How to Extract Fundamental Data from the S&P 500 with Python

The S&P 500 index represents the benchmark performance of the 500 largest public companies in the US. It is very important to extract the fundamental data from these companies for investors, analysts, and researchers.

Python is a great language through which one can extract and then analyze such information with the help of its extensive libraries. The following post shows how to extract the fundamental data of the S&P 500 index with the assistance of Python.

Why Extract Fundamental Data?

Fundamental data involves the core financial information such as earnings, revenues, dividends, and other measures normally used to determine the financial strength of a company.

With this kind of data extraction, investors can, no doubt, make wiser decisions about where to invest their capital. Fundamental analysis is an integral part of value investing and, in essence, establishes where the intrinsic value of a stock lies.

Prerequisites

Prior to advancing, please confirm that you possess the following prerequisites ?

Python 3.x Installed: You need to make sure that Python 3.x is installed in your system.
Basic understanding of Python: You need to have basic understanding of libraries such as pandas, requests, and yfinance. In addition you should have any IDE/ text editor of your choice, such as Jupyter Notebook or VS Code.
Install Required Libraries: You can install the necessary libraries using pip with the command below ?
```
pip install pandas requests yfinance
```

Steps to Extract the Data

Following are the steps to extract the fundamental Data from the S&P 500 with Python ?

Step 1: Import Required Libraries

First, import the needed libraries as shown below ?

import pandas as pd
import yfinance as yf
import requests
from bs4 import BeautifulSoup

pandas: To manipulate and analyze data.
yfinance: A Python package to download stock market data from Yahoo. Finance.
requests: To make an HTTP request on the web pages.
Beautiful Soup: It parses HTML to provide an easily accessible way to extract information from web pages.

Besides, if you prefer to run Python code online without requiring you to install anything locally, use the Python Online Compiler. A welcome addition to those who want to execute Python scripts directly in their browsers for quick tests and learning.

Step 2: Get a list of S&P 500 Companies

We need to get a list of companies making up the S&P 500 ?

response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table', {'id': 'constituents'})
df = pd.read_html(str(table))[0]
df.to_csv('sp500_companies.csv', index=False)
# Show the first few rows of the dataframe
df.head()

Step 3: Scraping Fundamental Data with yfinance

The following code scrapes Wikipedia for the table of S&P 500 companies, piping the data into a pandas DataFrame. It will contain a list with the company's ticker symbol, its name, its sector, and other relevant details.

Having the list of the S&P 500 companies set, now we can begin pulling fundamental data using yfinance. What follows is how one can pull data on market cap, PE, and dividend yield ?

def get_fundamental_data(ticker):
 stock = yf.Ticker(ticker)
 info = stock.info

 data = {
 'Ticker': ticker,
'Market Cap': info.get('marketCap', 'N/A'),
 'PE Ratio': info.get('trailingPE', 'N/A'),
 'Dividend Yield': info.get('dividendYield', 'N/A'),
 'EPS': info.get('trailingEps', 'N/A')
 }
 return data

# Extract data for a few companies
tickers = df['Symbol'].head(5) # Get tickers for the first 5 companies
fundamental_data = [get_fundamental_data(ticker) for ticker in tickers]
fundamental_df = pd.DataFrame(fundamental_data)

# Print the extracted data
fundamental_df

Above is the code to ?

The get_fundamental_data function that takes a stock ticker as input and returns a dictionary of fundamental data.
Apply it on a subset of S&P 500 companies and store the output in a DataFrame.

Step 4: Visualize or Analyze Data

Once you have the data extracted, you will most likely want to visualize or somehow analyze the data. Here's an example of how you could plot the distribution of the forward PE ratios across the S&P 500 ?

import matplotlib.pyplot as plt

# Extract PE Ratios for all companies
df['PE Ratio'] = df['Symbol'].apply(lambda x: get_fundamental_data(x)['PE Ratio'])
df['PE Ratio'] = pd.to_numeric(df['PE Ratio'], errors='coerce')

# Plot the distribution of PE Ratios
plt.figure(figsize=(10, 6))
df['PE Ratio'].dropna().hist(bins=50)
plt.title('Distribution of PE Ratios in the S&P 500')
plt.xlabel('PE Ratio')
plt.ylabel('Number of Companies')
plt.show()

This chart provides valuable insights into the valuation of firms listed in the S&P 500. The histogram provides an intuition of how many companies fall into specific PE ratio ranges.

Step 5: Save and Share Your Data

Finally, you may want to save the data extracted for further analysis or share it with others. You can export the DataFrame into the CSV file in a pretty straightforward manner:

fundamental_df.to_csv('sp500_fundamental_data.csv', index=False)

The above command will write the DataFrame into a CSV format file named sp500_fundamental_data.csv, which can be opened in Excel or any other data analysis tool.

micahgreen

I Am A Software Engineer and Passionate Programmer

Updated on: 2024-09-10T10:55:55+05:30

545 Views

Kickstart Your Career

Get certified by completing the course

Get Started