QMR - Ai-Yfinance Library The Definitive Guide
QMR - Ai-Yfinance Library The Definitive Guide
qmr.ai/yfinance-library-the-definitive-guide/
Linkedin Profile
Yahoo Finance is definitely one of the most popular sources of stock market data
available for retail traders, and is as such the first datasourced that most of us used back
when we got started. It is also worth mentioning that its easy of use and array of datasets
allows for quick prototyping, making it a valuable source for backtesting research ideas
and implementing very basic strategies.
1/13
Although there are more than a few python libraries that enable coders to have
programmatic access to the unofficial API of Yahoo Finance, yfinance stands out as one
of the oldest, most reliable and actively used libraries out there.
In this article I’ll cover the most useful aspects of this library, and I’ll even go into some
detail describing its parameters.
Installing yfinance
Installing yfinance is a straightforward process, but Python and PIP have to be installed
beforehand. If you do not already have them installed, watch the following video.
You’ll notice that the tutorial installs “Anaconda” which is a Python distribution that comes
with all the most important bells and whistles included, and PIP is among them.
Assuming that you followed the previous steps, the remainder of the installation is trivial.
You just need to open a terminal and write the following command:
Now you should be able to follow along during the next sections!
In order to get started, we have to create a Ticker instance and pass the symbol of the
asset that we are interested in getting data from.
import yfinance as yf
amzn = yf.Ticker("AMZN")
# GET TODAYS DATE AND CONVERT IT TO A STRING WITH YYYY-MM-DD FORMAT (YFINANCE
EXPECTS THAT FORMAT)
end_date = datetime.now().strftime('%Y-%m-%d')
amzn_hist = amzn.history(start='2022-01-01',end=end_date)
print(amzn_hist)
2/13
By default, yahoo returns daily data, but we can also parametrize the barsize. It is worth
mentioning that different barsizes have different limitations regarding how far back we can
go in time.
3/13
Yahoo Finance Data Restrictions
Requesting an interval (start – end) greater than the limitations will result in an error. You
can avoid this error by only asking for the maximum allowed days of data. This is done by
setting the parameter “period” to ‘max’, in addition to either “end” or “start” (but not both)
The following examples fetch the most recent dataset of 1-minute data for AMZN.
# OBSERVATION: THE VARIABLE 'end_date' WAS DECLARED IN THE PREVIOUS CODE SNIPPED
amzn_hist = amzn.history(period='max',end=end_date,interval='1m')
It is also possible to retrieve historical data of more assets with a single request. This can
be done by instantiating a Tickers object (plural instead of singular).
companies = ['AMZN','GOOG','WMT','TSLA','META']
tickers = yf.Tickers(companies)
tickers_hist = tickers.history(period='max',end=end_date,interval='1m',)
tickers_hist
4/13
You might notice that the returned dataframe has a MultiLevel Index, which is an
undesired structure for most purposes.
We can transform the structure in a more convenient way by using the code below:
tickers_hist.stack(level=1).rename_axis(['Date', 'Ticker']).reset_index(level=1)
This will transform the previous DataFrameinto the following more convenient structure:
As can be seen, this resulted in a DataFrame with a single row for each symbol and
minute of data available. Except for the additional “Ticker” column, the remaining columns
preserves the same structure as the single asset case.
Last but not least, it is sometimes convenient to store the retrieved data locally instead of
requesting it multiple times from Yahoo. This is not only due to request limits imposed by
the server but also to loading/requesting times (it is much faster to load the data stored
locally).
tickers_hist.to_csv('all_data.csv')
The “history” function includes lots of parameters, and knowing them is important in order
to correctly interpret the data you’re receiving:
5/13
period: As seen before, especially useful is the value “max”. The following are the
valid values: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max.
interval: Defines the size of each bar. Smaller bar sizes have more strict limitations,
and only 7 days of 1-minute data can be retrieved.The following are the valid
values: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
start: Start date. The server expects a string formatted as YYYY-MM-DD.
end: End date. The server expects a string formatted as YYYY-MM-DD.
repost: Defines whether to include or not data not corresponding to regular trading
hours. Default value is False
auto_adjust: whether to adjust prices to stock splits and dividend payments. The
default value is True.
6/13
results of the get_institutional_holders function of a traded company
What follows is a working code snippet of all the functions described above!
7/13
import yfinance as yf
tsla = yf.Ticker('TSLA')
actions = tsla.get_actions()
analysis = tsla.get_analysis()
balance = tsla.get_balance_sheet()
calendar = tsla.get_calendar()
cf = tsla.get_cashflow()
info = tsla.get_info()
inst_holders = tsla.get_institutional_holders()
news = tsla.get_news()
recommendations = tsla.get_recommendations()
sustainability = tsla.get_sustainability()
print(actions)
print('*'*20)
print(analysis)
print('*'*20)
print(balance)
print('*'*20)
print(calendar)
print('*'*20)
print(cf)
print('*'*20)
print(info)
print('*'*20)
print(inst_holders)
print('*'*20)
print(news)
print('*'*20)
print(recommendations)
print('*'*20)
print(sustainability)
print('*'*20)
The following script is a working example of how to retrieve all the put and call options for
Tesla
8/13
# IMPORT REQUIRED LIBRARY
import yfinance as yf
tsla = yf.Ticker('TSLA')
tsla_options = tsla.option_chain()
# ACCESS BOTH THE CALLS AND PUTS AND STORE THEM IN THEIR RESPECTIVE VARIABLES
tsla_puts = tsla_options.puts
tsla_calls = tsla_options.calls
As an example, this is an excerpt of the dataframe containing the Call Options for Tesla
as of writing this article:
Pros of yfinance
Free of charge: the fact that the access to the vast amount of data offered by
Yahoo Finance is free of charge is definitively its biggest advantage.
Lots of datasets: in contrast to other data sources, Yahoo Finance offers a wide
array of datasets, from intraday price data to ESG scores.
Actively maintained: although it might not sound relevant, the fact that the
yfinance library has a large community of users adds an extra layer of robustness
and reliability to the library. This reliability is limited because of the unofficial nature
of the library.
Disadvantages
Unofficial library: the library is developed by a community of users and not by the
data provider itself. Yahoo Finance can update its API at any given time without
giving prior notice to the developers, leading to short periods of time where the data
is inaccessible or corrupt.
Unreliable: because it is an unofficial library, both the quality of the data and the
access to it is not guaranteed by the vendor. Yfinance should never be used for live
trading with real money.
9/13
A common task of price data is plotting it using candlesticks, which is very straightforward
if we use yfinance in addition to finplot. The following snippet is a working example that
retrieved the chart above.
# YOU SHOULD MAKE SURE THAT YOU ALREADY INSTLALED THEM PREVIOUSLY
import yfinance as yf
tsla = yf.Ticker('TSLA')
df = tsla.history(interval='1d',period='1y')
fplt.candlestick_ochl(df[['Open','Close','High','Low']])
fplt.show()
The following script fetches 60 days’ worth of 15-minute bar data of Tesla:
import yfinance as yf
tsla = yf.Ticker('TSLA')
df = tsla.history(interval='15m',period='60d')
print(df)
10/13
And the following script transforms said data into 45-minute bars:
import pandas as pd
"High": "max",
"Low": "min",
"Close": "min",
"Volume": "sum"})
By using the pandas groupby function, we can aggregate data according to our specific
needs. OHLC requires different groupings since the Open is the first price of every period,
High is its maximum price, Low is its minimum price, Close is the last price, and Volume
requires adding all values of the column. The result of that function looks as follows:
Notice that in regrouping the DataFrame, we got rid of the Data Splits and Dividends
columns.
11/13
The yfinance library is not officially maintained nor developed by Yahoo but by a
community of users. Although Yahoo Finance allows users to have access to their finance
API, it does so in an informal way. Users do not require to subscribe, pay a fee, or even
generate an API Key. As a consequence, Yahoo does not provide any type of support nor
notify users about changes in the API endpoints. As a consequence, the yfinance library
was developed and maintained by a handful of developers not related to Yahoo
YahooFinance and the yfinance library are both excellent sources of information for
research and prototyping purposes. The fact that it offers a wide array of valuable
information for free is definitively its biggest advantage.
Having said that, the source only is useful as a reference and should never be used for
real-life trading purposes. The data does not hold up to regular quality standards in the
industry. For example, their options chain data is oftentimes incomplete to say the least.
Additionally, Yahoo Finance allows the use of their API, but only in an informal fashion.
Last but not least, the yfinance library is not actively maintained by Yahoo but by a
generous community of users. As a consequence, the library might not work from time to
time since Yahoo regularly updates their API, and the community fixes the library
reactively instead of preemptively.
All data points retrieved by the yfinance library are completely free of charge. Having said
that, not all data endpoints offered by Yahoo Data are accessible by yfinance functions.
Some features require a paid subscription from Yahoo Finance.
Most quotes are live, although this depends mostly on the market in which a given
company is listed. All US-based exchange quotes are live on Yahoo Finance (and on
yfinance).
You should check other markets individually since they might have a 15 minutes inbuild
delay. Just as a random example, Argentinean stocks have said delay.
12/13
Python TutorialsTutorials
13/13