0% found this document useful (0 votes)
76 views1 page

Stock Price Prediction Using ARIMA Model by Dereje Workneh Medium

This document discusses using an ARIMA model to predict future stock prices based on historical price data. It describes collecting 5 years of Apple stock price data, exploring the data, and splitting it into training and test sets. It then covers checking the data for stationarity, identifying the ARIMA model parameters p, d, and q by examining ACF and PACF plots, fitting the ARIMA model to make predictions on the validation set, and summarizing the general ARIMA modeling process.

Uploaded by

Lê Hoà
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views1 page

Stock Price Prediction Using ARIMA Model by Dereje Workneh Medium

This document discusses using an ARIMA model to predict future stock prices based on historical price data. It describes collecting 5 years of Apple stock price data, exploring the data, and splitting it into training and test sets. It then covers checking the data for stationarity, identifying the ARIMA model parameters p, d, and q by examining ACF and PACF plots, fitting the ARIMA model to make predictions on the validation set, and summarizing the general ARIMA modeling process.

Uploaded by

Lê Hoà
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Dereje Workneh Follow

Jun 6, 2020 · 7 min read · Listen

Stock price prediction using ARIMA Model

Any kind of prediction is a difficult task in the real world, especially where
the future is very dynamic. The stock market is highly volatile and
unpredictable by nature. Therefore, investors are always taking risks in
hopes of making a profit. People want to invest in the stock market and
expect profit from their investments. There are many factors that influence
stock prices, such as supply and demand, market trends, the global
economy, corporate results, historical price, public sentiments, sensitive
financial information, popularity (such as good or bad news related to a
company name and product), all of which may result in an increase or
decrease in the number of buyers etc. Even though one may analyze a lot of
factors, it is still difficult to achieve a better performance in the stock market
and to predict the future price in general. Predicting the price of a specific
stock one day ahead is, by itself, a very complicated task. In this blogpost,
next day stock prices are predicted for each of the individual days of a
certain year. For each day, comparisons are made with the actual prices to
validate the model. In this blogpost, we have been tasked with predicting the
price of the Apple (AAPL) stock price and have been provided with historical
data (time series data). This includes features like opening and closing stock
prices, volume, date, and so on. A time series is a series of data that is
collected over a period of time. Time series data are sequential data which
follow some patterns. In order of time, data are points in an index or listed
or graphed. Time series data are also called historical data or past data. Time
series data are used for predicting a future value based on an historical
value. This is called time series analysis. The daily closing price of stocks,
heights of ocean tides, and counts of sunspots are some examples of time
series data. Time series data are studied for several purposes, such as
forecasting the future based on knowledge of the past, understanding of the
phenomenon. Underlying measures, or simply succinctly describing the
salient features of the series. Forecasting or predicting future prices of an
observed time series plays an important role in nearly all fields of science,
engineering, finance, business intelligence, economics, meteorology,
telecommunications etc. To predict an outcome based on time series data,
we can use a time series model which is called Auto Regressive Integrated
Moving Average (ARIMA) is used as the machine learning technique to
analyze and predict future stock prices based on historical prices.

Data Collection: — In this section, we will collect our data and load it into
python. The data used for this blogpost was collected 5 years (2015–2020) of
AAPL(Apple) Stock price data from Yahoo Finance, which you can download
here. We chose to use the Closing Value for our analysis. This is the
workflow of the ARIMA model for this blogpost:

Let’s first import the required libraries :

Load the data

Exploratory Data Analysis:

Let’s check the data there is null or not and the shape of the data.

Now, We have seen earlier that the data type for ‘Date’ is an object. So first of
all we have to change the data type to datetime format otherwise we can not
extract features from it.

The closing price based on month

Closing price based on closing price

We can see that there is an increase during the specified time frame.

Bar graph

The correlation

Training and test datasets:- It is important we do not randomly pick training


and testing datasets. In stock price prediction, we have to use the test data
always the recent dataset give a better result for our prediction.

Training dataset is 80% of the total dataset while the test dataset the
remaining 20%.

Data distribution based on train and test datasets

ARIMA model:
ARIMA stands for Auto Regression Integrated Moving Average. It is specified
by three ordered parameters (p,d,q). Where:

p is the order of the autoregressive model(number of time lags)

d is the degree of differencing (number of times the data have had past
values subtracted)

q is the order of moving average model. Before building an ARIMA


model, we have to make sure our data is stationary.

Before going to the ARIMA model, we have to make our data is stationarize.
For a data to be stationarize:

1. The mean of the series should not be a function of time.

2. The variance of the series should not be a function of time.

3. the covariance of the i th term and the (i + m) th term should not be a


function of time.

Because when running a linear regression the assumption is that all of the
observations are all independent of each other. In a time series, however, we
know that observations are time dependent. It turns out that a lot of nice
results that hold for independent random variables (law of large numbers
and central limit theorem to name a couple) hold for stationary random
variables. So by making the data stationary, we can actually apply regression
techniques to this time dependent variable.

There are two methods to check the stationarity of a time series. The first is
by looking at the data. By visualizing the data it should be easy to identify a
changing mean or variation in the data. For a more accurate assessment
there is the Dickey-Fuller test. I won’t go into the specifics of this test, but if
the ‘Test Statistic’ is greater than the ‘Critical Value’ than the time series is
stationary. Below is code that will help you visualize the time series and test
for stationarity.

We can easily see that the time series is not stationary, and our
adf_test_stationarity function confirms what we see.

To transform our data more stationary . There are many transformations to


stationarize our data such as deflation by CPI, logarithmic, first Difference,
seasonal difference and seasonal adjustment and if you need to read more
here is the link.

For this blogpost, we are using Plot the ACF and PACF charts and find the
optimal parameters. The next step is to determine the tuning parameters of
the model by looking at the autocorrelation and partial autocorrelation
graphs. There are many rules and practice about how to select the
appropriate AR, MA, SAR, and MAR terms for the model. The chart below
provides a brief guide on how to read the autocorrelation and partial
autocorrelation graphs to select the proper terms. The big issue as with all
models is that you don’t want to overfit your model to the data by using too
many terms.

ACF plot

PACF plot

Build Model:
Below are the steps we follow for implementing auto ARIMA:

1. Fit Auto ARIMA: Fit the model on the univariate series

2. Predict values on validation set: Make predictions on the validation set

Summary:
The general steps to implement an ARIMA model are in time series data:–

1. Load the data: The first step for model building is of course to load the
dataset

2. Preprocessing: Depending on the dataset, the steps of preprocessing will


be defined. This will include creating timestamps, converting the dtype
of date/time column, making the series univariate, etc.

3. Make series stationary: In order to satisfy the assumption, it is necessary


to make the series stationary. This would include checking the
stationarity of the series and performing required transformations.

4. Determine d value: For making the series stationary, the number of


times the difference operation was performed will be taken as the d
value.

5. Create ACF and PACF plots: This is the most important step in ARIMA
implementation. ACF PACF plots are used to determine the input
parameters for our ARIMA model.

6. Determine the p and q values: Read the values of p and q from the plots
in the previous step.

7. Fit ARIMA model: Using the processed data and parameter values we
calculated from the previous steps, fit the ARIMA model.

8. Predict values on validation set: Predict the future values.

Final Words
Thank you for the read and your time, if you like this story please hold the
clap button. Also, I’ll be happy to share your feedback and see you the next
blogpost!

Stock Market Predictions Time Series Data Time Series Model

111 5

More from Dereje Workneh Follow

May 18, 2020

Predicting House Price in Ames, Iowa using


Regression Model
As a data scientist working for a real estate investment erm interesting,
our task is to develop a model to predict the selling price of a given…
home in Ames, Iowa from 2006 to 2010. Our employer hopes to use this
information to help assess whether asking price of a…
House Price Prediction 7 min read

Share your ideas with millions of readers. Write on Medium

May 17, 2020

How to Pandas Merge DataFrames


How to Merge DataFrames with Pandas Due to the Python’s popularity
in all directions of technologies in information technology is
increasing from time to time. Therefore, mastering Python opens mo…
opportunities in the industries. Python is one of the most popular data
science tools. …
3 min read

May 16, 2020

A Long journey to become data scientist


The journey leading up to me move here is has been full of good and
bad emotions for me, I crossed three continents and oceans to
accomplish my dream. By end of 2017, graduated from Addis Ababa…
University collaborated with University of Leeds, UK and University of
Antwerp, Belgium Physics…
Physics 3 min read

Love podcasts or audiobooks? Learn on the go with our new app. Try Knowable

Recommended from Medium

in Bransjebloggen 3

You might also like