Project Report
Project Report
on
STOCK PRICE
PREDICTOR
Using Long-Short Term Memory Networks
CENTRE FOR
DEVELOPMENT OF
ADVANCED COMPUTING
JAIPUR
Project By
HARSHIT BHARDWAJ
DEEKSHANT JAIN
Table of Content
DEFINITION
Project Overview
Problem Statement
ANALYSIS
Data Exploration
Data Visualization
Algorithms and Techniques
METHODOLOGY
Data Preprocessing
Implementation
RESULT
Model Evaluation and Validation
Justification
CONCLUSION
DEFINITION
Project Overview
Investment firms, money funds and even individuals have been using financial
models to better understand market behavior and make profitable investments and
trades. A wealth of information is available in the form of historical stock prices
and company performance data, suitable for machine learning algorithms to
process.
Can we actually predict stock prices with machine learning? Investors make
educated guesses by analyzing data. They'll read the news, study the company
history, industry trends and other lots of data points that go into making a
prediction. The prevailing theories is that stock prices are totally random and
unpredictable but that raises the question why top firms like Morgan Stanley and
Citigroup hire quantitative analysts to build predictive models In fact about 70% of
all orders on Wall Street are now placed by software, we're now living in the age of
the algorithm.
This project seeks to utilize Deep Learning models, Long-Short Term Memory
(LSTM) Neural Network algorithm, to predict stock prices. For data with
timeframes recurrent neural networks (RNNs) come in handy but recent researches
have shown that LSTM, networks are the most popular and useful variants of
RNNs.
We will use Keras to build a LSTM to predict stock prices using historical closing
price and trading volume and visualize both the predicted price values over time
and the optimal parameters for the model.
Problem Statement
The challenge of this project is to accurately predict the future closing
value of a given stock across a given period of time in the future. For
this project I will use a Long Short Term Memory networks – usually just
called “LSTMs” to predict the closing price of the S&P 5001 using a
dataset of past prices
GOALS
Model Evaluation
For this project measure of performance will be using the Mean Squared Error
(MSE) and Root Mean Squared Error (RMSE) calculated as the difference between
predicted and actual values of the target stock at adjusted close price and the delta
between the performance of the benchmark model (Linear Regression) and our
primary model (Deep Learning).
ANALYSIS
Data Exploration
The data used in this project is of the Alphabet Inc from January 1, 2005 to June 20,
2017, this is a series of data points indexed in time order or a time series. My goal
was to predict the closing price for any given date after training. For ease of
reproducibility and reusability, all data was pulled from the Google Finance Python
API 4.
The prediction has to be made for Closing (Adjusted closing) price of the data.
Since Google Finance already adjusts the closing prices for us 5, we just need
Table: The whole data can be found out in ‘Google.csv’ in the project root folder
Note: We did not observe any abnormality in datasets, i.e, no feature is empty and
does not contains any incorrect value as negative values.
We can infer from this dataset that date, high and low values are not important
features of the data. As it does not matter at what was the highest prices of the stock
for a particular day or what was the lowest trading price. What matters is the
opening price of the stock and closing prices of the stock. If at the end of the day
we have higher closing prices than the opening prices that we have some profit
otherwise we saw losses. Also volume of share is important as a rising market
should see rising volume, i.e, increasing price and decreasing volume show lack of
interest, and this is a warning of a potential reversal. A price drop (or rise) on large
volume is a stronger signal that something in the stock has fundamentally changed.
Therefore i have removed Date, High and low features from data set at
preprocessing step.
Data Visualization
To visualize the data i have used matplotlib library. I have plotted Closing stock
In addition to adjusting the architecture of the Neural Network, the following full
set of parameters can be tuned to optimize the prediction model:
• Input Parameters
• Number of Nodes (how many nodes per layer; tested 1,3,8, 16, 32, 64,
100,128)
• Training Parameters
• Training / Test Split (how much of dataset to train versus test model on)
• Epochs (how many times to run through the training process; kept at 2 for
base )
METHODOLOGY
Data Preprocessing
Acquiring and preprocessing the data for this project occurs in following sequence:
• Request the data from the Google Finance Python API and save it in google.csv
• Remove unimportant features(date, high and low) from the acquired data and
reversed the order of data, i.e., from january 03, 2005 to june 30, 2005
• Splitted the dataset into the training and test datasets for LSTM model. The Split
was of following shape: x_train (2589, 50, 3) y_train (2589,) x_test (446, 50, 3)
y_test (446,)
Implementation
Once the data has been downloaded and preprocessed, the implementation process
occurs consistently through models as follow:
We have thoroughly specified all the steps to build, train and test model and its
predictions in the notebook itself.
LSTM model :
Here I am calling a function defined in ‘lstm.py’ which builds the lstm model
Step 4: Now it’s time to predict the prices for given test datasets.
Step 5: Finally calculate the test score and plot the results of improved
LSTM model.
Fig : Plot For Adjusted Close and Predicted Close Prices for basic LSTM
model(epochs=2)
Fig : Plot For Adjusted Close and Predicted Close Prices for basic LSTM model(epochs=5)
RESULT
Free-Form Visualization
To conclude my report i would choose my final model visualization, which is
LSTM by fine tuning parameters. As i was very impressed on seeing how close i
have gotten to the actual data, with a mean square error of just 0.0009. It was an
nice moment for me as i had to poke around a lot (really ALOT !! :P ). But it was
fun working on this project.
Technologies Used
The technologies undertaken in this project:
● Set Up Infrastructure
○ iPython Notebook
○ Incorporate required Libraries (Keras, Tensor flow, Pandas,
Matplotlib, Sklearn, Numpy)
○ Git project organization
● Prepare Dataset
○ Incorporate data of Alphabet Inc company
○ Process the requested data into Pandas Dataframe
○ Develop function for normalizing data
● Develop Basic LSTM Model
○ Set up basic LSTM model with Keras utilizing parameters from
Benchmark Model
● Plot LSTM Predicted Values per time series
● Analyze and describe results for report.
I started this project with the hope to learn a completely new algorithm, i.e, Long-
Short Term Memory and also to explore a real time series data sets. The final model
really exceeded my expectation and have worked remarkably well. I am greatly
satisfied with these results.