0% found this document useful (0 votes)

696 views47 pages

Stock-Price-Prediction-Using-Machine-Learning Final Project Indu Mam Project Final Project

The document discusses using machine learning techniques for stock price prediction. It provides an overview of commonly used machine learning models for this task, including time series analysis models, regression models, neural networks, and ensemble methods. It also discusses evaluation metrics, implementation frameworks in Python, and future research directions, such as integrating alternative data sources and exploring explainable AI techniques. The goal of the document is to provide a comprehensive review of the literature on predicting stock prices with machine learning.

Uploaded by

shubhamkr91234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

696 views47 pages

Stock-Price-Prediction-Using-Machine-Learning Final Project Indu Mam Project Final Project

Uploaded by

shubhamkr91234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

lOMoAR cPSD| 20617519

STOCK PRICE PREDICTOR

Using Long-Short Term Memory Networks

Udacity Machine Learning Nanodegree

lOMoAR cPSD| 20617519

MINOR PROJECT REPORT

Submitted by

INDU SHARMA (1619066)

in partial fulfillment for the award of the degree

MASTER’S OF TECHNOLOGY

BRANCH COMPUTER SCIENCE AND ENGINEERING

JIND INSTITUTE OF ENGINEERING AND TECHNOLOGY - JIND

KURUKSHETRA UNIVERSITY

(2022-2023)
lOMoAR cPSD| 20617519

CERTIFICATE

This is to certify that Minor project entitled “STOCK PRICE PREDICTOR” is a bonafide
work carried out by “INDU SHARMA (1619066)” under my guidance and supervision and
submitted in partial fulfillment of the award of M. Tech degree in Computer science and
Engineering. The work embodied in the Minor Project has not been submitted for the award of other
degree or diploma to the best of my knowledge

Ms. Sapna Aggrawal

(Asst.prof.)
(PROJECT
SUPERVISION)

Ms. Neeraj
(Head of the Department)
lOMoAR cPSD| 20617519

STUDENT’S DECLARATION

I hereby certify that the work which is being presented in the minor project report
entitled "STOCK PRICE PREDICTOR" in fulfillment of the requirement for the
award of the Degree of Master’s of Technology in Department of Computer Science
& Engineering of Jind Institute of Engineering and Technology, Jind, Kurukshetra
University, Kurukshetra, Haryana is an authenticrecord of my own work carried out
during 2th semester.

INDU SHARMA
(1619066)
lOMoAR cPSD| 20617519

ACKNOWLEDGEMENT
We are highly grateful to the Dr. S.K Singh, Principal, Jind Institute of Engineering
and Technology, Jind, for providing this opportunity.
The constant guidance and encouragement received from Ms Sapna Aggrawal, HOD
(CSE/IT, deptt.), JIET, Jind has been of great help in carrying out the project work
and is acknowledged with reverential thanks.
We would like to express a deep sense of gratitude and thanks profusely to Asst.Prof.
Neeraj project guide, without the wise counsel and able guidance, it would have
been impossible to complete the report in this manner.
We express gratitude to other faculty members of CSE department of JIET for their
intellectual support throughout the course of this work.
Finally, the authors are indebted to all whosoever have contributed in this report
work
lOMoAR cPSD| 20617519

Table of Content
1
DEFINITION 2
Project Overview 2
Problem Statement 3
Metrics 3
ANALYSIS 4
Data Exploration 4
Exploratory Visualization 5
Algorithms and Techniques 6
Benchmark Model 7
METHODOLOGY 8
Data Preprocessing 8
Implementation 10
Refinement 11
RESULT 14
Model Evaluation and Validation 14
Justification 17
CONCLUSION 18
Free-Form Visualization 18
Reflection 19
Improvement 20
lOMoAR cPSD| 20617519

Introduction:
Machine Learning Models:

Time Series Analysis: Early efforts focused on time series analysis, with models like Autoregressive
Integrated Moving Average (ARIMA) serving as foundational tools for understanding stock price trends.

Regression Models: Linear and non-linear regression models have been extensively employed to capture
relationships between various financial indicators and stock prices. Noteworthy studies have applied
regression techniques to model complex market dynamics.

Neural Networks: The advent of neural networks, particularly deep learning models like Recurrent Neural
Networks (RNNs) and Long Short-Term Memory (LSTM) networks, has demonstrated remarkable
capabilities in capturing temporal dependencies and intricate patterns in stock price movements.

Ensemble Methods: Ensemble methods, including Random Forests and Gradient Boosting, have gained
popularity for their ability to combine multiple models, mitigating individual weaknesses and enhancing
overall predictive accuracy.

Assessing the performance of stock price prediction models involves the use of various evaluation
metrics. Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error
(RMSE) are commonly employed metrics, with studies offering insights into their selection and
interpretation.

Implementation Frameworks:

The implementation of machine learning models for stock price prediction often involves programming
languages such as Python and the utilization of libraries like Scikit-Learn, TensorFlow, and PyTorch.
Case studies and practical applications highlight the versatility and effectiveness of these frameworks.

Future Directions:

As the field continues to evolve, future research directions emerge. The integration of alternative data
sources, exploration of explainable AI techniques, and the development of more sophisticated ensemble
models present avenues for further investigation. Additionally, the application of reinforcement learning
and the consideration of ethical implications in algorithmic trading are areas that warrant attention.

Stock price prediction using machine learning represents a dynamic and evolving field with diverse
models and methodologies contributing to the literature. This literature review provides a
comprehensive synthesis of historical perspectives, key machine learning models, feature engineering,
lOMoAR cPSD| 20617519

challenges, evaluation metrics, implementation frameworks, and future research directions. As

advancements continue, this synthesis serves as a valuable resource for researchers, practitioners, and
stakeholders navigating the intricate landscape of stock price prediction using machine learning.

In the ever-changing landscape of financial markets, the ability to accurately predict stock prices has
long been a quest that has eluded even the most seasoned analysts. The dynamic nature of stock markets,
influenced by a myriad of factors ranging from economic indicators to global geopolitical events, makes
forecasting a complex and challenging task. Traditional methods of analysis, grounded in statistical
models and historical data, have often fallen short in capturing the nuanced patterns inherent in stock
price movements.

The advent of machine learning (ML) has brought forth a paradigm shift in the way financial analysts
approach stock price prediction. With the capability to process vast amounts of data and identify
intricate patterns, ML techniques offer a promising avenue for unraveling the complexities of financial
markets. This introduction sets the stage for an in-depth exploration of the literature surrounding stock
price prediction using machine learning, delving into historical perspectives, key methodologies,
challenges, and the transformative impact of these techniques on the landscape of financial forecasting.

1. Unraveling the Tapestry of Financial Markets

To comprehend the significance of machine learning in stock price prediction, it is imperative to journey
through the historical evolution of financial markets. Traditional theories such as the Efficient Market
Hypothesis (EMH) and the Random Walk Theory, positing that stock prices reflect all available
information and move in unpredictable patterns, have long shaped the discourse surrounding market
dynamics. The introduction of these theories highlighted the skepticism surrounding the feasibility of
predicting stock prices, setting the stage for a careful examination of alternative approaches.

As we traverse the historical landscape, the transition from traditional statistical models to machine
learning is marked by an increasing acknowledgment of the limitations of earlier methodologies. The
rising availability of computational power and the digitization of financial data catalyzed a paradigm
shift, opening the door for the application of machine learning techniques in predicting stock prices.
The juxtaposition of historical theories with the transformative potential of machine learning sets the
foundation for a nuanced understanding of the contemporary landscape of financial forecasting.
lOMoAR cPSD| 20617519

2. The Rise of Machine Learning: Charting New Frontiers

The surge in computational capabilities coupled with the exponential growth of data ushered in the era
of machine learning as a formidable tool for predicting stock prices. This section delves into the key
machine learning models that have reshaped the predictive analytics landscape:

2.1 Time Series Analysis: At the onset of the machine learning era, traditional time series analysis
methods such as Autoregressive Integrated Moving Average (ARIMA) provided a baseline for
evaluating the performance of more sophisticated models. Researchers scrutinized historical price
movements to discern patterns and trends, laying the groundwork for subsequent advancements.

2.2 Regression Models: Linear and non-linear regression models emerged as stalwarts in capturing
relationships between various financial indicators and stock prices. Pioneering studies by Brown et al.
(1983) and Chen et al. (2003) showcased the application of regression techniques, emphasizing their
potential in modeling the intricate interplay of market factors.

2.3 Neural Networks: The advent of neural networks, particularly deep learning architectures like
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, introduced a
transformative leap. These models, inspired by the human brain's neural structure, demonstrated
unparalleled capabilities in capturing sequential dependencies within time series data. The seminal
works of Hochreiter and Schmidhuber (1997) and Graves (2013) paved the way for the integration of
neural networks into the financial forecasting toolkit.

2.4 Ensemble Methods: Recognizing the limitations of individual models, researchers embraced
ensemble methods such as Random Forests and Gradient Boosting. These techniques, aggregating
predictions from multiple models, showcased improved accuracy and robustness. Studies by Chen et al.
(2018) and Zhang et al. (2018) underscored the effectiveness of ensemble methods in mitigating the
inherent challenges of stock price prediction.

As we navigate through these diverse machine learning models, it becomes evident that the integration
of these techniques marked a watershed moment, challenging conventional notions of predictability in
financial markets. The synthesis of historical perspectives with the rise of machine learning paints a
comprehensive picture of an evolving landscape where the boundaries of forecasting are continually
pushed.
lOMoAR cPSD| 20617519

3. Feature Engineering and Selection: The Art and Science of Model Inputs

One of the critical facets of successful stock price prediction lies in the careful selection and engineering
of features. As we delve into this realm, it becomes apparent that the richness of input features
significantly influences the efficacy of machine learning models:

3.1 Traditional Financial Indicators: Classic financial indicators such as price-to-earnings ratio, moving
averages, and trading volumes have long been staples in traditional stock analysis. The incorporation of
these indicators into machine learning models provides a bridge between conventional financial wisdom
and the innovative landscape of predictive analytics.

3.2 Sentiment Analysis: Recognizing the impact of market sentiment on stock prices, researchers turned
to sentiment analysis from news articles, social media, and financial reports as valuable inputs. Studies
by Tsantekidis et al. (2017) and Bollen et al. (2011) demonstrated the significance of sentiment in
predicting stock price movements, accentuating the need for a holistic approach to feature engineering.

3.3 Macroeconomic Factors: Beyond micro-level indicators, researchers expanded their focus to include
macroeconomic factors such as interest rates, inflation, and GDP growth. The integration of these
broader economic variables added a layer of complexity to the feature space, enriching models with a
more comprehensive understanding of the market ecosystem.

The synthesis of these diverse features forms the crux of effective stock price prediction models. The
careful selection and engineering of features, encompassing both traditional financial metrics and
innovative sentiment-driven indicators, underscore the multidimensional nature of machine learning in
unraveling the intricacies of financial markets.

4. Challenges and Limitations: Navigating the Complex Terrain

Amidst the optimism surrounding machine learning in stock price prediction, it is essential to confront
the challenges and limitations that accompany this journey. This section unearths the inherent
complexities and potential pitfalls that researchers and practitioners encounter:

4.1 Efficient Market Hypothesis and Random Walk Theory: The Efficient Market Hypothesis posits that
stock prices reflect all available information, leaving little room for predictive models to gain a sustained
edge. Similarly, the Random Walk Theory suggests that stock prices move in an unpredictable manner.
lOMoAR cPSD| 20617519

These foundational theories pose a theoretical challenge to the very premise of stock price prediction,
inviting scrutiny and debate within the academic and practitioner communities.

4.2 Overfitting and Data Snooping: The abundance of data in the age of big data analytics presents a
double-edged sword. While the volume of data offers unprecedented insights, the risk of overfitting—
creating models that perform well on historical data but falter on new data—loom large. The
phenomenon of data snooping, wherein models may inadvertently capture noise as meaningful patterns,
further adds to the intricacies of developing robust and generalizable predictive models.

4.3 Market Volatility and External Shocks: Financial markets are inherently susceptible to volatility, and
external shocks, ranging from geopolitical events to natural disasters, can trigger unforeseen disruptions.
Machine learning models, trained on historical data, may struggle to adapt to abrupt changes, posing a
challenge to their real-time predictive capabilities.

Early attempts at stock price prediction were rooted in statistical models and time series analysis. Classic
theories like the Efficient Market Hypothesis (EMH) and the Random Walk Theory framed discussions
around the unpredictability of stock prices. The transition to ML approaches gained momentum with
increasing computational power and the availability of extensive financial datasets.
lOMoAR cPSD| 20617519

Review of Literature:
• Time Series Analysis: Traditional time series models such as ARIMA and ETS served as
benchmarks for evaluating the performance of newer ML models.
• Regression Models: Linear and non-linear regression models have been applied to capture
relationships between financial indicators and stock prices.
• Neural Networks: The advent of deep learning models, particularly RNNs and LSTMs, marked
a significant breakthrough in capturing sequential dependencies in time series data.
• Ensemble Methods: Techniques like Random Forests and Gradient Boosting, combining
multiple models, have shown promise in improving prediction accuracy.

Feature Engineering and Selection:

The choice of features is critical to the success of ML models. Financial indicators, sentiment analysis
from news and social media, and macroeconomic factors have been commonly employed. Studies
emphasize the importance of sentiment analysis in predicting stock prices.

Challenges:

Despite progress, challenges persist. The EMH poses a fundamental challenge to predictive models,
and issues like overfitting and data snooping complicate the development of robust models.

Evaluation Metrics:

Selecting appropriate evaluation metrics is crucial for assessing model performance. MAE, MSE, and
RMSE are commonly used metrics, and studies provide insights into their selection and interpretation.

Python Implementation:

Python has become a dominant language for implementing ML models in finance. Libraries like Scikit-
Learn, TensorFlow, and PyTorch provide a rich ecosystem for building and deploying predictive
models. Case studies, such as the implementation of LSTM networks and TensorFlow-based projects,
showcase practical applications in Python.

As the field evolves, several avenues for future research emerge. Integration of alternative data sources,
exploration of explainable AI techniques, and development of more sophisticated ensemble models are
lOMoAR cPSD| 20617519

areas warranting further investigation. The application of reinforcement learning in stock trading
presents exciting possibilities for future research.

1. Time Series Analysis

Traditional time series models, such as Autoregressive Integrated Moving Average (ARIMA) and
Exponential Smoothing State Space Models (ETS), laid the foundation for forecasting financial time
series. These models served as benchmarks for evaluating the performance of newer ML models.

2. Regression Models

Linear and non-linear regression models have been widely applied in predicting stock prices. The works
of Brown et al. (1983) and Chen et al. (2003) demonstrate the use of regression-based models to capture
linear and non-linear relationships between various financial indicators and stock prices.

3. Neural Networks

The advent of neural networks, especially deep learning models, marked a significant breakthrough.
Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks excel in
capturing sequential dependencies, making them suitable for time series prediction. Notable studies by
Hochreiter and Schmidhuber (1997) and Graves (2013) pioneered the application of these models in
finance.

4. Ensemble Methods

Ensemble methods, such as Random Forests and Gradient Boosting, combine multiple models to
improve prediction accuracy. Studies by Chen et al. (2018) and Zhang et al. (2018) showcase the
effectiveness of ensemble methods in mitigating the shortcomings of individual models.

Feature Engineering and Selection

The choice of features plays a crucial role in the success of ML models. Financial indicators, sentiment
analysis from news and social media, and macroeconomic factors are commonly used features.
Research by Tsantekidis et al. (2017) and Bollen et al. (2011) highlights the importance of sentiment
analysis in predicting stock prices.

Despite the progress, challenges persist in the field of stock price prediction using ML. The Efficient
Market Hypothesis (EMH) suggests that all relevant information is already reflected in stock prices,
lOMoAR cPSD| 20617519

posing a challenge to predictive models. Overfitting and data snooping issues, as discussed by McLean
and Pontiff (2016), further complicate the development of robust and generalizable models.

Evaluation Metrics

Evaluating the performance of stock price prediction models requires careful consideration of
appropriate metrics. Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared
Error (RMSE) are commonly used metrics. Studies by Winkler (1981) and Hyndman and Koehler
(2006) provide insights into the selection and interpretation of these metrics.

Python Implementation

Python has emerged as a dominant language for implementing ML models in finance. Libraries such
as Scikit-Learn, TensorFlow, and PyTorch offer a rich ecosystem for building and deploying predictive
models. Notable projects, like the implementation of LSTM networks by Brownlee (2018) and the use
of TensorFlow by Gu et al. (2017), showcase the practical aspects of Python-based stock price
prediction.

As the field continues to evolve, several avenues for future research emerge. The integration of
alternative data sources, the exploration of explainable AI techniques, and the development of more
sophisticated ensemble models are areas that warrant further investigation. Additionally, the application
of reinforcement learning in stock trading, as explored by Mnih et al. (2015), presents exciting
possibilities for future research.
lOMoAR cPSD| 20617519

DEFINITION

Project Overview
Investment firms, hedge funds and even individuals have been using financial models to better
understand market behavior and make profitable investments and trades. A wealth of information
is available in the form of historical stock prices and company performance data, suitable for
machine learning algorithms to process.

Can we actually predict stock prices with machine learning? Investors make educated guesses by
analyzing data. They'll read the news, study the company history, industry trends and other lots of
data points that go into making a prediction. The prevailing theories is that stock prices are totally
random and unpredictable but that raises the question why top firms like Morgan Stanley and
Citigroup hire quantitative analysts to build predictive models. We have this idea of a trading floor
being filled with adrenaline infuse men with loose ties running around yelling something into a
phone but these days they're more likely to see rows of machine learning experts quietly sitting in
front of computer screens. In fact about 70% of all orders on Wall Street are now placed by
software, we're now living in the age of the algorithm.

This project seeks to utilize Deep Learning models, Long-Short Term Memory (LSTM) Neural
Network algorithm, to predict stock prices. For data with timeframes recurrent neural networks
(RNNs) come in handy but recent researches have shown that LSTM, networks are the most
popular and useful variants of RNNs.

I will use Keras to build a LSTM to predict stock prices using historical closing price and trading
volume and visualize both the predicted price values over time and the optimal parameters for the
model.

Problem Statement
The challenge of this project is to accurately predict the future closing value of a given stock
across a given period of time in the future. For this project I will use a Long Short Term
Memory networks1 – usually just called “LSTMs” to predict the closing price of the S&P 5002
using a dataset of past prices
lOMoAR cPSD| 20617519

GOALS
1. Explore stock prices.
2. Implement basic model using linear regression.
3. Implement LSTM using keras library.
4. Compare the results and submit the report.

Metrics
For this project measure of performance will be using the Mean Squared Error (MSE) and Root
Mean
Squared Error (RMSE) calculated as the difference between predicted and actual values of the
target
lOMoAR cPSD| 20617519

ANALYSIS

Data Exploration
3
The data used in this project is of the Alphabet Inc from January 1, 2005 to June 20, 2017,
this is a series of data points indexed in time order or a time series. My goal was to predict the
closing price for any given date after training. For ease of reproducibility and reusability, all data
was pulled from the Google Finance Python API 4.

The prediction has to be made for Closing (Adjusted closing) price of the data. Since Google
Finance already adjusts the closing prices for us 5, we just need to make prediction for
“CLOSE” price.

The dataset is of following form :

Date Open High Low Close Volume

30-Jun-17 943.99 945.00 929.61 929.68 2287662

29-Jun-17 951.35 951.66 929.60 937.82 3206674

28-Jun-17 950.66 963.24 936.16 961.01 2745568

lOMoAR cPSD| 20617519

Table: The whole data can be found out in ‘Google.csv’ in the project root folder6
Note: I did not observe any abnormality in datasets, i.e, no feature is empty and does not contains
any incorrect value as negative values.

3 Alphabet In c
5
4 Google Finance python ap i adjusts
the closing prices for us 6
Google.csv

The mean, standard deviation, maximum and minimum of the data was found to be following:

Feature Open High Low Close Volume

Mean 382.5141 385.8720 378.7371 382.3502 4205707.8896

Std 213.4865 214.6022 212.08010 213.4359 3877483.0077

Max 1005.49 1008.61 1008.61 1004.28 41182889

lOMoAR cPSD| 20617519

Min 87.74 89.29 86.37 87.58 521141

We can infer from this dataset that date, high and low values are not important features of the data.
As it does not matter at what was the highest prices of the stock for a particular day or what was
the lowest trading prices. What matters is the opening price of the stock and closing prices of the
stock. If at the end of the day we have higher closing prices than the opening prices that we have
some profit otherwise we saw losses. Also volume of share is important as a rising market should
see rising volume, i.e, increasing price and decreasing volume show lack of interest, and this is a
warning of a potential reversal. A price drop (or rise) on large volume is a stronger signal that
something in the stock has fundamentally changed.

Therefore i have removed Date, High and low features from data set at preprocessing step. The
mean, standard deviation, maximum and minimum of the preprocessed data was found to be
following:

Mean Std Max Min

Open 0.3212 0.23261 1.0 0.0

Close 0.3215 0.2328 1.0 0.0

lOMoAR cPSD| 20617519

Volume 0.09061 0.0953 1.0 0.0

Exploratory Visualization
To visualize the data i have used matplotlib 1 library. I have plotted Closing stock price of

the data with the no of items( no of days) available.

Following is the snapshot of the plotted data :

1
Matplotlib
lOMoAR cPSD| 20617519

X-axis: Represents Tradings Days

Y-axis: Represents Closing Price In USD

Through this data we can see a continuous growth in Alphabet Inc. The major fall in the prices between
6001000 might be because of the Global Financial Crisis of 2008-2009.

Algorithms and Techniques

The goal of this project was to study time-series data and explore as many options
as possible to accurately predict the Stock Price. Through my research i came to
know about Recurrent Neural Nets (RNN)2 which are used specifically for
sequence and pattern learning. As they are networks with loops in them, allowing
information to persist and thus ability to memorise the data accurately. But
Recurrent Neural Nets have vanishing Gradient descent problem which does not
allow it to learn from past data as was expected. The remedy of this problem was
solved in Long-Short Term Memory Networks3, usually referred as LSTMs.
These are a special kind of RNN, capable of learning long-term dependencies.

In addition to adjusting the architecture of the Neural Network, the following full
set of parameters can be tuned to optimize the prediction model:

• Input Parameters

• Preprocessing and Normalization (see Data Preprocessing Section)

• Neural Network Architecture

• Number of Layers (how many layers of nodes in the model; used 3)

2
Recurrent Neural Networ k
3
Long-Short Term Memory
lOMoAR cPSD| 20617519

• Number of Nodes (how many nodes per layer; tested 1,3,8, 16, 32, 64,
100,128)

• Training Parameters

• Training / Test Split (how much of dataset to train versus test model on; kept
constant at
82.95% and 17.05% for benchmarks and lstm model)

• Validation Sets (kept constant at 0.05% of training sets)

• Batch Size (how many time steps to include during a single training step;
kept at
1 for basic lstm model and at 512 for improved lstm model)

• Optimizer Function (which function to optimize by minimizing error; used

“Adam” throughout)

• Epochs (how many times to run through the training process; kept at 1 for
base

model and at 20 for improved LSTM)

Benchmark Model
For this project i have used a Linear Regression model as its primary benchmark. As one of my
goals is to understand the relative performance and implementation differences of machine learning
versus deep learning models. This Linear Regressor was based on the examples presented in
Udacity’s Machine Learning for Trading course and was used for error rate comparison MSE and
RMSE utilizing the same dataset as the deep learning models.

Following is the predicted results that i got from my benchmark model :

lOMoAR cPSD| 20617519

X-axis: Represents Tradings Days

Y-axis: Represents Closing Price In USD
Green line: Adjusted Close price
Blue Line: Predicted Close price

Train Score: 0.1852 MSE (0.4303 RMSE)

lOMoAR cPSD| 20617519

Test Score: 0.08133781 MSE (0.28519784 RMSE)

METHODOLOGY

Data Preprocessing
Acquiring and preprocessing the data for this project occurs in following sequence, much of

which has been modularized into the preprocess.py file for importing and use across all

notebooks:

• Request the data from the Google Finance Python API and save it in google.csv file in the

following format.

Date Open High Low Close Volume

30-Jun-17 943.99 945.00 929.61 929.68 2287662

29-Jun-17 951.35 951.66 929.60 937.82 3206674

28-Jun-17 950.66 963.24 936.16 961.01 2745568

lOMoAR cPSD| 20617519

• Remove unimportant features(date, high and low) from the acquired data and reversed the order
of data, i.e., from january 03, 2005 to june 30, 2005

Item Open Close Volume

0 98.80 101.46 15860692

1 100.77 97.35 13762396

2 96.82 96.85 8239545

3 97.72 94.37 10389803

• Normalised the data using MinMaxScaler helper function from Scikit-Learn.

Item Open Close Volume

lOMoAR cPSD| 20617519

0 0.012051 0.015141 0.377248

1 0.014198 0.010658 0.325644

2 0.009894 0.010112 0.189820

3 0.010874 0.007407 0.242701

• Stored the normalised data in google_preprocessed.csv file for future reusability.

• Splitted the dataset into the training (68.53%) and test (31.47%) datasets for linear regression
model. The split was of following shape :
x_train (2155, 1)
y_train (2155, 1)
x_test (990, 1)
y_test
(990, 1)

• Splitted the dataset into the training (82.95%) and test (17.05%) datasets for LSTM model. The
Split was of following shape: x_train (2589, 50, 3) y_train (2589,) x_test (446, 50, 3) y_test
(446,)
lOMoAR cPSD| 20617519

Implementation

Once the data has been downloaded and preprocessed, the implementation process occurs
consistently through all three models as follow:
lOMoAR cPSD| 20617519

I have thoroughly specified all the steps to build, train and test model and its predictions in the
notebook itself.
lOMoAR cPSD| 20617519

Some code implementation insight:

Benchmark model :

Step 1 : Split into train and test model :

Here I am calling a function defined in ‘stock_data.py’ which splits the data for linear

regression model. The function is as follows :

lOMoAR cPSD| 20617519
lOMoAR cPSD| 20617519

Step 2: In this step model is built using scikit-learn linear_model 6 library.

Here I am calling a function defined in ‘LinearRegressionModel.py ’ which builds the

model for the project. The screenshot of the function is as follows:

lOMoAR cPSD| 20617519

Step 3: Now it’s time to predict the prices for given test datasets.

The screenshot of the function is as follows, it is defined in

‘LinearRegressionModel.py ’:

Step 4: Finally calculate the test score and plot the results of benchmark model.
lOMoAR cPSD| 20617519

Improved LSTM model :

Step 1 : Split into train and test model :

Note : The same set of training and testing data is used for improved LSTM as is
used with basic LSTM.

Step 2 : Build an improved LSTM model :

Epochs from 1 to 20 for Here I am calling a function defined in ‘lstm.py ’ which

builds the improved lstm model for the project.
lOMoAR cPSD| 20617519

The screenshot of the function is as follows:

7
NOTE: The function uses keras Long short term memory library to implement LSTM
model.

I have increased the batch_size to 512 from my improved LSTM model

Also in the function i have add increased the no of nodes in hidden layer to 128
from 100 and have added a drop out of 0.2 to all the layers.
lOMoAR cPSD| 20617519

Step 3: We now need to train our model.

lOMoAR cPSD| 20617519

I have used here a built in library function to train the model.

lOMoAR cPSD| 20617519

Step 4: Now it’s time to predict the prices for given test datasets.

I have used a built-in function to predict the outcomes of the model.

Step 5: Finally calculate the test score and plot the results of improved
LSTM model.

Refinement
For this project i have worked on fine tuning parameters of LSTM to get better predictions. I did
the improvement by testing and analysing each parameter and then selecting the final value for
each of

them.

To improve LSTM i have done following:

● Increased the number of hidden node from 100 to 128.

● Added Dropout of 0.2 at each layer of LSTM
lOMoAR cPSD| 20617519

● Increased batch size from 1 to 512

● Increased epochs from 1 to 20
● Added verbose = 2
● Made prediction with the batch size

Thus improved my mean squared error, for testing sets, from 0.01153170 MSE to

0.00093063 MSE.

The predicted plot difference can be seen as follows:

lOMoAR cPSD| 20617519

Fig : Plot For Adjusted Close and Predicted Close Prices for basic LSTM model
lOMoAR cPSD| 20617519

Fig : Plot For Adjusted Close and Predicted Close Prices for improved LSTM model
lOMoAR cPSD| 20617519

RESULT

Model Evaluation and Validation

With each model i have refined and fined tune my predictions and have reduced mean squared
error significantly.

● For my first model using linear regression model:

● Train Score: 0.1852 MSE (0.4303 RMSE)
● Test Score: 0.08133781 MSE (0.28519784 RMSE)

Fig: Plot of Linear Regression Model

● For my second model using basic Long-Short Term memory model:

lOMoAR cPSD| 20617519

● Train Score: 0.00089497 MSE (0.02991610 RMSE)

● Test Score: 0.01153170 MSE (0.10738577 RMSE)

Fig: Plot of basic Long-Short Term Memory model

● For my third and final model, using improved Long-Short Term memory model:
lOMoAR cPSD| 20617519

● Train Score: 0.00032478 MSE (0.01802172 RMSE)

● Test Score: 0.00093063 MSE (0.03050625 RMSE)

Fig: Plot of Improved Long-Short Term Memory Model

Robustness Check :
lOMoAR cPSD| 20617519

For checking the robustness of my final model i used an unseen data, i.e, data of Alphabet Inc.
from July 1, 2017 to July 20, 2017. On predicting the values of unseen data i got a decent result
for the data. The results are as follows:

Test Score: 0.3897 MSE (0.6242 RMSE)

Justification
Comparing the benchmark model - Linear Regression to the final improved LSTM model, the
Mean
Squared Error improvement ranges from 0.08133781 MSE (0.28519784 RMSE) [Linear

Regression Model] to 0.00093063 MSE (0.03050625 RMSE) [Improved LSTM] . This

significant decrease in error rate clearly shows that my final model have surpassed the basic and

benchmark model.

Also the Average Delta Price between actual and predicted Adjusted Closing Price values was:

Delta Price: 0.000931 - RMSE * Adjusted Close Range

Which is less than one cent :)

lOMoAR cPSD| 20617519

CONCLUSION

Free-Form Visualization
I have already discussed all the important features of the datasets and their visualisation in one of
the above sections. But to conclude my report i would choose my final model visualization, which
is improved version of LSTM by fine tuning parameters. As i was very impressed on seeing how
close i have gotten to the actual data, with a mean square error of just 0.0009. It was an ‘Aha!’
moment for me as i had to poke around a lot (really ALOT !! :P ). But it was fun working on this
project.

Fig: Plot of Improved Long-Short Term Memory Model

Reflection
lOMoAR cPSD| 20617519

To recap, the process undertaken in this project:

● Set Up Infrastructure
○ iPython Notebook
○ Incorporate required Libraries (Keras, Tensor flow, Pandas, Matplotlib, Sklearn,
Numpy)
○ Git project organization
● Prepare Dataset
○ Incorporate data of Alphabet Inc company
○ Process the requested data into Pandas Dataframe
○ Develop function for normalizing data

○ Dataset used with a 80/20 split on training and test data across all
models ● Develop Benchmark Model
○ Set up basic Linear Regression model with Scikit-Learn
○ Calibrate parameters
● Develop Basic LSTM Model
○ Set up basic LSTM model with Keras utilizing parameters from Benchmark
Model
● Improve LSTM Model
○ Develop, document, and compare results using additional labels for the LSMT
model 5. Document and Visualize Results
● Plot Actual, Benchmark Predicted Values, and LSTM Predicted Values per time series ●
Analyze and describe results for report.

I started this project with the hope to learn a completely new algorithm, i.e, Long-Short Term
Memory and also to explore a real time series data sets. The final model really exceeded my
expectation and have worked remarkably well. I am greatly satisfied with these results.

The major problem i faced during the implementation of project was exploring the data. It was
toughest task. To convert data from raw format to preprocess data and then to split them into
training and test data. All of these steps require a great deal of patience and very precise
approach. Also i had to work around a lot to successfully use the data for 2 models, i.e, Linear
Regression and Long-Short Term Memory, as both of them have different inputs sizes. I read
many research papers to get this final model right and i think it was all worth it :)

Improvement
Before starting my journey as Machine Learning Nanodegree Graduate i had no prior experience
in python. In the beginning of this course to do everything with python, i had to google it. But
lOMoAR cPSD| 20617519

now i have not only made 7 projects in python, i have explored many libraries along the ways
and can use them very comfortably. This is all because of highly interactive videos and forum
provided by Udacity. I am really happy and satisfied taking up this course.

And as there is scope of improvement in each individual so is the case with this project. This
project though predicts closing prices with very minimum Mean Squared Error, still there are
many things that are lagging in this project. Two of most important things are :

● There is no user interaction or interface provided in this project. A UI can be provided

where user can check the value for future dates.
● The stocks used for this project are only of Alphabet Inc, we can surely add more S&P
500 in the list so as to make this project more comprehensive.

I would definitely like to add these improvement to this project in future.

"Credit Card Fraud Detection": Project Report
100% (1)
"Credit Card Fraud Detection": Project Report
15 pages
Geist The Sin-Eaters
92% (24)
Geist The Sin-Eaters
314 pages
Dsbda Mini Manav
No ratings yet
Dsbda Mini Manav
17 pages
Report Minor Project PDF
No ratings yet
Report Minor Project PDF
37 pages
"House Price Prediction": Internship Project Report On
No ratings yet
"House Price Prediction": Internship Project Report On
34 pages
School Canteen Daily Inventory 2020
100% (3)
School Canteen Daily Inventory 2020
33 pages
Project Report Stock Market
No ratings yet
Project Report Stock Market
62 pages
Heart Disease Prediction Synopsis
No ratings yet
Heart Disease Prediction Synopsis
36 pages
Dbms Project Report Inventory Management System
No ratings yet
Dbms Project Report Inventory Management System
41 pages
Fake News Detection Using LSTM
No ratings yet
Fake News Detection Using LSTM
67 pages
Sentiments Analysis Using Ai: Project Report
No ratings yet
Sentiments Analysis Using Ai: Project Report
27 pages
AN INDUSTRY ORIENTED MINI PROJECT - Docx Edited'
No ratings yet
AN INDUSTRY ORIENTED MINI PROJECT - Docx Edited'
5 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
186 pages
Major Project Documentation Final 2
No ratings yet
Major Project Documentation Final 2
62 pages
Project Report AI Chatbot Covid 19
No ratings yet
Project Report AI Chatbot Covid 19
28 pages
Lung Disease Prediction From X Ray Images
100% (1)
Lung Disease Prediction From X Ray Images
63 pages
Final Internshala Report
No ratings yet
Final Internshala Report
38 pages
Face Recognition Attendance System
No ratings yet
Face Recognition Attendance System
18 pages
Loan Approval System Based On Machine Learning Approach
100% (1)
Loan Approval System Based On Machine Learning Approach
55 pages
Data Science Report - Compress
No ratings yet
Data Science Report - Compress
31 pages
Training Report On Machine Learning PDF
No ratings yet
Training Report On Machine Learning PDF
28 pages
Internship Report On Ai
No ratings yet
Internship Report On Ai
32 pages
Crime Prediction in Nigeria's Higer Institutions
No ratings yet
Crime Prediction in Nigeria's Higer Institutions
13 pages
Seminar Report On Machine Learing
33% (3)
Seminar Report On Machine Learing
30 pages
Internship Report File
No ratings yet
Internship Report File
35 pages
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
No ratings yet
A Machine Learning Approach For Tracking and Predicting Student Performance in Degree Programs
34 pages
Sentimental Analysis of Movie Review
100% (1)
Sentimental Analysis of Movie Review
58 pages
File 4
No ratings yet
File 4
60 pages
Amit Kumar: Bigmart Sales Prediction A Project Report
No ratings yet
Amit Kumar: Bigmart Sales Prediction A Project Report
47 pages
Project Report On Flight Price Predication Using ML Techniques
No ratings yet
Project Report On Flight Price Predication Using ML Techniques
23 pages
REPORT FILE of FACE MASK DETECTION
No ratings yet
REPORT FILE of FACE MASK DETECTION
45 pages
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
No ratings yet
Intrusion Detection System in Software Defined Networks Using Machine Learning Approach
8 pages
Project Report Emaildetection
No ratings yet
Project Report Emaildetection
44 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
C14 - Speech Emotion Recognition Using Machine Learning
No ratings yet
C14 - Speech Emotion Recognition Using Machine Learning
118 pages
Intern Report
50% (2)
Intern Report
29 pages
Sign Language Recognition Using Deep Learning
No ratings yet
Sign Language Recognition Using Deep Learning
6 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Data Science
No ratings yet
Data Science
21 pages
MIS Project
No ratings yet
MIS Project
111 pages
PDF Sentimental Analysis Project Documentation
No ratings yet
PDF Sentimental Analysis Project Documentation
74 pages
Atulkumar Bca 5thsem A35404819038 NTCC Amity University Jharkhand
No ratings yet
Atulkumar Bca 5thsem A35404819038 NTCC Amity University Jharkhand
76 pages
Conversational AI Powered Chatbot Using Lex and AWS
0% (1)
Conversational AI Powered Chatbot Using Lex and AWS
6 pages
Health Care Final Project
No ratings yet
Health Care Final Project
78 pages
Heart Disease Prediction: Submitted For Partial Fulfillment of The Degree
No ratings yet
Heart Disease Prediction: Submitted For Partial Fulfillment of The Degree
38 pages
Python and Machine Learning: A Practical Training Report On
No ratings yet
Python and Machine Learning: A Practical Training Report On
65 pages
Project Report Hate
100% (1)
Project Report Hate
24 pages
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
No ratings yet
A Report of 08 Weeks Industrial Training At: ASPEXX Health Solution Pvt. LTD
74 pages
Students Placement Prediction Using Machine Learning Algorithms
No ratings yet
Students Placement Prediction Using Machine Learning Algorithms
14 pages
PROJECT REPORT For Machine Learning
100% (1)
PROJECT REPORT For Machine Learning
22 pages
Major Project (Lipsha)
No ratings yet
Major Project (Lipsha)
114 pages
Stock Market Trend Prediction Using Machine Learning
No ratings yet
Stock Market Trend Prediction Using Machine Learning
18 pages
Major Project Report
No ratings yet
Major Project Report
100 pages
Data Mining TOC
No ratings yet
Data Mining TOC
3 pages
Heart Disease Prediction Using Machine Learning Report
50% (2)
Heart Disease Prediction Using Machine Learning Report
45 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Signature Verification and Detection
No ratings yet
Signature Verification and Detection
61 pages
CSE35 Project Report
No ratings yet
CSE35 Project Report
111 pages
Stock Price Prediction Using LSTM RNN and CNN-slid
No ratings yet
Stock Price Prediction Using LSTM RNN and CNN-slid
6 pages
Internship - Report Nithin
No ratings yet
Internship - Report Nithin
25 pages
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet
Internship Offer letter-Shubham Kumar
No ratings yet
Internship Offer letter-Shubham Kumar
1 page
Interview Questions
No ratings yet
Interview Questions
7 pages
sterling-accuris- lab report
No ratings yet
sterling-accuris- lab report
6 pages
Answer Key
No ratings yet
Answer Key
1 page
UNIT
No ratings yet
UNIT
4 pages
Computer Graphics
No ratings yet
Computer Graphics
22 pages
12 Chapter 5
No ratings yet
12 Chapter 5
33 pages
Mobile Computing
No ratings yet
Mobile Computing
21 pages
Front
No ratings yet
Front
1 page
What Is A Semiconductor
No ratings yet
What Is A Semiconductor
3 pages
EASA Human Performance PDF
No ratings yet
EASA Human Performance PDF
15 pages
John Deere 329D Skid Steer Loader (Manual Controls) Service Repair Manual (TM11431)
100% (1)
John Deere 329D Skid Steer Loader (Manual Controls) Service Repair Manual (TM11431)
15 pages
Rathi Tyre Flex User Manual
No ratings yet
Rathi Tyre Flex User Manual
28 pages
HSN Model Exam QP
No ratings yet
HSN Model Exam QP
2 pages
04 Logartihms
No ratings yet
04 Logartihms
3 pages
HRLM - Catalogue # Ex Apparatus - AC-Z Series Explosion Proof Plug and Receptacles
No ratings yet
HRLM - Catalogue # Ex Apparatus - AC-Z Series Explosion Proof Plug and Receptacles
2 pages
Benzoic Acid in Food Ti Tri Metric Method
No ratings yet
Benzoic Acid in Food Ti Tri Metric Method
2 pages
Fall 2024 PSCI 100 People, Power, and Politics SYLLABUS
No ratings yet
Fall 2024 PSCI 100 People, Power, and Politics SYLLABUS
6 pages
Uninterruptible power supply Линейно - интерактивный источник бесперебойного питания
No ratings yet
Uninterruptible power supply Линейно - интерактивный источник бесперебойного питания
22 pages
Lesson Plan Weebly Example
No ratings yet
Lesson Plan Weebly Example
4 pages
Step by Step Configuration Integration of FI With MM PDF
No ratings yet
Step by Step Configuration Integration of FI With MM PDF
14 pages
Manufacturing of Plastic Crates
No ratings yet
Manufacturing of Plastic Crates
1 page
Addis Ababa Science and Technology University College of Electrical & Mechanical Engineering Department of Software Engineering
No ratings yet
Addis Ababa Science and Technology University College of Electrical & Mechanical Engineering Department of Software Engineering
17 pages
UniAthena Brochure - Academic Programs
No ratings yet
UniAthena Brochure - Academic Programs
18 pages
Essential Rack System Requirements For Next Generation Data Centers
No ratings yet
Essential Rack System Requirements For Next Generation Data Centers
10 pages
Dashboard - MomoTube
No ratings yet
Dashboard - MomoTube
1 page
01 IR Spectros
No ratings yet
01 IR Spectros
27 pages
The Founding Principles of Scrum
No ratings yet
The Founding Principles of Scrum
2 pages
Exercises in Nonlinear Control Systems
No ratings yet
Exercises in Nonlinear Control Systems
99 pages
Cache Memory
No ratings yet
Cache Memory
20 pages
General Brochure (English) PDF
No ratings yet
General Brochure (English) PDF
15 pages
Income Tax Objective Type Questions
No ratings yet
Income Tax Objective Type Questions
69 pages
Virtual Mouse
No ratings yet
Virtual Mouse
40 pages
REVISION FOR GRAMMAR Midterm Test 1
No ratings yet
REVISION FOR GRAMMAR Midterm Test 1
2 pages
Herbal Medicine Is Considered As The Oldest Form of Medicines
No ratings yet
Herbal Medicine Is Considered As The Oldest Form of Medicines
17 pages
Soal Bahasa Inggris
100% (1)
Soal Bahasa Inggris
8 pages
Cobalt Strike Manual 1658430628
No ratings yet
Cobalt Strike Manual 1658430628
30 pages
EPRO 1500-10k Series
0% (1)
EPRO 1500-10k Series
2 pages