ML Project documentation-LSTM-LR
ML Project documentation-LSTM-LR
TEAM MEMBERS
SeRIAL NUMBER NAME Roll Number
1. Srinjoy Ganguly 16500321017
2. Sarbojit Ghosh 16500121061
3. Pragyaparomita Kar 16500121034
4. Sneha Mukherjee 16500121056
INDEX
Page | 3
Declaration
Serial No. SuB-TOPICS PAGE NUMBER
DECLARATION 3
1.
2. CERTIFICATE 4
4. DATASET DESCRIPTION
5. METHODOLOGY
6. IMPLEMENTATION
7. RESULTS
9. DEPLOYMENT
10. CONCLUSION
11. REFERENCES
This is to declare that the project report entitled submitted by Srinjoy Ganguly, Sarbojit
Ghosh, Pragyaparomita Kar, and Sneha Mukherjee to the Calcutta Institute of Engineering
and Management, Kolkata in partial fulfillment for the award of the degree of B. Tech in
Computer Science & Engineering is a bona fide record of project work carried out by them
Page | 4
under my supervision. The contents of this report, in full or in parts, have not been
submitted to any other Institution or University for the award of any degree or diploma.
Mr. Suman Halder
2. Sarbojit Ghosh
3. Pragyaparomita Kar
4. Sneha Mukherjee
Certificate
Page | 5
This is to certify that the project report entitled submitted by Srinjoy Ganguly,
Sarbojit Ghosh, Pragyaparomita Kar, Sneha Mukherjee to the Calcutta institute
of Engineering and Management, Kolkata in partial fulfilment for the award of
the degree of B. Tech in Computer Science & Engineering is a bona fide record
of project work carried out by them under my supervision. The contents of this
report, in full or in parts, have not been submitted to any other institution or
University for the award of any degree or diploma.
PROJECT OVERVIEW
● For the students learning about finance, tapping into the potential of machine
learning for stock market forecasting is not just a theoretical exercise but a valuable
skill with real-world impact. This project delves into stock market prediction by
comparing different machine learning models, offering guidance for students entering
the field with advanced ML techniques.
● This project aims to explain the advantages of using predictive analytics in financial
decision-making and how these skills can give a competitive advantage in the world
of investments. For students, there are multiple benefits. First, learning about stock
market prediction allows them to apply the theoretical knowledge they have gained in
finance and machine learning courses in a practical way. This improves
understanding and helps them see how these subjects come together to produce
valuable insights in real-world situations.
● Additionally, students can gain a better understanding of how the stock market
operates by investing small amounts and earning returns without taking on too much
risk. Understanding bull markets can aid in recognizing growth trends, whereas
knowledge of bear markets can help in managing downturns effectively.
● Furthermore, we investigate the diverse array of ML models employed for stock
market prediction, offering a comprehensive comparison of their common features,
strengths, and weaknesses. This project explores a range of approaches, from simple
linear regression to advanced neural networks, to help identify the best model for
different market conditions.
● By comparing machine learning models across various papers, the goal is to improve
understanding of predictive algorithms and promote critical thinking in assessing
their effectiveness. By analyzing the performance of the models under various
conditions, this paper serves as a guide for those looking to use machine learning in
stock market prediction.
● It bridges the gap between theoretical knowledge and practical application,
empowering aspiring financial analysts with the skills they need to interpret market
patterns and confidently make informed financial decisions.
Page | 7
● Till now, various work has been done related to the Stock Market Prediction. After
analysing such papers available on the internet, we have selected some stock market
prediction models along with two different papers that focus on reviewing the stock
market prediction models that are present in the market.
MOTIVATION: The motivation behind using machine learning (ML) models for stock
market prediction stems from several factors:
1. Data Complexity and Volume: Stock market data is complex and constantly
evolving, with multiple factors influencing stock prices, such as macroeconomic
indicators, corporate earnings, geopolitical events, and market sentiment. Traditional
methods struggle to incorporate all these variables in a comprehensive way, whereas
ML models can handle large datasets, identifying complex patterns and relationships
that may not be obvious through manual analysis.
4. Real-Time Analysis: Machine learning allows for real-time analysis of stock data.
With ML, it's possible to constantly adjust predictions and trading strategies based on
new incoming data (such as stock prices, volume, news, and sentiment). This is
particularly important in the fast-paced, ever-changing environment of the stock
market.
5. Automation and Speed: Machine learning models can automate the prediction and
trading processes. This increases efficiency and speed, enabling algorithms to react to
market changes much faster than human traders. This can be crucial in high-
frequency trading scenarios where even microseconds can make a difference.
6. Risk Management: By learning from historical data, machine learning models can
assess potential risks and forecast volatility, enabling investors to make informed
decisions that help to mitigate losses. Models can be trained to consider a range of
risk factors, leading to better decision-making in portfolio management.
8. Sentiment Analysis: With the rise of social media and online news, sentiment
analysis has become a critical component in stock market prediction. ML models can
process and analyze unstructured data such as news headlines, tweets, or other textual
data to gauge market sentiment, which can be a strong predictor of stock price
movements.
9. Adaptability: Stock market conditions change over time due to evolving economic
conditions, technological advances, and other factors. ML models can be retrained
with new data to adapt to changing market environments, ensuring that predictions
remain relevant.
Overall, machine learning models bring significant advantages in terms of predictive power,
efficiency, and the ability to process vast amounts of data, which makes them highly
valuable in the context of stock market prediction and trading.
Summary: Stock market prediction is a key aspect for a financial analyst and
investor, as it facilitates better decisions and improved investment strategies. Towards this
end, this project compares how machine learning (ML) models perform in predicting stock
market prices with relatively small dataset sizes so as to get the best performing one. This
study evaluates the efficiency and accuracy of artificial neural networks, support vector
regression, LSTM and decision trees in capturing market trends and forecasting stock
movements. The importance of our proposed paper lies in its potential to demystify the
complexities of financial markets for students and new entrants in the field of finance.
Participants can garner practical insights on how the stock market operates and the
application of theoretical ML concepts to financial analytics by identifying the best-
performing ML models with limited datasets, which is a common occurrence in real life
situations.
This paper aims to analyse the commonly present machine learning models in
stock market prediction and choose the most effective model depending upon their
performance, which is achieved using limited information (short term data). This would
provide an edge in taking small (less-risky) informed financial decisions without extensive
study of the stock market.
Page | 9
DATASET DESCRIPTION
Tesla dataset:
● Consists of 7 columns and 2193 rows.
● The dataset starts from 29-06-2010 to 15-03-2019.
● Open column, represents the price at which the stock started trading when market
opened on the particular day.
● Close column, represents the price of individual stock, when the stock exchange closed
the market for the day. It also represents the buy-sell order executed between two
traders.
● High column, represents the highest price at which the stock traded during the period.
● Low column, Represents the lowest price at which the stock traded during the period.
● Volume column, indicates the total number of trading activity performed during a
period of time.
● Adjacent column(Adj.), is a calculation adjustment made to the stock’s closing price.
It is more complex and accurate than the closing price. The adjustment made to the
closing price depicts the true price of the stock because the outside factors could have
altered the True price.
● Snippet from the dataset:
Page | 10
Google dataset:
A.Train Dataset:
● Consists of 6 columns and 1257 rows.
● The dataset starts from 03-01-2012 to 30-12-2016.
● Open column, represents the price at which the stock started trading when market
opened on the particular day.
● Close column, represents the price of individual stock, when the stock exchange
closed the market for the day. It also represents the buy-sell order executed between
two traders.
● High column, represents the highest price at which the stock traded during the period.
● Low column, Represents the lowest price at which the stock traded during the period.
● Volume column, indicates the total number of trading activity performed during a
period of time.
● Snippet from the dataset:
Page | 11
B.Test Dataset:
● Consists of 6 columns and 251 rows.
● The dataset starts from 13-08-2018 to 13-08-2019.
● Open column, represents the price at which the stock started trading when market
opened on the particular day.
● Close column, represents the price of individual stock, when the stock exchange
closed the market for the day. It also represents the buy-sell order executed between
two traders.
● High column, represents the highest price at which the stock traded during the
period.
● Low column, Represents the lowest price at which the stock traded during the
period.
● Volume column, indicates the total number of trading activity performed during a
period of time.
● Snippet from the dataset:
Page | 12
METHODOLOGY
Machine learning is a branch of artificial intelligence that enables systems to learn from
data and enhance their performance without the need for explicit programming. It works by
using algorithms to recognize patterns and connections in data in order to make predictions
or decisions. There are three main types of machine learning techniques: supervised
learning, unsupervised learning, and reinforcement learning.
Supervised learning is when the algorithm learns from data that has been labelled, meaning
each input is matched with a specific output. This type of learning is crucial for predicting
stock market trends, as it allows models to analyse historical data with known results to
identify patterns and connections between different market indicators and upcoming price
Page | 13
changes. By examining factors such as previous stock prices, trading volumes, economic
indicators, and sentiment analysis, supervised machine learning algorithms can be
programmed to forecast future stock prices. Utilizing supervised machine learning (ML) in
stock market prediction is valuable because it can adapt to unfamiliar data and provide
precise forecasts for upcoming stock prices. Through supervised ML's training on
categorized data, it can identify intricate connections and trends within the market. This
enables investors and financial experts to discover potential market patterns and make well-
informed investment choices. This feature is essential in the fast-changing and
unpredictable setting of financial markets, where precise predictions can result in
substantial financial profits or losses.
EXPLANATION OF MODELS:
long short-term memory (lstm): Long Short-Term Memory
(LSTM) is a special type of recurrent neural network (RNN) that was created to solve the
problem of the vanishing gradient, a challenge that commonly occurs with traditional
RNNs.
LSTMs have a distinctive structure that consists of memory cells and gates that help
manage the flow of information. These memory cells enable LSTMs to store data over
extended sequences and selectively retain or discard information based on its importance.
The benefits of using LSTMs include their capability to comprehend long-term
relationships in sequential data, like time series or natural language, and their efficiency in
handling the issue of vanishing gradients during the training process.
AI models like this one can handle input sequences of different lengths and are resistant to
overfitting.
However, drawbacks include the need for complex hyperparameter tuning, higher
computational complexity, and a risk of overfitting if not properly regulated or trained on
enough data.
Page | 14
IMPLEMENTATION
7. Building and training the Linear Regression model and training the model:
8. Plotting the Predicted vs Actual values from the Model and data:
Page | 18
In summary, the study demonstrates that while Linear Regression provides a simple and
interpretable approach to stock market prediction, LSTM models offer a robust alternative
for capturing the intricate patterns and dependencies in financial time-series data. These
findings are promising for the development of advanced predictive systems in the financial
domain.
Page | 24
❖ Challenges:
1. Data Quality and Preprocessing
○ Stock market datasets often contain missing values, outliers, or inconsistent
formatting that can negatively impact model performance.
○ Aligning datasets from multiple sources, such as Tesla and Google, can
introduce compatibility issues.
○ Feature selection for both Linear Regression and LSTM required careful
consideration to ensure meaningful predictors were chosen.
2. Temporal Dependencies
○ Linear Regression struggles with temporal dependencies since it assumes that
observations are independent.
○ Capturing long-term dependencies in stock prices using LSTM required
extensive parameter tuning to avoid overfitting.
3. High Volatility and Noise
○ Stock market data is inherently noisy and volatile, making it difficult to
separate signal from noise.
○ Sudden market events or external factors not present in the dataset could
degrade prediction accuracy.
4. Model Optimization
○ Ensuring Linear Regression and LSTM models were properly optimized
required significant computational resources.
○ Identifying the optimal architecture for LSTM (e.g., number of layers, hidden
units, and activation functions) was a time-consuming process.
5. Evaluation Metrics
○ Choosing the right evaluation metrics was crucial, as standard metrics like
Mean Squared Error (MSE) might not capture all aspects of predictive
performance in a financial context.
6. Interpretability vs. Complexity
○ Linear Regression is easy to interpret but may not capture complex patterns.
○ LSTM, while powerful, acts as a "black box," making it difficult to explain
predictions to stakeholders.
❖ Solutions:
1. Data Cleaning and Feature Engineering
○ Applied techniques like interpolation to handle missing values and Z-score
normalization to address outliers.
○ Unified Tesla and Google datasets by standardizing formats and timestamps.
○ Used domain knowledge to select key features, such as historical prices,
volume, and technical indicators.
2. Temporal Data Handling
Page | 26
DEPLOYMENT
Page | 27
CONCLUSION
Page | 28
Linear Regression served as a baseline model, providing a simple yet interpretable approach
to forecasting trends. However, its limitations in capturing complex temporal patterns and
non-linear relationships inherent in stock market data were evident.
In contrast, the LSTM model, with its ability to learn sequential dependencies and temporal
patterns, demonstrated superior performance. The recurrent structure of LSTMs allowed it
to capture intricate price fluctuations, making it more effective for stock price prediction.
Despite this, the model's accuracy was still constrained by the inherent unpredictability of
stock markets, driven by external factors such as macroeconomic events, investor
sentiment, and market anomalies.
Through rigorous evaluation, we identified the strengths and limitations of each approach,
providing insights into their applicability for stock market prediction. While LSTMs show
promise for more accurate forecasting, future work could enhance these models further by
incorporating additional features such as trading volume, sentiment analysis from news and
social media, and macroeconomic indicators.
In conclusion, this project highlights the potential of machine learning models in financial
forecasting while emphasizing the importance of continuous innovation and refinement to
navigate the complexities of stock market prediction. These findings underscore the need
for hybrid approaches that combine traditional financial theories with advanced machine
learning techniques to achieve robust and reliable predictions.
REFERENCES
Page | 29