Ai ML Lab Project Template Final
Ai ML Lab Project Template Final
Ai ML Lab Project Template Final
PROJECT REPORT on
ARTIFICIAL INTELLIGENCE & MACHINE LEARNING
Lab(18CS62)
VI SEMESTER
2021-2022
Submitted by
CERTIFICATE
Certified that the Lab Project report work titled “Stock movement prediction using
Machine Learning” has been carried out by <Khetan Rishabh
Purushotam(1RV9CS0710 and Mohamed Moin Irfan(1RV19CS089)>, bonafide
students of RV College of Engineering, Bengaluru, have submitted in partial fulfillment
for the Assessment of Course: Artificial Intelligence & Machine Learning (18CS62)
– Lab Component during the year 2021-2022. It is certified that all
corrections/suggestions indicated for the internal assessment have been incorporated in
the report.
RV COLLEGE OF ENGINEERING, BENGALURU ® - 560059
(Autonomous Institution Affiliated to VTU)
DECLARATION
Science and Engineering, R.V. College of Engineering, Bengaluru hereby declare that
the Lab -project titled “Stock movement prediction using Machine Learning” has
been carried out by us and submitted in partial fulfillment for the Assessment of
Date:
ACKNOWLEDGEMENT
Any achievement, be it scholastic or otherwise does not depend solely on the individual efforts but
on the guidance, encouragement and cooperation of intellectuals, elders and friends. A number of
personalities, in their own capacities have helped me in carrying out this project work. I would like
to take this opportunity to thank them all.
I would like to thank Dr.Ramakanth Kumar P, Head of Department, Computer Science &
Engineering, R.V.C.E, Bengaluru, for his valuable suggestions and expert advice.
First and foremost I would like to thank Dr. Subramanya K N, Principal, R.V.C.E, Bengaluru, for
his moral support towards completing my project work.
I thank my Parents, and all the Faculty members of the Department of Computer Science &
Engineering for their constant support and encouragement.
Last, but not the least, I would like to thank my peers and friends who provided me with valuable
suggestions to improve my project.
ABSTRACT
The stock market is a dynamic and volatile platform which provides an environment and
opportunity for the traders to invest and trade in stocks of particular companies. The price of a
stock is dependent on numerous static and dynamic features. Predicting the trend in future price
movement of a particular company’s stock can be extremely beneficial for investors and traders.
In this project, we predict the direction in which a stock will be moving by studying the
previous trends and using the concept of false positives and training them using the Random
Forest classifier. We have seen a 71% accuracy in our predictions using the given model. The
dataset that has been used for this is the
Table of Contents
Abstract ii
List of tables
Table 1 - Related works
List of figures
1. Introduction
1
1.1. Project Domain and Problem addressing 2
1.2 Issues and Challenges
1.3 Need for AI-based solutions
1.4. Problem Statement
1.5. Project objectives
1.6. Summary
2. Literature Study
(at least 10 to 15 contemporary works presented in running paragraphs,
covering research objectives, methodology, results and gaps). Write the literature
survey in the form of table as shown below:
Sl no Title and year of paper Key findings Drawbacks
3. Design Details
3.1 Architecture
3.2 Methodology
3.3 Data set details
3.4 ML/DL techniques used
3..5 Hardware and Software requirements
Appendices
Appendix A: Screenshots
Appendix B: Printout of the base paper used for implementation of this project
1) Introduction
With the innovation in technology and their application in the stock market, the system has
become increasingly complex and volatile which in turn has made human predictions
highly inaccurate, but using Machine Learning to find out the patterns in the system using
historical data can help us predict the future prices more accurately.
We want to maximize our true positives - days when the algorithm predicts that the price
will go up, and it actually goes up. Therefore, we'll be using precision as our error metric
for our algorithm, which is true positives / (false positives + true positives). This will
ensure that we minimize how much money we lose with false positives (days when we buy
the stock, but the price actually goes down). This means that we will have to accept a lot of
false negatives - days when we predict that the price will go down, but it actually goes up.
This is okay, since we'd rather minimize our potential losses than maximize our potential
gains.
1.6 Summary
It is now evidently clear that AI and ML can have huge significance in topics of prediction
and using these systems in financial markets can be a huge bonus if applied correctly and
carefully. AI systems can predict the movements using knowledge of complex mathematical
functions on the basis of which the stocks move and by training them could be able to predict
how it would move ahead.
2) Literature Survey
2.1 Introduction:
The work on the use of artificial intelligence and especially machine learning to predict the
prices of any type of equity and commodity has been going on since a long time. With the
increase in the technological developments in the field of Machine learning, it has started
becoming clearer that historical patterns can be used in multiple ways to predict what can
happen in the future relating to the prices of any type of equity or commodity. With this
development, people have started creating more novel models to predict the movements in
prices more accurately. Since these markets are a huge arena for making financial profits, all the
giant financial institutions started conducting even more research in this field to gain an
economic advantage over their competitors and this forced the work on such models to full
force.
ISSN: 2319-7064
Vivek Kanade, SVM, ANN SVM Only sentiment data are used
6. Bhausaheb Devikar, (Support vector from various news and
Sayali Phadatare, Machine) Twitter resources no historical
Pranali Munde, “Stock data are considered for
market prediction: Using predictions.
historic data analysis”.
International journal of
advanced research in
computer science and
software
engineering, volume 7,
issue 1, 2017. ISSN: 2277
128X. DOI:
10.23956/ijarcsse/V711/01
12.
9.
Stock Market Prediction Sentimental analysis in Large amount of storage required
based on Social Sentiments the market for analyzing such sentiments in
using Machine Learning real time.
by Tejas Mankar , Tushar
Hotchandani , Manish
Madhwani , Akshay
Chidrawar, Lifna C.
10. Stock Market Prediction The old statistical The Multi-Layer Perceptron
Using Machine Learning techniques including algorithm of machine learning
Techniques by Mehak Simple Moving predicted 57% correct market
Usmani, Syed Hasan Adil, Average (SMA) and performance.
Kamran Raza and Syed Autoregressive
Saad Azhar Ali
Integrated Moving
Average (ARIMA) are
also used as input. The
machine learning
techniques including
Single Layer
Perceptron (SLP),
Multi-Layer Perceptron
(MLP), Radial Basis
Function (RBF) and
Support Vector
Machine (SVM) are
compared.
3.1 Architecture
Data is imported to the system using python command. Once the data is obtained, it has to be
cleaned or preprocessed before feeding it to the system to remove any unwanted spikes which
could jitter the results of the prediction. Once the data has been processed, we calculate the
indicators on the basis of which we are going to design our system and train our model. Once all
the requirements have been set we train and test the data by splitting them. Upon testing we can
plot the predicted information through various mathematical tools for better understanding of
the results.
3.2 Methodology
Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML. It is based on
the concept of ensemble learning, which is a process of combining multiple classifiers to solve a
complex problem and to improve the performance of the model. As the name suggests,
"Random Forest is a classifier that contains a number of decision trees on various subsets of the
given dataset and takes the average to improve the predictive accuracy of that dataset." Instead
of relying on one decision tree, the random forest takes the prediction from each tree and based
on the majority votes of predictions, and it predicts the final output. The greater number of trees
in the forest leads to higher accuracy and prevents the problem of overfitting.
● Root Node: Represents the entire population or sample and this further gets divided into
two or more homogeneous sets. Our starting point.
● Splitting: The process of dividing a node into two or more sub-nodes, for example we
split on gender.
● Decision Node: When a sub-node splits into further sub-nodes, then it is called decision
node.
● Leaf/Terminal Node: Nodes that do not split are called Leaf or Terminal nodes.
● Pruning: When we remove sub-nodes of a decision node, this process is called pruning.
You can say the opposite process of splitting.
● Branch/Subtree: A subsection of the entire tree is called branch or sub-tree.
● Parent and Child Node: A node, which is divided into sub-nodes is called parent node
of sub-nodes whereas sub-nodes are the child of parent node.
In machine learning, we have two categories of learning. Supervised learning and unsupervised
learning. With unsupervised learning, we don't supervise the model and instead allow it to
discover information on its own. We do this by providing an "UNLABELED" data set that
doesn't tell the model what category or value is the "correct" answer. With supervised learning,
we provide the model with a "LABELED" data set which tells the model what the "correct"
value it should be. Random Forest, is an example of a supervised learning algorithm because we
provide the model with a labeled data set.
1. Instability: Even small changes to the input data can have dramatic changes to the
overall structure of the decision tree.
2. They are often relatively inaccurate. Many other predictors perform better with similar
data.
3. For data including categorical variables with different numbers of levels, information
gain in decision trees is biased in favor of those attributes with more levels.
4. Calculations can get very complex, particularly if many values are uncertain and/or if
many outcomes are linked.
These are some of the reasons it's preferable to use Random Forest because we will see that it
helps overcome some of the weaknesses of Decision Trees.
RAM 4 GB or higher
Software requirements define what software is being used. It includes major stuff like what kind
of operating system, what databases are being used. The projects’ software requirements are
given in table x.x.
This project is specifically built in jupyter notebook using python wherein all the dataset
collection (imported through csv file), agents, training and testing of models and the results that
the prediction produces are all implemented using various python libraries like pandas, numpy,
scikit learn etc.
4) Implementation details of the Project
Once we've sorted the data, we need to calculate the change in price from one period to the next.
To do this, we will use the diff() method. Grab the close column and call the diff() method. The
diff() method will calculate the difference from one row to the next.
Step 1: Identify the rows where the ticker symbol changes. If we use the shift() method and shift
every row down by one, the rows where the unshifted column DOES NOT EQUAL the shifted
column is where the ticker changed. We will store these values in a variable called mask.
Step 2: Change those rows to NaN values. We can use the numpy.where() method to test our
series. The test is simple, wherever the mask variable equals True, in other words, wherever the
ticker symbol is different, set the change_in_price column to np.nan.
1) Relative Strength Index (RSI) : RSI is a popular momentum indicator that determines
whether the stock is overbought or oversold. A stock is said to be overbought when the
demand unjustifiably pushes the price upwards. This condition is generally interpreted as
a sign that the stock is overvalued, and the price is likely to go down. A stock is said to
be oversold when the price goes down sharply to a level below its true value. This is a
result caused due to panic selling. RSI ranges from 0 to 100, and generally, when RSI is
above 70, it may indicate that the stock is overbought and when RSI is below 30, it may
indicate the stock is oversold.
2) Stochastic Oscillator : Stochastic Oscillator follows the speed or the momentum of the
price. As a rule, momentum changes before the price changes. It measures the level of
the closing price relative to the low-high range over a period of time.
3) Williams %R : Williams %R ranges from -100 to 0. When its value is above -20, it
indicates a sell signal and when its value is below -80, it indicates a buy signal.
5) Price Rate Of Change : It measures the most recent change in price with respect to the
price in n days ago.
To get a more detailed overview of how the model performed, we can build a classification
report that will compute the F1_Score, the Precision, the Recall, and the Support. Now, I'm
assuming you don't know what these metrics are, so let's take some time to go over them.
1) Accuracy:
Accuracy measures the portion of all testing samples classified correctly.
2) Recall
Recall (also known as sensitivity) measures the ability of a classifier to correctly identify
positive labels. The recall is intuitively the ability of the classifier to find all the positive
samples. The best value is 1, and the worst value is 0.
3) Specificity
Specificity measures the classifier’s ability to correctly identify negative labels.
4) Precision
Precision measures the proportion of all correctly identified samples in a population of samples
which are classified as positive labels. The precision is intuitively the ability of the classifier not
to label as positive a sample that is negative. The best value is 1, and the worst value is 0.
6.4. Summary
Authors opine that application of machine learning techniques in stock price forecasting needs to
be a well thought process and demands painstakingly detailed execution. The proposed approach
is a paradigm shift in this class of problems by reformulating a traditional forecasting model as a
classification problem. Moreover, knowledge discovery from the analysis should create new
frontiers or applications such as a trading strategy based on the strengths of the classification
accuracy, investigating the behavior of certain classes of stocks.
REFERENCES:
[1] B. Chhimwal and V. Bapat, “Impact of foreign and domestic investment in stock market
volatility: Empirical evidence from India,” Cogent Economics & Finance, vol. 8, no. 1, Apr.
2020.
[3] C. Pop et al., “Decentralizing the Stock Exchange using Blockchain And Ethereum-based
implementation of the Bucharest Stock Exchange,” in 2018 IEEE 14th International Conference
on Intelligent Computer Communication and Processing (ICCP), pp. 459-466, 2018.
[4] N. Sakthivel and A. Saravanakumar, “Investors’ Satisfaction on Online Share Trading and
Technical Problems Faced by the Investors: A Study in Coimbatore District of Tamilnadu,”
International Journal of Management Studies, vol. V, no. 3(9), p. 71, Jul. 2018.
[5] D. Shah, H. Isah, and F. Zulkernine, “Stock Market Analysis: A Review and Taxonomy of
Prediction Techniques,” International Journal of Financial Studies, vol. 7, no. 2, p. 26, May
2019.
[6] U. Hathi, “Indian Companies Act 2013 Highlights and Review,” SSRN Electronic Journal,
2014.
[7] S. V. Shenoy and K. Srinivasan, “Relationship of IPO Issue Price and Listing Day Returns
with IPO Pricing Parameters,” International Journal of Management Studies, vol. V, no. 4(1), p.
11, Oct. 2018.
[8] G. Tanty and P. K. Patjoshi, “A Study on Stock Market Volatility Pattern of BSE and NSE in
India,” Asian Journal of Management, vol. 7, no. 3, p. 193, 2016.
[11] M. C. Joshi, “Factors Affecting Indian Stock Market,” SSRN Electronic Journal, 2013.
[12] S. Alhazbi, A. B. Said and A. Al-Maadid, "Using Deep Learning to Predict Stock
Movements Direction in Emerging Markets: The Case of Qatar Stock Exchange," 2020 IEEE
International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar,
pp. 440-444, 2020.
[13] D. Wei, “Prediction of Stock Price Based on LSTM Neural Network,” in 2019 International
Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), pp. 544-547, 2019.
[16] H. Jain and G. Harit, "An Unsupervised Sequence-to-Sequence Autoencoder Based Human
Action Scoring Model," 2019 IEEE Global Conference on Signal and Information Processing
(GlobalSIP), Ottawa, ON, Canada, pp. 1-5, 2019.
[18] J. Chou and T. Nguyen, "Forward Forecast of Stock Price Using SlidingWindo
Metaheuristic-Optimized Machine-Learning Regression," in IEEE Transactions on Industrial
Informatics, vol. 14, no. 7, pp. 3132- 3142, July 2018.
[19] Y. Zhang and Q. Yang, “An overview of multi-task learning,” National Science Review,
vol. 5, no. 1, pp. 30–43, 2018.