Report

This document summarizes research on using supervised learning methods to forecast stock prices based on past returns and news indicators. It discusses preprocessing data by cleaning, transforming, and dividing it into training and testing sets. Various classification algorithms are evaluated using root mean squared error and R-squared values, with gradient boosting regressor found to perform best. Bagging and random forest regressors also performed well. The results show ensemble methods like gradient boosting and bagging improved predictions over single models.

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views2 pages

Report

Uploaded by

Ali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Stock Price Trend Forecasting using Supervised Learning Methods.

Sharvil Katariya1 Saurabh Jain2

Abstract— The aim of the project is to examine a number B. Feature Selection and Feature Generation
of different forecasting techniques to predict future stock
returns based on past returns and numerical news indicators We created new features from the base features which
to construct a portfolio of multiple stocks in order to diversify provided better insights of the data like 50 day moving
the risk. We do this by applying supervised learning methods average, previous day difference, etc.
for stock price forecasting by interpreting the seemingly chaotic To prune out less useful features, in Feature Selection, we
market data. select features according to the k highest scores, with the help
of an linear model for testing the effect of a single regressor,
I. INTRODUCTION
sequentially for many regressors. We used the SelectKBest
The fluctuation of stock market is violent and there Algorithm, with f regression as the scorer for evaluation.
are many complicated financial indicators. However, the Furthermore, we added Twitters Daily Sentiment Score,
advancement in technology, provides an opportunity to as an feature for each company based upon the users tweets
gain steady fortune from stock market and also can help about that particular company and also the tweets on that
experts to find out the most informative indicators to make companys page.
better prediction. The prediction of the market value is of
paramount importance to help in maximizing the profit of III. A NALYSIS
stock option purchase while keeping the risk low. For analyzing the efficiency of the system we are used the
Root Mean Square Error(RMSE) and r2̂ score value.
The next section of the paper will be methodology where
we will explain about each process in detail. After that we A. Root Mean Squared Error (RMSE)
will have pictorial representations of the analysis that we The square root of the mean/average of the square of all
have made and we will also reason about the results achieved. of the error.
Finally, we will define the scope of the project. We will talk The use of RMSE is very common and it makes an excel-
about how to extend the paper to achieve more better results. lent general purpose error metric for numerical predictions.
Compared to the similar Mean Absolute Error, RMSE
amplifies and severely punishes large errors.
II. METHODOLOGY
This section will give you the detailed analysis of each
process involved in the project. Each sub section is mapped
to one of the stages in the project.

A. Data Pre-Processing
Fig. 1. RMSE Value calculation
The pre-processing stage involves
• Data discretization: Part of data reduction but with
particular importance, especially for numerical data
• Data transformation: Normalization.
• Data Cleaning: Fill in missing values.
• Data Integration: Integration of data files.

After the data-set is transformed into clean data-set, the

data-set is divided into training and testing sets so as to
evaluate. Here, the training values are taken as the more
recent values. Testing data is kept as 5-10 percent of the
total dataset.
*This work was supported by International Institute of Information
Technology
1 Sharvil Katariya is a student in Computer Science at IIIT Hyderabad,
India.
2 Nikhil Chavanke is a student in Computer Science at IIIT Hyderabad, Fig. 2. RMSE Value calculation
India.
B. R-Squared Value(r2̂ value) V. R ESULTS
The value of R2 can range between 0 and 1, and the higher Based on the results obtained, it is found that Gradient
its value the more accurate the regression model is as the Boosting Regressor consistently performs the best. This is
more variability is explained by the linear regression model. followed by Bagging Regressor, Random Forest Regressor,
R2 value indicates the proportionate amount of variation in Adaboost Regressor and by K Neighbour Regressor.
the response variable explained by the independent variables. Bagging Regressor is found to perform good as Bagging
R-squared is a statistical measure of how close the data are (Bootstrap sampling) relies on the fact that combination of
to the fitted regression line. It is also known as the coefficient many independent base learners will significantly decrease
of determination, or the coefficient of multiple determination the error. Therefore we want to produce as many independent
for multiple regression. base learners as possible. Each base learner is generated
by sampling the original data set with replacement. From
TABLE I
the results, it is safe to say that additional hidden layer(s)
C LASSIFIER E VALUATION
improve upon the score of the models.
Random Forest is an extension of bagging where the
Algorithm RMSE Value R-squared Value
Random Regressor 1.4325434e-07 0.956669 major difference is the incorporation of randomized feature
Bagging Regressor 1.329966e-07 0.959771 selection.
Adaboost Regressor 2.9882972e-07 0.909611
KNeighbours Regressor 0.00039015 -117.01176 ACKNOWLEDGMENT
Gradient Boosting Regressor 1.274547e-07 0.961448
We would like thank Soham Saha for mentoring our
project and introducing us to the new state-of-art tech-
nologies and helping us at every stage of this project. We
IV. G RAPHS
would also like to thank Dr. Bapi Raju, our course instructor
for Statistical Methods in AI, and clearing basic concepts
required as part of the Project.
R EFERENCES
[1] https://en.wikipedia.org/wiki/F-test
[2] http://goo.gl/4OI84b
[3] http://scikit-learn.org/stable/
[4] http://deeplearning.net/software/theano/
[5] http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[6] http://people.duke.edu/ rnau/411arim.htm - []() - []() - []()

Fig. 3. Comparison Graphs RMSE Value - Different Models

Fig. 4. Comparison Graphs R-squared Value - Different Models

Demantra Engine Tuning-RapidFlow PDF
No ratings yet
Demantra Engine Tuning-RapidFlow PDF
7 pages
Book Ayyub
100% (1)
Book Ayyub
2 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
2 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
2 pages
Stock Price Trend Forecasting Using Supervised Learning Methods
No ratings yet
Stock Price Trend Forecasting Using Supervised Learning Methods
17 pages
CE802 Report
No ratings yet
CE802 Report
7 pages
Stock Market Prediction: Hrithik D B181070PE
No ratings yet
Stock Market Prediction: Hrithik D B181070PE
5 pages
Prediction of Stock Price Based On Financial Data and Tweets
No ratings yet
Prediction of Stock Price Based On Financial Data and Tweets
5 pages
SSRN Id3370789
No ratings yet
SSRN Id3370789
5 pages
Smai Project Abstract
No ratings yet
Smai Project Abstract
4 pages
AI ProjectResearchPaper
No ratings yet
AI ProjectResearchPaper
9 pages
ML Unit II Modelling Notes
No ratings yet
ML Unit II Modelling Notes
18 pages
Research Stock
No ratings yet
Research Stock
6 pages
STOCK PRICE PREDICTOR USING MACHINE LEARNING Report
No ratings yet
STOCK PRICE PREDICTOR USING MACHINE LEARNING Report
39 pages
Prediction Stock Price Using Data Science Technique
No ratings yet
Prediction Stock Price Using Data Science Technique
11 pages
Group 4 Stock Market Prediction
No ratings yet
Group 4 Stock Market Prediction
23 pages
Stock Market Prediction Using Ensemble Learning
No ratings yet
Stock Market Prediction Using Ensemble Learning
48 pages
Model Selection On ML
No ratings yet
Model Selection On ML
49 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Ieee Research Paper
No ratings yet
Ieee Research Paper
2 pages
Supervised Learning - Basics
No ratings yet
Supervised Learning - Basics
115 pages
Improving Regressors Using Boosting Techniques: Observations, XX
No ratings yet
Improving Regressors Using Boosting Techniques: Observations, XX
9 pages
Final Report
No ratings yet
Final Report
19 pages
Research Stock
No ratings yet
Research Stock
6 pages
ML11 Generalization
No ratings yet
ML11 Generalization
40 pages
Stock Prediction Using News Headlines - ML
No ratings yet
Stock Prediction Using News Headlines - ML
16 pages
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
No ratings yet
Statistical Machine Learning: Yiqiao YIN Department of Statistics Columbia University
204 pages
Unit 3
No ratings yet
Unit 3
55 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
DM Unit - 3
No ratings yet
DM Unit - 3
21 pages
Data Science Assignment 2
No ratings yet
Data Science Assignment 2
14 pages
Empirical Study On The Performance of Various Mach
No ratings yet
Empirical Study On The Performance of Various Mach
16 pages
Final Thesis Yifan Cao
No ratings yet
Final Thesis Yifan Cao
178 pages
CE802 Pilot
No ratings yet
CE802 Pilot
2 pages
Part 3
No ratings yet
Part 3
15 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
Prediction - Accuracy
No ratings yet
Prediction - Accuracy
33 pages
Group 12 - Final Presentation
No ratings yet
Group 12 - Final Presentation
51 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Big Data Lesson 2 Lucrezia Noli
No ratings yet
Big Data Lesson 2 Lucrezia Noli
21 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
An Accurate Prediction of Price of Stock Using Linear Regression Model of Machine Learning
No ratings yet
An Accurate Prediction of Price of Stock Using Linear Regression Model of Machine Learning
6 pages
FULLTEXT01
No ratings yet
FULLTEXT01
56 pages
INT354 - Unit 5
No ratings yet
INT354 - Unit 5
35 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
100% (1)
Radial Basis Functions With Adaptive Input and Composite Trend Representation For Portfolio Selection
13 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Model Selection NEW
No ratings yet
Model Selection NEW
24 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
Unit V - Big Data Programming
No ratings yet
Unit V - Big Data Programming
22 pages
Presentation On Supervised Learning
No ratings yet
Presentation On Supervised Learning
8 pages
Laptop Price Pred
No ratings yet
Laptop Price Pred
11 pages
Chapter 2 Machine Learning Draft-85-172
No ratings yet
Chapter 2 Machine Learning Draft-85-172
88 pages
Module 2
No ratings yet
Module 2
5 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Defect Prediction in Software Development & Maintainence
From Everand
Defect Prediction in Software Development & Maintainence
Rudra Kumar
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Introduction and Fundamental System Structures
No ratings yet
Introduction and Fundamental System Structures
19 pages
Assignment-V (MA2201) 2023
No ratings yet
Assignment-V (MA2201) 2023
2 pages
6145 Maths 1
100% (1)
6145 Maths 1
866 pages
أثر تكنولوجيا المعلومات والاتصالات على النمو الاقتصادي في الدول النامية دراسة قياسية خلال الفترة 2005-2015
No ratings yet
أثر تكنولوجيا المعلومات والاتصالات على النمو الاقتصادي في الدول النامية دراسة قياسية خلال الفترة 2005-2015
22 pages
Arima and Holt Winter PDF
No ratings yet
Arima and Holt Winter PDF
43 pages
DAA Project
No ratings yet
DAA Project
20 pages
MATH 1. Module 4.aldrin
No ratings yet
MATH 1. Module 4.aldrin
58 pages
SME6024 Teaching Statistics & Probability
No ratings yet
SME6024 Teaching Statistics & Probability
18 pages
Teradata Join Processing PDF
No ratings yet
Teradata Join Processing PDF
23 pages
Econometrics I-For Lectuure Latest
67% (3)
Econometrics I-For Lectuure Latest
148 pages
A PROPOSAL ON Profability ANALYSIS OF Siddhartha Bank Limited
No ratings yet
A PROPOSAL ON Profability ANALYSIS OF Siddhartha Bank Limited
8 pages
0: Systematic Error: Experiment: Measuring Resistance I
No ratings yet
0: Systematic Error: Experiment: Measuring Resistance I
11 pages
Nguyen Et Al., 2020
No ratings yet
Nguyen Et Al., 2020
9 pages
Econometrics PPT Final Review Slides
No ratings yet
Econometrics PPT Final Review Slides
41 pages
GIS, Remote Sensing - Applications in The Health Sciences
No ratings yet
GIS, Remote Sensing - Applications in The Health Sciences
231 pages
Unit3 Business Stats Hypothesis
No ratings yet
Unit3 Business Stats Hypothesis
119 pages
Department of Hospitality Management
No ratings yet
Department of Hospitality Management
19 pages
LAB 11 Refine Factorial Design
No ratings yet
LAB 11 Refine Factorial Design
16 pages
BS in BBT
No ratings yet
BS in BBT
36 pages
Age and Gender Detection Using Deep Learning
No ratings yet
Age and Gender Detection Using Deep Learning
14 pages
The Logistics Performance Effect in International PDF
No ratings yet
The Logistics Performance Effect in International PDF
10 pages
01-02 Statistical Report
No ratings yet
01-02 Statistical Report
126 pages
Astm C 823 Sampling
No ratings yet
Astm C 823 Sampling
7 pages
Chapter 1 Concept of Economics and Significance of Statistics in Economics
No ratings yet
Chapter 1 Concept of Economics and Significance of Statistics in Economics
6 pages
Chapter 3
No ratings yet
Chapter 3
10 pages
Transportation Planning-Principles, Practices and Policies: I-J I J I-J I J J
No ratings yet
Transportation Planning-Principles, Practices and Policies: I-J I J I-J I J J
6 pages
Lean Six Sigma Green Belt Foundation - Doe Engg Process Reliability & Optimisation - Electronics & Electrical Industry
No ratings yet
Lean Six Sigma Green Belt Foundation - Doe Engg Process Reliability & Optimisation - Electronics & Electrical Industry
7 pages
Ivy - Data Analytics and Data Visualization Certification Course
No ratings yet
Ivy - Data Analytics and Data Visualization Certification Course
9 pages

Report

Uploaded by

Report

Uploaded by

Stock Price Trend Forecasting using Supervised Learning Methods.

Sharvil Katariya1 Saurabh Jain2

After the data-set is transformed into clean data-set, the

Fig. 3. Comparison Graphs RMSE Value - Different Models

Fig. 4. Comparison Graphs R-squared Value - Different Models

You might also like