0% found this document useful (0 votes)
42 views2 pages

Report

This document summarizes research on using supervised learning methods to forecast stock prices based on past returns and news indicators. It discusses preprocessing data by cleaning, transforming, and dividing it into training and testing sets. Various classification algorithms are evaluated using root mean squared error and R-squared values, with gradient boosting regressor found to perform best. Bagging and random forest regressors also performed well. The results show ensemble methods like gradient boosting and bagging improved predictions over single models.

Uploaded by

Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views2 pages

Report

This document summarizes research on using supervised learning methods to forecast stock prices based on past returns and news indicators. It discusses preprocessing data by cleaning, transforming, and dividing it into training and testing sets. Various classification algorithms are evaluated using root mean squared error and R-squared values, with gradient boosting regressor found to perform best. Bagging and random forest regressors also performed well. The results show ensemble methods like gradient boosting and bagging improved predictions over single models.

Uploaded by

Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Stock Price Trend Forecasting using Supervised Learning Methods.

Sharvil Katariya1 Saurabh Jain2

Abstract— The aim of the project is to examine a number B. Feature Selection and Feature Generation
of different forecasting techniques to predict future stock
returns based on past returns and numerical news indicators We created new features from the base features which
to construct a portfolio of multiple stocks in order to diversify provided better insights of the data like 50 day moving
the risk. We do this by applying supervised learning methods average, previous day difference, etc.
for stock price forecasting by interpreting the seemingly chaotic To prune out less useful features, in Feature Selection, we
market data. select features according to the k highest scores, with the help
of an linear model for testing the effect of a single regressor,
I. INTRODUCTION
sequentially for many regressors. We used the SelectKBest
The fluctuation of stock market is violent and there Algorithm, with f regression as the scorer for evaluation.
are many complicated financial indicators. However, the Furthermore, we added Twitters Daily Sentiment Score,
advancement in technology, provides an opportunity to as an feature for each company based upon the users tweets
gain steady fortune from stock market and also can help about that particular company and also the tweets on that
experts to find out the most informative indicators to make companys page.
better prediction. The prediction of the market value is of
paramount importance to help in maximizing the profit of III. A NALYSIS
stock option purchase while keeping the risk low. For analyzing the efficiency of the system we are used the
Root Mean Square Error(RMSE) and r2̂ score value.
The next section of the paper will be methodology where
we will explain about each process in detail. After that we A. Root Mean Squared Error (RMSE)
will have pictorial representations of the analysis that we The square root of the mean/average of the square of all
have made and we will also reason about the results achieved. of the error.
Finally, we will define the scope of the project. We will talk The use of RMSE is very common and it makes an excel-
about how to extend the paper to achieve more better results. lent general purpose error metric for numerical predictions.
Compared to the similar Mean Absolute Error, RMSE
amplifies and severely punishes large errors.
II. METHODOLOGY
This section will give you the detailed analysis of each
process involved in the project. Each sub section is mapped
to one of the stages in the project.

A. Data Pre-Processing
Fig. 1. RMSE Value calculation
The pre-processing stage involves
• Data discretization: Part of data reduction but with
particular importance, especially for numerical data
• Data transformation: Normalization.
• Data Cleaning: Fill in missing values.
• Data Integration: Integration of data files.

After the data-set is transformed into clean data-set, the


data-set is divided into training and testing sets so as to
evaluate. Here, the training values are taken as the more
recent values. Testing data is kept as 5-10 percent of the
total dataset.
*This work was supported by International Institute of Information
Technology
1 Sharvil Katariya is a student in Computer Science at IIIT Hyderabad,
India.
2 Nikhil Chavanke is a student in Computer Science at IIIT Hyderabad, Fig. 2. RMSE Value calculation
India.
B. R-Squared Value(r2̂ value) V. R ESULTS
The value of R2 can range between 0 and 1, and the higher Based on the results obtained, it is found that Gradient
its value the more accurate the regression model is as the Boosting Regressor consistently performs the best. This is
more variability is explained by the linear regression model. followed by Bagging Regressor, Random Forest Regressor,
R2 value indicates the proportionate amount of variation in Adaboost Regressor and by K Neighbour Regressor.
the response variable explained by the independent variables. Bagging Regressor is found to perform good as Bagging
R-squared is a statistical measure of how close the data are (Bootstrap sampling) relies on the fact that combination of
to the fitted regression line. It is also known as the coefficient many independent base learners will significantly decrease
of determination, or the coefficient of multiple determination the error. Therefore we want to produce as many independent
for multiple regression. base learners as possible. Each base learner is generated
by sampling the original data set with replacement. From
TABLE I
the results, it is safe to say that additional hidden layer(s)
C LASSIFIER E VALUATION
improve upon the score of the models.
Random Forest is an extension of bagging where the
Algorithm RMSE Value R-squared Value
Random Regressor 1.4325434e-07 0.956669 major difference is the incorporation of randomized feature
Bagging Regressor 1.329966e-07 0.959771 selection.
Adaboost Regressor 2.9882972e-07 0.909611
KNeighbours Regressor 0.00039015 -117.01176 ACKNOWLEDGMENT
Gradient Boosting Regressor 1.274547e-07 0.961448
We would like thank Soham Saha for mentoring our
project and introducing us to the new state-of-art tech-
nologies and helping us at every stage of this project. We
IV. G RAPHS
would also like to thank Dr. Bapi Raju, our course instructor
for Statistical Methods in AI, and clearing basic concepts
required as part of the Project.
R EFERENCES
[1] https://en.wikipedia.org/wiki/F-test
[2] http://goo.gl/4OI84b
[3] http://scikit-learn.org/stable/
[4] http://deeplearning.net/software/theano/
[5] http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[6] http://people.duke.edu/ rnau/411arim.htm - []() - []() - []()

Fig. 3. Comparison Graphs RMSE Value - Different Models

Fig. 4. Comparison Graphs R-squared Value - Different Models

You might also like