0% found this document useful (0 votes)
1 views

Stock Price Prediction using Machine Learning

This document discusses the use of machine learning techniques, such as Linear Regression, Support Vector Machine, and Decision Tree, for predicting stock prices on financial exchanges like NSE and BSE. It outlines the methodologies for data preprocessing, feature selection, and the organization of the project, which includes a literature survey of ten research papers related to stock market prediction. The objective is to enhance prediction accuracy by utilizing historical data and various machine learning algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Stock Price Prediction using Machine Learning

This document discusses the use of machine learning techniques, such as Linear Regression, Support Vector Machine, and Decision Tree, for predicting stock prices on financial exchanges like NSE and BSE. It outlines the methodologies for data preprocessing, feature selection, and the organization of the project, which includes a literature survey of ten research papers related to stock market prediction. The objective is to enhance prediction accuracy by utilizing historical data and various machine learning algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Abstract

In the world of finance, stock trading is the most essential activity. Predicting the stock market
is an act of determining the value of a stock in near future and other financial instruments
traded onthe financial exchange such as NSE, BSE. The fundamental and technical analysis is
being in use by the brokers of stock exchange when stocks are being predicted. Here in this
report we proposed the method which is called as Machine learning (ML) which is made
available by training the stock data, will then gain intelligence and thus finally uses the
acquired knowledge for an appropriate prediction. We used many techniques such as Linear
Regression, Support Vector Machine and Decision Tree to predict prices of a stock for small
and large capitalizations and in the different markets, employing prices daily with the minute
frequencies. Linear Regression is used for when the data is in the form of Linearity, or the
data seems to be nearby the line to get fitted. In Support Vector Machine, when the data is
spread then the line from where the most of the points pass is drawn and from there the
vectors from the points to the line are drawn. Meanwhile, in Decision Tree based on the
previous data decisions are made that effect of all the alternatives are checked and the most
suitable one is decided for the work to be performed.

i
Chapter-1

INTRODUCTI

ON

This chapter is made to introduce you with the . A basic and rough idea of what is the aim and
problem statement of our project. We have also mentioned what are we trying to accomplish
in this project. Also, all the technologies and platforms have been listed below in the project.

1.1 Overview
Securities exchange expectation is utilized to anticipate the future estimations of organizations
stock or other money related instruments that are completely showcased on monetary trades.
Notwithstanding, the financial exchange is affected by numerous elements, for example,
political occasions, monetary conditions and brokers desires. Be that as it may, Stock market
changes are totally irregular but on the other hand are explicit.

1.2 Problem Statement


The point of the task is to ascertain or anticipate the future stock costs of organizations
utilizingan alternate number of AI and estimating strategies reliant on authentic returns just as
numericalnews markers to fabricate an arrangement of numerous or different stocks so as to
expand the issue. We do this by putting managed learning techniques for stock value
anticipating by understanding the idea of dataset.

1.3 Objective
To create take a dataset of renowned
company. Feature extraction using

fundamental analysis Applying reduced


dataset

Evaluating accuracy
Plotting and analyzing the graph

1
1.4 Methodologies

1.4.1. Data Pre-Processing

The pre-processing stage formally involves


a. Data discretization: Reducing the data but with our importance and in accordance with
thealgorithms
b. Data transformation: Normalizing the data
c. Data Cleaning: Cleaning i.e. removing all the unnecessary elements from the dataset
keeping the ones we actually needed.
d. Data Integration: Integration of data files
After the data-set is transformed into clean data-set, training and testing datasets are being
taken from the dataset for further work and evaluation. Here, the training values are taken as
the more recent values because we focus more on training the algorithms and model first. We
have kept testing dataset as five to ten percent of the training dataset.

1.4.2. Feature Selection and Feature Generation

We created new features which will eventually provide the better insights of the project like
calculating mean, standard deviation, difference in prices, mean errors, squared errors, r2
score etc. We select properties as per the SVM regression, Linear regression and Decision
Tree with help of model which is linear for testing the single regressor effects or for many
regressors sequentially. We used the Support vector regression, Linear Regression, Decision
Tree algorithmin our project.

1.5 Organization
1st Chapter: It includes briefing of the project. A fundamental idea of what is the target of our
project and the problem statement. We had also mentioned what we are trying to complete in
this project. In addition to, all the technologies and platforms have been listed below in the
project.
2nd Chapter: Second chapter contains all the literature survey. We have written about all
theresearch papers that we have studied throughout in understanding and developing of the
project. a variety of papers and publications from well-known sources on machine learning
and neuralnetwork have been stated in this unit.

2
3rd Chapter: System development is being covered in this unit. We have mentioned how the
model and our project have been evolved over the time. We have drawn several flow charts
for model extraction and system extraction for better understanding of the project.
4th Chapter: Fourth chapter includes all the algorithms and mathematical formulas used in our
project. Different steps used in applying algorithms. Inputs and their Outcomes have been
covered, which are then analyzed.
5th Chapter: Fifth chapter discussed conclusion of the outputs that on what basis it came
out ,and future scope of the project have been written.

3
Chapter-2

LITERATURE SURVEY

This section contains ten Research papers and journals that we have studied from various
reputedsources.

2.1 Summary of Papers

2.1.1

Title A forecast approach for securities exchange instability dependent on


time arrangement information [1]
Authors Mohmadd khan

M. Asraaafi Alam

Parul Goel

Year of
Publications 25 January 2019

Publishing Details IEEE Access


This paper tells why not the notion of series of time analysis and
forecasting shoul be correct in the respect of Indian economy. The main
fall of the currency in the previous times had lead to the important need.
The paper not only tries to make an efficient Model but also to guess the
Indian stock market volatility. The available always time series data of
Indian stock market has been used for this study. The analysed time
series has been compared with the original time series, which shows
roughly a deviation of 5% mean percentage mistake for both Nifty and
Sensex on average. Different tests can be tried for the validation of the
predicted time series.

Summary

4
2.1.2

Near investigation of stock pattern expectation utilizing time


Title delay, intermittent and probabilistic unbiased systems [2]

Saad

Prokhorov V.D

D.C. Wunsch
Authors

Year of Nov 1998


Publications
Publishing IEEE Transactions on Neural Networks
Details
Summary Foreseeing condensed term stock patterns are reliant on history or past
estimations of day by day shutting costs which are conceivable
utilizing any of the various systems talked about. At first assurance
examination and neuro-designing are significant for fruitful execution.
The correlation coefficient measured is important in estimating the
delay between the inputs of TDN. A minute exponent indicates either
a cyclic or systematic behavior of the stock. Later here, it was noticed
that a training set which is relatively short should be used. TDNN is
moderate in terms of memory requirement and implementation
complexity. PNN has advantages of extreme implementation which
should be simple and low false alarm rate even for stocks with low
predictability. PNN is more suitable for stocks which do not need
training on long or large history, foreign stocks of Apple. Like TDNN,
RNN don’t need large storage memory, but the wrong thing is the
complexity of implementation—a one-time task.

5
2.1.3

Forward figure of stock value utilizing sliding window


Title metaheuristic-improved AI relapse [3]

Sheng Chou

Kha Nguyen

Authors

Year of July 2018


Publications
Publishing IEEE Transactions on Industrial Informatics
Details
Choice to sell or purchase a stock is perplexing since numerous
components are influenced by stock cost. This presents a novel
methodology, in view of a MetaFA-LSSVR, to develop an anticipating of
stock value master framework, with the point of improving exactness of
determining. The keen time arrangement anticipating framework that is
utilized by sliding-window metaheuristic streamlining is a GUI that runs
as an independent application. The framework make the financial
exchange esteems expectation less difficult, including not many and not
many calculations, than that utilizing the other strategy thatwere
referenced. The first FA is enhanced with metaheuristic parts—turbulent
maps, weight, versatile latency and Levy flight—to develop an
advancement algo-rithm (MetaFA). The exhibition which is prevalent of
the MetaFA was confirmed by benchmark capacities. In this manner, the
MetaFA was embraced to tune the automaticallythe hyperparameters C
and σ of the LSSVR. The streamlined expectation model was utilized
with the sliding window to conjecture and assess stock cost. Default
Summary

6
setting of the framework, including prespecified estimations of
parameters, spares the hour of clients. To assess the methodology
which is proposed, it was applied to five datasets ,and three other stock
datasets that have been utilized elsewhere. Factual measures were
acquired when applied to development organization datasets.
Specifically, the expectation of one day is 2597.TW Stock costs was
better than any development organization stock costs, with a MAPE of
1.372%, a R of 0.990. Towards the finish of the investigation, the
monetary presentation of the framework which was proposed was and
analyzed, with empowering results. In this way, the proposed
framework can be utilized as a choice taking device to figure stock
costs for putting resources into present moment

The examination centers around the securities exchange .To sum up


the utilization of the proposed framework, future work should utilize
the proposed framework to evaluate stocks in other developing or
develop markets, for example, Vietnam. At last, the advancement of
an application which is online ought to be considered to improve the
ease of use and ease of use of the master framework. The confinement
of the framework is its computational speed, particularly as for
approval of sliding window, in view of the intricacy of illuminating
enormous science circle in the MATLAB program. The computational
cost ascends with the quantity of approvals. Another shortcoming is the
need of characterizing parcel of parameters of the framework however
the default settings as gave. Besides, the framework didn't accomplish
remarkable outcomes for long haul venture—a finding that will be
found in future inquires about

7
2.1.4

Title Incorporating a piecewise straight portrayal strategy and a neural


system model for stock exchanging focuses expectation [4]

Authors Cha Pei:

Fan Yuan:

Chen-Hao Liu :
.

02 December 2008
Year of
Publications
IEEE Transactions on Systems, Man, and Cybernetics, Part C

Publishing Details (Applications and Reviews)


An ordinary measure of research was directed to contemplate the
conduct of development of stock cost. In any case, the financial
specialist is keen on making advantage or benefit by being given simple
exchanging choices, for example, hold/purchase/sell from the framework
rather choosing the stock value itself. Subsequently, an alternate strategy
is made by applying PLR to shape various classifications of past
information. As a discovering, turning or evolving focuses (trough or
pinnacle) of the recorded or the stock information can be resolved and
afterward be utilized as contribution to the BPN to prepare the
association weight of the model. At that point, another arrangement of
information is input that can actuate the model when a sell or purchase
point is became acquainted with by the BPN. A savvy PLR model is then
framed by joining the GA with the PLR. This improves the edge or the
base estimation of PLR to additionally raise the benefit of model. The
IPLR article is tried on different kinds of stocks, i.e., upturn, consistent,
and downtrend. The tried outcomes show that the IPLR approach can
make an adequate measure of benefit/advantage particularly on upswing
and downtrend states as opposed to consistent state. By and large, the
proposed framework is successful in its expectations in regards to the
Summary

8
future exchanging purposes of a specific stock. Notwithstanding, there is
one issue that is the value variety of the stock. It is seen that if the variety
of cost of the present stock is estimated either in an upswing or a
downtrend, at that point it is better that we train our BPN with a
coordinating example, i.e., either in a comparative downtrend or upturn
period.

2.1.5

An epic quick recurrence calculation and its application in


Title stock record development prediction[5]

Liming Zhang

Pengyi Yu

Authors

Year of 11 May 2012


Publications
Publishing IEEE Journal
Details
Summary
A very quick recurrence called - tallying IF is proposed. The recently
characterized IF relieves the three necessities for sure fire recurrence. The
rule is simple and straight forward. It tends to be utilized to make a
straightforward wave, including IMFs got from an EMD calculation. Its
theoretical fundamental and being basic and wide use make it to be of free
intrigue. As the significant application is proposed - checking IF is then
used to anticipate or on the other hand break down stock list utilizing a use
of EMD disintegration. It is anticipated that the - checking IF techniques

9
may have significance in applications in money related information
investigation.

2.1.6

Title Neurofuzzy with half and half elements of instability and sign
forecast [6]
Authors D. Bekiros

Year of Publications 06 October 2011


Publication details IEEE Transactions on Neural Networks
Dependable anticipating strategies for budgetary applications are
significant for financial specialists either to make benefit by
exchanging or break against potential market dangers. In this paper the
productivity of an exchanging plan depend on the usage of a
Summary
neurofuzzy model is checked, so as to decide the heading of the market
stock trade returns. Additionally, it is demonstrated that the
combination of the appraisals of the restrictive unpredictability
changes, as per the hypothesis of Bekaert and Wu, firmly improves the
consistency of the neurofuzzy model, as it gives suitable data to a
potential defining moment on the future exchanging day. The general
return of the proposed unpredictability based neurofuzzy model
including give and take(transaction) costs is reliably better than that of
a Markov-exchanging model, a forward neural system just as a
purchase and hold plan. The discoveries can be demonstrated by
summoning either the "unpredictability criticism" hypothesis or the
presence of portfolio protection plots in the business sectors of value
and are likewise reliable with the view that instability reliance produces
sign reliance.

In this manner, an exchanging technique reliant on the proposed

10
neurofuzzy model may permit financial specialists to acquire a bigger
number of profits than the latent portfolio the executives plan.

2.1.7

Expectation of financial exchange execution by utilizing distinctive


Title ML methods [7]

Authors Karman Hasan

04 May 2017
Year of Publications
2017 International Conference

Publishing Details

One choice can have enormous effect on a financial specialist's life in


Stock Market. The financial exchange is an extremely perplexing
framework and frequently a riddle, so it is , hard to dissect all the
components affecting before settling on a choice. In this exploration, they
have planned a securities exchange forecast model dependent on various
variables. The model was worked to anticipate KSE-100 list execution.
The market can be negative or positive with various qualities as
anticipated by the model. The components included are value, loan cost,
ware, outside trade, overall population feeling, variance of fuel
anticipated qualities with assistance of chronicled or the past information
of the market. The strategies utilized for expectation incorporate four
distinct forms of Artificial Neural Network (ANN) including Radial Basis
Function (RBF). All the strategies were contrasted with locate the best
anticipating model. The outcomes indicated that MLP performed best and
anticipated the market with precision of 77%. Each factor was
concentrated freely to discover the relationship with advertise execution.
The adjustment in Petrol costs demonstrated the most grounded
relationship with advertise execution. The outcomes proposed that
conduct of market can be anticipated utilizing AI methods.
Summary

11
2.1.8

Title Support Vector Machine used in Stock Market Prediction [8]

Zhen Hu

Jie Zhu

Ken Tse

Authors

Year of 09 January 2014


Publications
6th International Conference
Publishing
Details
A great deal of studies give proof of confronting noteworthy measure of
difficulties in out-of test consistency tests because of vulnerability in model
and flimsiness in parameter. As of late the presentation of certain
methodologies that beat these issues are found. Bolster Vector Machine
(SVM) is moderately another learning calculation having the attractive
qualities of controlling the choice capacity, the utilization of the portion
technique, and the sparsity of the arrangement. In this paper, they
introduced an observational and hypothetical structure for applying the
SVM methodology to anticipate the stocks. Right off the bat, some
organization explicit and six macroeconomic elements that may impact the
stock pattern are chosen for additional multivariate investigation. Besides,
Support Vector Machine is utilized in finding the relationship of these
components and foreseeing the performance. The result recommends that
Summary

12
SVM is a force sponsor instrument in anticipating the monetary market for
stock.

2.1.9

Protections trade Prediction assessment by merging


Title social and news end and conclusion [9]
Zhaoxia Wang

Seng-Beng Ho

Zhiping Lin

Authors

Year of 11 February 2019


Publications
Conference on Data Mining Workshops
Publishing
Details
Summary
The cost of stock is a decent marker for an organization and their numbers
can be influenced by numerous components. Various occasions are
influenced by open assessments and feelings in an unexpected way, which
may influence the pattern of securities exchange costs. In light of reliance of
different elements, the stock costs are not changeless, however are rather
unique,. Because of its higher learning capacity for understanding the
nonlinear time arrangement forecast issues, AI has been applied to the
examination region. Learning-based techniques for stock value expectation
are known and a great deal of achieved procedures have been utilized to
improve the consequences of the learning based indicators.
Notwithstanding, doing the effective securities exchange expectation is as
yet an errand. News stories and online life information are additionally
helpful and significant in money related expectation, yet right now nothing
but bad strategy exists that can contemplate these web based life to give

13
better examination of the monetary market. This paper attempts to
effectively foresee stock cost through thinking about the connection
between the stock cost and the news. Contrasted and as of now
introduced learning-based strategies, the adequacy of this new upgraded
learning-based strategy is appeared by utilizing the genuine stock value
informational collection with an improvement of execution as far as
diminishing the Mean Square Error (MSE). The discoveries of this
paper not just attempt to tell the benefits of the proposed technique, yet
attempts to bring up to the right course for future work in this degree
moreover.

2.1.10

AI Techniques for Stock Price Prediction [10]


Title

Sumeet Sarode

Harsha G. Tolani

Prateek Kak

Authors

14
21 November 2019
Year of
Publications
Conference on ICISS
Publishing
Details
In recent time economies, there is a measurable effect of the stock or value
advertise. Expectation of stock costs is extremely muddled, turbulent, and
the nearness of a unique culture makes it a troublesome test. Social method
of fund proposed that dynamic procedure of speculators to an extremely
huge degree impacted by the notions and feelings because of a specific news.
In this way, to help the choices of the financial specialists, we have a
methodology joining two unique fields for investigation of stock trade. The
framework consolidates value expectation dependent on past and constant
information alongside news examination. LSTM (Long Short-Term
Memory) is utilized for foreseeing. It requires the most recent exchanging
data and examination quantifies as its info. For news investigation, just the
related and live news is gathered from a major arrangement of business
news. The arranged news is examined to anticipate notion of around
organizations. The consequences of the two examinations are consolidated
together to get an outcome which gives a suggestion for future ascents.

Summary

15
Chapter-3

System Development

Third chapter includes system development. We have mentioned how the model and our
project have been evolved over the time. We have drawn several flow charts for model
extraction and system extraction for better understanding of the project

3.1 Approach using Support Vector Machine (SVM)

We are going to use Support vector machines (SVMs) for supervised learning methods as
for categorization, reverting and outliners detection.
“Support Vector Machine” (SVM) is a directed AI calculation and we can utilize it for order
and relapse issues yet we ordinarily use it for grouping issues. In SVM calculation,
information things are plotted.Then, hyper-plane is identified by performing classification that
differentiates the planes into two halves.

Fig3.1: SVM Diagram

16
Support Vectors are simply the co-ordinates of individual observation. Support Vector
Machine is a frontier which best segregates the two classes (hyper-plane/line).

SVM has the following advantages:

a. Effective in high dimensional spaces.


b. It is helpful in places where amount of tests are less that measure of measurements

c. Uses a subset of training points in the decision function (called support vectors), so
it isalso memory efficient.

d. Versatile: SVM has the advantage of determining diverse piece capacities. We as a


rulegive normal bits however we can also specify custom kernels.
The disadvantages of support vector machines are:

a. If the features are greater than the samples, avoid over fitting in choosing
Kernelfunctions and regularization term in crucial.
b. SVMs don't directly provide estimates of profitability, these are solved using
anexpensive five-fold cross-validation.

The support vector machines in scikit-learn support both dense and sparse sample vectors as
input. Data must have been fit properly in order to fit predictions for sparse data..We have
used C-ordered numpy.ndarray(dense) or scipy.sparse.csr_matrix(sparse) having
dtype=float=64 for optimal performance..

3.2 Classification and Regression


The picture beneath shows SVC, NuSVC and LinearSVC are classes that are fit for
performing multi-class characterization on a dataset

17
.

Fig 3.2: Classification of Data……[a]

SVC and NuSVC are comparable strategies, however they have slight contrast in the
arrangement of parameters and furthermore, they have diverse numerical plans. On the
othernote, LinearSVC is some another execution of Support Vector machines for the
instance of astraight part. In LinearSVC, Kernel is thought to be direct and that is the
reason it doesnot acknowledge catchphrase piece.
As other classifiers, SVC, NuSVC and LinearSVC take two arrays as input: an array X of
size [n_samples, n_features] which will hold the training samples, and second array y of
class labels (strings or integers), size [n_samples]:

from sklearnimportsvm
x= [[0, 0], [1, 1]]
y = [0, 1]
clf=svm.SVC(gamma='scale')
clf.fit(x, y)
SVC(C=1.0, cache_size=200, class_weight=None, coef=0.0,
decision_function_shape='ovr', degree=3, gamma='scale', kernel='rbf',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001,
verbose=False)

clf.predict([[2.,
2.]])array([1])

18
# get support vectors

clf.support_vectors_array([[0., 0.], [1., 1.]])

# get files of help vectors


clf.support_ array([0, 1]...)
# get number of help vectors for each class
clf.n_support_ array([1, 1]

3.2.1 Classification SVM Types


3.2.1.1 Classification SVM Type1

In this type of SVM, training involves error function minimization:

eq. 1
In limitations with:

eq. 2

where C is the breaking point consistent, w is the vector of coefficients, b is a predictable,


andaddresses boundaries for managing non particular data (inputs). The rundown names
the N getting ready cases. Note that addresses the class names and xi addresses the self-
sufficient elements. Data input is moved to incorporate space using part. More misstep is
rebuffed if C regard is greater.. Thusly, C should be picked with care to keep up a key good
ways from over fitting.3.2.1.2 Classification SVM Type2

The Classification SVM Type 2 involves model minimizes the error function:

eq. 3
subject to the constraints:

eq. 4

In a relapse SVM, subordinate variable and free factor reliance is evaluated.. It expect, as other
relapse issues, the free and ward relationship is distinguished by deterministic capacity f(x)
including a portion of the added substance commotion.

19
3.2.2 Regression SVM
3.2.2.1 Regression: SVM Type1

y = f(x) + noise eq.5


We need to find a useful structure for f(x)which will adequately foresee new cases that the
SVM has not been given already. This can be cultivated by means of setting up the SVM
model on a model set, i.e., getting ready set, a system that incorporates, like request the
progressive improvement of a botch work.

3.2.2.2 Regression SVM Type2

The error function is given by:

eq. 6

which we minimize subject to:

eq. 7

Support Vector Machines models can use many number of kernels. These include polynomial,
linear, radial basis function (RBF) and sigmoid.

3.3 Abstraction based extraction


Support Vector Machines has a basic idea dependent on recognizing of choice planes that
characterize choice limits. A choice plane is one that isolates between various arrangement of
articles with participations of various classes..The picture beneath delineates that. The
picture, the items have a place either with class GREEN or RED. The limit that characterizes
the different lines on the correct side of articles are GREEN and left side items are RED.
Another article which is on the correct side is stamped, i.e., characterized, as GREEN and left
one is delegated RED

.
Fig 3.3(a): Abstraction

20
The above model is of a direct classifier, i.e., a classifier isolating two arrangements of articles
asGREEN and RED. Most arrangement errands, in spite of the fact that, are not unreasonably
basic, and frequently progressively complex structures are required so as to make an ideal
partition, i.e., accurately grouping new articles (test information) based on the models that are
accessible (train information). This circumstance is appeared in the picture below.When past
schematic are being analyzed, plainly a full partition of the GREEN and RED articles would
require a bend (which is more perplexing than a line). Order undertakings that depend on
attracting isolating lines to recognize objects of various class participations are known as
hyperplane classifiers. Support Vector Machines are structured in such an away to deal with
suchundertakings.

Fig 3.3(b): Abstraction

The picture underneath shows the fundamental structure behind Support Vector Machines.
Here we can see the first items (left half of the schematic) mapped, that are revised, utilizing a
lot of scientific capacities which are known as portions. The way toward reworking the items
is known as mapping transformation. In the new setting, the mapped objects (right half of the
outline) is straightly distinguishable along these lines, rather than building the unpredictable
bend (left delineation), we need to do is to locate an ideal line which will isolate the GREEN
and the RED items.

Fig 3.3(c): Abstraction

21
3.4 Parameters
There are two types of SVR: Linear and Multiple.
Tuning parameters esteem for calculations in Machine Learning emotionally improves the
model execution. There are rundown of parameters accessible with SVR, which we will
examine, so some significant parameters having higher effect on model execution, 'Part',
'Degree', 'Gamma', 'C'.
a. Kernel: It is a similarity function and requires two inputs and spits out how similar they
are. It helps in representing the infinite set of discrete function in a family of constant
function.
b. Gamma: It is used in RBF (Radial Basis Function) model to indicate variance. A
smallgamma means a Gaussian surface with large variance. Gamma controls the shape of
peaks and height of pointed curves, higher the value of gamma, will try to exact fit
the trainingdatasets that is generalization error and cause overfitting problem.
c. C:It is a penalty parameter for error term. To have the best fit some points in regression
canalways be ignored this is indicated by c or C means low bias and high variance
as you penalise a lot misclassification. It also controls the trade-off between smooth
decision boundary and classifying the training points correctly.
d. Degree: Degree parameter specifies the degree of poly Kernel function. There is a trail and
dealing with degree parameter. The more the degree parameter, more the accuracy but this
also leads to more computational time and complexity.

3.5 Approach Using Linear Regression

Straight Regression being the underlying sort of relapse investigation to be thoroughly


contemplated, and to be broadly utilized by and large applications. This is because of models
or applications which relies straightly upon their obscure parameters are simpler to fit than
models which are non-directly identified with their parameters and in light of the fact that the
measurable properties of the subsequent estimators are simpler to decide.
On the off chance that the objective is expectation, estimating, or mistake decrease straight
relapse may be utilized to fit a prescient model to a watched informational collection of
estimations of the reaction and logical factors. In the wake of growing such a model, if extra
estimations of the logical factors are gathered without a going with reaction esteem, the fitted
model can be utilized to make a forecast of the reaction.

22
3.5.1. Proposed Algorithm

Data

Data Normalization

Features Selection

Prediction

Performance Measure

Fig 3.4: Linear Regression Algorithm

3.6 Approach using Decision Tree Analysis


A Decision Tree Analysis is a scientific model and is often used to make decisions in an
organization. The graphic presentation is being shown by a tree type structure or building
inwhich the issues can be checked in the form of flowchart, each with options or
branches of alternating choices.
Decision Trees are very nice tools for helping to choose between several courses of
actions. In Stock Prediction, the features are extracted from the daily stock market data,
and then the related features are selected using decision tree. An approx set based
classifier is used then to predict the next day’s trend.

3.6.1. Terminologies used

a. Root Node: A root nodes means the whole sample, it is then divided into multiple sets
whichis made up of homogenous or similar variables.
b. Decision Node: A sub node that diverges or parts away into furthermore possibilities
orchances is known as decision node.
c. Terminal Node: The final node showing the output o the outcome which can’t be
categorized further, is a leaf or terminal node basically where everything terminates.
d. Branch: it denotes the various alternatives or options available with decision tree
makingperson.
e. Splitting: The division or separation of the available choice into the multiple sub
nodes iswhat is known as splitting.
f. Pruning: Its just the opposite or vice versa of splitting, where the person making decision
caneliminate or discard one or more sub-nodes from a particular decision node.

3.6.2 Steps in Decision Tree

23
Fig 3.5: Steps of Decision Tree……[b]

3.6.3 Representation

Fig 3.6:Representation

24
3.7 Proposed System for Extractive approach
This phase would involve supervised classification methods like SupportVector Machines,
Neural Networks, Naive Bayes, Ensemble classifiers (likeAdaboost, Random Forest
Classifiers), etc.

Fig 3.7:Proposed System

25
3.8 System architecture for Extractive approach

We are using SVM also known as Support Vector Machines in our project. SVM will make
classification errors within training data in order to minimise overall error across test data.

Fig. 3.8: System Architecture

26
3.9 Model Design
The proposed approach uses machine and deep learning concepts. The flow chart for this
approach is as follows:

Fig 3.9: Model Design

27
Chapter-4

Performance Analysis

This chapter includes all the algorithms and mathematical formulas used in our project.
Different steps used in applying algorithms. Result and result analysis and their accuracy have
been scrutinized in this section.

4.1 Proposed solutions

a. Preprocessing and Cleaning


Removing the redundant data and recovering and the missing data and. This step basically
involves creation of useful features form the existing ones.
b. Feature Extraction
In this step searching is done with the space of possible feature subsets. We then picked
up thesubset which is optimal or near-optimal with respect to some objective function.
Overfitting and underfitting the dataset is major problem and hence, this is done to avoid
the same.
c. Data Normalization
Information is should have been standardized for better exactness by guaranteeing that all
highlights are not given over the top/low weight age.
d. Analysis of various supervised learning methods
d.1. Classification Methods
Support Vector Machines, Neural Networks, Naïve Bayes, Ensemble classifiers
(likeVector Machines, Neural Networks, Naive Bayes Adaboost, Random Forest
Classifiers) are all part of this phase which are part of supervised learning methods.
d.2 Regression Methods
These models are utilized for intrigued stocks to get the normal numerical value.This
stage would include administered relapses methods like Linear Regressions, Support
Vector Regressions, Usage of Kernel Methods,etc.
e. Social Media Sentiment Analysis
Analysing the situation of the current market from the facebook and twitter or form latest
news headlines in order to gain insights into the future of stock prices.

28
f. Analysis of Different Models
Comparison between the various methods and models implemented over the datasets.

4.2 Analysis
a. Analysis of stocks will be helpful for new merchants to exchange securities exchange
depended on the different variables considered by the product application.
b. Our programming will figure the sensex dependent on organization's stock worth. There
are a great deal of components on which stock estimation of organization depends. Some
of them are:
b.1 Demand and Supply: Supply of company’s share is a major reason for change in price
ofstocks. Increased demand and decreased supply leads to increase in value and vice versa.
b.2 Corporate results: This will be in with respect to the benefits or progress of the
organizationover some stretch of time of not many months.
b.3 Fame: Main power to share buyer. Popularity of a company effect the ones buying.

4.2.1 Prominent features based analysis

a. Analyzing stock data.


We need to give dataset of a company, which will include its opening and closing price
ofmonthly sales or profit.
b. Analyzing the factors.
We have to get the data in the same period for the following factors.

b.1 Demand and Supply: by the previous data entered.


b.2. Corporate results: Companies declare their results and profit at the last of each Quarter.
b.3. Popularity: Analysing the views about the company.

29
4.2.2 Prediction analysis

a. Technical analysis
b. Fundamental analysis

Specialized examination is a momentary system while basic investigation is long haul


methodology. By evaluating on inborn qualities principal investigation permit us to progress
inthe direction of the drawn out estimation of the organization.

4.3 Performance Measures


a. R2 Score (R-squared): It is a statistical measure of how near or close or in the
proximityof the data is to the fitted regression.0% means or indicates that the data or
model explains no variability of the response around its mean, or it simply means no
difference or variation is there.
b. RMSE (Root Mean Squared Error): It is the forecast blunder or the standard
deviation of the residuals( these are the proportion of how long away or far the relapse
line information focuses were). RMSE is a proportion of the distance away or spread
out these residuals are, or it will going to reveal to you that how focused or close
enough is the information close to the line or way of best fit. The little is the RMSE,
the better is the model.
c. MSE (Mean Squared Error): It is basically the average of differences square
betweenthe predicted and actual values, but it pressurizes more on importance of
large errors. Also it should be taken care that less MSE is better for model or the
balance between over and under fit.
d. MAE (Mean Absolute Error): It means the results of measuring the difference
betweentwo variables which are continuous. It is better than RMSE if big mistakes or
errors are undesirable.

30
4.4 Training Dataset
This dataset includes the dates, open, high, low, last, closing prices, total trading amount
andturnover.

31
32
33
34
4.5 Testing Snippet
This will be dataset that we will test on.
This dataset includes the date, open, high, low, close, last, close or Adjust close and volume
ofthe company i.e. total trade and turnover.

a. Date: It’s the date in the format MM-DD-YYYY, on which the trading is done,
orprediction is taken place.
b. Open: The worth every single offer has taken when the Stock trades opens up. It
offers adecent hint or sign of where stock will go during the entire time it showcase is
open. As the Stock trade can be connected with a closeout showcase so purchasers
and venders make manages the most elevated bidder, so the opening and earlier day's
end value need not to be same.
c. High/Low Price: These are to be taken a day before and gives the sign of how much
the share move during a day usually and how will it implicates the closing price, it
basicallyshows the basic cyclic movement of a share.
d. Last Price: It is used to tell the most recently which is reported trading price for the
future contract.
e. Close: It is the stock shutting cost of the specific date or day of exchanging that has
been changed to incorporate any circulations and activities that should be joined that
occurred whenever earlier or before to the following day open.
f. Total Trad: the number of shares that are being traded in an entire market during
given period of time.
g. Turnover: Its measure of stock liquidity calculated as no of shares traded over a
period per average number of shares outstanding for that period.

4.6 Custom Input


The necessary python files/dependencies are imported.

35
Dependencies installed are as follows:
a. Requests: Https request are handled by this library.
b. Pandas as pd: for handling csv files and dataframes.
c. Re: it is used for string operations.
d. Operator: used for numerical operations.
e. Sys: used for system calls
f. Urllib: used to put or get http url requests
g. OS: used for instructing operating systems to perform functions on file system
h. Csv:used to handle csv files, passing reading and writing in csv formats
i. Numpy as np: used for scientific calculations
j. From sklearn.svm import svr: used for machine learning
k. Matplotlib.pyplot as plt: used for visualization and plotting the graphs

36
We have a function first to get the previous stock price data of the company

Once we get the data, we load it in a CSV file for future processing

Now, we take from dataset x and y value

37
Input for Linear Regression

38
4.7 Output/Results

Initial Data fetched from Dataset or the csv. File

Fig 4.1: Initial data

4.7.1 Linear Regression

Error terms for Linear Regression:

39
Chapter 5

Conclusion

Internet has a growing rate and the rate with which the data is being generating , it has become
almost impossible for us to handle and take care of such data. Such an enormous amount of
information is processing nowadays that it becomes difficult for us to study their behavior or
to conclude anything from them, thus making it so hard to summarize it. Then comes Machine
Learning algorithms that helps us in understanding such datasets. Technical and fundamental
analysis have showed a little work in the experiments carried out. Machine learning algorithm
was applied to various data sources of different companies. Report highlights that stock
market is prone to differences. Report also concludes that predicting stock prices is extremely
tough job.
The main objective of this system is providing ways to heal the stock market. Our task is such
that it can’t be used for official model because of its limitedness. We have reached to a certain
degree of accuracy by incorporating the limited number of parameters. Since stock market is
highly fluctuating so to predict everything with great or large accuracy can’t be taken into
account. So our model that we have created has only depends on the selected number of
parameters and their relationship with the share price.
By this basic learning of Extractive and Abstractive Method and tried to implement the initial
one. We have successfully taken a dataset of different companies and performed data cleaning
and normalization. Then we have split the dataset into testing and training in which testing
dataset is almost 10%. After that, we have created Linear Regression, support regression
model and Decision Tree algorithm to trained our model on dates and prices.
Predicting the stock market forecast is always challenging and a tedious job, specially a
challenging work for business analysts. We have calculated our prediction with an overall
accuracy of 60% to 65% approx. To achieve accuracy higher than this, we definitely need to
research in deep.
Based on all the experiments performed in machine learning algorithms and techniques, input
data plays an important role. We are forced to combined the dataset and set of feature list
formed accordingly so, where the dataset is divided into testing and training part, the number
eventually become very less which is nothing but noise and unwanted information which by
using filtering techniques are removed from the dataset to work efficiently and predict the
outcome betterwith almost no sign of noise Additionally, SVM has demonstrated that we can
generator increasingly

40
custom list of capabilities and acquire forecasts with incredible proficiency. We have directed
tests utilizing non straight RBF portion which is demonstrating extensive precision in result.
What's more, the most significant thing, the above investigation helped us in anticipating the
future result of costs of organization yet they additionally gave us important and profound bits
of knowledge about the idea of information which is positively can be utilized to prepare our
SVM classifiers in a superior manner. The venture can be extended further by ad labeling
highlight list and with various classifier. Future work should be possible by including the
unaided preprocessor use alongside the direct classifier.
Based on the performances of all the three Algorithms, Linear Regression, Support
VectorMachine, and Decision Tree, we concluded that Decision Tree is best among the
three andsecond comes the Linear Regression based on the RMSE values,
RMSE(DT)<RMSE(LR)<RMSE(SVM)
As the RMSE should be least which means the error or difference between the Actual or
Y_test and Predicted_Y, so it should be less because if they will be less there would be more
chances of the two to be close enough and thus a good prediction to be called.
RMSE(DT)-
833.0699
RMSE(LR)-923
RMSE(SVM)-
5051.96
Also based on the Pie Chart if we calculate the modular difference between Actual and
Predicted mean, we would easily conclude that Decision Tree(DT) is the best as it have a
modular difference of 35.37, Linear Regression being the second in the race with 76.85 and
SVM at the last having 1255.22 mean difference.

5.1 Future scope of improvement


1. Our dataset and analysis method can improve potentially.
2. If more accurate algorithm and refined data with precise research is taken then future
scopecan be done with possible improvement.
3. Introduction of twitter feeds.
4. Advanced predictions form news feed and different websites can be taken for better
results. 5.Refining key phase extraction and doing more work will definitely produce better
results.

41
References

[1] S. M. Idrees, M. A. Alam and P. Agarwal, "A Prediction Approach for Stock
Market Volatility Based on Time Series Data," in IEEE Access, vol. 7, pp. 17287-17298,
2019. doi: 10.1109/ACCESS.2019.2895252

[2] E. W. Saad, D. V. Prokhorov and D. C. Wunsch, "Comparative study of stock trend


prediction using time delay, recurrent and probabilistic neural networks," in IEEE
Transactions on Neural Networks, vol. 9, no. 6, pp. 1456-1470,
Nov. 1998.doi: 10.1109/72.728395

[3] J. Chou and T. Nguyen, "Forward Forecast of Stock Price Using Sliding-Window
Metaheuristic-Optimized Machine-Learning Regression," in IEEE Transactions on Industrial
Informatics, vol. 14, no. 7, pp. 3132-3142, July 2018.
doi: 10.1109/TII.2018.2794389

[4] P. Chang, C. Fan and C. Liu, "Integrating a Piecewise Linear Representation Method and a
Neural Network Model for Stock Trading Points Prediction," in IEEE Transactions on
Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 39, no. 1, pp. 80-92,
Jan. 2009. doi: 10.1109/TSMCC.2008.2007255

[5] L. Zhang, N. Liu and P. Yu, "A Novel Instantaneous Frequency Algorithm and Its
Application in Stock Index Movement Prediction," in IEEE Journal of Selected Topics in
Signal Processing, vol. 6, no. 4, pp. 311-318,
Aug. 2012.
doi: 10.1109/JSTSP.2012.2199079

[6] S. D. Bekiros, "Sign Prediction and Volatility Dynamics With Hybrid Neurofuzzy
Approaches," in IEEE Transactions on Neural Networks, vol. 22, no. 12, pp. 2353-2362, Dec.
2011.doi: 10.1109/TNN.2011.2169497

[7] K. Raza, "Prediction of Stock Market performance by using machine learning


techniques," 2017 International Conference on Innovations in Electrical Engineering and

42
Computational Technologies (ICIEECT), Karachi, 2017, pp. 1-1.
doi: 10.1109/ICIEECT.2017.7916583

[8] Z. Hu, J. Zhu and K. Tse, "Stocks market prediction using Support Vector Machine," 2013 6th
International Conference on Information Management, Innovation Management and Industrial
Engineering, Xi'an, 2013, pp. 115-118.
doi: 10.1109/ICIII.2013.6703096

[9] Z. Wang, S. Ho and Z. Lin, "Stock Market Prediction Analysis by Incorporating Social and
News Opinion and Sentiment," 2018 IEEE International Conference on Data Mining Workshops
(ICDMW), Singapore, Singapore, 2018, pp. 1375-1380.
doi: 10.1109/ICDMW.2018.00195

[10] S. Sarode, H. G. Tolani, P. Kak and C. S. Lifna, "Stock Price Prediction Using Machine
Learning Techniques," 2019 International Conference on Intelligent Sustainable Systems
(ICISS), Palladam, Tamilnadu, India, 2019, pp. 177-181.
doi: 10.1109/ISS1.2019.8907958

You might also like