0% found this document useful (0 votes)

55 views19 pages

Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine

This document discusses using support vector machines to predict trends in the Indian stock market. It compares using macroeconomic information versus technical indicators as predictive inputs for an SVM model. It finds that macroeconomic information provides more accurate predictions than technical indicators alone. The combination of both types of inputs does not improve accuracy over macroeconomic information alone. Prediction accuracy is further improved when trading strategies are incorporated into the SVM model. The document also evaluates SVM performance compared to other classification methods like linear discriminant analysis for predicting the direction of stock market movements.

Uploaded by

Rishi Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views19 pages

Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine

Uploaded by

Rishi Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

INDIAN STOCK MARKET TREND PREDICTION USING SUPPORT VECTOR MACHINE

M.Suresh Babu* Dr. N.Geethanjali** Prof B.Satyanarayana***

ABSTRACT
Stock return predictability has been a subject of great controversy. The debate followed issues from market efficiency to the number of factors containing information on future stock returns. The analytical tool of support vector regression on the other hand, has gained great momentum in its ability to predict time series in various applications and also in finance (Smola and Schlkopf, 1998). Support vector machines (SVM) are employed to predict stock market dailytrends: ups and downs. The purpose is to examine the effect of macroeconomic information and technical analysis indicators on the accuracy of the classifiers. The construction of a prediction model requires factors that are believed to have some intrinsic explanatory power. These explanatory factors fall largely into two categories: fundamental and technical. Fundamental factors include for example macro economical indicators, which however, are usually only infrequently published. Technical factors are based solely on the properties of the underlying time series and can therefore be calculated at the same frequency as the time series. Since this study applies support vector regression to high frequent data, only technical factors are considered. It is found that macroeconomic information is suitable to predict stock market trends than the use of technical indicators. In addition, the combination of the two sets of predictive inputs does not improve the forecasting accuracy. Furthermore, the prediction accuracy improves when trading strategies are considered. Support vector machine (SVM) is a very specific type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution. In this paper, we investigate the predictability of financial movement direction with SVM by forecasting the weekly movement direction of BSE30 index. To evaluate the forecasting ability of SVM, we compare its performance with those of Linear Discriminant Analysis, Quadratic Discriminant Analysis and Elman Backpropagation Neural Networks. The experiment results show that SVM outperforms the other classification International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 1

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

methods. Further, we propose a combining model by integrating SVM with the other classification methods. The combining model performs best among all the forecasting methods. Keywords:Support Vector Machines, Classification, Stock Market, technical indicators.

*Principal, Intel Institute of Science, Anantapur, Andhra Pradesh, India. **Associate Professor, Department of Computer Science, S.K. University, Anantapur. India ***Professor & Chairman, Board of Studies, Department of Computer Science, Sri Krishnadevaraya Univesity, Anantapur. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 2

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

1. INTRODUCTION
Forecasting stock market behavior is a very difficult task since its dynamics are complex and non-linear. For instance, stock return series are generally noisy and may be influenced by many factors; such as the economy, business conditions, and political events to name a few. Indeed, empirical finance shows that publicly available data on financial and economic variables may explain stock return fluctuations in the Indian Stock Market. For instance, a number of applications have been proposed to forecast stock market returns with macroeconomic variables with the use of neural networks and Bayesian networks and support vector machines. On the other hand, technical indicators have been also used to predict stock market movements using neural networks, adaptive fuzzy inference system, and fuzzy logic. The literature shows that economic variables and technical indicators have achieved success in predicting the stock market. However, none of the previous studies have compared the performance of the economic information and technical indicators in terms of prediction accuracy. The financial market is a complex, evolutionary, and non-linear dynamical system. The field of financial forecasting is characterized by data intensity, noise, non stationary, unstructured nature, high degree of uncertainty, and hidden relationships. Many factors interact in finance including political events, general economic conditions, and traders expectations. Therefore, predicting finance market price movements is quite difficult. Increasingly, according to academic investigations, movements in market prices are not random. Rather, they behave in a highly non-linear, dynamic manner. The standard random walk assumption of futures prices may merely be a veil of randomness that shrouds a noisy non-linear process. Support vector machine (SVM) is a very specific type of learning algorithms characterized by the capacity control of the decision function, the use of the kernel functions and the sparsity of the solution. Established on the unique theory of the structural risk minimization principle to estimate a function by minimizing an upper bound of the generalization error, SVM is shown to be very resistant to the over fitting problem, eventually achieving a high generalization performance. Another key property of SVM is that training SVM is equivalent to solving a linearly constrained quadratic programming problem so that the solution of SVM is always unique and globally optimal, unlike neural networks training which requires nonlinear optimization with the danger of getting stuck at local minima. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 3

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

Some applications of SVM to financial forecasting problems have been reported recently. In most cases, the degree of accuracy and the acceptability of certain forecasts are measured by the estimates deviations from the observed values. For the practitioners in financial market, forecasting methods based on minimizing forecast error may not be adequate to meet their objectives. In other words, trading driven by a certain forecast with a small forecast error may not be as profitable as trading guided by an accurate prediction of the direction of movement. The goal of this study is to predict stock price movements only from the statistical properties of the underlying financial time series and to explore the predictability of financial market movement direction with SVM. Therefore, financial indicators are extracted from the time series, which are then used by a support vector regression (SVR) to predict market movement. 2.2.2 Support vector machines Support Vector Machines (SVM) is a supervised statistical learning technique introduced by Vapnik. It is one of the standard tools for machine learning successfully applied in many different real-world problems. For instance, they have been successfully applied in financial time series trend prediction. The SVM were originally formulated for binary classification. The SVM seek to implement an optimal marginal classifier that minimizes the structural risk in two steps. First, SVM transform the input to a higher dimensional space with a kernel (mapping) function. Second, SVM linearly combine them with a weight vector to obtain the output. As result, SVM provide very interesting advantages. They avoid local minima in the optimization process. In addition, they offer scalability and generalization capabilities. For instance, to solve a binary classification problem in which the output y -1,+1 SVM seek for a hyper-plane w.xb 0 to separate the data from classes +1 and 1 with a maximal margin. Here, x denotes the input feature vector, w is a weight vector, is the mapping function to a higher dimension, and b is the bias used for classification of samples. The maximization of the margin is equivalent to minimizing the norm of w. Thus, to find w and b, the following optimization problem is solved: Minimize : || w ||2 + C ni=1 i S.t yi (w.xib) 1 - i i 0 i = 1,......,n where C is a strictly positive parameter that determines the tradeoff between the maximum margin and the minimum classification error, n is the total number of samples, and generalization and is the error magnitude of the classification. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 4

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

The conditions ensure that no training example should be within the margins. The number of training errors and examples within the margins is controlled by the minimization of the term:

The solution to the previous minimization problem gives the decision frontier: f(x) = yii(xi)(x) + b xi Where each i is a lagrange coefficient. As mentioned before the role of the kernel function is to implicitly map the input vector into a high-dimensional feature space to achieve better separability. In this study the polynomial kernel is used since it is a global kernel. For instance, global kernels allow data points that are far away from each other to have an influence on the kernel values as well. K(x,xi) = (xi) (x) = ((xi.x) + 1)d where the kernel parameter d is the degree of the polynomial to be used. In this study, d is set to 2. Finally, the optimal decision separating function can be obtained as follows:

2. THEORY OF SVM IN CLASSIFICATION

The indicators are arbitrarily chosen among a high variety of financial indicators. The chosen indicators include price differences, moving averages, relative strength and so called stochastic indicators as shown in the figure. These indicators are then preprocessed in the sense that the mean vector is subtracted and each indicator time series in divided by its variance in order to receive indicator values with zero mean and unit variance. Before the SVR model is trained, the parameters of the SVR model are optimized using a cross validation procedure on a training set. After that, the optimized model is used to predict financial market movement. In the process of model selection, models are chosen only on the basis of performance over out of sample data, in order to avoid the critique of judging the model on the basis of in sample performance. The model selection is based on a cross validation procedure commonly used in Data Mining.

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

Our main results show that stock market prediction based on support vector regression is significantly outperforming a random stock market prediction. However, the prediction in average is only correct in 50.69 percent of times with a standard deviation of 0.26 percent. We present a basic theory of the support vector machine model. Let D be the smallest radius of the sphere that contains the data (example vectors). The points on either side of the separating hyperplane have distances to the hyperplane. The smallest distance is called the margin of separation. The hyperplane is called optimal separating hyperplane (OSH), if the margin is maximized. Let q be the margin of the optimal hyperplane. The points that are distance q away from the OSH are called the support vectors. Consider the problem of separating the set of training vector belonging to two separate classes, G = {(xi, yi), i = 1, 2,.....,N} with a hyperplane wT (x) + b = 0 (xi Rn is the ith input vector, yi {1, 1} is known binary target), the original SVM classifier satisfies the following conditions: wT (xi) + b 1 if yi = 1, w (xi) + b if yi = 1,
T

(1) (2)

or equivalently, yi[wT (xi) + b] 1 i = 1, 2........ N, (3) where : Rn Rm is the feature map mapping the input space to a usually high dimensional feature space where the data points become linearly separable. The distance of a point xi from the hyperplane is (4) The margin is 2/|w| according to its definition. Hence, we can find the hyperplane that optimally separates the data by solving the optimization problem: Min (w) = |w|2 under the constraints of Eq. (3). The solution to the above optimization problem is given by the saddle point of the Lagrange function (6) under the constraints of Eq. (3), where i are the nonnegative Lagrange multipliers. So far the discussion is restricted to the case where the training data is separable. To generalize the problem to the non-separable case, slack variable i is introduced such that International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 6 (5)

IJREAS

Volume 2, Issue 1 (January 2012)

(7)

ISSN: 2249-3905

Thus, for an error to occur the corresponding i must exceed unity, so

is an upper

bound on the number of training errors. Hence, a natural way to assign an extra cost for errors is to change the objective function from Eq. (5) to (8)

under the constraints of Eq. (7), where C is a positive constant parameter used to control the tradeoff between the training error and the margin. In this paper, we choose C =50 based on our experiment experiences. Similarly, solve the optimal problem by minimizing its Lagrange function (9) under the constraints of Eq. (7), where i,i are the non-negative Lagrange multipliers. The KarushKuhnTucker (KKT) conditions [16] for the primal problem are (10) (11) (12) (13) i 0 i 0, 0, i i = 0 Hence, (19) We can use the KKT complementarily conditions, Eqs. (17) and (18), to determine b. Note that Eq. (12) combined with Eq. (18) shows that j = 0 if j < C. Thus we can simply take any training data for which 0< j < c to use Eq. (17) (with j = 0) to compute b. b= yj wT (xj) Hence, International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 7 (20) It is numerically reasonable to take the mean value of all b resulting from such computing. (14) (15) (16) (17) (18)

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

(21) where Ns is the number of the support vectors. For a new data x, the classification function is then given by f(x) = Sign(wT (x) + b) (22)

Substituting Eqs. (19) and (21) into Eq. (22), we get the final classification function (23) If there is a kernel function such that K(xi,xj)=(xi)T (xj), it is usually unnecessary to explicitly Know what (x) is, and we only need to work with a kernel function in the training algorithm. Therefore, the non-linear classification function is (24) Any function satisfying Mercers condition [17] can be used as the kernel function. In this investigation, the radial kernel K(s, t) = exp(1/10 ||s-t||2 ) is used as the kernel function of the SVM because the radial kernel tends to give good performance under general smoothness assumptions. Consequently, it is especially useful if no additional knowledge of the data is available. 3. Experiment design Several financial indicators are calculated in order to reduce dimensionality of the time series:

: The relative price difference of prices p(t) at time t and p(t-1) at time t-1

: The exponential moving average of the prices p(t) : The relative strength indicator of the number of upward

movement U[t n;t] and downward movements D[t n;t] in the period of t-n until time t

: The stochastic indicator of the stock price p(t), lowest stock price L[t n;t] and highest stock price H[t n;t] in the period of t-n until time t. The figure illustrates some of the properties of the indicators derived as above from a random time series. The kernel densities are estimated for each indicator with a bandwidth of 0.001. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 8

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

Note, that the RDP and EMA indicator are rather gaussian distributed, while the RSI and Stochastic indicators have several modes and especially the Stochastic indicator seems to be a mixture of two different gaussian distributions. In our empirical analysis, we set out to examine the weekly changes of the BSE30 Index. The BSE30 Index is calculated and disseminated. It measures the composite price performance of 225 highly capitalized stocks trading on the Bombay Stock Exchange (BSE), representing a broad cross-section of Indian industries. Trading in the index has gained unprecedented popularity in major financial markets around the world. Futures and options contracts on the BSE30 Index are currently traded on the Securities and Exchange Board of India (SEBI), the National Stock Exchange (NSE). The increasing diversity of financial instruments related to the BSE30 Index has broadened the dimension of global investment opportunity for both individual and institutional investors. There are two basic reasons for the success of these index trading vehicles. First, they provide an effective means for investors to hedge against potential market risks. Second, they create new profit making opportunities for market speculators and arbitrageurs. Therefore, it has profound implications and significance for researchers and practitioners alike to accurately forecast the movement direction of BSE30 Index.

3.0. MODEL INPUTS SELECTION

Most of the previous researchers have employed multivariate input. Several studies have examined the cross sectional relationship between stock index and macroeconomic variables. The potential macroeconomic input variables which are used by the forecasting models include term structure of interest rates (TS), short-term interest rate (ST), long term interest rate (LT), consumer price index (CPI), industrial production (IP), government consumption (GC), private consumption (PC), gross national product (GNP) and gross domestic product (GDP). However, Indian interest rate has dropped down to almost zero since 1990. Other macroeconomic variables weekly data are not available for our study. Indian consumption capacity is limited in the domestic market. The economy growth has a close relationship with Indian export. The largest export target for India is the United States of America (USA), which is the leading economy in the world. Therefore, the economic condition of USA influences Indian economy, which is well represented by the BSE30 Index. As the BSE30 Index to Indian economy, the S& P 500 Index is a well-known indicator of the economic condition in USA. Hence, the S& P 500 Index is selected as model input. Another International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 9

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

import factor that affects the Indian export is the exchange rate of US Dollars against Indian Rupee (Rs), which is also selected as model input. The prediction model can be written as the following function: Directiont = F(St-1 S&P500 , St-1 IND), (25) where St-1 S&P500 and St-1 IND are first order difference natural logarithmic transformation to the raw S& P 500 index and IND at time t1, respectively. Such transformations implement an effective detrending of the original time series. Directiont is a categorical variable to indicate the movement direction ARTICLE IN PRESS

Fig. 1. First-order difference natural logarithmic weekly prices of BSE Index, S& P 500 Index. (observations from October 2010 to September 2011).

of BSE30 Index at time t. If BSE30 Index at time t is larger than that at time t 1, Directiont is 1. Otherwise, Directiont is 1. The above model inputs selection is only based on a macroeconomic analysis. As shown in Fig. 1, the behaviours of the BSE30 Index, the S& P 500 Index are very complex. It is impossible to give an explicit formula to describe the underlying relationship between them.

3.1. Data collection We obtain the historical data from the finance section of Yahoo and the Bombay Stock Exchange and National Stock Exchange respectively. The whole data set covers the period from January 1, 2007 to December 31, 2010, a total of 694 pairs of observations. The data set is divided into two parts. The first part (652 pairs of observations) is used to determine the specifications of the models and parameters. The second part (42 pairs of observations) is reserved for out-of-sample evaluation and comparison of performances among various forecasting models. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 10

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

3.2. Comparisons with other forecasting methods To evaluate the forecasting ability of SVM, we use the random walk model (RW) as a benchmark for comparison. RW is a one step ahead forecasting method, since it uses the current actual value to predict the future value as follows: yt+1=yt, period. We also compare the SVMs forecasting performance with that of linear discriminant analysis (LDA), quadratic discriminant analysis (QDA) and Elman backpropagation neural networks (EBNN). LDA can handle the case in which the within class frequencies are unequal and its performance has been examined on randomly generated test data. This method maximizes the ratio of between-class variance to the within-class variance in any particular data set, thereby guaranteeing maximal separability. QDA is similar to LDA, only dropping the assumption of equal covariance matrices. Therefore, the boundary between two discrimination regions is allowed to be a quadratic surface (for example, ellipsoid, hyperboloid, etc.) in the maximum likelihood argument with normal distributions. In this chapter, we derive a linear discriminant function of the form: L(St-1 s&p500, St-1 IND) = a0 + a1 St-1 s&p500 + a2 St-1 IND) and a quadratic discriminant function of the form: Q((St-1s&p500,St-1IND) = a + P((St-1s&p500,St-1IND)T+ ((St-1s&p500,St-1IND)T((St-1s&p500, St-1 (28) where a0, a1, a2, a, P,T are coefficients to be estimated. Elman Backpropagation Neural Network is a partially recurrent neural network. The connections are mainly feed forward but also include a set of carefully chosen feedback connections that let the network remember cues from the recent past. The input layer is divided into two parts: the true input units and the context units that hold a copy of the activations of the hidden units from the previous time step. Therefore, network activation produced by past inputs can cycle back and affect the processing of future inputs. 3.3. A combining model Given a task that requires expert knowledge to perform, k experts may be better than one if their individual judgments are appropriately combined. Based on this idea, predictive
IND

(26)

where yt is the actual value in the current period t and yt+1 is the predicted value in the next

(27)

)T,

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

performance can be improved by combining various methods. Therefore, we propose a combining model by integrating SVM with other classification methods as follows: (29) where wi is the weight assigned to classification method i, We would like to

determine the weight scheme based on the information from the training phase. Under this strategy, the relative contribution of a forecasting method to the final combined score depends on the in sample forecasting performance of the learned classifier in the training phase. Conceptually, a well-performed forecasting method should be given a larger weight than the others during the score combination. In the investigation, we adopt the weight scheme as follows: (30) where ai is the in sample performance constructed by forecasting method i.
Table 1
Forecasting performance of different classification methods

Classification method RW LDA QDA EBNN SVM Combining model Table 2 Covariances matrices of input variables when Directiont = -1 SINDt-1 SINDt-1 St-1
S&p500

Hit ratio (%) 50 55 69 69 73 75

St-1 S&p500 0.00002147347 0.00044862762

0.00015167706 0.00002147347

4. EXPERIMENT RESULTS Each of the forecasting models described in the last section is estimated and validated by in sample data. The model estimation selection process is then followed by an empirical evaluation which is based on the out-sample data. At this stage, the relative performance of the models is measured by hit ratio. Table 1 shows the experiment results.

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

RW performs worst, producing only 50% hit ratio. RW assumes not only that all historic information is summarized in the current value, but also that incrementspositive or negativeare uncorrelated (random), and balanced, that is, with an expected value equal to zero. In other words, in the long run there are as many positive as negative fluctuations making long term predictions other than the trend impossible. SVM has the highest forecasting accuracy among the individual forecasting methods. One reason that SVM performs better than the earlier classification methods is that SVM is designed to minimize the structural risk, whereas the previous techniques are usually based on minimization of empirical risk. In other words, SVM seeks to minimize an upper bound of the generalization error rather than minimizing training error. So SVM is usually less vulnerable to the over fitting problem.QDA out performs LDA in term of hit ratio, because LDA assumes that all the classes have equal covariance matrices, which is not consistent with the properties of input variable belonging to different classes as shown in Tables 2 and 3. In fact, the two classes have different covariance matrices. Heteroscedastic models are more appropriate than homoscedastic models. The integration of SVM and the other forecasting methods improves the forecasting performance. Different classification methods typically have access to different information and therefore produce different forecasting results. Given this, we can combine the individual forecasters various information sets to produce a single superior information set from which a single superior forecast could be produced.
Table 3 : Covariances matrices of input variables when Directiont = 1 SINDt-1 SINDt-1 St-1
S&p500

St-1 S&p500 -0.00002932242 0.00044571885

0.00018240800 -0.00002932242

The method of support vector regression includes several parameters to be chosen, which can e.g. optimized using cross validation. These parameter include the chosen kernel with parameter , the e of the e-insensitive loss function, the cost of error c and the number of training samples. The advantage of using a kernel is sometimes to be able to linearly classify inseparable cases like shown on the top of the figure. In this case, the black and white label points on the left side are not linearly separable. After the kernel transformation, however, the black and white labelled points might fall onto the same point in the new space. Here, the classification problem becomes International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 13

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

trivial. Therefore choosing a kernel is high importance, as well as the parameter of the kernel function. Another parameter is the e of the insensitive loss function, which is illustrated on the bottom of the figure. The support vector regression model is trained placing a penalty for values, which are off target. The penalty depends on the e-insensitive loss function, with parameter e. The idea is to penalize values off target only if the difference is higher than the absolute value of e. Given the kernel K(xi,xj) = (xi)T(xj), the training set of instance-label pairs (xi,yi),i = 1,...,l, where xi Rn and yi 1, -1l , the optimization problem of the support vector machines can be formulated as min subject to yi (wT (xi) + b) 1- Ei, Ei 0.

The support vector machine then maximizes the margin of the separating hyperplane of the classes, which is equal to minimizing |w|/|t| and therefore also to minimizing |w|2 / 2.

Cross-validation

Figure 4: Cross-validation setup. Several parameter values are tested in the prediction accuracy on a training set, of which then the optimal parameter combination is chosen for further prediction on the test set.

Since the SVR parameters can be easily controlled manually, the optimal set of parameters is chosen on a test set and then used on the following training set. The cross-validation is applied as illustrated in the figure. The total data set is divided into two parts, one for crossvalidation and one for testing. A third part of the data set in order to optimize the structure of the model, like the used indicators, is omitted in this study. In order to optimize the number of training samples, the cost of error c, the kernel parameter and the parameter e of the e-insensitive loss-function, a k-fold cross-validation is used as follows: the dataset is divided into k folders of equal size; subsequently, a model is built on all possible (k) combinations of k-1 folders, and each time the remaining one folder is used for validation. The best model is the one that performs best on average over the k validation International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 14

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

folders. The benefit of using a cross-validation procedure is that by construction it ensures that model selection is based entirely on out-of-sample rather than in-sample performance. Thus, the search for the best Support Vector Regression model is immune to a critique of drawing conclusions about the merits of a factor model based on its in-sample performance. In this study, a 10-fold cross-validation procedure was used for each parameter above. In each validation loop, different values for each parameter are chosen, while the other parameters are set constant. Then the SVR model is trained with this set of parameters and the prediction accuracy is calculated. This is done for all parameter combinations and then the combination with the maximal prediction accuracy chosen. Basic model

Figure 5: The basic model. The machine is trained on the past values of the indicators. The resulting model is used to predict the movement on the next day (= 108 data points). After that the model is shifted and proceeds again.

The basic simulation consists of two steps: First, at month t, all historical values for all explanatory factors together with the difference in returns for the periods t - n1 till t - 1 are used to build numerous support vector regressions. Thus the dependent variable is the return of the stock in the period of t till t + n2. The variable n2 is arbitrarily chosen to 108, in order to decrease calculation time. The independent variables are the technical indicators as described above. Second, once the prediction is calculated, the model is shifted 108 data points and the model is build again in order to predict the next 108 stock price movements. Using only historically available data ensures the implementation of the trading strategy is carried out without the benefit of foresight, in the sense that investment decisions are not based on data that have become available after any of the to-be-predicted periods. Moreover, International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 15

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

investment decisions for the to-be-predicted months are always based on the entire factor set of historical data, ensuring that no variable-selection procedures based on extensive manipulation of the whole available data have been carried out. At any rate, the utilized cross-validation procedure for model selection ensures that the best candidate model is selected on the basis of performance in the training set and not on the basis of performance on external validation samples. RESULTS AND DISCUSSION The data set consists of 5 minute closing prices p(t) for 28 stocks in the BSE Sensex. The missing stocks are Satyam and Hypo Real Estate due to data unavailability. With a time frame of nearly 7 years between April 2004 and August 2010, the data set comprises 140.000 data points per stock. From this data set, the log return of each stock i is calculated as

with price p(t) at time t as well as the market average as

over all stocks i. From this the log return above market is calculated as xi'(t) = xi(t) xmarket(t) for each stock i. Cross-validation Several parameter values are chosen for each of the machine parameters. The cost and training length parameter show linear dependencies, while the kernel parameter gamma shows a quadratic dependency. The e parameter is rather nonlinear dependent to the prediction accuracy. Several parameter conditions were tested on the first half of the data set. The figure shows the tested values for each parameter. The optimality criterion used here, is the cumulated return. Therefore, the model is trained with the parameter set, the prediction calculated and then the return resulting from the prediction is cumulated over time. The parameter values are tested on half of the data set, that is between May 2009 and July 2011. On the top left of the figure, the results for different parameter values of the cost function is shown. With an increasing cost function value, the cumulated return increases. This seems plausible, since with an increasing cost the model is trained longer. However, the parameter exploration is stopped at a cost value of 1000, since higher values increase computation time dramatically.

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

The top right of the figure shows different parameter values for e of the e-insensitive lossfunction. Here the results seem to be rather nonlinearly related to the cumulated return, since with increasing parameter e, the cumulated return decreases only in general. However, generally, smaller values of e seem to be more successful. Since this value controls the penalty of the training algorithm, a small value indicates a fast penalty for values off-target. The kernel parameter gamma, plotted for different values on the bottom left, seems to approach an optimum value around 1. The parameter controls the shape of the kernel. With high parameter values, the kernel becomes rather flat and the model increasingly predicts future movements only linearly, which is obviously insufficient. With small parameter values, the kernel becomes very thin and training data are increasingly over fitted with decreasing generalization performance. This again results in a low prediction

performance.Last, with an increasing number of training points the prediction performance increases. Therefore the quality of the trained model increases with the number of training samples. Prediction accuracy The optimized parameters were tested with the basic model approach described above on the second half of the data set. The prediction accuracy over all 28 stocks reached a mean of 50.69 percent with standard deviation of .26%. With this performance, the reported approach significantly outperformed a random prediction approach. Even if a gain of .69 percent might be a valuable trading prediction, this approach is market neutral and operated only on the basic statistical properties of market movements.

5. CONCLUSIONS
In this Chapter, we study the use of support vector machines to predict financial movement direction. SVM is a promising type of tool for financial forecasting. As demonstrated in our empirical analysis, SVM is superior to the other individual classification methods in forecasting weekly movement direction of BSE30 Index. This is a clear message for financial forecasters and traders, which can lead to a capital gain. However, each method has its own strengths and weaknesses. Thus, we propose a combining model by integrating SVM with other classification methods. The weakness of one method can be balanced by the strengths of another by achieving a systematic effect. The combining model performs best among all the forecasting methods. The underlying time series were derived from the Bombay Stock Exchange Index. The support vector machine was then trained in order to predict the movement of 28 stocks of the International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 17

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

index against market. Features for training were directly extracted from the statistical properties of the time series and no fundamental information was used. The model selection was based on the performance on out-of-sample data, in order to avoid critique of foresight and was performed as cross-validation. The main result of this study is that the movement of stocks is significantly predicted only using technical indicators with support vector regression.

6. REFERENCES
[1] Cristianini N, Taylor JS. An introduction to support vector machines and other kernelbased learning methods. New York: Cambridge University Press; 2000. [2] Cao LJ, Tay FEH. Financial forecasting using support vector machines. Neural Computing Applications 2001;10: 18492. [3] Tay FEH, Cao LJ. Application of support vector machines in financial time series forecasting. Omega 2001;29:30917. [4] Castanias R.P. Macro information and the Variability of Stock Market Prices. Journal of Finance 34 (1979), pp.439 [5] Schwert G William. The Adjustment of Stock Prices to Information about Inflation. The Journal of Finance 36 (1981), pp.15-29. [6] Schwert G William. Stock Returns and Real Activity: A Century of Evidence. Journal of Finance 14 (1990), pp.1237-1257. [7] Fama EF. Stock Returns, Real Activity, Inflation and Money. American Economic Review 71 (1981), pp.71:545 [8] Nai-Fu Chen, Roll R, Ross R. Economic Forces and The Stock Market. Journal of Business 59 (1986), pp.383-403. [9] Hardouvelis Gikas A. Macroeconomic Information and Stock Prices. Journal of Economics and Business 1987;39:131-140. [10] Darrat AF. Stock Returns, Money and Fiscal Deficits. Journal of Financial and Quantitative Analysis 25 (1990), pp.38798. [11] Blank SC. Chaos in futures market? a nonlinear dynamical analysis. Journal of Futures Markets 1991;11:71128. [12] DeCoster GP, Labys WC, Mitchell DW. Evidence of chaos in commodity futures prices. Journal of Futures Markets 1992;12:291305. [13]Frank M, Stengos T. Measuring the strangeness of gold and silver rates of return. The Review of Economic Studies 1989;56:55367. International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org 18

IJREAS

Volume 2, Issue 1 (January 2012)

ISSN: 2249-3905

[14] Frank M, Stengos T. Measuring the strangeness of gold and silver rates of return. The Review of Economic Studies 1989;56:55367. [15] Vapnik VN. Statistical learning theory. New York: Wiley; 1998. [16] Vapnik VN. An overview of statistical learning theory. IEEE Transactions of Neural Networks 1999;10:98899.

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

PWNSAT Sample Paper Class 10th Sample Paper Questions
71% (7)
PWNSAT Sample Paper Class 10th Sample Paper Questions
7 pages
Stock Market Forecasting
100% (1)
Stock Market Forecasting
7 pages
Stock Closing Price Prediction Using Machine Learning SVM Model
100% (1)
Stock Closing Price Prediction Using Machine Learning SVM Model
7 pages
2015 - Predicting-Stock-Market-Index-Using-Fusion-Of-Mac - 2015 - Expert-Systems-with-A
No ratings yet
2015 - Predicting-Stock-Market-Index-Using-Fusion-Of-Mac - 2015 - Expert-Systems-with-A
11 pages
Stock Market Analysis
100% (1)
Stock Market Analysis
19 pages
Market Stock Price Prediction Using Machine Learning: Vishal Dwivedi, Dr. Sheenu Rizvi, DR - Anuradhamisra
No ratings yet
Market Stock Price Prediction Using Machine Learning: Vishal Dwivedi, Dr. Sheenu Rizvi, DR - Anuradhamisra
5 pages
TOPSOE Seminar - Catalysts and Reactions PDF
100% (4)
TOPSOE Seminar - Catalysts and Reactions PDF
132 pages
Financial Time Series Forecasting Using Support Vector Machines
No ratings yet
Financial Time Series Forecasting Using Support Vector Machines
21 pages
Can Machine Learning Be Used To Predict Market Direction
No ratings yet
Can Machine Learning Be Used To Predict Market Direction
11 pages
An Application of Support Vector Machine To Companies' Financial Distress Prediction
No ratings yet
An Application of Support Vector Machine To Companies' Financial Distress Prediction
9 pages
An SVM-based Approach For Stock Market Trend Prediction 1
No ratings yet
An SVM-based Approach For Stock Market Trend Prediction 1
7 pages
10.1515 - Comp 2020 0199
No ratings yet
10.1515 - Comp 2020 0199
11 pages
02.predict The Stock Exchange of Thailand - Set
No ratings yet
02.predict The Stock Exchange of Thailand - Set
4 pages
A Prognosis Approach For Stock Market
No ratings yet
A Prognosis Approach For Stock Market
8 pages
174-Article Text-586-1-10-20220718
No ratings yet
174-Article Text-586-1-10-20220718
10 pages
SVM Stock
No ratings yet
SVM Stock
5 pages
Stock Market Price Prediction Using Machine Learning IJERTV14IS040441
No ratings yet
Stock Market Price Prediction Using Machine Learning IJERTV14IS040441
5 pages
Terminology and Body Plan
No ratings yet
Terminology and Body Plan
26 pages
Price Prediction Evolution: From Economic Model To Machine Learning
No ratings yet
Price Prediction Evolution: From Economic Model To Machine Learning
7 pages
Stock Price Direction Prediction by Directly Using Prices Data: An Empirical Study On The KOSPI and HSI
No ratings yet
Stock Price Direction Prediction by Directly Using Prices Data: An Empirical Study On The KOSPI and HSI
16 pages
Exercises of Nouns
No ratings yet
Exercises of Nouns
5 pages
1 s2.0 S0957417420310228 Main
No ratings yet
1 s2.0 S0957417420310228 Main
12 pages
Ann SVM
No ratings yet
Ann SVM
26 pages
Hankook Brochure Manual
No ratings yet
Hankook Brochure Manual
22 pages
Machine Learning Techniques For Stock Price Predic
No ratings yet
Machine Learning Techniques For Stock Price Predic
10 pages
SVM Based Stock Prediction Analysis
No ratings yet
SVM Based Stock Prediction Analysis
7 pages
Financial Time Series Forecasting Using Independent Component Analysis and Support Vector Regression
No ratings yet
Financial Time Series Forecasting Using Independent Component Analysis and Support Vector Regression
11 pages
Christ
No ratings yet
Christ
6 pages
Unit 2 Elasticity of Demand and Supply
100% (1)
Unit 2 Elasticity of Demand and Supply
24 pages
Literature Survey: 2.1 Review On Machine Learning Techniques For Stock Price Prediction
No ratings yet
Literature Survey: 2.1 Review On Machine Learning Techniques For Stock Price Prediction
15 pages
A Hybrid Model Integrating Singular Spectrum Analysis and Backpropagation Neural Network For Stock Price Forecasting
No ratings yet
A Hybrid Model Integrating Singular Spectrum Analysis and Backpropagation Neural Network For Stock Price Forecasting
6 pages
++++nifty Prediction Using SVM 350 - Milind Kolambe - JP - Normalization.
No ratings yet
++++nifty Prediction Using SVM 350 - Milind Kolambe - JP - Normalization.
7 pages
WCE2008 pp1171-1175
No ratings yet
WCE2008 pp1171-1175
5 pages
Mining Stock Market Tendency Using GA-Based Support Vector Machines
No ratings yet
Mining Stock Market Tendency Using GA-Based Support Vector Machines
10 pages
Journal of Internet Banking and Commerce
No ratings yet
Journal of Internet Banking and Commerce
22 pages
SSRN 3358252
No ratings yet
SSRN 3358252
7 pages
A Hybrid Forecasting Model For Prediction of Stock Value of Tata Steel Using Support Vector Regression and Particle Swarm Optimization
No ratings yet
A Hybrid Forecasting Model For Prediction of Stock Value of Tata Steel Using Support Vector Regression and Particle Swarm Optimization
10 pages
2 RV
No ratings yet
2 RV
12 pages
Machine Learning-Based Approaches For Financial Ma
No ratings yet
Machine Learning-Based Approaches For Financial Ma
19 pages
Solid State Zelio Relay
No ratings yet
Solid State Zelio Relay
76 pages
Support Vector Machines For Prediction of Futures Prices in Indian Stock Market
No ratings yet
Support Vector Machines For Prediction of Futures Prices in Indian Stock Market
5 pages
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
No ratings yet
(IJCST-V10I5P49) :mrs R Jhansi Rani, C Nithin
8 pages
Diploma in Electrical Engineering Industrial Traning Report
No ratings yet
Diploma in Electrical Engineering Industrial Traning Report
42 pages
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous ...
No ratings yet
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms Via Continuous ...
62 pages
Using Machine Learning Algorithms On Prediction of Stock Price-SVR
No ratings yet
Using Machine Learning Algorithms On Prediction of Stock Price-SVR
16 pages
Applied Soft Computing: Manoj Thakur, Deepak Kumar
No ratings yet
Applied Soft Computing: Manoj Thakur, Deepak Kumar
13 pages
Design of High-Speed Comparator For LVDS Receiver
No ratings yet
Design of High-Speed Comparator For LVDS Receiver
3 pages
Predicting Stock Market Price Using Support Vector Regression
No ratings yet
Predicting Stock Market Price Using Support Vector Regression
7 pages
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
No ratings yet
Analysing Stock Market Trend Prediction Using Machine Amp Deep Learning Models A Comprehensive Review
10 pages
Stock Prediction Using Machine Learning Algorithm
No ratings yet
Stock Prediction Using Machine Learning Algorithm
5 pages
Share Market Analysis and Prediction
No ratings yet
Share Market Analysis and Prediction
5 pages
Ic3 2019 8844891
No ratings yet
Ic3 2019 8844891
5 pages
12 Wcdma Hsdpa RRM and Parameters
No ratings yet
12 Wcdma Hsdpa RRM and Parameters
67 pages
Continuity Equation
No ratings yet
Continuity Equation
11 pages
STOCK PRICE PREDICTOR USING MACHINE LEARNING Report
No ratings yet
STOCK PRICE PREDICTOR USING MACHINE LEARNING Report
39 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
Python
No ratings yet
Python
12 pages
Final Exam SEE3433 Mei (Solution)
No ratings yet
Final Exam SEE3433 Mei (Solution)
9 pages
Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
100% (1)
Financial Time Series Analysis and Prediction With Feature Engineering and Support Vector Machines - Newton - Linchen
5 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
Literature Review
No ratings yet
Literature Review
6 pages
A Hybrid Machine Learning System For Stock Market Forecasting
No ratings yet
A Hybrid Machine Learning System For Stock Market Forecasting
4 pages
M. SC I Sem Course Structure
No ratings yet
M. SC I Sem Course Structure
9 pages
Humanoid Robot Reinforcement Learning Algorithm For Biped Walking
No ratings yet
Humanoid Robot Reinforcement Learning Algorithm For Biped Walking
7 pages
What Is Trip Circuit Supervision (TCS) Protection
No ratings yet
What Is Trip Circuit Supervision (TCS) Protection
7 pages
Stock Price Prediction Using Deep Learning Algorithm and Its Comparison With Machine Learning Algorithms
No ratings yet
Stock Price Prediction Using Deep Learning Algorithm and Its Comparison With Machine Learning Algorithms
11 pages
Lifi Over Wifi
No ratings yet
Lifi Over Wifi
17 pages
Objective Problems: (Level 1)
No ratings yet
Objective Problems: (Level 1)
7 pages
Using Volume Weighted Support Vector Machines With Walk Forward PDF
No ratings yet
Using Volume Weighted Support Vector Machines With Walk Forward PDF
9 pages
A Machine Learning Model For Stock Market
No ratings yet
A Machine Learning Model For Stock Market
7 pages
University of Cambridge International Examinations General Certificate of Education Ordinary Level
No ratings yet
University of Cambridge International Examinations General Certificate of Education Ordinary Level
24 pages
Oracle Recommended Patches R12.ATG - PF.B
No ratings yet
Oracle Recommended Patches R12.ATG - PF.B
32 pages
Natural Gas Pressure Test
No ratings yet
Natural Gas Pressure Test
4 pages
Irregular Singular Points
No ratings yet
Irregular Singular Points
14 pages
Predicting Stock Price Direction Using Support Vector Machines
No ratings yet
Predicting Stock Price Direction Using Support Vector Machines
14 pages
Elder Disk
No ratings yet
Elder Disk
39 pages
Unit Conversion Table: Distance Foot (FT) Inch (In) Meter (M) Centimeter (CM) Mile (Mi)
No ratings yet
Unit Conversion Table: Distance Foot (FT) Inch (In) Meter (M) Centimeter (CM) Mile (Mi)
2 pages
Exploring Machine Learning Methods For Stock Market Prediction A Review
No ratings yet
Exploring Machine Learning Methods For Stock Market Prediction A Review
4 pages
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
No ratings yet
Dear Sir,: Larsen & Toubro Limited Electrical & Automation Control & Automation
2 pages
Q3 Carpentry Week 4
No ratings yet
Q3 Carpentry Week 4
25 pages
Lanen - Fundamentals of Cost Accounting - 6e - Chapter 5 - Notes
No ratings yet
Lanen - Fundamentals of Cost Accounting - 6e - Chapter 5 - Notes
4 pages
Assembly Procedure 24M
No ratings yet
Assembly Procedure 24M
21 pages
Grade 5 DLL MATH 5 Q4 Week 2
No ratings yet
Grade 5 DLL MATH 5 Q4 Week 2
5 pages
X-Analysis 5250 User Manual 13.2.03
No ratings yet
X-Analysis 5250 User Manual 13.2.03
41 pages
1 4 Scalars and Vectors MCQ
No ratings yet
1 4 Scalars and Vectors MCQ
30 pages
Stock Market Prediction Using Machine Learning
No ratings yet
Stock Market Prediction Using Machine Learning
5 pages
Stock Market Analysis Using Supervised Machine Learning
No ratings yet
Stock Market Analysis Using Supervised Machine Learning
3 pages
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
No ratings yet
Design of Rotation Inducing Rocket Fins and Their Analysis For Aerodynamic Stability
6 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet

Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine

Uploaded by

Ijreas Volume 2, Issue 1 (January 2012) ISSN: 2249-3905 Indian Stock Market Trend Prediction Using Support Vector Machine

Uploaded by

IJREAS

Volume 2, Issue 1 (January 2012)

INDIAN STOCK MARKET TREND PREDICTION USING SUPPORT VECTOR MACHINE

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

2. THEORY OF SVM IN CLASSIFICATION

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

Thus, for an error to occur the corresponding i must exceed unity, so

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

3.0. MODEL INPUTS SELECTION

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

Volume 2, Issue 1 (January 2012)

Hit ratio (%) 50 55 69 69 73 75

St-1 S&p500 0.00002147347 0.00044862762

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

Volume 2, Issue 1 (January 2012)

St-1 S&p500 -0.00002932242 0.00044571885

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

with price p(t) at time t as well as the market average as

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

Volume 2, Issue 1 (January 2012)

International Journal of Research in Engineering & Applied Sciences http://www.euroasiapub.org

You might also like