0% found this document useful (0 votes)

9 views15 pages

ACluster HFT

This document summarizes research analyzing the clustering of stock prices using high frequency trading data from the National Stock Exchange in India. The researchers used functional principal component analysis (FPCA) and kernel principal component analysis (KPCA) on tick-by-tick data over 30 second intervals to identify clusters of stocks that move together beyond what regression and correlation can show. The analyses found two prominent clusters, one for banking stocks and one for IT stocks, with smaller clusters for automotive, energy, and other industries. IT stocks were also found to interact with the smaller clusters.

Uploaded by

dennis.k.lam1648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views15 pages

ACluster HFT

Uploaded by

dennis.k.lam1648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Title: Cluster analysis of stocks using price movements of high frequency data from National

Stock Exchange.

Abstract:

This paper aims to develop new techniques to describe joint behavior of stocks, beyond

regression and correlation. For example, we want to identify the clusters of the stocks that move

together. Our work is based on applying Kernel Principal Component Analysis(KPCA) and

Functional Principal Component Analysis(FPCA) to high frequency data from NSE. Since we

dealt with high frequency data with a tick size of 30 seconds, FPCA seems to be an ideal choice.

FPCA is a functional variant of PCA where each sample point is considered to be a function in

Hilbert space 𝐿2 . On the other hand, KPCA is an extension of PCA using kernel methods.

Results obtained from FPCA and Gaussian Kernel PCA seems to be in synergy but with a lag.

There were two prominent clusters that showed up in our analysis, one corresponding to the

banking sector and another corresponding to the IT sector. The other smaller clusters were seen

from the automobile industry and the energy sector. IT sector was seen interacting with these

small clusters. The learning gained from these interactions is substantial as one can use it

significantly to develop trading strategies for intraday traders.

Keywords: Financial mathematics, statistics, high frequency trading, big data analytics, artificial

intelligence.

Authors: Charu Sharma, Ph.D. scholar, Assistant Professor, School of Natural Sciences, Shiv

Nadar University, UP; Prof. Amber Habib, Professor, School of Natural Sciences, Shiv Nadar

University, UP; Prof. Sunil Bowry, Professor, School of Management and Entrepreneurship,

Shiv Nadar University, UP.

Address: Charu Sharma, A111D, Shiv Nadar University, NH91, Tehsil Dadri, Gautam Buddha

Nagar, Uttar Pradesh – 201314. Phone No: +91-9911750311, Email: [email protected]

I. Introduction

According to a report by a Brookernotes, a UK based company, out of all the people who

trade stocks online, about one-third of them are from Asia. In fact, out of the 3.2 million

traders in Asia, 570,000 of them are based in India. In view of this, understanding the

interactions amongst the stocks at a tick by tick level is a need of the hour. Among

various factors, which influence the change in the stock prices, change in the prices of

other stocks is one of the major influential factor. Over the years. researchers have used

techniques like regression analysis and correlation to understand the co-movements of the

stocks but that too on daily rate of return. In this paper we have tried to exploit the

interactions amongst the stocks on tick by tick level where each tick is a 30 sec mark. The

techniques used by us are extensions of a well-known classification technique called as

Principal Component Analysis. Also when the sample points in the working dataset can

be regarded as functions, then instead of using usual PCA for classification, one can think

of using functional analogue of PCA called as functional PCA. Last two decades has seen

tremendous development in the field of functional data analysis, a branch of statistics that

deals with the data that consider each sample point as functions. Over the past two

decades, Ramsay and Silverman2-7 has shown many real world applications in the field of

FDA. Since we worked with high frequency data, we can treat sample points as functions

instead of a discrete set of values and thus FPCA looked a good choice for classifying the

stocks. The second technique that has been used in this paper, is Kernel based principal

component analysis (KPCA). This method is used to exploit the nonlinearity of the data if
any. The data set is moved to a higher dimensional space where the new sets of points

obey the linearity and thus PCA can be performed on this new set.

II. PCA, FPCA, KPCA

Principal Component analysis was introduced in early 20th century by Karl Pearson. PCA

is a data reduction and classification technique. Under this technique, if we have n sample

points with k features (usually k>n) then, we aim to find the basis for the subspace

corresponding to the linear span of these n sample points in feature space ℝ𝑘 . Clearly the

dimension of this subspace will be less than or equal to n. Also we wish to place the basis

elements in an order such that, the first basis element is the major factor that brings out

differences amongst the sample points, the second basis element is the next major factor

and so on. While aiming to find such a basis, it turns out that the basis elements are

nothing but the eigenvectors of the covariance matrix in the feature space. Thus one carry

out SVD of the covariance matrix to find its eigenvectors and corresponding eigenvalues.

These basis elements are known as principal components.

Now in case when the features can be treated as a continuum, for example a quantity

been measured at a regular interval of time be it every second or every minute, then the

data can be regarded as a smooth function instead of discrete. In this case, our n sample

points can be treated as n functions and if these functions are assumed to be in ℒ 2 , then

one again finds the basis of the linear span of these n sample points in ℒ 2 .The basis

elements of course must satisfy the order in the same sense as PCA. This variant of PCA

is called FPCA. In our case since we are dealing with high frequency data data picked at

every 30 seconds; thus FPCA seems reasonable to use.

Once we get the basis, we express our data as linear combination of the principal

components. We then use clustering algorithms like k-means clustering to cluster the data

into different groups. K-means clustering however at times fails to consider nonlinearity

of the data, if present. Thus the next method we tried is the nonlinear extension of PCA

called as Kernel principal component analysis or KPCA. Here the idea is if the data is not

linearly separable in 𝑑 < 𝑛 dimensions, it however can almost always be linearly

separable in higher dimensions. One defines a map 𝜙: 𝑅 𝑑 → 𝑅 𝑁 , 𝑁 > 𝑑 such that our data

under this map is linearly separable in 𝑅 𝑁 . We have carried out analysis of the data by

applying both FPCA and KPCA with Gaussian kernels and summarized the results

obtained.

III. Data Description

We picked tick by tick data for the year 2014, from National Stock Exchange. We started

with the stocks listed in CNX100 index, for that year. Initially all the 100 stocks listed in

CNX100 index were picked but during the course of analysis 11 stocks were dropped due

to insufficient data values or missing data. CNX100 index, consist of the Nifty50 and

the CNX Nifty Junior stocks. Table 1 gives the detail of the composition. Also market

opens at 9 o’clock in the morning and is functional till 4 PM, but the active trading

sessions occurs between 9:30 till 3:30. Considering this, we have taken data between

9:30AM till 3:30PM, 6 hours a day, in our analysis. Every 30second is considered a tick,

and thus in each day we have 720 tick points for each stock. For each stock, volume

weighted average price (VWAP) per 30 seconds is calculated and used for further

analysis.
Number of stocks in
Industry Type
CNX100
INDUSTRIAL
6
MANUFACTURING
CEMENT & CEMENT
4
PRODUCTS
SERVICES 2
AUTOMOBILE 10
CONSUMER GOODS 14
PHARMA 10
FINANCIAL SERVICES 14
ENERGY 10
METALS 6
TELECOM 3
CONSTRUCTION 2
CHEMICALS 1
IT 6
Table 1: Composition of index CNX100, 2014

Number of stocks considered 89

Number of active trading 229

days in 2014
Number of daily ticks for 720
each stock
Magnitude of data 1,46,74,320
Table 2: Summary of the data

IV. Methodology and Analysis

In the past, many researchers have used correlation coefficient to understand the network

amongst the stocks. We started our analysis with the same. Two stocks are taken at a time

and Spearman rank correlation coefficients for each 3916 such pairs are calculated for

every working day in 2014. Each day consists of 720 ticks. A sample of these correlation

coefficients is given in figure 1. Figure 1 gives the correlation coefficient between each
pair of stocks for the first trading day of each month. Also table 3 summarizes Figure 1.

For most of these pairs, the correlation coefficient is observed to be less than 0.5, in fact

they are as low as 0.2. One key observation of this analysis is that 8 out of 12 times, the

maximum value of correlation coefficient, though very small, is seen to occur between PNB and

Bank of Baroda. Two state owned multinational banks are seen moving hand in hand even at a

30 sec tick scale. We further investigate this by running k-means algorithm every day for 229

days to form clusters of these 89 stocks by using distance metric as 𝑑(𝑆1 , 𝑆2 ) = 1 − 𝑐𝑜𝑟𝑟(𝑆1 ,

𝑆2 ). There after calculating hamming distance for each 3916 pairs. Hamming distance between

two vectors of same length is number of positions at which corresponding values are different.

In our case it gives number of times two stocks were in different clusters on a specific day. We

then pick pairs which are together for p%(varying from 90% to 50%) of times and marked an

edge between them. This way network within the stocks are built up. Figure 2 gives the

prominent subgraphs.
Figure 1: Correlation coefficient matrix for the first trading day of each month in form of an image.
Correlation coefficient between each pair represented by grayscale image, ranging from black to white,
minimum to maximum.

correlation coefficient no of pairs with correlation no of pairs with correlation

Day
max min coefficient > 0.5 coefficient > 0.6
Jan 1, 2014 0.2796 -0.1351 0 0
Feb 3, 2014 0.3913 -0.1845 0 0
Mar 3, 2014 0.4828 -0.2005 0 0
Apr 1, 2014 0.6639 -0.153 10 1
May 2, 2014 0.5162 -0.1552 2 0
Jun 2, 2014 0.4596 -0.1532 0 0
Jul 1, 2014 0.4029 -0.1196 0 0
Aug 1, 2014 0.4951 -0.1489 0 0
Sept 1, 2014 0.4045 -0.1535 0 0
Oct 1, 2014 0.3941 -0.1441 0 0
Nov 3, 2014 0.3662 -0.1798 0 0
Dec 1, 2014 0.381 -0.1339 0 0

Table 3: Summary of the correlation coefficient obtained from each of 3916 pairs corresponding to first
trading day of each month.

70% together

Axis Yes
55% together
Bank Bank
Figure 2:
Networks found in case of correlation coefficient method for hamming distance (a) <30% i.e. at least
70% of the times stocks are together, similarly (b) <45%

It is quite evident that PNB and Bank of Baroda, two nationalized banks, are seen to be moving

hand in hand quite a number of days, 162 out of 229 days ~ 71% of times. Similar is the case

with Axis Bank and Yes Bank, 131 out of 229 ~ 57%. Though the numbers are not so impressive
but still one can relate the occurrence of these pairs in same clusters with the sector they come

from.

We then further investigate this by running FPCA and KPCA with Gaussian Kernel with 𝜎 = 1, on

the data for each 229 working days. Principal components that explained variability of at least

75%, varying till 92%, are picked and clustering procedure is applied to it. For each 229 days, we

then use k-means clustering to put all the 89 stocks in various clusters. Again hamming distance

is used to form the graphs in similar way. Figure 3 and figure 4 gives the prominent sub graphs

for different p%.

All the programming is done in Matlab and R. Specifically for FPCA, we have used Matlab

package fdaM “http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/Matlab/ “.

FPCA, 80% together

FPCA, >70% together

FPCA, 50% together

FPCA, 60% together

Figure 3: Networks found in case of FPCA for hamming distance (a) <20% i.e. more than 80% of the
times stocks are together, similarly (b) <30% (c) <40% (d) <50%

KPCA , 90% together

KPCA, 85% together KPCA, 80% together

KPCA, 75% together KPCA, 70% together

Figure 4: Networks found in case of KPCA for hamming distance (a) <10% i.e. more than 90% of the
times stocks are together, similarly (b) <15% (c) <20% (d) <25% (d) <30%

P values corresponding to the strength of the relationship is also calculated in case of all three

methods. We performed hypothesis testing to test proportion of times the two stocks were
together in the same group. Table 4 summarizes the p-values obtained following KPCA method

corresponding to one tailed test, proportion of times two stocks are together 80% of times

against that they are together more than 80% of times. Table 5 compares the p-values from

different methods.

No. of times
together in a z-statistic p value
S1 S2
same cluster KPCA KPCA
KPCA
PNB Bank of Baroda 213 4.923098573 4.25923E-07
ICICI Axis Bank 202 3.105847422 0.000948673
TCS Infy 205 3.601461372 0.000158217
TCS Wipro 198 2.445028822 0.007242028
TCS Techm 198 2.445028822 0.007242028
Infy Wipro 198 2.445028822 0.007242028
Infy Techm 202 3.105847422 0.000948673
Table 4: Hypothesis testing for 7 strongest pairs considering KPCA. Test: one tailed test, proportion of
times two stocks are together 80% of times against that they are together more than 80% of times.
Rank correlation-
S1 S2 FPCA KPCA
coefficient

no.of no.of no.of

days pair days pair days pair
was in pvalue was in pvalue was in pvalue
same same same
cluster cluster cluster
PNB Bank of Baroda 162 4.03E-01 187 5.901E-05 213 1.49E-14
ICICI Axis Bank 125 1 150 9.31E-01 202 9.09E-10
TCS Infy 47 1 127 1 205 5.75E-11
TCS Wipro 30 1 107 1 198 2.72E-08
TCS Techm 32 1 101 1 198 2.72E-08
Infy Wipro 28 1 117 1 198 2.72E-08
Infy Techm 31 1 94 1 202 9.09E-10
Yes Bank Axis Bank 131 1 119 1 177 8.02E-03
HINDPETRO BPCL 123 1 175 1.70E-02 145 9.86E-01
TATAMTRDVR TATAMOTORS 114 1 137 1 81 1
Table 5: Hypothesis testing for 10 prominent pairs, with sample size 229. Test: one tailed test,
proportion of times two stocks are together 70% of times against that they are together more than 70%
of times.

Keeping in mind that 2014 was a special year, General Elections were held in India during the month of

April and May, it looked reasonable to look into this time period separately as well. We decided to pick

data starting from March 2014, as promotional rallies were been held, extending till May 2014. With this

set of data, same steps are repeated using all three different methods and the key features of this

analysis are summarized as follows:

 The same networks showed up and this time they were seen to be more tightly bonded. Figure 5

gives the percentage a pair was found to be together in the same cluster. Also, p-values

corresponding to top 10 pairs are shown in table 6.

 Network corresponding to the banking sector again emerged to be a standalone network as

seen in previous analysis.

 IT network however was seen to be interacting with stocks from different sectors.

[0, 40%)

[40%, 60%)

a b c

[60%, 80%)

[80%, 100%]

d e f

Figure 5: All 89 stocks are listed as rows and columns, and for each pair a colour represents the
percentage(𝑝) the pair were together in the same cluster. Colour scheme: Red: 𝑝 ≥ 80%, Green:
60% ≤ 𝑝 < 80%, Blue: 40% ≤ 𝑝 < 60% and White: 𝑝 < 40% . (a) rank correlation, 2014 data (b)
FPCA, 2014 data (c) KPCA, 2014 data (d) rank correlation, Mar 2014-May 2014 data (e) FPCA, Mar 2014-
May 2014 data (f) KPCA, Mar 2014-May 2014 data
Rank correlation-
1 S2 FPCA KPCA
coefficient

no.of no.of no.of

days pair days pair days pair
was in pvalue was in pvalue was in pvalue
same same same
cluster cluster cluster
PNB Bank of Baroda 35 1.84E-01 38 3.10E-02 45 1.91E-05
ICICI Axis Bank 22 1 35 1.84E-01 39 1.43E-02
TCS Infy 11 1 33 3.98E-01 43 2.56E-04
TCS Wipro 10 1 28 9.12E-01 43 2.56E-04
TCS Techm 6 1 24 9.96E-01 40 6.04E-03
Infy Wipro 11 1 29 8.48E-01 42 8.08E-04
Infy Techm 7 1 22 9.99E-01 42 8.08E-04
Yes Bank Axis Bank 23 1 20 1.00E+00 33 3.98E-01
HINDPETRO BPCL 24 1 34 2.81E-01 32 5.26E-01
TATAMTRDVR TATAMOTORS 25 1 30 7.60E-01 17 1.00E+00
Table 6: Hypothesis testing for 10 prominent pairs, with sample size 46, March 2014- May 2014. Test:
one tailed test, proportion of times two stocks are together 70% of times against that they are together
more than 70% of times

V. Conclusion

The aim of this paper is to study interactions between the stocks at tick by tick level for which

we picked 30 second as our tick size and studied behavior of 89 stocks out of 100 stocks listed in

CNX100 for the year 2014. Where the correlation coefficient actually fails to give us a detailed

picture, KPCA with Gaussian kernel gives an insight to this level of analysis with higher power.

Some of the sectors are seen to be in tight association and moving together. Stocks from the

banking sector and the IT sector are seen to form a tight knitted networks respectively. Some of

the stocks present under automobile industry are seen to be interacting with the IT hub while

energy sector also emerged to be an independent network by itself. The knowledge of these

interactions at this level will certainly be useful to our intraday traders while deciding their

portfolios on a day to day basis.

Since IT and financial services sector emerged out to be tight knitted networks in our analysis,

we plan to study these networks in depth in our future work. We aim to fit a multivariate

stochastic model to each of these networks at the high frequency level. Fitting the right model

will certainly help us to improve predictions of risk quantifiers like VaR(Value at risk) at the

individual stock levels and thus also at portfolio level. The knowledge of estimated VaR can be

then used by the brokering firms to set margin money for intraday stock traders.

References:

1. Boginski, Vladimir, Sergiy Butenko, and Panos M. Pardalos, Mining market data: a

network approach (2006), Computers & Operations Research.

2. Ramsay, J. O., Munhall, K. G., Gracco, V. L. and Ostry, D. J. (1996) Functional data

analysis of lip motion. Journal of the Acoustical Society of America.

3. Ramsay, J. O. (1998) Estimating smooth monotone functions. Journal of the Royal

Statistical Society.

4. Ramsay, J. O. (2000) Functional components of variation in handwriting. Journal of the American

Statistical Association

5. Ramsay, J. O. and Ramsey, J. B. (2001) Functional data analysis of the dynamics of the monthly

index of nondurable goods production. Journal of Econometrics

6. G., Cao, J. and Ramsay, J. O. (2007). Parameter cascades and profiling in functional data

analysis. Computational Statistics

7. Siverman, J. O. Ramsay and N. Heckman (1997) Spline smoothing with model-based penalties.

Behavior Research Methods, Instruments, and Computers

8. Zhiliang Wang, Yalin Sun and Peng Li, Functional Principal Components Analysis of

Shanghai Stock Exchange 50 Index (2014), Discrete Dynamics in Nature and Society.
9. Karl Pearson, On lines and planes of closest fit to systems of points in space (1901),

Philosophical Magazine.

10. L.J.Cao, K.S. Chua, W.K.Chong, H.P.Lee, Q.M Gu(2003),A comparison of PCA, KPCA

and ICA for dimensionality reduction in support vector machine. Neurocomputing. 55. 321-

336. 10.1016/S0925-2312(03)00433-8.

00 Main Pt1to6
No ratings yet
00 Main Pt1to6
815 pages
00 Main Pt1to4
No ratings yet
00 Main Pt1to4
524 pages
Lajos Horváth
No ratings yet
Lajos Horváth
426 pages
Time Series For Retail Store
No ratings yet
Time Series For Retail Store
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
191 pages
Practical Guide To Principal Component Methods in R Multivariate Analysis Book 2 Alboukadel Kassambara Download
No ratings yet
Practical Guide To Principal Component Methods in R Multivariate Analysis Book 2 Alboukadel Kassambara Download
65 pages
Indian Stock Market
No ratings yet
Indian Stock Market
24 pages
A Novel Curve Clustering Method For Functional Dat
No ratings yet
A Novel Curve Clustering Method For Functional Dat
28 pages
Review 151106
No ratings yet
Review 151106
41 pages
Sector Wise Stock Analysis
No ratings yet
Sector Wise Stock Analysis
15 pages
Unit 3 DWDM
No ratings yet
Unit 3 DWDM
25 pages
A Predictive Analysis of The Indian FMCG Sector Using Time Series Decomposition-Based Approach
No ratings yet
A Predictive Analysis of The Indian FMCG Sector Using Time Series Decomposition-Based Approach
25 pages
7 Financial Data
No ratings yet
7 Financial Data
42 pages
Seminar Report (T9247)
No ratings yet
Seminar Report (T9247)
28 pages
Market Research
No ratings yet
Market Research
14 pages
Nabeel Research Paper
No ratings yet
Nabeel Research Paper
8 pages
Multivariate Analysis
100% (2)
Multivariate Analysis
57 pages
Wellbore Surveying Procedures
100% (2)
Wellbore Surveying Procedures
85 pages
Applied Functional Data Analysis Methods and Case Studies Full Book Download
100% (9)
Applied Functional Data Analysis Methods and Case Studies Full Book Download
17 pages
Industry Session - Stock Market Analysis Using PCA
No ratings yet
Industry Session - Stock Market Analysis Using PCA
7 pages
Assignment - Unit-5 Answers
No ratings yet
Assignment - Unit-5 Answers
6 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
PCA Analysis
No ratings yet
PCA Analysis
28 pages
Time Series Data
No ratings yet
Time Series Data
19 pages
Optimization Techniques To Analyze Inflation Under Statistical Methods and Machine Learning Approach
No ratings yet
Optimization Techniques To Analyze Inflation Under Statistical Methods and Machine Learning Approach
8 pages
Rio 094551
No ratings yet
Rio 094551
8 pages
Time Series Statistics For Stationary Process Stock Analysis and Prediction
No ratings yet
Time Series Statistics For Stationary Process Stock Analysis and Prediction
9 pages
Portfolio Optimization Using Time Series Analysis
No ratings yet
Portfolio Optimization Using Time Series Analysis
17 pages
Stock Price Prediction Using K-Nearest Neighbor (KNN) Algorithm
No ratings yet
Stock Price Prediction Using K-Nearest Neighbor (KNN) Algorithm
13 pages
A Study On Security Analysis of Selected Stocks
No ratings yet
A Study On Security Analysis of Selected Stocks
16 pages
Tutorial 4
No ratings yet
Tutorial 4
4 pages
Mining of Stock Data: Intra-And Inter-Stock Pattern Associative Classification
No ratings yet
Mining of Stock Data: Intra-And Inter-Stock Pattern Associative Classification
7 pages
CCBD 2016 041
No ratings yet
CCBD 2016 041
4 pages
Applied Functional Data Analysis Methods and Case Studies Optimized PDF Download
No ratings yet
Applied Functional Data Analysis Methods and Case Studies Optimized PDF Download
15 pages
Chaos Theory
No ratings yet
Chaos Theory
4 pages
Predicting Stock Prices Using Data Mining Techniques: Qasem A. Al-Radaideh
No ratings yet
Predicting Stock Prices Using Data Mining Techniques: Qasem A. Al-Radaideh
8 pages
Reading Material - Module-3 - Introduction To Time Series Analysis
No ratings yet
Reading Material - Module-3 - Introduction To Time Series Analysis
17 pages
Some Quantitative Issues in Pairs Trading
No ratings yet
Some Quantitative Issues in Pairs Trading
6 pages
FULLTEXT01
No ratings yet
FULLTEXT01
57 pages
Stock Price Movements Classification Using Machine and Deep Learning Techniques-The Case Study of Indian Stock Market
No ratings yet
Stock Price Movements Classification Using Machine and Deep Learning Techniques-The Case Study of Indian Stock Market
8 pages
Chapter 5 Time Series Analysis
No ratings yet
Chapter 5 Time Series Analysis
8 pages
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms
No ratings yet
Predicting Stock Market Trends Using Machine Learning and Deep Learning Algorithms
8 pages
IJAMSS - Collective Behaviour Detection of Stock Markets - Heba Elsegai
No ratings yet
IJAMSS - Collective Behaviour Detection of Stock Markets - Heba Elsegai
12 pages
Time Series
No ratings yet
Time Series
19 pages
Healy World Presentation Company en in
No ratings yet
Healy World Presentation Company en in
70 pages
Engineering Journal Discovery of Fraud in Medical Insurance
No ratings yet
Engineering Journal Discovery of Fraud in Medical Insurance
10 pages
2009 - Clustering Techniques For Financial Diversification
No ratings yet
2009 - Clustering Techniques For Financial Diversification
6 pages
Liam - Mescall - PCA Project
No ratings yet
Liam - Mescall - PCA Project
15 pages
Chapter Vii Time Series Analysis
No ratings yet
Chapter Vii Time Series Analysis
6 pages
An Overview of Functional Data Analysis: Woncheol Jang
No ratings yet
An Overview of Functional Data Analysis: Woncheol Jang
5 pages
PCA Term Structure
No ratings yet
PCA Term Structure
28 pages
Statistical Arbitrage For Mid-Frequency Trading
No ratings yet
Statistical Arbitrage For Mid-Frequency Trading
17 pages
Get (Ebook) GPS For Land Surveyors by Van - Sickle, Jan ISBN 9781575040752, 1575040751 Free All Chapters
No ratings yet
Get (Ebook) GPS For Land Surveyors by Van - Sickle, Jan ISBN 9781575040752, 1575040751 Free All Chapters
81 pages
R125lcr-9a R125lcrd-9a Demo PDF
100% (1)
R125lcr-9a R125lcrd-9a Demo PDF
37 pages
Discovering Pattern Associations in Hang Seng Index Constituent Stocks
No ratings yet
Discovering Pattern Associations in Hang Seng Index Constituent Stocks
10 pages
Lens Material
No ratings yet
Lens Material
3 pages
Analysis of Stock Market Data by Using Dynamic Fourier and Wavelets Teknik
No ratings yet
Analysis of Stock Market Data by Using Dynamic Fourier and Wavelets Teknik
13 pages
TOS 1 Unit 5 Supports and Loads
No ratings yet
TOS 1 Unit 5 Supports and Loads
12 pages
Time Series
No ratings yet
Time Series
1 page
Algorthmic Trading
No ratings yet
Algorthmic Trading
21 pages
MR Lungu-Maths Presentation 2023
No ratings yet
MR Lungu-Maths Presentation 2023
36 pages
970B EMC Outsourced Test Report
No ratings yet
970B EMC Outsourced Test Report
230 pages
NCRP Dosimetry of X-Ray and Gamma-Ray Beams For Radiation Therapy in The Energy Range 10 KeV To 50 MeV
No ratings yet
NCRP Dosimetry of X-Ray and Gamma-Ray Beams For Radiation Therapy in The Energy Range 10 KeV To 50 MeV
123 pages
Trends and TA in Stocks
No ratings yet
Trends and TA in Stocks
5 pages
To Study and Analyze To Foresee Market Using Data Mining Technique
No ratings yet
To Study and Analyze To Foresee Market Using Data Mining Technique
4 pages
Work - Energy Theorem and Power
100% (1)
Work - Energy Theorem and Power
33 pages
Me Sci 8 q1 0704 Ps
No ratings yet
Me Sci 8 q1 0704 Ps
24 pages
Ficha Tecnica - Flujometro F-300
No ratings yet
Ficha Tecnica - Flujometro F-300
2 pages
García B. Et. Al. (2025) - Locating Active Faults in The Cusco Valley Using Magnetotelluric and Radon Gas Data
No ratings yet
García B. Et. Al. (2025) - Locating Active Faults in The Cusco Valley Using Magnetotelluric and Radon Gas Data
15 pages
Heat Transfer Activity Sheet
No ratings yet
Heat Transfer Activity Sheet
3 pages
MCE 312 Lecture Note (Module 2)
No ratings yet
MCE 312 Lecture Note (Module 2)
29 pages
Huang D D 2014
No ratings yet
Huang D D 2014
193 pages
Some of Existing Method of Pair Trading
No ratings yet
Some of Existing Method of Pair Trading
11 pages
Dav Kapildev Public School: ACADEMIC YEAR: 2023-24
No ratings yet
Dav Kapildev Public School: ACADEMIC YEAR: 2023-24
12 pages
MODULE 4 - Lecture Notes
No ratings yet
MODULE 4 - Lecture Notes
11 pages
Aschi Keys Code 1sdkjd
No ratings yet
Aschi Keys Code 1sdkjd
10 pages
Bending Stress-Experiment2 (2) 2018 Memo
No ratings yet
Bending Stress-Experiment2 (2) 2018 Memo
13 pages
Electronic Delivery Cover Sheet: This Notice Is Posted in Compliance With Title 37 C. F. R., Chapter II, Part 201.14
No ratings yet
Electronic Delivery Cover Sheet: This Notice Is Posted in Compliance With Title 37 C. F. R., Chapter II, Part 201.14
21 pages
M2 Chapter3
No ratings yet
M2 Chapter3
32 pages
Nodal and Mesh Analysis - Contoh Soal
No ratings yet
Nodal and Mesh Analysis - Contoh Soal
49 pages
1940 Hartline
No ratings yet
1940 Hartline
10 pages
Lec 2 62 80l Eng
No ratings yet
Lec 2 62 80l Eng
4 pages
Autoclave Step by Step Instructions
No ratings yet
Autoclave Step by Step Instructions
10 pages
Gra 2
No ratings yet
Gra 2
5 pages
Rauk 2003
No ratings yet
Rauk 2003
12 pages
PVV Solar Cell US0123 PDF
No ratings yet
PVV Solar Cell US0123 PDF
2 pages
Achieving A Constant Power Speed Range For Drives: R. Slemon
No ratings yet
Achieving A Constant Power Speed Range For Drives: R. Slemon
5 pages

ACluster HFT

Uploaded by

ACluster HFT

Uploaded by

Title: Cluster analysis of stocks using price movements of high frequency data from National

significantly to develop trading strategies for intraday traders.

Shiv Nadar University, UP.

Nagar, Uttar Pradesh – 201314. Phone No: +91-9911750311, Email: [email protected]

techniques used by us are extensions of a well-known classification technique called as

II. PCA, FPCA, KPCA

These basis elements are known as principal components.

every 30 seconds; thus FPCA seems reasonable to use.

linearly separable in 𝑑 < 𝑛 dimensions, it however can almost always be linearly

III. Data Description

Number of stocks considered 89

Number of active trading 229

IV. Methodology and Analysis

correlation coefficient no of pairs with correlation no of pairs with correlation

for different p%.

package fdaM “http://www.psych.mcgill.ca/misc/fda/downloads/FDAfuns/Matlab/ “.

FPCA, 80% together

FPCA, >70% together

FPCA, 50% together

KPCA , 90% together

KPCA, 75% together KPCA, 70% together

no.of no.of no.of

analysis are summarized as follows:

corresponding to top 10 pairs are shown in table 6.

 Network corresponding to the banking sector again emerged to be a standalone network as

seen in previous analysis.

no.of no.of no.of

portfolios on a day to day basis.

network approach (2006), Computers & Operations Research.

analysis of lip motion. Journal of the Acoustical Society of America.

3. Ramsay, J. O. (1998) Estimating smooth monotone functions. Journal of the Royal

4. Ramsay, J. O. (2000) Functional components of variation in handwriting. Journal of the American

index of nondurable goods production. Journal of Econometrics

analysis. Computational Statistics

Behavior Research Methods, Instruments, and Computers

You might also like