A Basic Approach To K-Means Clustering Applied To Stock Data

This document discusses applying K-means clustering to stock market data. It begins with an introduction to clustering and the K-means algorithm. It then describes implementing K-means on return data for 50 stocks over 2 years to group them into optimized clusters. The results show 3 clusters emerged with different mean returns and standard deviations. Plotting the clustered return data reveals how the points were grouped among the 3 centroids.

Uploaded by

Abhay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views4 pages

A Basic Approach To K-Means Clustering Applied To Stock Data

Uploaded by

Abhay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

A Basic approach to K-Means Clustering applied to stock data

Abstract 2. Clustering Algorithms

Clustering is a task of grouping a set of Clustering algorithms can be broadly

object with similar characteristic into one classified into two categories: Unsupervised
bucket and differentiates them with rest of linear clustering algorithms and
the group. This paper explains the unsupervised non-linear clustering
clustering process using simplest of algorithms. Where K-means, Hierarchical &
clustering algorithm – the K means. The Gaussian falls under unsupervised linear
idea further applied to stock market data whereas kernel K-means & density based
and tried to understand how we can use clustering algorithm fall under unsupervised
this method to make some meaningful non-linear clustering algorithms. Whereas
information out of this. unsupervised method is basically when no
information is provided to the algorithm on
1. Introduction
which data points belong to which clusters.
Data clustering is a way in which we make
2.1 K-Means clustering
cluster of objects that are somehow similar
in characteristics. Precisely, Data Clustering K-means clustering is an example of a
is a technique in which, the information partitioning (bottom-up) algorithm. Data
that is logically similar is stored together. In points are grouped based on similarity, but
clustering the objects of similar properties the degree of homogeneity of the clusters
are placed in one class of objects and a that are formed is dependant largely on
single access to the group makes the entire how many clusters the algorithm is told to
class available. A loose definition of find.
clustering could be “the process of
K-Means initializes cluster centroids with
organizing objects into groups whose
randomly selected data points and then
members are similar in some way”.
iteratively assigns the data points to their
A cluster is therefore a collection of objects
closest cluster and updates the centroids to
which are “similar” between them and are
the mean of the respective data points.
“dissimilar” to the objects belonging to
other clusters. Hence managing a data is a The Euclidean distance is the straight-line
complex job. Grouping them into different distance between two points. It is named
clusters will bring an order in data. The goal after the "Father of Geometry", the Greek
of the clustering becomes very clear where mathematician Euclid.
we are trying to discover the underlying
structure of the data.
2.2 K-Means clustering inference 3. Implementation

K-means clustering is one of the basic So far we have covered what K-means is all
clustering algorithms in the machine about, now we will look at how to apply this
learning domain. The inference of this concept to real world data. Stock data is
algorithm is based on the value of ‘K’ which universe where each of them is related in
is the number of clusters that can be found some way or the other, to bring out some
in n-dimensional dataset. In K-Means meaningful information out of this universe
algorithm, since it is consider there are ‘k’ is of much importance. In this exercise we
number of clusters; we consider there are will take universe of 50 stocks which
’k’ number of cluster means (center points), constitute NIFTY, we will take return series
where the cluster mean is average of all the of this constitutes in order to make uniform
data points falling under each cluster. The series and to create clusters.
end objective algorithm is that each data
In order to carry out this exercise we need
point in the data set is grouped into ‘k’
to have two parameters for each stocks, so
cluster and ‘k’ cluster means. If the dataset
we will take mean and standard deviation
is tightly surrounding the cluster means,
of each stock for last two years of data and
then it consider as a good cluster.
plot them to see how they looks like.

Name Mean S.D.

2.3 K-Means clustering algorithm
ACC 0.00045 0.01536
1. Place K points into the space
represented by the objects that are Ambuja Cements 0.00076 0.01915
being clustered. These points
Asian Paints 0.00070 0.01469
represent initial group centroids.
Axis Bank 0.00032 0.02261
2. Assign each object to the group that :
has the closest centroid.

3. When all objects have been Tata Power (0.00036) 0.02166

assigned, recalculate the positions of
the K centroids. Tata Steel (0.00099) 0.02218

4. Repeat Steps 2 and 3 until the TCS 0.00142 0.01667

centroids no longer move. This
Ultra Tech Cement 0.00123 0.01508
produces a separation of the objects
into groups from which the metric
to be minimized can be calculated.
We need to sum the value of minimum
Return Series distance and our objective is to minimize
0.03500 this distance in order to achieve optimized
0.03000 clusters with the help of solver we have
0.02500 minimized the minimum distance and came
0.02000 up with the optimized value of clusters.
0.01500
K=3
0.01000 Cluster-1 Cluster-2 Cluster-3
MEAN SD MEAN SD MEAN SD
0.00500
0.00060 0.01572 (0.00016) 0.02095 (0.00033) 0.02514
-
(0.00400) (0.00200) - 0.00200 0.00400
We can plot the after effect of the
Return Series optimized cluster with return series and
also look at position movement of data
Once we plot them we need to create initial
point among new optimized clusters.
‘K’ cluster to initialize the clustering effects.
0.03500
K=3
Cluster-1 Cluster-2 Cluster-3
0.03000
MEAN SD MEAN SD MEAN SD
(0.00191) 0.01343 0.00035 0.01870 0.00184 0.03240 0.02500
0.02000
In K-means the objective is to minimize the
distance between data point and centroid 0.01500

for this reason the next step is to find 0.01000

Euclidian distance of each data point with 0.00500
all centroids. -
Distance to Centroid 1 Distance to Centroid 2 Distance to Centroid 3
0.00039 0.00562 0.00981
Data Point Centroid-1
0.00343 0.00202 0.00609
0.00104 0.00632 0.01050 Centroid-2 Centroid-3
0.00689 0.00172 0.00262
0.00091 0.00441 0.00857
3. Conclusion

After finding out distance we need to figure This paper highlight about clustering further
out minimum distance and among which discusses about linear form of unsupervised
cluster particular data point belongs clustering method K-means clustering. The
Class Minimum Distance idea further implemented to universe of
Cluster 1 0.00039 stock data with their return and standard
Cluster 2 0.00202 deviation properties and tries to classify
Cluster 1 0.00104 optimum cluster for each stocks.
Cluster 2 0.00172
Cluster 1 0.00091
Name 2 Years Reference
ACC Cluster 1
Ambuja Cements Cluster 2  http://en.wikipedia.org/wiki/Cluster_an
Asian Paints Cluster 1
alysis
Axis Bank Cluster 2
Bajaj Auto Cluster 1  Aravind H, C Rajgopal, K P Soman. “A
Bank of Baroda Cluster 2 simple approach to Clustering in excel”
Bharti Airtel Cluster 2 (international journal of computer
BHEL Cluster 3
applications (0975-8887)
BPCL Cluster 2
Cairn India Cluster 1  https://sites.google.com/site/dataclust
Cipla Cluster 1 eringalgorithms/
Coal India Cluster 1  http://www.microarrays.ca/services/km
DLF Cluster 3 eans_clustering.pdf
Dr Reddys Labs Cluster 1
GAIL Cluster 1
 http://home.deib.polimi.it/matteucc/Cl
Grasim Cluster 1 ustering/tutorial_html/kmeans.html
HCL Tech Cluster 1
HDFC Cluster 1
HDFC Bank Cluster 1
Hero Motocorp Cluster 1
Hindalco Cluster 3
HUL Cluster 1
ICICI Bank Cluster 2
IDFC Cluster 3
IndusInd Bank Cluster 2
Infosys Cluster 2
ITC Cluster 1
Jaiprakash Asso Cluster 3
Jindal Steel Cluster 3
Kotak Mahindra Cluster 1
Larsen Cluster 2
Lupin Cluster 1
Mah and Mah Cluster 1
Maruti Suzuki Cluster 2
NMDC Cluster 2
NTPC Cluster 1
ONGC Cluster 1
PNB Cluster 2
Power Grid Corp Cluster 1
Ranbaxy Labs Cluster 3
Reliance Cluster 1
Reliance Infra Cluster 3
SBI Cluster 2
Sesa Goa Cluster 3
Sun Pharma Cluster 1
Tata Motors Cluster 3
Tata Power Cluster 2
Tata Steel Cluster 2
TCS Cluster 1
UltraTechCement Cluster 1

7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Density & Grid Based Clustering
100% (1)
Density & Grid Based Clustering
21 pages
Lec 18
100% (1)
Lec 18
34 pages
Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Unit 4
No ratings yet
Unit 4
125 pages
K-Means Cluster Analysis UC Business Analytics R Programming Guide
No ratings yet
K-Means Cluster Analysis UC Business Analytics R Programming Guide
19 pages
Question Bank - Machine Learning
100% (1)
Question Bank - Machine Learning
4 pages
India Volatility Index (VIX)
No ratings yet
India Volatility Index (VIX)
26 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
The Backpropagation Algorithm
No ratings yet
The Backpropagation Algorithm
4 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
Dbscan Clustering 1
No ratings yet
Dbscan Clustering 1
10 pages
Deep Learning For Human Beings - v2
No ratings yet
Deep Learning For Human Beings - v2
110 pages
Deep Learning - IIT Ropar - Unit 4 - Week 1
No ratings yet
Deep Learning - IIT Ropar - Unit 4 - Week 1
8 pages
Domnic Object Detecion Basics
No ratings yet
Domnic Object Detecion Basics
62 pages
Datamining Lect8
No ratings yet
Datamining Lect8
79 pages
MCQs DL Mid I R20 2023 With Answers
No ratings yet
MCQs DL Mid I R20 2023 With Answers
3 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
PART2
No ratings yet
PART2
61 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
Algo
No ratings yet
Algo
59 pages
Chap6 (Neural Network)
No ratings yet
Chap6 (Neural Network)
63 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
1 DL Introduction
No ratings yet
1 DL Introduction
51 pages
Module 5
No ratings yet
Module 5
98 pages
Artificial Intelligence Mini Project
No ratings yet
Artificial Intelligence Mini Project
5 pages
Unit 4
No ratings yet
Unit 4
74 pages
CS8091 - Big Data Analytics - Unit 2
No ratings yet
CS8091 - Big Data Analytics - Unit 2
44 pages
Presentation 1
No ratings yet
Presentation 1
47 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Unit 4
No ratings yet
Unit 4
40 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Clustering
No ratings yet
Clustering
84 pages
Unit-V Clustering Part 1
No ratings yet
Unit-V Clustering Part 1
26 pages
Black 76
No ratings yet
Black 76
93 pages
Clustering
No ratings yet
Clustering
18 pages
Machine Learning Unit 4
No ratings yet
Machine Learning Unit 4
22 pages
Ai Fundamentals Final Quiz Source by Ate Zein
No ratings yet
Ai Fundamentals Final Quiz Source by Ate Zein
25 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
36 pages
Deep Learning Unit 1..
No ratings yet
Deep Learning Unit 1..
21 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
DM QB
No ratings yet
DM QB
7 pages
Data Mining For BI - Part 5
No ratings yet
Data Mining For BI - Part 5
34 pages
DS3 Lab7
No ratings yet
DS3 Lab7
3 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
AI Week 11
No ratings yet
AI Week 11
21 pages
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
No ratings yet
A Survey On Deep Learning-Based Fine-Grained Object Classification and Semantic Segmentation
17 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
1 Neural Networks
No ratings yet
1 Neural Networks
16 pages
Clustering
No ratings yet
Clustering
28 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Kmean
No ratings yet
Kmean
24 pages
ML Lec-16
No ratings yet
ML Lec-16
16 pages
Neural Network Assignment 1 by Gourav Meena
No ratings yet
Neural Network Assignment 1 by Gourav Meena
14 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
UNIT - 3 - Clustering
No ratings yet
UNIT - 3 - Clustering
21 pages
StyleSwin Transformer-Based GAN For High-Resolution Image Generation
No ratings yet
StyleSwin Transformer-Based GAN For High-Resolution Image Generation
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
9 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
6 Clustering
No ratings yet
6 Clustering
15 pages
DL - FNN - RNN
No ratings yet
DL - FNN - RNN
5 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Face Recognition Based On Convolutional Neural Network.: November 2017
No ratings yet
Face Recognition Based On Convolutional Neural Network.: November 2017
5 pages
Untitled Document
No ratings yet
Untitled Document
19 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
4 - Neural Networks
No ratings yet
4 - Neural Networks
10 pages
Echo State Network
No ratings yet
Echo State Network
3 pages
PyCaret 3.0 Cheat - Sheet
No ratings yet
PyCaret 3.0 Cheat - Sheet
2 pages
Clustering
No ratings yet
Clustering
10 pages
K Mean Clustering1
No ratings yet
K Mean Clustering1
23 pages
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet
J Ipm 2019 102121
No ratings yet
J Ipm 2019 102121
17 pages
Introduction To Artificial Neural Networks
No ratings yet
Introduction To Artificial Neural Networks
19 pages
K-Mean Clustering
No ratings yet
K-Mean Clustering
8 pages
K Mean
No ratings yet
K Mean
7 pages
09 Hull White Model
No ratings yet
09 Hull White Model
14 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
K Means Example
No ratings yet
K Means Example
10 pages
Ensemble Interview Questions
No ratings yet
Ensemble Interview Questions
3 pages
Localvol
No ratings yet
Localvol
5 pages
KMeans Example
No ratings yet
KMeans Example
8 pages
K Means Algo
No ratings yet
K Means Algo
7 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
Introduction to Vectorial and Matricial Calculus
From Everand
Introduction to Vectorial and Matricial Calculus
Simone Malacrida
No ratings yet

A Basic Approach To K-Means Clustering Applied To Stock Data

Uploaded by

A Basic Approach To K-Means Clustering Applied To Stock Data

Uploaded by

A Basic approach to K-Means Clustering applied to stock data

Abstract 2. Clustering Algorithms

Clustering is a task of grouping a set of Clustering algorithms can be broadly

Name Mean S.D.

3. When all objects have been Tata Power (0.00036) 0.02166

4. Repeat Steps 2 and 3 until the TCS 0.00142 0.01667

for this reason the next step is to find 0.01000

You might also like