0% found this document useful (0 votes)
85 views4 pages

A Basic Approach To K-Means Clustering Applied To Stock Data

This document discusses applying K-means clustering to stock market data. It begins with an introduction to clustering and the K-means algorithm. It then describes implementing K-means on return data for 50 stocks over 2 years to group them into optimized clusters. The results show 3 clusters emerged with different mean returns and standard deviations. Plotting the clustered return data reveals how the points were grouped among the 3 centroids.

Uploaded by

Abhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views4 pages

A Basic Approach To K-Means Clustering Applied To Stock Data

This document discusses applying K-means clustering to stock market data. It begins with an introduction to clustering and the K-means algorithm. It then describes implementing K-means on return data for 50 stocks over 2 years to group them into optimized clusters. The results show 3 clusters emerged with different mean returns and standard deviations. Plotting the clustered return data reveals how the points were grouped among the 3 centroids.

Uploaded by

Abhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

A Basic approach to K-Means Clustering applied to stock data

Abstract 2. Clustering Algorithms

Clustering is a task of grouping a set of Clustering algorithms can be broadly


object with similar characteristic into one classified into two categories: Unsupervised
bucket and differentiates them with rest of linear clustering algorithms and
the group. This paper explains the unsupervised non-linear clustering
clustering process using simplest of algorithms. Where K-means, Hierarchical &
clustering algorithm – the K means. The Gaussian falls under unsupervised linear
idea further applied to stock market data whereas kernel K-means & density based
and tried to understand how we can use clustering algorithm fall under unsupervised
this method to make some meaningful non-linear clustering algorithms. Whereas
information out of this. unsupervised method is basically when no
information is provided to the algorithm on
1. Introduction
which data points belong to which clusters.
Data clustering is a way in which we make
2.1 K-Means clustering
cluster of objects that are somehow similar
in characteristics. Precisely, Data Clustering K-means clustering is an example of a
is a technique in which, the information partitioning (bottom-up) algorithm. Data
that is logically similar is stored together. In points are grouped based on similarity, but
clustering the objects of similar properties the degree of homogeneity of the clusters
are placed in one class of objects and a that are formed is dependant largely on
single access to the group makes the entire how many clusters the algorithm is told to
class available. A loose definition of find.
clustering could be “the process of
K-Means initializes cluster centroids with
organizing objects into groups whose
randomly selected data points and then
members are similar in some way”.
iteratively assigns the data points to their
A cluster is therefore a collection of objects
closest cluster and updates the centroids to
which are “similar” between them and are
the mean of the respective data points.
“dissimilar” to the objects belonging to
other clusters. Hence managing a data is a The Euclidean distance is the straight-line
complex job. Grouping them into different distance between two points. It is named
clusters will bring an order in data. The goal after the "Father of Geometry", the Greek
of the clustering becomes very clear where mathematician Euclid.
we are trying to discover the underlying
structure of the data.
2.2 K-Means clustering inference 3. Implementation

K-means clustering is one of the basic So far we have covered what K-means is all
clustering algorithms in the machine about, now we will look at how to apply this
learning domain. The inference of this concept to real world data. Stock data is
algorithm is based on the value of ‘K’ which universe where each of them is related in
is the number of clusters that can be found some way or the other, to bring out some
in n-dimensional dataset. In K-Means meaningful information out of this universe
algorithm, since it is consider there are ‘k’ is of much importance. In this exercise we
number of clusters; we consider there are will take universe of 50 stocks which
’k’ number of cluster means (center points), constitute NIFTY, we will take return series
where the cluster mean is average of all the of this constitutes in order to make uniform
data points falling under each cluster. The series and to create clusters.
end objective algorithm is that each data
In order to carry out this exercise we need
point in the data set is grouped into ‘k’
to have two parameters for each stocks, so
cluster and ‘k’ cluster means. If the dataset
we will take mean and standard deviation
is tightly surrounding the cluster means,
of each stock for last two years of data and
then it consider as a good cluster.
plot them to see how they looks like.

Name Mean S.D.


2.3 K-Means clustering algorithm
ACC 0.00045 0.01536
1. Place K points into the space
represented by the objects that are Ambuja Cements 0.00076 0.01915
being clustered. These points
Asian Paints 0.00070 0.01469
represent initial group centroids.
Axis Bank 0.00032 0.02261
2. Assign each object to the group that :
has the closest centroid.

3. When all objects have been Tata Power (0.00036) 0.02166


assigned, recalculate the positions of
the K centroids. Tata Steel (0.00099) 0.02218

4. Repeat Steps 2 and 3 until the TCS 0.00142 0.01667


centroids no longer move. This
Ultra Tech Cement 0.00123 0.01508
produces a separation of the objects
into groups from which the metric
to be minimized can be calculated.
We need to sum the value of minimum
Return Series distance and our objective is to minimize
0.03500 this distance in order to achieve optimized
0.03000 clusters with the help of solver we have
0.02500 minimized the minimum distance and came
0.02000 up with the optimized value of clusters.
0.01500
K=3
0.01000 Cluster-1 Cluster-2 Cluster-3
MEAN SD MEAN SD MEAN SD
0.00500
0.00060 0.01572 (0.00016) 0.02095 (0.00033) 0.02514
-
(0.00400) (0.00200) - 0.00200 0.00400
We can plot the after effect of the
Return Series optimized cluster with return series and
also look at position movement of data
Once we plot them we need to create initial
point among new optimized clusters.
‘K’ cluster to initialize the clustering effects.
0.03500
K=3
Cluster-1 Cluster-2 Cluster-3
0.03000
MEAN SD MEAN SD MEAN SD
(0.00191) 0.01343 0.00035 0.01870 0.00184 0.03240 0.02500
0.02000
In K-means the objective is to minimize the
distance between data point and centroid 0.01500

for this reason the next step is to find 0.01000


Euclidian distance of each data point with 0.00500
all centroids. -
Distance to Centroid 1 Distance to Centroid 2 Distance to Centroid 3
0.00039 0.00562 0.00981
Data Point Centroid-1
0.00343 0.00202 0.00609
0.00104 0.00632 0.01050 Centroid-2 Centroid-3
0.00689 0.00172 0.00262
0.00091 0.00441 0.00857
3. Conclusion

After finding out distance we need to figure This paper highlight about clustering further
out minimum distance and among which discusses about linear form of unsupervised
cluster particular data point belongs clustering method K-means clustering. The
Class Minimum Distance idea further implemented to universe of
Cluster 1 0.00039 stock data with their return and standard
Cluster 2 0.00202 deviation properties and tries to classify
Cluster 1 0.00104 optimum cluster for each stocks.
Cluster 2 0.00172
Cluster 1 0.00091
Name 2 Years Reference
ACC Cluster 1
Ambuja Cements Cluster 2  http://en.wikipedia.org/wiki/Cluster_an
Asian Paints Cluster 1
alysis
Axis Bank Cluster 2
Bajaj Auto Cluster 1  Aravind H, C Rajgopal, K P Soman. “A
Bank of Baroda Cluster 2 simple approach to Clustering in excel”
Bharti Airtel Cluster 2 (international journal of computer
BHEL Cluster 3
applications (0975-8887)
BPCL Cluster 2
Cairn India Cluster 1  https://sites.google.com/site/dataclust
Cipla Cluster 1 eringalgorithms/
Coal India Cluster 1  http://www.microarrays.ca/services/km
DLF Cluster 3 eans_clustering.pdf
Dr Reddys Labs Cluster 1
GAIL Cluster 1
 http://home.deib.polimi.it/matteucc/Cl
Grasim Cluster 1 ustering/tutorial_html/kmeans.html
HCL Tech Cluster 1
HDFC Cluster 1
HDFC Bank Cluster 1
Hero Motocorp Cluster 1
Hindalco Cluster 3
HUL Cluster 1
ICICI Bank Cluster 2
IDFC Cluster 3
IndusInd Bank Cluster 2
Infosys Cluster 2
ITC Cluster 1
Jaiprakash Asso Cluster 3
Jindal Steel Cluster 3
Kotak Mahindra Cluster 1
Larsen Cluster 2
Lupin Cluster 1
Mah and Mah Cluster 1
Maruti Suzuki Cluster 2
NMDC Cluster 2
NTPC Cluster 1
ONGC Cluster 1
PNB Cluster 2
Power Grid Corp Cluster 1
Ranbaxy Labs Cluster 3
Reliance Cluster 1
Reliance Infra Cluster 3
SBI Cluster 2
Sesa Goa Cluster 3
Sun Pharma Cluster 1
Tata Motors Cluster 3
Tata Power Cluster 2
Tata Steel Cluster 2
TCS Cluster 1
UltraTechCement Cluster 1

You might also like