Machine Learning: MACHINE LEARNING - Copy Rights Reserved Real Time Signals
Machine Learning: MACHINE LEARNING - Copy Rights Reserved Real Time Signals
Machine learning
Machine learning is a type of artificial intelligence (AI) that allows software applications to become
more accurate in predicting outcomes without being explicitly programmed.
The basic operations of machine learning is to build Algorithms that can receive input data and
use statistical analysis to predict an output value.
machine learning are similar to that of data mining and predictive modelling.
It is used for fraud detection ,Image and Speech Recognition, Medical Diagnosis, spam
filtering, network security threat detection, Prediction, Classification, Learning Associations,
Statistical Arbitrage, Extraction, Regression.
There are a number of spam filtering approaches that email clients use. To as certain that
these spam filters are continuously updated, they are powered by machine learning. When rule-
based spam filtering is done, it fails to track the latest tricks adopted by spammers. Multi Layer
Perception.
MACHINE LEARNING – copy rights reserved Real Time Signals
Machine learning is proving its potential to make cyberspace a secure place and tracking
monetary frauds online is one of its examples. For example: Paypal is using ML for protection
against money laundering. The company uses a set of tools that helps them to compare millions of
transactions taking place and distinguish between legitimate or illegitimate transactions taking place
between the buyers and sellers.
Face Recognition:
You upload a picture of you with a friend and Face book instantly recognizes that friend.
Face book checks the poses and projections in the picture, notice the unique features, and then
match them with the people in your friend list. The entire process at the backend is complicated and
takes care of the precision factor but seems to be a simple application of ML at the front end.
Google and other search engines use machine learning to improve the search results for
you. Every time you execute a search, the algorithms at the backend keep a watch at how you
respond to the results. If you open the top results and stay on the web page for long, the search
engine assumes that the the results it displayed were in accordance to the query. Similarly, if you
reach the second or third page of the search results but do not open any of the results, the search
engine estimates that the results served did not match requirement. This way, the algorithms
working at the backend improve the search results.
Traffic Predictions:
We all have been using GPS navigation services. While we do that, our current locations
and velocities are being saved at a central server for managing traffic. This data is then used to
build a map of current traffic. While this helps in preventing the traffic and does congestion analysis,
the underlying problem is that there are less number of cars that are equipped with GPS. Machine
learning in such scenarios helps to estimate the regions where congestion can be found on the
basis of daily experiences
Product Recommendations :
You shopped for a product online few days back and then you keep receiving emails for
shopping suggestions. If not this, then you might have noticed that the shopping website or the app
recommends you some items that somehow matches with your taste. Certainly, this refines the
shopping experience but did you know that it’s machine learning doing the magic for you? On the
basis of your behaviour with the website/app, past purchases, items liked or added to cart, brand
preferences etc., the product recommendations are made.
MACHINE LEARNING – copy rights reserved Real Time Signals
Image Recognition :
One of the most common uses of machine learning is image recognition. There are many
situations where you can classify the object as a digital image. For digital images, the
measurements describe the outputs of each pixel in the image.
In the case of a black and white image, the intensity of each pixel serves as one
measurement. So if a black and white image has N*N pixels, the total number of pixels and hence
measurement is N2.
In the colored image, each pixel considered as providing 3 measurements to the
intensities of 3 main color components ie RGB. So N*N colored image there are 3 N2
measurements.
For face detection – The categories might be face versus no face present. There might be a
separate category for each person in a database of several individuals.
For character recognition – We can segment a piece of writing into smaller images, each
containing a single character. The categories might consist of the 26 letters of the English
alphabet, the 10 digits, and some special characters.
Speech Recognition :
Speech recognition (SR) Is the translation of spoken words into text. It is also known as
“automatic speech recognition” (ASR), “computer speech recognition”, or “speech to text”
(STT).
Although the details of signal representation are outside the scope of this program, we can
represent the signal by a set of real values.
Speech recognition applications include voice user interfaces. Voice user interfaces are
such as voice dialing, call routing, domotic appliance control. It can also use as simple data
entry, preparation of structured documents, speech-to-text processing, and plane.
MACHINE LEARNING – copy rights reserved Real Time Signals
Supervised learning: Learning from the known label data to create a model then predicting target
class for the given input data.
Unsupervised learning: Learning from the unlabeled data to differentiating the given input data.
1. K – means clustering
2. Hierarchical clustering
MACHINE LEARNING – copy rights reserved Real Time Signals
Linear Regression
Linear regression:
Linear regression is a basic and commonly used type of predictive analysis. The
relationship between dependent variable (Y) and one or more independent variables (X).
First, the regression might be used to identify the strength of the effect that the independent
variable(s) have on a dependent variable. Typical questions are what is the strength of
relationship between dose and effect, sales and marketing spending, or age and income.
Second, it can be used to forecast effects or impact of changes. That is, the regression analysis
helps us to understand how much the dependent variable changes with a change in one or more
independent variables. A typical question is, “how much additional sales income do I get for each
additional $1000 spent on marketing?”
Third, regression analysis predicts trends and future values. The regression analysis can be used
to get point estimates. A typical question is, “what will the price of gold be in 6 months?”
line equation
y=mx+c+e
error(e) slope(m)
(c)intercept X
what is cost function : error is also called as cost function. cost function be the sum of squared
errors over your training set.
cost function=(y[i]-(m*x[i]+c))^2
MACHINE LEARNING – copy rights reserved Real Time Signals
what is gradient descent: gradient descent is use to decrease the cost function for each
iteration.
y=mx+c
cost function (e) = (y[i]-(m*x[i]))^2
Gradient descent = d(e)/dm = d/dm ( y[i]-m*x[i])^2
= 2 (y[i] - m * x[i] ) * d/dm (-m * x[i])
= 2(y[i] - m * x[i] )* -x[i]
Gradient descent = -2(y[i] - m * x[i])* x[i]
MACHINE LEARNING – copy rights reserved Real Time Signals
b1 = (x-x_bar)(y-y_bar)
(x-x_bar)^2
b0 = ( y_bar) - b1*(x_bar)
(x-x_bar)^2 (y-y_bar)(x-x_bar)
4 4
1 0
0 0
1 0
4 2
sum=10 sum=6
b1 = (x-x_bar)(y-y_bar) = 0.6
(x-x_bar)^2
R-squared (R^2) = compare between (estimated distance - mean),( actual distance -mean)
estimated values
6 regression line
4 mean=4
0 1 2 3 4 5 6
6 (actual values)
4 mean=4
0 1 2 3 4 5
(y-y_bar)^2
y^=b0+b1*x
(y-y_bar)^2
If R^2 is near to 1 better the line fits the data & when R^2 is far from 1, line not represents data at
all.
MACHINE LEARNING – copy rights reserved Real Time Signals
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,3,4,5])
y = np.array([2,4,5,4,5])
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x, m_y = np.mean(x), np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x - n*m_y*m_x)
SS_xx = np.sum(x*x - n*m_x*m_x)
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return(b_0, b_1)
if __name__ == "__main__":
main()
MACHINE LEARNING – copy rights reserved Real Time Signals
Multiple linear regression
x11,x12,x13,..... x1n y1
x21,x22,x23,..... x2n y2
x= ......,.......,......,....... ,y= y3
.......,.......,......,....... ......
xm1,xm2,xm3,...... xmn yn
y=m0*x0+m1*x1+m2*x2+........................m[n]*x[n]
example: house price Prediction, Predicting which Television Show will have more viewers for
next week.
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
data=pd.read_csv("house_price.csv")
data.head()
data.shape
def gradient(x,y,m0,mu):
error=0
m_gradient=0
new_m=0
for i in range(len(x)):
error+=(y[i]-(m0*x[i]))**2
m_gradient+=-2*(y[i]-(m0*x[i]))*x[i]
new_m+=m0-mu*m_gradient
return new_m,error
x=data['sqft']
y=data['price']
x=x-np.mean(x)
y=y-np.mean(y)
m=0
lst=[]
mu=0.01
for i in range(10):
m,e=gradient(x,y,m,mu)
lst.append(e)
plt.plot(lst)
print m
plt.scatter(x,y)
x1=np.arange(-0.5,0.5,0.1)
y1=x1*m
plt.plot(x1,y1,color='red')
MACHINE LEARNING – copy rights reserved Real Time Signals
By using sklearn:
Logistic regression:
Logistic regression is used to find the probability of event_ success and event_failure.
Dependent variable:
The dependent variable must be binary in nature (0 or 1 ).It is widely used for classification
problems.
logistic regression as a special case of linear regression when the outcome variable is
categorical, where we are using log of odds as dependent variable. In simple words, it predicts the
probability of occurrence of an event by fitting data to a log it function.
1. binomial: target variable can have only 2 possible types: “0” or “1” which may represent “win”
vs “loss”, “pass” vs “fail”, “dead” vs “alive”, etc.
2. multinomial: target variable can have 3 or more possible types which are not ordered(i.e.
types have no quantitative significance) like “disease A” vs “disease B” vs “disease C”.
3. ordinal: it deals with target variables with ordered categories. For example, a test score can
be categorized as:“very poor”, “poor”, “good”, “very good”. Here, each category can be given
a score like 0, 1, 2, 3.
ex:
Gender Pass/Fail
Female 1
Male 0
Male 1
Female 1
Female 1
Male 0
male 0
MACHINE LEARNING – copy rights reserved Real Time Signals
MACHINE LEARNING – copy rights reserved Real Time Signals
Decision Boundary :
Our current prediction function returns a probability score between 0 and 1. In order
to map this to a discrete class (true/false, cat/dog), we select a threshold value or tipping
point above which we will classify values into class 1 and below which we classify values
into class 2.
p≥0.5,class=1p<0.5,class=0p≥0.5,class=1p<0.5,class=0
For example, if our threshold was 0.5 and our prediction function returned 0.7, we
would classify this observation as positive. If our prediction was 0.2 we would classify the
observation as negative. For logistic regression with multiple classes we could select the
class with the highest predicted probability.
where,
MACHINE LEARNING – copy rights reserved Real Time Signals
if y=1 if y=0
0 s(z) 1 0 s(z) 1
softmax : As the name suggests, in softmax regression (SMR), we replace the sigmoid
logistic function by the so-called softmax function φ
where z is
(w is the weight vector, x is the feature vector of 1 training sample, and w0 is the bias unit.)
Now, this soft max function computes the probability that this training sample x(i) belongs to
class j given the weight and net input z(i). So, we compute the probability p(y = j | x(i); wj) for each
class label in j = 1, ..., k.
MACHINE LEARNING – copy rights reserved Real Time Signals
ex: z=[apple,orange,banana,mango]
apple=class 0
orange=class 1
banana= class 2
mango=class3
First, we want to encode the class labels into a format that we can more easily work with; we apply one-
hot encoding:
one-hot encoding
(0,1,2,3)
mango class 3 0 0 0 1
orange class 1 0 1 0 0
banana class 2 0 0 1 0
apple class 0 1 0 0 0
It works for both categorical and continuous input and output variables. In this technique,
we split the data into two or more homogeneous sets.
decision tree identifies the most significant variable and it’s value that gives best homogeneous set
of population.
Entropy :
Entropy, as it relates to machine learning, is a measure of the randomness in the
information being processed. The higher the entropy, the harder it is to draw any conclusions from
that information.
P+N
ex:
Class Entropy :
-5/10 log2 (5/10) - 5/10 log2 (5/10) = 1
MACHINE LEARNING – copy rights reserved Real Time Signals
Entropy(Age) :
4/10=0.4
Gain :
1-0.4 = 0.6
Competition :
Entropy (Competition) :
Gain :
1- 0.8754 = 0.124515
Type :
Entropy ( Type) :
Gain : 1-1 = 0
MACHINE LEARNING – copy rights reserved Real Time Signals
Compare Gain:
attribute Gain
Age 0.6
Competition 0.124
Type 0
Age
Old new
mid
down ? up
Age :
profit (class) : p = 2, n = 2
Entropy :
-2/4 log2 (2/4) - 2/4 log2 (2/4) = 1
Competition :
Entropy (Competition) :
0+2/4 (0) + 2+0/4 (0) = 0
1-0 =1
MACHINE LEARNING – copy rights reserved Real Time Signals
Type :
Entropy (Type) :
1-1 =0
Compare Gain:
Attribute Gain
Competition 1 node
Type 0
Decision Tree :
Age
Old new
mid
down up
Competition
Yes No
down up
MACHINE LEARNING – copy rights reserved Real Time Signals
Advantages:
• Easy to Understand: Decision tree output is very easy to understand even for people from
non-analytical background. It does not require any statistical knowledge to read and
interpret them. Its graphical representation is very intuitive and users can easily relate their
hypothesis.
• Useful in Data exploration: Decision tree is one of the fastest way to identify most
significant variables and relation between two or more variables. With the help of decision
trees, we can create new variables / features that has better power to predict target
variable.
• Less data cleaning required: It requires less data cleaning compared to some other
modelling techniques. It is not influenced by outliers and missing values to a fair degree.
• Data type is not a constraint: It can handle both numerical and categorical variables.
0 ma seco unk
5 adm 234 n ma 104 unkn
rrie ndar no yes now 5 1 -1 0 yes
9 in. 3 o y 2 own
d y n
1 ma seco unk
5 adm n ma 146 unkn
rrie ndar no 45 no now 5 1 -1 0 yes
6 in. o y 7 own
d y n
3 ma seco unk
5 servi 247 n ma unkn
rrie ndar no yes now 5 579 1 -1 0 yes
5 ces 6 o y own
d y n
4 ma unk
5 adm tertia n ma unkn
rrie no 184 no now 5 673 2 -1 0 yes
4 n. ry o y own
d n
MACHINE LEARNING – copy rights reserved Real Time Signals
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data=pd.read_csv("bank.csv")
data
features=data[['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays','previous']]
target=data["deposit"]
from sklearn.preprocessing import LabelEncoder
label = LabelEncoder()
Colums = features.dtypes.pipe(lambda features: features[features=='object']).index
for col in Colums:
features[col] = label.fit_transform(features[col])
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
X_train,X_test,Y_train,Y_test=train_test_split(features,target,test_size=0.4,random_state=1)
model=DecisionTreeClassifier()
model.fit(X_train,Y_train)
model.score(X_test,Y_test)
MACHINE LEARNING – copy rights reserved Real Time Signals
K-nearest neighbor classifier algorithms is to predict the target label by finding the nearest
neighbor class. The closest class will be identified using the distance measures like Euclidean
distance.
Euclidean distance: The Euclidean distance between two points in either the plane or 3-
dimensional space measures the length of a segment connecting the two points.
To demonstrate a k-nearest neighbor analysis, let's consider the task of classifying a new
object (query point) among a number of known examples.
Let’s consider the above image where we have two different target classes circle and triangle.
Now we would like to predict the target class for the red circle. Considering k value
as three, we need to calculate the similarity distance using similarity measures like Euclidean
distance.
MACHINE LEARNING – copy rights reserved Real Time Signals
Below graph shows that, for new dataset ( ), for 3 Nearest neighbour's dataset one is circle and
two are triangles. so new dataset will move to the cluster of triangle.
k=3
Let’s consider a setup with “n” training samples, where xi is the training data point. The
training data points are categorized into “c” classes. Using KNN, we want to predict class for the
new data point. So, the first step is to calculate the distance(Euclidean) between the new data
point and all the training data points.
Nearest neighbor is a special case of k-nearest neighbor class. Where k value is 1 (k = 1). In this
case, new data point target class will be assigned to the 1st closest neighbor.
Selecting the value of K in K-nearest neighbor is the most critical problem. A small value of
K means that noise will have a higher influence on the result i.e., the probability of overfitting is
very high. A large value of K makes it computationally expensive and defeats the basic idea
behind KNN (that points that are near might have similar classes ). A simple approach to select k
is k = n^(1/2).
To optimize the results, we can use Cross Validation. Using the cross-validation technique,
we can test KNN algorithm with different values of K. The model which gives good accuracy can
be considered to be an optimal choice.
It depends on individual cases, at times best process is to run through each possible value
of k and test our result.
MACHINE LEARNING – copy rights reserved Real Time Signals
Advantages:
• The K-Nearest Neighbours (KNN) Classifier is a very simple classifier that works well on
basic recognition problems.
Disadvantages:
• The main disadvantage of the KNN algorithm is, it does not learn anything from the training
data and simply uses the training data itself for classification.
• To predict the label of a new instance the KNN algorithm will find the K closest neighbours
to the new instance from the training data, the predicted class label will then be set as the
most common label among the K closest neigh boring points.
• Another disadvantage of this approach is that the algorithm does not learn anything from
the training data, which can result in the algorithm not generalizing well and also not being
robust to noisy data. Further, changing K can change the resulting predicted class label.
▪ Basically, we are trying to find probability of event A, given the event B is true. Event B is also
termed as evidence.
▪ P(A) is the priori of A (the prior probability, i.e. Probability of event before evidence is seen).
The evidence is an attribute value of an unknown instance(here, it is event B).
▪ P(A|B) is a posteriori probability of B, i.e. probability of event after evidence is seen.
where, y is class variable and X is a dependent feature vector (of size n) where:
X=(x1,x2,x3,......xn)
X=(Rainy,Hot,High,False)
y=No
So basically, P(X|y) here means, the probability of “Not playing golf” given that the weather
conditions are “Rainy outlook”, “Temperature is hot”, “high humidity” and “no wind”.
MACHINE LEARNING – copy rights reserved Real Time Signals
ex :
The dataset is divided into two parts, namely, features and the response vector.
▪ Feature matrix contains all the vectors(rows) of dataset in which each vector consists of the
value of dependent features. In above dataset, features are ‘Outlook’, ‘Temperature’,
‘Humidity’ and ‘Windy’.
probability of Outlook
probability of Temperature:
probability of Humidity:
probability of play:
play P(yes)/p(no)
Yes 9 9/14
No 5 5/14
total 14 100%
MACHINE LEARNING – copy rights reserved Real Time Signals
Let us test it on a new set of features (let us call it today)
today=(Sunny,Hot,Normal,False)
P(yes/today)=P(sunny/yes)P(Hot/yes)P(Normal/yes)P(Nowind/yes)P(yes)/P(today)
3/9*2/9*6/9*6/9*9/14=0.0211
P(no/today)=P(sunny/no)P(Hot/no)P(Normal/no)P(Nowind/no)P(no)/P(today)
2/5*2/5*1/5*2/5*5/14=0.0045
now since,
P(yes/today)+P(no/today)=1
P(yes/today)=0.0211/(0.0211+0.0045)=>0.0211/0.0256=>0.82
P(no/today)=0.0045/(0.0211+0.0045)=>0.0045/0.0256=>0.17
P(yes/today)>P(no/today)
x=[sepal.leangth,sepal.width,petal.leangth,petal.width]
y=[species]
To address why random forest algorithm. I am giving you the below advantages.
• The same random forest algorithm or the random forest classifier can use for both
classification and the regression task.
• Random forest classifier will handle the missing values.
• When we have more trees in the forest, we use random forest classifier to avoid
the overfitting.
All data
1. At each node:
1. For some number m (see below), m predictor variables are selected at random from
all the predictor variables.
2. The predictor variable that provides the best split, according to some objective
function, is used to do a binary split on that node.
3. At the next node, choose another m variables at random from all predictor variables
and do the same.
MACHINE LEARNING – copy rights reserved Real Time Signals
similarly, randomly creates the multiple decision trees and find the prediction for each decision
tree . each decision tree’s prediction is counted as a vote for one class. The label is predicted to
be the class which receives the most votes.
features =['Account Balance', 'Duration of Credit (month)', 'Payment Status of Previous Credit',
'Purpose', 'Credit Amount', 'Value Savings/Stocks', 'Length of current employment', 'Installment per
cent', 'Sex & Marital Status', 'Guarantors', 'Duration in Current address', 'Most valuable available
asset', 'Age (years)', 'Concurrent Credits', 'Type of apartment', 'No of Credits at this Bank',
'Occupation', 'No of dependents', 'Telephone', 'Foreign Worker']
• For the application in banking, Random Forest algorithm is used to find loyal customers, which
means customers who can take out plenty of loans and pay interest to the bank properly, and
fraud customers, which means customers who have bad records like failure to pay back a loan
on time or have dangerous actions.
• For the application in medicine, Random Forest algorithm can be used to both identify the
correct combination of components in medicine, and to identify diseases by analyzing the
patient’s medical records.
• For the application in the stock market, Random Forest algorithm can be used to identify a
stock’s behavior and the expected loss or profit.
MACHINE LEARNING – copy rights reserved Real Time Signals
SUPPORT VECTOR MACHINES
Linear Svm : In linear svm the data points are separated by an apparent gap.
it predicts a straight "hyper plane" dividing by 2 classes.
For Drawing the hyper plane is maximizing the distance from hype plane to the nearest data point
of either class.
The hyper plane is called as a "maximum margin hyper plane".
Non linear Svm: In non linear svm data points plotted in a higher dimensional space. Here
kernel trick is used to maximum margin hyper plane.
what is Margin : Margin is defined as the distance between the separating hyper plane
(decision boundary) .
MACHINE LEARNING – copy rights reserved Real Time Signals
here H1 does not separate the classes. H2 does, but only with a small margin. H3 separates them
with the maximum margin.
1 . Identify the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A, B and C).
Now, identify the right hyper-plane to classify star and circle.
2. Identify the right hyper-plane (Scenario-2): Here, we have three hyper-planes (A, B and C) and
all are segregating the classes well. Now, How can we identify the right hyper-plane?
Here to find the maximum distance between the nearest data points for each class.
here hyper-plane B as higher margin compared to A. But, hyper-plane B has a classification error and A
has classified all correctly. Therefore, the right hyper-plane is A.
MACHINE LEARNING – copy rights reserved Real Time Signals
SVM needs only the support vectors to classify any new data instances, it is quite efficient.
In other words, it uses a subset of training points in the decision function ( support vectors), so it is
also memory efficient.
To get the better understanding for the margin maximization, we may want to take a closer
look at those positive and negative hyper planes that are parallel to the decision boundary.
We can normalize this by the length of the vector w, which is defined as follows:
MACHINE LEARNING – copy rights reserved Real Time Signals
We can see we have two choices of implementation, and it depends on a specific SVM:
Large values of C correspond to large error penalties while we are less strict about
misclassification errors if we choose smaller values for C.
We can use the parameter C to control the width of the margin and therefore tune the bias-
variance trade-off as shown in the picture below:
Kernel method
The kernel methods is to deal with linearly inseparable data, and to create nonlinear
combinations of the original features to project them onto a higher dimensional space via a
mapping function ϕ() where it becomes linearly separable.
MACHINE LEARNING – copy rights reserved Real Time Signals
The mapping allows us to separate the two classes shown in the plot via a linear hyper plane
that becomes a nonlinear decision boundary if we project it back (ϕ−1) onto the original feature
space.
x=[sepal.leangth,sepal.width,petal.leangth,petal.width]
y=[species]
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
data=pd.read_csv("Iris.csv")
data.head()
# store the feature matrix (X) and response vector (Y)
X=np.array(data.iloc[:,0:4])
Y=data["Species"]
# splitting X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=4)
# training the model on training set
svc = svm.SVC(kernel='linear', C=15)
svc.fit(X_train, Y_train)
predict= svc.predict(X_test)
predict
accuracy=svc.score(X_test,Y_test)
accuracy
MACHINE LEARNING – copy rights reserved Real Time Signals
K- MEANS CLUSTERING
K-MEANS CLUSTERING:
K-means clustering is a type of unsupervised learning, which is used when you have
unlabeled data. The goal of this algorithm is to find groups in the data, with the number of
groups represented by the variable K. The algorithm works iteratively to assign each data
point to one of K groups based on the features that are provided.
• The Centroids of the K clusters, which can be used to label new data
Algorithm:
• assuming we have inputs x1,x2,x3,…,xn and value of K
• Step 1 - Pick K random points as cluster centers called centroids.
• Step 2 - Assign each xi to nearest cluster by calculating its distance to each centroid.
• Step 3 - Find new cluster center by taking the average of the assigned points.
• Step 4 - Repeat Step 2 and 3 until none of the cluster assignments change.
step1:
We randomly pick K cluster centres (centroids). Let’s assume these are c1,c2,…,ck and we
can say that;
C=c1,c2,.....ck
C is the set of all centroids.
step 2:
In this step we assign each input value to closest center. This is done by calculating
Euclidean(L2) distance between the point and the each Centroid.
step 3:
In this step, we find the new centroid by taking the average of all the points assigned to that
cluster.
Step 4:
In this step, we repeat step 2 and 3 until none of the cluster assignments change. That
means until our clusters remain stable, we repeat the algorithm.
ex: K = {2,3,4,10,11,12,20,25,30}
• Image Segmentation
• Clustering Gene Segementation Data
• News Article Clustering
• Clustering Languages
• Species Clustering
• Anomaly Detection
MACHINE LEARNING – copy rights reserved Real Time Signals
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#get_ipython().magic(u'matplotlib inline')
df = pd.DataFrame({
'x': [12, 20, 28, 18, 29, 33, 24, 45, 45, 52, 51, 52, 55, 53, 55, 61, 64, 69, 72],
'y': [39, 36, 30, 52, 54, 46, 55, 59, 63, 70, 66, 63, 58, 23, 14, 8, 19, 7, 24]
})
np.random.seed(200)
k=3
# centroids[i] = [x, y]
centroids = {
i+1: [np.random.randint(0, 80), np.random.randint(0, 80)]
for i in range(k)
}
old_centroids = copy.deepcopy(centroids)
def update(k):
for i in centroids.keys():
centroids[i][0] = np.mean(df[df['closest'] == i]['x'])
centroids[i][1] = np.mean(df[df['closest'] == i]['y'])
return k
centroids = update(centroids)
df = assignment(df, centroids)
# Plot results
fig = plt.figure(figsize=(5, 5))
plt.scatter(df['x'], df['y'], color=df['color'], alpha=0.5, edgecolor='k')
for i in centroids.keys():
plt.scatter(*centroids[i], color=colmap[i])
plt.xlim(0, 80)
plt.ylim(0, 80)
plt.show()
MACHINE LEARNING – copy rights reserved Real Time Signals
while True:
closest_centroids = df['closest'].copy(deep=True)
centroids = update(centroids)
df = assignment(df, centroids)
if closest_centroids.equals(df['closest']):
break