0% found this document useful (0 votes)

32 views13 pages

ML - Unit - 2

ML Unit-II

Uploaded by

Dr D S Naga Malleswara Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views13 pages

ML - Unit - 2

ML Unit-II

Uploaded by

Dr D S Naga Malleswara Rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Unsupervised Learning

Unsupervised learning is a type of machine learning algorithm used to draw inferences from
datasets consisting of input data without labeled responses.

The most common unsupervised learning method is cluster analysis, which is used for exploratory
data analysis to find hidden patterns or grouping in data. The clusters are modeled using a measure
of similarity which is defined upon metrics such as Euclidean or probabilistic distance.

Common clustering algorithms include:

Hierarchical clustering: builds a multilevel hierarchy of clusters by creating a cluster tree

k-Means clustering: partitions data into k distinct clusters based on distance to the centroid of a
cluster

Gaussian mixture models: models clusters as a mixture of multivariate normal density components

Self-organizing maps: uses neural networks that learn the topology and distribution of the data

Unsupervised learning methods are used in bioinformatics for sequence analysis and genetic
clustering; in data mining for sequence and pattern mining; in medical imaging for image
segmentation; and in computer vision for object recognition.

k-means clustering algorithm

k-means is one of the simplest unsupervised learning algorithms that solve the well known clustering
problem.

The procedure follows a simple and easy way to classify a given data set through a certain number
of clusters (assume k clusters) fixed apriori.

The main idea is to define k centers, one for each cluster.

These centers should be placed in a cunning way because of different location causes different result.
So, the better choice is to place them as much as possible far away from each other. The next step
is to take each point belonging to a given data set and associate it to the nearest center. When no
point is pending, the first step is completed and an early group age is done. At this point we need to
re-calculate k new centroids as barycenter of the clusters resulting from the previous step.

After we have these k new centroids, a new binding has to be done between the same data set
points and the nearest new center. A loop has been generated. As a result of this loop we may notice
that the k centers change their location step by step until no more changes are done or in other words
centers do not move any more.
Finally, this algorithm aims at minimizing an objective function know as squared error function given
by:

where,
‘||xi - vj||’ is the Euclidean distance between xi and vj.

‘ci’ is the number of data points in ith cluster.

‘c’ is the number of cluster centers.

Algorithmic steps for k-means clustering

Let X = {x1,x2,x3,……..,xn} be the set of data points and V = {v1,v2,…….,vc} be the set of centers.

1) Randomly select ‘c’ cluster centers.

2) Calculate the distance between each data point and cluster centers.

3) Assign the data point to the cluster center whose distance from the cluster center is minimum of all
the cluster centers..

4) Recalculate the new cluster center using:

where, ‘ci’ represents the number of data points in ith cluster.

5) Recalculate the distance between each data point and new obtained cluster centers.

6) If no data point was reassigned then stop, otherwise repeat from step 3).

Advantages:-
1) Fast, robust and easier to understand.

2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each object,

and t is # iterations. Normally, k, t, d << n.

3) Gives best result when data set are distinct or well separated from each other.

Fig I: Showing the result of k-means for 'N' = 60 and 'c' = 3

Disadvantages:-

1) The learning algorithm requires apriori specification of the number of cluster centers.

2) The use of Exclusive Assignment - If there are two highly overlapping data then k-means will not be
able to resolve that there are two clusters.

3) The learning algorithm is not invariant to non-linear transformations i.e. with different representation
of data we get different results (data represented in form of cartesian co-ordinates and polar co-ordinates
will give different results).

4) Euclidean distance measures can unequally weight underlying factors.

5) The learning algorithm provides the local optima of the squared error function.

6) Randomly choosing of the cluster center cannot lead us to the fruitful result. Pl. refer Fig.

7) Applicable only when mean is defined i.e. fails for categorical data.

8) Unable to handle noisy data and outliers.

9) Algorithm fails for non-linear data set.

Fig II: Showing the non-linear data set where k-means algorithm fails

kernel k-means clustering algorithm

This algorithm applies the same trick as k-means but with one difference that here in the calculation of distanc
kernel method is used instead of the Euclidean distance.

Algorithmic steps for Kernel k-means clustering

Let X = {a1, a2, a3, ..., an} be the set of data points and 'c' be the number of clusters.

1) Randomly initialize ‘c’ cluster center.

2) Compute the distance of each data point and the cluster center in the transformed space using:
where,

cth cluster is denoted by πc.

‘mc’ denotes the mean of the cluster πc.

‘Ф(ai)’ denotes the data point ai in transformed space.

Ф(ai). Ф(aj) = exp- (||ai - aj||)*q for gaussian kernel.

= (c + ai.aj)^d for polynomial kernel.

3) Assign data point to that cluster center whose distance is minimum.

4) Until data points are re-assigned repeat from step 2).

Fig I: Result obtained by applying Gaussian Kernel k-means with 'q' =10

Advantages

1) Algorithm is able to identify the non-linear structures.

2) Algorithm is best suited for real life data set.

Disadvantages

1) Number of cluster centers need to be predefined.

2) Algorithm is complex in nature and time complexity is large.

What is Dimensionality Reduction?

In machine learning classification problems, there are often too many factors on the basis of which
the final classification is done. These factors are basically variables called features. The higher the
number of features, the harder it gets to visualize the training set and then work on it. Sometimes,
most of these features are correlated, and hence redundant. This is where dimensionality reduction
algorithms come into play. Dimensionality reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal variables. It can be divided
into feature selection and feature extraction.

Why is Dimensionality Reduction important in Machine Learning and Predictive Modeling?

An intuitive example of dimensionality reduction can be discussed through a simple e-mail
classification problem, where we need to classify whether the e-mail is spam or not. This can
involve a large number of features, such as whether or not the e-mail has a generic title, the content
of the e-mail, whether the e-mail uses a template, etc. However, some of these features may
overlap. In another condition, a classification problem that relies on both humidity and rainfall can
be collapsed into just one underlying feature, since both of the aforementioned are correlated to a
high degree. Hence, we can reduce the number of features in such problems. A 3-D classification
problem can be hard to visualize, whereas a 2-D one can be mapped to a simple 2 dimensional
space, and a 1-D problem to a simple line. The below figure illustrates this concept, where a 3-D
feature space is split into two 1-D feature spaces, and later, if found to be correlated, the number
of features can be reduced even further.

Components of Dimensionality Reduction

There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables, or features,
to get a smaller subset which can be used to model the problem. It usually involves three
ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a lower dimension
space, i.e. a space with lesser no. of dimensions.

Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear or non-linear, depending upon the method used. The
prime linear method, called Principal Component Analysis, or PCA, is discussed below.
Principal Component Analysis
This method was introduced by Karl Pearson. It works on a condition that while the data in a higher
dimensional space is mapped to data in a lower dimension space, the variance of the data in the
lower dimensional space should be maximum.
It involves the following steps:
 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a large fraction
of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been some data loss
in the process. But, the most important variances should be retained by the remaining eigenvectors.
Advantages of Dimensionality Reduction
 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
Disadvantages of Dimensionality Reduction
 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some thumb rules
are applied.
 Explain fold
 based filtering

3. Kernel Principal Component Analysis

There are a lot of machine learning problems which a nonlinear, and the use of nonlinear feature
mappings can help to produce new features which make prediction problems linear. In this section
we will discuss the following idea: transformation of the dataset to a new higher-dimensional (in
some cases infinite-dimensional) feature space and the use of PCA in that space in order to produce
uncorrelated features. Such a method is called Kernel Principal Component Analysis or KPCA.

Let us denote a covariance matrix in a new feature space as

where . Will consider that the dimensionality of the feature space equals
to .

Eigen decomposition of is given by

By the definition of

and therefore

It is obviously to see, that is a linear combination of and thus can be written as

Substituting it to the equation above and writing it in a matrix notation, we get

where is a Gram matrix in , and are column-vectors with

elements . Eigenvectors of should be orthonormal, therefore, we get the following:

Having eigenvectors of , we can get the projection of an item on -th eigenvector:

So far, we have assumed that the mapping is known. From the equations above, we can see,
that only a thing that we need for the data transformation is the eigendecomposition of a Gram
matrix . Dot products, which are its elements can be defined without any definition of .
The function defining such dot products in some Hilbert space is called kernel. Kernels are
satisfied by the Mercer’s theorem. There are many different types of kernels, there are several
popular:

1. Linear: ;
2. Gaussian: ;
3. Polynomial: .

Using a kernel function we can write new equation for a projection of some data item onto -th
eigenvector:

So far, we have assumed that the columns of have zero mean. Using

and substituting it to the equation for , we get

where is a matrix , where each element equals to .

Summary: Now we are ready to write the whole sequence of steps to perform KPCA:

1. Calculate .
2. Calculate .
3. Find the eigenvectors of corresponding to nonzero eigenvalues and normalize
them: .
4. Sort found eigenvectors in the descending order of coresponding eigenvalues.
5. Perform projections onto the given subset of eigenvectors.

The method described above requires to define the number of components, the kernel and its
parameters. It should be noted, that the number of nonlinear principal components in the general
case is infinite, but since we are computing the eigenvectors of a matrix , at maximum
we can calculate nonlinear principal components.

Matrix Factorization:
matrix factorization is to, obviously, factorize a matrix, i.e. to find out two (or more)
matrices such that when you multiply them you will get back the original matrix.

Matrix factorization can be used to discover latent features underlying the interactions
between two different kinds of entities. (Of course, you can consider more than two
kinds of entities and you will be dealing with tensor factorization, which would be more
complicated.) And one obvious application is to predict ratings in collaborative filtering.

In a recommendation system such as Netflix or MovieLens, there is a group of users and

a set of items (movies for the above two systems). Given that each users have rated some
items in the system, we would like to predict how the users would rate the items that
they have not yet rated, such that we can make recommendations to the users. In this
case, all the information we have about the existing ratings can be represented in a
matrix. Assume now we have 5 users and 10 items, and ratings are integers ranging from
1 to 5, the matrix may look something like this (a hyphen means that the user has not yet
rated the movie):
D1 D2 D3 D4

U1 5 3 - 1

U2 4 - - 1

U3 1 1 - 5

U4 1 - - 4

U5 - 1 5 4

Hence, the task of predicting the missing ratings can be considered as filling in the
blanks (the hyphens in the matrix) such that the values would be consistent with the
existing ratings in the matrix.

The intuition behind using matrix factorization to solve this problem is that there should
be some latent features that determine how a user rates an item. For example, two users
would give high ratings to a certain movie if they both like the actors/actresses of the
movie, or if the movie is an action movie, which is a genre preferred by both users.
Hence, if we can discover these latent features, we should be able to predict a rating with
respect to a certain user and a certain item, because the features associated with the user
should match with the features associated with the item.

In trying to discover the different features, we also make the assumption that the
number of features would be smaller than the number of users and the number of items.
It should not be difficult to understand this assumption because clearly it would not be
reasonable to assume that each user is associated with a unique feature (although this is
not impossible). And anyway if this is the case there would be no point in making
recommendations, because each of these users would not be interested in the items
rated by other users. Similarly, the same argument applies to the items.

The mathematics of matrix factorization

Having discussed the intuition behind matrix factorization, we can now go on to work
on the mathematics. Firstly, we have a set of users, and a set of items. Let of
size be the matrix that contains all the ratings that the users have assigned to
the items. Also, we assume that we would like to discover $K$ latent features. Our task,
then, is to find two matrics matrices (a matrix) and (a matrix)
such that their product approximates :

In this way, each row of would represent the strength of the associations between a
user and the features. Similarly, each row of would represent the strength of the
associations between an item and the features. To get the prediction of a rating of an
item by , we can calculate the dot product of the two vectors corresponding
to and :

Now, we have to find a way to obtain and . One way to approach this problem is the
first intialize the two matrices with some values, calculate how `different’ their product
is to , and then try to minimize this difference iteratively. Such a method is called
gradient descent, aiming at finding a local minimum of the difference.

The difference here, usually called the error between the estimated rating and the real
rating, can be calculated by the following equation for each user-item pair:
Here we consider the squared error because the estimated rating can be either higher or
lower than the real rating.

To minimize the error, we have to know in which direction we have to modify the values
of and . In other words, we need to know the gradient at the current values, and
therefore we differentiate the above equation with respect to these two variables
separately:

Having obtained the gradient, we can now formulate the update rules for
both and :

Here, is a constant whose value determines the rate of approaching the minimum.
Usually we will choose a small value for , say 0.0002. This is because if we make too
large a step towards the minimum we may run into the risk of missing the minimum
and end up oscillating around the minimum.

A question might have come to your mind by now: if we find two matrices and such
that approximates , isn’t that our predictions of all the unseen ratings will all
be zeros? In fact, we are not really trying to come up with and such that we can
reproduce exactly. Instead, we will only try to minimise the errors of the observed
user-item pairs. In other words, if we let be a set of tuples, each of which is in the form
of , such that contains all the observed user-item pairs together with the
associated ratings, we are only trying to minimise every for . (In other
words, is our set of training data.) As for the rest of the unknowns, we will be able to
determine their values once the associations between the users, items and features have
been learnt.

Using the above update rules, we can then iteratively perform the operation until the
error converges to its minimum. We can check the overall error as calculated using the
following equation and determine when we should stop the process.
Matrix completion:

KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
K Means Algorithm
No ratings yet
K Means Algorithm
4 pages
Week 11
No ratings yet
Week 11
49 pages
Working of K Means Algorithm - YashBhure
No ratings yet
Working of K Means Algorithm - YashBhure
14 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Week 9
No ratings yet
Week 9
66 pages
Unit 4
No ratings yet
Unit 4
125 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
78 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
Kmean
No ratings yet
Kmean
24 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
K Clustering
No ratings yet
K Clustering
28 pages
K Mean Clustering
No ratings yet
K Mean Clustering
32 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Machine Learning
No ratings yet
Machine Learning
23 pages
K Means Clustering
No ratings yet
K Means Clustering
27 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
L7 Clustering
No ratings yet
L7 Clustering
58 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Algo
No ratings yet
Algo
59 pages
2nd Activity Fit Goal
No ratings yet
2nd Activity Fit Goal
3 pages
Unit 4
No ratings yet
Unit 4
74 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
27 pages
K Mean
No ratings yet
K Mean
7 pages
K Means
No ratings yet
K Means
25 pages
Palantir Price List
No ratings yet
Palantir Price List
2 pages
8610 Quiz Best File Braveheart
No ratings yet
8610 Quiz Best File Braveheart
141 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Intro Data Science: Cluster Analysis
No ratings yet
Intro Data Science: Cluster Analysis
60 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
2875 27398 1 SP
No ratings yet
2875 27398 1 SP
4 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
Kmean Clustering
No ratings yet
Kmean Clustering
3 pages
Unit-Ii Rectifiers, Filters, Regulators
No ratings yet
Unit-Ii Rectifiers, Filters, Regulators
39 pages
A Tutorial On Clustering Algorithms
No ratings yet
A Tutorial On Clustering Algorithms
4 pages
K Mean
No ratings yet
K Mean
12 pages
Book List 2023 24 For Website
No ratings yet
Book List 2023 24 For Website
10 pages
DBT Mindfulness Skills PDF
No ratings yet
DBT Mindfulness Skills PDF
2 pages
Intro - S4HANA - Using - Global - Bike - Case - Study - PP - Fiori - en - v3.3 (Step 8)
No ratings yet
Intro - S4HANA - Using - Global - Bike - Case - Study - PP - Fiori - en - v3.3 (Step 8)
6 pages
English Grammer For Cadet
No ratings yet
English Grammer For Cadet
20 pages
Untitled
No ratings yet
Untitled
326 pages
Unit 1
No ratings yet
Unit 1
17 pages
Unit-I PN JN Diode
No ratings yet
Unit-I PN JN Diode
40 pages
Natural Law PowerPoint by DR David
No ratings yet
Natural Law PowerPoint by DR David
82 pages
Mark Scheme (Results) January 2025: Pearson Edexcel International Advanced Level in Pure Mathematics 2 (WMA12) Paper 01
No ratings yet
Mark Scheme (Results) January 2025: Pearson Edexcel International Advanced Level in Pure Mathematics 2 (WMA12) Paper 01
23 pages
Esd Notes
No ratings yet
Esd Notes
135 pages
NBS
No ratings yet
NBS
7 pages
Traditional Literacy
No ratings yet
Traditional Literacy
9 pages
Unit 2
No ratings yet
Unit 2
44 pages
77-Presentation RFT Siri
No ratings yet
77-Presentation RFT Siri
15 pages
Educating For Peace 1st Edition Lokanath Mishra Download
100% (1)
Educating For Peace 1st Edition Lokanath Mishra Download
42 pages
Unit 1
No ratings yet
Unit 1
113 pages
Exam B - Respuestas
No ratings yet
Exam B - Respuestas
2 pages
Bachelor Thesis Political Science
100% (3)
Bachelor Thesis Political Science
4 pages
Manuscript
No ratings yet
Manuscript
47 pages
Measuring Weight Homework Ks1
100% (1)
Measuring Weight Homework Ks1
8 pages
Bala 1
No ratings yet
Bala 1
34 pages
KMBN408 RPR Notice
No ratings yet
KMBN408 RPR Notice
4 pages
Unit 3
No ratings yet
Unit 3
15 pages
ADIT Awards Distinctions and Overall Pass List (December 2023)
No ratings yet
ADIT Awards Distinctions and Overall Pass List (December 2023)
5 pages
Zaid Raad 999999
No ratings yet
Zaid Raad 999999
9 pages
Ife Cv.
No ratings yet
Ife Cv.
5 pages
Real Time Operating Systems: Programme Elective - II
No ratings yet
Real Time Operating Systems: Programme Elective - II
66 pages
Demand Planning
No ratings yet
Demand Planning
7 pages
IS-BFSI-Europe NW-Parent
No ratings yet
IS-BFSI-Europe NW-Parent
5 pages
Common Laboratory Accidents and Causes in Secondary Schools of Zaria Environ
No ratings yet
Common Laboratory Accidents and Causes in Secondary Schools of Zaria Environ
7 pages
Overview of Mathematics and Its Applications
No ratings yet
Overview of Mathematics and Its Applications
1 page
A8 Meantime 09 10
No ratings yet
A8 Meantime 09 10
24 pages
Instruction Permit FAQ
No ratings yet
Instruction Permit FAQ
2 pages
Introduction For RRL
No ratings yet
Introduction For RRL
6 pages
Growth Mindset - Commercial Support 2020
No ratings yet
Growth Mindset - Commercial Support 2020
5 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

ML - Unit - 2

Uploaded by

ML - Unit - 2

Uploaded by

Unsupervised Learning

Common clustering algorithms include:

Hierarchical clustering: builds a multilevel hierarchy of clusters by creating a cluster tree

k-means clustering algorithm

The main idea is to define k centers, one for each cluster.

‘ci’ is the number of data points in ith cluster.

‘c’ is the number of cluster centers.

Algorithmic steps for k-means clustering

1) Randomly select ‘c’ cluster centers.

4) Recalculate the new cluster center using:

where, ‘ci’ represents the number of data points in ith cluster.

2) Relatively efficient: O(tknd), where n is # objects, k is # clusters, d is # dimension of each object,

Fig I: Showing the result of k-means for 'N' = 60 and 'c' = 3

4) Euclidean distance measures can unequally weight underlying factors.

8) Unable to handle noisy data and outliers.

9) Algorithm fails for non-linear data set.

kernel k-means clustering algorithm

Algorithmic steps for Kernel k-means clustering

1) Randomly initialize ‘c’ cluster center.

cth cluster is denoted by πc.

‘mc’ denotes the mean of the cluster πc.

‘Ф(ai)’ denotes the data point ai in transformed space.

Ф(ai). Ф(aj) = exp- (||ai - aj||)*q for gaussian kernel.

= (c + ai.aj)^d for polynomial kernel.

3) Assign data point to that cluster center whose distance is minimum.

4) Until data points are re-assigned repeat from step 2).

1) Algorithm is able to identify the non-linear structures.

2) Algorithm is best suited for real life data set.

1) Number of cluster centers need to be predefined.

2) Algorithm is complex in nature and time complexity is large.

What is Dimensionality Reduction?

Why is Dimensionality Reduction important in Machine Learning and Predictive Modeling?

Components of Dimensionality Reduction

Methods of Dimensionality Reduction

3. Kernel Principal Component Analysis

Let us denote a covariance matrix in a new feature space as

Eigen decomposition of is given by

It is obviously to see, that is a linear combination of and thus can be written as

Substituting it to the equation above and writing it in a matrix notation, we get

where is a Gram matrix in , and are column-vectors with

Having eigenvectors of , we can get the projection of an item on -th eigenvector:

and substituting it to the equation for , we get

where is a matrix , where each element equals to .

In a recommendation system such as Netflix or MovieLens, there is a group of users and

The mathematics of matrix factorization

You might also like