0% found this document useful (0 votes)
48 views

Unsupervised Learning: Harsha Vardhan Reddy Burri

Unsupervised learning involves identifying hidden patterns in data without labeled outputs or targets. The main goals are to prepare clusters of similar data points and estimate the density of data distribution in the feature space. Common unsupervised learning techniques include k-means clustering, which groups data points into k clusters based on minimizing distances to centroid points, and hierarchical clustering, which builds nested clusters based on similarity. Unsupervised learning has applications in areas like market analysis, biology, and web mining.

Uploaded by

T R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Unsupervised Learning: Harsha Vardhan Reddy Burri

Unsupervised learning involves identifying hidden patterns in data without labeled outputs or targets. The main goals are to prepare clusters of similar data points and estimate the density of data distribution in the feature space. Common unsupervised learning techniques include k-means clustering, which groups data points into k clusters based on minimizing distances to centroid points, and hierarchical clustering, which builds nested clusters based on similarity. Unsupervised learning has applications in areas like market analysis, biology, and web mining.

Uploaded by

T R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Unsupervised Learning

Harsha Vardhan Reddy Burri


Unsupervised Learning
• There is no output or response or target
variable, only having input variable(X)
• The major goal is to identify the hidden
patterns and relationships in data
• Preparing clusters and finding data 
distribution in the space (density estimation).
• Examples: grouping fruits
Grouping : 
• Green color – bananas and grapes
• Physical characters
• Green color and big size – banana
• Like shape, color, odor, 
• Green color and small size‐ grapes 
• Ex: Red color – apples and cherrys
• Redcolor and bigsize‐ apples
• Redcolor and small size‐ cherrys
Real Life examples
• You meet strangers in party , then you need to 
classify them without prior knowldge. How to 
do? – Basis on gender, age, habits and other
behavioural
• You found a new instance that differ from 
others, how to find or classify? ‐
Challenges
• Harder as compared to Supervised Learning tasks..
• Dealing with large number of dimensions and large number of 
data items can be problematic because of time complexity;
• The effectiveness of the method depends on the definition of 
“distance” (for distance‐based clustering). 
• The result of the clustering algorithm (that in many cases can 
be arbitrary itself) can be interpreted in different ways.
• How do we know if results are meaningful since no answer 
labels are available?
• Let the expert look at the results (external evaluation)
• Define an objective function on clustering (internal 
evaluation)
Applications
• Can be applied in many fields
• Market Analysis : 
Grouping customers
• Biology: 
Classification of plants and animals given their features
Analysis genes and genomes
• Insurance: 
Identifying groups of motor insurance policy holders 
with a  high average claim cost; identifying frauds;
• Earthquake studies: 
– Clustering observed earthquake epicenters to identify 
dangerous zones;
• World Wide Web: 
– Document classification; clustering weblog data to discover 
groups of similar access patterns.
Types of Unsupervised algorithms
• K‐means clustering

• Hierarchial clustering

• Principle Component Analysis
K‐means Clustering 
• Unsupervised learning algoritm
• Unleabelled data or no target label
• Goal is to find patterns and making clusters
Stpes in K‐means:
• 1: Pick random points as cluster centers (also called as 
centroids). cluster centroids – c1,  c2, c3….ck
• 2: Assign each data point to nearest cluster by calculating
its distance to each centroid
• 3. find new cluster center by taking the averages of 
assigned points
• 4. Repeat step 2 and 3 untill none of the cluster
assignments change
Dataset= [2,3,4,10,11,12,20,25,30] #monthly expenditure (in 1000) of customers

10,11,12,20,25,30
2,3,4
Mean =3
Mean =18

11,12,20,25,30
2,3,4,10

Mean =5
Mean =20

12,20,25,30
2,3,4,10,11

Mean =6
Mean =22
2,3,4,10,11,12 20,25,30

Mean =7
Mean =25

Applications: 
1. Image segmentation
2. Clustering genome data – gene segments
3. Data mining segmentation
4. Anomly detection
5. Instance classification
6. Customer classification

You might also like