0% found this document useful (0 votes)
13 views16 pages

L05 Unsupervised Learning - Overview

Uploaded by

black hello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views16 pages

L05 Unsupervised Learning - Overview

Uploaded by

black hello
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Overview

1
Introduction
• machine learning technique in which the users do not
need to supervise the model
• allows the model to work on its own to discover patterns
and information that was previously undetected
• mainly deals with the unlabeled data

2
Unsupervised Learning Overview

unlabeled fit structur


data (no e
answers)

new predict map


unlabeled + model new
data data to
structur
e
Introduction
• Why unsupervised learning?
• finds all kind of unknown patterns in data.
• find features which can be useful for categorization.
• can taken place in real time, so all the input data to
be analyzed and labeled in the presence of learners.
• easier to get unlabeled data from a computer than
labeled data, which needs manual intervention.

4
Types of Unsupervised Learning Algorithms
• Clustering
• finding a structure or pattern in a collection of uncategorized data
• process data and find natural clusters(groups) if they exist in the data
• types: Exclusive (partitioning), Agglomerative, Overlapping, Probabilistic
• Dimension reduction
• dimension reduction technique that finds the variance maximizing
directions onto which to project the data
• use structural characteristics to simplify data
• Association
• rule learning problem to establish associations amongst data objects inside
large databases
• For example, people that buy a new home most likely to buy new furniture

5
Some of the Unsupervised Learning Algorithms
• Clustering
• K-means - data points are assigned into K groups, where K represents the number of
clusters based on the distance from each group’s centroid
• Hierarchical clustering, also known as hierarchical cluster analysis (HCA) - categorized in
two ways; they can be agglomerative or divisive
• probabilistic model - helps us solve density estimation or “soft” clustering problems where
data points are clustered based on the likelihood that they belong to a particular
distribution
• Dimension reduction
• Principal component analysis (PCA) - to reduce redundancies and to compress datasets
through feature extraction
• Singular value decomposition (SVD) - factorizes a matrix, A, into three, low-rank matrices
• Autoencoders leverage neural networks to compress data and then recreate a new
representation of the original data’s input
• Association
• Apriori algorithms – for market basket analyses, leading to different recommendation
engines for music platforms and online retailers
6
Types of Unsupervised Learning

Clustering identify unknown structure in data

Dimensionali
use structural characteristics to simplify
ty Reduction
data
Clustering: Finding Distinct Groups

text articles fit


of unknown + mode model
topics l

text articles predict


mode predict
of unknown + l
similar
topics
articles
Application of Clustering
• Recommendation engines
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection
9
Dimensionality Reduction: Simplifying
Structure
high fit
resolution + mode model
images l

high predict
mode compressed
resolution + l
images
images
Applications of Unsupervised Machine
Learning
• Clustering automatically split the dataset into groups base on
their similarities
• Anomaly detection can discover unusual data points in your
dataset. It is useful for finding fraudulent transactions
• Association mining identifies sets of items which often occur
together in your dataset
• Latent variable models are widely used for data
preprocessing. Like reducing the number of features in a
dataset or decomposing the dataset into multiple
components

11
Disadvantages of Unsupervised
Learning
• cannot get precise information regarding data sorting, and the output
as data used in unsupervised learning is labeled and not known
• less accuracy of the results is because the input data is not known
and not labeled by people in advance. This means that the machine
requires to do this itself.
• the spectral classes do not always correspond to informational
classes.
• the user needs to spend time interpreting and label the classes which
follow that classification.
• spectral properties of classes can also change over time so you can't
have the same class information while moving from one image to
another.
12
Pre-processing and Scaling
• data preprocessing and normalization become very
important when it comes to the implementation of different
Machine Learning Algorithms
• can affect the outcome of the learning model significantly, it
is very important that all features are on the same scale

13
Types of pre-processing and scaling
• StandardScaler
• ensures that for each feature in the dataset mean is 0 and variance is 1 and brings
all features to the same magnitude
• doesn’t ensure any minimum and maximum values for the features
• RobustScaler
• similar to StandardScaler but uses the median and quartiles instead of mean and
variance
• makes scaler ignore data points that are very different from the rest (measurement
errors)
• Normalizer scales
• each data point such that the feature vector has a Euclidian length of 1
• Every data point is scaled by a different number (by the inverse of its length)
• used when the only direction of the data matters, not the length of the feature
vector
• MinMaxScaler
• transforms all the input variables, so they’re all on the same scale between zero and
one
• computes the minimum and maximum values for each feature on the training data,
and then applies the min - max transformation for each feature.

Source: https://levelup.gitconnected.com/importance-of-data-
preprocessing-and-scaling-in-machine-learning-21db1d4377ec 14
Types of pre-processing and scaling

Source: https://levelup.gitconnected.com/importance-of-data-preprocessing-and-scaling-
15
in-machine-learning-21db1d4377ec
Reference:
• https://www.guru99.com/unsupervised-machine-learning.html
• https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-
unsupervised-learning#dimension-reduction
• https://www.ibm.com/cloud/learn/unsupervised-learning
• https://levelup.gitconnected.com/importance-of-data-
preprocessing-and-scaling-in-machine-learning-21db1d4377ec
• https://www.analyticsvidhya.com/blog/2016/11/an-introduction-
to-clustering-and-different-methods-of-clustering/
• https://www.digitalvidya.com/blog/the-top-5-clustering-
algorithms-data-scientists-should-know/

16

You might also like