0% found this document useful (0 votes)
9 views16 pages

Unit-4 ML

Ml pdf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views16 pages

Unit-4 ML

Ml pdf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

UNIT-4

Unsupervised Learning Techniques: Clustering, K-Means, Limits of


K-Means, Using Clustering for Image Segmentation, Using Clustering
for Preprocessing, Using Clustering for Semi-Supervised Learning,
DBSCAN, Gaussian Mixtures.
Dimensionality Reduction: The Curse of Dimensionality, Main
Approaches for Dimensionality Reduction, PCA, Using Scikit-Learn,
Randomized PCA, Kernel PCV.
Clustering
1.What is Clustering?
Clustering is a way to organize data into groups based on similarities. It’s a method used
in machine learning where you don’t have any labels or categories to guide you.
2.No Labels Needed:
In clustering, you work with data that doesn’t have any predefined tags. For instance, if
you have a list of animals without species names, clustering can help you group them
based on similar traits like size, habitat, or diet.
3.Finding Patterns:
The goal of clustering is to discover patterns or structures in the data. It helps you see how
data points are related to one another, even if you didn’t know those relationships existed
before.
4.Real-World Uses:
You can use clustering in many areas, like:
1. Customer Segmentation: Grouping customers who buy similar products.
2. Social Networks: Identifying communities of users with similar interests.
3. Image Recognition: Grouping similar images together for easier processing.

• In essence, clustering helps you make sense of large datasets by showing you how data
points can be grouped based on their similarities, without needing prior labels or
classifications.
Clustering
Clustering is the process of organizing a set of data points into groups, or clusters, based on their
similarities. In clustering:
• Similar Data Points: Data points within the same group are more alike each other.
• Dissimilar Data Points: Data points in different groups are more different from each other.
Essentially, clustering helps to categorize objects based on how closely related they are, making it
easier to analyze and understand complex datasets.
For example, in a given graph, you might notice that certain data points are closely grouped together.
These closely clustered points can be classified into a single group. By observing the graph, we can
identify that there are three distinct clusters present. Each cluster contains data points that are similar
to each other, while the points in different clusters are more dissimilar.
Clustering
Clustering is used in many ways, including:

• a) Customer Segmentation: Businesses group customers based on what they buy or how
they behave online. This helps tailor products and marketing to different customer types.
• b) Data Analysis: When looking at new data, finding clusters of similar items makes it
easier to understand and analyze each group.
• c) Dimensionality Reduction: Clustering can simplify data by reducing the number of
features. Each data point can be represented by how much it belongs to each cluster,
making it easier to work with.
• d) Anomaly Detection: Clusters can help identify unusual behavior. For example, if a
user acts very differently from others, they may be flagged as an anomaly, which can help
catch fraud or defects.
• e) Semi-Supervised Learning: If you have only a few labeled examples, clustering helps
spread those labels to similar instances, increasing the amount of labeled data for training.
• f) Search Engines: Search engines can find similar images by clustering all images.
When you upload a reference image, it quickly finds and returns images from the same
cluster.
• g) Image Segmentation: By grouping pixels based on color, you can simplify an image,
making it easier to detect and track objects.
k-means clustering
K-Means Clustering is a popular and easy-to-understand method for
grouping data into clusters. Here’s how it works:

1.Choose Clusters: Decide how many clusters you want, let’s say k clusters.
Then, randomly select k points from the data as the starting "centers" of
these clusters.
2.Assign Points: For each data point, find the closest center and assign the
point to that cluster.
3.Update Centers: After assigning all points, calculate the average position
of the points in each cluster. This average becomes the new center for that
cluster.
4.Repeat: Repeat the assignment and update steps until the centers no longer
change much (they converge).
5.Final Clusters: Once the centers are stable, the data points closest to each
center form the final clusters, with each cluster represented by its center.
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
k-means clustering
Disadvantages of K-Means Clustering
1.Choosing k Manually:
You have to decide how many clusters k to use. A “Loss vs. Clusters” plot can help
find the best k.
2.Dependence on Initial Values:
The result can change based on where you start. To reduce this issue, run k-means
multiple times with different starting points and choose the best outcome. For
larger datasets, more advanced methods for picking initial centers (called k-means
seeding) are needed.
3.Varying Sizes and Densities:
K-means struggles when clusters have different sizes or densities. It may not group
them effectively without adjustments to the algorithm.
4.Clustering Outliers:
Outliers can distort the results, dragging the center (centroid) away or forming their
own cluster. It may help to remove or adjust outliers before clustering.
5.High Dimensions:
As the number of features increases, the distances between points become less
meaningful, making clustering harder. You might need to reduce dimensions using
techniques like PCA or consider different clustering methods.

You might also like