0% found this document useful (0 votes)
19 views40 pages

M5

Uploaded by

acharyaramya412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views40 pages

M5

Uploaded by

acharyaramya412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

Part 1-

Clustering
Introduction
• Clustering or cluster analysis is a machine learning technique,
which groups the unlabelled dataset. It can be defined as "A
way of grouping the data points into different clusters,
consisting of similar data points. The objects with the
possible similarities remain in a group that has less or no
similarities with another group."
Application of clustering
• Market Segmentation – Businesses use clustering to group their customers and
use targeted advertisements to attract more audience.
• Market Basket Analysis – Shop owners analyze their sales and figure out which
items are majorly bought together by the customers. For example, In USA, according
to a study diapers and beers were usually bought together by fathers.
• Social Network Analysis – Social media sites use your data to understand your
browsing behaviour and provide you with targeted friend recommendations or
content recommendations.
• Medical Imaging – Doctors use Clustering to find out diseased areas in diagnostic
images like X-rays.
• Anomaly Detection – To find outliers in a stream of real-time dataset or forecasting
fraudulent transactions we can use clustering to identify them.
• Simplify working with large datasets – Each cluster is given a cluster ID after
clustering is complete. Now, you may reduce a feature set’s whole feature set into its
cluster ID. Clustering is effective when it can represent a complicated case with a
straightforward cluster ID. Using the same principle, clustering data can make
complex datasets simpler.
Requirement of clustering
• Scalability
• Dealing with different types of attributes
• Discovery of clusters with arbitrary shape
• Avoiding domain knowledge to determine input parameter
• Handling noisy data
• Incremental clustering
• Insensitivity to input order
• Handling high dimensional data
• Handling constraints
• Interpretability and usability
( for explanation refer text book pdf sent)
Types of clustering
Partitioning method

• Partitioning methods are a type of clustering approach used


to divide a dataset into distinct groups (or clusters) based on
the similarity of data points. The goal is to assign each data
point to exactly one cluster such that the data points in the
same cluster are more similar to each other than to those in
other clusters.
• Key Characteristics
1.Flat structure: These methods create non-overlapping, flat
clusters (no hierarchy).
2.Number of clusters: The number of clusters (k) is typically
predefined.
3.Iterative refinement: Clusters are iteratively refined to
minimize a specific objective function (e.g., within-cluster
K-Means Clustering
• A centroid-based clustering technique, K-Means is used to partition a dataset
into k distinct clusters. The aim is to minimize the intra-cluster distance (distance
within a cluster) while maximizing the inter-cluster distance (distance between
clusters).
Key Concepts
• Input Parameter k: Number of clusters, provided by the user.
• Cluster Mean (Centroid): The central point of a cluster, calculated as the mean of
all data points in the cluster.
• Distance Metric: Typically, the Euclidean distance is used to measure the
similarity between data points and centroids.
Objective Function
• The K-Means algorithm minimizes the sum of squared errors (SSE) or within-
cluster variance, defined as:
Steps in K-Means Algorithm
Initialization:
• Randomly select k data points from the dataset as the initial cluster
centroids.
Assignment:
• For each data point in the dataset, calculate the distance to each
cluster centroid.
• Assign the data point to the nearest cluster.
Update:
• Recalculate the centroid of each cluster as the mean of all data points
assigned to it.
Repeat:
Steps 2 and 3 (above) are repeated iteratively until one of the following
stopping conditions is met:
1.The centroids no longer change significantly.
2.A maximum number of iterations is reached.
3.The intra-cluster variance no longer decreases.
Strengths
• Efficient: Computationally simple and fast, especially for small to
medium-sized datasets.
• Scalable: Works well with large datasets when optimized.
• Versatile: Can handle numeric data effectively.
Limitations
1.Predefined k:
1. The number of clusters (k) must be known in advance, which is often not
intuitive.
2.Sensitivity to Initialization:
1. Poor choice of initial centroids can lead to suboptimal clustering.
3.Not Robust to Outliers:
1. Outliers can disproportionately influence the centroids.
4.Shape Assumption:
1. Assumes clusters are spherical and equally sized, making it unsuitable for
• Problems (refer textbook pdf sent)
K-Medoids Clustering
• K-Medoids is a clustering algorithm similar to K-Means, but instead of using
centroids as cluster centers, it uses actual data points (medoids). This
makes it more robust to outliers and noise because it minimizes the sum of
dissimilarities rather than squared distances.
Key Concepts
• Medoid: A representative data point within a cluster, which minimizes the
total dissimilarity (distance) to all other points in the cluster.
• Distance Metric: Any dissimilarity measure (e.g., Manhattan distance,
Euclidean distance, etc.) can be used.
Objective Function
• The K-Medoids algorithm minimizes the sum of pairwise dissimilarities
between data points and their assigned medoids:
Steps in K-Medoids Algorithm
Initialization:
•Randomly select kkk data points from the dataset as the initial medoids.
Assignment:
•Assign each data point to the nearest medoid based on the chosen distance
metric.
Update:
•For each cluster, calculate the total dissimilarity if a non-medoid point in
the cluster is swapped with the current medoid.
•If the total dissimilarity decreases, update the medoid for that cluster.
Repeat:
• Steps 2 and 3 are repeated until:
• The medoids no longer change.
• The total dissimilarity no longer decreases.
• A maximum number of iterations is reached.

•Problems (refer textbook pdf sent)


Hierarchical Clustering

• Hierarchical clustering is a clustering technique that builds


a hierarchy of clusters. Unlike partitioning methods (e.g., K-
Means or K-Medoids), it does not require the user to specify
the number of clusters in advance. Instead, it produces a
tree-like structure called a dendrogram, which represents
the hierarchy of clusters.
• Types of Hierarchical Clustering
• Agglomerative Clustering (Bottom-Up):
• Starts with each data point as its own cluster.
• Gradually merges clusters until all data points belong to one cluster or a
stopping condition is met.
• Divisive Clustering (Top-Down):
• Starts with all data points in a single cluster.
• Recursively splits clusters into smaller ones until each data point is its
own cluster or a stopping condition is met.
Agglomerative Clustering
Agglomerative clustering is the most common type of hierarchical
clustering, where the process starts with each data point as its own cluster
and iteratively merges the closest clusters until all points are grouped into
one large cluster or a predefined number of clusters is reached.
How it works
Initialization:
•Each data point is treated as its own cluster (singleton cluster).
Compute Pairwise Distances:
•Calculate the distance (or dissimilarity) between every pair of clusters
using a specified distance metric (e.g., Euclidean, Manhattan).
Merge Closest Clusters:
•Identify the two clusters with the smallest distance and merge them into a
new cluster.
Update Distances:
•Recalculate the distances between the newly formed cluster and all
remaining clusters using a linkage criterion.
Repeat:
• Steps 3 and 4 are repeated until:
• All data points belong to a single cluster (producing a dendrogram).
• Or the process stops at a predefined number of clusters.
Flowchart of agglomerative
clustering
Density-based clustering is a type of clustering algorithm designed to
find clusters of arbitrary shape and handle noise (outliers) effectively. It
groups data points that are closely packed together while marking
points in low-density regions as noise. The most popular density-based
clustering algorithm is DBSCAN (Density-Based Spatial Clustering
of Applications with Noise).

(minimum number of points) are within its 𝜖 (radius of neighborhood).


Core Point: A point is considered a core point if at least MinPts

Border Point: A point that is not a core point but lies within the 𝜖-
radius of a core point. Belongs to a cluster but does not contribute to
expanding it.
Noise Point: A point that is neither a core point nor a border

Cluster: Formed by connecting core points that are within the 𝜖-radius
point.Considered an outlier.

of each other, along with any border points associated with those core
points.
Grid-Based Clustering

• Grid-based clustering is a clustering method that divides the data


space into a grid structure composed of a finite number of cells, and
clustering is performed based on the density of data points in these
cells. This method is computationally efficient, as it depends on the
number of grid cells rather than the number of data points.
• Key Concepts in Grid-Based Clustering
1.Grid Structure:
1. The data space is divided into a grid with m-dimensional cells (hyper-
rectangular regions) based on the number of features (dimensions) in the
data.
2.Cell Density:
1. The density of a cell is defined as the number of data points it contains.
2. A threshold (density threshold) is used to classify cells as dense or sparse.
3.Cluster Formation:
1. Dense cells that are adjacent (or connected) form clusters.
2. Sparse cells are considered noise or outliers.
Part 2 - Instance
Based Learning
Instance Based Learning
• Instance-based learning is a machine learning approach that makes
decisions or predictions based on specific instances or examples from
the training data. Instead of explicitly creating a generalized model
from the data, instance-based learning algorithms keep the training
data and rely on it to make predictions when new data points are
encountered.
Key Characteristics
• Memory-based: The model retains the training instances, meaning
it requires more memory and storage compared to model-based
approaches.
• Lazy Learning: These algorithms delay generalization until a query
is made. They don't create a model until prediction time, which can
make training faster but prediction slower.
• Local Generalization: Predictions are based on a small subset of
the training data near the new instance rather than a global model.
Instance Based Learning
Advantages
• Adaptability: Can model complex decision boundaries with enough
training data.
• No Training Phase: Quick to implement since no model needs to be pre-
built.
• High Accuracy: Works well with large datasets where relationships aren't
linear.
Disadvantages
• Computational Cost: Prediction can be slow due to the need to scan the
entire dataset.
• Memory Requirement: Stores all or most of the training data.
• Overfitting: May perform poorly with noisy data or irrelevant features
unless preprocessed.
k-NEAREST NEIGHBOR LEARNING
• k-NN is a supervised learning algorithm used for both
classification and regression. It predicts the output of a data
point based on the outputs of its k nearest neighbors in the
feature space.
• KNN algorithm finds observations in the training set, which are
similar to the new observation. These observations are called
neighbors. For better accuracy, a set of neighbors (K) can be
considered for classifying a new observation. The class for the
new observation can be predicted to be same class that majority
In Figure, observations belong to two classes
of the neighbors belong to.
represented by triangle and circle shapes. To
find the class for a new observation, a set of
neighbors, marked by the circle, are
examined. As a majority of the neighbors
belong to class B, the new observation is
classified as class B.
KNN Algorithm
• The neighbors are found by computing distance between
observations. Euclidean distance is one of the most widely used
distance metrics. It is given by
Distance-Weighted Nearest Neighbor (DWNN)

• The Distance-Weighted Nearest Neighbor (DWNN)


algorithm is a variant of k-Nearest Neighbors (k-NN) that
assigns weights to each neighbor based on their distance from
the query point. Closer neighbors are given higher weights,
making them more influential in the prediction process. This
improves the model's performance, especially when neighbors
are unevenly spaced or have varying importance.
• Steps:
• Calculate the distance between the new data point and all data
points in the training dataset, just like in basic k-NN.
• Select the k-nearest data points like in k-NN.
• Assign a weight to each of the k-nearest neighbors inversely
proportional to their distance from the query point. In simpler
terms, closer neighbors get higher weights, while
farther neighbors get lower weights.
Distance-Weighted Nearest Neighbor (DWNN)
• For classification tasks, when determining the predicted class,
the contributions of neighbors are scaled by their weights.
Closer neighbors have a stronger influence on the
prediction.
• For regression tasks, the values associated with neighbors are
multiplied by their weights before averaging, giving more
importance to closer neighbors in the prediction.
• Formulas

The query point exactly matches one of the training instances and the denominator is therefore
zero, we assign to be in this case. ,
Locally Weighted Regression (LWR) / Locally
Weighted Linear Regression (LWLR)

Locally Weighted Linear Regression (LWLR) is


a non-parametric regression technique that
aims to fit a linear regression model to a
dataset by giving more weight to nearby data
points. For example, consider a dataset of
temperature readings and corresponding
energy consumption. LWLR can be used to
predict the energy consumption for a given
temperature reading by fitting a linear
regression model to the training data, where
the weight assigned to each training data
point is inversely proportional to its distance
from the query point. This means that training
data points that are closer to the query point
will have a higher weight and contribute more
to the linear regression model.
• Some important points to remember regarding LWR.
• LWR is a non-parametric regression technique that fits a linear regression model to a
dataset by giving more weight to nearby data points.

• LWR fits a separate linear regression model for each query point based on the weights
assigned to the training data points.

• The weights assigned to each training data point are inversely proportional to their
distance from the query point.

• Training data points that are closer to the query point will have a higher weight and
contribute more to the linear regression model.

• LWLR is useful when a global linear model does not well-capture the relationship
between the input and output variables. The goal is to capture local patterns in the
data.
Locally Weighted Regression (LWR)
Key features
Locality:
•Instead of using the entire dataset to fit a single model, LWR focuses on
the data points near the target input value.
•Nearby points are given higher importance (weight), and farther points
are given lower weight.
Weights:
•LWR assigns weights to data points based on their distance from the
query point.
•A common weighting function is the Gaussian kernel

Here, xq​is the query point, and τ (bandwidth) controls the influence of distant points.
Local Model:
•For a given query point xq​, LWR fits a simple linear regression model (or any desired
regression model) weighted by w(i).
•The model minimizes the following weighted loss function:

Predictions:
•The fitted model at xq​is used to predict yq​.
•This process is repeated for every query point, making the method computationally
expensive for large datasets.
Algorithm Steps:
1.Choose a query point xq.
2.Compute weights w(i) for all points in the dataset, using a kernel function.
3.Fit a weighted linear regression model using the weights.
4.Predict the output yq for xq using the fitted model.
5.Repeat for all desired query points.
Advantages:
•Flexibility: Captures non-linear relationships in data.
•Local Adaptation: Adapts to varying trends across different regions of the input space.

Disadvantages:
•Computationally Expensive: Each prediction requires fitting a new model, making it inefficient
for large datasets.
•Bandwidth Sensitivity: Choice of τ\tau significantly affects the model's performance.
•Small τ\tau: Too sensitive, prone to overfitting.
•Large τ\tau: Too smooth, prone to underfitting.

Applications:
•Data visualization
•Non-linear regression when a global model is inadequate.
•Situations where interpretability is important locally.
Radial basis function (RBF)
• Kernels play a fundamental role in transforming data into higher-dimensional
spaces, enabling algorithms to learn complex patterns and relationships.
Among the diverse kernel functions, the Radial Basis Function (RBF) kernel
stands out as a versatile and powerful tool. The Radial Basis Function (RBF)
kernel, also known as the Gaussian kernel, is one of the most widely used
kernel functions. It operates by measuring the similarity between data points
based on their Euclidean distance in the input space. Mathematically, the RBF
kernel between two ∣x–x’∣2 represents the squared Euclidean distance
between the two data points.

• ∣x–x’∣2 represents the squared Euclidean distance between the two data
points

• σσ is a parameter known as the bandwidth or width of the kernel, controlling


the smoothness of the decision boundary.
Applications in Machine Learning:
1.Kernel in Support Vector Machines (SVMs):
1. The Gaussian RBF is a popular kernel function used to transform data into a
higher-dimensional space where it becomes linearly separable.
2. Kernel function:
3. where is a hyperparameter.
2.Radial Basis Function Networks (RBFNs):
1. A type of neural network that uses RBF neurons in the hidden layer.
2. Used for function approximation, regression, and classification.
3.Interpolation and Approximation:
1. RBFs are extensively used in interpolating scattered data points or surface
fitting (e.g., in geostatistics).
4.Clustering and Density Estimation:
1. Gaussian Mixture Models (GMMs) use Gaussian RBFs for clustering and
density estimation.
• For more details refer : https://hackernoon.com/radial-basis-functions-types-
advantages-and-use-ca
Radial basis function (RBF) in ANN
Cased based reasoning
• Case-Based Reasoning (CBR) is an approach to problem-solving and learning that
uses past experiences (or "cases") to address new problems. It is widely applied in
fields like artificial intelligence, medical diagnosis, legal reasoning, customer support,
and more.
1.Case:
A case typically consists of:
1.Problem description: Details of the situation or problem encountered.
2.Solution: The actions or decisions taken to resolve the problem.
3.Outcome: The result of applying the solution.

2.Case Base:
A repository or database where cases are stored. These past cases are used to guide
the reasoning process.

3.Reasoning Process in CBR: The CBR cycle often consists of four steps:
1.Retrieve: Identify and retrieve the most similar past cases from the case base.
2.Reuse: Adapt the solutions from the retrieved cases to solve the new problem.
3.Revise: Test the solution in the real world and revise it if necessary.
4.Retain: If the solution works well, save it as a new case in the case base for future reference.
Benefits of CBR:
• Leverages past knowledge: CBR builds on existing knowledge,
avoiding the need to reinvent the wheel for every problem.
• Handles complex problems: It can provide solutions for complex,
poorly understood, or incomplete problems.
• Adaptive: The system evolves by learning from new experiences.
Applications:
1.Medical Diagnosis: Using past cases to diagnose and recommend
treatments for diseases.
2.Legal Reasoning: Applying precedents from previous cases to
resolve legal disputes.
3.Customer Support: Solving customer queries based on similar past
issues.
4.Engineering Design: Reusing solutions for similar design challenges.
5.Education: Adaptive tutoring systems that recommend solutions
based on past learner behavior.

You might also like