ML Unit-4-1
ML Unit-4-1
Home
Unit-2
Unit-3
Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and
Pasting, Random Forests, Boosting, Stacking. Support Vector Machine: Linear SVM
Classification, Nonlinear SVM Classification SVM Regression, Naïve Bayes Classifiers.
Unit-4
Unit-5
Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing Tensor Flow 2, Loading and Preprocessing Data
with Tensor Flow.
Home
Unit-4
Clustering
Cluster analysis or simply clustering is the process of partitioning a set of data objects (or
observations) into subsets. Each subset is a cluster, such that objects in a cluster are similar to
one another, yet dissimilar to objects in other clusters. The set of clusters resulting from a
cluster analysis can be referred to as a clustering. In this context, different clustering methods
may generate different clusterings on the same data set. The partitioning is not performed by
humans, but by the clustering algorithm. Hence, clustering is useful in that it can lead to the
discovery of previously unknown groups within the data.
Cluster analysis has been widely used in many applications such as business intelligence,
image pattern recognition, Web search, biology, and security.
• In business intelligence, clustering can be used to organize a large number of customers
into groups, where customers within a group share strong similar characteristics. This
facilitates the development of business strategies for enhanced customer relationship
management.
• In image recognition, clustering can be used to discover clusters or “subclasses” in
handwritten character recognition systems. Suppose we have a data set of handwritten
digits, where each digit is labeled as either 1, 2, 3, and so on. Note that there can be a
large variance in the way in which people write the same digit. Take the number 2, for
example. Some people may write it with a small circle at the left bottompart, while
some others may not. We can use clustering to determine subclasses for “2,” each of
which represents a variation on the way in which 2 can be written. Using multiple
models based on the subclasses can improve overall recognition accuracy.
• Clustering has also found many applications in Web search. For example, a keyword
search may often return a very large number of hits (i.e., pages relevant to the search)
Home
due to the extremely large number of web pages. Clustering can be used to organize the
search results into groups and present the results in a concise and easily accessible way.
Cluster is a collection of data objects that are similar to one another within the cluster and
dissimilar to objects in other clusters, a cluster of data objects can be treated as an implicit
class. In this sense, clustering is sometimes called automatic classification. Again, a critical
difference here is that clustering can automatically find the groupings. This is a distinct
advantage of cluster analysis.
Clustering is also called data segmentation in some applications because clustering partitions
large data sets into groups according to their similarity. Clustering can also be used for outlier
detection, where outliers (values that are “far away” from any cluster) may be more interesting
than common cases. Applications of outlier detection include the detection of credit card fraud
and the monitoring of criminal activities in electronic commerce.
Types of Clustering:
1. Partitioning methods.
• k-Means: A Centroid-Based Technique
• k-Medoids: A Representative Object-Based Technique
• CLARANS (Clustering Large Applications based upon RANdomized Search)
2. Hierarchical methods.
• Agglomerative versus Divisive Hierarchical Clustering.
• Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH).
• Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling.
3. Density-based methods
• DBSCAN (Density-Based Spatial Clustering of Applications with Noise).
• OPTICS: Ordering Points to Identify the Clustering Structure.
• OPTICS: Ordering Points to Identify the Clustering Structure.
4. Grid-based methods
• STING: STatistical INformation Grid
• CLIQUE (CLustering In QUEst)
Home
k-Means
Suppose a data set, D, contains n objects in Euclidean space. Partitioning methods distribute
the objects in D into k clusters, C1, : : : ,Ck. An objective function is used to assess the
partitioning quality so that objects within a cluster are similar to one another but dissimilar to
objects in other clusters. This is, the objective function aims for high intracluster similarity and
low intercluster similarity.
A centroid-based partitioning technique uses the centroid of a cluster, Ci , to represent that
cluster. Conceptually, the centroid of a cluster is its center point. The centroid can be defined
in various ways such as by the mean or medoid of the objects (or points) assigned to the cluster.
The difference between an object p Ci and ci, the representative of the cluster, is measured
by dist(p, ci), where dist(x,y) is the Euclidean distance between two points x and y. The quality
of cluster Ci can be measured by the with in cluster variation, which is the sum of squared
error between all objects in Ci and the centroid ci, defined as
where E is the sum of the squared error for all objects in the data set; p is the point in space
representing a given object; and ci is the centroid of cluster Ci.
k-means algorithm:
Home
Example: Consider a set of objects located in 2-D space, as depicted in Figure (a). Let k =3,
that is, the user would like the objects to be partitioned into three clusters.
Clustering of a set of objects using the k-means method; for (b) update cluster centers and
reassign objects accordingly (the mean of each cluster is marked by a C).
We arbitrarily choose three objects as the three initial cluster centers, where cluster centers are
marked by a +. Each object is assigned to a cluster based on the cluster center to which it is the
nearest. Such a distribution forms silhouettes encircled by dotted curves, as shown in Figure
(a).
Next, the cluster centers are updated. That is, the mean value of each cluster is recalculated
based on the current objects in the cluster. Using the new cluster centers, the objects are
redistributed to the clusters based on which cluster center is the nearest. Such a redistribution
forms new silhouettes encircled by dashed curves, as shown in Figure (b).
This process iterates, leading to Figure (c). The process of iteratively reassigning objects to
clusters to improve the partitioning is referred to as iterative relocation. Eventually, no
reassignment of the objects in any cluster occurs and so the process terminates. The resulting
clusters are returned by the clustering process.
• The k-means method is not guaranteed to converge to the global optimum and often
terminates at a local optimum. The results may depend on the initial random selection
of cluster centers.
• To obtain good results in practice, it is common to run the k-means algorithm multiple
times with different initial cluster centers.
Home
Limits of K-Means:
1. Sensitivity to Initial Conditions
• K-means is sensitive to initial conditions. The algorithm randomly initializes the
cluster centroids at the beginning, and the final clustering results can vary
depending on these initial positions.
• Different initializations can lead to other local optima, resulting in different
clustering outcomes. This makes the K-means algorithm less reliable and
reproducible.
2. Difficulty in Determining
• One of the drawbacks of the K-means algorithm is that we have to set the number
of clusters (K) in advance. Choosing an incorrect number of clusters can lead to
inaccurate results. Various methods are available to estimate the optimal K, such as
the silhouette analysis or elbow method, but they may not always provide a clear-
cut answer.
• If we use a too-small K, we’ll get too broad clusters.
• It may require multiple runs to find the most suitable value of K, which can be time-
consuming and resource-consuming.
3. Inability to Handle Categorical Data
• The algorithm works with numerical data, where distances between data points can
be calculated. However, categorical data doesn’t have a natural notion of distance
or similarity.
• When categorical data is used with the K-means algorithm, it requires converting
the categories into numerical values, such as using one-hot encoding.
• One shortcoming of using one-hot encoding is that it treats each feature
independently and can degrade performance since it can significantly increase data
dimensionality.
4. Time Complexity
• The time complexity of the algorithm is O(n * K * M * D), where K is the number
of clusters, n is the number of data points, D is the number of dimensions, and M is
the number of iterations.
Home
Using Clustering for Image Segmentation
Image segmentation is the process of dividing an image into distinct regions or segments,
where each region has some common characteristic. The goal is to simplify or change the
representation of an image, making it more meaningful and easier to analyze. Each segment
can represent different objects, boundaries, textures, or regions of interest.
Types of Image Segmentation
1. Thresholding: Pixels are grouped based on intensity values (e.g., foreground vs.
background).
2. Edge Detection: Segmentation based on identifying edges in the image (using
techniques like Sobel, Canny).
3. Region-based Segmentation: Dividing the image based on regions with similar
properties, such as color or texture.
4. Clustering-based Segmentation: Using clustering algorithms to group pixels with
similar features.
K-Means Clustering for Image Segmentation
K-Means is a partitional clustering algorithm that divides data into k clusters, where k is
predefined. It assigns each data point to the nearest cluster center and then iteratively updates
the centers based on the points assigned to them. It’s an unsupervised learning algorithm
because it doesn’t require labeled data.
K-Means Algorithm Steps:
1. Choose k cluster centers (centroids): These can be initialized randomly or by some
other method (like K-Means++).
2. Assign each point to the nearest centroid: Compute the Euclidean distance between
each point and the centroids and assign each pixel to the closest centroid.
3. Update the centroids: After assignment, update the centroids by calculating the mean
of all the points in each cluster.
4. Repeat: Repeat steps 2 and 3 until the centroids no longer change significantly
(convergence) or a specified number of iterations is reached.
Using K-Means for Image Segmentation
In the context of image segmentation, K-Means is applied to group similar pixels based on
color, intensity, or other features, making it easier to segment objects, textures, or regions in an
image.
Home
Process of K-Means for Image Segmentation:
1. Flatten the Image: The image, which is typically a 3D array of shape (height, width,
channels), is reshaped into a 2D array of shape (num_pixels, channels) where each row
represents a pixel’s RGB (or grayscale) value.
2. Clustering: K-Means is applied to the pixel data. Each pixel is assigned to one of the
k clusters, where k is the number of desired segments or regions.
3. Reconstructing the Image: The result of K-Means clustering is a set of centroids (the
average color of each cluster). The image is then reconstructed by replacing each pixel's
original color with the corresponding centroid color.
4. Output: The output is a segmented image with k regions, each having similar pixel
values. These regions could represent objects or different parts of an image.
Home
Example code:
import cv2
import numpy as np
import matplotlib.pyplot as plt
Home
Using Clustering for Pre processing
Using clustering in data preprocessing can enhance the quality and efficiency of your machine
learning model by grouping similar data points and creating new features, reducing
dimensionality, or addressing data issues like noise and outliers.
Clustering is an unsupervised machine learning technique where data points are grouped based
on similarities. The idea is that points in the same group (or cluster) share certain characteristics
and should be closer to each other in some feature space than to points in other clusters.
Common clustering algorithms include:
• K-means
• DBSCAN (Density-Based Spatial Clustering)
• Hierarchical Clustering
• Gaussian Mixture Models (GMM)
Each of these algorithms has its strengths and is chosen based on the data's nature (e.g., the
expected shape of the clusters, noise levels, etc.)
Why Use Clustering in Preprocessing?
Clustering is used in preprocessing to simplify, structure, and enhance the data. Here's how
it can be applied at various stages:
1. Feature Engineering
Clustering can be used to create new features that help machine learning models identify
patterns or relationships in the data more easily. By labeling each data point with the cluster it
belongs to, you can add a cluster label as an additional feature, which often improves model
performance.
Example: Suppose you have a dataset of customer information (age, income, spending
behavior) and want to predict customer churn. By clustering the customers into groups based
on their spending patterns, the cluster label can serve as a feature to help the model better
understand the behavior of customers who are likely to churn.
Steps:
1. Apply a clustering algorithm like K-means to the customer data.
2. Assign each customer to a cluster (label).
3. Use the cluster label as a feature for a classification model (e.g., predicting whether a
customer will churn).
Home
2. Dimensionality Reduction
High-dimensional datasets (i.e., datasets with many features) can suffer from the curse of
dimensionality, where the complexity of the model increases exponentially as the number of
features grows. Clustering helps to reduce the number of features by grouping similar data
points together.
How Clustering Helps: After clustering, you can use the cluster centroids (the average of all
points in a cluster) as a summarized representation of that group. This reduces the number of
unique data points the model needs to process.
Example: For customer data with many variables (age, income, education, etc.), you can apply
K-means clustering and represent each customer by the distance to their assigned cluster's
centroid, reducing the complexity of the dataset.
4. Data Balancing
Many real-world datasets are imbalanced, meaning that some classes are underrepresented
compared to others. Clustering can be used to balance data within clusters, which can improve
model performance, especially in classification tasks.
How Clustering Helps:After clustering the data, you can ensure that each cluster contains a
balanced representation of different classes by using techniques like oversampling or
undersampling within each cluster.
Example: In a medical dataset, the disease class might be underrepresented. After clustering
the data by patient characteristics, you can apply SMOTE (Synthetic Minority Oversampling
Technique) within each cluster to balance the classes before training a model.
Home
5. Reducing Complexity in Model Training
When training models, having smaller, more homogeneous groups (clusters) can reduce the
complexity of the training process. Instead of training on the entire dataset, you can train on
clusters that represent more homogeneous data points. This can make models more
interpretable and faster to train.
How Clustering Helps: Clustering helps segment the data into groups that can be trained
independently, making it easier to fit models and analyze the relationships within smaller
subsets of data.
Example: For large datasets with customer behavior data, clustering helps group customers
with similar buying patterns. You can then train a classifier separately for each cluster, leading
to more efficient training and better performance.
Home
Using Clustering for Semi-Supervised Learning
Semi-supervised learning (SSL) is a type of machine learning where a model is trained using
a combination of:
• Labeled data (data with known outputs or labels),
• Unlabeled data (data where the labels are unknown).
In real-world scenarios, labeled data can be scarce, expensive, or time-consuming to obtain,
while unlabeled data is abundant. Semi-supervised learning aims to leverage both labeled and
unlabeled data to improve model performance. Clustering is a powerful technique in SSL, as
it can help make use of the large amount of unlabeled data.
Clustering in Semi-Supervised Learning
Clustering in SSL involves grouping unlabeled data points into clusters based on their features.
These clusters represent different subgroups or patterns in the data. The key idea is to use the
structure of the unlabeled data to guide the learning process and improve predictions on new
or unseen data points.
Home
3. Using Clusters as Pseudo-Labels:
In SSL, pseudo-labelling is a method where you use the predicted label from an unsupervised
learning model (like clustering) as a "pseudo-label" for training the model. By assigning labels
to unlabelled data based on their cluster assignment, we treat them as if they were labeled.
Example: If a clustering algorithm divides a dataset into 5 clusters, we can assume that all the
data points in the same cluster share the same label and train the model accordingly.
4. Data Exploration and Disambiguation:
Clustering can also help in data exploration. By grouping similar data points together,
clustering can reveal hidden structure and relationships in the data. This may help in making
better decisions about how to assign labels or refine training data, especially when there's
ambiguity in labelling.
Home
DBSCAN
Clustering is an unsupervised learning technique where we try to group the data points based
on specific characteristics. There are various clustering algorithms with K-
Means and Hierarchical being the most used ones. Some of the use cases of clustering
algorithms include:
• Document Clustering
• Recommendation Engine
• Image Segmentation
• Market Segmentation
• Search Result Grouping
• and Anomaly Detection.
K-Means and Hierarchical Clustering both fail to create clusters of arbitrary shapes. They are
not able to form clusters based on varying densities. That’s why we need DBSCAN clustering.
Home
We can see three different dense clusters in the form of concentric circles with some noise here.
Now, let’s run K-Means and Hierarchical clustering algorithms and see how they cluster these
data points.
You might be wondering why there are four colors in the graph. As I said earlier, this data
contains noise, too. Therefore, I have taken noise as a different cluster, which is represented by
the purple color. Sadly, both of them failed to cluster the data points. Also, they were not able
to detect the noise present in the dataset properly. Now, let’s take a look at the results from
DBSCAN clustering.
Awesome! DBSCAN cannot only cluster the data points correctly but also perfectly detect
noise in the dataset.
Home
DBSCAN Algorithm:
“How does DBSCAN find clusters?”Initially, all objects in a given data set D are marked as
“unvisited.” DBSCAN randomly selects an unvisited object p, marks p as “visited,” and checks
whether the neighborhood of p contains at least MinPts objects.If not, p is marked as a noise
point. Otherwise, a new cluster C is created for p, and all the objects in the neighborhood of
p are added to a candidate set, N. DBSCAN iteratively adds to C those objects in N that do not
belong to any cluster. In this process, for an object p1 in N that carries the label “unvisited,”
DBSCAN marks it as “visited” and checks its neighborhood. If the neighborhood of p1 has
at least MinPts objects, those objects in the neighborhood of p1 are added to N. DBSCAN
continues adding objects to C until C can no longer be expanded, that is, N is empty. At this
time, cluster C is completed, and thus is output. To find the next cluster, DBSCAN randomly
selects an unvisited object from the remaining ones. The clustering process continues until all
objects are visited.
Home
Reachability and Connectivity
These are the two concepts that you need to understand before moving further. Reachability
states if a data point can be accessed from another data point directly or indirectly, whereas
Connectivity states whether two data points belong to the same cluster or not. In terms of
reachability and connectivity, two points in DBSCAN can be referred to as:
• Directly Density-Reachable
• Density-Reachable
• Density-Connected
Density-reachability and density-connectivity. Consider below figure for a given
represented by the radius of the circles, and, say, let MinPts = 3.
Labelled points, m,p,o, r are core objects because each is in an neighbourhood containing at
least three points. Object q is directly density-reachable from m. Object m is directly density-
reachable from p and vice versa.
Object q is (indirectly) density-reachable from p because q is directly density reachable from
m and m is directly density-reachable from p. However, p is not density reachable from q
because q is not a core object. Similarly, r and s are density-reachable from o and o is density-
reachable from r. Thus, o, r, and s are all density-connected.
Home
Disadvantages of the DBSCAN Algorithm
• It does not work with datasets that have varying densities.
• Cannot be employed with multiprocessing as it cannot be partitioned.
• Cannot find the right cluster if the dataset is sparse.
• It is sensitive to parameters epsilon and minPoints
Applications of DBSCAN
• It is used in satellite imagery.
• Used in XRay crystallography
• Anamoly detection in temperature.
Home
Gaussian Mixtures
A Gaussian Mixture Model (GMM) is a probabilistic model used to represent the presence of
subpopulations (clusters) within a larger population, where each subpopulation follows a
Gaussian (normal) distribution. GMM is considered a soft clustering technique because each
data point can belong to multiple clusters with different probabilities.
GMM assumes that:
• The data points are generated by a mixture of several Gaussian distributions.
• Each distribution is defined by its mean vector (μ) and covariance matrix (Σ).
Mathematical Representation
A GMM is represented as:
Parameters of GMM
The GMM has three key sets of parameters for each Gaussian component:
Home
Parameter Estimation: Expectation-Maximization (EM) Algorithm
To fit a GMM to data, we use the Expectation-Maximization (EM) algorithm:
Advantages of GMM
• Soft clustering: Points can belong to multiple clusters with varying probabilities.
• Flexibility: Can model clusters with different shapes and densities by adjusting the
covariance structure.
• Probabilistic interpretation: Provides insights into the uncertainty of cluster
assignments.
Limitations of GMM
• Sensitive to Initialization: Poor initialization can lead to local optima.
• Assumption of Gaussian Distribution: May not perform well if the data doesn't fit a
Gaussian distribution.
• Number of Components: Requires the number of components (K) to be specified
beforehand.
Applications
• Anomaly Detection: Identifying outliers based on low probability density.
• Image Processing: Segmenting regions with different textures or intensities.
• Customer Segmentation: Grouping customers based on purchasing behavior.
• Speech Recognition: Modeling acoustic feature distributions.
Home
Dimensionality Reduction
Dimensionality reduction is a technique used in machine learning and data science to reduce
the number of input features while preserving as much important information as possible. This
helps in overcoming computational inefficiencies and improving model performance.
Home
Let’s take an example to explain this better:
Imagine you are building a machine learning model to predict house prices based on features
like the number of bedrooms, square footage, location, age of the house, number of bathrooms,
and so on. If you have too many features like additional ones for each room’s condition,
flooring type, or neighborhood amenities, your dataset can become very large and complex.
With too many features, your model may become slow to train, and it might also pick up
unnecessary details or noise. For example, suppose the flooring type doesn’t significantly
impact house prices. In that case, it might lead the model to make less accurate predictions,
especially when the data is noisy or when there are many irrelevant features.
On the left, data points exist in a 3D space (X, Y, Z), but the Z-dimension appears unnecessary
since the data primarily varies along the X and Y axes. The goal of dimensionality reduction is
to remove less important dimensions without losing valuable information.
On the right, after reducing the dimensionality, the data is represented in lower-dimensional
spaces. The top plot (X-Y) maintains the meaningful structure, while the bottom plot (Z-Y)
shows that the Z-dimension contributed little useful information. This process makes data
analysis more efficient, improving computation speed and visualization while minimizing
redundancy.
Home
Main Approaches for Dimensionality Reduction
1. Feature Selection.
2. Feature Extraction
1. Feature Selection (Selecting Relevant Features):
Feature selection chooses the most relevant features from the dataset without altering them.
It helps remove redundant or irrelevant features, improving model efficiency. There are
several methods for feature selection including filter methods, wrapper methods, and
embedded methods.
Home
B. Non-Linear Methods
1. t-SNE (t-Distributed Stochastic Neighbor Embedding): Maps high-
dimensional data to a lower-dimensional space while preserving local relationships.
Best Used for Visualization
2. UMAP (Uniform Manifold Approximation and Projection): Similar to t-
SNE but faster and better at preserving global structure. Best Used for Large datasets
and Visualization.
3. Autoencoders (Neural Networks): Uses an encoder-decoder architecture to
learn compact representations. Best Used for Deep Learning Applications.
Home
Principal Component Analysis (PCA)
PCA Working:
Step 1: Standardize the Data
Since PCA is sensitive to feature scaling, we first standardize the data by converting each
feature into a zero-mean and unit variance form:
where:
• μ is the mean of each feature.
• σ is the standard deviation
where:
• C is the d × d covariance matrix.
• XT X is the dot product of the data matrix.
Home
Example covariance matrix for 3 features:
Home
Example of PCA
Home
Home
For Eigenvectors:
Home
Solve for Eigenvector λ2=0.1151
Home
Step 6: Select the Principal Component
Since we want to reduce our 2D data to 1D, we select the eigenvector corresponding to the
largest eigenvalue, which captures the most variance.
Our eigenvalues are: λ1=10.8224, λ2=0.1151 Since λ1 is the largest eigenvalue, its
corresponding eigenvector will be our principal component:
The final transformed data after applying PCA (projecting onto the principal component) is:
These are the projections of the original points onto the principal component (the direction
of maximum variance).
Home
Using Scikit-Learn
Scikit-Learn (or sklearn) is a popular Python machine learning library used for:
1. Data Preprocessing (e.g., Standardization, Normalization)
2. Dimensionality Reduction (e.g., PCA, t-SNE)
3. Machine Learning Algorithms (e.g., SVM, Decision Trees, K-Means)
4. Model Evaluation (e.g., Accuracy, Precision-Recall)
Program
import numpy as np
X = np.array([
[2, 3],
[3, 5],
[5, 8],
[7, 10]])
Home
# Step 4: Print Results
plt.figure(figsize=(6, 4))
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid()
plt.show()
Home
Randomized PCA
Randomized PCA is a faster approximation of standard PCA, particularly useful for large
datasets with high dimensions. Instead of computing all eigenvalues and eigenvectors, it
uses a stochastic algorithm to estimate the most important components efficiently.
Advantages:
• Faster than Standard PCA for high-dimensional data.
• Good Approximation of the principal components.
• Works Well with Sparse Data.
• Useful for Large Image and Text Datasets
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
X = np.array([
[2, 3],
[3, 5],
[5, 8],
[7, 10]
])
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
Home
plt.figure(figsize=(6, 4))
plt.scatter(X[:, 0], X[:, 1], label="Original Data", color='blue')
Home
Kernel PCA
Kernel PCA (KPCA) is an extension of Principal Component Analysis (PCA) that allows us to
find nonlinear patterns in data. While standard PCA only works with linear transformations,
Kernel PCA uses the kernel trick to map data into a higher-dimensional space, where it can
find nonlinear principal components.
Mathematical Explanation:
1. Standard PCA finds principal components by computing eigenvectors of the covariance
matrix.
2. Kernel PCA applies a nonlinear transformation Φ(X) to map data into a higher-
dimensional space.
3. Instead of computing this higher-dimensional transformation explicitly, KPCA uses the
kernel trick: K(xi,xj)=Φ(xi)⋅Φ(xj) where K is a kernel function.
4. KPCA finds eigenvectors of the kernel matrix K, allowing for dimensionality reduction
in the transformed space.
Home
Kernel PCA in Scikit-Learn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import KernelPCA
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Standard PCA
axes[0].scatter(X_pca[:, 0], X_pca[:, 1], c=y, map="coolwarm")
axes[0].set_title("Standard PCA")
# Kernel PCA
axes[1].scatter(X_kpca[:, 0],X_kpca[:,1],c=y,cmap="coolwarm")
axes[1].set_title("Kernel PCA (RBF)")
plt.show()
Home
Differences Between PCA, Randomized PCA, and Kernel PCA
PCA (Standard
Feature Randomized PCA Kernel PCA
PCA)
Uses randomized
Computes exact Singular Value Applies kernel trick to
Approach eigenvalues from Decomposition map data to a higher-
covariance matrix (SVD) for dimensional space
approximation
Handles
No, only linear Yes, captures nonlinear
Nonlinear No, still linear
patterns structures
Data?
Fast, optimized for
Slow for large Moderate, depends on
Speed high-dimensional
datasets kernel choice
data
Memory High (stores full Low (works with a
High (stores kernel matrix)
Usage covariance matrix) subset of data)
Efficient for large Can be expensive for large
Scalability Poor for big data
datasets datasets
Yes (uses kernels like
Kernel Trick? No No
RBF, polynomial)
Small to medium
Works Best Large, high- Nonlinear datasets with
datasets with linear
For dimensional datasets complex patterns
relationships
Image compression, Big data applications Clustering, face
Example Use
eigenfaces, financial (e.g., NLP, recognition, non-
Cases
data analysis genomics) Euclidean data
Implementati PCA(n_components
PCA(n_components KernelPCA(n_components
on in Scikit- =k, svd_solver
=k) =k, kernel="rbf")
Learn ="randomized")