0% found this document useful (0 votes)
12 views

Module 3 ML

ml

Uploaded by

neha1831sewani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Module 3 ML

ml

Uploaded by

neha1831sewani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Module 03

The Curse of Dimensionality in Machine Learning arises when working with


high-dimensional data, leading to increased computational complexity,
overfitting, and spurious correlations. Techniques like dimensionality reduction,
feature selection, and careful model design are essential for mitigating its
effects and improving algorithm performance. Navigating this challenge is
crucial for unlocking the potential of high-dimensional datasets and ensuring
robust machine-learning solutions.

What is the Curse of Dimensionality?


The Curse of Dimensionality refers to the phenomenon where the efficiency
and effectiveness of algorithms deteriorate as the dimensionality of the
data increases exponentially.

In high-dimensional spaces, data points become sparse, making it


challenging to discern meaningful patterns or relationships due to the vast
amount of data required to adequately sample the space.

The Curse of Dimensionality significantly impacts machine


learning algorithms in various ways. It leads to increased computational
complexity, longer training times, and higher resource requirements.
Moreover, it escalates the risk of overfitting and spurious correlations,
hindering the algorithms' ability to generalize well to unseen data.

How to Overcome the Curse of


Dimensionality?
To overcome the curse of dimensionality, you can consider the following
strategies:

Dimensionality Reduction Techniques:


Feature Selection: Identify and select the most relevant features from the
original dataset while discarding irrelevant or redundant ones. This reduces

Module 03 1
the dimensionality of the data, simplifying the model and improving its
efficiency.

Feature Extraction: Transform the original high-dimensional data into a


lower-dimensional space by creating new features that capture the
essential information. Techniques such as Principal Component Analysis
(PCA) and t-distributed Stochastic Neighbor Embedding (t-SNE) are
commonly used for feature extraction.

Data Preprocessing:
Normalization: Scale the features to a similar range to prevent certain
features from dominating others, especially in distance-based algorithms.

Handling Missing Values: Address missing data appropriately through


imputation or deletion to ensure robustness in the model training process.

PCA
As the number of features or dimensions in a dataset increases, the amount of
data required to obtain a statistically significant result increases exponentially.
This can lead to issues such as overfitting, increased computation time, and
reduced accuracy of machine learning models this is known as the curse of
dimensionality problems that arise while working with high-dimensional data.
As the number of dimensions increases, the number of possible combinations
of features increases exponentially, which makes it computationally difficult to
obtain a representative sample of the data. It becomes expensive to perform
tasks such as clustering or classification because the algorithms need to
process a much larger feature space, which increases computation time and
complexity. Additionally, some machine learning algorithms can be sensitive to
the number of dimensions, requiring more data to achieve the same level of
accuracy as lower-dimensional data.

To address the curse of dimensionality, Feature engineering techniques are


used which include feature selection and feature extraction. Dimensionality
reduction is a type of feature extraction technique that aims to reduce the
number of input features while retaining as much of the original information as
possible.

Module 03 2
In this article, we will discuss one of the most popular dimensionality reduction
techniques i.e. Principal Component Analysis(PCA).

What is Principal Component


Analysis(PCA)?
Principal Component Analysis(PCA) technique was introduced by the
mathematician Karl Pearson in 1901. It works on the condition that while the
data in a higher dimensional space is mapped to data in a lower dimension
space, the variance of the data in the lower dimensional space should be
maximum.

Principal Component Analysis (PCA) is a statistical procedure that uses an


orthogonal transformation that converts a set of correlated variables to a
set of uncorrelated variables.PCA is the most widely used tool in
exploratory data analysis and in machine learning for predictive models.
Moreover,

Principal Component Analysis (PCA) is an unsupervised learning algorithm


technique used to examine the interrelations among a set of variables. It is
also known as a general factor analysis where regression determines a line
of best fit.

The main goal of Principal Component Analysis (PCA) is to reduce the


dimensionality of a dataset while preserving the most important patterns or
relationships between the variables without any prior knowledge of the
target variables.

Principal Component Analysis (PCA) is used to reduce the dimensionality of a


data set by finding a new set of variables, smaller than the original set of
variables, retaining most of the sample’s information, and useful for
the regression and classification of data.

Module 03 3
Principal Component Analysis

1. Principal Component Analysis (PCA) is a technique for dimensionality


reduction that identifies a set of orthogonal axes, called principal
components, that capture the maximum variance in the data. The principal
components are linear combinations of the original variables in the dataset
and are ordered in decreasing order of importance. The total variance
captured by all the principal components is equal to the total variance in the
original dataset.

2. The first principal component captures the most variation in the data, but
the second principal component captures the maximum variance that
is orthogonal to the first principal component, and so on.

3. Principal Component Analysis can be used for a variety of purposes,


including data visualization, feature selection, and data compression. In
data visualization, PCA can be used to plot high-dimensional data in two or
three dimensions, making it easier to interpret. In feature selection, PCA can
be used to identify the most important variables in a dataset. In data
compression, PCA can be used to reduce the size of a dataset without
losing important information.

4. In Principal Component Analysis, it is assumed that the information is


carried in the variance of the features, that is, the higher the variation in a
feature, the more information that features carries.

Overall, PCA is a powerful tool for data analysis and can help to simplify
complex datasets, making them easier to understand and work with.

Module 03 4
Step-By-Step Explanation of PCA
(Principal Component Analysis)
Step1:Find mean of X & Y
Step 2:

Cov(X,X)

Cov(Y,Y)

Module 03 5
Cov(X,Y) & Cov(Y,X) are same

Module 03 6
Cov(X,Y) i.e 5.539 is value of only numerator so we have to divide it by n-1
before writing in amtrix
hence 5.539/9 (9 is n-1 i.e no of dimensins 10 -1)
Do same for all

Module 03 7
Module 03 8
For Lambda2

Module 03 9
For lambda 1

LDA

Module 03 10
fIND MEAN

Then calcualte S1 AND S2


Note:S1 is reltd to X1 S2 is related to X2

Before thatt find mue1 i.e {mean of all left or x coorditnate of X1,mean of all
right or x coorditnate of X1 )
Find diff of each term with mean

Module 03 11
Find transdpose ulta kro and theri multiplication in calci

Then add all matrix and u have computed S1


Do same for S2

Module 03 12
Module 03 13
Module 03 14
Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis
or Discriminant Function Analysis, is a dimensionality reduction technique
primarily utilized in supervised classification problems. It facilitates the
modeling of distinctions between groups, effectively separating two or more
classes. LDA operates by projecting features from a higher-dimensional space
into a lower-dimensional one. In machine learning, LDA serves as a supervised
learning algorithm specifically designed for classification tasks, aiming to
identify a linear combination of features that optimally segregates classes
within a dataset.

Module 03 15
For example, we have two classes and we need to separate them efficiently.
Classes can have multiple features. Using only a single feature to classify them
may result in some overlapping as shown in the below figure. So, we will keep
on increasing the number of features for proper classification.

Assumptions of LDA
LDA assumes that the data has a Gaussian distribution and that
the covariance matrices of the different classes are equal. It also assumes that
the data is linearly separable, meaning that a linear decision boundary can
accurately classify the different classes.
Suppose we have two sets of data points belonging to two different classes
that we want to classify. As shown in the given 2D graph, when the data points
are plotted on the 2D plane, there’s no straight line that can separate the two
classes of data points completely. Hence, in this case, LDA (Linear Discriminant
Analysis) is used which reduces the 2D graph into a 1D graph in order to
maximize the separability between the two classes.

Module 03 16
Linearly Separable Dataset
Here, Linear Discriminant Analysis uses both axes (X and Y) to create a new
axis and projects data onto a new axis in a way to maximize the separation of
the two categories and hence, reduces the 2D graph into a 1D graph.
Two criteria are used by LDA to create a new axis:

1. Maximize the distance between the means of the two classes.

2. Minimize the variation within each class.

Module 03 17
The perpendicular distance between the line and points
In the above graph, it can be seen that a new axis (in red) is generated and
plotted in the 2D graph such that it maximizes the distance between the means
of the two classes and minimizes the variation within each class. In simple
terms, this newly generated axis increases the separation between the data
points of the two classes. After generating this new axis using the above-
mentioned criteria, all the data points of the classes are plotted on this new axis
and are shown in the figure given below.

But Linear Discriminant Analysis fails when the mean of the distributions are
shared, as it becomes impossible for LDA to find a new axis that makes both
classes linearly separable. In such cases, we use non-linear discriminant
analysis.

How does LDA work?

Module 03 18
LDA works by projecting the data onto a lower-dimensional space that
maximizes the separation between the classes. It does this by finding a set of
linear discriminants that maximize the ratio of between-class variance to
within-class variance. In other words, it finds the directions in the feature space
that best separates the different classes of data.

SVD:
The Singular Value Decomposition (SVD) of a matrix is a factorization of that
matrix into three matrices. It has some interesting algebraic properties and
conveys important geometrical and theoretical insights about linear
transformations. It also has some important applications in data science. In this
article, I will try to explain the mathematical intuition behind SVD and its
geometrical meaning.
Mathematics behind SVD:
The SVD of mxn matrix A is given by the formula A=UΣVTA=UΣVT

where:

U: mxm matrix of the orthonormal eigenvectors of AAT .


AAT

V: transpose of a nxn matrix containing the orthonormal eigenvectors


of ATA.
T
ATA

ΣΣ : diagonal matrix with r elements equal to the root of the positive


eigenvalues of AAᵀ or Aᵀ A (both matrics have the same positive
eigenvalues anyway).

Module 03 19

You might also like