Module 3 ML
Module 3 ML
Module 03 1
the dimensionality of the data, simplifying the model and improving its
efficiency.
Data Preprocessing:
Normalization: Scale the features to a similar range to prevent certain
features from dominating others, especially in distance-based algorithms.
PCA
As the number of features or dimensions in a dataset increases, the amount of
data required to obtain a statistically significant result increases exponentially.
This can lead to issues such as overfitting, increased computation time, and
reduced accuracy of machine learning models this is known as the curse of
dimensionality problems that arise while working with high-dimensional data.
As the number of dimensions increases, the number of possible combinations
of features increases exponentially, which makes it computationally difficult to
obtain a representative sample of the data. It becomes expensive to perform
tasks such as clustering or classification because the algorithms need to
process a much larger feature space, which increases computation time and
complexity. Additionally, some machine learning algorithms can be sensitive to
the number of dimensions, requiring more data to achieve the same level of
accuracy as lower-dimensional data.
Module 03 2
In this article, we will discuss one of the most popular dimensionality reduction
techniques i.e. Principal Component Analysis(PCA).
Module 03 3
Principal Component Analysis
2. The first principal component captures the most variation in the data, but
the second principal component captures the maximum variance that
is orthogonal to the first principal component, and so on.
Overall, PCA is a powerful tool for data analysis and can help to simplify
complex datasets, making them easier to understand and work with.
Module 03 4
Step-By-Step Explanation of PCA
(Principal Component Analysis)
Step1:Find mean of X & Y
Step 2:
Cov(X,X)
Cov(Y,Y)
Module 03 5
Cov(X,Y) & Cov(Y,X) are same
Module 03 6
Cov(X,Y) i.e 5.539 is value of only numerator so we have to divide it by n-1
before writing in amtrix
hence 5.539/9 (9 is n-1 i.e no of dimensins 10 -1)
Do same for all
Module 03 7
Module 03 8
For Lambda2
Module 03 9
For lambda 1
LDA
Module 03 10
fIND MEAN
Before thatt find mue1 i.e {mean of all left or x coorditnate of X1,mean of all
right or x coorditnate of X1 )
Find diff of each term with mean
Module 03 11
Find transdpose ulta kro and theri multiplication in calci
Module 03 12
Module 03 13
Module 03 14
Linear Discriminant Analysis (LDA), also known as Normal Discriminant Analysis
or Discriminant Function Analysis, is a dimensionality reduction technique
primarily utilized in supervised classification problems. It facilitates the
modeling of distinctions between groups, effectively separating two or more
classes. LDA operates by projecting features from a higher-dimensional space
into a lower-dimensional one. In machine learning, LDA serves as a supervised
learning algorithm specifically designed for classification tasks, aiming to
identify a linear combination of features that optimally segregates classes
within a dataset.
Module 03 15
For example, we have two classes and we need to separate them efficiently.
Classes can have multiple features. Using only a single feature to classify them
may result in some overlapping as shown in the below figure. So, we will keep
on increasing the number of features for proper classification.
Assumptions of LDA
LDA assumes that the data has a Gaussian distribution and that
the covariance matrices of the different classes are equal. It also assumes that
the data is linearly separable, meaning that a linear decision boundary can
accurately classify the different classes.
Suppose we have two sets of data points belonging to two different classes
that we want to classify. As shown in the given 2D graph, when the data points
are plotted on the 2D plane, there’s no straight line that can separate the two
classes of data points completely. Hence, in this case, LDA (Linear Discriminant
Analysis) is used which reduces the 2D graph into a 1D graph in order to
maximize the separability between the two classes.
Module 03 16
Linearly Separable Dataset
Here, Linear Discriminant Analysis uses both axes (X and Y) to create a new
axis and projects data onto a new axis in a way to maximize the separation of
the two categories and hence, reduces the 2D graph into a 1D graph.
Two criteria are used by LDA to create a new axis:
Module 03 17
The perpendicular distance between the line and points
In the above graph, it can be seen that a new axis (in red) is generated and
plotted in the 2D graph such that it maximizes the distance between the means
of the two classes and minimizes the variation within each class. In simple
terms, this newly generated axis increases the separation between the data
points of the two classes. After generating this new axis using the above-
mentioned criteria, all the data points of the classes are plotted on this new axis
and are shown in the figure given below.
But Linear Discriminant Analysis fails when the mean of the distributions are
shared, as it becomes impossible for LDA to find a new axis that makes both
classes linearly separable. In such cases, we use non-linear discriminant
analysis.
Module 03 18
LDA works by projecting the data onto a lower-dimensional space that
maximizes the separation between the classes. It does this by finding a set of
linear discriminants that maximize the ratio of between-class variance to
within-class variance. In other words, it finds the directions in the feature space
that best separates the different classes of data.
SVD:
The Singular Value Decomposition (SVD) of a matrix is a factorization of that
matrix into three matrices. It has some interesting algebraic properties and
conveys important geometrical and theoretical insights about linear
transformations. It also has some important applications in data science. In this
article, I will try to explain the mathematical intuition behind SVD and its
geometrical meaning.
Mathematics behind SVD:
The SVD of mxn matrix A is given by the formula A=UΣVTA=UΣVT
where:
Module 03 19