Unit 5 Mfds
Unit 5 Mfds
Lasso regression is also called Penalized regression method. This method is usually used in
machine learning for the selection of the subset of variables. It provides greater prediction
accuracy as compared to other regression models. Lasso Regularization helps to increase
model interpretation.
The less important features of a dataset are penalized by the lasso regression. The coefficients
of this dataset are made zero leading to their elimination. The dataset with high dimensions
and correlation is well suited for lasso regression.
Penalizing factor is added to form a lasso regression to the least-squares. The selection of the
model depends upon its ability to reduce the above loss function to its minimal value.
All the estimated parameters are present in the lasso regression penalty, and the value of
lambda lies between zero and infinity which decides the performance of aggressive
regularization. Lambda is selected using cross-validation.
The coefficients tend to decrease and gradually become zero when the value of lambda is
increased.
Lasso Linear Regression, also known as L1 Regularization, retains one variable, whereas it sets the
other correlated variables to zero. This leads to lower accuracy due to the loss of information.
PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.
Centroid Method:
1) Obtain the correlation matrix.
2) Obtain grand matrix sum, row sum, column sum. 3)
Calculate 1 N grandtotal = 4) Multiply each column sum
with N, which gives the first factor loading. 5) To find
the second factor loading, find the cross product matrix
of factor 1 by testing first factor loading horizontally
and vertically and then multiplying corresponding rows
and columns. 6) Find the first factor residual matrix
given as, Residuals = total variations- explained
variations = ij ij r l − 7) Reflection: Reflection means that
each test ve
Hierarchical Clustering:
Hierarchical clustering is basically an unsupervised clustering technique which
involves creating clusters in a predefined order. The clusters are ordered in a top to
bottom manner. In this type of clustering, similar clusters are grouped together and
are arranged in a hierarchical manner. It can be further divided into two types
namely agglomerative hierarchical clustering and Divisive hierarchical clustering. In
this clustering, we link the pairs of clusters all the data objects are there in the
hierarchy.
Applications
There are many real-life applications of Hierarchical clustering. They
include: