0% found this document useful (0 votes)
167 views4 pages

Unit 5 Mfds

Lasso regression performs variable selection by penalizing the absolute size of coefficients, setting some to exactly zero. It is useful for datasets with high dimensions or correlation between variables. Ridge regression performs L2 regularization which penalizes the squared magnitude of coefficients, preventing overfitting. Principal component analysis reduces dimensionality through orthogonal transformation to linearly uncorrelated principal components that retain maximum variance. Hierarchical clustering arranges clusters in a hierarchical tree structure based on similarity, while non-hierarchical clustering partitions data into a predefined number of clusters without a hierarchical structure.

Uploaded by

Divya Motghare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views4 pages

Unit 5 Mfds

Lasso regression performs variable selection by penalizing the absolute size of coefficients, setting some to exactly zero. It is useful for datasets with high dimensions or correlation between variables. Ridge regression performs L2 regularization which penalizes the squared magnitude of coefficients, preventing overfitting. Principal component analysis reduces dimensionality through orthogonal transformation to linearly uncorrelated principal components that retain maximum variance. Hierarchical clustering arranges clusters in a hierarchical tree structure based on similarity, while non-hierarchical clustering partitions data into a predefined number of clusters without a hierarchical structure.

Uploaded by

Divya Motghare
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

 

What is LASSO Regression? 

Lasso regression is also called Penalized regression method. This method is usually used in
machine learning for the selection of the subset of variables. It provides greater prediction
accuracy as compared to other regression models. Lasso Regularization helps to increase
model interpretation. 

The less important features of a dataset are penalized by the lasso regression. The coefficients
of this dataset are made zero leading to their elimination. The dataset with high dimensions
and correlation is well suited for lasso regression. 

 Lasso Regression Formula: 

D= Residual Sum of Squares or Least Squares Lambda * Aggregate of  absolute values


of coefficients   

Lambda denotes the amount of shrinkage in the lasso regression equation.

The best model is selected in a way to minimize the least-squares. 

Penalizing factor is added to form a lasso regression to the least-squares. The selection of the
model depends upon its ability to reduce the above loss function to its minimal value. 

All the estimated parameters are present in the lasso regression penalty, and the value of
lambda lies between zero and infinity which decides the performance of aggressive
regularization. Lambda is selected using cross-validation. 

The coefficients tend to decrease and gradually become zero when the value of lambda is
increased. 

Lasso Linear Regression, also known as L1 Regularization, retains one variable, whereas it sets the
other correlated variables to zero. This leads to lower accuracy due to the loss of information. 

What is Ridge Regression?


Ridge regression is a specialized technique used to analyze multiple regression
data that is multicollinear in nature. It is a fundamental regularization
technique, but it is not used very widely because of the complex science behind
it. However, it is fairly easy to explore the science behind ridge regression in r if
you have an overall idea of the concept of multiple regression. Regression stays
the same, but in regularization, the way the model coefficients are determined
is different.

 Ridge regression carries out L2 regularization. In this, the penalty


equivalent is added to the square of the magnitude of coefficients. The
minimization objective = LS Obj + λ (the sum of the square of coefficients)
Here, LS Obj is the Least Square Objective. This is the linear regression
objective without regularization.

advantages of Ridge Regression?


 It protects the model from overfitting.
 It does not need unbiased estimators.
  Model complexity is reduced.

  Principal Component Analysis


Principal Component Analysis is an unsupervised learning algorithm that is used for
the dimensionality reduction in machine learning. It is a statistical process that
converts the observations of correlated features into a set of linearly uncorrelated
features with the help of orthogonal transformation. These new transformed features
are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.

PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance


o Eigenvalues and Eigen factors

Centroid Method:
1) Obtain the correlation matrix.
2) Obtain grand matrix sum, row sum, column sum. 3)
Calculate 1 N grandtotal = 4) Multiply each column sum
with N, which gives the first factor loading. 5) To find
the second factor loading, find the cross product matrix
of factor 1 by testing first factor loading horizontally
and vertically and then multiplying corresponding rows
and columns. 6) Find the first factor residual matrix
given as, Residuals = total variations- explained
variations = ij ij r l − 7) Reflection: Reflection means that
each test ve

What is Cluster Analysis?


Cluster analysis is a data analysis technique that explores the naturally
occurring groups within a data set known as clusters. Cluster analysis
doesn’t need to group data points into any predefined groups, which
means that it is an unsupervised learning method. In unsupervised
learning, insights are derived from the data without any predefined labels
or classes. A good clustering algorithm ensures high intra-cluster
similarity and low inter-cluster similarity.

Hierarchical Clustering: 
Hierarchical clustering is basically an unsupervised clustering technique which
involves creating clusters in a predefined order. The clusters are ordered in a top to
bottom manner. In this type of clustering, similar clusters are grouped together and
are arranged in a hierarchical manner. It can be further divided into two types
namely agglomerative hierarchical clustering and Divisive hierarchical clustering. In
this clustering, we link the pairs of clusters all the data objects are there in the
hierarchy. 
Applications
There are many real-life applications of Hierarchical clustering. They
include:

 Bioinformatics: grouping animals according to their biological features to reconstruct


phylogeny trees

 Business: dividing customers into segments or forming a hierarchy of employees


based on salary.

 Image processing: grouping handwritten characters in text recognition based on the


similarity of the character shapes.
 Information Retrieval: categorizing search results based on the query.

Hierarchical clustering types


There are two main types of hierarchical clustering:

1. Agglomerative: Initially, each object is considered to be its own cluster. According to


a particular procedure, the clusters are then merged step by step until a single cluster
remains. At the end of the cluster merging process, a cluster containing all the
elements will be formed.
2. Divisive: The Divisive method is the opposite of the Agglomerative method. Initially,
all objects are considered in a single cluster. Then the division process is performed
step by step until each object forms a different cluster. The cluster division or splitting
procedure is carried out according to some principles that maximum distance between
neighboring objects in the cluster.

Non Hierarchical Clustering: 


Non Hierarchical Clustering involves formation of new clusters by merging or
splitting the clusters.It does not follow a tree like structure like hierarchical
clustering.This technique groups the data in order to maximize or minimize some
evaluation criteria.K means clustering is an effective way of non hierarchical
clustering.In this method the partitions are made such that non-overlapping groups
having no hierarchical relationships between themselves. 

You might also like