0% found this document useful (0 votes)

167 views4 pages

Unit 5 Mfds

Lasso regression performs variable selection by penalizing the absolute size of coefficients, setting some to exactly zero. It is useful for datasets with high dimensions or correlation between variables. Ridge regression performs L2 regularization which penalizes the squared magnitude of coefficients, preventing overfitting. Principal component analysis reduces dimensionality through orthogonal transformation to linearly uncorrelated principal components that retain maximum variance. Hierarchical clustering arranges clusters in a hierarchical tree structure based on similarity, while non-hierarchical clustering partitions data into a predefined number of clusters without a hierarchical structure.

Uploaded by

Divya Motghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

167 views4 pages

Unit 5 Mfds

Uploaded by

Divya Motghare

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

What is LASSO Regression?

Lasso regression is also called Penalized regression method. This method is usually used in
machine learning for the selection of the subset of variables. It provides greater prediction
accuracy as compared to other regression models. Lasso Regularization helps to increase
model interpretation.

The less important features of a dataset are penalized by the lasso regression. The coefficients
of this dataset are made zero leading to their elimination. The dataset with high dimensions
and correlation is well suited for lasso regression.

 Lasso Regression Formula:

D= Residual Sum of Squares or Least Squares Lambda * Aggregate of absolute values

of coefficients

Lambda denotes the amount of shrinkage in the lasso regression equation.

The best model is selected in a way to minimize the least-squares.

Penalizing factor is added to form a lasso regression to the least-squares. The selection of the
model depends upon its ability to reduce the above loss function to its minimal value.

All the estimated parameters are present in the lasso regression penalty, and the value of
lambda lies between zero and infinity which decides the performance of aggressive
regularization. Lambda is selected using cross-validation.

The coefficients tend to decrease and gradually become zero when the value of lambda is
increased.

Lasso Linear Regression, also known as L1 Regularization, retains one variable, whereas it sets the
other correlated variables to zero. This leads to lower accuracy due to the loss of information.

What is Ridge Regression?

Ridge regression is a specialized technique used to analyze multiple regression
data that is multicollinear in nature. It is a fundamental regularization
technique, but it is not used very widely because of the complex science behind
it. However, it is fairly easy to explore the science behind ridge regression in r if
you have an overall idea of the concept of multiple regression. Regression stays
the same, but in regularization, the way the model coefficients are determined
is different.

 Ridge regression carries out L2 regularization. In this, the penalty

equivalent is added to the square of the magnitude of coefficients. The
minimization objective = LS Obj + λ (the sum of the square of coefficients)
Here, LS Obj is the Least Square Objective. This is the linear regression
objective without regularization.

advantages of Ridge Regression?

 It protects the model from overfitting.
 It does not need unbiased estimators.
 Model complexity is reduced.

Principal Component Analysis

Principal Component Analysis is an unsupervised learning algorithm that is used for
the dimensionality reduction in machine learning. It is a statistical process that
converts the observations of correlated features into a set of linearly uncorrelated
features with the help of orthogonal transformation. These new transformed features
are called the Principal Components. It is one of the popular tools that is used for
exploratory data analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.

PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.

PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
Some real-world applications of PCA are image processing, movie
recommendation system, optimizing the power allocation in various
communication channels. It is a feature extraction technique, so it contains the
important variables and drops the least important variable.

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance

o Eigenvalues and Eigen factors

Centroid Method:
1) Obtain the correlation matrix.
2) Obtain grand matrix sum, row sum, column sum. 3)
Calculate 1 N grandtotal = 4) Multiply each column sum
with N, which gives the first factor loading. 5) To find
the second factor loading, find the cross product matrix
of factor 1 by testing first factor loading horizontally
and vertically and then multiplying corresponding rows
and columns. 6) Find the first factor residual matrix
given as, Residuals = total variations- explained
variations = ij ij r l − 7) Reflection: Reflection means that
each test ve

What is Cluster Analysis?

Cluster analysis is a data analysis technique that explores the naturally
occurring groups within a data set known as clusters. Cluster analysis
doesn’t need to group data points into any predefined groups, which
means that it is an unsupervised learning method. In unsupervised
learning, insights are derived from the data without any predefined labels
or classes. A good clustering algorithm ensures high intra-cluster
similarity and low inter-cluster similarity.

Hierarchical Clustering:
Hierarchical clustering is basically an unsupervised clustering technique which
involves creating clusters in a predefined order. The clusters are ordered in a top to
bottom manner. In this type of clustering, similar clusters are grouped together and
are arranged in a hierarchical manner. It can be further divided into two types
namely agglomerative hierarchical clustering and Divisive hierarchical clustering. In
this clustering, we link the pairs of clusters all the data objects are there in the
hierarchy.
Applications
There are many real-life applications of Hierarchical clustering. They
include:

 Bioinformatics: grouping animals according to their biological features to reconstruct

phylogeny trees

 Business: dividing customers into segments or forming a hierarchy of employees

based on salary.

 Image processing: grouping handwritten characters in text recognition based on the

similarity of the character shapes.
 Information Retrieval: categorizing search results based on the query.

Hierarchical clustering types

There are two main types of hierarchical clustering:

1. Agglomerative: Initially, each object is considered to be its own cluster. According to

a particular procedure, the clusters are then merged step by step until a single cluster
remains. At the end of the cluster merging process, a cluster containing all the
elements will be formed.
2. Divisive: The Divisive method is the opposite of the Agglomerative method. Initially,
all objects are considered in a single cluster. Then the division process is performed
step by step until each object forms a different cluster. The cluster division or splitting
procedure is carried out according to some principles that maximum distance between
neighboring objects in the cluster.

Non Hierarchical Clustering:

Non Hierarchical Clustering involves formation of new clusters by merging or
splitting the clusters.It does not follow a tree like structure like hierarchical
clustering.This technique groups the data in order to maximize or minimize some
evaluation criteria.K means clustering is an effective way of non hierarchical
clustering.In this method the partitions are made such that non-overlapping groups
having no hierarchical relationships between themselves.

Special Program in Sports Curriculum Map
No ratings yet
Special Program in Sports Curriculum Map
10 pages
Geoinformatics For Marine and Coastal Management
100% (2)
Geoinformatics For Marine and Coastal Management
444 pages
Evaporation Pond Reply by SEZAD
No ratings yet
Evaporation Pond Reply by SEZAD
17 pages
q3 g8 Carpentry Module 6 Finalncr
No ratings yet
q3 g8 Carpentry Module 6 Finalncr
10 pages
PERT-CPM-WBS (ES 12 Engineering Management)
100% (1)
PERT-CPM-WBS (ES 12 Engineering Management)
70 pages
ATMega16 Microcontroller Digital LM35 LCD Thermometer
100% (3)
ATMega16 Microcontroller Digital LM35 LCD Thermometer
4 pages
Environment Baseline SURVEY Report For Nghi Son Refinery Petrochemical Complex
No ratings yet
Environment Baseline SURVEY Report For Nghi Son Refinery Petrochemical Complex
159 pages
REC-Q3 Week 2 - Exemplar-DLL Grade 9
No ratings yet
REC-Q3 Week 2 - Exemplar-DLL Grade 9
4 pages
Synthetic Fibers and Plastics
No ratings yet
Synthetic Fibers and Plastics
38 pages
1 Robert Grosseteste Compotus Correctorius Trans Philipp Nothaft
No ratings yet
1 Robert Grosseteste Compotus Correctorius Trans Philipp Nothaft
80 pages
Section 2 Poultry Genetics Notes
No ratings yet
Section 2 Poultry Genetics Notes
4 pages
GSW NG01017640 GEN MP4033 00002 001 Piping Material Specification
No ratings yet
GSW NG01017640 GEN MP4033 00002 001 Piping Material Specification
114 pages
HT2000 H56163 HR PDF
No ratings yet
HT2000 H56163 HR PDF
1 page
Models PDF
No ratings yet
Models PDF
86 pages
Concrete Order Form
No ratings yet
Concrete Order Form
1 page
The Nitriding Process Is Perhaps One of The Most Misunderstood Thermochemical Surface
No ratings yet
The Nitriding Process Is Perhaps One of The Most Misunderstood Thermochemical Surface
25 pages
Recommendations For Stilling Wells For Tank Gauging - Honeywell Enraf
100% (4)
Recommendations For Stilling Wells For Tank Gauging - Honeywell Enraf
12 pages
National Biosecurity Manual For Beef Cattle Feedlots
No ratings yet
National Biosecurity Manual For Beef Cattle Feedlots
36 pages
The Death of Modern Hospital Towards A Restorative
No ratings yet
The Death of Modern Hospital Towards A Restorative
16 pages
3rd SUMMATIVE 4
No ratings yet
3rd SUMMATIVE 4
2 pages
Seminar PDF
No ratings yet
Seminar PDF
21 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Datasheet Norsat LNA Ku Band 4000 Series
No ratings yet
Datasheet Norsat LNA Ku Band 4000 Series
1 page
Research Paper
No ratings yet
Research Paper
7 pages
Group 7 Spiritual Self
No ratings yet
Group 7 Spiritual Self
19 pages
Bed Level Sensor - Royce
No ratings yet
Bed Level Sensor - Royce
8 pages
Advanced Car Hire Price List
No ratings yet
Advanced Car Hire Price List
5 pages
GJESM Volume 8 Issue 2 Pages 197-208
No ratings yet
GJESM Volume 8 Issue 2 Pages 197-208
12 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Scatter Diagram
No ratings yet
Scatter Diagram
14 pages
Viva EDA
No ratings yet
Viva EDA
8 pages
Hourly Weather Forecast For Brampton, Ontario, Canada - The Weather Channel
No ratings yet
Hourly Weather Forecast For Brampton, Ontario, Canada - The Weather Channel
1 page
BA Assignment
No ratings yet
BA Assignment
10 pages
Unsupervised Learning 1691392220
No ratings yet
Unsupervised Learning 1691392220
15 pages
Market Research
No ratings yet
Market Research
88 pages
Assignment4 - AnswerKey
No ratings yet
Assignment4 - AnswerKey
14 pages
TC-1 Final Answer Key
No ratings yet
TC-1 Final Answer Key
14 pages
CS Notes
No ratings yet
CS Notes
3 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
31 pages
O4MD 01 Introduction
No ratings yet
O4MD 01 Introduction
10 pages
Business Data Mining Week 4
No ratings yet
Business Data Mining Week 4
12 pages
TE Computer DSBDA
No ratings yet
TE Computer DSBDA
11 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
ML Questions Answer Q1
No ratings yet
ML Questions Answer Q1
79 pages
Purlin Data Sheet MEI
No ratings yet
Purlin Data Sheet MEI
1 page
Variance
No ratings yet
Variance
6 pages
BUSINESS ANALYTICS Ashok
No ratings yet
BUSINESS ANALYTICS Ashok
15 pages
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
No ratings yet
Machine Learning Techniques Assignment-7: Name:Ishaan Kapoor Rollno:1/15/Fet/Bcs/1/055
5 pages
Mid Term
No ratings yet
Mid Term
12 pages
Clustering
No ratings yet
Clustering
69 pages
Clustering & PCA Assignment Questions
No ratings yet
Clustering & PCA Assignment Questions
4 pages
Dimensionality Reduction-PCA FA LDA
No ratings yet
Dimensionality Reduction-PCA FA LDA
12 pages
Sigmoid Deep Learning
No ratings yet
Sigmoid Deep Learning
8 pages
Mid 2
No ratings yet
Mid 2
11 pages
Data Mining Functionalities
No ratings yet
Data Mining Functionalities
13 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
331 MT2 Study
No ratings yet
331 MT2 Study
30 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
CS 2 Marks PDF Ia2
No ratings yet
CS 2 Marks PDF Ia2
4 pages
Data Analytics-1
No ratings yet
Data Analytics-1
21 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Weighted Clusterwise Linear Regression Based On Adaptive Quadratic Form Distance
No ratings yet
Weighted Clusterwise Linear Regression Based On Adaptive Quadratic Form Distance
20 pages
DataMining Unit4 Notes
No ratings yet
DataMining Unit4 Notes
27 pages
Data Analytics: Assignment Qa
No ratings yet
Data Analytics: Assignment Qa
6 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Act 3 Scene 2
No ratings yet
Act 3 Scene 2
8 pages
DW&DM (Unit - 4)
No ratings yet
DW&DM (Unit - 4)
9 pages
ASM Using R 2 Marks Answer Keys
100% (1)
ASM Using R 2 Marks Answer Keys
10 pages
Aam Ut-2 QB Ans
No ratings yet
Aam Ut-2 QB Ans
29 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
ML Notes
No ratings yet
ML Notes
12 pages
Unit 5 (Dimensionality Reduction)
No ratings yet
Unit 5 (Dimensionality Reduction)
96 pages
DATA - Dist
No ratings yet
DATA - Dist
90 pages
DWM Unit-5 Sem Ans
No ratings yet
DWM Unit-5 Sem Ans
8 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Unit 4 Mining
No ratings yet
Unit 4 Mining
12 pages
Unsupervised Learning - A Comprehensive Overview of
No ratings yet
Unsupervised Learning - A Comprehensive Overview of
5 pages
PREDICTIVE BUSINESS ANALYTICS Sem 4
No ratings yet
PREDICTIVE BUSINESS ANALYTICS Sem 4
31 pages
ML Exam Answers
No ratings yet
ML Exam Answers
26 pages
Assignment - Unit-5 Answers
No ratings yet
Assignment - Unit-5 Answers
6 pages
Skip To Content: You Said
No ratings yet
Skip To Content: You Said
42 pages
ASM-BDM - Module 3 - Notes
No ratings yet
ASM-BDM - Module 3 - Notes
12 pages
T3 Scheme 24 25
No ratings yet
T3 Scheme 24 25
4 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Empirical Finance
No ratings yet
Empirical Finance
5 pages
Unit 3
No ratings yet
Unit 3
24 pages
Ids
No ratings yet
Ids
6 pages
It ML Unit 4 Notes Final
No ratings yet
It ML Unit 4 Notes Final
21 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages

Unit 5 Mfds

Uploaded by

Unit 5 Mfds

Uploaded by

What is LASSO Regression?

 Lasso Regression Formula:

D= Residual Sum of Squares or Least Squares Lambda * Aggregate of absolute values

Lambda denotes the amount of shrinkage in the lasso regression equation.

The best model is selected in a way to minimize the least-squares.

What is Ridge Regression?

 Ridge regression carries out L2 regularization. In this, the penalty

advantages of Ridge Regression?

Principal Component Analysis

The PCA algorithm is based on some mathematical concepts such as:

o Variance and Covariance

What is Cluster Analysis?

 Bioinformatics: grouping animals according to their biological features to reconstruct

 Business: dividing customers into segments or forming a hierarchy of employees

 Image processing: grouping handwritten characters in text recognition based on the

Hierarchical clustering types

1. Agglomerative: Initially, each object is considered to be its own cluster. According to

Non Hierarchical Clustering:

You might also like