0% found this document useful (0 votes)

37 views16 pages

ML - Module 2

Module 2 of notes of ML

Uploaded by

Amritesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views16 pages

ML - Module 2

Module 2 of notes of ML

Uploaded by

Amritesh Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Module 2

Statistical Decision Theory, Bayesian Learning (ML, MAP, Bayes estimates, Conjugate
priors), Linear Regression, Ridge Regression, Lasso, Principal Component Analysis,
Partial Least Squares.
Statistical Decision Theory
Statistical decision theory: Decision theory is the science of making optimal
decisions in the face of uncertainty. Statistical decision theory is concerned with the
making of decisions when in the presence of statistical knowledge (data) which sheds
light on some of the uncertainties involved in the decision problem.

Statistical Decision Theory may be defined as a body of several methods

which facilitate the decision-maker to select wisely the best course of action
from amongst several alternatives.
The ideal of decision theory is to make choices rational by reducing them to a
kind of routine calculation.

Linear Regression
Regression
Regression: The main goal of regression is the construction of an efficient model to
predict the dependent attributes from a bunch of attribute variables. A regression
problem is when the output variable is either real or a continuous value i.e salary,
weight, area, etc.

Types of Regression

1. Linear Regression
2. Polynomial Regression
3. Support Vector Regression
4. Decision Tree Regression
5. Random Forest Regression

Linear Regression
Linear Regression: Linear Regression is a machine learning algorithm based
on supervised learning. It performs a regression task. It is a statistical method that is
used for predictive analysis. Linear regression makes predictions for continuous/real
or numeric variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a dependent

(y) and one or more independent (y) variables, hence called as linear
regression.
Since linear regression shows the linear relationship, which means it finds how
the value of the dependent variable is changing according to the value of the
independent variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as: Machine Learning

y= a0+a1x+ ε

Here,

Y= Dependent Variable/labels to data (Target Variable).

X= Independent Variable/input training data (predictor Variable).
a0= intercept of the line (Gives an additional degree of freedom).
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

Linear Regression Line

Linear Regression line: A linear line showing the relationship between the
dependent and independent variables is called a regression line. A regression line
can show two types of relationship:

o Positive Linear Relationship: If the dependent variable increases on the Y-axis

and independent variable increases on X-axis, then such a relationship is
termed as a Positive linear relationship.
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable
increases on the X-axis, then such a relationship is called a negative linear
relationship.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression: If a single independent variable is used to predict

the value of a numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
o Multiple Linear Regression: If more than one independent variable is used to
predict the value of a numerical dependent variable, then such a Linear
Regression algorithm is called Multiple Linear Regression.
Finding the best fit line
When working with linear regression, our main goal is to find the best fit line that
means the error between predicted values and actual values should be minimized.
The best fit line will have the least error.

The different values for weights or the coefficient of lines (a 0, a1) gives a different line
of regression, so we need to calculate the best values for a0 and a1 to find the best fit
line, so to calculate this we use cost function.

Cost function
o The different values for weights or coefficient of lines (a 0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
o By achieving the best-fit regression line, the model aims to predict Y value
such that the error difference between predicted value and true value is
minimum. So, it is very important to update the a0 and a1 values, to reach
the best value that minimize the error between predicted Y value (predicted)
and true Y value.

Mean Squared Error (MSE): For Linear Regression, we use the Mean Squared Error
(MSE) cost function, which is the average of squared error occurred between the
predicted values and actual values. It can be written as:

Where,

N=Total number of observation/data points.

Yi = Actual value.
(a1xi+a0)= Predicted value.
Residuals: The distance between the actual value and predicted values is called
residual. If the observed points are far from the regression line, then the residual will
be high, and so cost function will high. If the scatter points are close to the regression
line, then the residual will be small and hence the cost function.

Gradient Descent
o Gradient descent is used to minimize the MSE by calculating the gradient of
the cost function.
o A regression model uses gradient descent to update the coefficients of the
line by reducing the cost function.
o It is done by a random selection of values of coefficient and then iteratively
update the values to reach the minimum cost function.
o To update a0 and a1, we take gradients from the cost function. To find these
gradients, we take partial derivatives with respect to a0 and a1.
o Alpha is the learning rate which is a hyper-parameter that you must specify. A
smaller learning rate takes closer to the minimum, but it takes more time and
in case of a larger learning rate. The time taken is sooner but there is a chance
to overshoot the minimum value.

Advantages and Disadvantages Linear Regression

Advantages Disadvantages

Linear regression performs exceptionally The assumption of linearity between

well for linearly separable data. dependent and independent variables.

Easier to implement, interpret and It is often quite prone to noise and over-
efficient to train. fitting.

It handles over-fitting pretty well using

Linear regression is quite sensitive to
dimensionally reduction techniques,
outliers.
regularization, and cross-validation.

One more advantage is the extrapolation

It is prone to multicollinearity.
beyond a specific data set.
Model Performance
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization. It can be achieved by below method.

R-squared method

o R-squared is a statistical method that determines the goodness of fit.

o It measures the strength of the relationship between the dependent and
independent variables on a scale of 0-100%.
o The high value of R-square determines the less difference between the
predicted values and actual values and hence represents a good model.
o It is also called a coefficient of determination, or coefficient of multiple
determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are some formal
checks while building a Linear Regression model, which ensures to get the best
possible result from the given dataset.

o Linear relationship between the features and target: Linear regression

assumes the linear relationship between the dependent and independent
variables.
o Small or no Multicollinearity between the features: Multicollinearity means
high-correlation between the independent variables. Due to multicolinearity,
it may difficult to find the true relationship between the predictors and target
variables. Or we can say, it is difficult to determine which predictor variable is
affecting the target variable and which is not. So, the model assumes either
little or no multicollinearity between the features or independent variables.
o Homoscedasticity Assumption: Homoscedasticity is a situation when the error
term is the same for all the values of independent variables. With
homoscedasticity, there should be no clear pattern distribution of data in the
scatter plot.
o Normal distribution of error terms: Linear regression assumes that the error
term should follow the normal distribution pattern. If error terms are not
normally distributed, then confidence intervals will become either too wide or
too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without
any deviation, which means the error is normally distributed.
o No autocorrelations: The linear regression model assumes no autocorrelation
in error terms. If there will be any correlation in the error term, then it will
drastically reduce the accuracy of the model. Autocorrelation usually occurs if
there is a dependency between residual errors.

Linear Regression Use Cases

o Sales Forecasting.
o Risk Analysis.
o Housing Applications To Predict the prices and other factors.
o Finance Applications To Predict Stock prices, investment evaluation, etc.

Ridge Regression & Lasso Regression

Regularization

Regularization: Regularization is a technique to prevent the model from over-fitting

by adding extra information to it. It mainly regularizes or reduces the coefficient of
features toward zero.

In simple words, "In regularization technique, we reduce the magnitude of the

features by keeping the same number of features." Hence, it maintains
accuracy as well as generalizes the model.

Regularization is a technique used to reduce the errors by fitting the function

appropriately on the given training set and avoid over-fitting.
How does Regularization Work
Regularization works by adding a penalty or complexity term to the complex model.
Let's consider the simple linear regression equation:

y= β0+β1x1+β2x2+β3x3+⋯+βnxn +b

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively.

Here represents the bias of the model, and b represents the intercept.

Linear regression models try to optimize the β0 and b to minimize the cost function.
The equation for the cost function for the linear model is given below:

Now, we will add a loss function and optimize parameter to make the model that can
predict the accurate value of Y. The loss function for the linear regression is called
as RSS or Residual sum of squares.

Techniques of Regularization
There are mainly two types of regularization techniques, which are given below:

o Ridge Regression / L2 Regularization

o Lasso Regression / L1 Regularization

L1 & L2 Regularization
L1 regularization adds a penalty that is equal to the absolute value of the magnitude
of the coefficient. This regularization type can result in sparse models with few
coefficients. Some coefficients might become zero and get eliminated from the
model. Larger penalties result in coefficient values that are closer to zero (ideal for
producing simpler models). On the other hand, L2 regularization does not result in
any elimination of sparse models or coefficients. Thus, Lasso Regression is easier to
interpret as compared to the Ridge.
Ridge Regression

o Ridge regression is a model tuning method that is used to analyse any data
that suffers from multicollinearity. When the issue of multicollinearity occurs,
least-squares are unbiased, and variances are large, this results in predicted
values being far away from the actual values. It shrinks the parameters,
Therefore, it is used to prevent multicollinearity.
o Ridge regression is one of the types of linear regression in which a small
amount of bias is introduced so that we can get better long-term predictions.
o Ridge regression is a regularization technique, which is used to reduce the
complexity of the model. It is also called as L2 regularization.
o In this technique, the cost function is altered by adding the penalty term to it.
The amount of bias added to the model is called Ridge Regression penalty.
We can calculate it by multiplying with the lambda to the squared weight of
each individual feature.
o Ridge regression adds “squared magnitude” of coefficient as penalty term to
the loss function(L).
o The equation for the cost function in ridge regression will be:

o In the above equation, the penalty term regularizes the coefficients of the
model, and hence ridge regression reduces the amplitudes of the coefficients
that decreases the complexity of the model.
o As we can see from the above equation, if the values of λ tend to zero, the
equation becomes the cost function of the linear regression model. Hence,
for the minimum value of λ, the model will resemble the linear regression
model.
o A general linear or polynomial regression will fail if there is high collinearity
between the independent variables, so to solve such problems, Ridge
regression can be used.
o It helps to solve the problems if we have more parameters than samples.
Lasso Regression
o Lasso regression is another regularization technique to reduce the complexity
of the model. It stands for Least Absolute and Selection Operator.
o It is similar to the Ridge Regression except that the penalty term contains only
the absolute weights instead of a square of weights.
o Since it takes absolute values, hence, it can shrink the slope to 0, whereas
Ridge Regression can only shrink it near to 0.
o Lasso Regression adds “absolute value of magnitude” of coefficient as penalty
term to the loss function (L).
o This particular type of regression is well-suited for models showing high levels
of multicollinearity or when you want to automate certain parts of model
selection, like variable selection/parameter elimination.
o It is also called as L1 regularization. The equation for the cost function of
Lasso regression will be:

o Some of the features in this technique are completely neglected for model
evaluation.
o Hence, the Lasso regression can help us to reduce the overfitting in the model
as well as the feature selection.

Note: During Regularization the output function (y_hat) does not change. The
change is only in the loss function.

Differences between Ridge and Lasso Regression

o Ridge regression is mostly used to reduce the overfitting in the model, and it
includes all the features present in the model. It reduces the complexity of the
model by shrinking the coefficients.
o Lasso regression helps to reduce the overfitting in the model as well as
feature selection.
Principal Component Analysis
Principal Component Analysis: Principal Component Analysis is an unsupervised
learning algorithm that is used for the dimensionality reduction in machine learning.
It is a statistical process that converts the observations of correlated features into a
set of linearly uncorrelated features with the help of orthogonal transformation.
These new transformed features are called the Principal Components.

Statistically, PCA finds lines, planes and hyper-planes in the K-dimensional

space that approximate the data as well as possible in the least squares sense.
A line or plane that is the least squares approximation of a set of data points
makes the variance of the coordinates on the line or plane as large as possible.
PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
It is a feature extraction technique, so it contains the important variables and
drops the least important variable.

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given

dataset. More easily, it is the number of columns present in the dataset.
o Correlation: It signifies that how strongly two variables are related to each
other. Such as if one changes, the other variable also gets changed. The
correlation value ranges from -1 to +1. Here, -1 occurs if variables are
inversely proportional to each other, and +1 indicates that variables are
directly proportional to each other.
o Orthogonal: It defines that variables are not correlated to each other, and
hence the correlation between the pair of variables is zero.
o Eigenvectors: If there is a square matrix M, and a non-zero vector v is given.
Then v will be eigenvector if Av is the scalar multiple of v.
o Covariance Matrix: A matrix containing the covariance between the pair of
variables is called the Covariance Matrix.

Principal Components

Principal Components: The transformed new features or the output of PCA are the
Principal Components. The number of these PCs are either equal to or less than the
original features present in the dataset. Some properties of these principal
components are given below:
o The principal component must be the linear combination of the original
features.
o These components are orthogonal, i.e., the correlation between a pair of
variables is zero.
o The importance of each component decreases when going to 1 to n, it means
the 1 PC has the most importance, and n PC will have the least importance.

Steps for PCA algorithm

1. Getting the dataset: Firstly, we need to take the input dataset and divide it
into two subparts X and Y, where X is the training set, and Y is the validation
set.
2. Representing data into a structure: Now we will represent our dataset into a
structure. Such as we will represent the two-dimensional matrix of
independent variable X. Here each row corresponds to the data items, and the
column corresponds to the Features. The number of columns is the
dimensions of the dataset.
3. Standardizing the data: In this step, we will standardize our dataset. Such as
in a particular column, the features with high variance are more important
compared to the features with lower variance. If the importance of features is
independent of the variance of the feature, then we will divide each data item
in a column with the standard deviation of the column. Here we will name the
matrix as Z.
4. Calculating the Covariance of Z: To calculate the covariance of Z, we will take
the matrix Z, and will transpose it. After transpose, we will multiply it by Z. The
output matrix will be the Covariance matrix of Z.
5. Calculating the Eigen Values and Eigen Vectors: Now we need to calculate the
eigenvalues and eigenvectors for the resultant covariance matrix Z.
Eigenvectors or the covariance matrix are the directions of the axes with high
information. And the coefficients of these eigenvectors are defined as the
eigenvalues.
6. Sorting the Eigen Vectors: In this step, we will take all the eigenvalues and will
sort them in decreasing order, which means from largest to smallest. And
simultaneously sort the eigenvectors accordingly in matrix P of eigenvalues.
The resultant matrix will be named as P*.
7. Calculating the new features Or Principal Components: Here we will calculate
the new features. To do this, we will multiply the P* matrix to the Z. In the
resultant matrix Z*, each observation is the linear combination of original
features. Each column of the Z* matrix is independent of each other.
8. Remove less or unimportant features from the new dataset: The new feature
set has occurred, so we will decide here what to keep and what to remove. It
means, we will only keep the relevant or important features in the new
dataset, and unimportant features will be removed out.

Applications of Principal Component Analysis

o PCA is mainly used as the dimensionality reduction technique in various AI
applications such as computer vision, image compression, etc.
o It can also be used for finding hidden patterns if data has high dimensions.
Some fields where PCA is used are Finance, data mining, Psychology, etc.
o It is used to find inter-relation between variables in the data.
o It is used to interpret and visualize data.
o The number of variables is decreasing it makes further analysis simpler.
o It’s often used to visualize genetic distance and relatedness between
populations.

Principal Axis Method: PCA basically searches a linear combination of variables so

that we can extract maximum variance from the variables. Once this process
completes it removes it and searches for another linear combination that gives an
explanation about the maximum proportion of remaining variance which basically
leads to orthogonal factors. In this method, we analyze total variance.

Partial Least Squares

Partial Least Squares (PLS): Partial least squares (PLS) regression is a technique that
reduces the predictors to a smaller set of uncorrelated components and performs
least squares regression on these components, instead of on the original data. PLS
regression is especially useful when your predictors are highly collinear, or when you
have more predictors than observations and ordinary least-squares regression either
produces coefficients with high standard errors or fails completely.
PLS does not assume that the predictors are fixed, unlike multiple regression.
This means that the predictors can be measured with error, making PLS more
robust to measurement uncertainty.
PLS combines features of principal components analysis and multiple
regression. It first extracts a set of latent factors that explain
as much of the covariance as possible between the independent and
dependent variables. Then a regression step predicts values of the
dependent variables using the decomposition of the independent variables.
In PLS, components are selected based on how much variance they explain in
the predictors and between the predictors and the response(s).
PLS is a predictive technique that is an alternative
to ordinary least squares (OLS) regression, canonical correlation,
or structural equation modelling.
PLS regression is primarily used in the chemical, drug, food, and plastic
industries.

Curve Fitting Assignment
No ratings yet
Curve Fitting Assignment
5 pages
Question bank 3,4,5
No ratings yet
Question bank 3,4,5
9 pages
Data Science
100% (1)
Data Science
14 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
First Steps Inna
No ratings yet
First Steps Inna
232 pages
unit_2questionbank-1
No ratings yet
unit_2questionbank-1
38 pages
Nptel Notes 2
No ratings yet
Nptel Notes 2
21 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
simple linear regression with example problem
No ratings yet
simple linear regression with example problem
12 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
What Are Linear Models in Machine Learning[1].Docx (Unit3 Ml)
No ratings yet
What Are Linear Models in Machine Learning[1].Docx (Unit3 Ml)
60 pages
OE-ML unit -3
No ratings yet
OE-ML unit -3
29 pages
MachineLearning_Unit-II
No ratings yet
MachineLearning_Unit-II
45 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Midterm
No ratings yet
Midterm
4 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Batchelor Vs Stewartson Flow Structures in A Rotor Statotr Cavity
No ratings yet
Batchelor Vs Stewartson Flow Structures in A Rotor Statotr Cavity
58 pages
Chap 6 ORDINARY DIFFERENTIAL EQUATION
No ratings yet
Chap 6 ORDINARY DIFFERENTIAL EQUATION
35 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear_Regression (1)
No ratings yet
Linear_Regression (1)
35 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
X Cbse Maths Test-4 QP
No ratings yet
X Cbse Maths Test-4 QP
2 pages
Computational Science
No ratings yet
Computational Science
4 pages
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
No ratings yet
Applying_Machine_Learning_Algorithms_with_Scikit-learn(Sklearn)_-_Notes
19 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
Regression
No ratings yet
Regression
4 pages
Hanan
No ratings yet
Hanan
9 pages
Linear Advection
No ratings yet
Linear Advection
6 pages
Unit III
No ratings yet
Unit III
18 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Numerical Methods For Eigenvalue Problems: D. Löchel Supervisors: M. Hochbruck Und M. Tokar
No ratings yet
Numerical Methods For Eigenvalue Problems: D. Löchel Supervisors: M. Hochbruck Und M. Tokar
25 pages
A Genuinely High Order Total Variation Diminishing Scheme For One-Dimensional Scalar Conservation Laws
No ratings yet
A Genuinely High Order Total Variation Diminishing Scheme For One-Dimensional Scalar Conservation Laws
24 pages
-18-Linear Regression
No ratings yet
-18-Linear Regression
29 pages
Contours For 0.1 Range: (0.6,1.0)
No ratings yet
Contours For 0.1 Range: (0.6,1.0)
2 pages
Solution To Problem 2
No ratings yet
Solution To Problem 2
4 pages
Princeton University Notation and Terminology in optimization
No ratings yet
Princeton University Notation and Terminology in optimization
13 pages
Polynomials Prev Year Ques
No ratings yet
Polynomials Prev Year Ques
2 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Linear Algebra (Math 338) Midterm Exam 1: Date: March 1, 2007 Professor Ilya Kofman
No ratings yet
Linear Algebra (Math 338) Midterm Exam 1: Date: March 1, 2007 Professor Ilya Kofman
8 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Linear regression for machine learning
No ratings yet
Linear regression for machine learning
9 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Homework 1.1 and 1.2: With Solutions
No ratings yet
Homework 1.1 and 1.2: With Solutions
5 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
CM SA 02 Taylor FiniteDiff
No ratings yet
CM SA 02 Taylor FiniteDiff
8 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
RANK REFINER SHEET: Quadratic Equation: Connect With Om Sir
No ratings yet
RANK REFINER SHEET: Quadratic Equation: Connect With Om Sir
9 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Unit 2
No ratings yet
Unit 2
19 pages
UNIT-2 ML
No ratings yet
UNIT-2 ML
39 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
AP Calculus BC Chapter 11 Take Home Test
No ratings yet
AP Calculus BC Chapter 11 Take Home Test
9 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Numerical Recipes
No ratings yet
Numerical Recipes
8 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
ANM Detailed
No ratings yet
ANM Detailed
10 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Solving The General Cubic Equation
No ratings yet
Solving The General Cubic Equation
4 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Quiz 3
No ratings yet
Quiz 3
2 pages
ML UNIT II
No ratings yet
ML UNIT II
30 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Structural Dynamics Lecture 1
100% (3)
Structural Dynamics Lecture 1
36 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
An Introduction To Spectral Methods
100% (1)
An Introduction To Spectral Methods
39 pages
Foundation Course: Mathematics - Std. Ix
No ratings yet
Foundation Course: Mathematics - Std. Ix
27 pages
Numerical Methods in Heat Conduction
No ratings yet
Numerical Methods in Heat Conduction
51 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression Analysis: A Journey from Simple to Complex
From Everand
Regression Analysis: A Journey from Simple to Complex
Pasquale De Marco
No ratings yet
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet

ML - Module 2

Uploaded by

ML - Module 2

Uploaded by

Module 2

Statistical Decision Theory may be defined as a body of several methods

Linear regression algorithm shows a linear relationship between a dependent

Mathematically, we can represent a linear regression as: Machine Learning

Y= Dependent Variable/labels to data (Target Variable).

Linear Regression Line

o Positive Linear Relationship: If the dependent variable increases on the Y-axis

Types of Linear Regression

o Simple Linear Regression: If a single independent variable is used to predict

N=Total number of observation/data points.

Advantages and Disadvantages Linear Regression

Linear regression performs exceptionally The assumption of linearity between

It handles over-fitting pretty well using

One more advantage is the extrapolation

o R-squared is a statistical method that determines the goodness of fit.

Assumptions of Linear Regression

o Linear relationship between the features and target: Linear regression

Linear Regression Use Cases

Ridge Regression & Lasso Regression

Regularization: Regularization is a technique to prevent the model from over-fitting

In simple words, "In regularization technique, we reduce the magnitude of the

Regularization is a technique used to reduce the errors by fitting the function

In the above equation, Y represents the value to be predicted

X1, X2, …Xn are the features for Y.

β0,β1,…..βn are the weights or magnitude attached to the features, respectively.

o Ridge Regression / L2 Regularization

Differences between Ridge and Lasso Regression

Statistically, PCA finds lines, planes and hyper-planes in the K-dimensional

Some common terms used in PCA algorithm:

o Dimensionality: It is the number of features or variables present in the given

Steps for PCA algorithm

Applications of Principal Component Analysis

Principal Axis Method: PCA basically searches a linear combination of variables so

Partial Least Squares

You might also like