0% found this document useful (0 votes)
45 views74 pages

CH 03 Regression Techniques

Uploaded by

1032210687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views74 pages

CH 03 Regression Techniques

Uploaded by

1032210687
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

Chapter 3

Regression Techniques
Regression

• Is Supervised or Unsupervised?

• What is the basic requirement of Supervised learning?

• What is Regression?

• Regression is a supervised machine learning technique


which is used to predict continuous values.
Which of the following is a regression task?

1 2 3 4
Predicting age of a Predicting Predicting whether Predicting whether
person nationality of a stock price of a a document is
person company will related to sighting
increase tomorrow of UFOs?
Regression
Regression can be of following kind
• Linear regression
• Multiple linear regression
• Non-linear regression
Linear regression(single predictor variable)

• Data is modeled using straight line


• This regression line is represented by following expression

Where x → predictor variable


Y→ response variable
Multiple linear regression(multiple predictor variable)

• In linear and multiple linear regression always predictor and response


variable got the linear relationship
Non-linear regression :

• If the given response variable and predictor variable have got


polynomial relationship then it is called non-linear regression
How α and β is calculated?

• Using least square method

X→ mean value of x
Y→ mean value of y
Example 1
The below table shows the marks obtain by student in midterm and
final year exam
Midterm(x) Final year(y)
45 60
70 70
60 54
84 82
75 68
Find: 84 76

1)Equation of predication(linear Regression formula)


2) What will be the final year marks if the midterm marks is 40?
Solution:
• Find alpha and beta values
x y (x-x) (y-y) (x-x)(y-y) (x-x)2

45 60
70 70
60 54
84 82
75 68
84 76
X mean= Y mean= Σ = 648.62 Σ= 1141.36
β =0.558

α =28.84
• Now,

• So, prediction equation→ Y=28.84 + 0.558 X

• For X=40, we get Y=52


This is the final year marks for students getting 40 marks in midterm.
Example 2 : Linear Regression

SUBJE AGE
Solve using linear regression CT X GLUCOSE LEVEL Y
For Age =55 what will be the 1 43 99
glucose level? 2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
7 55 ?
Example 2: Linear Regression
Linear Regression-Step I

GLUCOSE
SUBJECT AGE X LEVEL Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022


Linear Regression-Step II
Find b:
GLUCOSE
SUBJECT AGE X LEVEL Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022


Linear Regression-Step III
Find b:
GLUCOSE
SUBJECT AGE X LEVEL Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022


Linear Regression-Step IV
Insert the values into the equation.
GLUCOSE
y’ =bo +b1 * x SUBJECT AGE X LEVEL Y XY X2 Y2

1 43 99 4257 1849 9801

y’ = 65.14 + (0.385225 * x) 2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022


Linear Regression-Step V

Prediction – the value of y for the given value


SUBJECT AGE X GLUCOSE LEVEL Y
of x = 55
1 43 99
2 21 65
y’ = 65.14 + (0.385225 * x)
3 25 79
4 42 75

y’ = 65.14 +(.385225 ∗55) 5 57 87


6 59 81
y’ =86.327
7 55 86.327
Advantages And Disadvantages of LR
Advantages Disadvantages

Linear regression performs exceptionally well The assumption of linearity between


for linearly separable data dependent and independent variables

Easier to implement, interpret and efficient to


It is often quite prone to noise and overfitting
train

It handles overfitting pretty well using Linear regression is quite sensitive to outliers
dimensionally reduction techniques, Hence,it should not be used in the
regularization, and cross-validation case of big-size data
Use Case – Implementing Linear Regression
1.Loading the Data
2.Exploring the Data
3.Slicing The Data
4.Train and Split Data
5.Generate The Model
6.Evaluate The accuracy
Multiple linear regression
• It is used to estimate the relationship between two or more
independent variables and one dependent variable
• Example:
• The selling price of a house can depend on the desirability of the
location, the number of bedrooms, the number of bathrooms, the
year the house was built, the square footage of the lot and a number
of other factors
• The height of a child can depend on the height of the mother, the
height of the father, nutrition, and environmental factors.
Multiple linear regression
Multiple linear regression
The simplest multiple regression model for two predictor variables is
y = β0 + β1x1 + β2x2 + €
Multiple linear regression
The simplest multiple regression model for two predictor variables is
y = a + b1x1 + b2x2 + €
Polynomial Regression

• It is also called the special case of Multiple Linear Regression.


• Because we add some polynomial terms to the Multiple Linear
regression equation to convert it into Polynomial Regression.
• It is a linear model with some modification in order to increase the
accuracy.
• The dataset used in Polynomial regression for training is of non-linear
nature.
• It makes use of a linear regression model to fit the complicated and
non-linear functions and datasets.
Need for Polynomial Regression

• If we apply a linear model on a linear


dataset, then it provides us a good
result as we have seen in Simple Linear
Regression,
• but if we apply the same model without
any modification on a non-linear
dataset, then it will produce a drastic
output.
• Due to which loss function will increase,
the error rate will be high, and accuracy
will be decreased.
• So for such cases, where data points
are arranged in a non-linear fashion,
we need the Polynomial Regression
model.
Equation of the Polynomial
Regression Model

• Simple Linear Regression equation:


• y = b0+b1x
• Multiple Linear Regression equation:
• y= b0+b1x+ b2x2+ b3x3+....+ bnxn
• Polynomial Regression equation:
• y= b0+b1x + b2x2+ b3x3+....+ bnxn

• The Simple and Multiple Linear equations are also


Polynomial equations with a single degree, and the
Polynomial regression equation is Linear equation
with the nth degree.
• If your data points clearly will not fit a linear regression
(a straight line through all data points), it might be
ideal for polynomial regression.
Problem Description
• There is a Human Resource company, which is going to
hire a new candidate. The candidate has told his
previous salary 160K per annum, and the HR have to
check whether he is telling the truth or bluff.
• So to identify this, they only have a dataset of his
previous company in which the salaries of the top 10
positions are mentioned with their levels.
• By checking the dataset available, we have found that
there is a non-linear relationship between the
Position levels and the salaries.
• Our goal is to build a Bluffing detector
regression model, so HR can hire an honest candidate.
Below are the steps to build such a model.
• Problem
Linear Regression Use Cases
• Sales of a product; pricing, performance, and risk parameters
• Generating insights on consumer behavior, profitability, and other business
factors
• Evaluation of trends; making estimates, and forecasts
• Determining marketing effectiveness, pricing, and promotions on sales of a
product
• Assessment of risk in financial services and insurance domain
• Studying engine performance from test data in automobiles
• Calculating causal relationships between parameters in biological systems
• Conducting market research studies and customer survey results analysis
• Astronomical data analysis
• Predicting house prices with the increase in sizes of houses
Regularization in Machine Learning
Regularization in Machine Learning

• Regularization is a technique used to reduce errors by fitting the


function appropriately on the given training set and avoiding
overfitting. The commonly used regularization techniques are :

• Lasso Regularization – L1 Regularization


• Ridge Regularization – L2 Regularization
• Elastic Net Regularization – L1 and L2 Regularization
Lasso Regression
• A regression model which uses the L1 Regularization technique is called LASSO(Least Absolute
Shrinkage and Selection Operator) regression.
• Lasso Regression adds the “absolute value of magnitude” of the coefficient as a penalty term to
the loss function(L).
• Lasso regression also helps us achieve feature selection by penalizing the weights to
approximately equal to zero if that feature does not serve any purpose in the model.

where,
•m – Number of Features
•n – Number of Examples
•Y-i – Actual Target Value
•Y-i(hat) – Predicted Target Value
Ridge Regression

• A regression model that uses the L2 regularization technique is


called Ridge regression.
• Ridge regression adds the “squared magnitude” of the coefficient as
a penalty term to the loss function(L).
Elastic Net Regression
• This model is a combination of L1 as well as L2 regularization.
• That implies that we add the absolute norm of the weights as well as
the squared measure of the weights.
• With the help of an extra hyperparameter that controls the ratio of
the L1 and L2 regularization.
Key Differences between Ridge and Lasso
Regression
• Ridge regression helps us to reduce only the overfitting in the model while
keeping all the features present in the model.
• It reduces the complexity of the model by shrinking the coefficients
whereas Lasso regression helps in reducing the problem of overfitting in
the model as well as automatic feature selection.
• Lasso Regression tends to make coefficients to absolute zero whereas Ridge
regression never sets the value of coefficient to absolute zero.
Evaluation Metrics for Regression Problems
How accurate our model is?

Following are the evaluation techniques:

1. M.A.E (Mean Absolute Error)


2. M.S.E (Mean Squared Error)
3. R.M.S.E (Root Mean Squared Error)
4. MAPE (Mean Absolute Percentage Error)
5. R-Squared
1. Mean Absolute Error (MAE):
• MAE is the average of the absolute differences between the predicted
values and the true values.
• It is robust to outliers because it treats all errors equally.
• MAE is easy to interpret as it represents the average magnitude of
errors.
2. Mean Squared Error (MSE):
• MSE is the average of the squared differences between the predicted
values and the true values.
• It penalizes larger errors more heavily than smaller errors, making it
sensitive to outliers.
3. Root Mean Squared Error (RMSE):

• RMSE is the square root of the MSE.


• It is in the same units as the target variable, which can be helpful for
interpretability.
4. R-squared (R2):
• R2 measures the proportion of the variance in the dependent variable
that is predictable from the independent variables.
• It ranges from 0 to 1, where 0 indicates that the model explains none
of the variance, and 1 indicates a perfect fit.
• It is often used to compare the goodness of fit of different models.

5. Mean Absolute Percentage Error (MAPE):
• MAPE expresses the error as a percentage of the true values and is
useful when you want to understand the relative size of errors.
• However, it can be problematic when true values are close to zero
because it can result in undefined or extremely large values.

Logistic Regression
Example 1
Employee rating Salary hike Employee rating Getting promoted
4 20k (based on probability)
4 yes(0.75)
Logistic Regression
Logistic regression is a statistical method commonly used to analyze data
with binary outcomes (yes/no, 1/0) and identify the relationship
between those outcomes and independent variables.

• Logistic regression, deals with categories.


• It doesn’t predict a specific value but rather the likelihood of
something belonging to a particular class.
• For instance, classifying emails as spam (category 1) or not spam
(category 0). The output here would be a probability between 0 (not
likely spam) and 1 (very likely spam).
• This probability is then used to assign an email to a definitive
category (spam or not spam) based on a chosen threshold.
we have our logistic function, also called a sigmoid function. The graph of a sigmoid function is as
shown above. It squeezes a straight line into an S-curve.
Example 2
do, y= z= log(odds) Y= P= Probability
Solved example 1
The dataset of pass or fail in an exam of 5 students is given in the table.
Use logistic regression classifier to answer the following questions

1) Calculate probability of pass for the student who studied 33 hours


2) At least how many hours student should study so that he will pass with
more than 90% probability.

Note: Assume the model suggested by the optimizer for odds of passing the course is,

Log(odds)= z = -64 + 2 * hours


Solution :
1. Calculate probability of pass for the student who studied 33 hours

Now, Z= -64 + 2 * hours


= -64 + 2 * 33
=2
Now,
2)At least how many hours the student should study so that he will
pass with more than 95% probability.

Since,
Difference between Linear Regression and Logistic Regression
Sr No Linear Regression Logistic Regression

1 Linear Regression is used for solving Regression Logistic regression is used for solving
problem. Classification problems.
2 In Linear regression, we predict the value of In logistic Regression, we predict the values
continuous variables. of categorical variables.
3 In linear regression, we find the best fit line, by In Logistic Regression, we find the S-curve
which we can easily predict the output. by which we can classify the samples.

4 Least square estimation method is used for Maximum likelihood estimation method is
estimation of accuracy. used for estimation of accuracy.

5 The output for Linear Regression must be a The output of Logistic Regression must be a
continuous value, such as price, age, etc. Categorical value such as 0 or 1, Yes or No,
etc.
6 In Linear regression, it is required that In Logistic regression, it is not required to
relationship between dependent variable and have the linear relationship between the
independent variable must be linear. dependent and independent variable.
Difference between Linear Regression and Logistic Regression

Sr. No Linear Regression Logistic Regression

Here activation function is used to convert a


7 Here no activation function is used. linear regression equation to the logistic
regression equation

8 Here no threshold value is needed. Here a threshold value is added.

Here we calculate Root Mean Square


Here we use precision to predict the next weight
9 Error(RMSE) to predict the next
value.
weight value.

Applications of logistic regression:


Applications of linear regression: • Medicine
• Financial risk assessment • Credit scoring
10
• Business insights • Hotel Booking
• Market analysis • Gaming
• Text editing
Modelling high dimensional data
(Fuzzy Clustering)
Clustering with high dimensional data
Fuzzy Clustering
• Clustering is an unsupervised machine learning technique that divides the given
data into different clusters based on their distances (similarity) from each other.

• The unsupervised k-means clustering algorithm gives the values of any point
lying in some particular cluster to be either as 0 or 1 i.e., either true or false. But
the fuzzy logic gives the fuzzy values of any particular data point to be lying in
either of the clusters

• Here, in fuzzy c-means clustering, we find out the centroid of the data points and
then calculate the distance of each data point from the given centroids until the
clusters formed become constant.
Fuzzy Clustering
• Fuzzy Clustering is a type of clustering algorithm in machine learning
that allows a data point to belong to more than one cluster with
different degrees of membership.

• Unlike traditional clustering algorithms, such as k-means or


hierarchical clustering, which assign each data point to a single
cluster, fuzzy clustering assigns a membership degree between 0 and
1 for each data point for each cluster.
Advantages of Fuzzy Clustering:
• Flexibility: Fuzzy clustering allows for overlapping clusters, which can
be useful when the data has a complex structure or when there are
ambiguous or overlapping class boundaries.
• Robustness: Fuzzy clustering can be more robust to outliers and noise
in the data, as it allows for a more gradual transition from one cluster
to another.
• Interpretability: Fuzzy clustering provides a more nuanced
understanding of the structure of the data, as it allows for a more
detailed representation of the relationships between data points and
clusters.
Disadvantages of Fuzzy Clustering:

• Complexity: Fuzzy clustering algorithms can be computationally more


expensive than traditional clustering algorithms, as they require
optimization over multiple membership degrees.

• Model selection: Choosing the right number of clusters and


membership functions can be challenging, and may require expert
knowledge or trial and error.
Example: Suppose the given data points are {(1, 3), (2, 5), (4, 8), (7, 9)}
The steps to perform the algorithm are:

• Step 1: Initialize the data points into the desired number of clusters
randomly.

• Let us assume there are 2 clusters in which the data is to be divided,


initializing the data point randomly. Each data point lies in both clusters
with some membership value which can be assumed anything in the initial
state.

• The table below represents the values of the data points along with their
membership (gamma r) in each cluster.
Cluster (1, 3) (2, 5) (4, 8) (7, 9)
1) 0.8 0.7 0.2 0.1
2) 0.2 0.3 0.8 0.9

Step 2: Find out the centroid.


The formula for finding out the centroid (V) is:

Where, r is fuzzy membership value of the data point, m is the fuzziness


parameter (generally taken as 2 ), and Xk is the data point.
Example: Suppose the given data points are {(1, 3), (2, 5), (4, 8), (7, 9)}

Step 2 continue (find out centroid)

Centroids are: (1.568, 4.051) and (5.35, 8.215)


Example: Suppose the given data points are {(1, 3), (2, 5), (4, 8), (7, 9)}

• Step 3: Find out the distance of each point from the centroid.
D11 = ((1 - 1.568)2 + (3 - 4.051)2)0.5 = 1.2

D12 = ((1 - 5.35)2 + (3 - 8.215)2)0.5 = 6.79

Similarly, the distance of all other points is computed from both the centroids.

(1, 3) (2, 5) (4, 8) (7, 9)


Distance Matrix:
Cluster 1: 1.2 1.04 4.64 7.35
Cluster (1, 3) (2, 5) (4, 8) (7, 9)
1) 1.2 ? ? ?
Cluster 2: 6.79 4.64 1.367 1.83
2) 6.79 ? ? ?
Distance matrix

(1, 3) (2, 5) (4, 8) (7, 9)

Cluster 1: 1.2 1.04 4.64 7.35

Cluster 2: 6.79 4.64 1.367 1.83

Step 4: Updating membership values

For point 1 new membership values are:

r11= [{ [(1.2)2 / (1.2)2] + [(1.2)2 / (6.79)2]} ^ {(1 / (2 – 1))} ] -1 =


0.96

r12= [{ [(6.79)2 / (6.79)2] + [(6.79)2 / (1.2)2]} ^ {(1 / (2 – 1))} ] -1 =


0.04
New membership value:
Alternatively, Cluster (1, 3) (2, 5) (4, 8) (7, 9)
r12 = 1 - r11 = 1 – 0.96 = 0.04 1) 0.96 ? ? ?
2) 0.04 ? ? ?
New membership value:
Cluster (1, 3) (2, 5) (4, 8) (7, 9)
1) 0.96 ? ? ?
2) 0.04 ? ? ?

Similarly, compute all other membership values, and update the matrix.

New membership value:

Cluster (1, 3) (2, 5) (4, 8) (7, 9)


1) 0.96 0.952 0.0798 0.058
2) 0.04 0.048 0.920 0.944

Step 5: Repeat the steps(2-4) until the constant values are obtained for the membership
values or the difference is less than the tolerance value.
(a small value up to which the difference in values of two consequent updations is
accepted).

Step 6: Defuzzify the obtained membership values.

You might also like