0% found this document useful (0 votes)

8 views57 pages

Hota ML Regression

The document outlines various regression models used in machine learning, including linear, logistic, and polynomial regression, along with their applications and methodologies. It discusses the process of finding relationships between dependent and independent variables, the importance of minimizing cost functions, and the use of gradient descent for optimization. Additionally, it highlights the differences between regression and classification problems, emphasizing the significance of appropriate loss functions in logistic regression.

Uploaded by

Aryan Kate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views57 pages

Hota ML Regression

Uploaded by

Aryan Kate

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Birla Institute of Technology and Science Pilani, Hyderabad Campus

11.09.2024

BITS F464: Machine Learning (1st Sem 2024-25)

REGRESSION MODELS
Chittaranjan Hota, Sr. Professor
Dept. of Computer Sc. and Information Systems
[email protected]
What Type of Problems can you solve?

Source: www.macroaxis.com/stocks/ https://www.imdb.com/

Price ($)
in 1000’s
400
300
200
Linear 100 Linear
500 1000 1500 2000 2500
Size in feet^2

Logistic

Polynomial

Different types of Regression for different purposes. Ridge, Lasso, Bayesian, …

Regression with Scalar Input(Univariate)
Weight

Height

Simple Linear Regression

With Vector inputs (more covariates)

https://currentaffairs.adda247.com/

• Unemployment rate, education level, population count, land area, income level,
investment rate, life expectancy, … (Multiple Linear Regression: Multi-variate)
Another Example of Multi-variate Regression
Sales = b + w1 weather + w2 money +w3 day
Regression:
Process of finding out
relationship between a
dependent variable
(outcome/ response/
label) and one or more
independent variables
(predictors/ covariates/
explanatory variables/
features)
BITS, Hyderabad

Independent variables (X): weather (rainy, sunny, cloudy), amount in hand, day
type (working, holiday), Dependent variable: Y (Sales)
How the dependent variable (Y) will react to each variable X taken
independently?
Best Fitting a Line: Least Squares Method
If >0
How are X and
Y related?
If <0?
If == 0 ?

The target function: , where m adjustable parameters are held in vector

Simple Linear Regression

Best Fitting a Line: Least Squares Method

Observed response

Predicted response
residual = data – fit

Find out the optimal parameter values by minimizing the sum of squared
residuals
Can you choose the best-fit line?

Hypothetically: Say, weight = 2 + 1.5 height

Multiple Linear Regression Analysis

(People are clustered based

(hardly any association between the two) on age)

Img. Source: https://sphweb.bumc.bu.edu/

Continued…

BMI = 18 + 1.5 (diet score) + 1.6 (male) + 4.2 (age > 20)

Y = β0 + β1 x1 + β2 x2 + β3 x3

Img. Source: https://sphweb.bumc.bu.edu/

Non-linear relationships

Examples: House price based on Floor area, Electricity

consumption based on no. of household members and
appliances being used.
Analyzing Residuals

Model
describes
data well
or poor?

Randomly
scattered
around zero
Continued…
Model includes
a Second-
degree
polynomial
(quadratic term)

Systematically
positive for
much of the
data.
Good or bad
fit?
Non-linear relations using Linear models?
• Feature Engineering: Engineer new features by transforming the existing
ones to capture non-linear relationships, e.g, you can include polynomial
features (e.g., quadratic, cubic).

• Using Basis Functions: Instead of using the original features, you can use
basis functions, which are transformations of the original features, e.g
Polynomial basis functions, Gaussian radial basis functions, or Sigmoidal
basis functions.

• Regularization: Ridge regression (L2 regularization) or Lasso regression

(L1 regularization) to penalize large coefficients.

• Non-linear Regression Models: If the relationship is highly non-linear, use

Polynomial, Logistic, exponential, Power-law, Gaussian, Logarithmic
regression etc., Decision trees, Random forests, SVMs with non-linear
kernels, or Neural networks.
We will see some of these…
• lift is the dependent variable, and the
independent variable is the ‘hours’, i.e
the time spent in weight lifting.
A
n

E
x
a
m
p
l • We add a quadratic term as an
e independent variable in the model. y =
x2
Source: https://bookdown.org/

parabola
Basis Functions: Why are they needed?

Linear or non-linear?
Let us add a basis function x1x2 into the input (this term couples two terms
non-linearly)
With the third input z = x1x2 the XOR becomes linearly separable.

Acknowledgement: Volker Tresp’s presentation

Continued…

Acknowledgement: Volker Tresp’s presentation

What are Basis Functions?
Simplest model of Linear Regression:
Key Property: Linear function of parameters. Also, it is a linear function of its
input variables  Imposes serious limitations on the model.
Basis functions come to rescue (called derived features in machine learning)
are building blocks for creating more complex functions.
For example, individual powers of x: the basis functions 1, x, x2, x3… can be
combined together to form a polynomial function.
Basis functions extend this class of models by considering linear
combinations of handpicked fixed nonlinear functions of the input
variables.
Non linearity in (vector form) or
the data while
keeping linearity Where, and
in parameters.
Basis functions for Non-linearity
Where,

(Polynomial basis function) (Gaussian basis function) (Sigmoidal basis function)

Global: a small change in x Local: a small change in x Local: a small change in x

3 3

2 2

1 1

1 2 3 0 1 2 3
MSE cost function for linear regression is always Convex.
Gradient Descent: Minimizing the MSE
• Optimization algorithm used to minimize the MSE function by iteratively
adjusting parameters in the direction of the negative gradient, aiming to
find the optimal set of parameters.
If we represent the gradient of
the loss function as ∇L, and the
parameters we are optimizing
as θ:
Then the update rule for gradient
descent is:

θ_new = θ_old - α * ∇L

Move in the opposite direction

of the gradient.
Img. Source: https://www.analyticsvidhya.com/
Many local minima in gradient descent

MSE cost function is Convex. Will you get many local minima? No, only one global minima.

Reason: If you pick any two points on the curve, the line joining them will never cross the curve.
Visualizing Gradient Descent

(visualized by using Contours)

Acknowledgement: Andrew Ng, Stanford
A bit of Math: Derivative of a Function?
Distance

100m

Δy

Δx time

10.25s

What is his Average Speed? Δy/ Δx

Amlan Borgohain
Instantaneous Speed Vs Average Speed

Distance

100m
Δy
Δx
Δy
Δy

Δx Δx
10.25s time

Continued…

𝞱1
Derivative = 0

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

The Impact of Learning Rate

Learing Rate

𝞱1

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Continued…

𝞱1

0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, …, 0.9, 1

Acknowledgement: Mohammad Hammoud, CMU (Qatar)
Gradient Descent for Linear Regression

Repeat until convergence{

}
Batch Vs Stochastic Gradient Descent
Very smooth
convergence,
however
using all the
data for one
update.

Very noisy
convergence,
because
using only
one data
point for one
update.
Regression vs. Classification
Aspect Regression Classification
Objective Predict continuous values or a Predict categorical labels (0 or 1; cat,
range of values (3.4, 8.6, …) dog, sheep; low,medium,high)

Example House prices; Stock prices; Spam emails; Image classification;

Body Mass Index; Energy Loan approval (approved/ not
consumption etc. approved), Customer churn etc.
Evaluation MSE, RMSE, MAE, R2 Accuracy, Precision, Recall, F1, AUC
metrics
Algorithms Linear regression, Ridge, Logistic regression, DT with
Lasso, Polynomial regression, categorical targets, Naïve Bayes,
DT with numerical targets etc. SVMs, KNN, …
Types of Continuous outcome (how Discrete outcomes (which class?)
problems much?)
Logistic Regression
• The linear regression model discussed in the previous
class assumes that the dependent variable is
quantitative (continuous).

• However, in many situations, the dependent variable is

instead qualitative (categorical)

• A patient arrives at the campus medical (BITS) with cough,

fever and runny nose.
• Which disease the patient has? Influenza (Flu) (20-30%),
Acute Bronchitis (15-25%), Common cold (10-20%).

Question: Which one is dependent and which one is Independent variable?

Logistic Regression
• The linear regression model discussed in the previous
class assumes that the dependentCredentialvariableTheft
is
quantitative (continuous). (20-30%)

• However, in many situations, the dependent variable is

instead qualitative (categorical) Malware
Distribution
• A patient arrives at the campus medical (BITS) with cough,
(15-20%)
fever and runny nose.
• Which disease the patient has? Influenza (Flu) (20-30%),
Acute Bronchitis (15-25%), Common cold (10-20%).

Question: Which one is dependent and which one is Independent variable?

Logistic Regression
• Logistic regression is a type of linear regression that predicts the
probability of an event occurring based on one or more input features. It's
widely used for binary classification problems.
• How does it work?
Step1: Linear combination: Calculate a linear combination of the input
features and their weights, which is represented by the equation:
z = β0 + β1 . x1 + … + βn . xn , where ‘z’ is the log odds score.
Step2: Apply the logistic function (also known as the Sigmoid) to the
linear combination result (z):
p = 1 / (1 + exp(-z))
Step3: Thresholding: Compare the predicted probability with a threshold
value (usually set to 0.5). If p > 0.5, predict class 1; otherwise, predict
class 0.
Example: Hiking in Seattle
? day

? day

Should we fit a linear regression model to this data? No

Loss function for logistic regression
• If you use MSE for Logistic regression,
what problems it might create?

• A suitable loss function in logistic regression is called the

Log-Loss, or binary cross-entropy. This function is:

• It penalizes deviations (incorrect probability predictions),

offering a continuous metric for optimization during model
training.
What is it?
Why Log-Loss?
You can see how as the
probability gets closer
to the true value
(p=0 when y=0 and p=1
when y=1), the Log-Loss
decreases to 0. As the
probability gets further
from the true value, the
Log-Loss approaches
infinity.
How Gradient Descent Works for Logistic
Regression?

(1 step)

(5 steps)

(10 steps)

Source: mlu-explain.github.io/logistic-regression/
Chances of Admission to BITS Pilani: Ex.

Define the Logistic Regression Model:

p = 1/(1 + e-(β0 + β1. Math + β2. Physics + β3. Chemistry + β4.12th Percentage))
Assignment 3
Thank You!

Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
LEC2 مشين
No ratings yet
LEC2 مشين
116 pages
AIML ML Session 3 - Student Common Reference (With More Additional Reading Materials)
No ratings yet
AIML ML Session 3 - Student Common Reference (With More Additional Reading Materials)
55 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
NVT SDS Unit V Final PDF
No ratings yet
NVT SDS Unit V Final PDF
100 pages
Cost Function
No ratings yet
Cost Function
17 pages
Regression
No ratings yet
Regression
45 pages
Mla Unit 2
No ratings yet
Mla Unit 2
99 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Fileml
No ratings yet
Fileml
54 pages
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE401L TH VL2024250102078 2024-09-04 Reference-Material-I
27 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Final ML
No ratings yet
Final ML
54 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Lec 3
No ratings yet
Lec 3
22 pages
Unit No. 2
No ratings yet
Unit No. 2
30 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Regression
No ratings yet
Regression
16 pages
Lecture 3
No ratings yet
Lecture 3
90 pages
Module 3
No ratings yet
Module 3
27 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
W2 Ann
No ratings yet
W2 Ann
12 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
No ratings yet
Everything You Need To Know About Linear Regression - by Sushant Patrikar - Towards Data Science
20 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML Unit3
No ratings yet
ML Unit3
9 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
10 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
5 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
Maths Class Ix Chapter 01 02 and 03 Practice Paper 01 Answers
67% (3)
Maths Class Ix Chapter 01 02 and 03 Practice Paper 01 Answers
6 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Machine Learning Class Slide
No ratings yet
Machine Learning Class Slide
44 pages
M Stage 8 p110 02 Afp PDF
67% (3)
M Stage 8 p110 02 Afp PDF
14 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
ML:Introduction: Week 1 Lecture Notes
No ratings yet
ML:Introduction: Week 1 Lecture Notes
8 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
ISYE 2028 Chapter 8 Solutions
100% (2)
ISYE 2028 Chapter 8 Solutions
41 pages
2024-26 - Jr.C-120 - Physics Teaching & Test Schedule With Class & Home Work
No ratings yet
2024-26 - Jr.C-120 - Physics Teaching & Test Schedule With Class & Home Work
30 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Chapter 6 Supervised Learning
No ratings yet
Chapter 6 Supervised Learning
6 pages
Java Programming The Beginning Beginner's Guide PDF
No ratings yet
Java Programming The Beginning Beginner's Guide PDF
67 pages
Friction - DPPs
No ratings yet
Friction - DPPs
11 pages
2015 CIMC Keystage III Team
No ratings yet
2015 CIMC Keystage III Team
11 pages
SURPAC Model Filling
No ratings yet
SURPAC Model Filling
13 pages
Rational Method With Excel-R1
No ratings yet
Rational Method With Excel-R1
20 pages
En 894-3
No ratings yet
En 894-3
46 pages
Electrical and Electronics Measurement and Instrumentation
100% (1)
Electrical and Electronics Measurement and Instrumentation
50 pages
Unit 18
No ratings yet
Unit 18
4 pages
Lecture Notes 12-Higher-Order Taylor Methods
No ratings yet
Lecture Notes 12-Higher-Order Taylor Methods
85 pages
Physics Paper 1 Quarter Year Examination 2013 Marking Scheme
No ratings yet
Physics Paper 1 Quarter Year Examination 2013 Marking Scheme
23 pages
Java Module Part1
No ratings yet
Java Module Part1
74 pages
Unit 1 Lesson 1-5
No ratings yet
Unit 1 Lesson 1-5
24 pages
Monorail
No ratings yet
Monorail
3 pages
Lecture 4 - Metrology & Measurement
No ratings yet
Lecture 4 - Metrology & Measurement
15 pages
Basic Geostatistics: Austin Troy
No ratings yet
Basic Geostatistics: Austin Troy
36 pages
Xii Worksheet 1 Ms (CH 1,2)
No ratings yet
Xii Worksheet 1 Ms (CH 1,2)
15 pages
TP Minor Test 4 P 1 Enthuse Jee (Advanced) 26.08.2024 f1
No ratings yet
TP Minor Test 4 P 1 Enthuse Jee (Advanced) 26.08.2024 f1
19 pages
(Download) SSC - CGL Tier-II Exam Paper-I (Arithmetical Ability) Held On - 16-09-2012 - SSCPORTAL PDF
No ratings yet
(Download) SSC - CGL Tier-II Exam Paper-I (Arithmetical Ability) Held On - 16-09-2012 - SSCPORTAL PDF
12 pages
K-2615 (Paper-II) (Mathematical Science)
No ratings yet
K-2615 (Paper-II) (Mathematical Science)
8 pages
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
No ratings yet
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
45 pages
Determinant and Matrices
No ratings yet
Determinant and Matrices
4 pages
Simone's Magnetic Force Lab: Research Question
No ratings yet
Simone's Magnetic Force Lab: Research Question
5 pages
CH2114
No ratings yet
CH2114
2 pages
ISEN 315 Spring 2011 Dr. Gary Gaukler
No ratings yet
ISEN 315 Spring 2011 Dr. Gary Gaukler
29 pages
CW1 Balancing of Rotating Masses
No ratings yet
CW1 Balancing of Rotating Masses
5 pages
Flowmeter Result
No ratings yet
Flowmeter Result
7 pages
Introduction to Advanced Mathematical Analysis
From Everand
Introduction to Advanced Mathematical Analysis
Simone Malacrida
No ratings yet
Exercises of Multi-Variable Functions
From Everand
Exercises of Multi-Variable Functions
Simone Malacrida
No ratings yet

Hota ML Regression

Uploaded by

Hota ML Regression

Uploaded by

Birla Institute of Technology and Science Pilani, Hyderabad Campus

BITS F464: Machine Learning (1st Sem 2024-25)

Source: www.macroaxis.com/stocks/ https://www.imdb.com/

Different types of Regression for different purposes. Ridge, Lasso, Bayesian, …

Simple Linear Regression

The target function: , where m adjustable parameters are held in vector

Simple Linear Regression

Hypothetically: Say, weight = 2 + 1.5 height

(People are clustered based

Img. Source: https://sphweb.bumc.bu.edu/

Img. Source: https://sphweb.bumc.bu.edu/

Examples: House price based on Floor area, Electricity

• Regularization: Ridge regression (L2 regularization) or Lasso regression

• Non-linear Regression Models: If the relationship is highly non-linear, use

Acknowledgement: Volker Tresp’s presentation

Acknowledgement: Volker Tresp’s presentation

(Polynomial basis function) (Gaussian basis function) (Sigmoidal basis function)

Global: a small change in x Local: a small change in x Local: a small change in x

Reduced rapidly. Weights tend

500 1000 1500 2000 2500 Size(x)

Acknowledgement: Andrew Ng, Stanford

Acknowledgement: Andrew Ng, Stanford

Acknowledgement: Andrew Ng, Stanford

Acknowledgement: Andrew Ng, Stanford

Move in the opposite direction

(visualized by using Contours)

What is his Average Speed? Δy/ Δx

Fastest Instantaneous speed?

That is the slope of f at the point (x, y)

(Img. Source: Wiki)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

Acknowledgement: Mohammad Hammoud, CMU (Qatar)

0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, …, 0.9, 1

Repeat until convergence{

Example House prices; Stock prices; Spam emails; Image classification;

• However, in many situations, the dependent variable is

• A patient arrives at the campus medical (BITS) with cough,

Question: Which one is dependent and which one is Independent variable?

• However, in many situations, the dependent variable is

Question: Which one is dependent and which one is Independent variable?

Should we fit a linear regression model to this data? No

• A suitable loss function in logistic regression is called the

• It penalizes deviations (incorrect probability predictions),

Define the Logistic Regression Model:

You might also like