Program Name: B.Tech CSE Semester: 5th Course Name: Machine Learning Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
Program Name: B.Tech CSE Semester: 5th Course Name: Machine Learning Course Code:PEC-CS-D-501 (I) Facilitator Name: Aastha
Tech CSE
Semester : 5th
Course Name: Machine Learning
Course Code:PEC-CS-D-501 (I)
Facilitator Name: Aastha
Introduction to Regression
Analysis
Regression analysis is used to:
Predict the value of a dependent variable based on the
value of at least one independent variable
Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to
predict or explain
Independent variable: the variable used to explain
the dependent variable
Slide-8
Simple Linear Regression Model
Slide-9
Types of Relationships
Linear relationships Curvilinear relationships
Y Y
X X
Y Y
X X
Slide-10
Types of Relationships
(continued)
Strong relationships Weak relationships
Y Y
X X
Y Y
X X
Slide-11
Types of Relationships
(continued)
No relationship
X
Slide-12
Simple Linear Regression Model
Population Random
Population Independent Error
Slope
Y Variable term
Coefficient
Dependent intercept
Variable
Yi β 0 εi
β1 X i Random Error
component
Y Yi β 0 β 1 X i
Observed Value εi
of Y for Xi
εi Slope = β1
Predicted Value
Random Error
of Y for Xi
for this Xi value
Intercept = β0
Xi X
Slide-14
Simple Linear Regression
Equation (Prediction Line)
The simple linear regression equation provides an
estimate of the population regression line
Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for
observation i
Y ˆi b 0
b 1Xi
The individual random error terms ei have a mean of zero
Slide-15
Sample Data for House Price
Model
House Price in $1000s Square Feet
(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700
Slide-16
Regression Using Excel
Tools / Data Analysis / Regression
Slide-17
Assumptions of Regression
Use the acronym LINE:
Linearity
The underlying relationship between X and Y is linear
Independence of Errors
Error values are statistically independent
Normality of Error
Error values (ε) are normally distributed for any given value of
X
Equal Variance (Homoscedasticity)
The probability distribution of the errors has constant
variance
Department of Statistics, ITS Surabaya Slide-18
Pitfalls of Regression Analysis
Lacking an awareness of the assumptions
underlying least-squares regression
Not knowing how to evaluate the
assumptions
Not knowing the alternatives to least-squares
regression if a particular assumption is violated
Using a regression model without knowledge of
the subject matter
Extrapolating outside the relevant range
Department of Statistics, ITS Surabaya Slide-19
Aravali College of Engineering And Management
Jasana, Tigoan Road, Neharpar, Faridabad, Delhi NCR
Toll Free Number : 91- 8527538785
Website : www.acem.edu.in
09/10/2020 20