CH 5

Uploaded by

radhikasn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views36 pages

CH 5

Uploaded by

radhikasn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

INTRODUCTION TO

REGRESSION
INTRODUCTION TO REGRESSION
• Regression analysis is the premier method of supervised learning.
• Given a training dataset D containing N training points (x„ y.), where
i = 1...N, regression analysis is used to model the relationship
between one or more independent variables xi and a dependent
variable y..
• The relationship between the dependent and independent variables
can be represented as a function as follows:

y =f(x)
y =f(x)

• The feature variable x is also known as an explanatory variable,

exploratory variable, a predictor variable, an independent variable
covariate, or a domain point.
• y is a dependent variable. Dependent variables are also called as labels,
target variables, or response variables.
• Regression analysis determines the change in response variables when
one exploration variable is varied while keeping all other parameters
constant.
• This is used to determine the relationship each of the exploratory
variables exhibits. Thus, regression analysis is used for prediction and
forecasting.
INTRODUCTION TO LINEARITY, CORRELATION,
AND CAUSATION
• The quality of the regression analysis is determined by the factors
such as correlation and causation.
Regression and Correlation :
• Correlation among two variables can be done effectively using a
Scatter plot, which is a plot between explanatory variables and
response variables.
• It is a 2D graph showing the relationship between two variables
• The x-axis of the scatter plot is independent, or input or predictor
variables
• y-axis of the scatter plot is output or dependent or predicted
variables.
• The positive, negative, and random correlations are given in Figure
• In positive correlation, one variable change is associated with the
change in another variable.
• In negative correlation, the relationship between the variables is
reciprocal while in random correlation, no relationship exists
between variables.
• While correlation is about relationships among variables, say x and
y, regression is about predicting one variable given another variable.
Regression and Causation
• Causation is about causal relationship among variables, say x and y.

• Causation means knowing whether x causes y to happen or vice versa.

• x causes y is often denoted as x implies y. Correlation and Regression

relationships are not same as causation relationship.

• For example, the correlation between economical background and marks

scored does not imply that economic background causes high marks.
Linearity and Non-linearity Relationships
• The linearity relationship between the variables means the
relationship between the dependent and independent variables can
be visualized as a straight line.

• The line of the form, y = ax + b can be fitted to the data points that
indicate the relationship between x and y.

• By linearity, it is meant that as one variable increases, the

corresponding variable also increases in a linear manner
Types of Regression Methods
• Linear Regression It is a type of regression where a line is fitted upon given data for finding the linear
relationship between one independent variable and one dependent variable to describe relationships.

• Multiple Regression It is a type of regression where a liner is fitted for finding the linear relationship between
two or more independent variables and one dependent variable to describe relationships among variables.

• Polynomial Regression It is a type of non-linear regression method of describing relationships among variables
where Nth degree polynomial is used to model the relationship between one independent variable and one
dependent variable.

• Polynomial multiple regression is used to model two or more independent variables and one dependent variable.

• Logistic Regression It is used for predicting categorical variables that involve one or more independent variables
and one dependent variable. This is also known as a binary classifier.

• Lasso and Ridge Regression Methods These are special variants of regression method where regularization
methods are used to limit the number and size of coefficients of the independent variables.
Limitations of Regression Method
• Outliers - Outliers are abnormal data. It can bias the outcome of the
regression. model, as outliers push the regression line towards it.
• Number of cases - The ratio of independent and dependent
variables should be at least 20 1. For every explanatory variable,
there should be at least 20 samples. Atleast five samples are
required in extreme cases.
• Missing data - Missing data in training data can make the model
unfit for the sampled data.
• Multicollinearity - If exploratory variables are highly correlated (0.9
and above), the regression is vulnerable to bias. Singularity leads to
perfect correlation of 1. The remedy is to remove exploratory
variables that exhibit correlation more than I. If there is a tie, then
the tolerance (1 - R squared) is used to eliminate variables that have
the greatest value.
INTRODUCTION TO LINEAR REGRESSION
• In the simplest form, the linear regression model can be created by
fitting a line among the scattered data points. The line is of the form
given in Eq.
y=a0+ a1* x +e
• Here, ao is the intercept which represents the bias and al represents
the slope of the line.
• These are called regression coefficients. e is the error in prediction.
The assumptions of linear regression are listed
as follows:
• The observation(y) are random and are mutually independent.
• The difference between the predicted and true values is called an
error. The error is also mutually independent with the same
distributions such as normal distribution with zero mean and
constant variables.
• The distribution of the error term is independent of the joint
distribution of explanatory variables.
• The unknown parameters of the regression models are constants.
• The idea of linear regression is based on Ordinary Least Square
(OLS) approach. This method is also known as ordinary least squares
method. In this method, the data points are modelled using a
straight line.
• Any arbitrarily drawn line is not an optimal line. In Figure
• In another words, OLS is an optimization technique where the
difference between the data points and the line is optimized.
Linear Regression in Matrix Form
VALIDATION OF REGRESSION METHODS
Coefficient of Determination
• The sum of the squares of the differences between the value of the
data pair and the average of y is called total variation

Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Linear Regression
No ratings yet
Linear Regression
3 pages
CP1 Acronyms 2024
No ratings yet
CP1 Acronyms 2024
10 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Sa1 Pu 14 PDF
No ratings yet
Sa1 Pu 14 PDF
212 pages
Unit 2
No ratings yet
Unit 2
19 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Unit - II - DA
No ratings yet
Unit - II - DA
22 pages
Regression
No ratings yet
Regression
14 pages
Unit III
No ratings yet
Unit III
13 pages
3-4 CLRM
No ratings yet
3-4 CLRM
87 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Regression
No ratings yet
Regression
39 pages
2023 Statistics Fin 10
No ratings yet
2023 Statistics Fin 10
14 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
REGRESSION
No ratings yet
REGRESSION
38 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Intermediate Analytics-Regression-Week 1
No ratings yet
Intermediate Analytics-Regression-Week 1
52 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
Regression
No ratings yet
Regression
7 pages
Correlation and Regression Notes
No ratings yet
Correlation and Regression Notes
5 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
4 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Model Development
No ratings yet
Model Development
80 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
Econometrics - Chapter - Chapter - II
No ratings yet
Econometrics - Chapter - Chapter - II
34 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
Regression
No ratings yet
Regression
25 pages
Module 3
No ratings yet
Module 3
34 pages
Module 9 - Simple Linear Regression & Correlation
No ratings yet
Module 9 - Simple Linear Regression & Correlation
29 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Syllabus Professional Examination March, 2023 New
No ratings yet
Syllabus Professional Examination March, 2023 New
41 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
CH-15 - IInd Sem 23-24
No ratings yet
CH-15 - IInd Sem 23-24
99 pages
Azon Salary Guide
No ratings yet
Azon Salary Guide
56 pages
5.population Pyramids
No ratings yet
5.population Pyramids
15 pages
100 Employee Data Set
No ratings yet
100 Employee Data Set
7 pages
Scholarship PDF
No ratings yet
Scholarship PDF
10 pages
Pas-26 Far
No ratings yet
Pas-26 Far
8 pages
A Sample Mid-Term Examination of Econometrics Multiple Choice
No ratings yet
A Sample Mid-Term Examination of Econometrics Multiple Choice
8 pages
FIRMW
No ratings yet
FIRMW
30 pages
Wa0002.
No ratings yet
Wa0002.
57 pages
Econometrics 4
No ratings yet
Econometrics 4
37 pages
3674 Bachelor of Actuarial Studies - Computer Science
No ratings yet
3674 Bachelor of Actuarial Studies - Computer Science
7 pages
Multinomial Logistic Regression
No ratings yet
Multinomial Logistic Regression
18 pages
BBS Brochure PDF
No ratings yet
BBS Brochure PDF
15 pages
Intro Regression Modeling
No ratings yet
Intro Regression Modeling
11 pages
Module 2 - Chapter 3
No ratings yet
Module 2 - Chapter 3
23 pages
Re Contextualizing Pre Sentence Reports
No ratings yet
Re Contextualizing Pre Sentence Reports
25 pages
Notes M5
No ratings yet
Notes M5
13 pages
CPP
No ratings yet
CPP
4 pages
3.structure of C 9.1-9.14
No ratings yet
3.structure of C 9.1-9.14
38 pages
Module 2 Mid Terms 2023-2025
No ratings yet
Module 2 Mid Terms 2023-2025
11 pages
Assignment1 CPP N
No ratings yet
Assignment1 CPP N
1 page
Data Analytics For Non-Life Insurance Pricing
No ratings yet
Data Analytics For Non-Life Insurance Pricing
240 pages
Mces Assg-2
No ratings yet
Mces Assg-2
1 page
SSR 2 PDF
No ratings yet
SSR 2 PDF
9 pages
Đầu Tư Tài Chính Index and Full Covariance.
No ratings yet
Đầu Tư Tài Chính Index and Full Covariance.
19 pages
EDA Template
No ratings yet
EDA Template
18 pages
Mps 3 - Desain Studi Kohort - Dr. Budi Utomo, DR., M.kes.
No ratings yet
Mps 3 - Desain Studi Kohort - Dr. Budi Utomo, DR., M.kes.
43 pages
LN - ieML LogisticRegression
No ratings yet
LN - ieML LogisticRegression
21 pages
James Steiger R For MultipleRegressionIntro
No ratings yet
James Steiger R For MultipleRegressionIntro
54 pages
Data - FORECASETING MENGGUNAKAN REGRESI - 21060027 - MUHAMMAD SYIHABUDIN RIYADI
No ratings yet
Data - FORECASETING MENGGUNAKAN REGRESI - 21060027 - MUHAMMAD SYIHABUDIN RIYADI
11 pages
Lecture - Hoi Qui Don - DT - New - 8.5
No ratings yet
Lecture - Hoi Qui Don - DT - New - 8.5
10 pages
Sminar 8
No ratings yet
Sminar 8
10 pages
Problem Set 2
No ratings yet
Problem Set 2
3 pages
Quantile Regression
No ratings yet
Quantile Regression
3 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet

CH 5

Uploaded by

CH 5

Uploaded by

INTRODUCTION TO

• The feature variable x is also known as an explanatory variable,

• Causation means knowing whether x causes y to happen or vice versa.

• x causes y is often denoted as x implies y. Correlation and Regression

• For example, the correlation between economical background and marks

• By linearity, it is meant that as one variable increases, the

You might also like