0% found this document useful (0 votes)

55 views39 pages

Regression

This document provides an introduction to regression analysis, including what regression is, why it is used, different types of regression models, and how to create a regression model. It discusses simple and multiple linear regression, as well as nonlinear regression. It also covers model assumptions, functional form, scatter plots, and the method of least squares for estimating regression parameters from sample data.

Uploaded by

Pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views39 pages

Regression

Uploaded by

Pradeep

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

Introduction

to Regression
In this session
• Introduction to Regression.
– What is Regression?
– Why we need Regression?
– Different types of Regression Models
– How to create a Regression model?

• Method of Least Squares for simple linear regression.

What is Regression?
• Regression is a tool for finding existence of an association
relationship between a dependent variable (Y) and one or more
independent variables (X1, X2, …, Xn) in a study.

• The relationship can be linear or non-linear.

Regression
An important tool in Predictive Analytics

Regression is a supervised learning algorithm under

Machine Learning terminology
Regression -‐ Definition

A statistical technique that attempts to determine

the existence of a possible relationship between one
dependent variable (usually denoted by Y) and a
collection of Independent variables.

Regression is used for generating new hypothesis

and for validating a hypothesis
Regression Vs Correlation
• Regression is the study of, “existence of a relationship”,
between two variable. The main objective is to estimate the
change in mean value of independent variable.

• Correlation is the study of, “strength of relationship”, between

two variables.
Importance of Regression

• In 1980, Supreme court of USA recognized regression

as a valid method of identifying discrimination.

• American Food and Drug Administration (FDA) uses

regression as an approved tool for validating food
and drug products.
Why we need Regression?
• Companies would like to know about factors that has
significant impact on their Key Performance Indicators (KPI).

• Regression helps to create new hypothesis that may assist the

companies to improve their performance.
Types of Regression
Types of Regression
Regression
Models
One More than one
independent independent variable
variable

Simple Multiple
Regression Regression

Linear Non-‐linear Linear Non-‐linear

Types of Regression
• Simple linear regression – refers to a regression model
between two variables.
Y = β 0 + β1 X 1 + ε

• Multiple linear regression – refers to a regression model on

more than one independent variables.
Y = β 0 + β1 X 1 + β 2 X 2 + ... + β k X k + ε

• Nonlinear regression.
1
Y = β0 + + X 2β 3 + ε
β1 + β 2 X 1
Linear Regression

• Linear regression stands for a function that is

linear in regression coefficients.

• The following equation will be treated as

linear as far as regression is concerned.
Y = β1 + β1X1 + β2 X1X 2 + β3 X 22
Multiple Linear Regression
• Multiple linear regression means linear in regression
parameters (beta values). The following are examples of
multiple linear regression:

Y = β0 + β1x1 + β 2 x2 + ... + β k xk + ε
2
Y = β0 + β1x1 + β 2 x2 + β3 x1x2 + β 4 x2 ... + β k xk + ε

An important task in multiple regression is to estimate

the beta values (β1, β2, β3 etc…)
Regression Model Development
Regression Model Development

Pre-process the data

Derive and Analyze and divide the data
Explore the data into training and
Descriptive Statistics
validation data

Perform Estimate regression Define functional form

Diagnostic Tests parameters of the relationship

Model satisfies
diagnostic test
YES STOP
Regression Functional Form
1. Hypothesize Deterministic Component

2. Estimate Unknown Model Parameters

3. Specify Probability Distribution of Random Error Term

– Estimate Standard Deviation of Error

4. Validate Model for its fitness.

5. Use Model for Prediction & Estimation

Functional Form
Functional Form

• Specify the explanatory variables.

• Specify the nature of relationship between dependent variable

and explanatory variables.
Linear Regression Model

Relationship between variables is a linear function

Population Population Random

Y-Intercept Slope Error

Yi = β 0 + β1X i + ε i
Dependent Independent
(Response) (Explanatory)
Variable Variable
(e.g., income) (e.g., education)
Deterministic component in
Regression
General form of Regression Models

Y = Deterministic Component + Random Error

where E(Y) = Deterministic Component

The deterministic component is a mathematical

combination of the independent variables.
Scatter Plot
What is scatter plot?
• Scatter plot (also called scatter diagram) is a graph
used to display and compare two or more variables.

• Scatter plot doesn’t require the user to specify

dependent and independent variable.
Regression Model Development

Explore the data –

create training and
Derive and Analyze
Pre-process the data
validation dataset Descriptive Statistics

Define functional form

Perform Estimate regression
of the relationship
Diagnostic Tests parameters using training data

Model satisfies
diagnostic test
YES STOP
Interpreting a Scatter plot

• In any graph of data, look for

– The Overall pattern and
– Striking deviations from that pattern
• You can describe the overall pattern of a scatterplot by the
– Form (Linear pattern?)
– Direction (Positive, Negative or Flat)
– Strength of the relationship (Correlation)
• Watch for outliers
– An outlier is an individual value(s) that falls outside the
overall pattern of the relationship
No Relationship Quadratic

Strong Positive Exponential

Strong negative Outlier

Regression Model Development

Explore the data and Derive and Analyze

create training and Pre-process the data
validation data set
Descriptive Statistics

Perform Estimate regression Define functional form

Diagnostic Tests parameters of the relationship

Model satisfies
diagnostic test
YES STOP
Model Assumptions
Linear Regression Model Assumptions
• The regression model is linear in parameters.
• The explanatory variable X is assumed to be non-stochastic.
• Given the value of X (say Xi), the mean of the random error term
εi is zero.
• The error term, εi, follows a normal distribution.
• Given the value of X, the variance of εi is constant
(Homoscedasticity).
• There is no autocorrelation between two εi values.
Assumptions Continued…

• Low correlation between Xi and εi.

• The number of observations n must be greater than the
number of parameters to be estimated.
• The X values in a given sample must not all be the same.
Technically, Var(X) must be a finite positive number.
• The regression model is correctly specified.
• There is no perfect multi-collinearity (no perfect linear
relationship) among explanatory variables.
Estimation of Parameters
Estimation of Parameters

Population Random Sample

Unknown
Relationship J $
Yi = β 0 + β1X i + ε i J $
J $
J $ J $
J $
J $
Population Linear Regression Model

Y
Yi = β 0 + β1X i + ε i
εi = Random error

X
∧ ∧
Observed value E (Y X ) = β 0 + β1 X i
What is the best fit?
How would you draw a line through the points?
How do you determine which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60
Method of Ordinary Least Squares (OLS)
Least Squares Graphically
n
LS minimizes ∑ ε̂ i2 = ε̂12 + ε̂ 22 + ε̂ 32 + ε̂ 24
i =1

Y Y2 = β! 0 + β! 1X 2 + ε! 2
ε^4
ε^2
ε^1 ε^3
! ! !
Yi = β 0 + β 1X i
X
Estimation of Parameters in Regression

The least squares function is given by

2
n n
⎛ k ⎞
2
SSE = ∑ ε = ∑ ⎜⎜ yi − β 0 − ∑ β j xij ⎟⎟
i
i =1 i =1 ⎝ j =1 ⎠
Regression Coefficient (β1) in SLR

ˆ ∑ (xi − x )( yi − y ) Cov( X , Y )
β1 = 2
=
∑ (xi − x ) Var ( X )

ˆ SY
β1 = r ×
SX
where r is the correlation coefficient between X and Y
SY is the standard deviation of Y
SX is the standard deviation of X
Why Least Squares Estimate
• OLS beta estimates are, “Best Linear Unbiased Estimates
(BLUE)”, provided the error terms are uncorrelated (no auto
regression) and have equal variance (homoscedasticity). That
is,

⎡ ∧ ⎤
E ⎢β − β ⎥ = 0
⎣ ⎦
Advantages of OLS Estimates
• They are unbiased estimates.

• They (estimates) have minimum variance.

• They have consistency, as the sample size increases,

the estimate, β i, converges to the true population
∧

parameter value, βi.

Never - Ending - Quest - Graves PDF
100% (10)
Never - Ending - Quest - Graves PDF
592 pages
Gribanov Albert Einstein Philosophical Views and The Theory of Relativity Progress 1987
0% (1)
Gribanov Albert Einstein Philosophical Views and The Theory of Relativity Progress 1987
275 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
LESSON 5 Nature of Knowledge
No ratings yet
LESSON 5 Nature of Knowledge
17 pages
Hence, Philosophy Is The Love of Wisdom
No ratings yet
Hence, Philosophy Is The Love of Wisdom
44 pages
Solution Manual Accounting Theory Godfrey 7ed Chapter 4
No ratings yet
Solution Manual Accounting Theory Godfrey 7ed Chapter 4
9 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
TEXT BOOK - Book - Stangor.complete - Optimized
No ratings yet
TEXT BOOK - Book - Stangor.complete - Optimized
479 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Unit III
No ratings yet
Unit III
13 pages
F Regression
No ratings yet
F Regression
65 pages
UE20CS312 Unit2 Slides
No ratings yet
UE20CS312 Unit2 Slides
206 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
1 - UNIT 2 2 Files Merged
No ratings yet
1 - UNIT 2 2 Files Merged
80 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Regression
No ratings yet
Regression
14 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
65 pages
Model Development
No ratings yet
Model Development
80 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
High Scope Lesson Plan
100% (1)
High Scope Lesson Plan
3 pages
Module 3
No ratings yet
Module 3
34 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Intro To Reg Models
No ratings yet
Intro To Reg Models
27 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Lecture Note #8 - PEC-CS701E
No ratings yet
Lecture Note #8 - PEC-CS701E
20 pages
CH 5
No ratings yet
CH 5
36 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Sluga, Hans D. Frege's Alleged Realism - 1977 PDF
No ratings yet
Sluga, Hans D. Frege's Alleged Realism - 1977 PDF
18 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Ra Web
No ratings yet
Ra Web
70 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Practical Research Module 5
50% (2)
Practical Research Module 5
2 pages
Searle (1975b 1-23) PDF
No ratings yet
Searle (1975b 1-23) PDF
24 pages
QMM Epgdm 5
No ratings yet
QMM Epgdm 5
58 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Yr2 Arithmetic Questions 1 PDF
No ratings yet
Yr2 Arithmetic Questions 1 PDF
44 pages
Chapter - 3.
No ratings yet
Chapter - 3.
14 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Conflict Management: Candyce Reynolds, Ph.D.. University Studies
No ratings yet
Conflict Management: Candyce Reynolds, Ph.D.. University Studies
47 pages
Regression
No ratings yet
Regression
25 pages
2MLIntrodpart 2
No ratings yet
2MLIntrodpart 2
42 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Correlation
No ratings yet
Correlation
13 pages
PDF 501268 Scott Burchill National Interest in Ir Theory DD
No ratings yet
PDF 501268 Scott Burchill National Interest in Ir Theory DD
33 pages
2023 Statistics Fin 10
No ratings yet
2023 Statistics Fin 10
14 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Anthropocentrism vs. Nonanthropocentrism PDF
No ratings yet
Anthropocentrism vs. Nonanthropocentrism PDF
18 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
Graphical Relationships Among F (X), F' (X), F'' (X)
No ratings yet
Graphical Relationships Among F (X), F' (X), F'' (X)
13 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
2-7flow Chart & Paragraph Proofs
No ratings yet
2-7flow Chart & Paragraph Proofs
25 pages
Project SIPAG Shield On An Influential Procrastination Awareness and Guide
No ratings yet
Project SIPAG Shield On An Influential Procrastination Awareness and Guide
38 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Evaluating Resources: The CRAPP TEST: Your Name
No ratings yet
Evaluating Resources: The CRAPP TEST: Your Name
2 pages
Gyekye's Moderate Communitarianism: A Case of Radical Communitarianism in Disguise
No ratings yet
Gyekye's Moderate Communitarianism: A Case of Radical Communitarianism in Disguise
26 pages
Advances in Hyper Game Theory PDF
No ratings yet
Advances in Hyper Game Theory PDF
8 pages
Reading 2 Instrument Types and Characteristics
No ratings yet
Reading 2 Instrument Types and Characteristics
12 pages
Lesson 4: More Examples of Functions: Student Outcomes
No ratings yet
Lesson 4: More Examples of Functions: Student Outcomes
11 pages
What Is Your Understanding of The Concept of Governance? and What Is Your Own Concept of Governance? Explain
No ratings yet
What Is Your Understanding of The Concept of Governance? and What Is Your Own Concept of Governance? Explain
5 pages
Journal Critique
No ratings yet
Journal Critique
5 pages
Measuring The Academic Self-Efficacy of Undergraduates The Role of Gender
No ratings yet
Measuring The Academic Self-Efficacy of Undergraduates The Role of Gender
7 pages
Section 322
No ratings yet
Section 322
6 pages
Limitations of The Study - Organizing Your Social Sciences Research Paper - Research Guides at University of Southern California
No ratings yet
Limitations of The Study - Organizing Your Social Sciences Research Paper - Research Guides at University of Southern California
1 page
Degree of Domination by Martineau
No ratings yet
Degree of Domination by Martineau
2 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
Area Conversion
80% (5)
Area Conversion
1 page
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Regression

Uploaded by

Regression

Uploaded by

Introduction

• Method of Least Squares for simple linear regression.

• The relationship can be linear or non-linear.

Regression is a supervised learning algorithm under

A statistical technique that attempts to determine

Regression is used for generating new hypothesis

• Correlation is the study of, “strength of relationship”, between

• In 1980, Supreme court of USA recognized regression

• American Food and Drug Administration (FDA) uses

• Regression helps to create new hypothesis that may assist the

Linear Non-­‐linear Linear Non-­‐linear

• Multiple linear regression – refers to a regression model on

• Linear regression stands for a function that is

• The following equation will be treated as

An important task in multiple regression is to estimate

Pre-process the data

Perform Estimate regression Define functional form

2. Estimate Unknown Model Parameters

3. Specify Probability Distribution of Random Error Term

4. Validate Model for its fitness.

5. Use Model for Prediction & Estimation

• Specify the explanatory variables.

• Specify the nature of relationship between dependent variable

Relationship between variables is a linear function

Population Population Random

Y = Deterministic Component + Random Error

where E(Y) = Deterministic Component

The deterministic component is a mathematical

• Scatter plot doesn’t require the user to specify

Explore the data –

Define functional form

• In any graph of data, look for

Strong Positive Exponential

Strong negative Outlier

Explore the data and Derive and Analyze

Perform Estimate regression Define functional form

• Low correlation between Xi and εi.

Population Random Sample

The least squares function is given by

• They (estimates) have minimum variance.

• They have consistency, as the sample size increases,

parameter value, βi.

You might also like

Linear Non-‐linear Linear Non-‐linear