0% found this document useful (0 votes)

71 views14 pages

2023 Statistics Fin 10

Regression analysis is used to determine the relationship between two or more variables and predict the value of a dependent variable based on the value of an independent variable. Simple linear regression models the relationship between one independent and one dependent variable with a straight line. Key assumptions of linear regression models include that the relationship is linear, observations are independent, and independent variables are not correlated. Diagnostic plots and statistical tests can check if a dataset meets the assumptions.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views14 pages

2023 Statistics Fin 10

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Statistics

Lecture 10
regression analysis

Regression - helps to determine the functional relationship between 2 or more variables

- is usually conducted:
• when we want to know whether any relationship between variables actually exists
• when we want to understand the nature of the relationship between variables (i.e. strength, direction )
• when we want to predict a variable given the value of others X Y
Independent Dependent
Predictor Response
Explanatory Explained
Exogenous Endogenous
What is the difference between correlation and regression? Regressor Regressand
Eg. height depends on weight

Regression Correlation
able to assess the significance of the independent not able
variable in explaining the variability or behaviour of the
dependent variable
able to predict and optimize a variable given the value of not able
others
able to show cause and effect not able – only relationship
data represented by line data represented in single point
regression analysis

Linear regression models

• the important and the most used in applications class of regression models
• attempt to explain the relationship between two or more variables using a straight line
• can often lead to very useful results
• even if the relationship between dependent and one or more of the independent variables is nonlinear
and describe by a nonlinear model we can change it to a linear model by using an appropriate
transformation.

If we have only one independent variable - the model is call “simple” - eg. dependent –sales, independent - advertising,
If we have many independent variables - the model is call “multiply”- eg. dependent –sales, independent - advertising,
size of the shop, number of shops, number of customers,

Looking for the best solution (model) we usually start from many independent variables
– from multiply linear regression

… but multiply linear regression approach is based on simple linear regression approach –

…so we limit our attention to this class of regression models - simple linear regression
regression analysis
y = 𝑎 + 𝑏𝑥 − algebra

Simple linear regression - is a statistical model used to study the relationship

between 𝒚 and 𝒙 if they are related LINEARLY.
X Y
Independent Dependent
The population simple linear regression model Predictor Response
Explanatory Explained
- the type I regression model – for population
Exogenous Endogenous
Regressor Regressand
𝑌 = 𝛼0 + 𝛼1 𝑋 + 𝜀 (1) Eg. data on advertising, sales

where:
𝑌 - the dependent variable, the variable we wish to explain or predict
𝑋 - the independent variable
𝜀 - the error term, the only random component in the model sales
𝛼0 - the population intercept of the line line - nonrandom component
𝛼1 - the population slope of the line

Regression
• parametric approach. shops

‘Parametric’ - we estimate different parameters of 𝛼1 𝑌 = 𝛼0 + 𝛼1 𝑋

regression equation like : slope, intercept -
- we have to make assumptions about data for the
purpose of analysis.
• is restrictive in nature – NO good results with datasets which 𝛼0 - the intercept
do not fulfill regression assumptions (scaterplt, statistical tests) advertising
Aczel A. D., Sounderpandian J., (2017, 2005), Statystyka w zarządzaniu (Complete Business Statistics)
regression analysis Model assumptions

1. The true relationship between dependent variable and independent variable(s) is linear -

change in dependent variable due to one unit change in independent variable is constant, regardless of the value of independent variable
(If we fit a linear model to a non-linear data set, the regression algorithm would can not capture all data - this will lead to an inefficient model
and wrong predictions)
How to check: we can look at residual vs fitted value plots
The scatterplot shows the perfect setup for
a linear regression:

https://itfeature.com/time-series-analysis-and-forecasting/autocorrelation/residuals-plot-for-detection-of-autocorrelation
https://www.andrew.cmu.edu/user/achoulde/94842/homework/regression_diagnostics.html https://data.library.virginia.edu/diagnostic-plots/ https://data.library.virginia.edu/normality-assumption/
regression analysis Model assumptions

2. The observations are independent of one another → NO autocorrelation = no correlation between the residual
(error) terms - this is an issue with time series data -- that means - data with a natural time-ordering (the next instant is dependent on
previous instant)
Autocorrelation - observations are sequentially correlated
Autocorrelation strongly reduces model’s accuracy.
How to check: we can use Durbin – Watson (DW) test, we can look at residual vs time value plots
Some pattern, so AC
Some pattern, so AC

Some pattern, so AC

No AC

ONLY random pattern of residuals indicates

the non-presence of autocorrelation
https://itfeature.com/time-series-analysis-and-forecasting/autocorrelation/residuals-plot-for-detection-of-autocorrelation
https://www.andrew.cmu.edu/user/achoulde/94842/homework/regression_diagnostics.html https://data.library.virginia.edu/diagnostic-plots/ https://data.library.virginia.edu/normality-assumption/
regression analysis Model assumptions

3. The independent variables should not be correlated→NO multicollinearity

In a model with correlated variables it becomes difficult to find out which variable is actually contributing to predict the dependent variable.
How to check: we can use scatterplot, VIF factor, a correlation table

Scatter Plot Matrix

The variables are:

• Age (years),
• Weight (kg),
• Oxygen intake rate (ml per kg body weight per min.),
• RunTime - time to run 1.5 miles (minutes)

https://itfeature.com/time-series-analysis-and-forecasting/autocorrelation/residuals-plot-for-detection-of-autocorrelation
https://www.andrew.cmu.edu/user/achoulde/94842/homework/regression_diagnostics.html https://data.library.virginia.edu/diagnostic-plots/
regression analysis Model assumptions

4. The error terms must have constant variance – homoskedasticity→NO heteroskedasticity, non-constant

variance - arises in presence of outliers

How to check: we can look at residual vs fitted values plot, we can use Breusch-Pagan / Cook – Weisberg test or White general test.
If heteroskedasticity exists -> funnel shape pattern

funnel shape

non-constant variance

variance arises in presence

of outliers
regression analysis Model assumptions

5. The error terms must be normally distributed - 𝜀~𝑁 0, 𝜎 2

How to check: we can look at QQ plot, we can use Kolmogorov-Smirnov test, Shapiro-Wilk test

Normal Q-Q Plot (quantile-quantile)

Q-Q Plot helps validate the assumption of normal

distribution in a data set
- If the data comes from a normal distribution, the
plot shows fairly straight line

- Absence of normality in the errors can be seen

with deviation in the straight line

https://itfeature.com/time-series-analysis-and-forecasting/autocorrelation/residuals-plot-for-detection-of-autocorrelation https://tobeneo.files.wordpress.com/2013/12/plot.jpg
https://www.andrew.cmu.edu/user/achoulde/94842/homework/regression_diagnostics.html https://data.library.virginia.edu/diagnostic-plots/ https://data.library.virginia.edu/normality-assumption/
https://www.analyticsvidhya.com/blog/2016/07/deeper-regression-analysis-assumptions-plots-solutions/
regression analysis Regression Plots
variance arises
Scatter plot - Residual vs Fitted Values (Predicted Values ) in presence of
outliers
We can check linearity, autocorrelation, multicollinearity, homoskedasticity:
• is there any pattern in this plot
• is there any funnel shape funnel shape (e)

non-constant variance

• If residuals show problems with

heteroscedasticity or non-
normality, we could try
transforming the raw data.

• if we don’t meet the linearity

assumption, we can check if we
can do logistic regression
instead.

Even if the assumptions are not met, we could still use the model to draw conclusion - but –
about the sample, NOT POPULATION
to generalize the model to POPULATION - the assumptions must be met
https://tobeneo.files.wordpress.com/2013/12/plot.jpg
https://www.analyticsvidhya.com/blog/2016/07/deeper-regression-analysis-assumptions-plots-solutions/
regression analysis Simple linear regression

The population simple linear regression model

- the type I regression model – for population
𝑌 = 𝛼0 + 𝛼1 𝑋 + 𝜀
1 2
The estimated regression equation For each particular data point
- the type II regression model

𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑒 (1) 𝑦𝑖 = 𝑎0 + 𝑎1 𝑥𝑖 + 𝑒𝑖 (2)

where:
𝑎0 estimates 𝛼0 where
𝑎1 estimates 𝛼1 𝑖 = 1, 2, , 𝑛
e - the observed errors—the residuals from fitting the line 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖
𝑎0 + 𝑎1 𝑥 to the data set of n points. 𝑒1 = 𝑦1 − 𝑦ො1 is the first residual, the distance from the 1st
data point to the fitted regression line
𝑒𝑛 , the 𝑛 − 𝑡ℎ error
3
The errors 𝑒𝑖 are viewed as estimates of the true population errors 𝜀𝑖 .
sales
The regression line
𝑦ො = 𝑎0 + 𝑎1 𝑥

𝑦ො = 𝑎0 + 𝑎1 𝑥 (3)
where:
𝑦ො - predicted value of 𝑦 or
𝑦 value lying on the fitted regression line for a given 𝑥
advertising
https://www.jmp.com/en_us/statistics-knowledge-portal/what-is-regression/the-method-of-least-squares.html
Aczel A. D., Sounderpandian J., (2017), Statystyka w zarządzaniu (Complete Business Statistics)
regression analysis
Estimates of the unknown population parameters
Eg. data on advertising and sales 𝛼0 and 𝛼1 are obtained by the method of least squares.

sales raw data

different lines passing

through the dataset
advertising

line that minimizes sum of the squared errors - SSE

very large errors

𝑦ො = 𝑎0 + 𝑎1 𝑥

some errors are positive and others are negative

- If we want to minimize all the errors, we should minimize the sum
of the squared errors (SEE )

Aczel A. D., Sounderpandian J., (2017), Statystyka w zarządzaniu (Complete Business Statistics)
regression analysis

Simple linear regression - we have to find the least-squares line-the line that minimizes SSE sum of the squared errors

The regression function y with respect to x in the random sample

the estimated regression equation the regression line

𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑒 𝑦ො = 𝑎0 + 𝑎1 𝑥

Least-squares regression estimators:

𝑆𝑆𝑥𝑦 σ𝑛
𝑖=1 𝑥𝑖 −𝑥ҧ 𝑦𝑖 −𝑦ത 𝑆𝑦
the slope 𝑎1 = = σ𝑛 2
(1) 𝑎1 = 𝑟 (2)
𝑆𝑆𝑥 𝑖=1 𝑥𝑖 −𝑥ҧ 𝑆𝑥
formula for detail data

where:
the intercept 𝑎0 = 𝑦ത − 𝑎1 𝑥ҧ 𝑟 is the Pearson’s correlation coefficient

𝒂𝟏 parameter indicates the average change of y variable given the x value increases by 1

𝒂𝟎 informs what would be the hypothetical value of y when x = 0

(that is only the mathematical interpretation and often it doesn’t make any economic sense)

! 𝑟 and 𝑎1 always have the same sign

regression analysis

Simple linear regression Excel LINEST function

Eg. data on sales and cost of sales. We want to estimate regression equation (parameters 𝑎1 , 𝑎0 )

Sales Cost
sprzedaż koszty xi − x yi − y
2
x x($) y ($
y
) xi − x ( x i − x )( y i − y ) ( xi − x ) ( y i − y ) 2
0 18 -90 -2,91 261,9 8100 8,4681
20 20,3 -70 -0,61 42,7 4900 0,3721
40 20,5 -50 -0,41 20,5 2500 0,1681
60 20,4 -30 -0,51 15,3 900 0,2601
80 21,2 -10 0,29 -2,9 100 0,0841
100 21,7 10 0,79 7,9 100 0,6241
120 21,3 30 0,39 11,7 900 0,1521 900
x= = 90
140 21,6 50 0,69 34,5 2500 0,4761 10
160 22,2 70 1,29 90,3 4900 1,6641
209,1
180 21,9 90 0,99 89,1 8100 0,9801 y= = 20,91
900 209,1 571 33000 13,249 10

𝑆𝑆𝑥𝑦 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത 571

𝑎1 = = 𝑛
=
33000
=0.017 𝑦ො = 𝑎0 + 𝑎1 𝑥
𝑆𝑆𝑥 σ𝑖=1 𝑥𝑖 − 𝑥ҧ 2

𝑎0 = 𝑦ത − 𝑎1 𝑥ҧ = 20.91 − 0.017 ∙ 90 = 19.47

𝑦ො = 19.47 + 0.017𝑥

the average increase in cost (y) given the sales (x) value increases by 1($) equals 0.017 ($)

the hypothetical value of cost (y) when sales (x) = 0 equals 19.47 ($)

TM63 PDF
100% (2)
TM63 PDF
21 pages
Business Requirements Document Template 38
No ratings yet
Business Requirements Document Template 38
8 pages
Management Science Notes
No ratings yet
Management Science Notes
13 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Model Development
No ratings yet
Model Development
80 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
BA Unit 2 Notes
No ratings yet
BA Unit 2 Notes
5 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Slides
No ratings yet
Slides
39 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Module 3
No ratings yet
Module 3
34 pages
CH 5
No ratings yet
CH 5
36 pages
1 Linear Regression
No ratings yet
1 Linear Regression
22 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Unit III
No ratings yet
Unit III
13 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
d90840b8 1721727178674
No ratings yet
d90840b8 1721727178674
43 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Practical - Regression
No ratings yet
Practical - Regression
114 pages
Module 2 Part 1 - Types of Forecasting Models and Simple Linear Regression
No ratings yet
Module 2 Part 1 - Types of Forecasting Models and Simple Linear Regression
71 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Unit 2
No ratings yet
Unit 2
136 pages
UE20CS312 Unit2 Slides
No ratings yet
UE20CS312 Unit2 Slides
206 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Regression
No ratings yet
Regression
53 pages
STATG5 - Simple Linear Regression Using SPSS Module
No ratings yet
STATG5 - Simple Linear Regression Using SPSS Module
16 pages
Aiml Module 3 Part 3
No ratings yet
Aiml Module 3 Part 3
12 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Regression
No ratings yet
Regression
39 pages
Meweek 3
No ratings yet
Meweek 3
57 pages
Regression Analysis (AI)
No ratings yet
Regression Analysis (AI)
9 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
BUSINESS STATISTICS: Simple Linear Regression and Correlation
No ratings yet
BUSINESS STATISTICS: Simple Linear Regression and Correlation
55 pages
Chapter 06-Regression Analysis
No ratings yet
Chapter 06-Regression Analysis
41 pages
Evans Analytics2e PPT 08
No ratings yet
Evans Analytics2e PPT 08
65 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Hanan
No ratings yet
Hanan
9 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Unit - II - DA
No ratings yet
Unit - II - DA
22 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
DS-unit-4.pptx (1)
No ratings yet
DS-unit-4.pptx (1)
21 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Regression Analysis Presentation
No ratings yet
Regression Analysis Presentation
52 pages
Predictive Analytics-Mid Sem Exam Question Bank
No ratings yet
Predictive Analytics-Mid Sem Exam Question Bank
28 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Module 3 - Regression and Correlation Analysis
No ratings yet
Module 3 - Regression and Correlation Analysis
54 pages
MIS BA 20232024 Notes Chapter03
No ratings yet
MIS BA 20232024 Notes Chapter03
13 pages
Hair EOMA 2024 Release Chap005 IM
No ratings yet
Hair EOMA 2024 Release Chap005 IM
19 pages
NMR 44 Hershman PDF
No ratings yet
NMR 44 Hershman PDF
5 pages
AI For CV Labmanual
No ratings yet
AI For CV Labmanual
23 pages
Scribd 2
No ratings yet
Scribd 2
9 pages
Manual Módulo Pick&Place FESTO
No ratings yet
Manual Módulo Pick&Place FESTO
51 pages
E Series 2019 Brochure 2020 05
No ratings yet
E Series 2019 Brochure 2020 05
32 pages
AI Project Cycle
No ratings yet
AI Project Cycle
10 pages
Ayandeep Chatterjee's Project PDF
No ratings yet
Ayandeep Chatterjee's Project PDF
15 pages
Mohamed Adel 2021.CV
No ratings yet
Mohamed Adel 2021.CV
2 pages
Wiring Diagram Manual Flaps
No ratings yet
Wiring Diagram Manual Flaps
15 pages
Kaspersky Security Center 13 Web Console Installation Parameters
No ratings yet
Kaspersky Security Center 13 Web Console Installation Parameters
5 pages
XDJ-R1 Firmware Update Guide E2
No ratings yet
XDJ-R1 Firmware Update Guide E2
2 pages
Puzzle Diagrams in Editable Powerpoint
100% (3)
Puzzle Diagrams in Editable Powerpoint
22 pages
Fixel-Based Analysis of Visual Pathway White Matter in Primary Open-Angle Glaucoma
No ratings yet
Fixel-Based Analysis of Visual Pathway White Matter in Primary Open-Angle Glaucoma
1 page
LinearModels Slides
No ratings yet
LinearModels Slides
130 pages
FMC Video Daughter Card
No ratings yet
FMC Video Daughter Card
34 pages
Profile Summary: Arjun BV Phone: +91-6362476588/9986489284
No ratings yet
Profile Summary: Arjun BV Phone: +91-6362476588/9986489284
5 pages
Acessibilidade Digital PDF
No ratings yet
Acessibilidade Digital PDF
104 pages
SocialProfileGuidelineV2 4
No ratings yet
SocialProfileGuidelineV2 4
15 pages
Manual de Instalare Modul de Extensie Paradox ZX8
No ratings yet
Manual de Instalare Modul de Extensie Paradox ZX8
2 pages
KSI Book-Publish-2023
No ratings yet
KSI Book-Publish-2023
242 pages
CAED Question Bank
No ratings yet
CAED Question Bank
23 pages
Rubicon Cables Industries
No ratings yet
Rubicon Cables Industries
11 pages
EN 840Dsl Safety v48 2018-03
No ratings yet
EN 840Dsl Safety v48 2018-03
124 pages
A Glossary of Computer-Related Words in Tamil
No ratings yet
A Glossary of Computer-Related Words in Tamil
5 pages
Treazure API Documentation 1.18.0
No ratings yet
Treazure API Documentation 1.18.0
101 pages
Data Mining Primer
No ratings yet
Data Mining Primer
15 pages
AutoCAD History
No ratings yet
AutoCAD History
10 pages
Chapter 10 Parallelism A N D S e N T e N C e Problems
No ratings yet
Chapter 10 Parallelism A N D S e N T e N C e Problems
47 pages

2023 Statistics Fin 10

Uploaded by

2023 Statistics Fin 10

Uploaded by

Statistics

Regression - helps to determine the functional relationship between 2 or more variables

Linear regression models

Simple linear regression - is a statistical model used to study the relationship

‘Parametric’ - we estimate different parameters of 𝛼1 𝑌 = 𝛼0 + 𝛼1 𝑋

ONLY random pattern of residuals indicates

3. The independent variables should not be correlated→NO multicollinearity

Scatter Plot Matrix

The variables are:

variance - arises in presence of outliers

variance arises in presence

5. The error terms must be normally distributed - 𝜀~𝑁 0, 𝜎 2

Normal Q-Q Plot (quantile-quantile)

Q-Q Plot helps validate the assumption of normal

- Absence of normality in the errors can be seen

• If residuals show problems with

• if we don’t meet the linearity

The population simple linear regression model

sales raw data

different lines passing

line that minimizes sum of the squared errors - SSE

some errors are positive and others are negative

The regression function y with respect to x in the random sample

Least-squares regression estimators:

𝒂𝟎 informs what would be the hypothetical value of y when x = 0

! 𝑟 and 𝑎1 always have the same sign

Simple linear regression Excel LINEST function

𝑆𝑆𝑥𝑦 σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത 571

𝑎0 = 𝑦ത − 𝑎1 𝑥ҧ = 20.91 − 0.017 ∙ 90 = 19.47

You might also like