0% found this document useful (0 votes)

6 views20 pages

Regression Presentation

The document provides an overview of linear regression, including its definition, the least-squares regression line, and the coefficient of determination (r²). It emphasizes the importance of confirming a linear relationship before computing regression and discusses the impact of outliers and influential points on regression analysis. Additionally, it highlights that correlation does not imply causation and outlines criteria for establishing causation from observed associations.

Uploaded by

Ahaan Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views20 pages

Regression Presentation

Uploaded by

Ahaan Raza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 20

Linear Regression

The Practice of Statistics in the Life Sciences

Objectives

Regression

 What is Linear Regression

 The least-squares regression line
 Finding the least-squares regression line
 The coefficient of determination, r 2
 Outliers and influential observations
 Making predictions
 Association does not imply causation
What is Linear Regression
 Linear regression is a linear model, e.g. a model that assumes a
linear relationship between the input variables (x) and the single
output variable (y). More specifically, that y can be calculated from a
linear combination of the input variables (x).

 When there is a single input variable (x), the method is referred to

as simple linear regression. When there are multiple input
variables, literature from statistics often refers to the method as
multiple linear regression.
The least-squares regression
line
The least-squares regression line is the unique line such that the sum
of the vertical distances between the data points and the line is zero,
and the sum of the squared vertical distances is the smallest possible.
Notation
yˆ is the predicted y value on the regression line

yˆ intercept  slope x yˆ a  bx



slope < 0 slope = 0 slope > 0

Not all calculators/software use this yˆ ax  b

convention. Other notations include: ŷ b0  b1 x
yˆ variable_name x  constant
Interpretation

The slope of the regression line

describes how much we expect
y to change, on average, for
every unit change in x.

The intercept is a necessary mathematical descriptor of the

regression line. It does not describe a specific property of the data.
Finding the least-squares
regression line
sy
The slope of the regression line, b, equals: b r
sx
r is the correlation coefficient between x and y
sy is the standard deviation of the response variable y
sx is the standard deviation of the explanatory variable x

The intercept, a, equals: a  y  bx

x̅ and y̅ are the respective means of the x and y variables

Plotting the least-square regression line

Use the regression equation to find the value of y for two distinct values
of x, and draw the line that goes through those two points.
Hint: The regression line always passes through the mean of x and y.

The points used for drawing

the regression line are derived
from the equation.

They are NOT actual points

from the data set (except by
pure coincidence).
Least-squares regression is only for
linear associations
Don’t compute the regression line until you have confirmed that there is
a linear relationship between x and y.

ALWAYS PLOT THE RAW DATA

These data sets all give a

linear regression equation
of about ŷ = 3 + 0.5x.

But don’t report that until

you have plotted the data.
Moderate linear Obvious nonlinear
association; relationship;
regression OK. regression
inappropriate.

ŷ = 3 + 0.5x ŷ = 3 + 0.5x

One extreme Only two values

outlier, requiring for x; a redesign is
further due here…
examination.

ŷ = 3 + 0.5x ŷ = 3 + 0.5x
The coefficient of
determination, r 2 yˆ i  y
r 2, the coefficient of determination, is the
square of the correlation coefficient.

r 2 represents the fraction of the

variance in y that can be explained
by the regression model. yi  y

r = 0.87, so r 2 = 0.76
This model explains 76% of individual variations in BAC
r = –0.3, r 2 = 0.09, or 9%
The regression model explains not even 10% of the
variations in y.

r = –0.7, r 2 = 0.49, or 49%

The regression model explains nearly half of the
variations in y.

r = –0.99, r 2 = 0.9801, or ~98%

The regression model explains almost all of the
variations in y.
Outliers and influential points
Outlier: An observation that lies outside the overall pattern.
“Influential individual”: An observation that markedly changes the
regression if removed. This is often an isolated point.

Child 19 = outlier
(large residual)
Child 19 is an outlier of the
relationship (it is unusually
far from the regression line,
vertically).

Child 18 is isolated from the

Child 18 = potential rest of the points, and might
influential individual be an influential point.
Residuals

The vertical distances from each point to the least-squares regression

line are called residuals. The sum of all the residuals is by definition 0.

Outliers have unusually large residuals (in absolute value).

Points above the

line have a positive
residual (under
estimation).
Points below the line have a
negative residual (over
estimation).
^
Predicted y
dist. ( y  yˆ ) residual
Observed y
All data
Outlier Without child 18
Without child 19

Influential

Child 18 changes the regression line substantially when it is removed. So, Child 18
is indeed an influential point.

Child 19 is an outlier of the relationship, but it is not influential (regression line

changed very little by its removal).
Making predictions
Use the equation of the least-squares regression to predict y for any
value of x within the range studied.

Predication outside the range is extrapolation. Avoid extrapolation.

yˆ 0.0144 x  0.0008
What would we expect for the
BAC after drinking 6.5 beers?

yˆ 0.0144 * 6.5  0.0008
yˆ 0.936  0.0008 0.0944 mg / ml
Thousands Manatee
100 powerboats deaths
y = 0.1301x - 43.7 447 13
R² = 0.9061 460 21
481 24
80
498 16
513 24

Manatee deaths
512 20
60
526 15
559 34
585 33
40 614 33
The least-squares 645 39
675 43
regression line is: 20 711 50
719 47

yˆ 0.1301x  43.7 0
681
679
55
38
678 35
400 600 800 1000
696 49
Powerboats (x1000)
713 42
732 60
If Florida were to limit the number of powerboat registrations to 500,000, 755 54
809 66
what could we expect for the number of manatee deaths in a year? 830 82
880 78
yˆ 0.1301(500)  43.7  yˆ 65.05  43.7 21.35 944
962
81
95
978 73
 Roughly 21 manatee deaths. 983 69
1010 79
1024 92

Could we use this regression line to predict the number of manatee

deaths for a year with 200,000 powerboat registrations?
Association does not imply
causation
Association, however strong, does NOT imply causation.
The observed association could have an external cause.

 A lurking variable is a variable that is not among the explanatory or

response variables in a study, and yet may influence the relationship
between the variables studied.

 We say that two variables are confounded when their effects on a

response variable cannot be distinguished from each other.
In each example, what is most likely the lurking variable? Notice that some
cases are more obvious than others.
1
0.9
0.8
0.7
Strong positive association 0.6

reading index
between the shoe size and 0.5
0.4
reading skills in young children. 0.3
0.2
0.1
0
0 1 2 3 4 5 6 7
Shoe size

Strong positive association between the number firefighters

at a fire site and the amount of damage a fire does.

Negative association between moderate

amounts of wine-drinking and death rates
from heart disease in developed nations.
Establishing causation
Establishing causation from an observed association can be done if:

1) The association is strong.

2) The association is consistent.
3) Higher doses are associated with stronger responses.
4) The alleged cause precedes the effect.
5) The alleged cause is plausible.

Lung cancer is clearly associated with smoking.

What if a genetic mutation (lurking variable) caused

people to both get lung cancer and become addicted to smoking?

It took years of research and accumulated indirect evidence to reach the

conclusion that smoking causes lung cancer.

Handout9 Trees Bagging Boosting
100% (1)
Handout9 Trees Bagging Boosting
23 pages
CH 02 Wooldridge 6e PPT Updated
No ratings yet
CH 02 Wooldridge 6e PPT Updated
39 pages
Chapter4 - Part 2
No ratings yet
Chapter4 - Part 2
37 pages
Chapter 3: Describing Relationships: Section 3.2
No ratings yet
Chapter 3: Describing Relationships: Section 3.2
23 pages
Looking at Data: Relationships: Least-Squares Regression
No ratings yet
Looking at Data: Relationships: Least-Squares Regression
23 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Correlation Regression Tutorial
No ratings yet
Correlation Regression Tutorial
42 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
No ratings yet
Looking at Data: Relationships - : Caution About Correlation and Regression The Question of Causation
20 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
77 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
65 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Correlation
No ratings yet
Correlation
13 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Chapter 3 Describing Relationships
No ratings yet
Chapter 3 Describing Relationships
39 pages
Unit III
No ratings yet
Unit III
18 pages
F Regression
No ratings yet
F Regression
65 pages
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Assessing Relationships: Regression Analyses: February 25, 2020
No ratings yet
Assessing Relationships: Regression Analyses: February 25, 2020
20 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Stat2 3
No ratings yet
Stat2 3
5 pages
Ra Web
No ratings yet
Ra Web
70 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Unit 3new
No ratings yet
Unit 3new
34 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Regression
No ratings yet
Regression
25 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
QBM 101 Lecture 10
No ratings yet
QBM 101 Lecture 10
45 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-06 Reference-Material-I
No ratings yet
WINSEM2023-24 BITE410L TH VL2023240503970 2024-03-06 Reference-Material-I
21 pages
Unit III
No ratings yet
Unit III
13 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Artificial Intelligence and Machine Learning - CS3491 - Notes - Unit 3 - Supervised Learning
No ratings yet
Artificial Intelligence and Machine Learning - CS3491 - Notes - Unit 3 - Supervised Learning
37 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Ch. 3 Review Packet
No ratings yet
Ch. 3 Review Packet
9 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Document 1
No ratings yet
Document 1
1 page
Project Guide Allocation 2024-2025
No ratings yet
Project Guide Allocation 2024-2025
8 pages
Pandas Exercises
No ratings yet
Pandas Exercises
5 pages
Custom File Type Proposal
No ratings yet
Custom File Type Proposal
8 pages
Assigment 22
No ratings yet
Assigment 22
4 pages
GSTARIX Model For Forecasting Spatio-Temporal Data
No ratings yet
GSTARIX Model For Forecasting Spatio-Temporal Data
11 pages
Spearman's Rank-Order Correlation
No ratings yet
Spearman's Rank-Order Correlation
10 pages
Confidence Intervals For Point Biserial Correlation
No ratings yet
Confidence Intervals For Point Biserial Correlation
6 pages
Automobile Sales Predictions
No ratings yet
Automobile Sales Predictions
19 pages
Ardl Model
No ratings yet
Ardl Model
6 pages
ECON 330 - Econometrics - Adeel Tariq
No ratings yet
ECON 330 - Econometrics - Adeel Tariq
4 pages
Mplus User Guide Ver - 7 - r6 - Web
No ratings yet
Mplus User Guide Ver - 7 - r6 - Web
856 pages
Interpretation of Eviews Regression
100% (1)
Interpretation of Eviews Regression
6 pages
Panel Questions
No ratings yet
Panel Questions
5 pages
Interpretasi Data SPSS 16.0
No ratings yet
Interpretasi Data SPSS 16.0
6 pages
Box-Pierce Test of Autocorrelation in Panel Data Using Stata
No ratings yet
Box-Pierce Test of Autocorrelation in Panel Data Using Stata
4 pages
Computation of Pearson R
No ratings yet
Computation of Pearson R
28 pages
Anova Mcqs
100% (5)
Anova Mcqs
4 pages
Problem Set 2 SOLUTIONS
No ratings yet
Problem Set 2 SOLUTIONS
9 pages
Data Analysis: Parametric vs. Non-Parametric Tests
No ratings yet
Data Analysis: Parametric vs. Non-Parametric Tests
19 pages
Ai Unit 3
No ratings yet
Ai Unit 3
30 pages
12 Multiple Regression Part2
No ratings yet
12 Multiple Regression Part2
9 pages
Choosing The Correct Statistical Test in SAS, Stata and SPSS
No ratings yet
Choosing The Correct Statistical Test in SAS, Stata and SPSS
3 pages
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
No ratings yet
Data Analytics For Accounting - Exercise Chapter 3 Performing The Test Plan and Analyzing The Results
3 pages
15 Mlops Interview Questions For 2025
No ratings yet
15 Mlops Interview Questions For 2025
13 pages
Multivariate Data Analysis - CFA
No ratings yet
Multivariate Data Analysis - CFA
60 pages
Double Cross Validation For The Number of Factors in Approximate Factor Models
No ratings yet
Double Cross Validation For The Number of Factors in Approximate Factor Models
34 pages
Slides Module 4 Lesson 2
No ratings yet
Slides Module 4 Lesson 2
34 pages
Lecture 14
No ratings yet
Lecture 14
17 pages
Econometrics 2012 Exam
No ratings yet
Econometrics 2012 Exam
9 pages
Week2 Excel Problem Statement Real Estate-1
No ratings yet
Week2 Excel Problem Statement Real Estate-1
2 pages
Predicting Ischemic Stroke: By, Atcahaya V (1801015) Mirtuyanjana N (1801114)
No ratings yet
Predicting Ischemic Stroke: By, Atcahaya V (1801015) Mirtuyanjana N (1801114)
7 pages

Regression Presentation

Uploaded by

Regression Presentation

Uploaded by

Linear Regression

The Practice of Statistics in the Life Sciences

 What is Linear Regression

 When there is a single input variable (x), the method is referred to

slope < 0 slope = 0 slope > 0

Not all calculators/software use this yˆ ax  b

The slope of the regression line

The intercept is a necessary mathematical descriptor of the

The intercept, a, equals: a  y  bx

x̅ and y̅ are the respective means of the x and y variables

The points used for drawing

They are NOT actual points

ALWAYS PLOT THE RAW DATA

These data sets all give a

But don’t report that until

One extreme Only two values

r 2 represents the fraction of the

r = –0.7, r 2 = 0.49, or 49%

r = –0.99, r 2 = 0.9801, or ~98%

Child 18 is isolated from the

The vertical distances from each point to the least-squares regression

Outliers have unusually large residuals (in absolute value).

Points above the

Child 19 is an outlier of the relationship, but it is not influential (regression line

Predication outside the range is extrapolation. Avoid extrapolation.

Could we use this regression line to predict the number of manatee

 A lurking variable is a variable that is not among the explanatory or

 We say that two variables are confounded when their effects on a

Strong positive association between the number firefighters

Negative association between moderate

1) The association is strong.

Lung cancer is clearly associated with smoking.

What if a genetic mutation (lurking variable) caused

It took years of research and accumulated indirect evidence to reach the

You might also like