0% found this document useful (0 votes)

4 views57 pages

Lec 4

The document discusses regression analysis, focusing on how to explore associations between numerical variables using scatterplots and correlation coefficients. It explains the importance of visualizing data trends, measuring the strength of associations, and modeling linear trends with regression equations. Additionally, it emphasizes the need for careful interpretation of results, including the slope and y-intercept of regression equations, and warns against extrapolation and assuming causation from correlation.

Uploaded by

slenderwather

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views57 pages

Lec 4

Uploaded by

slenderwather

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

Regression Analysis:

Exploring Associations
between Variables
Topics

• Explore associations between numerical

variables graphically and numerically
• Model linear trends using a regression line
SECTION 4.1 VISUALIZING
VARIABILITY WITH A SCATTERPLOT

• Use Technology to
Create a Scatterplot
• Use Scatterplots to
Investigate
Associations
Between Numerical
Variables
Visualizing Variability with a Scatterplot

Scatterplot
• The primary tool for examining relationships
between two numerical variables.
• Each point in the scatterplot represents one
observation.
• Usually created using technology such as a
computer software program or a graphing
calculator.
Median Age of Marriage for Women

Each point in the scatterplot represents one state in the U S and

the District of Columbia. Each point represents the median age
of marriage for women and men in the state. Each data point has
the form: (median age of women, median age for men).
Examining Scatterplots

Note three features:

1. Trend (like center)
2. Strength (like spread)
3. Shape
Trend

The general tendency of the scatterplot as

you read from left to right, typical trends:
1. Increasing (uphill), called a positive association
2. Decreasing (downhill), called a negative
association
3. No trend, if there is neither an uphill nor
downhill tendency
Example 1: Positive Trend

This scatterplot shows a positive trend because the graph goes

uphill as you scan from left to right. This means as the age of the
car increases, the mileage also tends to increase.
Example 2: Negative Trend

This scatterplot shows a negative trend because the graph goes

downhill as you scan from left to right. This means as literacy
rate increases, total births per woman tends to decrease.
Example 3: No Trend

This scatterplot shows no trend because the points seem to

follow no predictable pattern. This means that for every age
group we can find relatively fast and relative slow runners.
Marathon running speed does not seem to be related to age of
runner.
Example 4: Trend, Neither Positive nor
Negative

This data set shows an association between two variables,

but it cannot be characterized as positive nor negative.
Strength of an Association

Scatterplots with large amounts of scatter or

vertical variation indicate a weak association.

Scatterplots with small amounts of scatter or little

vertical variation indicate a strong association.
Example 5: Strength of Association (1 of 2)

Is there a stronger association between height and

weight or between waist size and weight?
There seems to be a stronger association between waist
size and weight (less vertical variation in the graph).
Shape: Linear
Scatterplots that cluster around a line model linear
trends. This scatterplot shows there is a linear
association between volume of searches for the
word “vampire” and the word “zombie.”
Shape: Non-Linear
Sometimes there are trends in data that are non-linear
– trends that are better modeled by a curve rather than
a line. This scatterplot shows there is a non-linear
trend between temperature and pollutant ozone levels.
Writing Descriptions of Associations

When writing a description of an association

between two numerical variables, always
include:
1. Trend
2. Shape
3. Strength
In addition, mention any observations that don’t fit
the general trend (if any).
Example 6: Describing Associations
(1 of 2)

How would you describe the association between median

age of marriage for women and median age of marriage for
men in the 50 states and the District of Columbia?
Example 6: Describing Associations
(2 of 2)

The association between median age of marriage for women and

the median age of marriage for men is positive and linear. In
other words, women who marry at an older age tend to marry
men who are an older age. The association is strong because
there is very little vertical variation in the graph.
Be Careful Describing Associations

• Always use a phrase like “tends to” when

describing an association because the trend you
are describing has variability – the association
you are describing may not be true for all
individuals.
• Always point out any data points that appear to
be unusual or not part of the general pattern.
SECTION 4.2 MEASURING
STRENGTH OF ASSOCIATION
WITH CORRELATION
• Find and Interpret
the Correlation
Coefficient
Correlation Coefficient

• A number that measures the strength of a linear

relationship
• Symbol: r
• Always between −1 and +1
• r values close to −1 or +1 indicate a strong linear
association
• r values close to 0 indicate a weak association
r Values of 1 and −1

Correlation coefficients of 1 and −1 indicate perfect

positive and perfect negative associations. The
data points lie exactly on a line.
Visualizing the Correlation Coefficient

Notice that as r increases, there is less vertical variation in the data (the
trend is stronger).
Computing the Correlation Coefficient

Background:
Data are converted to z-scores which are multiplied
together. These products are then added and the
resulting sum is divided by n − 1.
In practice: The correlation coefficient is found
using technology.
Example 7
The table below shows the heights and weights for
6 women. Compute and interpret r, the correlation
coefficient.

Height 61 62 63 64 66 68
Weight 104 110 141 125 170 160
Stat Crunch Output (1 of 2)
• Simple linear regression results:
Dependent Variable: Weight
Independent Variable: Height
Weight = −442.88235 + 9.0294118 Height
Sample size: 6
R ( correlation coefficient ) = 0.88093363
R-sq = 0.77604407
Page 1 of the output has a lot of information, but
we can see r = 0.881. Since r is close to 1, we
would say there is a strong linear association
between height and weight.
Stat Crunch Output (2 of 2)

Page 2 provides a graph of the data, including a

graph of the line that best fits the data.
Notes About the Correlation Coefficient

• Changing the order of the variables does not

change r.
• Adding a constant or multiplying by a positive
constant does not affect r.
• r is unitless.
• r is only useful to measure a linear trend –
always graph your data first before computing r
to make sure the association is linear!
SECTION 4.3 MODELING LINEAR
TRENDS

• Use Technology to
Write the
Regression Equation
• Use the Regression
Equation to Make
Appropriate
Predictions
Regression Line

• A tool for making predictions about future

observed values
• Has the formy = a + bx, where a is the y-intercept
and b is the slope

• Usually generated using appropriate technology

Example 8: Regression Equation

The scatterplot shows a fairly strong positive linear trend.

The regression equation has a slope of 2.16 and a y-
intercept of 30.46. The positive trend indicates that players
who hit more home runs tend to have more RBIs.
Example 9: Using the Regression
Equation (1 of 2)

The scatterplot shows a negative linear trend. As age of car

increases, value tends to decrease. The regression
equation is: predicted value = 21375 − 1215 age
Example 9: Using the Regression
Equation (2 of 2)

predicted value = 21,375-1215 age

Use the regression equation to predict the value of
a car that is 12 years old.

predicted value = 21,375 - 1215 age

predicted value = 21,375 - 1215 12 ( )
predicted value = $6795
Finding the Regression Equation

• To find the regression equation using

technology, follow the same steps as for finding
the correlation coefficient.
Example 10
The table below shows the heights and weights for
six women. Find the regression equation that
describes the relationship between height and
weight.
Height 61 62 63 64 66 68
Weight 104 110 141 125 170 160

Note: We previously determined that this data

followed a linear trend, so it is appropriate to find
the regression equation.
Stat Crunch Output
• Simple linear regression results:
Dependent Variable: Weight
Independent Variable: Height
Weight = −442.88235 + 9.0294118 Height
Sample size: 6
R ( correlation coefficient ) = 0.88093363
R-sq = 0.77604407
Example 11: Using the Regression
Equation

Weight = −442.882 + 9.03 Height

Use the regression equation to predict the weight
of a woman who is 65 inches tall.
Weight = −442.882 + 9.03 Height
Weight = −442.882 + 9.03 ( 65 )
Weight = 144.07 inches
Notes About the Regression Equation

• Order matters. If x and y are switched, the

regression equation will change.
• We use the x-variable to make predictions about
the y-variable, so the x-variable is called the
explanatory or predictor variable. It is also called
the independent variable.
• The y-variable is the response or predicted
variable. It is also called the dependent
variable.
Example 12

The table below shows the heights and weights for

six women. Find the regression equation that
describes the relationship between height and
weight. This time use weight as the predictor or
explanatory variable (x) and height as the
predicted or response variable (y).
Height 61 62 63 64 66 68
Weight 104 110 141 125 170 160
Example 13
Simple linear regression results:
Dependent Variable: Height
Independent Variable: Weight
Height = 52.397256 + 0.085946249 Weight
Sample size: 6
R ( correlation coefficient ) = 0.88093363
R-sq = 0.77604407
Note: r ( correlation coefficient ) remains the same;
The regression equation is different from our
previous result.
Interpreting the Slope of the
Regression Equation

• Slope tells us how much the y-variable changes

when the x-variable is increased by 1 unit.
• A slope close to 0 means there is no linear
relationship between x and y.
Example 14: Interpreting the Slope

Weight = −442.882 + 9.03 Height

The slope of this line is 9.03. The y-variable is
weight and the x-variable is height.
Interpretation:
For every additional inch in height, weight tends to
increase by 9.03 pounds.
Every increase of 1 inch in height is associated
with an increase in weight of 9.03 pounds.
Example 15: Interpreting Slope
In a previous example on the association between
age of car and value of car, the regression
equation was: predicted value = 21,375-1215age
Interpret the slope of the regression equation.
Slope = −1215, x-variable is age, y-variable is value.
Interpretation:
For each additional year of age, value of car tends
to decrease by $1215.
Each additional year of age is associated with a
decrease of $1215 in value.
Interpreting the y-Intercept of the
Regression Equation

• The y-intercept is the predicted value when x is 0.

• The y-intercept is meaningful only if it makes

sense for x to equal 0.
Example 16: Interpreting the y-Intercept
(1 of 2)

In a previous example on the association between

age of car and value of car, the regression
equation was: predicted value = 21,375-1215 age

Interpret the y-intercept of the equation, if

appropriate.
y-intercept = 21375. It is the predicted value when
x (age) is 0. In other words, when the car is new,
its value is $21,375.
Example 16: Interpreting the y-Intercept
(2 of 2)

In a previous example on the association between

height and weight in women, the regression
equation was:
Weight = −442.882 + 9.03 Height

Interpret the y-intercept, if appropriate.

y-intercept = −442.882. It is the predicted value for weight if
x (height) is 0. It is impossible to weigh −442 pounds and it
is impossible for a woman to be 0 inches tall, so in this
case the y-intercept is meaningless.
SECTION 4.4 EVALUATING THE
LINEAR MODEL

• Use Linear Models

to Describe
Associations Only
When Appropriate
• Compute and
Interpret the
Coefficient of
Determination
Cautionary Notes Regarding
Regression
• Don’t use linear models to describe non-linear
associations. Always look at a scatterplot first!
• Correlation is not causation! An association between two
variables is not sufficient evidence to conclude that a
cause-and-effect relationship exists between the
variables.
• Beware of outliers that can have a big effect on r. Always
check the scatterplot for outliers first.
• Don’t extrapolate! Don’t make predictions beyond the
range of the data, because we are not sure that the linear
trend will continue beyond the range of the data.
Example 17: Extrapolation (1 of 2)

In a previous example we found there was a

strong linear relationship between heights and
weights in women, and the regression
equation is Weight = −442.882 + 9.03Height.

What weight does this equation predict for a

woman who is 36 inches tall?
Example 17: Extrapolation (2 of 2)

Weight = −442.882 + 9.03 Height

Weight = −442.882 + 9.03 ( 36 ) = −117.8 pounds

Note: The range of the data was for women 61 to

68 inches tall. It is not appropriate to use the
regression equation to predict the height for a 36
inch tall woman since 36 is beyond the range of
the data (extrapolation).
Coefficient of Determination: r Squared

• The square of r, the correlation coefficient

• Usually converted to a percentage, so always
between 0% and 100%
• Measures how much variation in the response
variable is explained by the explanatory variable
2
• The larger r , the smaller the amount of
variation or scatter about the regression line.
Example 18: r Squared

For the data on car age and predicted value,

2
r = −0.778. Compute and interpret r .
r = ( −0.778 ) = .605, so r 2 = 60.5%.
2 2

Car age explains about 60.5% of the variation in

car value.
Section 4.1 Question
The scatterplot shows what type of relationship
between median age of marriage for men and women?

A. A strong positive relationship

B. A weak positive relationship
C. A strong negative relationship
D. A weak negative relationship
Section 4.2 Question 1
There is a negative association between the
percentage of smoke-free homes and the
percentage of high school students who smoke.
This means:
A. As the percentage of smoke-free homes has
increased, the percentage of high school smokers
has also increased.
B. As the percentage of smoke-free homes has
increased, the percentage of high school smokers
has decreased.
C. We cannot predict any trends from the given
information.
Section 4.4 Question
For a certain group of cars, there is a strong association
between city and highway mileage that can be described
by the equation:
Predicted Hwy MPG = 7.79 + 0.95 City MPG
Which of the following is an interpretation of the slope?
A. Each increase of 1 MPG in highway mileage is
associated with an increase of 0.95 in city mileage.
B. Each increase of 1 MPG in city mileage is associated
with an increase of 0.95 in highway mileage.
C. Each increase of 1 MPG in city mileage is associated
with an increase of 7.79 in highway mileage.
D. The slope of this equation is meaningless.
Section 4.2 Question 2

Which of the following correlation coefficient

values indicates the strongest association
between two variables?
A. 0.12
B. 0.42
C. 0.78
D. −0.92.
(Closest to +1 or −1)
Section 4.3 Question

When doing a regression analysis on a data

set, which of the following remain the same
no matter which variable is chosen for x and
which is chosen for y?
A. The y-intercept of the regression equation
B. The slope of the regression equation
C. The correlation coefficient
D. All of the above

S R K Iyengar, R K Jain Numerical PDF
No ratings yet
S R K Iyengar, R K Jain Numerical PDF
328 pages
Master Thesis Multiple Regression Analysis
100% (2)
Master Thesis Multiple Regression Analysis
7 pages
A FINALS Econometrics - II MCQs
100% (2)
A FINALS Econometrics - II MCQs
6 pages
Correlation
100% (1)
Correlation
29 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
Numerical Analysis - I. Jacques and C. Judd
100% (2)
Numerical Analysis - I. Jacques and C. Judd
110 pages
Tests of Association
No ratings yet
Tests of Association
22 pages
Makerere University: Chapter Four: Numerical Analysis
No ratings yet
Makerere University: Chapter Four: Numerical Analysis
36 pages
Using Econometrics A Practical Guide Seventh Edition Global Edition Studenmund Download
No ratings yet
Using Econometrics A Practical Guide Seventh Edition Global Edition Studenmund Download
63 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
IPS7e LecturePPT ch02
No ratings yet
IPS7e LecturePPT ch02
105 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
37 pages
Corelation With Example
No ratings yet
Corelation With Example
112 pages
Bi Variate 1
No ratings yet
Bi Variate 1
75 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Econometrics Cha 4
No ratings yet
Econometrics Cha 4
72 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Stats CH 4 Powerpoint
No ratings yet
Stats CH 4 Powerpoint
67 pages
Lecture SLR
No ratings yet
Lecture SLR
60 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
L3 Correlation
No ratings yet
L3 Correlation
101 pages
Gould ch04
No ratings yet
Gould ch04
62 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Week 8 2025 - Correlation and Regression
No ratings yet
Week 8 2025 - Correlation and Regression
47 pages
Regression and Correlation
No ratings yet
Regression and Correlation
32 pages
Lecture 05
No ratings yet
Lecture 05
20 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
77 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
BA 216 Lecture 5 Notes
No ratings yet
BA 216 Lecture 5 Notes
31 pages
Chapter 4 Interpolation
No ratings yet
Chapter 4 Interpolation
52 pages
Correlation
No ratings yet
Correlation
22 pages
Scatter Plot
No ratings yet
Scatter Plot
20 pages
Chapter2-ESTA3042 2020S2
No ratings yet
Chapter2-ESTA3042 2020S2
80 pages
2.1 FULL Content
No ratings yet
2.1 FULL Content
38 pages
Correlation 11 12 2024 25122024 090652pm
No ratings yet
Correlation 11 12 2024 25122024 090652pm
34 pages
Notes Scatter Plots
No ratings yet
Notes Scatter Plots
39 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Lesson 3.3 Probability Normal Distribution Linear Regression and Correlation
No ratings yet
Lesson 3.3 Probability Normal Distribution Linear Regression and Correlation
29 pages
Association
No ratings yet
Association
57 pages
ECN 652 Handout 9 Student
No ratings yet
ECN 652 Handout 9 Student
46 pages
Chapter 8
No ratings yet
Chapter 8
27 pages
Correlation and Regression
No ratings yet
Correlation and Regression
32 pages
Chapter 5 Multicollinearity
No ratings yet
Chapter 5 Multicollinearity
20 pages
5 - Chapter9-Linear Regression
No ratings yet
5 - Chapter9-Linear Regression
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
11 pages
Accuracy Assesment
No ratings yet
Accuracy Assesment
6 pages
Different Types of Interpolation
No ratings yet
Different Types of Interpolation
7 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Regression Ex
No ratings yet
Regression Ex
13 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
3.2.3 Newton's Divided Difference Interpolation: Lagrange Method Has The Following Weaknesses
No ratings yet
3.2.3 Newton's Divided Difference Interpolation: Lagrange Method Has The Following Weaknesses
31 pages
EASE Scheme of Image Zooming Documentation
No ratings yet
EASE Scheme of Image Zooming Documentation
50 pages
Correg
No ratings yet
Correg
19 pages
Ha01 - PP Test
No ratings yet
Ha01 - PP Test
17 pages
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
No ratings yet
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
17 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Functional Description C24 Limiting Decoder Block Ahead
No ratings yet
Functional Description C24 Limiting Decoder Block Ahead
20 pages
0design Criteria For Frangible Covers in Ordnance Facilities PDF
No ratings yet
0design Criteria For Frangible Covers in Ordnance Facilities PDF
30 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
23 pages
1 s2.0 S0141029619311046 Main
No ratings yet
1 s2.0 S0141029619311046 Main
11 pages
3 Linear Regression 3
No ratings yet
3 Linear Regression 3
10 pages
Correlation
No ratings yet
Correlation
19 pages
Mathematics-Ii: Ii B.Tech Ii Sem - Question Bank
No ratings yet
Mathematics-Ii: Ii B.Tech Ii Sem - Question Bank
10 pages
Creating A Grade Thickness Long Section in Leapfrog
No ratings yet
Creating A Grade Thickness Long Section in Leapfrog
21 pages
Regression
No ratings yet
Regression
7 pages
Chapter 3 - Regression
No ratings yet
Chapter 3 - Regression
8 pages
York University Adms2320 Chapter 16 Example
No ratings yet
York University Adms2320 Chapter 16 Example
7 pages
Response Dependent Variable, Predictors Explanatory Independent Variables
No ratings yet
Response Dependent Variable, Predictors Explanatory Independent Variables
9 pages
Lab 4 DONE New
No ratings yet
Lab 4 DONE New
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
8 pages
Lab 2
No ratings yet
Lab 2
4 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Numerical Methods Bank
No ratings yet
Numerical Methods Bank
11 pages
Correlation and Regression: Predicting The Unknown
No ratings yet
Correlation and Regression: Predicting The Unknown
5 pages
Problem Set 2
No ratings yet
Problem Set 2
3 pages
XonGrid Interpolation Add-In
No ratings yet
XonGrid Interpolation Add-In
5 pages
Audit Regression Nestle
No ratings yet
Audit Regression Nestle
2 pages
Thuchanh
No ratings yet
Thuchanh
1 page
Tutorialsheet 3
No ratings yet
Tutorialsheet 3
2 pages
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Lec 4

Uploaded by

Lec 4

Uploaded by

Regression Analysis:

• Explore associations between numerical

Each point in the scatterplot represents one state in the U S and

Note three features:

The general tendency of the scatterplot as

This scatterplot shows a positive trend because the graph goes

This scatterplot shows a negative trend because the graph goes

This scatterplot shows no trend because the points seem to

This data set shows an association between two variables,

Scatterplots with large amounts of scatter or

Scatterplots with small amounts of scatter or little

Is there a stronger association between height and

When writing a description of an association

How would you describe the association between median

The association between median age of marriage for women and

• Always use a phrase like “tends to” when

• A number that measures the strength of a linear

Correlation coefficients of 1 and −1 indicate perfect

Page 2 provides a graph of the data, including a

• Changing the order of the variables does not

• A tool for making predictions about future

• Usually generated using appropriate technology

The scatterplot shows a fairly strong positive linear trend.

The scatterplot shows a negative linear trend. As age of car

predicted value = 21,375-1215 age

predicted value = 21,375 - 1215 age

• To find the regression equation using

Note: We previously determined that this data

Weight = −442.882 + 9.03 Height

• Order matters. If x and y are switched, the

The table below shows the heights and weights for

• Slope tells us how much the y-variable changes

Weight = −442.882 + 9.03 Height

• The y-intercept is the predicted value when x is 0.

• The y-intercept is meaningful only if it makes

In a previous example on the association between

Interpret the y-intercept of the equation, if

In a previous example on the association between

Interpret the y-intercept, if appropriate.

• Use Linear Models

In a previous example we found there was a

What weight does this equation predict for a

Weight = −442.882 + 9.03 Height

Weight = −442.882 + 9.03 ( 36 ) = −117.8 pounds

Note: The range of the data was for women 61 to

• The square of r, the correlation coefficient

For the data on car age and predicted value,

Car age explains about 60.5% of the variation in

A. A strong positive relationship

Which of the following correlation coefficient

When doing a regression analysis on a data

You might also like