0% found this document useful (0 votes)

13 views60 pages

Chapter 12 Notes

Chapter 12 of 'Introductory Statistics 2E' focuses on linear regression and correlation, exploring how to analyze the relationship between two or more numeric variables. It introduces concepts such as scatter plots, the line of best fit, and the correlation coefficient, which measures the strength and direction of the relationship between variables. The chapter emphasizes the importance of interpreting the slope and understanding the coefficient of determination in the context of data analysis.

Uploaded by

meenahershey52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views60 pages

Chapter 12 Notes

Uploaded by

meenahershey52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

INTRODUCTORY STATISTICS 2E

Chapter 12 LINEAR REGRESSION AND CORRELATION

POWERPOINTS
INTRODUCTION
Professionals often want to know how two or more numeric variables are related.
For example, is there a relationship between the grade on the second math exam
a student takes and the grade on the final exam? If there is a relationship, what is
the relationship and how strong is it?
In another example, your income maybe determined by your education, your
profession, your years of experience, and your ability. The amount you pay a
repair person for labor is often determined by an initial amount plus an hourly fee.
The type of data described in the examples is bivariate data — "bi" for two
variables. In reality, statisticians use multivariate data, meaning many variables.
In this chapter, you will be studying the simplest form of regression, "linear
regression" with one independent variable (x). This involves data that fits a line in
two dimensions. You will also study correlation which measures how strong the
relationship is.
FIGURE 12.1

Linear regression and correlation can help you determine if an auto mechanic’s salary
is related to his work experience. (credit: modification of work “USPS commissions local
repair-shop for some needed work on its older trucks” by Joshua Rothhaas/ Flickr, CC
BY 2.0)
FIGURE 12.2
FIGURE 12.3
FIGURE 12.4

Three possible graphs of y = a + bx.

(a) If b > 0, the line slopes upward to the right.
(b) If b = 0, the line is horizontal.
(c) If b < 0, the line slopes downward to the right.
LINEAR EQUATIONS

Linear regression for two variables is based on a linear equation

with one independent variable.

The equation has the form: y = a+bx

a and b are constant numbers.
x is the independent variable
y is the dependent variable.

Typically, you choose a value to substitute for the independent variable and then solve
for the dependent variable
LINEAR EQUATIONS

The graph of a linear equation of the form y=a+bx is a straight

line.
Any line that is not vertical can be described by this equation.
SLOPE AND Y-INTERCEPT OF A LINEAR
EQUATION

For the linear equation y=a+bx,

b=slope and a=y-intercept.

From algebra recall that the slope is a number that describes the
steepness of a line, and the y-intercept is the ycoordinate of the
point (0, a) where the line crosses the y-axis.
EXAMPLE OF LINEAR EQUATIONS

Aaron's Word Processing Service (AWPS) does word processing. The rate for
services is $32 per hour plus a $31.50 one-time charge. The total cost to a
customer depends on the number of hours it takes to complete the job. Find the
equation that expresses the total cost in terms of the number of hours required
to complete the job.
Let x= the number of hours it takes to get the job done.
Let y= the total cost to the customer.
The $31.50 is a fixed cost.
If it takes x hours to complete the job, then (32)(x) is the cost of the word
processing only.
The total cost is: y= 31.50 + 32x
EXAMPLE OF LINEAR EQUATIONS

Svetlana tutors to make extra money for college. For each

tutoring session, she charges a one-time fee of $25 plus $15 per
hour of tutoring. A linear equation that expresses the total amount
of money Svetlana earns for each session she tutors is y= 25 +
15x.
What are the independent and dependent variables? What is the
y-intercept and what is the slope? Interpret them using complete
sentences.
SOLUTION TO EXAMPLE

The independent variable (x) is the number of hours Svetlana

tutors each session.
The dependent variable (y) is the amount, in dollars, Svetlana
earns for each session.
The y-intercept is 25 (a = 25).
At the start of the tutoring session, Svetlana charges a one-time
fee of $25 (this is when x= 0).
The slope is 15 (b= 15).
For each session, Svetlana earns $15 for each hour she tutors.
YOU TRY
Exercise 1.
A vacation resort rents SCUBA equipment to certified
divers. The resort charges an up-front fee of $25 and
another fee of $12.50 an hour.
What are the dependent and independent variables?

Exercise 2.
A vacation resort rents SCUBA equipment to
certified divers. The resort charges an up-front fee
of $25 and another fee of $12.50 an hour.
Find the equation that expresses the total fee in
terms of the number of hours the equipment is
rented.

Exercise 3.
Is the equation y = 10 + 5x – 3x2 linear? Why or
why not?
12.2 | SCATTER PLOTS
SCATTERPLOT

Before we take up the discussion of linear regression and

correlation, we need to examine a way to display the relation
between two variables x and y. The most common and easiest
way is a scatter plot. The following example illustrates a scatter
plot.
SCATTERPLOT
A scatter plot shows the direction of a relationship between the variables.
A clear direction happens when there is either:
• High values of one variable occurring with high values of the other variable
or low values of one variable occurring with low values of the other variable.
• High values of one variable occurring with low values of the other variable.
You can determine the strength of the relationship by looking at the scatter
plot and seeing how close the points are to a line, a power function, an
exponential function, or to some other type of function.
For a linear relationship there is an exception. Consider a scatter plot where
all the points fall on a horizontal line providing a "perfect fit."
The horizontal line would in fact show no relationship.
TRENDS IN SCATTERPLOTS

When you look at a scatterplot, you want to notice the overall

pattern and any deviations from the pattern. The following
scatterplot examples illustrate these concepts.
PATTERNS IN SCATTERPLOTS
PATTERNS IN SCATTERPLOTS
PATTERNS IN SCATTERPLOTS
SCATTERPLOT TO LINEAR
REGRESSION

In this chapter, we are interested in scatter plots that show a linear

pattern. Linear patterns are quite common.
The linear relationship is strong if the points are close to a straight line,
except in the case of a horizontal line where there is no relationship.
If we think that the points show a linear relationship, we would like to
draw a line on the scatter plot. This line can be calculated through a
process called linear regression.
However, we only calculate a regression line if one of the variables
helps to explain or predict the other variable. If x is the independent
variable and y the dependent variable, then we can use a regression
line to predict y for a given value of x.
FIGURE 12.5

Scatter plot showing the number of m-commerce users (in millions) by year.
FIGURE 12.9

Scatter plot showing the scores on the final exam based on scores from the third exam.
EXAMPLE
Does the scatter plot appear linear? Strong or weak? Positive or negative?
EXAMPLE
Does the scatter plot appear linear? Strong or weak? Positive or negative?
EXAMPLE
Does the scatter plot appear linear? Strong or weak? Positive or negative?
EXAMPLE
A random sample of ten professional athletes produced the following data where x is the numb
SOLUTION
EXAMPLE
12.3 | THE REGRESSION EQUATION
LINE OF BEST FIT

Data rarely fit a straight line exactly. Usually, you must

be satisfied with rough predictions. Typically, you have
a set of data whose scatter plot appears to "fit“ a
straight line. This is called a Line of Best Fit or Least-
Squares Line.
LEAST-SQUARES REGRESSION LINE

The third exam score, x, is the independent variable and the final exam score, y, is the
dependent variable. We will plot a regression line that best "fits" the data.
If each of you were to fit a line "by eye," you would draw different lines. We can use
what is called a least-squares regression line to obtain the best fit line.
Y-HAT

The ŷ is read "yhat“ and is the estimated value of y. It is the value of y

obtained using the regression line. It is not generally equal to y from data.
RESIDUAL OR ERROR

The term y0 – ŷ0 = ε0 is called the "error" or residual. It is not an error in the sense of
a mistake. The absolute value of a residual measures the vertical distance between
the actual value of y and the estimated value of y. In other words, it measures the
vertical distance between the actual data point and the predicted point on the line.
SUM OF SQUARED ERRORS (SSE)

ε= the Greek letter epsilon

For each data point, you can calculate the residuals or errors,
yi -ŷi =εi for i= 1, 2, 3, ..., 11.
Each |ε| is a vertical distance.
If you square each ε and add them together, you get the Sum of Squared
Errors (SSE).
Using calculus, you can determine the values of a and b that make the
SSE a minimum. When you make the SSE a minimum, you have
determined the points that are on the line of best fit.
It turns out that the line of best fit has the equation: y-hat= a + bx
LINEAR REGRESSION

The process of fitting the best-fit line is called linear

regression.
The idea behind finding the best-fit line is based on the
assumption that the data are scattered about a straight line.
The criteria for the best fit line is that the sum of the squared
errors (SSE) is minimized, that is, made as small as possible.
Any other line you might choose would have a higher SSE
than the best fit line.
This best fit line is called the least-squares regression line.
FIGURE 12.10
FIGURE 12.11
FIGURE 12.12
FIGURE 12.13
FIGURE 12.14
PRACTICE OF USING THE CALCULATOR

X y
A random sample of
65 175
11statistics students 67 133
produced the following data, 71 185
where x is the third exam 71 163
score out of 80, and y is the 66 126
final exam score out of 200. 75 198
67 153
What is the best-fit line for
70 163
this data?
71 159
69 151
69 159
PREDICTION WARNING

Remember, it is always important to plot a scatter diagram first.

If the scatter plot indicates that there is a linear relationship between
the variables, then it is reasonable to use a best fit line to make
predictions for y given x within the domain of x-values in the sample
data, but not necessarily for x-values outside that domain.
You could use the line to predict the final exam score for a student
who earned a grade of 73 on the third exam. You should NOT use the
line to predict the final exam score for a student who earned a grade
of 50 on the third exam, because 50 is not within the domain of the x-
values in the sample data, which are between 65 and 75.
UNDERSTANDING SLOPE

The slope of the line, b, describes how changes in the variables

are related. It is important to interpret the slope of the line in the
context of the situation represented by the data. You should be
able to write a sentence interpreting the slope in plain English.
INTERPRETATION OF THE SLOPE: The slope of the best-fit line
tells us how the dependent variable (y) changes for every one unit
increase in the independent (x) variable, on average.
THIRD EXAM vs FINAL EXAM EXAMPLE Slope: The slope of the
line is b= 4.83.
Interpretation: For a one-point increase in the score on the third
exam, the final exam score increases by 4.83 points, on average.
THE CORRELATION COEFFICIENT “R”

Besides looking at the scatter plot and seeing that a line seems reasonable,
how can you tell if the line is a good predictor? Use the correlation coefficient
as another indicator (besides the scatterplot) of the strength of the relationship
between xand y.
The correlation coefficient, r, developed by Karl Pearson in the early 1900s, is
numerical and provides a measure of strength and direction of the linear
association between the independent variable x and the dependent variable y.
The correlation coefficient is calculated as
R = where n= the number of data points.
If you suspect a linear relationship between x and y, then r can measure how
strong the linear relationship is.
(NEVER CALCULATE BY HAND)
WHAT THE VALUE OF R

• The value of r is always between –1 and +1: –1 ≤ r≤ 1.

• The size of the correlation r indicates the strength of the linear
relationship between x and y. Values of r close to –1or to +1
indicate a stronger linear relationship between x and y.
• If r= 0 there is absolutely no linear relationship between x and
y(no linear correlation).
• If r = 1, there is perfect positive correlation. If r = –1, there is
perfect negative correlation. In both these cases, all of the original
data points lie on a straight line. Of course, in the real world, this
will not generally happen.
WHAT THE SIGN OF R TELLS US
• A positive value of r means that when x increases, y tends to increase
and when x decreases, y tends to decrease (positive correlation).
• A negative value of r means that when x increases, y tends to decrease
and when x decreases, y tends to increase (negative correlation).
• The sign of r is the same as the sign of the slope, b, of the best-fit line.
NOTE Strong correlation does not suggest that x causes y or y causes x.
We say “ correlation does not imply causation."
FIGURE 12.15

(a) A scatter plot showing data with a positive correlation. 0 < r < 1
(b) A scatter plot showing data with a negative correlation. –1 < r < 0
(c) A scatter plot showing data with zero correlation. r = 0
THE COEFFICIENT OF DETERMINATION
“R²”
The variable r² is called the coefficient of determination and is the square
of the correlation coefficient, but is usually stated as a percent, rather
than in decimal form. It has an interpretation in the context of the data:
• r², when expressed as a percent, represents the percent of variation in
the dependent (predicted) variable y that can be explained by variation in
the independent (explanatory) variable x using the regression (best-fit)
line.
•1– r²,when expressed as a percentage, represents the percent of
variation in y that is NOT explained by variation in x using the regression
line. This can be seen as the scattering of the observed data points about
the regression line.
THE COEFFICIENT OF DETERMINATION
“R²”

Consider the third exam/final exam example introduced in the previous section
• The line of best fit is: ŷ= –173.51 + 4.83x
• The correlation coefficient is r= 0.6631
• The coefficient of determination is r² = (0.66312)² = 0.4397
• Interpretation of r² in the context of this example:
• Approximately 44% of the variation (0.4397 is approximately 0.44) in the final-exam
grades can be explained by the variation in the grades on the third exam, using the
best-fit regression line.
• Therefore ,approximately 56% of the variation (1–0.44=0.56) in the final exam
grades cannot be explained by the variation in the grades on the third exam, using the
best-fit regression line. (This is seen as the scattering of the points about the line.)
12.5 PREDICTION
PREDICTION

Recall the third exam/final exam example. We examined the scatterplot and
showed that the correlation coefficient is significant. We found the equation
of the best-fit line for the final exam grade as a function of the grade on the
third-exam. We can now use the least-squares regression line for prediction.
Suppose you want to estimate, or predict, the mean final exam score of
statistics students who received 73 on the third exam. The exam scores(x-
values) range from 65 to 75. Since73 is between the x-values 65 and 75,
substitute x=73 into the equation.
Then: = −173.51+4.83(73)=179.08
We predict that statistics students who earn a grade of 73 on the third exam
will earn a grade of 179.08 on the final exam, on average.
FIGURE 12.20
FIGURE 12.21
FIGURE 12.22
FIGURE 12.28
FIGURE 12.29
FIGURE 12.30
FIGURE 12.32
EXAMPLE

FINAL OPS-MAN-Assignment-chapter-4
100% (1)
FINAL OPS-MAN-Assignment-chapter-4
11 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
MZB127 Topic 11 Lecture Notes (Unannotated Version)
No ratings yet
MZB127 Topic 11 Lecture Notes (Unannotated Version)
14 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
No ratings yet
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
14 pages
STAT Q4 Week 9 Enhanced.v1
No ratings yet
STAT Q4 Week 9 Enhanced.v1
11 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
No ratings yet
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
25 pages
CH 12 Solutions Manual PDF
No ratings yet
CH 12 Solutions Manual PDF
43 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
51 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Gade 12 & 12 Promaths STATS 2024 June 2024
No ratings yet
Gade 12 & 12 Promaths STATS 2024 June 2024
206 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
Regression Intro
No ratings yet
Regression Intro
3 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 - DR. BASHIRU
10 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
MMW Module 10 - Correlation and Linear Regression
No ratings yet
MMW Module 10 - Correlation and Linear Regression
13 pages
Statistics of Two Variables: Functions
No ratings yet
Statistics of Two Variables: Functions
15 pages
Business Statistics MCQ 3
No ratings yet
Business Statistics MCQ 3
3 pages
Statistics and Probability Week 7 - 8
No ratings yet
Statistics and Probability Week 7 - 8
4 pages
SQQS2073 Note 1 Simple Linear Regression
No ratings yet
SQQS2073 Note 1 Simple Linear Regression
11 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Looking at Data: Relationships: Least-Squares Regression
No ratings yet
Looking at Data: Relationships: Least-Squares Regression
23 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Chap 11 Lecture
No ratings yet
Chap 11 Lecture
10 pages
Book 2 Notes-71-78
No ratings yet
Book 2 Notes-71-78
8 pages
Linear - Regression SHARP Calculator
No ratings yet
Linear - Regression SHARP Calculator
2 pages
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
56 pages
Correlation
No ratings yet
Correlation
57 pages
Reference Material Linear Regression
No ratings yet
Reference Material Linear Regression
12 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
Line of Regression Part 1
No ratings yet
Line of Regression Part 1
27 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
44 pages
Regression Analysis Sta 221
No ratings yet
Regression Analysis Sta 221
10 pages
Lecture 09
No ratings yet
Lecture 09
18 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Assignment 6 - STAT
No ratings yet
Assignment 6 - STAT
12 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
ANALYTICAL TECHNIQUES LU4 Lecture Notes
No ratings yet
ANALYTICAL TECHNIQUES LU4 Lecture Notes
25 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
10 pages
(Mathe) Simple Linear Regression and Correlation
No ratings yet
(Mathe) Simple Linear Regression and Correlation
61 pages
Activity Sheets in Statistics and Probability: Quarter 4, Week 8 Regression Analysis
No ratings yet
Activity Sheets in Statistics and Probability: Quarter 4, Week 8 Regression Analysis
13 pages
Output Input Linear Correlation Coefficient Regression Analysis
No ratings yet
Output Input Linear Correlation Coefficient Regression Analysis
6 pages
MCQs Unit 4 Correlation and Regression
88% (8)
MCQs Unit 4 Correlation and Regression
14 pages
Correlation and Regression Skill Set
No ratings yet
Correlation and Regression Skill Set
8 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)
Base Plate Calculation
100% (1)
Base Plate Calculation
8 pages
Erythropoietin Concentrated Solution (1316)
No ratings yet
Erythropoietin Concentrated Solution (1316)
5 pages
Nuclear and Radiochemistry: Prof. B.S.Tomar Prof. P.K.Mohapatra
No ratings yet
Nuclear and Radiochemistry: Prof. B.S.Tomar Prof. P.K.Mohapatra
1 page
Puneet Tandon PDF
No ratings yet
Puneet Tandon PDF
2 pages
PMOS NMOS Equations and Examples
100% (1)
PMOS NMOS Equations and Examples
3 pages
Fischer Catalog
No ratings yet
Fischer Catalog
60 pages
Diesel Generator Set MTU 16V4000 DS2250 Dimension
No ratings yet
Diesel Generator Set MTU 16V4000 DS2250 Dimension
4 pages
Pure Maths 2 2024 Edexcel
No ratings yet
Pure Maths 2 2024 Edexcel
48 pages
Vapor-Liquid Equl. K - Value
100% (1)
Vapor-Liquid Equl. K - Value
49 pages
Physics Asl Project (Edited)
No ratings yet
Physics Asl Project (Edited)
22 pages
LM Borrow
No ratings yet
LM Borrow
2 pages
Mass:mass Problems
No ratings yet
Mass:mass Problems
2 pages
Layer Resolving Numerical Scheme For Singularly Perturbed Paraboli - 2023 - Meth
No ratings yet
Layer Resolving Numerical Scheme For Singularly Perturbed Paraboli - 2023 - Meth
7 pages
Petroleum Geoscience Report
No ratings yet
Petroleum Geoscience Report
23 pages
Knapsack Problem
No ratings yet
Knapsack Problem
18 pages
Stat Basic Definitions
No ratings yet
Stat Basic Definitions
4 pages
Signal Converter Boe Bipolar en
No ratings yet
Signal Converter Boe Bipolar en
2 pages
IADS - Programming User Guide
No ratings yet
IADS - Programming User Guide
187 pages
KW 8 M
No ratings yet
KW 8 M
87 pages
4
No ratings yet
4
2 pages
Transport Phenomena - 7 - Conservation of Momentum3
No ratings yet
Transport Phenomena - 7 - Conservation of Momentum3
34 pages
Grade 10 Final Examination
No ratings yet
Grade 10 Final Examination
6 pages
History of Elliptic Curve Cryptography
No ratings yet
History of Elliptic Curve Cryptography
3 pages
An Introduction To Satellite Technologies For Tracking Wildlife
100% (1)
An Introduction To Satellite Technologies For Tracking Wildlife
64 pages
Lecture Notes - 1 - Philosphy of Sciences-AT10503-revise - Final
No ratings yet
Lecture Notes - 1 - Philosphy of Sciences-AT10503-revise - Final
23 pages
Mach4 G and M Code Reference Manual
No ratings yet
Mach4 G and M Code Reference Manual
81 pages
Process Fluid Flow
No ratings yet
Process Fluid Flow
24 pages
Shivansh Rai Project
No ratings yet
Shivansh Rai Project
55 pages
Common Tags
No ratings yet
Common Tags
3 pages