What Is Simple Linear Regression?

- Simple linear regression analyzes the relationship between two continuous variables, known as the predictor (independent) variable and response (dependent) variable. - The "best fitting line" is determined by the least squares criterion, which minimizes the sum of the squared differences between observed and predicted responses (residuals). - This document discusses simple linear regression and how to determine the equation of the line that best fits a set of bivariate data using the least squares method.

Uploaded by

Phương Anh Nguyễn Phạm

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

What Is Simple Linear Regression?

Uploaded by

Phương Anh Nguyễn Phạm

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

LINEAR REGRESSION

What is Simple Linear Regression?

Simple linear regression is a statistical method that allows us to summarize and study relationships
between two continuous (quantitative) variables:

 One variable, denoted x, is regarded as the predictor or independent variable.

 The other variable, denoted y, is regarded as the response or dependent variable.
Simple linear regression just concerns the study of only one predictor variable. In contrast, multiple
linear regression gets its adjective "multiple," because it concerns the study of two or more predictor
variables.
Types of relationships
Before proceeding, we must clarify what types of relationships we won't mention in this section,
namely, deterministic (or functional) relationships. Here is an example of a deterministic
relationship.

Note that the observed (x, y) data points fall directly on a line. As you may remember, the
relationship between degrees Fahrenheit and degrees Celsius is known to be:
9
F= ×C +32
5
That is, if you know the temperature in degrees Celsius, you can use this equation to determine the
temperature in degrees Fahrenheit exactly.
Here are some examples of other deterministic relationships that you have studied in high school.

 Circumference = 2×π × diameter: C=2 π × R

V
 Ohm's Law: I = , where V = voltage applied, r = resistance, and I = current.
R
 Boyle's Law: For a constant temperature, P = α/V, where P = pressure, α = constant for each
gas, and V = volume of gas.
This section does not mention the deterministic relationships. Instead, we are interested in statistical
relationships, in which the relationship between the variables is not perfect.
Here is an example of a statistical relationship. The response variable y is the mortality due to skin
cancer (number of deaths per 10 million people) and the predictor variable x is the latitude (degrees
North) at the center of each of 49 states in the U.S.

You might anticipate that if you lived in the higher latitudes of the northern U.S., the less exposed
you'd be to the harmful rays of the sun, and therefore, the less risk you'd have of death due to skin
cancer. The scatter plot supports such a hypothesis. There appears to be a negative linear relationship
between latitude and mortality due to skin cancer, but the relationship is not perfect. Indeed, the plot
exhibits some "trend," but it also exhibits some "scatter." Therefore, it is a statistical relationship,
not a deterministic one.
Some other examples of statistical relationships might include:

 Height and weight — as height increases, you'd expect weight to increase, but not perfectly.
 Alcohol consumed and blood alcohol content — as alcohol consumption increases, you'd
expect one's blood alcohol content to increase, but not perfectly.
 Vital lung capacity and pack-years of smoking — as amount of smoking increases (as
quantified by the number of pack-years of smoking), you'd expect lung function (as quantified by
vital lung capacity) to decrease, but not perfectly.
 Driving speed and gas mileage — as driving speed increases, you'd expect gas mileage to
decrease, but not perfectly.
Okay, so let's study statistical relationships between one response variable y and one predictor
variable x!
2.2 - What is the "Best Fitting Line"?
Since we are interested in summarizing the trend between two quantitative variables, the natural
question arises — "what is the best fitting line?"
Let me give u an example: This chart shows a set of heights (x) and weights (y) of 10 students.
Looking at the plot below, which line — the solid line or the dashed line — do you think best
summarizes the trend between height and weight?

 yi denotes the observed response for experimental unit i

 xi denotes the predictor value for experimental unit i
 y^i is the predicted response (or fitted value) for experimental unit i
Then, the equation for the best fitting line is:

y^i=b0+b1xi
In our height and weight example, the experimental units ("experimental unit" is the object or
person on which the measurement is made) are students.
Let's try out the notation on our example with the trend summarized by the line w = -266.53 +
6.1376 h. The first data point in the list indicates that student 1 is 63 inches tall and weighs 127
pounds. That is, x1 = 63 and y1 = 127 . Do you see this point on the plot? If we know this student's
height but not his or her weight, we could use the equation of the line to predict his or her weight.
We'd predict the student's weight to be -266.53 + 6.1376(63) or 120.1 pounds. That is, y^1 = 120.1.
Clearly, our prediction wouldn't be perfectly correct — it has some "prediction error" (or "residual
error"). In fact, the size of its prediction error is 127-120.1 or 6.9 pounds.
You might want to roll your cursor over each of the 10 data points to make sure you understand the
notation used to keep track of the predictor values, the observed responses and the predicted
responses:
i xi yi y^i

1 63 127 120.1
2 64 121 126.3
3 66 142 138.5
4 69 157 157.0
5 69 162 157.0
6 71 156 169.2
7 71 169 169.2
8 72 165 175.4
9 73 181 181.5
1
75 208 193.8
0

As you can see, the size of the prediction error depends on the data point. If we didn't know the
weight of student 5, the equation of the line would predict his or her weight to be -266.53 +
6.1376(69) or 157 pounds. The size of the prediction error here is 162-157, or 5 pounds.

In general, when we use yî=b0+b1xi to predict the actual response yi, we make a prediction error (or
residual error) of size:
ei=yi−yî
A line that fits the data "best" will be one for which the n prediction errors — one for each
observed data point — are as small as possible in some overall sense. One way to achieve this goal
is to invoke the "least squares criterion," which says to "minimize the sum of the squared prediction
errors." That is:
 The equation of the best fitting line is: yî=b0+b1xi
 We just need to find the values b0 and b1 that make the sum of the squared prediction errors
the smallest it can be.
 That is, we need to find the values b0 and b1 that minimize:
Q=∑i=1n(yi−yî)2

Here's how you might think about this quantity Q:

 The quantity  ei=yi−yî is the prediction error for data point i.
 The quantity  e2i=(yi−yî)2 is the squared prediction error for data point i.
 And, the symbol  ∑ni=1 tells us to add up the squared prediction errors for all n data points.
Incidentally, if we didn't square the prediction error ei=yi−yî to get e2i=(yi−yî)2, the positive and
negative prediction errors would cancel each other out when summed, always yielding 0.
Now, being familiar with the least squares criterion, let's take a fresh look at our plot again. In light
of the least squares criterion, which line do you now think is the best fitting line?
Let's see how you did! The following two side-by-side tables illustrate the implementation of the
least squares criterion for the two lines up for consideration — the dashed line and the solid line.

w = -331.2 + 7.1 h (the dashed line) w = -266.53 + 6.1376 h (the solid line)
(yi−yî (yi−yî
i xi yi yî
)
(yi−yî)2 i xi yi yî
)
(yi−yî)2

1 63 127 116.1 10.9 118.81 120.13

1 63 127 6.8612 47.076
9
2 64 121 123.2 -2.2 4.84
126.27
3 66 142 137.4 4.6 21.16 2 64 121 -5.2764 27.840
6
4 69 157 158.7 -1.7 2.89 138.55
3 66 142 3.4484 11.891
5 69 162 158.7 3.3 10.89 2
6 71 156 172.9 -16.9 285.61 156.96
4 69 157 0.0356 0.001
4
7 71 169 172.9 -3.9 15.21
156.96
8 72 165 180.0 -15.0 225.00 5 69 162 5.0356 25.357
4
9 73 181 187.1 -6.1 37.21 169.24 -
6 71 156 175.287
1 0 13.2396
75 208 201.3 6.7 44.89
0 169.24
7 71 169 -0.2396 0.057
______ 0

766.5 8 72 165 175.37 - 107.686
7 10.3772
181.51
9 73 181 -0.5148 0.265
5
193.79
10 75 208 14.2100 201.924
0
______

597.4
Based on the least squares criterion, which equation best summarizes the data? The sum of the
squared prediction errors is 766.5 for the dashed line, while it is only 597.4 for the solid line.
Therefore, of the two lines, the solid line, w = -266.53 + 6.1376h, best summarizes the data. But, is
this equation guaranteed to be the best fitting line of all of the possible lines we didn't even consider?
Of course not!
If we used the above approach for finding the equation of the line that minimizes the sum of the
squared prediction errors, we'd have our work cut out for us. We'd have to implement the above
procedure for an infinite number of possible lines — clearly, an impossible task! Fortunately,
somebody has done some dirty work for us by figuring out formulas for the intercept b0 and
the slope b1 for the equation of the line that minimizes the sum of the squared prediction errors.
The formulas are determined using methods of calculus. We minimize the equation for the sum of the
squared prediction errors:

Q=∑i=1n(yi−(b0+b1xi))2

(that is, take the derivative with respect to b0 and b1, set to 0, and solve for b0 and b1) and get the
"least squares estimates" for b0 and b1:
b0=y¯−b1x¯
and:

b1=∑ni=1(xi−x¯)(yi−y¯)∑ni=1(xi−x¯)2
Because the formulas for b0 and b1 are derived using the least squares criterion, the resulting equation
— yî=b0+b1xi— is often referred to as the "least squares regression line," or simply the "least
squares line." It is also sometimes called the "estimated regression equation." Incidentally, note
that in deriving the above formulas, we made no assumptions about the data other than that they
follow some sort of linear trend.
We can see from these formulas that the least squares line passes through the point (x¯,y¯), since
when x=x¯, then y=b0+b1x¯=y¯−b1x¯+b1x¯=y¯.
In practice, you won't really need to worry about the formulas for b0 and b1. Instead, you are are
going to let statistical software, such as R or Minitab, find least squares lines for you.
One thing the estimated regression coefficients, b0 and b1, allow us to do is to predict future
responses — one of the most common uses of an estimated regression line. This use is rather
straightforward:
A common use of the estimated regression line. yî,wt=−266.53+6.1376xi,ht
yî,wt=−266.53+6.1376(66)=138.
Predict (mean) weight of 66"-inch tall people.
55
yî,wt=−266.53+6.1376(67)=144.
Predict (mean) weight of 67"-inch tall people.
69
Now, what does b0 tell us? The answer is obvious when you evaluate the estimated regression
equation at x = 0. Here, it tells us that a person who is 0 inches tall is predicted to weigh -266.53
pounds! Clearly, this prediction is nonsense. This happened because we "extrapolated" beyond the
"scope of the model" (the range of the x values). It is not meaningful to have a height of 0 inches,
that is, the scope of the model does not include x = 0. So, here the intercept b0 is not meaningful. In
general, if the "scope of the model" includes x = 0, then b0 is the predicted mean response when x =
0. Otherwise, b0 is not meaningful. There is more discussion of this here.
And, what does b1 tell us? The answer is obvious when you subtract the predicted weight of 66"-
inch tall people from the predicted weight of 67"-inch tall people. We obtain 144.69 - 138.55 = 6.14
pounds -- the value of b1. Here, it tells us that we predict the mean weight to increase by 6.14 pounds
for every additional one-inch increase in height. In general, we can expect the mean response to
increase or decrease by b1 units for every one unit increase in x.

https://drive.google.com/file/d/1ZoaueunP6p0d_--0CHm1QzWdLKXJR3pz/view?usp=sharing

Introduction to Applied Econometrics Analysis Using Stata
From Everand
Introduction to Applied Econometrics Analysis Using Stata
Justin Doran
5/5 (3)
Module 4 (Data Management) - Math 101
No ratings yet
Module 4 (Data Management) - Math 101
8 pages
Multivariate Data Analysis For Dummies CAMO PDF
100% (4)
Multivariate Data Analysis For Dummies CAMO PDF
43 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
7 pages
2311115_FM2
No ratings yet
2311115_FM2
16 pages
Regn_lect_3
No ratings yet
Regn_lect_3
10 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Student t Test Updated
No ratings yet
Student t Test Updated
7 pages
What Is The Correlation Coefficient?: Coefficient. The Sample Value Is Called R, and The Population Value Is Called
No ratings yet
What Is The Correlation Coefficient?: Coefficient. The Sample Value Is Called R, and The Population Value Is Called
22 pages
Correlation -p1 pearson product
No ratings yet
Correlation -p1 pearson product
4 pages
Topic09. Multiple Regression
No ratings yet
Topic09. Multiple Regression
36 pages
Asynchronus Learning Module - Sesi 8
No ratings yet
Asynchronus Learning Module - Sesi 8
9 pages
Correlation
No ratings yet
Correlation
9 pages
3 STAT-602 Regression & Correlation
No ratings yet
3 STAT-602 Regression & Correlation
4 pages
REGRESSION
No ratings yet
REGRESSION
7 pages
cross-correlation function and lagged regression
No ratings yet
cross-correlation function and lagged regression
9 pages
Statistics 02
No ratings yet
Statistics 02
8 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Chapter - 9 Correlations 9.0. Objectives: R. Note That Both Measures Are Taken On Each Individual Being Studied
No ratings yet
Chapter - 9 Correlations 9.0. Objectives: R. Note That Both Measures Are Taken On Each Individual Being Studied
8 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
STAT Q4 Week 9 Enhanced.v1
No ratings yet
STAT Q4 Week 9 Enhanced.v1
11 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Group Assignment Final PDF
100% (1)
Group Assignment Final PDF
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Correlation and Regression Analysis PDF
No ratings yet
Correlation and Regression Analysis PDF
11 pages
Lecture 6 Correlation and Regression
No ratings yet
Lecture 6 Correlation and Regression
10 pages
Unit 2 Correlation Analysis: 2.1. Definition
No ratings yet
Unit 2 Correlation Analysis: 2.1. Definition
9 pages
ch2 Linear Regression
No ratings yet
ch2 Linear Regression
39 pages
Session: Modeling, Simulation and Optimization
No ratings yet
Session: Modeling, Simulation and Optimization
31 pages
ANCOVA in R
No ratings yet
ANCOVA in R
29 pages
UAS EKONOMETRIKA
No ratings yet
UAS EKONOMETRIKA
8 pages
unit-5 -notes
No ratings yet
unit-5 -notes
41 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Correlation CORRELATION: Meaning and Uses of Correlation
No ratings yet
Correlation CORRELATION: Meaning and Uses of Correlation
6 pages
Free Fall: Objective: The Purpose of This Experiment Is To Prove That
100% (1)
Free Fall: Objective: The Purpose of This Experiment Is To Prove That
5 pages
LGT2425 Lecture 3 Part II (Notes)
No ratings yet
LGT2425 Lecture 3 Part II (Notes)
55 pages
Regression and Correlation
No ratings yet
Regression and Correlation
15 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
18 pages
4 STAT-602 Regression & Correlation (Mid&Final)
No ratings yet
4 STAT-602 Regression & Correlation (Mid&Final)
22 pages
Unit 5 (CORRELATION AND REGRESSION)
No ratings yet
Unit 5 (CORRELATION AND REGRESSION)
23 pages
Class X: Bivariate Association & The Chi Square Test
No ratings yet
Class X: Bivariate Association & The Chi Square Test
27 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
Decision Science Assignment
No ratings yet
Decision Science Assignment
13 pages
Report For Experiment #7 Work and Energy On An Air Track: Ali Hussein
No ratings yet
Report For Experiment #7 Work and Energy On An Air Track: Ali Hussein
15 pages
Introduction Supervised Machine Learning
No ratings yet
Introduction Supervised Machine Learning
27 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Nerlove, M., & Wallis, K. F. (1966) - Use of The Durbin-Watson Statistic in Inappropriate Situations. Econometrica, 34 (1), 235.
No ratings yet
Nerlove, M., & Wallis, K. F. (1966) - Use of The Durbin-Watson Statistic in Inappropriate Situations. Econometrica, 34 (1), 235.
5 pages
Workbook.regression.solutions
No ratings yet
Workbook.regression.solutions
52 pages
CORRELATION1
No ratings yet
CORRELATION1
16 pages
Answered Sheets Combined
No ratings yet
Answered Sheets Combined
52 pages
MRU. Fisica Experimental - Removed
No ratings yet
MRU. Fisica Experimental - Removed
5 pages
thq
No ratings yet
thq
3 pages
FINAL EXAM IN E-WPS Office
No ratings yet
FINAL EXAM IN E-WPS Office
12 pages
Chapter 8 Simple Linear Regression
100% (3)
Chapter 8 Simple Linear Regression
17 pages
7.0 Simple Linear Regressions and Correlations Analysis
No ratings yet
7.0 Simple Linear Regressions and Correlations Analysis
6 pages
Econometric Notes
No ratings yet
Econometric Notes
5 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Laboratorio#1 AR21 Gil Flores Karina
No ratings yet
Laboratorio#1 AR21 Gil Flores Karina
100 pages
Assigment PDF
No ratings yet
Assigment PDF
6 pages
A-level Physics Revision: Cheeky Revision Shortcuts
From Everand
A-level Physics Revision: Cheeky Revision Shortcuts
Scool Revision
3/5 (10)
Analyzing Forecast Results
No ratings yet
Analyzing Forecast Results
10 pages
Assessment of Public Space Quality Using Good Publ
No ratings yet
Assessment of Public Space Quality Using Good Publ
9 pages
Wondimu Abebe [email protected]
No ratings yet
Wondimu Abebe [email protected]
27 pages
(Ebooks PDF) Download The Multivariate Social Scientist Introductory Statistics Using Generalized Linear Models Sofroniou Full Chapters
100% (3)
(Ebooks PDF) Download The Multivariate Social Scientist Introductory Statistics Using Generalized Linear Models Sofroniou Full Chapters
84 pages
Syllabus As Per The NEP, 2020 (A.y. 2024-25) (Faculty)
No ratings yet
Syllabus As Per The NEP, 2020 (A.y. 2024-25) (Faculty)
44 pages
Operations Management - Midterms
No ratings yet
Operations Management - Midterms
19 pages
7887 27264 1 PB
No ratings yet
7887 27264 1 PB
14 pages
(Ebook) Business Statistics by J. K. Sharma ISBN 9788131798669, 9788177586541, 8131798666, 8177586548 2024 scribd download
No ratings yet
(Ebook) Business Statistics by J. K. Sharma ISBN 9788131798669, 9788177586541, 8131798666, 8177586548 2024 scribd download
76 pages
Chapter 3 Forecasting
No ratings yet
Chapter 3 Forecasting
26 pages
605 Midterm Solution 2016
No ratings yet
605 Midterm Solution 2016
3 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
No ratings yet
LEARNING ACTIVITY SHEET (LAS) Grade 11 - Statistics and Probability
5 pages
Instant Download Introduction To Econometrics 3rd Edition James H. Stock PDF All Chapter
100% (2)
Instant Download Introduction To Econometrics 3rd Edition James H. Stock PDF All Chapter
52 pages
Rome Laboratory Old
No ratings yet
Rome Laboratory Old
358 pages
Btech Sem6 Cs1141 Data Mining
No ratings yet
Btech Sem6 Cs1141 Data Mining
5 pages
Chapter 14 Simple Linear Regression
No ratings yet
Chapter 14 Simple Linear Regression
45 pages
Z-Score Examples With Solutions
No ratings yet
Z-Score Examples With Solutions
6 pages
Chapter 2 HRM
No ratings yet
Chapter 2 HRM
77 pages
Regression and Classification
No ratings yet
Regression and Classification
26 pages
Module 3: Demand Forecasting: Unit 5: Linear Regression Forecasting
No ratings yet
Module 3: Demand Forecasting: Unit 5: Linear Regression Forecasting
9 pages
S1 final mock
No ratings yet
S1 final mock
17 pages
Data Analytics: Relation Analysis
No ratings yet
Data Analytics: Relation Analysis
88 pages
Impact of Job Related Factors On Turn Over Intention Srilanka
No ratings yet
Impact of Job Related Factors On Turn Over Intention Srilanka
8 pages
Notes On Regression For ITM
No ratings yet
Notes On Regression For ITM
10 pages
Analysis of Recent Pharmaceutical Regula
No ratings yet
Analysis of Recent Pharmaceutical Regula
15 pages
DSBA Curriculum Guide
No ratings yet
DSBA Curriculum Guide
18 pages
Sanet ST
No ratings yet
Sanet ST
385 pages
Is Sensory Processing Associated With Prematurity, Motor and Cognitive Development at 12 Months of Age?
No ratings yet
Is Sensory Processing Associated With Prematurity, Motor and Cognitive Development at 12 Months of Age?
6 pages
Business Analytics
No ratings yet
Business Analytics
35 pages

What Is Simple Linear Regression?

Uploaded by

What Is Simple Linear Regression?

Uploaded by

LINEAR REGRESSION

What is Simple Linear Regression?

 One variable, denoted x, is regarded as the predictor or independent variable.

 Circumference = 2×π × diameter: C=2 π × R

 yi denotes the observed response for experimental unit i

Here's how you might think about this quantity Q:

1 63 127 116.1 10.9 118.81 120.13

You might also like