Chapter 3

This document discusses straight line regression, focusing on the relationship between a response variable Y and a single predictor variable X. It introduces key concepts such as the regression function, slope, and intercept, and explains how to estimate parameters using sample data. An illustrative example with student test scores is provided to demonstrate the application of these concepts in answering specific questions about the population data.

Uploaded by

nguyenvanbeox00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views6 pages

Chapter 3

Uploaded by

nguyenvanbeox00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

3

Straight Line Regression

3.1
Overview
In Chapter 2 we defined the regression function s1(x1, . . . ,
xk) of a response variable Y on k predictor variables X1, ... ,;
and introduced many of the basic concepts underlying
regression. In particular we learned that the best function
for predicting the Y value of an item using the values of
X1, ... , Xk is the regression function Ay (xi , ... , xk). In this
chapter we focus on the simple but important special case of
straight line regression. Accordingly, throughout this
chapter, we assume that there is only one predictor variable
X and that the graph of the regression function of Y on X is
a straight line, i.e.,
ity(x) = /30 + flix (3.1.1)
The quantity /31 is the slope and flo is the intercept of the
regression line. Thus the mean of the Y values in the
subpopulation determined by X = x is given by Ay(x) = fio
+ /31x. Recall that ay(x) denotes the standard deviation of
this subpopulation. If the entire population data are
available, then we can calculate exactly the values of /30,
p1, and cry(x) for every allowable x, but since the entire
population is almost never available in a real problem, we
cannot know the values of Po, fit, and ay(x) exactly, so we
must rely on sample data to estimate these and other
unknown quantities (parameters). In this chapter we
consider point and confidence interval estimation of
various quantities of interest, and we also discuss statistical
tests. Section 3.3 introduces two sets of assumptions
under which the theory of linear regression has been well
developed. Section 3.4 discusses point estimation of
parameters of interest. Methods for examining the validity
of regression assumptions are discussed in Section 3.5.
Confidence interval procedures and statistical tests are
described in Sections 3.6 and 3.7, respectively. Section 3.8
introduces the analysis of variance. The coefficient of
correlation and the coefficient of determination are
described in Section 3.9. The effect of measurement errors
on inferences about various model parameters is explained
in Section 3.10. Section 3.11 considers the special case of the
straight line regression model, where the regression line is
known
100 Chapter 3: Straight Line Regression

to pass through the origin. Chapter exercises appear in Section 3.12. Laboratory as-
signments describing the use of a statistical computing package (MINITAB or SAS)
for straight line regression are in Chapter 3 of the laboratory manual.
Before proceeding further, we present a detailed illustrative example where the
entire population of numbers is assumed to be available, even though it never is in
a real problem, so that you can get a better grasp of the concepts. This example
will also point out how various questions of interest, arising in real problems, can
be answered exactly when the entire population of numbers is available. Statistical
inference procedures, discussed in this chapter and throughout this book, attempt
to provide answers to such questions when only sample data, and not the entire
population, are available.

3.2
An Example of Straight Line Regression
Table D-3 in Appendix D contains a set of data consisting of 2,600 pairs of numbers
(Y, X), where Y is the score (in percent) obtained by a student on a standardized
calculus test administered at a certain university, and X is the number of hours
(recorded to the nearest hour) that the student spent studying for this test. These data
are also stored in the file grades.dat on the data disk. For purposes of illustration,
we suppose that these data form a bivariate population {(Y, X )}. The size of the
population is thus 2,600. An examination of these data shows that there are 13
distinct values of X in the population, and they are 0, 1, 2, ... , 12. The number of
observations, the means, and the standard deviations for each of the corresponding
13 subpopulations of Y values are exhibited in Table 3.2.1. A plot of the means of the

TABLE 3.2.1
Subpopulation Counts, Means, and Standard Deviations for Population Data in Table D-3
Hours Number Subpopulation Subpopulation
X of Items Mean Standard Deviation

45.0 2.881
§ 49.0 2.881
53.0 2.881
§ 57.0 2.881 ,
§ 61.0 2.88f
65.0 2.881
§§ 69.0 2.881
73.0 2.881
§§ 77.0 2.881
§§ 81.0 2.881
85.0 2.881
§§ 89.0 2.881
§§ 93.0 2.881
32 An Example of Straight Line Regression 101

Y values of these 13 subpopulations against the corresponding X values (i.e., a plot of

µp(x) against x for all allowable x values) is shown in Figure 3.2.1. This plot clearly
shows that the regression function of Y on X is of the form µp(x) = Po + PA;
i.e., the subpopulation means for Y lie on a straight line when plotted against the
corresponding values of X. Furthermore, we can calculate the values of po and pi
explicitly. In fact, the value of Po is 45.0 because the mean value of Y corresponding
to X = 0 is 45.0 (see Table 3.2.1). Also the value of pi is 4.0 because the increase
in the mean value of Y for a unit increase in X is easily seen to be 4.0%. Hence the
population regression function is
(x) = 45.0 + 4.0x
Observe also that the subpopulation standard deviations are all equal to 2.881.

FIGURE 3.2.1
8
Population data values
• •
♦ Subpopulation means

• •
•
0
00

V)
• •
8 •
•
•
0 2 4 6 8 10 12
Hours

Note These data are specifically concocted for the purpose of illustration so that
the population regression function of Y on X will be exactly a straight line and,
in addition, the subpopulation standard deviations will all be the same. In most real
problems, we cannot expect the population regression function to conform exactly to
a straight line model, and the subpopulation standard deviations cannot be expected
to all be exactly the same. But in many situations, these idealized conditions may
be met approximately. You should also be aware that in actual investigations the
number of subpopulations of Y values, determined by X, can be quite large, and
the sizes of the subpopulations need not all be the same. In this particular example,
however, we have deliberately kept the number of subpopulations rather small (13
to be precise) and the sizes of the subpopulations all equal (200 observations in each
subpopulation) for ease of discussion.
102 Chapter 3: Straight Line Regression

Thus, because we know the entire population ((Y, X )} in this example, we are
able to determine exactly the values of /30, fil and the subpopulation standard devi-
ations cry (x). Any other population summary quantity (parameter) can be calculated
exactly as well.

Some Questions of Interest

A student who is considering taking this calculus test may be interested in knowing
the answers to the following questions:
1 What is the average increase in score per additional hour of studying time?
2 What is the average score of students who did not study at all for the test?
3 What is the best predicted value of the score of a student who spent 10 hours
studying for this test?
4 Of all the students in the population who spent 10 hours studying for the test,
what proportion obtained a score of 90% or above?
(3.2.1)
We give answers to these four questions by three methods.
a Answers based on the entire population data
b Answers based on only population parameters
c Answers based on only a random sample from the population
Of course in any real problem we can use only method (c) to obtain answers, but
we give the answers to questions (1)—(4) of (3.2.1) by all three methods to help you
understand that samples really can help answer questions about the population.

a Answers Based on the Entire Population Data

Answers to the preceding questions based on the entire population data are as fol-
lows:
1 The increase in the average score for each additional hour of studying time is
equal to /31, the slope of the regression line of Y on X, which has the value 4.0.
2 The average score of students who did not study at all for the test (i.e., X = 0)
is Ay (0), which is equal to the intercept /3 0 of the regression line, which has the
value 45.0.
3 The best predicted value of the score of a student in this population who spent
10 hours studying for this test is Ay (10) = 45.0 + 4.0(10) = 85.0.
4 In Table D-3 in Appendix D, an examination of the subpopulation of Y values
corresponding to X = 10 shows that 11 out of the 200 students in this subpopu-
lation obtained a score of 90% or above. Thus the required proportion is 0.055.
32 An Example of Straight Line Regression 103

b Answers Based on Only Population Parameters

Clearly, we are able to obtain exact answers to the questions in (3.2.1) when the
entire population {(Y, X)} is available to us. In many situations, we can answer
various questions concerning the population even if we do not know the entire
population but know only certain important summary quantities (parameters) of the
population. To demonstrate this in the present example, we begin by examining the
histogram of the subpopulation of Y values determined by X = 10. The histogram is
in Figure 3.2.2, which suggests that this subpopulation is approximately Gaussian.
In fact, we should examine the subpopulation of Y values for each distinct X value
to determine if each is approximately Gaussian.

FIGURE 3.2.2
Now suppose that we do not have the entire population {(Y, X )} available to

O
7 8 85 9 9
Sco

us, but suppose we do know that the regression function of Y on -X is given by

µp (x)= 45.0 + 4.0x and that each subpopulation of Y values has a standard de-
viation equal to 2.881. Thus we know the values of /30 and fir which are 45.0 and
4.0, respectively, and we also know that ay (x) = 2.881 for each allowable x. Further-
more, by plotting the histogram of Y for each distinct value of X, we can demonstrate
that each subpopulation of Y values is (approximately) Gaussian. With this informa-
tion we can answer questions (1)—(4) in (3.2.1). Questions (1)—(3) can be answered
knowing only that the regression function of Y on X is tt,y(x) = 45.0 + 4.0x. To
answer question (4) we first observe that the mean Y value for the subpopulation
corresponding to an X value of 10 is equal to 45.0 + 4.0(10) = 85 and that its stan-
dard deviation is 2.881. We now use the fact that this subpopulation of Y values is
approximately Gaussian. The proportion of values in a Gaussian population, with

Playfair Cipher With Examples
No ratings yet
Playfair Cipher With Examples
6 pages
Applied General Statistics (HIS 223)
No ratings yet
Applied General Statistics (HIS 223)
35 pages
Unit 3 Notes
100% (2)
Unit 3 Notes
32 pages
(Https://swayam - Gov.in) : Course Outline
No ratings yet
(Https://swayam - Gov.in) : Course Outline
5 pages
Digital Signal Processing With MATLAB / Simulink: Ryan D. Reas
No ratings yet
Digital Signal Processing With MATLAB / Simulink: Ryan D. Reas
59 pages
CH - 8 (Bode Plot) PD Sol
No ratings yet
CH - 8 (Bode Plot) PD Sol
28 pages
Uttam Linear Regression 17march24
No ratings yet
Uttam Linear Regression 17march24
82 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Information To Users
No ratings yet
Information To Users
147 pages
9.5 Inequalities and Equations Practice
No ratings yet
9.5 Inequalities and Equations Practice
3 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
35 pages
Q R N N: Uaternion Ecurrent Eural Etworks
No ratings yet
Q R N N: Uaternion Ecurrent Eural Etworks
19 pages
Applied Regression Analysis.
25% (4)
Applied Regression Analysis.
9 pages
SST307 Complete
No ratings yet
SST307 Complete
72 pages
CH 17
No ratings yet
CH 17
36 pages
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
No ratings yet
(Ebook) Graybill & Iyer 2004 Regression Analysis - Concepts & Applications - With SAS & Minitab
648 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Chap 2-6 Regression
No ratings yet
Chap 2-6 Regression
17 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Regression2024 MBA
No ratings yet
Regression2024 MBA
25 pages
Complex ICA-R
No ratings yet
Complex ICA-R
8 pages
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
No ratings yet
Pradytha Galuh Putranti - 2304220013 - SSD - B ING-STAT
26 pages
Unit6 SG SelfTest AnswerKey
No ratings yet
Unit6 SG SelfTest AnswerKey
17 pages
Fallsem2024-25 Sts4021 Ss Ch2024250100090 Reference Material I 13-08-2024 Binary Palindrome 14
No ratings yet
Fallsem2024-25 Sts4021 Ss Ch2024250100090 Reference Material I 13-08-2024 Binary Palindrome 14
23 pages
Experiment No
No ratings yet
Experiment No
29 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
BS 341 Practice Test Two
No ratings yet
BS 341 Practice Test Two
9 pages
Quiz 2 Cheatsheet v3
No ratings yet
Quiz 2 Cheatsheet v3
2 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
4-Curve Fitting and Interpolation
No ratings yet
4-Curve Fitting and Interpolation
48 pages
Correlation and Regression Analysis: Pembe Begul GUNER
No ratings yet
Correlation and Regression Analysis: Pembe Begul GUNER
30 pages
CNS Module 1-Notes
No ratings yet
CNS Module 1-Notes
31 pages
Experiment 9
No ratings yet
Experiment 9
6 pages
CDMP Mock Test 2
No ratings yet
CDMP Mock Test 2
19 pages
Kleinbaum Applied Regression Analysis and Other Multivariable Methods 3 Ed PDF
0% (6)
Kleinbaum Applied Regression Analysis and Other Multivariable Methods 3 Ed PDF
9 pages
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
No ratings yet
Prof. Dr. Moustapha Ibrahim Salem Mansourms@alexu - Edu.eg 01005857099
110 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
(W-6597) Deep-Neural-Network Approach To Solving The Ab Initio Nuclear Structure Problem
No ratings yet
(W-6597) Deep-Neural-Network Approach To Solving The Ab Initio Nuclear Structure Problem
9 pages
PPB ML Notes
No ratings yet
PPB ML Notes
54 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Unit 2A MCQs (Extra Practice)
No ratings yet
Unit 2A MCQs (Extra Practice)
2 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Section 2
No ratings yet
Section 2
22 pages
Module 3 (Regression Line) and Module 4
No ratings yet
Module 3 (Regression Line) and Module 4
38 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
STB1003 - Unit-3 BSC
No ratings yet
STB1003 - Unit-3 BSC
12 pages
GenMath Q1 W4
No ratings yet
GenMath Q1 W4
16 pages
Division of Polynomials
No ratings yet
Division of Polynomials
27 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
Lect5 Math231
No ratings yet
Lect5 Math231
31 pages
Group Project
No ratings yet
Group Project
14 pages
Scatter Plot/Diagram Simple Linear Regression Model
No ratings yet
Scatter Plot/Diagram Simple Linear Regression Model
43 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Calc 5.4 Packet
No ratings yet
Calc 5.4 Packet
4 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Additional Material - Linear Regression
No ratings yet
Additional Material - Linear Regression
11 pages
Lecture 2
No ratings yet
Lecture 2
17 pages
Chapter 5 Curve Fitting
No ratings yet
Chapter 5 Curve Fitting
19 pages
Micro Lecture Plan: Control System & OEC-801B
No ratings yet
Micro Lecture Plan: Control System & OEC-801B
4 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
STAB27
No ratings yet
STAB27
51 pages
Regression: Finding The Equation of The Line of Best Fit: Background and General Principle
No ratings yet
Regression: Finding The Equation of The Line of Best Fit: Background and General Principle
6 pages
Ma724 - 38
No ratings yet
Ma724 - 38
7 pages
Spam Detection Using Compression and PSO: Conference Paper
No ratings yet
Spam Detection Using Compression and PSO: Conference Paper
10 pages
Incremental and Decremental Algorithm Design Strategies: Dr. Munesh Singh
No ratings yet
Incremental and Decremental Algorithm Design Strategies: Dr. Munesh Singh
15 pages
Alternating Direction Implicit Method
No ratings yet
Alternating Direction Implicit Method
6 pages
Mago, Jessica Marionne O. - Hypothesis Tests in Simple Linear Regression - Quiz
No ratings yet
Mago, Jessica Marionne O. - Hypothesis Tests in Simple Linear Regression - Quiz
2 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
STAT14S - PSPP: Exercise Using PSPP To Explore Bivariate Linear Regression
No ratings yet
STAT14S - PSPP: Exercise Using PSPP To Explore Bivariate Linear Regression
4 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
MTPPT5 - Convolution
No ratings yet
MTPPT5 - Convolution
11 pages
Regression
No ratings yet
Regression
6 pages
Assignment Class Xii Determinants
No ratings yet
Assignment Class Xii Determinants
2 pages
Q 2
No ratings yet
Q 2
1 page
Sampling Theorem Lecture 16
No ratings yet
Sampling Theorem Lecture 16
8 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Equation Line Parallel Perpendicular Practice
No ratings yet
Equation Line Parallel Perpendicular Practice
4 pages
7FFB? :H H Ii?ed7d7boi?i: Dehc7dh$:h7f H 7hhoic?j
No ratings yet
7FFB? :H H Ii?ed7d7boi?i: Dehc7dh$:h7f H 7hhoic?j
15 pages
Statistics Review: EEE 305 Lecture 10: Regression
No ratings yet
Statistics Review: EEE 305 Lecture 10: Regression
12 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
21 pages
3 STAT-602 Regression & Correlation
No ratings yet
3 STAT-602 Regression & Correlation
4 pages
Regression Equation
No ratings yet
Regression Equation
56 pages
Regression Methods
No ratings yet
Regression Methods
12 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
From Everand
Digital Signal and Image Processing using MATLAB, Volume 3: Advances and Applications, The Stochastic Case
Gérard Blanchet
3/5 (1)