0% found this document useful (0 votes)

136 views18 pages

Simple Linear Regression

Basics of Linear Regression, Other Regression Models

Uploaded by

Ariel Raye Rica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views18 pages

Simple Linear Regression

Basics of Linear Regression, Other Regression Models

Uploaded by

Ariel Raye Rica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 18

10.

0 Simple Linear Regression

One of the most important application of statistics involves

estimating the mean value of a response variable Y or predicting
some future value y based on knowledge of a set of related variable
X. We may refer to X as the independent variable that is used to
predict the dependent variable Y.

There are a good number of variables that we can easily

associate with another, and have some sense of prediction:

 A person’s weight (Y, in kilograms) is related to his/her

height (X in centimeters). If we know the person’s height,
we can predict his/her weight. Taller people are heavier
than shorter ones, in most cases.
 The volume of water (X, in liters) in a kettle left to boil on an
oven determines how long (Y, in minutes) it will take before
the whole vat of water boils. A full kettle of tap water takes
a longer time to boil than a half-full one.
 The demand Y for a commodity in terms of units sold is
inversely proportional with its price X. When department
stores go on “sale”, more people troop to buy things there
because the prices are cheaper. The same goes for new
cellphones models whose prices go down, there will be
more people who buy them.
 The amount of rainfall X is associated with the amount of
particulates Y that are removed from a surface exposed to
rain. More rain, more removed particulates.
 The density X of a particle board or plywood determines its
stiffness Y.
 The entrance exam final grade X of an applicant to DLSU-
Manila is a predictive indicator of his/her CGPA (variable Y)
upon graduation. Most universities have this predictive
model in use. (at lease, DLSU-M and UP-Diliman does.)
 The advertising expense X incurred promoting a product
has a predictive relationship with its sales Y in pesos.
Within certain range of advert expense, the more spent on
advertising, the higher the total sales of the product. This
is like saying if a TV advertisement for a product like
Kentucky Fried Chicken ran more often during times when
people are known to tune in to television, one could expect
that more people would eat at KFC.
 The number of sweet nothings that a guy does each day for
a girl he is courting would be a predictive indicator of how
long the courtship lasts before they go steady. More sweet
nothings per day, less time until steady. (Ok, that’s a
stretch, but there are other variables to consider, like if the
girl already likes the guy in the first place, or if the guy has
pleasant features, or if the guy can really make a girl laugh,
or if the guy is famous because of being an athlete or an
entertainment celebrity. But these factors are also
predictive independent variables to the dependent variable
called “courtship length.”)

How does one determine which variable is X and which variable

is Y? Practically speaking, variable X must be the variable that can
be easily measured or else controlled by an experimenter, and
variable Y must be the variable that is thought to be associated with
X, but is of predictive interest. That is, we would want to predict Y
because it is desirable to know, and we have X to predict it because
it is hypothesized to be associated with Y.

Let’s use a fairly simple example: Height information can be

used to predict a person’s weight. We could collect a dozen males’
height and weight data:

Person Height Weight Person Height Weight

(inches) (lbs) (inches) (lbs)
1 60 105 7 65 127
2 61 110 8 66 134
3 63 115 9 67 145
4 64 120 10 67 138
5 64 118 11 68 150
6 65 124 12 72 136

We could reformat the table above so that we will have 12

pairs of x and y data points.

Let Xi be the height of the ith person, for i=1,2,..12

Yi be the weight of the ith person.

Xi 60 61 63 64 64 65 65 66 67 67 68
Yi 105 110 115 120 118 124 127 134 145 138 150

Now, we choose height to be our random variable X

because height can be easily measured -- we should be able to
say with good accuracy what a person’s height is just by sight
and some practice, but a person’s weight may be harder to
predict, but may be thought to be associated with varying value
of height X. So we will try to predict weight Y based on height X.
We could make an X-Y graph of these paired sets of data,
called a scatter plot, with height on the x-axis, and weight on
the y-axis, like on the next page:

Scatter plot of Height and Weight

155
150
145
140
Weight (lbs)

135
130
125
120
115
110
105
100
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
Height (Inches)

From this scatter plot, we would see that as a person’s

height increases, so does the weight, that these two variables
are directly proportional.

Now, we choose height to be our random variable X

because height can be easily measured, we should be able to
say with good accuracy what a person’s height is just by sight
and some practice, but a person’s weight may be harder to
predict. So we will try to predict weight Y based on height X.

We would know want to create an equation for weight Y as a

function of X:

Y = f (X)

And this equation can be said to be a line, based on the scatter

plot shown.

The equation for a line is of the form:

Y= a+bX

Where: a = y-intercept (the value of y when x is zero)

b = slope of the line.

A linear regression model can be used to predict the values

of a and b. The name “linear regression” means that the
collected data pairs are said to follow or “regress back” into a
predicted value corresponding to the equation for a line. The
linear regression line is the line such that the total squared
difference (or error) between the actual data and the prediction
equation is minimized. This linear regression line is thus also
referred to as “least squares line.”

The equations for b and a in the least squares line are as

follows:

Y=a+bX
n
 n  n 
n xi yi    xi   y i 
b  i 1  i 1  i 1 
2
n
 n 
n xi    xi 
2

i 1  i 1 
Where
n n

 yi b xi
a i 1
 i 1

n n

Let’s use the height-weight data, reproduced here:

Xi 60 61 63 64 64 65 65 66 67 67 68
Yi 105 110 115 120 118 124 127 134 145 138 150

Some straightforward calculations may be performed via common

scientific calculators using the following keystroke guide:

Casio Calculators: Sharp Calculators:

Mode LR or Mode REG>LIN Mode StatXY

This turns your calculator in their linear regression mode.

Press ScL to clear statistical memory.

Indicate that there should be no entries by pressing the keystrokes for
n and
that the screen should show zero 0.

This varies with different calculators, but your calculator should have a
“quick reference” card or info sheet stuck against its panel to show you
the keys.

To enter data, press Xi <comma> , Yi data DT , then do so again for

each x-y data pair.
Some calculators use XdYd key instead of the comma (,) key.

Then press the keys for A and B to get the values for the least squares
line.

Some calculators are fairly straightforward with this: Shift 7 and

Shift 8 . Others may use longer winded keys like Shift S-var then 
until you see A and B (and r).

The final answers should be:

a
= -111.2830
b
= 3.6540

So the least squares line should be Weight y=a+b X

Y=-111.2830 + 3.6540 X
Where Y=weight, and X=height.
Linear regression Line plotted against scatter plot points
Y= -111.2830 +
155 3.6540X
150
145
140
Weight (lbs)

135
130
125
120
115
110
105
100
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
Height (Inches)

We could now use this formula for Y to predict the weight of a

person who is 70 inches tall:

Y= f (70 inches)
Y= -111.2830 + 3.6540 (70)
= 144.497 lbs Using calculators, simply type in 70
then ŷ

Or equivalently, we could also determine X from a value of Y: say,

we want to know what should be the expected height for a person
who is 130 lbs, it would be:
130 = -111.2830 + 3.6540 X
keystrokes: 130 x̂
X= 66.0326 inches or about 5 ft 6 inches
Correlation: the strength of association between two
variables X and Y

To know if X can be a good predictor of Y, we make use of the

correlation coefficient r as a measure of the relatedness of the
two variables.

Pearson’s Correlation coefficient

n
 n  n 
n xi y i    xi   y i 
r i 1  i 1  i 1 
 n 2  n 2   n 2  n 2 
n xi    xi    n y i    y i  
 i 1  i 1    i 1  i 1  

Range of values Interpretation

of r
0.7< r < 1 Strong positive
correlation
0<r< Weak positive
correlation
0.69
-0.69< r < Weak negative
correlation
0
-1< r <- Strong negative
correlation
0.7
+1 or -1 Perfect correlation
0 No correlation

For our height-weight example:

r = 0.913150717

indicating that the correlation between height and weight is

positive (directly proportional) and that this correlation is quite
strong (r>0.7). This means that we can reliably predict weight Y
using height X and vice-versa--this predictability works both ways.
Weak Positive correlation Weak negative correlation

30 30

25 25

20 20

15 15

10 10

5 5

0 0
10 11 12 13 14 15 16 17 18
10 11 12 13 14 15 16 17 18

Strong Positive correlation

Strong Negative correlation

30
30

25
25

20
20

15
15
10
10
5
5
0
10 11 12 13 14 15 16 17 18 0
10 11 12 13 14 15 16 17 18

Practice Exercises:

1. An engineer wants to determine how much temperature affects

the average life of a component. He undertakes an experiment
using various temperatures and the resulting lifetime of the
component:

Temp 28 29 30 32 33 35 38 42 46
(oC)
Life 100 98 89 95 92 88 90 88 85
(hrs) 0 0 0 0 1 5 0 0 0

a. Fit a linear regression model to the data

b. Determine the correlation coefficient between Temp X and Life
Y and state an interpretation of this number.
c. What would be the total lifetime of the component in question
if it is exposed at a constant 40oC?
d. At what temperature would a component last to a maximum
of 900 hours?
2. A manufacturer of laundry detergent was interested in testing a
new product prior to market release. One area of concern was
the relationship between the height of the detergent suds in a
washing machine as a function of the amount of detergent added
in the wash cycle. For a standard size washing machine tub filled
to the full level, the manufacturer made random assignments of
amounts of detergent and tested them on the washing machine.
The data appear next:

Height , Y Amount,
X
28.1 27.6 6
32.3 33.2 7
34.8 35.0 8
38.2 39.4 9
43.5 46.8 10

a. Fit a linear regression model to the data with repeated observations.

b. Determine the coefficient of correlation and state an interpretation
of this number.
c. If a standard washing machine filled at full level has an allowable
suds height of only 38 cms from the full water level and the suds
can be up to an extra 10 cms before it “overflows” out of the
washing machine, how much spoonfuls of detergent should be
recommended as the maximum amount of detergent to be used?

3. An equal number of families from eight different cities of various sizes

were asked how much money they spend for food, clothing, and housing
per year. The city size and average family responses are summarized
below.

City 3 5 7 10 15 20 17 12
size 0 0 5 0 0 0 5 0
Food 4 3 4 42 41 45 44 37
0 7 0
Clothi 1 2 2 15 16 12 14 10
ng 0 0 0
Housi 1 2 1 23 26 28 26 24
ng 5 0 9
City size in millions of people, all expenditures in thousands of US
dollars.
a. Fit a simple linear model relating city size and annual expenses
per family.
b. using the correlation coefficient betweeb city size and annual
expenses per family, state whether there is a strong or weak
correlation.
c. What would be the expected annual family expense from a city
of 65 million people?
d. Can city size predict food expenditure better than city size
predict annual family expenditure. Use the correlation coefficient
for your answer.
Other Linear Regression Models: Exponential Regression,
Power Regression.

Sometimes, the data that you have may not fit into a simple
linear model. However, if you transform the original data pair via
some function like finding its natural logarithm or its inverse, you can
transform data that is inherently not linear into linear values to fit into
our simple linear regression model.

Type of Regression Estimation Model Linear Format

Exponential Model Y = a e bX Ln(Y) = ln(a) + bX
Power Regression Y= a X b Ln(Y) = ln(a) + b
Ln(X)
Inverse Regression Y= a + b/X Y= a + b (1/X)

For each of these cases, the correlation coefficient should be applied to

the linearized data to measure the degree of association.

For some Casio™ calculators, these regression functions are

already built in in the Regression modes options.

Practice Problems: (from Hayter, Probability and Statistics for

Engineers and Scientists, Duxbury, 2002. pp.656-
657)

1. Make a plot of the following data set. What

intrinsically linear function would provide a good model for this
data set? Fit a straight line into the transformed variables and
write the fitted model back in terms of the original variables.
What is the predicted value of the dependent variable y when
x=2.0?

X -2.0 -0.4 1.5 2.4 2.7 3.5 4.6 5.3 5.8 6.4 6.8
Y 5.3 8.8 13. 17. 18. 24. 28. 34. 44. 55. 72.
3 9 9 4 3 0 0 1 2

2. A bioengineer measures the growth rate of a substance by counting the number of

cells N present at various times t as shown in the following data table:
Time 1 2 3 4 5 6 7 8
t
Cells 1 3 4 7 11 22 32 54
N 2 1 2 5 9 1 7 6
Fit the model N=AeBt exhibiting exponential growth on the data,
and show how correlated is the transformed N with t.

3. In an experiment to investigate the suitability of using a

silicon tube to model the behavior of a human artery,
the following data set was collected, which relates the
pressure differential P with cross-sectional area X.
P 2 4 7 11 13 21 32 48 64 91
X 0.5 0.5 0.5 0.6 0.6 0.7 0.7 0.8 0.9 1.0
4 7 5 9 3 8 5 7 4

Show that a model P=AXb appears to provide a good fit to

the data set.
Multiple Linear Regression

In most research problems where regression analysis is applied,

more than one independent variable is needed in the regression
model. When this model is linear in the coefficients, it is called a
multiple linear regression (MLR) model. For the case of k independent
variables X1, X2 …Xk, the estimated response is obtained from the
sample regression equation:

yˆ  b0  b1 X 1    bk X k

This model is simply an extension of the simple linear regression

(LR) formula y=a+bx where the y-intercept term “b o” in the MLR model
is the corresponding y-intercept term “a” in the simple LR model.

Furthermore, the multiple LR model has multiple b ixi entries

instead of just one term as in the simple LR model.

The computational effort to find the values of the bi coefficients

is quite tiring, requiring the use of matrix algebra and solving for
inverse of a matrix. The reader is referred to common statistical texts
like : Introduction to Probability and Statistics: Principles and
Applications for Engineering and the Computing Sciences (Fourth
edition) by J.Susan Milton and Jesse C. Arnold. McGraw Hill.
www.mhhe.com or Probability and Statistics for Engineers and
Scientists by Walpole, Myers, Myers and Ye.

What this lecture note would like to show instead is how to use
Microsoft™ Excel worksheets to compute for these coefficients as well
as determining which subset of available variables Xi’s should be
included in a multiple linear regression model.
Let’s say that a country’s GNP was thought to be predicted by,
say, three indicator variables: total consumption X 1 in the capital city ,
Total investments made by the citizens X 2, and finally the city’s
government expenditure X3.

The following table shows the values of each variable Xi and the
true GNP during that year.
X1 X2 X3 Y (GNP)
50 10 100 330
50 20 150 260
50 30 200 290
50 40 280 306
70 50 240 300
70 70 350 260
80 80 200 200
80 90 750 520

Procedure to get linear regression coefficients b 0, b1. b2

….bk using Excel:

2. Load up Excel from your computer and type in the data on the
worksheet.
3. Then go to the menu item Tools-Data Analysis, and choose
Regression from the Data Analysis dialogue box.

4. Input the range of values for the Y column by highlighting cell

numbers D1..D9, and input the range of values for X column
by highlighting cell numbers A1..C9. Check on the box called
“Labels”. Toggle Output range and just press a cell where
rows below it in the worksheet is available for print outs. At
this point your filled-up dialogue box should look like this (on
the next page)

5. Press OK and you should have the report after the dialogue
box on the next page:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.974163
R Square 0.948993
Adjusted R Square 0.910738
Standard Error 28.13851
Observations 8

ANOVA
df SS MS F Significance F
Regression 3 58924.4 19641.47 24.80685 0.004794
Residual 4 3167.103 791.7758
Total 7 62091.5

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 62.341 94.4772 0.659852 0.545407 -199.97 324.6523
X1 4.454482 2.239976 1.98863 0.117635 -1.7647 10.67366
X2 -4.63867 1.271412 -3.64844 0.021801 -8.16868 -1.10866
X3 0.682428 0.082945 8.227451 0.00119 0.452135 0.912721
The coefficients are shown on this part of the report.
Therefore, for this model that uses X1, X2 and X3 as input
variables, our prediction model for Y (GNP) can be written as:

Y= 62.341 + 4.454482X1 –4.63867 X2 +0.682428 X3 Quite easily

done.
How to Interpret the figures on the Regression Report:

 What proportion of the total variation in Y can be

explained by the model ?
This is the value of R-squared. R 2 is called the coefficient of
determination.

From the Example:

Regression Statistics
Multiple R 0.974163
R Square 0.948993
Adjusted R Square 0.910738
Standard Error 28.13851
Observations 8

94.8993% of the variation in y can be explained by the model using

X1, X2 and X3.

 To test if the model containing all the variables included

is adequate to explain the variation in Y: ensure that the
“ANOVA Significance F” value is below your smallest level of
significance . (Usually referred to as alpha ). For most
statistical tests, =0.05 or less. The level of significance refers
to the probability that a model is not adequate, therefore, if the
significance F value falls below the set value , this means that
there is a far chance that the model is not adequate. Confidence
level is the complement of alpha= (1-).

From the example: Is the model adequate at 5% level of significance?

ANOVA
df SS MS F Significance F
Regression 3 58924.4 19641.47 24.80685 0.004794
Residual 4 3167.103 791.7758
Total 7 62091.5

Here, we can see that Significance-F is 0.004794, which is

well below the value of a=0.05, therefore, the model with three
variables X1, X2 and X3 are adequate to explain the variation in
GNP Y, that is, one can use the model with the coefficients
shown to predict the changing/varying values of Y.

 To determine which set of variables Xi should be included

in a model that is adequate but uses the least number of
variables. (Model selection)

Look at the p-values in the bottom (or 3rd) table of the report,
all p-values must be below significance level .

If there are any variables whose p-values are above , that

means the variable is not significant enough to explain the
variation of Y. The variable X1, X2 …Xk. whose p-values is highest
but over  must be eliminated first. Redo the regression using
EXCEL using the reduced set of variables Xi, and check the new
p-values. Iteratively eliminate the variables whose p-values are
above a. When all p-values of the remaining Xi independent
variables are below a, then stop. The resulting multiple linear
regression model should be efficient and complete.

From the Example: Which variables X1, X2 and X3 must be

retained in an efficient model to predict Y (i.e.
efficient=least number of significant variables) at a
5% level of significance.

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Since the p-value of variable X1 (total consumption in the

capital city, encircled) is 0.117635 >0.05, then X1 must be
eliminated from the model, redo regression using X2 and X3 as
input X-values.

A resulting table should look like the report below:

SUMMARY OUTPUT

Regression Statistics
Multiple R 0.947926
R Square 0.898564
Adjusted R Square 0.85799
Standard Error 35.49168
Observations 8

ANOVA
df SS MS F Significance F
Regression 2 55793.2 27896.6 22.14614 0.003277
Residual 5 6298.298 1259.66
Total 7 62091.5

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 245.7008 25.9829 9.456247 0.000223 178.9097 312.4918
X2 -2.35443 0.687503 -3.42462 0.018744 -4.12171 -0.58716
X3 0.624944 0.098062 6.37296 0.001407 0.372869 0.87702

Since all p-values are below 0.05, then stop. The efficient model
to predict Y contains X2 and X3, and the model is: Y=245.7008-
2.35443X2+0.624944X3.

Practice Exercise:

Find the model that uses the least number of Xi variables to

predict Y.
Use a=0.01

X1 X2 X3 X4 X5 X6 Y
7 9 10 12 13 22 995
6 10 18 18 19 25 1325
8 9 7 10 17 26 1452
6 7 9 25 37 27 1735
7 8 10 35 22 28 2188
8 9 11 18 15 29 1435
9 7 6 51 18 30 2980
10 6 8 16 41 32 1470
9 2 17 28 36 35 1240

Discrete Probability Distribution Updated
No ratings yet
Discrete Probability Distribution Updated
44 pages
Series1 Sol
No ratings yet
Series1 Sol
6 pages
Final Project - Regression Models
100% (1)
Final Project - Regression Models
35 pages
2013 CBC Standard Gypsum Board Ceiling Details For Suspended and Joist Framing Construction
No ratings yet
2013 CBC Standard Gypsum Board Ceiling Details For Suspended and Joist Framing Construction
68 pages
EAC Brightspace Learner Guide PDF
No ratings yet
EAC Brightspace Learner Guide PDF
31 pages
Practical Problems in Statistic
100% (1)
Practical Problems in Statistic
8 pages
Interest Rate PDF
No ratings yet
Interest Rate PDF
391 pages
Kim Dissertation
No ratings yet
Kim Dissertation
301 pages
Shiny Dashboard
No ratings yet
Shiny Dashboard
27 pages
Regression Models Course Project
100% (1)
Regression Models Course Project
4 pages
Statistics - Linear Regression - Correlation Worksheet PDF
No ratings yet
Statistics - Linear Regression - Correlation Worksheet PDF
2 pages
Vuelve A Intentarlo Cuando Estés Listo.: Week 3 Quiz
No ratings yet
Vuelve A Intentarlo Cuando Estés Listo.: Week 3 Quiz
4 pages
Week-9 Discrete Probability Distributions
No ratings yet
Week-9 Discrete Probability Distributions
97 pages
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
No ratings yet
Ggplot2 - Easy Way To Mix Multiple Graphs On The Same Page - Articles - STHDA
54 pages
Old2ans PDF
No ratings yet
Old2ans PDF
2 pages
Brightspace Orientation NEIS PDF
No ratings yet
Brightspace Orientation NEIS PDF
24 pages
Statistical Experiment: 5.1 Random Variables and Probability Distributions
No ratings yet
Statistical Experiment: 5.1 Random Variables and Probability Distributions
23 pages
2012 - Duration As A Measure of Time Structure of Bond and Interest Rate Risk - IJEP PDF
No ratings yet
2012 - Duration As A Measure of Time Structure of Bond and Interest Rate Risk - IJEP PDF
12 pages
R Examples
No ratings yet
R Examples
56 pages
AnalyticsEdge Rmanual PDF
100% (1)
AnalyticsEdge Rmanual PDF
44 pages
STAB22 Data Analysis Project Instruction-1-已转档
No ratings yet
STAB22 Data Analysis Project Instruction-1-已转档
7 pages
Duration - and - Convexity
No ratings yet
Duration - and - Convexity
22 pages
Assignment
No ratings yet
Assignment
9 pages
Using The Google Chart Tools With R
No ratings yet
Using The Google Chart Tools With R
40 pages
Linear Regression
No ratings yet
Linear Regression
28 pages
Heatmap Calculation Tutorial Using Kernel Density Estimation (KDE) Algorithm
No ratings yet
Heatmap Calculation Tutorial Using Kernel Density Estimation (KDE) Algorithm
6 pages
Chapter 1: Descriptive Statistics: 1.1 Some Terms
No ratings yet
Chapter 1: Descriptive Statistics: 1.1 Some Terms
15 pages
Topic 4 - Probability (Old Notes)
100% (1)
Topic 4 - Probability (Old Notes)
22 pages
R Exercice
No ratings yet
R Exercice
11 pages
Chapter 1 Assignment What Is Statistics?
No ratings yet
Chapter 1 Assignment What Is Statistics?
2 pages
CS2610 Final Exam: If Is - Nan Print
No ratings yet
CS2610 Final Exam: If Is - Nan Print
5 pages
Assignment 3 - Answer Key
No ratings yet
Assignment 3 - Answer Key
13 pages
Potential of The R Packages in Engineering
No ratings yet
Potential of The R Packages in Engineering
14 pages
R - Tutorial: Matrices Are Vectors
No ratings yet
R - Tutorial: Matrices Are Vectors
13 pages
Kellison FM Prep
No ratings yet
Kellison FM Prep
43 pages
Introduction To Rstudio: Creating Vectors
No ratings yet
Introduction To Rstudio: Creating Vectors
11 pages
15 Linear Regression in Geography
No ratings yet
15 Linear Regression in Geography
24 pages
9B BMGT 220 THEORY of ESTIMATION 2
No ratings yet
9B BMGT 220 THEORY of ESTIMATION 2
4 pages
Practice Test - Chap 7-9
No ratings yet
Practice Test - Chap 7-9
12 pages
Macaulay's Duration An Appreciation
No ratings yet
Macaulay's Duration An Appreciation
5 pages
KIRII CeilingSuspensionSystem
No ratings yet
KIRII CeilingSuspensionSystem
7 pages
Statistics Chapter 4 Project
No ratings yet
Statistics Chapter 4 Project
3 pages
Probability and Statistics
No ratings yet
Probability and Statistics
8 pages
Applying Duration: A Bond Hedging Example
No ratings yet
Applying Duration: A Bond Hedging Example
8 pages
Chapter 4 Student
No ratings yet
Chapter 4 Student
15 pages
I.B. Mathematics HL Core: Probability: Please Click On The Question Number You Want
No ratings yet
I.B. Mathematics HL Core: Probability: Please Click On The Question Number You Want
29 pages
Ruck Man
No ratings yet
Ruck Man
180 pages
Nature of Regression Analysis
No ratings yet
Nature of Regression Analysis
19 pages
R Workshop
No ratings yet
R Workshop
47 pages
Response
No ratings yet
Response
20 pages
Chap 2
No ratings yet
Chap 2
41 pages
QM Statistic Notes
No ratings yet
QM Statistic Notes
24 pages
Gypsum
No ratings yet
Gypsum
32 pages
Regression Analysis
No ratings yet
Regression Analysis
7 pages
Groebner Business Statistics 7 Ch07
No ratings yet
Groebner Business Statistics 7 Ch07
34 pages
Wonder of Heavens
No ratings yet
Wonder of Heavens
8 pages
Human Consciousness
No ratings yet
Human Consciousness
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
8 pages
Chapter 3 Answer Cost Accounting
100% (1)
Chapter 3 Answer Cost Accounting
17 pages
Organic Mock Exam Questions
100% (2)
Organic Mock Exam Questions
119 pages
Lecture 6 Linear Regression
No ratings yet
Lecture 6 Linear Regression
8 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
My Brothers Famous Bottom Takes Off Strong Jeremy Instant Download
No ratings yet
My Brothers Famous Bottom Takes Off Strong Jeremy Instant Download
26 pages
A. Background of OJT: On The Job Training (OJT) or Internship Program Is One of The
No ratings yet
A. Background of OJT: On The Job Training (OJT) or Internship Program Is One of The
5 pages
2 Probability
No ratings yet
2 Probability
28 pages
Essentials of Organizational Behavior 13th Edition Robbins Test Bankdownload
100% (5)
Essentials of Organizational Behavior 13th Edition Robbins Test Bankdownload
52 pages
Jurnal Sabun
No ratings yet
Jurnal Sabun
16 pages
School - Tpad - Report - For - Term - Three 2024 Template
0% (1)
School - Tpad - Report - For - Term - Three 2024 Template
3 pages
Scaffolding Works NC Ii
No ratings yet
Scaffolding Works NC Ii
67 pages
Math and Vocabulary For Civil Service Exams
97% (36)
Math and Vocabulary For Civil Service Exams
304 pages
Rotary Pumps
No ratings yet
Rotary Pumps
2 pages
Character Sheet Arcane-Arcade Fallout 2.0 Fillable 222
No ratings yet
Character Sheet Arcane-Arcade Fallout 2.0 Fillable 222
1 page
11th+M+5M+Q+DrKT - (2024-25)
No ratings yet
11th+M+5M+Q+DrKT - (2024-25)
2 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages
Importance of QAQC in Drill Core Sampling Dr. Sukanta Goswami Sakariya
No ratings yet
Importance of QAQC in Drill Core Sampling Dr. Sukanta Goswami Sakariya
20 pages
References For Sieving Machine
No ratings yet
References For Sieving Machine
4 pages
Mindscapes Class 6 To 8
No ratings yet
Mindscapes Class 6 To 8
27 pages
2024 Ultra Mock 1 English Language 1
No ratings yet
2024 Ultra Mock 1 English Language 1
3 pages
Tema 19
No ratings yet
Tema 19
15 pages
Human Relations Movement
No ratings yet
Human Relations Movement
3 pages
Hyrje Modelim
No ratings yet
Hyrje Modelim
19 pages
Practice Problems - Stoichiometry
100% (1)
Practice Problems - Stoichiometry
2 pages
Chemical Formula
No ratings yet
Chemical Formula
29 pages
3-Entropy As A State Function
No ratings yet
3-Entropy As A State Function
2 pages
Quadratic Oscillations
No ratings yet
Quadratic Oscillations
10 pages
Question Bank Physical Science Class IX Nov 2023
No ratings yet
Question Bank Physical Science Class IX Nov 2023
7 pages
Introduction To Probability
No ratings yet
Introduction To Probability
12 pages
Physical and Chemical Principles
No ratings yet
Physical and Chemical Principles
4 pages
ProteinFoldingIandII PDF
No ratings yet
ProteinFoldingIandII PDF
40 pages
Nuclear Energy: Fusion: Thermonuclear Reaction E MC
No ratings yet
Nuclear Energy: Fusion: Thermonuclear Reaction E MC
13 pages
Phisophy Conceipt
No ratings yet
Phisophy Conceipt
6 pages
Grasshoppers Case Study
No ratings yet
Grasshoppers Case Study
2 pages
5.0 Reliability
No ratings yet
5.0 Reliability
11 pages
Nur C204
No ratings yet
Nur C204
2 pages
Sampling in Statistics
No ratings yet
Sampling in Statistics
16 pages
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
No ratings yet
9.0 Estimation of A Random Variable's Possible Value: Statistical Inference Consists of Using Methods by Which One
8 pages
Frequency Distribution and Descriptive Measures
No ratings yet
Frequency Distribution and Descriptive Measures
12 pages
Lab 7
No ratings yet
Lab 7
7 pages
CRITICAL ANALYSIS Paula
No ratings yet
CRITICAL ANALYSIS Paula
2 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
4 pages
Agony Aunt
No ratings yet
Agony Aunt
2 pages
Veritas Et Misericordia 1 Finals Essay
No ratings yet
Veritas Et Misericordia 1 Finals Essay
2 pages
Gravimetric Analysis and Precipitation Equilibria
No ratings yet
Gravimetric Analysis and Precipitation Equilibria
1 page
V ( ) S D ( ) D P D:: Ariance AND Tandard Eviation OF A Iscrete Robability Istribution
No ratings yet
V ( ) S D ( ) D P D:: Ariance AND Tandard Eviation OF A Iscrete Robability Istribution
5 pages
Integrated Concepts in Chemical Equilibrium
No ratings yet
Integrated Concepts in Chemical Equilibrium
2 pages
Phase Diagrams - Examples: Phase Diagram and Variance. The Variance Varies All Over The Phase Diagram, As Shown
No ratings yet
Phase Diagrams - Examples: Phase Diagram and Variance. The Variance Varies All Over The Phase Diagram, As Shown
2 pages
Reversibility PDF
No ratings yet
Reversibility PDF
4 pages
Engineering Sciences Questions
No ratings yet
Engineering Sciences Questions
2 pages
Aqua Fin A Purification Diagram
No ratings yet
Aqua Fin A Purification Diagram
1 page
Practice Problems - Stoichiometry
No ratings yet
Practice Problems - Stoichiometry
2 pages
7-Hemholtz Energies
No ratings yet
7-Hemholtz Energies
1 page
Exam Prep for:: Using and Interpreting Statistics
From Everand
Exam Prep for:: Using and Interpreting Statistics
Mzn Lnx
No ratings yet

Simple Linear Regression

Uploaded by

Simple Linear Regression

Uploaded by

10.

0 Simple Linear Regression

One of the most important application of statistics involves

There are a good number of variables that we can easily

 A person’s weight (Y, in kilograms) is related to his/her

How does one determine which variable is X and which variable

Let’s use a fairly simple example: Height information can be

Person Height Weight Person Height Weight

We could reformat the table above so that we will have 12

Let Xi be the height of the ith person, for i=1,2,..12

Now, we choose height to be our random variable X

Scatter plot of Height and Weight

From this scatter plot, we would see that as a person’s

Now, we choose height to be our random variable X

We would know want to create an equation for weight Y as a

And this equation can be said to be a line, based on the scatter

The equation for a line is of the form:

Where: a = y-intercept (the value of y when x is zero)

A linear regression model can be used to predict the values

The equations for b and a in the least squares line are as

Let’s use the height-weight data, reproduced here:

Some straightforward calculations may be performed via common

Casio Calculators: Sharp Calculators:

This turns your calculator in their linear regression mode.

Press ScL to clear statistical memory.

To enter data, press Xi <comma> , Yi data DT , then do so again for

Some calculators are fairly straightforward with this: Shift 7 and

The final answers should be:

So the least squares line should be Weight y=a+b X

We could now use this formula for Y to predict the weight of a

Or equivalently, we could also determine X from a value of Y: say,

To know if X can be a good predictor of Y, we make use of the

Pearson’s Correlation coefficient

Range of values Interpretation

For our height-weight example:

indicating that the correlation between height and weight is

Strong Positive correlation

1. An engineer wants to determine how much temperature affects

a. Fit a linear regression model to the data

a. Fit a linear regression model to the data with repeated observations.

3. An equal number of families from eight different cities of various sizes

Type of Regression Estimation Model Linear Format

For each of these cases, the correlation coefficient should be applied to

For some Casio™ calculators, these regression functions are

Practice Problems: (from Hayter, Probability and Statistics for

1. Make a plot of the following data set. What

2. A bioengineer measures the growth rate of a substance by counting the number of

3. In an experiment to investigate the suitability of using a

Show that a model P=AXb appears to provide a good fit to

In most research problems where regression analysis is applied,

This model is simply an extension of the simple linear regression

Furthermore, the multiple LR model has multiple b ixi entries

The computational effort to find the values of the bi coefficients

Procedure to get linear regression coefficients b 0, b1. b2

4. Input the range of values for the Y column by highlighting cell

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Y= 62.341 + 4.454482X1 –4.63867 X2 +0.682428 X3 Quite easily

 What proportion of the total variation in Y can be

From the Example:

94.8993% of the variation in y can be explained by the model using

 To test if the model containing all the variables included

From the example: Is the model adequate at 5% level of significance?

Here, we can see that Significance-F is 0.004794, which is

 To determine which set of variables Xi should be included

If there are any variables whose p-values are above , that

From the Example: Which variables X1, X2 and X3 must be

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Since the p-value of variable X1 (total consumption in the

A resulting table should look like the report below:

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Find the model that uses the least number of Xi variables to

You might also like