0% found this document useful (0 votes)
55 views46 pages

Simple Linear Regression Analysis - Final

The document summarizes key concepts in simple linear regression including: - The simple linear regression model relates a response variable Y to a single predictor X using a straight line. - The least squares method is used to estimate the intercept β0 and slope β1 parameters by minimizing the sum of squared residuals. - Hypothesis tests can be conducted on the slope and intercept parameters using t-tests assuming the errors are normally distributed. - The coefficient of determination R2 indicates how well the model fits the data.

Uploaded by

Prarthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views46 pages

Simple Linear Regression Analysis - Final

The document summarizes key concepts in simple linear regression including: - The simple linear regression model relates a response variable Y to a single predictor X using a straight line. - The least squares method is used to estimate the intercept β0 and slope β1 parameters by minimizing the sum of squared residuals. - Hypothesis tests can be conducted on the slope and intercept parameters using t-tests assuming the errors are normally distributed. - The coefficient of determination R2 indicates how well the model fits the data.

Uploaded by

Prarthana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 46

Lecture 03 - 07

Isuru Kumarasiri
Content
• Simple Linear Regression Model
• Least square estimation of the parameters
• Hypothesis Testing on the slope and Intercept
• Interval Estimation in Simple Linear Regression
• Prediction of new observations
• Coefficient of Determination
• Estimation by maximum likelihood
Simple Linear Regression Model
• simple linear regression model, is a model with a single regressor x
that has a relationship with a response y that is a straight line.
• This simple linear regression model is:

• where the intercept β0 and the slope β1 are unknown constants and ε
is a random error component.
• The errors are assumed to have mean zero and unknown variance σ2.
Simple Linear Regression Model Contd..
• Additionally we usually assume that the errors are uncorrelated. This
means that the value of one error does not depend on the value of
any other error.

• The parameters β0 and β1 are usually called regression coefficients.


Least square estimation of the parameters
• The parameters β0 and β1 are unknown and must be estimated using
sample data.
• That is, we estimate β0 and β1 so that the sum of the squares of the
differences between the observations yi and the straight line is a
minimum.

• Thus, the least - squares criterion is


Least square estimation of the parameters
Contd…
Least square estimation of the parameters
Contd…
• Simplifying these two equations yields

• And
Least square estimation of the parameters
Contd…
• The fitted simple linear regression model is then

• Question: Obtain the regression equation of Y on X and estimate


Y when X=55 from the following by hand and using R software.

Ans: Y = 0.942X+6.08
Least square estimation of the parameters
Contd…
෢0 and 𝛽
• Therefore 𝛽 ෢1 are the least-squares estimators of the intercept
and slope, respectively. The fitted simple linear regression model is
then,
෢0 + 𝛽
𝑦ො = 𝛽 ෢1 𝑥
• Above equation gives a point estimate of the mean of y for a
particular x.
New Notations
• 𝑠𝑥𝑥 =

෢1 is,
• Thus, a convenient way to write 𝛽
Residuals
• The difference between the observed value yi and the corresponding
fitted value yˆi is a residual. Mathematically the ith residual is,

• Residuals play an important role in investigating model adequacy and


in detecting departures from the underlying assumptions. This topic is
discussed later.
Residuals Contd…
• After obtaining the least-squares fit, several interesting questions
come to mind:
1. How well does this equation fit the data?
2. Is the model likely to be useful as a predictor?
3. Are any of the basic assumptions (such as constant variance and uncorrelated
errors) violated, and if so, how serious is this?
• All of these issues must be investigated before the model is finally
adopted for use. As noted previously, the residuals play a key role in
evaluating model adequacy.
Hypothesis Testing on the slope and intercept
• We are often interested in testing hypotheses and constructing
confidence intervals about the model parameters.
• These procedures require that we make the additional assumption
that the model errors εi are normally distributed.
• Thus, the complete assumptions are that the errors are normally and
independently distributed with mean 0 and variance σ2, abbreviated
NID(0, σ2 ).
Testing using t Tests
• Suppose that we wish to test the hypothesis
that the slope equals a constant, say β10 . The
appropriate hypotheses are,

• where we have specified ed a two-sided


alternative.
• The Test statistic is,
• Which follows N(0,1) distribution(Z distribution)
When 𝜎 is unknown
• Then we can show that MSRes is an unbiased estimator of σ2.
• Then the test statistic is,

where
• And,

• follows a tn-2 distribution if the null hypothesis H0 : β1 = β10


Test Hypothesis about Intercept
• A similar procedure can be used to test hypotheses about the
intercept. To test

• we would use the test statistic,

• We reject the null hypothesis if,


Testing the significance of Regression line
• A very important special case of the hypotheses is,

• These hypotheses relate to the significance of a regression.


• Failing to reject H0: β1 = 0 implies that there is no linear relationship
between x and y
• Alternatively, if H0: β1 = 0 is rejected, this implies that x is of value in
explaining the variability in y.
Testing the significance of Regression line Contd…

• However, rejecting H0: β1 = 0


could mean either that the
straight-line model is
adequate or that even
though there is a linear
effect of x, better results
could be obtained with the
addition of higher order
polynomial terms in x.
Testing the significance of Regression line
Contd…
• The test procedure for H0: β1 = 0 may be developed by simply making
use of the t statistic with β10 = 0.
• The null hypothesis of significance of regression would be rejected if
Example 1

• Consider the dataset(rocket


propellant) and answer the
following questions.

1. Estimate the model parameters


2. Estimate σ2
3. Test for significance of regression
Example 2

• The purity of oxygen produced by a fractional


distillation process is thought to be related to
the percentage of hydrocarbons in the main
condenser of the processing unit. Twenty
samples are shown. Find,
• Fit a simple linear regression model to the
data.
• Test the hypothesis H0: β1 = 0.
Several other
Properties of
Least Square
Fit
Analysis of Variance
• We may also use an analysis-of-variance approach to test the
significance of a regression.
• The analysis of variance is based on a partitioning of the total
variability in the response variable y.
• To obtain this partitioning, begin with the identity
Analysis of
Variance
Contd..
ANOVA Contd…

• We can use the usual analysis-of-


variance F test to test the
hypothesis
H0: β1 = 0
• The test statistic is,

• follows the F1,n− 2 distribution.


Standard Error
• The denominator of the test statistic, t0, is often called the estimated
standard error, or more simply, the standard error of the slope. That
is,

and

• Similarly, for the intercept,


Interval Estimation in Simple Linear Regression
• Here, we consider confidence interval estimation of the regression
model parameters β0, β1
• If the errors are normally and independently distributed, then
100(1 − α ) percent confidence interval (CI) on the slope β1 is given by,

• and a 100(1 − α ) percent CI on the intercept β0 is,


Question

• construct 95% CIs on β1 using the rocket


propellant data from the Example.
Covariance
• Covariance measures the direction of the relationship between two
variables.
• A positive covariance means that both variables tend to be high or
low at the same time.
• A negative covariance means that when one variable is high, the
other tends to be low.
Properties of Covariance
• Let X1 and X2 denote random variables and let a, b, c, . . . denote
some constants. Then, the following properties hold.
Questions
1. Show that , Sxy= and Sxx=

2. Assuming that, =0 , show that


a.
n
b.
c.
Interval Estimation of the Mean Response
• A major use of a regression model is to estimate the mean response
E(y) for a particular value of the regressor variable x.
• Let x0 be the level of the regressor variable for which we wish to
estimate the mean response, say E(y|x0).
• An unbiased point estimator of E(y|x0) is found from the fitted model
as
Interval
Estimation of
the Mean
Response
Contd…
To obtain a 100(1 − α )
percent CI on E(y|x0)
Question
• Find a 95% CI on E (y|x0) for
the rocket propellant data in
Example. (obtain the 95% CI on the
mean response at x = x0 .)

• For 𝑥ҧ show that the CI is,


Prediction of New Observation
• An important application of the regression model is the prediction of
new observations y corresponding to a specified level of the regressor
variable x.
• If x0 is the value of the regressor variable of interest, then

• Now consider obtaining an interval estimate of this future


observation y0
• We now develop a prediction interval for the future observation y0.
Prediction of New Observation Contd…
• Note that the random variable

• is normally distributed with mean zero and variance

• Thus, the 100(1 − α ) percent prediction interval on a future


observation at x0 is
Question
• Find a 95% prediction interval on a future value of propellant shear
strength in a motor made from a batch of sustainer propellant that is
10 weeks old.
Ans:
Question
• Try page 64 Question (2.18) in the Reference book
Coefficient of Determination
• The quantity

is called the coefficient of determination


• R2 is often called the proportion of variation explained by the
regressor x.
and 0 ≤ R2 ≤ 1
• Show that the R2 for the previous example is 0.9018.
• that is, 90.18% of the variability in strength is accounted for by the
regression model.
Question
Maximum Likelihood Estimates
• The MLE is an example of a point estimate because it gives a single
value for the unknown parameter.
• Given data the maximum likelihood estimate (MLE) for the parameter
p is the value of p that maximizes the likelihood P(data|p). That is, the
MLE is the value of p for which the data is most likely.
• If is often easier to work with the natural log of the likelihood
function. For short, this is simply called the log-likelihood. Since ln(x)
is an increasing function, the maxima of the likelihood and log-
likelihood coincide.
Example
Example Contd…
Estimation(β0 and β1 )by Maximum Likelihood
• Consider the data ( yi , xi ), i = 1, 2, . . . , n . If we assume that the
errors in the regression model are NID(0, σ2 ), then the observations yi
in this sample are normally and independently distributed random
variables with mean β0 + β1xi and variance σ2.
• The Likelihood function is,
Estimation(β0 and β1 )by Maximum Likelihood
Contd…
• Show that the maximum - likelihood estimators are,
End of Simple Linear Regression Part !!!

Questions?

You might also like