Simple Regression and Correlation
Simple Regression and Correlation
Business Statistics
Regression and Correlation
• Regression analysis is the process of
constructing a mathematical model or
function that can be used to predict or
determine one variable by another variable.
Product
Quality 27 39 73 66 33 43 47 55 60 68 70 75 82
Market
share 2 3 10 9 4 6 5 8 7 9 10 13 12
Example (contd.)
• Here,
X = Product quality (independent variable)
Y = Manufacture’s market share (dependent variable)
because, we are assuming that manufacture’s market share (Y)
is affected by the quality of its product (X).
Scatter Plot
14
12
Market share (%)
10
0
20 30 40 50 60 70 80 90
Product quality
Scatter Plots (also called scatter diagrams) are used to investigate the
possible relationship between two variables that both relate to the
same "event.“.
Estimators of the Coefficients
• Good estimators of the population coefficients β0 and
β1 can be estimated from sample data.
• Let, b0 and b1 are estimators of β0 and β1.
• Then our prediction equation:
Ŷ = b0 + b1X
where, Ŷ is the predicted value for a given X.
• The best way of obtaining the estimators is known as
least square technique.
Simple Linear Regression Model
(continued)
Y Yi β 0 β1X i ε i
Observed Value
of Y for Xi
εi Slope = β1
Intercept = β0
Xi
X
Interpretation of b0 and b1
• b0 is the estimated average value of Y when the value
of X is zero.
i i i i 0 1 i
e 2
i 1
(Y ˆ
Y
i 1
) 2
(Y b
i 1
b X ) 2
Least Square Method
• Finally we will obtain,
SS XY
b1 b0 y b1 x
SS X
where,
x
2
SS x ( x x ) x
2 2
n
SS xy ( x x )( y y ) xy
x ( y )
n
Problem
• The following sample data shows the demand for a product in
thousands of units and its price (Rs.) charged in six different
market areas:
Price: 10 18 14 11 16 13
Demand: 125 58 90 100 72 85
Estimate the simple linear regression relationship between
price and demand.
Estimate the demand for the product in a market where it is
priced at Rs. 15.
Error Variance
• The error variance σ2 is a measure of the spread of
the population elements about the regression line.
• Generally, smaller the error variance, the more closely
the population elements follow the regression line.
• An unbiased estimator of , denoted by s2, is the mean
squared error (MSE) of the regression.
• The estimate s = √(MSE) of the standard deviation of
the regression errors is called standard error of
estimate.
Computation of MSE
• Degrees of freedom (error) = n-2
SSE (Y Y )
ˆ 2
(SS )2
=SS XY
Y SS
X
=SS b SS
Y 1 XY
MSE= SSE
(n-2)
Standard Error of Regression Coefficients
s (b0 )
s x 2
nSS X
The standard error of b1 (slope) :
s
s (b1 )
SS X
Confidence Intervals for the Regression
Coefficients
1 (x x ) 2
yˆ t s 1
2 n SS X
where tα/2 is based on (n-2) degrees of freedom.
Hypothesis Tests for the Slope
of the Regression Model
• Suppose our null and alternative hypotheses are (for airline
cost data):
H0 : β1 = 0
H1 : β1 ≠ 0
• We have to calculate t-statistic,
b1 1( H 0 )
t
Sb
where, b1 = slope of the fitted regression
β1(H0) = actual slope hypothesized for the population
Sb = Standard error of the regression coefficient
Correlation Analysis
• Correlation analysis is the statistical tool we
can use to describe the degree to which one
variable is linearly related to another.
• The population correlation, denoted by ρ, can
take on any value from -1 to 1.
• Correlation can also be measured by
calculating coefficient of determination.
Correlation
indicates
indicatesaaperfect
perfectnegative
negativelinear
linearrelationship
relationship
-1<<<<00 indicates
-1 indicatesaanegative
negativelinear
linearrelationship
relationship
indicatesno
indicates nolinear
linearrelationship
relationship
00<<<<11 indicates
indicatesaapositive
positivelinear
linearrelationship
relationship
indicates
indicatesaaperfect
perfectpositive
positivelinear
linearrelationship
relationship
Theabsolute
The absolutevalue ofindicates
valueof indicatesthe
thestrength
strengthor
orexactness
exactnessof
ofthe
therelationship.
relationship.
Illustrations of Correlation
Y Y Y
== -1
-1 == 00
== 11
X X X
Y == -.8 Y == 00 Y
-.8
== .8
.8
X X X
Covariance and Correlation
The covariance of two random variables X and Y:
Cov ( X , Y ) E [( X )(Y )]
X Y
where and Y are the population means of X and Y respectively.
X
The population correlation coefficient:
Cov ( X , Y )
=
X Y
The sample correlation coefficient * :
SS
r= XY
SS SS
X Y
*Note:
*Note: If << 0,
If 0, b1
b1 << 00 If == 0,
If 0, b1
b1 == 00 If >> 0,
If 0, b1
b1 >0
>0
Examples of Approximate
r2 Values
Y
r2 = 1
X
r =1
2
Examples of Approximate
r2 Values
Y
0 < r2 < 1
X
Examples of Approximate
r2 Values
r2 = 0
Y