REGRESSION
REGRESSION
REGRESSION
Regression is the estimation or prediction of unknown values of one variable from known values of another
variable. After establishing the fact of correlation between two variables, it is natural curiosity to know the extent
to which one variable varies in response to a given variation in the other variable i.e, one is interested to know the
nature of relationship between the two variables.
Regression measures the nature and extent of correlation.
LINEAR REGRASSION:
If two variates x and y are correlated i.e, there exists an association or relationship between them, then the
scatter diagram will be more or less concentrated round a curve, This curve is called the curve of regression and
the relationship is said to be expressed by means of curvilinear regression. In the particular case, when the curve
is a straight line, it is called a line of regression and the regression is said to be linear.
A line of regression is the straight line which gives the best fit in the least square sense to the given
frequency.
If the line of regression is so chosen that the sum of squares of deviation parallel to the axis of y is
minimised. It is called the line of regression of y on x and it gives the best estimate of y for any given value of x.
If the line of regression is so chosen that the sum of squares of deviation parallel to the axis of x is
minimised. It is called the line of regression of x on y and it gives the best estimate of x for any given value of y.
Y Y
𝑃𝑖 (𝑥𝑖 𝑦𝑖 )
B B
𝑃𝑖 (𝑥𝑖 𝑦𝑖 )
𝐻 𝑥𝑗 𝑦𝑖
𝐻 𝑥𝑖 𝑦𝑗
A A
O X X
O
Its equation is ̅ ( ̅) the line of regression of y on x
Alternative Method:
Instead of calculating ̅ ̅ and r, we may use the following method,
Find sum, and
Solve the equation and simultaneously for a and b
we get the required equations . The above equations are called Normal equations
(iii) If r = 0, the two lines of regression becomes ̅ ̅ which are two straight lines parallel to X
and Y axes respectively and passing through their means ̅ ̅ . They are mutually perpendicular.
(iv) If the two lines of regression will coincide.
̅̅ ̅̅ ̅̅ ̅̅ ∑( ̅)( ̅)
(v) ̅
OR ∑( ̅)
̅
̅̅ ̅̅ ∑( ̅)( ̅)
(vi) ̅
OR ∑( ̅)
̅
Property II. If one of the regression co – efficient is greater than unity, the other must be less than unity.
proof: Let
Since ( )
Similarly, if then
Property III. Arithmetic mean of regression co – efficient is greater than the correlation co – efficient.
or or which is true.
Property IV: Regression co – efficient are independent of the origin but not of scale.
Proof: Let where a, b, h and k are constant
( )
Similarly
Thus, and are both independent of a and b but not of h and k.
Property V. The correlation co – efficient and the two regression co – efficient have same sign
Proof: Regression co – efficient of on
REGRESSION
Regression co – efficient of on
where r, have their usual meaning. Explain the significance of the formula when
and .
Sr. no.
Calculations of etc
̅
̅
̅ ̅
̅̅ ( )( )
̅ ( )
REGRESSION
̅ ( )( )
̅ ( )
1 36 3 9 35 2 4 6
2 32 1 33 0 0 0
3 34 1 1 31 4
4 31 4 30 9 6
5 31 4 34 1 1
6 32 1 32 1 1
7 35 2 4 36 3 9 3
∑ ∑ ∑ ∑ ∑
∑ ∑
√ √ √ √ √
( ) ( )
To find the value of when put in the above equation
( )
REGRESSION
approximately
Therefore, the judge would have given 37 marks to the eighth performance
3. The following data regarding the heights ( ) and weights ( ) of 100 college students are given
∑ ∑ ∑ ∑ ∑ Find the coefficient of
correlation between height and weight and also the equation of regression of height and weight
Solution: The coefficients of regression are given by
∑ ∑
∑
(∑ ) ( )
∑
∑ ∑
∑
(∑ ) ( )
∑
√ √
The equation of the lines of regression of on is ̅ ( ̅)
( )
4. From 10 observations on price and supply of a commodity the following summary figures were
obtained ∑ ∑ ∑ ∑ Compute the equation of the line of
regression of on and interpret the result. Estimate the supply when price is 16 units
Solution: We obtain the values of and of the equation of the line of regression of on i.e. of the equation
from the normal equations
∑ ∑
∑ ∑ ∑
But ∑ ∑ ∑ ∑
Multiply the first equation by 13 and subtract the result from the second equation
Where are means of X and Y, are standard deviations of X and Y and r is the correlation
coefficient between X and Y. John weight 200lbs, Smith is 5 feet tall. Estimate the height of john and weight
of Smith. From the value of height of John estimate his weight. Why is it different from 200 ?
REGRESSION
lbs
Hence, the height of John inches and weight of Smith lbs
To estimate the weight of John from his height 71.25 we have to use the equation of line of regression
of on (and not of on )
i.e. ( )
Putting we get ( )
The difference is due to the fact that for estimating we use one equation and for estimating we
use another equation.
6. It is given that the means of and are 5 and 10. If the line of regression of on is parallel to the line
estimate the value of for
Solution: The line of regression of on is ̅ ( ̅)
Its slope is But this line is parallel to i.e. whose slope is
When
Writing it as we find
Suppose the second equation represents the line of regression of on
Writing it as we find
√ √ √ √
But the value of can never be greater than 1 numerically. Hence, our supposition is wrong
Now treating the first equation as representing the line of regression of on we write it as,
Treating the second equation as representing the line of regression of on we write it as,
√ √ √
(iii) For
8. The regression lines of a sample are and Find (i) Sample means ̅ and ̅
(ii) coefficient of correlation between and Also estimate when
Also verify that the sum of the coefficients of regressions is greater than .
Solution: (i) Mean ̅ and ̅ are obtained by solving the two given equations.
⁄
√ √( ) ( ) √
9. Find the angle between the lines of regression using the following data ∑ ∑
REGRESSION
10. If the tangent of the angle made by the line of regression of on is 0.6 and find the correlation
coefficient between and
Solution: If the equation of the line of regression of on is ̅ ( ̅ ) then we know that is the
slope of the line of regression. We are thus, given
But and
11. If the tangent of the angle made by the lines of regression is 0.6 and find the coefficient of
correlation between and .
Solution: We know that the tangent of the angle between two lines of regression is given by
( )
But and
( )( ) ( )
( )( )
or ⁄ (| | cannot be )
12. If and the angle between the lines of regression is find the coefficient of correlation
Solution: We have ( )
( )
√
√
13. If the arithmetic mean of regression coefficients is and their difference is find the correlation
coefficient.
Solution: Let the coefficients of regression be and Now by data and
and
and
REGRESSION
Coefficient of correlation √ √