0% found this document useful (0 votes)
20 views

Regression

Regression analysis is used to study the relationship between two or more variables and predict the value of one variable from another known variable. There are different types of regression: simple regression involves two variables, multiple regression involves more than two variables. The regression curve shows how the variables are concentrated around a line or curve of best fit. Linear regression follows a straight line, while non-linear regression follows a curved line. Methods for drawing regression lines include free hand curve fitting and the method of least squares. Regression equations are used to predict values of one variable based on values of another. Correlation measures the strength of the relationship between variables, while regression determines the nature of the relationship to make predictions.

Uploaded by

murshidaman3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Regression

Regression analysis is used to study the relationship between two or more variables and predict the value of one variable from another known variable. There are different types of regression: simple regression involves two variables, multiple regression involves more than two variables. The regression curve shows how the variables are concentrated around a line or curve of best fit. Linear regression follows a straight line, while non-linear regression follows a curved line. Methods for drawing regression lines include free hand curve fitting and the method of least squares. Regression equations are used to predict values of one variable based on values of another. Correlation measures the strength of the relationship between variables, while regression determines the nature of the relationship to make predictions.

Uploaded by

murshidaman3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

REGRESSION

Regression means the estimation or prediction of unknown value of one variable from that of a known
variable .It is a statistical device used to study the relationship between two or more variables which are
related.

Simple regression

In the study of regression analysis, if there are two variables ,it is known as simple regression.

Multiple regression

In multiple regression analysis, there are more than two variables and we try to find out the effect of
two or more independent variables on one dependent variable.

Regression curve

If the given bivariate data are plotted on a graph, the points so obtained on the scatter diagram will be
less or more concentrate around a curve ,is called regression curve.

Linear regression

If the regression curve is a straight line ,we say that there is linear regression between the variables
under study.

Non -linear regression

If the curve of regression is not a straight line then the regression is termed as non-linear or curved
linear regression.

Line of best fit

The points of the scatter diagram concentrate around a straight line, that line is called Line of Best fit.
This line is also known as regression line.

Method of drawing regression lines (graphic method)

1). Free hand curve method

Take independent variable on X-axis and dependent variable on the Y-axis. We draw a smooth free
hand line in such a way that it clearly indicates the tendency of the original data. The line is fitted nearly
by inspection the free hand curve be drawn in such a way that the area of the curve below and above
the line are approximately equal. Such a line will describe the general tendency of the original data .This
is the regression line.
2). Method of least squares (curve fitting)

The principle of least squares is that principle which states that the line of best fit should be drawn in
such a manner that the sum of the squares of difference between the known values of the dependent
variable and the corresponding values of it obtained from the line of best fit should be the least.

ie ∑ (𝑦0 − 𝑦𝑒 )2 should be the least where y0 stands for known values of the dependent variable and
ye stands for the corresponding value of the dependent line.

Regression lines

There are two types of regression lines.

1) Regression line of X on Y.
2) Regression equation of Y on X.

Regression Equations

1) Regression equation of X on Y is x − x = bxy ( y − y )

2) Regression equation of Y on X is y − y = byx ( x − x )

n xy − ( x y ) n xy − ( x y )
byx = bxy =
n x 2 − (  x ) n y 2 − (  y )
2 2

where and are known as regression


coefficients.

PROBLEMS

1. From the following data of the age of Husband and the age of wife, form the two regression
equations and calculate the husband’s age when the wife’s age is 16.

Husband’s 36 23 27 28 28 29 30 31 33 35
age:

Wife’s age: 29 18 20 22 27 21 29 27 29 28

Also, find the age of wife when husband’s age is 40.

Solution: Take Husband’s age as x and wife’s age as y.

X Y XY X2 Y2
36 29 1044 1296 841
23 18 414 529 324
27 20 540 729 400
28 22 616 784 484
28 27 756 784 729
29 21 609 841 441
30 29 870 900 841
31 27 837 961 729
33 29 957 1089 841
35 28 980 1225 784
300 250 7623 9138 6414
∑ 𝑥 300
𝑥= = = 30
𝑛 10

∑ 𝑦 250
𝑦= = = 25
𝑛 10

n xy − ( x y )
byx =
n x 2 − (  x )
2

= 0.89

n xy − ( x y )
bxy =
n y 2 − (  y )
2

= 0.75

Equation of Y on X is y − y = byx ( x − x )
𝑦 − 25 = 0.89(𝑥 − 30)

Y=0.89 X - 1.7

Equation of X on Y is x − x = bxy ( y − y )
𝑥 − 30 = 0.75(𝑦 − 25)

X = 0.75Y + 11.25

i) When wife’s age (y) is 16 Husband’s age (x) is obtained by putting y=16 in the equation of

‘x’ on ‘y’

X = 0.75 (16) + 11.25 = 23.25

Husband’s age = 23.25

ii) When the Husband’s age of (x) is 40, wife’s age (y) is obtained by putting x = 40 in the

equation of y on x , which is equal to

Y = 0.89 (40) – 1.7

= 35.6-1.7
Wife’s age= 33.9

2. You are given the following data:

x Y

Arithmetic mean 36 85

Standard 11 8

deviation

Correlation coefficient between x and y is 0.66

i. Find the two regression equations.

ii. Estimate the value of x when y = 75.

Solution: Given, 𝑥 = 36, 𝑦 = 85 , 𝜎𝑥 = 11, 𝜎𝑦 = 8, 𝑟 = 0.66


𝜎𝑦 0.66 × 8
𝑏𝑦𝑥 = 𝑟 = = 0.48
𝜎𝑥 11

𝜎𝑥 0.66 × 11
𝑏𝑥𝑦 = 𝑟 = = 0.91
𝜎𝑦 8

Equation of Y on X is y − y = byx ( x − x )
𝑦 − 85 = 0.48(𝑥 − 36)

= 0.48𝑥 − 17.28

𝑖. 𝑒. , 𝑦 = 0.48𝑥 + 67.72

Equation of X on Y is x − x = bxy ( y − y )

𝑥 − 36 = 0.91(𝑦 − 85)

= 0.91𝑦 − 77.35

𝑖. 𝑒. , 𝑥 = 0.91𝑦 − 41.35

To estimate the value of x when y = 75, we use the regression equation of x on y.


𝑥 = 0.91(75) − 41.35 = 26.9
Ex.3 In a partially destroyed record of an analysis of correlation data, the following results

are only legible.

Variance of x = 9

Regression equations: 8x – 10y = -66

40x - 18y = 214

Find

a. The mean values of x and y.

b. The coefficient of correlation.

c. The S.D of y.

Solution:

1 .Mean Values of X and Y

8x – 10y = -66 …..(1)

40x - 18y = 214 …..(2)

Multiply (1) by 5, we get 40x – 50y = -330 ……(3)

(3)- (2) implies -32y=-554

Therefore, y = 17

Putting the value of y in (1), 8x – 10(17) = -66 implies x = 13

Thus 𝑥 = 13 , 𝑦 = 17

b. Coefficient of correlation
Taking first equation to be equation of x on y ,
8x=10y-66
10 66
Therefore, 𝑥 = 8 𝑦 − 8
Take second equation as y on x, -18y = -40x + 214
40 214
𝑦= 𝑥−
18 18

10 40 400
The product of coefficients 8 × 18 = 144 > 1
Therefore, our assumption is wrong. Hence the first equation is y on x and second is x on y.
Write the first equation as
-10y = -8x - 66

y=8/10 x+ 66/10

𝑏𝑦𝑥 = 8/10 = 0.8

Second equation is 40x = 18y +214


x=18/40 y + 214/40
𝑏𝑥𝑦 = 18/40 = 0.45

r = bxy  byx
Therefore,
= √. 36 = 0.6

c. S.D of Y

Given variance of x = 9

Therefore 𝜎𝑥 = 3
𝜎𝑦 𝜎𝑦
𝑏𝑦𝑥 = 𝑟 = 0.8 = 0.6 ×
𝜎𝑥 3

0.8 × 0.3
∴ 𝜎𝑦 = =4
0.6

Distinction between correlation and Regression

1. In correlation analysis, we study the degree of relationship between variables. In

regression we study the nature of relationship.

2. In correlation analysis, the choice of dependent and independent variables is purely depends
on personal choice and is of no practical significance.
In regression analysis, one has to decide which variable shall be taken as dependent and which as
independent.

3. Correlation analysis is not for the purpose of prediction whereas the regression analysis is
basically used for prediction purposes.

Properties of regression lines

1). The two lines intersect at (𝑥, 𝑦).

2). When ‘r’ =1, the two lines coincide.

3). When ‘r ‘=0, the two lines are mutually perpendicular.

Properties of regression co.efficients

𝑏𝑦𝑥 is the regression co. efficient of y on x and 𝑏𝑥𝑦 is the regression co. efficient of x on y.

1). The sign of both regression coefficients will be the same. That is , both will be positive or both will be
negative.

2). Product of the two regression coefficients is the square of the correlation coefficient.

𝑏𝑥𝑦 ×𝑏𝑦𝑥 =r2

r = bxy  byx
Ie,

3). 𝑏𝑦𝑥 and 𝑏𝑥𝑦 will have the same sign as ‘r’.

4). When there is perfect correlation, 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are reciprocals of each other.

𝑏𝑦𝑥 𝜎𝑦 𝑏𝑥𝑦 𝜎
5) =𝜎 and = 𝜎𝑥
𝑟 𝑥 𝑟 𝑦

6). Both the regression coefficients will not be greater than 1. Ie, one of them cannot be greater than1,or
both can be less than 1.

Strandard error of estimate

The difference between actual value and predicted value is the error in prediction. Standard error of
estimate is the square root of the mean of the squares of these errors. The standard error of estimate
measures how far the predicted value are accurate. Smaller the value of standard error, closer is the
predicted value to the actual value.

∑ (𝑦𝑜 −𝑦𝑒 )2
Standard error of estimate = √ 𝑛
where y0 stands for actual values and ye for predicted
value.

You might also like