0% found this document useful (0 votes)
495 views18 pages

Unit Regression Analysis: Objectives

The document discusses regression analysis, which examines the relationship between a dependent (or outcome) variable and one or more independent (or predictor) variables. It introduces linear regression as fitting a straight line to express the relationship between two variables. The method of least squares is used to estimate the parameters of the linear regression equation to minimize the sum of the errors between observed and predicted values of the dependent variable. Regression allows for predicting the dependent variable from known values of the independent variables.

Uploaded by

Selvance
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
495 views18 pages

Unit Regression Analysis: Objectives

The document discusses regression analysis, which examines the relationship between a dependent (or outcome) variable and one or more independent (or predictor) variables. It introduces linear regression as fitting a straight line to express the relationship between two variables. The method of least squares is used to estimate the parameters of the linear regression equation to minimize the sum of the errors between observed and predicted values of the dependent variable. Regression allows for predicting the dependent variable from known values of the independent variables.

Uploaded by

Selvance
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

UNIT 9 REGRESSION ANALYSIS

Structure
9.0 9.1 9.2 Objectives Introduction The Concept of Regression Linear Relationship: Two Variable Case Minimisation&Errors Method of Least Squares Prediction and Relationship between R e ~ s s i o n Correlation Multiple Regression Non-linear Regression Let Us Sum Up Key Words Some Useful Books Answers/Hints to Check Your Progress Exercises

9.0 OBJECTIVES
After going through this unit, you should be able to: explain the concept of regression; explain the method of least squares; identify the limitations of linear regression; apply linear regression models to given data; and use the regression equation for prediction.

9.1 INTRODUCTION
In the previous' Unit we noted that correlation coefficient does not reflect cause and effect relationship between two variables. Thus we cannot predict the value of one variable for a given value of the other variable. This limitation is removed 'by regression analysis. In regression analysis, to be discussed in this Unit, the relationship between variables are expressed in the fom of a mathematical equation. It is assumed that one variable is the cause and the other is the effect. You should remember that regression is a statistical tool which helps understand the relationship between variables and predicts the unknown values of the dependent variable fmm known values of the independent variable.

9.2 THE CONCEPT OF REGRESSION


In reg~ession analysis we have two types of variables: i) dependent (or explained) variable, and ii) independent (or explanatory) variable. As the name (explained and explanatory) suggests the dependent variable is explained!by the independent variable.

S u m m a r i s a t i o n of Bivariate Data

In the simplest case of regression analysis there is one dependent variable and one independent variable. Let us assume that consrunption expenditure of a household is related to the household income. For example, it can be postulated that as household income increases, expenditure also increases. Here consumption expenditure is the dependent variable and household income is the independent variable.
Usually we denote the dependent variable as Y and the independent variable as X. Suppose we took up a household survey and collected n pairs of observations in X and Y. The next step is to find out the nature of relationship between X and

Y
The relationship between X and Y can take many forms. The general practice is to express the relationship in terms of some mathematical equation. The simplest of these equations is the linear equation. This means that the relationship betweeh X and Y is in the form of a straight line and is termed linear regression. When the equation represents curves (not a straight line) the regression is called nonlinear or cunilitiear. Now the question arises, 'How do we identify the equation form?' There is no hard and fast rule as such. The form of the equation depends upon the reasoning and assumptions made by us. However, we may plot the X and Y variables on a graph paper to prepare a scatter diagram. From the scatter diagram, the location of the points on the graph paper helps in identifying the type of equation to be fitted.'Ethe points are more or less in a straight line, then linear equation is assumed. On the other hand, if the points are not in a straight line and are in the form of - a curve, a suitable non-linear equation (which resembles the scatter) is assumed. We have to take another decision, that is, the identification of dependent and independent variables. This again depends on the logic put forth and purpose of ,analysis:whether 'Y depends on X' or 'X depends on Y'. Thus there can be two regression equations fiom the same set of data. These are i) Y is assumed to be dependent on X (this is termed 'Y on X' line), and ii) X is assumed to be dependent on Y (this is termed 'X on Y' line). Regression analysis can be extended to cases where one dependent variable is explained by a number of independent variablbs Such a case is termed multiple regression. In advanced regression models there can be a number of both dependent as well as independent variables. You may by now be wondering why the term 'regression', which means 'reduce'. This n m e is associated with a phenomenon that was observed in a study on the relatioplship between the stature of father (x) and son (j). obs6rved that It was the average stature of sons of the tallest fathers has a tendency to be less than the avmge stature of these fathers. On the other hand, the average stature of sons of the Ghortest fathers has a tendency to be more than the average stature of these fathers. This phenomenon was called regression towards the mean. Although this appeared somewhat strange at that time, it was found later that this is due to natural variation within subgroups of a group and the same phenomenon occurred in most problems and data sets. The explanation is that many tall men come h m families with average stature due to vagaries of natural variation and they produce sons who are shorter than them on the whole. A similar phenomenon takes place at the lower end of the scale.

1 4

9.3 LINEAR RELATIONSHIP: TWO VAlUABLE CASE


The simplest relationship between X and Y could perhaps be a linear deterministic function given by =a+bXi . ..(9.1)

r.

In the above equation X is the independent variable or explanatory variable and Y is the dependent variable or explained variable. You may recall that the s u b d p t i represents the observation number, i ranges from 1 to n. Thus Y, is the first observation of the dependent variable, X, is the i3.h observationof the independent variable, and so on. Equation (9.1) implies that Y is completely determined by X and the parameters a and b. Suppose we have parameter values a = 3 and b = 0.75, then our linear equation is Y = 3 + 0.75 X. From this equation we can find out the value of Y for given values of X. For example, when X = 8, we find that Y = 9. Thus if we have different values of X then we obtain corresponding Y values on the basis of (9.1). Again, if Xi is the same for two observations, then the value of

Y, will also be identical for both the observations. A plot of Y on X will show
no deviation from the straight line with intercept 'a' and slope 'b'. If we look into the detenninistic model given by (9.1) we find that it may not be appropriate for describing economic interrelationship between variables. For example, let Y = consumption and X = income of households. Suppose you record your income and consumption for successive months. For the months when your income is the same, do your consumption remain the same? The point we are w g to make is that economic relationship involves certain randomness. Therefore, we assume the relationship between Y and X to be stochastic and add one error term in (9.1). Thus our stochastic model is ...(9.2) q. =a+bXi +ei where e, is the error term.In real life situations ei represents randomness in human b~haviour excluded variables, if any, in the model. Remember that the right and hand side of (9.2) has two parts, viz., i) deterministic part (that is, a + bx,), and ii) stochastic or randomness part (that is, e,). Equation (9.2) implies that even if
X , remains the same for two observations, Y, need not be the same because

of different e, . Thus, if we plot (9.2) on a graph paper the observations will not remain on a straight line.

Example 9.1 The amount of rainfall and agricultural production for ten years are given in Table 9.1.

Summarisation of Bivariate Data

Table 9.1: Rainfall and Agricultural Production


Rainfall (in mm.) Agricultural production (in tonne)
33 3 7
38
42 42 45 49

60
62

65

7 1
73
75 81
85

n
55 57
.

88

90

Fig. 9.1 :Scatter DIagram

We plot the data on a graph paper. The scatter diagram looks something like Fig. 9.1. We observe h m Fig. 9.1 that the points do not lie strictly on a straight line. But they show an upward rising tendency where a straight line can be fitted. Let us draw the regression line along with the scatter plot.

Fig. 9.2: Regression Line

The vertical difference between the regression line and the observations is the error e, . The value corresponding to the regression line is called the predicted value or the expected value. On the other hand, the actual value of the dependent variable corresponding to a particular value of the independent variable is called the observed value. Thus 'error' is the difference between predicted value and observed value.
A question that arises is, 'How do we obtain the regression line? The procedure of fitting a straight line to the data is explained below.

Regression Analysis

As mentioned earlier, a straight line can be represented by Yj =a+bX, where b is the slope and a is the intercept on y-axis. The location of a straight line depends on the value of a and b, calledparameters. Therefore, the task before us is to estimate these parameters fiom the collected data. (You will learn more about the concept of estimation in Block 7). In order to obtain the line' of best fit to the data we should find estimates of a and b in such a way that the error

In Fig. 9.1 these differences between observed and predicted values of Y are marked with straight lines fiom the observed points, parallel to y-axis, meeting the regression line. The lengths of these segments are the errors at the observed

points.
Let us denote the n observations as before by (Xi, Y) i = 1,2, ....., n. In Example ,, 9.1 on agricultural production and &all, n=10.Let us denote the predicted value of I.;. at xi by f (the notation fi is pronounced as ' Y,-cap' or ' Y, -hat'). Thus 1, 2, ....., n. The error at the r P point will then be

f .=a+bXi, i

e, = q -&

,.

... (9.3)

It would be nice if we can determine a and b in such a way that each of the e, , i = 1,2, .....,n is zero. But this is impossible unless it so happens that all the n points lie on a straight line, which is very unlikely. Thus we have to be content with minimising a combination of e, ,i = 1,2, ....., n. What are the options before us?
a

It is tempting to think that the total of all the ei, i = 1,2, .. ...,n, that is Cei

' i=l

is a suitable choice. But it is not. Because, for points above the line ate positive and below the line are negative. Thus by having a cbmbination of large positive and large negative errors, it is possible for
a

Cei i=l

to be very small.

A second possibility& that if we take a = ij (the arithmetic mean of the 5;s) .and b = 0,' could be made zero. In this case, however, we do not need Cei
n

S u m m a r ~ s a t ~ o" f n Bivariate Data ,

the value of X at all for prediction! The predicted value is the same irrespective of the observed value of X. This evidently is wrong.
n

What then is wrong with the criterion Ce; ? It takes into account the sign of
i=l

e, . What matters is the magnitude of the error and whether the error is on

the positive side or negative side is really immaterial. Thus, the criterion

i=l

ilei1

is a suitable criterion to rninimise. Remember that lei means the absolute value of ei. Thus, if ei = 5 then lei = 5 and also if ei = -5 then lei1 = 5.However, this option poses some computational problems. For theoretical and computational reasons, the criterion of least squares is preferred to the absolute value criterion. While in the absolute value criterion the sign of ei is removed by taking its absolute value, in the least squares criterion it is done by squaring it. Remember that the squares of both 5 and -5 are 25. This device has been found to be mathematically and computationally more attractive. We explai~ detail the least squares method in the following Section. in

9.5 METHOD OF LEAST SQUARES


In the least squares method we minimise the sum of squares of the error terms,

that is.

te:.

From (9.3) we find that e, = Yi -

t.
...(9.4)

which implies ei = Yi - (a + b X i) = & - a - b X i. Hence,


i=l

i:e

i=l

(Y, - a - bx.1 2

The next question is: How do we obtain the values of a and b to rninimise (9.3)? a Those of you who are familiar with the concept of differentiation will remember that the value of a function is minimumwhen the fust derivative of the function is zero and second derivative is positive. Here we have to choose the value of a and b. Hence,
i=l

5s:

will be minimum when its partial derivatives with


ee!

respect to a and b are zero. The partial derivatives of follows:

1sl

are obtained as

By equating (9.5) and (9.6) to zero and re-arranging the terms we get the following Wo equations:

Regression Analysis

These two equations, (9.7) and (9.8), are called the normal equations of least squares. These are two simultaneous linear equations in two unknowns. These can be solved to obtain the values of a and b. Those of you who are not familiar with the concept of differentiation can use a rule of thumb (We suggest that you shouldlearnthe concept of differentiation, which is so much usehl in Economics). We can say that the normal equations given at (9.7) and (9.8) are derived by multiplying the coefficients of a and b to the linear equation &d summing over all observations. Here the linear equation is Y, = a + bX, . The first normal equation is simply the linear equation Y,. = a + bXi summed over all observations (since the coefficient of a is I).

The second normal equation is the linear equation multiplied by coefficient of b is Xi)

4. the (since

After obtaining the normal equations we calculate the values of a and b h m the set of data we have.

Example 9.2: Assume that quantity of agricultural production depends on the amount of rainfall and fit a linear regression to the data given in Example 9.1.
In this case dependent variable (Y) is quantity of agricultural production and independent variable (X) is amount of rainfall. The regression equation to be fitted is Y, =a+bXi +ei For the above equation we find out the nonnal equations by the method of least squares. These equations are given at (9.7) and (9.8). Next we construct a table as follows:
Table 9.2: Computation of Regression Line

Summarisation o f Bivariate Data .

By substituting values fiom Table 9.2 in the normal equations (9.7) and (9.8) we get the following:

!
j

By solving these two equations we obtain a = -10.73 and b = 0.743. So the regression line is
= -10.73 + 0.743Xi.

Notice that the sum of errors ei for the estimated regression equation in zero (see the last column of Table 9.2). The computation given in Table92 often involves large numbers and poses difficulty. Hence we have a short-cut method for calculating the values of a and b h m the normal equations. Let us take x = x - T and y = Y Y respectively.

where

and y are the arithmetic means of X and

Hence x i = (X - Z)(Y- F)

By re-arranging terms in the normal equations we find that

You may dh m Unit 8 that cavarime is given by 0, = -Z ( X i - BXT f ) = - C x,yi


? i=1 I

1* n i=l

1*

1 * l n . Moreover, variance of X is given by of = - i=1(Xi - XI2= -i=1 xi2 Z C n n

Since these formulae are derived h m the normal equations we get the same values for a and b in this method also. For the data given in Table 9.1 we compute the values of a and b by this method. For this purpose we construct Table 9.3.

Table 9.3: Computation of Regression Line (short-cut method)


I

Regression Analysis

XI
60
Q
65 71 73 75

y1
33
37

XI

YI
-12 -8
-7

xi'
225

X I

Y,
180 104
70

-15

-13 -10

169
100

38
42 42 45 49

-3 -3 0
4 7

16

12 6 0
24

-20 6 10
13 15 0

4
0

81
85

36
100

52
55

70
130 180
776

88

10
12 0

169
225

90
Total
750

57
450

1044

On the basis of Table 9.3 we find that

Thus the regression line in this method also f. = -10.73 +0.743Xi

...(9.12)

Coeficient b in (9.12) is called the regression coefficient. This coefficient reflects the amount of increase in Y when there is a unit increase in X. In regression equation (9.12) the coefficient b = 0.743 implies that if rainfall increase by 1 mm. agricultural production will increase 0.743 thousand tonne. Regression coefficient is widely used. It is also an important tool of analysis. For example, if Y is aggregate consumption and X is aggregate income, b represents marginal propensity to consume (MPC).

9.6 PREDICTION
A major interest in studying regression lies in its ability to forecast. In Example 9.1 in the previous Section we assumed that the quantity of agricultural production is dependent on the amount of rainfall. We fitted a linear equation to the observed data and got the relationship =-10.73

+ 0.7434

Froin this equation we can predict the quantity of agricultural output given the amount of rainfall. Thus when rainfall is 60 rnrn. agriculturalproduction is (-1 0.73 + 0.74 X 60) = 33.85 thousand tomes. This figure is the predicted value on the basis of regression equation. In a similar manner we can find the predicted valies of Y for different values of X.

Summarisation of Bivarlate Data

Compare the predicted value with the observed value. Frorn Table 9.1 where observed values are given we find that when rainfall is 60 mm. a g r i c u l production ~ is 33 thousand tonnes. In fact, the predicted values for observed values of X are given in the f f h column a 'Table 9.2. Thus when rainfall is 60 mm. predicted it Thus the error value is -0.85 thousand tonne. value is 33.85 thousand tonr~es. Now a question arises, 'Which one, between observed and predicted values, should we believe?' In other words, what will be the quantity of agriculturr1 production if there is a rainfall of 60 mrn. in future? On the basis of our regression line it is given to be 33.85 tonnes. And we accept this value because it is based on the overall data. The e m r of -0.85 is considered as a random fluctuation which may not be repeated. The secolh,d question that comes to our mind is, 'Is the prediction valid for any value of X?' For example, we find fiom the regression equation that when rainfall is zero, agricultural production is -1 0.73 thous.and tonne. But common sense tells us that agricultural production cannot be negative! Is there anything wrong with our regression equation? In fact, the regression equation here is estimated on the basis of rainfall data in the range of 60-90 mm. Thus prediction is be valid in this range of X. Our prediction should not be for far off values of X. A third, question that arises here is, 'Will the predicted value come true?' This depends upon the coeflcient of determination. If the coefficient of determination is closer to one, there is greater likelihood that the prediction will be realised. However, the predicted value is constrained by elements of randomness involved with human behaviour and other unforeseen factors.
.

9.7 RELATIONSHIP BETWEEN REGRESSION AND CORRELATION


In regression analysis the status of the two variables (X, Y) are different such that Y is the variable to be predicted an= is the variable, information on which is to be used. In the &all-agricultural production problem, it makes sense to predict agricultural production on the basis of rainfall and it would not mike sense to try and predict rainfall on the basis of agricultural production. However; in the case of scores in Economics and Statistics (see Example 8.1 in the previous Unit), either one could be X and the other Y. Hence we consider the two prediction problems: (i) predicting Economics score (Y) fiom Statistics score (X); and (ii) predicting Statistics score (X) from Economics score (Y).

Thus we can have two regression coefficients h m a given set of data depending upon the choice of dependent and independent variables. These are:
l

b) X o n Y line, Xi =&+fly You may ask, 'What is the need for having two different lines? By rearrangement of terms of the Y on X line we obtain Xi= -- + - q . Thus we should have b b
a
1

a = -- and
h

p = 1.

However, the observations are not on a straight line and

the relation between X and Y is not a mathematical one. You may recall that estimates of the parameters are obtained by the method of least squares. Thus the regression line
= a +b x ,

Regression Analysis

is obtained by minimising Z (& - a - bX, ) whereas


IS obtained by rninirnising

the regression line f = a + p):

Z ( X I -a - Pq I2 ,

However, there is a relationship between the two regression coefficients h and


OJ7

P.

We have noted earlier that b = -. By a similar formula by interchanging the roles 2

o x

of X and Y we find
0; :.

p=

O~Y

But by definition we notice that o, = o,

Thus b x p = 0 X 0 ,2 ,which is the same as 9. : This 9 is called the coeficient o determination. Thus the product of the two f regression coefficients of Y on X and X on Y is the square of the correlation coefficient. This gives a relationship between correlation and regression. Notice, however, that the coefficient of determination of either regression is the same, i.e., 9; means that although the hvo regression lines are different, their predictive this powers are the sane. Note that the coefficient of determination ? ranges between 0 and 1, i.e., the makimum value it can assume is unity and the minimum value is zero; it cannot be negative. From the previous discussion, two points emerge clearly:

1) If the points in the scatter lie close to a straight line, then there is a strong relationship between X and Y and the correlation coefficient is high. 2) Jf the points in the scatter diagram lie close to a straight line, then the observed values and predicted values of Y by least squares are very close and the prediction errors (y. - f .) are small.
Thus, the prediction errors by least squares seem to be related to the correlation coefficient. We explain this relationship here. The sum of squares of errors at the various points upon using the.least squares linear regression is
i=l

5(K -

r.

On the other hand, if we had not used the value of observed X t i predict Y, then the prediction would be a constant, say, a. The best value of a by least squares

criterion is such an a that minimises be

i=l

(T - a)2 ;the solution to this o is seen to

F . Thus the sum of squares of errors of prediction at various points without


i=l

using x is The ratio,

t (& - F)i
i=l

i=l

has been by the use of X. In fact, this ratio is the coefficient of detemhation and same as ,-2 mentioned above. Since both the nunerator and denominator of this ratio are non-negative, the ratio is greater than or equal to zero.

5 ():. - 8 1' /"(T - I)' can then be used as an index of how much Z

Summarisation of Bivariate Data

Check Your Progress 1 1) From the following data find the coefficient of linear correlation between X 4Y,Determine also the regression line of Y on X, and then make an estimate of the value of Y when X = 12,

2) Obtain the lines of regression for the following data:

3) Find the two lines of regression from the following data : AgeofHusband(X) 25 22 28 25 35 20 22 40 20 18 Age of Wife (Y) 18 15 20 17 22 14 16 21 15 14 Hence es.ri;uate(i) age of husband when the age of wife is 19, (ii) age of wife when the age of husband is 30.

4) From the following data, obtain the two regression equations :


'Purchases : 71

75

69

97

70

91

39

61

'

80

47

5) Obtain the equation of the line of regression of yield of rice O on water (x), fiom the data given in the following table : bater in inches (x) 12 18 24 30 36 42 48 Yield in tons Q 5.27 5.68 6.25 7.21 8.02 8.71 8.42 Estimate the most probable yield of rice for 40 hches of water.

Regression Analysis

................................................................................................................... ................................................................................................................... ................................................................................................................... ................................................................................................................... ...................................................................................................... ............. ...................................................................................................................


9.8 MULTIPLE REGRESSION
-

So far we have considered the case of the dependent variable being explained by one independent variable. However, them are many cases where the dependent variable is explained by two or more independent variables. For example, yield of crops (Y) being explained by application of fertilizer (X,) and irrigation water(;rZ). This sort of models is termed multiple regression. Here, the equation that we consider is

Where Y is the explained variable, X, and X, are explanatory variables, and e is the error term. In order to make the presentation simple we have dropped the subscripts. A regression equation can be fitted to (9.13) by applying the method of least squares discussed in Section 9.3. Here alsqwe minirnise the normal equations as follows: e2 and obtain

By solving the above equations we obtain estimates for a,P and y. The regression equation that we obtain is f = a + w l+ y ~ 2

...(9.15)

Remember that we obtain predicted or forecast values of Y (that is f ) through (9.15) by applying various values for X, and X2. In the bivariate case (Y,X) we could plot the regression line on a graph paper. on However, it is quite complex to plot the three variable case (Y, X,, 5) graph paper bec-e it will require three dimensions. However, the intuitive idea remains the same and we have to minimise the sum of errors. In fact when we add all the error terms (el,e2,........e n )it sum up to zero. In many cases the number of explanatory variables may be more than two. In

Summarisation of Bivariate Data

such cases we have to follow the basic principle of least squares: minimize ze2. Thusif Y=a,+a,Xl+a2X2+ ...............+a,X,,+ethenwehavetominimize Ce2 =C(Y-a,-a,X, -a2X2........-anX,,)2 andfindoutthenormalequations.

Now a question arises, 'How many variables should be added in a regression equation?' It depends on our logic and what variables are considered to be important. Whether a variable is important or not can be identified on the basis of statistical tests also. These tests will be discussed later in Block 7.
We present a numerical example of multiple regression below.

Example 9.2
A student tries to explain the rent charged for housing near the University. She collects data on monthly rent, area of the house and distance of the house fiom the university carnpus and fits a linear regression model. Rent (in Rs.'000) Y Area (in sq.mt.) X. distance (in Km.)

x,

In the above example rent charged (Y) is the dependent variable while area of the house (X,) and distance of the house fiom the university campus (X,) are independent variables. The steps involved in estimation of regression line are: i) Find out the regression equation to be estimated. In this case it is given by ~ = a + w+yX2+e. ,
i) Find out the normal equations for the regression equatiqn to be estimated. i In this case the normal equations are

iv)

Construct a table as given in Table 9.4. Put the values fiom the table in the normal equations. Solve for the estimates of a , p and y .

v)

Table 9.4: Computation of Multiple Regression

Regression Analysis

By applying the above mentioned steps we obtain the estimated regression line as f =-4.80 + 0.45X, + 0.09X2

9.9 NON-LINEAR REGRESSION


The equation fitted in regression can be non-linear or curvilinear also. In fact, it can take numerous forms. A simpler form involving two variables is the quadratic form. The equation is Y = a + bX + cX2 There are three parameters here viz., a, b and c and the normal equations are:

By solving for these equation we obtain the values of a, b and c.


Certain non-linear equations can be transformed into linear equations by taking logarithms. Finding out the optimum values of the parameters h m the transformed linear equations is the same as the process discussed in the previous section. We give below some of the frequently used non-linear equations and the respective transformed linear equations. 1) Y = a cbX By taking natural log (In), it can be written as l n Y = l n a + bX orYf=a+plr Where, Yf = InY, a = lna, X = X and P = b

By taking logarithm (log), the equation can be transformed into log Y = log a + b log X orY= a+pX where, Yf = log Y, a = log a, P = b and X =.log X

1 If we take Y' = - then Y

If we take X 7-=a + b X

then

Once the non-linear equation is transformed, the fitting of a regression line is as per the method discussed in the beginning of this unit.' We derive the normal equations and substiwte the values calculated fiom the observed data. From the transformed parameters, the actual parameters can be obtained by making the reverse transformation. Check Your Progress 2 1) Using the data on scores in Statistics and Economics of Table 9.8, compute the regression of Yon Xand Ken Y and check that the two lines are different. On the scatter diagram, plot both these regression lines. Check that the product of the regression coefficients is the square of the correlation coefficient.

...................................................................................................................
2) Suppose that the least squares linear regression of family expenditure on clothing (Rs. Y) on family annual income (Rs. X ) has been found to be Y = 100 + 0.09X, in the range 1000 <X <100000. Interpret this regression line. Predict the expenditure on the clothing of a family with an annual income of Rs. 10,000. What about families with annual income of Rs. 100 and Rs. 10,00,000?

9.10 LET US SUM UP


In this Unit we discussed an important statistical tool, that is, regression. In regression analysis we have two types of variables: dependent and independent.. The dependent variable is explained by independent variables. The relationship

between variable takes the form of a mathematical equation. Based on our logic, understanding and purpose of analysis we categorise variables iind identify the equation form. The regression coefficient enables us to make predictions for the dependent variable given the values of the independent variable. However,.prediction remains more or less valid within the range of data used for analysis. If we attempt to predict for far off values of the independent variable we may get insensible values for the dependent variable.

Regression Analysis

9.11 KEY WORDS


Coefficient of Determination : It is given as F', i.e., the square of the correlation coefficient. It shows the percentage variation in the dependent variable Y explained by the independent variable X 8ormal Equations
: A set of simultaneous equations derived in

the application of the least squares method, for example in regression analysis. They are used to estimate the parameters of the model.

Regression

: It is a statistical measure of the average

relationship between two or more variables in terms of the original units of the data.

9.12 SOME USEFUL BOOKS

--

Nagar, A.L. and R.K. Das, 1989 :Basic Statistics, Oxford University Press, Delhi. Goon, A.M., M.K. Gupta and B. Dasgupta, 1987 : Basic Statistics, The World Press Pvt. Ltd., Calcutta.

9.13 ANSWERSIHINTS TO CHECK YOUR PROGRESS EXERCISES


Check Your Progress 1

+ 0.98 ; Y = 0.64X + 0.54; 8.2 2) X = 0.95Y - 6.4 ; Y = 0.95X + 7.25 3) X = 2.23Y - 12.70 ; Y = 0.39X + 7.33
1)

(i) 29.6 (ii) 18.9 4) Y = 0.613X + 14.81 ; X = 1 . 3 6 0 ~ 5.2 5) Y= 3.99+ 0.103X; 8.11 tons

Sum~narisationo f U i v ~ r i a t cData

Check Your Progress 2

1) i) Y = a + bX = 5.856 + 0.676~ ii) X = a + PY= 29.848 + 0.799Y iii) r = 0.73 iv) 0.676 X 0.799 = 0.54 2) Expenditure on clothing, when family income is Rs. 10,000, is Rs.1,000. In the case of income below Rs.1,000 or above Rs. 1,00,000 the regression line may not hold good. In between both these figures, one rupee increase in income increases expenditure on clothes by 9 paise.

You might also like