0% found this document useful (0 votes)
39 views

Chapter 2 Simple Linear Regression

This document covers the fundamentals of Simple Linear Regression, including key concepts such as Population Regression Function, Stochastic Error Term, and the Method of Ordinary Least Squares (OLS). It outlines the assumptions of the Classical Linear Regression Model, the process of estimating model parameters, and the significance testing of regression coefficients. Additionally, it discusses the variances and standard errors of OLS estimators and provides insights into forecasting and the interpretation of regression results.

Uploaded by

Khushi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
39 views

Chapter 2 Simple Linear Regression

This document covers the fundamentals of Simple Linear Regression, including key concepts such as Population Regression Function, Stochastic Error Term, and the Method of Ordinary Least Squares (OLS). It outlines the assumptions of the Classical Linear Regression Model, the process of estimating model parameters, and the significance testing of regression coefficients. Additionally, it discusses the variances and standard errors of OLS estimators and provides insights into forecasting and the interpretation of regression results.

Uploaded by

Khushi Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 31
Chapter e Simple Linear Regression Model : Two Variable Case After learning this chapter you will understand : Population Regression Function. Stochastic Error Term. Sample Regression Function. Method of Ordinary Least Squares. Assumptions of CLRM. Properties of OLS Estimators. Gauss-Markoy Theorem. Hypothesis Testing of OLS Estimators. Coefficient of Determination. Normality Tests : v Normal Probability Plot, v Jarque-Bera Test. Forcasting. Scaling and units of measurement. VVVVVVVVVV For Full Course Video Lectures of All Subjects of Eco. (Hons), B Com (H), BBE, MA Economics Register yourself at www.primeacademy.in Dheeraj Suri Classes Prime Academy 9899192027 Prime Academy, www.primeacademy.in Basic Concepts 1. Regression : Regression means returning or stepping back to average or normal. It was first used by Sir francis Galton. Regression analysis, in general sense, means the estimation or prediction of the unknown value of one variable from the known value of other variable. In the words of M. M. Blair, “Regression analysis is a mathematical measure of the average relationship between two or more variables in terms of the original units of the data”. It is one of the very important statistical tools which is extensively used in business and economics to study the relationship between two or more variables that are related casually and for estimation of demand and supply curves, cost functions, production and consumption functions, etc. Linear Regression : Since in this text we are only concerned with linear regression models, so it is essential to know the meaning of linearity. The term linearity can be interpreted in two different ways as under : (i) Linearity in the Variables : If all the variables involved in the regression equation have degree one then such regression equation is said to linear in variables. wR Gi) Linearity in the Parameters : If the conditional expectation of Y, E(Y|X;) is a linear function of parameters, the B’s, then it is linear in parameters. Here it may or may not be linear in variables. In our analysis by linearity we mean linear in parameters only. The regression model may or may not be linear in variables, the X’s, but it is essentially linear in parameters, the B's. w : Regression equations are used to estimate the value of one variable on the basis of the value of other variable. If we have two variables X and Y the two regression for them are X on Y and Y on X. 4. Regression Equation of Y on X : The line of regression of Y on X is the line which gives the best estimate of Y for any given value of X. 5. Regression Equation of X on Y : The line of regression of X on Y is the line which gives the best estimate of X for any given value of Y. Note : Generally we study only regression equation of Y on X, where Y is the dependent or explained variable and X is independent or explanatory variable. 6. A Linear Probabilistic Model : As a first approximation we may assume that the Population Regression Function (PRF) is a linear function, i.c., it is of the type : E(Y/Xi) = Bi + 2X, Where, E(Y/X;) means the mean or expected value of Y corresponding to or conditional upon a given value of X, here by linearity we mean linear in parameters. B1 and > are unknown but fixed parameters known as regression coefficients. Br is called intercept term and B2 is called slope term. Econometrics 2. By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in 7. — Stochastic Specification of PRF : E(Y/X;) = i+ 2X: form of the regression function implies that the relationship between X and Y is exact, that is all the variations in Y are solely due to changes in X and there are no other factors affecting the dependent variable. If this were true then all the points of X and Y pairs, if plotted on a two dimensional plane, would fall on a straight line. However if we if we gather observations on the actual data and plot them on a diagram, we see they do not fall on a straight line (or any other smooth curve for that matter). The deviations of the observations from the line may be attributed to several factors. In Statistical analysis, however, one generally acknowledges the fact that the relationship is not exact by explicitly including a random factor, known as disturbance term in the linear regression model. So the linear regression model becomes : Ys= Bi + PoXi + us Where u, is known as the stochastic or random or residual or error term. 8. Sample Regression Function : Suppose that we have taken a sample of few values of Y corresponding to given X;. The line which is drawn to fit the data is called sample regression line. Mathematically, sample regression line can be expressed as: Y= B+ BX, or Y=b +b,X, Where, Y, is the estimator of E(Y/X;) or the estimator of population conditional mean, A, or by is the estimator of By, and By or by is the estimator of >. 9. Stochastic Specification of SRF : The stochastic sample regression function can be expressed as: Y=A+BX, +e or Y=¥ te, Where, ¢; is the estimator of population error term uj. 10. Estimating Model Parameters : Our basic objective in regression analysis is to estimate the stochastic PRF Yi= Bit BX + ui On the basis of the SRF ¥=B,+BX; +e, Because, generally our assumption is based on a single sample from some population. But due to sampling variations our estimate of PRF based on SRF is only approximate. To estimate the regression equation we use the method of least squares. il. Assumptions of Classical Linear Regression Model : In order to use the method of ordinary least squares (OLS), the following basic assumptions must hold for a two variable regression model : Econometrics 23 By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in ‘Assumption 1: Linear regression model. The regression model Y, = B, + £X; + u; is linear in the parameters. This interpretation of linearity is that the conditional expectation of Y, E(Y | Xi), is a linear function of the parameters, the fs; it may or may not be linear in the variable X. In this interpretation E(Y | X,) = A + BoX?, is a linear (in the parameter) regression model. To sce this, let us suppose X takes the value 3. Therefore, E(Y | X = 3) = fi + fs, which is obviously linear in fy and >. Note : A function is said to be linear in the parameter, say, 1, if f; appears with a power of I only and is not multiplied or divided by any other parameter (for example, BiPx, B2/Br, and so on). Assumption 2: X values are fixed in repeated sampling. Values taken by the regressor X are considered fixed in repeated samples. More technically, X is assumed to be nonstochastic. Assumption 3: Zero mean value of disturbance u;. Given the value of X, the mean, or expected, value of the random disturbance term u; is zero. Technically, the conditional mean value of u; is zero. Symbolically, we have E(u; |X;) = 0. A violation of this assumption introduces bias in the intercept of the regression equation. ‘Assumption 4: Homoscedasticity or equal variance of u:. Given the value of X, the variance of u; is the same for all observations. That is, the conditional variances of ui are identical. Symbolically, we have var(u; |Xi) = Elui— E(u: |Xi)? = E(u? | Xd) =e where var stands for variance. Assumption 5: No autocorrelation between the disturbances. Given any two X values, X; and X; (i # j), the correlation between any two uj and uj (i # j) is zero. Symbolically, cov (ui, uj [Xi, Xj) = E{ fui - E(u] | Xi } {aj - EQa)] |X} } E(u |Xi)(uj | Xj) 0 where i and j are two different observations and where cov means covariance. Assumption 6: Zero covariance between ui and Xi, or E(uiXi) = 0. Formally, coy (u;, Xi) = Elu; - E(u) |EXi - ECX))] E[u; (X; - E(X)] since E(u) = E(uXi) - E(X,)E(u) since EX) is nonstochastic = E(uiX) since E(u) = 0 = 0 by assumption Assumption 7: The number of observations n must be greater than the number of parameters to be estimated. Alternatively, the number of observations n must be greater than the number of explanatory variables. Assumption 8: Variability in X values. The X values in a given sample must not all be the same. Technically, var (X) must be a finite positive number. Assumption 9: The regression model is correctly specified. Alternatively, there is no specification bias or error in the model used in empirical analysis. Econometrics 24 By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in Assumption 10: There is no perfect multicollinearity. That is, there are no perfect linear relationships among the explanatory variables. 12. Method of Least Squares : The method of Ordinary Least Squares is attributed to a German Mathematician Carl Friedrich Gauss. This method is based on the assumption that the sum of the squares of differences between the estimated values and the actual observed values of the observations is minimum and the variables X and Y are related according to the simple linear regression model. The values of Pi, Bz and o? will almost never be known to an investigator. Instead, sample data consisting of n observed pairs (X;, Yi), (X2, Ya), ..., (Xn, Yn) will be available, from which the model parameters and the true regression line itself can be estimated. The least squares method is used to obtain the best fitting straight line to the given data and consists in selecting the best fitting straight line to the values of independent variable for dependent variable. Using this method the regression equation may be found as under : Regression Equation of Y on X : Y=h,+2,X Normal Equations YY = nf. +h X and DX =A X+ 8, xX? where f, and f, are constants and their values can be obtained by solving _ the normal equations. The least square estimate of the slope coefficient /, of true regression line is DIX, - HH -Y))_ aX -T XD, De - Xx] n> x? -(Dx,} The least square estimate of the intercept coefficient /, of true regression line is : B=Y-BX Note : Regression Equation of X on Y : X=h+hy Normal Equations DX =nf, + B,DY and YXY= BY + BD Y* where f, and f, are constants and their values can be obtained by solving the normal equations. 13. Variations in Yi : The variations in Yi may be classified as under : (i) Total Variation : The total variation in actual Y values about their sample mean ¥ is called total variation in Y. it may also be called total sum of squares (TSS). > y; is a measure of total variation in Y, such that, TsS = 0%, -¥)’ ie. Ey7, Econometrics 25 By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in Gi) Explained Variation : Ys? = D(-¥) =D(" -7F = T(x, - XY is the variation of the estimated Y values (f,) about their mean (f=), which appropriately may be called the sum of squares due to regression, (i.e., due to explanatory variables) or simply the explained sum of squares (ESS), such that ESS = D7, -¥, Ga-ne-7f &») yaw yer (ii) Unexplained Variation : Se? represents the unexplained variation of Y values about the regression line, or simply they are called residual sum of squares (RSS). Such that RSS = (¥,-¥)? or Ye? Where, Total Sum of Squares = Expected Sum of Squares + Residuals Sum of Squares ie., TSS = ESS + RSS Y, = Actual Value, ¥, = Estimated Value, ¥ = Actual Mean 14. Estimating o? and o : The parameter 6? determines the amount of variability inherent in the regression model. A large value of 6? means that the observed (i, )») are quite spread out about the true regression line, whereas when o? is small the observed points will tend to fall very close to the true regression line. The variance of the population error term o? is usually unknown. We therefore need to replace it by an estimate using sample information. Since the population error term is unobservable, one can use the estimated residuals to find an estimate. We start by forming the residual term =¥,- BBX, After estimating the residuals, we estimate the residuals sum of squares denoted x Sw -¥Y =D -4-BX7 We observe that, first of all two parameters 3, and 3, must be estimated, which implies a loss of two degrees of freedom. With this information we may use the folowing formula 7 estimating 6° as under: is known as the standard error of estimate or the standard error of the regression (se). It is simply the standard deviation of the Y values about the estimated regression line and is often used as a summary measure of the “goodness of fit” of the estimated regression line. Econometrics 2.6 By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in Note : The term number of degrees of freedom means the total number of observations in the sample (= n) less the number of independent (linear) constraints or restrictions put on them. In other words, it is the number of independent observations out of a total of n observations. For example, before the Ye? can be computed, , and 2, must first be obtained. These two estimates therefore put two restrictions on the ¢?.. Therefore, there are n ~ 2, not n, independent observations to compute the }’e*. 15. Variances and Standard Errors of OLS Estimators : As we know that OLS estimators are random variables, because their values will change from sample to sample. So we would like to know something about the sampling variability of these estimators. These sampling variabilities are measured by the variances of these estimators. The variances and standard errors of OLS estimators are computed by the following formulae : Var(b,) => — SE(b)) = Vartb, Var(b2 _ 6 Sax, - xP 16. Test of Significance of Regression Coefficients : As per central limit theorem, the regression coefficients bi and b2 follow normal distribution with their means equal to true B; and B, and variances as computed above. The following steps are taken to test the significance of slope term of regression equation : (i) Define Null hypothesis(Ho) and Alternative hypothesis(H.). Hp : B= 0, £e., slope term is statistically insignificant. H, : Slope term is statistically insignificant, ie., fo #0 (Two tailed test) B2.>0 (Upper tailed test) B<0 (Lower tailed test) (ii) Find out the tail of the test, determine whether it is single tail or two tail test. (ii) Calculate the standard error of b>. (iv) Calculate the test statistic “t’ as under : (3) (v) Set the Level of Significance ‘a’ (vi) Find ta (for single tail test) or ta (for two tail test) for n — 2 degrees of freedom from the table. (vii) Compare |t| and ta (or ta). (a) If |t] < tas (or ta), then do not Reject Null hypothesis. (b) If |t] > ta (or ta), then Reject Null hypothesis. ly we can test the statistical significance of intercept term B, Econometrics 27 By Dheeraj Suri, 9899-192027 Prime Academy, www.primeacademy.in 17. Confidence Interval : Let us assume that ‘a’ is the level of significance or the probability of committing type I error, then the confidence interval of regression coefficients is computed as under : Confidence Interval of Intercept Term : P(t, ;.-SED,) $ B, $b, +t,).SE(b,))=1 Confidence Interval of Slope Term : P(b, ~t,2:SElb,) $ By $B, + tq,» SE@,))=1— 18. The Coefficient of Determination : The coefficient of determination is a measure of how well a ee model likely to predict future outcomes. The coefficient of determination 1? is the square of sample correlation coefficient between outcomes and predicted values. The coefficient of determination denoted by 1° is given by _ Explained Variance Tair Coefficient of determination =r? = Total Variance Properties of 2 : The following two properties of ? may be noted : (@) is anon negative quantity. (ii) Its limits areO< P< 1 19. The Goodness of Fit Test : Once the regression line has been fitted, we would like to know how good the fit is; in other words, we would like to measure the discrepancy of actual observations from the fitted line. This is important since the closer the data to the line, the better the fit or, in other words, the better the explanation of variation of the dependent variable by the independent variables. A usual measure of the goodness of fit is the square of the correlation coefficient, 7°. This is the proportion of the total variation of the dependent variable caused by the independent variable. In other words, _ ExpectedSum of Squares(ESS) ~ TotalSum of Squares(TSS) The closer the value of 1° to 1 the better fit is the regression model, because an Y= | means regression is perfect fit, i.2., ¥ =, 20. Gauss Markov Theorem : The Gauss Markov theorem states that, provided that the assumptions of CLRM are satisfied, the OLS estimators are BLUE, i.e., Best (most efficient) linear (combinations of Y;) unbiased estimators of the regression parameters. Thus, the OLS estimators have the following properties : (i) Linearity : b; and by are linear estimators, i.e., they are linear functions of random variable Yj. Gi) Unbiasedness : OLS estimators are unbiased. (a) bi and bp are unbiased estimates of B; and Bo, i.e., E(b:) = Bi and E(b2) = Bo. (b) The OLS estimator of the error variance is unbiased, i.e., E(6*) = 0°. Econometrics 28 By Dheeraj Suri, 9899-192027

You might also like