0% found this document useful (0 votes)

18 views34 pages

Econometrics - Chapter - Chapter - II

Chapter 2 discusses Simple Linear Regression, focusing on regression analysis as a tool for understanding the relationship between dependent and independent variables. It differentiates between correlation and regression, emphasizing that while correlation measures the strength of relationships, regression is used for prediction and explanation. The chapter also outlines the simple linear regression model, its components, and the importance of distinguishing between population and sample regression functions.

Uploaded by

Dessalegn Merdassa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views34 pages

Econometrics - Chapter - Chapter - II

Uploaded by

Dessalegn Merdassa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter 2: Simple Linear Regression

2.1 The concept of Regression Analysis

Regression analysis is almost certainly the most important tool at the econometrician’s
disposal. But what is regression analysis? Regression analysis is concerned with the study of
the dependence of one variable, the dependent variable, on one or more other variables, the
explanatory variables, with a view to estimating and/or predicting the (population) mean or
average value of the former in terms of the known or fixed (in repeated sampling) values of
the latter. In very general terms, regression is concerned with describing and evaluating the
relationship between a given variable and one or more other variables. More specifically,
regression is an attempt to explain movements in a variable by reference to movements in
one or more other variables. Regression allows you to estimate how a dependent
variable changes as the independent variable(s) change. To make this more concrete, denote
the variable whose movements the regression seeks to explain by y and the variables which
are used to explain those variations by x1, x2, . . . , xk. Hence, in this relatively simple setup, it
would be said that variations in k variables (the xs) cause changes in some other variable, y.
This chapter will be limited to the case where the model seeks to explain changes in only one
variable y (although this restriction will be removed in the chapter). In regression, the
dependent variable (y) and the independent variable(s) (xs) are treated very differently.
There are various completely interchangeable names for y and the xs, as shown in table 2.1.
All of these terms could be used synonymously in regression analysis.
Table2.1 other names for dependent and independent variable
Dependent variable Independent variable
Explained variable Explanatory variable
Predictant predictor
Regressand Regressor
Response stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
Effect variable Causal variable
Although it is a matter of personal taste and tradition, in this course we will use the
dependent variable/explanatory variable or the more neutral, regressand and regressor
terminology.
2.3 Regression and Correlation
Correlation and regression are complex and powerful statistical techniques that have wide
application in data analysis. It’s not uncommon for correlation and regression to be confused
for one another as correlation can often drive into regression. Although correlation analysis is
closely related to regression analysis, it is conceptually very much different from regression.
Correlation is a statistical technique that measures that extent to which two or more
variables fluctuate in relation to each other. Correlation is a term that measures of strength
of a linear relationship between two quantitative variables. In other words, it measures how
two variables move in relation to one another. It is a statistical measure that expresses the
extent to which two variables are linearly related (meaning they change together at a
constant rate). It’s a common tool for describing simple relationships without making a
statement about cause and effect.
Positive correlation. This means the two variables moved either up or down in the same
direction together.
Negative correlation. This means the two variables moved in opposite directions.
Zero or no correlation: A correlation of zero means there is no relationship between the two
variables. In other words, as one variable moves one way, the other moved in another
unrelated direction
Correlation Vs. regression
Basically, you need to know when to use correlation vs regression. Use correlation for a quick
and simple summary of the direction and strength of the relationship between two or more
numeric variables. Use regression when you’re looking to predict, optimize, or explain a
number response between the variables (how x influences y).
Basis of comparison Correlation Regression
Definition A statistical measure that Describes how an independent
defines co-relationship or variable is associated with the
association of two variables. dependent variable.
Objectives To find a value expressing the To estimate values of a random
relationship between variable based on the values of a
variables. fixed variable.
When to use when you want to When you want to predict or explain
summarize direct numerical response
relationship between two
variables
Dependent and Independent
No difference Both variables are different
variables
Able to quantify direction of yes yes
relationship
Able to quantify strength of Yes Yes
relationship
Able to show cause and No Yes
effect
Able to predict and optimize No Yes
X and y are exchangeable Yes No
Use mathematical equation No Yes
Regression and causation
Although regression analysis deals with the dependence of one variable on other variables, it
does not necessarily imply causation. Regression deals with dependence amongst variables
within a model. But it cannot always imply causation. Causation means that a change in one
variable causes a change in another variable.
Causation exists when one event causes another event to occur. So, a causal relationship
exists when one variable in a data set has a direct influence on another variable. Thus,
one event triggers the occurrence of another event. A causal relationship is also referred
to as cause and effect. A causal relation between two events exists if the occurrence of the
first causes the other. The first event is called the cause and the second event is called the
effect. In other words, it is about cause and effect.
Example
Smoking causes lung cancer
Rain clouds cause rain
Exercise causes muscle growth
Overeating causes weight gain

Types of lnear regression

Simple Linear Regression Model
Simple linear regression is a linear regression model with a single explanatory variable. When
there is only one independent variable in the linear regression model, the model is generally
termed as a simple linear regression model. When there are more than one independent
variables in the model, then the linear model is termed as the multiple linear regression
model.
The equation for a regression line of Y on X for the population is given as:
Y = f(x) + U
= 0 + 1X + U ........................................... (2.1

Where y is termed as the dependent or study variable and X is termed as the independent or
explanatory variable. The terms 0 and 1X are the parameters of the model. The parameter
 is termed as an intercept term, and the parameter 1X is termed as the slope parameter.
These parameters are usually called as regression coefficients. The unobservable error
component U accounts for the failure of data to lie on the straight line and represents the
difference between the true and observed realization of y. U surrogates for all those variables
that are omitted from the model but that collectively affect Y. The obvious question is: Why
not introduce these variables into the model explicitly? Stated otherwise, why not develop a
multiple regression model with as many variables as possible? The reasons are many.

A) Vagueness of theory: The theory, if any, determining the behavior of Y may be, and often
is, incomplete. We might know for certain that weekly income X inﬂuences weekly
consumption expenditure Y, but we might be ignorant or unsure about the other
variables affecting Y. Therefore, ui may be used as a substitute for all the excluded or
omitted variables from the model.
B) Unavailability of data: Even if we know what some of the excluded variables are and
therefore consider a multiple regression rather than a simple regression, we may not have
quantitative information about these variables. It is a common experience in empirical
analysis that the data we would ideally like to have often are not available. For example,
in principle we could introduce family wealth as an explanatory variable in addition to the
income variable to explain family consumption expenditure. But un fortunately,
information on family wealth generally is not available. Therefore, we may be forced to
omit the wealth variable from our model despite its great theoretical relevance in
explaining consumption expenditure.
C) Core variables versus peripheral variables: Assume in our consumption- income example
that besides income X1, the number of children per family X2, sex X3, religion X4,
education X5, and geographical region X6 also affect consumption expenditure. But it is
quite possible that the joint inﬂuence of all or some of these variables may be so small
and at best nonsystematic or random that as a practical matter and for cost
considerations it does not pay to introduce them into the model explicitly. One hopes
that their combined effect can be treated as a random variable ui

D) Intrinsic randomness in human behavior: Even if we succeed in introducing all the relevant
variables into the model, there is bound to be some “intrinsic” randomness in individual
Y’s that cannot be explained no matter how hard we try. Humans are not machines that will
do as instructed. So there is unpredictable element. Example: due to unexplained case, an
increase in income may not influence consumption. Thus the disturbance term captures such
human behavior that is left unexplained by the economic model. The disturbances, the u’s, may
very well reﬂect this intrinsic randomness
E) Poor proxy variables: Although the classical regression model (to be developed in Chapter
3) assumes that the variables Y and X are measured accurately, in practice the data may
be plagued by errors of measurement. Variable being explained cannot be measures
accurately, either because of data collection difficulties or because it is inherently un measurable
and a proxy variables must be used instead. The disturbance term can in these circumstances be
thought of as representing this measurement error [(of the variable(s)]
Example: measuring taste is not an easy job

F) Principle of parsimony: Following Occam’s razor, we would like to keep our regression
model as simple as possible. If we can explain the behavior of Y “substantially” with
two or three explanatory variables and if our theory is not strong enough to suggest
what other variables might be included, why introduce more variables? Let ui
represent all other variables. Of course, we should not exclude relevant and
important variables just to keep the regression model simple
G) Wrong functional form: Even if we have theoretically correct variables explaining a
phenomenon and even if we can obtain data on these variables, very often we do not
know the form of the functional relationship between the regressand and the
regressors. Is consumption expenditure a linear (invariable) function of income or a
nonlinear (invariable) function? If it is the former, Yi = β1 + B2 Xi + ui is the proper
functional relationship between Y and X, but if it is the latter, Yi = β1 + β2 Xi + β3 X2 +
ui may be the correct functional form. In two-variable models the functional form of the
relationship can often be judged from the scatter gram. But in a multiple regression
model, it is not easy to determine the appropriate functional form, for graphically we
cannot visualize scatter grams in multiple dimensions. For all these reasons, the stochastic
disturbances ui assume an extremely critical role in regression analysis, which we will
see as we progress.
Generally speaking simple linear regression analysis is concerned with the study of the
dependency of one dependent variable on one variable called the explanatory variable(s) or
the independent variable. Moreover, the true relationship that connects the variables
involved is split in to two. They are systematic (or explained variation and random or
(unexplained) variation. Using (2.2) we can disaggregate the two components as follows
Y =  0 +  1X + U
That is,
[variation in Y] = [systematic variation] + [random variation]
In our analysis we will assume that the “independent” variable X is nonrandom. We will also
assume a linear model. Note that this course is concerned with linear model like (2.3). In this
regard it is essential to know what the term linear really means, for it can be interpreted in
two different ways. These are,

a) Linearity in the variables

b) Linearity in parameters
i) Linearity in variables implies that an equation is linear model if it is expressed in a straight
line.
Example. Consider the regression function Y = 0 + 1X. This means the slope (or
derivative) of this equation is independent of X so that there is linearity in variable. But if
Y = 0 + 1X2 then the variable X is raised (power) to second degree, so it is non-linear in
variable. This is because, the slope or derivative is not independent of the value taken by
dy
X. That is, dx = 21X Hence the above function is not linear in X since the variable X

appears with a power of 2

ii) Linearity in the parameter: this implies that the parameters (i.e., ) are raised to their first
degree. In this interpretation Y = 0 + 1X2 is a linear regression model but Y = 0 + 21X is
not. The latter is an example of a nonlinear (in the parameters) regression model of the
two interpretation of linearity, linearity in the parameters is relevant for the development
of the regression theory. Thus the term linear regression means a regression that is linear
in the parameters, the’s; it may or may not be linear in the explanatory variables.
The following discussion stress that regression analysis is largely concerned with estimating
and/or predicting the (population) mean or average value of the dependent variable on the
basis of the known or fixed values of the explanatory variable(s).

2.2 Population Regression function Vs Sample Regression Function

Population regression Function

As noted in Section 1.2, regression analysis is largely concerned with estimating

and/or predicting the (population) mean value of the dependent variable on the basis
of the known or ﬁxed values of the explanatory variable(s).To understand this,
consider the data given in Table 2.1.
Table
income
↓ 80 100 120 140 160 180 200 220 240 26
0
Weekly family 55 65 79 80 102 110 120 135 137 15
consumption 60 70 84 93 107 115 136 137 145 0
15
expenditure Y, $ 65 74 90 95 110 120 140 140 155 2
17
70 80 94 103 116 130 144 152 165 5
17
75 85 98 108 118 135 145 157 175 8
18
– 88 – 113 125 140 – 160 189 0
18
– – – 115 – – – 162 – 5
19
1
Total 325 462 445 707 678 750 685 1043 966 121
1
Conditional 65 77 89 101 113 125 137 149 161 17
means of Y, 3
E (Y | X )

The data in the above table refer to a total population of 60 families in a hypothetical
community and their weekly income (X) and weekly consumption expenditure (Y), in
dollar. The 60 families are divided into 10 income groups (from $80 to $260) and the
weekly expenditures of each family in the various groups are as shown in the table.
Therefore, we have 10 ﬁxed X values and the corresponding Y values against each of
the X values; so to speak, there are 10 Y subpopulations. There is considerable
variation in weekly consumption expenditure in each income group, which can be
seen clearly from Figure 2.1. But the general picture that one gets is that, despite the
variability of weekly consumption expenditure within each income bracket, on the
average, weekly consumption expenditure increases as income increases. To see this
clearly, in Table 2.1 we have given the mean, or average, weekly consumption
expenditure corresponding to each of the 10 levels of income. Thus, corresponding to
the weekly income level of $80, the mean consumption expenditure is $65, while
corresponding to the income level of $200, it is $137. In all we have 10 mean values
for the 10 subpopulations of Y. We call these mean values conditional expected
values, as they depend on the given values of the (conditioning) variable X.
Symbolically, we denote them as E(Y | X), which is read as the expected value of Y
given the value of X (see also Table 2.2).

The dark circled points in Figure 2.1 show the conditional mean values of Y
against the various X values. If we join these conditional mean values, we obtain
what is known as the population regression line (PRL), or more generally, the

population regression curve. More simply, it is the regression of Y on X. The

adjective “population” comes from the fact that we are dealing in this example with
the entire population of 60 families. Of course, in reality a population may have
many families.

200

Weekly consumption expenditure,

E(Y | X)

150

100
$

50
80 100 120 140 160 180 200 220 240 260

Weekly income, $

FIGURE 2.1 Conditional distribution of expenditure for various levels of income (data of Table 2.1).

Geometrically, then, a population regression curve is simply the locus of the

conditional means of the dependent variable for the ﬁxed values of the
explanatory variable(s). More simply, it is the curve connecting the means of the
subpopulations of Y corresponding to the given values of the regressor X. It can
be depicted as in Figure 2.2.

Y
Weekly consumption expenditure,

Conditional
Weekly consumption expenditure,

mean

E(Y | Xi)
$

149

Distribution of

Y given X =
65 $220
$

X
80 140 220
FIGURE 2.2 Population regression line (data of Table 2.1

This figure shows that for each X (i.e., income level) there is a population of Y values
(weekly consumption expenditures) that are spread around the (conditional) mean of
those Y values. For simplicity, we are assuming that these Y values are distributed
symmetrically around their respective (conditional) mean values. And the regression
line (or curve) passes through these (conditional) mean values. With this background,
the reader may find it instructive to reread the definition of regression given in
Section 1.2.
2.2 THE CONCEPT OF POPULATION REGRESSION FUNCTION (PRF)
From the preceding discussion and Figures. 2.1 and 2.2, it is clear that each
conditional mean E(Y | Xi ) is a function of Xi, where Xi is a given value of X.
Symbolically,
E(Y | Xi) = f (Xi )---------------------------------------------------------------------(2.2.1)

where f (Xi ) denotes some function of the explanatory variable X. In our example,
E(Y | Xi ) is a linear function of Xi. Equation (2.2.1) is known as the conditional
expectation function (CEF) or population regression function (PRF) or population
regression (PR) for short. It states merely that the expected value of the distribution
of Y given Xi is functionally related to Xi. In simple terms, it tells how the mean or
average response of Y varies with X. What form does the function f (Xi) assume?
This is an important question because in real situations we do not have the entire
population available for examination. The functional form of the PRF is therefore an
empirical question, although in speciﬁc cases theory may have something to say. For
example, an economist might posit that consumption expenditure is linearly related to
income. Therefore, as a ﬁrst approximation or a working hypothesis, we may assume
that the PRF E(Y | Xi ) is a linear function of Xi, say, of the type
E(Y | Xi ) = β1 + β2 Xi---------------------------------------------------------------- (2.2.2)

where β1 and β2 are unknown but fixed parameters known as the regression
coefficients; β1 and β2 are also known as intercept and slope coefficients,
respectively. Equation (2.2.1) itself is known as the linear population regression
function. Some alternative expressions used in the literature are linear population
regression model or simply linear population regression. In the sequel, the terms
regression, regression equation, and regression model will be used
synonymously. In regression analysis our interest is in estimating the PRFs like
(2.2.2), that is, estimating the values of the unknowns β1 and β2 on the basis of
observations on Y and X. This topic will be studied in detail in Chapter 3.
2.2 THE MEANING OF THE TERM LINEAR
Since this text is concerned primarily with linear models like (2.2.2), it is essential
to know what the term linear really means, for it can be interpreted in two different
ways.
Linearity in the Variables
The first and perhaps more “natural” meaning of linearity is that the conditional

expectation of Y is a linear function of Xi, such as, for example, (2.2.2).

geometrically; the regression curve in this case is a straight line. In thisi

interpretation, a regression function such as E(Y | Xi) = β1 + β2 X2 is not a linear

function because the variable X appears with a power or index of 2.
Linearity in the Parameters

The second interpretation of linearity is that the conditional

2 expectation of Y, E(Y |
2
Xi ), is a linear function of the parameters, the β’s; it may or may not be linear
i
in

the variable X.7 In this interpretation E(Y | Xi ) = β1 + β2 X2 is a linear (in the

parameter) regression model. To see this, let us suppose X takes the value 3.
i
Therefore, E(Y | X = 3) = β1 + 9β2 , which is obviously linear in β1 and β2. All the
models shown in Figure 2.3 are thus linear regression models, that is, models

linear in the parameters. Now consider the model E(Y | Xi ) = β1 + β2 Xi . Now

suppose X = 3; then we obtain E(Y | Xi ) = β1 + 3β2 , which is nonlinear in the

parameter β2. The preceding model is an example of a nonlinear (in the
parameter) regression model. Of the two interpretations of linearity, linearity in
the parameters is relevant for the development of the regression theory to be
presented shortly. Therefore, from now on the term “linear” regression will always
mean a regression that is linear in the parameters; the β’s (that is, the parameters
are raised to the ﬁrst power only). It may or may not be linear in the explanatory
variables, the X’s. Schematically, we have Table 2.3. Thus, E(Y | Xi ) = β1 + β2 Xi ,
which is linear both in the parameters and variable, is a LRM, and so is E(Y | Xi ) =

β1 + β2 X2, which is linear in the parameters but nonlinear in variable X.

TABLE 2.3 LINEAR REGRESSION MODELS

Model linear in parameters? Model linear in variables?

Yes No

Yes LRM LRM

No NLRM NLRM

Note: LRM = linear regression model

NLRM = nonlinear regression model

2.3 STOCHASTIC SPECIFICATION OF PRF

It is clear from Figure 2.1 that, as family income increases, family consumption
expenditure on the average increases, too. But what about the consumption
expenditure of an individual family in relation to its (ﬁxed) level of income? It is
obvious from Table 2.1 and Figure 2.1 that an individual family’s consumption
expenditure does not necessarily increase as the income level increases. For
example, from Table 2.1 we observe that corresponding to the income level of $100
there is one family whose consumption expenditure of $65 is less than the
consumption expenditures of two families whose weekly income is only $80. But
notice that the average consumption expenditure of families with a weekly income of
$100 is greater than the average consumption expenditure of families with a weekly
income of $80 ($77 versus $65). What, then, can we say about the relationship
between an individual family’s consumption expenditure and a given level of income?
We see from Figure 2.1 that, given the income level of Xi , an individual family’s
consumption expenditure is clustered around the average consumption of all families at
that Xi , that is, around its conditional expectation. Therefore, we can express the
deviation of an individual Yi around its expected value as follows:
ui = Yi − E(Y | Xi ) or

Yi = E(Y | Xi ) + ui-----------------------
(2.4.1)
where the deviation ui is an unobservable random variable taking positive or
negative values. Technically, ui is known as the stochastic disturbance or
stochastic error term. How do we interpret (2.4.1)? We can say that the
expenditure of an individual family, given its income level, can be expressed as
the sum of two components: (1) E(Y | Xi ), which is simply the mean consumption
expenditure of all the families with the same level of income. This component is
known as the systematic, or deterministic, component, and (2) ui , which is the
random, or nonsystematic, component. We shall examine shortly the nature of the
stochastic disturbance term, but for the moment assume that it is a surrogate or
proxy for all the omitted or neglected variables that may affect Y but are not (or
cannot be) included in the regression model. If E(Y | Xi ) is assumed to be linear in
Xi , as in (2.2.2), Eq. (2.4.1) may be written as
Yi = E(Y | Xi ) + ui

= β1 + β2 Xi + ui-----------------------------------------------------------------------------
(2.4.2)

Equation (2.4.2) posits that the consumption expenditure of a family is linearly

related to its income plus the disturbance term. Thus, the individual consumption
expenditures, given X = $80 (see Table 2.1), can be expressed as
Y1 = 55 = β1 + β2(80) + u1

Y2 = 60 = β1 + β2 (80) + u2

Y3 = 65 = β1 + β2 (80) + u3 (2.4.3)

Y4 = 70 = β1 + β2(80) + u4

Y5 = 75 = β1 + β2(80) + u5

Now if we take the expected value of (2.4.1) on both sides, we obtain

E(Yi | Xi ) = E[E(Y | Xi )] + E(ui | Xi )

= E(Y | Xi ) + E(ui | Xi )-------------------------------------------------------- (2.4.4)

where use is made of the fact that the expected value of a constant is that constant
itself.8 Notice carefully that in (2.4.4) we have taken the conditional expectation,
conditional upon the given X’s. Since E (Yi | Xi) is the same thing as E(Y | Xi ), Eq.
(2.4.4) implies that
E (ui | Xi ) = 0---------------------------------------------------------------------- (2.4.5)

Thus, the assumption that the regression line passes through the conditional means
of Y (see Figure 2.2) implies that the conditional mean values of ui (conditional upon
the given X’s) are zero. From the previous discussion, it is clear (2.2.2) and (2.4.2) are
equivalent forms if E(ui | Xi ) = 0.9 But the stochastic speciﬁcation (2.4.2) has the
advantage that it clearly shows that there are other variables besides income that affect
consumption expenditure and that an individual family’s consumption expenditure
cannot be fully explained only by the variable(s) included in the regression model.

2.2 THE SAMPLE REGRESSION FUNCTION (SRF)

By confining our discussion so far to the population of Y values corresponding to the
fixed X’s, we have deliberately avoided sampling considerations (note that the data
of Table 2.1 represent the population, not a sample). But it is about time to face up
to the sampling problems, for in most practical situations what we have is but a
sample of Y values corresponding to some fixed X’s. Therefore, our task now is to
estimate the PRF on the basis of the sample information. As an illustration, pretend
that the population of Table 2.1 was not known to us and the only information we had
was a randomly selected sample of Y values for the fixed X’s as given in Table 2.4.
Unlike Table 2.1, we now have only one Y value corresponding to the given X’s;
each Y (given Xi) in Table 2.4 is chosen randomly from similar Y’s corresponding to
the same Xi from the population of Table 2.1. The question is: From the sample of
Table 2.4 can we predict the average weekly consumption expenditure Y in the
population as a whole corresponding to the chosen X’s? In other words, can we
estimate the PRF from the sample data? As the reader surely suspects, we may not
be able to estimate the PRF “accurately” because of sampling fluctuations.

To see this, suppose we draw another random sample from the population of Table
2.1, as presented in Table 2.5. Plotting the data of Tables 2.4 and 2.5, we obtain the
scatter gram given in Figure 2.4. In the scatter gram two sample regression lines are
drawn so as to “fit” the scatters reasonably well: SRF1 is based on the first sample,
and SRF2 is based on the second sample. Which of the two regression lines rep-
resents the “true” population regression line? If we avoid the temptation of looking at
Figure 2.1, which purportedly represents the PR, there is no way we can be
absolutely sure that either of the regression lines shown in Figure 2.4 represents the
true population regression line (or curve). The regression lines in Figure 2.4 are known
as the sample regression lines. Supposedly they represent the population regression
line, but because of sampling fluctuations they are at best an approximation of the
true PR. In general, we would get N different SRFs for N different samples, and these
SRFs are not likely to be the same.
TABLE 2.4

A RANDOM SAMPLE FROM THE POPULATION OF TABLE 2.1

TABLE 2.5

ANOTHER RANDOM SAMPLE FROM THE POPULATION OF TABLE 2.1

Y X Y X

70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260

200
SRF2
 First sample (Table 2.4) Regression based
Second sample (Table on SRF1
2000 2.5) 


Weekly consumption expenditure,

Regression based
 on

150 


 

100
$

80 100
120
140
160
180
200
220
240
260

Weekly income, $

FIGURE 2.4 Regression lines based on two different samples.

.
Now, analogously to the PRF that underlies the population regression line, we can develop the
concept of the sample regression function (SRF) to represent the sample regression line.
The sample counterpart of (2.2.2) may be written as
Yˆi = βˆ1 + βˆ2 Xi------------------------------------------------------------------------------ (2.6.1)

Where Yˆ is read as “Y-hat’’ or “Y-cap’’

YˆI = estimator of E(Y | Xi )
βˆ1 = estimator of β1
βˆ2 = estimator of β2

Note that an estimator, also known as a (sample) statistic, is simply a rule or formula or
method that tells how to estimate the population parameter from the information provided by the
sample at hand. A particular numerical value obtained by the estimator in an application is
known as an estimate. Now just as we expressed the PRF in two equivalent forms, (2.2.2) and
(2.4.2), we can express the SRF (2.6.1) in its stochastic form as follows:
Yi = βˆ1 + βˆ2 Xi + uˆ i----------------------------------------------------------------(2.6.2)

where, in addition to the symbols already deﬁned, uˆ i denotes the (sample) residual term.
Conceptually uˆ i is analogous to ui and can be regarded as an estimate of ui . It is introduced
in the SRF for the same reasons as ui was introduced in the PRF. To sum up, then, we ﬁnd our
primary objective in regression analysis is to estimate the PRF

Yi = β1 + β2 Xi + ui ---------------------------------------------------------------------------------(2.4.2)
on the basis of the SRF
ˆ ˆ
----------------------------------------------------------------------------------(2.6.2)
Yi = β 1 + β xi =
uˆ i
because more often than not our analysis is based upon a single sample from some population.
But because of sampling ﬂuctuations our estimate of the PRF based on the SRF is at best an
approximate one. This approximation is shown diagrammatically in Figure 2.5
Y

SRF: Yi = 1 + 2 Xi

Yi
Y
i

Weekly consumption expenditure,

PRF: E(Y | Xi)= 1 + 2 Xi

E(Y | Xi)
$

Weekly consumption
E(Y | Xi)
For X = Xi , we have one (sample) observation Y = Yi . In terms of the
SRF, the observed Yi can be expressed as
Yi = Yî + uˆ i-------------------------------------------------------------------------------
(2.6.3)
and in terms of the PRF, it can be expressed as
Yi = E(Y | Xi ) + ui--------------------------------------------------------------------------
(2.6.4)
Now obviously in Figure 2.5 Yî overestimates the true E(Y | Xi ) for the
Xi shown therein. By the same token, for any Xi to the left of the point
A, the SRF will underestimate the true PRF. But the reader can readily
see that such over- and underestimation is inevitable because of
sampling fluctuations. The critical question now is: Granted that the
SRF is but an approximation of the PRF, can we devise a rule or a
method that will make this ap proximation as “close” as possible? In
other words, how should the SRF be constructed so that βˆ1 is as
“close” as possible to the true β1 and βˆ2 is as “close” as possible to
the true β2 even though we will never know the true β1 and β2?
.

James R. Evans - Statistics, Data Analysis and Decision Modeling International 5th Ed.-Pearson (2013)
86% (14)
James R. Evans - Statistics, Data Analysis and Decision Modeling International 5th Ed.-Pearson (2013)
543 pages
Wharton Business Analytics Coursera Quiz
100% (2)
Wharton Business Analytics Coursera Quiz
155 pages
ML Lab Mannual R22 Cse (DS)
No ratings yet
ML Lab Mannual R22 Cse (DS)
46 pages
Difference Between Correlation and Regression
No ratings yet
Difference Between Correlation and Regression
7 pages
Financial Modeling Final Exam
100% (1)
Financial Modeling Final Exam
6 pages
Unit-2-Linear Regression-R1
No ratings yet
Unit-2-Linear Regression-R1
21 pages
CHAPTER 2 Tesfaye Final - New Slide
No ratings yet
CHAPTER 2 Tesfaye Final - New Slide
159 pages
Topic 3 - Simple Regression Analysis
No ratings yet
Topic 3 - Simple Regression Analysis
37 pages
Simple Regression
No ratings yet
Simple Regression
14 pages
Unit III Part B
No ratings yet
Unit III Part B
31 pages
Chapter - 3.
No ratings yet
Chapter - 3.
14 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
REGRESSION
No ratings yet
REGRESSION
38 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
(Ebook PDF) Essentials of Statistics For Business and Economics 7th Editioninstant Download
100% (3)
(Ebook PDF) Essentials of Statistics For Business and Economics 7th Editioninstant Download
55 pages
Slides
No ratings yet
Slides
39 pages
Correlation
No ratings yet
Correlation
13 pages
Regression12 5
No ratings yet
Regression12 5
6 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
ML - Module 3 Chapter 5
No ratings yet
ML - Module 3 Chapter 5
10 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
No ratings yet
G. S. Maddala - Introduction To Econometrics-Macmillan Pub. Co. - Maxwell Macmillan Canada - Maxwell Macmillan International (1992)
637 pages
Chapter 1 The Nature of Regression Analysis
No ratings yet
Chapter 1 The Nature of Regression Analysis
9 pages
DSC 402
No ratings yet
DSC 402
14 pages
Aalysis
No ratings yet
Aalysis
16 pages
CH 5
No ratings yet
CH 5
36 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
Introduction To Simple Linear Regression
No ratings yet
Introduction To Simple Linear Regression
34 pages
Chapter Regression PDF
No ratings yet
Chapter Regression PDF
95 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
No ratings yet
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
7 pages
University of Caloocan City: Managerial Economics Eco 3
No ratings yet
University of Caloocan City: Managerial Economics Eco 3
34 pages
Regression
No ratings yet
Regression
7 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition Download
100% (7)
(Ebook PDF) Modern Business Statistics, With Microsoft Office Excel 4th Edition Download
56 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Correlation and Regression Notes
No ratings yet
Correlation and Regression Notes
5 pages
Module 9 - Simple Linear Regression & Correlation
No ratings yet
Module 9 - Simple Linear Regression & Correlation
29 pages
Regression Make Simple
No ratings yet
Regression Make Simple
13 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
9 Regression Analysis
No ratings yet
9 Regression Analysis
38 pages
4
No ratings yet
4
3 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
No ratings yet
Common Pitfalls in Statistical Analysis: Linear Regression Analysis
4 pages
01 397202 048 8035282141 16102020 084920pm
No ratings yet
01 397202 048 8035282141 16102020 084920pm
4 pages
Chapter Two Part One
No ratings yet
Chapter Two Part One
6 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
No ratings yet
Planet, Code - MACHINE LEARNING WITH PYTHON - A Comprehensive Guide To Algorithms, Deep Learning Techniques, and Practical Applications (2025)
233 pages
4 STAT-602 Regression & Correlation (Mid&Final)
No ratings yet
4 STAT-602 Regression & Correlation (Mid&Final)
22 pages
Regression
No ratings yet
Regression
25 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
6 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
Cost & Mgmt. Accoutning I Worksheet
100% (1)
Cost & Mgmt. Accoutning I Worksheet
6 pages
Definition 3. Use of Regression 4. Difference Between Correlation and Regression 5. Method of Studying Regression 6. Conclusion 7. Reference
No ratings yet
Definition 3. Use of Regression 4. Difference Between Correlation and Regression 5. Method of Studying Regression 6. Conclusion 7. Reference
11 pages
Cost II ch.2
No ratings yet
Cost II ch.2
25 pages
Note Simple Linear Regression
No ratings yet
Note Simple Linear Regression
17 pages
Aswini Bajaj
No ratings yet
Aswini Bajaj
21 pages
Brief Lecture Notes On Simple Linear Regression Regression Analysis
No ratings yet
Brief Lecture Notes On Simple Linear Regression Regression Analysis
8 pages
Econ140 Spring18 Syllabus
No ratings yet
Econ140 Spring18 Syllabus
7 pages
Ai&ml Unit 5
No ratings yet
Ai&ml Unit 5
89 pages
2025 L2 QuantMethods
No ratings yet
2025 L2 QuantMethods
57 pages
Correlation and Simple Linear Regression Analyses: Objectives
No ratings yet
Correlation and Simple Linear Regression Analyses: Objectives
6 pages
Assignments
No ratings yet
Assignments
6 pages
Financial Modelling Course Outline
No ratings yet
Financial Modelling Course Outline
3 pages
2-15 Corr&SimpReg
No ratings yet
2-15 Corr&SimpReg
72 pages
Financial Modeling Final Exam
No ratings yet
Financial Modeling Final Exam
4 pages
Tutorial Regression
No ratings yet
Tutorial Regression
2 pages
Risk Management and Insurance - Course Outline
No ratings yet
Risk Management and Insurance - Course Outline
2 pages
B.B Proposal
No ratings yet
B.B Proposal
44 pages
Module 3
No ratings yet
Module 3
98 pages
Lesson1 - Simple Linier Regression
No ratings yet
Lesson1 - Simple Linier Regression
40 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
Chapter 3 Master budget-II
No ratings yet
Chapter 3 Master budget-II
12 pages
CH4@Financial Markets
No ratings yet
CH4@Financial Markets
10 pages
Ques Draft - Finished COMPLETED
No ratings yet
Ques Draft - Finished COMPLETED
7 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
To Whom It May Concern
No ratings yet
To Whom It May Concern
1 page
Lensa Letter
No ratings yet
Lensa Letter
1 page
BUSN 4000 Syllabus
No ratings yet
BUSN 4000 Syllabus
10 pages
EC403 U3 Random Regressors and Moment Based Estimation
No ratings yet
EC403 U3 Random Regressors and Moment Based Estimation
42 pages
CH 14 .....
No ratings yet
CH 14 .....
36 pages
Lesson 6 02 Regression 2
No ratings yet
Lesson 6 02 Regression 2
17 pages
Sta108 Grouping
No ratings yet
Sta108 Grouping
15 pages
Regression Analysis
No ratings yet
Regression Analysis
3 pages
MAS202 - Homework For Chapter 13-14
No ratings yet
MAS202 - Homework For Chapter 13-14
7 pages
UCCM2263 Tutorial 1 2016
No ratings yet
UCCM2263 Tutorial 1 2016
4 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Chapter 1 Simple Linear Regression Model
No ratings yet
Chapter 1 Simple Linear Regression Model
2 pages
BAN 602 - Project2
No ratings yet
BAN 602 - Project2
4 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
From Everand
Multivariate Analysis – The Simplest Guide in the Universe: Bite-Size Stats, #6
Lee Baker
No ratings yet

Econometrics - Chapter - Chapter - II

Uploaded by

Econometrics - Chapter - Chapter - II

Uploaded by

Chapter 2: Simple Linear Regression

2.1 The concept of Regression Analysis

Types of lnear regression

a) Linearity in the variables

appears with a power of 2

2.2 Population Regression function Vs Sample Regression Function

Population regression Function

As noted in Section 1.2, regression analysis is largely concerned with estimating

population regression curve. More simply, it is the regression of Y on X. The

Weekly consumption expenditure,

Geometrically, then, a population regression curve is simply the locus of the

expectation of Y is a linear function of Xi, such as, for example, (2.2.2).

interpretation, a regression function such as E(Y | Xi) = β1 + β2 X2 is not a linear

The second interpretation of linearity is that the conditional

the variable X.7 In this interpretation E(Y | Xi ) = β1 + β2 X2 is a linear (in the

linear in the parameters. Now consider the model E(Y | Xi ) = β1 + β2 Xi . Now

suppose X = 3; then we obtain E(Y | Xi ) = β1 + 3β2 , which is nonlinear in the

β1 + β2 X2, which is linear in the parameters but nonlinear in variable X.

Model linear in parameters? Model linear in variables?

Yes LRM LRM

Note: LRM = linear regression model

2.3 STOCHASTIC SPECIFICATION OF PRF

Equation (2.4.2) posits that the consumption expenditure of a family is linearly

Now if we take the expected value of (2.4.1) on both sides, we obtain

= E(Y | Xi ) + E(ui | Xi )-------------------------------------------------------- (2.4.4)

2.2 THE SAMPLE REGRESSION FUNCTION (SRF)

A RANDOM SAMPLE FROM THE POPULATION OF TABLE 2.1

ANOTHER RANDOM SAMPLE FROM THE POPULATION OF TABLE 2.1

FIGURE 2.4 Regression lines based on two different samples.

Where Yˆ is read as “Y-hat’’ or “Y-cap’’

Weekly consumption expenditure,

PRF: E(Y | Xi)= 1 + 2 Xi

You might also like