1.Introduction _lecture notes_
1.Introduction _lecture notes_
__________________________________________________________________________________
The process of doing econometrics involves combing economic theory, maths, statistics, data and
software for conducting analysis. As Wooldridge suggests, that analysis could mean using data to
test whether a particular economic theory (e.g. we could estimate fiscal multipliers to test whether
Keynes’s theory that boosting aggregate demand via more government expenditure to raise GDP
output holds in South Africa) or evaluating a given policy (e.g. we could test whether the
Employment Tax Incentive subsidy actually resulted in more jobs for previously unemployed youth).
This course provides you with an introduction to this process. The main outcome of the course is
understanding how to apply the Ordinary Least Squares Linear Regression Model in bivariate form –
meaning that you will learn how to estimate the relationship between two variables: one
dependent, and one independent variable. This course primarily serves as a foundation for learning
more useful econometric techniques in 3rd year and honours courses – where you will expand the
bivariate model to test relationships between multiple variables and different types of relationships.
The image on slide 4 shows what we’ll be doing. In the image you see a graph with an 𝑥-axis
(measuring the number of hours studied for ECO242 – which is the independent variable in this
instance) and a 𝑦-axis (measuring the final mark for ECO242 – the dependent variable). The dots in
the graph are data points for these two variables (study hours and final marks) taken from a sample
of past ECO242 students. The line drawn through this cloud of data points is the regression line. That
line was estimated using linear regression. By applying regression, we can reduce all those data
points to just two estimated parameters, which are given in the equation on the graph: a slope
estimate of 2.2874 and an intercept estimate of 23.028. This is what you will learn to do in this
course: reduce a set of data on two variables to a linear equation with two parameters. Next year (in
3rd year econometrics) you’ll extend the number of estimated parameters beyond two.
The discipline is a relatively ‘young’ one and there have been several advances in the field since its
inception. These slides present some background on the more important figures and institutions in
the development of the field.
Note that it is likely that advances in AI technology will impact practices in the field in future. Some
economists are already using machine learning techniques to estimate models and AI is likely to
either replace some of the steps undertaken in econometric modelling, or possibly introduce new
modelling techniques – at this stage it is unclear how exactly AI will impact econometrics and other
fields.
The remainder of the slides go through the eight-step methodology of an econometric analysis. This
is a very general procedural guide to how econometrics is done.
Slide 8. Typically, you’ll start out with a hypothesis or some research question you’d like to answer.
The example used here is the marginal propensity to consume (MPC). This is a simple
macroeconomic theory you would have encountered in your first-year macro course. It states that as
aggregate income increases, aggregate consumption increases, but by less than the increase in
income. That is, the value of the MPC lies between zero and one.
Slide 9. Next, you’ll write down a mathematical model for this theory. Writing down the
mathematical model is useful as it allows you to make clear statements about the relationships of
the variables of interest. In this case, the model you saw in first year macro will do. It is simply a
linear equation with consumption as the dependent variable and income as the independent
variable. In this mathematical model a linear relationship is specified for these two variables. There
are two parameters in the model: a slope and an intercept. These are called, respectively, 𝛽2 and 𝛽1 .
Recall that the hypothesis stated that 0 < 𝑀𝑃𝐶 < 1, or, to put it in terms of the econometric model,
0 < 𝛽2 < 1. This model is ‘deterministic’ – it indicates that the relationship between consumption
and income is exact. Clearly, this is not credible as economic models (unlike, say, models in physics)
are never exact.
Slide 11. This is why we transform the mathematical model into an econometric (or statistical)
model. We do so by adding an ‘error term’ to the mathematical model – the 𝑢 in the equation
𝐶𝑜𝑛𝑠𝑢𝑚𝑝𝑡𝑖𝑜𝑛 = 𝛽1 + 𝛽2 𝐼𝑛𝑐𝑜𝑚𝑒 + 𝑢. This indicates that the relationship is not exact and that
there may be factors other than income which influence consumption expenditure.
Slide 12. Now that we have an econometric model, we need to ‘feed’ some data into it. For this
analysis, since we are dealing with aggregate values, we’ll need time-series data on consumption
and income. If we were interested in the micro level (or individual/household level) relationship
between consumption and expenditure we might have gathered some cross-section data on the two
variables from households or individuals.
Slide 13 shows a plot of the data. As shown on the axes, both consumption and income are
measured in trillions of rands.
Slide 14 shows the estimated regression model (we will deal with how to estimate these values later
in the course). The two parameters now have values. The intercept is estimated to be 0.125 and the
slope is estimated to be 0.63. How do we interpret these estimated values? Recall that in a linear
equation the 𝑦 variable equals the intercept when the 𝑥 variable is set to zero. Further, the slope
indicates the change in 𝑦 for a one unit change in 𝑥.
Slide 15 illustrates the regression line. Notice that for the data points near the origin, the regression
line ‘underestimates’ the data. The same appears to be the case at the top right of the plot. In some
cases, the estimated line ‘overestimates’ the data. What do you think? Is this a reasonable fit of the
data?
Slide 16 discusses hypothesis testing. This occurs once parameters are estimated and is meant to
subject the parameters to further ‘robustness’ checks. Estimates derived from sample data are
necessarily imperfect and subject to sampling variation – meaning that applying the same model to a
different sample of data may yield a slightly different set of parameter estimates. In this example it
is possible to test whether the true value is close to one, for instance.
Slide 17 mentions forecasting and prediction. This is not something we’ll touch on at all in this
course. But this would be the logical uses of regression results.
Slide 18 talks about policy uses of regression results. This, again, is not something that will be
touched on in ECO242. Naturally, in an ideal world, government would base its policy agenda on
evidence. Regression analysis is one means of producing evidence for policy decision-making
purposes. In the simplified example on this slide, government is targeting a level of consumption
expenditure equal to R2 trillion. The estimated regression model indicates that national income
(proxied by GDP) must be R3.373 trillion in order to obtain consumption expenditure of R2 trillion in
this economy. This is found by simply ‘plugging in’ the number 2 into the model. Recall that
consumption is the 𝑌 (or independent) variable in this model, so if you want to know what level of
income is associated with consumption of R2 trillion, we use the model as follows:
2 = −0.123 + 0.63𝐼𝑛𝑐𝑜𝑚𝑒.
Then, solve for income (GDP).