0% found this document useful (0 votes)
11 views45 pages

Introduction To Econometrics

The document provides an introduction to econometrics, defining it as the quantitative analysis of economic phenomena using statistical and mathematical techniques. It discusses the importance of econometrics in verifying economic theories and models, as well as the different types of regression analysis used in the field. Additionally, it outlines the assumptions underlying the Ordinary Least Squares (OLS) method, which is crucial for estimating parameters in linear regression models.

Uploaded by

Warren chiwoko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views45 pages

Introduction To Econometrics

The document provides an introduction to econometrics, defining it as the quantitative analysis of economic phenomena using statistical and mathematical techniques. It discusses the importance of econometrics in verifying economic theories and models, as well as the different types of regression analysis used in the field. Additionally, it outlines the assumptions underlying the Ordinary Least Squares (OLS) method, which is crucial for estimating parameters in linear regression models.

Uploaded by

Warren chiwoko
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

INTRODUCTION TO

ECONOMETRICS
By : W. CHIWOKO

UNILIL BSc in Applied and Development Economics 04/30/202 1


5
What is Econometrics?

• Econometrics means measurement (the meaning of the Greek word


metrics) in economics.
• Econometrics may be defined as the quantitative analysis of actual
economic phenomena based on the concurrent development of theory and
observation, related by appropriate methods of inferences.
• Econometrics may also be defined as the social sciences in which the
tools of economic theory, mathematics and statistical inference are
applied to the analysis of economic phenomena.
• It includes all those statistical and mathematical techniques that are
utilized in the analysis of economic data.

UNILIL BSc in Applied and Development Economics 04/30/2025 2


Why Econometrics?
• The main aim of using econometric tools is to prove or disprove
particular economic propositions and models.
• In econometrics, the result of a certain outlook on the role of
economics consists of the application of mathematical statistics to
economic data to tend empirical support to the models constructed
by mathematical economics and to obtain numerical results.
• Based on the definition above, econometrics is an amalgam of
economic theory, mathematical economics, economic statistics and
mathematical statistics.
UNILIL BSc in Applied and Development Economics 04/30/2025 3
Why Studying Econometrics
Separately ?
• Economic theory makes statements or hypotheses that are mostly
qualitative in nature.
• For example, microeconomics states that, other thing remaining the
same(constant), a reduction in the price of a commodity is expected to
increase the quantity demanded of that commodity.
• Thus, economic theory postulates a negative or inverse relationship
between the price and quantity demanded of a commodity.
• But the theory itself does not provide any numerical measure of the
relationship between the two
• That is it does not tell by how much the quantity will go up or down as a result of a
certain change in the price of the commodity.
UNILIL BSc in Applied and Development Economics 04/30/2025 4
Why Studying Econometrics

Separately ?
The main concern of Mathematical economics is to
express economic theory in mathematical form (equation)
without regard to measurability or mainly interested in
the empirical verification of the theory.
• Econometrics, as noted in our discussion above, is mainly
interested in the empirical verification of economic theory.
• The econometrician often uses the mathematical
equations proposed by the mathematical economist but
puts these equations in such a form that they lend
themselves to empirical testing and this conversion of
mathematical and practical skill.
UNILIL BSc in Applied and Development Economics 04/30/2025 5
Why Studying Econometrics
Separately ?
• Economic statistics is mainly concerned with collecting, processing
and presenting economic data in the form of charts and tables.
• These are the jobs of the economic statistician.
• It is he or she who is primarily responsible for collecting data on
gross national product (GNP) employment, unemployment, price etc.
the data on thus collected constitute the raw data for econometric
work, but the economic statistician does not go any further, not
being concerned with using the collected data to test economic
theories and one who does that becomes an econometrician.

UNILIL BSc in Applied and Development Economics 04/30/2025 6


Why Studying Econometrics
Separately ?
• Although mathematical statistics provides many tools used
in the trade, the econometrician often needs special
methods in view of the unique nature of the most
economic data, namely, that the data are not generated as
the result of a controlled experiment.
• The econometrician, like the meteorologist, generally
depends on data that cannot be controlled directly.

UNILIL BSc in Applied and Development Economics 04/30/2025 7


Estimation and Testing Models:
• What is regression?
• Regression analysis is a fundamental statistical technique used in many fields,
from finance, econometrics to social sciences.
• It is the statistical method(technique) used for the estimation of relationship
between a dependent variable and one or more independent
variables(explanatory)
• It is used to asses the strength of the relationship between variables
• The goal of regression analysis is to express the response variable(dependent
variable) as a function of the predictor variables(independent variables).
• Types of regression analysis
• Simple linear regression
• Multiple linear regression
• Logistic regression
• Multivariate Linear regression
UNILIL BSc in Applied and Development Economics 04/30/2025 8
Types of regression Analysis
• Simple Linear Regression: Involves one independent variable
and one dependent variable.
• The equation is: y = β0 + β1X
• where y is the dependent variable, X is the independent variable, β 0 is the intercept, and β1 is the
slope
• Multiple Linear Regression: Involves more than one independent
variable. The equation is: y = β0 + β1X1 + β2X2 + ... + βnXn
• where X1, X2, ..., Xn are the independent variables, and β0, β1, ..., βn are the
coefficients
• NB: The Ordinary Least Squares (OLS) method helps in
estimating the parameters of this regression model.
UNILIL BSc in Applied and Development Economics 04/30/2025 9
Ordinary least squares (OLS) regression
• Ordinary least squares (OLS) is a technique used in linear regression
model to find the best-fitting line for a set of data points by
minimizing the residuals (the differences between the observed and
predicted values).
• The best fitting line in OLS regression is the line that minimizes the
sum of the squared differences (residuals) between the observed values
and the predicted values.
• This line is often referred to as the "regression line" or "line of best
fit." Mathematically, it can be represented by the equation:

• Where: y is the dependent variable (the outcome we are trying to


predict), x is the independent variable (the predictor), b0 is the
intercept (the value of y when x is 0), b1 is the slope (the change in y
UNILIL BSc in Applied and Development Economics 04/30/2025 10
for a one-unit change in x).
Key takeaways in OLS
• The ordinary least squares (OLS) method can be defined as a linear
regression technique that is used to estimate the unknown parameters
in a model.
• SSR measures the level of variance in error term, or residuals, of a
regression model.
• So the OLS method minimizes the sum of squared residuals (SSR),
defined as the difference between the actual (observed values of the
dependent variable) and the predicted values from the model.
• SSE = ∑(yi−ŷi)^2
• where yi are the observed values and ŷi are the predicted values.
• The smaller the SSR the better the your model fits your data and vice
versa.
UNILIL BSc in Applied and Development Economics 04/30/2025 11
But what are residuals?
• Residuals are the differences between the observed data
values and the least squares regression line.
• The difference between the actual(observed) value and the value
predicted by the model (y- ŷ).
• The line represents the model’s predictions
• There is one residual per data point, and they collectively
indicate the degree to which the model is wrong.
• To calculate the residual mathematically, it’s simple
subtraction.
• Residual = Observed value – Model value.
UNILIL BSc in Applied and Development Economics 04/30/2025 12
Graphical representation of the residuals

Observed values
UNILIL BSc in Applied and Development Economics 04/30/2025 13
Graphical representation of the residuals

UNILIL BSc in Applied and Development Economics 04/30/2025 14


How to calculate the line of best

fit?
The following equation represents for a one-unit change in the
the best fitting regression line: independent variable.
y = b + mx • We need to calculate the values of
• Where: m and b to find the equation for
the best-fitting line.
o y is the dependent variable.
o x is the independent variable. • Here are the least squares
regression line formulas for the
o b is the y-intercept.
slope (m) and intercept (b):
o m is the slope of the line.
• The slope represents the mean
change in the dependent variable

UNILIL BSc in Applied and Development Economics 04/30/2025 15


Worked examples

ŷ = 11.329 + 1.0616x

UNILIL BSc in Applied and Development Economics 04/30/2025 16


The Assumptions underlying the Method of OLS
• Like many statistical analyses, ordinary least squares (OLS) regression has its own
underlying assumptions.
• When these assumptions for linear regression are found true, Ordinary Least Squares
produces the best estimates.
• However, if some of these assumptions are not materialised, you might need to employ
remedial measures or use other estimation methods to improve the results.
• Most of these assumptions pivot around the properties of the error term.
• Unfortunately, the error term is a population parameter that we never known in advance.
• Instead, we are using the next best thing that is available the residuals. Residuals are the
sample estimate of the error for each observation.
UNILIL BSc in Applied and Development Economics 04/30/2025 17
OLS ASSUMPTION
• The Ordinary Least Squares (OLS) method is a
fundamental statistical technique used for estimating the
parameters in a linear regression model.
• The performance and validity of the OLS estimates rely on
certain assumptions known as the Gauss-Markov
assumptions.
• These assumptions help ensure that the OLS estimators
are the Best Linear Unbiased Estimators (BLUE) —
meaning
UNILIL they Economics
BSc in Applied and Development have the smallest variance among
04/30/2025 all
18

unbiased estimators.
Assumption 1: The regression model
is linear in parameters and variables
• This assumption addresses the functional form of the model.
• In statistics, linearity of a model can be expressed in two ways;
• Linearity in variables and
• Linearity in parameters
• Linearity in variables means that the conditional expectation(E) of Y is a linear function
of xi.
• That is, geometrically the regression curve in this case is a straight line y = mx i + c.
• the powers of the variables are always one. That is,
• E(Y/Xi) = β0+β1Xi is a linear function whereas, E(Y/Xi) = β0+β1Xi2 is not a linear function.
• In other words, a function y =f(x) is said to be linear in the variables if x appears with a
power or index of 1 only and is not multiplied or divided by any other variable
UNILIL BSc in Applied and Development Economics 04/30/2025 19
OLS Assumption 2: The
conditional mean should be zero.
• The expected value of the mean of the error terms of OLS
regression should be zero given the values of independent
variables.
• Mathematically, E(ε∣X)=0. This is sometimes just written
as E(ε)=0.
• In other words, the distribution of error terms has zero
mean and doesn’t depend on the independent variables X′s.
Thus, there must be no relationship between the X′s and the
error
UNILIL term.
BSc in Applied and Development Economics 04/30/2025 20
Assumption 3: OLS should be
independent(no multicollinear)
• In a simple linear regression model, there is only one independent variable and
hence, by default, this assumption will hold true.
• However, in the case of multiple linear regression models, there are more than one
independent variable.
• The OLS assumption of no multi-collinearity says that there should be no linear
relationship between the independent variables.
• For example, suppose you spend your 24 hours in a day on three things – sleeping, studying, or
playing. Now, if you run a regression with dependent variable as exam score/performance and
independent variables as time spent sleeping, time spent studying, and time spent playing, then
this assumption will not hold.
• This is because there is perfect collinearity between the three independent variables.
UNILIL BSc in Applied and Development Economics 04/30/2025 21
• Time spent sleeping = 24 – Time spent studying – Time spent playing.
• In such a situation, it is better to drop one of the three independent
variables from the linear regression model.
• If the relationship (correlation) between independent variables is
strong (but not exactly perfect), it still causes problems in OLS
estimators.
• Hence, this OLS assumption says that you should select independent variables
that are not correlated with each other.
• An important implication of this assumption of OLS regression is
that there should be sufficient variation in the X′s. More the
variability in X′s, better are the OLS estimates in determining the
impact of X′s on Y.
UNILIL BSc in Applied and Development Economics 04/30/2025 22
OLS Assumption 4: Spherical errors: There is
homoscedasticity and no autocorrelation.

• According to this OLS assumption, the error terms in the


regression should all have the same variance.
• Mathematically, Var(ε∣X)=σ2.
• If this variance is not constant (i.e. dependent on X’s), then the
linear regression model has heteroscedastic errors and likely to
give incorrect estimates.
• This OLS assumption of no autocorrelation says that the error
terms of different observations should not be correlated with
each other.
UNILIL BSc in Applied and Development Economics 04/30/2025 23
autocorrelation
• Mathematically, Cov (εi​εj​∣X)=0for i​=j
• For example, when we have time series data (e.g. yearly
data of unemployment), then the regression is likely to
suffer from autocorrelation because unemployment next
year will certainly be dependent on unemployment this
year.
• Hence, error terms in different observations will surely
be correlated with each other.

UNILIL BSc in Applied and Development Economics 04/30/2025 24


How do we check for heteroscedasticity
We can use several methods to detect the heteroscedasticity but we will delve into two
methods for a start and these are:
1. Breusch-Pegan test
• You look at the p value of the chi2
• If the p value is not significant, we fail to reject the null hypothesis, then the
error terms have a constant variance hence Homoskedasticity
• If the p value is significant, we reject the null the null hypothesis, then the error
terms have no constant variance hence Heteroskedasticity
2. Using scatter plot graphs
• The vertical axis shows the residuals (the differences between observed and
predicted values)
• The Horizontal axis shows the fitted (predicted) values
• Then look for the patterns
• Uniform spread of points suggest homoscedasticity (constant variance)
UNILIL BSc in Applied and Development Economics 04/30/2025 25

• When the patterns are in a funnel shape it suggest heteroscedasticity (no constant
variance)
• In simple terms, this OLS assumption means that the error
terms should be IID (Independent and Identically
Distributed).
• The diagram below shows the difference between Homoscedasticity
and Heteroscedasticity. The variance of errors is constant in case of
homoscedasticity while it’s not the case if errors are
heteroscedastic.

UNILIL BSc in Applied and Development Economics 04/30/2025 26


OLS Assumption 5: Error terms should be
normally distributed.

• This assumption states that the errors are normally distributed,


conditional upon the independent variables.
• This OLS assumption is not required for the validity of OLS
method; however, it becomes important when one needs to
define some additional finite-sample properties.
• Note that only the error terms need to be normally distributed. The
dependent variable Y need not be normally distributed.

UNILIL BSc in Applied and Development Economics 04/30/2025 27


The Use of OLS Assumptions
• OLS assumptions are extremely important. If the OLS
assumptions 1 to 4 hold, then according to Gauss-Markov
Theorem, OLS estimator is Best Linear Unbiased Estimator
(BLUE).
• These are desirable properties of OLS estimators and require
separate discussion in detail. However, below the focus is on
the importance of OLS assumptions by discussing what
happens when they fail and how can you look out for
potential errors when assumptions are not outlined.

UNILIL BSc in Applied and Development Economics 04/30/2025 28


The Assumption of Linearity
• If you fit a linear model to a data that is non-linearly related, the
model will be incorrect and hence unreliable.
• When you use the model for extrapolation, you are likely to get
erroneous results. Hence, you should always plot a graph
of observed predicted values.
• If this graph is symmetrically distributed along the 45-degree line,
then you can be sure that the linearity assumption holds.
• If linearity assumptions don’t hold, then you need to change the
functional form of the regression, which can be done by taking non-
linear transformations of independent variables (i.e. you can
take logX instead of X as your independent variable) and then check
for linearity.
UNILIL BSc in Applied and Development Economics 04/30/2025 29
The Assumption of Homoscedasticity
• If errors are heteroscedastic (i.e. OLS assumption is violated), then it
will be difficult to trust the standard errors of the OLS estimates.
• Hence, the confidence intervals will be either too narrow or too wide.
• Also, violation of this assumption has a tendency to give too much
weight on some portion (subsection) of the data.
• Hence, it is important to fix this if error variances are not constant.
• You can easily check if error variances are constant or not. Examine
the plot of residuals predicted values or residuals vs. time (for time
series models).
• Typically, if the data set is large, then errors are more or less
homoscedastic. If your data set is small, check for this assumption.
UNILIL BSc in Applied and Development Economics 04/30/2025 30
The Assumption of Normality of Errors
• If error terms are not normal, then the standard errors of OLS estimates
won’t be reliable, which means the confidence intervals would be too wide or
narrow.
• Also, OLS estimators won’t have the desirable BLUE property.
• A normal probability plot or a normal quantile plot can be used to check if the
error terms are normally distributed or not.
• A bow-shaped deviated pattern in these plots reveals that the errors are not
normally distributed.
• Sometimes errors are not normal because the linearity assumption is not
holding.
• NB: So, it is worthwhile to check for linearity assumption again if this
assumption fails.
UNILIL BSc in Applied and Development Economics 04/30/2025 31
Assumption of No
Multicollinearity
• You can check for multicollinearity by making a correlation matrix
(though there are other complex ways of checking them like
Variance Inflation Factor, etc.).
• Almost a sure indication of the presence of multi-collinearity is
when you get opposite (unexpected) signs for your regression
coefficients (e. if you expect that the independent variable
positively impacts your dependent variable but you get a negative
sign of the coefficient from the regression model). It is highly
likely that the regression suffers from multi-collinearity.
• If the variable is not that important intuitively, then dropping that
variable or any of the correlated variables can fix the problem.
UNILIL BSc in Applied and Development Economics 04/30/2025 32
Coefficient of Determination (R²)
• The coefficient of determination is a number between 0 and
1 that measures how well a statistical model predicts an
outcome.
• The coefficient of determination is often written as R2, which
is pronounced as “r squared.”
• The coefficient of determination (R²) measures how well a
statistical model predicts an outcome. The outcome is
represented by the model’s dependent variable.
• The lowest possible value of R² is 0 and the highest possible
value is 1. Put simply, the better a model is at making
predictions, the closer its R² will be to 1.
UNILIL BSc in Applied and Development Economics 04/30/2025 33
• Example: Imagine that you perform a simple linear regression
that predicts students’ exam scores (dependent variable) from
their time spent studying (independent variable).
• If the R2 is 0, the linear regression model doesn’t allow you to
predict exam scores any better than simply estimating that
everyone has an average exam score.
• If the R2 is between 0 and 1, the model allows you to partially
predict exam scores. The model’s estimates are not perfect, but
they’re better than simply using the average exam score.
• If the R2 is 1 or closer to 1, the model allows you to perfectly
predict anyone’s exam score.
UNILIL BSc in Applied and Development Economics 04/30/2025 34
• More technically, R2 is a measure of goodness of fit. It is the
proportion of variance in the dependent variable that is
explained by the model.
• Graphing your linear regression data usually gives you a
good clue as to whether its R2 is high or low. For example,
the graphs below show two sets of simulated data:
• The observations are shown as dots.
• The model’s predictions (the line of best fit) are shown as a black line.
• The distance between the observations and their predicted values
(the residuals) are shown as purple lines.
UNILIL BSc in Applied and Development Economics 04/30/2025 35
• You can see in the first dataset that when the R2 is high, the observations are close
to the model’s predictions. In other words, most points are close to the line of best
fit:
UNILIL BSc in Applied and Development Economics 04/30/2025 36
• In contrast, when the R2 is low, the observations are far from the model’s
predictions. In other words, when the R2 is low, many points are far from
the line of best fit:

• NB: The coefficient of determination is always positive, even when


the correlation is negative.
UNILIL BSc in Applied and Development Economics 04/30/2025 37
Calculating the R-squared
• You can choose between two formulas to calculate the
coefficient of determination (R²) of a simple linear
regression.
• The first formula is specific to simple linear regressions, and
the second formula can be used to calculate the R² of many
types of statistical models.
• Formula 1:
• Where r is the pearson correlation coefficient

UNILIL BSc in Applied and Development Economics 04/30/2025 38


• Example: Calculating R² using the correlation coefficient
• You are studying the relationship between heart rate
and age in children, and you find that the two variables
have a negative Pearson correlation(r) of -0.28:

• = 0.08

UNILIL BSc in Applied and Development Economics 04/30/2025 39


Formula 2: Using the regression outputs

• =
• Where:
• RSS = ∑( Ŷ- Ῡ = sum of squared residuals
• TSS = ∑ = total sum of squares
• ESS = ∑ (= error sum of squares

UNILIL BSc in Applied and Development Economics 04/30/2025 40


Example 2: Calculating and interpretation
of R²
• As part of performing a simple proportion of variance in
linear regression that predicts the dependent variable that is
students’ exam scores predicted by the
(dependent variable) from their statistical model.
study time (independent • 71% of the variance in
variable), you calculate that: students’ exam scores is
• TSS = 2187.04 predicted by their study time
• ESS =629.22 • 29% of the variance in
• Therefore = student’s exam scores is
unexplained by the model
• =
• The students’ study time has
• = 1 – 0.29 = = 0.71 a large effect on their exam
• You can interpret the coefficient scores
UNILIL BSc in Applied and Development Economics 04/30/2025 41
of determination (R²) as the
. import excel "C:\Users\hp\Desktop\EXAMPLE.xlsx", sheet("Sheet1") firstrow
(2 vars, 5 obs)

. reg Y X

Source SS df MS Number of obs = 5


F(1, 3) = 7.54
Model 32.9109589 1 32.9109589 Prob > F = 0.0710
Residual 13.0890411 3 4.3630137 R-squared = 0.7155
Adj R-squared = 0.6206
Total 46 4 11.5 Root MSE = 2.0888

Y Coefficient Std. err. t P>|t| [95% conf. interval]

X 1.061644 .3865466 2.75 0.071 -.16852 2.291808


_cons 11.32877 1.940449 5.84 0.010 5.153394 17.50414

.
UNILIL BSc in Applied and Development Economics 04/30/2025 42
Is the R-squared enough?
• However, R2 is less useful in measuring the goodness of fit
of a multiple regression model. This is because it increases
each time you add new independent variables, even if the
variation explained by them may not be statistically
significant.
• An overfitted model contains deceptively high multiple
R2 values thus have a decreased ability to make precise
predictions.
• So it is appropriate to use the adjusted R-squared
UNILIL BSc in Applied and Development Economics 04/30/2025 43
Adjusted R2

• Adjusted R2 , adjusts for the


number of independent
variables in the model. Its
value increases only when the
added independent variables
improve the fit of the
regression model.
• The adjusted R2 can be
• Furthermore, it decreases
negative if R2 is low enough.
when the added variables do
not improve the model fit by a• However, R2 is always
good enough amount. positive.
UNILIL BSc in Applied and Development Economics 04/30/2025 44
• Where:
• n = number of observations.
• k = number of the independent variables (slope
coefficients).

UNILIL BSc in Applied and Development Economics 04/30/2025 45

You might also like