Advance Econometrics Assignment
Advance Econometrics Assignment
Autoregressive Model:
Autoregressive models are a class of machine learning (ML) models that automatically
predict the next component in a sequence by taking measurements from previous inputs in the
sequence. Autoregression is a statistical technique used in time-series analysis that assumes
that the current value of a time series is a function of its past values. Autoregressive models
use similar mathematical techniques to determine the probabilistic correlation between
elements in a sequence. They then use the knowledge derived to guess the next element in an
unknown sequence. For example, during training, an autoregressive model processes several
English language sentences and identifies that the word “is” always follows the word
“there.” It then generates a new sequence that has “there is” together.
Linear regression
You can imagine linear regression as drawing a straight line that best represents the average
values distributed on a two-dimensional graph. From the straight line, the model generates a
new data point corresponding to the conditional distribution of historical values.
Consider the simplest form of the line graph equation between y (dependent variable) and x
(independent variable); y=c*x+m, where c and m are constant for all possible values of x and
y. So, for example, if the input dataset for (x,y) was (1,5), (2,8), and (3,11). To identify the
linear regression method, you would use the following steps:
Autoregressive models apply linear regression with lagged variables of its output taken from
previous steps. Unlike linear regression, the autoregressive model doesn’t use other
independent variables except the previously predicted results. Consider the following
formula.
Lag
Data scientists add more lagged values to improve autoregressive modeling accuracy. They
do so by increasing the value of t, which denotes the number of steps in the time series of
data. A higher number of steps allows the model to capture more past predictions as input.
For example, you can expand an autoregressive model to include the predicted temperature
from 7 days to the past 14 days to get a more accurate outcome. That said, increasing the
lagged order of an autoregressive model does not always result in improved accuracy. If the
coefficient is close to zero, the particular predictor has little influence on the result of the
model. Moreover, indefinitely expanding the sequence results in a more complex model
requiring more computing resources to run.
In time series applications, we often use the distributed-lag model to assess the dynamic
effects of a predictor x on the response variable y. Dynamic effects herein refers to influence
that occurs incrementally over time rather than all at once. In the simplest case of one
explanatory variable x, the response y in time period t is specified as a linear combination
of x values in the same and previous periods.
where
x[t], x[t-1], x[t-2], . . . denote the x values at time periods t, t-1, t-2, . . . , with ϕ₀, ϕ₁,
ϕ₂, . . . representing the weights of these x values
ε[t] denotes a random variable that represents the unexplainable variation in y[t]; it
is typically assumed to follow a Gaussian distribution with zero-mean and constant variance
A distributed-lag model with infinite lags assumes that y is related to values of x occurred
infinitely long in the past. In cases where the influence of x on y diminishes to zero within a
finite number of time periods, considering a finite number of lags in the model would be
sufficient.
This model has one explanatory variable x measured at t, t-1, and t-2, with parameters fixed
at δ = 1.2, ϕ₀ = 0.5, ϕ₁ = 0.8, and ϕ₂ = 0.3. Let us assume that ε[t] = 0 so that we can focus on
analyzing the contribution of x to the response y.Temporary Change in x
Suppose the explanatory variable x increases temporarily from 0.0 in period t-1 to 1.0 in
period t, and returns to 0.0 in subsequent periods. Examples of x and y are depicted by the
blue and orange lines, respectively, in Figure 1.
From period t-1 to t, the temporary unit increase in x causes y to increase from 1.0 to 1.5. This
change is equal to the value of ϕ₀, which is the rate of change of y with respect to x in the
same time period. It is also commonly known as the immediate effect or short-run
effect. From period t-1 to t+1, y increases from 1.0 to 1.8 and this change is equal to the value
of ϕ₁. Similarly, from period t-1 to t+2, y increases by the value of ϕ₂. We can also infer from
this example that, if x changes by -1.0 from period t-1 to t, then the effects on y would have
the same magnitudes as defined by the lag weights but with opposite sign.
This example shows that the effects on y is not sustainable when the change in x is temporary.
In the absence of error and given an arbitrary temporary change in x from period t-1 to t, the
resulting change in y from period t-1 to t+j for j≥0 is proportional to the change in x. The
proportionality constant is given by the lag weight associated with x[t-j] in the model, or zero
if x[t-j] does not exist in the model.
Permanent Change in x consider when the explanatory variable x increases from 0.0 in
period t-1 to 1.0 in period t, and remains at 1.0 permanently. Change in y from
period t-1 to t is, again, given by the value of ϕ₀. However, since the change in x is
permanent, y does not return to its original level in future periods. Figure 2 illustrates
how y evolves due to a permanent unit change in x. permanent, y does not return to its
original level in future periods. Figure 2 illustrates how y evolves due to a permanent
unit change in x.
From period t-1 to t+1, the permanent unit increase in x causes y to increase from 1.0 to 2.3.
This change is equal to the value of ϕ₀+ϕ₁. From period t-1 to t+2, y increases from 1.0 to
2.6, whose change is equal to the value of ϕ₀+ϕ₁+ϕ₂. In periods t+3 and so on, y persists at
2.8 as x remains at 1.0. The sum of all lag weights determines the eventual change in y. This is
commonly referred as the long-run effect on y in response to a permanent unit change in x.
This example shows that the effects on y would persist when the change in x is permanent.
In the absence of error and given an arbitrary permanent change in x beginning in period t,
the resulting change in y from period t-1 to t+j for j≥0 is proportional to the change in x.
The proportionality constant is given by the sum of lag weights associated with x[t], x[t-
1], . . . x[t-j] in the model.
Simultaneous Equation Model
The identification problem is a deductive, logical issue that must be solved before estimating
an economic model. In a demand and supply model, the equilibrium point belongs to both
curves, and many presumptive curves can be drawn through such a point. We need prior
information on the slopes, intercepts, and error terms to identify the true from the
presumptive demand and supply curves. Such prior information will give a set of structural
equations. If the equations are linear, and the error terms are normally distributed with zero
mean and constant variance, then a model is formed for estimation. A typical identification
process may fix the demand curve and shift the supply curve, cutting the demand curve at
many points to trace it out. By the zero mean assumption of the error term, half the
observations are expected above and half below the demand curve. In the same way, the
supply curve can be identified. This method originated with Ragnar Frisch (1938) and Trygve
Haavelemo (1944). Tjalling Koopmans evolved the order and rank conditions for identifying
linear models (1949). Franklin Fisher’s work was the first major textbook on the subject
(1966), and Charles Manski extended it to the social sciences (1995). The rank condition
guarantees that the equations can be solved. Econometric texts often create a spreadsheet to
demonstrate the rank condition. For the model above, the column is labeled with the
variables Q, P, Y, T, and the rows contain information on the equations. Each cell has either a
0 for an excluded variable or a 1 for an included variable. For the demand function above, the
entry for the row vector is [1, 1, 1, 0] and for the supply function [1, 1, 0, 1]. To identify the
demand curve for the order condition, first locate the zero in its vector, then pick up the
corresponding number in the supply vector. The picked-up number, which is 1, should be
equal to M –1, which is also 1. With many equations, the numbers that we pick up will array
into many rows and columns. The general rank test requires one to find M –1 rows and M –1
columns in that array whose elements are not all zeros, because such a (M –1)(M –1)
spreadsheet will make the model solvable.
The identification problem poses significant challenges in statistical modeling and causal
inference, with profound implications across various fields such as economics, social
sciences, and public health. At its core, the identification problem refers to the difficulty
in distinguishing between correlation and causation among variables. This challenge
arises when multiple factors influence an outcome, making it hard to determine which
variable directly affects another. As a result, researchers may draw erroneous conclusions
that can misguide theory development and empirical analysis.
One of the primary implications of the identification problem is its impact on causal
inference. In many cases, observed relationships between variables might be coincidental
rather than causal. For instance, a study might find that higher education levels correlate
with better health outcomes. Without proper identification, researchers might mistakenly
conclude that education directly improves health, overlooking potential confounding
factors such as socioeconomic status or access to healthcare. This misinterpretation can
lead to ineffective or misguided policies aimed at improving health through educational
initiatives, ultimately failing to address the root causes of health disparities.
The identification problem also affects data utilization in research. Researchers often
collect vast amounts of data, but if they cannot accurately identify the relationships
among variables, they may miss critical insights. Omitted variable bias, where
unobserved factors influence the dependent variable, can lead to incomplete or misleading
interpretations. This issue is particularly pronounced in observational studies, where
randomization is not feasible, making it essential for researchers to employ robust
methodologies to account for potential confounders.
Rule of identification:
All the rules listed here assume that errors are independent of exogenous variables,
and all variables have expected value zero.
The two-variable and three-variable rules apply to the Factor Analysis Model: X = ΛF
+ e. These rules assume that all errors are uncorrelated, and each observed variable is
caused by only one factor. If a model includes variables that are caused by more than
one factor, it may be possible to add them to the model later using the Expansion Rule
below.
– The scale is fixed, meaning one factor loading equals one, and
– At least two additional observed variables have loadings that do not equal zero,
then the variance of the factor, the variances of the error terms and the factor loadings
are all identified.
– The model contains at least one other factor having a non-zero correlation with
this factor, and
– The scale of the other factor is fixed, then the variance of the factor, the variances
of the error term, and the one factor loading are all identified.
– Apply the Double Measurement Rule to all relevant factors, identifying their
variances and covariances, as well as the variances and possibly covariances of sets of
error terms.
– Apply the two and three-variable rules to each remaining factor, one at a time.
This will identify the factor loadings and the variances of the factors.
– If the scales are fixed for two factors, their covariance is identified provided the
error terms of the two measurements are independent.
The last two rules apply to general Structural Equation Models (such as the LISREL
Model) that have both latent and observed variables.
• Two-Step Rule
1: Consider the latent variable model as a model for observed variables. Check
identification (usually using the Counting Rule and the Recursive Rule).
2: Consider the measurement model as a factor analysis model, ignoring the structure
of V (F). Check identification.
– The additional variables are independent of the error terms in the original
measurement model.
– In the original measurement model, each latent variable has at least one
observed variable that is a function of that latent variable, and of no other latent variable
except for error terms. This will automatically be true if the rules above are used to establish
identification of the measurement model.