Econ3150 v12 Note01 PDF
Econ3150 v12 Note01 PDF
Cross-Section Data (Tverrsnittsdata): These are data from units observed at the same time or in the same time period. The data may be single
observations from a sample survey or from all units in a population. Examples of
Norwegian cross-section data are the Household Budget Survey for the year 1999,
The Manufacturing Statistics for the year 2000, the Population Census for the year
2001.
Most often cross-section data are data for micro units individuals, households,
firms, companies, etc. But macro-like cross-section data may well occur; examples
are cross-section data for municipalities, other local units, counties or or even countries. In cross-section data, all data variation goes across units; we have variation
across space (spatial variation).
Most often time-series data are macro data or macro-type data, for example timeseries for macro-economic variables from the National Accounts. But micro-data
may also occur as time-series, for example time-series for a particular household or
time-series for a particular firm. In time-series data the data variation goes over
time periods; we have variation over time (time serial variation).
y = a + bx + cp + dz + u.
specified the range of the subscripts i and t and a suitable set of assumptions for
the disturbances. In formulating (2) we have assumed that all households in each
period have been confronted with the same commodity price, and that the number
of persons in each specific household has not changed from year to year.
i = 1, . . . , M.
Since the price does not vary across the data set, we can combine the price term cp1
with the genuine intercept a and interpret a + cp1 as a cross-section intercept for
year 1. The variables in the cross-section data set therefore become y, x, z, with
observation set {yi1 , xi1 , zi }i=N
i=1 . The disturbance ui1 varies across households.
Equation (3) shows the following: It is impossible to estimate the price coefficient
c from the cross-section data. This is because the price only varies along the time
dimension, not along the cross-sectional dimension. What we are able to estimate for example by applying the Ordinary Least Squares (OLS) method on (3),
provided that the ui1 s (i = 1, . . . , N ) satisfy classical assumptions are b, d and
the composite intercept (a + cp1 ). Even if we know p1 , this is not sufficient to
derive an estimate for c. We say that we are unable to identify c in equation (1)
from our data. This also has a positive side: In cross-section data we do not need to
be concerned with, or bothered with, correlation between the income x and the price p.
Specialization II: Micro Time Series Data: Assume that we have time-series
data for one single household, household i = 1, for T successive years. Equation (2)
translated to this data situation then becomes
(4)
t = 1, . . . , T,
Since the number of household members does not show any variation across the data
set, we can combine the household size term dz1 with the genuine intercept a and
interpret a + dz1 as a time-series intercept for household 1. The variables in the
time-series data set therefore become y, x, p, with observation set {y1t , x1t , pt }t=T
t=1 .
The disturbance u1t varies over years.
Equation (4) shows the following: It is impossible to estimate the household size
coefficient d from the time-series data. This is because the number of households
members only varies along the cross-sectional dimension, not along the time dimension. What we are able to estimate for example by applying the OLS method
on (4), provided that the u1t s (t = 1, . . . , T ) satisfy classical assumptions are b,
c and the composite intercept (a + dz1 ). Even if we know z1 , this is not sufficient
to derive an estimate for d. We say that we are unable to identify d in equation
(1) from our data. This also has a positive side: In time-series data we do not need
to be concerned with, or bothered with, correlation between the income x and the
household size z.
Altogether Equations (3) and (4) show:
3
Yt = A + bXt + Cpt + Ut ,
where
Yt =
PN
i=1
yit ,
Xt =
PN
i=1
xit ,
Ut =
PN
Z=
PN
i=1
uit
C = cN,
4
i=1 zi
are coefficients. From (7) we can estimate the income coefficient b, the macro price
coefficient C and the composite macro intercept A.
We then are in a similar situation as when using the micro time-series relation
(4). It is impossible to estimate the household size coefficient d from the time-series
data. This is because the number of household members only varies along the crosssectional dimension, not along the time dimension, also in the aggregate data set.
Even if we know Z, this is not sufficient for deriving an estimate of d. On the other
hand, we do not need to be concerned with, or bothered with, correlation between
P
the income X and the population size
zi in the macro data set.
i = 1, . . . , M ; t = 1, . . . , T.
Such a data set, with M T observations, is called a panel data set, because we
observe a panel of M households over T years. Alternative terms are combined
time-series/cross-section data or longitudinal data. The variables in a panel data
set can vary both across the spatial dimension and over and time dimension. But
some of them may vary along one dimension only, as z and p in our basic example.
Using panel data, which exhibit both spatial and temporal variation,
we are able to estimate a, b, c and d jointly.
Panel data set may well be large. For example, M = 5000 households observed over
T = 20 years give a data set with M T = 100 000 observations. Handling so large
bodies of data poses strong requirements on computer technology and computer
software, but is well within the reach for being handled by modern computers, even
lap-tops.
Final remark
Attempts to estimate the same economic coefficient (i) from cross-section data, e.g.,
the income coefficient b in (3), and (ii) from time-series data, e.g., the income coefficient b in (4), often give systematically different results. Possible explanations of
this have been much discussed. Panel data may set us in a position to study such
differences mode closely. Biases in the estimation of b and d from cross-section data
may reflect omitted (and often unobservable) consumption motivating variables that
are correlated with xi1 and zi across the cross-section, say tastes and preferences. Biases in the estimation of b and c from time-series data may reflect omitted
(and often unobservable) consumption motivating variables that are correlated with
x1t and pt over time, say the consumers moods and expectations about the future business-cycle conditions. The Gross/Net Coefficient-problem (a concept to be
discussed later on) may therefore enter the scene differently and have different consequences in the two data types. Panel data let loose the variation in the xs, the
zs and the ps at the same time. But panel data also set the researcher in a position
to examine both (i) correlation over i in each period, for t = 1, . . . , T separately and
(ii) correlation over t for each individual, for i = 1, . . . , N separately. This may help
him or her approach explanations of discrepancies between cross-sectional based and
time-serial based estimates of presumptively the same parameter. Panel data may
also help to form purer estimators than those obtainable from the two simple data
types. This often requires the use of specific estimation methods, a topic studied in
more advanced econometrics.
What has been said above underpins, inter alia, the following conclusion: When
discussing correlation between economic variables in relation to an econometric investigation, it is important to be precise about what the correlation goes
across. This enters as an important characteristic of the data type used in the
investigation. Correlation between income and wealth across a cross-section has a
different meaning than correlation between income and wealth over time, and such
correlation coefficients often turn out to have markedly different size. The nature
of multicollinearity problems (also a concept to be discussed later on) when using
cross-section data and when using time-series data may therefore become widely
different.
Supplementary readings:
Erik Birn: konometriske emner. En viderefring. Oslo: Unipub 2008. Kapittel 1:
Datatyper and modeltyper.
Zvi Griliches: Handbook of Econometrics, Vol. III. Amsterdam: North-Holland, 1986.
Chapter 25: Economic Data Issues, sections 1, 2, 3.