The Structure of Economic Data
The Structure of Economic Data
Economic data sets come in various forms. While some econometric methods can be
applied straightforwardly to different types of data set, it is essential to examine the
special features of some sets. In the following sections we describe the most important
data structures encountered in applied econometrics.
Cross-sectional data
A cross-sectional data set consists of a sample of individuals, households, firms, cities,
countries, regions or any other type of unit at a specific point in time. In some cases,
the data across all units do not correspond to exactly the same time period. Consider
a survey that collects data from questionnaire surveys of different families on different
days within a month. In this case, we can ignore the minor time differences in collection
and the data collected will still be viewed as a cross-sectional data set.
In econometrics, cross-sectional variables are usually denoted by the subscript i, with
i taking values of 1, 2, 3, . . . , N, for N number of cross-sections. So if, for example,
Y denotes the income data we have collected for N individuals, this variable, in a
cross-sectional framework, will be denoted by:
Yi
for i = 1, 2, 3, . . . , N (2.1)
Cross-sectional data are widely used in economics and other social sciences. In
economics, the analysis of cross-sectional data is associated mainly with applied microeconomics. Labour economics, state and
local public finance, business economics,
demographic economics and health economics are some of the prominent fields in
microeconomics. Data collected at a given point in time are used in these cases to test
microeconomic hypotheses and evaluate economic policies.
Time series data
A time series data set consists of observations of one or more variables over time. Time
series data are arranged in chronological order and can have different time frequencies,
such as biannual, annual, quarterly, monthly, weekly, daily and hourly. Examples of
time series data include stock prices, gross domestic product (GDP), money supply and
ice cream sales figures, among many others.
Time series data are denoted by the subscript t. So, for example, if Y denotes the GDP
of a country between 1990 and 2002 we denote that as:
Yt
for t = 1, 2, 3, . . . , T (2.2)
where t = 1 for 1990 and t = T = 13 for 2002.
ASTERIOU: “chap02” — 2011/3/29 — 18:47 — page 16 — #3
16 Statistical Background and Basic Data Handling
Because past events can influence those in the future, and lags in behaviour are
prevalent in the social sciences, time is a very important dimension in time series data
sets. A variable that is lagged one period will be denoted as Yt−1, and when it is lagged
s periods will be denoted as Yt−s. Similarly, if it is leading k periods it will be denoted
as Y
t+k.
A key feature of time series data, which makes them more difficult to analyse than
cross-sectional data, is that economic observations are commonly dependent across
time; that is, most economic time series are closely related to their recent histories. So,
while most econometric procedures can be applied to both cross-sectional and time
series data sets, in the case of time series more things need to be done to specify the
appropriate econometric model. Additionally, the fact that economic time series display
clear trends over time has led to new econometric techniques that attempt to address
these features.
Another important feature is that time series data that follow certain frequencies
might exhibit a strong seasonal pattern. This feature is encountered mainly with weekly,
monthly and quarterly time series. Finally, it is important to note that time series data
are mainly associated with macroeconomic applications.
Panel data
A panel data set consists of a time series for each cross-sectional member in the data
set; as an example we could consider the sales and the number of employees for 50
firms over a five-year period. Panel data can also be collected on a geographical basis;
for example, we might have GDP and money supply data for a set of 20 countries and
for a 20-year period.
Panel data are denoted by the use of both i and t subscripts, which we have used before
for cross-sectional and time series data, respectively. This is simply because panel data
have both cross-sectional and time series dimensions. So, we might denote GDP for a
set of countries and for a specific time period as:
Y
it for t = 1, 2, 3, . . . , T and i = 1, 2, 3, . . . , N (2.3)
To better understand the structure of panel data, consider a cross-sectional and a time
series variable as N × 1 and T × 1 matrices, respectively