0% found this document useful (0 votes)
19 views

Lecture 4

The document discusses different types of economic data used in econometric analysis, including cross-sectional data, time series data, pooled cross-sectional data, and panel/longitudinal data. It provides examples of each type of data and notes how the appropriate econometric methods depend on the nature of the data used. The document also discusses sources of international and local economic data and issues regarding data quality. It introduces the linear regression model and explains how the ordinary least squares method is used to estimate the unknown regression coefficients by minimizing the sum of squared errors.

Uploaded by

Hajra Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

Lecture 4

The document discusses different types of economic data used in econometric analysis, including cross-sectional data, time series data, pooled cross-sectional data, and panel/longitudinal data. It provides examples of each type of data and notes how the appropriate econometric methods depend on the nature of the data used. The document also discusses sources of international and local economic data and issues regarding data quality. It introduces the linear regression model and explains how the ordinary least squares method is used to estimate the unknown regression coefficients by minimizing the sum of squared errors.

Uploaded by

Hajra Ahmad
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

3/8/2021

The Nature and Sources of


Data

Nature and Sources of Data

Econometric analysis requires data


Different kinds of economic data sets are:
 Cross-sectional data
 Time series data
 Pooled crosssectional
 Panel/Longitudinal data
Econometric methods depend on the nature of the data
used
 Use of inappropriate methods may lead to misleading
results

1
3/8/2021

Cross-Sectional Dat Sets


 Sample of individuals, households, firms, cities, states,
countries, or other units of interest at a given point of
time/in a given period

 Cross-sectional observations are more or less


independent

 For example, pure random sampling from a population

 Sometimes pure random sampling is violated, e.g. units


refuse to respond in surveys, or if sampling is
characterized by clustering

 Cross-sectional data typically encountered in applied


microeconomics

Cross-Sectional Dat Sets :


Example1

 Cross-sectional data set on wages and other characteristics

Indicator variables
(1=yes, 0=no)

Observation number Hourly wage

2
3/8/2021

Time Series Data

 Observations of a variable or several variables over time

 For example, stock prices, money supply, consumer price


index, gross domestic product, annual homicide rates,
automobile sales, …

 Time series observations are typically serially correlated

 Ordering of observations conveys important information

 Data frequency: daily, weekly, monthly, quarterly, annually, …

 Typical features of time series: trends and seasonality

 Typical applications: applied macroeconomics and finance

Time Series Dat Sets : Example

 Time series data on minimum wages and related variables

Average minimum Average Unemployment Gross national


wage for given year coverage rate rate product

3
3/8/2021

Pooled Cross Sections Data


 Two or more cross sections are combined in one data set

 Cross sections are drawn independently of each other

 Pooled cross sections often used to evaluate policy


changes
 Example:
 Evaluate effect of change in property taxes on house prices
 Random sample of house prices for the year 1993
 A new random sample of house prices for the year 1995
 Compare before/after (1993: before reform, 1995: after reform)

Pooled Cross Sections Data :


Example
 Pooled cross sections on housing prices
Property tax
Size of house
in square feet

Number of
bathrooms

Before reform

After reform

4
3/8/2021

Panel or longitudinal data

The same cross-sectional units are followed over time

Panel data have a cross-sectional and a time series


dimension

Panel data can be used to account for time-invariant


unobservables

Panel data can be used to model lagged responses


Example:
• City crime statistics; each city is observed in two years
• Time-invariant unobserved city characteristics may be modeled
• Effect of police on crime rates may exhibit time lag

Panel or longitudinal data:


Example

 Two-year panel data on city crime statistics

Each city has two time


series observations

Number of
police in 1986

Number of
police in 1990

5
3/8/2021

Sources of Data

Sources of data (International)


Success of regression is dependent on availability
of quality data
International Sources?
 World Bank, World Development Indicators (WDI)
 IMF: International Monetary Website
 Asian Development Bank, International Energy
Organization
Yet best source is: Google itself.. And you can
Google for relevant international data

6
3/8/2021

Sources of data (Local)


Local Sources of Data?
1. Economic Survey
(http://www.finance.gov.pk/survey_1819.html)
2. Handbook of SBP
(http://www.sbp.org.pk/departments/stats/PakEcon
omy_HandBook/index.htm)
3. Surveys of Pakistan Bureau of Statistics
4. Websites of Planning Commission, SBP, PBS,
Ministry of Finance, SECP etc.

Screen shoots of Pakistani


sources of Data

7
3/8/2021

Quality of the data

…success of regression analysis depends on quality


and availability of data
…data may not be always available & not all
available data may be of good quality
check carefully the quality of the agency that collects
data
 possibility of errors of measurement, errors of
omission or errors of rounding in data & need to be
checked
Data at a higher aggregate level and not possible to
use for disintegrate level
….the results of research are only as good as the
quality of the data

Estimating a Linear Regression


Model

8
3/8/2021

Linear Regression Model (LRM)


 Consider the “simple Linear Regression Model”
Yi = β1 + β2X2i + εi
 It is called “simple” because it has two variables only
 The term “linear” in the linear regression model
refers to linearity in the regression coefficients, the
βs, and not linearity in the Y and X variables.
 So X and Y could be used with higher power, with
log scale etc. in linear regression models
 βs coefficients cannot be raised to any power or
divide by another coefficient

Estimating the regression


Model
Consider this model with K-variables
Yi = B1 + B2X2i + B3X3i + … + BkXki + ui

Our a simplified version of it with three variables


Wagei = B1 + B2Edui + B3Expi + B4Femalei + ui

Furthermore, let assume that you have data on 450


workers about their “experience”, “education”, “gender”
etc.
….the structure of data could be seen on next slide.

9
3/8/2021

Data on Wage, Education,


Gender (sample data)

How can you “estimate” the regression from this


data?

Ordinary Least Square


(OLS/LS) Method
 There are several ways through which we can
calculate this regression model
The most common method is called “Ordinary Least
Square/Least Square method”
It is explained as follows: “OLS calculates the
unknown regression parameters by minimizing the
sum of the squared Errors (errors) of the regression
model”
Formally: Yi = B1 + B2X2i + B3X3i + … + BkXki + ui (1)
ui =Yi – (B1 + B2X2i + B3X3i + … + BkXki ) (2)
or ui = Yi – BX (3)

10
3/8/2021

Ordinary Least Square


(OLS/LS) Method
or ui =Yi – BX (3)
this equation means that “error term” is equal to
difference between the “actual value (Y)” and the
“value of Y obtained from the regression (BX)”
To obtain the values of “B” we try to make the sum of
error terms (Σui=0) equal to Zero… i.e., minimizing the
errors
However, due to some statistical properties its not
possible to minimize errors because Σui=0 …so we
instead minimize the “square sum of errors” i.e.,(Σui2 )
This implies that we can write equation (3) as follows:
Σ ui2= Σ (Yi – BX)2

Ordinary Least Square


(OLS/LS) Method
Σ ui2= Σ (Yi – BX)2
or Σ ui2= Σ (Yi –B1 – B2X2i – B3X3i – … – BkXki)2 (4)
 Now we have data only on “Y” and “X” variables but no
data on “Bs”
 To obtain values of the regression coefficients,
derivatives are taken with respect to the regression
coefficients and set equal to zero.
 This is the standard procedure for optimization: take the
first order derivative of the function with respective to
unknowns one-by-one and equating each of it equal to
zero and solving it.
 Same can be followed here… for each of the B… one by
one, and thus for all Bs we get equations that is in-terms
of x and y variables.

11
3/8/2021

Ordinary Least Square


(OLS/LS) Method for SLR
Σ ui2= Σ (Yi – BX)2 (4)

Or for simple linear regression we can write Eq. (4) as


Σ ui2= Σ (Yi –B1 – B2X2i)2
For minimization of errors sum of squares:
𝜕𝑢𝑖2
 𝜕𝐵1 =𝜕 𝜕𝐵1 Σ (Yi –B1 – B2X2i)2 =0

𝜕𝑢𝑖2
 𝜕𝐵2 =𝜕 𝜕𝐵2 Σ (Yi –B1 – B2X2i)2 =0

 This two equations gives the following two results

Derivations
n n
∑ ui2 = ∑ (yi – β1 – β2x2i)2
i=1 i=1

Take derivatives with respect to β1 and set equation equal to zero


n
∂ ∑ ui2 = ∂ ∑ (yi – β1 – β2x2i)2 = 0
∂ β1 ∂ β1 i=1

2 ∑ (yi – β1 – β2x2i)2-1 ∂ (yi ‒ β1‒ β2x2i) = 0


n

i=1 ∂β1
n
2 ∑ (yi – β1 – β2x2i) (-1) = 0
i=1

n n n
∑ (yi ) – ∑ β1 – β2∑ x2i = 0
i=1 i=1 i=1

n n
∑ (yi ) – ( β1 + β1 + .. β1.n ) – β2∑ x2i = 0
i=1 i=1

12
3/8/2021

Derivations
(Cont…)
n n
∑ (yi ) – ( β1 + β1 + .. β1.n ) – β2∑ x2i = 0
i=1 i=1

n n
∑ (yi ) – ( nβ1) – β2 ∑ x2i = 0
i=1 i=1
n
n
( nβ1) = ∑ (yi ) – β2∑ x2i
i=1 i=1
n n
β1 = ∑ (yi ) β2∑x2i
i=1
i=1

n n

– –
β1 = Y – β2 X

Derivations
n n
(Cont…)
∑ ui2 = ∑ (yi – β1 – β2x2i)2
i=1 i=1

Take derivatives with respect to β1 and set equation equal to zero


n
∂ ∑ ui2 = ∂ ∑ (yi – β1 – β2x2i)2 = 0
— i=1
∂ β2 ∂ β2
n
2 ∑ (yi – β1 – β2x2i)2-1 ∂ (‒ β2x2i) = 0
i=1 ∂β1
n
‒2∑ (yi – β1 – β2x2i)(x2i ) = 0
i=1
n n n n
∑ (x2iyi ) – β1 ∑ x2i – β2∑ x2i∑ x2i = 0
i=1 i=1 i=1 i=1

n n n
β 2 ∑ x2i2 = ∑ (x2iyi ) – β1 ∑ x2i
i=1 i=1 i=1

13
3/8/2021

Derivations
(Cont…)
But we know that….
– –
β1 = Y – β2 X
n n n
β 2 ∑ x2i
2
= ∑ (x2iyi ) – β1 ∑ x2i
i=1 i=1 i=1

Putting β1 values in above equation


n n
n n n
∑ yi β2 ( ∑ x2i)
β 2 ∑ x2i2 = ∑ (x2iyi ) – ∑ x2i i=1
– i=1
i=1 i=1 i=1 n n
n n
n
n n ∑x2i∑ yi β2( ∑ x2i)2
β 2 ∑ x2i2 ∑ (x2iyi ) – i=1 + i=1
= i=1
i=1 i=1 n n

Derivations
(Cont…)
n n n n n
β 2 ∑ x2i2 = ∑ (x2iyi ) – ∑
i=1
x2i∑ yi
+
β 2 (∑x2i)2
i=1 i=1 i=1 i=1
n n
n n n
n
β 2 ( ∑xi)2 n ∑ x2i ∑ yi
β 2 ∑ xi2 – = ∑ (x2iyi ) – i=1 i=1
i=1
i=1
n i=1 n

n n n
n
(∑xi)2 n
∑ x2i∑ yi
β2 ∑ xi2 – i=1 = ∑ (x2iyi ) – i=1 i=1
i=1 i=1
n n
n n
n ∑ x2i∑ yi
∑ (x2iyi ) – i=1 i=1
i=1 n
β2 = n
n

∑ x2i2 (∑x )2
– i=1 2i
i=1 n

14
3/8/2021

Derivations
(Cont…)
n n
n ∑ x2i ∑ yi
∑ (x2iyi ) – i=1 i=1
i=1
n
β2 = n
n
(∑x2i)2
∑ xi2
i=1
– i=1n
n n n
n∑ (xiyi ) – ∑ x2i∑yi
i=1 i=1 i=1
n
β2 =
n n
n ∑ x2i2 – ( ∑x2i)2
i=1 i=1

Derivations
(Cont…)
n n n
n∑ (x2iyi ) – ∑ x2i ∑yi
i=1 i=1 i=1
n
β2 =
n n
n ∑ x2i2 – ( ∑ x2i)2
i=1 i=1

n n n
n∑ (x2iyi ) – ∑
i=1
x2i∑ yi
i=1 i=1
β2 = n n
n ∑ x2i2 – ( ∑x2i)2
i=1
i=1

15
3/8/2021

Derivations
(Cont…)
For OLS estimation of the model: Y = β1 + β2X2i +Ui
is given as follows

– –
β1 = Y – β2 X2

n n n
∑ xi ∑yi
n∑ (x2iyi ) – i=1
β2 =
i=1 i=1
ˆ2   x  x y  y 
2i 2 i

 x  x 
n n 2
∑x2i)2
n∑ x2i2 – (i=1 2i 2
i=1

Formulas for estimation SLR


Model
For Simple population regression: Yi = B1 + B2X2i + Ui
We can estimate: Yi = b1 + b2X2i + ei with b1 and b2 as
follows:

𝑏1 = 𝑌 − 𝑏2 𝑋2

𝑛 𝑋2 𝑦− 𝑋2 𝑌 (𝑋2 −𝑋2 )(𝑌−𝑌)


𝑏2 = 𝑛 𝑋22 − (𝑋2 )2
or (𝑋2 −𝑋2 )2

Note1: all Σs are for i=0 to n i.e., 𝑛𝑖=0 , however for


simplicity we just wrote Σ.
Note2: We also ignore the “i” in subscripts of the variables
(both X and Y) to make the equations readable & simple

16
3/8/2021

Ordinary Least Square


(OLS/LS) Method for MLR
Σ ui2= Σ (Yi – BX)2 (4)

Or for multiple linear regression (MLR) we can write Eq. (4)


as
Σ ui2= Σ (Yi –B1 – B2X2i– B3X3i)2
For minimization of errors sum of squares:
𝜕𝑢𝑖2
 𝜕𝐵1 =𝜕 𝜕𝐵1 Σ (Yi –B1 – B2X2i– B3X3i)2 =0
𝜕𝑢𝑖2
 𝜕𝐵2 =𝜕 𝜕𝐵2 Σ (Yi –B1 – B2X2i– B3X3i)2 =0
𝜕𝑢𝑖2
 𝜕𝐵3 =𝜕 𝜕𝐵3 Σ (Yi –B1 – B2X2i– B3X3i)2 =0

 This three equations gives the following three results

Formulas for Multiple


Regression Models
For regression: Yi = B1 + B2X2i + B3iX3i +Ui
 the Multiple regression Yi = b1 + b2X2i + b3X3i + ei
with b1 , b2 and b3 as follows:
𝑏1 = 𝑌 − 𝑏2 𝑋2 − 𝑏3 𝑋3

Σ𝑋32 (Σ 𝑋2 𝑌) −(Σ𝑋2 𝑋3 )(Σ𝑋3 𝑌)


𝑏2 = (Σ𝑋22 )(Σ𝑋32 )−Σ(𝑋2 𝑋3 )2

Σ𝑋22 (Σ 𝑋3 𝑌) −(Σ𝑋2 𝑋3 )(Σ𝑋2 𝑌)


𝑏3 =
(Σ𝑋22 )(Σ𝑋32 )−Σ(𝑋2 𝑋3 )2

17
3/8/2021

From where did these formulas


come?

 There is a complete derivations of these formulas by using calculus tools


of function minimization (you do not need to memorize that)

 ….by minimizing the error sum of squares (that’s what OLS does) of
these models

 … if someone is interested, can check the annex of the course book …I


can do it too, but for applications of regression models we do not need the
derivation, rather we are interested in how it works

 ...…I showed part of derivations on board for simple linear regression


model

 …ALL YOU NEED IS TO KNOW THE BACKGROUND AND MUST BE


ABLE TO APPLY THE FORMULAS

18

You might also like