Intro To Analyzing Cross-Sectional Time-Series Data in R (For Students of IR & Comparative Politics)
Intro To Analyzing Cross-Sectional Time-Series Data in R (For Students of IR & Comparative Politics)
Contents
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Download packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Introducing the World Development Indicators API in R . . . . . . . . . . . . . . . . . . . . . . . . 2
Simple demonstration of WDI package in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Make a real research dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Clean & transform the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Subset the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Visualizing panel data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Multiple variables (on a single scale) for one unit over time . . . . . . . . . . . . . . . . . . . 7
One time-series plot for one variable and one line per unit . . . . . . . . . . . . . . . . . . . . 8
Panel of time-series plots for one variable and one plot per unit . . . . . . . . . . . . . . . . . 9
Import Polity IV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Easily deal with country codes using the countrycode package . . . . . . . . . . . . . . . . . . . . 11
Subset variables of interest and merge with WDI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Standardize time-series on different scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Visualize standardized time-series by country . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Modeling pooled data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Panel-corrected standard errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Generate summary tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Generate publication-quality results tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Save your dataset to a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1
Overview
This document demonstrates how to perform several routine tasks in the analysis of state-level economic
and political data for multiple countries over time, using the programming language R. It is aimed toward
first-year graduate students with little or no experience with R. The .Rmd file which generated this document
may prove especially useful as a template from which students may wish to get started. All one needs to
do is download and install R, RStudio, and LaTex, open this .Rmd file in Rstudio, and modify/extend the
examples already coded here.
This document downloads from the web two widely used data sources (the World Bank’s World Development
Indicators and the Polity IV measure of democracy), merges them, cleans and transforms some variables,
visualizes several variables in commonly recurring types of graphs, runs simple models, and generates
automatically formatted tables.
Download packages
install.packages("WDI")
install.packages("countrycode")
install.packages("plm")
install.packages("ggplot2")
install.packages("reporttools")
install.packages("stargazer")
install.packages("psData")
• To find the variable codes to call, first search with the WDIsearch() function.
• Then call data with the WDI() function.
• Super easy!
• Let’s say we’re interested in GDP growth across countries. First we’ll search for “gdp growth” to see if
the World Bank has a variable matching this search term.
### Use this function to find indicator codes for things that interest you
WDIsearch("gdp growth")
## indicator name
## [1,] "NV.AGR.TOTL.ZG" "Real agricultural GDP growth rates (%)"
## [2,] "NY.GDP.MKTP.KD.ZG" "GDP growth (annual %)"
2
• To download the data straight from the web, we use the WDI() function, specifying which countries,
variables, and years we’re interested in.
### Insert as many countries and variables as you want, within this range.
### Let's try getting GDP growth for US and France from 1960-2012
wb <- WDI(country=c("US", "FR"), indicator=c("NY.GDP.MKTP.KD.ZG"), start=1960, end=2012)
3
Make a real research dataset
• Download a set of commonly used variables for all countries between 1960 and the most recent update
of the WDI. Setting extra=TRUE retrieves some extra variables such as region, geocodes, and additional
country codes. The additional country code turns out to merge better with other datasets (see below).
4
Clean & transform the data
# Assignment operator "<-" will add a new variable to the dataframe "wb"
wb$debt<-wb$GC.DOD.TOTL.GD.ZS # central government debt as share of gdp
wb$curracct<-wb$BN.CAB.XOKA.GD.ZS # current account balance as share of gdp
wb$dependency<-wb$SP.POP.DPND # dependency-age population as share of total population
wb$land<-wb$AG.LND.TOTL.K2 # total land in square kilometers
wb$pop<-wb$SP.POP.TOTL # total population
wb$radioscap<-(wb$IT.RAD.SETS/wb$pop)*1000 # radios per 1000 people
wb$logradioscap<-log(wb$radioscap + 1) # logarithm of radios per capita, because it's skewed
wb$newspaperscap<-wb$IT.PRT.NEWS.P3 # newspapers in circulation per 1000 people
wb$gdp<-wb$NY.GDP.MKTP.KD # gross domestic product, constant 2000 USD
wb$gdp2<-wb$NY.GDP.MKTP.CD # gross domestic product, current USD, millions
wb$gdp3<-wb$NY.GDP.MKTP.PP.CD # gross domestic product, purchasing-power parity, current
wb$gdpcap<-wb$NY.GDP.PCAP.KD # gdp per capita constant 2000 USD
wb$gdpgrowth<-wb$NY.GDP.MKTP.KD.ZG # change in gross domestic product
wb$imports<-wb$NE.IMP.GNFS.ZS # imports of goods and services as share of gdp
wb$exports<-wb$NE.EXP.GNFS.ZS # exports of goods and services as share of gdp
wb$trade<-wb$NE.TRD.GNFS.ZS # total trade as share of gdp
wb$fdi<-wb$BX.KLT.DINV.WD.GD.ZS # foreign-direct investment as share of gdp
wb$privatecapital<-wb$BN.KLT.PRVT.GD.ZS # private capital flows as share of gdp
wb$spending<-wb$NE.CON.GOVT.ZS # government consumption expenditure as share of gdp
wb$industry<-wb$SL.IND.EMPL.ZS # employment in industry as share of total employment
wb$industry2<-wb$NV.IND.TOTL.ZS # value added in industry as share of gdp
5
Subset the data
Take a peak at the “head” (the first six rows) using the head() function. This is “panel” or “pooled”
cross-sectional, time-series data. This is also called “country-year” format.
The World Bank keeps data not only on countries, but on certain aggregates of countries. Aggregate units
are distinguished by the category “Aggregates” in the factor variable “region” in the wb object (a dataframe).
Then, each of the different aggregates are specified by the “country” variable. To keep things tidy, let’s break
the wb dataframe into a few separate dataframes. We’ll make one for all the aggregates, one for regions in
particular (or any other set of aggregates you’re interested in; try replacing the region categories below with
other categories in the “country” variable of the aggregate subset we’re about to make. . . ), and one for the
good-old fashioned states.
wb<-subset(wb, region!="Aggregates")
6
Visualizing panel data
Below we investigate our data using one of the most powerful graphing packages in R: ggplot2. The formulas
below are a quick tour of plots suited especially for the demands of panel data.
Multiple variables (on a single scale) for one unit over time
50
% of GDP
Variables
40
Spending
Trade
30
20
7
One time-series plot for one variable and one line per unit
40000
Regions
East Asia & Pacific (all income levels)
30000
GDP Per Capita
8
Panel of time-series plots for one variable and one plot per unit
9
Trade Levels in Europe and Central Asia, 1960−2012
Albania Andorra Armenia Austria Azerbaijan Belarus Belgium
Bosnia and Herzegovina
300
200
100
0
BulgariaChannel IslandsCroatia CyprusCzech RepublicDenmark Estonia Faeroe Islands
300
200
100
0
Finland France Georgia Germany Greece Greenland Hungary Iceland
300
200
100
0
Ireland Isle of Man Italy Kazakhstan KosovoKyrgyz Republic Latvia Liechtenstein
300
Trade (% of GDP)
200
100
0
Lithuania Luxembourg
Macedonia, FYRMoldova Monaco MontenegroNetherlands Norway
300
200
100
0
Poland Portugal Romania
Russian Federation
San Marino Serbia Slovak RepublicSlovenia
300
200
100
0
Spain Sweden Switzerland Tajikistan Turkey Turkmenistan UkraineUnited Kingdom
300
200
100
0
Uzbekistan
300
200
100
0
1960
1970
1980
1990
2000
2010
Year
10
Import Polity IV
require(psData)
polity <- PolityGet(vars = "polity2")
polity<-subset(polity, year>=1960 & year<=2012)
head(polity) # peek at the dataframe
11
Subset variables of interest and merge with WDI
require(arm)
12
Visualize standardized time-series by country
1
Standardized deviations from mean
1
0 Standardized Variables
1
0
1960
1970
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
Year
13
Modeling pooled data
require(plm)
model<-plm(spending ~ trade + polity2 + dependency + gdpgrowth + lag(spending),
index = c("iso3c","year"),
model="within",
effect="twoways",
data=df)
ls(model) # see the contents of the dataframe
require(lmtest)
coeftest(model, vcov=function(x) vcovBK(x, type="HC1", cluster="time"))
14
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## trade 0.00738 0.00233 3.17 0.0015 **
## polity2 0.01961 0.00951 2.06 0.0392 *
## dependency 0.00604 0.00364 1.66 0.0971 .
## gdpgrowth -0.05459 0.00725 -7.53 6e-14 ***
## lag(spending) 0.79578 0.02035 39.10 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
15
Generate summary tables
These functions will generate publication-quality Latex tables of summary statistics of all the variables in
your dataframe. To use these tables in a paper, you simply have to copy and paste into a Latex document
what these functions print for you in the R console. These functions generate Latex code, which you can
then typset with Latex into an elegant paper.
require(reporttools)
summaryvars<-subset(df, select=c("lending", "income", "imports", "exports", "industry"))
tableContinuous(summaryvars[,sapply(summaryvars, is.numeric)], font.size=12,
longtable=FALSE, comment=FALSE, timestamp=FALSE,
cap="Example Table for Continous Variables") # numeric variables only
Variable n Min q1 x
e x̄ q3 Max s IQR #NA
imports 6576 0.1 22.3 32.2 38.6 48.0 424.8 27.3 25.7 1058
exports 6576 0.2 17.2 27.7 33.3 42.8 230.3 24.3 25.6 1058
industry 2403 2.1 19.8 24.3 24.7 29.9 59.6 7.8 10.1 5231
P
Variable Levels n % %
lending Aggregates 0 0.0 0.0
Blend 488 6.4 6.4
IBRD 2627 34.4 40.8
IDA 2635 34.5 75.3
Not classified 1884 24.7 100.0
all 7634 100.0
income Aggregates 0 0.0 0.0
High income: nonOECD 464 6.1 6.1
High income: OECD 1494 19.6 25.6
Low income 1676 21.9 47.6
Lower middle income 2088 27.4 75.0
Not classified 0 0.0 75.0
Upper middle income 1912 25.1 100.0
all 7634 100.0
16
Generate publication-quality results tables
The stargazer function in the package of the same name will turn your model object into a nicely formatted
table of results. It works just as the reporttools functions above. A nice little trick is to simply write the
Latex code to a file, which will automatically save to your working directory. After running these lines, check
your working directory for you Latex file called “model.tex”. Typset this file in a Latex editor.
require(stargazer)
model<-lm(spending ~ trade + polity2 + dependency + gdpgrowth + lag(spending), data=df)
stargazer(model,
header=FALSE,
title="Table of Regression Results",
digits = 2,
style = "apsr")
spending
trade −0.00∗∗∗
(0.00)
polity2 −0.00∗∗∗
(0.00)
dependency 0.00
(0.00)
gdpgrowth 0.00∗∗∗
(0.00)
lag(spending) 1.00∗∗∗
(0.00)
Constant −0.00∗∗∗
(0.00)
N 6,094
R2 1.00
Adjusted R2 1.00
Residual Std. Error 0.00 (df = 6088)
F Statistic 1,784,599,984,481,631,618,244,476,630,204,416.00∗∗∗ (df = 5; 6088)
∗ ∗∗ ∗∗∗
p < .1; p < .05; p < .01
17
Save your dataset to a file
Finally, this code chunk will save your dataframe “df” to a comma-separated text file, so you can easily open
it in Stata, SPSS, Excel, etc.
18