ECON6067
Computation and Analysis of Economic Data
Stata (II)
Karen Xiaoting Mai
Fall 2022
Plan
▶ Graphs
▶ Time-Series Operators
▶ Encode/Decode
▶ T-Test
▶ Linear Regression
Graphs
Twoway Graphs
▶ [graph] twoway plot [if] [in] [, twoway_options]
▶ Examples of plottype
▶ scatter: scatterplot
▶ line: line plot
▶ connected: connected-line plot
▶ bar: bar plot
▶ lfit: linear prediction plot
▶ qfit: quadratic prediction plot
▶ lfitci: linear prediction plot with CIs
▶ qfitci: quadratic prediction plot with CIs
▶ function: line plot of function
Graphs
Line Plot
▶ Line plot of y1 vs x
▶ twoway line y1 x
▶ Line plot of y1, y2, y3 each against sorted values of x
▶ twoway line y1 y2 y3 x, sort
Graphs
Scatter Plot
▶ Scatter plot
▶ twoway scatter y x
▶ Adding a line of best fit
▶ twoway scatter yr x || lfit y_var x_var OR
▶ twoway (scatter yr x) (lfit y_var x_var)
▶ Combine with more plot types
▶ twoway (scatter ...) (line...) (lfit ...)
▶ Save the graph
▶ graph save [graphname] filename [, asis replace]
Graphs
Gph Files
▶ Gph files come in three forms
▶ old-format Stata 7 or earlier .gph file
▶ modern-format graph in live format
▶ contain the data and other information necessary to re-create
the graph
▶ can be edited later and can be displayed using different
schemes
▶ data used to create the graph can be retrieved from the .gph
file
▶ modern-format graph in as-is format
▶ contain a recording of the picture
▶ generally smaller than live-format files
▶ cannot be modified
Time-Series Operators
▶ Suppose the dataset has a variable that represents time in
numeric values, say, 1980, 1981, ...
▶ Use tsset to set time variable and then use Stata time series
operators and commands
▶ Set to be a straight time series
▶ tsset timevar
▶ Set to be a collection of time series
▶ tsset panelvar timevar
▶ Time-series operators: L., F., D.
▶ Lag L: xt−1 , L2: xt−2
▶ Lead F: xt+1 , F2: xt+2
▶ Difference D: xt − xt−1 , D2: (xt − xt−1 ) − (xt−1 − xt−2 )
▶ e.g.,
▶ gen GDPchange = (GDP - L.GDP) / L.GDP
Encode/Decode
▶ The panelvar in tsset panelvar timevar needs to be numeric
▶ String variable to numeric variable
▶ encode varname, gen(newvar)
▶ Numeric variable to string variable
▶ decode varname, gen(newvar)
Example: Penn World Table
Penn World Table
▶ Question: Do poor countries grow faster?
▶ Average annual growth rate of real per capita GDP vs. real per
capita GDP 1960
Example: Penn World Table
Penn World Table
▶ Question: Do poor countries grow faster?
▶ Average annual growth rate of real per capita GDP vs. real per
capita GDP 1960
▶ Data: Penn World Table 10.0
▶ Information on relative levels of income, output, input and
productivity, covering 183 countries 1950-2019
▶ https://www.rug.nl/ggdc/productivity/pwt/
▶ Feenstra, Robert C., Robert Inklaar and Marcel P. Timmer
(2015), ”The Next Generation of the Penn World Table”
American Economic Review, 105(10), 3150-3182, available for
download at http://www.ggdc.net/pwt.
▶ Use the series “rgdpo” for GDP: Output-Side Real GDP at
Chained PPPs
▶ Output-side real GDP allows comparison of productive
capacity across countries and over time
Example: Penn World Table
Penn World Table
▶ Recall: with discrete time, we can derive average growth rate
from
Yt = Y0 · (1 + g )t
So 1
Yt t
g= −1
Y0
Approximately
ln Yt − ln Y0
g≈
t
With continuous time, this is exact.
T-Test
▶ T-test: test equality of means
▶ One sample: compares the mean of the sample to a given
number
▶ ttest varname == #
▶ Two upaired samples: tests whether the difference in the
means from the two groups is 0
▶ ttest varname, by(groupvar)
▶ ttest income, by(gender)
▶ Two paired samples: tests whether the difference in the
means from the two variables measured on the same set of
subjects is 0, taking into account the scores are not
independent
▶ ttest varname1 == varname2
▶ ttest bp_before == bp_after
T-Test
▶ Stored results
▶ r(N_1) sample size n_1
▶ r(N_2) sample size n_2
▶ r(p_l) lower one-sided p-value
▶ r(p_u) upper one-sided p-value
▶ r(p) two-sided p-value
▶ r(se) estimate of standard error
▶ r(t) t statistic
▶ r(sd_1) standard deviation for first variable
▶ r(sd_2) standard deviation for second variable
▶ r(sd) combined standard deviation
▶ r(mu_1) x_1 bar, mean for population 1
▶ r(mu_2) x_2 bar, mean for population 2
▶ r(df_t) degrees of freedom
▶ r(level) confidence level
Linear Regression
Regress
▶ Linear regression
▶ regress depvar [indepvars] [if] [in] [weight] [, options]
▶ regress y x1 x2 x3
Linear Regression
Regress
▶ Linear regression
▶ regress depvar [indepvars] [if] [in] [weight] [, options]
▶ regress y x1 x2 x3
▶ Common options
▶ noconstant: suppress constant term
▶ vce(vcetype): specifies the type of standard error reported.
vcetype may be ols, robust, cluster clustvar, bootstrap, ...
▶ vce(robust): robust to some kinds of misspecification
▶ vce(cluster): allow for intragroup correlation
Linear Regression
Regress
▶ Linear regression
▶ regress depvar [indepvars] [if] [in] [weight] [, options]
▶ regress y x1 x2 x3
▶ Common options
▶ noconstant: suppress constant term
▶ vce(vcetype): specifies the type of standard error reported.
vcetype may be ols, robust, cluster clustvar, bootstrap, ...
▶ vce(robust): robust to some kinds of misspecification
▶ vce(cluster): allow for intragroup correlation
▶ depvar and indepvars may contain time-series operators
▶ Stored results
▶ e(N): number of observations
▶ e(r2): R-squared
▶ e(r2): adjusted R-squared
▶ e(F): F statistic
▶ e(V): variance-covariance matrix of the estimators
Linear Regression
Adding Interactions
▶ In Stata, can use factor-variable operators to create virtual variables
▶ i. unary operator to specify indicators
▶ c. unary operator to treat as continuous
▶ # binary operator to specify interactions
▶ ## binary operator to specify factorial interactions
Linear Regression
Adding Interactions
▶ In Stata, can use factor-variable operators to create virtual variables
▶ i. unary operator to specify indicators
▶ c. unary operator to treat as continuous
▶ # binary operator to specify interactions
▶ ## binary operator to specify factorial interactions
▶ Adding interactions between variables by putting ## btw them
▶ x1##x2
▶ include main effects of x1 and x2 and their interactions
▶ Variables in an interaction are assumed to be categorical unless
stated otherwise
Linear Regression
Adding Interactions
▶ In Stata, can use factor-variable operators to create virtual variables
▶ i. unary operator to specify indicators
▶ c. unary operator to treat as continuous
▶ # binary operator to specify interactions
▶ ## binary operator to specify factorial interactions
▶ Adding interactions between variables by putting ## btw them
▶ x1##x2
▶ include main effects of x1 and x2 and their interactions
▶ Variables in an interaction are assumed to be categorical unless
stated otherwise
▶ If involve a continuous variable
▶ x1##c.x2
Linear Regression
Adding Interactions
▶ In Stata, can use factor-variable operators to create virtual variables
▶ i. unary operator to specify indicators
▶ c. unary operator to treat as continuous
▶ # binary operator to specify interactions
▶ ## binary operator to specify factorial interactions
▶ Adding interactions between variables by putting ## btw them
▶ x1##x2
▶ include main effects of x1 and x2 and their interactions
▶ Variables in an interaction are assumed to be categorical unless
stated otherwise
▶ If involve a continuous variable
▶ x1##c.x2
▶ If include only the interactions
▶ x1#x2
Linear Regression
Adding Interactions
▶ In Stata, can use factor-variable operators to create virtual variables
▶ i. unary operator to specify indicators
▶ c. unary operator to treat as continuous
▶ # binary operator to specify interactions
▶ ## binary operator to specify factorial interactions
▶ Adding interactions between variables by putting ## btw them
▶ x1##x2
▶ include main effects of x1 and x2 and their interactions
▶ Variables in an interaction are assumed to be categorical unless
stated otherwise
▶ If involve a continuous variable
▶ x1##c.x2
▶ If include only the interactions
▶ x1#x2
▶ If include only the main effects of categorical variables
▶ i.x1 i.x2
Linear Regression
Hypothesis Tests on Coefficients
▶ Tests jointly hypotheses about model coefficients
▶ test x1
▶ test x1 x2
▶ test (x1==10) (x2==2)
▶ test 2.x1==100