An Introduction To Stata For Economists: Data Analysis

This document provides an overview and introduction to analyzing panel data and performing various statistical analyses in Stata. It covers generating summary statistics, correlation analyses, linear regression, instrumental variables estimation, and converting between long and wide formats for panel data. Do-files are introduced as a way to save and run sequences of Stata commands.

Uploaded by

Xiaoying Xu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views

An Introduction To Stata For Economists: Data Analysis

Uploaded by

Xiaoying Xu

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 48

An Introduction to Stata for

Economists
Part II:
Data Analysis
Kerry L. Papps
2. Overview
• Do-files
• Summary statistics
• Correlation
• Linear regression
• Generating predicted values and hypothesis testing
• Instrumental variables and other estimators
• Panel data capabilities
• Panel estimators
2. Overview (cont.)
• Writing loops
• Graphs
4. Comment on notation used
• Consider the following syntax description:
list [varlist] [in range]
– Text in typewriter-style font should
be typed exactly as it appears (although there
are possibilities for abbreviation).
– Italicised text should be replaced by desired
variable names etc.
– Square brackets (i.e. []) enclose optional Stata
commands (do not actually type these).
5. Comment on notation used
(cont.)
• For example, an actual Stata command might be:
list name occupation
• This notation is consistent with notation in Stata
Help menu and manuals.
6. Do-files
• Do-files allow commands to be saved and
executed in “batch” form.
• We will use the Stata do-file editor to write do-
files.
• To open do-file editor click Window  Do-File
Editor or click
• Can also use WordPad or Notepad: Save as “Text
Document” with extension “.do” (instead of
“.txt”). Allows larger files than do-file editor.
7. Do-files (cont.)
• Note: a blank line must be included at the end of a
WordPad do-file (otherwise last line will not run).
• To run a do-file from within the do-file editor,
either select Tools  Do or click
• If you highlight certain lines of code, only those
commands will run.
• To run do-file from the main Stata windows,
either select File  Do or type:
do dofilename
8. Do-files (cont.)
• Can “comment out” lines by preceding with * or
by enclosing text within /* and */.
• Can save the contents of the Review window as a
do-file by right-clicking on window and selecting
“Save All...”.
9. Univariate summary
statistics
• tabstat produces a table of summary statistics:
tabstat varlist [, statistics(statlist)]
• Example:
tabstat age educ, stats(mean sd
sdmean n)
• summarize displays a variety of univariate
summary statistics (number of non-missing
observations, mean, standard deviation, minimum,
maximum):
summarize [varlist]
10. Multivariate summary
statistics
• table displays table of statistics:
table rowvar [colvar] [, contents(clist
varname)]
• clist can be freq, mean, sum etc.
• rowvar and colvar may be numeric or string
variables.
• Example:
table sex educ, c(mean age median
inc)
11. Multivariate summary
statistics (cont.)
• One “super-column” and up to 4 “super-rows” are
also allowed.
• Missing values are excluded from tables by
default. To include them as a group, use the
missing option with table.
EXERCISE 1
12. Generating simple statistics
• Open the do-file editor in Stata. Run all your solutions
to the exercises from here.
• Open nlswork.dta from the internet as follows:
webuse nlswork
• Type summarize to look at the summary statistics
for all variables in the dataset.
• Generate a wage variable, which exponentiates
ln_wage:
gen wage=exp(ln_wage)
EXERCISE 1 (cont.)
13. Generating simple statistics
• Restrict summarize to hours and wage and
perform it separately for non-married and married
(i.e. msp==0 and 1).
• Use tabstat to report the mean, median,
minimum and maximum for hours and wage.
• Report the mean and median of wage by age
(along the rows) and race (across the columns) :
table age race, c(mean wage median
wage)
14. Sets of dummy variables
• Dummy variables take the values 0 and 1 only.
• Large sets of dummy variables can be created
with:
tab varname, gen(dummyname)
• When using large numbers of dummies in
regressions, useful to name with pattern, e.g. id1,
id2… Then id* can be used to refer to all
variables beginning with *.
15. Correlation
• To obtain the correlation between a set of
variables, type:
correlate [varlist] [[weight]] [,
covariance]
• covariance option displays the covariances
rather than the correlation coefficients.
• pwcorr displays all the pairwise correlation
coefficients between the variables in varlist:
pwcorr [varlist] [[weight]] [, sig]
16. Correlation (cont.)
• sig option adds a line to each row of matrix
reporting the significance level of each correlation
coefficient.
• Difference between correlate and pwcorr is
that the former performs listwise deletion of
missing observations while the latter performs
pairwise deletion.
• To display the estimated covariance matrix after a
regression command use:
estat vce
17. Correlation (cont.)
• (This matrix can also be displayed using Stata’s
matrix commands, which we will not cover in this
course.)
18. Linear regression
• To perform a linear regression of depvar on
varlist, type:
regress depvar varlist [[weight]] [if
exp] [, noconstant robust]
• depvar is the dependent variable.
• varlist is the set of independent variables
(regressors).
• By default Stata includes a constant. The
noconstant option excludes it.
19. Linear regression (cont.)
• robust specifies that Stata report the Huber-
White standard errors (which account for
heteroskedasticity).
• Weights are often used, e.g. when data are group
averages, as in:
regress inflation unemplrate year
[aweight=pop]
• This is weighted least squares (i.e. GLS).
• Note that here year allows for a linear time trend.
20. Post-estimation commands
• After all estimation commands (i.e. regress,
logit) several predicted values can be computed
using predict.
• predict refers to the most recent model
estimated.
• predict yhat, xb creates a new variable yhat
equal to the predicted values of the dependent
variable.
• predict res, residual creates a new
variable res equal to the residuals.
21. Post-estimation commands
(cont.)
• Linear hypotheses can be tested (e.g. t-test or F-
test) after estimating a model by using test.
• test varlist tests that the coefficients
corresponding to every element in varlist jointly
equal zero.
• test eqlist tests the restrictions in eqlist, e.g.:
test sex==3
• The option accumulate allows a hypothesis to
be tested jointly with the previously tested
hypotheses.
22. Post-estimation commands
(cont.)
• Example:
regress lnw sex race school age
test sex race
test school == age, accum
EXERCISE 2
23. Linear regression
• Compute the correlation between wage and
grade. Is it significant at the 1% level?
• Generate a variable called age2 that is equal to the
square of age (the square operator in Stata is ^).
• Create a set of race dummies with:
tab race, gen(race)
• Regress ln_wage on: age, age2, race2,
race3, msp, grade, tenure, c_city.
EXERCISE 2 (cont.)
24. Linear regression
• Display the covariance matrix from this
regression.
• Use predict to generate a variable res
containing the residuals from the equation.
• Use summarize to confirm that the mean of the
residuals is zero.
• Rerun the regression and report Huber-White
standard errors.
25. Additional estimators
• Instrumental variables:
ivregress 2sls depvar exogvars
(endogvars=ivvars)
• Both exogvars and ivvars are used as instruments
for endogvars.
• For example:
ivregress 2sls price inc pop
(qty=cost)
• Logit:
logit depvar indepvars
26. Additional estimators
(cont.)
• Probit:
probit depvar indepvars
• Ordered probit:
oprobit depvar indepvars
• Tobit:
tobit depvar indepvars, ll(cutoff)
• For example, tobit could be used to estimate
labour supply:
tobit hrs educ age child, ll(0)
EXERCISE 3
27. IV and probit
• Repeat the regression from Exercise 2 using
ivregress 2sls and instrument for tenure
using union and south. Compare the results
with those from Exercise 2.
• Estimate a probit model for union with the
following regressors: age, age2, race2,
race3, msp, grade, c_city, south.
28. Panel data manipulation
• Panel data generally refer to the repeated
observation of a set of fixed entities at fixed
intervals of time (also known as longitudinal data).
• Stata is particularly good at arranging and analysing
panel data.
• Stata refers to two panel display formats:
– Wide form: useful for display purposes and often
the form data obtained in.
– Long form: needed for regressions etc.
29. Panel data manipulation
(cont.)
Example of wide form:
i xij

id sex inc2008 inc2009 inc2010

1 0 5000 5500 6000

2 1 2000 2200 3300

3 0 3000 2000 1000

• Note the naming convention for inc.

30. Panel data manipulation
(cont.)
Example of long form:
i j xij

id year sex inc

1 2008 0 5000
1 2009 0 5500
1 2010 0 6000
2 2008 1 2000
2 2009 1 2200
2 2010 1 3300
3 2008 0 3000
3 2009 0 2000
3 2010 0 1000
31. Panel data manipulation
(cont.)
• To change from long to wide form, type:
reshape wide varlist, i(ivarname)
j(jvarname)
• varlist is the list of variables to be converted from
long to wide form.
• i(ivarname) specifies the variable(s) whose
unique values denote the spatial unit.
• j(jvarname) specifies the variable whose unique
values denote the time period.
32. Panel data manipulation
(cont.)
• To change from wide to long form, type:
reshape long stublist, i(ivarname)
j(jvarname)
• stublist is the “word” part of the names of
variables to be converted from wide to long form,
e.g. “inc” above.
• It is important to name variables in this format, i.e.
word description followed by year.
33. Panel data manipulation
(cont.)
• To move between the above example datasets use:
reshape long inc, i(id) j(year)
reshape wide inc, i(id) j(year)
• These steps “undo” each other.
34. Lags
• You can “declare” the data to be in panel form,
with the tsset command:
tsset panelvar timevar
• For example:
tsset country year
• After using tsset, a lag can be created with:
gen lagname = L.varname
• Similarly, L2.varname gives the second lag.
35. Panel estimators
• Panel data estimation:
xtreg depvar indepvars [, re fe
i(panelvar)]
• i(panelvar) specifies the variable corresponding
to an independent unit (e.g. country). This can be
omitted if the data have been tsset.
• re and fe specify how we wish to treat the time-
invariant error term (random effects vs fixed
effects).
36. Panel estimators (cont.)
• An alternative to fe is to regress depvar on a set
of dummy variables for each panel unit.
• You should either drop one dummy or use the
noconstant option to avoid the dummy
variable trap, although Stata automatically drops
regressors when they are perfectly collinear.
• To perform a Hausman test of fixed vs random
effects, first run each estimator and save the
estimates, then use the hausman command:
37. Panel estimators (cont.)
xtreg depvar indepvars, fe
estimates store fe_name
xtreg depvar indepvars, re
estimates store re_name
hausman fe_name re_name
• You must list the fe_name before re_name in the
hausman command.
EXERCISE 4
38. Manipulating a panel
• Declare the data to be a panel using tsset, noting
that idcode is the panel variable and year is the
time variable.
• Generate a new variable lwage equal to the lag of
wage and confirm that this contains the correct
values by listing some data (use the break button):
list idcode year wage lwage
• Save the file as “NLS data” in a folder of your
choice.
EXERCISE 4 (cont.)
39. Manipulating a panel
• Using the same regressors from the regress
command in Exercise 2, run a fixed effects
regression for ln_wage using xtreg.
• Note that all time invariant variables are dropped.
• Store the estimates as fixed.
• Run a random effects regression and store the
estimates as random.
• Perform a Hausman test of random vs fixed effects.
Which is preferred?
EXERCISE 4 (cont.)
40. Manipulating a panel
• Drop all variables other than idcode, year and
wage using the keep command (quicker than
using drop).
• Use the reshape wide option to rearrange the
data so that the first column represents each
person (idcode) and the other columns contain
wage for a particular year.
• Return the data to long form (change wide to
long in the command).
EXERCISE 4 (cont.)
41. Manipulating a panel
• Do not save the new dataset.
42. Writing loops
• The foreach command allows one to repeat a
sequence of commands over a set of variables:
foreach name of varlist varlist {
Stata commands referring to `name’
}
• Stata sequentially sets name equal to each element in
varlist and executes the commands enclosed in
braces.
• name should be enclosed within the characters ` and
’ when referred to within the braces.
43. Writing loops (cont.)
• name can be any word and is an example of a
“local macro”.
• For example:
foreach var of varlist age educ
inc {
gen l`var’=log(`var’)
drop `var’
}
EXERCISE 5
44. Using loops in regression
• Open “NLS data” and rerun the fixed effects
regression from Exercise 4.
• Use foreach with varlist to loop over all the
regressors and report their t-statistics (using
test).
• Use foreach with varlist to create a loop
that renames each variable by adding “68” to the
end of the existing name.
45. Graphs
• To obtain a basic histogram of varname, type:
histogram varname, discrete freq
• To display a scatterplot of two (or more) variables,
type:
scatter varlist [[weight]]
• weight determines the diameter of the markers
used in the scatterplot.
46. Graphs (cont.)
• There are options for (among other things):
– Adding a title (title)
– Altering the scale of the axes (xscale,
yscale)
– Specifying what axis labels to use (xlabel,
ylabel)
– Changing the markers used (msymbol)
– Changing the connecting lines (connect)
47. Graphs (cont.)
• Particularly useful is mlabel(varname) which
uses the values of varname as markers in the
scatterplot.
• Example:
scatter gdp unemplrate,
mlabel(country)
48. Graphs (cont.)
• Graphs are not saved by log files (separate
windows).
• Select File  Save Graph.
• To insert in a Word document etc., select Edit 
Copy and then paste into Word document. This
can be resized but is not interactive (unlike Excel
charts etc.).

Epson L810 Series, L850 Series Service Manual Rev. A
50% (2)
Epson L810 Series, L850 Series Service Manual Rev. A
127 pages
SAAD Question Bank
No ratings yet
SAAD Question Bank
21 pages
SAP S/4HANA Cloud Sales Content With SAP Analytics Cloud ID: 3N0
No ratings yet
SAP S/4HANA Cloud Sales Content With SAP Analytics Cloud ID: 3N0
7 pages
Introduction To STATA
No ratings yet
Introduction To STATA
57 pages
Introduction To Stata 2024-06-18 Handout
No ratings yet
Introduction To Stata 2024-06-18 Handout
52 pages
VAR Lecture2
100% (1)
VAR Lecture2
39 pages
Time Sereis Analysis Using Stata
100% (1)
Time Sereis Analysis Using Stata
26 pages
Stata
No ratings yet
Stata
50 pages
Metrics Final Slides From Darmouth PDF
100% (1)
Metrics Final Slides From Darmouth PDF
126 pages
Lecture 7 VAR
No ratings yet
Lecture 7 VAR
34 pages
Parametric & Nonparametric Tests
No ratings yet
Parametric & Nonparametric Tests
87 pages
ML3 - Evaluation
100% (1)
ML3 - Evaluation
65 pages
Stata Commands PDF
No ratings yet
Stata Commands PDF
5 pages
Advanced Statistical Computing PDF
No ratings yet
Advanced Statistical Computing PDF
329 pages
Regression in Data Mining
No ratings yet
Regression in Data Mining
15 pages
Logistic Regression
100% (2)
Logistic Regression
30 pages
Multiple Regression Analysis: I 0 1 I1 K Ik I
100% (1)
Multiple Regression Analysis: I 0 1 I1 K Ik I
30 pages
Regression With Dummy Variables Econ420 1
No ratings yet
Regression With Dummy Variables Econ420 1
47 pages
Lecture 8 Application of VAR Model
100% (1)
Lecture 8 Application of VAR Model
22 pages
Data Transformation With Dplyr - Cheatsheet
100% (1)
Data Transformation With Dplyr - Cheatsheet
2 pages
Basics of STATA Software
No ratings yet
Basics of STATA Software
67 pages
Chow Test
No ratings yet
Chow Test
23 pages
Calculating Total Scale Scores and Reliability SPSS - D.boduszek
No ratings yet
Calculating Total Scale Scores and Reliability SPSS - D.boduszek
16 pages
Week 1 - Intro To Stata
No ratings yet
Week 1 - Intro To Stata
35 pages
STATA
No ratings yet
STATA
26 pages
Forecasting Time Series With Arma and Arima Models The Box-Jenkins Methodology
100% (1)
Forecasting Time Series With Arma and Arima Models The Box-Jenkins Methodology
35 pages
Qualitative Response Regression Questions
No ratings yet
Qualitative Response Regression Questions
10 pages
Lecture Note Basic Statistics
No ratings yet
Lecture Note Basic Statistics
73 pages
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
No ratings yet
Introduction To STATA: Introduction To STATA About STATA Basic Operations Regression Analysis Panel Data Analysis
27 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
[Data & Variable Management] Stata Data Management
No ratings yet
[Data & Variable Management] Stata Data Management
64 pages
TS PartII
100% (1)
TS PartII
50 pages
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
100% (1)
Introduction To Vars and Structural Vars:: Estimation & Tests Using Stata
69 pages
Econometrics I lab tutorial using STATA
No ratings yet
Econometrics I lab tutorial using STATA
28 pages
Biiiimplmoniwb
No ratings yet
Biiiimplmoniwb
34 pages
Arima
100% (1)
Arima
4 pages
Time Series - Practical Exercises
100% (1)
Time Series - Practical Exercises
9 pages
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
100% (1)
Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan
140 pages
Pareto Distribution
No ratings yet
Pareto Distribution
13 pages
STATA Manual 1
No ratings yet
STATA Manual 1
61 pages
Topic03 Correlation Regression
No ratings yet
Topic03 Correlation Regression
81 pages
K Kiran Kumar IIM Indore
100% (1)
K Kiran Kumar IIM Indore
115 pages
Difference Between Logit and Probit Models
100% (1)
Difference Between Logit and Probit Models
7 pages
Stata Application Part I
No ratings yet
Stata Application Part I
27 pages
Panel Data
No ratings yet
Panel Data
9 pages
Char Lie
100% (1)
Char Lie
64 pages
Chapter9 - Serial Correlation
No ratings yet
Chapter9 - Serial Correlation
37 pages
List of Formula - Managerial Statistics
No ratings yet
List of Formula - Managerial Statistics
6 pages
Revision Pack 4 May 2011
No ratings yet
Revision Pack 4 May 2011
27 pages
Stata Data Managment
No ratings yet
Stata Data Managment
79 pages
Regression Analysis Project
No ratings yet
Regression Analysis Project
4 pages
Multiple Linear Regression: y BX BX BX
No ratings yet
Multiple Linear Regression: y BX BX BX
14 pages
Logit & Probit Model
No ratings yet
Logit & Probit Model
51 pages
Confidence Interval Estimation
100% (1)
Confidence Interval Estimation
31 pages
Slides PDF
No ratings yet
Slides PDF
418 pages
STATA Training for staff
No ratings yet
STATA Training for staff
23 pages
MBA Free Ebooks
No ratings yet
MBA Free Ebooks
56 pages
Practical-5 - Jupyter Notebook
100% (1)
Practical-5 - Jupyter Notebook
8 pages
Mathematical Modeling, Numerical Methods, and Problem Solving
No ratings yet
Mathematical Modeling, Numerical Methods, and Problem Solving
96 pages
Advanced Stata
No ratings yet
Advanced Stata
54 pages
STATA Programming II
100% (1)
STATA Programming II
2 pages
BA Notes
No ratings yet
BA Notes
5 pages
Array Strings in Programiing
No ratings yet
Array Strings in Programiing
30 pages
Dahua CCTV Goods New Rate
No ratings yet
Dahua CCTV Goods New Rate
8 pages
Systronix 20x4 LCD Brief Technical Data
No ratings yet
Systronix 20x4 LCD Brief Technical Data
7 pages
CC Lab Manual
No ratings yet
CC Lab Manual
67 pages
TFRIS - Appendices
No ratings yet
TFRIS - Appendices
72 pages
Introduction To Design Analysis & Algorithms
No ratings yet
Introduction To Design Analysis & Algorithms
79 pages
Experiment 3B
No ratings yet
Experiment 3B
4 pages
Web Design
No ratings yet
Web Design
26 pages
Devops Benchmarking Study 2023
No ratings yet
Devops Benchmarking Study 2023
51 pages
Citra Log.txt.Old
No ratings yet
Citra Log.txt.Old
4 pages
Python-Basic-Elements-String-sets-Dictionaries
No ratings yet
Python-Basic-Elements-String-sets-Dictionaries
132 pages
Computer-Assisted Language Learning
No ratings yet
Computer-Assisted Language Learning
13 pages
Sure
No ratings yet
Sure
11 pages
Yuvaraju Devops
No ratings yet
Yuvaraju Devops
5 pages
20761C TrainerPrepGuide PDF
No ratings yet
20761C TrainerPrepGuide PDF
7 pages
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
No ratings yet
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
1 page
Wireshark Modifier
No ratings yet
Wireshark Modifier
25 pages
Old Book Buy or Sell
50% (4)
Old Book Buy or Sell
41 pages
Iot-Based Building Automation and Energy Management
No ratings yet
Iot-Based Building Automation and Energy Management
13 pages
Supplier Handbook & FAQ - R1
No ratings yet
Supplier Handbook & FAQ - R1
15 pages
Job Details Page 2
No ratings yet
Job Details Page 2
5 pages
190-801 Exam Syllabus
No ratings yet
190-801 Exam Syllabus
2 pages
Retail Marketing
No ratings yet
Retail Marketing
17 pages
HND in Computing and Software Engineering: Lesson 01 - Introduction To Data Structures
No ratings yet
HND in Computing and Software Engineering: Lesson 01 - Introduction To Data Structures
16 pages
Roan M
No ratings yet
Roan M
35 pages
3 - Powersuite Presentation
No ratings yet
3 - Powersuite Presentation
26 pages
Unit 4
No ratings yet
Unit 4
3 pages
Networks Security Quiz - March 2024 With Answers
No ratings yet
Networks Security Quiz - March 2024 With Answers
8 pages