0% found this document useful (0 votes)
53 views7 pages

A Short Guide To Stata 10 For Windows

This document provides a short guide to using Stata 10 for Windows. It introduces the basic Stata environment including the command, results, review, browser, and editor windows. It describes how to open and save data files, import data from Excel, perform common data manipulations like generating new variables and recoding values, and perform descriptive statistics, graphs, regressions, and use do-files for replication. The guide directs the user to Stata's online help and documentation for more details on commands and statistical methods.

Uploaded by

Wilmar González
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views7 pages

A Short Guide To Stata 10 For Windows

This document provides a short guide to using Stata 10 for Windows. It introduces the basic Stata environment including the command, results, review, browser, and editor windows. It describes how to open and save data files, import data from Excel, perform common data manipulations like generating new variables and recoding values, and perform descriptive statistics, graphs, regressions, and use do-files for replication. The guide directs the user to Stata's online help and documentation for more details on commands and statistical methods.

Uploaded by

Wilmar González
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Unversitat Pompeu Fabra Kurt Schmidheiny A Short Guide to Stata 10 for Windows 2

Short Guides to Microeconometrics October 2008

1 Introduction

A Short Guide to Stata 10 for Windows∗ This guide introduces the basic commands of Stata. More commands (on
panel data, limited dependent variables, monte carlo experiments, etc.)
are described in the respective handouts.
Stata commands are set in Courier. Options in [brackets] are optional.
1 Introduction 2

2 The Stata Environment 2


2 The Stata Environment

3 Additions to Stata 3 When you start Stata for Windows you will see the following windows:
the Command window where you type in your Stata commands, the Re-
4 Where to get help 3 sults window where Stata results are displayed, the Review window where
past Stata commands are displayed and the Variables window which list
5 Opening and Saving Data 4
all the variables in the active datafile.
6 Importing Data from Excel 4 The data in the active datafile can be browsed (read-only) in the Browser
window, which is activated from the menu Data/Data browser or by
7 Data Manipulation 5
browse varlist
8 Descriptive Statistics 7
where varlist (e.g. income age) is a list of variables to be displayed.
9 Graphs 8 The Editor window allows to edit data either by directly typing into the
editor window or by copying and pasting from spreadsheet software
10 OLS Regression 8
edit varlist
11 Log Files 9
Stata 8 has implemented every Stata command (except the programming
12 Do-Files 10 commands) as a dialog that be accessed from the menus. This makes
commands you are using for the first time easier to learn as the proper
13 Important Functions and Operators 12 syntax for the operation is displayed in the Review window.


This manual draws on material from various sources: the Stata handbooks, Ben
Jann (ETH Zurich), Data and Statistical Services (Princeton University).

Version: 31-10-2008, 18:41


3 Short Guides to Microeconometrics A Short Guide to Stata 10 for Windows 4

3 Additions to Stata 5 Opening and Saving Data

Many researchers provide their own Stata programs on Stata’s webpage. Open an existing Stata datafile (extension .dta):
net search keyword use filename [, clear]
searches the Internet for user-written additions to Stata that contain where the option clear clears the dataset already in memory.
the specified keyword, including user-written additions published in the Save a datafile in Stata format:
Stata Journal (SJ) and old Stata Technical Bulletin (STB).
save [filename ]
If filename is not specified, the name under which the data was last
4 Where to get help known is used. If filename is specified without an extension, .dta is used.

The Stata User’s Guide is an introduction into the capabilities and basic Stata will look for data or save data or save a log file in the drive and
concepts of Stata. The Stata Base Reference Manual provides system- directory specified by
atic information about all Stata commands. It is also often an excellent cd drive:directory
treatise of the implemented statistical methods. See help memory if you encounter memory problems when loading a file.
The online help in Stata describes all Stata commands with its options.
However, it does not explain the statistical methods as in the Reference
manual. You can start the online help by issuing the command
6 Importing Data from Excel

help command Prepare the data in Excel for conversion:


If you don’t know the exact expression for the command, you can search
• Make sure that missing data values are coded as empty cells or as
the Stata documentation by
numeric values (e.g., 999 or -1). Do not use character values (e.g -,
search word N/A) to represent missing data.
In both cases the result is written into the result window. Alternatively,
• Make sure that there are no commas in the numbers. You can
you can display the result in the Viewer window by issuing the command
change this under Format menu, then select Cells... .
view help command
• Make sure that variable names are included only in the first row of
or by calling the Stata online help in the menu bar: Help/Search.... your spreadsheet. Variable names should be 32 characters or less,
start with a letter and contain no special characters except ‘ ’.

Under the File menu, select Save As... . Then Save as type Text(tab
delimited). The file will be saved with a .txt extension.
5 Short Guides to Microeconometrics A Short Guide to Stata 10 for Windows 6

Start Stata. Then issue the following command: which repeats the command for each group of observations for which the
insheet using filename [, clear] values of the variables in varlist are the same. For example,

where filename is the name of the tab-delimited file (with extension .txt). sort nationality
If you have already opened a data file in Stata you can replace the old by nationality: egen referenceinc = mean(income)
data file using the option clear. generates the new variable referenceinc containing for each observation
the mean income of all observations of the same nationality. Note that
the data has to be sorted by nationality beforehand.
7 Data Manipulation
The recode command is a convenient way to exchange the values of or-
dinal variables:
New variables are created by
recode var (rule1 ) [(rule2 )]
generate newvar = expression [if expression ]
e.g. replace gender (1=0) (2=1) will produce a dummy variable.
where newvar is the name of the new variable and expression is a
mathematical function of existing variables. The if option applies the The following system variables (note the ‘ ’) may be useful:
command only to the data specified by a logical expression. The (system) n contains the number of the current observation.
missing value code ‘.’ is assigned to observations that take no value. N contains the total number of observations in the dataset.
Some examples: pi contains the value of pi to machine precision.
generate age2 = age^ 2 A lagged variable can be created in the following way: First define a time
generate agewomen = age if women == 1 series index. Second declare the data a time series. For example this can
generate rich = 0 if wealth != . be done with the commands
replace rich = 1 if wealth >= 1000000 generate t = n /* generate a variable with values 1...N */
generate rich = wealth >= 1000000 tsset t /* declare the time series */
Existing variables can be changed by Lagged values can now be designated as L.varname . For example L.gdp
replace oldvar = expression [if expression ] designates a lagged value of the variable gdp, L2.invest designates the
variable invest lagged twice.
The command egen extends the functionality of generate. For example
You can delete variables from the dataset by either specifying the vari-
egen average = mean(income)
ables to be dropped or to be kept:
creates a new variable containing the (constant) mean income for all
drop varlist
observations. See the last section for some available functions.
keep varlist
Both the generate and the egen command allow the by varlist prefix
You can delete observation from the dataset by specifying the observa-
7 Short Guides to Microeconometrics A Short Guide to Stata 10 for Windows 8

tions to be dropped (or kept) by a either logical expression or by speci- Produce a two-way table of absolute and relative frequencies counts along
fying the last and first observation with Pearson’s chi-square statistic:
drop [if expression ] [in range first /last ] tabulate var1 var2, col chi2
keep [if expression ] [in range first /last ] Perform a two-sample t test of the hypothesis that varname has the same
Arrange the observations of the current dataset in ascending order with mean within the two groups defined by the dummy variable groupvar
respect to varlist ttest varname [if exp ], by(groupvar ) [ unequal]
sort varlist where the option unequal indicates that the two-sample data are not to
Change the order of the variables in the current dataset: be assumed to have equal variances.
order varlist
by specifying a list of variables to be moved to the front of the dataset. 9 Graphs
You can convert the data into a dataset of the means (or other statistics
see help) of varlist. varname specifies the groups over which the means Draw a scatter plot of the variables yvar1 yvar2 ... (y-axis) against xvar
are calculated. (x-axis):

collapse varlist, by(varname ) scatter yvar1 yvar2 ... xvar

A description of the variables in the dataset is produced by describe Draw a line graph, i.e. scatter with connected points
and codebook [varlist ]. line yvar1 yvar2 ... xvar
Draw a histogram of the variable var
8 Descriptive Statistics histogram var
Draw a scatter plot with regression line:
Display univariate summary statistics of the variables in varlist:
scatter yvar xvar || lfit yvar xvar
summarize varlist
Report the frequency counts of varname:
10 OLS Regression
tabulate varname [if expression ] [, missing]
The missing option requests that missing values are reported. To regress a dependent variable depvar on a constant and one or more inde-
Display the correlation or covariance matrix for varlist pendent variables in varlist use

correlate varlist regress depvar [varlist ] [if exp ] [, level(#) noconstant]


9 Short Guides to Microeconometrics A Short Guide to Stata 10 for Windows 10

The if option limits the estimation to a subsample specified by the You can temporarily suspend, resume or stop the logging with the com-
logical expression exp. The noconstant option suppresses the constant mand:
term. level(#) specifies the confidence level, in percent, for confidence log { on | off | close }
intervals of the coefficients. See help regress for more options.
cmdlog { on | off | close }
You can access the estimated parameters and their standard errors from
the most recently estimated model
coef[varname ] contains the value of the coefficient on varname 12 Do-Files
se[varname ] contains the standard error of the coefficient
A “do”-file is a set of commands just as you would type them in one-
Stata calculates predictions from the previously estimated regression by
by-one during a regular Stata session. Any command you use in Stata
predict newvarname [, stdp] can be part of a do file. The default extension of do-files is .do, which
The stdp option provides the standard error of the prediction. explains its name. Do-files allow you to run a long series of commands
several times with minor or no changes. Furthermore, do-files keep a
[post-estimation commands: predict, cve, ...]
record of the commands you used to produce your results.
To edit a do file, just click on the icon (like an envelope) in the toolbar.
11 Log Files To run this file, save it in the do-file editor and issue the command:
do mydofile
A log file keeps a record of the commands you have issued and their
results during your Stata session. You can create a log file with You can also click on the Do current file icon in the do-file editor to run
the do file you are currently editing.
log using filename [, append replace text]
Comments are indicated by a * at the beginning of a line. Alternatively,
where filename is any name you wish to give the file. The append option
what appears inside /* */ is ignored. The /* and */ comment delimiter
simply adds more information to an existing file, whereas the replace
has the advantage that it may be used in the middle of a line.
option erases anything that was already in the file. Full logs are recorded
in one of two formats: SMCL (Stata Markup and Control Language) or * this is a comment
text (meaning ASCII). The default is SMCL, but the option text can generate x = 2*y /* this is another comment*/ + 5
change that. Hitting the return key tells Stata to execute the command. In a do file,
A command log contains only your commands the return key is at the end of every line, and restricts commands to
be on the same line with a maximum of 255 characters. In many cases,
cmdlog using filename
(long) commands are more clearly arranged on multiple lines. You can
Both type of log files can be viewed in the Viewer: tell Stata that the command is longer than one line by using the
view filename
11 Short Guides to Microeconometrics A Short Guide to Stata 10 for Windows 12

#delimit ; 13 Important Functions and Operators


command in the beginning of your do-file. The following Stata commands
are now terminated by a ‘;’. An example do-file: Some Mathematical Expressions

capture log using mincer, replace abs(x) returns the absolute value of x.
#delimit ; exp(x) returns the exponential function of x.
use schooling.dta, clear ; int(x) returns the integer by truncating x towards zero.
* generate a proxy for experience ; ln(x), log(x) returns the natural logarithm of x if x>0.
generate exp = age - educ - 6 ; log10(x) returns the log base 10 of x if x>0.
* estimate the Mincer equation ; max(x1,...,xn) returns the maximum of x1, ..., xn.
regress min(x1,...,xn) returns the minimum of x1, ..., xn.
lnwage educ exp exp2 female round(x) returns x rounded to the nearest whole number.
/* change the significance level to 0.01 */ round(x,y) returns x rounded to units of y.
, level(99) ; sign(x) returns -1 if x<0, 0 if x==0, 1 if x>0.
log close ; sqrt(x) returns the square root of x if x>=0.
⇒ Note that lines with comments also need to be terminated by ‘;’.
Otherwise the following command will not be executed. Logical and Relational Operators

& and | or
! not ∼ not
> greater than < less than
>= greater or equal <= smaller or equal
== equal != not equal

Some Probability distributions and density functions

norm(z) cumulative standard normal distribution


normden(z) returns the standard normal density
normden(z,m,s) normal density with mean m and stand. deviation s
invnorm(p) inverse cumulative standard normal distribution

Similar commands are available for a variety of distribution functions.


13 Short Guides to Microeconometrics

Some Functions in egen

diff(varlist )
creates an indicator variable equal to 1 where the variables in varlist
are not equal, and 0 otherwise.
fill(numlist )
creates a variable of ascending or descending numbers or complex
repeating patterns. See help numlist for the numlist notation.
max(varname ) (allows by varlist :)
creates a constant containing the maximum value of varname.
mean(varname )
creates a constant containing the mean of varname.
median(varname ) (allows by varlist :)
creates a constant containing the median of varname.
min(varname ) (allows by varlist :)
creates a constant containing the minimum value of varname.
rmax(varlist )
gives the maximum value in varlist for each observation (row). Equals
max(var1, var2, ... ) in the generate command.
rmean(varlist )
creates the (row) means of the variables in varlist for each observation
(row). Equals mean(var1, var2, ... ) in the generate command.
rmin(varlist )
gives the minimum value in varlist for each observation (row). Equals
min(var1, var2, ... ) in the generate command.
sd(varname ) (allows by varlist :)
creates a constant containing the standard deviation of varname.
sum(varname ) (allows by varlist :)
creates a constant containing the sum of varname.

You might also like