Basic Stata Programming
Basic Stata Programming
RESEP
Research on Socio-Economic Policy
Disclaimer
These slides are meant to serve as a basic introduction to Stata programming and
offers an overview of macros, loops, and user-defined programs.
All notes, examples, and applications are my own work, but in areas draw heavily from
a number of excellent sources, most of which are listed at the end of this
presentation. Users are strongly encouraged to consult these resources for a more
complete treatment of the concepts introduced and discussed here.
Few users ever create routines/programs intended for sharing with other Stata
users
However, the vast majority of Stata users will benefit massively from employing
macros, loops, and programs in their own do-files
. More often than not, these commands, functions, and features bring significant gains
in terms of power, flexibility, and efficiency
. Learning how to use them well requires some exposure to their syntax and uses,
practice, and a healthy dose of patience
. But, the advantages of using them far outweigh the initial investment required to
learn how to use them
Objective
An introduction to basic Stata programming
Data
NIDS2008_sample.dta
1 NIDS website
Basic Stata Programming (24-26/11/2014) [email protected] 6 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
Outline
Structure of the presentation
Section 1: Introduction
Section 2: Basic programming
. Introduces macros, macro functions, loops, branching, and user-defined programs
Section 3: Developing a program
. Provides a real world example of how a program may be developed, starting from
the reason why the program is needed, to its conceptual and practical evolution
from basic code to a self-contained ado-file
Section 4: Examples & Applications
. Provides a set of further examples and applications for macros, loops, and programs
Stata Macros I
What are they?
>>> 4 4
Their names and contents are largely arbitrary (STS)2
Stata Macros II
Two types: local and globals
locals and globals differ in terms of their scope and how they are called
once defined
A local has a ``local'' scope
. it is only valid within the do-file, loop, or program within which it was defined
A global has a ``global'' scope
. within a particular instance of Stata, it is always valid, regardless of where it was
defined
Defining and calling a local: Defining and calling a global:
local x = 2 global x = 2
display 2 + `x' display 2 + $x
4 4
ROT: Use locals whenever globals can be avoided
Task: calculate 1 + 1 by creating a local with an initial value of 1, incrementing the local by 1, and then
displaying the answer in/as a string.
Three special local cases that are particularly useful when programming
. tempvar defines local(s) that may be used to create and refer to temporary
variables in the data
. tempname defines local(s) that may be used to create and refer to temporary
scalars or matrices in the data
. tempfile defines locals that may be used to create and refer to temporary files
tempvar and tempname are used to ensure efficient and clean programming
tempfile is particularly useful when calculations require 'destructive' data
manipulations
. e.g. when collapsing data or creating sub-samples
More on this later...
Stata Macros IV
What are they for?
Task: estimate (OLS) association between education and ln (earnings), controlling for age, age2 , race, the
interaction between being female and being married, household size, and province
Task: summarize monthly earnings, tabulate education, and graph the distribution of
ln (hhincome) for females between the ages of 15 and 64 who are either married or living with a
partner
Stata's macro extended functions are incredibly useful and powerful tools for,
among other things, accessing OS parameters, data attributes, and estimation
results
They can be used to...
. Access data/variable attributes
storage type, display format, variable label, value label name, label(s) associated with
numeric value(s), etc.3
. Access OS parameters
current/specified directory and/or sub-directories, specific types of files in
current/specified directory and/or subdirectories, etc
. Access stored estimation results
anything stored in e(), r(), or s()
. Etc., etc., etc.
3 Stata's describe command, for example, is based on these functions
4 See StataCorp (2013b, pp. 261 - 276)
Basic Stata Programming (24-26/11/2014) [email protected] 18 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
Task: access the storage type, variable label, value label name, and the value label corresponding to a
numeric value of 1 for the variable 'pcode' and then display these attributes
Task: Get the third element in the list '2 guys 1 girl 1 pizza place'
Both Stata macros and scalars can contain numeric and non-numeric information,
but there are differences
Macros Scalars
. Limited to 1mill+ characters . Limited to 244 characters
. Numeric information is converted . No conversion of numeric
(potential loss of accuracy) information (full numeric
. Must be dereferenced (referred precision)
to using syntax) . Can be referred to by name
. e.g.: . e.g.:
local root = sqrt(2.15) scalar root = sqrt(2.15)
di `root'/4 di root/4
>>> .36657196 >>> .36657196
Use scalars for intermediate calculations if precision is critical
Use macros everywhere else
To view all of the macros currently defined in Stata's memory (including the
current F-key mapping), simply issue the command macro dir
The preserve command preserves a copy of the current state of the data in
Stata's memory, ensuring that data can be restored after do-file/program
termination or after the restore command is issued
This is particularly useful when executing experimental and potentially
destructive procedures which one may wish to undo at a later stage.
Task: assign the preserve and restore commands to the F5 and F6 keys, respectively
1 * 1. Assign ' preserve ' to F5
2 global F5 "preserve;"
3
4 * 2. Assign ' restore ' to F6
5 global F6 "restore;"
Ending the definition of a shortcut macro with a semicolon ensures that the
command will be executed once the corresponding shortcut key is hit
. Without the semicolon, hitting the shortcut key will only make the command apper
in the Stata's Command window.
Every time Stata is invoked, it searches for a file called profile.do and, if found, it
executes all of the commands contained therein
This means that it is possible to make your preferred F-key shortcut mappings
permanent by placing their macro definitions in the profile.do file
To enable Stata to find this file, it is recommended that it be located in
C:\ado\personal, if ``C:\'' is the root directory where Stata is installed
The commands that may appear in profile.do are by no means limited only to
macro definitions
. E.g. you could specify your favourite working directory in profile.do or
. include a welcome message or
. start a log file
. etc.
Loops
What macros were born to do
Though useful in and of themselves, macros are most useful in the context of
loops
Loops allow you to ``loop through'' or repeat blocks of code based on specified
criteria
Three types of loops in Stata
. forvalues
loop over consecutive/fixed interval values
. foreach
loop over elements of a list (values, variables, macros, or names)
. while
loop while specified expression is evaluated as true
Each type has slightly different syntax
Which one you should use depends on what you want to accomplish
. But it is often possible to accomplish something using either of the three
Basic Stata Programming (24-26/11/2014) [email protected] 25 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
Say you want to print/display the numbers [0; 10] in intervals of 2 below one
another in the results window
You could run the following code:
1 di 0 // display value 0
2 di 2 // display value 2
3 di 4 // display value 4
4 di 6 // display value 6
5 di 8 // display value 8
6 di 10 // display value 10
1 * 1. forvalues loop
2 forvalues i = 0(2)10 { // for the range [0,10] in steps of 2
3 di `i ' // display value
4 } // repeat/end loop
5
6 * 2.1 foreach loop with specified numlist
7 foreach i of numlist 0 2 4 6 8 10 { // for each of the values specified
8 di `i ' // display value
9 } // repeat/end loop
10
11 * 2.2 foreach loop with anything specified
12 foreach i in 0 2 4 6 8 10 { // for each of the things specified
13 di `i ' // display value
14 } // repeat/end loop
15
16 * 3. while loop with initial value and increment
17 local i = 0 // define intial value for local i
18 while `i ' <=10 { // while value of local i<=10
19 di `i ' // display value
20 local i = `i ' +2 // increment local i by 2
21 } // repeat/end loop
In addition to iterating over values and numlists, it is also possible to loop over
variables
. When working with data there are many instances where we wish to repeat more
or less the same procedure several times for different variables
. the foreach loop is the main workhorse when it comes to looping over variables
rather than values
Say, for example, one wanted to
1. cross tabulate educ against several other categorical/discrete variables or
2. run a series of regressions, incrementally adding regressors to see how the results
change or
3. create a series of graphs showing the distributions of the log of household income
from different sources?
None of these task are particularly difficult, but they can be tedious if done
manually
. The solution is to loop over variables
Basic Stata Programming (24-26/11/2014) [email protected] 31 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
(10) This is the standard syntax that must be used when foreach is invoked to loop over variables
. The choice of the term ``var'' here is completely arbitrary and simply sets up the alias (i.e. `var') that may
be used within the loop to refer to the respective variables in the list
. ``of varlist'' is compulsory syntax needed to ensure that foreach loops over the varlist specified thereafter
(11) If one notes that the local `var' is simply an alias for each of the respective variables specified in the varlist,
then it should be obvious that this line simply evaluates to the command line on line 2 for the first iteration, the
command line on line 3 for the second iteration, and so on until it evaluates to the command line on line 7 for the
final iteration of the loop
(10) The syntax here is different from that used in the previous example
. We have included factor variable operators when specifying i.race, i.marital, and i.area . These variables do
not exists as specified in the data, so Stata will not be able to find them if told to search only through the
existing variables
. what we have actually specified is a list of strings, rather than a list of variables (it does not matter that they
refer to variables once specified in the context of the regress command)
. ``in'' is compulsory syntax needed to ensure that foreach loops over strings, or anything other than values,
variables, matrices, or scalars
(11) The self-referential use of the local `newlist' in the definition of newlist is a neat trick that allows one
to incrementally build a list over loop iterations. To explain, consider the following:
. Iteration 1: `newlist' is empty such that the line evaluates to local newlist `var' which in turn
evaluates to local newlist educ
. Iteration 2: `newlist' is already an alias for educ. The line thus evaluates to local newlist educ
`var' which in turn evaluates to local newlist educ age
. Iteration 3: `newlist' is already an alias for educ age. The line thus evaluates to local newlist
educ age `var' which in turn evaluates to local newlist educ age female
. This process repeats itself and through every iteration, `newlist' gains another variable
(15) As in the previous example, we are looping over strings rather than variables. The syntax used is thus the
standard foreach syntax for looping over anything
. This was done purely for the sake of convenience. All of the variable names have a common prefix. It is
therefore not necessary to write this prefix out every time
(16) This line designates logvar as an alias that may be used to refer to a temporary variable which may or may not
be created during the loop
(17) This line generates the temporary variable logvar based on the alias `logvar' and sets it equal to the natural
logarithm of the respective household income variables
. Temporary variables exist only within the loops/programs/do-files within which they are created. The
temporary variable logvar therefore ceases to exit as soon as the loop terminates. This is useful if we don't
want the logged versions of the household income variables for any purposes other than generating the
kdensity graphs (which we assume is the case here)
(18) For the first iteration of the loop, this line evaluates to kdensity logvar,
name(hhinc_labour,replace)
(12) (19) All of these lines employ the macro extended function `:variable label varname' to
dereference the existing variable label for the variable in question. This is included within quotation marks as a suffix
after the phrase HH Owns a when assigning the new variable labels to the variables.
(22) This is the standard syntax that must be used when foreach is invoked to loop over variables
. The choice of the term ``var'' here is completely arbitrary and simply sets up the alias (i.e. `var') that may
be used within the loop to refer to the respective variables in the list
. ``of varlist'' is compulsory syntax needed to ensure that foreach loops over the varlist specified thereafter
(23) If one notes that the local `var' is simply an alias for each of the respective variables specified in the varlist,
then it should be obvious that this line simply evaluates to the command line on line 12 for the first iteration, the
command line on line 13 for the second iteration, and so on until it evaluates to the command line on line 19 for the
final iteration of the loop
Stata's loop commands allow loops to be combined and nested within one
another
. One can combine/nest any number of forvalues, foreach, and/or while
loops that loop over values or variables
This exponentially increases the usefulness of loops, but also tends to make
things more complicated and difficult to stay on top of
For our final example, we will use two nested foreach loops to determine the
proportion of observations in the data are jointly non-missing for different pairs
of variables
Basic branching
Executing parts of code conditionally
A further way in which loops and programs can be made more powerful,
general, and robust is through the use of branching
Branching allows one to conditionally execute blocks of code, depending on
whether or not a certain condition is true
. that is, it allow allows us to specify what should happen if a certain condition is true,
and what should happen if that condition is false and/or another condition applies
Code can be ``branched'' by using the if, else, and, in some instances, the
else if commands
Suppose, for the purposes of illustration, we wanted to summarize all int
variables, ds all of str variables, and do nothing with the other variables in the
example data
Branching makes this easy to achieve
(2) When used to specify a varlist in Stata, an asterisk serves as a shorthand for ``all variables in the dataset''
(3) (5) This block of code will only be executed if the condition in line 3 is evaluated as true. In other words, only
if the variable type of the variable in question is int (short for integer)
(6) (8) The else if command used to initialise this block of code means that it's contents will only be
executed if
. the statement in line 3 evaluates to false AND the statement in line 6 evaluates to true
(9) The else command used to initialise this block of code means that it's contents (which is nothing) will
(10)
only be executed if
. the statement in line 3 evaluates to false AND the statement in line 6 evaluates to false (i.e. the variable in
question is neither a string nor an integer variable)
There are only three substantive new lines to the code: (5) ; (6) ; (11)
. (13) (14) These lines are technically also new, but the loop would function precisely the same if they
were deleted from the code
(5) This stores the variable pairs specified by the outer and inner foreach loops in reverse order in the local
pair
(6) 12 This block of code will only be executed if the condition in line 6 is evaluated as true
. The string function used here searches for the position in the string ```pairs''' at which ```pair'''
is found. If it is not found, the expression evaluates to zero
. (5) This uses the same self-referential ``trick'' to incrementally concatenate a string containing all of the
variable pairs for which lines (7) 10 have been executed
. the rationale here is that, if the string ```pair''' is found within ```pairs''' such that
strpos("`pairs'","`pair'") != 0, then it must be the case that the joint-non-missingness for
that pair of variables has already been determined and the current iteration of the inner loop should
therefore be skipped
. the evaluates values of `pair' and strpos("`pairs'","`pair'") for each of the iterations of
the outer and inner loops are presented on the next frame
User-defined programs
From loops to self-contained commands
In the much the same way that loops extend the functionality of basic Stata code,
programs (can) extend the functionality of loops and macros
User-defined programs offer additional layers of power, flexibility, and efficiency,
at the potential cost of greater complexity
User-defined commands (hereafter programs) function in precisely the same
way as Stata's commands
. Most of Stata's commands are in fact user-defined programs or based on
user-defined programs
Programs
. are called via a unique command name,
. may or may not accept/require compulsory and/or optional arguments and
. once issued execute one or more 'procedures' in Stata
What these procedures are and precisely how they are executed depends on
how the program, and its syntax, is defined
Basic Stata Programming (24-26/11/2014) [email protected] 51 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
There are a number of reasons why Stata users develop programs, but they
mostly relate to actual or percevide deficiencies in the available repertoire of
Stata commands
Some of the major reasons why you may want to develop a program are
. There is no existing Stata command that can do what you need it to do
. You are not aware of any existing Stata command(s) that can do what you need it to
do (more likely)
. Stata's command(s) for doing what you want are needlessly slow and you believe
you can program something more efficient
. There are Stata commands that can do 99% of what you want/need, but you want
that last 1%
. You want to have cleaner do-files, type fewer lines of code, and speed up your
coding
There are a number of ways in which one can define Stata programs
These differ in terms of
. The scope, flexibility, and robustness of the programs they can be used to define
. How easy their definition syntax is (how easy it will be for you to define the
program)
. How general their specification syntax is (how easy it will be for another Stata user
to call the program)
Learning to create your own Stata programs can be daunting at first and it is
tempting to avoid complicated syntax insofar as is possible
However, there are good reasons why you should try to learn how to adhere to
standard programming practice and standard Stata syntax
The initial cleaning of a new dataset in Stata tends to involve quite a lot of
variable renaming, variable labelling, and variable recoding/replacement
Stata has perfectly adequate commands for doing any one of these tasks, but in
some instances it would be nice if one could rename, assing a variable label, and
set to missing invalid values on a variable in one step
This is precisely what the revalve command in the example below does
Note that revalve does nothing novel
. it simply serves as a wrapper for existing Stata commands in an attempt to reduce
the amount of typing required to clean data
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
The table below illustrates what would happen if one used encode to convert
the string variable, read_hl, into a numeric variable:
Original string variable New encoded variable
Values (alphabetic) Rank ordering Values Value Labels
Fair 3 1 Fair
Not at all 1 2 Not at all
Not well 2 3 Not well
Very Well 4 4 Very Well
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
1 * 1. Create new variable ' newvar ' and fill with missing values
2 gen newvar = .
3
4 * 2. Fill in values of ' newvar ' based on string values of ' read_hl '
5 replace newvar = 1 if read_hl == "Not at all"
6 replace newvar = 2 if read_hl == "Not well"
7 replace newvar = 3 if read_hl == "Fair"
8 replace newvar = 4 if read_hl == "Very Well"
9
10 * 3. Define/Modify appropriate value label
11 label define newvar 1 "Not at all", modify
12 label define newvar 2 "Not well", modify
13 label define newvar 3 "Fair", modify
14 label define newvar 4 "Very Well", modify
15
16 * 4. Assign new value label and appropriate variable label to ' newvar '
17 label values newvar newvar
18 label var newvar "Respondent ' s self-reported home language reading level"
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
1 * 1. Convert string var ' read_hl ' to labeled numeric var ' newvar '
2 encode read_hl, gen(newvar)
3
4 * 2. Adjust ' newvar ' s values using recode
5 recode newvar (1=3) (2=1) (3=2)
6
7 * 3. Modify ' newver ' value label
8 label define newvar 1 "Not at all", modify
9 label define newvar 2 "Not well", modify
10 label define newvar 3 "Fair", modify
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
This code seems like a significant improvement over the previous example
. It is much more compact
. Only some of the values and value labels needed to be redefined
. The variable label was automatically assigned to the newvar variable
But it is still not very general and is overly reliant on manual input
Next up: introduce some automation using a loop and macro extended functions
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
1 * 1. Create new variable ' newvar ' and fill with missing values
2 gen newvar = .
3
4 * 2. Get list of unique values for ' read_hl ' and store in local `levels '
5 levelsof read_hl, local(levels)
6
7 * 3. Specify initial value for index local `i '
8 local i = 1
9
10 * 4. Automatically recode and label ' newvar ' using loop with ordered numlist
11 foreach num of numlist 3 1 2 4 {
12 replace newvar = `num ' if read_hl == "`: word `i ' of `levels ' ' "
13 label define newvar `num ' "`: word `i ' of `levels ' ' ", modify
14 local ++i
15 }
16
17 * 5. Assign new value label and appropriate variable label to ' newvar '
18 label values newvar newvar
19 label var newvar "`:variable label read_hl ' "
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Loops do not necessarily make code more efficient (they can do the opposite)
. Unless we intend to repeat the process, there is no real need for a loop in the
context of the present example
The initial setup cost of a loop can be high
. User-defined programs tend to have significantly higher setup costs
Decision rule: ese loops/programs iff
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
It is useful to note that the example data contains three more string variables
(write_hl, read_en, and write_en) that have exactly the same coding as read_hl
. we may want to use our code to achieved ordered encodes of these variables also
More generally, being able to specify the order in which a string variable is
encoded might prove useful in a range of contexts
Our goal is thus
1. To make it as easy as possible to achieve an ordered encode and
2. To make the code with which we do so as general/flexible as possible
We can achieve both of these goals by converting our loop code into a
self-contained program
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Plan ahead
. What arguments and options will your program need/take, etc.
. Will it accept weights and if statements, etc.
Use good syntax
. Adhere to syntax and usage guidelines given in StataCorp (2013b, pp. 505 - 519)
. This makes debugging easier, makes it easier for others to understand what your
program does, and makes it easier for you to remember what you were trying to
achieve
Start simple
. Try to create a minimal working example (MWE) that achieves basic objectives first
and then add complexity and refinement in incremental steps
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
We want to create our first version of the program, trying to keep the number
of changes we make to the bare minimum
At the very least, our MWE will require the following elements:
. A program/command name
We'll call the program oencode (short for ``ordered encode'') since it is moderately
descriptive and also a non-reserved name
. Allowance for two compulsory arguments:
1. The name of the string variable to be encoded and
2. The order in which the string variables values should be encoded
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
(4) The syntax line introduces a host of changes from the previous example
. the varlist_specifier max = 1 restricts the number of input variables that may be specified to one
. the varlist_specifier str restricts the type of input variable that may be specified to string variables only
. the addition of GENerate(string) to the syntax line adds a further compulsory argument to the
command. The user must specify a name for the new encoded variable to be created within the wrapper
generate().
. The respective uppercase parts of the GENerate() and Order() wrappers indicate the shortest
abbreviations that may be used to specify these compulsory arguments
(5) ; (6) ; (9) Prefixing these command lines with qui (short for quitely) suppresses any output that they
would otherwise generate
In each of these lines, newvar, has been replaced by `generate' which is the alias for
(5) ; (9) ; (10) ; (13) ; (14)
the new variable created by the command
(17) Call the program oencode with read_hl as the input varlist, arbitraryvar as the name of the new encoded
variable, and '3 1 2 4' as the input order() numlist
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
To make our code more robust we will introduce some changes that
. Check if the number of elements specified in the order() numlist matches the
number of elements in the `levels' alias,
. prevents execution of the program if it does not and
. informs the user of the program termination and the nature of the specification
error when neccesary
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
...
...
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
...
...
17 qui gen `generate ' = .
18 local i = 1
19 foreach num of numlist `order ' {
20 qui replace `generate ' = `num ' if `varlist ' == "`: word `i ' of `levels '
21 label define `generate ' `num ' "`: word `i ' of `levels ' ' ", modify
22 local ++i
23 }
24 label value `generate ' `generate '
25 label var `generate ' "`:variable label `varlist ' ' "
26 end
27
28 oencode read_hl, gen(randomvar) o(3 1 2) // too few values
29 oencode read_hl, gen(randomvar) o(3 1 2 4 5) // too many values
30 oencode read_hl, gen(randomvar) o(3 1 2 4) // correct
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
(6) ; (17) The order in which these two lines of code are executed has been switched around in the code to
prevent execution of the command in the event of specification errors
(8) This line can be interpreted as: count the number of elements/words in the numlist specified within the order()
wrapper and count the number of elements/words in the `levels' local. If the former is smaller than the latter, do the
following...
. (9) Issue Stata error code number 122: ``invalid numlist has too few elements''8
. (10) Terminate the program immediately without executing any further part of it
(12) This line can be interpreted as: if the previous if statement was evaluated as 'false', count the number of
elements/words in the numlist specified within the order() wrapper and count the number of elements/words in the
`levels' local. If the former is greater than the latter, do the following...
. (13) Issue Stata error code number 123: ``invalid numlist has too many elements''
. (14) Terminate the program immediately without executing any further part of it
(28) (30) Call the program oencode with too few, too many, and the right number of elements in the input
order() numlist
8 See StataCorp (2013b, pp. 182 - 195) for a list of Stata's error codes
Basic Stata Programming (24-26/11/2014) [email protected] 79 / 111
Introduction Basic programming Developing a program Examples & Applications Learning Stata Programming References
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Our program is now fairly stable, but there remains room for improvement
(almost always the case) and scope for enhancements
. E.g. when encoding a string variable, the user may want to replace the original string
variable with the encoded one rather than generating an additional variable in the
data
. Obviously, this can be done by using oencode to generate a new variable,
dropping the old string variable, assigning the old string variable name to the new
encoded variable, and ordering the new variable where the original string variable
used to be positioned in the data
. However, it would be neat if we could include the option to do precisely this
directly in our oencode command
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
For our final iteration of the oencode program, we will change the code such
that it allows the user
. Either to generate a new encoded variable by specifying a name in the
generate() wrapper or to replace the original sting variable with an encoded
version by specifying a replace option.
This requires
. making the generate() argument optional and introducing an additional
replace optional argument
. including exit clauses in the event that the user specifies both or neither of the
generate() and replace optional arguments
. specifying what happens if the user choose the replace option instead of the
generate() option
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
(9) (12) Specifies what will happen if both the generate() and replace options are specified
(13) (16) Specifies what will happen if neither the generate() nor the replace options are specified
(17) (27)Specifies what will happen if the generate() option is specified. The code inside this block is the
same as in the previous example
(28) (41) Specifies what will happen if the replace option is specified. The code in this block differs somewhat
from what was used before:
. (29) Rename the original input string variable to avoid renaming conflicts
. (30) Generate a placeholder variable with the same original name as the now renamed input variable
. (37) Order the new encoded variable before the renamed original input variable
. (40) Drop the renamed original input variable from the data
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
This last version of the oencode command is general, compact, and fairly
robust to specification error
. There are definitely more refinements and enhancements we could make, but the
present version should suffice for most uses
To permanently add it to Stata's repertoire of commands, we need to save the
snippet of program code within a file entitled oencode.ado and store it under the
C:\ado\personal folder
This will ensure that the oencode command is available the next time Stata is
opened
As a final example, we'll test how well the oencode command functions
Problems with encode Manual encoding Encoding using loops Developing oencode MWE Stable Final ado
Task: convert all of the string variables on self-reported reading and writing ability and on self-reported
emotional well-being into labelled numeric variables
Loops
Loops
Task: Summarize hourly wages and store its count, mean, standard deviation, and minimum and
maximum value in a nicely formatted matrix
Loops
Task: Summarize hourly wages, monthly earnings, per capita household income, and household income
and store the counts, means, standard deviations, minimum and maximum values in a nicely
formatted matrix
Loops
Task: Run a basic mincerian earnings function and access/display the standard errors from the estimation
Loops
Task: Run a basic mincerian earnings function and access the standard errors from the estimation using a
simple command
Loops
Task: Create a local polynomial graph between two variables and provide basic adornments to the figure
Loops
Task: Create a local polynomial graph between two variables and provide basic adornments to the figure
Loops
Task: Graphically illustrate the distribution of the log of household income from labour, grants,
government sources, investments, remittances, and rent on one graph
Loops
Task: Generate a labelled age cohort variable with intervals of 4 years based on the age variable
1* 1. Create and fill ' agecohort ' variable based on values of ' age '
2gen agecohort = .
3 replace agecohort = 1 if inrange(age,0,4)
4 replace agecohort = 2 if inrange(age,5,9)
........................................
23 replace agecohort = 21 if inrange(age,100,104)
24 replace agecohort = 22 if inrange(age,105,109)
25
26 * 2. Define/modify value labels
27 label define agecohort 1 "0 - 4", modify
28 label define agecohort 2 "5 - 9", modify
........................................
47 label define agecohort 21 "100 - 104", modify
48 label define agecohort 22 "105 - 109", modify
49
50 * 3. Assign variable and value labels and order variable
51 label values agecohort agecohort
52 label var agecohort "Age cohort"
53 order var agecohort, after(age)
Loops
1 * 1. Create and fill ' agecohort ' var & value labels based on values of ' age '
2 gen agecohort = .
3 local i = 1
4 forvalues r = 0(5)105 {
5 local s = `r ' + 4
6 replace agecohort = `i ' if inrange(age,`r ' ,`s ' )
7 label define agecohort `i ' "`r ' - `s ' ", modify
8 local ++i
9 }
10
11 * 2. Assign variable and value labels and order variable
12 label values agecohort agecohort
13 label var agecohort "Age cohort"
14 order agecohort, after(age)
Loops
Task: Calculate the proportion of females for each race group and graphically illustrate the point
estimates along with the 95% confidence intervals
Loops
Task: Calculate the number of household members and household residents per household
Loops
Thank you
Official resources for learning Stata Other resources for learning Stata
Official resources for learning Stata Other resources for learning Stata
A number of excellent books on Stata and statistics can be purchase from the
Stata Bookstore
. These books cover a wide array of topics and applications ranging from basic data
management and workflow in Stata to advanced econometric modelling and
programming
The Stata Journal is a quarterly publication of peer-reviewed articles, tips, and
notes on various applications in Stata
. This is one of the best resources for keeping abreast of modern
econometric/analytical techniques and how they can be implemented in Stata
. The topics dealt with range from simple trips and ticks that can be used to improve
workflow in Stata to involved applications of econometric techniques at the
forefront of the discipline
. A subscription is required to gain access to the most recent issues of the journal, but
articles from back issues can often be accessed for free
Official resources for learning Stata Other resources for learning Stata
A number of excellent books on Stata and statistics can be purchase from the
Stata Bookstore
. These books cover a wide array of topics and applications ranging from basic data
management and workflow in Stata to advanced econometric modelling and
programming
The Stata Journal is a quarterly publication of peer-reviewed articles, tips, and
notes on various applications in Stata
. This is one of the best resources for keeping abreast of modern
econometric/analytical techniques and how they can be implemented in Stata
. The topics dealt with range from simple tips and ticks that can be used to improve
workflow in Stata to involved applications of econometric techniques at the
forefront of the discipline
. A subscription is required to gain access to the most recent issues of the journal, but
articles from back issues can often be accessed for free
Official resources for learning Stata Other resources for learning Stata
NetCourses
Training on Stata, by Stata
Official resources for learning Stata Other resources for learning Stata
There are many excellent third-party resources for learning Stata programming
freely available online
. These resources vary in terms of their coverage, comprehensiveness, focus, level,
target audience, and structure
. Taken together, they are an incredibly useful for learning Stata and should be more
than sufficient for gaining familiarity with the vast majority of applications most users
(including advanced users) may be interested in
Many of these free resources have proved invaluable in my own understanding
and command of Stata
I strongly encourage all interested users to exploit the availability of the
resources listed below
Official resources for learning Stata Other resources for learning Stata
The list below includes my favourite free, third-party resources for learning
Stata, but is by no means an exhaustive list of what is available online
Official resources for learning Stata Other resources for learning Stata
. Introduction to basic and most frequently used commands in Stata and illustration of
their uses
4. Stata Programming Essentials by the Social Science Computing Cooperative at
UW-Madison (intermediate)
. Good overview of and introduction to programming concepts (macros, loops, etc)
5. Stataman's The Stata Project-Oriented Guide (intro-advanced)
. As the name implies, this site offers a project-oriented approach to learning data
manipulation, results manipulation, command automation, and results presentation
in Stata
6. Alexander C. Lembecke's Advanced Stata Topics notes at the London School of
Economics (advanced)
. One of the best introductions (brief, yet thourough) to programming basics, syntax,
maximum likelihood methods, and Mata that I have seen. This is one of my
permanent reference resources
Official resources for learning Stata Other resources for learning Stata
Official resources for learning Stata Other resources for learning Stata
Official resources for learning Stata Other resources for learning Stata
. Increasingly, Stata users (and StataCorp) are uploading video tutorials to YouTube.
These tutorials vary in quality, usefulness, and scope, but are useful for those who
prefer the screencast medium
Official resources for learning Stata Other resources for learning Stata
Thank you
References I
Baum, C. F. (2005). SUGUK 2005 invited lecture: A little bit of Stata programming goes a long way...
Lecture, Boston College.
Driver, S. (2005). Stata tip 18: Making keys functional. The Stata Journal 5(1), pp. 137 -- 138.
Schechter, C. (2011). Stata tip 99: Taking extra care with encode. The Stata Journal 11(2), pp. 321 -- 322.
StataCorp (2013a). Getting Started Stata for Windows Release 13. StataCorp, College Station, TX: Stata Press.
--------- (2013b). Stata Programming Reference Manual Release 13. StataCorp, College Station, TX: Stata
Press.
[online] available from http://www.stata.com/manuals13/p.pdf
--------- (2013c). Stata User's Guide Release 13. StataCorp, College Station, TX: Stata Press.
[online] available from http://www.stata.com/manuals13/pmacro.pdf