Computing Stata Notes
Computing Stata Notes
Variables types:
1. String: this variable can contain both numbers and alphabets (and other symbols)
a. String variables are stored in “str” format
Let’s suppose b1 has been coded as Male and Female. By using following
command we can convert string variable to a numeric variabele:
o . encode b1, generate(sex)
o . replace sex=1 if b1=="Male"
o . replace sex=2 if b1=="Female"
2. Numeric: you can only have numbers in this format of variable
a. Following are different types of number variables: byte, int, long, float, and double
You can look for variable type under the Properties window in Stata 12 (and under Variables window in
previous versions).
You can copy-paste data into Stata from an excel file using Data Editor. You can open Data Editor from
Data>Data Editor>Data Editor (Edit). You can also type “editor” in the command menu.
Make sure that variable names do not have spaces and symbols in them. Otherwise Stata will treat them
as any other observation.
Data Browser
We usually view data in Data Browser. It looks same as Data Editor but it does not allow you
to make any changes in the data. You can open Data Browser from Data>Data Editor>Data
Editor (Browse). You can also type “browse” in the command menu.
o br b1 pcode msno b3
Variables window
When you import a dataset in Stata; you can find information on each variable in that dataset under
Variables window. In Stata 12, Variables window display variable name and label. Label is simply a brief
description of a variable.
You can use following commands to change variable name and label:
Make sure that variable name does not contain any space and label is written under inverted commas.
Describe Variable
You can also use describe command to view variable’s type, format, and labels.
describe
You can also view properties of multiple variables by specifying each name. Executing describe without
specifying variable name will display properties of all the variables.
Dropping a Variable
Keep is another command for the same purpose. You can also specify the list of variables that you do not
want to drop:
keep VariableName3
List Command
We can also view data in our “result” window by using list command:
listVariableName
list in 1/3
Sort Command
Sort command is used to arrange your dataset with respect to given variable. The following command is
used to sort data in an ascending order.
Sort VariableName
gsortVariableName
Stata’s Help and Search:
Stata Help provides detail description on syntax and use of each command in Stata. You can open help
dialogue box from Help>Stata Command.
Help window, for each command, also provides link (top right corner) to its dialogue box. We can
execute each command both using a command or its dialogue box. Dialogue box allow you to execute
command in a much simpler manner. However, it is recommended that you should use command
language as a way to ensure reproducibility of your work.
Stata has following symbols for different relational and logical operators:
It counts observations satisfying specified conditions. A simple count returns total number of observation
in a data set. We can also combine count command with if command to narrow our search.
count if VariableName1==1
count if gender==”Male”
First command simply counts then number of observation where “VariableName1” is equal to 1. Second
commands perform the similar function on “gender” but specifies condition in inverted commas; this is
because “VariableName2” is a string variable. Last command combines both of these conditions.
Summarize Command:
This command is used to generate summary statistics for a given set of variable. A simple summarize
followed by a variable list will display: number of observations, mean, standard deviation, minimum
value, and maximum value.
This is used to generate both one-way and two-way frequency table. A simple tabulate command
followed by single variable will display each unique observation along with its frequency followed by
cumulative frequency. We can use option of “missing” to find out number of missing observations.
tabulate VaraibleName1
Tabulate command followed by two variables will generate a cross-tab of those variables. We can use
both string and numeric variables with tabulate command.
Tabstat Command:
This will display mean values of “VariableName1” for each unique value of “VariableName2”. This
command only display mean on default but you can use the option of “statistics” to generate additional
statistics.
Generate command:
generateVaraibleNew=1
This command will generate a new variable with a name “VariabaleNew” where each observation will be
“1”. Note that we have only used single “equals to” sign instead of writing it twice; this is used when we
are setting anything equal to a value or a condition, on the other hand double “equals to” signs are used to
test for equality (As we have discussed it above).
generateVariableNew=1
In the first case we replaced “Dummy” (same variable) with zeros for females. In the second case we
replace all the missing values with zeros. Both will have a same result if variable “gender” does not
contain any missing value. Now, we have a dummy variable that has value 1 for Male and 0 for Female.
You can combine multiple logical and relational operators with multiple variables to generate much more
complex variable. It is better to view data browser while generating complex variables to check your
output.
That means a male student with economics majors or a student with just more than 80% in
math test. Therefore it is better to clearly state your conditions: