0% found this document useful (0 votes)
9 views

Computing Stata Notes

computing 2 miss sabrina afzal notes

Uploaded by

toufeeqjamal908
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Computing Stata Notes

computing 2 miss sabrina afzal notes

Uploaded by

toufeeqjamal908
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Stata Notes

Important File types

1. Data input and output


The default extension for data file in Stata is “.dta”.
2. Log File
Log Files are used to keep record of everything you carry out in Stata. It saves everything that
appears in “result” window. We usually have two formats for log files: 1) Stata format “.smcl” 2)
Notepad format “.log”
a. You can start a log file from File>Log>Begin
3. Do File
Do Files are very important feature of Stata. It allows you to save all your command in tiny text
file that you can use later to reproduce all your results.
a. You can open a Do File window from Window>Do-file Editor>New Do-file Editor

Variables types:

Stata has two broad categories for variable:

1. String: this variable can contain both numbers and alphabets (and other symbols)
a. String variables are stored in “str” format
 Let’s suppose b1 has been coded as Male and Female. By using following
command we can convert string variable to a numeric variabele:
o . encode b1, generate(sex)
o . replace sex=1 if b1=="Male"
o . replace sex=2 if b1=="Female"
2. Numeric: you can only have numbers in this format of variable
a. Following are different types of number variables: byte, int, long, float, and double

You can look for variable type under the Properties window in Stata 12 (and under Variables window in
previous versions).

Importing and Viewing data in Stata

You can copy-paste data into Stata from an excel file using Data Editor. You can open Data Editor from
Data>Data Editor>Data Editor (Edit). You can also type “editor” in the command menu.

Make sure that variable names do not have spaces and symbols in them. Otherwise Stata will treat them
as any other observation.

 Data Browser
We usually view data in Data Browser. It looks same as Data Editor but it does not allow you
to make any changes in the data. You can open Data Browser from Data>Data Editor>Data
Editor (Browse). You can also type “browse” in the command menu.
o br b1 pcode msno b3
Variables window

When you import a dataset in Stata; you can find information on each variable in that dataset under
Variables window. In Stata 12, Variables window display variable name and label. Label is simply a brief
description of a variable.

You can use following commands to change variable name and label:

 rename OldVariableName NewVariableName


 . rename B1 gender
 label variable VariableName “Label that you want to give”

Make sure that variable name does not contain any space and label is written under inverted commas.

Describe Variable

You can also use describe command to view variable’s type, format, and labels.

describe VariableName1 VariableName2

describe

You can also view properties of multiple variables by specifying each name. Executing describe without
specifying variable name will display properties of all the variables.

Dropping a Variable

You can use drop command to drop and variable:

drop VariableName1 VariableName2

Keep is another command for the same purpose. You can also specify the list of variables that you do not
want to drop:

keep VariableName3

List Command

We can also view data in our “result” window by using list command:

listVariableName

list in 1/3

Sort Command

Sort command is used to arrange your dataset with respect to given variable. The following command is
used to sort data in an ascending order.

Sort VariableName

To sort the data in descending order:s

gsortVariableName
Stata’s Help and Search:

Stata Help provides detail description on syntax and use of each command in Stata. You can open help
dialogue box from Help>Stata Command.

Help window, for each command, also provides link (top right corner) to its dialogue box. We can
execute each command both using a command or its dialogue box. Dialogue box allow you to execute
command in a much simpler manner. However, it is recommended that you should use command
language as a way to ensure reproducibility of your work.

Logical and Relational Operator in Stata:

Stata has following symbols for different relational and logical operators:

Logical Operators Relational Operator


~ not > greater than
| or < less then
& and == equal to
>= greater than or equal to
<= less than or equal to
!= or ~= not equal
Count command:

It counts observations satisfying specified conditions. A simple count returns total number of observation
in a data set. We can also combine count command with if command to narrow our search.

count if VariableName1==1

count if gender==”Male”

count if variableName1==1 & gender==”Male”

First command simply counts then number of observation where “VariableName1” is equal to 1. Second
commands perform the similar function on “gender” but specifies condition in inverted commas; this is
because “VariableName2” is a string variable. Last command combines both of these conditions.

Summarize Command:

This command is used to generate summary statistics for a given set of variable. A simple summarize
followed by a variable list will display: number of observations, mean, standard deviation, minimum
value, and maximum value.

summarize VariableName1 VariableName2

We can also generate additional summary statistics by using option of “detail”:

summarize VariableName1 VariableName2, detail


Tabulate Command:

This is used to generate both one-way and two-way frequency table. A simple tabulate command
followed by single variable will display each unique observation along with its frequency followed by
cumulative frequency. We can use option of “missing” to find out number of missing observations.

tabulate VaraibleName1

tabulate VariableName1, missing

Tabulate command followed by two variables will generate a cross-tab of those variables. We can use
both string and numeric variables with tabulate command.

tabulate VariableName1 VariableName2

Tabstat Command:

Tabstat combines tabulate command with summarize command.

tabstat VaraibleName1, by(VariableName2)

This will display mean values of “VariableName1” for each unique value of “VariableName2”. This
command only display mean on default but you can use the option of “statistics” to generate additional
statistics.

tabstat VaraibleName1, statistics(mean min max) by(VariableName2)

Now, it will also display minimum and maximum values.

Generate command:

It is used to create new variables in Stata.

generateVaraibleNew=1

This command will generate a new variable with a name “VariabaleNew” where each observation will be
“1”. Note that we have only used single “equals to” sign instead of writing it twice; this is used when we
are setting anything equal to a value or a condition, on the other hand double “equals to” signs are used to
test for equality (As we have discussed it above).

We can combine this command with if:

generateVariableNew=1

Generating Dummy Variables

We can combine generate command with if condition:

generate Dummy=1 if gender==1


This command will generate a new variable with a name “Dummy” that will have a value of 1 where
gender will be male. This command will leave rest of the observations blank (or missing displaying dots).
In order to fill those blanks, we use replace command:

replace Dummy=0 if gender==2

We can also use following way:

replace Dummy=0 if Dummy==.

In the first case we replaced “Dummy” (same variable) with zeros for females. In the second case we
replace all the missing values with zeros. Both will have a same result if variable “gender” does not
contain any missing value. Now, we have a dummy variable that has value 1 for Male and 0 for Female.

You can combine multiple logical and relational operators with multiple variables to generate much more
complex variable. It is better to view data browser while generating complex variables to check your
output.

1. More on generate command:


 Make sure that you view your data after using generate/replace command to double check your
output.
 Stata reads missing values “.” as a highest value in a numeric variable. For example: if you have a
variable (x) that has values ranging from 1-10 and some missing values. Then asking Stata to
generate a variable that takes value of 1 if variable x is greater than 5 (generate dummy=1 if x>5)
will also count missing values in it. Therefore, you would put such commands as:
o generate dummy=1 if x>5 & x!=.
 It is also recommended that you should use parenthesis whenever you need to add multiple
logical operators. For example if you want to generate a new variable that is equals to 1 if a
student is a male and he has either taken economics as major or scored more than 80% in math
test:
o generate dummy=1 if gender==”Male” & major=”Economics” | math>80
This will not generate the desire variable. Stata will read this as:
o generate dummy=1 if (gender==”Male” & major=”Economics”) | math>80

That means a male student with economics majors or a student with just more than 80% in
math test. Therefore it is better to clearly state your conditions:

o generate dummy=1 if gender==”Male” & (major=”Economics” | math>80)


 When you are dealing with interaction terms (interacting two variables); make sure your
comparison group is consistent. For example: if you need to generate a variable that is equal to 1
if a person is male and lives in urban area then its comparison groups would be: male living in
rural area, female living in urban area, female living in rural area. Make sure that you have coded
you variable correctly.

You might also like