Introduction To Stata 2024-06-18 Handout
Introduction To Stata 2024-06-18 Handout
2
5 Basic windows
1. Command
2. Results
3. History
4. Variables
5. Properties
(See “Window” menu)
3
Setting up a log file in Stata
• A log file allows you to save your typed command and
results displayed for later use or review
• Your personal notes are also saved in the log file:
begin the line with *
• Two types of log file
• .smcl (Stata Markup and Control Language) which looks nice
(with command lines in bold font) on Stata but cannot be
opened with other text editor
• .log which is a plain text that can be opened in any text editor
• Menu > File > Log (or through the “Log” icon)
4
Setting up a log file in Stata
5
Setting up a log file in Stata
• To save and close the log file
log close
• Or through GUI (which can also be used to view the
ongoing log file)
• Menu > File > Log
• “Log” icon
6
Stata data files
• Stata can only load one dataset at a time
• Changes to data file are not saved automatically; you
have to save the changes by yourself
• However, saving changes to the data points (values)
only in data analysis software is not recommended;
the changes should be made during data entry,
not during analysis
• Stata data files have .dta extension
• Other types of data files (e.g., .xls, .txt, and .sav) can
also be imported
7
Importing data from Excel
• Menu > File > Import
• Select “Import first row as variable names”
• A range of Excel cells to import can also be specified
8
Executing commands in Stata
1. “Command” window
2. .do file
3. Graphical user interface (GUI) menu and
dialog boxes, available for most commands
• To invoke a dialog box
db command
e.g., db summarize
9
GUI versus command line
Whenever you
input a command
through GUI, Stata
will automatically
convert it to a
command line
(unabridged) and
execute it.
10
Command lines in Stata
• A command line may contain: prefixes, commands,
variables, conditions (following if or in), and options.
• NB the distinction between them
• Variable names can be typed in fully, partially ± using
Tab key, or selected from Variables window.
• Abbreviations are possible and can save time in typing.
• The command lines are sensitive to case and
punctuation, but rarely to space.
11
General structure
bysort sex: summarize age if (age >= 30) & (age < 45), detail
Abbreviation:
list id age education if school==1 in 1/5, nolabel clean
l id age edu if sch==1 in 1/5, nol clean
15
Displaying help in Stata
help command
e.g., h summarize
16
Executing a command in some
specific observations
• Applying condition(s) based on values of variable(s)
command … if condition(s)
• In a specific range of observations
command … in observations
(NB: Here, “ / ” means “through”)
• if and in may be used together (but rarely)
• Values of string variables are both case-sensitive and
space-sensitive, and must be wrapped in "…"
list id age if sex=="Male" & school==1 in 1/5
17
Conditional expressions
• Relational operators
Sign Meaning
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
== Equal to
!= Not equal to
~= Not equal to
18
Conditional expressions
• Logical operators
Sign Meaning
& And
| Or
! Not
~ Not
19
Conditional expressions
inlist(var, value, value, …)
• The condition is met when the variable (either
numeric or string [with "…"]) has values that
match those in the list.
• Example:
command … if inlist(histo, 1, 3, 4)
Returns all records with variable histo = 1 or 3
or 4, i.e., identical to:
command … if histo==1 | histo==3 |
histo==4
20
Conditional expressions
inrange(var, value, value)
• The condition is met when the variable (only
numeric) has a value that falls within the
specified range.
• Examples:
command … if inrange(histo, 1, 3)
• All records with histo ranging from 1 to 3
command … if inrange(age, 60, .)
• All records with non-missing age of ≥ 60
21
Prefixes “by” and “bysort”
• Executing command once for each level of a variable
using “by” (does not work without prior sorting).
sort varlist
by varlist: command …
22
Using Stata to calculate something
display …
• Examples:
display 23+25+29+30
di sqrt(400) * (12^3)
di ln(2.7182818)
di invnormal(0.975)
di normal(1.96)
23
Data Editor (Edit/Browse)
24
25
Showing description of data
describe varlist
codebook varlist
• If the variable(s) is/are not specified, all of the
variables in the dataset will be described.
• NB Type of data (categorical vs. continuous) and type
of variable (string vs. numeric) can be unmatched.
• Categorical data may be stored in a numeric variable.
• Continuous data may also be stored in a string variable.
26
Displaying values of specified variables
list varlist
27
list hn age sex
28
Missing values in Stata
• Stata commands use . for missing values in numeric
variables
• Missing values are treated as the largest values:
• All non-missing values < missing values
• Many types of missing values can be specified
through adding a-z character after . (total: 27 types)
. < .a < .b < .c < … < .z
• To show variables with missing values
misstable summarize
• For missing values in string variables, use ""
29
Data management
commands in Stata
30
Labeling
• Labeling variables
• Labeling values of data
31
Labeling variables
32
label variable
varname “label”
• Reporting labeled
variable looks better
and more
understandable to
other people
33
Value labels
• In Stata, integer variable values can be labeled as
categories
• For example, you can label variable sex
• 1 as Male
• 2 as Female
• Why should the values be labeled?
• Output will provide labels instead of just numeric value
• This aids in interpretation and formatting of results
34
Labeling values of data
label define labelname
# “label” # “label” …
label list
numlabel, add
numlabel, remove
35
Categorizing continuous variables
1. Generate a new variable based upon conditions
2. A quick command for categorizing numerical data
36
1. Generate a new variable based on
conditions
• Example: From guidelines, timing of iodine
administration after surgery should be within 42 days
37
1.1. Generate a new variable
generate newvar = value if …
38
1.2. Replace values of the variable
• Because you cannot generate the variable of the
same name, you have to use replace command
instead
• Repeat this as many times as needed
39
2. Quick command for categorizing data
• recode can perform all the previous steps plus
applying value labels in only 1 command line
recode var (rule=value “label”)
(rule=value “label”) …, generate(newvar)
• Example:
recode rep (min/3=1 "Below average")
(4 5=2 Average) (6/max=3 "Above average"),
gen(newrep)
40
Recoding variables
• Examples of rules
Rule Meaning
12 1 or 2
0/10 0 through 10
. Missing
11/max 11 through maximum
41
Variables management commands
• Renaming variables
rename oldvar newvar
rename (oldvarlist) (newvarlist)
• Removing variables
drop varlist
keep varlist
• Removing observations
drop if …
keep if …
42
Changing variable types
• Converting string variables with numeric characters
into numeric variables
destring varlist, replace
destring varlist, generate(newvarlist)
destring varlist, generate(newvarlist) force
(force option: non-numeric characters are dropped)
• Converting numeric variables into string variables with
numeric characters
tostring varlist, generate(newvarlist)
43
Changing variable types
• Creating a numeric variable with value labels from a
string variable
encode var, generate(newvar)
• Creating a string variable from a numeric variable with
value labels
decode var, generate(newvar)
44
Dataset management
save filename
use filename
import excel …
cd
merge
append
reshape
45
Saving data temporarily
• Sometimes you would like to preserve a dataset
before manipulation
46
Snapshoting data
• If you would like to preserve multiple versions of the
dataset, use snapshot command
snapshot save
snapshot restore snapshot#
snapshot list
47
Saving commands for
further use
48
Stata’s .do file
• Do-file: a file containing list of Stata commands
• Useful when you are doing large projects that may
require repeated analysis
49
50
51
Saving commands from History window
• Select commands in
History window
• Right-click and choose
“Send to Do-file Editor”
52
Executing commands in Do-file editor
• Select the command line(s) you want to execute
• Menu > Tools > Execute (do) or click
or press Ctrl+D
53
Tips & tricks: special commands often
used in do-files
• To run commands with long
results without pausing
set more off
• After importing or opening
data file, use the option
“clear” to clear the previous
dataset in memory
• Comments after // (on the
same line as the command or
on a new line) or * (only on a
new line)
• Join lines with ///
54
Thank you for your attention
55