0% found this document useful (0 votes)
97 views52 pages

Introduction To Stata 2024-06-18 Handout

Uploaded by

sawettachai.jai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views52 pages

Introduction To Stata 2024-06-18 Handout

Uploaded by

sawettachai.jai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to Stata

Assoc. Prof. Kunlawat Thadanipon, MD, MSc

Department of Clinical Epidemiology and Biostatistics


Faculty of Medicine Ramathibodi Hospital
Mahidol University

18th July 2024


1
Content
• Orientation to Stata
• Basic functions and commands in Stata

2
5 Basic windows
1. Command
2. Results
3. History
4. Variables
5. Properties
(See “Window” menu)

3
Setting up a log file in Stata
• A log file allows you to save your typed command and
results displayed for later use or review
• Your personal notes are also saved in the log file:
begin the line with *
• Two types of log file
• .smcl (Stata Markup and Control Language) which looks nice
(with command lines in bold font) on Stata but cannot be
opened with other text editor
• .log which is a plain text that can be opened in any text editor
• Menu > File > Log (or through the “Log” icon)
4
Setting up a log file in Stata

5
Setting up a log file in Stata
• To save and close the log file
log close
• Or through GUI (which can also be used to view the
ongoing log file)
• Menu > File > Log
• “Log” icon

6
Stata data files
• Stata can only load one dataset at a time
• Changes to data file are not saved automatically; you
have to save the changes by yourself
• However, saving changes to the data points (values)
only in data analysis software is not recommended;
the changes should be made during data entry,
not during analysis
• Stata data files have .dta extension
• Other types of data files (e.g., .xls, .txt, and .sav) can
also be imported

7
Importing data from Excel
• Menu > File > Import
• Select “Import first row as variable names”
• A range of Excel cells to import can also be specified

8
Executing commands in Stata
1. “Command” window
2. .do file
3. Graphical user interface (GUI) menu and
dialog boxes, available for most commands
• To invoke a dialog box
db command
e.g., db summarize

9
GUI versus command line
Whenever you
input a command
through GUI, Stata
will automatically
convert it to a
command line
(unabridged) and
execute it.

10
Command lines in Stata
• A command line may contain: prefixes, commands,
variables, conditions (following if or in), and options.
• NB the distinction between them
• Variable names can be typed in fully, partially ± using
Tab key, or selected from Variables window.
• Abbreviations are possible and can save time in typing.
• The command lines are sensitive to case and
punctuation, but rarely to space.

11
General structure
bysort sex: summarize age if (age >= 30) & (age < 45), detail

prefix command data condition option


+variable(s)

Abbreviation:
list id age education if school==1 in 1/5, nolabel clean
l id age edu if sch==1 in 1/5, nol clean

15
Displaying help in Stata
help command

• This also shows the possible abbreviation of the


command and its options.

e.g., h summarize

16
Executing a command in some
specific observations
• Applying condition(s) based on values of variable(s)
command … if condition(s)
• In a specific range of observations
command … in observations
(NB: Here, “ / ” means “through”)
• if and in may be used together (but rarely)
• Values of string variables are both case-sensitive and
space-sensitive, and must be wrapped in "…"
list id age if sex=="Male" & school==1 in 1/5

17
Conditional expressions
• Relational operators

Sign Meaning
> Greater than
< Less than
>= Greater than or equal to
<= Less than or equal to
== Equal to
!= Not equal to
~= Not equal to

18
Conditional expressions
• Logical operators

Sign Meaning
& And
| Or
! Not
~ Not

19
Conditional expressions
inlist(var, value, value, …)
• The condition is met when the variable (either
numeric or string [with "…"]) has values that
match those in the list.
• Example:
command … if inlist(histo, 1, 3, 4)
Returns all records with variable histo = 1 or 3
or 4, i.e., identical to:
command … if histo==1 | histo==3 |
histo==4
20
Conditional expressions
inrange(var, value, value)
• The condition is met when the variable (only
numeric) has a value that falls within the
specified range.
• Examples:
command … if inrange(histo, 1, 3)
• All records with histo ranging from 1 to 3
command … if inrange(age, 60, .)
• All records with non-missing age of ≥ 60

21
Prefixes “by” and “bysort”
• Executing command once for each level of a variable
using “by” (does not work without prior sorting).
sort varlist
by varlist: command …

• These 2 command lines can be combined as follows.


bysort varlist: command …

22
Using Stata to calculate something
display …

• Examples:
display 23+25+29+30
di sqrt(400) * (12^3)
di ln(2.7182818)
di invnormal(0.975)
di normal(1.96)
23
Data Editor (Edit/Browse)

24
25
Showing description of data
describe varlist
codebook varlist
• If the variable(s) is/are not specified, all of the
variables in the dataset will be described.
• NB Type of data (categorical vs. continuous) and type
of variable (string vs. numeric) can be unmatched.
• Categorical data may be stored in a numeric variable.
• Continuous data may also be stored in a string variable.

26
Displaying values of specified variables
list varlist

• Useful in exploring the data with results shown in


the log file (cf. browsing in Data Editor)
• May be used with “if”, “in”, or both.
• A number of options are also available (see Help
page).

27
list hn age sex

list hn age sex if revision==1

list hn age sex if (age>50) & (age<53) in 1/5,


nolabel clean

28
Missing values in Stata
• Stata commands use . for missing values in numeric
variables
• Missing values are treated as the largest values:
• All non-missing values < missing values
• Many types of missing values can be specified
through adding a-z character after . (total: 27 types)
. < .a < .b < .c < … < .z
• To show variables with missing values
misstable summarize
• For missing values in string variables, use ""

29
Data management
commands in Stata

30
Labeling
• Labeling variables
• Labeling values of data

(These can also be conveniently done through GUI)

31
Labeling variables

32
label variable
varname “label”

• Reporting labeled
variable looks better
and more
understandable to
other people

33
Value labels
• In Stata, integer variable values can be labeled as
categories
• For example, you can label variable sex
• 1 as Male
• 2 as Female
• Why should the values be labeled?
• Output will provide labels instead of just numeric value
• This aids in interpretation and formatting of results

34
Labeling values of data
label define labelname
# “label” # “label” …

label values varlist


labelname

label list

numlabel, add
numlabel, remove

35
Categorizing continuous variables
1. Generate a new variable based upon conditions
2. A quick command for categorizing numerical data

36
1. Generate a new variable based on
conditions
• Example: From guidelines, timing of iodine
administration after surgery should be within 42 days

• datediff is the variable for number of days

• Categorize it into 2 groups in a new categorical variable


• 1: patients who received within 42 days
• 2: patients who received iodine later than 42 days

37
1.1. Generate a new variable
generate newvar = value if …

• Stata will examine the condition in each record and


assign the specified value to the new variable only if
the condition is met, otherwise a missing value is
assigned
• inlist and inrange functions can be useful

38
1.2. Replace values of the variable
• Because you cannot generate the variable of the
same name, you have to use replace command
instead
• Repeat this as many times as needed

replace var = value if …

39
2. Quick command for categorizing data
• recode can perform all the previous steps plus
applying value labels in only 1 command line
recode var (rule=value “label”)
(rule=value “label”) …, generate(newvar)

• Example:
recode rep (min/3=1 "Below average")
(4 5=2 Average) (6/max=3 "Above average"),
gen(newrep)

40
Recoding variables
• Examples of rules

Rule Meaning
12 1 or 2
0/10 0 through 10
. Missing
11/max 11 through maximum

41
Variables management commands
• Renaming variables
rename oldvar newvar
rename (oldvarlist) (newvarlist)

• Removing variables
drop varlist
keep varlist

• Removing observations
drop if …
keep if …

42
Changing variable types
• Converting string variables with numeric characters
into numeric variables
destring varlist, replace
destring varlist, generate(newvarlist)
destring varlist, generate(newvarlist) force
(force option: non-numeric characters are dropped)
• Converting numeric variables into string variables with
numeric characters
tostring varlist, generate(newvarlist)

43
Changing variable types
• Creating a numeric variable with value labels from a
string variable
encode var, generate(newvar)
• Creating a string variable from a numeric variable with
value labels
decode var, generate(newvar)

44
Dataset management
save filename
use filename
import excel …
cd
merge
append
reshape

45
Saving data temporarily
• Sometimes you would like to preserve a dataset
before manipulation

• preserve: saves dataset in temporary memory


• restore: restores the preserved dataset

• Only one set of preserved data is available

46
Snapshoting data
• If you would like to preserve multiple versions of the
dataset, use snapshot command

snapshot save
snapshot restore snapshot#
snapshot list

47
Saving commands for
further use

48
Stata’s .do file
• Do-file: a file containing list of Stata commands
• Useful when you are doing large projects that may
require repeated analysis

• Starting a new empty do-file:


• Window > Do-file Editor > New Do-file Editor
(or through the “Do-file” icon)

49
50
51
Saving commands from History window
• Select commands in
History window
• Right-click and choose
“Send to Do-file Editor”

52
Executing commands in Do-file editor
• Select the command line(s) you want to execute
• Menu > Tools > Execute (do) or click
or press Ctrl+D

53
Tips & tricks: special commands often
used in do-files
• To run commands with long
results without pausing
set more off
• After importing or opening
data file, use the option
“clear” to clear the previous
dataset in memory
• Comments after // (on the
same line as the command or
on a new line) or * (only on a
new line)
• Join lines with ///
54
Thank you for your attention

55

You might also like