Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm
povertyactionlab.org 2
Basic workflow in Stata
3
Stata User Interface
Interactive Menu
Command
Do file editor
povertyactionlab.org 4
Basic workflow in Stata
User
Functions
imported,
web
download
Interactive Results
Command
Data Variables Results
editor
Do file Results
povertyactionlab.org 5
Please use do files
povertyactionlab.org 6
Creating a do file
Give each do file a title
State the author
Track the dates
povertyactionlab.org 7
Commenting
povertyactionlab.org 8
Using the Help Viewer
povertyactionlab.org 9
SSC archive
Stata users develop commands and store them in the SSC archive
You need to be connected to the internet to use this feature
Lets install wbopendata
ssc describe wbopendata
ssc install wbopendata
To see newly added SSC packages:
ssc new
To see trending SSC packages:
ssc hot
povertyactionlab.org 10
Updating SSC pagkages
povertyactionlab.org 11
Importing dataset to Stata
12
Setting a directory
povertyactionlab.org 13
Types of files
Stata: .dta
Excel: .xls, .xlsx, .csv
ASCII: .csv, .dat, .txt
Once you import the data, make sure to save it in Stata format:
save filename.dta, replace
povertyactionlab.org 14
Importing data from a public source
povertyactionlab.org 15
Wide vs long dataset
povertyactionlab.org 16
Reshaping: wide to long, v.v.
We can use the built-in command to reshape the data from wide to long or
v.v.
Suppose that we have a long format:
reshape wide varlist, i(logical observations) j(subobservations)
Suppose that we have a wide format:
reshape long varlist, i(logical observations) j(subobservations)
povertyactionlab.org 17
Cleaning the dataset
18
Cleaning the dataset
povertyactionlab.org 19
Label and missing values
Labels are very useful for users Often times, we are going to deal with
missing values
Consider the educ variable in the Missing values can be driven by:
dataset Non response
Non-valid response
Instead of seeing 0, 1, 2, 3, , users
No response required
will see: etc.
No schooling
It is good to document the reason for
DNF elementary missingness
SD There is a subdiscipline in statistics and
SMP econometrics dedicated with missing data
etc. There are many missing values that we can
use:
.
.a
.b
.c
etc.
povertyactionlab.org 20
Summarizing variables: descriptive statistics
povertyactionlab.org 21
One-way tabulation
povertyactionlab.org 22
Creating a summary statistics
povertyactionlab.org 23
Log files
povertyactionlab.org 24
Terima Kasih
25