0% found this document useful (0 votes)
182 views25 pages

Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm

The document discusses using Stata for data analysis. It describes Stata's popularity and capabilities for statistical analysis. It emphasizes the importance of using do files for reproducibility and documentation when working with Stata. Basic workflows, interfaces, and commands for importing data, cleaning variables, and summarizing statistics are covered.

Uploaded by

Leni Marlina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
182 views25 pages

Data Analysis With Stata: Creating A Working Dataset: Gumilang Aryo Sahadewo October 9, 2017 Mep Feb Ugm

The document discusses using Stata for data analysis. It describes Stata's popularity and capabilities for statistical analysis. It emphasizes the importance of using do files for reproducibility and documentation when working with Stata. Basic workflows, interfaces, and commands for importing data, cleaning variables, and summarizing statistics are covered.

Uploaded by

Leni Marlina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Data

Analysis with Stata:


Creating a Working Dataset
Gumilang Aryo Sahadewo
October 9, 2017
MEP FEB UGM
Why Stata?

It is widely used in governments, academics, and research


institutions
The software:
Quite intuitive
Results are presented clearly
Handle large datasets
First known for cross section and panel data, now offer packages for time
series analyses
Better graphical capabilities in recent version
Offers built-in packages for a wide range of statistical analyses
Third-party development of packages
Annual update of the software
Cross-platform compatibility

povertyactionlab.org 2
Basic workflow in Stata

3
Stata User Interface

Interactive Menu

Command
Do file editor

povertyactionlab.org 4
Basic workflow in Stata

User
Functions
imported,
web
download
Interactive Results

Command
Data Variables Results
editor

Do file Results

povertyactionlab.org 5
Please use do files

Stata is a command-line-driven software


Utilizing do files is the best way to learn and use Stata
A long list of commands is inevitable in data analyses
Structure commands
Run commands in batch
A do file is a great tool for reproducibility
File sharing
Transparency of procedures and results
Interactive menu is still useful:
Getting the syntax of command right is hard for the first time
Copy the command to the do file

povertyactionlab.org 6
Creating a do file
Give each do file a title
State the author
Track the dates

povertyactionlab.org 7
Commenting

It is useful to provide descriptions, explanations, or logics for your


commands
Some useful syntax to use:
Block commenting: /* insert text here */
Line Command:
* insert text here
// insert text here
Continue a syntax to a new line: ///

povertyactionlab.org 8
Using the Help Viewer

It is difficult to memorize the syntax of a particular command


There are many options and sub-options
It is always useful to visit the help viewer
help topic
The help viewer provides:
A description of a command
A description of each option
An example on how to use the command

povertyactionlab.org 9
SSC archive

Stata users develop commands and store them in the SSC archive
You need to be connected to the internet to use this feature
Lets install wbopendata
ssc describe wbopendata
ssc install wbopendata
To see newly added SSC packages:
ssc new
To see trending SSC packages:
ssc hot

povertyactionlab.org 10
Updating SSC pagkages

Check whether installed SSC packages are up-to-date:


adoupdate
You may also update all or specific SSC packages youve installed:
adoupdate, update
Note that its good to check what is being updated before you
actually update a package
Some updates may be major
Some of your do files may be obsolete!

povertyactionlab.org 11
Importing dataset to Stata

12
Setting a directory

It is useful to set a unique folder for


each project
Keeps you organized
Mac:
cd /users/sahadewo/stata/
Windows:
cd C:\Stata\
In the folder, organize your files in
subfolders e.g.:
Raw Data
Do file
Data
Figure
Table
Log

povertyactionlab.org 13
Types of files

Stata: .dta
Excel: .xls, .xlsx, .csv
ASCII: .csv, .dat, .txt
Once you import the data, make sure to save it in Stata format:
save filename.dta, replace

povertyactionlab.org 14
Importing data from a public source

An example of public data accessible through Stata is World Bank


Open Databases.
You can import data given that you have an internet access:
wbopendata, country(IDN) clear
You can also specify a certain topic:
wbopendata, topics(4) clear

povertyactionlab.org 15
Wide vs long dataset

There are two forms of data layout:


Wide
Wide
Long
Two important terms:
Logical observations e.g. person,
firms, country
Subobservations e.g. time
The data is in a wide form if a row Long
contains multiple columns of
subobservations for each logical
observations.
The data is in a long form if a row
contains a specific subobservation for
each logical observations.

povertyactionlab.org 16
Reshaping: wide to long, v.v.

We can use the built-in command to reshape the data from wide to long or
v.v.
Suppose that we have a long format:
reshape wide varlist, i(logical observations) j(subobservations)
Suppose that we have a wide format:
reshape long varlist, i(logical observations) j(subobservations)

povertyactionlab.org 17
Cleaning the dataset

18
Cleaning the dataset

The first routine that we should do is to clean the data


This routine include:
Renaming variables
Providing a brief description
Defining a label
Create a habit of renaming variables and providing a brief description for
each variable
Remember, your dataset will be used by many people
It is good to assume that they dont know the dataset at all
You are going to forget the dataset in a matter of weeks!

povertyactionlab.org 19
Label and missing values

Labels are very useful for users Often times, we are going to deal with
missing values
Consider the educ variable in the Missing values can be driven by:
dataset Non response
Non-valid response
Instead of seeing 0, 1, 2, 3, , users
No response required
will see: etc.
No schooling
It is good to document the reason for
DNF elementary missingness
SD There is a subdiscipline in statistics and
SMP econometrics dedicated with missing data
etc. There are many missing values that we can
use:
.
.a
.b
.c
etc.

povertyactionlab.org 20
Summarizing variables: descriptive statistics

We can start our analysis by running simple descriptive statistics:


Average
Standard deviation
Min
Max
We can use the command:
summ varname
We can also run more detailed descriptive statistics:
summ varname, d
Post using the summ command, we can use the stored results:
display `r(mean)
display `r(sd)
We can also use the stored results in another command

povertyactionlab.org 21
One-way tabulation

Next, it is good to tabulate variables:


Categorical variable
Ordinal variable
Ratio variable with limited values
We can observe:
frequency (count)
Percentage
Cummulative percentage

povertyactionlab.org 22
Creating a summary statistics

We can use Stata built-in package tabstat to create a summary statistics


This package allows us to summarize important descriptive statistics:
Average
Median
SD
Min, Max
Interquartile range
Number of observations
We can use the output to create a summary statistics in a document file

povertyactionlab.org 23
Log files

Keep your session documented!


You can use a log file to document your do file and the output
associated with the commands in the do file
Keep your log file in an organized folder

povertyactionlab.org 24
Terima Kasih

Gumilang Aryo Sahadewo


sahadewo.wordpress.com

25

You might also like