0% found this document useful (0 votes)
154 views

Cheat Sheet: With Stata 15

This document provides a summary of basic data processing and syntax in Stata 15. It outlines the general syntax format for Stata commands as [varlist1:]command [varlist2] [=exp] [,options] and provides examples of arithmetic, logic, and data manipulation operations. It also reviews Stata's six basic data types and how to convert between them using commands like gen, decode, encode, tostring, and destring.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
154 views

Cheat Sheet: With Stata 15

This document provides a summary of basic data processing and syntax in Stata 15. It outlines the general syntax format for Stata commands as [varlist1:]command [varlist2] [=exp] [,options] and provides examples of arithmetic, logic, and data manipulation operations. It also reviews Stata's six basic data types and how to convert between them using commands like gen, decode, encode, tostring, and destring.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Data Processing Basic Syntax

with Stata 15 Cheat Sheet All Stata commands have the same format (syntax):
For more info see Stata’s reference manual (stata.com) [E\£varlist1:]£FRPPDQG£ [varlist2] [=exp] [LI£H[S] [LQ£range] [weight] [XVLQJ£filename] [,options]
apply the function: what are column to save output as condition: only apply to apply pull data from a file special options
Useful Shortcuts command across you going to do apply a new variable apply the function specific rows weights (if not loaded) for command
each unique to varlists? command to if something is true
combination of
F2 keyboard buttons Ctrl + 9 variables in In this example, we want a detailed summary
varlist1 bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail with stats like kurtosis, plus mean and median
describe data open a new .do file
Ctrl + 8 Ctrl + D
To find out more about any command – like what options it takes – type KHOS£command
open the data editor highlight text in .do file,
clear then ctrl + d executes it
delete data in memory in the command line Basic Data Operations Change Data Types
$ঔ&এ঍঍ঁ঎঄3঒এ঍ঐঔ Arithmetic Logic == tests if something is equal Stata has 6 data types, and data can also be missing:
= assigns a value to a variable no data true/false words numbers
add (numbers) & and == equal < less than missing byte string int long float double
PgUp PgDn scroll through previous commands + combine (strings)
! or ~ not != not <= less than or equal to To convert between numbers & strings:
î subtract or > greater than gen foreignString = string(foreign) "1"
Tab autocompletes variable name after typing part | or ~= equal 1 tostring foreign, gen(foreignString) "1"
>= greater or equal to
cls clear the console (where results are displayed) * multiply if foreign != 1 & price >= 10000 if foreign != 1 | price >= 10000
decode foreign , gen(foreignString) "foreign"

Set up / divide make


Chevy Colt
foreign
0
price
3,984
make
Chevy Colt
foreign
0
price
3,984
gen foreignNumeric = real(foreignString) "1"
Buick Riviera 0 10,372 Buick Riviera 0 10,372 1 destring foreignString, gen(foreignNumeric) "1"
pwd ^ raise to a power Honda Civic
Volvo 260
1
1
4,499
11,995
Honda Civic
Volvo 260
1
1
4,499
11,995
encode foreignString, gen(foreignNumeric) "foreign"
print current (working) directory recast double mpg
cd "C:\Program Files (x86)\Stata13" Explore Data generic way to convert between types
change working directory
dir 9উঅগ'ঁঔঁ2঒ইঁ঎উচঁঔউএ঎ 6অঅ'ঁঔঁ'উওঔ঒উংকঔউএ঎ Summarize Data
describe make price codebook make price
display filenames in working directory include missing values create binary variable for every rep78
display variable type, format, overview of variable type, stats, value in a new variable, repairRecord
dir *.dta and any value/variable labels number of missing/unique values
List all Stata data in working directory underlined parts tabulate rep78, mi gen(repairRecord)
are shortcuts – count summarize make price mpg one-way table: number of rows with each value of rep78
capture log close count if price > 5000 print summary statistics
use "capture" tabulate rep78 foreign, mi
close the log on any existing do files or "cap" number of rows (observations) (mean, stdev, min, max) two-way table: cross-tabulate number of observations
log using "myDoFile.txt", replace Can be combined with logic for variables for each combination of rep78 and foreign
create a new log file to record your work and results ds, has(type string) inspect mpg bysort rep78: tabulate foreign
search mdesc lookfor "in." show histogram of data, for each value of rep78, apply the command tabulate foreign
packages contain search for variable types, number of missing or zero tabstat price weight mpg, by(foreign) stat(mean sd n)
find the package mdesc to install extra commands that observations
expand Stata’s toolkit
variable name, or variable label create compact table of summary statistics displays stats
ssc install mdesc formats numbers for all data
isid mpg histogram mpg, frequency
install the package mdesc; needs to be done once check if mpg uniquely plot a histogram of the table foreign, contents(mean price sd price) f(%9.2fc) row
identifies the data distribution of a variable
Import Data create a flexible table of summary statistics
%঒এগওঅ2ংওঅ঒খঁঔউএ঎ওগউঔঈউ঎ঔঈঅ'ঁঔঁ collapse (mean) price (max) mpg, by(foreign) replaces data
sysuse auto, clear for many examples, we Missing values are treated as the largest calculate mean price & max mpg by car type (foreign)
load system data (Auto data) use the auto dataset. browse or Ctrl + 8 positive number. To exclude missing values,
use "yourStataFile.dta", clear open the data editor ask whether the value is less than "." Create New Variables
load a dataset from the current directory frequently used list make price if price > 10000 & !missing(price) clist ... (compact form) generate mpgSq = mpg^2 gen byte lowPr = price < 4000
commands are list the make and price for observations with price > $10,000 create a new variable. Useful also for creating binary
import excel "yourSpreadsheet.xlsx", /* highlighted in yellow variables based on a condition (generate byte)
display price[4]
*/ sheet("Sheet1") cellrange(A2:H11) firstrow generate id = _n bysort rep78: gen repairIdx = _n
import an Excel spreadsheet display the 4th observation in price; only works on single values
_n creates a running index of observations in a group
gsort price mpg (ascending) gsort –price –mpg (descending)
import delimited "yourFile.csv", /* generate totRows = _N bysort rep78: gen repairTot = _N
*/ rowrange(2:11) colrange(1:8) varnames(2) sort in order, first by price then miles per gallon
_N creates a running count of the total observations per group
import a .csv file duplicates report assert price!=. pctile mpgQuartile = mpg, nq = 4
finds all duplicate values in each variable verify truth of claim create quartiles of the mpg data
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data"
webuse "wb_indicators_long" levelsof rep78 egen meanPrice = mean(price), by(foreign) see help egen
set web-based directory and load data from the web display the unique values for rep78 calculate mean price for each group in foreign for more options
Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) geocenter.github.io/StataTraining updated June 2016
follow us @StataRGIS and @flaneuseks Disclaimer: we are not affiliated with Stata. But we like it. CC BY 4.0

You might also like