Stata Class Notes
Stata Class Notes
Open Stata
2. Click on file/ change directory
3. Search for the folder with all files required for your analysis
4. Select data sub folder and click open
Summary: file/change directories/stataclassfolder/datasubfolder/open
1. Type “dir” in command view to check directory under the
subfolder data
2. Type “pwd” (print working directory)
• Log file purpose is to save everything/any changes made to your work in
stata.
• It creates a file that will store all the information necessary for the work
done and changes:-
Double checking your work later on
Reviewing the output of statistical procedures
Copying and pasting results
General way of making it:-
Log using “filename.log
When you want to save to the already existing log file;
Log using “filename.log, replace
Log close- closes the log file
Tyoe use “path and filename”.dta
use “path and filename”.dta,clear for clearing or cancelling the selection
use “path and filename”.dta,append/replace used when one wants to add more
information to the existing .dta file (append) or replace used when you want the file to
be replaced by the new formed .dta
e.g use zsbs2009.dta, append/replace
Note: replace in the command changes the contents in the old file, so be conscious
that no any wanted information is deleted by resaving it.
use zsbs2009.dta,clear
Type clear and close if you want to close your Stata remember your log file is not
affected because you have saved all the changes to it.
• Type describe to check all the variables available in the file
• This brings the observations, number of variables and variable names,
storage type and labels and order of variables
• describe
b) Percent- the % of the total number of observation for which each category accounts
Stata ignores. Missing values when running tabulations to view them you must use the
missing syntax,
*drop command
keep if q101==1
keep if q103<=19
**generate command
gen age=q103
recode q103 (15/24=1 “15-24) (25/49=2 “25-49”)(50/60=3 “50-60), gen(agegroup)// this generate new
groups to the new name called agegroup
tabulate agegroup, missing
Rename q101 sex // to rename write the Varname space then new name of the variable
Kay Vincelaama 8/26/2017
recode q103 (15/24=1 “15-24) (25/49=2 “25-49”)(50/60=3 “50-60) if q101==1, gen(mage)
• Open stata, type the do file CMD or just use the CTRL +9 function
• Click on the open icon and select your do file by allocation it
• To open the file using the do file, click execute once and in the result window a
set of instruction will appear
• Use the keep CMD
e.g. keep q101 q701-q711
• Highlight the command and click on execute to open the selected
variables
To execute all the variables highlight the dataset and click execute to open
all the variables
Using the zsbs2009 individual dataset and the age variable q103, what command
could you sue to identify the number of repsondents who are missing a value for
age? tab q103, m
How many are there? 482
Create a new variable called oldfolks to identify those respondents older than age
45 using the generate and replace commands. How many are there
gen oldfolks=q103
Keep if oldfolks>45
574 people older than 45 yrs
Ttest mpg1==mpg2
Ttest mpg1==mpg2, unpaired
The concentration lies between 0 and 6kg the other values beyond 6kg lies in the unusual
outcomes hence called outliers this can be eliminated to have a clear view of the data 8/26/2017
Kay Vincelaama
1. drop if v438>2000
2. drop if m19_1>6000
1800
1600
1400
1200
1000
(obs=6605)
m19_1 | 1.0000
v438 | 0.1056 1.0000
b4_01 | -0.0808 -0.0002 1.0000
v106 | -0.0268 0.1444 -0.0053 1.0000
v025 | 0.0756 -0.1229 -0.0031 -0.3265 1.0000
v012 | 0.1177 0.1147 0.0067 -0.1717 0.0472 1.0000