Stata Session 1 KA (Class)
Stata Session 1 KA (Class)
dta, clear
The Dirty Data Theorem states that “real world” data tends codebook
to come from bizarre and unspecifiable distribution of
codebook examines the variable names, labels, and data to
highly correlated variables and have unequal sample sizes, produce a codebook describing the dataset.
missing data points, non-independent observations and an
indeterminate number of inaccurately recorded values. codebook [varlist]
list specific variables for specific number of observations webuse hbp2, clear
===================================
keep the first 10 observations
webuse destring1, clear
keep in 1/10
destring id, replace
same logic applies to variables
replace total = "toto" in 2
drop [varlist]
destring total, replace
===================================
Defining labels and values for variables
label define age_cat 1 “less than 20” 2 “20-24” 3
“25-29” 4 “30-34” 5 “35-39” 6 “40-44” 7 “45+”
what is the storage type of the variable “name”? bysort sex: sum bmi
Give the variable energy the following label “total energy What do you notice? Any extreme observations?
expenditure” and “body mass index” for the variable bmi.
Try and replace the extreme observations by a missing value
gen id=_n
suppose we want to merge the 2 categories obese and overweight - Assign the following values for gender 1(males), 2 (females);
together marital status 1 (never married) 2 (married) 3 (divorced)
recode bmi_cat (4=3), gen (bmi_cat1) - how many missing values do we have for the following variables:
age gender marital_stat education height weight.
label define bmi_cat1 1 “underweight” 2 “normal” 3
“overweight” - categorize age into 4 groups 14 to 29, 30 to 49, 50 to 69 and >69
give a label for the newly created variable - appropriately label each category.
label define bmi_cat2 1 “underweight” 2 “normal” 3 - knowing that the condition “and” is denoted as “&” and the
“overweight” condition “or” is denoted as “|” categorize the newly
created bmi into 4 categories as follow
Application 2
FOR FEMALE INNAMTES: <18.5 (underweight), 18.5-25
Open dataset inmates.dta. (normal), 26-30 (overweight), >30 (obese)
FOR MALE INMATES: <20 (underweight), 20-25 (normal), 26-30
(overweight), >30 (obese)
Produce the mean bmi for male and female inmates separately in two
different ways.