STATA Command Summary
STATA Command Summary
Command
disp {formula} - Calculator function; eg. disp (1+1)/(2*6), disp sqrt(16), disp ln(10) ; log = ln = natural
logarithm
tab {var} - Create table with frequency, percentage, and cumulative percentage ->
Categorical data
tab {var1} {var2} - Create 2x2 table (row = var1, column = var2) (Cross-tabulation)
tab {var1} {var2}, col - Create 2x2 table (row = var1, column = var2) with column percentage **Use column
percentage is preferred.
tab {var1} {var2}, row - Create 2x2 table (row = var1, column = var2) with row percentage
histogram {var} - Create histogram -> Can be used for evaluate if the data is normally distributed
histogram {var},{var2} - Create 2 histograms of Var1 by Var2
histogram {var},by({var2}) - Create 2 histograms of Var1 by each of Var2
silk {var} - Shapiro-Wilk W test for normal data (if not significant -> normal)
**Test of normality (Kolmogorov-Smirnov, Shapiro-Wilk) - if n > 40 cannot be used (Tends to always significant despite true normal
distribution)
**Easy way to confirm normality
1. Eyeball test (Histogram plot)
2. Size of S.D. (< Mean/2 ?)
3. Mean = Median = Mode ?
**Clinical count data 1. Mostly non-normal distribution, 2. Mostly Right skewed
**TIPS Normally not to present standard error of mean in manuscript, use CI instead.
**Proportion -> STATA with use binomial or Bernoulli’s distribution with proportion variable -> show 95% CI in ‘Binomial Exact’
tab {var1} {var2}, col chi2 - Create 2x2 table (row = var1, column = var2) with column percentage, analyse with
Chi-square
tab {var1} {var2}, col exact - Create 2x2 table (row = var1, column = var2) with column percentage, analyse with
Fisher’s Exact Probability test
sum {var1} if {var2 + comparator + value}
- Summarize variable with if clause eg. sum a if b==1
ttest {var1},by({var2}) - Test of Mean using 2-sample t-test with equal variances (Var1 by Var2)
sdtest {var1},by({var2}) - Variance test of 2 means (Proof of equal variance)
ranksum {var1}, by({var2}) - Test of Mean using Ranksum
**T-test can only be use if both 2 means are normally distributed -> Use histogram to evaluate first! (T-test is ‘Parametric test’
-> using parameters eg Mean, SD)
**If not normally distributed -> Use Non-parametric test instead : Wilcoxon rank-sum (=Mann-Whitney U test)
**Non-parametric statistics - Not depends on mean, SD of data (But lower power compared to Parametric test)
**Using normal distribution data with non-parametric test is ok, but not preferred because of lower power
**Ranksum test is the test of rank summation, not the test of median!
**Conservatively : always use 2-sided p-value initially (Pr(|T|>|t|), because we don’t actually know the direction of difference
**But if we know that the intervention we give will result in only 1 direction of difference only -> We can use 1-sided p-value of
expected direction of difference.(BUT not recommended)
**H0 = NULL hypothesis, Ha = Alternative hypothesis
**Chi-square : Use in ‘LARGE’ sample test (Not clearly defined how much is LARGE). If small sample size of don’t want any
assumption -> Use Fisher’s Exact test instead
**Fisher’s Exact test use very complex calculation -> very slow if very high sample size -> Use Chi-square is accepted (result can be
assumed as equal)
pwcorr {var1} {var2} - Pearson’s pairwise correlation : - is negative correlation, + is positive, 0 is cannot
describe correlation, rage can only be between -1 to +1. Greater value = greater
strength of linear correlation!
pwcorr {var1} {var2}, sig - Pearson’s pairwise correlation with p-value
spearman {var1} {var2} - Spearman’s rho correlation
oneway {var1} {var2} - Analysis of variance (ANOVA) of Var1 by multiple groups of Var2
oneway {var1} {var2}, tab - Create 2x2 table of Var1 by Var 2, and do the Analysis of variance (ANOVA) of Var1 by
multiple groups of Var2
oneway {var1} {var2}, tab bon - Create 2x2 table of Var1 by Var 2, and do the Analysis of variance ANOVA) of Var1 by
multiple groups of Var2 with Bonferroni correction (Do multiple paired T-test
with p-value compensation)
kwallis {var1}, by({var2}) - K-Wallis rank test for multiple means
**ANOVA is like T-test with same assumption (normally distribute, equal variance -> This command use Bartlett’s test of equal
variance; if significant -> variance not equal between group)
**ANOVA is Parametric test
**We don’t do T-test 3 times instead of using analysis of multiple mean -> Multiplicity, Some may use Bonferroni p-value correction
(but not recommended)
**Multivariate analysis have to include variable that has no statistical significant, but has a difference
**Regression analysis is better than ANOVA, and thus more preferred
**K-Wallis rank test is Non-parametric test for multiple means. Not depends on mean, SD of data
Regression plot (Linear plot) : Menu Graphics -> Two-way graph -> Create -> select ‘Fit plot’ -> Linear prediction ->
input X and Y variable -> Submit
Scatter plot : Menu Graphics -> Two-way graph -> Create -> select ‘Basic plot’ -> Scatter plot -> input
X and Y variable -> Submit
regress {var1} {var2} - Do the linear regression analysis using var1 and var2 and display constant and
coefficient to form linear formula (Y = a + b(x), a = constant, b = coefficient)
regress {var1} i.{var2} **if var2 is ‘strata’ (group1, group2,…)
- Do the linear regression analysis (Y = base + 0(group0) + Coef1(group1) +
Coef2(group2) +…)
regress {var1} i.{var2}, base **if var2 is ‘strata’ (group1, group2,…)
- Do the linear regression analysis (Y = base + 0(group0) + Coef1(group1) +
Coef2(group2) +…), and show base group (group 0)
regress {var1} i.{var2} {var3} …{varn}
- Do the linear regression analysis, adjust base with Var 3 to Var n (var 3 to var n have
to be linear associated with var1**)
**Linear regression plot - Create a line the have lowest cumulative distance between line and each point of data in scatter plot (least
error)
**Regression analysis = regress to the mean/best line
**regress command = Gaussian regression (Y data has normal distribution). There is non-Gaussian regression
Command
(Cohort study)
(Case-control study)
cc {var y} {var x} - Case-control study -> Create 2x2 table, calculate odds ratio, 95% CI and Chi-square
test result
cci {value1} {value1} {value1} {value1}
- Case-control study immediate command -> Create table using value 1-4
logistic lbw smoke - Do the multivariate regression analysis between Var1 and Var2 using Logistic
regression, **adjust for Var3 to Var n to correct confounding factors, then
calculate OR
**Risk factor research (Cohort study) : Can use OR in Cohort study, but may overestimate risk ratio (But it looks dramatic!, and
frequently use in risk factor research)
ir {var y} {var x} {follow-up time} - Create 2x2 table, calculate Incidence rate, Incidence rate ratio,
Incidence rate difference and Fisher’s exact test result
poisson {var1} {var2}, exp(day) irr - Poisson regression analysis for rate (Univariable)
poisson {var1} {var2} {var3} … {var n}, exp(day) irr
- Poisson regression analysis for rate (Multivariable adjust using Var3 to Var n) -> for
Average rate (Incidence rate must be constant at all point of time)
stset {day} {var y} - Prepare data for survival analysis (street = survival time set)
sts graph, hazard - Show shape of Smoothed hazard estimate -> shape of rate at all time point
sts graph, cumhaz - Show shape of Cummulative hazard curve
sts graph, (+/-surv) - Show Kaplan-Meier survival probability curve -> Showing overall survival **Default of
sts graph command is KM curve, no need to use ‘surv’
sts graph, surv by({var}) - Show Kaplan-Meier survival probability curve stratify by var
sts graph, failure - Show Failure curve (Inversion of KM curve) -> Showing complication, not death
outcome
stsum - Show Time at risk, Incidence rate, and Survival time percentile
sts graph if {condition} - Show curve that comply with the specifiedcondition
sts test - Log-rank test for survival function (Non-parametric) -> Only tell if all survival curve are
different or not
sts list, at({time1},{time2},{time3},…) surv
- Show survival list at time1, time2, time3,…
stcox {var x} - Cox-regression analysis -> show Hazard ratio compare by Var x
stcox i.{var x}, base - Cox-regression analysis if Var X is ‘strata’ (not value) and show base
**If Incidence rate is not constant -> Instantaneous rate : cannot use Poisson regression, use Cox-regression analysis instead
**sts command : Only use after stset command
**Median survival time = Time point that only 50% of study population survive
diagt {reference} {test} - Calculate all characters of diagnostic test (Sense, Spec, PPV, NPV), with 95% CI
included
roctab {reference} {test} - Create ROC table -> Help in display accuracy in non-binary index test
roctab {reference} {test}, graph - Create ROC curve -> Help in display accuracy in non-binary index test