Basic STATA Command
Basic STATA Command
Command
**TIPS Normally not to present standard error of mean in manuscript, use CI instead.
**Proportion -> STATA with use binomial or Bernoulli’s distribution with proportion
variable -> show 95% CI in ‘Binomial Exact’
tab {var1} {var2}, col chi2 - Create 2x2 table (row = var1,
column = var2) with column percentage, analyse with Chi-square
tab {var1} {var2}, col exact - Create 2x2 table (row = var1, column =
var2) with column percentage, analyse with Fisher’s Exact Probability test
sum {var1} if {var2 + comparator + value} - Summarize variable with if clause
eg. sum a if b==1
ttest {var1},by({var2}) - Test of Mean using 2-sample t-test
with equal variances (Var1 by Var2)
sdtest {var1},by({var2}) - Variance test of 2 means (Proof of
equal variance)
ranksum {var1}, by({var2}) - Test of Mean using Ranksum
**T-test can only be use if both 2 means are normally distributed -> Use histogram to
evaluate first! (T-test is ‘Parametric test’ -> using parameters eg Mean, SD)
**If not normally distributed -> Use Non-parametric test instead : Wilcoxon rank-sum
(=Mann-Whitney U test)
**Non-parametric statistics - Not depends on mean, SD of data (But lower power
compared to Parametric test)
**Using normal distribution data with non-parametric test is ok, but not preferred
because of lower power
**Ranksum test is the test of rank summation, not the test of median!
**Conservatively : always use 2-sided p-value initially (Pr(|T|>|t|), because we don’t
actually know the direction of difference
**But if we know that the intervention we give will result in only 1 direction of
difference only -> We can use 1-sided p-value of expected direction of difference.
(BUT not recommended)
**H0 = NULL hypothesis, Ha = Alternative hypothesis
**Chi-square : Use in ‘LARGE’ sample test (Not clearly defined how much is
LARGE). If small sample size of don’t want any assumption -> Use Fisher’s Exact
test instead
**Fisher’s Exact test use very complex calculation -> very slow if very high sample
size -> Use Chi-square is accepted (result can be assumed as equal)
**ANOVA is like T-test with same assumption (normally distribute, equal variance ->
This command use Bartlett’s test of equal variance; if significant -> variance not
equal between group)
**ANOVA is Parametric test
**We don’t do T-test 3 times instead of using analysis of multiple mean ->
Multiplicity, Some may use Bonferroni p-value correction (but not recommended)
**Multivariate analysis have to include variable that has no statistical significant, but
has a difference
**Regression analysis is better than ANOVA, and thus more preferred
**K-Wallis rank test is Non-parametric test for multiple means. Not depends on mean,
SD of data
Regression plot (Linear plot) : Menu Graphics -> Two-way graph ->
Create -> select ‘Fit plot’ -> Linear prediction -> input X and Y variable -> Submit
Scatter plot : Menu Graphics -> Two-way graph ->
Create -> select ‘Basic plot’ -> Scatter plot -> input X and Y variable -> Submit
regress {var1} {var2} - Do the linear regression analysis using
var1 and var2 and display constant and coefficient to form linear formula (Y = a +
b(x), a = constant, b = coefficient)
regress {var1} i.{var2} **if var2 is ‘strata’ (group1, group2,…) - Do the
linear regression analysis (Y = base + 0(group0) + Coef1(group1) + Coef2(group2) +
…)
regress {var1} i.{var2}, base **if var2 is ‘strata’ (group1, group2,…) - Do the
linear regression analysis (Y = base + 0(group0) + Coef1(group1) + Coef2(group2) +
…), and show base group (group 0)
regress {var1} i.{var2} {var3} …{varn} - Do the linear regression analysis, adjust
base with Var 3 to Var n(var 3 to var n have to be linear associated with var1**)
**Linear regression plot - Create a line the have lowest cumulative distance between
line and each point of data in scatter plot (least error)
**Regression analysis = regress to the mean/best line
**regress command = Gaussian regression (Y data has normal distribution). There is
non-Gaussian regression
(Cohort study)
**Using CS command to create 2x2 epitable without univariable risk ratio in Cohort
study is not acceptable anymore (We have to use Multivariate binary regression to
adjust other confounding factors)
Except for RCT (OK due to low confounding factors)
(Case-control study)
**Risk factor research (Cohort study) : Can use OR in Cohort study, but may
overestimate risk ratio (But it looks dramatic!, and frequently use in risk factor
research)
**If Incidence rate is not constant -> Instantaneous rate : cannot use Poisson
regression, use Cox-regression analysis instead
**sts command : Only use after stset command
**Median survival time = Time point that only 50% of study population survive