Lecture 4
Lecture 4
Stata is a statistical software package used for data analysis, data management, and visualization.
It provides a wide range of tools for statistical analysis, including regression analysis, time-series
analysis, panel data analysis, and survival analysis, among others. Stata is popular in academia,
business, and government for its robust features, user-friendly interface, and powerful scripting
capabilities. It is commonly used by researchers, economists, social scientists, and healthcare
professionals to analyze and interpret data in various fields.
Task 1
In the general population the population mean (μ) and standard deviation (σ) of serum creatinine
are μ = 1.0 mg/dL and σ = 0.4 mg/dL, respectively
Question 1
A sample of n=12 patients were administered a new antibiotic. One day later, their serum creatinine
levels were measured. The mean of this sample was 1.2 mg/dL. Using an appropriate immediate
command, test the null hypothesis that the mean in the sample is different from the mean in
the general population. Do NOT assume that the population variance is known. Instead, suppose
you are given the sample standard deviation (s) and it has value s = 0.6 mg/dL. Provide a 1 sentence
interpretation of your output.
In this sample of n=12, the observed mean was 1.2. A two-sided t-test of the null hypothesis that μ
= 1.0 using sample standard deviation s = 0.6, yielded a t-statistic value of 1.15 and associated two
sided p-value = 0.27. This is not statistically significant. The null hypothesis is not rejected.
Conclude that these data do not provide statistically significant evidence that the mean serum
creatinine among patients taking the new antibiotic μ ≠ 1.0.
Note that the question hypothesis is that the sample mean is different than population mean. While
in STATA H0 is about equality.
Question # 2
Using an appropriate immediate command, obtain a 95% confidence interval estimate of the true
mean serum creatinine among patients who have received the new antibiotic. Again, do NOT
assume that the population variance is known. Again, use the sample standard deviation s = 0.6
mg/dL. Provide a 1-sentence interpretation of your output
Answer
Based on this sample of n=12, with 95% confidence, the unknown mean serum creatinine among
patients taking the new antibiotic is estimated to be between 0.82 and 1.58.
Question # 3
Consider treated patients whose race is recorded as “other”. For this subset of the data, test
whether the baseline temperature (temp0) is significantly different from their temperature
after 2 hours (temp1). Provide a 1 sentence interpretation of your output.
INTERPRETATION: In the sub-group of 32 who are classified as race=’other’, data on temp0 and
temp1 were complete for n=28, or 87.5%. Among these n=28, the mean temperatures at baseline
and two-hours were 100.6 and 99.9, respectively. The mean and standard deviation of the baseline
to two-hour change in temperature were 0.67 and 0.22, respectively. A two-sided paired t-test of the
null hypothesis that the mean change μd = 0 yielded a paired tstatistic value of 2.96 and associated
two sided p-value = 0.006. This is statistically significant. The null hypothesis is rejected. Conclude
that these data provide statistically significant evidence that the change in temperature between
the baseline and 2-hour occasions of measurement μd ≠ 0.
Question # 4
Consider, still, ONLY the treated patients whose race is recorded as “other”. For this subset of the
data, obtain a 90% confidence interval for the true change in temperature between baseline and 2
hours. Provide a 1 sentence statement and interpretation of your confidence interval.
Test whether the baseline APACHE score (apache) is different between treated and untreated
patients The treatment variable is treat. Provide a 1 sentence interpretation of your output.
Hypotheis is APACHE score (apache) is different between treated and untreated patients
Answer
Full data set - Two independent samples, continuous: we will apply 2 Sample t test
INTERPRETATION: In this cohort of n0 = 230 treated with placebo and n1 = 224 treated with
ibuprofen, the mean APACHE scores were 15.19 (s1 = 6.9) and 15.48 (s2 = 7.3), respectively. A two
sample t-test of the null hypothesis of equality of means yielded a t-statistic value of -0.44 and
associated two sided p-value = 0.66. This is not statistically significant. The null hypothesis is not
rejected. Conclude that these data do not provide statistically significant evidence of a treatment
effect on APACHE score.