Biostatistic Assessment!
Biostatistic Assessment!
.................
2. A. What is meant by a measure of location?
B. If you have an open-end continuous series, Which measures of central
tendency do you recommend representing this series?
C. Why?
Α . Measures of location are statistical estimates that describe the center of a probability
distribution.
B. If you have an open-end continuous series, where the data is not bounded by specific
minimum and maximum values, it is recommended to use the median and the mode as measures
of central tendency.
1. Median: The median is the middle value of a dataset when it is arranged in ascending or
descending order.
2. Mode: The mode represents the most frequently occurring value or values in a dataset. It
is especially useful when there are distinct peaks or modes in the data distribution. The
mode can provide insights into the dominant values or categories in the open-ended
continuous series.
C. Using the median and mode in an open-ended continuous series is recommended because they
are less sensitive to extreme values and outliers, which can be common in datasets without
specific minimum and maximum values. These measures provide a more accurate representation
of the central tendency and help to avoid biases caused by extreme values that may
disproportionately influence the mean.
................
3. If we have frequency distribution which is almost, but not quite, symmetrical and
Who’s mean and mode are 27 and 29 respectively, what will be the appropriate
value Of the median?
.............
4 . How does one plot a box-and-whisker plot? What are the advantages of this type of
Plot? What additional information does this type of display give that is not available From
either a bar graph or stem-and-leaf plot?
#To plot a box-and-whisker plot, follow these steps:
1. Arrange your dataset in ascending order.
2. Determine the median, which is the middle value of the dataset. If the dataset has an odd
number of values, the median is the middle value. If the dataset has an even number of values,
the median is the average of the two middle values.
3. Calculate the lower quartile (Q1), which is the median of the lower half of the dataset. It
represents the point below which 25% of the data falls.
4. Calculate the upper quartile (Q3), which is the median of the upper half of the dataset. It
represents the point below which 75% of the data falls.
5. Calculate the interquartile range (IQR), which is the difference between Q3 and Q1.
6. Identify any outliers in the dataset. Outliers are values that fall below Q1 – 1.5 * IQR or above
Q3 + 1.5 * IQR.
7. Draw a number line or horizontal axis and mark the position of the median, Q1, and Q3. Draw
a box from Q1 to Q3, with a line inside the box at the position of the median.
8. Draw lines, called whiskers, from the box to the minimum and maximum values within the
range of Q1 – 1.5 * IQR to Q3 + 1.5 * IQR. Any outliers beyond this range can be shown as
individual points or asterisks.
............
6. In a certain population of women 3 percent have had breast cancer, 15 percent are
Smokers, and 4 percent are smokers and have had breast cancer. A woman is selected At
random from the population. What is the probability that she has had breast cancer Or
smokes or both?
..............
7. Assume the number of episodes per year of otitis media, a common disease of the Middle
ear in early childhood, follows a Poisson distribution with parameter λ= 1.6 Episodes per
year. Find the probability of getting 3 or more episodes of otitis media in The first 2 years
of life.
.........
8. What is the difference between the population standard deviation and the standard
Error of the mean?
The population standard deviation is the square root of the population vari- ance. The
standard error of the mean is the standard deviation for the sampling dis- tribution of the sample
average. For random samples, it differs from the population standard deviation by a factor of
1/√n.
..............
9. What are the desirable properties of an estimator of a population parameter?
An estimator is a statistic used to estimate or infer a population parameter based on sample data.
Desirable properties of an estimator include:
1. Unbiasedness: An estimator is unbiased if, on average, it produces estimates that are
equal to the true population parameter. In other words, the expected value of the estimator
is equal to the population parameter. Unbiased estimators are desirable because they
provide accurate estimates without systematic overestimation or underestimation.
2. Efficiency: An efficient estimator is one that has a small variance compared to other
estimators. It means that the estimator is precise and provides estimates that are close to
the true value. Efficiency is important because it allows for more accurate and reliable
estimates, providing more precise information about the population parameter.
3. Consistency: A consistent estimator is one that converges to the true population
parameter as the sample size increases. In other words, as the sample size grows, the
estimates produced by the estimator become increasingly closer to the true value.
Consistency is desirable because it ensures that with larger samples, the estimator
provides more accurate and reliable estimates.
4. Sufficiency: A sufficient estimator is one that captures all the relevant information in the
sample data needed to estimate the population parameter. It means that the estimator
summarizes the data efficiently and does not contain redundant information. Sufficiency
is desirable because it allows for simpler and more efficient estimation procedures.
5. Robustness: A robust estimator is one that is not greatly affected by outliers or violations
of assumptions. It means that the estimator’s performance remains stable even in the
presence of deviations from the underlying assumptions.
Robust estimators are desirable because they provide reliable estimates even in situations
where the data may not strictly adhere to the assumptions of the estimation method.
.....................
10. Suppose we randomly select 20 students enrolled in an introductory course in
biostatistics and measure their resting heart rates. We obtain a mean of 66.9 (S =
9.02). Calculate a 95% confidence interval for the population mean and give an
interpretation of the interval you obtain.
●To calculate a 95% confidence interval for the population mean resting heart rate, we can use
the t-distribution since the sample size is relatively small (n = 20) and the population standard
deviation is unknown.
Given:
Sample mean (x) = 66.9
Sample standard deviation (s) = 9.02
Sample size (n) = 20
The 95% confidence interval for the population mean resting heart rate is approximately (62.672,
71.128). This means that we are 95% confident that the true population mean resting heart rate
falls within this interval.
Interpretation: We can interpret the confidence interval as follows: If we were to repeat the
sampling process multiple times and construct 95% confidence intervals each time,
approximately 95% of these intervals would contain the true population mean resting heart rate.
Therefore, based on this specific sample of 20 students, we estimate with 95% confidence that
the population mean resting heart rate lies between 62.672 and 71.128 beats per minute.
................
11. The standard hemoglobin reading for normal males of adult age is 15 g/100 ml. The
Standard deviation is about 2.5 g/100 ml. For a group of 36 male construction Workers, the
sample mean was 16 g/100 ml. Construct a 95% confidence interval for the male
construction workers. What is your interpretation of this interval relative to The normal
adult male population?
To construct a 95% confidence interval for the mean hemoglobin level of male construction
workers, we can use the formula:
CI = x± (Z * (σ/√n))
Where:
- CI is the confidence interval
- xis the sample mean (16 g/100 ml)
- Z is the critical value from the standard normal distribution corresponding to the desired
confidence level (95% confidence level corresponds to Z ≈ 1.96)
- σ is the population standard deviation (2.5 g/100 ml)
- n is the sample size (36)
The 95% confidence interval for the mean hemoglobin level of male construction workers is
approximately (15.18 g/100 ml, 16.82 g/100 ml).
Interpretation: We can interpret the confidence interval as follows: Based on the sample of 36
male construction workers, we estimate with 95% confidence that the true mean hemoglobin
level of male construction workers falls within the interval of approximately 15.18 g/100 ml to
16.82 g/100 ml. This means that it is likely that the hemoglobin levels of male construction
workers are higher or lower than the standard hemoglobin reading for normal adult males (15
g/100 ml) as the confidence interval does not include the standard value.
............
12. Describe the differences between a one-tailed and a two-tailed test. Give examples of
when it would be appropriate to use a two-tailed test and when it would be appropriate to
use a one-tailed test.
...........
the difference between a one-tailed test and a two-tailed test lies in the directionality of the
alternative hypothesis. A one-tailed test is used when there is a specific expectation of the effect’s
direction, while a two-tailed test is used when the alternative hypothesis is non-directional,
allowing for differences in either direction. The choice between the two depends on the specific
research question and the prior expectations about the relationship being tested.
Example 1: Suppose a new medication is believed to increase the average score on a cognitive
test. The null hypothesis (H₀) would state that the medication has no effect, while the alternative
hypothesis (H₁) would state that the medication increases the average score. In this case, a one-
tailed test would be appropriate because we are specifically interested in determining if the
medication has a positive effect on the cognitive test scores
Example 2: Consider a study investigating whether a new teaching method improves student
performance. The null hypothesis (H₀) would state that the teaching method has no effect, while
the alternative hypothesis (H₁) would state that the teaching method has a different effect, either
positive or negative. In this case, a two-tailed test would be appropriate because we want to
determine if there is a significant difference in student performance, regardless of whether it
improves or declines.
.........
13. It is known in a pharmacological experiment that rats fed with a particular diet over a
certain period gain an average of 40 gms in weight. A new diet was tried on a sample of 20
rats yielding a weight gain of 43 gms with variance 7 gms2. Test the hypothesis that the new
diet is an improvement assuming normality.
To test the hypothesis that the new diet is an improvement compared to the known average
weight gain of 40 grams, we can perform a hypothesis test using the sample data.
Let’s set up the null and alternative hypotheses:
Null Hypothesis (H₀): The new diet has no improvement, and the average weight gain is equal to
40 grams.
Alternative Hypothesis (H₁): The new diet has an improvement, and the average weight gain is
greater than 40 grams.
Since we have a sample size of 20 and the variance of the weight gain is known (7 gms²), we can
use a one-sample t-test to test the hypotheses. The test statistic for the one-sample t-test is given
by:
T = (x- μ₀) / (s / √n)
Where:
- xis the sample mean weight gain (43 gms)
- μ₀ is the hypothesized population mean weight gain under the null hypothesis (40 gms)
- s is the sample standard deviation (√(variance) = √7 gms)
- n is the sample size (20)
Calculating the test statistic:
T = (43 – 40) / (√7 / √20)
T = 3 / (√7 / √20)
T ≈ 3 / 0.5916
T ≈ 5.07
Next, we need to determine the critical value or p-value for the test. Since the alternative
hypothesis is one-sided (the average weight gain is greater), we will compare the test statistic to
the critical value from the t-distribution at the desired significance level.
Assuming a significance level of 0.05, the critical value at a one-sided test with 19 degrees of
freedom (n-1) is approximately 1.729.
Since the calculated test statistic (5.07) is greater than the critical value (1.729), we have strong
evidence to reject the null hypothesis.
Therefore, we can conclude that based on the given sample, the new diet has shown a statistically
significant improvement in weight gain compared to the known average weight gain of 40
grams.
.................
14. An epidemiologic study examined risk factors associated with pediatric AIDS. In a
small study of 30 cases and 30 controls, a positive history of substance abuse occurred
among 11 of the cases and 6 of the controls. Based on these data, can the investigator assert
that substance abuse is significantly associated with pediatric AIDS at the α = 0.05 level?
Compute the approximate 95% confidence interval for the difference between the
proportions of substance abuse found in the case and control groups.
To determine if substance abuse is significantly associated with pediatric AIDS at the α = 0.05
level, we can perform a hypothesis test for the difference in proportions between the case and
control groups.
Let’s define the following:
P1 = proportion of cases with a positive history of substance abuse
P2 = proportion of controls with a positive history of substance abuse
We want to test the null hypothesis (H₀) that there is no difference in the proportions of
substance abuse in the case and control groups against the alternative hypothesis (H₁) that there
is a difference.
H₀: p1 – p2 = 0
H₁: p1 – p2 ≠ 0 (two-tailed test)
THANK
!YOU