0% found this document useful (0 votes)
36 views18 pages

Biostatistic Assessment!

The document discusses five application areas of biostatistics for medical laboratory science: quality control, diagnostic accuracy assessment, epidemiological studies, reference range determination, and clinical trial design and analysis. It also answers questions about measures of central tendency, box-and-whisker plots, sample size and selection, probability, Poisson distribution, standard error, and properties of estimators.

Uploaded by

Abel Christos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views18 pages

Biostatistic Assessment!

The document discusses five application areas of biostatistics for medical laboratory science: quality control, diagnostic accuracy assessment, epidemiological studies, reference range determination, and clinical trial design and analysis. It also answers questions about measures of central tendency, box-and-whisker plots, sample size and selection, probability, Poisson distribution, standard error, and properties of estimators.

Uploaded by

Abel Christos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

1.

Write at least five application areas of Biostatistics specifically for Medical


Laboratory Science.

1. Quality control: Biostatistics is instrumental in ensuring the quality and accuracy of


laboratory testing procedures. It helps in designing robust quality control measures,
analyzing control data, and determining if the laboratory’s performance meets acceptable
standards. Biostatistical methods such as control charts and regression analysis are
commonly used for this purpose.
2. Diagnostic accuracy assessment: Biostatistics plays a crucial role in evaluating the
accuracy and reliability of diagnostic tests used in medical laboratory science. It helps in
determining sensitivity, specificity, positive predictive value, negative predictive value,
and likelihood ratios of various tests. Biostatistical techniques such as receiver operating
characteristic (ROC) curve analysis and calculation of diagnostic test performance
indices aid in evaluating and comparing different test methods.
3. Epidemiological studies: Biostatistics is essential for conducting epidemiological studies
within the field of medical laboratory science. It helps in designing study protocols,
calculating sample sizes, analyzing data, and drawing conclusions. Biostatistical methods
such as case-control studies, cohort studies, and cross-sectional surveys are used to
investigate the relationship between various laboratory parameters and disease outcomes.
4. Reference range determination: Biostatistical methods are employed in establishing
reference ranges for laboratory tests. These ranges provide a benchmark against which
individual patient results can be compared to identify abnormalities. Biostatistics helps in
determining the mean, standard deviation, and percentiles of test results in a healthy
population, considering factors such as age, sex, and ethnicity.
5. Clinical trial design and analysis: Biostatistics is indispensable in the design, conduct,
and analysis of clinical trials conducted in medical laboratory science. It helps in
determining sample sizes, randomization methods, and statistical analysis plans.
Biostatistical techniques such as hypothesis testing, analysis of variance (ANOVA),
survival analysis, and regression analysis are used to assess the efficacy and safety of new
laboratory techniques, drugs, or interventions.

.................
2. A. What is meant by a measure of location?
B. If you have an open-end continuous series, Which measures of central
tendency do you recommend representing this series?
C. Why?
Α . Measures of location are statistical estimates that describe the center of a probability
distribution.
B. If you have an open-end continuous series, where the data is not bounded by specific
minimum and maximum values, it is recommended to use the median and the mode as measures
of central tendency.
1. Median: The median is the middle value of a dataset when it is arranged in ascending or
descending order.
2. Mode: The mode represents the most frequently occurring value or values in a dataset. It
is especially useful when there are distinct peaks or modes in the data distribution. The
mode can provide insights into the dominant values or categories in the open-ended
continuous series.
C. Using the median and mode in an open-ended continuous series is recommended because they
are less sensitive to extreme values and outliers, which can be common in datasets without
specific minimum and maximum values. These measures provide a more accurate representation
of the central tendency and help to avoid biases caused by extreme values that may
disproportionately influence the mean.

................
3. If we have frequency distribution which is almost, but not quite, symmetrical and
Who’s mean and mode are 27 and 29 respectively, what will be the appropriate
value Of the median?
.............
4 . How does one plot a box-and-whisker plot? What are the advantages of this type of
Plot? What additional information does this type of display give that is not available From
either a bar graph or stem-and-leaf plot?
#To plot a box-and-whisker plot, follow these steps:
1. Arrange your dataset in ascending order.
2. Determine the median, which is the middle value of the dataset. If the dataset has an odd
number of values, the median is the middle value. If the dataset has an even number of values,
the median is the average of the two middle values.
3. Calculate the lower quartile (Q1), which is the median of the lower half of the dataset. It
represents the point below which 25% of the data falls.
4. Calculate the upper quartile (Q3), which is the median of the upper half of the dataset. It
represents the point below which 75% of the data falls.
5. Calculate the interquartile range (IQR), which is the difference between Q3 and Q1.
6. Identify any outliers in the dataset. Outliers are values that fall below Q1 – 1.5 * IQR or above
Q3 + 1.5 * IQR.
7. Draw a number line or horizontal axis and mark the position of the median, Q1, and Q3. Draw
a box from Q1 to Q3, with a line inside the box at the position of the median.
8. Draw lines, called whiskers, from the box to the minimum and maximum values within the
range of Q1 – 1.5 * IQR to Q3 + 1.5 * IQR. Any outliers beyond this range can be shown as
individual points or asterisks.

Advantages of a box-and-whisker plot include:


Summary of distribution: A box-and-whisker plot provides a concise summary of the
distribution of a dataset. It shows the minimum, maximum, median, quartiles, and outliers,
allowing for a quick understanding of the data’s spread and central tendency.
Comparison between groups: Box-and-whisker plots are effective for comparing multiple
groups or datasets side by side. By plotting multiple boxes on the same axis, you can visually
compare the quartiles, medians, and ranges of different groups, making it easier to identify
differences or similarities.
Identification of outliers: Box-and-whisker plots explicitly highlight outliers, making it easy
to identify values that deviate significantly from the rest of the data. This can be valuable for
identifying potential errors or unusual observations.
Compared to bar graphs or stem-and-leaf plots, a box-and-whisker plot provides additional
information:
- It gives a better visual representation of the spread of the data, including the range, quartiles,
and outliers, whereas a bar graph typically only shows the mean or count of each category.
- It provides a clear indication of the median and quartiles, which stem-and-leaf plots do not
display as explicitly.
...............
5. What role does sample size play in the accuracy of statistical inference? Why is the
Method of selecting the sample even more important than the size of the sample?
Sample size is important in statistical inference because it affects the accuracy of our
conclusions about a population. A larger sample size provides more information and reduces
sampling error, leading to more precise estimates and inferences.
However, the method of selecting the sample is even more important than the sample size. The
sample should be chosen in a way that is unbiased and representative of the population. If the
sample is not representative, it can introduce sampling bias, which can lead to inaccurate results.
Therefore, it is crucial to use a proper sampling method to ensure the sample is representative of
the population being studied.

............

6. In a certain population of women 3 percent have had breast cancer, 15 percent are
Smokers, and 4 percent are smokers and have had breast cancer. A woman is selected At
random from the population. What is the probability that she has had breast cancer Or
smokes or both?
..............
7. Assume the number of episodes per year of otitis media, a common disease of the Middle
ear in early childhood, follows a Poisson distribution with parameter λ= 1.6 Episodes per
year. Find the probability of getting 3 or more episodes of otitis media in The first 2 years
of life.

.........
8. What is the difference between the population standard deviation and the standard
Error of the mean?

The population standard deviation is the square root of the population vari- ance. The
standard error of the mean is the standard deviation for the sampling dis- tribution of the sample
average. For random samples, it differs from the population standard deviation by a factor of
1/√n.

..............
9. What are the desirable properties of an estimator of a population parameter?
An estimator is a statistic used to estimate or infer a population parameter based on sample data.
Desirable properties of an estimator include:
1. Unbiasedness: An estimator is unbiased if, on average, it produces estimates that are
equal to the true population parameter. In other words, the expected value of the estimator
is equal to the population parameter. Unbiased estimators are desirable because they
provide accurate estimates without systematic overestimation or underestimation.
2. Efficiency: An efficient estimator is one that has a small variance compared to other
estimators. It means that the estimator is precise and provides estimates that are close to
the true value. Efficiency is important because it allows for more accurate and reliable
estimates, providing more precise information about the population parameter.
3. Consistency: A consistent estimator is one that converges to the true population
parameter as the sample size increases. In other words, as the sample size grows, the
estimates produced by the estimator become increasingly closer to the true value.
Consistency is desirable because it ensures that with larger samples, the estimator
provides more accurate and reliable estimates.
4. Sufficiency: A sufficient estimator is one that captures all the relevant information in the
sample data needed to estimate the population parameter. It means that the estimator
summarizes the data efficiently and does not contain redundant information. Sufficiency
is desirable because it allows for simpler and more efficient estimation procedures.
5. Robustness: A robust estimator is one that is not greatly affected by outliers or violations
of assumptions. It means that the estimator’s performance remains stable even in the
presence of deviations from the underlying assumptions.
Robust estimators are desirable because they provide reliable estimates even in situations
where the data may not strictly adhere to the assumptions of the estimation method.

.....................
10. Suppose we randomly select 20 students enrolled in an introductory course in
biostatistics and measure their resting heart rates. We obtain a mean of 66.9 (S =
9.02). Calculate a 95% confidence interval for the population mean and give an
interpretation of the interval you obtain.

●To calculate a 95% confidence interval for the population mean resting heart rate, we can use
the t-distribution since the sample size is relatively small (n = 20) and the population standard
deviation is unknown.

Given:
Sample mean (x‌) = 66.9
Sample standard deviation (s) = 9.02
Sample size (n) = 20

The 95% confidence interval for the population mean resting heart rate is approximately (62.672,
71.128). This means that we are 95% confident that the true population mean resting heart rate
falls within this interval.

Interpretation: We can interpret the confidence interval as follows: If we were to repeat the
sampling process multiple times and construct 95% confidence intervals each time,
approximately 95% of these intervals would contain the true population mean resting heart rate.
Therefore, based on this specific sample of 20 students, we estimate with 95% confidence that
the population mean resting heart rate lies between 62.672 and 71.128 beats per minute.
................
11. The standard hemoglobin reading for normal males of adult age is 15 g/100 ml. The
Standard deviation is about 2.5 g/100 ml. For a group of 36 male construction Workers, the
sample mean was 16 g/100 ml. Construct a 95% confidence interval for the male
construction workers. What is your interpretation of this interval relative to The normal
adult male population?
To construct a 95% confidence interval for the mean hemoglobin level of male construction
workers, we can use the formula:
CI = x‌± (Z * (σ/√n))
Where:
- CI is the confidence interval
- x‌is the sample mean (16 g/100 ml)
- Z is the critical value from the standard normal distribution corresponding to the desired
confidence level (95% confidence level corresponds to Z ≈ 1.96)
- σ is the population standard deviation (2.5 g/100 ml)
- n is the sample size (36)

Plugging in the values:


CI = 16 ± (1.96 * (2.5/√36))
CI = 16 ± (1.96 * (2.5/6))
CI = 16 ± (1.96 * 0.4167)
CI ≈ 16 ± 0.8167

The 95% confidence interval for the mean hemoglobin level of male construction workers is
approximately (15.18 g/100 ml, 16.82 g/100 ml).

Interpretation: We can interpret the confidence interval as follows: Based on the sample of 36
male construction workers, we estimate with 95% confidence that the true mean hemoglobin
level of male construction workers falls within the interval of approximately 15.18 g/100 ml to
16.82 g/100 ml. This means that it is likely that the hemoglobin levels of male construction
workers are higher or lower than the standard hemoglobin reading for normal adult males (15
g/100 ml) as the confidence interval does not include the standard value.
............

12. Describe the differences between a one-tailed and a two-tailed test. Give examples of
when it would be appropriate to use a two-tailed test and when it would be appropriate to
use a one-tailed test.
...........
the difference between a one-tailed test and a two-tailed test lies in the directionality of the
alternative hypothesis. A one-tailed test is used when there is a specific expectation of the effect’s
direction, while a two-tailed test is used when the alternative hypothesis is non-directional,
allowing for differences in either direction. The choice between the two depends on the specific
research question and the prior expectations about the relationship being tested.
Example 1: Suppose a new medication is believed to increase the average score on a cognitive
test. The null hypothesis (H₀) would state that the medication has no effect, while the alternative
hypothesis (H₁) would state that the medication increases the average score. In this case, a one-
tailed test would be appropriate because we are specifically interested in determining if the
medication has a positive effect on the cognitive test scores

Example 2: Consider a study investigating whether a new teaching method improves student
performance. The null hypothesis (H₀) would state that the teaching method has no effect, while
the alternative hypothesis (H₁) would state that the teaching method has a different effect, either
positive or negative. In this case, a two-tailed test would be appropriate because we want to
determine if there is a significant difference in student performance, regardless of whether it
improves or declines.

.........
13. It is known in a pharmacological experiment that rats fed with a particular diet over a
certain period gain an average of 40 gms in weight. A new diet was tried on a sample of 20
rats yielding a weight gain of 43 gms with variance 7 gms2. Test the hypothesis that the new
diet is an improvement assuming normality.

To test the hypothesis that the new diet is an improvement compared to the known average
weight gain of 40 grams, we can perform a hypothesis test using the sample data.
Let’s set up the null and alternative hypotheses:
Null Hypothesis (H₀): The new diet has no improvement, and the average weight gain is equal to
40 grams.
Alternative Hypothesis (H₁): The new diet has an improvement, and the average weight gain is
greater than 40 grams.
Since we have a sample size of 20 and the variance of the weight gain is known (7 gms²), we can
use a one-sample t-test to test the hypotheses. The test statistic for the one-sample t-test is given
by:
T = (x‌- μ₀) / (s / √n)
Where:
- x‌is the sample mean weight gain (43 gms)
- μ₀ is the hypothesized population mean weight gain under the null hypothesis (40 gms)
- s is the sample standard deviation (√(variance) = √7 gms)
- n is the sample size (20)
Calculating the test statistic:
T = (43 – 40) / (√7 / √20)
T = 3 / (√7 / √20)
T ≈ 3 / 0.5916
T ≈ 5.07

Next, we need to determine the critical value or p-value for the test. Since the alternative
hypothesis is one-sided (the average weight gain is greater), we will compare the test statistic to
the critical value from the t-distribution at the desired significance level.
Assuming a significance level of 0.05, the critical value at a one-sided test with 19 degrees of
freedom (n-1) is approximately 1.729.
Since the calculated test statistic (5.07) is greater than the critical value (1.729), we have strong
evidence to reject the null hypothesis.
Therefore, we can conclude that based on the given sample, the new diet has shown a statistically
significant improvement in weight gain compared to the known average weight gain of 40
grams.
.................
14. An epidemiologic study examined risk factors associated with pediatric AIDS. In a
small study of 30 cases and 30 controls, a positive history of substance abuse occurred
among 11 of the cases and 6 of the controls. Based on these data, can the investigator assert
that substance abuse is significantly associated with pediatric AIDS at the α = 0.05 level?
Compute the approximate 95% confidence interval for the difference between the
proportions of substance abuse found in the case and control groups.

To determine if substance abuse is significantly associated with pediatric AIDS at the α = 0.05
level, we can perform a hypothesis test for the difference in proportions between the case and
control groups.
Let’s define the following:
P1 = proportion of cases with a positive history of substance abuse
P2 = proportion of controls with a positive history of substance abuse
We want to test the null hypothesis (H₀) that there is no difference in the proportions of
substance abuse in the case and control groups against the alternative hypothesis (H₁) that there
is a difference.
H₀: p1 – p2 = 0
H₁: p1 – p2 ≠ 0 (two-tailed test)

Using the given data:


Cases (n1) = 30, with 11 having a positive history of substance abuse
Controls (n2) = 30, with 6 having a positive history of substance abuse
The sample proportions are:
P‌1 = 11/30
P‌2 = 6/30
To test the hypothesis, we can use the z-test for the difference in proportions. The test statistic is
given by:
Z = (p‌1 – p‌2) / √((p‌(1-p‌)/n1) + (p‌(1-p‌)/n2))
Where p‌is the pooled proportion:
P‌= (x1 + x2) / (n1 + n2)
Calculating the test statistic:
P‌= (11 + 6) / (30 + 30) = 17 / 60 = 0.2833
Z = (0.3667 – 0.2) / √((0.2833 * (1-0.2833)/30) + (0.2833 * (1-0.2833)/30))
Z ≈ 0.1667 / √(0.0049 + 0.0049)
Z ≈ 0.1667 / √0.0098
Z ≈ 0.1667 / 0.0989
Z ≈ 1.684
Next, we need to determine the critical value or p-value for the test. Since the alternative
hypothesis is two-sided, we will use a significance level of α = 0.05/2 = 0.025.
The critical z-values at α = 0.025 are approximately -1.96 and +1.96.
Since the calculated test statistic (1.684) does not exceed the critical values (-1.96 and +1.96), we
do not have sufficient evidence to reject the null hypothesis.
Therefore, based on these data, we cannot assert that substance abuse is significantly associated
with pediatric AIDS at the α = 0.05 level.
To compute the approximate 95% confidence interval for the difference between the proportions
of substance abuse found in the case and control groups, we can use the formula:
CI = (p‌1 – p‌2) ± Z * √((p‌1 * (1-p‌1)/n1) + (p‌2 * (1-p‌2)/n2))
Where Z is the critical value from the standard normal distribution corresponding to the desired
confidence level (95% confidence level corresponds to Z ≈ 1.96).
Plugging in the values:
CI = (0.3667 – 0.2) ± 1.96 * √((0.3667 * (1-0.3667)/30) + (0.2 * (1-0.2)/30))
CI = 0.1667 ± 1.96 * √(0.0049 + 0.004)
CI ≈ 0.1667 ± 1.96 * √0.0089
CI ≈ 0.1667 ± 1.96 * 0.0943
CI ≈ 0.1667 ± 0.1849
The 95% confidence interval for the difference in proportions is approximately (-0.0182,
0.3516).
This interval suggests that based on the data, the true difference in proportions of substance
abuse between the case and control groups could range from a decrease of 0.0182 to an increase
of 0.3516, with 95% confidence. However, since the interval includes zero, we cannot conclude
that there is a significant difference in substance abuse between the two groups based on this
study.
REFERENCES
 ROSNER, B., 2015. FUNDAMENTALS OF BIOSTATISTICS. CENGAGE
LEARNING.
 TRIOLA, M.M. AND TRIOLA, M.F., 2013. BIOSTATISTICS FOR THE BIOLOGICAL
AND HEALTH
 SCIENCES WITH STATDISK: PEARSON NEW INTERNATIONAL EDITION.
PEARSON HIGHER ED.
 DANIEL, W.W. AND CROSS, C.L., 2018. BIOSTATISTICS: A FOUNDATION FOR
ANALYSIS IN
 THE HEALTH SCIENCES. WILEY.
 CHAN, B.K., 2015. BIOSTATISTICS FOR EPIDEMIOLOGY AND PUBLIC
HEALTH USING R.
 SPRINGER PUBLISHING COMPANY.
 VAN BELLE, G., FISHER, L.D., HEAGERTY, P.J. AND LUMLEY, T., 2004.
BIOSTATISTICS: A
 METHODOLOGY FOR THE HEALTH SCIENCES. JOHN WILEY & SONS.
 BURTIS, C. A., ASHWOOD, E. R., & BRUNS, D. E. (2012). TIETZ TEXTBOOK OF
CLINICAL CHEMISTRY AND MOLECULAR DIAGNOSTICS (5TH ED.). ELSEVIER
SAUNDERS. CHAPTER 1: INTRODUCTION TO CLINICAL CHEMISTRY.
 ROTHMAN, K. J., GREENLAND, S., & LASH, T. L. (2008). MODERN
EPIDEMIOLOGY (3RD ED.). LIPPINCOTT WILLIAMS & WILKINS. CHAPTER 6:
MEASUREMENT ERROR AND MISCLASSIFICATION.
 WESTGARD, J. O., & WESTGARD, S. A. (2014). BASIC QUALITY MANAGEMENT
SYSTEMS FOR BIOCHEMICAL LABORATORIES: GUIDELINES FOR SELF-
ASSESSMENT (2ND ED.). WESTGARD QC.
 CARSTENSEN, B., PLUMMER, M., & LAARA, E. (2011). EPI: A PACKAGE FOR
STATISTICAL ANALYSIS IN EPIDEMIOLOGY. R PACKAGE VERSION 2.18.
AVAILABLE AT:
HTTPS://CRAN.R-PROJECT.ORG/WEB/PACKAGES/EPI/INDEX.HTML
 DEVORE, J. L., & PECK, R. (2015). STATISTICS: THE EXPLORATION AND
ANALYSIS OF DATA. CENGAGE LEARNING. CHAPTER 3: DESCRIBING
BIVARIATE DATA.
 “SAMPLING: DESIGN AND ANALYSIS” BY SHARON L. LOHR
 TRIOLA, M. F., & TRIOLA, M. (2019). ELEMENTARY STATISTICS. PEARSON.
CHAPTER 3: DESCRIBING DATA USING NUMERICAL MEASURES.
 “MATHEMATICAL STATISTICS AND DATA ANALYSIS” BY JOHN A. RICE
 “ALL OF STATISTICS: A CONCISE COURSE IN STATISTICAL INFERENCE”
BY LARRY WASSERMAN

THANK
!YOU

You might also like