0% found this document useful (0 votes)
4 views

1 - Practice Exercise 1 Data Descriptives

The document contains practice exercises for a biostatistics course, focusing on data types, descriptive statistics, and various statistical calculations. It includes questions on identifying data types, calculating measures such as median, mean, standard deviation, and variance, as well as interpreting data from surveys and histograms. The exercises also cover concepts like z-scores and weighted means, providing a comprehensive overview of statistical analysis techniques.

Uploaded by

strasark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

1 - Practice Exercise 1 Data Descriptives

The document contains practice exercises for a biostatistics course, focusing on data types, descriptive statistics, and various statistical calculations. It includes questions on identifying data types, calculating measures such as median, mean, standard deviation, and variance, as well as interpreting data from surveys and histograms. The exercises also cover concepts like z-scores and weighted means, providing a comprehensive overview of statistical analysis techniques.

Uploaded by

strasark
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

HESY 5007 Biostatistics

PRACTICE EXERCISE 1
DATA TYPE & DESCRIPTIVE STATISTICS
Question 1 – Data

Identify each of the following as quantitative data or categorical data.


a) The platelet counts of exam subjects in the dataset above.
Quantitative data
b) The names of the pharmaceutical companies that manufacture aspirin tablets
Categorical data
c) The colors of pills
Categorical data
d) The weights of aspirin tablets
Quantitative data

Question 2 – Data
Which of the following describe discrete data?
a) The numbers of people surveyed in each of the next several National Health and
Nutrition Examination Surveys
Discrete
b) The exact foot lengths (cm) of a random sample of statistics students
Not Discrete (Continuous)
c) The exact times that randomly selected drivers spend texting while driving during the
past 7 days
Discrete

Question 3
In a survey of 1020 adults in the United States, 44% said that they wash
their hands after riding public transportation (based on data from KRC Research).
a) Identify the sample and the population.
The population consists of all adults in the United States and the sample is the 1020
adults who were surveyed.
b) Is the value of 44% a statistic or a parameter? Why?
Statistics it is calculated from the sample and not the population.
c) What is the level of measurement of the value of 44%? (Nominal, ordinal, interval,
ratio)’
Ratio level of measurement
d) Are the numbers of subjects in such surveys discrete or continuous?
Discrete
e) The responses are “yes,” “no,” “not sure,” or “refused to answer.” Are these responses
quantitative data or categorical data?
Categorical data
Total 6 marks

Question 4
The data below shows the ages of patients in an optometry clinic on a certain day.
24 27 32 23 35 34 28 40 28 29 45 51 24 33 42
22 34 21 34 56 38 29 41 44 27 30 63 30 39 49

a) Draw a stem-and-leaf diagram to display these data.


Order the data in ascending order
21 22 23 24 24 27 27 28 28 29 29 30 30 32 33
34 34 34 35 38 39 40 41 42 44 45 49 51 56 63

Age Stem-and-Leaf Plot

Frequency Stem & Leaf

2.12344778899
3.0023444589
4.012459
5.16
6.3
Key 2|1 = 21
(2marks)

b) Determine the
a. Median
33+34
Median = 2 = 33.5
(2marks)
b. Interquartile range
NOTE – ONE APPROACH
Finding the quartiles
• First arrange the data in ascending order
Case 1: An even number of data values
• Split the data into their upper half and lower half
• Then the median of the upper half is Q3, and the median of the lower
half is Q1.
Case 2: An odd number of data values
• Find the median, Q2, and delete it from the list.
• Split the remaining data into their upper half and lower half.
• Then the median of the upper half is Q3, and the median of the lower
half is Q1.
Interquartile range = upper quartile(Q3) – lower quartile(Q1)
Interquartile range = 41 – 28 = 13
(4marks)
c. Mean
∑𝑥 1052
Mean 𝜇 = 𝑛 = 30 = 35.067
(2marks)
d. 5% trimmed mean
This is the mean when the lowest 5% and highest 5% approximately of the
ordered data are excluded. i.e., removing approximately the upper 2 and lower
2 values. Then finding the mean of the remaining numbers.
21 22 23 24 24 27 27 28 28 29 29 30 30 32 33
34 34 34 35 38 39 40 41 42 44 45 49 51 56 63

∑𝑥 890
5% trimmed Mean ≈ = ≈ 34.23
𝑛 26
(4marks)
e. Standard Deviation
∑ 𝑥 = 1052
∑ 𝑥 2 = 39998
n = 30
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2
𝜎= √
𝑛(𝑛 − 1)

30. (39998) − (1052)2


𝜎= √
30(30 − 1)
𝜎 = 10.35219

(3marks)
f. Variance
Variance = 𝜎 2 = 10.352192 = 107.168
(1mark)

c) Construct a box-and-whisker plot for this distribution.

OR
(4marks)

d) Hence describe the shape of the distribution by discussing its skewness and normality.
The data is positively skew and does not follow a bell shape curve hence it is not
normal.

(3marks)

Total 25 marks

Question 5 - Histogram
Answer the questions by referring to the following histogram, which represents the sepal
widths (mm) of a sample of irises.
a) Based on the histogram, what is the approximate number of irises in the sample?
Approximately 50
b) What is the class width? What are the approximate lower- class and upper-class limits
of the first class?
Class Width is the difference between two consecutive lower-class limits (or two
consecutive lower-class boundaries) in a frequency distribution.
That is, 2.5 – 2 = 0.5.
Lower-class = 2
Upper Class = 2.5

c) What is the largest possible value? Would that value be an outlier?


4.5 mm. This will not be an outlier.

d) Does it appear that the sample is from a population having a normal distribution?
The sample does not seem normally distributed since is does not follow the bell shape
curve.

Question 6 – Weighted Mean


A student of one of the authors earned grades of A, C, B, A, and D. Those courses had these
corresponding numbers of credit hours: 3, 3, 3, 4, and 1. The grading system assigns quality
points to letter grades as follows: A = 4; B = 3; C = 2; D = 1; F = 0. Compute the grade-point
average (GPA) and round the result with two decimal places. If the dean’s list requires a GPA
of 3.00 or greater, did this student make the dean’s list?

Grade No. of Credit Quality Points w.f


Hours f w
A 3 4 12
B 3 3 9
C 3 2 6
D 4 1 4
F 1 0 0
Total ∑ 10 31
∑(𝑤.𝑥)
Weighted Mean, 𝑥̅ = ∑𝑤

31
𝑥̅ = = 3.10
10
Yes the student did make the dean’s list.
(5marks)

Question 7 – z scores
Body Data

Females have pulse rates with a mean of 74.0 beats per minute (BPM) and a standard
deviation of 12.5 beats per minute and that maximum is 104 BPM.
a) What is the difference between the maximum and the mean?
Difference = 104 – 74.0 = 30
(1mark)

b) How many standard deviations is that [the difference found in part (a)]?
𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 30
That is = 12.5 = 2.4
𝜎
c) Convert the maximum pulse rate to a z score.
𝑋−𝜇
𝑧=
𝜎
104 − 74
𝑧=
12.5

𝑧 = 2.4

d) If we consider pulse rates that convert to z scores between -2 and 2 to be neither


significantly low nor significantly high, is the maximum pulse rate significant?
The maximum pule rate of 104 bpm is significantly high since z = 2.4 > 2.
Question 8 – Descriptives for Categorical Data

a) Identify the class width, class midpoints, and class boundaries for the given frequency
distribution.
Class Width = 100
Class midpoints. x = 49.5, 149.5, 249.5, 349.5, 449.5, 549.5, 649.5
Class Boundaries = -0.5, 99.5, 199.5, 299.5, 399.5, 499.5, 599.5.
(3marks)
b) Construct a histogram and bar chat.
Histogram
Bar Graph

(4marks)

c) Determine whether the frequency distribution is approximately a normal distribution.


The distribution is not normal since it does not follow a bell shape curve
(1mark)

d) Ignore the given frequencies. Assume that the first three frequencies are 2, 12, and 18,
respectively. Assuming that the distribution of the 153 sample values is a normal
distribution, identify the remaining four frequencies.

Blood Platelets Frequency


Count of Males
0 - 99 2
100 - 199 12
200 - 299 18
300 - 399 89
400 - 499 18
500 – 599 12
600 – 699 2
The four remaining numbers are 89, 18, 12 and 2.
(4marks)
e) Find the mean find the mean of the data summarized in the original frequency
distribution.
Blood Platelets Frequency Midpoint (x) xf
Count of Males
0 - 99 1 49.5 49.5
100 - 199 51 149.5 7624.5
200 - 299 90 249.5 22455
300 - 399 10 349.5 3495
400 - 499 0 449.5 0
500 – 599 0 549.5 0
600 – 699 1 649.5 649.5
∑ 153 34273.5

∑ 𝑥𝑓
𝑥̅ = ∑𝑓
=
49.5(1)+ 149.5(51)+ 249.5(90)+349.5(10)+449.5(0)+ +549.5 (0)+ 649.5(1)
1+51+90+10+0+0+1
49.5+7624.5+22455+3495+0+0+ 649.5
=
153
34273.5
=
153
𝑥̅ = 224.0098
(3marks)
f) Find the Standard deviation using find the standard deviation by using the formula
below, where x represents the class midpoint, f represents the class frequency, and n
represents the total number of sample values.

Blood Frequency (f) Midpoint (x) 𝒙𝟐 f.x f.𝒙𝟐


Platelets
Count of
Males
0 - 99 1 49.5 2450.25 49.5 2450.25
100 - 199 51 149.5 22350.25 7624.5 1139863
200 - 299 90 249.5 62250.25 22455 5602523
300 - 399 10 349.5 122150.3 3495 1221503
400 - 499 0 449.5 202050.3 0 0
500 – 599 0 549.5 301950.3 0 0
600 – 699 1 649.5 421850.3 649.5 421850.3
∑ 153
1135052 34273.5 8388188

n = ∑ 𝑓 =153
153(8388188) − (34273.5)2
𝑠=√
153(153 − 1)
1283392802 − 1174672802
𝑠=√
23256

s = √4674.9226
s = 68.37
(4marks)
g) Hence, find the variance.
Variance = s2 = 68.372 = 4674.9226
(1mark)
Total 20marks

You might also like