0% found this document useful (0 votes)
54 views79 pages

MetNum1 2023 1 Week 10

This document provides an outline for the topics covered in Week 10 of the CE23216 MetNum 1 course. The week focuses on descriptive statistics and sampling techniques. Key concepts covered include measures of location (percentiles, quartiles), measures of variability (range, standard deviation), measures of central tendency (mean, median, mode), and common sampling methods. The document also includes examples of calculating percentiles from a data set and determining the mean from a set of observations.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views79 pages

MetNum1 2023 1 Week 10

This document provides an outline for the topics covered in Week 10 of the CE23216 MetNum 1 course. The week focuses on descriptive statistics and sampling techniques. Key concepts covered include measures of location (percentiles, quartiles), measures of variability (range, standard deviation), measures of central tendency (mean, median, mode), and common sampling methods. The document also includes examples of calculating percentiles from a data set and determining the mean from a set of observations.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

CE23216 MetNum 1

Semester 2023/1

Week 10

Rehan Hussain, Ph.D.

1
Topics covered in Part 2

Week Title
Week 9 Introduction to statistics and probability
Week 10 Descriptive statistics and sampling techniques
Week 11 Probability theory
Week 12 Discrete and continuous probability distributions
Week 13 Variance, co-variance, and correlation
Week 14 Statistical inference methods
Week 15 Statistical analysis using Octave/MATLAB
Week 16 UAS

2
Outline

What's in this week’s lectures?

1. Measures of location and center

2. Measures of variability

3. Measures of shape

4. Data presentation methods

5. Sampling methods

3
Cartoon of the week

4
What is descriptive statistics?

— Descriptive statistics aims to summarize a sample, rather than to learn


about the population from which the sample was drawn.

— Descriptive statistics forms the basis of nearly every quantitative


analysis of data!

— Data are usually described in terms of location, central tendency,


variability, and shape.

5
Summary of measures

Measures of Variability
Measures of Location üRange
üPercentile üInterquartile range
üQuartile üVariance
üStandard Deviation

Measures of Central Tendency Measures of Shape


üMedian üSkewness
üMode üKurtosis
üMean

6
Topic 1: Measures of location
and center

7
Sorting data

l To calculate useful statistics, the data collected in an experiment


usually needs to be sorted in some way.

l Two ways of doing this are ordering and grouping.

l In ordering, the data are normally arranged from the smallest value
to the largest (a.k.a. ascending order).

l In grouping, data are grouped according to the number of times


each value appears, which is known as the frequency.

8
Sorting data

l Suppose a group of eight children have the following ages:


5, 6, 8, 5, 5, 7, 5, 6

i xi
We can put the data 1 5 We can group the
i fi xi
in ascending order, 2 5 data, as follows:
1 4 5
as follows: 3 5
2 2 6
4 5 i - group number
3 1 7
i - observation 5 6 fi - frequency 4 1 8
number 6 6 xi - data value
xi - data value 7 7
8 8

9
Percentiles

l Percentiles are a measure of location, whereby the data are partitioned into
100 segments.

l The Pth percentile in the ordered set is that value below which P% of the
observations in the set lie.

l The position of the Pth percentile is given by


(𝑛 + 1)𝑃/100
where 𝑛 is the number of observations (or sample size).

l If the position is not a whole number, linear interpolation is used to find the
correct percentile value.

10
Worked Example 1: percentiles
Marks
33
26
24
The marks scored by a 21
group of students on their 19
final exams are shown on 20
the right. 18
18
52
56
Find the 50th, 80th, and 27
90th percentiles of the 22
given data set. 18
49
22
20
23
32
20
18

11
Worked Example 1: percentiles
Marks
i (sorted)
1 18
2 18
Solution: 3 18
4 18
The first step is 5 19
to sort the data in 6 20
ascending order 7 20
8 20
and write the 9 21
corresponding 10 22
index number (𝑖) 11 22
for each 12 23
observation. 13 24
14 26
15 27
16 32
17 33
18 49
19 52
20 56

12
Worked Example 1: percentiles

l To find the 50th percentile, determine its position:


(n + 1)P/100 = (20 + 1)(50/100) = 10.5

l Hence, we linearly interpolate between the 10th and 11th observations:

24
23
Marks
22
21
20
8 9 10 11 12
Observation number

l In this case, since both the 10th and 11th observations have the same
value, the 50th percentile is 22.
13
Worked Example 1: percentiles

l To find the 80th percentile, determine its position:


(n + 1)P/100 = (20 + 1)(80/100) = 16.8

l Hence, we linearly interpolate between the 16th and 17th observations:


37
35
33
Marks 31
29
27
15 16 17 18
Observation number

l In this case, since the 16th and 17th observations have different values, the
80th percentile is 32.8.

14
Worked Example 1: percentiles

l To find the 90th percentile, determine its position:


(n + 1)P/100 = (20 + 1)(90/100) = 18.9

l Thus, the 90th percentile is located between the 18th observation


(49) and the 19th observation (52)

l Using linear interpolation, we get:


P90 = 49 + 0.9(52 – 49) = 49 + 2.7 = 51.7

15
Quartiles

l Quartiles are the specific percentiles which break down the


ordered data set into quarters.

ØThe 25th percentile is known as the first (or lower) quartile (Q1).

ØThe 50th percentile is known as the second (or middle) quartile


(Q2).

ØThe 75th percentile is known as the third (or upper) quartile (Q3).

16
Arithmetic mean or average
l The arithmetic mean or average of a set of measurements is a measure of
center. It is equal to the sum of the measurements divided by the total
number of measurements, n.
l For a sample, the mean is normally assigned the symbol 𝑥̅ (pronounced
as ‘x bar’)
Ungrouped data Grouped data

å xi å f i .xi
x= x=
n n
Note: If we were able to enumerate the whole population, the population
mean would be assigned the symbol μ.

17
Calculating the sample mean

The table shows 8 observations of pull-off force from engine connectors, in


Newtons. The mean pull-off force is given by:

i xi
∑$!"# 𝑥!
𝑥= 1 12.6
8 2 12.9
12.6 + 12.9+. . . +13.1 3 13.4
=
8 4 12.3
104 5 13.6
= = 13.0 N 6 13.5
8
7 12.6
8 13.1

18
Median
• The median of a set of measurements is the middle measurement when
the measurements are put in ascending order.

• The position of the median is


Note: For an odd number of
observations, the median will
0.5(n + 1) correspond to a single data
point.

• The median is the same as the second quartile or 50th percentile.


Therefore, we have already seen how to calculate it (Worked Example 1).

19
Mode

— The mode is the data value with the highest frequency.

— For example:
1. For the set {2, 4, 9, 8, 8, 5, 3}, the mode is 8, which occurs twice.
2. For the set {2, 2, 9, 8, 8, 5, 3}, there are two modes: 8 and 2 (this
is called a bimodal set).
3. For the set {2, 4, 9, 8, 5, 3}, there is no mode (each value is
unique).

20
Mean vs median vs mode

The number of cartons (quarts) of milk purchased each week by 25


households are as follows:
0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5

å xi 55
— Mean x= = = 2.2
n 25

— Median 𝑀=2 13th data point

— Mode Mo = 2 The number 2 appears 9 times

21
Class Exercise 1: mean, median, mode

22
Topic 2: measures of variability

23
Variability (dispersion)

— Measures of variability tell us how spread out the data are (in other words, the
degree to which the different data points deviate from the central value).

Low variability

Medium variability

High variability

24
Measures of variability

l Range refers to the difference between the maximum and


minimum values of a dataset

l Interquartile range (IQR) refers to the difference between the


third and first quartiles (Q3 - Q1)

l Variance refers to the average of the squared deviations from


the mean

l Standard deviation refers to the square root of the variance

25
Range

— If the n observations in a sample are denoted by x1, x2, …, xn, the sample range
is given by the largest observation in the sample minus the smallest observation,
i.e.:

r = max(xi) – min(xi)

— Note that the population range is always ≥ the sample range.

26
Sample range

For the data set shown in Worked Example 1,

Range = Maximum – Minimum


= 56 – 18 = 38

Position Interpolation
First Quartile: (20 + 1) ´ 25/100 = 5.25 19 + (.25)(1) = 19.25

Third Quartile: (20 + 1) ´ 75/100 = 15.75 27 + (.75)(5) = 30.75

Interquartile Range = Q3 – Q1
= 30.75 – 19.25 = 11.5

27
Variance

If the 𝑛 observations in a sample are denoted by 𝑥#, 𝑥%, … , 𝑥& , the


sample variance is

å ( x - x ) 2 Q: Why do we use a
s2 = i squared term? Why
n -1 not just ∑ 𝑥! − 𝑥̅ ?

For the 𝑁 observations in a population denoted by 𝑥#, 𝑥%, … , 𝑥' , the


population variance is

å ( x - µ ) 2
s2 = i

N
28
Bessel’s correction
— You may have noticed that the sample variance is divided by (n – 1), whereas the
population variance uses the total population size N.

— The use of (n – 1) is known as Bessel’s correction and reduces the bias (i.e. the
discrepancy between the measured value and the ‘true’ value) due to estimating the
population variance using a sample. A more detailed explanation can be found here!

— The smaller the sample, the greater the bias, as the sample mean is less representative
of the population mean for smaller samples.

29
Standard deviation

— The variance is a measure of how spread out the dataset is. However, because it
uses a squared term, it does not give us a measure of how far the data is from
the mean in terms of the same units as the mean.

— Therefore, to obtain a more direct measure of the variation of the data points
relative to the mean, we take the square root of the variance, which is known as
the standard deviation.
Sample standard deviation: 𝑠 = 𝑠 %
Population standard deviation : 𝜎 = 𝜎 %

30
Calculating the sample variance
Recall the pull-off force data in Slide 18. The mean was calculated as 𝑥̅ = 13.0 N. The table
below displays the quantities needed to calculate the sample variance and sample standard
deviation:
"
𝑥̅ 2
i xi x i𝑥-! − 𝑥̅
xbar (x i𝑥-! −
xbar)
1 12.6 -0.4 0.16
2 12.9 -0.1 0.01
3 13.4 0.4 0.16
4 12.3 -0.7 0.49
5 13.6 0.6 0.36
6 13.5 0.5 0.25
7 12.6 -0.4 0.16
8 13.1 0.1 0.01

Hence, s2 = 0.23 N2 and s = 0.48 N (note: the desired accuracy is


normally one more decimal place than the data).

31
Shortcut formula
— Using the definition of the mean, it is possible to rewrite the variance formula from the previous
slide,
å( xi - x ) 2
s =
2

n -1
into the following form (try it yourself before watching the derivation!):

2 ( å x ) 2
å xi - i

s2 = n
n -1
— This is known as the shortcut formula, as it allows us to calculate the variance using values of xi
directly, without having to subtract each value from the mean.

32
Using the shortcut formula
2
n
æn
ö
å 2
ix - ç å xi ÷
è i =1 ø
n i xi x i2
s 2 = i =1 1 12.6 158.76
n -1 2 12.9 166.41
3 13.4 179.56

1,353.60 - (104.0 ) 8
2 4 12.3 151.29
5 13.6 184.96
= 6 13.5 182.25
7 7 12.6 158.76
8 13.1 171.61
1.60
= = 0.2286 pounds
0.23 N 2 2 sums = 104.0 1,353.60
7
When calculating s, don’t forget to
s = 0.2286 = 0.48 N2
0.48pounds use the unrounded value of s2!

33
Class Exercise 2: mean and std dev

(a) Suppose the mean score on a national test is 400 with a standard
deviation of 50. if each score is increased by 25, what are the new
mean and standard deviation?
(b) Suppose the mean score on a national test is 400 with a standard
deviation of 50. if each score is increased by 25%, what are the
new mean and standard deviation?

34
Topic 3: measures of shape

35
Skewness and kurtosis

l Skewness is a measure of the degree of asymmetry of a


distribution (hence, it only applies to grouped data). The data can
be either:
• Skewed to the left (negative skew)
• Symmetric (unskewed)
• Skewed to the right (positive skew)

l Kurtosis is a measure of the flatness or peakedness of a


distribution. The data can be:
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
36
Skewness

Mean<Median<Mode Mean = Median = Mode Mean>Median>Mode


Skewed to left Skewed to right
37
Kurtosis
• Leptokurtic: high and thin, peaked distribution
• Mesokurtic: normal in shape, not too flat and not too peaked
• Platykurtic: flat and spread out, flat distribution

38
Topic 4: data
presentation
methods

39
Data presentation

— In science and engineering, the


communication of technical data is
hugely important. For example, it
enables researchers to share their
findings with the public or each other,
and engineers to present their designs
to the company management.

— Here, we will look at some common


ways of communicating statistical data
through visual representations.

40
Methods for displaying data

l Pie chart: categories are represented as percentages of the total.


It is circular in shape, like a pie.

l Bar chart: categories are represented as rectangles (‘bars’) with


different heights. Can be either vertical or horizontal.

l Frequency polygon: line graph of frequency

l Ogive: line graph of cumulative frequency

l Time series plot: shows how values change over time

41
Pie charts and bar charts
Student Cola Preference From Survey
Colas Frequency (Count) 112
120

Coca Cola 88 100 88


Pepsi 63 80

Frequency
63
Bloxy Cola 112 60 47 47
Mecca Cola 47
40 27
RC Cola 13
20 13
Corsica Cola 27
Zam Zam Cola 47 0
Coca Pepsi Bloxy Me cca RC Cola Corsica Zam
Cola Cola Cola Cola Zam
Cola

Student Cola Preference From Survey

Zam Zam Cola 47

RC Cola 13

Pepsi 63

Me cca Cola 47

Corsica Cola 27

Coca Cola Pepsi Bloxy Cola Me cca Cola Coca Cola 88


RC Cola Corsica Cola Zam Zam Cola
Bloxy Cola 112

42
Time series plots
— Time series plots show the data (or statistic) value on the vertical axis
and the time on the horizontal axis.
— Such plots reveal trends, cycles or other time-oriented behavior that
could not otherwise be seen in the data.

Company sales: (a) yearly and (b) quarterly.


43
Box and whisker plots

— Box and whisker plots (also just called box plots) illustrate the data in a graphical display
that simultaneously describes several important features of a data set, such as:
Ø Center
Ø Spread
Ø Symmetry
Ø Identification of outliers

— Five specific values are used: the median (Q2), the


first quartile (Q1), the third quartile (Q3) and the
maximum and minimum values of the data set.

44
Box plot without outliers

• The figure below shows the construction of a box plot without outliers:

• Sometimes, instead of showing the maximum and minimum values, the


whiskers extend to the furthest data points within 1.5 IQR on each side of the
box. Values outside the plot are shown as outliers. This type of box plot will be
referred to as a ‘box plot with outliers’.

45
Box plot with outliers
To illustrate the construction of a box plot with outliers, consider the alloy compressive strength
data listed in Table 1.

Table 1. Compressive strength (psi) of aluminum-lithium specimens.

105 221 183 186 121 181 180 143


97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149

46
Box plot with outliers

Step 1: Find the values of Q1, Q2, and Q3 and calculate the IQR.

Q1 = 143.5, Q2 = 161.5, Q3 = 181, IQR = 37.5

Step 2: Construct the ‘box’ (can be either horizontal or vertical)

Step 3: Find the upper and lower limits for outliers.

The upper limit is given by: 181 + 1.5 IQR = 237.25


The lower limit is given by: 143.5 − 1.5 IQR = 87.25

Step 4: Find the closest data points within the upper and lower limits and draw the ‘whiskers’
extending to these points.

Step 5. Mark the outlier values and label the plot.

47
Box plot (with outliers)
The figure shows the resulting box plot (with outliers) of the compressive strength data of 80
aluminum-lithium alloy specimens. We can see that the dataset contains three outliers, at
245, 87, and 76.

IQR

IQR

IQR

48
Histograms
• A histogram is way of representing grouped data. It looks like a bar chart, but
with data values on the x-axis and frequency (or relative frequency, which is
normalized relative to the sample size, n) on the y-axis.

• Unlike a bar chart, there are no


spaces between the bars on a
histogram (the x-axis is
continuous). For the data in Slide
21, the histogram would look like:

49
Grouping data into intervals

l When there the number of observations is large, it is useful to partition


the data into intervals or classes (e.g. different age groups for people: 0–
16, 17–30, 31–45, 45+).

l Intervals should be
o Mutually exclusive: Every observation is assigned to only one group,
without any overlap
o Exhaustive: Every observation is assigned to a group

— In addition, the intervals are normally equal width, although the first or
last group may be open-ended

50
Frequency distributions

— A frequency distribution is a table that shows classes or intervals of data entries


with a count of the number of entries in each class.

— The data is gathered into bins or cells, which are defined by the boundaries of
the intervals.

— The boundaries of the intervals should be convenient values (e.g. whole


numbers), as should the interval width.

— The number of classes multiplied by the interval width should exceed the range
of the data (meaning that all the data fit within the defined intervals).

51
Constructing a frequency distribution
Step 1. Decide on the number of classes (k) to include in the frequency distribution. We can do this
using Sturge’s rule and then rounding up.
Sturge’s rule: k = 1 + 3.22 log n, where n = sample size

Step 2. Find the class interval (i) as follows: determine the range (r) of the data, divide the range by
the number of classes, and round up to the next convenient number.
𝑟
𝑖=
𝑘

Step 3. Find the class limits. You can use the minimum data value as the lower limit of the first
class.

Step 4. Make a tally mark for each data entry in the row of the appropriate class. Count the tally
marks to find the frequency (f) for each class.

52
Worked Example 2: frequency distribution

The following sample data set lists the number of minutes 50 students spent on
social media during a given day. Construct a frequency distribution of the data.

50 40 41 17 11 7 22 44 28 21 19 23 37 51 54 42 88
41 78 56 72 56 17 7 69 30 80 56 29 33 46 31 39 20
18 29 34 59 73 77 36 39 30 62 54 67 39 31 53 44

53
Worked Example 2: frequency distribution

Solution:
1. 𝑘 = 1 + 3.22 log 50 = 6.47 → Round up to 7
$$+,
2. 𝑖 = ,
= 11.52 → Round up to 12
Class Tally Frequency, f Relative Cumulative
frequency frequency
7 – 18 |||| | 6 0.12 6
19 – 30 |||| |||| 10 0.2 16
31 – 42 |||| |||| ||| 13 0.26 29
43 – 54 |||| ||| 8 0.16 37
55 – 66 |||| 5 0.1 42
67 – 78 |||| | 6 0.12 48
79 – 90 || 2 0.04 50
Σ f = 50 1.00 54
Histogram of a frequency
distribution
— Histograms are commonly used to display frequency distributions. However, since the histogram
is on a continuous scale, the class boundaries for discrete data will need to be adjusted so that
there are no gaps in between.

— For the data from Example 9, the distance from the upper limit of the first class to the lower limit
of the second class is 19 – 18 = 1. Half this distance is 0.5. Hence, on the histogram, the upper
boundary used for the first class (which is also the lower boundary for the second class) is 18.5.
Similarly, the second class is adjusted to be between 18.5 and 30.5, and so on.

— Note that for the first class, the starting value is normally adjusted down to maintain an equal
width to the other intervals. In this case, the starting value would be 6.5.

55
Histogram of a frequency distribution
— The resulting histogram is constructed as follows, with the x-axis labelled with either the
class midpoints (left) or class boundaries (right):

Social media usage Social media usage

56
Frequency polygon

— A frequency polygon is like a histogram, but instead of bars, the class midpoints are
connected using straight lines.

Social media usage

To ensure that the graph


begins and ends on the
horizontal axis, extend the
graph by one class width at
each end.

57
Relative frequency histogram

— The corresponding relative frequency histogram, obtained by dividing the f


values through by n, is as follows:
Social media usage

58
Ogive
— Cumulative frequency refers to the total frequency value at each upper class boundary (as shown
in the table). A plot of cumulative frequency is known as an ogive (see the figure).

Social media usage

— Notice that the graph starts at 6.5, where the cumulative frequency is 0, and ends at 90.5, where
the cumulative frequency is 50.

59
Class Exercise 3: histograms

60
Topic 5: sampling
techniques

61
What is sampling?

— Sampling is a technique of selecting individual members or a subset of the


population to make statistical inferences from them and estimate characteristics
of the whole population.

— Proper sampling is a time-saving and cost-effective way to design experiments.

— For example, different sampling methods are widely used by researchers in the
field of market research so that they do not need to study the entire population to
collect actionable insights.

62
Types of sampling

— Probability sampling: in this method, a researcher chooses members of a


population randomly. Hence, all the members have an equal opportunity to be a
part of the sample.

— Non-probability sampling: in this method, individuals are selected based on non-


random criteria, and not every individual has a chance of being included.

63
Probability sampling

— Probability sampling is mainly used in quantitative research, as it is more likely to


produce results that are representative of the entire population.

— Different types of probability sampling techniques include:


Ø Simple random sampling
Ø Systematic sampling
Ø Stratified sampling
Ø Cluster sampling

64
Simple random sampling

— Sampling frame includes the entire population.


— True randomization (such as a random number generator) is used to select members of
the population.

65
Systematic sampling

— Slightly easier to conduct than simple random sampling.


— Each member is assigned a number and members are chosen at regular intervals.
— Can result in bias or skew if there is a hidden pattern in the list.

66
Stratified sampling

— The population is divided into subgroups (or strata) according to certain


criteria (e.g. gender, age range, job role)
— Equal numbers are then chosen from each strata using simple random
or systematic sampling
— This can ensure each important category is represented properly.

67
Cluster sampling

— The population is divided into subgroups that should all have similar
characteristics to the whole group. The subgroups are then randomly
selected.
— If subgroups are very large, these can be further sampled (known as
multistage sampling).
— Can lead to errors if the subgroups are not truly representative.

68
Non-probability sampling
— Non-probability sampling is mainly used in qualitative or exploratory research,
where the goal is to gain an initial understanding of a small or under-researched
population.

— Can be cheaper and easier to implement than probability sampling but is also
more prone to sampling bias which can cause errors.

— Different types of non-probability sampling techniques include:


Ø Convenience sampling
Ø Voluntary response sampling
Ø Purposive sampling
Ø Snowball sampling

69
Convenience sampling

— Members of the population are selected based on ease of access (e.g. living in the same city or
country as the researcher).
— Cheap and easy to implement, but there is no way to tell if the sample is representative, so
results may not be generalizable.

70
Voluntary response sampling

— The sample consists of people who willingly participate (volunteer) in the study. Hence, it is easy
for the researcher to implement.
— Likely to result in a degree of sample bias as some people may be inherently more likely than
others to volunteer.

71
Purposive sampling

— Researcher uses their prior knowledge or expertise to select the most suitable sample (also
known as judgement sampling).
— Cheap and easy to implement, but there is no way to tell if the sample is representative, so
results may not be generalizable.

72
Snowball sampling

— If the population is hard to access, snowball sampling can be used to recruit participants via other
participants.
— The number of people you have access to grows rapidly (or “snowballs”) as you contact more
people.

73
Class Activity: sampling

Collect the heights of all students in a spreadsheet. Calculate the mean height of
the class.

a) Use simple random sampling to select a sample of 5 students. Calculate the


mean. How does it differ from the entire class mean?
b) Use simple random sampling to select a sample of 10 students. Calculate
the mean. How does it differ from the entire class mean?
c) Use stratified sampling to select samples of 10 boys and 10 girls,
respectively. Calculate the sample means. How do they differ from the entire
class mean?

74
Problem Set 2

75
Question 1

The concentration of ions in a solution was measured an


operator using the same instrument 10 times. She obtained
the following data (ppm):

7.15, 7.20, 7.18, 7.19, 7.21, 7.20, 7.16, 7.18, 7.20, 7.17

(a) Calculate the sample mean, median, and mode


(b) Find the interquartile range
(c) Calculate the variance and standard deviation

76
Question 2
From the following data,
1.09 1.92 2.31 1.79 2.28 1.74 1.47 1.97
0.85 1.24 1.58 2.03 1.7 2.17 2.55 2.11
1.86 1.9 1.68 1.51 1.64 0.72 1.69 1.85
1.82 1.79 2.46 1.88 2.08 1.67 1.37 1.93
1.4 1.64 2.09 1.75 1.63 2.37 1.75 1.69

(a) Find the mean, median, and mode.


(b) Calculate the variance and standard deviation.
(c) Construct a box and whisker plot with outliers.

77
Question 3

Fifty students were asked how Table 1. Sleep data for students.
much sleep they get per school Amount of Frequency
night, rounded to the nearest sleep per
school night
hour. The results are shown in (hours)
Table 1. Based on these data: 4 2
5 5
(a) Calculate the 28th and 80th 6 7
percentiles 7 12
(b) Construct a relative 8 14
9 7
frequency histogram
10 3
(c) Construct an ogive

78
Question 4

79

You might also like