0% found this document useful (0 votes)

54 views79 pages

MetNum1 2023 1 Week 10

This document provides an outline for the topics covered in Week 10 of the CE23216 MetNum 1 course. The week focuses on descriptive statistics and sampling techniques. Key concepts covered include measures of location (percentiles, quartiles), measures of variability (range, standard deviation), measures of central tendency (mean, median, mode), and common sampling methods. The document also includes examples of calculating percentiles from a data set and determining the mean from a set of observations.

Uploaded by

donbradman334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views79 pages

MetNum1 2023 1 Week 10

Uploaded by

donbradman334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 79

CE23216 MetNum 1

Semester 2023/1

Week 10

Rehan Hussain, Ph.D.

1
Topics covered in Part 2

Week Title
Week 9 Introduction to statistics and probability
Week 10 Descriptive statistics and sampling techniques
Week 11 Probability theory
Week 12 Discrete and continuous probability distributions
Week 13 Variance, co-variance, and correlation
Week 14 Statistical inference methods
Week 15 Statistical analysis using Octave/MATLAB
Week 16 UAS

2
Outline

What's in this week’s lectures?

1. Measures of location and center

2. Measures of variability

3. Measures of shape

4. Data presentation methods

5. Sampling methods

3
Cartoon of the week

4
What is descriptive statistics?

Descriptive statistics aims to summarize a sample, rather than to learn

about the population from which the sample was drawn.

Descriptive statistics forms the basis of nearly every quantitative

analysis of data!

Data are usually described in terms of location, central tendency,

variability, and shape.

5
Summary of measures

Measures of Variability
Measures of Location üRange
üPercentile üInterquartile range
üQuartile üVariance
üStandard Deviation

Measures of Central Tendency Measures of Shape

üMedian üSkewness
üMode üKurtosis
üMean

6
Topic 1: Measures of location
and center

7
Sorting data

l To calculate useful statistics, the data collected in an experiment

usually needs to be sorted in some way.

l Two ways of doing this are ordering and grouping.

l In ordering, the data are normally arranged from the smallest value
to the largest (a.k.a. ascending order).

l In grouping, data are grouped according to the number of times

each value appears, which is known as the frequency.

8
Sorting data

l Suppose a group of eight children have the following ages:

5, 6, 8, 5, 5, 7, 5, 6

i xi
We can put the data 1 5 We can group the
i fi xi
in ascending order, 2 5 data, as follows:
1 4 5
as follows: 3 5
2 2 6
4 5 i - group number
3 1 7
i - observation 5 6 fi - frequency 4 1 8
number 6 6 xi - data value
xi - data value 7 7
8 8

9
Percentiles

l Percentiles are a measure of location, whereby the data are partitioned into
100 segments.

l The Pth percentile in the ordered set is that value below which P% of the
observations in the set lie.

l The position of the Pth percentile is given by

(𝑛 + 1)𝑃/100
where 𝑛 is the number of observations (or sample size).

l If the position is not a whole number, linear interpolation is used to find the
correct percentile value.

10
Worked Example 1: percentiles
Marks
33
26
24
The marks scored by a 21
group of students on their 19
final exams are shown on 20
the right. 18
18
52
56
Find the 50th, 80th, and 27
90th percentiles of the 22
given data set. 18
49
22
20
23
32
20
18

11
Worked Example 1: percentiles
Marks
i (sorted)
1 18
2 18
Solution: 3 18
4 18
The first step is 5 19
to sort the data in 6 20
ascending order 7 20
8 20
and write the 9 21
corresponding 10 22
index number (𝑖) 11 22
for each 12 23
observation. 13 24
14 26
15 27
16 32
17 33
18 49
19 52
20 56

12
Worked Example 1: percentiles

l To find the 50th percentile, determine its position:

(n + 1)P/100 = (20 + 1)(50/100) = 10.5

l Hence, we linearly interpolate between the 10th and 11th observations:

24
23
Marks
22
21
20
8 9 10 11 12
Observation number

l In this case, since both the 10th and 11th observations have the same
value, the 50th percentile is 22.
13
Worked Example 1: percentiles

l To find the 80th percentile, determine its position:

(n + 1)P/100 = (20 + 1)(80/100) = 16.8

l Hence, we linearly interpolate between the 16th and 17th observations:

37
35
33
Marks 31
29
27
15 16 17 18
Observation number

l In this case, since the 16th and 17th observations have different values, the
80th percentile is 32.8.

14
Worked Example 1: percentiles

l To find the 90th percentile, determine its position:

(n + 1)P/100 = (20 + 1)(90/100) = 18.9

l Thus, the 90th percentile is located between the 18th observation

(49) and the 19th observation (52)

l Using linear interpolation, we get:

P90 = 49 + 0.9(52 – 49) = 49 + 2.7 = 51.7

15
Quartiles

l Quartiles are the specific percentiles which break down the

ordered data set into quarters.

ØThe 25th percentile is known as the first (or lower) quartile (Q1).

ØThe 50th percentile is known as the second (or middle) quartile

(Q2).

ØThe 75th percentile is known as the third (or upper) quartile (Q3).

16
Arithmetic mean or average
l The arithmetic mean or average of a set of measurements is a measure of
center. It is equal to the sum of the measurements divided by the total
number of measurements, n.
l For a sample, the mean is normally assigned the symbol 𝑥̅ (pronounced
as ‘x bar’)
Ungrouped data Grouped data

å xi å f i .xi
x= x=
n n
Note: If we were able to enumerate the whole population, the population
mean would be assigned the symbol μ.

17
Calculating the sample mean

The table shows 8 observations of pull-off force from engine connectors, in

Newtons. The mean pull-off force is given by:

i xi
∑$!"# 𝑥!
𝑥= 1 12.6
8 2 12.9
12.6 + 12.9+. . . +13.1 3 13.4
=
8 4 12.3
104 5 13.6
= = 13.0 N 6 13.5
8
7 12.6
8 13.1

18
Median
• The median of a set of measurements is the middle measurement when
the measurements are put in ascending order.

• The position of the median is

Note: For an odd number of
observations, the median will
0.5(n + 1) correspond to a single data
point.

• The median is the same as the second quartile or 50th percentile.

Therefore, we have already seen how to calculate it (Worked Example 1).

19
Mode

The mode is the data value with the highest frequency.

For example:
1. For the set {2, 4, 9, 8, 8, 5, 3}, the mode is 8, which occurs twice.
2. For the set {2, 2, 9, 8, 8, 5, 3}, there are two modes: 8 and 2 (this
is called a bimodal set).
3. For the set {2, 4, 9, 8, 5, 3}, there is no mode (each value is
unique).

20
Mean vs median vs mode

The number of cartons (quarts) of milk purchased each week by 25

households are as follows:
0 0 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 5

å xi 55
Mean x= = = 2.2
n 25

Median 𝑀=2 13th data point

Mode Mo = 2 The number 2 appears 9 times

21
Class Exercise 1: mean, median, mode

22
Topic 2: measures of variability

23
Variability (dispersion)

Measures of variability tell us how spread out the data are (in other words, the
degree to which the different data points deviate from the central value).

Low variability

Medium variability

High variability

24
Measures of variability

l Range refers to the difference between the maximum and

minimum values of a dataset

l Interquartile range (IQR) refers to the difference between the

third and first quartiles (Q3 - Q1)

l Variance refers to the average of the squared deviations from

the mean

l Standard deviation refers to the square root of the variance

25
Range

If the n observations in a sample are denoted by x1, x2, …, xn, the sample range
is given by the largest observation in the sample minus the smallest observation,
i.e.:

r = max(xi) – min(xi)

Note that the population range is always ≥ the sample range.

26
Sample range

For the data set shown in Worked Example 1,

Range = Maximum – Minimum

= 56 – 18 = 38

Position Interpolation
First Quartile: (20 + 1) ´ 25/100 = 5.25 19 + (.25)(1) = 19.25

Third Quartile: (20 + 1) ´ 75/100 = 15.75 27 + (.75)(5) = 30.75

Interquartile Range = Q3 – Q1
= 30.75 – 19.25 = 11.5

27
Variance

If the 𝑛 observations in a sample are denoted by 𝑥#, 𝑥%, … , 𝑥& , the

sample variance is

å ( x - x ) 2 Q: Why do we use a
s2 = i squared term? Why
n -1 not just ∑ 𝑥! − 𝑥̅ ?

For the 𝑁 observations in a population denoted by 𝑥#, 𝑥%, … , 𝑥' , the

population variance is

å ( x - µ ) 2
s2 = i

N
28
Bessel’s correction
You may have noticed that the sample variance is divided by (n – 1), whereas the
population variance uses the total population size N.

The use of (n – 1) is known as Bessel’s correction and reduces the bias (i.e. the
discrepancy between the measured value and the ‘true’ value) due to estimating the
population variance using a sample. A more detailed explanation can be found here!

The smaller the sample, the greater the bias, as the sample mean is less representative
of the population mean for smaller samples.

29
Standard deviation

The variance is a measure of how spread out the dataset is. However, because it
uses a squared term, it does not give us a measure of how far the data is from
the mean in terms of the same units as the mean.

Therefore, to obtain a more direct measure of the variation of the data points
relative to the mean, we take the square root of the variance, which is known as
the standard deviation.
Sample standard deviation: 𝑠 = 𝑠 %
Population standard deviation : 𝜎 = 𝜎 %

30
Calculating the sample variance
Recall the pull-off force data in Slide 18. The mean was calculated as 𝑥̅ = 13.0 N. The table
below displays the quantities needed to calculate the sample variance and sample standard
deviation:
"
𝑥̅ 2
i xi x i𝑥-! − 𝑥̅
xbar (x i𝑥-! −
xbar)
1 12.6 -0.4 0.16
2 12.9 -0.1 0.01
3 13.4 0.4 0.16
4 12.3 -0.7 0.49
5 13.6 0.6 0.36
6 13.5 0.5 0.25
7 12.6 -0.4 0.16
8 13.1 0.1 0.01

Hence, s2 = 0.23 N2 and s = 0.48 N (note: the desired accuracy is

normally one more decimal place than the data).

31
Shortcut formula
Using the definition of the mean, it is possible to rewrite the variance formula from the previous
slide,
å( xi - x ) 2
s =
2

n -1
into the following form (try it yourself before watching the derivation!):

2 ( å x ) 2
å xi - i

s2 = n
n -1
This is known as the shortcut formula, as it allows us to calculate the variance using values of xi
directly, without having to subtract each value from the mean.

32
Using the shortcut formula
2
n
æn
ö
å 2
ix - ç å xi ÷
è i =1 ø
n i xi x i2
s 2 = i =1 1 12.6 158.76
n -1 2 12.9 166.41
3 13.4 179.56

1,353.60 - (104.0 ) 8
2 4 12.3 151.29
5 13.6 184.96
= 6 13.5 182.25
7 7 12.6 158.76
8 13.1 171.61
1.60
= = 0.2286 pounds
0.23 N 2 2 sums = 104.0 1,353.60
7
When calculating s, don’t forget to
s = 0.2286 = 0.48 N2
0.48pounds use the unrounded value of s2!

33
Class Exercise 2: mean and std dev

(a) Suppose the mean score on a national test is 400 with a standard
deviation of 50. if each score is increased by 25, what are the new
mean and standard deviation?
(b) Suppose the mean score on a national test is 400 with a standard
deviation of 50. if each score is increased by 25%, what are the
new mean and standard deviation?

34
Topic 3: measures of shape

35
Skewness and kurtosis

l Skewness is a measure of the degree of asymmetry of a

distribution (hence, it only applies to grouped data). The data can
be either:
• Skewed to the left (negative skew)
• Symmetric (unskewed)
• Skewed to the right (positive skew)

l Kurtosis is a measure of the flatness or peakedness of a

distribution. The data can be:
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
36
Skewness

Mean<Median<Mode Mean = Median = Mode Mean>Median>Mode

Skewed to left Skewed to right
37
Kurtosis
• Leptokurtic: high and thin, peaked distribution
• Mesokurtic: normal in shape, not too flat and not too peaked
• Platykurtic: flat and spread out, flat distribution

38
Topic 4: data
presentation
methods

39
Data presentation

In science and engineering, the

communication of technical data is
hugely important. For example, it
enables researchers to share their
findings with the public or each other,
and engineers to present their designs
to the company management.

Here, we will look at some common

ways of communicating statistical data
through visual representations.

40
Methods for displaying data

l Pie chart: categories are represented as percentages of the total.

It is circular in shape, like a pie.

l Bar chart: categories are represented as rectangles (‘bars’) with

different heights. Can be either vertical or horizontal.

l Frequency polygon: line graph of frequency

l Ogive: line graph of cumulative frequency

l Time series plot: shows how values change over time

41
Pie charts and bar charts
Student Cola Preference From Survey
Colas Frequency (Count) 112
120

Coca Cola 88 100 88

Pepsi 63 80

Frequency
63
Bloxy Cola 112 60 47 47
Mecca Cola 47
40 27
RC Cola 13
20 13
Corsica Cola 27
Zam Zam Cola 47 0
Coca Pepsi Bloxy Me cca RC Cola Corsica Zam
Cola Cola Cola Cola Zam
Cola

Student Cola Preference From Survey

Zam Zam Cola 47

RC Cola 13

Pepsi 63

Me cca Cola 47

Corsica Cola 27

Coca Cola Pepsi Bloxy Cola Me cca Cola Coca Cola 88

RC Cola Corsica Cola Zam Zam Cola
Bloxy Cola 112

42
Time series plots
Time series plots show the data (or statistic) value on the vertical axis
and the time on the horizontal axis.
Such plots reveal trends, cycles or other time-oriented behavior that
could not otherwise be seen in the data.

Company sales: (a) yearly and (b) quarterly.

43
Box and whisker plots

Box and whisker plots (also just called box plots) illustrate the data in a graphical display
that simultaneously describes several important features of a data set, such as:
Ø Center
Ø Spread
Ø Symmetry
Ø Identification of outliers

Five specific values are used: the median (Q2), the

first quartile (Q1), the third quartile (Q3) and the
maximum and minimum values of the data set.

44
Box plot without outliers

• The figure below shows the construction of a box plot without outliers:

• Sometimes, instead of showing the maximum and minimum values, the

whiskers extend to the furthest data points within 1.5 IQR on each side of the
box. Values outside the plot are shown as outliers. This type of box plot will be
referred to as a ‘box plot with outliers’.

45
Box plot with outliers
To illustrate the construction of a box plot with outliers, consider the alloy compressive strength
data listed in Table 1.

Table 1. Compressive strength (psi) of aluminum-lithium specimens.

105 221 183 186 121 181 180 143

97 154 153 174 120 168 167 141
245 228 174 199 181 158 176 110
163 131 154 115 160 208 158 133
207 180 190 193 194 133 156 123
134 178 76 167 184 135 229 146
218 157 101 171 165 172 158 169
199 151 142 163 145 171 148 158
160 175 149 87 160 237 150 135
196 201 200 176 150 170 118 149

46
Box plot with outliers

Step 1: Find the values of Q1, Q2, and Q3 and calculate the IQR.

Q1 = 143.5, Q2 = 161.5, Q3 = 181, IQR = 37.5

Step 2: Construct the ‘box’ (can be either horizontal or vertical)

Step 3: Find the upper and lower limits for outliers.

The upper limit is given by: 181 + 1.5 IQR = 237.25

The lower limit is given by: 143.5 − 1.5 IQR = 87.25

Step 4: Find the closest data points within the upper and lower limits and draw the ‘whiskers’
extending to these points.

Step 5. Mark the outlier values and label the plot.

47
Box plot (with outliers)
The figure shows the resulting box plot (with outliers) of the compressive strength data of 80
aluminum-lithium alloy specimens. We can see that the dataset contains three outliers, at
245, 87, and 76.

IQR

48
Histograms
• A histogram is way of representing grouped data. It looks like a bar chart, but
with data values on the x-axis and frequency (or relative frequency, which is
normalized relative to the sample size, n) on the y-axis.

• Unlike a bar chart, there are no

spaces between the bars on a
histogram (the x-axis is
continuous). For the data in Slide
21, the histogram would look like:

49
Grouping data into intervals

l When there the number of observations is large, it is useful to partition

the data into intervals or classes (e.g. different age groups for people: 0–
16, 17–30, 31–45, 45+).

l Intervals should be
o Mutually exclusive: Every observation is assigned to only one group,
without any overlap
o Exhaustive: Every observation is assigned to a group

In addition, the intervals are normally equal width, although the first or
last group may be open-ended

50
Frequency distributions

A frequency distribution is a table that shows classes or intervals of data entries

with a count of the number of entries in each class.

The data is gathered into bins or cells, which are defined by the boundaries of
the intervals.

The boundaries of the intervals should be convenient values (e.g. whole

numbers), as should the interval width.

The number of classes multiplied by the interval width should exceed the range
of the data (meaning that all the data fit within the defined intervals).

51
Constructing a frequency distribution
Step 1. Decide on the number of classes (k) to include in the frequency distribution. We can do this
using Sturge’s rule and then rounding up.
Sturge’s rule: k = 1 + 3.22 log n, where n = sample size

Step 2. Find the class interval (i) as follows: determine the range (r) of the data, divide the range by
the number of classes, and round up to the next convenient number.
𝑟
𝑖=
𝑘

Step 3. Find the class limits. You can use the minimum data value as the lower limit of the first
class.

Step 4. Make a tally mark for each data entry in the row of the appropriate class. Count the tally
marks to find the frequency (f) for each class.

52
Worked Example 2: frequency distribution

The following sample data set lists the number of minutes 50 students spent on
social media during a given day. Construct a frequency distribution of the data.

50 40 41 17 11 7 22 44 28 21 19 23 37 51 54 42 88
41 78 56 72 56 17 7 69 30 80 56 29 33 46 31 39 20
18 29 34 59 73 77 36 39 30 62 54 67 39 31 53 44

53
Worked Example 2: frequency distribution

Solution:
1. 𝑘 = 1 + 3.22 log 50 = 6.47 → Round up to 7
$$+,
2. 𝑖 = ,
= 11.52 → Round up to 12
Class Tally Frequency, f Relative Cumulative
frequency frequency
7 – 18 |||| | 6 0.12 6
19 – 30 |||| |||| 10 0.2 16
31 – 42 |||| |||| ||| 13 0.26 29
43 – 54 |||| ||| 8 0.16 37
55 – 66 |||| 5 0.1 42
67 – 78 |||| | 6 0.12 48
79 – 90 || 2 0.04 50
Σ f = 50 1.00 54
Histogram of a frequency
distribution
Histograms are commonly used to display frequency distributions. However, since the histogram
is on a continuous scale, the class boundaries for discrete data will need to be adjusted so that
there are no gaps in between.

For the data from Example 9, the distance from the upper limit of the first class to the lower limit
of the second class is 19 – 18 = 1. Half this distance is 0.5. Hence, on the histogram, the upper
boundary used for the first class (which is also the lower boundary for the second class) is 18.5.
Similarly, the second class is adjusted to be between 18.5 and 30.5, and so on.

Note that for the first class, the starting value is normally adjusted down to maintain an equal
width to the other intervals. In this case, the starting value would be 6.5.

55
Histogram of a frequency distribution
The resulting histogram is constructed as follows, with the x-axis labelled with either the
class midpoints (left) or class boundaries (right):

Social media usage Social media usage

56
Frequency polygon

A frequency polygon is like a histogram, but instead of bars, the class midpoints are
connected using straight lines.

Social media usage

To ensure that the graph

begins and ends on the
horizontal axis, extend the
graph by one class width at
each end.

57
Relative frequency histogram

The corresponding relative frequency histogram, obtained by dividing the f

values through by n, is as follows:
Social media usage

58
Ogive
Cumulative frequency refers to the total frequency value at each upper class boundary (as shown
in the table). A plot of cumulative frequency is known as an ogive (see the figure).

Social media usage

Notice that the graph starts at 6.5, where the cumulative frequency is 0, and ends at 90.5, where
the cumulative frequency is 50.

59
Class Exercise 3: histograms

60
Topic 5: sampling
techniques

61
What is sampling?

Sampling is a technique of selecting individual members or a subset of the

population to make statistical inferences from them and estimate characteristics
of the whole population.

Proper sampling is a time-saving and cost-effective way to design experiments.

For example, different sampling methods are widely used by researchers in the
field of market research so that they do not need to study the entire population to
collect actionable insights.

62
Types of sampling

Probability sampling: in this method, a researcher chooses members of a

population randomly. Hence, all the members have an equal opportunity to be a
part of the sample.

Non-probability sampling: in this method, individuals are selected based on non-

random criteria, and not every individual has a chance of being included.

63
Probability sampling

Probability sampling is mainly used in quantitative research, as it is more likely to

produce results that are representative of the entire population.

Different types of probability sampling techniques include:

Ø Simple random sampling
Ø Systematic sampling
Ø Stratified sampling
Ø Cluster sampling

64
Simple random sampling

Sampling frame includes the entire population.

True randomization (such as a random number generator) is used to select members of
the population.

65
Systematic sampling

Slightly easier to conduct than simple random sampling.

Each member is assigned a number and members are chosen at regular intervals.
Can result in bias or skew if there is a hidden pattern in the list.

66
Stratified sampling

The population is divided into subgroups (or strata) according to certain

criteria (e.g. gender, age range, job role)
Equal numbers are then chosen from each strata using simple random
or systematic sampling
This can ensure each important category is represented properly.

67
Cluster sampling

The population is divided into subgroups that should all have similar
characteristics to the whole group. The subgroups are then randomly
selected.
If subgroups are very large, these can be further sampled (known as
multistage sampling).
Can lead to errors if the subgroups are not truly representative.

68
Non-probability sampling
Non-probability sampling is mainly used in qualitative or exploratory research,
where the goal is to gain an initial understanding of a small or under-researched
population.

Can be cheaper and easier to implement than probability sampling but is also
more prone to sampling bias which can cause errors.

Different types of non-probability sampling techniques include:

Ø Convenience sampling
Ø Voluntary response sampling
Ø Purposive sampling
Ø Snowball sampling

69
Convenience sampling

Members of the population are selected based on ease of access (e.g. living in the same city or
country as the researcher).
Cheap and easy to implement, but there is no way to tell if the sample is representative, so
results may not be generalizable.

70
Voluntary response sampling

The sample consists of people who willingly participate (volunteer) in the study. Hence, it is easy
for the researcher to implement.
Likely to result in a degree of sample bias as some people may be inherently more likely than
others to volunteer.

71
Purposive sampling

Researcher uses their prior knowledge or expertise to select the most suitable sample (also
known as judgement sampling).
Cheap and easy to implement, but there is no way to tell if the sample is representative, so
results may not be generalizable.

72
Snowball sampling

If the population is hard to access, snowball sampling can be used to recruit participants via other
participants.
The number of people you have access to grows rapidly (or “snowballs”) as you contact more
people.

73
Class Activity: sampling

Collect the heights of all students in a spreadsheet. Calculate the mean height of
the class.

a) Use simple random sampling to select a sample of 5 students. Calculate the

mean. How does it differ from the entire class mean?
b) Use simple random sampling to select a sample of 10 students. Calculate
the mean. How does it differ from the entire class mean?
c) Use stratified sampling to select samples of 10 boys and 10 girls,
respectively. Calculate the sample means. How do they differ from the entire
class mean?

74
Problem Set 2

75
Question 1

The concentration of ions in a solution was measured an

operator using the same instrument 10 times. She obtained
the following data (ppm):

7.15, 7.20, 7.18, 7.19, 7.21, 7.20, 7.16, 7.18, 7.20, 7.17

(a) Calculate the sample mean, median, and mode

(b) Find the interquartile range
(c) Calculate the variance and standard deviation

76
Question 2
From the following data,
1.09 1.92 2.31 1.79 2.28 1.74 1.47 1.97
0.85 1.24 1.58 2.03 1.7 2.17 2.55 2.11
1.86 1.9 1.68 1.51 1.64 0.72 1.69 1.85
1.82 1.79 2.46 1.88 2.08 1.67 1.37 1.93
1.4 1.64 2.09 1.75 1.63 2.37 1.75 1.69

(a) Find the mean, median, and mode.

(b) Calculate the variance and standard deviation.
(c) Construct a box and whisker plot with outliers.

77
Question 3

Fifty students were asked how Table 1. Sleep data for students.
much sleep they get per school Amount of Frequency
night, rounded to the nearest sleep per
school night
hour. The results are shown in (hours)
Table 1. Based on these data: 4 2
5 5
(a) Calculate the 28th and 80th 6 7
percentiles 7 12
(b) Construct a relative 8 14
9 7
frequency histogram
10 3
(c) Construct an ogive

78
Question 4

Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
No ratings yet
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
4 pages
Measures of Central Tendency
100% (15)
Measures of Central Tendency
15 pages
03 Numerical Description
No ratings yet
03 Numerical Description
52 pages
EECM3724 Unit 1 Ch3 Slides 2022
No ratings yet
EECM3724 Unit 1 Ch3 Slides 2022
48 pages
Ken Black QA ch03
0% (1)
Ken Black QA ch03
61 pages
Lecture 3 - Stat HO
No ratings yet
Lecture 3 - Stat HO
21 pages
Click To Add Text Dr. Cemre Erciyes
No ratings yet
Click To Add Text Dr. Cemre Erciyes
69 pages
ch03 Ver3
No ratings yet
ch03 Ver3
25 pages
2a. Describing Variables With Numbers
No ratings yet
2a. Describing Variables With Numbers
30 pages
Measures of Dispersion: Profgrcnair
No ratings yet
Measures of Dispersion: Profgrcnair
22 pages
Central Tendency Variation Outliers
No ratings yet
Central Tendency Variation Outliers
59 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Analysis of Statistcal Data
No ratings yet
Analysis of Statistcal Data
46 pages
Mean, Median, Mode, Standard Deviation (Descriptive Statistics)
No ratings yet
Mean, Median, Mode, Standard Deviation (Descriptive Statistics)
43 pages
Topic 1 Describing Data II
No ratings yet
Topic 1 Describing Data II
68 pages
FDSA Unit 2
No ratings yet
FDSA Unit 2
44 pages
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
No ratings yet
Parameter Statistic Parameter Population Characteristic Statistic Sample Characteristic
9 pages
Measures of Grouped and Ungrouped Data Se 201
No ratings yet
Measures of Grouped and Ungrouped Data Se 201
8 pages
Lecture 04
No ratings yet
Lecture 04
88 pages
STAE Lecture Notes - LU3 - Annotated
No ratings yet
STAE Lecture Notes - LU3 - Annotated
10 pages
Decriptive Part 3
No ratings yet
Decriptive Part 3
32 pages
MATH30 6 Lecture 3
No ratings yet
MATH30 6 Lecture 3
66 pages
Basic 1
No ratings yet
Basic 1
60 pages
Central Tendency - Lecture Notes
No ratings yet
Central Tendency - Lecture Notes
34 pages
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
No ratings yet
Numerical Measures: Bf1206-Business Mathematics SEMESTER 2 - 2016/2017
25 pages
History Reporting
No ratings yet
History Reporting
61 pages
Stats 2
No ratings yet
Stats 2
20 pages
DDDDDD 2
No ratings yet
DDDDDD 2
5 pages
SALMAN ALAM SHAH - Definitions of Statistics
No ratings yet
SALMAN ALAM SHAH - Definitions of Statistics
16 pages
CH 2 Lecture Notes
No ratings yet
CH 2 Lecture Notes
12 pages
Share MBBS - Lecture 4 (1) - 1
No ratings yet
Share MBBS - Lecture 4 (1) - 1
68 pages
Measures of Location and Spread
No ratings yet
Measures of Location and Spread
1 page
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
No ratings yet
Basic Business Statistics: Concepts & Applications: Activity 4+ 5 + 6 Descriptive Statistics and Graphical Analysis
33 pages
Statistics For Business and Economics: Using Numerical Measures To Describe Data
No ratings yet
Statistics For Business and Economics: Using Numerical Measures To Describe Data
74 pages
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
No ratings yet
St130: Basic Statistics Week 3: Lecture: School of Computing Information and Mathematical Sciences
62 pages
Lecture 2b - Describing Data-Numerical
No ratings yet
Lecture 2b - Describing Data-Numerical
47 pages
Module 1 Overview - of - Statistics
No ratings yet
Module 1 Overview - of - Statistics
11 pages
Basic Concepts of Statistics
No ratings yet
Basic Concepts of Statistics
43 pages
L3 Numerical Summary Measures
No ratings yet
L3 Numerical Summary Measures
44 pages
STAE Lecture Notes - LU3
No ratings yet
STAE Lecture Notes - LU3
24 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
Lecture 2-3 Data Analysis Location & Dispression
No ratings yet
Lecture 2-3 Data Analysis Location & Dispression
43 pages
Measusres of Locations
No ratings yet
Measusres of Locations
52 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
65 pages
Statistics 84
No ratings yet
Statistics 84
4 pages
Measures
No ratings yet
Measures
8 pages
Lecture 3 Numerical Measures of Data
No ratings yet
Lecture 3 Numerical Measures of Data
36 pages
Module 3 - Branches of Statistics
No ratings yet
Module 3 - Branches of Statistics
50 pages
Representation of Data - 1.1.4
No ratings yet
Representation of Data - 1.1.4
6 pages
Descriptive Statistics 1
No ratings yet
Descriptive Statistics 1
63 pages
R - Iii Unit
No ratings yet
R - Iii Unit
34 pages
Lec006 - Measures of Dispersion
No ratings yet
Lec006 - Measures of Dispersion
42 pages
Quantitative Methods For Management
No ratings yet
Quantitative Methods For Management
118 pages
NITKclass 1
No ratings yet
NITKclass 1
50 pages
CHAPTER 1 Descriptive Statistics
No ratings yet
CHAPTER 1 Descriptive Statistics
5 pages
Measures of Variability
100% (2)
Measures of Variability
71 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Chapter 2
No ratings yet
Chapter 2
46 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet
GCSE Mathematics Numerical Crosswords Higher Tier Written for the GCSE 9-1 Course
From Everand
GCSE Mathematics Numerical Crosswords Higher Tier Written for the GCSE 9-1 Course
Ian Winkworth
No ratings yet
CHAPTER5 Assessment Answer
No ratings yet
CHAPTER5 Assessment Answer
20 pages
Sample Size and Optimal Design For Logistic Regression With Binary Interaction - Eugene Demidenko
No ratings yet
Sample Size and Optimal Design For Logistic Regression With Binary Interaction - Eugene Demidenko
11 pages
Risk and Return
No ratings yet
Risk and Return
3 pages
Chapter 3 SIP Methodology
No ratings yet
Chapter 3 SIP Methodology
43 pages
Reviewer 1
No ratings yet
Reviewer 1
20 pages
ML Notes (Module-3)
No ratings yet
ML Notes (Module-3)
21 pages
JSO (Test - 14) Paid
No ratings yet
JSO (Test - 14) Paid
5 pages
Manual Benchmarking JDemetra
No ratings yet
Manual Benchmarking JDemetra
12 pages
677cc494589faa9aee6cfb8a FORMULA SHEET MAS202
No ratings yet
677cc494589faa9aee6cfb8a FORMULA SHEET MAS202
2 pages
TOPIC 6 Sampling Distribution and Point Estimation of Parameters
No ratings yet
TOPIC 6 Sampling Distribution and Point Estimation of Parameters
38 pages
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
No ratings yet
A Combination of Mathematics, Statistics and Machine Learning To Detect Fraud
4 pages
Title: Interval Estimation: σ known.: Confidence Interval Lower Limit Upper Limit
No ratings yet
Title: Interval Estimation: σ known.: Confidence Interval Lower Limit Upper Limit
2 pages
Make Up Assignment - Data Science
No ratings yet
Make Up Assignment - Data Science
4 pages
MMW Reviewer
No ratings yet
MMW Reviewer
5 pages
Tutorial 4 Solutions - Forecasting
100% (1)
Tutorial 4 Solutions - Forecasting
7 pages
STA201-STA202 - Lecture # 5-Measures of Position
No ratings yet
STA201-STA202 - Lecture # 5-Measures of Position
48 pages
The Influence of Fashion Consciousness and Brand I
No ratings yet
The Influence of Fashion Consciousness and Brand I
6 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
AP 2015 Statistics
No ratings yet
AP 2015 Statistics
65 pages
Maths Integration
No ratings yet
Maths Integration
7 pages
(Mai 4.1-4.3) Descriptive Statistics
No ratings yet
(Mai 4.1-4.3) Descriptive Statistics
34 pages
Midterm Exam Formula PDF
No ratings yet
Midterm Exam Formula PDF
6 pages
II Puc Statistics Old Question Papers Upto 2017
No ratings yet
II Puc Statistics Old Question Papers Upto 2017
14 pages
Advanced Econometrics:Test 1 (30 Marks) Date: October 9, 2017
No ratings yet
Advanced Econometrics:Test 1 (30 Marks) Date: October 9, 2017
4 pages
ME-Tut 9
0% (1)
ME-Tut 9
1 page
Name: Mukund N. Purohit Roll No.: 21 Multivariate Analysis: Definition
100% (1)
Name: Mukund N. Purohit Roll No.: 21 Multivariate Analysis: Definition
6 pages
Assignment 4 MA2201
No ratings yet
Assignment 4 MA2201
3 pages
Sentiment Analysis IMDB Review - Presentation
No ratings yet
Sentiment Analysis IMDB Review - Presentation
19 pages
Hypothesis Tests For The Means of Two Populations
No ratings yet
Hypothesis Tests For The Means of Two Populations
21 pages

MetNum1 2023 1 Week 10

Uploaded by

MetNum1 2023 1 Week 10

Uploaded by

CE23216 MetNum 1

Rehan Hussain, Ph.D.

What's in this week’s lectures?

1. Measures of location and center

4. Data presentation methods

 Descriptive statistics aims to summarize a sample, rather than to learn

 Descriptive statistics forms the basis of nearly every quantitative

 Data are usually described in terms of location, central tendency,

Measures of Central Tendency Measures of Shape

l To calculate useful statistics, the data collected in an experiment

l Two ways of doing this are ordering and grouping.

l In grouping, data are grouped according to the number of times

l Suppose a group of eight children have the following ages:

l The position of the Pth percentile is given by

l To find the 50th percentile, determine its position:

l Hence, we linearly interpolate between the 10th and 11th observations:

l To find the 80th percentile, determine its position:

l Hence, we linearly interpolate between the 16th and 17th observations:

l To find the 90th percentile, determine its position:

l Thus, the 90th percentile is located between the 18th observation

l Using linear interpolation, we get:

l Quartiles are the specific percentiles which break down the

ØThe 50th percentile is known as the second (or middle) quartile

The table shows 8 observations of pull-off force from engine connectors, in

• The position of the median is

• The median is the same as the second quartile or 50th percentile.

 The mode is the data value with the highest frequency.

The number of cartons (quarts) of milk purchased each week by 25

 Median 𝑀=2 13th data point

 Mode Mo = 2 The number 2 appears 9 times

l Range refers to the difference between the maximum and

l Interquartile range (IQR) refers to the difference between the

l Variance refers to the average of the squared deviations from

l Standard deviation refers to the square root of the variance

 Note that the population range is always ≥ the sample range.

For the data set shown in Worked Example 1,

Range = Maximum – Minimum

Third Quartile: (20 + 1) ´ 75/100 = 15.75 27 + (.75)(5) = 30.75

If the 𝑛 observations in a sample are denoted by 𝑥#, 𝑥%, … , 𝑥& , the

For the 𝑁 observations in a population denoted by 𝑥#, 𝑥%, … , 𝑥' , the

Hence, s2 = 0.23 N2 and s = 0.48 N (note: the desired accuracy is

l Skewness is a measure of the degree of asymmetry of a

l Kurtosis is a measure of the flatness or peakedness of a

Mean<Median<Mode Mean = Median = Mode Mean>Median>Mode

 In science and engineering, the

 Here, we will look at some common

l Pie chart: categories are represented as percentages of the total.

l Bar chart: categories are represented as rectangles (‘bars’) with

l Frequency polygon: line graph of frequency

l Ogive: line graph of cumulative frequency

l Time series plot: shows how values change over time

Coca Cola 88 100 88

Student Cola Preference From Survey

Zam Zam Cola 47

Coca Cola Pepsi Bloxy Cola Me cca Cola Coca Cola 88

Company sales: (a) yearly and (b) quarterly.

 Five specific values are used: the median (Q2), the

• Sometimes, instead of showing the maximum and minimum values, the

Table 1. Compressive strength (psi) of aluminum-lithium specimens.

105 221 183 186 121 181 180 143

Q1 = 143.5, Q2 = 161.5, Q3 = 181, IQR = 37.5

Step 2: Construct the ‘box’ (can be either horizontal or vertical)

Step 3: Find the upper and lower limits for outliers.

The upper limit is given by: 181 + 1.5 IQR = 237.25

Step 5. Mark the outlier values and label the plot.

• Unlike a bar chart, there are no

l When there the number of observations is large, it is useful to partition

 A frequency distribution is a table that shows classes or intervals of data entries

 The boundaries of the intervals should be convenient values (e.g. whole

Social media usage Social media usage

Social media usage

To ensure that the graph

 The corresponding relative frequency histogram, obtained by dividing the f

Social media usage

 Sampling is a technique of selecting individual members or a subset of the

 Proper sampling is a time-saving and cost-effective way to design experiments.

Descriptive statistics aims to summarize a sample, rather than to learn

Descriptive statistics forms the basis of nearly every quantitative

Data are usually described in terms of location, central tendency,

The mode is the data value with the highest frequency.

Median 𝑀=2 13th data point

Mode Mo = 2 The number 2 appears 9 times

Note that the population range is always ≥ the sample range.

In science and engineering, the

Here, we will look at some common

Five specific values are used: the median (Q2), the

A frequency distribution is a table that shows classes or intervals of data entries

The boundaries of the intervals should be convenient values (e.g. whole

The corresponding relative frequency histogram, obtained by dividing the f

Sampling is a technique of selecting individual members or a subset of the

Proper sampling is a time-saving and cost-effective way to design experiments.

Probability sampling: in this method, a researcher chooses members of a

Non-probability sampling: in this method, individuals are selected based on non-

Probability sampling is mainly used in quantitative research, as it is more likely to

Different types of probability sampling techniques include:

Sampling frame includes the entire population.

Slightly easier to conduct than simple random sampling.

The population is divided into subgroups (or strata) according to certain

Different types of non-probability sampling techniques include: