MATH103 M2 Data Presentation
MATH103 M2 Data Presentation
1
Data Presentation
Distribution Tables
Graphical Methods
Frequency Histogram
Frequency Polygon
Cumulative Frequency Polygon or Ogive
Stem and Leaf Plot
2
Data Presentation
Numerical Measures
3
Frequency Distribution Table
4
Distribution Tables
A tabular presentation of grouped data that lists down the class limit,
class boundary and class marks for each class.
⋮ ⋮ ⋮ ⋮ ⋮
5
Distribution Tables
Frequency Distribution Frequency Distribution Table
A distribution of the total number of
observations over arbitrarily LCB LCL CM UCL UCB Frequency
defined classes, cells, or categories.
𝐿𝐶𝐵1 𝐿𝐶𝐿1 Cell 1 𝑈𝐶𝐿1 𝑈𝐶𝐵1 𝑓1
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝐿𝐶𝐵𝑘 𝐿𝐶𝐿𝑘 Cell k 𝑈𝐶𝐿𝑘 𝑈𝐶𝐵𝑘 𝑓𝑘
Frequency
𝑘
The number of times a particular
data point occurs in the set of data. 𝑓𝑖 = 𝑛
𝑖=1
6
Distribution Tables
Relative Frequency Distribution Relative Frequency Distribution
A distribution showing the relative Table
frequencies over arbitrarily defined LCB LCL CM UCL UCB
Relative
Frequency
class.
𝐿𝐶𝐵1 𝐿𝐶𝐿1 Cell 1 𝑈𝐶𝐿1 𝑈𝐶𝐵1 𝑟𝑓1
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝐿𝐶𝐵𝑘 𝐿𝐶𝐿𝑘 Cell k 𝑈𝐶𝐿𝑘 𝑈𝐶𝐵𝑘 𝑟𝑓𝑘
Relative Frequency 𝑓2
Ratio of the class frequency to the 𝑟𝑓2 =
𝑛
total number of observations.
𝑘
𝑟𝑓𝑖 = 1
𝑖=1
7
Distribution Tables
Cumulative Distribution Cumulative Frequency
A distribution showing the Distribution Table
cumulative frequency LCB LCL CM UCL UCB
Cumulative
Frequency
corresponding to a class boundary.
𝐿𝐶𝐵1 𝐿𝐶𝐿1 Cell 1 𝑈𝐶𝐿1 𝑈𝐶𝐵1 𝑐𝑓1
⋮ ⋮ ⋮ ⋮ ⋮ ⋮
𝐿𝐶𝐵𝑘 𝐿𝐶𝐿𝑘 Cell k 𝑈𝐶𝐿𝑘 𝑈𝐶𝐵𝑘 𝑐𝑓𝑘
Cumulative Frequency
The accumulated frequency that is 𝑐𝑓3 = 𝑓3 + 𝑓2 + 𝑓1
less than or equal to the stated
value or class boundary. 𝑐𝑓𝑘 = 𝑛
8
Distribution Tables
Relative Cumulative Frequency Relative Cumulative Frequency
Distribution Distribution Table
A distribution showing the Relative Cumulative
LCB LCL CM UCL UCB
Frequency
relative cumulative frequency 𝐿𝐶𝐵1 𝐿𝐶𝐿1 Cell 1 𝑈𝐶𝐿1 𝑈𝐶𝐵1 𝑟𝑐𝑓1
corresponding to a class ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
boundary. 𝐿𝐶𝐵𝑘 𝐿𝐶𝐿𝑘 Cell k 𝑈𝐶𝐿𝑘 𝑈𝐶𝐵𝑘 𝑟𝑐𝑓𝑘
𝑟𝑐𝑓𝑘 = 1
9
98.6 98.6 98.0 98.0 99.0 98.4 97.7
Sample Problem
98.8
98.5
98.6
97.3
97.0
98.7
97.0
97.4
98.8
98.9
97.6
98.6
98.8
98.0
99.6 98.7 99.4 98.2 98.0 98.6 98.0
98.0 97.8 98.0 98.4 98.6 98.6 98.3
96.9 97.6 97.1 97.9 98.4 97.3 99.5
98.8 98.7 97.8 98.0 97.1 97.4 97.5
98.6 98.3 98.7 98.8 99.1 98.6 97.3
98.9 98.4 98.6 97.1 97.9 98.8 97.6
98.0 98.4 97.8 98.4 97.4 98.0 98.2
99.0 96.5 97.6 98.0 97.8 97.6 98.6
97.5 97.6 98.2 98.5 98.0 98.2 97.2
98.4 98.6 98.4 98.5 99.4 99.2 98.4
98.8 98.0 98.7 98.5 97.9 97.8 98.6
97.0 98.7 98.4 98.4 98.4 98.6 98.6
98.2
96.5 96.6 96.7 96.8 96.9 97.0 97.1 97.2 97.3 97.4 97.5 97.6 97.7 97.8 97.9 98.0 98.1 98.2 98.3 98.4 98.5 98.6 98.7 98.8 98.9 99.0 99.1 99.2 99.3 99.4 99.5 99.6
Cell 1 Cell 2 Cell 3 Cell 4 Cell 5 Cell 6 Cell 7 10 Cell 8
Sample Problem
Frequency Distribution Table
LCB LCL CM UCL UCB Frequency
11
Sample Problem
Relative Frequency Distribution Table
Relative
LCB LCL CM UCL UCB
Frequency
96.45 96.5 96.65 96.8 96.85 0.009433962
96.85 96.9 97.05 97.2 97.25 0.075471698
97.25 97.3 97.45 97.6 97.65 0.132075472
97.65 97.7 97.85 98 98.05 0.20754717
98.05 98.1 98.25 98.4 98.45 0.179245283
98.45 98.5 98.65 98.8 98.85 0.301886792
98.85 98.9 99.05 99.2 99.25 0.056603774
99.25 99.3 99.45 99.6 99.65 0.037735849
12
Sample Problem
Cumulative Frequency Distribution Table
Cumulative
LCB LCL CM UCL UCB
Frequency
96.45 96.5 96.65 96.8 96.85 1
96.85 96.9 97.05 97.2 97.25 9
97.25 97.3 97.45 97.6 97.65 23
97.65 97.7 97.85 98 98.05 45
98.05 98.1 98.25 98.4 98.45 64
98.45 98.5 98.65 98.8 98.85 96
98.85 98.9 99.05 99.2 99.25 102
99.25 99.3 99.45 99.6 99.65 106
13
Sample Problem
Relative Cumulative Frequency Distribution Table
Relative Cumulative
LCB LCL CM UCL UCB
Frequency
96.45 96.5 96.65 96.8 96.85 0.009433962
96.85 96.9 97.05 97.2 97.25 0.08490566
97.25 97.3 97.45 97.6 97.65 0.216981132
97.65 97.7 97.85 98 98.05 0.424528302
98.05 98.1 98.25 98.4 98.45 0.603773585
98.45 98.5 98.65 98.8 98.85 0.905660377
98.85 98.9 99.05 99.2 99.25 0.962264151
99.25 99.3 99.45 99.6 99.65 1
14
Frequency Histogram
Frequency Polygon
Pie Chart
Scatterplot
15
Graphical Methods
Frequency Histogram 35
Frequency
20
0
96.5-96.8 96.9-97.2 97.3-97.6 97.7-98 98.1-98.4 98.5-98.8 98.9-99.2 99.3-99.6
16
Graphical Methods
Frequency
Frequency Polygon 35
20
15
A line segment connects the
frequency of the class marks of all 10
classes. 5
0
96.65 97.05 97.45 97.85 98.25 98.65 99.05 99.45
17
Graphical Methods
Cumulative Frequency Polygon Cumulative Frequency
120
A line graph that is based on the
100
cumulative frequency distribution of
grouped data. 80
60
18
Graphical Methods
Stem and Leaf Plot
A tabular method that identifies the
place value of the data as stems or
Stem Leaf Frequency
leaves. 96 59 2
97 000111233344455666666788888999 30
0000000000000222223344444444444455556666666666666667777778888888
98 66
99
Used to account the frequency of 99 00124456 8
ungrouped data.
19
Graphical Methods
Pie Chart Frequency
20
Graphical Methods
Frequency
Scatterplot 16
21
Measures of Central Tendency
Measures of Position
22
Numerical Measures
Measures of Central Tendency
Arithmetic Mean
Center of gravity. It works best if the data is distributed very evenly across the
range.
For ungrouped data:
σ𝑛𝑖=1 𝑥𝑖
𝑥=
𝑛
For grouped data:
σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖
𝑥=
𝑛
Where:
𝑥𝑖 Class Mark
𝑓𝑖 Frequency of ith class
23
Numerical Measures
Measures of Central Tendency
Median
Divides the data into two equal parts.
Better measure of centrality than the mean if your data are skewed, meaning
lopsided.
For ungrouped data:
𝑥𝑛+1 𝑥𝑛
𝑥𝑛+1 +
𝑥 = = 2 2
2 2
For grouped data:
𝑛
− σ 𝑓𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝑥 = 𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 + 2 ×𝑐
𝑓𝑚𝑒𝑑𝑖𝑎𝑛
Where
𝒄 Class size
24
Numerical Measures
Measures of Central Tendency
Mode
Most frequency observed value of the variable.
Useful when differences are rare or when the differences are non numerical.
For ungrouped data:
𝑥ො − 𝑚𝑜𝑠𝑡 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
For grouped data:
∆1
𝑥ො = 𝐿𝐶𝐵𝑚𝑜𝑑𝑎𝑙 + ×𝑐
∆1 + ∆2
Where:
∆1 Freq. of modal class minus freq. of next lower class
∆2 Freq. of modal class minus freq. of next higher class
25
Numerical Measures
Measures of Variation
Range
Measure of dispersion or variation defined as the difference between the
largest and smallest data values.
It is a crude indication of the spread of the scores.
For ungrouped data:
𝑅 = 𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
For grouped data:
26
Numerical Measures
Measures of Variation
Mean Deviation
The sum of the absolute difference of the observations and the mean.
For ungrouped data:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥
𝑀𝐷 =
𝑛
𝑥 Mean
28
𝑛 Number of observations
Numerical Measures
Measures of Variation
Standard Deviation
The dispersion of data defined as the square root of the Variance.
For ungrouped data:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥 2
𝑠=
𝑛−1
For grouped data:
σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥 2
𝑠=
𝑛−1
Where 𝑥𝑖 Class Mark
𝑥 Mean
29
𝑛 Number of observations
Numerical Measures
Further Description of Data
Skewness
Degree of asymmetry departure from symmetry of a distribution.
A distribution, or data set, is symmetric if it looks the same to the left and right
of the center point
Image from:
30
http://www.janzengroup.net/stats/lessons/descriptive.html
Numerical Measures
Further Description of Data
Skewness
For ungrouped data:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥 3
𝑎3 =
𝑛 − 1 𝑠3
For grouped data:
σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥 3
𝑎3 =
𝑛 − 1 𝑠3
Where:
𝑥𝑖 Class Mark
𝑥 Mean
𝑛 Number of observations
31
Numerical Measures
Further Description of Data
Kurtosis
Degree of peakedness of a unimodal distribution.
𝑎4 < 3 𝑎4 ≈ 3 𝑎4 > 3
Image from:
32
http://www.janzengroup.net/stats/lessons/descriptive.html
Numerical Measures
Further Description of Data
Kurtosis
For ungrouped data:
σ𝑛𝑖=1 𝑥𝑖 − 𝑥 4
𝑎4 =
𝑛 − 1 𝑠4
For grouped data:
σ𝑘𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥 4
𝑎4 =
𝑛 − 1 𝑠4
Where:
𝑥𝑖 Class Mark
𝑥 Mean
𝑛 Number of observations
33
Numerical Measures
Measures of Position
Quartile
Identifies the position of data in quarters of the total observations.
𝑗
𝑛 − σ 𝑓𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝑄𝑗 = 𝐿𝐶𝐵𝑄𝑗 + 4 ×𝑐
𝑓𝑄𝑗
Where:
𝑄𝑗 Jth quartile
𝑛 Total number of observations or population
𝑓𝑄𝑗 Frequency of quartile class
𝑐 Class size
34
Numerical Measures
Measures of Position
Decile
Identifies the position of data in tenths of the total observations.
𝑗
𝑛 − σ 𝑓𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝐷𝑗 = 𝐿𝐶𝐵𝐷𝑗 + 10 ×𝑐
𝑓𝐷𝑗
Where:
𝐷𝑗 Jth decile
𝑛 Total number of observations in the sample
𝑓𝐷𝑗 Frequency of decile class
𝑐 Class size
35
Numerical Measures
Measures of Position
Percentile
Identifies the position of data in hundredths of the total observations.
𝑗
𝑛 − σ 𝑓𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝑃𝑗 = 𝐿𝐶𝐵𝑃𝑗 + 100 ×𝑐
𝑓𝑃𝑗
Where:
𝑃𝑗 Jth percentile
𝑛 Total number of observations in the sample
𝑓𝑃𝑗 Frequency of percentile class
𝑐 Class size
36
Empirical Rule
Summary
Measures & Standardized values and Z-scores
Decisions
Box and whisker plot
37
Summary Measures & Decisions
Empirical Rule
An outlier is any data below 𝑥ҧ − 3𝑠 or above 𝑥ҧ + 3𝑠
Image from:
https://es.khanacademy.org/math/probability/normal-
38
distributions-a2/normal-distributions-a2ii/a/basic-normal-
Summary Measures & Decisions
Empirical Rule
An outlier is any data below 𝑥 − 3𝑠 or above 𝑥 + 3𝑠
16
14
12
10
0
96.4 96.5 96.6 96.7 96.8 96.9 97 97.1 97.2 97.3 97.4 97.5 97.6 97.7 97.8 97.9 98 98.1 98.2 98.3 98.4 98.5 98.6 98.7 98.8 98.9 99 99.1 99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9 100
39
Summary Measures & Decisions
Standardized values and Z-scores
Similar to Empirical Rule but the
outliers are any data below z < −3
or 𝑧 > 3
Z-score:
𝑥 − 𝑥ҧ
𝑧=
𝑠
Image from:
https://heartbeat.fritz.ai/how-to-make-your-machine- 40
learning-models-robust-to-outliers-44d404067d07
Summary Measures & Decisions
Standardized values and Z-scores
Similar to Empirical Rule but the outliers are any data below z < −3 or 𝑧 > 3
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
-3 -2 -1 0 1 2 3
41
Summary Measures & Decisions
Box and whisker plot
Displaying the distribution of data based on the Quartile division of the data.
Image from:
http://www.webquest.hawaii.edu/kahihi/mathdictionary/B/box 42
whiskerplot.php
Summary Measures & Decisions
Boxplots
Outliers are any data within
𝑄1 – 1.5𝐼𝑄𝑅 or above 𝑄3 + 1.5𝐼𝑄𝑅
𝐼𝑄𝑅 = 𝑄1 − 𝑄3
43