Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA
Grouped data approach is being used solved and analyze statistical data
which comprise of 30 or more data.
4. Class Mark.
Halfway between the upper limit and lower limit of a class (also called midpoint).
5. Class Boundary.
Halfway between the lower limit of one class and the upper limit of the
preceding class (the exact limit).
6. Relative Frequency.
The frequency of one score or group of scores divided by the total frequency of
all the observations.
7. Cumulative Frequency.
The frequency of any class plus the frequencies of all preceding class in a
distribution.
Some Important Terminologies
(Grouped Data)
Class Class
Class Boundary x f cf rf Frequency
19 - 50 18.5 - 50.5 34.5 6 6 0.20
Class 51 - 82 50.5 - 82.5 66.5 10 16 0.33 Cumulative
Boundary 83 - 114 82.5 - 114.5 98.5 4 20 0.13 Frequency
115 - 146 114.5 - 146.5 130.5 4 24 0.13
147 - 178 146.5 - 178.5 162.5 4 28 0.13
Relative
Class Mark 179 - 210 178.5 - 210.5 194.5 2 30 0.07
Frequency
(Midpoint)
Frequency Distribution Table
Some Important Terminologies
(Grouped Data)
8. Graph.
A pictorial representation of a set of data.
Some Important Terminologies
(Grouped Data)
9. Histogram.
A vertical bar
graph that shows
the frequencies of
scores or classes of
scores by the
height of the bar.
Some Important Terminologies
(Grouped Data)
A Bimodal Distribution
A Normal Distribution
A Negatively Skewed
A Positively Skewed Distribution
Distribution
Some Important Terminologies
(Grouped Data)
NOTE: These data will be used to solve the different statistical measures using
grouped data analysis.
1.1 Creating a Frequency Distribution Table
STATISTICAL MEASURES:
Grouped Data Analysis
Step1. Get the lowest and the highest value in the distribution.
We shall let H and L be the highest and the lowest value in
the distribution.
STATISTICAL MEASURES:
Grouped Data Analysis
where:
k = number of classes
n = sample size
1.1 Creating a Frequency Distribution Table
STATISTICAL MEASURES:
Grouped Data Analysis
Step 4. Determine the size of the class interval. The value of CI can be
obtained by dividing the range by the desired number of classes.
Hence, CI = R/k.
Step 5. Construct the classes. In constructing the classes, we first determine the
lowest lower limit of the distribution. The value of this lower limit can be chosen
arbitrarily as long as the lowest value shall fall on the first interval and the
highest value to the last interval.
STATISTICAL MEASURES:
Grouped Data Analysis
Class CF CF
Class Class Boundary Mark (x) f down up rf
- -
- -
- -
- -
- -
- -
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location
a. Mean.
σ 𝑓𝑥 σ 𝑓𝑥
Population Mean: 𝜇 = and Sample Mean: 𝑥ҧ = 𝑁−1
𝑁
where:
σ 𝑓𝑥 = product of frequency of a class and its corresponding class mark
N = number of population
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location
b. Median
𝑛
− 𝑓𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
෪ = 𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 +
Median (𝑥) 2
𝐶𝐼
where:
𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = lower class boundary value of the median class
n = number of sample
𝑓𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = frequency of the median class
CI = class interval value
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location
c. Mode
(∆1 −∆2 )
Mode (𝑥)
ො = 𝐿𝐶𝐵𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 + ∆2
where:
𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = lower class boundary value of the modal class
∆1 =
∆2 =
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location
20
15
MEAN MODE
Frequency
10
5
MEDIAN
0
F D C B A
Frequency 3 10 20 23 12
Grade
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location
2. Measures of Variability.
Measures of dispersion are important for describing the spread of
the data, or its variation around a central value.
a. Range
b. Variance
c. Standard Deviation
d. Mean Deviation
e. Interquartile Range
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion
2.1. Range. The difference between the largest and smallest number in
the set.
R = Hscore - Lscore
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion
Percentile Deviation:
Decile Deviation:
Quartile Deviation:
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape
60 1200
50 1000
40 800
Frequency
Frequency
30 600
20 400
10 200
0 0
> 59 50 - 59 40 - 49 30 - 39 20 - 29 < 20 <100 100 - 199 200 - 299 300 - 399 400 - 499 500 - 600
Frequency 300 500 600 1000 1100 950
Frequency 40 50 40 20 15 12
GRE - Numerical Scores
Age Groups
70
60
50
Frequency
40
30
20
10
0
1 2 3 4 5 6 7
Frequency 25 55 65 50 62 58 25
Self-Ratings (1 = Low Self-Esteem, 7 = High Self-Esteem)
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape: Skewness
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape
Bell-shaped -- Mesokurtic
Peaked -- Leptokurtic
Flat – Platykurtic
The following data are the Using the all the data in the table above,
calculate and interpret the following:
measures of the diameters of • Mean
36 rivet heads in 1/100 of an • Standard Variation
inch. • Skewness
• 50th Percentile
• Create the following graphs:
• Histogram
• Frequency Ogive
• Box Plot