0% found this document useful (0 votes)
16 views

Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA

Uploaded by

20-08653
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Descriptive Statistics - Grouped Data and Graphs - Math403 - EDA

Uploaded by

20-08653
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

LEARNING OUTCOMES

 Know different terminologies involved in grouped data analysis.


 Create a frequency distribution table.
 Solve problems involving the different statistical measures and interpret
the computed values.
 Interpret the corresponding graphical presentations derived from the
solutions.
INTRODUCTION TO GROUPED DATA ANALYSIS

 Grouped data approach is being used solved and analyze statistical data
which comprise of 30 or more data.

 These statistical analysis is widely performed in all industries to analyze


quality performance and for the purpose of quality improvement.
Some Important Terminologies
(Grouped Data)

1. Frequency Distribution Table.


Class
A tabulation of data showing the
Class Boundary x f
number of times a score appears.
19 - 50 18.5 - 50.5 34.5 6
2. Frequency. 51 - 82 50.5 - 82.5 66.5 10
The number of times a score or a 83 - 114 82.5 - 114.5 98.5 4
group of scores (class) occurs in a 115 - 146 114.5 - 146.5 130.5 4
population or sample. 147 - 178 146.5 - 178.5 162.5 4
3. Class. 179 - 210 178.5 - 210.5 194.5 2
A group of scores in a population
or sample.
Some Important Terminologies
(Grouped Data)

4. Class Mark.
Halfway between the upper limit and lower limit of a class (also called midpoint).
5. Class Boundary.
Halfway between the lower limit of one class and the upper limit of the
preceding class (the exact limit).
6. Relative Frequency.
The frequency of one score or group of scores divided by the total frequency of
all the observations.
7. Cumulative Frequency.
The frequency of any class plus the frequencies of all preceding class in a
distribution.
Some Important Terminologies
(Grouped Data)

Class Class
Class Boundary x f cf rf Frequency
19 - 50 18.5 - 50.5 34.5 6 6 0.20
Class 51 - 82 50.5 - 82.5 66.5 10 16 0.33 Cumulative
Boundary 83 - 114 82.5 - 114.5 98.5 4 20 0.13 Frequency
115 - 146 114.5 - 146.5 130.5 4 24 0.13
147 - 178 146.5 - 178.5 162.5 4 28 0.13
Relative
Class Mark 179 - 210 178.5 - 210.5 194.5 2 30 0.07
Frequency
(Midpoint)
Frequency Distribution Table
Some Important Terminologies
(Grouped Data)

8. Graph.
A pictorial representation of a set of data.
Some Important Terminologies
(Grouped Data)

9. Histogram.
A vertical bar
graph that shows
the frequencies of
scores or classes of
scores by the
height of the bar.
Some Important Terminologies
(Grouped Data)

How to Create Histogram

A Bimodal Distribution

A Normal Distribution

A Negatively Skewed
A Positively Skewed Distribution
Distribution
Some Important Terminologies
(Grouped Data)

10. Frequency Polygon.


A graph on which the
frequencies of classes
are plotted at the class
mark and the class
marks are connected
by straight lines.
Some Important Terminologies
(Grouped Data)

11. Scatter Diagram. 4.5


The relationship 4.4
4.3
between two variables
is shown by a series of
4.2
4.1
4 . ..
.. ..... . .. ..
..
dots plotted on a
graph.
3.9
3.8
3.7
3.6
3.5
. 3.8 y
axis
200 x
axis
..
150 200 250 300 350 400 450 500 550 600 650
Some Important Terminologies
(Grouped Data)

12. Frequency Ogive.


An ogive is a cumulative
frequency polygon, often
presented in percentage
terms. Cumulative
frequencies show the
running total, the
frequency below each
class boundary.
STATISTICAL MEASURES:
Grouped Data Analysis

1. Measure of Central Tendency or Central Location


2. Measure of Variability or Dispersion
3. Measure of Position
4. Measure of Shape
STATISTICAL MEASURES:
Grouped Data Analysis

The following are the scores of the students in certain examination.


23.5 60.7 56.9 79.4 89.4 57.3 74.5 52.4 55.3
80.4 77.9 84.5 81.3 17.2 41.2 65.7 63.6 26.5
52.6 71.3 77.4 64.8 32.8 78.3 25.4 88.1 76.2
41.8 78.1 98.2 83.5 95.6 64.0 48.6 80.3 43.9

NOTE: These data will be used to solve the different statistical measures using
grouped data analysis.
1.1 Creating a Frequency Distribution Table

STATISTICAL MEASURES:
Grouped Data Analysis

Steps in constructing the Frequency Distribution:

Step1. Get the lowest and the highest value in the distribution.
We shall let H and L be the highest and the lowest value in
the distribution.

Step2. Get the value of the range. The range denoted by R1


refers to the difference between the highest and the lowest
value in the distribution.
1.1 Creating a Frequency Distribution Table

STATISTICAL MEASURES:
Grouped Data Analysis

Steps in constructing the Frequency Distribution:

Step3. Determine the number of classes. In the determination of the


number of classes, it should be noted that there is no standard method to
follow. Generally, the number of classes must not be less than 5 and should
not be more than 15. In some instances, however, the number of classes can
be approximated by using the relation, K= 1+ 3.222 log n (Sturge’s Formula)

where:
k = number of classes
n = sample size
1.1 Creating a Frequency Distribution Table

STATISTICAL MEASURES:
Grouped Data Analysis

Steps in constructing the Frequency Distribution:

Step 4. Determine the size of the class interval. The value of CI can be
obtained by dividing the range by the desired number of classes.
Hence, CI = R/k.

Step 5. Construct the classes. In constructing the classes, we first determine the
lowest lower limit of the distribution. The value of this lower limit can be chosen
arbitrarily as long as the lowest value shall fall on the first interval and the
highest value to the last interval.
STATISTICAL MEASURES:
Grouped Data Analysis

Steps in constructing the Frequency Distribution:

Step 6. Determine the frequency of each class. The determination of the


number of frequencies is done by counting the number of items that
fall in each interval. Use frequency tally.
Step 7. Fill out the frequency distribution table with the other required
information as indicated.
STATISTICAL MEASURES:
Grouped Data Analysis

Class CF CF
Class Class Boundary Mark (x) f down up rf
- -
- -
- -
- -
- -
- -
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

1. Measure of Central Tendency or Central Location

Central Tendency is the point about which the scores tend to


cluster, a sort of average in the series. It is the center of
concentration of scores in any set of data. It is a single number
which represents the general level of performance of a group.
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

a. Mean.
σ 𝑓𝑥 σ 𝑓𝑥
Population Mean: 𝜇 = and Sample Mean: 𝑥ҧ = 𝑁−1
𝑁

where:
σ 𝑓𝑥 = product of frequency of a class and its corresponding class mark
N = number of population
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

b. Median
𝑛
− 𝑓𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
෪ = 𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 +
Median (𝑥) 2
𝐶𝐼

where:
𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = lower class boundary value of the median class
n = number of sample
𝑓𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = frequency of the median class
CI = class interval value
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

c. Mode
(∆1 −∆2 )
Mode (𝑥)
ො = 𝐿𝐶𝐵𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 + ∆2

where:
𝐿𝐶𝐵𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠 = lower class boundary value of the modal class
∆1 =
∆2 =
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

Distribution of Final Grades in Statistics Course


25

20

15
MEAN MODE
Frequency

10

5
MEDIAN

0
F D C B A
Frequency 3 10 20 23 12
Grade
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Central Tendency or Central Location

 Comparison of Central Tendency Measures

 Use Mean when distribution is reasonably symmetrical, with few extreme


scores and has one mode.
 Use Median with nonsymmetrical distributions because it is not sensitive to
skewness.
 Use Mode when dealing with frequency distribution for nominal data.
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion

2. Measures of Variability.
Measures of dispersion are important for describing the spread of
the data, or its variation around a central value.
a. Range
b. Variance
c. Standard Deviation
d. Mean Deviation
e. Interquartile Range
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion

2.1. Range. The difference between the largest and smallest number in
the set.

R = Hscore - Lscore
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion

2.2.a. Population Variance (𝝈𝟐 ).

2.2.b. Sample Variance (𝒔𝟐 ).


STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion

2.3. Standard Deviation. Square root of variance


2.3.a. Population Standard Deviation (σ).
2.3.b. Sample Standard Deviation (s).

2.4. Mean Deviation.


STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Variability or Dispersion

2.5. Interquartile Range:

Range of values extending from 25th percentile to 75th percentile


STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Position

3.1. Percentile. Divides the data into 100 equal parts.

3.2. Decile. Divides the data into 10 equal parts.

3.3. Quartile. Divides the data into 4 equal parts.


STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Position
where:
3.1. Percentile. Values that 𝑷𝒏 = percentile being computed(ex. 𝑷𝟓𝟎 )
divide a set of observations 𝑳𝑷𝒏 = lower class boundary of the percentile
into 100 equal parts. It is
being computed
denoted by P1 , P2,…., P99,
i = class interval value
such that 1% of the data
𝒇𝑷𝒏 = frequency of the class of the
falls below P1.
percentile being computed
kn = percentile value (ungrouped)
𝑭𝑷𝒏 = cumulative frequency of the class of
the percentile being computed
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Position
where:
3.2. Decile. Values that 𝑫𝒏 = decile being computed(ex. 𝑫𝟓 )
divide a set of observations 𝑳𝑫𝒏 = lower class boundary of the decile
into 10 equal parts. It is
being computed
denoted by D10 , D20,….,
i = class interval value
D90, such that 20% of the
𝒇𝑫𝒏 = frequency of the class of the
data falls below D20.
decile being computed
kn = decile value (ungrouped)
𝑭𝑫𝒏 = cumulative frequency of the class of
the decile being computed
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Position
where:
3.3. Quartile. Values that 𝑸𝒏 = quartile being computed(ex. 𝑸𝟐 )
divide a set of observations 𝑳𝑸𝒏 = lower class boundary of the quartile
into 4 equal parts. It is
being computed
denoted by Q1, Q2, ……,
i = class interval value
Q3, such that 25% of the
𝒇𝑸𝒏 = frequency of the class of the
data falls below Q1.
quartile being computed
kn = quartile value (ungrouped)
𝑭𝑸𝒏 = cumulative frequency of the class of
the quartile being computed
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Position

Percentile Deviation:

Decile Deviation:

Quartile Deviation:
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape

4.1 Skewness. The measure of the shape of a nonsymmetrical distribution.


 Two sets of data can have the same mean & SD but different skewness
 Two types of skewness:
• Positive skewness
• Negative skewness
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape: Skewness

The shape of the distribution provides information about the central


tendency and variability of measurements.

Three common shapes of distributions are:


Normal: bell-shaped curve; symmetrical
Skewed: non-normal; non-symmetrical; can be positively or negatively
skewed
Multimodal: has more than one peak (mode)
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape: Skewness

Mean Mode Mean Mode Mean


Median
Median Mode Median
Negatively Symmetric Positively
Skewed (Not Skewed) Skewed

Relative Locations for Measures of Central Tendency


STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape: Skewness

Age Distribution Distribution of Scores on the Numerical Section of GRE

60 1200

50 1000

40 800
Frequency

Frequency
30 600

20 400

10 200

0 0
> 59 50 - 59 40 - 49 30 - 39 20 - 29 < 20 <100 100 - 199 200 - 299 300 - 399 400 - 499 500 - 600
Frequency 300 500 600 1000 1100 950
Frequency 40 50 40 20 15 12
GRE - Numerical Scores
Age Groups

Positively Skewed Distribution Negatively Skewed Distribution


Bimodal Distribution
Distribution of Self-Ratings on Self-Esteem

70

60

50
Frequency

40

30

20

10

0
1 2 3 4 5 6 7
Frequency 25 55 65 50 62 58 25
Self-Ratings (1 = Low Self-Esteem, 7 = High Self-Esteem)
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape: Skewness
STATISTICAL MEASURES:
Grouped Data Analysis
Measure of Shape

4.2 Kurtosis. A measure of whether the curve of a distribution is:

 Bell-shaped -- Mesokurtic
 Peaked -- Leptokurtic
 Flat – Platykurtic

 Measure of peakedness of the data.


Assessment Task

The following data are the Using the all the data in the table above,
calculate and interpret the following:
measures of the diameters of • Mean
36 rivet heads in 1/100 of an • Standard Variation
inch. • Skewness
• 50th Percentile
• Create the following graphs:
• Histogram
• Frequency Ogive
• Box Plot

You might also like