Com 201 - Concept of Frequency Distribution, Mean, Median - Print
Com 201 - Concept of Frequency Distribution, Mean, Median - Print
OMISORE AKINLOLU G.
FWACP, MPH, MB.Ch.B
RE-CAP OF LAST LECTURE
PRESENTATION OF DATA
• There are various methods of data presentation.
• Irrespective of the methods, data are usually
grouped or collated into frequencies- to determine
the rate of occurrence.
METHODS OF DATA PRESENTATION
• Tabular Presentation of Data.
• Graphical or diagrammatic presentation of Data
- Quantitative or numerical Data- Use
Histogram, Frequency Polygon.
- Qualitative or categorical data- Use Bar OR
Pie chart. Dot maps for geographical mapping
• Summary indices- e.g. Mean, Median & Mode.
TABULAR PRESENTATION.
• Done in form of frequency tables.
• Can be for both quantitative and qualitative data.
• Definitions for Frequency Table
CLASS- one of the groups into which
data can be classified.
CLASS FREQUENCY (CF)- is the number of
observations (NOB) in the data set falling in a
particular class.
CLASS RELATIVE FREQUENCY- CF divided by
the total NOB in the data set.
Example of a Frequency Table
Frequency Table
CLASS FREQUENCY RELATIVE
FREQUENCY
Level of Number
Education
None 254 0.34
Primary 201 0.27
Secondary 119 0.16
Others 75 0.10
Total 746 1.00 7
Methods of summarizing data
• Measures of location/ central
tendency-
• Measures of dispersion /
spread/variation
• Mean
• Median
• Mode
• Midrange
The Mean (x-bar)
x= 23 + 19 + 21 + 20 + 23 + 21 + 22 + 24 + 22 + 22
10
= 21.7 years
PROCEDURE FOR CALCULATING MEAN (Grouped
data)
• Find class-mid mark for each interval.
• Multiply class mid-mark in each interval by
corresponding frequencies. The class mid-
mark for each interval is the average of
the class limits.
• Add results in (ii) across all intervals.
• Divide results in (iii) by number of observations
or total frequency.
Mean - (Grouped Data)
Example:
Marks of students in practical 1
Marks Frequency Class mid mark ∑ fI xI
60-64 10 62 10*62 620
65-69 14 67 14*67 938
70-74 12 72 12*72 864
75-79 20 77 20*77 1540
80-84 10 82 10*82 820
85-89 14 87 14*87 1218
-
∑f = 80 ∑ fi xi = 6000
x = ∑ fi xI
∑ fi
x = 6000/80
= 75
Properties of Mean (X)
• Affected by extremes of values
• All the other observations lie about it
• Makes use of all information
• The sum of the deviations of the values
from the mean is always equal zero i.e
the mean is subtracted from each value
to form deviations (x minus xbar)
• ∑ (x- Xbar) = 0
• (x1 – Xbar)+ (x2 – Xbar) + (x3 – Xbar)…= 0
Advantages of the Mean
• It is easy to calculate
• Makes use of all information in the
distribution- hence reliable and
accurate
• Amenable to statistical procedures and
testing
Disadvantages of the mean
• It may be unduly influenced by abnormal
values in the distribution
• Not used with badly skewed
distribution- the more asymmetric the
distribution, the less desirable it is to
summarize the observation by using the
mean
The Median
• The middle number in an array of
the data, when the number of
observations is odd
Or
• The arithmetic mean of the middle
numbers in an array of data when
the number of observations is even
The Median cont’d
• The value above or below which half
(50%) of the observations fall
• Bisector of histogram/ polygon
• For interval, ratio and ordinal scale (not
nominal)
EXAMPLE ON MEDIAN
Find the Median of the first 10 1st year clinical students in
UNIOSUN 23, 19, 21, 20, 23, 21, 22, 24, 22, 22
(a)The class values 7. 1 and 7.3 are the lower and upper limits of
the class and their difference gives the class width.
(b)The class boundaries are 0.05 below the lower class limit and
0.05 above the upper class limit (because the figures are in
1Decimal place)
(c)The class interval/ width is the difference between the upper and
lower class boundaries.
(d)Question- What are the class boundaries if the figures
are between 7 and 8?
10.Median (Grouped Data)
Using last example of mean.
Cumulative
Marks Boundaries Frequency Frequency (F)
60-64 59.5-64.5 10 10
65-69 64.5-69.5 14 24
70-74 69.5-74.5 12 36
75-79 74.5-79.5 20 56
80-84 79.5-84.5 10 66
85-89 84.5-89.5 14 80
• Mode= L + ∆1 XC
∆1 + ∆2
L= Lower limit of modal class. C= Class width/Modal class Size
∆1= f1-f0 (frequency of modal class- frequency of the class
preceeding modal class.
∆2= f1-f2 (frequency of modal class- frequency of the class
succeeding the modal class.
= 74.5 + (20-12) x5
(20-12) + (20-10)
= 76.7
Advantages of the Mode
• Easy to compute
• Not affected by extreme values
• Main usefulness is for calling attention to
distribution in which the values cluster at
several places
• Only average available for nominal scale
Disadvantages of the Mode
• Not the best for biological or medical
statistics
• Several observations with the same
frequency - multimodal
• Least valuable
Mid- Range
• Minimum + maximum, divided by 2
• Not affected by extreme of values
• Does not consider all information in the
distribution
Choice of measure of central
tendency
• Depends on the shape/nature of the
distribution- skewed to the left or right
or a normal distribution