Module 2 - 4
Module 2 - 4
Data Presentation
Unit 1: Data Presentation
Unit 1
Data Presentation
Content
1.0 Introduction
2.0 Learning Outcomes
3.1 Narration/Tabulation
4.0 Conclusion
5.0 Summary
Examples:
1. Guardian Newspapers has three titles in her stables: The Guardian, “African
Guardian”, and “Express.” A study of staff ratio in three departments was carried
out and the following information was gathered. There are 200 staff in the three
departments, of which 65 are in the African Guardian. Of the staff in the guardian,
30 are in the editorial department and 21 in the advertising department. In the
Guardian Express, 22 staff are in the production department and 15 in the
advertisement department of the total of 61. The total number of staff in the three
departments who worked in the advert department 55 and those in the
production department 65.
i. Tabulate the above information so as to give the highest possible information.
ii. How many staff are in the advertisement department in the African Guardian?
Solution:
= 2.05%
iii) Multiple bar chart and component bar charts.
Self-Assessment Exercise
1. Tabulation thus enhances the condensation and easy comparison of data (Yes
or No)?
3.2 Pictorial Presentations (Diagrams, Charts and Graphs)
No matter how informative and well designed a statistical table is, as a medium for
conveying to the reader an immediate and clear impression of its content, it is inferior
to a good chart or graph. Many people are incapable of comprehending large masses
of information presented in tabular form; the figures merely confuse them.
Furthermore, many of such people are unwilling to make the effort to grasp the
meaning of such data. Graphs and charts come into their own as a means of conveying
information in easily comprehensible form. It is for such reasons that government and
multinationals always produce popular versions of important white papers in the form
of multi-coloured booklets full of simple diagrams and charts. Such diagrams and
charts are also often now seen on television both for viewers’ easy understanding and
for advertising.
Though such pictorial presentation reduces the amount of detail that can be put across
to the reader or viewer, often it is not the detail that matters, but rather the overall
picture. The most popular charts, diagrams and graphs are:
i. Pie-charts
ii. Bar diagrams (bar charts and histograms)
iii. Graphs (frequency polygons and Ogives)
Self-Assessment Exercise 2
1. An investigation of the marital status of the staff of an institution reveals the
following:
A component bar chart comprises of bars which are subdivided into components.
Example:
Self-Assessment Exercise 3
Self-Assessment Exercise 4
The sex distribution of staff in five departments of a Television station is given below:
3.2.3 Histograms
Histograms and bar charts look alike in presentation but while the bars of the bar charts
are usually not joined; those of the histogram are usually joined. Further, while the
chart attaches importance only to its heights, histogram attaches importance to both
heights and the widths.
Self-Assessment Exercise 5.
130
120
110
100
90
80
70
60
50
40
Widowed
Divorced
Single
Married
Self-Assessment Exercise(s) 6
1. Of 100 patients in an orthopaedic hospital who were asked for their room, 50
wanted private rooms, 40 wanted semi-private and 10 would make do with any
room. Present this data by means of a bar chart.
2. Of 400 nursing students at teaching hospital,152 planned too into psychiatric
speciality,120 into paediatric, 80 into public health and 48 into orthopaedic
nursing. Represent the information on pie chart.
3 In a study of age distribution of patients in orthopaedic hospital, the following
ages were recorded:
51 35 45 52 53 32 31 44 47 35 52 36 44 45 44 32
48 44 44 33 53 44 44 47 44 44 44 55 44 34 54 44
45 48 32 44 47 58 50 37 44 47 50 46 38 57 49 50
51 38
Draw the frequency polygon for data above
4.0 Conclusion
In this unit, you have studied several graphical representations of data. These
representations are used in interpreting features of data. They are descriptive in
nature.
5.0 Summary
1. The raw data resulting from a survey or census are usually unorganised
2. A collection of data must be organised and summarised so as to reveal the
significant features.
3. A collection of data may be described by frequency tables, pie chart, bar chart,
histograms and frequency polygons.
51 35 45 52 53 32 31 44 47 35 52 36 44 45 44 32 48
44 44 33 53 44 44 47 44 44 44 55 44 34 54 44 45 48
32 44 47 58 50 37 44 47 50 46 38 57 49 50 32 51 38
Indira Gandhi National Open University, (1999) Probability and Statistics, Sita Fine
Arts Pvt. Ltd., New Delhi-28
Module 3
Measures of Central
Tendency
Unit 1: Measures of Location
4.0 Conclusion
5.0 Summary
Sometimes the figures in a data are so spread that unless the figures are grouped, a
neat and sensible frequency table may not be achieved. Tabulation done in this way
is called a grouped frequency distribution table. The figures are usually grouped into
distinct classes to avoid confusion of possible placement of data into two or more
classes. So, classes may not have gaps in reality.
Self-Assessment Exercise 1
The weights in Kg of a collection of 40 workers in an organization are given below:
59, 53, 66, 55, 57, 65, 48, 59, 51, 58, 52, 68, 60, 70, 71, 56, 70, 64, 54, 67, 62, 53,
49, 56, 63, 48, 57, 61, 58, 55, 56, 55, 61, 52, 54, 65, 56, 50, 62, 60.
Using the tally method, prepare a grouped frequency distribution table using groups
48 – 52, 53 – 57, …
3.2 Mean (Averages)
The word average’ is used here to denote the not-too- brilliant and not-too-dull
student. But in statistics the word has a special meaning. In the above context it
would be used statistically, to describe that student who is representative, in some
ways, of all students that sat for the examination. Therefore, if we have a group of
figures, the average figure is that single figure that can represent all the other groups
in that distribution. Three types of averages are often used in statistics:
i. The mean
ii. The median, and
iii. The mode
𝑥̅ = x1 + x2 + … + xn
n
𝑥̅ = Σx /n
𝑥̅ = Σx = 3 + 5 + 7 + 10 + 15
n 5
= 40/5
=8
Note: When each of the numbers x1, x2,…,xn has attached frequencies f1, f2,...,f3, then
the mean becomes
Mean = 𝑥̅ = Σx
Σf
Where Σf = n
Self-Assessment Exercise(s) 2
1. The figures are usually grouped into distinct classes to avoid confusion of
possible placement of data into two or more classes (Yes or No)?
2. A large and ungrouped data are cumbersome to study and interpret (Yes or
No)?
3.2.2 Mean of a Grouped Data
Long Method
𝑥̅ = 𝜀𝑓𝑥
𝛴𝑓𝑥
=
𝛴𝑓
𝛴𝑓𝑑
𝑥̅ = 𝐴 +
𝛴𝑓
Where,
A is a guessed or assumed mean and d = X– A are the deviations from the assumed
mean.
Coding Method
𝛴𝑓𝑢
𝑥̅ = 𝐴 + ( 𝛴𝑓 ) + 𝐶
NOTE: The coding method is very short and should always be used for grouped data
when class intervals are equal.
Self-Assessment Exercise 3
1. Referring to the table in the above SAE 1, calculate the mean using:
a. The long method
b. Assumed mean of 61
c. The coding method.
Advantages of Mean
It takes account of all the values of a distribution. It is therefore, more representative
than the other two and for this reason alone, it is used more than the other two
averages.
Disadvantages of Mean
Self-Assessment Exercise(s) 4
Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19
No. of workers 5 11 17 14 9 4
If a set of data arranged in order of magnitude, the middle value, which divides the set
into two equal groups is the median. Generally, for N data,
𝑁+1 𝑡ℎ
Median =( ) item
2
Example:
Find the median of the following sets of data
(a) 3, 6, 2, 4, 3
(b) 2, 5, 3, 4, 8, 3
Solution:
(a) Arrangement in order: 2, 3, 3, 4, 6
Here N= 5 and
5+1 𝑡ℎ
Median = ( ) item
2
= 6/2 = 3
That implies the 3rd item =3
Here, N = 6
6+1 𝑡ℎ
Thus, Median = ( ) item
2
= 3.5th item
Thus, median
The median of a group data can be obtained graphically from the cumulative frequency
curve (ogive) or by calculation, using the formula:
Median = L + N/2 – F C
f
Where:
Advantages of Median
1. It is easily understood.
2. It is relatively easy to calculate.
Disadvantages of Median
1. It takes no account of extreme values in the distribution. For instance, the median
of 2, 40, 43, 45, and 96 is even though there are two extreme values 2 and 96.
2. It does not use all the data available.
Self-Assessment Exercise(s) 5
Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19
No. of workers 5 11 17 14 9 4
a. Construct the cumulative frequency curve and from it estimate the median.
b. Calculate the median and compare your results.
This is the value or number that has the highest frequency in a distribution. The mode
may not exist and even when it does exist, it may not be unique.
Example:
The mode can be obtained both graphically and by calculations. For a grouped data,
we use the histogram to estimate the mode while by calculation we use the formula:
Mode = L + fm - fa C
2fm – fa - fb
Where:
L = lower class boundary of the modal class
fm = frequency of the modal class
fa = frequency of the class above the modal class
fb = frequency of the class below the modal class
C = size of the modal class interval.
Note: The modal class is the class that has the highest frequency. The mode itself is
a number within this class.
a. Construct the histogram and from it, estimate the mode of the distribution.
b. Calculate the mode and compare your answer with the estimated value in (a)
above.
Solution:
a. The construction should be done on a graph sheet where frequencies are on the
vertical axis and class boundaries on the horizontal
13
12
11
10
𝑓𝑚 − 𝑓𝑎
b. Mode = 𝐿+ [ ]𝑐
2 𝑓𝑚 − 𝑓𝑎 − 𝑓𝑏
12−8
Mode = 52.2 + [ ]x 5
2(12)− 8−10
= 52.5 + (4/6) x 5
= 52.5 + 3.33
= 55.83
Comparison: Graphical value = 56
Estimated value = 55.83
The values agree appreciably.
Advantages of Mode
Disadvantages of Mode
1. It presents a misleading picture for a distribution that does not have a regular
shape.
2. It does not use all the data available.
The figure below shows the relative positions of the mean, median and mode for
frequency curves which are skewed to the right and left respectively. For symmetrical
curves the mean, mode and median all coincide.
Self-Assessment Exercise 6
1. The distribution of the number of overtime hours per month worked by 60 staff
of NITEL are given below:
Overtime (hrs) 60 – 69 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19
No. of workers 5 11 17 14 9 4
a. Calculate the mode of the distribution.
4.0 Conclusion
In this unit, you have been exposed to the measure of central tendency which are
bench marks, typical scores or measures which give precise and brief description of a
set of data. These are very important aspect of statistics you cannot toy with.
To make your data very precise for interpretation, you will need to learn these
measures of location very well.
5.0 Summary
In this unit you have learnt that the measures of central tendency are a set of bench
marks which make precise and brief presentation or description of a set of scores. The
three basic measures of central tendency are the mean, the median and the mode.
The mean is the most widely used. It is equal to the sum of the scores divided by the
number of the scores. The symbol is 𝑥̅ and the formula is 𝛴𝑋⁄𝑁 or 𝛴𝐹𝑋⁄𝛴𝐹. Or for
assumed mean = AM + int(Σfx/Σfx).
4.0 Conclusion
5.0 Summary
Example: The deviations of the numbers 8,3, 5, 12, 10 from their arithmetic mean
7.6 are 8-7.6, 3-7.6, 5-7.6, 12-7.6, 10-7.6
= 0.4, -4.6, -2.6, 4.4, 2.4 with algebraic sum
0.4-4. 6-2.6 + 4.4 + 2.4 = 0.
2. The sum of the squares of the deviations of a set of numbers, X, from any
number a is a minimum if and only if a = X.
Solution:
N N
Comparing the last expression with (w2 + pw + q), we have:
W=a, p= -2 ∑x/N, Q = ∑X2/N
Then the expression is a minimum when a=-1/2p = ∑x/N = X
(c) If f1 numbers have mean m1, f2 numbers have mean m2, . . ., fK numbers have mean
mK then the mean of all the numbers is
(d) If A is any guessed or assumed arithmetic mean (which may be any number) and
if = X-A, denoted by d, are the deviations of X from A, then,
x = A+ ∑di/N
Or simply X = A + ∑d/N
If the data is grouped, then
𝑥̅ = A + ∑ fidi/ ∑ fi or simply as
𝑥̅ = A + ∑fd/N
where N = ∑ fi
Self-Assessment Exercise(s) 1
1. The algebraic sum of the deviations of a set of numbers from their arithmetic
mean is zero (Yes or No)?
2. If a final examination in a course is weighted three times as much as a quiz and
a student has a final examination grade of 85 and quiz grades of 70 and 90,
calculate the mean grade.
The harmonic mean, H of a set of N numbers: X1, X2, X3, .. . , XN , is the reciprocal of
the arithmetic mean of the reciprocals of the numbers:
H = N / ∑1/x
Example: The harmonic means of the numbers 2, 4, 8 is
H = 3
½ + ¼ + 1/8
= 3
7/8
= 3.43
Self-Assessment Exercise(s) 2
1. The harmonic mean, H of a set of N numbers: X1, X2, X3, .. . , XN , is the reciprocal
of the arithmetic mean of the reciprocals of the numbers (Yes or No)?
2. Calculate the geometric mean of the numbers: 3, 5, 6, 7.
3. Calculate harmonic mean of the numbers: 5, 6, 7.
Solution:
i. Mean ratio of milk to bread prices = 1(2.50+2.00) 2.25
ii. Since the ratio of milk to bread prices for the first year is 2.50, the ratio of bread to
milk price is 1/2.50 = 0.40. Similarly, the ratio of bread to milk prices for the second
year is 1/2.00 0
Then
Mean ratio of bread to milk prices = 1(0.40+0.50) = 0.45
iii. We would expect the mean ratio of milk to bread prices to be the reciprocal of the
mean ratio bread to milk prices if the mean is an appropriate average.
However, 1/0.45 = 2.11 # 2.25.
This shows that the arithmetic mean is a poor average to use for ratios.
iv. Geometric mean of ratios of milk to bread prices = V(2.50)(2.00)
V5
Geometric mean of ratios of bread to milk prices v’(0.40)(0.50) = V 0.2 = 1/ V 5 Since
these averages are reciprocals, our conclusion is that the geometric mean is more
suitable than the arithmetic mean for averaging ratios for this type of problem.
Self-Assessment Exercise 3
4.0 Conclusion
We have been exposed to the concept of weighted mean: be it arithmetic, geometric
and harmonic means. This means for us to calculate appropriate mean, weight has to
be attached to individual value accordingly.
5.0 Summary
In this unit, you have been learnt that the concept of weighted mean is particularly
useful in the construction of price index number and in such a situation to take into
account the relative importance (or weight) of the different observations while
evaluating the mean.
4.0 Conclusion
5.0 Summary
The mean and the median wages for each of the two distributions is N35, 000. From
the results, one could wrongly conclude that the workers conditions of service in both
stations are the same. A close observation of the figures clearly shows that the wages
of workers in station A are more fairly and evenly distributed than those in B. One
therefore, needs a study of dispersion to detect the disparity in a distribution. Various
measures of dispersion are available; the measures which we shall discuss in this unit
are the range, the quartile, the deciles and percentiles.
3.2 Quartiles
In the last unit, you learnt that Median is a positional score, which occupy the middle
point on the score scale. In the same way, the quartiles are positional scores. The first
Quartile Q1 is the score point that sets the lower quarter or 25% of the group. In the
same way, the middle quartile Q2 is the median score point and third quartile Q3 is the
75% of the group. So, quartiles are points that divide a score into four equal parts.
These points can locate in a distribution.
Scores 15 18 21 23 25 27 28 29 32
Freq. 1 1 2 3 6 5 3 4 3
Solution:
S/No Scores F CF
1 32 3 28
2 29 4 25
3 28 3 21
4 27 5 18
5 25 6 13
6 23 3 7
7 21 2 4
8 18 1 2
9 15 1 1
ii. Find the 25% or ¼ of the number of scores =25/100 x 28/1=7
iii. Count below along the frequency column until you get 25% of the cases. It is
between 23 and 25 i.e Q1= 23+25/2 =24
iv. Find 75% or ¾ of the scores =75/100x28/1 =21
v. Count from the below along the frequency until you get 75% of the cases. This
gives Q3. It is between 28 and 29 i.e Q3=28+29/2=28.5
Self-Assessment Exercise 2
No of 8 10 16 14 10 5 2
Employees
In this sub unit, we shall move to another step. This is to divide into ten equal parts to
locate the deciles. Deciles points are used to mark off a distribution, thus indicating
points of dividing a distribution of success into tenths. Thus, there are 9 deciles i.e.
from 1 to 9 which divide a distribution into ten equal parts. D1 is the first deciles and
below D1 lies the bottom 10% of the group. In the same way D 2 is the point in the
distribution below which 20% of the cases fall. Like quartiles, deciles are points in a
distribution not segments.
Scores 16 18 20 22 24 26 28 30 32
Freq. 1 1 2 3 6 5 3 4 3
4.0 Conclusion
In this unit, we have gone through sources of the measures of variability or dispersion.
These are the measures used to establish the homogeneity or heterogeneity of a set
of data in a distribution scale.
5.0 Summary
In this unit you have been exposed to some measures of variability which are
measures that show the spread of the scores in a given distribution. The measures
you have seen so far are:
i. The range which simply shows the difference between the highest and the lowest
observations or numbers.
ii. The quartiles are the points which divide the distributions or scores into four
equal parts called quarters.
iii. The deciles are also points on the distribution that divide the distribution into ten
equal parts or tenths.
iv. Percentile are points on the score scale that divide the distribution into 100 equal
parts called centiles or percentages.
4.0 Conclusion
5.0 Summary
Solution
X x–𝒙
̅=x–7 |x – 𝒙
̅| = |x – 7|
2 -5 5
4 -3 3
7 0 0
10 3 3
12 5 5
Total 16
Hence:
Mean Deviation = 16/5 = 3.2
Note: For a grouped data, the class marks are taken as our x values.
Self-Assessment Exercise 1
1. Find the mean deviation of the set of values: 12, 6, 7, 3, 15, 10, 18, 5
Calculations
1. Long Method:
Σfx2 Σfx
𝑆 = 𝐶 × √ Σf − ( Σf )2
Σfd2 Σfd
𝑆 = 𝐶 × √ Σf − ( Σf )2
Where d = x – A
3. Coding Method:
Σfu2 Σfu 2
𝑆 =𝐶 ×√ −( )
Σf Σf
Where u is as earlier defined under mean.
Note: In problems involving calculation of both mean and standard deviation, for
simplicity, the method used in computing the mean should be applied to find the
standard deviation.
The square of the standard deviation is called the variance. So if ‘S’ denote the
standard deviation, then S2 is the Variance.
Examples:
1. The marks scored by some 50 students in a statistics test are given below:
Marks 51 – 60 41 – 50 31 - 40 21 - 30 11 – 20 1 – 10
Frequency 5 10 11 12 6 6
a. Calculate the mean and the standard deviation using the log method
Solution:
F X Fx x2 fx2 ̅
x-𝒙 |x-31.1| f|x-31.1| F
= x-31.1
1 – 10 6 5.5 33.0 30.25 181.5 -25.5 25.5 153.6 6
11 – 20 8 15.5 93.0 240.25 1441.5 -15.5 15.5 93.5 12
21 – 30 12 25.5 306.0 650.25 7803.0 -5.5 5.5 67.2 24
31 – 40 11 35.5 390.5 1260.25 13862.75 4.4 4.4 43.4 35
41 – 50 10 45.5 455.0 2070.25 20702.25 14.4 14.4 144.0 45
51 – 60 5 55.5 277.5 3080.25 14401.25 24.4 24.4 122.0 50
Total 50 1555.0 59392.5 628.8
𝑥̅ = Σfx = 1555
Σf 50
= 31.1
Σfu2 Σfu
a. 𝑆 = 𝐶 × √ Σf − ( Σf )2
59392.5 1555 2
𝑆=√ −( )
50 50
= 14.85
Self-Assessment Exercise 2
4.0 Conclusion
In this unit you have learnt that apart from the usefulness of the measures of the
measures of central tendency for providing a concise index of the average value of set
scores, three is more to be studied about a set of scores. Therefore, to describe a
distribution of scores very well and adequately we need both the measures of central
tendency and measure of variability, this is because the two measures make up two
types of descriptive statistics which are indispensable in describing distribution of a
given data.
5.0 Summary
You have studied three measures in this unit; they are mean deviation, standard
deviation and variance. Thus,
Σfx2 Σfx
𝑆 = √ Σf − ( Σf )2
Σfd2 Σfd
𝑆 = √ Σf − ( Σf )2
Where d = x – A
iii. Coding Method:
Σfu2 Σfu
𝑆 = 𝐶 × √ Σf − ( Σf )2
6.0 Tutor-Marked Assignment
1. Find the mean deviation of the set of values: 2, 3, 5, 6, 8.
2. The distribution of the ages of 108 staff of a telecommunication outfit are given
below:
No of Staff 8 12 20 24 16 16 12
Using assumed mean 33, calculate the mean and the standard deviation of the
distribution.