Statistics Chapter-II
Statistics Chapter-II
Statistics Chapter-II
PROBABILITY
CHAPTER-II
METHODS OF DATA COLLECTION
AND PRESENTATION
1
Objectives:
2
Introduction
3
Methods of data collection
5
Definitions:
6
There are three basic types of
frequency distributions:
a) Categorical frequency distribution
b) ƒUngrouped frequency distribution
c) Grouped frequency distribution
a) Categorical frequency Distribution:
Used for data that can be place in specific
categories such as nominal, or ordinal.
e.g. marital status
7
Example 1
A social worker collected the following data on
marital status for 25 persons.(M=married,
S=single, W=widowed, D=divorced)
8
Solution:
Since the data are categorical, discrete classes
can be used. There are four types of
marital status M, S, D, and W. These types will
be used as class for the distribution. We
follow procedure to construct the frequency
distribution.
Step 1: Make a table as shown.
9
Continued…
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by
using; f/n *100%
Where f= frequency of the class, n=total number of value.
10
Continued…
Percentages are not normally a part of
frequency distribution but they can be added
since they are used in certain types
diagrammatic such as pie charts.
Combing all the steps one can construct the
following frequency distribution.
11
Continued…
13
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data
Step 4: Compute the frequency.
14
Continued…
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Total 20 15
c) Grouped frequency Distribution:
• When the range of the data is large, the data must be grouped in to classes
that are more than one unit in width.
Definitions:
Grouped Frequency Distribution: a frequency distribution
when several numbers are grouped in one class.
Class limits: Separates one class in a grouped frequency
distribution from another. The limits could actually appear
in the data and have gaps between the
upper limits of one class and lower limit of the next.
Units of measurement (U): the distance between two
possible consecutive measures. It is usually taken as 1, 0.1,
0.01, 0.001, -----.
16
• Class boundaries: Separates one class in a grouped
frequency distribution from another. The boundaries
have one more decimal places than the row data and
therefore do not appear in the data. There is no gap
between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is
found by subtracting U/2 from the corresponding lower
class limit and the upper class boundary is found by
adding U/2 to the corresponding upper class limit.
17
Class width: the difference between the upper and lower class boundaries of
any class. It is also the difference between the lower limits of any two
consecutive classes or the difference between any two consecutive class
marks.
Class mark (Mid points): it is the average of the lower and upper class
limits or the average of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than
orequal to a specific value.
More than Cumulative frequency(MCF) : it is the total frequency of all
values greater than or equal to the lower class boundary of a given class.
Less than Cumulative frequency (LCF): it is the total frequency of all
values less than or equal to the upper class boundary of a given class
18
• Relative frequency (rf): it is the frequency divided
by the total frequency.
• Relative cumulative frequency (rcf): it is the
cumulative frequency divided by the total frequency
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This
means that no data value can fall into two different
classes.
19
3. The classes must be all inclusive or
exhaustive. This means that all data values
must be included.
4. The classes must be continuous. There are no
gaps in a frequency distribution.
5. The classes must be equal in width. The
exception here is the first or last class.
20
Steps for constructing Grouped frequency
Distribution
21
4. Find the class width (w) by dividing the
range by the number of classes and rounding
up, not off.
24
Step 5: Select the starting point, let it be the
minimum observation.
ƒ6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first
upper class=12-U=12-1=11
ƒ11, 17, 23, 29, 35, 41 are the upper class
limits.
25
Continued…
So combining step 5 and step 6, one can construct the following
classes.
Class
limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5 27
Continued…
• Step 8: tally the data.
Step 9: Write the numeric values for the tallies
in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative
cumulative frequency.
• The complete frequency distribution follows:
28
Continued…:
Cf Cf rcf
Class Class Class Fre (less (more (less
Tally rf.
limit boundary Mark q. than than than
type) type) type
29
Continued…:
Exercise 1: The following data are the weights in kg of 40
individuals participated in a diet program for weight
loss:
• 70 64 99 55 64 89 87 65 62 38 67
70 60 69 78 39 75 56 71 51 99 68
95 86 57 53 47 50 55 81 80 98 51
36 63 66 85 79 83 70.
• Construct a grouped frequency distribution for this data
by using Sturgess’ rule for the number of classes. The
distribution must contains class boundaries, class mark,
Lcf, Mcf, Rf, Lrcf and Mrcf.
30
Test Time Allowed-1hr
The following data are on the number of minutes
to travel from home to work for 25 workers:
28 25 48 37 41 19 32 26 16 23 23 29 36 31
26 21 32 25 31 43 35 42 38 33 28.
Construct a grouped frequency distribution for this
data by using Sturgess’ rule for the number of
classes. The distribution must contains class
boundaries, class mark, Lcf, Mcf, Rf, Lrcf and
Mrcf.
31
Solution:
33
Continued…
• Definition : A relative frequency distribution is a
distribution which specifies the frequency of a
class relative to the total frequency.
• PERCENTAGE
The percentage for a category is obtained by
multiplying the relative frequency for that
category by 100 .
34
Continued…
• Solution: First we find the relative frequency
of each class. The relative frequency of a class
is the frequency of the class divided by the
total number of observations. For instance the
relative frequency of the first class is
3/25=0.12, the relative frequency of the second
class is 6/25=0.24, and so on. Thus, the
relative frequency distribution is shown in the
table below.
35
Continued…
Class limit Frequency Relative frequency Percentage(%)
16-21 3 0.12 0.12x100=12%
22-27 6 0.24 0.24x100=24%
28-33 8 0.32 0.32x100=32%
34-39 4 0.16 0.16x100=16%
40-45 3 0.12 0.12x100=12%
46-51 1 0.04 0.04x100=4%
Total 25 1.0 100%
36
Diagrammatic and graphical presentation
of data
37
1) Diagrammatic presentation of data
Diagrams are appropriate for presenting discrete data.
The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are:
• Pie charts
• Bar charts
a) Pie chart
• It is a circle divided by radial lines into sectors so that the area of
each sector is proportional to the size of the figure represented.
Pie-chart construction:
• Calculate the percentage frequency of each component. It is
fi
Percentage *100%
n
• Calculate the degree measures of each sector. It is given by .
fi
Angle * 3600
n
• Draw the circle using protractor and compass. 38
Example :
• The following data are the blood types of 50
volunteers at a blood plasma donation clinic:
• O A O AB A A O O B A O A AB B O
O O A B A A O A A B O B A O AB A
O O A B AAA O B O O A O A B O
AB A O
a)Organize this data using a categorical
frequency distribution.
b)Present the data using both a pie .
39
Solution:
The classes of the frequency distribution are A, B, O, AB. Count the number of donors
for each of the blood types.
A 19 38.0 38136.8
57.6
B 8 16.0
136.8
O 19 38.0
28.8
AB 4 8.0
8%
16%
41
Exercise
Draw a pie-chart to represent the following data on a certain
family expenditure.
Table: Family expenditure.
Item Amount (in birr)
Food 3,000
Clothing 1,000
House rent 3,000
Fuel & Light 1,000
Saving 2,000
Total 10,000
42
b) Bar chart
• A set of bars (thick lines or narrow rectangles)
representing some magnitude over time space.
- They are useful for comparing aggregate over
time space.
- Bars can be drawn either vertically or
horizontally.
- There are different types of bar charts. The most
common types are :
• Simple bar chart
• Component or sub divided bar chart.
• Multiple bar charts.
43
a)Simple Bar chart
44
Continued…
Sales(birr) Sales(birr) Sales(birr
Product
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
45
Continued…
SALES IN 1957
30
25
20
Sales in Birr
15
10
0
Product-A Product-B Product-C
Types of Products
46
Continued…
SALES IN 1958
40
35
30
Sales in Birr
25
20
15
10
5
0
Product-A Product-B Product-C
Types of Products
47
Continued…
SALES IN 1959
60
50
40
Sales in Birr
30
20
10
0
Product-A Product-B Product-C
Types of products
48
b)Component Bar chart
49
Solutions:
100
90
80
70
Sales in birr
60
50 Product-C
40 Product-B
30 Prduct-A
20
10
0
1957 1958 1959
Years of production
50
c)Multiple Bar charts
These are used to display data on more than
one variable.
They are used for comparing different
variables at the same time.
Example:
Draw a multiple bar chart to represent the sales
by product from 1957 to 1959[ in the previous
example]
51
Solution:
SALES BY PRODUCT FROM 1957 TO 1959
60
50
40
Sales in birr
30 Product -A
Prduct-B
20
Product-C
10
0
1957 1958 1959
Years of production
52
2)Graphical Presentation of data
53
a)Histogram
It consists of a set of adjacent rectangles whose bases are
marked off by class boundaries (not class limits) along
the horizontal axis and whose heights are proportional
to the frequencies associated with the respective
classes.
Example:The following data are on the number of minutes
to travel from home to work for 25 workers:
28 25 48 37 41 19 32 26 16 23 23 29 36 31 26
21 32 25 31 43 35 42 38 33 28.
Present this data using Histogram and frequency polygon.
54
Solution:
First construct (or organize ) using the frequency distribution
table That is:
Class limit Class boundaries Frequency
16-21 15.5-21.5 3
22-27 21.5-27.5 6
28-33 27.5-33.5 8
34-39 33.5-39.5 4
40-45 39.5-45.5 3
46-51 45.5-51.5 1
Total 25
55
Continued…
56
Frequency polygon
Frequency Polygon
9
8
7
6
Frequency
5
4
3
2
1
0
12.5 18.5 24.5 30.5 36.5 42.5 48.5 54.5
Class Mark
57
Example
58
Frequency Distribution Table
Class Limit Class Boundary Class Mark Frequency
59
Histogram
60
Frequency Polygon
Frequency Polygon
8
7
6
Frequency
5
4
3
2
1
0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Class Mark
61
3) Ogive (cumulative frequency polygon)
62
Class Limit Class Boundary Class Mark LCF
Frequency
63
Ogive
25
Less than cumulative frequency
20
15
10
0
0 10 20 30 40 50
Upper Class boundaries
64
65
Thank you !!!
66