Statistics Chapter-II

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

INTRODUCTION TO STATISTICS &

PROBABILITY
CHAPTER-II
METHODS OF DATA COLLECTION
AND PRESENTATION

1
Objectives:

After completing this unit you should be able to:


• organize data using frequency distribution.
• Present data using suitable graphs or diagrams

2
Introduction

• The amount of data collected in real life situations


is often too large, thus we need some methods to
organize it. One of such methods is grouping, that
is putting data into groups rather than treating
each observation individually. In fact, raw data
provide little, if any, information to decision
makers. Thus, we need a means of converting the
raw data into useful information. Hence, the
purpose of this unit is to introduce tools used for
data presentation.

3
Methods of data collection

Depending on the source, data can be primary or secondary.


1) Primary data refers to the statistical data which the
investigator originates for the purpose of study.
2) secondary data, on the other hand, refers to data which is
not originated by the investigator himself, but which he
obtains from someone else records. Secondary data can be
obtained from published or unpublished documents:
reports, journals, magazines, articles e t c.
• Primary methods of data collection: It includes data
collection using observation, personal interview, self
administered questionnaire, mailed questionnaire etc.
4
Classification and tabulation of data

The process of arranging data in to classes or categories


according to similarities technically is called
classification.
Types of classification
• Geographical- in terms of cities, districts, countries etc.
• Chronological - on the basis of time
• Qualitative - according to some qualitative
characteristics.
• Quantitative – in terms of magnitude.
One can also use combination of these to classify data.

5
Definitions:

• Raw data: recorded information in its original


collected form, whether it be counts or
measurements, is referred to as raw data.
• Frequency: is the number of values in a specific
class of the distribution.
• Frequency distribution: is the organization of
raw data in table form using classes and
frequencies.

6
There are three basic types of
frequency distributions:
a) Categorical frequency distribution
b) ƒUngrouped frequency distribution
c) Grouped frequency distribution
a) Categorical frequency Distribution:
Used for data that can be place in specific
categories such as nominal, or ordinal.
e.g. marital status

7
Example 1
A social worker collected the following data on
marital status for 25 persons.(M=married,
S=single, W=widowed, D=divorced)

8
Solution:
Since the data are categorical, discrete classes
can be used. There are four types of
marital status M, S, D, and W. These types will
be used as class for the distribution. We
follow procedure to construct the frequency
distribution.
Step 1: Make a table as shown.

9
Continued…

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by
using; f/n *100%
Where f= frequency of the class, n=total number of value.
10
Continued…
Percentages are not normally a part of
frequency distribution but they can be added
since they are used in certain types
diagrammatic such as pie charts.
Combing all the steps one can construct the
following frequency distribution.

11
Continued…

b)Ungrouped frequency Distribution:


Is a table of all the potential raw score values that could
possible occur in the data along with the number of times
each actually occurred.
Is often constructed for small set or data on discrete
variable.
12
Constructing ungrouped frequency
distribution:
• First find the smallest and largest raw score in the collected data.
• Arrange the data in order of magnitude and count the frequency.
• To facilitate counting one may include a column of tallies.
Example:
The following data represent the mark of 20 students. Construct a
frequency distribution, which is ungrouped.

13
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data
Step 4: Compute the frequency.

14
Continued…
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 // 2
76 / 1
80 /// 3
85 /// 3
90 / 1
Total 20 15
c) Grouped frequency Distribution:

• When the range of the data is large, the data must be grouped in to classes
that are more than one unit in width.
Definitions:
Grouped Frequency Distribution: a frequency distribution
when several numbers are grouped in one class.
Class limits: Separates one class in a grouped frequency
distribution from another. The limits could actually appear
in the data and have gaps between the
upper limits of one class and lower limit of the next.
Units of measurement (U): the distance between two
possible consecutive measures. It is usually taken as 1, 0.1,
0.01, 0.001, -----.

16
• Class boundaries: Separates one class in a grouped
frequency distribution from another. The boundaries
have one more decimal places than the row data and
therefore do not appear in the data. There is no gap
between the upper boundary of one class and lower
boundary of the next class. The lower class boundary is
found by subtracting U/2 from the corresponding lower
class limit and the upper class boundary is found by
adding U/2 to the corresponding upper class limit.

17
Class width: the difference between the upper and lower class boundaries of
any class. It is also the difference between the lower limits of any two
consecutive classes or the difference between any two consecutive class
marks.
Class mark (Mid points): it is the average of the lower and upper class
limits or the average of upper and lower class boundary.
Cumulative frequency: is the number of observations less than/more than
orequal to a specific value.
More than Cumulative frequency(MCF) : it is the total frequency of all
values greater than or equal to the lower class boundary of a given class.
Less than Cumulative frequency (LCF): it is the total frequency of all
values less than or equal to the upper class boundary of a given class

18
• Relative frequency (rf): it is the frequency divided
by the total frequency.
• Relative cumulative frequency (rcf): it is the
cumulative frequency divided by the total frequency
Guidelines for classes
1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This
means that no data value can fall into two different
classes.

19
3. The classes must be all inclusive or
exhaustive. This means that all data values
must be included.
4. The classes must be continuous. There are no
gaps in a frequency distribution.
5. The classes must be equal in width. The
exception here is the first or last class.

20
Steps for constructing Grouped frequency
Distribution

1. Find the maximum and minimum values


2. Compute the Range(R) = Maximum value -
Minimum value.
3. Select the number of classes desired, usually
between 5 and 20 or use Sturges rule where k
is number of classes desired and n is total
number of observation.
k = 1+ 3.32log(n)

21
4. Find the class width (w) by dividing the
range by the number of classes and rounding
up, not off.

5. Pick a suitable starting point less than or equal to


the minimum value. The starting point is called the
lower limit of the first class. Continue to add the
class width to this lower limit to get the rest of the
lower limits.
22
6.To find the upper limit of the first class, subtract U
from the lower limit of the second class. Then
continue to add the class width to this upper limit to
find the rest of the upper limits.
7. Find the boundaries by subtracting U/2 units from
the lower limits and adding U/2 units to the upper
limits.
8. Tally the data.
9. Find the frequencies.
10.Find the cumulative frequencies, and if necessary,
find the relative frequencies 23
Example:
Construct a frequency distribution for the following data.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33
Step 3: Select the number of classes desired using Surges
formula;
k = 1+ 3.32log(n) =1+ 3.32log(20)= 5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

24
Step 5: Select the starting point, let it be the
minimum observation.
ƒ6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first
upper class=12-U=12-1=11
ƒ11, 17, 23, 29, 35, 41 are the upper class
limits.

25
Continued…
So combining step 5 and step 6, one can construct the following
classes.
Class
limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41

Step 7: Find the class boundaries;


26
Continued…
For the first class :-
Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
 Then continue adding w on both boundaries to obtain the rest
boundaries. By doing so one can obtain the following classes.

Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5 27
Continued…
• Step 8: tally the data.
Step 9: Write the numeric values for the tallies
in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative
cumulative frequency.
• The complete frequency distribution follows:

28
Continued…:
Cf Cf rcf
Class Class Class Fre (less (more (less
Tally rf.
limit boundary Mark q. than than than
type) type) type

6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10


12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 ////// 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

29
Continued…:
Exercise 1: The following data are the weights in kg of 40
individuals participated in a diet program for weight
loss:
• 70 64 99 55 64 89 87 65 62 38 67
70 60 69 78 39 75 56 71 51 99 68
95 86 57 53 47 50 55 81 80 98 51
36 63 66 85 79 83 70.
• Construct a grouped frequency distribution for this data
by using Sturgess’ rule for the number of classes. The
distribution must contains class boundaries, class mark,
Lcf, Mcf, Rf, Lrcf and Mrcf.

30
Test Time Allowed-1hr
The following data are on the number of minutes
to travel from home to work for 25 workers:
28 25 48 37 41 19 32 26 16 23 23 29 36 31
26 21 32 25 31 43 35 42 38 33 28.
Construct a grouped frequency distribution for this
data by using Sturgess’ rule for the number of
classes. The distribution must contains class
boundaries, class mark, Lcf, Mcf, Rf, Lrcf and
Mrcf.
31
Solution:

• Range =R=Max.value -Min. value= 48 – 16


=32
• The number of classes=K=1+3.32log(25)
=5.64≈6
• Class width=W=R/K=32/6=5.33 rounding up
to the nearest integer i.e W=6.
• The unit of measurement is=u=1
• Let the lower limit of the first class be 16 then
the frequency distribution is as follows:
32
Continued
Class Class Clas Freq LCF MCF Relative LRCF MRCF
limit boundaries s mark uency frequency
16-21 15.5-21.5 18.5 3 3 25 0.12 0.12 1
22-27 21.5-27.5 24.5 6 9 22 0.24 0.36 0.88
28-33 27.5-33.5 30.5 8 17 16 0.32 0.68 0.64
34-39 33.5-39.5 36.5 4 21 8 0.16 0.84 0.32
40-45 39.5-45.5 42.5 3 24 4 0.12 0.96 0.16
46-51 45.5-51.5 48.5 1 25 1 0.04 1.0 0.04
Total 25 1.0

33
Continued…
• Definition : A relative frequency distribution is a
distribution which specifies the frequency of a
class relative to the total frequency.
• PERCENTAGE
The percentage for a category is obtained by
multiplying the relative frequency for that
category by 100 .

• Example : Convert the above absolute frequency


distribution in above example to a relative
frequency distribution and in percentage..

34
Continued…
• Solution: First we find the relative frequency
of each class. The relative frequency of a class
is the frequency of the class divided by the
total number of observations. For instance the
relative frequency of the first class is
3/25=0.12, the relative frequency of the second
class is 6/25=0.24, and so on. Thus, the
relative frequency distribution is shown in the
table below.
35
Continued…
Class limit Frequency Relative frequency Percentage(%)
16-21 3 0.12 0.12x100=12%
22-27 6 0.24 0.24x100=24%
28-33 8 0.32 0.32x100=32%
34-39 4 0.16 0.16x100=16%
40-45 3 0.12 0.12x100=12%
46-51 1 0.04 0.04x100=4%
Total 25 1.0 100%

36
Diagrammatic and graphical presentation
of data

-These are techniques for presenting data in visual displays using


geometric and pictures.
Importance:
• They have greater attraction.
• They facilitate comparison.
• They are easily understandable.

37
1) Diagrammatic presentation of data
 Diagrams are appropriate for presenting discrete data.
The three most commonly used diagrammatic presentation for
discrete as well as qualitative data are:
• Pie charts
• Bar charts
a) Pie chart
• It is a circle divided by radial lines into sectors so that the area of
each sector is proportional to the size of the figure represented.
Pie-chart construction:
• Calculate the percentage frequency of each component. It is
fi
Percentage  *100%
n
• Calculate the degree measures of each sector. It is given by .
fi
Angle  * 3600
n
• Draw the circle using protractor and compass. 38
Example :
• The following data are the blood types of 50
volunteers at a blood plasma donation clinic:
• O A O AB A A O O B A O A AB B O
O O A B A A O A A B O B A O AB A
O O A B AAA O B O O A O A B O
AB A O
a)Organize this data using a categorical
frequency distribution.
b)Present the data using both a pie .
39
Solution:
The classes of the frequency distribution are A, B, O, AB. Count the number of donors
for each of the blood types.

Blood type Frequency Percent Angle(In degree)

A 19 38.0 38136.8
57.6
B 8 16.0
136.8
O 19 38.0
28.8
AB 4 8.0

Total 50 100.0 360


40
Continued…
Blood Types

8%

38% Blood Type A


Blood Type B
38% Blood Type O
Blood Type AB

16%

41
Exercise
Draw a pie-chart to represent the following data on a certain
family expenditure.
Table: Family expenditure.
Item Amount (in birr)
Food 3,000
Clothing 1,000
House rent 3,000
Fuel & Light 1,000
Saving 2,000
Total 10,000

42
b) Bar chart
• A set of bars (thick lines or narrow rectangles)
representing some magnitude over time space.
- They are useful for comparing aggregate over
time space.
- Bars can be drawn either vertically or
horizontally.
- There are different types of bar charts. The most
common types are :
• Simple bar chart
• Component or sub divided bar chart.
• Multiple bar charts.

43
a)Simple Bar chart

• Are used to present data on one variable.


• They are thick lines (narrow rectangles) having
the same breadth.
• The magnitude of a quantity is represented by
the height /length of the bar.
Example: The following data represent sale by
product, 1957- 1959 of a given company for
three products A, B, C.

44
Continued…
Sales(birr) Sales(birr) Sales(birr
Product
In 1957 In 1958 In 1959
A 12 14 18

B 24 21 18

C 24 35 54

45
Continued…
SALES IN 1957
30

25

20
Sales in Birr

15

10

0
Product-A Product-B Product-C
Types of Products

46
Continued…
SALES IN 1958
40
35
30
Sales in Birr

25
20
15
10
5
0
Product-A Product-B Product-C
Types of Products

47
Continued…
SALES IN 1959
60

50

40
Sales in Birr

30

20

10

0
Product-A Product-B Product-C
Types of products

48
b)Component Bar chart

• When there is a desire to show how a total (or


aggregate) is divided in to its component parts, we use
component bar chart.
-The bars represent total value of a variable with each
total broken in to its component parts and different
colours or designs are used for identifications
Example:
Draw a component bar chart to represent the sales by
product from 1957 to 1959.[for the previous example]

49
Solutions:
100
90
80
70
Sales in birr

60
50 Product-C
40 Product-B
30 Prduct-A
20
10
0
1957 1958 1959
Years of production

50
c)Multiple Bar charts
These are used to display data on more than
one variable.
They are used for comparing different
variables at the same time.
Example:
Draw a multiple bar chart to represent the sales
by product from 1957 to 1959[ in the previous
example]

51
Solution:
SALES BY PRODUCT FROM 1957 TO 1959
60

50

40
Sales in birr

30 Product -A
Prduct-B
20
Product-C
10

0
1957 1958 1959
Years of production

52
2)Graphical Presentation of data

• The histogram, frequency polygon and cumulative


frequency graph or ogive are most commonly applied
graphical representation for continuous data.
Procedures for constructing statistical
graphs:
• Draw and label the X and Y axes.
• Choose a suitable scale for the frequencies or cumulative
frequencies and label it on the Y axes.
• Represent the class boundaries for the histogram or ogive
or the mid points for the frequency polygon on the X axes.
• Plot the points.
• Draw the bars or lines to connect the points.

53
a)Histogram
It consists of a set of adjacent rectangles whose bases are
marked off by class boundaries (not class limits) along
the horizontal axis and whose heights are proportional
to the frequencies associated with the respective
classes.
Example:The following data are on the number of minutes
to travel from home to work for 25 workers:
28 25 48 37 41 19 32 26 16 23 23 29 36 31 26
21 32 25 31 43 35 42 38 33 28.
Present this data using Histogram and frequency polygon.

54
Solution:
First construct (or organize ) using the frequency distribution
table That is:
Class limit Class boundaries Frequency
16-21 15.5-21.5 3
22-27 21.5-27.5 6
28-33 27.5-33.5 8
34-39 33.5-39.5 4
40-45 39.5-45.5 3
46-51 45.5-51.5 1
Total 25

55
Continued…

56
Frequency polygon
Frequency Polygon
9
8
7
6
Frequency

5
4
3
2
1
0
12.5 18.5 24.5 30.5 36.5 42.5 48.5 54.5
Class Mark

57
Example

Construct a histogram and a frquency polygon for the following data.


11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34
39 27.
Solution:
First organize the data by using grouped frequency distribution.
Stape1: Range=Max.value-Min. Value=39-6=33
Step 2: Select the number of classes desired using Surges formula;
k = 1+ 3.32log(n) =1+ 3.32log(20)= 5.32=6(rounding up)
Step 3: Find the class width; w=R/k=33/6=5.5=6 (rounding up)

Step 4: Select the starting point, let it be the minimum observation.


ƒ6, 12, 18, 24, 30, 36 are the lower class limits.
Step 5: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
ƒ11, 17, 23, 29, 35, 41 are the upper class limits.

58
Frequency Distribution Table
Class Limit Class Boundary Class Mark Frequency

6 – 11 5.5 – 11.5 8.5 2


12 – 17 11.5 – 17.5 14.5 2
18 – 23 17.5 – 23.5 20.5 7
24 – 29 23.5 – 29.5 26.5 4
30 – 35 29.5 – 35.5 32.5 3
36 – 41 35.5 – 41.5 38.5 2
Total 20

59
Histogram

60
Frequency Polygon
Frequency Polygon
8
7
6
Frequency

5
4
3
2
1
0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Class Mark

61
3) Ogive (cumulative frequency polygon)

A graph showing the cumulative frequency (less


than or more than type) plotted against upper
or lower class boundaries respectively. That is
class boundaries are plotted along the horizontal
axis and the corresponding cumulative
frequencies are plotted along the vertical axis.
The points are joined by a free hand curve.
Example: Draw an ogive curve(less than type) for
the above example.

62
Class Limit Class Boundary Class Mark LCF
Frequency

6 – 11 5.5 – 11.5 8.5 2 2


12 – 17 11.5 – 17.5 14.5 2 4
18 – 23 17.5 – 23.5 20.5 7 11
24 – 29 23.5 – 29.5 26.5 4 15
30 – 35 29.5 – 35.5 32.5 3 18
36 – 41 35.5 – 41.5 38.5 2 20
Total 20

63
Ogive
25
Less than cumulative frequency

20

15

10

0
0 10 20 30 40 50
Upper Class boundaries

64
65
Thank you !!!

66

You might also like