Chapter Two
Chapter Two
1
It includes all methods from simple visual observations to the use of high-level machines and
measurements, sophisticated equipment or facilities, such as radiographic, X-ray machines,
microscope.
In a direct observation approach, an investigator stays at the place of survey and notes down the
observation himself.
c) Questionnaire Method (Self-administered)
Under this method, a list of questions related to the survey is prepared and distributed by hand to
the respondents or sent to the various respondents by post, Web sites, e-mail, etc. However, this
method cannot be used if the respondent is illiterate. It is a method that is often used in many
statistical investigations.
Depending on how questions are asked and recorded we can distinguish two major types of
questions- Open –ended questions, and closed ended questions.
a) Closed- ended questions:
questions: Closed questions offer a list of possible options or answers
from which the respondents must choose.
b) Open-ended questions:
questions: Open-ended questions permit free responses that should be
recorded in the respondent’s own words. The respondent is not given any possible
answers to choose from.
2.1.3. Types of Variables and Level (Scale) of Measurement
Types of Variables
Variables are divided into two: Qualitative and quantitative variable
1. Qualitative variables are nonnumeric variables and cannot be measured. Examples: gender,
religious affiliation, and state of birth.
2. Quantitative Variables are numerical variables and can be measured. Examples: balance in
checking account, number of children in family.
Note that quantitative variables are either discrete or continuous
Discrete variable:
variable: It assumes a finite or countable number of possible values. It is usually obtained
by counting. Example: number of children‘s in a family, number of cars at a traffic light
Continuous variable:
variable: It can assume any value within the defined range. Continuous variables are
usually obtained by measuring. Example: weight in kg, height, time, air pressure in a tire.
Level (Scale) of Measurement
There are four general levels of measurements:
These are: Nominal, ordinal, interval and ratio levels of measurements
2
1. Nominal level
The terms nominal level of measurements and nominal scale are commonly used to refer to data
that can only be classified into categories. In the strict sense of the words, however, there are no
measurements and no scales involved. Instead, there are just counts.
There is no any mathematical difference between categories.
Examples:
1. Sex of a person (male and female could be coded as 0 and 1).
2. Ethnic group (black, white and oriental may be coded as 0, 1 and 2).
This indicates that for nominal level of measurement, there is no particular order for the groupings.
Further, the categories are considered to be mutually exclusive. Note that in all the above cases one
cannot say that 1> 0 or 2> 1 etc.
Nominal level is considered the most primitive, the lowest or the most limited type of measurement
2. Ordinal Level
Ordinal data are nominal data, which have order and consensus. Measurements with ordinal scales
are ordered in the sense that higher numbers represent higher values, i.e., they can have meaningful
inequalities (< or >). In such kind of data, only counting and ranking are possible but it is not likely
to find exact differences.
Examples
1. Military ranks: comparing 3 stars general and 4 stars general.
2. Linkers scale such as 1= poor, 2= fair, 3= good and 4= excellent.
The major difference between a nominal level and an ordinal level of measurement is the “greater
than” relationship between the ordinal-level categories. Otherwise, the ordinal scale of
measurement has the same characteristics as the nominal scale; namely, the categories are mutually
exclusive and exhaustive.
2. Interval level
The interval scale of measurement is the next higher level. It includes all the characteristics of the
ordinal scale, but in addition, the distance between values is a constant size. If one observation is
greater than another by a certain amount, and the zero point is arbitrary, the measurement is on at
least an interval scale. For example, the difference between temperatures of 70 degrees and 80
degrees is 10 degrees. Likewise, a temperature of 90 degrees is 10 degrees more than a temperature
of 80 degrees, and so on. Scores on a statistics or mathematics examination are also examples of the
interval scale of measurement.
3
3. Ratio level
Ratio level is the highest level of measurement. This level has all the characteristics of interval
level. The distances between numbers are of a known, constant size; the categories are mutually
exclusive, and so on.
The major differences between interval and ratio levels of measurement are these: (1) Ratio-level
data has a meaningful zero point and (2) the ratio between two numbers is meaningful. Money is a
good illustration having zero dollars has meaning you have none! Weight is another ratio-level
measurement.
Examples:
Classification of data is the process of arranging things in groups or classes according to their
resemblance.
Purposes of Classification:
Classification: -
To eliminate unnecessary detail.
To bring out clearly points of similarity & dissimilarity
To enable one to form mental pictures of objects on measurements
To enable one to make comparisons and draw inferences
Types of Classification
1. Geographical Classification:
Classification: - Data are arranged according to places like continents, regions,
and countries
2. Chronological Classification:
Classification: - Data are arranged according to time like year, month.
4
3. Qualitative Classification: - Data are arranged according to attributes like color,
religion, marital-status, sex, educational background, etc.
4. Quantitative Classification:
Classification: - In this type of classification, the statistical data is classified
according to some quantitative variables. The variable may be either discrete or continuous.
2.3. PRESENTATION OF DATA
This section deals with the study of organizing a set of raw data into a Frequency Distribution (FD)
and describes the distribution graphically in a histogram, a frequency polygon, & a cumulative
frequency curve (ogive). The other types of numerical information will be summarized & presented
in the form of bar chart, pie chart or a pictogram.
5
Definition: A frequency distribution is the organization of raw data in table form, using classes and
frequencies.
Frequency distribution is of two kinds: Ungrouped and Grouped Frequency Distribution
2.3.2.1. Ungrouped Frequency Distribution (UFD)
Shows a distribution where the values of a variable are linked with the respective frequencies.
Example 1: Consider the number of children in 15 families.
1 0 3 2 0
2 4 1 3 1
4 1 2 2 3
Construct ungrouped FD for the above data.
Solution:
No. of Children No. of Family Frequency Relative
(Values) (Tallies) Frequency
0 // 2 0.13
1 //// 4 0.27
2 //// 4 0.27
3 /// 3 0.2
4 // 2 0.13
Total 15 1
Exercise 1
Consider the following scores in a statistics test obtained by 20 students in a given class.
10, 4, 4, 7, 5, 7, 7, 8, 5, 7, 8, 5, 10, 8, 7, 5, 7, 8, 7, 4
Prepare an ungrouped FD
2.3.2.2. Grouped Frequency Distribution (GFD)
If the mass of the data is very large, it is necessary to condense the data into an appropriate number
of classes or groups of values of a variable and indicate the number of observed values that fall into
each class. Therefore, a GFD is a frequency distribution where values of a variable are linked into
groups & corresponded with the number of observations in each group.
Example 2
Values (xi) 1 - 25 26 - 50 51 - 75 76 - 100
Frequency (fi) 3 10 18 6
Remarks:
a. If both the LCL & UCL are included in a class, it is called an inclusive class. E.g 0 – 10,
11 – 20, 21 – 30, and so on. For inclusive classes,
Class width (cw) = UCBi - LCBi
b. If LCL is included and the UCL is not included in a class, it is called an exclusive class.
E.g. 0 – 10, 10 – 20, 20 – 30, and so on. For exclusive classes
cw = UCLi – LCLi
To be consistent, we use inclusive classes.
5. Class Mark (cm): it is the midpoint (center) of a class cmi = UCBi + LCBi
2
Note:-
Note:- the difference between any two successive class marks is equal to the width of a class
6. Range (R) : is the difference between the largest (L) and the smallest (S) values in a data
R=L–S
7
2. n 1 + 3.322LogN; up/down to
Determine the number of classes (n) using sturgles formula n
the nearest whole number, where N=Total number of observation
3. Find the class width (cw) by dividing the range(R) by the number of classes (n) and round to
the nearest integer. cw=R/n
4. Identify the unit of measurement usually as 1, 0.1, 0.01…
5. Determine the class limits
a. Determine the lower-class limit of the first class (LCL1), pick a suitable starting point less
than or equal to the minimum value. then
LCL2 = LCL1 + cw, LCL3 = LCL2 + cw,… LCLi+1 = LCLi + cw
b. Determine the upper class limit of the first class (UCL1) i.e.
UCL1 = LCL1 + cw – u, where u = the unit of measurement, then
UCL2 = UCL1 + cw, UCL3 =UCL2 + cw, … , UCLi+1 = UCLi + cw
6. Compute class boundaries (UCBi = UCLi + ½*U and LCBi = LCLi – ½*U) and class mark
(Cmi=(UCBi+LCBi)/2) then complete the GFD with the respective class frequencies.
Example 3: The number of customers for consecutive 30 days in a supermarket was listed as
follows:
20 48 65
25 48 49
35 25 72
42 22 58
53 42 23
57 65 37
18 65 37
16 39 42
49 68 69
63 29 67
a. Construct a GFD with a suitable number of classes
b. Complete the distribution obtained in (a) with class boundaries & class marks
Solution:
1. Range = Largest value – smallest value = 72 – 16 = 56
2. N = 30 (total number of observations)
Number of classes, n = 1 + 3.322 log30
n = 1 + 3.322 log30
= 1 + 3.322 (1.4771)
= 5.9
Hence a suitable number of class n is chosen to be 6
8
3. Class width = Range = 9.33 = cw
56
n 6
9
For the sake of convenience, take cw to be 10 (note that it is also
possible to choose the cw to be 9).
4. u=1
5. Take lower limit of the 1st class (LCL1) to be 16 & u = 1
LCL1 = 16 and UCL1 = LCL1 + cw – u =16+10-1 = 25
LCL2 = LCL1 + cw = 16 + 10 = 26 UCL2 = UCL1 + cw = 25 + 10 = 35
LCL3=LCL2 + cw = 26 + 10 = 36 UCL3 = UCL2 + cw = 35 + 10 = 45 and so on
10
37 40 69
35 36 70
72 62 36
72
65 64 47
59 55 42
45 50 46
65
54 63 51
50 61 60
58 58 56
58
55 45 49
51 50 56
44 60 70
44
52 43 55
46 42 62
57 48 60
55
A. How many classes would you recommend?
B. What class interval would you suggest?
C. What would you recommend as the lower limit and upper limit of the first class?
D. Organize the data into a grouped frequency distribution based on the appropriate
class interval.
E. What is the modal age?
2.3.2.3. Cumulative Frequency Distribution (CFD)
It is the collection of values of a variable above or below specified values in a distribution. GFD
is of two types.
a. ‘Less Than’ Cumulative Frequency Distribution (<CFD): shows the collection of cases
lying below the upper class boundaries of each class.
b. ‘More Than’ Cumulative Frequency Distribution (>CFD): shows the collection of cases
lying above the lower class boundaries of each class.
Remark: The frequency distribution does not tell us directly the number of units above or
below specified values of the classes this can be determined from a “cumulative Frequency
Distribution’
11
Example 4: Convert the absolute frequency distribution in example 3 into a less than and more
than cumulative frequency distribution
Class Class Frequency Less than Cumulative More than Cumulative
(xi) Boundaries (fi) Frequency (<cfi) Frequency (>cfi)
16 – 25 15.5 – 25.5 7 7 30
26 – 35 25.5 – 35.5 2 9 23
36 – 45 35.5 – 45.5 6 15 21
46 – 55 45.5 – 55.5 5 20 15
56 – 65 55.5 – 65.5 6 26 10
66 – 75 65.5 – 75.5 4 30 4
This means that from ‘less than’ cumulative frequency distribution there are 7 observations less
than 25.5, 9 observations below 35.5, etc and from ‘more than’ cumulative frequency
distribution 30 observations are above 15.5, 23 observations are above 25.5 etc.
2.3.2.4. Relative Frequency Distribution (RFD)
It enables the researcher to know the proportion or percentage of cases in each class. Relative
frequencies can be obtained by dividing the frequency of each class by the total frequency. It
can be converted in to a percentage frequency by multiplying each relative frequency by 100%.
i.e.
fi
Rf i =
n
Where Rfi – is the relative frequency of the ith class Note: Pfi = Rfi 100%
fi – is the frequency of the ith class Where Pfi is percentage frequency of each class. n – is
the total number of observations
Example 5: The relative and percentage frequency distribution of Example 3 is:
Exercise 4
12
Consider the following data.
22, 11, 12, 12, 13, 20, 14, 14, 3, 4, 5, 6, 7,7,8,7,8,9,10, 11, 13, 15, 16, 17, 17, 18, 15, 19, 12,
14,15,18
A. Prepare an ungrouped frequency distribution
B. Construct a grouped frequency distribution with appropriate number of classes and
compute the class boundaries and class marks
C. Construct a less than and more than cumulative frequency distribution
D. Compute the relative and percentage frequency distribution
2.3.3. GRAPHIC METHODS OF DATA PRESENTATION
2.3.3.1. Histogram
After you complete a frequency distribution, your next step will be to construct a “picture” of
these data values using a histogram. A histogram is a graph consisting of a series of adjacent
rectangles whose bases are equal to the class width of the corresponding classes and whose
heights are proportional to the corresponding class frequencies. Here, class boundaries are
marked along the horizontal axis (x – axis) and the class frequencies along the vertical axis (y –
axis) according to a suitable scale. It describes the shape of the data. You can use it to answer
quickly such questions as are the data symmetric? And where do most of the data values lay?
Example 6: Construct a histogram for the following distribution.
5
Series1
4
F
q
u
r
e
0
14.5-24.5 24.5-34.5 34.5-44.5 44.5-54.5 54.5-64.5
Class Boundaries
13
Exercise 5: Construct a histogram for the following distribution
Class (xi) 5 – 10 10 – 15 15 – 20 20 – 25 25 - 30 30 – 35
Frequency (f
(fi) 4 7 9 12 6 5
Frequency Polygon
9
8
7
6
5
Frequency
F
q
u
r
e
4
3
2
1
0
9.5 19.5 29.5 39.5 49.5 59.5 69.5
Class Marks
.
Exercise 6: Construct
Construct a frequency polygon for the frequency distribution given in exercise 5
2.3.3.3. Cumulative Frequency Curve (Ogive)
It is the graphic representation of a cumulative frequency distribution Ogives are of two kinds.
‘Less than’ ogive and ‘more than’ Ogive (< Ogive and > Ogive).
A) ‘Less than’ ogive:
ogive: here, upper class boundaries are plotted against the ‘less than’
cumulative frequencies of the respective class & they are joined by adjacent lines.
14
Example 8: Draw a ‘less than’ Ogive for the following frequency distribution
Class (xi) 3-6 7 – 10 11 – 14 15 – 18 19 – 22
Frequency (f
(fi) 4 7 10 6 3
Class Boundaries 2.5-6.5 6.5-10.5 10.5-14.5 14.5-18.5 18.5-22.5
<Cumulative FD 4 11 21 27 30
>Cumulative FD 30 26 19 9 3
35
30
25
20
15
10
5
0
6.5 10.5 14.5 18.5 22.5
15
A more than ogive for the above frequency distribu-
tion
More than cumulative frequency (>Cfi
35
30
25
20
15
10
5
0
2.5 6.5 10.5 14.5 18.5
lower class boundaries (LCBi)
40
30 30
Values
25
20 20
15
10 10 10
0
1986 1987 1988 1989 1990 1991
Year
16
2.3.3.5. Bar Chart (Bar Diagram)
Histogram, Frequency polygon, ogives are used for data having an interval or ratio level of
measurement. The other kinds of presenting statistical data suitable for a particular kind of
situations are bar charts, pie chart and pictograph.
Bar chart is a series of equally spaced bars of uniform width where the height (length) of a bar
represents the amount (magnitude) of frequency corresponding with a category. Bars may be
drawn horizontally or vertically. Vertical bar graphs are preferred as they allow comparison with
other bars.
250
Revenue
200
150
100
50
0
1980 1981 1982
year
B. Multiple Bar Chart: Here two or more bars are grouped with the corresponding frequency to
represent two or more interrelated data in each category. The bars of related variables are kept
adjacent to each other for every set of values. These charts can be used if the overall total is not
required and each bar is shaded or colored separately and a key is given to distinguish them.
17
Example 12: The following table shows the production of wheat and maize in hundreds of
quintals.
Year 1980 1981 1982
Maize 40 20 60
Wheat 80 60 100
Solution:
C. Subdivided (Component) Bar Chart: It is used to present data by subdividing a single bar
with respect to the proportional frequency. Each portion of the bar is then shaded or colored and
a key is given to distinguish them.
Example 13: The number of quintals of wheat and maize (in millions of quintals) produced by
country x in the indicated years.
Solution:
18
The number of quintals of wheat and maize
produced by country X
Number of quintals
600
Year
100%
22
80% 50 40
60% wheat
40% 78 maize
50 60
20%
0%
1980 1981 1982
Year
Solution:
2.3.3.6. Pie Chart
A pie chart is a circle divided in to various sectors with areas proportional to the value of the
component they represent. It shows the components in terms of percentages not in absolute
magnitude. The degree of the angle formed at the center has to be proportional to the values
represented.
19
Clothing 100 100/1000 100 = 10%
Food 350 350/1000 100 = 35%
House Rent 250 250/1000 100 = 25%
Miscellaneous 300 300/1000 100 = 30%
Total 1000 100%
Solution: The pie chart for the above expenditure is as follows
Food
30% 35% House rent
Clothing
Misc.
10% 25%
1991 -
1990 -
20