Chapter 1
Chapter 1
Chapter 1
1. Basic concepts, methods of data collection and presentation
1. 1. Introduction
1.1.1 Definition and Classification of Statistics
Definition: Statistics is the science of conducting studies to collect, organize, analyzed and draw
conclusions from the data.
In general, statistics can be defined into two senses.
1. In singular sense: It is a subject or science that deals with the methods of collection,
organization, analysis of data and interpretation of the results.
2. In plural sense: It is defined as a set (aggregate) of numerical data or a quantitative aspect of
facts.
1.1.2 Classification of Statistics
Statistics can be classified into two broad areas.
1. Descriptive Statistics: It is a part of statistics which can be used to organize and summarize
masses of data.
The frequency distribution, measure of central tendencies such as mean and median, and
measure of variation such as range and standard deviation belong to this category of
statistics.
❖ Example: The average age of students in this class is 21.
1. Inferential Statistics: It is a major part of statistics which concerned with making
decisions, inferences (conclusions) and forecasting about the population based on sample
results.
❖ It includes estimation and test of hypothesis about the population.
Example: Drinking decaffeinated coffee can raise cholesterol levels by 7%.
Exercise: Describe the following sentences whether inferential statistics or descriptive statistics.
Suppose that the height of 6 randomly selected students from section 2 are the following:
160cm, 165cm, 175cm, 170cm, 180cm and 185cm.
1. The average height of six students is 172.5cm.
2. The average height of students in this section is not less than 172.5cm.
3. About half of the six students have the height more than 170cm.
4. The average height of students in section 2 is greater than that of section 1.
❖ Skip patterns means skipping a question or a group of questions which are not applicable.
Disadvantages
❖ Untrained interviewer may distort the meaning of the questions.
❖ Attribute of the interviewer may affect the responses due to:
a) Bias of the interviewer and b) his/her social or ethnic characteristics.
❖ It costs much in terms of time and money.
b) Telephone Interviews
Advantages
❖ It is less expensive in time and money compared with face-to-face interviews.
❖ The interviewer is able to help the respondent if he/she doesn’t understand the question
(as seen with face-to-face interview)
❖ Broad representative samples can be obtained for those who have telephone lines.
Disadvantage
❖ Under representation of those groups which do not have telephones.
❖ Problem with unlisted telephone number in the directory.
❖ Respondent may be substituted by another.
c) Self administered questionnaires returned by mail (mailed questionnaire)
Here the questionnaire is mailed to the respondents to be filled. Sometimes
it is known as self-enumeration.
Advantages
❖ These are the cheapest.
❖ There is no need for trained interviewer.
❖ There is no interviewer bias.
Disadvantage
❖ Low response rate
❖ Uncompleted questionnaires due to omission or invalid responses.
❖ No assurance that the questionnaire was answered by the right person
❖ Needs intense follow up to get a high response rate.
3. The use of documentary sources
Extracting information from existing sources (e.g. Hospital records) is much less expensive
than the other two methods. It can be an important source of data.
After having the collected and edited data, the next important step is to organize it. That is
to present it in a readily comprehensible condensed form that aids to draw inferences from
it. It is also necessary that the like be separated from the unlike ones.
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution: Since the data are qualitative (categorical), discrete classes can be used. There are four types
of marital status M, S, D, and W. These types will be used as the classes for the distribution.
0 2 2 1 1 2
3 5 3 2 2 2
1 0 1 2 4 2
0 1 0 1 4 4
2 2 0 1 1 5
Solution: First arrange the data in order of magnitude (in ascending order) and then count the
frequency. The distinct values for these data are: 0,1,2,3,4 & 5. => 𝑠𝑚𝑎𝑙𝑙.
No of cups Frequency (f)
0 5
1 8
2 10
3 2
4 3
5 2
Total 30
✓ Each individual value is presented separately, that is why it is named ungrouped frequency
distribution.
Class intervals (CI): are a non-overlapping interval such that each value in the set of
observations can be placed in one, and only one, of the intervals.
Then continue to add the class width to this upper limit to get the rest of the
upper class limits. i.e. 𝒖𝒄𝒍𝒊+𝟏 = 𝒖𝒄𝒍𝒊 + 𝒘 , 𝑖 = 1,2, … , 𝑘 − 1.
✓ where "𝒖" is a unit measurement or the smallest difference between the two nearest
observations in the data. It is usually taken as 1, 0.1, 0.01,... as the data is given as whole
numbers , tenth digit, hundredth digit , ... respectively.
6. Find the frequencies.
o Class boundaries (CB): separate one class from another but there is no gab b/n
the consecutive classes.
are the set of exact limits or true limits. They are called lower- and upper-class
boundaries.
o Lower class boundary (LCB): The lcb is obtained by subtracting half the unit
of measurements from the lcl of the class. i.e.
𝒖
𝒍𝒄𝒃𝒊 = 𝒍𝒄𝒍𝒊 − 𝟐 𝑵𝒐𝒕𝒆: 𝒍𝒄𝒃𝒊+𝟏 = 𝒍𝒄𝒃𝒊 + 𝒘
o Upper class boundary (UCB): The ucb is obtained by adding half the unit of
measurements from the ucl of the class. i.e.
𝒖
𝒖𝒄𝒃𝒊 = 𝒖𝒄𝒍𝒊 + 𝟐 𝑵𝒐𝒕𝒆: 𝒖𝒄𝒃𝒊+𝟏 = 𝒖𝒄𝒃𝒊 + 𝒘
❖ Class marks (mid points) (m): It is the average of lcl and ucl or lcb and ucb.
𝒍𝒄𝒍𝒊 +𝒖𝒄𝒍𝒊 𝒍𝒄𝒃𝒊 +𝒖𝒄𝒃𝒊
𝒎𝒊 = 𝒐𝒓 𝒎𝒊 = 𝑵𝒐𝒕𝒆: 𝒎𝒊+𝟏 = 𝒎𝒊 + 𝒘
𝟐 𝟐
• Then continue adding 𝒘 on both boundaries to obtain the rest boundaries. By doing so one
can obtain the following classes.
Class boundary
5.5 – 12.5
12.5 – 19.5
19.5 – 26.5
26.5 – 33.5
33.5 – 39.5
Step 7: Find the frequencies.
Year of report 1986 1987 1988 1989 1990 1991 1992 1993
Cases 2 17 87 190 448 885 3256 2814
Sex
Antigen Male Female Total
DPT 250 300 550
Polio 300 320 620
BCG 200 210 410
2. Pie-Chart
It is used to show the partitioning of a total data into its component parts using circles. The
circles should be divided into sectors proportional to the frequencies of the categories they
represent.
Steps to draw a pie chart
1. Convert frequencies into percentage relative frequency.
2. Draw a circle of any radius.
3. Convert percentage relative frequencies into degree measures.
𝟑𝟔𝟎𝟎 𝒙 %𝒓𝒇
𝒂𝒏𝒈𝒍𝒆 𝒐𝒇 𝒂 𝒔𝒆𝒄𝒕𝒐𝒓 =
𝟏𝟎𝟎%
Example
Draw the pie chart for the following data. First construct a table providing the central angles.
Histogram
b) Frequency polygon
It is a multi-sided figure which is drawn by plotting the class marks (midpoints) in the x-axis and
the frequencies in the y-axis. Then connect the points with straight lines and extend these lines on
both ends so that it reaches the horizontal axis at the class mid points. This allows the total area to
be enclosed.
Example: draw the frequency polygon for the following age data.
Note: The total area under the frequency polygon is equal to the area under the histogram.