Chapter One&2
Chapter One&2
INTRODUCTION TO STATISTICS
1.1. Definition and Classification of Statistics
Definition:
1. Descriptive Statistics
deals with describing data without attempting to infer anything that goes beyond the given set
of data,
Consists of collection, organization, summarization and presentation of data.
2. Inferential Statistics
deals with making inferences and/or conclusions about a population based on data obtained
from a limited sample of observations,
consists of performing hypothesis testing, determining relationships among variables and
making predictions.
Examples:
a) From past figures, it has been predicted that 31 0 0 of registered voters will vote in the November
election.
b) The average age of a student in Hawassa University is 20.1 years.
1
To determine the probability of reliability of a product.
To control the quality of products in a given production process.
To compare the improvement f yield due to certain additives (fertilizer, herbicides, (wee
decides), e t c
However, Statistics has the following limitations.
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It doesn’t study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative studies.
c) Statistical results are true only on the average. Examples: The probability of getting a head
in tossing a coin is 1|2 the germination percentage of a given variety of seed is 80%
d) It is sensitive for misuse: Examples: The number of car accidents committed in a city in
a particular year by women drivers is 10 while that committed by men drivers is 40. Hence
women drivers are safe drivers.
2
Examples: red, brown, black, short, tall, pass, fail
One is different from and greater /better/ less than the other.
3
1.5. Sources of Data and Methods of Data Collection
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
Comparable
Meaningful and
Collected for a well-defined objective
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
It enables us to know the rang of the data set easy and it also gives us some idea about
the general characteristics of the distribution.
Any scientific investigation requires data related to the study. The required data can be obtained
from either a primary source or a secondary source.
Primary source: Is a source of data that supplies firsthand information for the use of the
immediate purpose.
Primary data: are data originally collected for the immediate purpose.
- Primary data are more expensive than secondary data.
Secondary source: are individuals or agencies, which supply data originally collected for other
purposes by them or others.
- Usually they are published or unpublished materials, records, reports, e t c.
Secondary data: data collected from a secondary source.
The process of data collection from a primary source may be through:
a) Field trials
b) Laboratory experiments
c) Surveys – census survey
- Sample survey.
4
CHAPTER TWO
METHOD OF DATA ORGANIZATION AND PRESENTATION
Classification eliminates inconsistency and also brings out the points of similarity and/or
dissimilarity of collected items/data.
Classification is necessary because it would not be possible to draw inferences and conclusions if
we have a large set of collected [raw] data.
A frequency distribution is a table that presents data according to some criteria with the
corresponding number of items falling in each class (i.e. with the corresponding frequencies.)
Example: A frequency distribution presenting the number of males and females in a class
Sex Frequency
Male 57
Female 39
Generally, there are two basic types of frequency distributions: Ungrouped and Grouped
frequency distributions.
5
Ungrouped frequency distribution is a table of all potential raw scored values that could possibly
occur in the data along with their corresponding frequencies. Ungrouped frequency distribution
is often constructed for small set of data or a discrete variable.
Example: The following data are the ages in years of 20 women who attend health education last
year: 30, 41, 39, 41, 32, 29, 35, 31, 30, 36, 33, 36, 32, 42, 30, 35, 37, 32, 30, and 41.
STEP 2. Construct a table, tally the data and complete the frequency column. The frequency
distribution becomes as follows.
29 / 1
30 //// 4
31 / 1
32 /// 3
33 / 1
35 // 2
36 // 2
37 / 1
39 / 1
41 /// 3
42 / 1
6
3. Grouped frequency distribution
When the range of the data is large, the data must be grouped into classes. Grouped frequency
distribution is a frequency distribution when several numbers of data are grouped into one class.
class.
– Unit of measurement (U): the smallest difference between any two values of the variable being
measured.
– Cumulative frequency (Cf) less than type: the total frequency of all values (observations) less
than or equal to the upper class boundary for the given class.
– Cumulative frequency (Cf) more than type: The total frequency of all values (observations)
greater than or equal to the lower class boundary for the given class.
7
A tabular arrangement of class intervals together with their corresponding cumulative frequency
(either less than or more than type; as defined above) is called a cumulative frequency
distribution.
– Relative frequency: the frequency a class divided by the total frequency (i.e. sum of all
frequencies) and, if multiplied by 100, gives the percent of values falling in that class.
Frequencyof that class
Re lative frequencyof a class
Total frequency
Note:
The relative frequency shows what fractional part or proportion of the total frequency
belongs to the corresponding class.
The sum of all the relative frequencies in the frequency distribution is always 1.
– Relative cumulative frequency (less than type/ more than type): total of the relative frequencies
above/ below a class inclusively. Or the cumulative frequency (less than type/more than type)
divided by the total frequency. This gives the percent of values which are less than/more than
the upper/lower class boundary.
8
STEP 6. Find the upper class limits. To find the upper class limit of the first claa, subtract one unit
of measurement from the lower limit of the second class. Then continue to add the class
width to this upper limit so as to get the rest of the upper limits.
STEP 7. Compute the class boundaries as: LCB LCL 12 U and UCB UCL 12 U
Where LCL = lower class limit, UCL= upper class limit, LCB= lower class boundary and
UCB= upper class boundary. The class boundaries are also half way between the upper limit of
one class and the lower limit of the next class.
is given below.
62 50 35 36 31 43 43 43
41 31 65 30 41 58 49 41
37 62 27 47 65 50 45 48
27 53 40 29 63 34 44 32
58 61 38 41 26 50 47 37
STEP 6. Upper limit of the first class = 31-1 = 30. And hence the upper class limits become
30 35 40 45 50 55 60 65
9
Class limits Class Tally frequency Cumulative Cumulative
boundaries frequency frequency
(less than (more than
type) type)
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
are techniques for presenting data in visual displays using geometric figures;
are visual aids which give a bird’s eye view about a given set of numerical data;
10
are easily understandable by anyone who does have no statistical background
Usually diagrams are appropriate for presenting discrete data, whereas graphs are appropriate for
presenting continuous types of data.
There are three common diagrammatic presentations of data: bar-diagram/charts, pie-chart and
pictograms, as well as three common graphic presentations of data: histogram, frequency
polygon, and cumulative frequency polygon (ogive).
I. Bar-diagrams/ Bar-charts
Bar-diagram is a series of equally spaced bars having equal width and the height of each
bar representing the magnitude or frequency of observations in each group.
Bar-diagrams are usually used to represent one way or simple frequency distribution.
1. Simple bar-diagrams
Simple bar-diagrams are used to depict data of single variable or one-way variable.
Example: The following frequency distribution shows sales of production (in million birr) of
three products for 2004 production year.
A 14
B 21
C 9
D 17
11
22
20
18
16
14
12
10
6
A B C D
Product
II. Pie-charts
A pie-chart is a circle that is divided into sections or wedges according to the percentages of
frequencies in each category of the distribution. The angle of the sector of a class is obtained by
multiplying the ratio of the frequency of the class to the total frequency by 3600.
frequencyof theclass
i.e. sector angleof a class 3600
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses.
Below is the breakdown of the various expenditure items. Draw an appropriate chart to portray
the data.
Fuel 603
Repairs 930
12
Depreciation 492
Total 2,950
Fuel 603 20 74
Depreciation 492 17 60
13
17% 20%
Key
9% 22%
Fuel
Repairs
Depreciation
III. Histogram
A histogram is another way of data presentation which is more suitable for frequency
distributions with continuous classes.
In drawing a pictogram, we put the class boundaries of each class on the horizontal axis and its
respective frequency on the vertical axis.
14
5.5 – 11.5 8.5 2 2 20
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve.
Example: Present the data in the previous example using a frequency polygon.
10
6
Frequency
0
0.0 8.50 14.50 20.50 26.50 32.50 38.50
Class Marks
Cumulative frequency polygon can be traced on less than or more than cumulative frequency
basis. Place the class boundaries along the horizontal axis and the corresponding cumulative
15
frequencies (either less than or more than cumulative frequencies) along the vertical axis. Then
join the cross points by a free hand curve.
Example: the data in the previous example can be presented using either a less than or a more
than cumulative frequency polygon as given below (i) and (ii) respectively.
30
20
10
0
11.50 17.50 23.50 29.50 35.50 41.50
30
20
10
0
5.50 11.50 17.50 23.50 29.50 35.50
16