Introduction To Statistics
Introduction To Statistics
Untouchability is a sin
Untouchability is a crime
Untouchability is inhuman
CHAPTER ONE
1. DEFINITIONS, SCOPE AND LIMITATIONS OF STATISTICS
Introduction:
In the modern world of computers and information technology, the importance of statistics is
very well recognized by all the disciplines. Statistics has originated as a science of statehood and
found applications slowly and steadily in Agriculture, Economics, Commerce, Biology,
Medicine, Industry, planning, education and so on.
As on date there is no any other human walk of life, where statistics cannot be applied
Origin and Growth of Statistics:
The word ‘Statistics’ and ‘Statistical’ are all derived from the Latin word Status, means a
political state. The theory of statistics as a distinct branch of scientific method is of
comparatively recent growth. Research particularly into the mathematical theory of statistics is
rapidly proceeding and fresh discoveries are being made all over the world.
Definitions of Statistics:
Statistics is defined differently by different authors over a period of time.
Definitions by A.L. Bowley:
Statistics may be called the science of counting in one of the departments due to Bowley,
obviously this is an incomplete definition as it takes into account only the aspect of
collection and ignores other aspects such as analysis, presentation and interpretation.
Bowley gives another definition for statistics, which states ‘statistics may be rightly
called the scheme of averages’. This definition is also incomplete, as averages play an
important role in understanding and comparing data and statistics provide more measures.
Scope of Statistics:
Statistics is applied in every sphere of human activity social as well as physical
like Biology, Commerce, Education, Planning, Business Management,
Information Technology, etc. It is almost impossible to find a single department
of human activity where statistics cannot be applied.
Statistics and Industry:
In industries, control charts are widely used to maintain a certain quality level. In
production engineering, to find whether the product is conforming to specifications or
not, statistical tools, namely inspection plans, control charts, etc., are of extreme
importance. In inspection plans we have to resort to some kind of sampling a very
important aspect of Statistics.
Statistics and Commerce:
Any businessman cannot afford to either by under stocking or having overstock of his goods.
In the beginning he estimates the demand for his goods and then takes steps to adjust with his
output or purchases. Thus statistics is indispensable in business and commerce.
Statistics and Agriculture:
Analysis of variance (ANOVA) is one of the statistical tools developed by Professor R.A.
Fisher, plays a prominent role in agriculture experiments. In tests of significance based
on small samples, it can be shown that statistics is adequate to test the significant
difference between two sample means.
Statistics and Economics:
Statistical data and technique of Statistical analysis have' proved immensely useful in solving a variety of
economic problems, such as wages, prices, analysis of time series and demand analysis. It has also
facilitated the development of economic theory. Wide applications of mathematics statistics in the study of
economics have led to the development of new disciplines called Economic Statistics and Econometrics.
Statistics and Education:
Statistics is widely used in education. Research has become a common feature in all branches of
activities. Statistics is necessary for the formulation of policies to start new course, consideration
of facilities available for new courses etc.
2. Inferential Statistics
deals with making inferences and/or conclusions about a population based on data obtained from a
limited sample of observations,
Inferential statistics is used to make predictions or comparisons about larger group (a
population) using information gathered about a small part of that population.
There are two main methods used in inferential statistics: estimation and hypothesis testing.
1.1 Definition of some basic terms
Population: A population is the set of all objects we wish to study.
A population may be finite or infinite.
If a population of values consists of a fixed number of these values, the population is said to be
finite, otherwise, it is infinite.
Sample: A sample is part of the population we study to learn about the population.
A sample should be a representative of the population.
Example1: In a certain study, 900 men were selected from Oromia Region. It was found that 25
are smokers.
(a) What is the population in this study?
(b) What is the sample size?
Solution
(a) The population is men from Oromia.
(b) The sample size is 900.
Example 2
A finite population includes the following:
(a) Students studying Business Administration at the Methodist University.
(b) All football clubs in the first and second divisions in Ghana.
(c) All households in Ethiopia.
Example 3
An infinite population includes the following:
(a) The set of real numbers between two integers.
(b) All fishes in River Volta.
(c) All palm trees in West Africa.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Parameter: is a descriptive measure of a population, or summary value calculated from a
population. Examples: Average, Range, variance value of the population.
Statistic: is a descriptive measure of a sample, or summary value calculated from a sample.
Example: Average, Range, variance value of the sample.
1.2 Stages in Statistical Investigation
We have defined statistics, in singular sense, as a science that deals with collection, organization
(classification), presentation, analysis, and interpretation of numerical facts. So we consider the
following stages of statistical investigation:
Data Collection: This is a stage where we gather information for our purpose.
Data Organization: It is a stage where we edit our data. A large mass of figures that are
collected from surveys frequently need organization. The collected data involve irrelevant
figures, incorrect facts, omission and mistakes.
Data Presentation: The organized data can now be presented in the form of tables, charts,
diagrams and graphs. At this stage, large data are presented in a very summarized and condensed
manner.
Data Analysis: This is the stage where we critically study the data. The purpose of data analysis
is to dig out information useful for decision making.
Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. If the data that have been analyzed are not properly interpreted, the whole
purpose of the investigation may be defected and misleading conclusion may be drawn.
1.3 Application and limitation of statistics
Uses of statistics
The science of statistics is very essential for research and decision making processes in all
aspects of human life. The following are some of the areas for which statistical analysis is
required:
To represent the facts in the form of numerical data.
To summarize a mass of data into a few presentable understandable and precise figures.
To Predict or forecast future trend.
To help select a course of action among a number of alternatives.
To help in formulating policies.
Limitations of Statistics
a) It does not study qualitative characteristics directly Examples: Beauty, honesty, poverty,
and standard of living.
b) It does not study a single individual but deals with aggregate of facts. Example: The
population size of a country for some given year does not help us for comparative
studies.
c) Statistical laws are not exact: It is well known that mathematical and physical sciences
are exact. But statistical laws are not exact and statistical laws are only approximations.
Statistical conclusions are not universally true. They are true only on an average.
d) It is sensitive for misuse: Statistics must be used only by experts; otherwise, statistical
methods are the most dangerous tools on the hands of the inexpert. The use of statistical
tools by the inexperienced and untraced persons might lead to wrong conclusions.
Statistics can be easily misused by quoting wrong figures of data. As King says aptly
‘statistics are like clay of which one can make a God or Devil as one pleases’.
1.4 Types of variables and measurement scales
A variable is a characteristic of an object that can have different possible values.
There are two types of variables.
a) Quantitative variables: are variables that can be quantified or can have numerical
values. Examples: height, area, income, temperature e t c.
b) Qualitative variables: are variables that cannot be quantified directly. Examples: beauty,
Gender, location. qualitative variables are also called categorical variables. And hence we
have two types of data; quantitative & qualitative data.
Quantitative variables can be further classified as
Discrete variables, and
Continuous variables
a) Discrete variables are variables whose values are counts.
Examples: number of students, number of households (family size), Number of pages of
a book.
b) Continuous variables are variables that can have any value within an interval.
Examples: weight, Length, Volume, e t c.
1.5 Measurement scales
There are four types of measurement scales for variables
1. Nominal scale: - “Nominal “is a Latin word for “name”
This is a scale for grouping individuals into different categories.
This scale of measure applies to qualitative variables only.
On the nominal scale, no order is required.
We cannot do arithmetic operations on data measured on the
nominal scale.
Example: Colour, Gender, Short, Tall, Pass, Fail etc.
2. Ordinal scale: - “ ordinal” is a Latin word, meaning “order”
This scale also applies to qualitative data.
It is a scale for grouping and ordering of individuals in to different categories.
In this case one category is lower than the next one or vice versa.
Ordinal scales data contain and convey more information than the nominal scale data
We cannot do arithmetic operations on data measured on the nominal scale.
Examples: military ranks, ranks in race, ranks of collage academic staff, e t c.
3. Interval scale:
This scale of measurement applies to quantitative data only.
There is no true zero point (arbitrary zero paint)
There is no physical significance to the zero point.
In this scale, the zero point does not indicate a total absence of the quantity
being measured.
o
Example: c, oF (Measuring units of temperature)
Possible to add and subtract but multiplication and division are not possible
37Oc – 35oc = 2oc
45oc – 43oc= 2oc
40 c = 2(20 c) But this does not imply that an object which is 40 oc is twice as hot as an
o o
Quantitative Qualitative
Variables Variables
Meaured using
Discreate Continuous Nominal and
Variables Variables Oridinal scale of
measurnment
1.6 Sources of data and methods of data collection
Any aggregate of numbers cannot be called statistical data. We say an aggregate of numbers is
statistical data when they are
Comparable
Meaningful and
Collected for a well defined objective
Nature of Data:
It may be noted that different types of data can be collected for different purposes. The data can
be collected in connection with time or geographical location or in connection with time and
location. The following are the three types of data:
1. Time series data: The data that collected over period of time
- This type of data might have been collected either at regular intervals of time
or irregular intervals of time
2. Spatial data: If the data collected is connected with that of a place, then it is termed as spatial
data.
3. Spacio-temporal data: If the data collected is connected to the time as well as place then it is
known as spacio temporal data.
Raw data: are collected data, which have not been organized numerically.
Examples: 25, 10, 32, 18, 6, 93, 4.
An array: is an arrangement of raw numerical data in ascending or descending order of
magnitude.
It enables us to know the range of the data set easy and it also gives us some idea
about the general characteristics of the distribution.
Categories of data based on its sources
Primary source:
Is a source of data that supplies first hand information for the use of the immediate purpose
Primary data is the one, which is collected by the investigator himself for the purpose of a
specific inquiry or study. Such data is original in character and is generated by survey conducted
by individuals or research institution or any organization.
Primary data are more expensive than secondary data.
Secondary source:
Secondary data are those data which have been already collected and analysed by some earlier
agency for its own use; and later the same data are used by a different agency.
Secondary data are available from libraries, government agencies and the internet.
Libraries
A common place to look for secondary data is a library. Here, data can be obtained from
magazines, journals and newspapers.
Government agencies
Government data can be obtained from publications issued by local, state, national and
international governments. Such data include laws, regulations, statistics and consumer
information.
Internet
Secondary data can be obtained from search engines such as Yahoo, Google, MSN.com, etc., on
the internet.
6. What is a questionnaire?
Female Employe
Unemployed
d) Quantitative classification: – in terms of magnitude.
Quantitative classification refers to the classification of data according to some
characteristics that can be measured such as height, weight, etc., For example the students of
a college may be classified according to weight.
2.2 Frequency Distributions
Frequency: - is the number of times a certain value or set of values occurs in a specific group.
The word 'frequency' is derived from 'how frequently' a variable occurs.
Frequency distribution: is a table that presents data according to some criteria with the corresponding
number of items falling in each class (i.e. with the corresponding frequencies.)
Example: A frequency distribution presenting the number of male and females in 1 st Statistics department.
Sex Frequency
Male 1
Female 20 A frequency distribution is constructed for three main
reasons:
1. To facilitate the analysis of data.
2. To estimate frequencies of the unknown population distribution from the distribution of
sample data and
3. To facilitate the computation of various statistical measures
Generally, there are three basic types of frequency distributions: Categorical, Ungrouped and Grouped
frequency distributions.
1. Categorical frequency distribution
–the data are usually qualitative
– the scales of measurements for the data are usually nominal or ordinal
The categorical frequency distribution is used for data which can be placed in specific categories
such as nominal or ordinal level data. For example, data such as political affiliation, religious
affiliation, blood type, marital status, or major field of study would use categorical frequency
distributions.
Example: The following data are on the political party affiliations of sample of 21 Rural
Development and Agricultural Extension students. D, R, and O stand for Democratic,
Republican and other, respectively.
D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
The classes for grouping are ‘Democratic’, ‘Republican’ and ‘Other’.
Table: Number of students by political party affiliations.
STEPS 8, 9 and 10 are displayed in the following table (columns 3, 4 and 5&6 respectively).
Class limits Class Tally Frequency Cumulative Cumulative
boundaries frequency (less frequency (more
than type) than type)
26 – 30 25.5 – 30.5 ///// 5 5 40
31 – 35 30.5 – 35.5 ///// 5 10 35
36 – 40 35.5– 40.5 ///// 5 15 30
41 – 45 40.5– 45.5 ///// //// 9 24 25
46 – 50 45.5– 50.5 ///// // 7 31 16
51 – 55 50.5– 55.5 / 1 32 9
56 – 60 55.5– 60.5 // 2 34 8
61 – 65 60.5– 65.5 ///// / 6 40 6
Example 2:
Example 2.2
Table 2.15 gives the ages of a sample of patients who attended Hope Medical Hospital.
(a) Find the sample size. (b) Complete the blank cells.
Table 2.15: Ages of patients
Ages (years) Frequency Relative frequency Cumulative
frequency less than type
10 – 14 – – –
15 – 19 8 0.16 12
20 – 24 15 – –
25 – 29 – – 37
30 – 34 – – –
Solution
(a) If the sample size is n, then the relative frequency of the second class interval
is 8 ÷ n. Hence, n is a root of the equation
8
=0.16
n
8
n= =50
0.16
a) Hence the sample size is equal to 50.
b) The missing value will be
AB
B lo o d T y p e
8 10 12 14 16 18
Frequency
S a le s ( in m illio n b ir r ) 20
18
16
14
12
10
6
A B C D
Product
2. Deviation bar-diagrams
When the data take both positive and negative values (for instance data on profit, net export, percent
change, etc) deviation bar-diagrams are appropriate.
Example: Present the following data using a suitable bar-diagram.
Data: Net profit (in thousands birr) in oil sales for five years
Year Profit (in thousands)
1997 12
1998 -5
1999 14
2000 9
2001 -6
The deviation bar-diagram for the data looks like the following.
20
Profit (in thousands)
10
-10
1997 1998 1999 2000 2001
Year
3. Broken bar-diagrams
This kind of bar-diagram is used to present data involving a few extreme values where it will be difficult
to accommodate the magnitude of the bars corresponding to these values within the graph paper. In this
case we use pieces of bars with each piece starting with a jump on the numerical scale.
Example: Data: - Amount of production per a day for four products of a factory.
Product Quantity produced (kg/day)
A 14
B 35
C 23
D 109
4. Component bar-diagrams
When it is desired to show how a total (an aggregate) is divided into component parts, we use component
bar diagram. In such type of bar-diagrams, the bars represent aggregate value of a variable with each
aggregate broken into its component parts and different colors or designs are used for identification.
Example: Represent the following data using bar-charts
Data: Yields of production of farmers in Southern Ethiopia.
Year 1990 EC 1991 EC 1992 EC 1993 EC
Crop
Barley 14 15 26 19
Wheat 10 15 14 25
Maize 2 6 10 3
Total 26 36 50 47
60
50
Production
40
30
20
MAIZE
10
WHAET
0 BARLEY
1990 1991 1992 1993
YEAR
The component bar-diagram for this table is as follows
5. Multiple bar-diagrams
Multiple bar-diagrams are used to display data on more than one variable. They are used for comparing
different variables at the same time.
Example: The data given in the above example can be presented by using multiple bar-diagram as below
30
P r o d u c tio n
20
10
BARLEY
WHAET
0 MAIZE
1990 1991 1992 1993
YEAR
II. Pie-charts
A pie-chart is a circle that is divided into sections or wedgrs according to the percentages of frequencies
in each category of the distribution. The angle of the sector of a class is obtained by multiplying the ratio
of the frequency of the class to the total frequency by 360 0.
frequency of the class
i.e. sector angle of a class= ×3600
total frequency
Note that pie-charts are usually used for depicting nominal level data.
Example: A survey showed that a car owner spends birr 2,950 per year on operating expenses. Below is
the breakdown of the various expenditure items. Draw an appropriate chart to portray the data.
Expenditure item Amount (in birr)
Fuel 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492
Total 2,950
17% 20%
Key
Fuel
9% 22%
Insurance and license
Repairs
Interest on car loan 32%
Depreciation
III. Pictograms
In pictograms, we represent the data by means of some picture symbols. Here we decide a suitable picture
to represent a definite number of units in which the variable is measured.
SExample: The following table shows the orange production in a plantation from production
year 1990-1993. Represent the data by a pictogram.
Table: Orange productions from 1990 to 1993.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000
IV. Histogram
A histogram is
another way of data
presentation which
is more suitable for
frequency
distributions with
continuous classes.
In drawing a histogram, we put the class boundaries of each class on the horizontal axis and its respective
frequency on the vertical axis.
Example: Draw a histogram presenting the following data.
Frequency Cumulative Cumulative
Class Boundaries Class Mark Frequency (less Frequency (more
than type) than type)
5.5 – 11.5 8.5 2 2 20
11.5 – 17.5 14.5 2 4 18
17.5 – 23.5 20.5 7 11 16
23.5 – 29.5 26.5 4 15 9
29.5 – 35.5 32.5 3 18 5
35.5 – 41.5 38.5 2 20 2
V. Frequency Polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the vertical axis
and their respective class marks along the horizontal axis. Then join the cross points by a free hand curve.
Example: Present the data in the previous example using a frequency polygon.
10
8
Frequency
0
0.0 8.50 14.50 20.50 26.50 32.50 38.50
Class Marks
30
Less than type cumulative frequencies
20
10
0
11.50 17.50 23.50 29.50 35.50 41.50
20
10
0
5.50 11.50 17.50 23.50 29.50 35.50