1.1 Definitions and Classification of Statistics: Chapter One: Introduction
1.1 Definitions and Classification of Statistics: Chapter One: Introduction
Contents
1. Introduction
1.1 Definitions and classification of Statistics
1.2 Stages in statistical investigation
1.3 Definition of some terms
1.4 Use, scope, limitation & misuse of Statistics
1.4.1 Uses of Statistics
1.4.2 Scope of Statistics
1.4.3 Limitations of Statistics
Introduction
“Statistical thinking will one day be as necessary for efficient citizenship as the ability to
read and write.”
H. G.WELLS
The word “Statistics” and “Statistical” are all derived from Latin word status which
means a political state. Statistics is defined differently by different authors over a period
of time. In the olden days statistics was confined to only state affairs but in modern days
it embraces almost every sphere of human activity. Therefore, a number of old
definitions, which was confined to narrow field of enquiry, were replaced by more
definitions, which are much more comprehensive and exhaustive. Let us examine
different way of defining statistics by different authors and Dictionaries.
[email protected]
1
Chapter one: Introduction
Despite these, the word statistics can have two different senses while we use it as plural
and singular verb. Statistics in singular verb is defined as the branch of mathematics that
deals with the collection, organization, analysis, and interpretation of numerical data.
Statistics is especially useful in drawing general conclusions about a set of data from a
sample of the data. But statistics in plural verb is defined as numerical data which has
been collected, classified, and interpreted.
Based on the usage of statistical data statistics is defined broadly in to two mutually
exclusive groups so called Descriptive statistics and inferential statistics.
Descriptive statistics are used to describe the basic features of the data in a study. They
provide simple summaries about the sample and the measures. Together with simple
graphics analysis, they form the basis of virtually every quantitative analysis of data.
Various techniques that are commonly used are classified as:
[email protected]
2
Chapter one: Introduction
Example-1: Of 350 randomly selected people in the town of Addis Ababa 280 people
had the last name Abebe. An example of descriptive statistics is the following statement:
"80% of these people have the last name Abebe."
Example-2: On the last 3 Sundays, Hiwot Car salesman sold 2, 1, and 0 new cars
respectively. An example of descriptive statistics is the following statement: "Hiwot
averaged 1 new car sold for the last 3 Sundays."
These are both descriptive statements because they can actually be verified from the
information provided.
Example-3: Of 350 randomly selected people in the town of Addis Ababa 280, Ethiopia,
people had the last name Abebe. An example of inferential statistics is the following
statement: "80% of all people living in Ethiopia have the last name Abebe."
We have no information about all people living in Ethiopia, just about the 350 living in
Addis Ababa. We have taken that information and generalized it to talk about all people
living in Ethiopia.
Example-4: On the last 3 Sundays, Hiwot. Car salesman sold 2, 1, and 0 new cars
respectively. An example of inferential statistics is the following statements: "Hiwot
never sells more than 2 cars on a Sunday."
Although this statement is true for the last 3 Sundays, we do not know that this is true for
all Sundays.
[email protected]
3
Chapter one: Introduction
Before we deal with statistical investigation, let us see what statistical data mean. Each
and every numerical data can’t be considered as statistical data unless it possesses the
following criteria. These are:
A statistician should be involved at all the different stages of statistical investigation. This
includes formulating the problem, and then collecting, organizing and classifying,
presenting, analyzing and interpreting of statistical data. Let’s see each stage in detail
Formulating the problem: first research must emanate if there is a problem. At
this stage the investigator must be sure to understand the problem and then
formulate it in statistical term. Clarify the objectives very carefully. Ask as
many questions as necessary because “An approximate answer to the right
question is worth a great deal more than a precise answer to the wrong
question.” -The first golden rule of applied mathematics-
Therefore, the first stage in any statistical investigation should be to:
Get a clear understanding of the physical background to the
situation under study;
Clarify the objectives;
Formulate the objective in statistical terms
Proper collection of data: in order to draw valid conclusions, it is important
‘good’ data. Data are gathered with aim to meet predetermine objectives. In
other words, the data must provide answers to problems. The data itself form the
[email protected]
4
Chapter one: Introduction
foundation of statistical analyses and hence the data must be carefully and
accurately collected..
Organization and classification of data: in this stage the collected data
organized in a systematic manner. That means the data must be placed in
relation to each other. The classification or sorting out of data is, by itself, a
kind of organization of data.
Presentation of data: The purpose of putting the organized data in graphs,
charts and tables is two-fold. First, it is a visual way to look at the data and see
what happened and make interpretations. Second, it is usually the best way to
show the data to others. Reading lots of numbers in the text puts people to sleep
and does little to convey information.
Analyses of data: is the process of looking at and summarizing data with the
intent to extract useful information and develop conclusions. Data analysis is
closely related to data mining, but data mining tends to focus on larger data sets,
with less emphasis on making inference, and often uses data that was originally
collected for a different purpose. In this stage different types of inferential
statistical methods will apply. For instance, hypothesis testing such as 2 test of
association.
Interpretation of data: interpretation means drawing valid conclusions from
data which form the basis of decision making. Correct interpretation requires a
high degree of skill and experience.
Note that: Analyses and interpretation of data are the two sides of the same
coin.
In this section, we will define those terms which will be used most frequently. These are:
Data: Facts or figures from which the conclusion can be drawn.
Data set: Facts or figures collected for a particular study. Each value in the data set is
called data value or datum.
[email protected]
5
Chapter one: Introduction
Raw Data: Data sheets are where the data are originally recorded. Original data are
called raw data. Data sheets are often hand drawn, but they can also be printouts from
database programs like Microsoft Excel.
Population: The totality of all subjects with certain common characteristics that are
being studied in a specified time and place.
Sample: Is a portion of a population which is selected using some technique of sampling.
Sample must be representative of the population so that it must be selected by any of the
developed technique.
Sampling: Is the process of selecting units (e.g., people, organizations) from a population
of interest so that by studying the sample we may fairly generalize our results back to the
population from which they were chosen. There are two types of sampling techniques
namely random sampling technique and non-random sampling technique.
Random sampling technique or probability sampling technique gives a non- zero chance
for all elements to be included in the sample. In other words, there is no personal bias
regarding the selection. The five common random sampling techniques are:
Simple Random sampling
Systematic Random sampling
Stratified Random sampling
Cluster Random sampling
Multi-stage sampling
Non-random sampling technique is mostly known as non-probability sampling
techniques and in this case not all elements of a population have a known chance of
inclusion or if some outcomes have a zero chance of being selected as a sample. The
most familiar examples of non-random sampling techniques are
Quota sampling
Convenience sampling
Volunteer sampling
Purposive sampling
Haphazard sampling
Snow ball sampling etc…
Sample size: The number of elements or observation to be included in the sample.
[email protected]
6
Chapter one: Introduction
Variable: Is an attribute of a physical or an abstract system which may change its value
while it is under observation. Variables are often specified according to their type and
intended use and hence variable can be classified in to two namely qualitative and
quantitative variables.
[email protected]
7
Chapter one: Introduction
device. No gaps between possible values. They are obtained by measuring. For example,
consider the heights of two people no matter how close it is we can find another person
whose height falls some where between the two heights is a continuous variable.
[email protected]
8
Chapter one: Introduction
[email protected]
9
Chapter one: Introduction
movement of the two variables. In such cases, it is the user who has to
interpret the results carefully, pointing out the type of relationship obtained.