MEANING AND DEFINITION OF STATISTICS
Introduction to Statistics
If you have ever had to as much as calculate the average of a set of numbers or create bar and pie
charts or other forms of visualizing data, you have had something to do with statistics. The world
we exist in today is replete with numbers and data of all kinds especially with the boom in
consumer-focused technologies like smartphones, personal computers, the Internet and social
media.
Students at all levels of their education starting from primary school will necessarily study
statistics in one manner or the other. That underscores the importance of the subject. When
taking a statistics course as a major or minor part of your academic discipline, you will surely
have to do tasks and assignments that involve data processing.
Definitions
Statistics is a sub-discipline or branch of mathematics that concerns data. It is heavily focused
on the collection, organization, analysis, presentation, and interpretation of data.
Statistics are the numerical statement of facts capable of analysis, interpretation and the science
of the statistics is the study of the principles and the methods applied in collecting, presenting,
analysis and interpreting the numerical data in any field of inquiry.
Statistics are the branches of mathematics that deal with the collection, organization, analysis
and interpretation of numerical data.
Statistics is the science of estimates and probabilities.
Statistics as a Science: Science is a body of systematized knowledge. Any subject can be put in
the category of science if it possess following characteristics:
It is a systematized group of knowledge
Its laws and methods must be universally acceptable
1
It must analyze the cause-effect relationship
It must possess the quality of estimation and forecasting.
Prof. Horace Secrist defines Statistics as an aggregate of facts affected to a marked extent by
multiplicity of causes numerically expressed, enumerated or estimated according to reasonable
standards of accuracy, collected in a systematic manner for a pre-determined purpose and placed
in relation to each other.
Secrist’s definition trys to focus that statistics (as numerical data) must possess the following:
o Statistics should be aggregate of facts; single and isolated figures may not be termed as
statistics. For example, age of an individual and price of a single commodity do not form
statistics.
o Statistics should be affected by multiple causes; usually, numerical figures (data) are
influenced by variety of factors (causes)
o Statistics should be expressed numerically; things, attributes and conditions are counted.
o Reasonable standard of accuracy in estimation or enumeration is essential.
o Collection of statistics should be in a systematic manner
o Collection should be for a pre-determined purpose; to avoid waste of time, energy,
money and labour.
Branches of statistics
Descriptive statistics; deals with collection of data, presentation of data, find averages
and other good measures which describe the data.
Inferential statistics; deals with the techniques used for anlaysis of data, making
estimates and drawing conclusion about population on the basis of results obtained from
samples
Population and Samples
A population is an aggregate of individuals or object or material about which some sort of
information is required. Or Population is the set of objects we wish to study. The objects may be
people, things, or data (numbers) associated with people, things and events.
For example: we may have to study a population of fans produced in a factory during a certain
period. The entire numbers in the class is population. A set of levels of productivity of all
employees under a certain programme.
2
A population may be a finite or infinite. If a population of values consists of a fixed number of
these values, the population is said to be finite, otherwise, it is infinite. An infinite population
consists of endless succession of values. In practice, the term infinite population is used to refer
to a population that cannot be enumerated in a reasonable period of time.
A sample is a part selected from the population. The selected part enables us to know the
characteristics of a population e.g. a sample of ten fans selected from a big lot of fans produced
in a factory to know the performance of all fans. The objective of sampling is to form
conclusions about a population based on a sample drawn from that population.
The following are some of the reasons for sampling:
If it is too costly and time consuming to consider the entire population.
If the test is destructive.
If sufficiently accurate results can be obtained from sample.
DATA
Data are the facts and figures that are collected, analyzed, and summarized for presentation and
interpretation. Basic to any successful statistical analysis is the collection of data. By data, we
mean information obtained by measuring or observing characteristics or attributes of individuals
in a group.
The validity of any conclusions drawn from a statistical analysis largely depends on the nature,
relevance and accuracy of the data. The data may be regarded as the “raw material’ of any
statistical analysis.
Collection of data
There are two ways in collecting data:
a) Primary; the data published are used by organization with originally collected by them
or first-hand collection of data.
b)
Method to collect Primary Data
Direct personal observation
Registration
Estimate through local correspondents or source
Investigate or collection through enumeration
Questionnaire method
3
c) Secondary data; the data published are used by organization other than the one which
originally collected by them.
Method to collect Secondary Data
Official source (GSS, UNISCO, IMF, BoG)
Private source
Publication of research organisation
The Structure of Data
The structure of data is largely determined by the types of variables involved, the number of
variables and the number of observations.
Variable and constant
A quantity which changes from individual to individual, items to items, object to object is called
variable. Or a variable is a phenomenon that changes. A variable is also called variate and
denoted by x, y, z. A quantity which doesn’t change is called constant. OR, a quantity which is
fixed is called a constant.
Types of Variable
Every individual possesses a certain number of characteristics or attributes. Colour, age, height
and examination marks of observation are examples of characteristics or attributes.
Characteristics can assume different values, categories or descriptions. For example, the colour
of a pencil can be described as red, yellow, green, blue, etc. – that is a variable.
There are, broadly speaking, two types of variables – Quantitative and Qualitative variables.
The statistical analysis that is appropriate for a particular variable depends upon whether the
variable is qualitative or quantitative
Qualitative Variable
Qualitative variables are labels or names used to identify an attribute of each element. It uses
either nominal or ordinal scale of measurement and may be non-numeric or numeric. On the
nominal scale, no order or magnitude is required, on the ordinal scale, however, order is
important example Police Rank. They do not adhere to meaningful numerical scale; rather they
consist of categories and labels that may or may not have a natural ordering. Qualitative data
cannot be measured or counted directly (without introducing coding). They can only be
described or classified according to the characteristics, which they share in common. Therefore
consist of counts or frequencies in various categories. Qualitative variable changes in quality. To
illustrate, suppose a footballer, Asamoah Gyan having number “3” on his jersey. This is simply a
label that identifies the player.
4
Qualitative data that have a natural order but provide no information about the difference
between adjacent positions are ordinal scaled. For example, if a runner finishes fourth in a race,
the ‘4’ identifies the order of finish for this runner in a natural sequence from first to last; but it
provides no information about the difference in times between this runner and the runner that
finished third and fifth.
Quantitative Variable
A variable which changes in quantity is called Quantitative variable. Are numeric values that
indicate how much or how many? They are variables that can be measured or counted. Age,
temperature and weight are examples of quantitative variables. They are numerically
distinguished and ordered, and the differences among them are meaningful. For example, the
runner’s time in a race is a quantitative variable because it measures the amount of time it took
the runner to finish this race.
A quantitative variable whose values are not constant can be continuous or discrete. A
continuous such as weight can take any value on the continuous scale, although in practice, such
a value may be rounded to a specified number of significant figures.
Discrete variables take only specific values with no possibility of any other values between them.
They can only take a value from a set of distinct values such as nonnegative integers. Arithmetic
operations often provide meaningful results for quantitative variables. The data may be added
and then divided by the number of observations to compute the average value.
Basic comparison Qualitative data Quantitative data
Definition Qualitative data is Quantitative data is data that
information that cannot be can be expressed as a number
expressed as a number or can be quantified
Can data be counted? No Yes
Data type Words, objects, pictures, Number and statistics
observations, and symbols
Questions that data answer How and why this has How many, how much and
happened? how often
examples Names as Emma, Ana, Ella, Score on tests and exams,
Ho, Saviefe, green, blue weight of a person, shoe size
Purpose of data analysis Understand, explain and Test hypothesis, develop
interpret social interactions predictions for the future
and patterns check cause and effect
5
Types of data analysis Patterns, characteristics, Statistical relationship
theme identification identification
Scope of the results Less generalization, particular Generalization findings.
findings. Do not drive Draw conclusions and trends
conclusions and about a large population
generalizations across a based on a sample taken from
population it
Popular methods of data Content analysis, thematic Linear regression model,
analysis analysis, Discourse analysis, logistic regression, Analysis
Grounded theory, of variance, correlation
conversation analysis analysis, distribution
Data
Quantitative
Qualitative Continuous and Discrete
Nominal and Ordinal
Tabular Methods Graphical
Tabular Graphical
Methods
Methods
Methods
Freq Dist Histogram
Bar Graph
Freq. Dist Relative Freq Ogive
Pie Chart
Relative Freq Per Freq Di Stem-and-Leaf
Per Frq Cumulative Frq Scatter Plot
Crosstabulation Cumulative Rel
Crosstabulation
Basic steps in statistical analysis (DCOVA)
DEFINE the variables that you want to study in order to solve business problem or meet
business objective
COLLECT data from appropriate sources
6
ORGANISE the data by developing tables
VISUALISE the data by developing charts
ANALYZE the data by examining the appropriate tables and chart, and using other
statistical methods to reach conclusions.
7
Difference between Parameter and Statistic
PARAMETER STATISTIC
A parameter is a specific characteristics of a A statistic is a specific characteristic of a
population sample
The symbol µ is used for population mean The symbol is used for sample mean and
and the symbol σ for standard deviation. the symbol s is used for population standard
deviation.
The parameters are usually unknown and are The sample statistic is used to draw
estimated through sample conclusion about the population parameter
Population parameter is a fixed quantity Sample statistic is not a fixed quantity
(constant) (variable)
Parametric or Non-Parametric Tests
Before choosing a statistical test to apply to your data you should address the issue of whether
your data are parametric or not. This is quite a subtle and convoluted decision:
parametric
Nonparametric
• Ranks, scores, or categories are generally non-parametric data.
• Measurements that come from a population that is normally distributed can usually
be treated as parametric.
Does it matter whether you choose a parametric or nonparametric test?
– Large data sets present no problems.
– Small data sets present a dilemma.
Parametric Test
• Parametric statistical test are based upon the assumption that the data are sampled
from a Gaussian distribution or normally distributed.
• These tests include the t test and analysis of variance.
• For Parametric statistical tests, it is important that the assumption made on the probability
distribution is valid.
• If this assumption about the data is true, parametric tests are:
• more powerful than their equivalent non-parametric counterparts
• can detect differences with smaller sample sizes,
• detect smaller differences with the same sample size.
Non-parametric Test
8
• Tests that do not make assumptions about the population distribution are referred to as
nonparametric- tests.
• All commonly used nonparametric tests rank the outcome variable from low to high and
then analyze the ranks.
• These tests include the Wilcoxon, Mann-Whitney test, and Kruskal-Wallis tests.
• These tests are also called distribution-free tests.
• Assume that your data have an underlying continuous distribution.
• Assume that for groups being compared, their parent distributions are similar in all
characteristics other than location.
• Are usually less sensitive than parametric methods.
• Are often more robust than parametric methods when their assumptions are properly met.
• Can run into problems when there are many ties (data with the same value).
• That take into account the magnitude of the difference between categories (e.g. Wilcoxon
signed ranks test) are more powerful than those that do not (e.g. sign test).
9
10
Qn1
Ghana Airport Company collects data on visitors to Accra. The following questions were among
16 asked in a questionnaire handed out to passengers during incoming airline flights in
September, 2014.
a) This trip to Accra is my: 1st, 2nd, 3rd, 4th, ectc.
b) The primary reason for this trip is: (10 categories including vacation, convention,
honeymoon, Business, seminar)
c) Where I plan to stay: (11 categories including hotel, apartment, relatives, camping)
d) Total days in Accra
What is the population being studied
Is the use of a questionnaire a good way to reach the population of passengers on
incoming airline flights?
Comment on each of the four questions in terms of whether it will provide qualitative or
quantitative data.
Qn 2
Suppose that you measure the time it takes to download a video from the Internet.
a. Explain why the download time is a continuous numerical variable
b. Explain why download time is a ratio scaled variable
Qn 3
The director of market research at a large department store chain wanted to conduct a survey
throughout the Municipality to determine the amount of time working women spend shopping
for clothing in a typical month.
a). Describe both the population and the sample of interest,
b). Indicate the type of data the director might want to collect
c). Develop a first draft of the questionnaire needed in (b) by writing three categorical
questions and three numerical questions that you feel would be appropriate for this
survey.
11
QN4
State whether each of the following variables is qualitative or quantitative and indicate its
measurement scale.
a) Gender
b) Class rank
c) Make automobile
d) Number of people favoring the death penalty
e) Annual Sales
f) Soft-drink size (small, medium, large)
g) Employee classification (GS1 through GS18)
h) Earning per share
i) Method of payment (cash, chaque, Credit card)
j) Age
QN5
Identify each random variable as being either discrete or continuous
a) The number of blue balls selected randomly from a box containing six red and eight blue
balls
b) The number shown after a die is tossed once
c) The speed of a car on a highway
d) Weight loss after a series of aerobic exercises.
Qn6
(a) What is the difference between Parameter and Statistic
(b) Explain three reasons why you have to adopt sampling
(c) What are the differences between the Parametric and non-Parametric
(d) Mention five each of the Parametric and Non-Parametric Statistical tools
(e) Briefly explain Descriptive Statistics and Inferential Statistics
(f) State five differences between qualitative and quantitative variables
12
13