Statatics Cha 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

CHAPTER ONE

Definition of terms
Data: -

1. are figures or facts from which conclusion can be made


2. are the numerical results of any scientific measurement.

Any value that is expressed in numbers is called data.

Population: the totality of all elements under study

Sample: is a portion or part of the population taken so that some generalization about the
population can be made.

It is the subset of the population, which is assumed to be the representative of the population.

Definition of Statistics

Statistics can be defined in two senses: -plural (as Statistical Data) and

- Singular (as Statistical Methods).

Plural sense: Statistics are collection of facts (figures).

This meaning of the word is widely used when reference is made to facts and figures on sales,
employment or unemployment, accident, weather, death, education, e.t.c.

Eg: Sales Statistics, Labor Statistics, Employment Statistics, e.t.c.

In this sense, the word Statistics serves simply as data. But not all numerical data are statistics.

Singular sense: it is the science that deals with the methods of data collection, organization,
presentation, analysis and interpretation of data.

It refers the subject area that is concerned with extracting relevant information from available
data with the aim to make sound decisions.

According to this meaning, statistics is concerned with the development and application of
methods and techniques for collecting, organizing, presenting, analyzing and interpreting
statistical data.
Stages in Statistical investigation

According to the singular sense definition of statistics, a statistical study involves five stages:
Collection of Data, Organization of Data, Presentation of Data, Analysis of Data and
Interpretation of Data.

1. Collection of Data: It is the first stage in any statistical investigation and involves the process
of obtaining (gathering) a set of related measurements to meet predetermined objectives.
The data collected may be primary data (data collected directly by the investigator) or
it may be secondary data (data obtained from intermediate sources such as newspapers,
journals, official records, e.t.c).
2. Organization of Data: It is usually not possible to derive any conclusion about the main
features of the data from direct inspection of the observations. The second purpose of
statistics is describing the properties of the data in a summary form. This stage of statistical
investigation helps to have a clear understanding of the information gathered and includes
editing (correcting), classifying and tabulating the collected data in a systematic manner.
Thus the first step in the organization of data is editing. It means correcting (adjusting)
omissions, inconsistencies, irrelevant answers and wrong computations in the collected
data. The second step of the organization of data is classification that is arranging the
collected data according to some common characteristics. The last step of the organization
of data is presenting the classified data in tabular form, using rows and columns
(tabulation).
3. Presenting of Data: The purpose of data presentation is to have an overview of what the data
actually looks like, and to facilitate statistical analysis.
Data presentation can be done using Graphs and Diagrams that have great memorizing effect
and facilitates comparison.

4. Analysis of Data: The analysis of data is the extraction of summarized and comprehensive
numerical description in order to reach conclusions or provide answers to a problem. The
problem may require simple or sophisticated mathematical expressions.

5. Interpretation of Data: This is the last stage of statistical investigation. Interpretation


involves drawing conclusions from the data collected and analyzed in order to make decision.
Classification of Statistics

Based on the scope of the decision, statistics can be classified into two;

Descriptive and Inferential Statistics.

1) Descriptive Statistics refers to the procedures used to organize and summarize masses of data.

It is concerned with describing or summarizing the most important features of the data.

It deals only with the characteristics of the collected data without going beyond it.

The methodology of descriptive statistics includes the methods of organizing (classification,


tabulation, Frequency Distributions) and presenting (Graphical and Diagrammatic Presentation)
data and calculations of certain indicators of data like Measures of Central Tendency and
Measures of Dispersion (Variation) which summarize some important features of the data.

2) Inferential (Inductive) Statistics includes the methods used to find out something about a
population, based on the sample.

It is concerned with drawing statistically valid conclusions about the characteristics of the
population based on information obtained from sample. Performing hypothesis testing,
determining relationships between variables and making predictions are also inferential
statistics.

Ex: Classify the following statements as Descriptive and Inferential Statistics

a. The average age of the students in this class is 21 years.


b. At least 5% of the killings reported last year in city X were due to tourists.
c. Of the students enrolled in Rift Valley University in this year, 74% are male and 26% are
female.
d. The chance of winning the Ethiopian National Lottery in any day is 1 out of 167000.
Variable
It is a characteristics or an attribute that can assume different values.
Eg: Height, Family size, Gender
Based on the values that variables assume, variables can be classified as
1. Qualitative variables: do not assume numeric values.
Eg: Gender

2. Quantitative variables: assume numeric values. These variables are numeric in


nature.
Eg: Height, Family size
 Discrete variable: takes whole number values and consists of distinct
recognizable individual elements that can be counted. It is a variable that
assumes a finite or countable number of possible values. These values are
obtained by counting (0, 1, 2. . .).
Eg: Family size, Number of children in a family, number of cars at the
traffic light
 Continuous variable: takes any value including decimals. Such a variable
can theoretically assume an infinite number of possible values. These
values are obtained by measuring.
Eg: Height, Weight, Time, Temperature
Generally the values of a variable can be obtained either by counting for discrete variables, by
measuring for continuous variables or by making categories for qualitative variables.
Ex: Classify each of the following as Qualitative and Quantitative and if it is quantitative classify
as Discrete and Continuous.

a. Color of automobiles in a dealer’s show room.


b. Number of seats in a movie theater.
c. Classification of patients based on nursing care needed (complete, partial or seafarer)
d. Number of tomatoes on each plant on a field.
e. Weight of newly born babies.

Scales of Measurements/Levels of Measurements


Consider the following two cases.

 Mr A wears 5 when he plays foot ball.

 Mr B wears 6 when he plays foot ball.


Who plays better?
What is the average shirt number?
 Mr A scored 5 in Stat quiz.
 Mr B scored 6 in Stat quiz.
Who did better?
What is the average score?

Based on the number on the shirts it is not possible to judge, whether Mr B plays better. But by
using the test score, it is possible to judge that Mr B did better in the exam. Also it not possible to
find the average shirt numbers (or the average shirt number is nothing) because the numbers on
the shirts are simply codes but it is possible to obtain the average test score.
Therefore scales of measurement:

 Shows the information contained in the value of a variable.


 Shows also that what mathematical operations and what statistical analysis are
permissible to be done on the values of the variable.

 Nominal Scales of variables are those qualitative variables which show category of
individuals. They reflect classification in to categories (name of groups) where there is no
particular order or qualitative difference to the labels. Numbers may be assigned to the
variables simply for coding purposes. It is not possible to compare individual basing on the
numbers assigned to them. The only mathematical operation permissible on these variables is
counting.
These variables
 Have mutually exclusive (non-overlapping) and exhaustive categories.
 No ranking or order between (among) the values of the variable.
Eg: Gender, Religion, ID No, Ethnicity, Color
 Ordinal Scales of variables are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the only mathematical operations to be done on the
values of the variables. But there is no precise difference between the values (categories) of
the variable.
Eg: Academic qualifications (B.Sc., M.Sc., Ph.D.), Grade Scores (A, B, C, D, F), Strength
(very weak, week, strong, very strong), Health status (very sick, sick, cured)
 Interval Scales of variables are those quantitative variables when the value of the variables is
zero it does not show absence of the characteristics i.e. there is no true zero. Zero indicates low
than empty. There is a precise difference between the units of measurement (levels)
Eg: temperature, 00c does not mean there is no temperature but to say it is too cold.
 Ratio Scales of variables are those quantitative variables when the values of the variables are
zero it shows absence of the characteristics. Zero indicates absence of the characteristics.
Eg: Height, Weight, Income, Amount of yield, Expenditure, Consumption.
All mathematical operations are allowed to be operated on the values of the variables
Data Types
Based on the source, data can be classified into two: Primary Data and Secondary Data.

 Primary data are data collected for the first time either through direct observation or by
enquiring individuals. It refers to the data collected either by or under the direct supervision
and instruction of the researcher.
 Secondary data are data obtained from published or unpublished sources like newspapers,
journals, official records, e.t.c.
Based on the role of time, data can be classified as Cross-sectional and Time series.

 Cross-sectional Data: is a set of observations taken at a point of time.


 Time series Data: is a set of observations collected for a sequence of time usually at equal
intervals.
Methods of Data Collection
The first and foremost task in statistical investigation is data collection. Before data collection,
four important points should be considered. These are the purpose of data collection (why we need
to collect data), the data to be collected (what kind of data to be collected), the source of data
(where we can get the data) and the methods of data collection (how can we collect this data).
These steps are called the why, what, where and how of the data collection.

Primary data are collected from primary sources and secondary data from secondary sources.

Primary data can be collected through experimental methods in laboratory in natural sciences and
through survey method in social sciences.

The survey methods of data collection are personal interview, telephone interview, mailed
questionnaire and personal observation.

 Observational Method: This method involves monitoring of an ongoing activity and


direct recording of data. It avoids incompleteness of data. However, it is rarely used as it
is not possible to plan when the events will happen.
 Personal Interview: a trained interviewer asks a series of questions and records responses
on a specially designed form called questionnaire. In this approach the enumerator is with
the respondent s/he explains some points which is not clear for the respondent.
In this approach the quality of the data affected both the design of the questionnaire and
the quality of the interviewer. It has the advantage of obtaining information in depth from
a person being interviewed, since we can make some clarifications to the questions and
avoids incompleteness and disorder responses.
Disadvantage:
 It is costly than other methods, since it requires training of interviewers and
transportation cost.
 The respondent may not tell us the real information for sensitive questions,
since there is face to face interaction. Eg: Asking about salary, if his/her
salary is very small, he/she might tell us the wrong one, since the respondent gets
ashamed of it.

 Telephone Interview: This method involves contacting the respondent on telephone and
collecting information. It is faster to collect information. The absence of telephone lines
makes this approach less usable. It cannot be also used for rural surveys.
Advantage: It is less costly, since it requires less number of interviewers and the cost for
calling is than the cost for transportation. The respondent may give his/her opinion candidly
since there is no face to face interaction. Because of this, the data we get through this
method are more realistic than the previous one.
Disadvantage: this method is not applicable in developing countries because of the lack of
access to telephone. The respondent might not be in his/her house or may not respond to
the call, and in the meantime the interviewer might get bored. There is a high chance of
getting incomplete response, since the connection can be interrupted.
 Mailed Questionnaire: the researcher sends the questionnaire to the respondent; the
respondents complete the form and sends back to the researcher. Costs are low. The
responses are free from biases of the interviewer and respondents can have more time to
give well thought answers. But it is applicable for educated persons. Non response, Partial
response, low return rates.
Disadvantage: the respondent might give in appropriate answers to questions, since there
is no one is there with them they may understand the question wrongly and repond it
incorrectly.
Types of Surveys
In general there are two methods of data collection: Census Survey and Sample Survey
Method.

 Census Survey: is (complete enumeration) a study covered all the elements in the
population under consideration. In this method we resort a 100% inspection of the
population and each and every unit of the population is enumerated. It enables to
obtain information about each and every element in the population.

 Sample Survey: is a survey in which some elements which are representatives of


the population (sample) are taken to infer about the whole population. It is a
statistical process in which we select and examine a sample instead of considering
the whole population.

The Sampling method has many advantages over the census methods.
1. Sampling reduces cost of data collection.
2. Greater speed i.e. it enables us to obtain results on time.
3. Greater accuracy. It helps us to get data of good quality as the number of
enumerators’ decreases we can train and supervise them well in the process of data
collection.
4. Greater scope (under circumstances where human and material resources are limited).
5. Census may be destructive. Samples reduce the damages caused by some tests in
quality control. For example, in cooking food mothers check whether the food has
enough amount of salt, spices, butter and so on, by taking small amount and testing it.
What would happen if the test is all what is in the dish?
6. Complete enumeration may be impossible or impractical (when the population is
infinite), thus sampling is the only way.

You might also like