Statatics Cha 1
Statatics Cha 1
Statatics Cha 1
Definition of terms
Data: -
Sample: is a portion or part of the population taken so that some generalization about the
population can be made.
It is the subset of the population, which is assumed to be the representative of the population.
Definition of Statistics
Statistics can be defined in two senses: -plural (as Statistical Data) and
This meaning of the word is widely used when reference is made to facts and figures on sales,
employment or unemployment, accident, weather, death, education, e.t.c.
In this sense, the word Statistics serves simply as data. But not all numerical data are statistics.
Singular sense: it is the science that deals with the methods of data collection, organization,
presentation, analysis and interpretation of data.
It refers the subject area that is concerned with extracting relevant information from available
data with the aim to make sound decisions.
According to this meaning, statistics is concerned with the development and application of
methods and techniques for collecting, organizing, presenting, analyzing and interpreting
statistical data.
Stages in Statistical investigation
According to the singular sense definition of statistics, a statistical study involves five stages:
Collection of Data, Organization of Data, Presentation of Data, Analysis of Data and
Interpretation of Data.
1. Collection of Data: It is the first stage in any statistical investigation and involves the process
of obtaining (gathering) a set of related measurements to meet predetermined objectives.
The data collected may be primary data (data collected directly by the investigator) or
it may be secondary data (data obtained from intermediate sources such as newspapers,
journals, official records, e.t.c).
2. Organization of Data: It is usually not possible to derive any conclusion about the main
features of the data from direct inspection of the observations. The second purpose of
statistics is describing the properties of the data in a summary form. This stage of statistical
investigation helps to have a clear understanding of the information gathered and includes
editing (correcting), classifying and tabulating the collected data in a systematic manner.
Thus the first step in the organization of data is editing. It means correcting (adjusting)
omissions, inconsistencies, irrelevant answers and wrong computations in the collected
data. The second step of the organization of data is classification that is arranging the
collected data according to some common characteristics. The last step of the organization
of data is presenting the classified data in tabular form, using rows and columns
(tabulation).
3. Presenting of Data: The purpose of data presentation is to have an overview of what the data
actually looks like, and to facilitate statistical analysis.
Data presentation can be done using Graphs and Diagrams that have great memorizing effect
and facilitates comparison.
4. Analysis of Data: The analysis of data is the extraction of summarized and comprehensive
numerical description in order to reach conclusions or provide answers to a problem. The
problem may require simple or sophisticated mathematical expressions.
Based on the scope of the decision, statistics can be classified into two;
1) Descriptive Statistics refers to the procedures used to organize and summarize masses of data.
It is concerned with describing or summarizing the most important features of the data.
It deals only with the characteristics of the collected data without going beyond it.
2) Inferential (Inductive) Statistics includes the methods used to find out something about a
population, based on the sample.
It is concerned with drawing statistically valid conclusions about the characteristics of the
population based on information obtained from sample. Performing hypothesis testing,
determining relationships between variables and making predictions are also inferential
statistics.
Based on the number on the shirts it is not possible to judge, whether Mr B plays better. But by
using the test score, it is possible to judge that Mr B did better in the exam. Also it not possible to
find the average shirt numbers (or the average shirt number is nothing) because the numbers on
the shirts are simply codes but it is possible to obtain the average test score.
Therefore scales of measurement:
Nominal Scales of variables are those qualitative variables which show category of
individuals. They reflect classification in to categories (name of groups) where there is no
particular order or qualitative difference to the labels. Numbers may be assigned to the
variables simply for coding purposes. It is not possible to compare individual basing on the
numbers assigned to them. The only mathematical operation permissible on these variables is
counting.
These variables
Have mutually exclusive (non-overlapping) and exhaustive categories.
No ranking or order between (among) the values of the variable.
Eg: Gender, Religion, ID No, Ethnicity, Color
Ordinal Scales of variables are also those qualitative variables whose values can be ordered
and ranked. Ranking and counting are the only mathematical operations to be done on the
values of the variables. But there is no precise difference between the values (categories) of
the variable.
Eg: Academic qualifications (B.Sc., M.Sc., Ph.D.), Grade Scores (A, B, C, D, F), Strength
(very weak, week, strong, very strong), Health status (very sick, sick, cured)
Interval Scales of variables are those quantitative variables when the value of the variables is
zero it does not show absence of the characteristics i.e. there is no true zero. Zero indicates low
than empty. There is a precise difference between the units of measurement (levels)
Eg: temperature, 00c does not mean there is no temperature but to say it is too cold.
Ratio Scales of variables are those quantitative variables when the values of the variables are
zero it shows absence of the characteristics. Zero indicates absence of the characteristics.
Eg: Height, Weight, Income, Amount of yield, Expenditure, Consumption.
All mathematical operations are allowed to be operated on the values of the variables
Data Types
Based on the source, data can be classified into two: Primary Data and Secondary Data.
Primary data are data collected for the first time either through direct observation or by
enquiring individuals. It refers to the data collected either by or under the direct supervision
and instruction of the researcher.
Secondary data are data obtained from published or unpublished sources like newspapers,
journals, official records, e.t.c.
Based on the role of time, data can be classified as Cross-sectional and Time series.
Primary data are collected from primary sources and secondary data from secondary sources.
Primary data can be collected through experimental methods in laboratory in natural sciences and
through survey method in social sciences.
The survey methods of data collection are personal interview, telephone interview, mailed
questionnaire and personal observation.
Telephone Interview: This method involves contacting the respondent on telephone and
collecting information. It is faster to collect information. The absence of telephone lines
makes this approach less usable. It cannot be also used for rural surveys.
Advantage: It is less costly, since it requires less number of interviewers and the cost for
calling is than the cost for transportation. The respondent may give his/her opinion candidly
since there is no face to face interaction. Because of this, the data we get through this
method are more realistic than the previous one.
Disadvantage: this method is not applicable in developing countries because of the lack of
access to telephone. The respondent might not be in his/her house or may not respond to
the call, and in the meantime the interviewer might get bored. There is a high chance of
getting incomplete response, since the connection can be interrupted.
Mailed Questionnaire: the researcher sends the questionnaire to the respondent; the
respondents complete the form and sends back to the researcher. Costs are low. The
responses are free from biases of the interviewer and respondents can have more time to
give well thought answers. But it is applicable for educated persons. Non response, Partial
response, low return rates.
Disadvantage: the respondent might give in appropriate answers to questions, since there
is no one is there with them they may understand the question wrongly and repond it
incorrectly.
Types of Surveys
In general there are two methods of data collection: Census Survey and Sample Survey
Method.
Census Survey: is (complete enumeration) a study covered all the elements in the
population under consideration. In this method we resort a 100% inspection of the
population and each and every unit of the population is enumerated. It enables to
obtain information about each and every element in the population.
The Sampling method has many advantages over the census methods.
1. Sampling reduces cost of data collection.
2. Greater speed i.e. it enables us to obtain results on time.
3. Greater accuracy. It helps us to get data of good quality as the number of
enumerators’ decreases we can train and supervise them well in the process of data
collection.
4. Greater scope (under circumstances where human and material resources are limited).
5. Census may be destructive. Samples reduce the damages caused by some tests in
quality control. For example, in cooking food mothers check whether the food has
enough amount of salt, spices, butter and so on, by taking small amount and testing it.
What would happen if the test is all what is in the dish?
6. Complete enumeration may be impossible or impractical (when the population is
infinite), thus sampling is the only way.