Pre Requisite Excel Stats
Pre Requisite Excel Stats
1
COLLECTION AND
TYPES OF DATA
INTRODUCTION
Statistics deals with collection, presentation, analysis and interpretation of data.
The whole exercise is performed with certain objective in mind. The objective
could be a crucial decision for a company, organisation, institution and even the
nation. Decisions can be about 'how much to produce ? how much to stock? what
should be the import-export policy? what should be the recruitment policy ? how
many counters should be opened at a service facility ?" etc. Any decision backed
by sound statisticalreasoning would be beneficial. The first step in reaching such
adecision would be data collection. This would be the main focus ofthis chapter.
Also we would discuss the different types of data that accrues from any statistical
inquiry.
POPULATION
SAMPLE
Toreach a decision for the population, ideally, information on the
would be best. For smallpopulations that would be possible. However entire population
a
census survey for large populations may not be possible due to the timecomplete and cost
involved. Therefore it is preferable to collect information from a part of the
population under study, namely, a sample. Asample is a collection of some. but
not all, of the units of the population. It is a subset
a population. How does one decide on the sample containing the characteristics of
to be
several procedures to decide on the units to be sampled collected ? There are
namely simple random
sampling, stratified sampling, cluster sampling, etc. We shall discuss one of them
namely simple random sampling (SRS).
Simplerandom sampling is a simple process of selectingasample of'n'
a population of Nunits in a units trom
random
units is done in such a way that the manner. By random we mean the selection
population
equal change of inclusion in the sample. units, eligible for selection, stand a
In simple random
sampling we use either one of the two procedures, namely
(i) simple random sampling without replacement (SRSWOR) and (ii) simple rand
sampling with replacement (SRSWR)
population. depending on the nature and size of the
Colleetion and iypes of Data 3
In SRSWOR a sample of size'n' is selected from a population of size 'N is selected
such that for sclection ofthe first sample unit, all the Nunits arc cligiblefor
selcction
with equal chance. In other words, all the units stand an
cqual chance N of being
included in sample. Now the first unit selected in dropped from the
leaving bchind(N )unitscligible for sclection of second sample population
unit. Now the
chance of inclusion, as second sample unit, is
N - The process is repeated
untilwe have a sample of'n' units.
In SRSWR allthepopulation units are
eligible for
ofa sampleunit isdone. We do not drop the unit or inclusion every time a selection
units already selected. Thus all
the population units stand a chance 1 everytime a selection is made.
N
In SRSWOR the sample units are distinct
whereas in SRSWR the sample units
may be repeated more than once. The SRSWOR procedure is
advised for small or
finite population and the SRSWR is advised for infinite
populations.
COLLECTION OF DATA
Statistical data may be categorised as primary data and secondary data. Primary
data is data collected primarily for the purpose of the given enquiry. They are
original in character and are the raw materials of the enquiry. Secondary data are
data already collected by someone for some purpose and are available for the
present study. For instance, the data collected during census operations are primary
data to the department of census and the same data, if utilized by a researcher for
some study, are secondary data for him.
PRIMARY DATA
For collection of Primary data, the investigator may choose any one ofthe following
methods :
1 Direct personalobservation
2. Indirect oral investigation
3 Data through mailed questionnaires
4. Information through agencies
5. Investigations through enumerators
1. Direct Personal Observation
In this method the investigator collects the requisite data personally. He asks or
cross-examinesthe informant in a tactful and courteous manner, and collects the
necessary information. This method is adopted when (i) intensive or in-depth study
Descriptive Statistics (EYB Sc SEM-I &I,
IS CSSCntial,() greater accuracy is needed. (ii) the field of enquiry is not large but
complex, (IV) data of aconfidential nature are to be collected and (v) sutticient
lime is available.
respon se will
Sucn a peT Sonal enquiry gives reliable and accurate information as
be encouraging because of personal touch. Also, uniformity and homogeneity of
time
expensive and
data can be maintained. However, such an enquiry will be
investigator.
cOnsuming and willnot give good results at the hands of an untrained
Also,the chances of personal prejudices / bias creeping into the invest1gation are
more.
2. IndirectOral lnvestigation
information. Here
This method is used when the informant is reluctant to supply
who are in touch with the
the investigator approaches witnesses or third parties drinking or
informant. For instance if one wants to collect in formation about
members, friends
gambling habits of peopleone gets the information from family
and/or liquor shops, etc. This method is generally adopted by Government agencies
thefts, murders
for their enquiry committees of commissions. Also, in cases of
riotsetc. the police interrogate third parties who possess knowledge about the
happenings under study.
This method is simple and convenient,and adequate information can be obtained
if information is collected from different parties.
However absence of direct contact can mar the reliability of the information. Also
witnesses may colour the information to suit their interests.
3. Data Through Mailed Questionnaires
As the title suggests data is collected through aquestionnaire consisting ofa list of
questions pertaining to the inquiry. This questionnaire is posted to the respondents.
who are expected to answer the questions or write the answers in the blank spaces.
Acovering letter is also sent along with the questionnaire requesting full co
operation andprompt reply from the respondents. This method is followed by
research workers, private individuals, non-official agencies and Government
agencies.
The mailed questionnaire method is the most economical and there is a saving of
time and labour. It is suitable when the area of the survey is large. Since information
isobtained directly from respondents error in the investigation is small.
However, since there is no directcontact between investigator and respondent one
cannot be sure about the accuracy and reliability of the data. Some people may not
reply as they may be illiterate or simply lazy. This could lead tO non-response or
delayed response.
4. Information Through Agencies
Here, the investigator appoints agents or correspondents to collect data. These
agentscollect the information and transfer it to the investigator. This method of
Collection and Types of Data
primary data collection is usuallyfollowed by news agencies where information is
needed in different ficlds like politics, sports, natural and nan-made calamities
etc. This method is adopted where information is required
regular basis. from a wide arca on a
Although this method gives extensive information in aspeedy and
way. it may be biased and the requisite degree economical
of accuracy and uniformity cannot
be maintained,
5. Data Through Schedules or
Investigations Through
This method is generally employed by the Enumerators
A number of enumerators are selected and Government for population census etc.
trained. They are then armed with
standardised questionnaires and sent to
The method is adopted when one requiresinformants
to get first hand information.
all types of people (literate and illiterate).
reliable and accurate information from
However, this data collection exercise may be costly and time
depends on the efficiency of the enumerators. consuming, and
QUESTIONNAIRE
The reader can easily appreciate, from the above primary data
collection
that the most important requirement for a successful data collection methods,
exercise a
is
questionnaire. A questionnaire is the medium of communication between an
investigator and the respondents. Utmost care and caution must be exercised when
one designs or drafts a questionnaire.
A questionnaire is often divided into two sections (i) classification section & (ii)
subject-matter section.
(i) Classification section:This section includes details of the respondent such
as name, age, sex, education, marital status, occupation etc.. Also details like
dateof interview and name of the interviewer are included in this section.
(ii) Subject-matter section : This section includes questions related to the
subject-matter of the inquiry. The answers given here can be analysed
according tothe information in the classification section.
Requisites of a Good Questionnaire
Construction of agood questionnaire requires great skill, care, wisdom, efficiency
and experience.
The following points must be kept in mind:
(i) The questionnaire must be brief i.e. the number of questions should be as
few as possible as respondents will not like or maynot have time for answering
a long questionnaire. All the questionsmust be relevant to the problem under
investigation.
(1) Questions should not be ambiguous. They must be capable of one and only
one interpretation.
Descriptive Statistics (EYB. Sc.:SEM-I& I)
( ) Qucstions must be casily understood. Technical terms must be
when addressed to spccialists. avoided except
(Iv) Questions must be arranged in a logical
sequence.
(V) QuCstions should have a precise answers. The answers should take the form
o1 'yes' or 'no', a quantity, a date, a place etc. Wherever pOSsible the
questtonnaire should suggest answers so that the informant has merely to
tick or cross (or x) the answer.
(VI) Qucstions must not containwords ofvague meaning. Toask if something is
large or ifa man is unskilled are examples of such questions.
(VIn) Questions of asensitive or personal nature should be avoided. Such questions
may not be answered and some informants may be offended.
(vii)Questions should not require the respondent to make any calculations. The
figures provided by the respondent must be accepted and calculations
according to the need ofthe questionnaire done later.
(ix) To check reliability of answers, some questions should be asked so as
to
provide cross-check for the answers to the similar questions.
Before using the method of collecting data through questionnaire, it is always
advisable toconduct 'pilot survey for testing the questionnaire. This pilot survey
is, in fact, a replica or rehearsal of the main survey. Such a
survey brings into light
weaknesses, ifany, of thequestionnaire and also of the survey technique.
experience gained in this way, improvement can be made if necessary. From the
SCHEDULES
This method of data collection is very much likethe
questionnaire, with little difference which lies in the factcollection
of data through
that schedules (proforma
containing the set of questions) are filled in by the enumerators who are specially
appointed for the purpose. These enumerators go to
questions from the schedule and record the replies in therespondents, put to them the
in the proforma. The method requires space meant for the same
up schedules and they should be trainedcareful selection of enumerators for filling
to perform theirjob well. The
should be honest, sincere, hardworking and should enumerators
have patience and perseverance.
This method of data collection is very
fairly reliable results. It is, however, useful
in extensive enquiries and can lead
to
the world is conducted through this very expensive. Population census all over
method.
Differences between Questionnaires and Schedules:
The questionnaire is generally sent
through mail to informants with acovering
letter, the schedule is generally filled in by the enumerator.
Tocollect data through
we have to spend moneyquestionnaire
is relatively cheap and
only in preparing the economical since
required. Toquestionnaire
the respondents. No field staff is and mailing it to
collect data through schedule
7
Collection and Types of Data
IS relativcly more expensive since considerablc amount of money is spcnt in
appontng enumerators and training thcm.
Non-response is high in case of qucstionnaire as many people do not respona
andmany return questionnaire without answering all questions. As against
this, non-response is generally very low in case of schedules.
4 The questionnaire method is likely to be slow since many
not return the qucstionnaire on time. but in case of schedulerespondents may
the in formation
is collected well in timeas they are filled in
by the enumerators.
Questionnaire method can be used only when the respondents are literates
and co-operative, but incase of schedules, the information can be
even when the respondents happen to be illiterate. gathered
6. In case of questionnaire, it is not clear as to who replies, but in
case of schedule,
the identity of respondent is known.
SECONDARY DATA
Secondary data are those bits of information which have been already collected by
some other agency for its own use. Secondary data are like finished products as
they are already processed. Primary data lose their original form when they are
statistically processed and become secondary data for the user. However, the
investigator must be very cautious while using secondary data as (i) it might have
been influenced by personal prejudices ofthe original investigator, (i) the method
of collection may not be proper and (ii) the degree of accuracy may not be suficient
for the present inquiry.
Secondary data is available through (a) published sources (b)unpublished sources.
(a) Published Sources
Various governmental, international and local agencies publish statistical data. The
chief among them are
() Reports ofinternationalagencies like International Monetary Fund (I.M.E),
International Bank for Reconstruction and Development (I.B.R.D.), United
Nations Organisation (UNO), United Nations Educational, Scientific and
Cultural Organisation (UNESCO), United Nations International Childrens
Emergency Funds (UNICEF),etc.
(ii) Oficial publications of Central and State Government departments. Some of
the impottant publications are the Reserve Bank of India Bulletin,Census of
India, Statistical Abstracts of States, Agricultural Statistics of India, India
Trade Journal, Indian Forest Statistics etc.
(ii) Semi-official publications of semi-government institutions like Municipal
Corporations, Panchayats, District Boards,
(i) Reports of VariousCommissions and Committees appointed by the Government
like Pay Commission Reports, Land Reforms Committee Reports etc.
Descriptive Statistics (FYBSc.:SSEM-I &
(V) Newspapers and Journals like Economic and Political Weckly, Commer. Co
Economic Times, Capital, Dalal Street Journal, Monthly Statistics of Trade et.
(b) Unpublished Sources
There are various sources of unpublished data. They are the records maintained b,
Various government offices, research material of researchers for universities Th
research institutes. ty
()
X EDITING THE DATA (ii
(i)
Once the data have been obtained either from a primary or a secondary source, the
next step in any statistical investigation is to edit the data. The main purpose of
editing the data is to serutinize it for possible errors &irregularities. The task of
editing is a highly specialised one and necessary too before proceeding to tabulation
It requires great care and negligence in this regard render a valuable study useless
The process of editing will consist of :
(a) Checking the schedules / questionnaires for completeness : The editor
must see whether answer to each question has been furnished. If not. the
informant must be contacted again. If one doesn't still get any reply then
editor shouldmark 'No reply' in space provided fr the answe.
(b) Ensuring that there are no inconsistent entries : The editor must check if
answers to questionsare not contradictory. For example if answers to two
questions 'Are your married ? and 'How many children do you have? are
"No' and "Two' respectively, then clarification should be obtained.
(c) Checking the questionnaires for accuracy : Validity or reliability of
conclusions drawn from survey depends on accuracy ofinformation. Sometimes
the editor may be able to identify erroneous entries easily. Rectification of such
entries should be done before the data used. Ifentries are obviously wrong and
no rectification is possible then that entry should not be used.
Inaccuracies may arise due to dropping or shifting of a decimal point. For
e.g. the thirdvalue of data 14.9, 13.8, 142, 12.3, 15.6 is obviously 14.2.
Also, it is obvious that the son's age cannot be 20 if the mother's age is reported
as 25. Entries reported for September, 31 or February, 30 cannot be
considered.
(d) Checking the questionnaires for homogeneity of answers : The editor
must check ifthe questions have been interpreted uniformly. For e.g. if the
respondent has mentioned gross pay then it cannot be compared with net pay
after tax deduction. Also if some respon dents give monthly income, some
annual income andothers weekly income, then no comparison is possible.
Before using such information it must be converted to a çommon scale.
9
Collection and Tipes of Data
TYPES OF DATA
The data obtained through primary or secondary sources can be of the following
types
(i) Geographicalor Spatial Data
(i) Chronologicalor Time-Series Data
() Geographicalor Spatial Data :
h¹ Data is collected according to geographical areas and locations. The basis of
collection is the geographical or locational differences betvween variousitems.
e.g. Sales of cars in cities