ERM 4a Final
ERM 4a Final
DATA COLLECTION
&
DATA ANALYSIS
Prof. P. Laxminarayana
Dept. of Mechanical Engineering
Osmania University, Hyderabad
Why a Manager Needs to Know About Statistics
Source of Data:
The Researcher should keep in mind two types of data:
1. Primary
2. Secondary
The Primary Data : Those which are collected afresh and for the first time, and thus
happen to be original in character.
The secondary data : Those which have already been collected by someone else and
which have already been passed through the statistical process.
The distinction between Primary and Secondary data can be made more clear on the
basis of documents:
1. Primary data : Documented as record
2. Secondary data : Documented as report
Exploring the Data Contd….
There are several methods of collecting primary data, particularly in surveys and
descriptive researches. Important ones are:
(i) Observation method,
(ii) Interview method,
(iii) Through questionnaires,
(iv) Through schedules, and
(v) Other methods which include
Collection of Primary Data
Other Methods:
(a) Warranty cards;
(b) Distributor audits;
(c) Pantry audits;
(d) Consumer panels;
(e) Using mechanical devices;
(f) Through projective techniques;
(g) Depth interviews, and
(h) Content analysis.
We briefly take up each method separately.
Observation Method
Good and Hatt : Science begins with observation and must ultimately return to
observation for its final validation.
Moses and Kalton: Observation implies the use of eyes rather than of ears and the
voice.
Definition of Observation: As systematic viewing, coupled with consideration of the seen
phenomena, in which main consideration must be given to the larger unit of activity by
which the specific observed phenomena occurred.
Observing natural phenomena, aided by systematic classification and measurement, led to
the development of theories and laws of nature’s forces.
Aids of Observation: Diaries, note-books, schedules, photographs and maps are the
commonly used devices for observation.
Observation method has various limitations
It is an expensive method.
The information provided by this method is very limited.
Sometimes unforeseen factors may interfere with the observational task.
At times, the fact that some people are rarely accessible to direct observation
creates obstacle for this method to collect data effectively.
Generally, controlled observation takes place in various experiments that are carried out in a
laboratory or under controlled conditions
Whereas uncontrolled observation is resorted to in case of exploratory researches.
Interview Method
The interview method is one of the important methods of primary data collection.
It is a confiscation between the observer and respondent. It is oral-verbal questions and
corresponding oral – verbal response to the queries made.
Definition of interviews:
PV Young : The interview may be regarded as a systematic method by which one
persons enters more or less legitimately into the inner life of another who
is generally a stranger to him.
Hsin Pao Yang: The interview is a technique of field work which is used to watch the
behaviour of an individual or individuals, to record statements, to observe
the concrete results of social or group interactions.
CA Master : In a formal interview pre-determined questions are asked and the answers
are collected in a certain way.
The interviews can be conducted personally or though telephones.
The concept of interview, usually understood as face -to- face encounter, can be extended to
include telephone interviews and in today’s context, video interviews.
Interview Method
The interview method of collecting data involves presentation of oral-verbal
stimuli and reply in terms of oral-verbal responses.
This method can be used through personal interviews and, if possible, through
telephone interviews.
Personal interviews: Personal interview method requires a person known as the
interviewer asking questions generally in a face-to-face contact to the other
person or persons.
At times the interviewee may also ask certain questions and the interviewer
responds to these, but usually the interviewer initiates the interview and collects
the information.
This sort of interview may be in the form of direct personal investigation or it
may be indirect oral investigation.
Direct personal investigation: He has to be on the spot and has to meet people
from whom data have to be collected.
This method is particularly suitable for intensive investigations.
Interview Method
Indirect oral examination can be conducted under which the interviewer has to
cross-examine other persons who are supposed to have knowledge about the
problem under investigation and the information, obtained is recorded.
Most of the commissions and committees appointed by government to carry on
investigations make use of this method.
Major advantages of personal interviews:
1. More information and that too in greater depth can be obtained.
2. There is greater flexibility under this method as the opportunity to restructure
questions is always there, specially in case of unstructured interviews.
3. Observation method can as well be applied to recording verbal answers to
various questions.
4. Personal information can as well be obtained easily under this method.
5. The interviewer can collect supplementary information about the respondent’s
personal characteristics and environment which is often of great value in
interpreting results.
Interview Method
Weaknesses of personal interviews:
1. It is a very expensive method, specially when large and widely spread
geographical sample is taken.
2. There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of
interviewers.
3. Certain types of respondents such as important officials or executives or people
in high income groups may not be easily approachable under this method and
to that extent the data may prove inadequate.
4. The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to
make the interview interesting.
5. Under the interview method the organization required for selecting, training
and supervising the field-staff is more complex with formidable problems.
6. Interviewing at times may also introduce systematic errors.
Interview Method
Telephone interviews: his method of collecting information consists in
contacting respondents on telephone itself. It is not a very widely used
method, but plays important part in industrial surveys, particularly in
developed regions.
The chief merits of such a system are:
1. It is more flexible in comparison to mailing method.
2. It is faster than other methods i.e., a quick way of obtaining information.
3. It is cheaper than personal interviewing method; here the cost per response is relatively low.
4. Recall is easy; callbacks are simple and economical.
5. There is a higher rate of response than what we have in mailing method; the non-response is
generally very low.
6. Replies can be recorded without causing embarrassment to respondents.
7. Interviewer can explain requirements more easily.
8. At times, access can be gained to respondents who otherwise cannot be contacted for one reason
or the other.
9. No field staff is required.
10. Representative and wider distribution of sample is possible
Interview Method Contd….
Telephone interviews
Demerits of collecting information are:
1. Little time is given to respondents for considered answers; interview period is
not likely to exceed five minutes in most cases.
2. Surveys are restricted to respondents who have telephone facilities.
3. Extensive geographical coverage may get restricted by cost considerations.
4. It is not suitable for intensive surveys where comprehensive answers are
required to various questions.
5. Possibility of the bias of the interviewer is relatively more.
6. Questions have to be short and to the point; probes are difficult to handle.
COLLECTION OF DATA THROUGH QUESTIONNAIRES
This method of data collection is quite popular, particularly in case of big
enquiries. It is being adopted by private individuals, research workers, private
and public organisations and even by governments.
In this method a questionnaire is sent (usually by post) to the persons
concerned with a request to answer the questions and return the questionnaire.
A questionnaire consists of a number of questions printed or typed in a
definite order on a form or set of forms.
The questionnaire is mailed to respondents who are expected to read and
understand the questions and write down the reply in the space meant for the
purpose in the questionnaire itself. The respondents have to answer the
questions on their own.
The method of collecting data by mailing the questionnaires to respondents is
most extensively employed in various economic and business surveys.
COLLECTION OF DATA THROUGH QUESTIONNAIRES
Contd…
Rating Scales and Agreement Scales are two types of questions that some researchers treat
as multiple choice questions and others treat as numeric open end questions.
Rating Scales
4. How would you rate this product?
Excellent
Good
Fair
Poor
5. On a scale where “10” means you have a great amount of interest in a subject and “I” means you
have none at all, how would you rate your interest in each of the following topics?
Domestic politics …
Foreign Affairs …
Science and Health …
Business …
Questionnaire Design
Agreement Scale
6. How much do you agree with each of the following statements
S. No Particulars Strongly Agree Dis Strongly
agree agree Disagree
1 My manager provides constructive criticism
2 Our medical plan provides adequate coverage
3 I would prefer to work longer hours on fewer days
A Sample Questionnaire
A study for telephone services company to find the expectations of customers using telephone booths
at Hyderabad and their profiles. The format of the questionnaire used in this study is presented below:
Questionnaire
Study on customer expectations and profiles of PCO booths at Hyderabad
Address of Telephone Booth:
Customer’s personal profile
1. Name :
2. Age :
a. Up to 17 years b. 18-24 years
c. 25-40 years d. 41-50 years
e. 51- 60 years f. More than 60 years
3. Gender
4. a. Male …… b. Female …..
5. Monthly househod income
a. Less than Rs. 10,000 b. Rs. 10,000 – 20,000
c. Rs. 20,000 d. Rs. 30,000 – 50, 000 e. more than Rs. 50,000.
6. Occupation
a. Service sector b. Government c. Public d. Private
e. Business f. Student / house wife g. Others (specify) ………
SOME OTHER METHODS OF DATA COLLECTION
Particularly used by big business houses in modern times.
1. Warranty cards: Warranty cards are usually postal sized cards which are used by dealers of consumer durables to
collect information regarding their products. The consumer to fill in the card and post it back to the dealer.
2. Distributor or store audits: Performed by distributors as well as manufactures through their salesmen at regular
intervals. To estimate market size, market share, seasonal purchasing pattern and so on. The data are obtained in
such audits not by questioning but by observation.
3. Pantry audit technique: It is used to estimate consumption of the basket of goods at the consumer level. It is to
find out what types of consumers buy certain products and certain brands, the assumption being that the contents of
the pantry accurately portray consumer’s preferences.
4. Consumer panel: An extension of the pantry audit approach on a regular basis is known as ‘consumer panel’,
where a set of consumers are arranged to come to an understanding to maintain detailed daily records of their
consumption and the same is made available to investigator on demands.
5. Use of mechanical devices : The use of mechanical devices has been widely made to collect information by
way of indirect means. Eye camera, Pupilometric camera, Psychogalvanometer, Motion picture camera and
Audiometer are the principal devices so far developed and commonlyused by modern big business houses, mostly
in the developed world for the purpose of collecting the required information.
6. Projective techniques: Projective techniques (or what are sometimes called as indirect interviewing techniques)
for the collection of data, it play an important role in motivational researches or in attitude surveys.
7. Depth interviews : Depth interviews are held to explore needs, desires and feelings of respondents Unless the
researcher has specialized training, depth interviewing should not be attempted
8. Content-analysis : Content-analysis consists of analysing the contents of documentary materials such as books,
magazines, newspapers and the contents of all other verbal materials.
COLLECTION OF SECONDARY DATA
Secondary data means data that are already available i.e., they refer to the data which have
already been collected and analyzed by someone else.
When the researcher utilizes secondary data, then he has to look into various sources from
where he can obtain them.
Secondary data may either be published data or unpublished data.
Usually published data are available in:
a. Various publications of the central, state are local governments;
b. Various publications of foreign governments or of international bodies and their subsidiary
organizations;
c. Technical and trade journals;
d. Books, magazines and newspapers;
e. Reports and publications of various associations connected with business and industry,
banks, stock exchanges, etc.;
f. Reports prepared by research scholars, Universities, Economists, etc. In different fields;
g. Public records and statistics, historical documents, and other sources of published
information.
COLLECTION OF SECONDARY DATA Contd….
The sources of unpublished data are many: It may be found in diaries, letters, unpublished
biographies and autobiographies and also may be available with scholars and research workers,
trade associations, labour bureaus and other public/private individuals and organisations.
Researcher must be very careful in using secondary data. By way of caution, the researcher,
before using secondary data, must see that they possess following characteristics:
1. Reliability of data: Reliability can be tested by finding out
(a) Who collected the data? (b) What were the sources of data?
(c) Were they collected by using proper methods (d) At what time were they collected?
(e) Was there any bias of the compiler? (f) What level of accuracy was desired? Was it achieved ?
2. Suitability of data: The data that are suitable for one enquiry may not necessarily be found
suitable in another enquiry.
3. Adequacy of data: If the level of accuracy achieved in data is found inadequate for the purpose of
the present enquiry, they will be considered as inadequate and should not be used by the researcher.
From all this we can say that it is very risky to use the already available data. The already
available data should be used by the researcher only when he finds them reliable, suitable and
adequate.
Description and analysis of Data
Technically speaking, description implies editing, coding, classification and
tabulation of collected data so that they are amenable to analysis.
The term analysis refers to the computation of certain measures along with
searching for patterns of relationship that exist among data-groups.
Thus, “in the process of analysis, relationships or differences supporting or
conflicting with original or new hypotheses should be subjected to statistical
tests of significance to determine with what validity data an be said to indicate
any conclusions”.
Editing: A routine work,
it has to be carried out with utmost care and devotion,
Checking the filled questionnaires,
Coding: It is an operation which requires judgment, skill, particularly for developing the coding frame
Reducing the mass data into manageable proportion
Classification: Tabulation of data is a common tool
It is used for summarizing the data so that they are amenable for interpretation
Summarizing data into tabular form.
Description Operations
Editing: Editing of data is a process of examining the collected raw data
(specially in surveys) to detect errors and omissions and to correct
these when possible. It involves a careful scrutiny of the
completed questionnaires and/or schedules.
Field editing:
• Consists in the review of the reporting forms by the investigator for
completing (translating or rewriting)
• This type of editing is necessary in view of the fact that individual writing
styles often can be difficult for others to decipher.
Central editing:
• It should take place when all forms or schedules have been completed
and returned to the office. This type of editing implies that all forms
should get a thorough editing by a single editor in a small study and by
a team of editors in case of a large inquiry.
Description Operations Contd…..
Coding:
• Coding refers to the process of assigning numerals or other symbols to
answers so that responses can be put into a limited number of categories or
classes.
• Coding is necessary for efficient analysis and through it the several replies
may be reduced to a small number of classes which contain the critical
information required for analysis.
Classification:
• Most research studies result in a large volume of raw data which must be
reduced into homogeneous groups if we are to get meaningful relationships.
1. Classification according to attributes: Data are classified on the basis of common
characteristics which can either be descriptive (such as literacy, sex, honesty, etc.) or
numerical (such as weight, height, income, etc.).
vi. Availability of finance: In practice, size of the sample depends upon the
amount of money available for the study purposes. This factor should be
kept in view while determining the size of sample for large samples result
in increasing the cost of sampling estimates.
and drawing conclusions there from. Most research studies result in a large
volume of raw data which must be suitably reduced so that the same can be
read easily and can be used for further analysis. Clearly the science of
statistics cannot be ignored by any research worker.
The important statistical measures that are used to summarise the
survey/research data are:
1. Measures of central tendency or statistical averages
2. Measures of dispersion
3. Measures of asymmetry (skewness)
4. Measures of relationship
Some Important Definitions
A Population (Universe) is the whole collection of things under
consideration
Population Sample
Use statistics to
summarize features
Use parameters to
summarize features
Data
Categorical Numerical
(Qualitative) (Quantitative)
Discrete Continuous
IMPORTANT STATISTICAL MEASURES
Measures of Central Tendency(Statistical averages)
Mean, Median, Mode, Geometric Mean, Harmonic Mean
Quartiles
Measure of Variation
Range, Semi Inter-quartile Range, Mean Deviation, Variance, Standard
Deviation and Coefficient of Variation
Measures of Skewness / Shape (Measure Asymmetry)
Symmetric, Skewed
Measures of Kurtosis/Peakedness
Lepto kurtic / Platy Kurtic / Meso kurtic
Summary Measures
Summary Measures
Mean Mode
Median Range Coefficient
of Variation
Variance
Standard Deviation
Geometric Mean
Shape of a Distribution
Describe How Data are Distributed
Measures of Shape
Symmetric or skewed
i. Test of a hypothesis concerning some single value for the given data (such as one-
sample sign test).
ii. Test of a hypothesis concerning no difference among two or more sets of data
(such as two-sample sign test, Fisher-Irwin test, Rank sum test, etc.).
iii. Test of a hypothesis of a relationship between variables.
iv. Test of a hypothesis concerning variation in the given data i.e., test analogous to
ANOVA .
v. Tests of randomness of a sample based on the theory of runs viz., one sample runs
test.
vi. Test of hypothesis to determine if categorical data shows dependency or if two
classifications are independent viz., the chi-square test. The chi-square test can as
well be used to make comparison between theoretical populations and actual data
when categories are used.
Points of Central Tendency
Measures of central tendency (or statistical averages) tell us the point about which
items have a tendency to cluster. Such a measure is considered as the most
representative figure for the entire mass of data. Measure of central tendency is also
known as statistical average. Mean, median and mode are the most popular averages.
Mean, also known as arithmetic average
Median (M) is the value of the middle item of series when it is arranged in
ascending or descending order of magnitude.
For example: Whether the number of hours students devote for studies is
somehow related to their family income, to age, to sex or to
similar other factor.
There are several methods of determining the relationship
between variables, but no method can tell us for certain that a
correlation is indicative of causal relationship.
Inferential Statistics
Analysis, particularly in case of survey or experimental data,
involves estimating the values of unknown parameters of the
population and testing of hypotheses for drawing inferences.
Present Data
E.g., Tables and graphs
Characterize Data
E.g., Sample Mean = X i
n
Inferential Statistics
Estimation
E.g., Estimate the population mean
weight using the sample mean
weight
Hypothesis Testing
E.g., Test the claim that the
population mean weight is 120
pounds
REJECT
Null Hypothesis
X 2.4
Reason for Rejecting H0
Sampling Distribution of X
It is unlikely that ... Therefore,
we would get a we reject the
sample mean of null hypothesis
this value ... that = 3.5.
2.4 = 3.5 X
If H0 is true
General Steps in Hypothesis Testing
E.g., Test the Assumption that the True Mean # of TV Sets in U.S.
Homes is at Least 3 ( Known)
1. State the H0 H0 : 3
2. State the H1 H1 : 3
3. Choose =.05
4. Choose n n 100
5. Choose Test Z test
General Steps in Hypothesis Testing Contd…
Rejection 0
Regions
H0: 3.5
H1: > 3.5
0
/2
H0: 3.5
H1: 3.5
0
Type I & II Errors Have an Inverse Relationship
Factors Affecting Type II Error
n
How to Choose between Type I and Type II Errors
Choice Depends on the Cost of the Errors
Choose Smaller Type I Error When the Cost of
Rejecting the Maintained Hypothesis is High
A criminal trial: convicting an innocent person
The Exxon Valdez: causing an oil tanker to sink
Choose Larger Type I Error When You Have an
Interest in Changing the Status Quo
A decision in a startup company about a new piece of software
A decision about unequal pay for a covered group
Less Variability
Standard Error (Standard Deviation) of the
Sampling Distribution X is Less Than the
Standard Error of Other Unbiased Estimators
f X Sampling
Distribution
of Median Sampling
Distribution of
Mean
X