0% found this document useful (0 votes)
10 views65 pages

ERM 4a Final

Uploaded by

vinayak457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views65 pages

ERM 4a Final

Uploaded by

vinayak457
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 65

UNIT-IV

DATA COLLECTION
&

DATA ANALYSIS

Prof. P. Laxminarayana
Dept. of Mechanical Engineering
Osmania University, Hyderabad
Why a Manager Needs to Know About Statistics

 To Know How to Properly Present Information

 To Know How to Draw Conclusions about Populations


Based on Sample Information

 To Know How to Improve Processes

 To Know How to Obtain Reliable Forecasts


Why We Need Data
 To Provide Input to Survey
 To Provide Input to Study
 To Measure Performance of Ongoing Service or
Production Process
 To Evaluate Conformance to Standards
 To Assist in Formulating Alternative Courses of
Action
 To Satisfy Curiosity
Exploring the Data
 The task of data collection begins after a research problem has been defined and research
design/plan chalked out.
 The collection of data is the important task in the research methodology. Before explaining the
methods of data collection, researcher should understand the need of the study and decide the type
of the data required

Source of Data:
The Researcher should keep in mind two types of data:
1. Primary
2. Secondary
The Primary Data : Those which are collected afresh and for the first time, and thus
happen to be original in character.
The secondary data : Those which have already been collected by someone else and
which have already been passed through the statistical process.
The distinction between Primary and Secondary data can be made more clear on the
basis of documents:
1. Primary data : Documented as record
2. Secondary data : Documented as report
Exploring the Data Contd….

 The researcher has to decide which type of data he would like


to use for this study and accordingly he will have to select
particular type of data base.

 The data collection methods differ in each type of data to be


collected by the researcher personally, where as in secondary
data it is only compilation of the data already collected.

 We describe the different methods of data collection, with the


pros and cons of each method.
COLLECTION OF PRIMARY DATA
We collect primary data during the course of doing experiments in an
experimental research but in case we do research of the descriptive type and
perform surveys, whether sample surveys or census surveys, then we can obtain
primary data either through observation or through direct communication with
respondents in one form or another or through personal interviews.*

 In an experiment the investigator measures the effects of an experiment which


he conducts intentionally.
 In a survey, the investigator examines those phenomena which exist in the
universe independent of his action. The difference between an experiment and a
survey can be depicted as under:

There are several methods of collecting primary data, particularly in surveys and
descriptive researches. Important ones are:
(i) Observation method,
(ii) Interview method,
(iii) Through questionnaires,
(iv) Through schedules, and
(v) Other methods which include
Collection of Primary Data
Other Methods:
(a) Warranty cards;
(b) Distributor audits;
(c) Pantry audits;
(d) Consumer panels;
(e) Using mechanical devices;
(f) Through projective techniques;
(g) Depth interviews, and
(h) Content analysis.
We briefly take up each method separately.
Observation Method
Good and Hatt : Science begins with observation and must ultimately return to
observation for its final validation.
Moses and Kalton: Observation implies the use of eyes rather than of ears and the
voice.
Definition of Observation: As systematic viewing, coupled with consideration of the seen
phenomena, in which main consideration must be given to the larger unit of activity by
which the specific observed phenomena occurred.
Observing natural phenomena, aided by systematic classification and measurement, led to
the development of theories and laws of nature’s forces.

Components of Observation: Observation involves Three Processes:


1. Sensation: It is gained through the sense of organs which depends upon the
physical alertness of the observer.
It is reports the facts as observed.
2. Attention : Which is largely a matter of habit.
3. Perception: Which involves the interpretation of sensory reports.
It enables the mind to recognize the facts.
Observation Method
 Observation becomes a scientific tool and the method of data collection for the
researcher, when it serves a formulated research purpose, is systematically
planned and recorded and is subjected to checks and controls on validity and
reliability. Under the observation method, the information is sought by way of
investigator’s own direct observation without asking from the respondent.
 For instance, in a study relating to consumer behaviour, the investigator instead
of asking the brand of wrist watch used by the respondent, may himself look at
the watch.
 The main advantage of this method is that subjective bias is eliminated, if
observation is done accurately.
 The information obtained under this method relates to what is currently
happening; it is not complicated by either the past behaviour or future intentions
or attitudes
 This method is independent of respondents’ willingness to respond and as such
is relatively less demanding of active cooperation on the part of respondents as
happens to be the case in the interview or the questionnaire method.
Characteristics of Observation
1. Observation is at once a physical as well as mental activity. The use of sense organs
is involved as in observation one has to see or hear something.
2. Observation is selective because one has to observe the range of those things which
fall within the observation
3. Observation is purposive. Observation is limited to those facts and details which
help in achieving the specified objectives of research.
4. Observation has to be efficient. Mere one is not enough. There should be scientific
thinking. Further, these observations should be based on tools of research which
have been properly standardized.
5. In observation the researcher makes direct study. It is a classical scientific method
for the collection primary and dependable data.
6. Through observation, it is possible to establish cause – effect relationship in social
phenomena. The investigator first of all observes things and then collect data.

Aids of Observation: Diaries, note-books, schedules, photographs and maps are the
commonly used devices for observation.
Observation method has various limitations
 It is an expensive method.
 The information provided by this method is very limited.
 Sometimes unforeseen factors may interfere with the observational task.
 At times, the fact that some people are rarely accessible to direct observation
creates obstacle for this method to collect data effectively.

The researcher should keep in mind things like:


 What should be observed?
 How the observations should be recorded? Or how the accuracy of observation can be ensured?
 Incase the observation is characterized by a careful definition of the units to be observed,
 The style of recording the observed information, standardized conditions of observation and the
selection of pertinent data of observation, then the observation is called as structured observation.

 Generally, controlled observation takes place in various experiments that are carried out in a
laboratory or under controlled conditions
 Whereas uncontrolled observation is resorted to in case of exploratory researches.
Interview Method
 The interview method is one of the important methods of primary data collection.
 It is a confiscation between the observer and respondent. It is oral-verbal questions and
corresponding oral – verbal response to the queries made.

Definition of interviews:
PV Young : The interview may be regarded as a systematic method by which one
persons enters more or less legitimately into the inner life of another who
is generally a stranger to him.
Hsin Pao Yang: The interview is a technique of field work which is used to watch the
behaviour of an individual or individuals, to record statements, to observe
the concrete results of social or group interactions.
CA Master : In a formal interview pre-determined questions are asked and the answers
are collected in a certain way.
The interviews can be conducted personally or though telephones.
The concept of interview, usually understood as face -to- face encounter, can be extended to
include telephone interviews and in today’s context, video interviews.
Interview Method
 The interview method of collecting data involves presentation of oral-verbal
stimuli and reply in terms of oral-verbal responses.
 This method can be used through personal interviews and, if possible, through
telephone interviews.
Personal interviews: Personal interview method requires a person known as the
interviewer asking questions generally in a face-to-face contact to the other
person or persons.
 At times the interviewee may also ask certain questions and the interviewer
responds to these, but usually the interviewer initiates the interview and collects
the information.
 This sort of interview may be in the form of direct personal investigation or it
may be indirect oral investigation.
 Direct personal investigation: He has to be on the spot and has to meet people
from whom data have to be collected.
 This method is particularly suitable for intensive investigations.
Interview Method
Indirect oral examination can be conducted under which the interviewer has to
cross-examine other persons who are supposed to have knowledge about the
problem under investigation and the information, obtained is recorded.
Most of the commissions and committees appointed by government to carry on
investigations make use of this method.
Major advantages of personal interviews:
1. More information and that too in greater depth can be obtained.
2. There is greater flexibility under this method as the opportunity to restructure
questions is always there, specially in case of unstructured interviews.
3. Observation method can as well be applied to recording verbal answers to
various questions.
4. Personal information can as well be obtained easily under this method.
5. The interviewer can collect supplementary information about the respondent’s
personal characteristics and environment which is often of great value in
interpreting results.
Interview Method
Weaknesses of personal interviews:
1. It is a very expensive method, specially when large and widely spread
geographical sample is taken.
2. There remains the possibility of the bias of interviewer as well as that of the
respondent; there also remains the headache of supervision and control of
interviewers.
3. Certain types of respondents such as important officials or executives or people
in high income groups may not be easily approachable under this method and
to that extent the data may prove inadequate.
4. The presence of the interviewer on the spot may over-stimulate the respondent,
sometimes even to the extent that he may give imaginary information just to
make the interview interesting.
5. Under the interview method the organization required for selecting, training
and supervising the field-staff is more complex with formidable problems.
6. Interviewing at times may also introduce systematic errors.
Interview Method
Telephone interviews: his method of collecting information consists in
contacting respondents on telephone itself. It is not a very widely used
method, but plays important part in industrial surveys, particularly in
developed regions.
The chief merits of such a system are:
1. It is more flexible in comparison to mailing method.
2. It is faster than other methods i.e., a quick way of obtaining information.
3. It is cheaper than personal interviewing method; here the cost per response is relatively low.
4. Recall is easy; callbacks are simple and economical.
5. There is a higher rate of response than what we have in mailing method; the non-response is
generally very low.
6. Replies can be recorded without causing embarrassment to respondents.
7. Interviewer can explain requirements more easily.
8. At times, access can be gained to respondents who otherwise cannot be contacted for one reason
or the other.
9. No field staff is required.
10. Representative and wider distribution of sample is possible
Interview Method Contd….
Telephone interviews
Demerits of collecting information are:
1. Little time is given to respondents for considered answers; interview period is
not likely to exceed five minutes in most cases.
2. Surveys are restricted to respondents who have telephone facilities.
3. Extensive geographical coverage may get restricted by cost considerations.
4. It is not suitable for intensive surveys where comprehensive answers are
required to various questions.
5. Possibility of the bias of the interviewer is relatively more.
6. Questions have to be short and to the point; probes are difficult to handle.
COLLECTION OF DATA THROUGH QUESTIONNAIRES
 This method of data collection is quite popular, particularly in case of big
enquiries. It is being adopted by private individuals, research workers, private
and public organisations and even by governments.
 In this method a questionnaire is sent (usually by post) to the persons
concerned with a request to answer the questions and return the questionnaire.
A questionnaire consists of a number of questions printed or typed in a
definite order on a form or set of forms.
 The questionnaire is mailed to respondents who are expected to read and
understand the questions and write down the reply in the space meant for the
purpose in the questionnaire itself. The respondents have to answer the
questions on their own.
 The method of collecting data by mailing the questionnaires to respondents is
most extensively employed in various economic and business surveys.
COLLECTION OF DATA THROUGH QUESTIONNAIRES
Contd…

 The opening questions should be such as to arouse human


interest. The following type of questions should generally
be avoided as opening questions in a questionnaire:
1. Questions that put too great a strain on the memory or
intellect of the respondent;
2. Questions of a personal character;
3. Questions related to personal wealth, etc.
Questionnaire Design
General Considerations
The first rule is design the questionnaire to fit the medium
Examples:
Multiple Choice
1. Where do you live?
 North
 South
 East
 West
Numeric Open End
2. How much did you spend on groceries this week? ……………..
Questionnaire Design
Text Open End
3. How can our company improve is working conditions?

Rating Scales and Agreement Scales are two types of questions that some researchers treat
as multiple choice questions and others treat as numeric open end questions.
Rating Scales
4. How would you rate this product?
 Excellent
 Good
 Fair
 Poor
5. On a scale where “10” means you have a great amount of interest in a subject and “I” means you
have none at all, how would you rate your interest in each of the following topics?
Domestic politics …
Foreign Affairs …
Science and Health …
Business …
Questionnaire Design

Agreement Scale
6. How much do you agree with each of the following statements
S. No Particulars Strongly Agree Dis Strongly
agree agree Disagree
1 My manager provides constructive criticism
2 Our medical plan provides adequate coverage
3 I would prefer to work longer hours on fewer days
A Sample Questionnaire
A study for telephone services company to find the expectations of customers using telephone booths
at Hyderabad and their profiles. The format of the questionnaire used in this study is presented below:
Questionnaire
Study on customer expectations and profiles of PCO booths at Hyderabad
Address of Telephone Booth:
Customer’s personal profile
1. Name :
2. Age :
a. Up to 17 years b. 18-24 years
c. 25-40 years d. 41-50 years
e. 51- 60 years f. More than 60 years
3. Gender
4. a. Male …… b. Female …..
5. Monthly househod income
a. Less than Rs. 10,000 b. Rs. 10,000 – 20,000
c. Rs. 20,000 d. Rs. 30,000 – 50, 000 e. more than Rs. 50,000.
6. Occupation
a. Service sector b. Government c. Public d. Private
e. Business f. Student / house wife g. Others (specify) ………
SOME OTHER METHODS OF DATA COLLECTION
Particularly used by big business houses in modern times.
1. Warranty cards: Warranty cards are usually postal sized cards which are used by dealers of consumer durables to
collect information regarding their products. The consumer to fill in the card and post it back to the dealer.
2. Distributor or store audits: Performed by distributors as well as manufactures through their salesmen at regular
intervals. To estimate market size, market share, seasonal purchasing pattern and so on. The data are obtained in
such audits not by questioning but by observation.
3. Pantry audit technique: It is used to estimate consumption of the basket of goods at the consumer level. It is to
find out what types of consumers buy certain products and certain brands, the assumption being that the contents of
the pantry accurately portray consumer’s preferences.

4. Consumer panel: An extension of the pantry audit approach on a regular basis is known as ‘consumer panel’,
where a set of consumers are arranged to come to an understanding to maintain detailed daily records of their
consumption and the same is made available to investigator on demands.
5. Use of mechanical devices : The use of mechanical devices has been widely made to collect information by
way of indirect means. Eye camera, Pupilometric camera, Psychogalvanometer, Motion picture camera and
Audiometer are the principal devices so far developed and commonlyused by modern big business houses, mostly
in the developed world for the purpose of collecting the required information.
6. Projective techniques: Projective techniques (or what are sometimes called as indirect interviewing techniques)
for the collection of data, it play an important role in motivational researches or in attitude surveys.
7. Depth interviews : Depth interviews are held to explore needs, desires and feelings of respondents Unless the
researcher has specialized training, depth interviewing should not be attempted
8. Content-analysis : Content-analysis consists of analysing the contents of documentary materials such as books,
magazines, newspapers and the contents of all other verbal materials.
COLLECTION OF SECONDARY DATA
Secondary data means data that are already available i.e., they refer to the data which have
already been collected and analyzed by someone else.
When the researcher utilizes secondary data, then he has to look into various sources from
where he can obtain them.
Secondary data may either be published data or unpublished data.
Usually published data are available in:
a. Various publications of the central, state are local governments;
b. Various publications of foreign governments or of international bodies and their subsidiary
organizations;
c. Technical and trade journals;
d. Books, magazines and newspapers;
e. Reports and publications of various associations connected with business and industry,
banks, stock exchanges, etc.;
f. Reports prepared by research scholars, Universities, Economists, etc. In different fields;
g. Public records and statistics, historical documents, and other sources of published
information.
COLLECTION OF SECONDARY DATA Contd….
The sources of unpublished data are many: It may be found in diaries, letters, unpublished
biographies and autobiographies and also may be available with scholars and research workers,
trade associations, labour bureaus and other public/private individuals and organisations.
Researcher must be very careful in using secondary data. By way of caution, the researcher,
before using secondary data, must see that they possess following characteristics:
1. Reliability of data: Reliability can be tested by finding out
(a) Who collected the data? (b) What were the sources of data?
(c) Were they collected by using proper methods (d) At what time were they collected?
(e) Was there any bias of the compiler? (f) What level of accuracy was desired? Was it achieved ?
2. Suitability of data: The data that are suitable for one enquiry may not necessarily be found
suitable in another enquiry.
3. Adequacy of data: If the level of accuracy achieved in data is found inadequate for the purpose of
the present enquiry, they will be considered as inadequate and should not be used by the researcher.
From all this we can say that it is very risky to use the already available data. The already
available data should be used by the researcher only when he finds them reliable, suitable and
adequate.
Description and analysis of Data
 Technically speaking, description implies editing, coding, classification and
tabulation of collected data so that they are amenable to analysis.
 The term analysis refers to the computation of certain measures along with
searching for patterns of relationship that exist among data-groups.
 Thus, “in the process of analysis, relationships or differences supporting or
conflicting with original or new hypotheses should be subjected to statistical
tests of significance to determine with what validity data an be said to indicate
any conclusions”.
Editing: A routine work,
it has to be carried out with utmost care and devotion,
Checking the filled questionnaires,
Coding: It is an operation which requires judgment, skill, particularly for developing the coding frame
Reducing the mass data into manageable proportion
Classification: Tabulation of data is a common tool
It is used for summarizing the data so that they are amenable for interpretation
Summarizing data into tabular form.
Description Operations
Editing: Editing of data is a process of examining the collected raw data
(specially in surveys) to detect errors and omissions and to correct
these when possible. It involves a careful scrutiny of the
completed questionnaires and/or schedules.
 Field editing:
• Consists in the review of the reporting forms by the investigator for
completing (translating or rewriting)
• This type of editing is necessary in view of the fact that individual writing
styles often can be difficult for others to decipher.

 Central editing:
• It should take place when all forms or schedules have been completed
and returned to the office. This type of editing implies that all forms
should get a thorough editing by a single editor in a small study and by
a team of editors in case of a large inquiry.
Description Operations Contd…..
Coding:
• Coding refers to the process of assigning numerals or other symbols to
answers so that responses can be put into a limited number of categories or
classes.
• Coding is necessary for efficient analysis and through it the several replies
may be reduced to a small number of classes which contain the critical
information required for analysis.

Classification:
• Most research studies result in a large volume of raw data which must be
reduced into homogeneous groups if we are to get meaningful relationships.
1. Classification according to attributes: Data are classified on the basis of common
characteristics which can either be descriptive (such as literacy, sex, honesty, etc.) or
numerical (such as weight, height, income, etc.).

2. Classification according to class-intervals : The numerical characteristics refer to


quantitative phenomenon which can be measured through some statistical units. Data relating
to income, production, age, weight, etc.
Description Operations Contd…..
Tabulation: When a mass of data has been assembled, it becomes
necessary for the researcher to arrange the same in some kind
of concise and logical order. This procedure is referred to as
tabulation.

Tabulation is essential because of the following reasons:

1. It conserves space and reduces explanatory and descriptive statement


to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and
omissions.
4. It provides a basis for various statistical computations.
Need for Sampling
Sampling is used in practice for a variety of reasons such as:
1. Sampling can save time and money. A sample study is usually less expensive than a
census study and produces results at a relatively faster speed.
2. Sampling may enable more accurate measurements for a sample study is generally
conducted by trained and experienced investigators.
3. Sampling remains the only way when population contains infinitely many
members.
4. Sampling remains the only choice when a test involves the destruction of the item
under study.
5. Sampling usually enables to estimate the sampling errors and, thus, assists in
obtaining information concerning some characteristic of the population.
Sample Design
The following are to considered for a sample design:
i. Nature of universe: Universe may be either homogenous or heterogenous in
nature. If the items of the universe are homogenous, a small sample can
serve the purpose. But if the items are heteogenous, a large sample would be
required. Technically, this can be termed as the dispersion factor.
ii. Number of classes proposed: If many class-groups (groups and sub-groups)
are to be formed, a large sample would be required because a small sample
might not be able to give a reasonable number of items in each class-group.
iii. Nature of study: If items are to be intensively and continuously studied, the
sample should be small. For a general survey the size of the sample should
be large, but a small sample is considered appropriate in technical surveys.
iv. Type of sampling: Sampling technique plays an important part in
determining the size of the sample. A small random sample is apt to be much
superior to a larger but badly selected sample.
Sample Design Contd…
v. Standard of accuracy and acceptable confidence level: If the standard of
accuracy or the level of precision is to be kept high, we shall require
relatively larger sample. For doubling the accuracy for a fixed significance
level, the sample size has to be increased fourfold.

vi. Availability of finance: In practice, size of the sample depends upon the
amount of money available for the study purposes. This factor should be
kept in view while determining the size of sample for large samples result
in increasing the cost of sampling estimates.

vii. Other considerations: Nature of units, size of the population, size of


questionnaire, availability of trained investigators, the conditions under
which the sample is being conducted, the time available for completion of
the study are a few other considerations to which a researcher must pay
attention while selecting the size of the sample.
Role of Statistics for Data Analysis
 In research is to function as a tool in designing research, analysing its data

and drawing conclusions there from. Most research studies result in a large
volume of raw data which must be suitably reduced so that the same can be
read easily and can be used for further analysis. Clearly the science of
statistics cannot be ignored by any research worker.
 The important statistical measures that are used to summarise the
survey/research data are:
1. Measures of central tendency or statistical averages
2. Measures of dispersion
3. Measures of asymmetry (skewness)
4. Measures of relationship
Some Important Definitions
 A Population (Universe) is the whole collection of things under
consideration

 A Sample is a Portion of the population selected for analysis

 A Parameter is a Summary measure computed to describe the


characteristic of a population

 A Statistic is a Summary measure computed to describe the characteristic


of a sample
Population and Sample

Population Sample
Use statistics to
summarize features
Use parameters to
summarize features

Inference on the population from the sample


Types of Data

Data

Categorical Numerical
(Qualitative) (Quantitative)

Discrete Continuous
IMPORTANT STATISTICAL MEASURES
 Measures of Central Tendency(Statistical averages)
 Mean, Median, Mode, Geometric Mean, Harmonic Mean

 Quartiles
 Measure of Variation
 Range, Semi Inter-quartile Range, Mean Deviation, Variance, Standard
Deviation and Coefficient of Variation
 Measures of Skewness / Shape (Measure Asymmetry)
 Symmetric, Skewed

 Measures of Kurtosis/Peakedness
 Lepto kurtic / Platy Kurtic / Meso kurtic
Summary Measures
Summary Measures

Central Tendency Quartile Variation

Mean Mode
Median Range Coefficient
of Variation
Variance

Standard Deviation
Geometric Mean
Shape of a Distribution
 Describe How Data are Distributed
 Measures of Shape
 Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median < Mode Mean = Median =Mode Mode < Median < Mean
Estimates of Population
 In most statistical research studies, population parameters are usually unknown
and have to be estimated from a sample.

 The estimate of a population parameter may be one single value or it could be


a range of values. In the former case it is referred as point estimate, whereas
in the latter case it is termed as interval estimate.

 The random variables (such as µ and σ2) used to estimate population


parameters, such as µ and σ2 They are conventionally called as ‘estimators’,
while specific values of these (such as µ = 105 or σ2 = 21.44) are referred to as
‘estimates’ of the population parameters.
Parametric and Non-Parametric Methods
 Statisticians have developed several tests of hypotheses (also known
as the tests of significance) for the purpose of testing of hypotheses
which can be classified as:
1. Parametric tests or Standard tests of hypotheses
2. Non-parametric tests or Distribution-free test of hypotheses

 Parametric tests usually assume certain properties of the parent population


from which we draw samples. Assumptions like observations come from a
normal population, sample size is large, assumptions about the population
parameters like mean, variance, etc., must hold good before parametric tests
can be used.

 The important parametric tests are:


(1) z-test; (2) t-test; (3) ѱ2-test, and (4) F-test.
Parametric and Non-Parametric Methods Contd..
When the researcher cannot or does not want to
make such assumptions. In such situations we use
statistical methods for testing hypotheses which are
called non-parametric tests because such tests do
not depend on any assumption about the
parameters of the parent population.
IMPORTANT NONPARAMETRIC OR DISTRIBUTION-FREE TESTS

Tests of hypotheses with ‘Order statistics’ or ‘Non-Parametric statistics’ or


‘Distribution-free statistics’ are known as nonparametric or distribution-free tests.

The following distribution-free tests are important and generally used:

i. Test of a hypothesis concerning some single value for the given data (such as one-
sample sign test).
ii. Test of a hypothesis concerning no difference among two or more sets of data
(such as two-sample sign test, Fisher-Irwin test, Rank sum test, etc.).
iii. Test of a hypothesis of a relationship between variables.
iv. Test of a hypothesis concerning variation in the given data i.e., test analogous to
ANOVA .
v. Tests of randomness of a sample based on the theory of runs viz., one sample runs
test.
vi. Test of hypothesis to determine if categorical data shows dependency or if two
classifications are independent viz., the chi-square test. The chi-square test can as
well be used to make comparison between theoretical populations and actual data
when categories are used.
Points of Central Tendency
 Measures of central tendency (or statistical averages) tell us the point about which
items have a tendency to cluster. Such a measure is considered as the most
representative figure for the entire mass of data. Measure of central tendency is also
known as statistical average. Mean, median and mode are the most popular averages.
Mean, also known as arithmetic average

 Where = The symbol we use for mean (pronounced as X bar)


 Ʃ = Symbol for summation
 n = total number of items

 Median (M) is the value of the middle item of series when it is arranged in
ascending or descending order of magnitude.

 Mode is the most commonly or frequently occurring value in a series. The


mode in a distribution is that item around which there is maximum
concentration.
Measures of Relationship
 We have dealt with those statistical measures that we use in context of
univariate population i.e., the population consisting of measurement of
only one variable.

For example: Whether the number of hours students devote for studies is
somehow related to their family income, to age, to sex or to
similar other factor.
There are several methods of determining the relationship
between variables, but no method can tell us for certain that a
correlation is indicative of causal relationship.
Inferential Statistics
Analysis, particularly in case of survey or experimental data,
involves estimating the values of unknown parameters of the
population and testing of hypotheses for drawing inferences.

 Descriptive analysis: Descriptive analysis is largely the study of


distributions of one variable. This study provides us with profiles of
companies, work groups, persons and other subjects on any of a multiple of
characteristics such as size. Composition, efficiency, preferences.

 Inferential analysis: Inferential analysis is often known as statistical


analysis. It is concerned with the various tests of significance for testing
hypotheses in order to determine with what validity data can be said to indicate
some conclusion or conclusions. It is also concerned with the estimation of
population values.
Descriptive Statistics
 Collect Data
 E.g., Survey

 Present Data
 E.g., Tables and graphs

 Characterize Data
 E.g., Sample Mean = X i

n
Inferential Statistics
 Estimation
 E.g., Estimate the population mean
weight using the sample mean
weight
 Hypothesis Testing
 E.g., Test the claim that the
population mean weight is 120
pounds

Drawing conclusions and/or making decisions


concerning a population based on sample results.
What is a Hypothesis?
 A Hypothesis is a
Claim (Assumption)
about the Population
Parameter I claim the mean GPA of
 Examples of parameters this class is   3.5!
are population mean
or proportion
 The parameter must
be identified before
analysis

© 1984-1994 T/Maker Co.


The Null Hypothesis, H0
 States the Assumption (Numerical) to be Tested
 E.g., The mean GPA is 3.5

 Null Hypothesis is Always about a Population Parameter


 (H 0 :   3.5), Not about a Sample Statistic ( H 0 :   3.5 )
 Is the Hypothesis a Researcher Tries to Reject
H 0 : X  3.5
The Null Hypothesis, H0 (continued)

 Begin with the Assumption that the Null Hypothesis is True


 Similar to the notion of innocent until
proven guilty
 Refer to the Status Quo
 Always Contains the “=” Sign
 The Null Hypothesis May or May Not be Rejected
The Alternative Hypothesis, H1
 Is the Opposite of the Null Hypothesis
 E.g., The mean GPA is NOT 3.5 ( H1 :   3.5 )

 Challenges the Status Quo


 Never Contains the “=” Sign
 The Alternative Hypothesis May or May Not Be
Accepted (i.e., The Null Hypothesis May or May
Not Be Rejected)
 Is Generally the Hypothesis that the Researcher
Claims
Hypothesis Testing Process
Assume the
population
mean GPA is 3.5
(H 0 :   3.5) Identify the Population

Is X  2.4 likely if   3.5?


Take a Sample
No, not likely!

REJECT

Null Hypothesis
 X  2.4
Reason for Rejecting H0
Sampling Distribution of X
It is unlikely that ... Therefore,
we would get a we reject the
sample mean of null hypothesis
this value ... that  = 3.5.

... if in fact this were


the population mean.

2.4  = 3.5 X
If H0 is true
General Steps in Hypothesis Testing

E.g., Test the Assumption that the True Mean # of TV Sets in U.S.
Homes is at Least 3 ( Known) 
1. State the H0 H0 :   3
2. State the H1 H1 :   3
3. Choose   =.05
4. Choose n n  100
5. Choose Test Z test
General Steps in Hypothesis Testing Contd…

6. Set up critical value(s) Reject H0



Z
-1.645
7. Collect data 100 households surveyed
8. Compute test statistic Computed test stat =-2,
and p-value p-value = .0228
Reject null hypothesis
9. Make statistical decision
The true mean # TV set is
10.Express conclusion
less than 3
Level of Significance, 
 Defines Unlikely Values of Sample Statistic if Null
Hypothesis is True
 Called rejection region of the sampling distribution
 
Designated by , (level of significance)
 Typical values are .01, .05, .10
 Selected by the Researcher at the Beginning
 Controls the Probability of Committing a Type I Error
 Provides the Critical Value(s) of the Test
Error in Making Decisions Contd…
 Type II Error
 Fail to reject a false null hypothesis
 Probability of Type II Error is
 The power of the test is

 Probability of Not Making Type I Error
 1   
 Called the Confidence Coefficient
1   
Result Probabilities
H0: Innocent
Jury Trial Hypothesis Test
The Truth The Truth
Verdict Innocent Guilty Decision H0 True H0 False
Do Not Type II
Innocent Correct Error Reject 1-
Error (  )
H0
Type I Power
Guilty Error Correct Reject Error
H0 (1 -  )
( )
Level of Significance and the Rejection Region

H0:  3.5  Critical


H1:  < 3.5 Value(s)

Rejection 0
Regions 
H0:   3.5
H1:  > 3.5
0
/2
H0:  3.5
H1:   3.5
0
Type I & II Errors Have an Inverse Relationship

Reduce probability of one error


and the other one goes up holding
everything else unchanged.


Factors Affecting Type II Error

 True Value of Population Parameter


  increases when the difference between the hypothesized
parameter and its true value decrease
 Significance Level
  increases when  decreases
 Population Standard Deviation 
  increases when  increases 
 Sample Size
 
  increases when n decreases


n
How to Choose between Type I and Type II Errors
 Choice Depends on the Cost of the Errors
 Choose Smaller Type I Error When the Cost of
Rejecting the Maintained Hypothesis is High
 A criminal trial: convicting an innocent person
 The Exxon Valdez: causing an oil tanker to sink
 Choose Larger Type I Error When You Have an
Interest in Changing the Status Quo
 A decision in a startup company about a new piece of software
 A decision about unequal pay for a covered group
Less Variability
Standard Error (Standard Deviation) of the
Sampling Distribution  X is Less Than the
Standard Error of Other Unbiased Estimators

f  X  Sampling
Distribution
of Median Sampling
Distribution of
Mean

 X

You might also like