Au B.com Business Statistics
Au B.com Business Statistics
B.Com
IV-SEMESTER
BUSINESS STATISTICS
All rights reserved. No part of this publication which is material protected by this copyright notice
may be reproduced or transmitted or utilized or stored in any form or by any means now known or
hereinafter invented, electronic, digital or mechanical, including photocopying, scanning, recording
or by any information storage or retrieval system, without prior written permission from the
Alagappa University, Karaikudi, Tamil Nadu.
SYLLABI-BOOK MAPPING TABLE
BUSINESS STATISTICS
Syllabi Mapping in Book
UNIT I Statistics - Importance of Statistics in modern business Pages - 1-15
environment - Definition of Statistics - Scope and Applications of
Statistics Characteristics of Statistics- Functions of Statistics -
Limitations of Statistics - Statistical Software
UNIT II Measures of Central Tendency and Dispersion - Objectives Pages - 16-58
of statistical average - Requisites of a Good Average - Statistical
Averages - Arithmetic mean - Properties of arithmetic mean - Merits
and demerits of arithmetic mean \- Median - Merits and demerits of
median - Mode - Merits and demerits of mode - Geometric Mean-
Harmonic Mean - Appropriate Situations for the Use of Various
Averages - Positional Averages - Dispersion – Range - Quartile
deviations - Mean deviation, Standard Deviation - Properties of
standard deviation Coefficient of Variance..
UNIT VI Testing of Hypothesis in Case of Large and Small Samples - Pages - 104-120
Large Samples– Assumptions - Testing Hypothesis - Null and alternate
hypothesis - Interpreting the level of significance - Hypotheses are
accepted and not proved - Selecting a Significance Level - Preference of
type I error - Preference of type II error - Determine appropriate
distribution - Two – Tailed Tests and One – Tailed Tests - Two – tailed
tests - Case study on two – tailed and one-tailed tests - Classification of
Test Statistics - Statistics used for testing of hypothesis - Test procedure -
How to identify the right statistics for the test - Testing of Hypothesis in
Case of Small Samples - Small samples - ‘t’ Distribution - Uses of ‘t’ test
UNIT VII Chi-square Test - Chi-square as a Test of Independence -
Pages -121-124
Characteristics of Chi-square test - Degrees of freedom -
Restrictions in applying Chi-square test - Practical applications of
Chi-square test - Levels of significance - Steps in solving problems
related to Chi-Square test - Interpretation of Chi-Square values -
Chi-Square Distribution -Properties of Chi-square distribution -
Conditions for applying the Chi-Square test - Uses of Chi-square test
- Applications of Chi-Square test - Tests for independence of
attributes - Test of goodness of fit - Test for specified variance
UNIT VIII F – Distribution and Analysis of Variance (ANOVA) –
Pages – 125-131
Analysis of Variance (ANOVA) - Assumptions for F-test -
Objectives of ANOVA - ANOVA table - Assumptions for study of
ANOVA - Classification of ANOVA - ANOVA table in one- way
ANOVA - Two way classifications
UNIT IX Simple Correlation and Regression - Correlation - Pages - 132-141
Causation and Correlation- Types of Correlation - Measures of
Correlation - Scatter diagram - Karl Pearson’s correlation coefficient
- Properties of Karl Pearson’s correlation coefficient - Factors
influencing the size of correlation coefficient - Probable Error -
Conditions under which probable error can be used
UNIT X Spearman’s Rank Correlation Coefficient - Partial Correlations - Pages - 142-152
Multiple Correlations - Regression - Regression analysis - Regression lines
- Regression coefficient - Standard Error of Estimate - Multiple
Regression Analysis - Reliability of Estimates - Application of Multiple
Regressions.
UNIT XI Business Forecasting –Objectives of forecasting in Pages - 153-158
business - Prediction, projection and forecasting - Characteristics of
business forecasting-Steps in forecasting , Methods of Business
Forecasting - Business barometers.
UNIT XII Time series analysis – Extrapolation - Regression analysis Pages - 159-165
- Modern econometric methods - Exponential smoothing method -
Theories of Business Forecasting - Sequence or time-lag theory -
Action and reaction theory - Economic rhythm theory - Specific
historical analogy - Cross-cut analysis theory - Utility of Business
Forecasting - Advantages of business forecasting - Limitations of
business forecasting
UNIT XIII Time Series Analysis – Utility of the Time Series - Pages - 166-182
Components of Time Series - Long term trend or secular trend -
Seasonal variations - Cyclic variations - Random variations -
Methods of Measuring Trend - Free hand or graphic method - Semi-
average method - Method of moving averages - Method of least
squares - Mathematical Models for Time Series - Additive model -
multiplicative model, Editing of Time Series - Measurement of
Seasonal Variation - Seasonal average method - Seasonal variation
through moving averages - Chain or link relative method - Ratio to
trend method - Forecasting Methods Using Time Series - Mean
forecast - Naive forecast - Linear trend forecast - Non-linear trend
forecast - Forecasting with exponential smoothing
UNIT XIV Index Numbers: Definition – Relative - Classification of Pages - 183-203
index numbers - Base year and current year - Chief characteristics of
index numbers - Main steps in the construction of index numbers -
Methods of Computation of Index Numbers – Un- weighted index
numbers - Weighted index numbers, Tests for Adequacy of Index
Number Formulae - Cost of Living Index Numbers of Consumer
Price Index - Utility of consumer price index numbers -
Assumptions of cost of living index number - Steps in construction
of cost of living index numbers - Methods of Constructing
Consumer Price Index - Aggregate expenditure method - Family
budget method - Weight average of price relatives - Limitations of
Index Numbers - Utility and Importance of IndexNumbers.
CONTENTS
2.0 Introduction
2.1 Objectives
2.2 Measures of Central Tendency
2.3 Mean
2.3.1 Arithmetic mean
2.3.2 Geometric mean
2.3.3 Harmonic mean
2.4 Median
2.5 Mode
2.6 Partition values
2.6.1 Quartiles
2.6.2 Deciles
2.6.3 Percentile
2.7 Measures of Dispersion
2.8 Range
2.13 Summary
2.17Further Readings
3.0 Introduction
3.1 Objectives
3.3Types of Probability
3.9 Summary
4.0 Introduction
4.1 Objectives
4.7 Summary
5.7 Unbiasedness
5.8 Consistency
5.9 Efficiency
5.10 Sufficiency
6.0 Introduction
6.1 Objectives
6.6 Summary
6.7 Key Words
7.0 Introduction
7.1 Objectives
7.5 Summary
8.0 Introduction
8.1 Analysis of Variance (ANOVA)
8.4 Summary
9.0 Introduction
9.1 Objectives
9.2 Correlation
9.8 Summary
10.1 Introduction
10.2 Objectives
10.4 Regression
10.10Summary
11.1 Introduction
12.1 Introduction
13.4 Forecasting
13.5 Deseasonalisation
13.6 Summary
14.0 Introduction
14.1 Objectives
14.6 Summary
1
Statistics 1.2 STATISTICS
The word statistics of English language have been derived from the
NOTES Latin word status or Italian word „statista‟ or German word „statistik‟. In
each case it means "an organised political state”. Although, in the past,
statistics was considered as the "science of statecraft" as it was used by
the government of various States to collect data regarding population
,births , deaths, taxes etc.,. Statistics, nowadays, have experienced a
modern development. Statistics play a crucial role in enriching a specific
domain by collecting data in that field, analyse the data by applying
various statistical techniques and making inferences about the same.For
example, knowing the average height of the students will enable the
engineer to know about the size of the door.
1.2.1 DEFINITION OF STATISTICS
The definition of statistics can be expressed in two ways to cover
two different concepts. They are
1. Statistics as numerical data
2. Statistics for statistical method
1. Statistics as numerical data
When the word „statistics‟ is used in plural sense, it refers to the
collection of numerical data.
For example: - Export or Import quantity, Foreign Direct Investment,
etc..,.
According to Webster,” statistics are classified facts representing
the conditions of the people in a state especially those facts which can be
stated in number or in table of numbers or in any tabular or classified
arrangements"
This definition of Webster reveals that only numerical facts can be
termed statistics. This is an old, narrow and inadequate definition for
modern times.
According to Bawley “Statistics are numerical statement of facts in
any department of inquiry placed relation to each other"
Here, Bowley says that statistics is the science of counting and
ignores other aspects such as analysis, interpretations etc..,.
According to Yule and Kendall,” By statistics we mean
quantitative data affected to a market extent by multiplicity of cause"
Yule and Kendall‟s definition tells us that numerical data is
affected by multiplicity of cause. For example, the cost of production is
affected by wage cost, exchange rate, raw material etc..,.
According to Professor Horace Secrist," It is the aggregate of
Self-Instructional Material
facts affected to mark extent by multiplicity of causes, numerically
2
expressed, enumerated or estimated according to a reasonable standard of Statistics
accuracy, connecting in a systematic manner for the predetermined
purpose and placed in relation to each other" NOTES
Secrist‟s definition for statistics is more complete. The vital point that
the definition covers are
1) Aggregate of facts
2) Affected by multiplicity of cause
3) Numerically expressed
4) Estimated according to standard of accuracy
5) Systematic Collection of data
6) Data collected for a predetermined purpose
7) Comparable
2. Statistics as Statistical Methods
According to Bowley,” Statistics the science of measurement of
social organism, regarded as a whole in all its manifestation"
This definition of Bowley is insufficient
According to Wallis and Roberts," Statistics is a body of methods
for making wise decision on the face of uncertainty"
This definition is modern as it conveys statistical methods enable
us to arrive at valid decisions.
According to Croxton and Cowden” statistics must be defined as
the science of collection, presentation, analysis and interpretation of
numerical data”
This definition gives a more elaborate meaning to statistics as
statistical tools.
1.2.2 IMPORTANCE OF STATISTICS
Statistics can be used to various areas of business operations for
effective results. Some prominent areas are given below.
1) Startups - While opening a new business or acquire one, we
need to study the market from a statistical point of view to get
accuracy in the market demand and supply .A businessman must
do proper research by collecting data, analyzing and interpreting
them regarding market trends before starting his business.
2) Production - The production of the commodity depends upon
various factors such as demand, supply of capital etc..,. These
factors must be analyzed statistically to get a precise and accurate
view of the same.
3) Marketing - An ideal marketing strategy requires statistical
analysis on population, income of consumers, availability of the
product ect..,.
Self-Instructional Material
3
Statistics 4) Investment - Statistics play a vital role in making decisions
regarding buying shares, debentures or real estate. Using this
NOTES statistical data, an investor will buy investments at a lesser price
and sell when the price increases.
5) Banking - Banking sector is highly influenced by economic and
market conditions. Bank have separate research department which
collect and analyse information regarding inflation rate, interest
rates, bank rates etc..,.
1.2.3 LIMITATIONS OF STATISTICS
1) Statistics does not analyse qualitative phenomenon
As statistics is a science which deals with numerical, it cannot be
applied in data that cannot be measured in terms of quantitative
measurements. However statistical techniques can be used to convert the
qualitative data to quantitative data.
2) Statistics does study individuals
Statistics deals with aggregate quantities and doesn't give
importance to individual data. This is because individual data is not
useful for statistical analysis.
3) Statistical laws are not exact
Statistical interpretations are based on averages and hence are only
approximations can be made
4) Statistics may be misused
Statistical data when used by an inexperienced person or illiterate
person can lead to wrong interpretations. Hence it must be used only by
experts.
1.2.4 FUNCTIONS OF STATISTICS
1) Consolidation
Statistics enables you to consolidate and understand huge data by
providing only significant observations.
For example, instead of observing the marks of each and every individual
with class average will enable you to know the class's performance as a
whole.
2) Comparison
Classification and tabulation of data are used to compare the data.
Various statistical tools such as graph, measure of depression dispersion,
correlation gives us huge scope for comparison.
For example, the market demand for a product can be compared among
the states. This enables the company to identify and analyse the target
Self-Instructional Material market.
4
Statistics
3) Forecasting
NOTES
Curriculum
Forecasting means predicting the future prospects. Statistics plays a
huge role in forecasting the future. NOTES
For example, with the data of the sales value for the past 10 years, we
will be able to predict the sales of the coming year approximately. Time
series analysis and regression analysis are important for forecasting.
4) Estimation
One of the main aims of statistics is to draw conclusions on a huge
population based on the analysis from a sample group.
For example, from a sample height of 10 students will be able to estimate
the average height of all the students from the class.
5) Test of hypothesis
Statistical hypothesis is portraying a huge population from the
inferences of a sample observation.
For example, if a particular fertilizer helps in increasing the crop yield in
a particular area then it will be used in other areas based on this sample.
1.2.5 SCOPE OF STATISTICS
1) Statistics in Industries
Statistics is extensively used in huge number of industries.
Statistics may be used in sales forecasting, consumer preference, quality
control, inventory control, risk management etc. Sampling is vital for
inspection plans.
2) Statistics in Education
Statistics plays an important role in education. Statistics help in
measuring and evaluating the progress of the student, formulating
policies and also helps to predict the future performance of the students
to help them improve in the same.
3) Statistics in Economics
Statistics helps us to understand and analyse economic theories.
Right from analysing microeconomic factors like the demand for the
product, research regarding different markets to macroeconomic concept
like inflation, unemployment can be done easily using statistics.
4) Statistics in Medicine
Statistics helps in researching and analysing medical experiments
and investigations. Biostatic enables researchers to identify if a particular
treatment or drug is working and how effective it is.
Self-Instructional Material
5) Statistics in Modern Application
5 Self-Instructional Material
Statistics A lot of software‟s are developed day to day for experimentation,
forecasting and estimation.
NOTES
Curriculum For example, SYSAT is one such software which provides with scientific
and technical graphical options.
NOTES 6) Statistics in Agriculture
Statistics can be applied in agriculture by analysing the
effectiveness of fertilizers. It can be used in taking decisions regarding
inputs and outputs, inventories etc..,.
1.3 DATA
Data are pieces of factual information that are recorded and applied
for analysis. Data is a tool which helps us to understand certain problems
by providing us with information. They are a set of values with
qualitative and quantitative variable.
1.3.1 TYPES OF DATA
Data of broadly classified into two based upon who collected the
data
Primary data
Primary data is the data collected by investigator himself for the
first time for his own research and analysis. It is also known as first-hand
information. Primary data is collected using method such as personal
interview, survey etc..,.
Secondary data
Secondary data is the data which is already been collected and
process by the person for the purpose of his research. Journals, internal
sources, journals, book etc..,. are sources of secondary data.
Self-Instructional Material
6
Self-Instructional Material
Statistics
MARKS FREQUENCY
0-10 5
10-20 10
20-30 20
30-40 15
40-50 10
10
Statistics
NOTES
Example
MONDAY 2000
TUESDAY 1750
WEDNESDAY 3000
THURSDAY 2250
FRIDAY 1550
KARNATAKA 75.36%
KERALA 93.91%
11
Statistics They can be further divided into two broad categories.
a) Multiple bar diagram
NOTES
When there is a need to compare two set of data multiple
bar diagram is used. For example import and export, production
and sale etc..,.
b) Component bar diagram
Component bar diagram also known as Sub diagrams are
used to compare different components of a particular class. For
example, the various components such as rent, medicine,
education on which the monthly salary spend can be easily
understood from a component bar diagram.
(ii) Pie diagram
A pie diagram is similar to that of a component bar diagram but
it is represented in circle proportionally instead of bars. The values
given in each class is converted into percentage and then each figure is
multiplied by 3.6 degree. (360/100 - 360 degree of a circle divided
into 100 parts) the values are then divided accordingly in the circle.
2) Frequency diagram
When the data is in the form of grouped frequency are usually
represented by frequency diagrams. Histogram, frequency polygon,
frequency curve and ogive are types of frequency diagram.
(i) Histogram
Histogram is a diagram which consists of rectangular bars
whose area is proportional to the frequency of a variable and
whose width is equal to the class interval.
(ii) Frequency polygon
A frequency polygon is another type of frequency
distribution graph. In a frequency polygon, the number of
observations is marked with a single point at the midpoint of each
and every interval. Then the points are connected using a straight
line.
(iii)Frequency curve
The frequency curve is obtained by drawing a smooth
freehand curve that passes through the points of a frequency
polygon closely as possible.
(iv) Ogive
Ogive also known as the cumulative frequencies are of two
types. When the cumulative frequencies are plotted against their
Self-Instructional Material upper limits respectively, then it is less than ogive. When the
12
cumulative frequencies are plotted against their lower limits Statistics
respectively, then it is more than ogive.
3) Arithmetic line graph NOTES
An arithmetic line graph also known as time series graph is a graph
where the time ( months, years, weeks) are plotted in the x axis and their
respective values are plotted in the y axis. It helps us in analysing trends
and periodicity of data.
1.6 SUMMARY
13
Statistics 1.8 ANSWERS TO CHECK YOUR PROGRESS
NOTES 1. The word „statistics‟ is used in plural sense refers to the collection
of numerical data and when in singular sense it means the science
of collecting, classifying and using statistics
2. According to Professor Horace Secrist," It is the aggregate of
facts affected to mark extent by multiplicity of causes,
numerically expressed, enumerated or estimated according to a
reasonable standard of accuracy, connecting in a systematic
manner for the predetermined purpose and placed in relation to
each other
3. Classification and tabulation of data are used to compare the data.
Various statistical tools such as graph, measure of depression
dispersion, correlation gives us huge scope for comparison.
4. Statistics helps in researching and analysing medical experiments
and investigations. Biostatic enables researchers to identify if a
particular treatment or drug is working and how effective it is.
5. Secondary data is the data which is already been collected and
process by the person for the purpose of his research.
6. Indirect oral investigation is when the investigator investigates a
person close to the source. This is done due to the reluctance of
the original person.
7. Questionnaire method
(i) This method is cheaper
(ii) The time consumed for this process is very less.
8. Publications of international bodies like UNO, WTO and WHO,
Publications of research institutes like ISI, NCERT, ICAR, and
Government publications.
9. Component bar diagram also known as Sub diagrams are used to
compare different components of a particular class.
10. Spatial classification is when the data classification is based on
place like town, city, district, state, country etc..,.
1.9 QUESTIONS AND EXERCISE
SHORT ANSWER QUESTIONS
1. Write short notes about the types of date
2. List the merits and demerits of direct personal interview
3. What are the general principles followed while framing a
questionnaire?
4. Write about the classification of tabular presentation of data.
5. What is a bar diagram? What are its types?
LONG ANSWER QUESTIONS
1. Analyse the importance and scope of statistics
Self-Instructional Material 2. Explain in detail about the data collection techniques used in
14
primary data. Statistics
3. Discuss about the functions and limitations of statistics.
4. Explain the various methods used for presentation of data. NOTES
1.10 FURTHER READINGS
1. Gupta, S. P. : Statistical Methods, Sultan Chand and Sons, New
Delhi.
2. Hooda, R. P.: Statistics for Business and Economics, Macmillan,
New Delhi.
3. Hein, L. W. Quantitative Approach to Managerial Decisions,
Prentice Hall,NJ.
4. Levin, Richard I. and David S. Rubin: Statistics for Management,
Prentice Hall, New Delhi.
5. Lawrance B. Moore: Statistics for Business & Economics, Harper
Collins, NY.
6. Watsman Terry J. and Keith Parramor: Quantitative Methods in
Finance International, Thompson Business Press, London.
Self-Instructional Material
15
Measures of Central Tendency
UNIT II MEASURES OF
NOTES
CENTRALTENDENCY
Structure
2.0 Introduction
2.1 Objectives
2.2Measures of Central Tendency
2.3 Mean
2.3.1 Arithmetic mean
2.3.2 Geometric mean
2.3.3 Harmonic mean
2.4 Median
2.5 Mode
2.6 Partition values
2.6.1 Quartiles
2.6.2 Deciles
2.6.3 Percentile
2.7 Measures of Dispersion
2.7.1 Properties of a good measure of Dispersion
2.7.2 Characteristics of Measures of Dispersion
2.7.3 Classification of Measures of Dispersion
2.8 Range
2.9 Quartile deviation
2.10 Mean Deviation
2.11 Standard Deviation
2.11.1 Calculation of Standard Deviation
2.12 Coefficient of Variable
2.13 Summary
2.14 Key Words
2.15Answers to Check Your Progress
2.16 Question and Exercise
2.17 Further Readings
2.0 INTRODUCTION
Measures of central tendency are a statistical tool used to
summarize data that depicts the central value of the given data. These
measures enable us to identify where most of the values fall. The
three most commonly used measures of central tendency are mean,
median and mode. In this unit you will learn about them extensively
Self-Instructional Material
and also learn about some other partition values.
16
2.1 OBJECTIVES Measures of Central Tendency
17
Measures of Central Tendency So 146 cm is the average height of the brothers. Here 154 > 146
> 138. The average value lies in between the minimum value and the
NOTES
maximum value.
Thus if x1, x2, ..., xn represent the values of n observations,
then arithmetic mean (A.M.) for n observations is: (direct method)
There are two methods for computing the arithmetic mean: (i)
Direct method (ii) Short cut method.
Direct Method:
Example:
The following data represent the number of books issued in a college
library is selected from 7 different days 17,1 9, 22, 25, 15, 40, 21 find
the mean number of books.
Solution:
x̅ = 20 + 39 + 22 + 25 + 45 + 40 + 54 = 245 = 35
7 7
Hence the mean of the number of books is 35
Indirect Method:
In this method an assumed mean or an arbitrary value (A) is used as
the basis of calculation of deviations (di) from individual values. If di
= xi – A
Example:
A student’s marks in 5 subjects are 95, 78, 88, 72,99. Find the
average of his marks.
Let us take the assumed mean, A = 88
xi di= xi– 88
95 7
78 10
88 0
Self-Instructional Material
18
72 -16 Measures of Central Tendency
99 10 NOTES
Total 11
Solution:
11
= 88 + = 88 + 5.5 = 93.5
5
Example:
Given the following frequency distribution, calculate the arithmetic
mean
Marks 64 63 62 61 60 59
No. Of. Students 8 18 12 9 7 6
Self-Instructional Material
19
Measures of Central Tendency
Solution:
NOTES
xi fi fi xi di = xi – A fidi
(A=62)
64 8 512 2 16
63 18 1134 1 18
62 12 744 0 0
61 9 549 -1 -9
60 7 420 -2 -14
59 6 354 -3 -18
60 3713 -7
Direct Method
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
𝑥=
𝑁
x̅ = 3713 / 60 = 61.88
Short cut method
𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑥 = 𝐴+ 𝑥𝑐
𝑁
Here A = 62
x̅ = 62 – 7 = 61.88
60
The mean mark is 61.88
Mean of continuous Grouped data:
Direct method
𝑛
𝑖=1 𝑓 𝑖 𝑥 𝑖
𝑥= , xi is the midpoint of the class interval
𝑁
Solution:
Yield ( in Kg) No of plots ( fi ) Mid xi fi xi d = (xi - A) / c fidi
64 - 84 3 74 222 -1 -3
84 - 104 5 94 470 0 0
104 – 124 7 114 798 1 7
124 – 144 20 134 2680 2 40
Total 35 4170 44
Direct Method
𝑛
𝑖=1 𝑓𝑖 𝑥𝑖
𝑥=
𝑁
x̅ = 4170 / 35 = 119.143
Short cut method
𝑛
𝑖=1 𝑓𝑖 𝑑𝑖
𝑥 = 𝐴+ 𝑥𝑐
𝑁
x̅ = 94 + 44 x c = 119.143
35
3.3.2 WEIGHTED ARITHMETIC MEAN
For calculating simple mean, all the values or the sizes of items in
the distribution have equal importance. But in practical life this may not be
so, in case some items are more important than others, a simple average
computed is not representative of the distribution. Proper weightage has to
be given to the various items.
For example a student may use a weighted in order to calculate their Self-Instructional Material
percentage grade in a course, in this the student would multiply the
weighing of all assessment items in the course( eg: assignment, exams,
21
Measures of Central Tendency projects, etc.)by respective grade that was obtained in each of categories
NOTES The average whose component items are being multiplied by
certain values known as “weights” and the aggregate of the multiplied
results are divided by the total sum of their “weight”
Let x1,x2,....,xn be the set of n values having weights w1,w2,....,wn
respectively,
then the weighted mean is
𝑛
𝑤1 𝑥1 + 𝑤2 𝑥2 + … … … 𝑤𝑛 𝑥𝑛 𝑖=1 𝑤𝑖 𝑥𝑖
𝑥𝑤 = = 𝑛
𝑤1 + 𝑤2 + 𝑤3+⋯………+𝑤1 𝑖=1 𝑤𝑖
Example:
A student obtained the marks 40,50,60,80, and 45 in math,
statistics, physics, chemistry and biology respectively. Assuming
weights 5,2,4,3, and 1 respectively for the above mentioned subjects,
find the weighted arithmetic mean per subject.
Cu
Solution
Components Marks scored ( xi ) Weightage (wi ) wi xi N
Maths 40 5 200
Statistics 50 2 100
Physics 60 4 240
Chemistry 80 3 240
Biology 45 1 45
Total 15 825
Weighted average:
= 22 x 15 + 18 x 20
22 + 18
= 330 + 360 = 690 = 172.5
40 40
ulum
Merits of AM
ES 1. It can be calculated easily and is also easy to understand.
2. Fluctuation can be minimized
3. It can further be used for statistical treatement like
median,mode etc.,.
4. This method is rigidly defined and hence can be used for
comparison
Demerits of AM
1. It cannot be plotted in a graph.
2. It is not applicable in qualitative data.
3. AM cannot be calculated if the class intervals have open
ends.
4. It is highly influenced by extreme observations.
3.3.2 GEOMETRIC MEAN ( GM )
A geometric mean is a mean or average which shows the
central tendency of a set of numbers by using the product of their
values.
The geometric mean of two numbers, say x, and y is the
square root of their product x×y. For three numbers, it will be the
cube root of their products i.e., (x y z) 1⁄3.
The geometric mean of a series containing n observations is
the nth root of the product of the values. If x1, x2,……xn are
observations then Self-Instructional Material
23
Measures of Central Tendency
NOTES
Example:
Calculate the geometric mean of the following growth of price of
onions per 100 Kg per annum is 180, 250, 490, 1400, and 1050
Solution:
= Antilog 13.5107
5
= Antilog 2.7021 = 503.6
Geometrical mean of onion rate is 503.6
Example:
Find the geometric mean for the following distribution of student’s
marks:
Marks 0 – 30 30 – 50 50 – 80 80 - 100
No . of students 20 30 40 10
Solution:
No of Mid
Marks f log x
students f points x
Self-Instructional Material
24
20 (log 15) = 20(1.1761) = Measures of Central Tendency
0 – 30 20 15
23.5218 NOTES
30 (log 40) = 30 (1.6020)
30 – 50 30 40
= 48.0168
40 (log 65) = 20(1.8129) =
50 – 80 40 65
72.5165
80 - 10 (log 90) = 20(1.9542) =
10 90
100 19.5424
Total 100 163.6425
= Antilog 163.6425
100
= Antilog 1.6364 = 503.6
Geometrical mean of onion rate is 43.29
Merits of Geometric mean:
1. It is strictly defined
2. It is based on all items
3. It is very suitable for averaging ratio, rates and percentages
4. It is capable of further mathematical treatment
5. Unlike AM, it is not affected much by the presence of
extreme values
Demerits of geometric mean:
1. It cannot be used when the values are negative or if any of the
observations is zero
2. It is difficult to calculate particularly when the items are very
large or when there is a frequency distribution
3. It brings out the property of the ratio of the change and not the
absolute difference of change as the case in arithmetic mean
4. The GM may not be the actual value of the series
3.3.3 HARMONIC MEAN
Harmonic mean of a set of observations is defined as the
reciprocal of the arithmetic average of the reciprocal of the given
values. If x1,x2…..xn are n observations.
A harmonic mean is used in averaging of ratios. The most Self-Instructional Material
common examples of ratios are that of speed and time, cost and unit
of material, work and time etc. The harmonic mean (H.M.) of n
25
Measures of Central Tendency observations is
NOTES H.M. for ungrouped data
Example:
Calculate the harmonic mean of the numbers 13.5, 14.5, 14.8, 15.2
and 16.1
Solution:
The harmonic mean is calculated as below:
x 1/x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
Total 0.3417
= 5 = 14.63
0.3417
H.M. Discrete Grouped data:
For a frequency distribution
Example:
The frequency distribution of first year students of a particular
college, calculate the harmonic mean
Age (years) 17 18 19 20 21
2 5 13 7 3
Solution:
Self-Instructional Material
26
Age ( years) x Number of students f f / x Measures of Central Tendency
17 2 0.1176 NOTES
18 5 0.2778
19 13 0.6842
20 7 0.3500
21 3 0.1429
Total 30 1.5725
2.4 MEDIAN
The number of students in your classroom, the money your parents
earns, the temperature in your city is all important numbers. But how
can you get the information of the number of students in your school
or the amount earned by the citizen of your entire city? Self-Instructional Material
The median is that value of the variable which divides the
27
Measures of Central Tendency group into two equal parts, one part comprising all values greater and
the other all values less than median.
NOTES
Ungrouped data
Arrange the given values in the ascending or descending order.
If the number of value is odd, median is the middle value.
For example if we have the number of values 12, 15, 21, 27, 35. So
the numbers are odd then taking the mean as the midpoint 21.
(𝑛+1)𝑡ℎ
Median = term if n is odd
2
If the number of values is even, median is the mean of the middle two
values.
For example if we have 12, 15, 21, 27, 35, 40. So the numbers are
even then taking the mean of the numbers,
(𝑛 )𝑡ℎ (𝑛+1)𝑡ℎ
Median = Mean( 𝑎𝑛𝑑 terms )
2 2
Example:
The salaries of 8 employees who work for a small company
are listed below. What is the median salary?
40,000; 29,000; 35,500; 31,000; 43,000; 30,000; 27,000; 32,000
Solution:
Arrange the data in ascending order
27,000; 29,000; 30,000; 31,000; 32,000; 35,500; 40,000; 43,000
Since there is an even number of items in the data set, we compute
the median by taking the mean of the two middlemost numbers.
(𝑛)𝑡ℎ (𝑛+1)𝑡ℎ 4 𝑡ℎ + 5𝑡ℎ 𝑖𝑡𝑒𝑚
Mean ( 𝑎𝑛𝑑 terms ) =
2 2 2
31,000 + 32,000 63,000
= = = 31,500
2 2
Example: 13
Find the median of the following set of points in a game: 15, 14, 10,
8, 12, 8, 16
Self-Instructional Material
28
Solution: Measures of Central Tendency
First arrange the values in an ascending order 8, 8, 10, 12, 14, 15, 16 NOTES
Solution:
No of Students No of Branches Cumulative Frequency
x f cf
1 2 2 Self-Instructional Material
2 11 13
29
Measures of Central Tendency 3 15 28
NOTES 4 20 48
5 25 73
6 18 91
7 10 101
Total 101
(𝑁+1)𝑡ℎ
Median = size of item
2
(101 +1)𝑡ℎ
= size of item
2
= 51th item
Median = 5 because 51th item corresponds to 5
Median for continuous grouped data
In case, the data is given in the form of a frequency table with
class interval etc, then the following formula is used for calculating
median in continuous grouped data
Class
0-4 5-9 10-14 15-19 20-24 25-29 30-34 35-39
interval
Frequency 5 8 10 12 7 6 3 2
Solution:
Class Frequency True class Cumulative
interval interval frequency
f
cf
Self-Instructional Material
30
0-4 5 0.5 - 4.5 5 Measures of Central Tendency
𝑁 53
= = 26.5
2 2
l = 14.5
N/2 = 26.5
m = 23
f = 12
= 14.5 + (26.5 – 23) x 5 = 14.5 + 1.46 = 15.96
12
Merits of Median:
1. Median is not influenced by extreme values because it is a
positional average.
2. Median can be calculated in case of distribution with open
end intervals.
3. Median can be located even if the data are incomplete.
4. Median can be located even for qualitative factors such as
ability, honesty etc.
Demerits of Median:
1. A slight change in the series may bring drastic change in
median value
2. In case of even number of items or continuous series, median
is an estimated value other than any value in the series. Self-Instructional Material
3. It is not suitable for further mathematical treatment except its
31
Measures of Central Tendency use in mean deviation.
4. It is not taken into account all the observation.
NOTES
2.5 MODE
The mode is the most frequently occurring values or scores.The
mode is useful when there are a lot of repeated values. There can be
no mode, one mode, or multiple modes.
Its importance is very great in marketing studies where a manager is
interested in knowing about the size, which has the highest
concentration of items. For example, in placing an order foot shoes
or ready-made garments the model size helps because the sizes and
other sizes around in common demand.
Ungrouped Data:
For ungrouped values or a series of individual observation mode
is often found by mere inspection
Example:
Find the mode for the following list of values:
13,18,13,14,13,16,14,21,13
Solution:
The mode is the number that is repeated more often than any other
Therefore the Mode = 13
In some cases the mode may be absent while in some cases there
may be more than one mode.
Example:
Ms.Rossy asked students in her class how many siblings they each has.
Find the mode of the data : 0,0,0,1,1,1,1,2,2,2,2,3,3,4
Solution:
The modes are 1 and 2 siblings
Grouped Data
For Discrete distribution, the highest frequency and corresponding value
of X is mode.
Continuous distribution:
f 5 14 40 91 450 87 60 38 15
Solution:
The highest frequency is 450 and corresponding class interval in 200 –
250, which is the modal class
Here L = 200, f1 = 150, f0=91, f2=87, h=50
= 200 + 150 – 91 x 50
2 x 150 – 91 – 87
= 2450 = 200 + 24.18 = 224.18
122
Example: 19
Find the modal class and the actual mode of the data set below
Number 1-3 4-6 7-9 10-12 13-15 16-18 19-21 22-24 25-27 28-30
Frequency 7 6 4 2 2 8 1 2 3 2
Solution:
Modal class = 10 – 12
Here L = 10, f1 = 9, f0 = 4, f2 = 2, h = 3
= 10 + 9-4 x3
2x9-2-4
= 10 + 5 x 3 = 10 + 1.25 = 11.25 Self-Instructional Material
12
33
Measures of Central Tendency Mode = 11.25
NOTES Merits of mode:
1. It is easy to calculate and in some cases it can be located mere
inspection.
2. Mode is not at all affected by extreme values
3. It can be calculated for open-end classes
4. It is usually an actual value of an important part of the series
5. In some circumstances it is the best representative of data
Demerits of mode:
1. It is not based on all observation
2. It is not capable of further mathematical treatment
3. Mode is ill defined generally it is not possible to find mode in
some cases.
4. As compared with mean, mode is affected to a great extent by
sampling fluctuations
It is unsuitable in cases where relative importance of items has to be
considered.
2.6 PARTITION MEASURES
2.6.1 QUARTILES
The quartiles divide the distribution in four parts. There are three
quartiles denoted by Q1, Q2 and Q3 divides the frequency distribution in
to four equal parts
That is 25% of data will lie below Q1, 50% of data below Q2 and
75percent below Q3. Here Q2 is called the Median. Quartiles are
obtained in almost the same way as median.
Ungrouped Data:
If the data set consist of n items and arranged in ascending order then
Example:20
Compute quartiles for the data 25, 18, 30, 8, 15, 5, 10, 35, 40, 45.
Solution:
(𝑛 +1)𝑡ℎ (10+1)𝑡ℎ
Q1 = item = item = (2.75)th item
4 4
(3)𝑟𝑑
= 2nd item + ( 3rd item - 2nd item)
4
Self-Instructional Material
34
(3)
=8+ ( 10 – 8 ) = 8 + 1.5 Measures of Central Tendency
4
NOTES
Q1= 9.5
(𝑛+1)𝑡ℎ (10+1)𝑡ℎ
Q3 = 3 item = item = 3 x (2.75)th item
4 4
= (8.25)th item
(1)
= 2nd item + ( 9th item - 8th item)
4
(1)
= 35 + (40 – 35 ) = 35 + 1.25
4
Q3 = 36.25
Continuous series:
In the case of continuous series, find the cumulative frequency and
then use the interpolation formula.
Find Cumulative frequencies
Find N / 4
Q1 class is the class interval corresponding to the value of the
cumulative frequency just greater than N / 4
Q3 class is the class interval corresponding to the value of the
cumulative frequency just greater than 3 N / 4
35
Measures of Central Tendency
NOTES Solution:
Class Frequency f Cumulative frequency cf
10 - 20 4 4
20 - 30 3 7
30 - 40 2 9
40 - 50 1 10
50 - 60 5 15
(3.75− 0)
= 10 + x 10 = 10 + 9.38 = 19.38
4
= 50 + (11.25 – 10 ) x 10
5
= 50 + 2.5 = 52.5
2.6.2 DECILES
These are the values which divide the total number of observation
into 10 equal parts. They are D1, D2, D3, D4, D5, D6, D7, D8, D9 and D10.
Ungrouped Data:
Example:
Compute the D7 for the data: 5, 24, 36, 12, 20, and 8.
Solution:
Arranging the given data in the ascending order 5,8,12,20,24,36
(5(𝑛 +1))𝑡ℎ (5(6+1))𝑡ℎ
D5 = observation = observation = ( 3.5)th observation
10 10
Example: NOTES
= 30 + 0.5 = 30.5
D7 = (7N / 10)th item
= (7 x 62 / 10)th item
= (43.4)th item
This lies in the interval 40 – 50
(7N / 10 – m)
D4 = l + xc
𝑓
(43.4−40) (3.4)
= 40 + x 10 = 30 + x 10
10 10
= 40 + 3.4 = 43.4
2.6.3 PERCENTILE
The percentile values divide the distribution into 100 parts each
containing 1 percent of the cases. The percentile (P k) is that value of the
variable upto which lie exactly k% of the total number of observation Self-Instructional Material
Relationship
37
Measures of Central Tendency P25 = Q1
NOTES P50 = Median = Q2
P75 = 3rd quartile = Q3
Ungrouped Data:
Example: 24
The monthly income ( in ₹1000) of 8 persons working in a factory. Find
P30 income value 17, 21,14,36,10,25,15,29
Solution:
Arrange the data in the increasing order : 10, 14, 15, 17, 21, 25, 29, 36
n=8
= 14 + 0.7 ( 15 -14)
= 14 + 0.7
P30 = 14.7
Grouped Data:
Example: 25
Find P53 for the following frequency distribution.
Class 0-5 5-10 10-15 15-20 20-25 25-30 30-35 35-40
interval
Frequency 5 8 12 16 20 10 4 3
Solution:
Class interval Frequency Cumulative frequency
0-5 5 5
Self-Instructional Material
38
5-10 8 13 Measures of Central Tendency
10-15 12 25 NOTES
15-20 16 41
20-25 20 61
25-30 10 71
30-35 4 75
35 - 40 3 78
Total 78
P53 = l + (53N / 10 – m) x c
f
= 20 + (41.34 - 41) x 5= 20 + 0.335 = 20.335
20
10 2 10
10 8 12
10 20 8
∑X = 30 30 30
In all three series, the value of arithmetic mean is 10. On the basis
of this average, we can say that the series are alike. If we carefully
examine the composition of three series, we find the following
differences:
(i) In case of 1st series, three items are equal; but in 2nd and 3rd
series, the items are unequal and do not follow any specific
order.
(ii) The magnitude of deviation, item-wise, is different for the 1st,
2nd and 3rd series. But all these deviations cannot be
ascertained if the value of simple mean is taken into Self-Instructional Material
consideration.
39
Measures of Central Tendency (iii) In these three series, it is quite possible that the value of
arithmetic mean is 10; but the value of median may differ from
NOTES
each other. This can be understood as follows;
10 2 8
10 20 12
∑X = 30 30 30
Example:
Calculate range and its coefficient from the following distribution.
x 10- 15 15 – 20 20 – 25 25 - 30
41
Measures of Central Tendency
NOTES
Solution: L = 30, S = 10
Range = L - S = 30 – 10 = 20
Coefficient of Range = (L-S) / (L+S) = 30 - 10 / 30 + 10
= 20/ 40= 0.5
Merits of Range
It is the simplest of the measure of dispersion
Easy to calculate
Easy to understand
Independent of change of origin
Demerits of Range
It is based on two extreme observations. Hence, get affected by
fluctuations
A range is not a reliable measure of dispersion
Dependent on change of scale
Solution:
28 + 72 + 90 + 140 + 210 540
𝐴= = = 108
5 5
28 80
72 36
90 18
Self-Instructional Material
44
Measures of Central Tendency
140 32
NOTES
210 102
Ʃ|D|= 268
1 |𝐷| 268
Mean Deviation = MD = X−A =
= = 𝟓𝟑. 𝟔
N 𝑁 5
MD 53.6
Coefficient of Mean Deviation = = = 𝟎. 𝟒𝟗𝟔𝟑
A 108
Discrete Data Series
For discrete series, the Mean Deviation can be calculated using
𝐟 |𝐱 − 𝐌𝐞| 𝐟 |𝐃|
𝑴𝑫 = =
𝑵 𝑵
Where, N = Number of observations.
f = Different values of frequency f.
x = Different values of items.
Me = Median.
Coefficient of Mean Deviation
The Coefficient of Mean Deviation can be calculated using the
following formula.
𝐌𝐃
𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐌𝐃 =
𝐌𝐞
Example: Calculate the mean deviation and for the following discrete
data
Frequency 6 15 3 3 9
Solution:
42 6 252 93 558
45
Measures of Central Tendency
150 3 550 15 45
NOTES
210 9 1890 75 675
MD 46.75
Coefficient of MD = = = 𝟎. 𝟑𝟒𝟔𝟑
Me 135
Continuous Data Series
The method of calculating mean deviation in a continuous series
is same as the discrete series. In continuous series, find a midpoint of the
various classes and take deviation of these points from the average
selected
f |x − Me| f |D|
𝑀𝐷 = =
𝑁 𝑁
Where N = Number of observations.
f = Different values of frequency f.
x = Different values of items.
Me = Median.
Self-Instructional Material
46
Measures of Central Tendency
Age in years 0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80
NOTES
No of persons 40 50 64 80 82 70 20 16
Solution:
Mid Mid
Item Frequenc |xi – fi |xi – Item Frequenc
poin fixi poin fixi
s y fi Me| Me| s y fi
t xi t xi
0- 31.4 1258. 0-
5 40 200 5 40 200
10 7 8 10
30- 30-
35 80 2800 1.47 117.6 35 80 2800
40 40
Ʃ fi |xi
– Me|
Ʃ fixi
N = 422 =1539 6544.3
0 4
Self-Instructional Material
47
Measures of Central Tendency Ʃ fixi 15390
Median = = = 𝟑𝟔. 𝟒𝟕
NOTES N 422
f |x − Me| f |D| 6544.34
Mean Deviation = = = = = 15.5079
N N 422
MD 15.5079
Coefficient of MD = = = 0.4252
Me 36.47
Merits of Mean Deviation:
It is simple to understand and easy to compute.
It is based on each and every item of the data.
MD is less affected by the values of extreme items than the
Standard deviation.
Demerits of Mean Deviation:
The greatest drawback of this method is that algebraic signs are
ignored while taking the deviations of the items.
It is not capable of further algebraic treatments.
It is much less popular as compared to standard deviation.
2.11 STANDARD DEVIATION
The concept of Standard Deviation was introduced by Karl
Pearson in 1893. It is by far the most important and widely used measure
of dispersion. Its significance lies in the fact that it is free from those
defects which afflicted earlier methods and satisfies most of the
properties of a good measure of dispersion. Standard Deviation is also
known as root-mean square deviation as it is the square root of means of
the squared deviations from the arithmetic mean.
The standard deviation is defined as the positive square root of
the mean of the square deviations taken from the arithmetic mean of the
data
Ungrouped data
x1 , x2 , x3 ... xn are the ungrouped data then standard deviation is
calculated bythere are two methods of calculating standard deviation in
an individual series
Actual mean method
Assumed mean method
Actual Mean Method:
Ʃ(X − X̅ )2
Standard deviation σ =
n
Example:
Calculate the standard deviation from the following data 28, 44,
Self-Instructional Material 18, 30, 40, 34, 24, 22.
48
Measures of Central Tendency
NOTES
Solution:
Deviations from actual mean
Values (X) X - X̅ (X - X̅ )2
28 -2 4
44 -14 196
18 -12 144
30 0 0
40 10 100
34 4 16
24 -6 36
22 -8 64
240 560
240
𝑋= = 30
8
Ʃ(X − X̅ )2 560
σ = = = 70 = 𝟖. 𝟑𝟔𝟔𝟔
n 8
Assumed Mean Method
This method is used when the arithmetic mean is fractional value.
Taking deviations from fractional value would be a very difficult and
tedious task. To save time and labour a short cut method is used;
deviations are taken from a assumed mean.
2
Ʃd2 Ʃd
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 σ = −
n 𝑛
Example:
The marks obtained by the college students in statistics. Using the
following data calculate standard deviation.
Self-Instructional Material
49
Measures of Central Tendency
NOTES
Students No: 1 2 3 4 5 6 7 8 9 10
Marks 53 58 46 67 32 70 35 68 88 99
1 53 -14 196
2 58 -9 81
3 75 8 64
4 67 0 0
5 32 -35 1225
6 70 3 9
7 35 -32 1024
8 68 1 1
9 88 21 441
10 69 2 4
Ʃ d2 =
n = 10 Ʃd = -55
3045
2
Ʃd2 Ʃd
σ= −
n 𝑛
2
3045 −55
= − = 304.5 − 30.25 = 274.25
10 10
= 𝟏𝟔. 𝟓𝟔𝟎𝟓
2.11.1 CALCULATION OF STANDARD DEVIATION
Discrete series: There are three methods for calculating standard
deviation in discrete series. They are
a) Actual mean method
b) Assumed mean method
Self-Instructional Material c) Step deviation method
50
Actual mean method Measures of Central Tendency
Calculate the mean of the series. Find the deviations for various NOTES
items from the means and square the deviations and multiply by the
respective frequency and total the product the formula to calculate actual
mean method is
Ʃ𝐟𝐝𝟐
𝛔=
Ʃ𝐟
If the actual mean is fractions, the calculation takes lot of time
and labour; and as such this method is rarely used in practice
Assumed mean method
Here deviation is taken not from an actual mean but from an
assumed mean. Also this method is used, if the given variable values are
not in equal intervals.
Ʃ𝐝𝟐 Ʃ𝐝 𝟐
𝛔= − where d = X – A, N = Ʃf
𝐟 𝒇
Example:
Calculate standard deviation from the following data:
X 20 22 25 31 35 40 42 45
f 5 12 15 20 25 14 10 6
Solution:
Deviation from assumed mean
x f d = X-A d2 fd fd2
(A=31)
22 12 -9 81 -108 972
25 15 -6 36 -90 540
31 20 0 0 0 0
35 25 4 16 100 400
40 14 9 81 126 1134
Self-Instructional Material
42 10 11 121 110 1210
51
Measures of Central Tendency
45 6 14 196 84 504
NOTES
N= 107 Ʃfd = 167 Ʃ fd2 = 5365
Example:
The frequency distribution of marks in mathematics given in the table
Marks 30 40 50 60 70 80 90
No of students 8 12 20 10 7 3 2
Solution:
30 8 -2 -16 32
40 12 -1 -12 12
50 20 0 0 0
60 10 1 10 10
70 7 2 14 28
80 3 3 9 27
90 2 4 8 32
Self-Instructional Material
52
Self-Instructional Material
Measures of Central Tendency
2
Ʃfd2 Ʃfd
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 σ = − XC NOTES
N 𝑁
2
141 13
= − X 10 = 1.4934 X 10 = 𝟏𝟒. 𝟗𝟑𝟒
62 62
Example:
Particulars regarding income of two company are given below:
Company
A B
NOTES
d1 = x̅12 - x̅1 = 1613.6363 -1500 = 113.6363
d2 = x̅12 - x̅2 = 1613.6363 – 1750 = -136.3637
600 100 + 12913 .209 +500 (81+18595 .0587 )
σ12 = 600 + 500
= 124.8488
Merits of Standard Deviation:
Among all measures of dispersion Standard Deviation is
considered superior because it possesses almost all the requisite
characteristics of a good measure of dispersion. It has the following
merits:
It is rigidly defined.
It is based on all the observations of the series and hence it is
representative.
It is amenable to further algebraic treatment.
It is least affected by fluctuations of sampling.
Demerits:
It is more affected by extreme items.
It cannot be exactly calculated for a distribution with open-ended
classes.
It is relatively difficult to calculate and understand.
2.12 COEFFICIENT OF VARIATION
The coefficient of variation (CV) is a statistical measure of the dispersion
of data points in a data series around the mean. The coefficient of
variation represents the ratio of the standard deviation to the mean, and it
is a useful statistic for comparing the degree of variation from one data
series to another, even if the means are drastically different from one
another.
Coefficient of Variation = (Standard Deviation / Mean) * 100.
σ
CV = X 100
x
The coefficient of variation (CV) is a measure of relative
variability. It is the ratio of the standard deviation to the mean (average).
For example, the expression “The standard deviation is 15% of the mean
is a CV.
The CV is particularly useful when you want to compare results
from two different surveys or tests that have different measures or values.
For example, if you are comparing the results from two tests that have
Self-Instructional Material
54
different scoring mechanisms. If sample A has a CV of 12% and sample Measures of Central Tendency
B has a CV of 25%, you would say that sample B has more variation, NOTES
relative to its mean.
Example:
Price of car in five years in two cities is given below:
20,00000 10,00000
22,00000 20,00000
19,00000 18,00000
23,00000 12,00000
16,00000 15,00000
City A City B
20 0 0 10 -5 25
22 2 4 20 5 25
19 -1 1 18 3 9
23 3 9 12 -3 9
16 -4 16 15 0 0
City A: x̅ = Ʃx / n = 100 / 5 = 20
Self-Instructional Material
55
Measures of Central Tendency
(X − X)2 dx 2 30
σx = = = = 2.45
NOTES
n n 5
σ 2.45
C. V. X = X 100 = X 100 = 𝟏𝟐. 𝟐𝟓%
x 20
City B: x̅ = Ʃx / n = 75 / 5 = 15
(y − y)2 dy2 68
σy = = = = 3.69
n n 5
σ 3.69
C. V. Y = X 100 = X 100 = 𝟐𝟒. 𝟔%
y 15
City A had more stable prices than City B, because the coefficient of
variation is less in City A.
Marks 30 40 50 60 70 80 90
No of students 16 24 40 20 14 6 4
57
Measures of Central Tendency
Self-Instructional Material
58
UNIT III - PROBABILITY Probability
NOTES
Structure
3.0 Introduction
3.1 Objectives
3.2 Importance Terms
3.3Types of Probability
3.4 Basic relationship of Probability
3.5 Addition Theorem of Probability
3.6 Multiplication Theorem of Probability
3.7. Condition Probability
3.7.1 Combined Use Of Addition And Multiplication Theorem
3.8 Baye‟s Theorem and its application
3.9 Summary
3.10 Key Words
3.11 Answer to Check Your Progress
3.12 Questions and Exercise
3.3 Further Readings
3.0 INTRODUCTION
In our day to day life the “probability” or “chance” is very
commonly used term. Sometimes, we use to say “Probably it may rain
tomorrow”, “Probably Mr. X may come for taking his class today”,
“Probably you are right”. All these terms, possibility and probability
convey the same meaning. But in statistics probability has certain special
connotation unlike in Layman‟s view.
The theory of probability has been developed in 17th century. It
has got its origin from games, tossing coins, throwing a dice, drawing a
card from a pack. In 1954 Antoine Gornband had taken an initiation and
an interest for this area.
After him many authors in statistics had tried to remodel the idea
given by the former. The “probability” has become one of the basic tools
of statistics. Sometimes statistical analysis becomes paralyzed without
the theorem of probability. “Probability of a given event is defined as the
expected frequency of occurrence of the event among events of a like
sort.” (Garrett)
The probability theory provides a means of getting an idea of the
likelihood of occurrence of different events resulting from a random
experiment in terms of quantitative measures ranging between zero and Self-Instructional Material
59
Probability one. The probability is zero for an impossible event and one for an event
which is certain to occur.
NOTES
3.1 OBJECTIVES
The students will be able to understand
The important terms in probability
Concept of conditional probability, addition theorem and
multiplication theorem.
Baye‟s theorem and its applications
3.2 IMPORTANT TERMS
1. Probability or Chance: Probability or chance is a common term
used in day-to-day life. For example, we generally say, 'it may
rain today'. This statement has a certain uncertainty. Probability is
quantitative measure of the chance of occurrence of a particular
event.
2. Experiment: An experiment is an operation which can produce
well-defined outcomes.
NOTES
6. Sample Space :Sample Space is the set of all possible outcomes
of an experiment. It is denoted by S.
Examples : When a coin is tossed, S = {H, T} where H =
Head and T = Tail
7. Mutually Exclusive Events: Two or more than two events are
said to be mutually exclusive if the occurrence of one of the
events excludes the occurrence of the other
Example :When a coin is tossed, we get either Head or Tail.
Head and Tail cannot come simultaneously. Hence occurrence of
Head and Tail are mutually exclusive events.
If an event can occur in „a‟ ways and fail to occur in „b‟ ways and
these are equally to occur, then the probability of the event
occurring, a/a+b is
denoted by p. Such probabilities are known as unitary or
theoretical or mathematical probability.
p is the probability of the event happening and q is the probability
of its not happening.
𝑎 𝑏
p =𝑎 +𝑏 and q = 𝑎 +𝑏
𝑎 +𝑏
Hence p+q =𝑎 +𝑏
Therefore p+q = 1
Limitations:
o This definition is confined to the problemsof games of
chance only and can notexplain the problem other than the
gamesof chance.
o This method can not be applied, when theoutcomes of a
random experiment are notequally likely.
o The classical definition is applicable onlywhen the events
are mutually exclusive.
Limitations:
Self-Instructional Material o The experimental conditions may not remain essentially
homogeneous and identical in a large number of
62
repetitions of the experiment. Probability
o The relative frequency m/n, may not attain aunique value
NOTES
no matter however large.
o Probability p(A) defined can never be obtainedin practice.
We can only attempt at a closeestimate of p(A) by making
N sufficiently large.
3. Subjective Approach :
The subjective approach is also known as subjective
theory of probability. The probability of an event is considered as
a measure of one‟s confidence in the occurrence of that particular
event
This theory is commonly used in business decision making. The
decision reflects the personality of the decision maker. Persons
may arrive at different probability assignment because of
differences in value at experience etc. The personality of the
decision maker is reflected in a final decision. The decision under
this theory is taken on the basis of the available data plus the
effects of other factors many of which may be subjective in
nature.
Example:A student would top in B. Com Exam this year.
A subjective would assign a weight between zero and one to this
event according to his belief for its possible occurrence.
4. Axiomatic Approach:
The probability calculations are based on the axioms. The
axiomatic probability includes the concept of both classical and
empirical definitions of probability.
The approach assumes finite sample spaces and is based on the
following three axioms:
i) The probability of an event ranges from 0 to 1.If the event
cannot take place its probability shall be „0‟ and if it is
bound to occur its probability is„1‟.
ii) The probability of the entire sample space is 1, i.e. p(S)=1.
iii) If A and B are mutually exclusive events then the
probability of occurrence of either A or B denoted
byp(AUB) = p(A) + p(B)
iv) If A and B are happening together events then the
probability of occurrence of probability of A intersection
B denoted by p (A∩ B) = p(A) . p(B)
3.4 BASIC RELATIONSHIPS OF PROBABILITY
There are some basic probability relationships that can be used to
compute the probability of an event without knowledge of all the
sample point probabilities.
Self-Instructional Material
63
Probability
NOTES
Complement of an
Event:The complement of
any even A is the even (not
A),i.e, the event that A does
not occur. The event A and
its complement (not A) are
mutually exclusive and
exhaustive.it is denoted Aʹ ,
Ac or 𝐴
Intersection of Two
Events: The intersection
of events A and B is the
set of all sample points
that are in both A and B.
it is denoted by A∩ B
o Mutually
Exclusive Events: two sets
are mutually exclusive ( also
called disjoint) if they do not
have any elements in
common; they need not
together comprise the
universal set.
64
1. Addition Theorem For Mutually Exclusive Events Probability
Statement: If A and B are two mutually exclusive events, then the NOTES
probability of occurrence of either A or B is the sum of the individual
probabilities of A and B. Symbolically
Proof : Let N be the total number exhaustive and equally likely cases of
an experiment. Let m1 and m2 be the number of cases favourable to the
happening of events A and B respectively. Then
and
.
Since the events A and B are mutually exclusive, the total number of
events favorable to either A or B i.e. n(A∪B) = m1 +m2, then
65
Probability modified as:
NOTES Statement: If A and B are not mutually exclusive events, the probability
of the occurrence of either A or B or both is equal to the probability that
event A occurs, plus the probability that event B occurs minus the
probability of occurrence of the events common to both A and B. In other
words the probability of occurrence of at least one of them is given by
Example 2
Self-Instructional Material A card is drawn at random from a pack of 52 cards. Find the probability
66
that the drawn card is either a spade or a king. Probability
Because one of the kings is a spade card also therefore, these events are
not mutually exclusive. The probability of drawing a king of spade is
67
Probability event B can happen in n2 ways out of which a2 are favorable, we can
combine each favorable event in the first with each favorable event in the
NOTES
second case. Thus, the total number of favorable cases is a 1 x a2.
Similarly, the total number of possible cases is n1 x n2. Then by definition
the probability of happening of both the independent events is
68
Probability
NOTES
For the conditional event A|B (i.e., the happening of A under the
condition that B has happened), the favorable outcomes (sample points)
must be out of the sample points of B. In other words, for the event A|B,
the sample space is B and hence
Similarly, we have
Also
Example
A bag contains 5 white and 8 red balls. Two successive drawings of 3
balls are made such that (a) the balls are replaced before the second
drawing, and (b) the balls are not replaced before the second draw. Find
the probability that the first drawing will give 3 white and the second 3
red balls in each case.
Solution:
(a) When balls are replaced.
Total balls in the bag = 8 + 5 = 13
3 balls can be drawn out of total of 13 balls in 13C3 ways.
3 white balls can be drawn out of 5 white balls in 5 C3 ways.
NOTES
Probability of 3 red balls =
Since the events are independent, the required probability is:
(b) When the balls are not replaced before second draw
Total balls in the bag = 8 + 5 = 13
3 balls can be drawn out of 13 balls in 13C3 ways.
3 white balls can be drawn out of 5 white balls in 5 C3 ways.
70
Probability
NOTES
71
Probability
NOTES black =
ii) Probability of drawing First ball black and the second ball white
=
Since these probabilities are mutually exclusive, by using addition
theorem
2. What is an event?
𝐏(𝐁 | 𝐀)·𝐏(𝐀)
P(A | B) = .
𝐏(𝐁 | 𝐀)·𝐏(𝐀) +𝐏(𝐁 | 𝐀ʹ)·𝐏(𝐀ʹ)
0.99×0.001
=
0.99×0.001 + (0.05×0.999)
0.00099
= 0.05094 = 0.0194.
(b) A patient has just had a negative test result. What is the
probability that the patient is a carrier? The answer is
P(Bʹ | A)P(A)
P(A | Bʹ ) = P(Bʹ | A)P(A)+ P(Bʹ | Aʹ)P(Aʹ)
0.01×0.001
= (0.01×0.001 )+ (0.95×0.999)
0.00001
= = 0.00001.
0.94095
3.9 SUMMARY
Bayes‟ Theorem is often stated in the form. If P(A) ≠ 0,1 and
P(B)≠ 0, then
𝐏(𝐁 | 𝐀)·𝐏(𝐀)
P(A | B) = 𝐏(𝐁 | 𝐀)·𝐏(𝐀) +𝐏(𝐁 | 𝐀ʹ)·𝐏(𝐀ʹ) Self-Instructional Material
73
Probability Conditional Probability: Two events A and B are said to be
dependent when event A can occur only when event B is known
NOTES
to have occurred (or vice versa).
Multiplication Probability :The probability of simultaneous
occurrence of two or more events
Addition Probability: If A and B are not mutually exclusive
events, the probability of the occurrence of either A or B or both
is equal to the probability that event A occurs, plus the probability
that event B occurs minus the probability of occurrence of the
events common to both A and B
Types Of Probability: Axiomatic Approach,Classical Approach
,Relative Frequency Theory of Probability,Subjective Approach
3.10 KEY WORDS
Probability, Sample, Events, Variables, Addition theorem, Multiplication
theorem, Axiomatic approach, Classical approach, Relative frequency
theory, Subjective approach, Baye‟s theorem
3.11 ANSWER TO CHECK YOUR PROGRESS
1. Sample Space :Sample Space is the set of all possible
outcomes of an experiment. It is denoted by S
2. Event :Any subset of a Sample Space is an event. Events
are generally denoted by capital letters A, B , C, D etc.
3. Addition Theorem For Mutually Exclusive Events
Self-Instructional Material
75
probability distribution
UNIT IV - PROBABILITY
NOTES
DISTRIBUTION
Structure
4.0 Introduction
4.1 Objectives
4.2 Random Variable
4.3Types of Random Variable
4.4 Binomial Distribution
4.5 Poisson Distribution
4.6 Normal Distribution
4.7 Summary
4.8 Key Words
4.9 Answer to Check your progress
4.10 Questions and Exercise
4.11 Further Reading
4.0 INTRODUCTION
A probability distribution is a table or an equation that links each
outcome of a statistical experiment with its probability of occurrence. It
describes the range of possible values that a random variable can attain
and the probability that the value of the random variable is with any
subset of that range. For example if X is a random variable then denote
by P(X) to be the probability that X Occurs. It must be the case that 0<=
P(X) <= 1 for each value of X and ƩP(X) = 1 ( the sum of all the
probabilities is 1)
4.1 OBJECTIVES
The students will be able to understand
The random variable and its types in probability distribution
Concept of Binomial Distribution, Poisson Distribution and
Normal Distribution
Concept of Mean and Standard deviation of Binomial and Poisson
Distribution
Self-Instructional Material
76
Value of 3 2 2 2 1 1 1 0 probability distribution
X:
NOTES
Thus, to every outcome there corresponds a real number X(w). Since the
points of the sample space corresponds to outcomes, this means that a
real number, which we denote by X(w), is defined for each w∈S and let
us denote them by w1, w2,….,w8 i.e.X(w1) = 3, X(w2)=2,…., X (w8 ) =0.
Thus,, we define a random variable as a real valued function whose
domain is the sample space associated with a random experiment and
range is the real line. Generally it is denoted by capital letters X,Y, Z, ---
etc.
4.3 TYPES OF RANDOM VARIABLE
1. Discrete random variable
If a random variable X assumes only a finite or countable set of
values, it is called a discrete random variable. In other words, a real
valued function defined on a discrete sample space is called a discrete
random variable. In case of discrete random variable we usually talk of
values at a point. Generally it represents counted data. For example,
number of defective milk packet in a milk plant, number of students in a
class etc.
2. Continuous random variable
A random variable is said to be continuous if it can assume
infinite and uncountable set of values. A continuous random variable is
in which different values cannot be put in one to one correspondence
with a set of positive integers. For example, weight of baby elephant take
any possible value in the interval of 160 kg to 260 kg, say 189 kg or
189.4356 kg; likewise, marks scored by the students in a class etc. In
case of continuous random variable we usually take the values in a
particular interval. Continuous random variables represent measured
data.
Probability Distribution of a Random Variable
The concept of probability distribution is equivalent to the
frequency distribution. It depicts how total probability of one is
distributed among various values which a random variable can take.
P(X):
Then
Self-Instructional Material
77
probability distribution
NOTES
Example
A die is tossed twice. Getting a number greater than 4 is
considered a success. Find the variance of the probability distribution of
the number of success.
Solution:
Here p, probability of a number greater than 4=2/6=1/3 and q,
Thus, we have:
0 0 0 0
mean
The variance
Self-Instructional Material
78
probability distribution
NOTES
4.4 BINOMIAL DISTRIBUTION
Binomial distribution is a discrete probability distribution. This
distribution was discovered by a Swiss Mathematician James Bernoulli
(1654-1705). A Bernoullian trial is an experiment having only two
possible outcomes i.e. success or failure. In other words the result of the
trial are dichotomous e.g. in tossing of a coin either head or tail, the sex
of a calf can be either male or female, a manufactured milk product or an
engineering equipment or spare part will be either defective or non
defective etc. This distribution can be used under the following
conditions:
a) The random experiment is performed repeatedly a finite and fixed
number of times i.e. n, the number of trials is finite and fixed.
b) The outcome of a trial results in the dichotomous classification of
events i.e. each trial must result in two mutually exclusive outcomes,
success or failure.
c) Probability of success (or failure) remains same in each trial i.e. in
each trail the probability of success, denoted by p remains constant. q=1-
p, is then termed as the probability of failure (non-occurrence).
d) Trials are independent i.e. the outcome of any trial does not affect the
outcomes of the subsequent trials.
Theorem:
If X denotes the number of successes in n trials satisfying the
above conditions, then X is a random variable which can take values
0,1,2,….,n i.e. no success, one success, two successes, ………, or all the
n successes. The general expression for the probability ofr successes is
n r n-r
given by:P(r) = P(X = r) = Crp q for r=0,1,2,….,n
Proof :
By the theorem of compound probability, the probability that r
trials are success and the remaining (n-r) are failures in a sequence of n
trials in a specified order say S,F,S,F,S,…..,S is given by
But we are interested in any r trials being successes and since r trials can
n
be chosen out of n trials in Cr(mutually exclusive) ways. Therefore, by
the theorem of total probability, the chance P (r) of r successes in a series
of n independent trials is given by
Example :
It is known that 40 % patients affected by tuberculosis die every
year. 6 patients are admitted to a hospital suffering from tuberculosis.
What is the probability that
(i) Three patients will die.
(ii) at least patients will die
(iii) all patients will be cured
(iv) no patients will be saved.
Solution
we have p = 0.4 , q = 1- 0.40 = 0.6 and n=6
In binomial distribution we have
P(r) = nCr .pr .qn-r
(i) Prob. [Three patients will die]
P[r = 3] = P(3) = 6C3 . (0.4)3 (0.6)3
(iii) Prob. (all patients will be cured) =1 - P (no patients will die)
1- P(0) =1 - 6C0 (0.4)0(0.6)6
Self-Instructional Material = 1 - (0.6)6
80
= 1 - 0.0467 = 0.9533 probability distribution
(iv) Prob. (no patients will be saved) = P (all patients will die)
NOTES
= P(6)
= 6C6 (0.4)6 (0.6)0
= (0.4)6 =0.0041
Variance
=
Self-Instructional Material
81
probability distribution For the binomial distribution if mean and variance are known, we
can arrive at the frequency distribution and variance is less than mean.
NOTES
iii) The third and fourth central moment μ3 and μ4 can be obtained on the
same lines.
Self-Instructional Material
82
i.e. probability distribution
NOTES
4.5 POISSON DISTRIBUTION
Poisson distribution is a limiting case of Binomial distribution
under the following conditions:
n, the no. of trials is indefinitely large i.e., n→∞
We know that
Self-Instructional Material
83
probability distribution
NOTES
Total probability is 1:
----
Prob. [That the carton will fail to meet the guaranteed quality] = 1- Prob.
[The carton will meet the guaranteed quality] = Prob. [Not more than 10
items will be defective] = 1 - P [r ≤ 10]
= 1- [P(0) + P(1) + P(2) + P(3)+………….+ P(10)]
Therefore, we have
=
Examples of Poisson Distribution
The hourly number of customers arriving at a bank
Self-Instructional Material
84
The daily number of accidents on a particular stretch of highway probability distribution
where,
=m
Hence, for Poisson distribution with parameter m mean is equal to
variance.
iii) Third and fourth central moments μ3 and μ4
It may be noted that the first three central moments of the Poisson
distribution are identical and are equal to the value of parameter itself
Self-Instructional Material
85
probability distribution namely ‘m’. Hence Poisson distribution is always a positively skewed
distribution as m>0 as well as leptokurtic. As the value of m
NOTES
increases γ1 decreases and the thus Skewness is reduced for increasing
values of m. As m⟶∞, γ1 and γ2 tend to zero. So we conclude that as
m⟶∞, the curve of the Poisson distribution tends to be symmetrical
curve for large values of m.
v) Mode of Poisson distribution is determined by the value m. If m is an
integer then the distribution is bi-modal, the two modal values being
X=m and X=m-1.When m is not an integer then the distribution has
unique modal value being integral part of m.
vi) Additive property: If X1 and X2 are two independent
Poisson variate with parameters m1and m2 then their sum X1 + X2 is also
a Poisson variate with parameter m1 + m2.
Example
The mean of the Poisson distribution is 2.25. Find the other
constants of the distribution.
Solution:
We have
and
Where a1, a2, ….. , an are constants, is also a normal variate with
Mean = a1μ1+ a2μ2 +…. + an μn and Variance = a1 2 σ12 + a22 σ22 +
…….. + an2 σn2. In particular, if we take a1 = a2 =……. = an =1
then we get X1+X2 +…….. +Xn is a normal variate with mean
μ1+μ2+ ……..+μn and variance σ12 + σ22 +….. + σn2 .Thus, the
sum of independent normal variates is also a normal variate with
mean equal to sum of their means and standard deviation equal to
square root of sum of the squares of their standard deviations.
This is known as the Re-productive or Additive Property of the
Normal distribution.
16. Mean Deviation (M.D.) about mean or median or mode is given
by
Also
Self-Instructional Material
89
probability distribution 19. We have (approximately):
NOTES
,
21. Area property: One of the most fundamental property of the
normal probability curve is the area property. If X ∽ N (μ, σ2),
then the probability that random value of X will lie between X= μ
and X= x1 is given
∴ P (0< x <x1)=P(0<z<z1)
The area under the normal probability curve between the ordinates at
X= μ-σ and X= μ+σ is 0.6826.In other words, the range X = μ-σ covers
68.26% of the observations (as shown in Fig). This is known as 1σ limit
of normal distribution
This is known as 1σ limit of normal distribution
Self-Instructional Material
90
probability distribution
NOTES
The area under the normal probability curve between the ordinates at
X= μ-2σ and X= μ+2σ is 0.95445. In other words, the range X = μ+2σ
covers 95.445% of the observations (as shown in Fig.). This is known as
2σ limits of normal distribution and is considered as warning limit in
case of statistical quality control which implies that it is a warning to the
manufacturer that the manufacturing process is going out of control.
3. The probability that random variable X lies in the interval (μ−3σ,
μ+3σ) is given by
The area under the normal probability curve between the ordinates at
X=μ−3σ and X= μ+3σ is 0.9973.In other words, the range X = μ+2σ
covers 99.73% of the observations (as shown in Fig. ). This is known as
3σ limits of normal distribution and it implies the manufacturing process
is out of control in case of statistical quality control.
Thus, the probability that a normal variate X lies outside the range μ- 3σ
is given as
At X=50,
P(X>50) =P(Z > 0.334) = 0.5 - P(0≤ Z ≤ 0.334)= 0.5 - 0.1308= 0.3692
No of students = 1000 * 0.3692= 369.2 ~ 369 students
(b) Number of students scoring between 30 and 58
As shown in figure we want to find
P(30<X<58) i.e. probability of shaded portion
Self-Instructional Material
92
P(30<X<58)=P(-0.5≤ Z≤0.6667) =0.1915+0.2476= 0.4391 probability distribution
No of students = 1000 * .4391 = 439.1 ~ 439 students NOTES
At X= x1 , = Z1 .
From Fig the P(X>x1) shown as shaded region
Conclusion
(a) 369 students scored more than 50.
(b) 439 students scored between 30 & 58.
(c) Minimum score of top 100 students is 73.
4.7 SUMMARY
Conditions for the binomial probability distribution are
i. The trials are independent
ii. The number of trials is finite
Self-Instructional Material
93
probability distribution np
NOTES The mean and variance of the Poisson distribution is λ.
3. Examples
o The hourly number of customers arriving at a bank
o The daily number of accidents on a particular stretch of
highway
o The hourly number of accesses to a particular web server
o The daily number of emergency calls in 108
o The number of types in a book
binomial distribution
3. Write the main characteristics of normal distribution
4. Fit the Poisson distribution to the following
X 0 1 2 3 4 5
f 120 82 52 22 4 0
Self-Instructional Material
95
UNIT – V ESTIMATION
Estimation
NOTES
Structure
5.1. Introduction
5.2 Reasons For Making Estimates
5.3 Types Of Estimates
5.4 Point Estimation
5.5 Interval Estimation
5.6 Criteria of a Good Estimator
5.7 Unbiasedness
5.8 Consistency
5.9 Efficiency
5.10 Sufficiency
5.11 Confidence Intervals
512 Determining The Sample Size In Estimation
5.1. INTRODUCTION
The sampling process is used to draw statistical inference about the
characteristics of a population or process of interest. On many
occasions we do not have enough information to calculate an
exact value of population parameters (such as μ, σ and p)
and therefore make the best estimate of this value from the
corresponding sample statistics (such as x , s, and P ). The need to
use the sample statistic to draw conclusions about the population
characteristic is one of the fundamental applications of statistical
inference in business and economics.
5.2 Reasons for Making Estimates
A few applications of statistical estimation are given below :
A production manager needs to determine the proportion of items being
manufactured that do not match with quality standards.
A mobile phone service company may be interested to know the average
length of a long distance telephone call and its standard
deviation
A bank needs to understand consumer awareness of its services and credit
schemes.
96
Let us first know the concept of ‘estimate’ as used in Statistics. Estimation
According to some dictionaries, an estimate is a valuation based on
NOTES
opinion or roughly made from imperfect or incomplete data. This
definition may apply, for example, when an individual who has an
opinion about the competence of one of his colleagues. But, in Statistics
the term estimate is not used in this sense. In Statistics too the estimates
are made when the information available is incomplete or imperfect.
However, such estimates are made only when they are based on sound
judgement or experience and when the samples are scientifically selected.
There are two types of estimates that we can make about a population : a
point estimate and an interval estimate. A point estimate is a single
number, which is used to estimate an unknown population parameter.
Although a point estimate may be the most common way of expressing an
estimate, it suffers from a major limitation since it fails to indicate how
close it is to the quantity it is supposed to estimate. In other words, a point
estimate does not give any idea about the reliability of precision of the
method of estimation used. For instance, if someone claims that 40
percent of all children in a certain town do not go to the school and are
devoid of education, it would not be very helpful if this claim is based on
a small number of households, say, 20. However, as the number of
households interviewed for this purpose increases from 20 to 100, 500 or
even 5,000, the claim that 40 percent of children have no school
education would become more and more meaningful and reliable. This
makes it clear that a point estimate should always be accompanied by
some relevant information so that it is possible to judge how far it is
reliable.
The second type of estimate is known as the interval estimate. It is a range
of values used to estimate an unknown population parameter. In case of
an interval estimate, the error is indicated in two ways: first by the extent
of its range; and second, by the probability of the true population
parameter lying within that range. Taking our previous example of 40
percent children not having a school education, the statistician may say
that actual percentage of such children in that town may lie between 35
percent and 45 percent. Thus, he will have a better idea of the reliability
of such an estimate as compared to the point estimate of 40 percent.
Estimator and Estimate
When we make an estimate of a population parameter, we use a sample
statistic. This sample statistic is an estimator.
∑ xi
97
Estimation question here is: how can we evaluate the properties of these estimates,
NOTES compare then with one another, and finally, decide which the ‘best’ is? The
answer to this question is possible only when we have certain criteria that a
good estimator must satisfy. These criteria are briefly discussed below.
5.7 Unbiasedness
This is a very important property that an estimator should possess. If we
take all possible samples of the same size from a population and calculate
their means, the mean μ x of all these means will be equal to the mean μ of
the population. This means that the sample mean is an unbiased estimator
of the population mean μ. When the expected value (or mean) of a sample
statistic is equal to the value of the corresponding population parameter,
the sample statistic is said to be an unbiased estimator.
Suppose we take the smallest sample observation as an estimator of the
population mean μ, it can be easily shown that this estimator is biased.
Self-Instructional Material
Since the smallest observation must be less than the mean, its expected
value must be less than μ. Symbolically, E(Xs) < μ, where Xs stands for
98
the smallest item and E stands for the expected value. Thus, this estimator Estimation
is biased downwards. The extent of bias is the difference between the
NOTES
expected value of the estimator and the value of the parameter. In this case,
bias is equal to E(Xs)- μ. In contrast, the biases for the sample mean x is
zero.
5.8 Consistency
Another important characteristic that an estimator should possess is
consistency. Let us take the case of the standard deviation of the
sampling distribution of x . The standard deviation of the sampling
distribution of sample mean is computed by following formula :
σ
σx =
Solution
From the given information, for the population of all employees, N = 4,000
μ = Rs.4,800 σ = Rs.1,200.
σ 1,200 1,200
x = n or, σx = 81 = 9 = Rs.133.33
In this case, n = 100 and n/N = 100/4,000 = 0.025, which is also less
than 0.05. The mean and the standard deviation ξ are
μξ = μ = Rs.4,800
σ 1,200 1,200
x = n or, σ x = 100 = 10 = Rs.120 Self-Instructional Material
99
Estimation In this case, n = 180 and n/N = 180/4,000 = 0.045, which again is less
NOTES
than 0.05. The mean and the standard deviation ξ are
μx = μ = Rs.4,800
σ 1,200 1,200
x = n or, σ x = 180 = 13. 42 = Rs.89.42
From the above three sets of calculation, it becomes clear that the mean
of the sampling distribution of x is always equal to the mean of the
population regardless of the sample size. But, in case of the standard
deviation, we find the change. In the given example, we find that
standard deviation of x decreased from Rs.189.87 to Rs.120 and then to
Rs.133.33 as the sample size increased from 40 to 100 and then to 180.
5.9 Efficiency
Another desirable property of a good estimator is that it should be
efficient. Efficiency is measured in terms of size of the standard error of
the statistic. Since an estimator is a random variable, it is necessarily
characterised by a certain amount of variability. This means that some
estimates may be more variable than others. Just as bias is related to the
expected value of the estimator, so efficiency can be defined in terms of
the variance. In large samples, for example, the variance of the sample
2
mean is V( x )=σ /n. As the sample size n increases, the variance of the
sample mean (V x ) becomes smaller, so the estimator becomes more
efficient. This criterion, when applied to large samples, gives better
estimates as compared to the small ones.
The efficiency of one estimator in relation to another estimator can be
judged by comparing their sampling variances. Thus, efficiency relates to
the size of the standard error. Given the same sample size, the statistic that
has a smaller standard error is preferable as it is efficient in relation to
another statistic that has a larger standard error. The sampling distribution
of the mean and the median have the same mean, that is, the population
mean. However, the variance of the sampling distribution of the means is
smaller than the variance of the sampling distribution of the medians. As
such, the sample mean is an efficient estimator of the population mean,
while the sample median is an inefficient estimator.
5.10 Sufficiency
The fourth property of a good estimator is that it should be sufficient. A
sufficient statistic utilises all the information a sample contains about the
parameter to be estimated. ξ, for example, is a sufficient estimator of the
population mean μ. It implies that no other estimator of μ, such as the
sample median, can provide any additional information about the
parameter μ. Likewise, we can say that the sample proportion π.
Having looked into properties of a good estimator briefly, a pertinent
question arises: how can we find estimators with these desirable
properties? This brings us to the method of maximum likelihood
5.6.2 METHOD OF MAXIMUM LIKELIHOOD (ML)
Self-Instructional Material
100
The maximum likelihood method provides estimators with the desirable Estimation
properties such as efficiency, consistency and sufficiency, which we
NOTES
have just discussed. It usually does not give an unbiased estimate. Let
us take an example to explain this method.
101
Estimation Strictly speaking a 95% confidence interval means that if we were to take
NOTES
100 different samples and compute a 95% confidence interval for each
sample, then approximately 95 of the 100 confidence intervals will contain
the true mean value (μ). In practice, however, we select one random
sample and generate one confidence interval, which may or may not
contain the true mean. The observed interval may over- or underestimate μ.
Consequently, the 95% CI is the likely range of the true, unknown
parameter. The confidence interval does not reflect the variability in the
unknown parameter. Rather, it reflects the amount of random error in the
sample and provides a range of values that are likely to include the
unknown parameter. Another way of thinking about a confidence interval
is that it is the range of likely values of the parameter (defined as the point
estimate + margin of error) with a specified level of confidence (which is
similar to a probability).
For the standard normal distribution, P(-1.96 < Z < 1.96) = 0.95, i.e., there
is a 95% probability that a standard normal variable, Z, will fall between -
1.96 and 1.96. The Central Limit Theorem states that for large samples:
Using algebra, we can rework this inequality such that the mean (μ) is the
middle term, as shown below .then and finally
This last expression, then, provides the 95% confidence interval for the
population mean, and this can also be expressed as:
Thus, the margin of error is 1.96 times the standard error (the standard
deviation of the point estimate from the sample), and 1.96 reflects the fact
that a 95% confidence level was selected. So, the general form of a
confidence interval is:
where Z is the value from the standard normal distribution for the selected
Self-Instructional Material confidence level (e.g., for a 95% confidence level, Z=1.96).
102
In practice, we often do not know the value of the population standard Estimation
deviation (σ). However, if the sample size is large (n > 30), then the sample
NOTES
standard deviations can be used to estimate the population standard
deviation.
Self-Instructional Material
103
Test of Hypothesis
NOTES
UNIT - VI TEST OF HYPOTHESIS
Structure
6.0 Introduction
6.1 Objectives
6.2 Hypothesis Testing on Population Mean
6.2.1 Population is Known
6.2.2 Population is Unknown
6.3 Difference Between Mean of Two Populations
6.3.1 Population Variance Known
6.3.2 Population Variance Unknown
6.4 Test of Hypothesis for Population Proportion
6.5 Difference Between Two Proportion
6.6 Summary
6.7 Key Words
6.8 Answers to Check Your Progress
6.9 Questions and Exercise
6.10 Further Readings
6.0 INTRODUCTION
Hypothesis testing was introduced by Ronald Fisher, Jerzy
Neyman, Karl Pearson and Pearson’s son, Egon Pearson. Hypothesis
testing is a statistical method that is used in making statistical decisions
using experimental data. Hypothesis Testing is basically an assumption
that we make about the population parameter.
6.1 OBJECTIVES
variance is known and the second situation is if the population variance is NOTES
unknown.
6.2.1 POPULATION VARIANCE KNOWN
Steps:
𝑿− 𝝁
4. Consider the test statistics Z= . underH0. Here 𝑿represents
𝛔/√𝐧
5. Calculate the value of Z for the given sample (x1, x2, ..., xn) as
𝑿− 𝝁
Z= 𝛔/√𝐧.
105
Test of Hypothesis
NOTES
Example:
Thus; | z 0 | = 3.33
Self-Instructional Material
106
Step 6 : Critical value Test of Hypothesis
Steps:
1. Let μ and σ2be respectively the mean and the variance of the
population under study, where σ2 is unknown. If μ0 is an
admissible value of μ, then frame the null hypothesis as H0: μ =
μ0and choose the suitable alternative hypothesis from
(i) H1: μ ≠ μ0(ii) H1: μ > μ0(iii) H1: μ < μ0
2. Let (X1, X2, …,Xn) be a random sample of n observations
drawn from the population, where n is large (n ≥ 30).
3. Specify the level of significance, α.
𝑿− 𝝁
4. Consider the test statistic Z= under H0, where X and S are
𝐒/√𝐧
the sample mean and sample standard deviation respectively. It
may be noted that the above test statistic is obtained from Z by
substituting S for σ.
The approximate sampling distribution of the test statistic under
H0is the N(0,1)distribution.
5. Calculate the value of Z for the given sample (x1, x2, ...,xn) as
𝑿− 𝝁
Z= . Here,𝑿 and s are respectively the values of 𝑿and S
𝐒/√𝐧
calculated for the given sample.
6. Find the critical value, ze, corresponding to α and H1from the
following table
107
Test of Hypothesis
Critical Value (ze) zα/2 zα –zα
NOTES
7. Decide on H0 choosing the suitable rejection rule from the
following table corresponding to H1.
Example:
Solution:
Step 1 :Let the fuel consumption of the new model car be assumed to
be distributed according to a distribution with mean and
standard deviation respectively μ and σ. The null and
alternative hypotheses are
Null hypothesis H0: μ = 57
i.e., the average fuel consumption of the company’s new
model car is not significantly different from that of the
existing model.
Alternative hypothesis H1: μ > 57
i.e., the average fuel consumption of the company’s new
model car is significantly lower than that of the existing
model. In other words, the number of kms by the new model
csr is significantly more than that of the existing model car.
Step 2 : Data:
The given sample information are
Size of the sample (n) = 100. Hence, it is a large sample.
Sample mean (𝒙 )= 30
Sample standard deviation(s) = 3
Self-Instructional Material
108
Step 3 : Level of significance Test of Hypothesis
α = 5% NOTES
109
Test of Hypothesis drawn from Population-1 and (Y1, Y2, …, Yn) be a random
NOTES
sample of n observations drawn from Population-2, where m and
n are large(i.e., m ≥ 30 and n ≥ 30). Further, these two samples
are assumed to be independent.
𝑋 −𝑌 − (μx−μY )
4. Consider the test statistic Z = 𝜎 ²x 𝜎 ²y
under H0, where 𝑋
+
𝑚 𝑛
and 𝑌 are respectively the means of the two samples described in
Step-2.
5. Calculate the value of Z for the given samples (x1, x2, ...,xm) and
𝑋 −𝑌
(y1, y2, …, yn) as Z0 = 𝜎 ²x 𝜎 ²y
.
+
𝑚 𝑛
Here, 𝑥 and 𝑦are respectively the values of 𝑋 and 𝑌for the given
samples.
6. Find the critical value, ze, corresponding to α and H1 from the
following table
Example:
Self-Instructional Material
110
Performance of students in a national level Olympiad exam was Test of Hypothesis
studied. The scores secured by randomly selected students from two
NOTES
districts, viz., D1 and D2 of a State were analyzed. The number of students
randomly selected from D1 and D2 are respectively 1000 and 1600.
Average scores secured by the students selected from D1 and D2 are
respectively 116 and 114. Can the samples be regarded as drawn from the
identical populations having common standard deviation 27 Test at 5%
level of significance.
Solution:
Step 2 : Data
The given sample information are
Size of the Sample-1 (m) = 1000
Size of the Sample-2 (n) = 1600. Hence, both the samples
are large.
Mean of Sample-1 (𝑥 ) = 116
Mean of Sample-2 (𝑦 ) = 114
111
Test of Hypothesis Since both m and n are large, the sampling distribution of
NOTES
Z under H0 is the N(0, 1) distribution.
𝑋 −𝑌 116 −114
𝑍= 1 1
= 1 1
= 49.628
+ +
𝑚 𝑛 1000 1600
Step-7 : Decision
Since H1 is a two-sided alternative, elements of the critical
region are defined by the rejection rule |z0| ≥ ze= z0.025. For
the given sample information, |z0| = 49.628 >ze= 1.96. It
indicates that the given sample contains sufficient evidence
to reject H0. Thus, it may be decided that H0 is rejected.
Therefore, the average performance of the students in the
districts D1 and D2 in the national level Olympiad
examination are significantly different. Thus the given
samples are not drawn from identical populations.
Steps:
1. Let μx and 𝜎²x be respectively the mean and the variance of
Population -1. Also, let μY and 𝜎²y be respectively the mean and
the variance of Population -2 under study. Here 𝜎²x and𝜎²y are
known admissible values.
𝑋 −𝑌 − (μx−μY )
4. Consider the test statistic Z = under H0,
𝑆²x 𝑆²y
+
𝑚 𝑛
6. Calculate the value of Z for the given samples (x1, x2, ...,xm) and
𝑋 −𝑌
(y1, y2, …, yn) as Z0 = 𝑆²x 𝑆²y
.
+
𝑚 𝑛
Here, 𝑥 and 𝑦are respectively the values of 𝑋 and 𝑌for the given
samples.
Also, 𝑠²x and 𝑠²yare respectively the values of 𝑆²x and 𝑆²y for the
given samples.
7. Find the critical value, ze, corresponding to α and H1 from the
following table
Example:
Self-Instructional Material
114
Solution: Test of Hypothesis
Step 1 :Let P denote the proportion of students in the city who NOTES
preferred to have chocolate. Then, the null and the
alternative hypotheses are
Null hypothesis: H0 : = 0.5
i.e., it is significant that both chocolate and ice-cream are
preferred equally in the city.
Alternative hypothesis: H0 : ≠ 0.5
i.e., preference of chocolate and ice-cream are not
significantly equal. It is a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 2000. Hence, it is a large sample.
No. of chocolate consumer = 1120
1120
Sample proportion (p) =2000 = 0.56
Steps:
1: Let PX and PY denote respectively the proportions of Population-1
and Population-2 possessing the qualitative characteristic
(attribute) under study. Frame the null hypothesis as H 0:
PX=PY and choose the suitable alternative hypothesis from
mp x+ np y
𝑝= , 𝑞 = 1 - 𝑝 . The approximate sampling
𝑚 +𝑛
distribution of the test statistic
under H0 is N(0,1) distribution.
(PX− PY )
5: Calculate the value of Z for the given data as Z = 1 1
𝑝𝑞( + )
𝑚 𝑛
Example:
A study was conducted to investigate the interest of students
in private schools. Among randomly selected 1000 students
from City-1, 800 persons were found to be private school.
From City-2 , 1600 persons were selected randomly and
among them 1200 students are from private school. Do the
data indicate that the two cities are significantly different
with respect to prevalence of private school among the
students? Choose the level of significance as α = 0.05.
Solution:
Step1 :Let PX and PY be respectively the proportions of private school
students in City-1 and City-2. Then, the null and alternative
hypotheses are
Null hypothesis: H0: PX = PY
i.e., there is no significant difference between the
proportions of private school students in City-1 and City-2.
Alternative hypothesis: H1: PX≠ PY
i.e., difference between the proportions of private school
students in City-1 and City-2 is significant. It is a two-sided
alternative hypothesis.
Step 2 : Data
The given sample information are
6.6 SUMMARY
NOTES
6.8 ANSWER TO CHECK YOUR PROGRESS
119
Test of Hypothesis students from two regions. Among 300 students selected from
NOTES Region A, 34 students expressed their interest. Among 200
students selected from Region B, 28 students expressed their
interest. Does this information provide sufficient evidence to
conclude at 5% level of significance that students in Region A are
more interested in Residential Schooling than the students in
Region B?
Self-Instructional Material
120
UNIT 7 - CHI – SQUARE TEST Chi –Square Test
NOTES
Structure
7.0 Introduction
7.1 Objectives
7.2 Characteristics of Chi –Square Test
7.3 Uses of Chi –Square Test
7.4 Steps of Chi –Square Test
7.5 Summary
7.6 Questions
7.0 INTRODUCTION
A chi-squared test, also written as χ2 test, is any statistical
hypothesis test where the sampling distribution of the test statistic is a
chi-squared distribution when the null hypothesis is true. Without other
qualification, 'chi-squared test' often is used as short for Pearson's chi-
squared test.
The chi-squared test is used to determine whether there is a
significant difference between the expected frequencies and the observed
frequencies in one or more categories.
Chi square test is applied in statistics to test the goodness of fit to
verify the distribution of observed data with assumed theoretical
distribution. Therefore, it is a measure to study the divergence of actual
and excepted frequencies. it has great use in statistics, specially in
sampling studies, where we except a doubted coincidence between
actual and excepted frequencies, and the extent to which the difference
can be ignored, because of fluctuations in sampling.
7.1 OBJECTIVES
The student will be able to
Understand the purpose for using chi-square test
Understand the procedures for Analysis of variance
Understand the characteristics and of chi-square test
Solve problems to test the hypothesis whether the population
has a particular variance using chi-square test
7.2 CHARACTERISTICS OF χ2 TEST
1. Test is based on events or frequencies, whereas in theoretical
distribution, the test is based on mean and standard deviation.
2. To draw inferences, this test is applied, specially testing the
hypothesis but not useful for estimation.
3. The test can be used between the entire set of observed and
excepted frequencies.
4. For every increase in the number of degree of freedom, a new χ2
distribution is formed.
Self-Instructional Material
5. It is a general purpose test and as such is highly useful n research.
121
Chi –Square Test
Solution
On tabulation of the information in a 2x2 contingency table, we
get:
Self-Instructional Material
122
Chi –Square Test
NOTES
Observed Frequencies
Hindu Non – Total
Hindu
Consuming Tea 1236 164 1400
Non – Consuming 564 36 600
Tea
Total 1800 200 2000
Excepted Frequencies
Hindu Non – Total
Hindu
Consuming Tea 1260 140 1400
Non – Consuming 540 60 600
Tea
Total 1800 200 2000
Calculation of χ2
O E O–E (O-E)2 (O-E)2/ E
1236 1260 -24 576 0.457
564 540 24 576 1.068
164 140 24-24 576 4.114
36 60 576 9.600
∑(O-E)2/
E=15.239
d.f is 1, Table value of 𝑟2 0.05 for 1 d.f = 3.841.
For a contingency table, 2x2 table, the degree of freedom is
The calculated value of χ2 15.239 is higher than the table value i.e., 3.841;
therefore the null hypothesis is rejected.
Hence, the two communities differ significantly as far as
consumption of a tea is concerned.
7.5. SUMMARY
The uses of distribution are testing the specified variance of a normal
population, testing goodness of fit and testing independence of
attributes
Through the test we can find out the deviations between the observed
values and excepted values. Here we are not concerned with the
parameters but concerned with the form of distribution. Self-Instructional Material
123
Chi –Square Test
NOTES
Self-Instructional Material
124
Unit – VIII F – TEST F-test
NOTES
Structure
8.1 Introduction
8.5 Summary
8.1 Introduction
Solution:
Null Hypothesis Ho= there is no significant difference in the durability of
3 makes of computers.
Computer I Computer II Computer III
2 2
X1 X1 X2 X2 X3 X32
4 16 7 49 6 36
6 36 9 81 4 16
8 64 11 121 6 36
Self-Instructional Material
126
9 81 12 144 3 9 F-test
7 49 5 25 2 4
∑X1=34 ∑X1 =246 ∑X2=44 ∑X2 =420 ∑X3=21 ∑X32=101
2 2 NOTES
Step – 1
Sum of all items (T) = ∑X1+∑X2+∑X3
= 34+44+21
=99
Step – 2
Correction factor (C.F) = T2= (99)2= 653.4
N 15
Step – 3
TSS = Sum of Squares of all the items – C.F
= ∑X12+∑X22+∑X32 - T2
N
= 246+420+101-653.4= 113.6
Step – 4
SSC = Sum of Squares between samples – C.F
= (∑X1)2 (∑X2)2 (∑X3)2 - C.F
n + n + n
= (34)2(44)2 (21)2 - 653.4
5 + 5 + 5
= 231.2 + 387.2 + 88.5 – 653.= 53.5
Step – 5
MSC = Sum of squares between samples
d.f
= 53.5
2
= 26.75
Step – 6
SSE = Total sum of squares – Sum of Squares between samples
= 113.6 – 53.5
= 60.1
Step – 7
MSE = Sum of squares within samples
d.f
= 60.1
12
= 5.00
ANOVA TABLE
Source of Sum of Degrees of Mean F - ratio
variations squares freedom Squares
Between SSC = 53.5 3-1=2 MSC= SSC
samples d.f
127 Self-Instructional Material
F-test = Fc = MSC
26.75 MSE
NOTES
Within SSE = 60.1 15-3=12 MSE= SSE = 5.35
samples d.f
= 5.00
Varieties Yields
1 2 3 4
A 6 5 8 9
B 8 4 6 9
C 7 6 10 6
Solution
Null hypothesis H0: There is no significant difference between varieties
(rows) and between yields,(blocks).
Varieties Yields
1 2 3 4 Total
A 6 4 6 6 24
B 7 5 8 9 28
C 8 6 10 9 32
Total 21 15 24 24 84
Step -1
Grand total (T) = 84
Step – 2
Self-Instructional Material
128
T2 (84)2
Correction factor (C.F) = N = 12 = 588 F-test
Step – 3 NOTES
SSC = Sum of squares between blocks (columns)
= (21)2 + (15)2 + (24)2+ (24)2 – C.F
3 3 3 3
= 606 – 588
= 18
Step – 4
SSR = Sum of squares between varieties (Rows)
= (24)2 + (28)2 + (32)2 – C.F
4 4 4
= 596 – 588
=8
Step – 5
TSS = Total sum of squares – C.F
=
[(6)2+(7)2+(8)2+(4)2+(6)2+(5)2+(8)2+(6)2+(10)2+(6)2+(9)2+(9)2] - 588
= 624 – 588
= 36
Step – 6
SSE = Residual sum of squares
= TSS-(SSC+SSR)
= 36 – (18+8) = 10
um Step- 7
d.f = v3 = (c-1) (r-1)
S = (3) (2)
=6
ANOVA TABLE
Source of Sum of Degree of Mean F-ratio
variation squares freedom Squares
Between SSC= 18 c-1 MSC=SSC Fc=
Blocks 4-1= 3 d.f MSC
(Columns) =6 MSE
= 3.6
Between SSR=8 r-1 MSR=SSR FR=
Varieties 3-1=2 d.f MSR
(Rows) =4 MSE
= 2.4
Residual SSE=10 (r-1)(c-1) MSE=SSE
=6 d.f
=1.667
8.5 SUMMARY
The uses of distribution are testing the specified variance of a normal
population, testing goodness of fit and testing independence of
attributes
Analysis of variance (ANOVA) is a collection of statistical
models and their associated estimation procedures used to analyze
the differences among group means in a sample
One-way analysis of variance (abbreviated one-way ANOVA) is a
technique that can be used to compare means of two or more samples
The two-way ANOVA compares the mean differences between
groups that have been split on two independent variables
8.6 KEY WORDS
Chi-square, Analysis of Variance, One way method, Two way method
133
between price and demand, yield of crop and price.
Correlation Analyses
The following expels illustrate the concept of positive correlation
NOTES and negative correlation.
Positive correlation
X 5 7 9 11 16 20 28
y 20 26 35 37 48 50 55
Negative Correlation
X 14 17 23 35 46
y 16 12 10 9 5
X 6 12 18 24
Y 5 10 15 20
Self-Instructional Material
135
Correlation Analyses
NOTES
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
Self-Instructional Material
136
Correlation Analyses
(a)Arithmetic mean Method
∑xy
NOTES
r=
∑x2 ∑y2
Example:
Find Pearson’s Co-efficient of correlation from the following data
Sales 15 18 22 28 32 46 52
Solution
Let the sales be denoted by x and the profit by y.
Computation of coefficients of correlation
X 𝐗 X2 Y 𝐘 Y2 XY
−𝐗 −𝐘
Self-Instructional Material
137
Correlation Analyses
NOTES
X=∑x/N =213/7=30.43
Y=∑y/N =645/7 =92.14
∑x2=1179.68,∑y2=6002.86,∑xy=2647.57
∑xy 2647.57 2647.57
r= = =
∑x 2 ∑y 2 1179.68x6,002.86 1179.68x6,002.86
2647.57 2647.57
= = = 0.99
34.35x77.48 2661.44
Candidate 1 2 3 4 5 6 7 8 9 10
Self-Instructional Material
138
Professor A 8 12 6 4 9 15 8 7 16 13 Correlation Analyses
Professor B 9 16 10 8 14 19 12 11 20 17 NOTES
Solution
Rx Ry d= Rx- Ry d2
8 9 -1 1
12 16 -4 16
6 10 -4 16
4 8 -4 16
9 5 4 16
15 10 5 25
8 7 1 1
7 11 -4 16
16 15 1 1
13 18 -5 25
∑ d2=133
= 1-0.8060
r = 0.194
139
2. Coefficients of Correlation are independent of Change of Origin:
Correlation Analyses
This property reveals that if we subtract any constant from all the
values of X and Y, it will not affect the coefficient of correlation.
NOTES
3. Coefficients of Correlation possess the property of symmetry:
The degree of relationship between two variables is symmetric.
4. Coefficient of Correlation is independent of Change of Scale:
This property reveals that if we divide or multiply all the values
of X and Y, it will not affect the coefficient of correlation.
5. The value of the co efficient of correlation shall always lie
between +1 and -1.
6. When r = + 1, then there is perfect positive correlation between
the variables.
7. When r = - 1, then there is perfect negative correlation between
the variables.
8. When r = 0, then there is no relationship between the variables.
The third formula given above, that is
∑xy
r=
∑x 2 ∑y 2
140
1. The term correlation refers to the degree of relationship between Correlation Analyses
two or more variables
2. Linear correlation is a measure of the degree to which two NOTES
variables vary together, or ameasure of the intensity of the
association between two variables
3. Positive and Negative correlation ,Simple, Partial and Multiple
correlation,Linear and Non-Linear correlation
4. Scatter diagram is a graphic device for finding correlation
between two variables
6∑D 2
5. rs = 1 −
n(n 2 −1)
141
6. Statistical Method by S.P. Gupta. Sultan Chand and Sons., Delhi.
Correlation Analyses
NOTES
Self-Instructional Material
142
Spearman’s Rank Correlation UNIT X - SPEARMAN’S RANK
NOTES CORRELATION
Structure
10.0 Introduction
10.1 Objectives
10.2 Regression
10.3 Linear Regression
10.4 Types of Regression
10.4.1Regression Equation of Y on X
10.4.2 Regression Equation of X on Y
10.5 Curve fitting by the Method of Least square
10.6 Derivations of Regression Equation
10.7 Properties of Correlation Coefficient
10.8 Summary
10.9 Key Words
10.10 Answer to Check Your Progress
10. 11Questions and Exercise
10.12 Further Readings
10.0 INTRODUCTION
Regression means stepping back or going back. It was first used by
Francis Galton in 1877. He studied the relationship between the
height of father and their sons. The study revealed that
Tall fathers have tall sons and short fathers have short sons.
The mean height of the sons of tall father is less than mean height
of their fathers.
The mean height of sons of short fathers is more than the mean
height of their fathers.
The tendency to going back was called by Galton as „Line of
Regression‟. This line describing the average relationship
between two variables is known as the line of Regression.
In statistical modelling, regression analysis is a set of statistical
processes for estimating the relationships among variables. It
includes many techniques for modelling and analyzing several variables,
when the focus is on the relationship between a dependent variable and
one or more independent variables (or 'predictors'). More specifically,
regression analysis helps one understand how the typical value of
the dependent variable (or 'criterion variable') changes when any
one of the independent variables is varied, while the other independent
variables are held fixed.
Regression analysis is widely used for prediction and forecasting,
Self-Instructional Material
where its use has substantial overlap with the field of machine
142
learning. Spearman’s Rank Correlation
NOTES
10.1 OBJECTIVES
After studying this chapter students will be able to understand
Concept of Regression and Regression coefficients
Types of regression equations
Regression lines both x on y and y on x
10.2 REGRESSION
Regression analysis refers to assessing the relationship between the
outcome variable and one or more variables. The outcome variable is
known as the dependent or response variable and the risk elements, and
cofounders are known as predictors or independent variables. The
dependent variable is shown by “y” and independent variables are shown
by “x” in regression analysis.
10.3 LINEAR REGRESSION
Linear regression attempts to model the relationship between two
variables by fitting a linear equation to observed data. One variable is
considered to be an explanatory variable, and the other is considered to
be a dependent variable. For example, a modeler might want to relate the
weights of individuals to their heights using a linear regression model.
10.4 TYPES OF REGRESSION EQUATIONS
The Regression Equation is the algebraic expression of the regression
lines. It is used to predict the values of the dependent variable from the
given values of independent variables. As there are two regression lines,
there are two regression equations. For the two variables X and Y, there
are two regression equations. They are.
o Regression equation of X on Y.
o Regression equation of Yon X.
10.4.1 Regression Equation of X on Y
The straight line equation is X=a+by
Here a andb are unknown constants, which determines the
position. The constant a is the intercept on the other value; the constant b
is the slope; the following two normal equations are derived;
∑x = na + b∑y
∑xy = a∑x + b∑y2
The Regression equation X on Y is used to find out the values of X for
given value of Y. Self-Instructional Material
X 15 20 25 30 35 40 45
y 8 14 20 26 32 38 44
Solutions
x y X2 Y2 xy
15 8 225 64 120
NOTES
146
3. Fitting of a Straight Line: Spearman’s Rank Correlation
A straight line can be fitted to the given data by the method of NOTES
least squares. The equation of a straight line or least square line
isY=a+bX, where a and b are constants or unknowns.
To compute the values of these constants we need as many equations as
the number of constants in the equation. These equations are called
normal equations. In a straight line there are two constants a and b so we
require two normal equations.
Normal Equation for „a‟ ∑Y = na+ b∑X
X 1 2 3 4 5
y 2 5 3 8 7
Solution
X Y XY X2 1.1+1.3X Y-
1 2 2 1 2.4 -0.4
2 5 10 4 3.7 1.3
3 3 9 9 5.0 -2
4 8 32 16 6.3 1.7
5 7 35 25 7.6 -0.6
147
Spearman’s Rank Correlation
Normal equation for „b ∑XY=a∑X+b∑X2 88=15a+55b ----(2)
NOTES
148
2. When deviations are Taken from Assumed Mean Spearman’s Rank Correlation
When instead of using actual means of X and Y observations, we use any NOTES
arbitrary item (in the observation) as the mean.
We consider taking deviations of X and Y values from their respective
assumed means.
The formula for calculating regression coefficient when regression is Y
on X is as follows:
149
The constant „b‟ in the regression equation (Ye = a + bX) is called as
Spearman’s Rank Correlation
the Regression Coefficient. It determines the slope of the line, i.e. the
NOTES change in the value of Y corresponding to the unit change in X and
therefore, it is also called as a “Slope Coefficient.”
1. The correlation coefficient is the geometric mean of two
regression coefficients. Symbolically, it can be expressed as:
NOTES
2. When regression is Y on X
When regression is X on Y
151
obtained. ΣX=15, ΣY=25, ΣX2=55, ΣY2=135, ΣXY=83. Find the
Spearman’s Rank Correlation
equation of the
NOTES lines of regression and estimate the values of X and Y if Y=8 ;
X=12.
4. Using the following information you are requested to (i) obtain
the linear regression
of Y on X (ii) Estimate the level of defective parts delivered when
inspectionexpenditure amounts to Rs.28,000 ΣX=424, ΣY=363,
ΣX2 =21926, ΣY2 =15123,ΣXY=12815 , N=10. Here X is the
expenditure on inspection, Y is the defectiveparts delivered
10.11 FURTHER READINGS
1. Statistics (Theory & Practice) by Dr. B.N. Gupta. SahityaBhawan
Publishers andDistributors (P) Ltd., Agra.
2. Statistics for Management by G.C. Beri. Tata McGraw Hills
Publishing CompanyLtd., New Delhi.
3. Business Statistics by Amir D. Aczel and J. Sounderpandian. Tata
McGraw HillPublishing Company Ltd., New Delhi.
4. Statistics for Business and Economics by R.P. Hooda. MacMillan
India Ltd., NewDelhi.
5. Business Statistics by S.P. Gupta and M.P. Gupta. Sultan Chand
and Sons., NewDelhi.
5. 6. Statistical Method by S.P. Gupta. Sultan Chand and Sons.,
New Delhi.
Self-Instructional Material
152
UNIT – IX BUSINESS FORECASTING Business Forecasting
NOTES
Structure
11.1 Introduction
11.2 The Objectives of Forecasting
11.3 Prediction, projection and forecasting
11.4 Characteristics of forecasting are as follows
11.5 Steps in Forecasting
11.6 Methods of Business Forecasting
11.1 Introduction
Business forecasting is a method to predict the future, where the future is
narrowly defined by economic conditions. It combines information
gathered from past circumstances with an accurate picture of the present
economy to predict future conditions for a business.
Teunter found that if the goal is simply to minimize forecast error, then
forecasting zero in every period was the best method to use! (The zero
forecast had lower error than a moving average, exponential smoothing,
bootstrapping, and three variations of Croston’s method that were tested.)
However, for proper inventory management to serve customer needs,
forecasting zero demand every period is probably not the right thing to do.
A similar point was made last fall in a Foresight article by Stephan Kolassa
and Roland Martin (discussed in "Tumbling Dice"). Using a simple dice
tossing experiment, they showed the implications for bias in commonly
used percentage error metrics. What makes this important to management
is that if the sole incentive for forecasters is to minimize MAPE, the
forecaster could do best by purposely forecasting too low. This, of course,
Self-Instructional Material
153
Business Forecasting could have bad consequences for inventory management and customer
NOTES
service.
11.3 Prediction, projection and forecasting
Forecast is scientific and free from intuition and personal bias, whereas
prediction is subjective and fatalistic in nature.
Forecasting is an extrapolation of past into the future while prediction is
judgmental and takes into account changes taking place in the future.
Therefore, prediction is utilized more in business and economics while
forecasting takes place in weather and earthquakes.
Predicting is saying or telling something before the event while forecasting
is done on the basis of analysis of the past.
Forecasting is still not a complete science as there are chances of error.
Concept of Forecasting
Forecasting is a process of making predictions about the future course of a
business or a company based on trend analysis and past and present data.
So essentially data is collected and studied about the business, and analysis
is done to forecast future scenarios that are likely to occur. Hence
forecasting is an important tool in the process of business planning.
Analysis of Deviations
Self-Instructional Material
154
No forecast will be completely accurate. The differences or deviations Business Forecasting
from the forecasts should be analyzed and studied. This will help in
NOTES
making more accurate forecasts in the future.
This method refers to the projection of trends on the basis of past events.
The historical sequence of events is analysed as a basis for understanding
the present situation and forecasting the future trends. The past recurring
trends are associated with the corresponding cause and effect phenomenon
in the future.
(c) Regression
(d) Econometric Model.
Self-Instructional Material
(a) Business Index or Barometer:
156
Business Forecasting
The term ‘business index’ refers to a series relating to business conditions.
NOTES
It is also known as ‘barometer’, ‘indicator’ or ‘economic forecaster.’ Such
a business index number may relate to general conditions of business or to
a particular trade or industry or to an individual business.
The index number may measure changes in business activity during the
changes of cyclical variations, i.e. boom, decline, depression and recovery.
It is called business barometer because it helps in making forecasts for
future business conditions.
The indices of production, wages, trade, finance, stocks and shares, etc. are
plotted on a graph paper to obtain the curve showing trend of long-period
and seasonal movements. The various index numbers relating to different
activities of business may be combined into a general or composite index
of business activity.’
The following are some of the important series which are considered by
businessmen for forecasting:
(v) Employment
The different figures may be converted into relatives on a certain base. The
weighted average of these relatives may be computed to ascertain the
business index called the barometer.
Self-Instructional Material
157
Business Forecasting The reports on general business and trade conditions are published by the
NOTES
Chamber of Commerce, industry and some trade associations. Important
journals and newspapers also publish index numbers relating to various
industries and trades. The Reserve Bank of India also publishes various
index numbers and indicators of general economic conditions.
(i) There should not be sudden jumps in figures from one period to another;
and
(c) Regression:
Self-Instructional Material
158
UNIT – XII TIME SERIES ANALYSIS Time Series Analysis
NOTES
Structure
12.1 Introduction
12.2 Regression analysis
12.3 Exponential Smoothing Method
12.4 Theories of Business Forecasting:
12.5 Theory of Economic Rhythm
12.6 Action and Reaction Approach
12.7 Sequence Method or Time Lag Method
12.8 Specific Historical Analogy
12.9 Cross-Cut Analysis
12.10 Model Building Approach
12.11 Utility of Business Forecasting
12.12 Limitations of Business Forecasting
12.13 Business Forecasting: Advantage
12.1 Introduction
A series of observations, on a variable, recorded after successive intervals
of time is called a time series. The successive intervals are usually equal
time intervals, e.g., it can be 10 years, a year, a quarter, a month, a week, a
day, and an hour, etc. The data on the population of India is a time series
data where time interval between two successive figures is 10 years.
Similarly figures of national income, agricultural and industrial production,
etc., are available on yearly basis.
12.2 Regression analysis
The main objective of regression analysis is to know the nature of
relationship between two variables and to use it for predicting the most
likely value of the dependent variable corresponding to a given, known
value of the independent variable. This can be done by substituting in
Eq.(5.1a) any known value of X corresponding to which the most likely
estimate of Y is to be found.
Yc = 8.61 + 0.71(15)
8.61 + 10.65
19.26
It may be appreciated that an estimate of Y derived from a regression
equation will not be exactly the same as the Y value which may actually be
observed. The difference between estimated Yc values and the
corresponding observed Y values will depend on the extent of scatter of
various points around the line of best fit.
Self-Instructional Material
159
Time Series Analysis loser the various paired sample points (Y, X) clustered around the line of
NOTES best fit, the smaller the difference between the estimated Yc and observed Y
values, and vice-versa. On the whole, the lesser the scatter of the various
points around, and the lesser the vertical distance by which these deviate
from the line of best fit, the more likely it is that an estimated Yc value is
close to the corresponding observed Y value.
Ft-1 +α(Dt-
Dt-1 Forecast Error α(Dt-1 -Ft-1) 1-Ft-1)
The estimated Yc values will coincide the observed Y values only when all
the points on the scatter diagram fall in a straight line. If this were to be so,
the sales for a given marketing expenditure could have been estimated with
l00 percent accuracy. But such a situation is too rare to obtain. Since some
of the points must lie above and some below the straight line, perfect
prediction is practically non-existent in the case of most business and
economic situations.
This means that the estimated values of one variable based on the known
values of the other variable are always bound to differ. The smaller the
difference, the greater the precision of the estimate, and vice-versa.
Accordingly, the preciseness of an estimate can be obtained only through a
measure of the magnitude of error in the estimates, called the error of
estimate.
In this regard, business forecasting refers to the analysis of the past and
present economic conditions with the object of drawing inferences about
the future business conditions. In the words of Allen, “Forecasting is a
systematic attempt to probe the future by inference from known facts. The
purpose is to provide management with information on which it can base
planning decisions.
The reason for the same lies in the fact that despite all precautions, an
element of error is bound to creep in the forecasts and we cannot eliminate
guesswork in forecasts. It is also felt that forecasting is influenced by the
pessimistic or optimistic attitude of the forecaster.
The success of a new business will depend upon the accuracy of such
forecasts. If the forecasts are made systematically, then the operations of
the business will go smoothly and the chances of failure will be minimised.
Self-Instructional Material
165
Analysis of Time Series
UNIT XIII - ANALYSIS OF TIME
NOTES SERIES
Structure
13.0 Introduction
13.1 Objectives
13.2 Time series Analysis
13.2.1 Components of time series
13.2.2 Analysis of Time Series
13.3 Measurement of trends
13.3.1Moving average method
13.3.2 Least square method
13.4 Measurement of seasonal variation
13.4.1 Methods of constructing seasonal indices
13.5 Forecasting
13.6 Deseasonalisation
13.7 Summary
13.8 Key Words
13.9 Answers to Check Your Progress
13.10 Questions and Exercise
13.11 Further Readings
13.0 INTRODUCTION
When quantitative data are arranged in the order of their
occurrence, the resulting statistical series is called a time series. The
quantitative values are usually recorded over equal time interval daily,
weekly, monthly, quarterly, half yearly, yearly, or any other time
measure. Monthly statistics of Industrial Production in India, Annual
birth-rate figures for the entire world, yield on ordinary shares, weekly
wholesale price of rice, and daily records of tea sales or census data are
some of the examples of time series. Each has a common characteristic of
recording magnitudes that vary with passage of time. In this unit we will
see about time series analysis.
13.1 OBJECTIVES
After going through this unit, you will
Learn about time series analysis
Know about the measurement of trends
Self-Instructional Material
166
Analysis of Time Series
Understand forecasting and Deseasonalisation
NOTES
13.2 TIME SERIES ANALYSIS
Time series are influenced by a variety of forces. Some are continuously
effective other make themselves felt at recurring time intervals, and still
others are non-recurring or random in nature. Therefore, the first task is
to break down the data and study each of these influences in isolation.
This is known as decomposition of the time series. It enables us to
understand fully the nature of the forces at work. We can then analysis
their combined interactions. Such a study is known as time-series
analysis.
13.2.1 COMPONENTS OF TIME SERIES:
The factors that are responsible for bringing about changes in a
time series, also called the components of time series, are as follows:
Secular Trends (or General Trends)
Seasonal Movements
Cyclical Movements
Irregular Fluctuations
Secular Trends:
Secular trend is the main component of a time series which results
from long term effects of socio-economic and political factors. It shows
the growth or decline in a time series over a long period. It is the type of
tendency which continues to persist for a very long period. Prices and
export and import data, for example, reflect obviously increasing
tendencies over time.
Seasonal Trends:
Seasonal trends are short term movements occurring in data due
to seasonal factors. The short term is generally considered as a period in
which changes occur in a time series with variations in weather or
festivities. For example, it is commonly observed that the consumption of
ice-cream during summer is generally high and hence an ice-cream
dealer's sales would be higher in some months of the year while
relatively lower during winter months. Employment, output, exports, etc.,
are subject to change due to variations in weather. Similarly, the sale of
garments, umbrellas, greeting cards and fire-works are subject to large
variations during festivals like Valentine’s Day, Eid, Christmas, New
Year's, etc. These types of variations in a time series are isolated only
when the series is provided biannually, quarterly or monthly.
Cyclic Movements
Self-Instructional Material
It is a long term oscillations occurring in a time series. These
167
Analysis of Time Series oscillations are mostly observed in economics data and the periods of
such oscillations are generally extended from five to twelve years or
NOTES more. These oscillations are associated with the well known business
cycles. These cyclic movements can be studied provided a long series of
measurements, free from irregular fluctuations, is available.
Irregular Fluctuations
It happens when a sudden changes occurring in a time series
which are unlikely to be repeated. They are components of a time series
which cannot be explained by trends, seasonal or cyclic movements.
These variations are sometimes called residual or random components.
These variations, though accidental in nature, can cause a continual
change in the trends, seasonal and cyclical oscillations during the
forthcoming period. Floods, fires, earthquakes, revolutions, epidemic,
strikes etc., are the root causes of such irregularities.
13.2.2 ANALYSIS OF TIME SERIES
The objective of the time series analysis is to identify the
magnitude and direction of trends, to estimate the effect of seasonal and
cyclical variations and to estimate the size of the residual component.
This implies the decomposition of a time series into its several
components. Two lines of approach are usually adopted in analyzing a
given time series:
The additive model
The multiplicative model
It is not always necessary for the time series to include all four
types of variations; rather, one or more of these components might be
missing altogether. For example, when using annual data the seasonal
component may be ignored, while in a time series of a short span having
monthly or quarterly observations, the cyclical component may be
ignored
13.3 MEASUREMENT OF TRENDS
Moving average method
Least square method
13.3.1 MOVING AVERAGE METHOD
Moving average method is a simple device of reducing
fluctuations and obtaining rend values with a fair degree of accuracy. In
this method the average value of a number of years (months, weeks, or
days) is taken as the trend value for the middle point of the period of
moving average. The process of averaging smoothes the curve and
reduces the fluctuations.
The first thing to be decided in this method is the period of the
moving average. What it means is to take a decision about the number of
consecutive items whose average would be calculated each time.
Suppose it has been decided that the period of the moving average would
be 5 years (months, weeks, or days) then the arithmetic average of the
first 2 items (number 1,2,34 and 5) would be placed against item no:3
and then the arithmetic average of item Nos:2,3,4,5 and 6would be
placed against item No: 4. This process would be repeated till the
arithmetic average of the last five items has been calculated.
Odd Period of Moving Average
Calculation of three yearly moving averages includes the following steps
1. Add up the values of the first 3 years and place the yearly sum
against the median year. (This sum is called moving total) Self-Instructional Material
169
Analysis of Time Series 2. Leave the first year value, add up the values of the next three
years and place it against its median year.
NOTES 3. This process must be continued till all the values of the data are
taken for calculation.
4. Each 3-yearly moving total must be divided by 3 to get the 3-year
moving averages, which is our required trend value.
The formula calculating 3 yearly moving averages is as follows
𝑎 +𝑏+𝑐 𝑏+𝑐+𝑑 𝑐+𝑑 +𝑒
, ,
3 3 3
Example:
Calculate the 3 yearly and 5 yearly moving averages of the data
Years 1 2 3 4 5 6 7 8 9 10 11 12
Solution:
Year Sales 3 Year 3 Year 5 Year 5 Year
Moving Moving Moving Moving
Total Average Total Average
(3) / 3 (4) / 5
1 5.2 --- -- --
2 4.9 15.6 5.2 -- --
3 5.5 15.3 5.1 25.7 5.14
4 4.9 15.6 5.2 26.2 5.24
5 5.2 15.8 5.27 26.7 5.34
6 5.7 16.3 5.41 27.0 5.4
7 5.4 16.9 5.63 28.0 5.6
8 5.8 17.1 5.7 28.8 5.76
9 5.9 17.7 5.23 28.3 5.66
10 6.0 17.1 5.7 27.7 5.54
11 5.2 16.0 5.33 --- ---
12 4.8 --- --- --- ---
Even Period of Moving Average:
The period of moving average is 4,6, or 8, it is even number. The
Self-Instructional Material four yearly total cannot be placed against any year as median 2.5 is
170
between the second and the third year. So the total should be placed in Analysis of Time Series
between the 2nd and 3rd years. We must centre the moving average in
order to place the moving average against the year NOTES
Steps to find even period of moving average:
1. Add up the values of the first 4 years and place the sum against
the middle of 2nd and 3rd year. (This sum is called 4 year moving
total)
2. Leave the first year value and add next 4 values from the 2nd year
onward and write the sum against its middle position.
3. This process must be continued till the value of the last item is
taken into account.
4. Add the first two 4-years moving total and write the sum against
3rd year.
5. Leave the first 4-year moving total and add the next two 4-year
moving total and place it against 4th year.
6. This process must be continued till all the 4-yearly moving totals
are summed up and centered.
7. Divide the 4-years moving total by 8 to get the moving averages
which are our required trend values
Example:
Find the 4 yearly moving average foe determining trend values in the
following time series data
Profit in(000) ₹ 12 14 16 15 13 14 18
Solution:
Years Profit Sum of 4 years Moving 4 yearly Moving
Fours Average Average Centered
2005 12
2006 14
57 14.25 (14.25 + 14.50)/ 2 =
14.38
2007 16
58 14.50 (14.50 + 14.50)/ 2 =
14.50
2008 15 Self-Instructional Material
2011 18
Advantages
Moving averages can be used for measuring the trend of any
series. This method is applicable to linear as well as non-linear trends.
Disadvantages
The trend obtained by moving averages generally is neither a
straight line nor a standard curve. For this reason the trend cannot be
extended for forecasting future values. Trend values are not available for
some periods at the start and some values at the end of the time series.
This method is not applicable to short time series
13.3.2 LEAST SQUARES METHOD
When the trend is linear the trend equation may be represented by
y = a + bt and the values of a and b for the line y = a + bt which
minimizes the sum of squares of the vertical deviations of the actual
(observed) values from the straight line, are the solutions to the so called
normal equations:
Ʃy = na + bƩt …………….. (1)
Ʃyt = aƩt + bƩt2 ………….(2)
Where n is the number of paired observations
The normal equation are obtained by multiplying y = a + bt, by
the coefficient of a and b, i.e., by 1 and t throughout and summing up.
When the Number of Years is Odd
We can use this method when we are given odd number of years. It is
easy and is widely used in practice. If the number of items is odd, we
can follow the following steps:
1. Denote time as the t variable and values as y
2. Middle year is assumed as the period of origin and find out
deviations
3. Square the time deviations and find t 2.
4. Multiply the given value of y by the respective deviation of t and
find the total Ʃty.
5. Find out the values of y; get Ʃy
6. The value so obtained are placed in the two quations
i. Ʃy = na + bƩt
Self-Instructional Material ii. Ʃyt = aƩt + bƩt2; find out the value of a and b
7. The calculated values of a and b are substituted and the trend
172
value of y are found for various values of t. Analysis of Time Series
When the number of years is odd the calculation will be simplified by NOTES
taking the mid year as origin and one year as unit and in that case
Ʃt = 0 and the two normal equations take the form
Ʃy = na ; Ʃyt = bƩt2
Ʃ𝐲 Ʃ𝐲𝐭
Hence a = ,b=
𝑛 Ʃ𝐭²
Example :
Calculate trend values by the method of least square from data given
below and estimate the sales for 2003
Solution:
y t ty t2
1996 70 -2 -140 4
1997 74 -1 -74 1
1998 80 0 0 0
1999 86 1 86 1
2000 90 2 180 4
Since Ʃt = 0
Ʃ𝐲 400 Ʃ𝐲𝐭 52
a= = = 80 , b = = 𝟏𝟎 = 5.2
𝑛 5 Ʃ𝐭²
Hence, y = 80 + 5.2 x t
Self-Instructional Material
Therefore y1996 = 80 + 5.2 ( - 2) = 80 – 10.4 = 69.6
173
Analysis of Time Series y1997 = 80 + 5.2 ( -1) = 80 – 5.2 = 74.8
y1998 = 80 + 5.2 ( 0 ) = 80 + 0 = 80
NOTES
y1996 = 80 + 5.2 ( 1) = 80 + 5.2 = 85.2
y1996 = 80 + 5.2 ( 2) = 80 + 10.4 = 90.4
For 2003, t will be 5. Putting t = 5 in the equation
Y2013 = 80 + 5.2 (5\0 = 80 + 26 = 106
Thus the estimated sales for the year 2003 is ₹106 lakhs
When the Number of Years is Even
When the number of years is even the origin is placed in the midway
between the two middle years and the unit is taken to be half year instead
of one year. With this change of origin and scale we have
Ʃt = 0
Ʃ𝐲 Ʃ𝐲𝐭
Hence a = ,b=
𝑛 Ʃ𝐭²
Example:
Production of a company for 6 consecutive years is given in the
following table. Calculate the trend value by using the method of least
square
Production 12 13 18 20 24 28
Solution:
y t ty t2
174
Analysis of Time Series
2005 28 2.5 70 6.25 27.37
NOTES
2
n = 6 Ʃy = 115 Ʃt = 0 Ʃty = 57.5 Ʃt = 17.5
Since t = 0
Ʃ𝐲 115 Ʃ𝐲𝐭 57.5
a= = = 19.17 , b = = 17.5 = 3.28
𝑛 6 Ʃ𝐭²
175
Analysis of Time Series 13.4.1 METHODS OF CONSTRUCTING SEASONAL INDICES
NOTES There are four methods of constructing seasonal indices.
1. Simple averages method
2. Ratio to trend method
3. Percentage moving average method
4. Link relatives method
Simple Average Method :
The time series data for each of the 4 seasons (for quarterly data)
of a particular year are expressed as percentages to the seasonal average
for that year. The percentages for different seasons are averaged over the
years by using simple average. The resulting percentages for each of the
4 seasons then constitute the required seasonal indices.
Steps to calculate Simple Average Method:
(i) Arrange the data by months, quarters or years according to the data
given.
(ii) Find the sum of the each months, quarters or year.
(iii) Find the average of each months, quarters or year.
(iv) Find the average of averages, and it is called Grand Average (G)
(v) Compute Seasonal Index for every season (i.e) months, quarters or
year is given by
𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝐴𝑣𝑒𝑟𝑎𝑔𝑒
Seasonal Index (S.I) = × 100
𝐺𝑟𝑎𝑛𝑑𝑎𝑣𝑒𝑟𝑎𝑔𝑒
Self-Instructional Material
176
Analysis of Time Series
2012 369 410 496 510
NOTES
2013 391 432 458 495
Solution:
𝑆𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝐴𝑣𝑒𝑟𝑎𝑔𝑒
Seasonal Index (S.I) = × 100
𝐺𝑟𝑎𝑛𝑑𝐴𝑣𝑒𝑟𝑎𝑔𝑒
343 .83 + 412 + 461 .67+ 486 .83 1704 .33
Grand average = = = 426.0825
4 4
343 .83
S.I for I Q = × 100 = 80.69
426 .0825
177
Analysis of Time Series
461 .67
NOTES S.I for III Q = × 100 = 108.35
426 .0825
486 .83
S.I for IV Q = × 100 = 114.26
426 .0825
13.5 FORECASTING
Time series forecasting methods produce forecasts based solely on
historical values and they are widely used in business situations where
forecasts of a year or less are required. These methods used are
particularly suited to Sales, Marketing, Finance, Production planning etc.
and they have the advantage of relative simplicity. Time series
forecasting is a technique for the prediction of events through a sequence
of time.
The technique is used across many fields of study, from geology to
economics. The techniques predict future events by analyzing the trends
of the past on the assumption that future trends will hold similar to
historical trends. Data is organized around relatively deterministic
timestamps, and therefore, compared to random samples, may contain
additional information that is tried to extract.
Time series methods are better suited for short-term forecasts
(i.e., less than a year).
Time series forecasting relies on sufficient past data being
available and that the data is of a high quality and truly
representative.
Time series methods are best suited to relatively stable situations.
Where substantial fluctuations are common and underlying
Self-Instructional Material conditions are subject to extreme change, then time series
178
methods may give relatively poor results. Analysis of Time Series
Deseasonalised data being free from the seasonal impact manifest only
average valueof data.
Seasonal adjustment can be made by dividing the original data by the
seasonal index.
𝑶𝑹𝑰𝑮𝑰𝑵𝑨𝑳𝑫𝑨𝑻𝑨
Deseasonalised data = 𝑺𝑬𝑨𝑺𝑶𝑵𝑨𝑳𝑰𝑵𝑫𝑬𝑿 𝑿 𝟏𝟎𝟎
where an adjustment-multiplier 100 is necessary because the seasonal
indices are usually given in percentages.
In case of additive model
Yt = T + S + C + I
𝒔𝒆𝒂𝒔𝒐𝒏𝒂𝒍𝒊𝒏𝒅𝒆𝒙
Deseasonalised data = 𝒐𝒓𝒊𝒈𝒊𝒏𝒂𝒍𝒅𝒂𝒕𝒂 − 𝟏𝟎𝟎
𝒔𝒆𝒂𝒔𝒐𝒏𝒂𝒍𝒊𝒏𝒅𝒆𝒙
= 𝐘𝐭 − 𝟏𝟎𝟎
13.7 SUMMARY
Time series are influenced by a variety of forces. Some are Self-Instructional Material
continuously effective other make themselves felt at recurring
179
Analysis of Time Series time intervals, and still others are non-recurring or random in
nature. Therefore, the first task is to break down the data and
NOTES study each of these influences in isolation. This is known as
decomposition of the time series.
The objective of the time series analysis is to identify the
magnitude and direction of trends, to estimate the effect of
seasonal and cyclical variations and to estimate the size of the
residual component. This implies the decomposition of a time
series into its several components. Two lines of approach are
usually adopted in analyzing a given time series:
180
Analysis of Time Series
13.9 ANSWERS TO CHECK YOUR PROGRESS NOTES
1. Periodically, at equal time intervals, at successive points of
time
2.Cyclical movements
3. Time series are influenced by a variety of forces. Some are
continuously effective other make themselves felt at recurring
time intervals, and still others are non-recurring or random in
nature.
4. Time series forecasting methods produce forecasts based solely
on historical values and they are widely used in business
situations where forecasts of a year or less are required
5.There are four methods of constructing seasonal indices.
1. Simple averages method
2. Ratio to trend method
3. Percentage moving average method
4. Link relatives method
Year 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Production 21 22 23 25 24 22 25 26 27 26
Self-Instructional Material
182
Index Numbers
UNIT XIV - INDEX NUMBER
Structure NOTES
14.0 Introduction
14.1 Objectives
14.2 Index Numbers
14.2.1 Types of Index Numbers
14.2.2 Problems in construction of Index Numbers
14.2.3 Methods of Constructing Index Numbers
14.2.4 Quantity or Volume Index Numbers
14.2.5 Test for Index Numbers
14.2.6 Chain Base Index Numbers
183
group of related variables over a period of time, to obtain a figure that
Index Numbers represents the „net‟ result of the change in the constitute variables. In this
unit you will learn in detail about index numbers.
NOTES 14.1 OBJECTIVES
After going through this unit, you will
Understand about index numbers and their types
Learn about the different methods of calculating index numbers
Know the uses and limitations of index numbers
14.2 INDEX NUMBERS
Index numbers are meant to study changes in the effects of factors
which cannot be measured directly. According to Bowley, “Index
numbers are used to measure the changes in some quantity which we
cannot observe directly”. For example, changes in business activity in a
country are not capable of direct measurement, but it is possible to study
relative changes in business activity by studying the variations in the
values of some such factors which affect business activity, and which are
capable of direct measurement.
Index numbers may be classified in terms of the variables that they
are intended to measure. In business, different groups of variables in the
measurement of which index number techniques are commonly used are
(i) price, (ii) quantity, (iii) value and (iv)business activity. Thus, we have
an index of wholesale prices, index of consumer prices, index of
industrial output, index of value of exports and index of business activity,
etc. Here we shall be mainly interested in index numbers of prices
showing changes with respect to time, although the methods described
can be applied to other cases. In general, the present level of prices is
compared with the level of prices in the past. The present period is called
the current period and some period in the past is called the base period.
14.2.1 TYPE OF INDEX NUMBER
Index numbers are names after the activity they measure. Their
types are as under:
Price Index: Measure changes in price over a specified period of time. It
is basically the ratio of the price of a certain number of commodities at
the present year as against base year.
Quantity Index : As the name suggest, these indices pertain to
measuring changes in volumes of commodities like goods produced or
goods consumed, etc.
Value Index: These pertain to compare changes in the monetary value of
imports, exports, production or consumption of commodities.
The simple average of relative method is simpler and easier to apply than
the simple aggregative method. The only disadvantage is that it gives
equal weight to all items.
Example :
The following are the prices of four different commodities for
2017 and 2018. Compute a price index with the (1) simple aggregative Self-Instructional Material
185
method and (2) average of price relative method by using both the
Index Numbers arithmetic mean and geometric mean, taking 2017 as the base.
NOTES
Solution:
∑P1 2662
P01 =∑P0 × 100 = × 100 = 101.49
2623
2. Simple Average of Price Relative Method ( using the arithmetic
mean)
1 P1 1
P01 = Ʃ 𝑥 100 = (407.64) 𝑥 100 = 101.91
𝑛 P0 4
∑ log P 8.0213
P01 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔( ) = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔( ) = 101.23
4 4
Self-Instructional Material
186
Weighted Index number: Index Numbers
When all commodities are not of equal importance, we assign
weight to each commodity relative to its importance and the index NOTES
number computed from these weights is called a weighted index number.
Weighted aggregative index number:
In order to attribute appropriate importance to each of the items
used in an aggregate index number some reasonable weights must be
used. There are various methods of assigning weights and consequently
a large number of formulae for constructing index numbers have been
devised of which some of the most important ones are:
1. Laspeyre‟s Index Number
2. Paasche‟s Index Number
3. Fisher‟s Ideal Index Number
4. Marshal-Edge worth Index Number
Laspeyre’s Index Number:
In this index number the base year quantities are used as weights,
so it also called the base year weighted index.
∑𝐏𝟏𝐪𝟎
P01 = 𝒙𝟏𝟎𝟎
∑𝐏𝟎𝐪𝟎
187
Example:
Index Numbers Compute the weighted aggregative price index numbers for 2011
with 2010 as the base year using (1) Laspeyre‟s Index Number (2)
NOTES Paasche‟s Index Number (3) Fisher‟s Ideal Index Number (4) Marshal-
Edgeworth Index Number.
Prices Quantities
Commodity
2010 2011 2010 2011
A 10 12 20 22
B 8 8 16 18
C 5 6 10 11
D 4 4 7 8
Solution:
Prices Quantities
Commodity 2010 2011 2010 2011 P1q0 P0q0 P1q1 P0q1
P0 P1 q0 q1
C 5 6 10 11 60 50 66 55
D 4 4 7 8 28 28 32 32
ƩP1q0 ƩP0q0 ƩP1q1 ƩP0q1
= = = 506 = 451
456 406
Self-Instructional Material
188
∑P1q0 456
P01 = 𝑥 100 = 406 𝑥 100 = 112.32 Index Numbers
∑ P0q0
X 5 4 40
Self-Instructional Material
189
Y 3 2 60
Index Numbers
Z 2 1 20
NOTES
Solution:
Price
Commodity Weight P PW
Current Base 𝑃1
year year =𝑃0 𝑥 100
X 5 4 40 125 5000
Y 3 2 60 150 9000
Z 2 1 20 200 4000
120 18000
Ʃ𝑝𝑤 18000
Weighted Average of Price Relative index = = = 150
Ʃ𝑤 120
2002 2012
Commodity
Price Total value Price Total value
A 10 200 12 360
B 12 480 15 900
C 15 450 17 680
Solution :
Here instead of quantity, total values are given, hence find
quantities of base year and current year
𝑡𝑜𝑡𝑎𝑙𝑣𝑎𝑙𝑢𝑒
Quantity = 𝑝𝑟𝑖𝑐𝑒
Commodity p0 q0 p1 q1 p0 q0 p0 q1 p1 q0 p1 q1
10 20 12 30 200 300 240 360
A
10 40 15 60 400 600 600 900
B
15 30 17 40 450 600 510 680
C
Total 1050 1500 1350 1940
∑q1p0 1500
Laspeyre‟s method Q01 = 𝑥 100 = 𝑥 100 = 142.86
∑ q0p0 1050
∑q1p1 1940
Paasche‟s formula Q01 = 𝑥 100 = 1350 𝑥 100 = 143.7
∑ q0p1
∑q1p0 ∑q1p1
Fisher‟s formula Q01 = 𝑥 ∑ q0p1 𝑥 100
∑ q0p0
NOTES There are certain tests which are put to verify the consistency, or
adequacy of an index number formula from different points of view. The
most popular among these are the following tests:
Order reversal test.
Time reversal test.
Factor reversal test.
Unit test.
The time reversal test requires that the index number computed
backwards should be the reciprocal of the index number computed
forward, except for the constant of proportionality
∑𝐏𝟏𝐪𝟎 ∑𝐏𝟏𝐪𝟏
P01 = 𝒙 ∑𝐏𝟎𝐪𝟏
∑𝐏𝟎𝐪𝟎
∑𝐏𝟎𝐪𝟏 ∑𝐏𝟎𝐪𝟎
P10 = 𝒙 ∑𝐏𝟏𝐪𝟎
∑𝐏𝟏𝐪𝟏
P01 x P10 = 1
Laspeyre‟s and Paasche„s method do not satisfy this test but
Fisher‟s ideal index satisfies this method. Besides both the simple and
weighted geometric mean of piece relatives, also, satisfy this time
reversal test.
3. Factor reversal test:
This test has also been put forth by Prof. Irving Fisher, in this test
the product of price index and quantity index must be equal to the value
index. Thus, for the Factor Reversal test, a formula of index number
should satisfy the following equation:
Self-Instructional Material
Price index × Quantity Index = Value Index
192
∑𝐏𝟏𝐪𝟎 ∑𝐏𝟏𝐪𝟏 Index Numbers
P01 = 𝒙 ∑𝐏𝟎𝐪𝟏
∑𝐏𝟎𝐪𝟎
NOTES
∑𝐪𝟏𝐩𝟎 ∑𝐪𝟏𝐩𝟏
Q01 = 𝒙
∑𝐪𝟎𝐩𝟎 ∑𝐪𝟎𝐩𝟏
∑𝐩𝟏𝐪𝟏
∴ P01 x Q01 = ∑𝐩𝟎𝐪𝟎
Most of the formulae of index number discussed above fail to
satisfy this acid test of consistency except that of Prof. Irving Fisher.
4.Unit test
This test suggests that the formula for constructing an index
should be independent of the unit of measurement in which the prices
and quantities are quoted. Except unweighted aggregative index number
all other formulas in this chapter satisfy this test.
Example:
Construct Fisher‟s ideal index for the following data. Test whether it
satisfies time reversal test and factor reversal test.
A 24 20 30 24
B 30 14 40 10
C 10 10 16 18
Solution:
Commodity q0 p0 q1 p1 p0 q0 p0 q1 p1 q0 p1 q 1
24 20 30 24 480 600 576 720
A
30 14 40 10 420 560 300 400
B
10 10 16 18 100 160 180 288
C
1000 1320 1056 1408
Self-Instructional Material
= 1.056 𝑥1.067 x 100 = 1.127 x 100
193
Index Numbers = 1.062 x 100 = 106.2
1408 ∑p1 q1
= =
1000 ∑p0 q0
Hence Fisher ideal index number satisfy the factor reversal test
𝐏𝐧
𝐏𝐧 − 𝟏 , 𝐧 = 𝒙𝟏𝟎𝟎
𝐏𝐧 − 𝟏
Example:
Find the index numbers for the following data taking 2010 as the base
year
Solution:
18 18 100
2004 𝑥 100 = 100
18
21 21 100 x 116.67
2005 𝑥 100 = 116.67 = 116.7
18 100
25 25 116.67 x 119.05
2006 𝑥 100 = 119.05
21 100
= 138.9
23 23 138.9 x 92
2007 𝑥 100 = 92 = 127.79
25 100
28 28 127.79 x 121.74
2008 𝑥 100 = 121.74
23 100
= 155.57
30 30 155.57 x 107.14
2009 𝑥 100 = 107.14
28 100
= 166.68 Self-Instructional Material
195
Index Numbers
CHECK YOUR PROGRESS – 1
NOTES 1.What is chain base index number
2. What is the formula for Fisher‟s Ideal Index Number?
3. What is weighted index number?
Self-Instructional Material
(4) Selection of Commodities:
196
The next step is the selection of the commodities to be included. Index Numbers
We should select those commodities which are most often used by that
class of people. NOTES
14.3.2 METHODS TO COMPUTE COST OF LIVING INDEX
NUMBERS
There are two methods to compute cost of living index numbers:
(1) Aggregate Expenditure Method (2) Family Budget Method
∑𝐏𝟏𝐪𝟎
P01 = 𝒙𝟏𝟎𝟎
∑𝐏𝟎𝐪𝟎
Here,
P1 - Represent the price of the current year,
P0 - Represents the price of the base year and
q0- Represents the quantities consumed in the base year.
∑𝐏𝟏
Here, I=∑𝐏𝟎 𝒙𝟏𝟎𝟎 and ƩW=P0q0
Self-Instructional Material
Example:
197
Index Numbers Construct the cost of living index number for 2018 on the basis of
2017 from the following data using (1) Aggregate Expenditure Method
NOTES (2) Family Budget Method.
A 6 315.75 316.00
B 6 305.00 308.00
C 1 416.00 419.00
D 6 528.00 610.00
E 4 120.00 119.50
F 1 1020.00 1015.00
Solution:
The cost of living index number of 2018 by Aggregate Expenditure
method:
Quantity Prices
Commodity Consumed P1q0 P0q0
2017 2018
in 2017
P0 P1
(in quintal )
q0
Ʃ P1q0 Ʃ P0q0 =
Self-Instructional Material
198
= 9316 8808.5 Index Numbers
ƩW = ƩWI
8808.5 =931592.1
2
They indicate the changes in the consumer prices. Thus they help
government in formulating policies regarding control of price,
taxation, imports and exports of commodities, etc.
They are used in granting allowances and other facilities to
employees
They are used for the evaluation of purchasing power of money.
They are used for deflating money
They are used for comparing changes in the cost of living of
differenc classes of people Self-Instructional Material
199
14.4 USES OF INDEX NUMBER
Index Numbers The main uses of index numbers are given below.
NOTES Index numbers are used in the fields of commerce, meteorology,
labour, industry, etc.
Index numbers measure fluctuations during intervals of time,
group differences of geographical position of degree, etc.
They are used to compare the total variations in the prices of
different commodities in which the unit of measurements differs
with time and price, etc.
They measure the purchasing power of money.
They are helpful in forecasting future economic trends.
They are used in studying the difference between the comparable
categories of animals, people or items.
Index numbers of industrial production are used to measure the
changes in the level of industrial production in the country.
Index numbers of import prices and export prices are used to
measure the changes in the trade of a country.
Index numbers are used to measure seasonal variations and cyclical
variations in a time series.
14.5 LIMITATIONS OF INDEX NUMBER
They are simply rough indications of the relative changes.
The choice of representative commodities may lead to fallacious
conclusions as they are based on samples.
There may be errors in the choice of base periods or weights, etc.
Comparisons of changes in variables over long periods are not
reliable.
They may be useful for one purpose but not for another.
They are specialized types of averages and hence are subject to all
those limitations which an average suffers from.
CHECK YOUR PROGESS - 2
4. What are the methods to compute Cost of Living Index
numbers?
5. What are thepopularTests for Index number?
6.Write a few uses of index number.
14.6 SUMMARY
Index numbers are meant to study changes in the
effects of factors which cannot be measured directly.
According to Bowley, “Index numbers are used to
measure the changes in some quantity which we cannot
observe directly”.
Self-Instructional Material
200
. Price Index Quantity Index Value Index.Quantity Index Index Numbers
Numbers are the types of index numbers.
Price index numbers measures and permit comparison of NOTES
the price of certain goods; quantity index number, on the
other hand, measures the physical volume of production,
construction of employment. Though price indices are
more widely used, production indices are highly
significant as indicators of the level of output in the
economy or in parts of it.
There are certain tests which are put to verify the
consistency, or adequacy of an index number formula
from different points of view. The most popular among
these are the following tests: (1)Order reversal test.(2)
Time reversal test. (3) Factor reversal test. (4)Unit test.
In this method, there is no fixed base period; the year
immediately preceding the one for which the price index
has to be calculated is assumed as the base year.
Cost of living index numbers measure the changes in the
prices paid by consumers for a special “basket” of goods
and services during the current year as compared to the
base year.
There are two methods to compute cost of living index
numbers: (1) Aggregate Expenditure Method (2) Family
Budget Method.
8.7 KEY WORDS
Index numbers,Price Index, Quantity Index, Value Index, Laspeyre‟s
Index Number, Paasche‟s Index Number, Fisher‟s Ideal Index
Number, Marshal-Edge worth Index Number,Order reversal test,
Time reversal test, Factor reversal test, Unit test,Chain Base index
number, Cost of living index number.
A 6 Kg 5 7
B 6 Quintal 6 6
C 5 Quintal 5 4
D 6 Quintal 7 7
E 4 Quintal 8 8
F 5 Kg 9 9
Self-Instructional Material
203
DISTANCE EDUCATION – CBCS – (2018 – 2019 Academic Year Onwards)
Question Paper Pattern – BUSNESS STATISTICS
(UG Programs)
Time: 3 Hours Maximum: 75 Marks
Part – A (10 x 2 = 20 Marks)
Answer all questions
1. What is chi-square test?
2. What is meant by analysis of variance?
3. Write short note on index number.
4. Describe Type II error
5. Write any four advantages of statistics
6. Explain the term Probable Error
7. What is meant by Binomial Distribution?
8. Define the term “Correlation”
9. What is meant by Regression?
10. What is forecasting?
Part – B (5 x 5 = 25 Marks)
Answer all questions choosing either (A) or (B)
11. (A) what are the importances of statistics?
(or)
Y : 5 4 3 4 1
(or)
(B) What is the probability that a leap year will contain 53 Sundays?
13. (A) The manufacture of a certain make of electric bulbs claims that the bulbs have a mean life of 25
months with a life of 25 onths with a standard deviation of 5 months. A random sample of 6 such
bulbsgave the following values.
Life of months: 24 26 30 20 20 18
Can you regard the producer’s claim to be valid at 1% level of significance? V = 4.032
(or)
(B). Find standard deviation from the following observations.
Size : 120 125 130 135 140 145 150 155 160
Frequencies : 2 3 3 1 2 7 4 2 8
14. (A) The first of two samples consists of 23 pairs and gives a correlation coefficient of 0.5 while
the second of 28 pairs has the correlation coefficient of 0.8. Are these values significantly
different?
(or)
15. (A) Calculate the 3 yearly and 5 yearly moving averages of the data
Years 1 2 3 4 5 6 7 8 9 10 11 12
sales 5.2 4.9 5.5 4.9 5.2 5.7 5.4 5.8 5.9 6.00 5.2 4.8
(or)
(B) Calculate trend values by the method of least square from data given below and estimate the
sales for 2003
Part – C (3 x 10 = 30 Marks)
Answer any three out of five questions
16. Eight coins are tossed simultaneously 256 times. Number of heads observed at each throw is
recorded as given below:
No. of Heads: 0 1 2 3 4 5 6 7 8
Frequencies : 2 6 30 52 67 56 32 10 1
Fit a binomial distribution and the expected frequency. Also find the mean and standard
deviation
17. The quantity of raw materials purchased by a company at the specified prices during the 12
months of 2017 is given as follows:
Months J F M A M J JY A S O N DEC
Quantity 250 200 250 280 300 300 220 220 200 210 300 250
Per kg
b. Can you estimate the approximate quantity likely to be purchased if the price shoots up to
Rs.124 per kg?
18. 200 digits are chosen at random from a set of tables. The frequencies of the digits are as follows
Digit : 0 1 2 3 4 5 6 7 8 9
F : 18 19 23 21 16 25 22 20 21 15
Use Chi-Square test to assess the correctness of the hypothesis that the digits were distributed in
equal numbers in the tables from which they were chose. V, 9 = 16.22
19. The sales data of an item in six shops before and after a special promotional campaign are as under:
Before Campaign: 53 28 31 48 50 42
After Campaign : 58 29 30 55 56 45
Can the campaign be judged to be a success? Test at 5% level of significance. V= t0.05 = 2.57
Price Quantity
X 9 15 5 5
Y 4 12 10 11
Z 1 5 6 6