Statistics For Management
Statistics For Management
Notes
e
Learning Objective:
in
●● To get introduced with types of statistics and its application in different phases
●● To develop an understanding of data representation techniques
nl
●● To understand the MS Excel applications of numerical measures of central
tendency and dispersion
O
Learning Outcome:
At the end of the course, the learners will be able to –
ity
graphical procedures
●● To be able to use the tool MS Excel for answering business problems based on
numerical measures
s
“ It is the science of collection, presentation, analysis, and interpretation of
numerical data from logical analysis”
Croxton and Cowden define-
er
1.1.1 Statistical Thinking and Analysis
v
Data is a collection of any number of related observations. We can collect the
ni
Statistics is not restricted to only information about the State, but it also extends
to almost every realm of the business. Statistics is about scientific methods to gather,
organize, summarize and analyze data. More important still is to draw valid conclusions
ity
and make effective decisions based on such analysis. To a large degree, company
performance depends on the preciseness and accuracy of the forecast. Statistics
is an indispensable instrument for manufacturing control and market research.
Statistical tools are extensively used in business for time and motion study, consumer
m
person, usually owner or manager of the firm, used to take all decisions regarding
the business. Example: A manager used to decide, from where the necessary raw
materials and other factors of production were to be acquired, how much of output will
Amity Directorate of Distance & Online Education
2 Statistics Management
be produced, where it will be sold, etc. This type of decision making was usually based
Notes
e
on experience and expectations of this single individual and as such had no scientific
basis.
in
1.1.2 Limitations and Applications of Statistics
Statistical techniques, because of their flexibility have become popular and
nl
are used in numerous fields. But statistics is not a cure-all technique and has few
limitations. It cannot be applied to all kinds of situations and cannot be made to answer
all queries. The major limitations are:
O
1. Statistics deals with only those problems, which can be expressed in quantitative
terms and amenable to mathematical and numerical analysis. These are not suitable
for qualitative data such as customer loyalty, employee integrity, emotional bonding,
motivation etc.
ity
2. Statistics deals only with the collection of data and no importance is attached to an
individual item.
3. Statistical results are only an approximation and not mathematically correct. There is
s
always a possibility of random error.
4. Statistics, if used wrongly, can lead to misleading conclusions, and therefore, should
er
be used only after complete understanding of the process and the conceptual base.
5. Statistics laws are not exact laws and are liable to be misused.
6. The greatest limitation is that the statistical data can be used properly only by a
v
professional. A person having thorough knowledge of the methods of statistics and
proper training can only come to conclusions.
ni
7. If statistical data are not uniform and homogenous, then the study of the problem is
not possible. Homogeneity of data is essential for a proper study.
U
8. Statistical methods are not the only method for studying a problem. There are other
methods as well, and a problem can be studied in various ways.
The study of statistics can be categorized into two main branches. These branches
are descriptive statistics and inferential statistics.
Descriptive statistics is used to sum up and graph the data for a category picked.
m
Descriptive statistics give information that describes the data in some manner. For
example, suppose a pet shop sells cats, dogs, birds and fish. If 100 pets are sold, and
35 out of the 100 were dogs, then one description of the data on the pets sold would be
that 35% were dogs.
Inferential statistics are techniques that allow us to use certain samples to generalize
(c
the populations from which the samples were taken. Hence, it is crucial that the sample
represents the population accurately. The method to get this done is called sampling. Since
the inferential statistics aim at drawing conclusions from a sample and generalizing them to
Notes
e
a population, we need to be sure that our sample accurately represents the population. This
requirement affects our process. At a broad level, we must do the following:
in
●● Define the population we are studying.
●● Draw a representative sample from that population.
●● Use analyses that incorporate the sampling error.
nl
1.1.4 Importance and Scope of Statistics
O
●● Condensation: Statistics compresses a mass of figures to small meaningful
information, for example, average sales, BSE index, the growth rate etc. It is
impossible to get a precise idea about the profitability of a business from a mere
record of income and expenditure transactions. The information of Return On
ity
Investment (ROI), Earnings Per Share (EPS), profit margins, etc., however, can be
easily remembered, understood and thus used in decision-making.
s
can be harmful for the business. For example, to decide the refining capacity for a
petrochemical plant, it is required to predict the demand of petrochemical product
er
mix, supply of crude oil, the cost of crude, substitution products, etc., for next 10 to
20 years, before committing an investment.
●● Expectation: Statistics provides the basic building block for framing suitable
policies. For example how much raw material should be imported, how much
capacity should be installed, or manpower recruited, etc., depends upon the
m
Sample
A sample consists of one or more observations drawn from the population. Sample
is the group of people who actually took part in your research. They are the people
(c
that are questioned (for example, in a qualitative study) or who actually complete the
survey (for example, in a quantitative study). Participants who may have been research
participants but didn’t personally participate are not considered part of the survey.
e
sample is always less than the size of the population from which it is taken. [Utilizes the
count n - 1 in formulas.]
in
Population
A population includes all of the elements from a set of data. Population is the
nl
broader group of people that you expect to generalize your study results to. Your
sample is just going to be a part of the population. The size of your sample will depend
on your exact population.
O
A population data set contains all members of a specified group (the entire list of
possible data values). [Utilizes the count n in formulas.]
ity
For example – Mr. Tom wants to do a statistical analysis on students’ final
examination scores in her math class for the past year. Should he consider her data to
be a population data set or a sample data set?
Mrs. Tom is only working with the scores from his class. There is no reason for him
s
to generalize her results to all management students in the school. He has all of the
data that pertaining to his investigation = population.
er
1.2.1 Importance of Graphical Representation of Data
Data needs to be process and analyze the data obtained from the field. The
v
processing consists mainly of recording, labeling, classifying and tabulating the
collected data so that it is consistent with the report. The data may be viewed either in
ni
tabulation form or via charts. Effective use of the data collected primarily depends on
how it is organized, presented, and summarized.
●● One of the most convincing and appealing ways in which statistical results
U
In a bar diagram, only the length of the bar is taken into account but not the width.
In other words bar is a thick line whose width is merely shown, but length of the bar is
taken into account and is called one-dimensional diagram.
It represents only one variable. Since these are of the same width and vary only in
lengths (heights), it becomes very easy for a comparative study. Simple bar diagrams
are very popular in practice. A bar chart can be either vertical or horizontal; for example
Notes
e
sales, production, population figures etc. for various years may be shown by simple bar
charts
in
Illustration - 1
The following table gives the birth rate per thousand of different countries over a
certain period of time.
nl
Country India Germany U. K. New Zealand Sweden China
Birth rate 33 16 20 30 15 40
O
s ity
v er
ni
Comparing the size of bars, China’s birth rate is highest, next is India whereas
Germany and Sweden equal in the lowest positions.
U
further subdivided into various components. Each component occupies a part of the bar
proportional to its share in total.
Illustration - 1
Notes
e
in
nl
O
ity
Multiple Bar Diagram
In a multiple bar diagram, two or more sets of related data are represented and the
components are shown as separate adjoining bars. The height of each bar represents
s
the actual value of the component. The components are shown by different shades or
colours. er
Illustration 1 - Construct a suitable bar diagram for the following data of number of
students in two different colleges in different faculties.
v
College Arts Science Commerce Total
ni
e
In percentage bar diagram the length of the entire bar kept equal to 100 (Hundred).
Various segment of each bar may change and represent percentage on an aggregate.
in
Illustation 1
nl
1995 45% 35% 20%
O
1997 48% 36% 16%
s ity
v er
1.2.3 Pie Chart
ni
A pie chart or a circle chart is a circular statistical graphic, that is divided into
slices to illustrate a numerical proportion. In a pie chart, the arc length of each slice is
proportional to the quantity it represents. While it is named for its resemblance to a pie
which has been sliced, there are variations on the way it can be presented.. In a pie
U
chart, categories of data are represented by wedges in the circle and are proportional in
size to the percent of individuals in each category.
Pie charts are very widely used in the business world and the mass media. Pie
ity
charts are generally used to show percentage or proportional data and usually the
percentage represented by each category is provided next to the corresponding slice of
pie. Pie charts are good for displaying data for around six categories or fewer.
m
1.2.4 Histogram
Histogram is a graphical data display using bars of different heights. This is similar
to a bar map, but there are ranges of histogram categories. The height of each bar
)A
e
●● Analyzing what the output from a supplier’s process looks like seeing whether
a process change has occurred from one time period to another
in
●● Determining whether the outputs of two or more processes are different
●● You wish to communicate the distribution of data quickly and easily to others
nl
1.2.5 Frequency Polygon
These are the frequencies plotted against the mid-points of the class-intervals and
the points thus obtained are joined by line segments. On comparing the Histogram and
O
a frequency polygon, in frequency polygons the points replace the bars (rectangles).
Also, when several distributions are to be compared on the same graph paper,
frequency polygons are better than Histograms.
ity
Illustration 1
s
10-20 3
20-30
er 16
30-40 22
40-50 35
v
50-60 24
ni
60-70 15
70-80 2
U
ity
m
)A
(c
1.2.6 Ogives
Notes
e
When frequencies are added, they are called the cumulative frequencies. The
curve obtained by plotting cumulating frequencies is called a cumulative frequency
in
curve or an ogive (pronounced as ojive).
nl
cumulative frequencies on the vertical (y-axis).
Less than Ogive: To plot a less than ogive, data is arranged in ascending order
O
of magnitude and frequencies are cumulated from the top i.e. adding. Cumulative
frequencies are plotted against the upper class limits. Ogives under this method, gives
a positive curve
Greater than Ogive: To plot a greater than ogive, the data is arranged in the
ity
ascending order of magnitude and frequencies are cumulated from the bottom or
subtracted from the total from the top. Cumulative frequencies are plotted against the
lower class limits. Ogives under this method, gives negative curve
s
skewness etc. can be located using ogives. Ogives are helpful in the comparison of the
two distributions.
er
Illustration 1 –
Draw less than and more than ogive curves for the following frequency distribution
v
and obtain median graphically. Verify the result.
f 5 12 18 25 15 12 8 5
U
20 5 100 0
40 17 95 20
ity
60 35 83 40
80 60 65 60
100 75 40 80
m
120 87 25 100
140 95 13 120
)A
Notes
e
in
nl
O
ity
1.2.7 Pareto Chart
s
A Pareto Chart is a graph showing the frequency of the defects and their
er
cumulative effect. Pareto charts are helpful in identifying the defects that should be
prioritized to achieve the greatest overall change.
The Pareto principle (also known as the 80/20 rule, the law of the vital few, or the
v
principle of factor sparsity) states that, for many events, roughly 80% of the effects
come from 20% of the causes.
ni
e
components
●● It must be used while communicating with others about the data
in
1.2.8 Stem-and-leaf display
A stem-and-leaf display or stem-and-leaf plot is a device for presenting quantitative
nl
data in a graphical format, similar to a histogram, to assist in visualizing the shape of a
distribution. They are are useful tools in exploratory data analysis.
A Stem and Leaf Plot is a special table where each data value is split into a “stem”
O
(the first digit or digits) and a “leaf” which is usually the last digit. The “stem” values are
listed down, and the “leaf” values go right (or left) from the stem values. The “stem” is
used to group the scores and each “leaf” shows the individual scores within each group.
ity
For example –
Tom got his friends to do a long jump and got these results:
2.3, 2.5, 2.5, 2.7, 2.8 3.2, 3.6, 3.6, 4.5, 5.0
s
The stem-and-leaf plot for the same will be – er
Stem Leaf
2 35578
3 266
v
4 5
ni
5 0
●● What the stem and leaf here mean (Stem “2” Leaf “3” means 2.3)
●● In this case each leaf is a decimal
●● It is OK to repeat a leaf value
ity
A cross-tabulation (or crosstab) is, for reference, a two- (or more) dimensional table
which records the number (frequency) of respondents having the specific characteristics
described in the table cells. Cross-tabulation tables offer a wealth of information on the
variables’ relationship. Cross-tabulation analysis goes by several names in the research
(c
e
●● Clean and Useable Data:
Cross tabulation makes it simple to interpret data! The clarity offered by cross
in
tabulation helps deliver clean data that be used to improve decisions throughout an
organization.
●● Easy to Understand:
nl
No advanced statistical degree is needed to interpret cross tabulation. The results
are easy to read and explain. This is makes it useful in any type of presentation.
O
1.2.10 Scatter plot and Trend line
Scatter diagram is the most fundamental graph plotted to show relationship
between two variables. It is a simple way to represent bivariate distribution. Bivariate
ity
distribution is the distribution of two random variables. Two variables are plotted one
against each of the X and Y axis. Thus, every data pair of (xi , yi ) is represented by
a point on the graph, x being abscissa and y being the ordinate of the point. From a
scatter diagram we can find if there is any relationship between the x and y, and if yes,
s
what type of relationship. Scatter diagram thus, indicates nature and strength of the
correlation. er
The pattern of points obtained by plotting the observed points are knows as scatter
diagram.
the dots cluster around a curve, the correlation is called a non-linear or curve linear
correlation.
Scatter diagram is drawn to visualize the relationship between two variables. The
ity
values of more important variable are plotted on the X-axis while the values of the
variable are plotted on the Y-axis. On the graph, dots are plotted to represent different
pairs of data. When dots are plotted to represent all the pairs, we get a scatter diagram.
The way the dots scatter gives an indication of the kind of relationship which exists
between the two variables. While drawing scatter diagram, it is not necessary to take
m
at the point of sign the zero values of X and Y variables, but the minimum values of the
variables considered may be taken.
●● When there is a positive correlation between the variables, the dots on the
)A
scatter diagram run from left hand bottom to the right hand upper corner. In
case of perfect positive correlation all the dots will lie on a straight line.
●● When a negative correlation exists between the variables, dots on the scatter
diagram run from the upper left hand corner to the bottom right hand corner. In
(c
case of perfect negative correlation, all the dots lie on a straight line.
Example: Figures on advertisement expenditure (X) and Sales (Y) of a firm for the
Notes
e
last ten years are given below. Draw a scatter diagram.
in
Sales in Lakh ` 45 56 58 82 65 70 64 85 50 85
Solution:
nl
O
s ity
er
A scatter diagram gives two very useful types of information. First, we can observe
patterns between variables that indicate whether the variables are related. Secondly,
if the variables are related we can get idea of what kind of relationship (linear or non-
v
linear) would describe the relationship. Correlation examines the first question of
determining whether an association exists between the two variables, and if it does, to
ni
Arithmetic mean is defined as the value obtained by dividing the total values
of all items in the series by their number. In other word is defined as the sum of the
given observations divided by the number of observations, i.e., add values of all items
m
Symbolically – x = x1 + x2 + x3 + xn/n
)A
all items.
3. Finding the combined arithmetic mean when different groups are given.
e
1. Arithmetic mean is affected by the extreme values.
2. Arithmetic mean cannot be determined by inspection and cannot be located
in
graphically.
3. Arithmetic mean cannot be obtained if a single observation is lost or missing.
nl
4. Arithmetic mean cannot be calculated when open-end class intervals are
present in the data.
O
Arithmetic Mean for Ungrouped Data
Individual Series
ity
1. Direct Method
The following steps are involved in calculating arithmetic mean under an individual
series using direct method:
s
- Divide the sum of the values by the number of items. The result is the arithmetic
er
mean.
The following formula is used: X = Ʃ x/N
Solution –
U
Mean = Ʃ x = 125 128 132 135 140 148 155 157 159 191 = 1440
X = Ʃ x/n = 1440/10
ity
= 144
Illustration - 1 Calculate the arithmetic average of the data given below using
Notes
e
short–cut method
Roll No 1 2 3 4 5 6 7 8 9 10
in
Marks 43 48 65 57 31 60 37 48 78 59
nl
Solution –
1 43 -17
O
2 48 -12
3 65 5
ity
4 57 -3
5 31 -29
6 60 0
s
7 37 -23
8 48 -12
er
9 78 18
10 59 -1
v
Ʃd = – 74
ni
X = a +Ʃd/N
as combined mean of the entire group. The combined average of two series can be
calculated by the given formula –
n1x1 + n2x2/ n1 + n2
Where, n1 = No. of items of the first group, n2 = No. of items of the second group
m
Example - From the following data ascertain the combined mean of a factory
)A
Solution:
(c
e
Average salary x2 = 250
in
n1x1 + n2x2/ n1 + n2
= 1, 50,000 + 2, 50,000/1500
nl
= 266.66
O
Weighted Arithmetic Mean
Sometimes, some observations get relatively more importance than other
observations. The weight for such observation must be given on the basis of their
relative importance. In weighted arithmetic mean, for finding an average the value of
ity
each item is multiplied by its weight and then the product are divided by the number of
weights.
Symbolically = Ʃwx / Ʃw
s
Example – Calculate simple and weighted average from the following data –
No. of tonnes 25 30 40 50 10 45
v
Solution:
ni
March 50 40 2000
ity
April 52 50 2600
June 54 45 2430
m
Simple AM
)A
X = Ʃx/n = 294/6 = 49
Weighted AM
The correct average price paid is ` 50.30 and not ` 49 i.e., weight arithmetic mean
is correct than simple arithmetic mean.
e
Median is defined as the value of the item dividing the series into two equal
halves, where one half contains all values less than (or equal to) it and the other half
in
contains all values greater than (or equal to) it. It is also defined as the “central value
of the variable. In median, the value of items must be arranged in order of their size or
magnitude to find out the median.
nl
Median is a positional average. The term position refers to the place of a value in
the series, where the place of median is such that it is equal to the number of items
lying on the either side; therefore it is also called as locative average.
O
Merits of Median
Following are the advantages of median:
ity
●● It is rigidly defined.
●● It is easy to calculate and understand.
●● It can be located graphically.
●● It is not affected by extreme values like the arithmetic mean.
s
●● It can be found by mere inspection. er
●● It can be used for qualitative studies.
●● Even if the extreme values are unknown, median can be calculated if one
knows the number of items.
v
Demerits of Median
ni
Application of Median
Example – Determine the median from the following –
m
3 23
4 23
5 25
Notes
e
6 25
7 25
in
8 27
9 40
nl
Median = 10/2 = 5th term
= 25
O
1.3.3 Mode - Intro and Application
The word “mode” is derived from the French word “1a mode” which means fashion.
ity
So it can be regarded as the most fashionable item in the series or the group.
Croxtan and Cowden regard mode as “the most typical of a series of values”. As a
result it can sum up the characteristics of a group more satisfactorily than the arithmetic
mean or median.
s
Mode is defined as the value of the variable occurring most frequently in a
er
distribution. In other words it is the most frequent size of item in a series.
Merits of Mode
v
The following are the merits of mode:
Demerits of Mode
The following are the demerits of mode:
●● If mode is multiplied by the number of items, the product will not be equal to
the total value of the items.
)A
●● It will not truly represent the group if there are a small number of items of the
same size in a large group of items of different sizes
●● It is not suitable for further mathematical treatment
Applications of Mode
(c
Individual Series
Notes
e
The mode of this series can be obtained by mere inspection. The number which
occurs most often is the mode.
in
Illustration - 1
nl
Solution:
On inspection, it is observed that the number 9 has maximum frequency i.e.,
O
repeated maximum of 4 times than any other number. Therefore mode (Z) = 9
Discrete Series
The mode is calculated by applying grouping and analysis table.
ity
●● Grouping Table: Consisting of six columns including frequency column,
1st column is the frequency 2nd and 3rd column is the grouping two
way frequencies and 4th, 5th and 6th column is the grouping three way
frequencies.
s
●● Analysis table: consisting of 2 columns namely tally bar and frequency
er
Steps in Calculating Mode in Discrete Series
The following steps are involved in calculating mode in discrete series:
v
●● Group the frequencies by two’s.
●● Leave the frequency and group the other frequencies in two’s.
ni
●● Leave the frequencies of the first two sizes and add the frequencies of the
other sizes in threes.
●● Prepare an analysis table to know the size occurring the maximum number
ity
of times. Find out the size, which occurs the largest number of times. That
particular size is the mode.
Continuous Series
m
Find out the modal class. Modal class can be easily found out by inspection. The
group containing maximum frequency is the modal group. Where two or more classes
)A
appear to be a modal class group, it can be decided by grouping process and preparing
an analyzed table as was discussed in question number 2.102.
Mo = l + fm – f1 / 2fm – f1 – f2 . i
(c
e
A percentile is the value below which a percentage of data falls.
in
Example: You are the fourth tallest person in a group of 20
nl
If your height is 1.65m then “1.65m” is the 80th percentile height in that group.
Quartiles are the values that split data into quarters. Quartiles are values that divide
O
a (part of a) data table into four groups containing an approximately equal number of
observations. The total of 100% is split into four equal parts: 25%, 50%, 75% and 100%.
The Quartiles also divide the data into divisions of 25%, so:
ity
●● Quartile 1 (Q1) can be called the 25th percentile
●● Quartile 2 (Q2) can be called the 50th percentile
●● Quartile 3 (Q3) can be called the 75th percentile
s
Example:
For 1, 3, 3, 4, 5, 6, 6, 7, 8, 8:
er
●● The 25th percentile = 3
●● The 50th percentile = 5.5
v
●● The 75th percentile = 7
The percentiles and quartiles are computed as follows:
ni
appropriate values.
A measure of dispersion or variation in any data shows the extent to which the
numerical values tend to spread about an average. If the difference between items is
small, the average represents and describes the data adequately. For large differences
it is proper to supplement information by calculating a measure of dispersion in addition
to an average. It is useful to determine data for the knowledge it may serve:
(c
e
●● To suggest methods to control variation in the data.
A study of variations helps us in knowing the extent of uniformity or consistency in
in
any data. Uniformity in production is an essential requirement in industry. Quality control
methods are based on the laws of dispersion.
nl
Absolute and Relative Measures of Dispersion
The measures of dispersion can be either ‘absolute’ or “relative”. Absolute
measures of dispersion are expressed in the same units in which the original data
O
are expressed. For example, if the series is expressed as Marks of the students in a
particular subject; the absolute dispersion will provide the value in Marks. The only
difficulty is that if two or more series are expressed in different units, the series cannot
be compared on the basis of dispersion.
ity
‘Relative’ or ‘Coefficient’ of dispersion is the ratio or the percentage of a measure
of absolute dispersion to an appropriate average. The basic advantage of this measure
is that two or more series can be compared with each other despite the fact they are
expressed in different units.
s
A precise measure of dispersion is one that gives the magnitude of the variation
er
in a series, i.e. it measures in numerical terms, the extent of the scatter of the values
around the average.
values. A good measure of dispersion should have properties similar to those described
for a good measure of central tendency.
Graphical Method
Range
)A
Definition: The ‘Range’ of the data is the difference between the largest value of
data and smallest value of data.
e
Where L – Largest value and S- Smallest Value
in
In individual observations and discrete series, L and S are easily identified. In
continuous series, the following two methods are used as follows:
nl
S - Lower boundary of the lowest class.
O
S - Mid Value of the lowest class.
Solution: L = 12 S = 5
ity
Range = L – S
= 12 – 5
=7
s
Coefficient of range = L – S / L + S
er
= 12 – 5/ 12 + 5
= 7/17
= 0.4118
v
Interquartile Range and Deviations
ni
Inter-quartile range and deviations are described in the following sub sections.
Inter-quartile Range
U
Inter-quartile range is a difference between upper quartile (third quartile) and lower
quartile (first quartile). Thus, Inter Quartile Range = (Q3 - Q1)
ity
Quartile deviation
Quartile Deviation is the average of the difference between upper quartile and
lower quartile.
Quartile Deviation (QD) also gives the average deviation of upper and lower
quartiles from Median.
)A
QD = (Q3 - Q1)/2 = Q3 - Q1 / Q3 + Q1
No. of Weeks: 5 8 21 12 6 52
Solution:
Notes
e
Weekly wages No. of Weeks: Cumulative Frequency
in
100 5 5
200 8 13
400 21 34
nl
500 12 46
600 6 52
O
N = 52
Q1 = N+1 /4
ity
= 52+1/4
13.25
s
= 200 + 0.25 (400-200)
= 200 + 0.25 × 200
er
= 200 + 50
= 250
v
Q3 = 3(N+1 /4)
ni
= 3 x 13.25
= 39.75
U
= 500.
Q.D. = Q3 - Q1 / 2
= 500 – 250/2
m
= 250/2
= 125
)A
e
Variance is defined as the average of squared deviation of data pointsfrom their mean.
in
When the data constitute a sample, the variance is denoted byσ2x and averaging
is done by dividing the sum of the squared deviation from the mean by ‘n – 1’. When
observations constitute the population, the variance is denoted by σ2 and we divide by
N for the average
nl
Different formulas for calculating variance:
n
O
(xi–x)2
i=1
Sample Variance Var (x) = σx2 =
n–1
ity
(xi–µ)2
Population Variance Var (x) = σ2 =
N
Where,
s
x = Sample mean
n = Sample size
er
µ = Population mean
N = Population size
v
Population Variance is,
ni
∑ ( xi − µ ) 2
Var (x) = σ =
2
N
n n n n
∑ ( x 2i − 2 µ xi + µ 2 ) ∑ ( x 2i ) − 2 µ ∑ xi + µ 2 ∑ (1)
U
= =
=i 1 =i 1 =i 1 =i 1
N N
n
∑ x 2i
ity
= i =1
− µ2
N
Var (x) = E(X2)–[E(X)]2
m
Standard deviation
Definition: Standard Deviation is the root mean square deviation of the values
from their arithmetic mean. S.D. is denoted by symbol σ (read sigma). The Standard
)A
Deviation (SD) of a set of data is the positive square root of the variance of the set.
This is also referred as Root Mean Square (RMS.) value of the deviations of the data
points. SD of sample is the square root of the sample variance i.e. equal to σx and the
Standard Deviation of a population is the square root of the variance of the population
and denoted by σ.
(c
e
●● It is the most important and widely used measure of variability.
●● It is based on all the observations.
in
●● Further mathematical treatment is possible.
●● It is affected least by any sampling fluctuations.
nl
●● It is affected by the extreme values and it gives more importance to the values
that are away from the mean.
●● The main limitation is; we cannot compare the variability of different data sets
O
given in different units
ity
2
Ex 2 ∑ x
=σ −
n n
s
If an assumed value A is taken for mean and d = X-A, then
er
2
Ed 2 ∑ d
=σ −
n n
v
For a frequency distribution
ni
2
Efd 2 ∑ fd
σ= − ×C
N N
U
N = Total frequency
Frequency 6 14 10 8 1 3 8
)A
e
Interval Mark mi
in
Feb 15 14 210 -15 225 3150
nl
April 35 8 280 5 25 200
O
June 55 3 165 25 625 1875
ity
Mean = 1500/50 = 30
SD = /19250/50 = 19.62
s
Combined Standard Deviation
er
Standard Deviation of Combined Means
The mean and S.D. of two groups are given in the following table
v
Group Mean S.D. Size
I X1 σ1 n1
ni
II X2 σ2 n2
Let X and σ be the mean and S.D. of teh combined group of (n1 + n2) items. Then
U
n1x+n 2 x 2
X=
n1 +n 2
ity
n1x1 +n 2 x 2 + n 3 x 3
X=
n1 +n 2 + n 3
n1σ 12 +n 2σ 2 2 +n 3σ 32 + n1d12 +n 2 d 2 2 +n 3d 32
(c
σ2 =
n1 +n 2 + n 3
e
It is defined as the ratio of SD and mean, multiplied by 100.
in
CV =σ/ μ×100
This is also called as variability. Smaller value of CV indicates greater stability and
lesser variability.
nl
Example: Two batsmen A and B made the following scores in the preliminary round
of World Cup Series of cricket matches.
O
A 14, 13, 26, 53, 17, 29, 79, 36, 84 and 49
Who will you select for the final? Justify your answer?
ity
Solution: We will first calculate mean, standard deviation and Karl Pearson’s
coefficient of variation. We will select the player based on the average score as well
as consistency. We not only want the player who has been scoring at high average but
also doing it consistently. Thus, the probability of his playing good inning in final is high.
s
For Player ‘A’ (Using Direct Method) er
Score xi Deviation (xi - µ) (xi - µ)2 Σ xi2
53 13 169 2809
79 39 1521 6241
36 -4 16 1296
ity
84 44 1936 7056
49 9 81 2401
Now,
)A
Mean =
10
∑ (xi-µ ) 2 5974
Variance = Var(x) = i −1 = =597.4
(c
N 10
e
Example: Two batsmen A and B made the following scores in the preliminary
round of World Cup Series of cricket matches.
in
A 14, 13, 26, 53, 17, 29, 79, 36, 84 and 49
nl
Who will you select for the final? Justify your answer?
Solution: We will first calculate mean, standard deviation and Karl Pearson’s
O
coefficientof variation. We will select the player based on the average score as well as
consistency. We not only want the player who has been scoring at high average but
also doing it consistently. Thus, the probability of his playing good inning in final is high.
ity
For Player ‘A’ (Using Direct Method)
s
13 -27 729 169
79 39 1521 6241
ni
36 -4 16 1296
84 44 1936 7056
U
49 9 81 2401
Now,
Mean =
m
10
∑ (xi-µ ) 2 5974
Variance = Var(x) = i −1 = =597.4
N 10
)A
Key Terms
(c
●● Sample: A sample consists one or more observations drawn from the population.
Sample is the group of people who actually took part in your research.
e
Population is the broader group of people that you expect to generalize your study
results to.
in
●● Frequency Polygon: These are the frequencies plotted against the mid-points of
the class-intervals and the points thus obtained are joined by line segments
●● Bar Diagram: Only length of the bar is taken into account but not the width. In
nl
other wards bar is a thick line whose width is shown merely, but length of the bar is
taken into account is called one-dimensional diagram.
●● Simple Bar Diagram: It represents only one variable. Since these are of the same
O
width and vary only in lengths (heights), it becomes very easy for comparative
study. Simple bar diagrams are very popular in practice.
●● Percentage bar diagram: the length of the entire bar kept equal to 100
ity
(Hundred). Various segment of each bar may change and represent percentage
on an aggregate.
●● Range: The ‘Range’ of the data is the difference between the largest value of data
and smallest value of data.
s
Check your progress er
1. A frequency polygon is constructed by plotting frequency of the class interval and the
a) Lower limit of the class
b) Upper limit of the class
v
c) Any value of the class
ni
b) Descriptive Statistics
c) Education Statistics
ity
d) Business Statistics
3. A histogram consists of a set of
a) Adjacent triangles
b) Adjacent rectangles
m
c) Parts
d) Groups
e
a) Histogram
b) Pie chart
in
c) Frequency Polygon
d) O give
nl
Questions and Exercises
1. What do you mean by statistics ?
O
2. What are the various type of bar diagrams ?
3. What are the merits of mean median and mode
4. What do you understand by Standard deviation and combined standard deviation
ity
5. Find the standard deviation for the following data:
Frequency 8 14 10 6 4 3 8
s
Check your progress
1. d)
er
Middle limit of the class
2. b) Descriptive Statistics
3. b) Adjacent rectangles
v
4. c) Parts
ni
5. b) Pie chart
Further Readings
U
Bibliography
m
e
7. Gould F J – Introduction to Management Science – Englewood Cliffs N J Prentice Hall.
8. Naray J K, Operation Research, theory and applications – Mc Millan, New Dehi.
in
9. Taha Hamdy, Operations Research, Prentice Hall of India
10. Tulasian: Quantitative Techniques: Pearson Ed.
nl
11. Vohr.N.D. Quantitative Techniques in Management, TMH.
11. Stevenson W.D, Introduction to Management Science, TMH.
O
s ity
v er
ni
U
ity
m
)A
(c
e
Learning Objective:
in
●● To get familiarize with business problems associated with the concept of
probability and probability distributions
nl
●● To understand the MS Excel applications of Binomial, Poisson and Normal
probabilities
O
Learning Outcome:
At the end of the course, the learners will be able to –
ity
●● Understand various theorems and principles of probability
s
2.1.1 Probability – Introduction
er
A probability is the quantitative measure of risk. Statistician I.J. Good suggests, “The
theory of probability is much older than the human species, since the assessment of
uncertainty incorporates the idea of learning from experience, which most creatures do.”
v
Probability and sampling are inseparable parts of statistics. Before we discuss
probability and sampling distributions, we must be familiar with some common terms
ni
used in theory of probability. Although these terms are commonly used in business, they
have precise technical meaning.
outcomes under study is called experiment, for example, sampling from a production lot.
Random experiment is an experiment whose outcome is not predictable in advance. There
is a chance or risk (sometimes also called as uncertainty) associated with each outcome.
ity
Example: If the random experiment is rolling of a die, the sample space is a set, S
= {1, 2, 3, 4, 5, 6}.
m
Similarly, if the random experiment is tossing of three coins, the sample space is, S
= {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT} with total of 8 possible outcomes. (H is
heads, and T is Tails showing up.)
)A
If we select a random sample of 2 items from a production lot and check them for
defect, the sample space will be S = {DD, DS, DR, RS, RR, SS} where D stands for
defective, S stands for serviceable and R stands for re-workable.
(c
●● Event: One or more possible outcomes that belong to certain category of our
interest are called as event. A sub set E of the sample space S is an event. In
other words, an event is a favorable outcome.
e
that usually in probability and statistics; we are interested in number of elements in
sample space and number of elements in event space.
in
●● Union of events: If E and F are two events, then another event defined to include
all outcomes that are either in E or in F or in both is called as a union of events E
and F. It is denoted as E U F.
nl
●● Intersection of events: If E and F are two events, then another event defined to
include all outcomes that are in both E and F is called as an intersection of events
E and F. It is denoted as E∩ F.
O
●● Mutually exclusive events: The events E and F are said to be mutually exclusive
events if they have no outcome of the experiment common to them. In other
words, events E and F are said to be mutually exclusive events if E∩ F = φ, where
φ is a null or empty set.
ity
●● Collectively exhaustive events: The events are collectively exhaustive if their
union is the sample space.
●● Complement of event: Complement of an event E is an event which consists of
s
all outcomes that are not in the E. It is denoted as EC. Thus, E ∩ EC = φ and E U
EC = S er
2.1.2 Types of Events
A probability event can be defined as a set of outcomes of an experiment. In other
v
words, an event in probability is the subset of the respective sample space. A random
experiment ‘s entire potential set of outcomes is the sample space or the individual
ni
For example –
U
The sample space for the tossing of three coins simultaneously is given by:
S = {(T, T, T), (T, T, H), (T, H, T), (T, H, H), (H, T, T), (H, T, H), (H, H, T), (H, H, H)}
ity
Suppose, if we want to find only the outcomes which have at least two heads; then
the set of all such possibilities can be given as:
There could be a lot of events associated with a given sample space. For any
event to occur, the outcome of the experiment must be an element of the set of event E.
)A
Example Events:
e
Events can be:
in
●● Dependent (also called “Conditional”, where an event is affected by other events)
●● Mutually Exclusive (events can’t happen at the same time)
nl
2.1.3 Algebra of Events
Events are the outcome of an experiment. The likelihood of an event occurring is
O
the ratio of number of favourable events to total number of occurrences. Often they
will happen together with two things occurring or it can happen that just one of them is
going to happen. Event algebra can offer an event that performs certain operations over
two given events. The operations are union, intersection, complement and difference of
ity
two events. As events are the subset of sample space, these operations are performed
as set operations.
Complementary Events
s
For an event AA, there is a complimentary event BB such that BB represent the
set of events which are not in the set AA. For example, if two coins are tossed together
er
then the sample space will be {HT,TH,HH,TT}{HT,TH,HH,TT}. Let AA be the event of
getting one head, then the set AA = {HT,TH}{HT,TH}. The complementary events of A,
BA, B = {HH,TT}{HH,TT}.
v
Events with AND
ni
AND stands for the intersection of two sets. An event is the intersection of two
events if it has got the members present in both the event. For example, if a pair of dice
is rolled then the sample space will have 3636 members. Suppose AA is the event of
getting both dice having same members and BB is the event having the sum as 66.
U
AA = {(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)}{(1,1),(2,2),(3,3),(4,4),(5,5),(6,6)}
BB = {(3,3),(1,5),(5,1),(2,4),(4,2)}{(3,3),(1,5),(5,1),(2,4),(4,2)}
ity
AA AND BB = {(3,3)}{(3,3)}
Events with OR
OR stands for union of two sets. An event is called union of two events if it has got
m
members present in either of the sets. For example, if two coins are tossed together the
sample space, SS = {HT,TH,TT,HH}{HT,TH,TT,HH}. Let event AA be the event having
only one head and event BB be the event having two heads.
)A
AA = {HT}{HT}
BB = {HH}{HH}
Union of AA and BB, AA OR BB = {HT,HH}{HT,HH}
For two events AA and BB, AA but not BB is the event having all the elements of AA
but excluding the elements of BB. This can also be represented as AA - BB. Suppose,
there is an experiment of choosing 44 cards from a deck of 5252 cards. The event AA is
Notes
e
having all cards as red cards and event BB is having all cards as king. Then the event
AA but not BB will have all red cards excluding the two kings.
in
2.1.4 Addition Rule of Probability
If one task can be done in n1 ways and other task can be done in n2 ways and
nl
if these tasks cannot be done at the same time, then there are (n1+n2) ways of doing
one of these tasks (either one task or the other). When logical OR is used in deciding
outcomes of the experiment and events are mutually exclusive then the ‘Sum Rule’ is
O
applicable.
1. If ‘A’ and ‘B’ are any two events then the probability of the occurrence of either
ity
‘A’ or ‘B’ is given by:
P (A U B) = P (A) +P (B) – P (A∩B)
2. If ‘A’ and ‘B’ are two mutually exclusive events then the probability of occurrence
of either A or B is given by
s
P (A U B) = P (A) + P (B) er
Example: An urn contains 10 balls of which 5 are white, 3 black and 2 red. If we
select one ball randomly, how many ways are there that the ball is either white or red?
Solution:
v
Answer is 5 + 2 = 7.
ni
Example: In a triangular series the probability of Indian team winning match with
Zimbawe is 0.7 and that with Australia is 0.4. If the probability of India winning both
matches is 0.3, what is the probability that India will win at least one match so that it can
enter the final?
U
Solution:
Now, given that probability of the Indian team winning the match with
ity
Therefore, probability that India will win at least one match is,
= 0.8
Suppose that a procedure can be broken down into a sequence of two tasks. If
there are n1 ways to do first task and n2 ways to do second task after the first task
has been done. Then there are (n1 × n2) ways to do the procedure. In general, if r
Notes
e
experiments are to be performed are such that the first outcome can be in n1 ways,
having completed the first experiment the second experiment outcome can be in n2,
in
then similarly outcome of the third experiment can be in n3 ways, and so on. Then there
is a total of n1 × n2 × n3 ×…× nr possible outcomes of the r experiments.
nl
If ‘A’ and ‘B’ are two independent events then the probability of occurrence of ‘A’
and ‘B’ is given by:
O
P (A∩B) = P (A) P (B)
It must be remembered that when the logical AND is used to indicate successive
experiments then, the ‘Product Rule’ is applicable.
ity
Example: How many outcomes are there if we toss a coin and then throw a dice?
Answer is 2 × 6 = 12.
Example: It has been found that 80% of all tourists who visit India visit Delhi, 70%
s
of them visit Mumbai and 60% of them visit both.
1. What is the probability that a tourist will visit at least one city?
er
2. Also, find the probability that he will visit neither city.
Solution:
v
Let D indicate visit to Delhi and M denote visit to Mumbai.
Given, P (D) = 0.8, P (M) = 0.7 and P (D ∩M) = 0.6
ni
probability and denoted by P(E F) . If event F occurs, then our sample space is reduced
to the event space of F. Also now for event E to occur, we must have both events E and
F occur simultaneously. Hence probability that event E occurs, given that event F has
occurred, is equal to the probability of EF (that is E ∩ F) relative to the probability of F.
)A
Thus,
P( EF )
P( E | F ) =
P( F )
Another variation of conditional probability rule is
(c
P( EF ) P( E | F ) × P( F )
=
Conditional probability satisfies all the properties and axioms of probabilities. Now
Notes
e
onwards, we would write (E ∩ F) as EF, which is a common convention.
Conditional probability is the probability that an event will occur given that another
in
event has already occurred. If A and B are two events, then the conditional probability of
A given B is written as P (A/B) and read as “the probability of A given that B has already
occurred.”
nl
Example: The probability that a new product will be successful if a competitor
does not launch a similar product is 0.67. The probability that a new product will be
successful in the presence of a competitor’s new product is 0.42. The probability that
O
the competitor will launch a new product is 0.35. What is the probability that the product
will be success?
Solution: Let S denote that the product is successful, L denote competitor will
ity
launch a product and LC denotes competitor will not launch the product. Now, from
given data,
s
Now, using conditional probability formula, probability that the product will be
er
success P(S) is,
Consider two events, E and F. whatsoever be the events, we can always say
that the probability of E is equal to the probability of intersection of E and F, plus, the
U
P (E) = P (E ∩ F) + P (E ∩ F ∩ C)
Baye’s Formula
ity
and (E FC) are mutually exclusive, since former must be in F and latter must not in F,
we have by Axiom 3,
Suppose now that E has occurred and we are interested in determining the
probability of Fi has occurred, then using above equations, we have following
proposition.
(c
P( EFi ) P( E | Fi ) × P( Fi )
P(=
Fi | E ) = n for all i = 1,2...n
P( E ) ∑ P( E | Fi ) × P( Fi )
i =1
Amity Directorate of Distance & Online Education
38 Statistics Management
e
possible ‘hypothesis’ about proportionality of some subject matter, say market shares of
a competitors, then Baye’s’ formula gives us how these should be modified by the new
in
evidence of the experiment, says a market survey.
Example: A bin contains 3 different types of lamps. The probability that a type 1
lamp will give over 100 hours of use is 0.7, with the corresponding probabilities for type
nl
2 and 3 lamps being 0.4 and 0.3 respectively. Suppose that 20 per cent of the lamps in
the bin are of type 1, 30 per cent are of type 2 and 50 per cent are of type 3. What is the
probability that a randomly selected lamp will last more than 100 hours? Given that a
selected lamp lasted more than 100 hours, what are the conditional probabilities that it
O
is of type 1, type 2 and type 3?
Solution: Let type 1, type 2 and type 3 lamps be denoted by T1, T2 and T3
respectively. Also, we denote S if a lamp lasts more than 100 hours and SC if it does
ity
not. Now, as per given data,
s
Now, using conditional probability formula,
er
= P(S1) = P(S|T1 )P(T1 ) P(S|T2 )P(T2 ) P(S|T3 )P(T3)
= 0.41
v
(b) Now, using Bayes’ formula
ni
P=
(T3 | S ) = = 0.366
P( S ) 0.41
Random variables are often classified according to the probability mass function in
case of discrete, and probability density function in case of continuous random variable.
When the distributions are entirely known, all the statistical calculations are possible. In
)A
practice, however, the distributions may not be known fully. But it can be approximated
that the random variable to one of the known types of standard random variables by
examining the processes that make it random. These standard distributions are also
called ‘probability models’ or sample distributions. Various characteristics of distribution
like mean, variance, moments, etc. can be calculated using known closed formulae. We
(c
will study some of the common types of probability distributions. The normal distribution is
the backbone of statistical inference and hence we will study it in more detail.
There are broadly four theoretical distributions which are generally applied in
Notes
e
practice. They are:
1. Bernoulli distribution
in
2. Binomial distribution
3. Poisson distribution
nl
4. Normal distribution
O
In probability theory, the expected value of a random variable is a generalization
of the weighted average and intuitively is the arithmetic mean of a large number of
independent realizations of that variable. The expected value is also known as the
expectation, mathematical expectation, mean, average, or first moment.
ity
A Random Variable is a set of possible values from a random experiment. The
mean of a discrete random variable X is a weighted average of the possible values
that the random variable can take. Unlike the sample mean of a group of observations,
which gives each observation equal weight, the mean of a random variable weights
s
each outcome xi according to its probability, pi. The common symbol for the mean (also
known as the expected value of X) is u.
er
It is defined as –
The formula changes slightly according to what kinds of events are happening. For
most simple events, either the Expected Value formula of a Binomial Random Variable
or the Expected Value formula for Multiple Events is used.
U
spread out the values of X are, given how likely each value is to be observed.
Variance: Var(X)
Var(X) = Σx2p − μ2
σ = √Var(X)
Notes
e
2.2.4 Binomial Distribution - Introduction
in
Usually we often conduct many trials, which are independent and identical.
Suppose we perform n independent Bernoulli trials (each with two possible outcomes
and probability of success p) each of which results in a success with probability p and
nl
probability of failure (1 – p). If random variable X represents the number of successes
that occur in n trials (order of successes not important), then X is said to be a Binomial
random variable with parameters (n, p).
O
Note that Bernoulli random variable is a Binomial random variable with parameter
(1, p) i.e. n = 1. The probability mass function of a binomial random variable with
parameters (n, p) is given by,
ity
P(X = i) = (1 – p)n – 1 for i = 0, 1, 2, ....., n
μ = E[X] = np
s
Var = [X] = np(1 – p)
er
2.2.5 Binomial Distribution - Application
When to use binomial distribution is an important decision. Binomial distribution
can be used when following conditions are satisfied:
v
●● Trials are finite (and not very large), performed repeatedly for ‘n’ times.
ni
●● Each trial (random experiment) should be a Bernoulli trial, the one that results
in either success or failure.
●● Probability of success in any trial is ‘p’ and is constant for each trial.
U
Following are some of the real life examples of applications of binomial distribution.
burning is less than two. What is the probability that the class room is unusable on a
random occasion?
1
Solution: This a case of binomial distribution with n = 5 and p = Notes
e
3
Class room is unusable if the number of burnouts is 4 or 5. That is i = 4 or 5. Noting
in
that,
n
P( X =+ i ) ( P)i (1 − P) n −i
4) P( X ==
i
nl
Thus, the probability that the class room is unusable on a random occasion is,
O
4 5 0
5 1 2 5 1 2
P( X =4) + P( X =
5) =
4 3 3 5 3 3 =
+ 0.0412 + 0.00412 =
0.04532
Example: It is observed that 80% of T.V. vuewers watch Aap Ki Adalat programme.
ity
What is the probability that at least 80% of the viewers in a random sample of 5 watch
this programme?
Solution: This is the case of binomial distribution with n = 5 and p = 0.8. Also i = 4
or 5.
s
Probability of at least 80% of the viewers in a random sample of 5 watches this
programme.
5 4 1
er 5 5 0
P ( X > 4) + P ( X =4) + P ( X =
5) =
4 (0.8) (0.2) + 5 (0.8) (0.2) =0.4096 + 0.3277
−
= 0.7373
v
We must remember that a cumulative binomial probability refers to the probability
that the binomial random variable falls within a specified range (e.g., is greater than or
ni
equal to a stated lower limit and less than or equal to a stated upper limit).
P(X = i) is a probability mass function (p.m.f.) of the Poisson random variable. Its
expected value and variance are,
μ = E[X] = λ
m
Var(X) = λ
Poisson random variable has wide range of applications. It can also be used as
)A
e
4. Number of arrivals of calls on telephone exchange per minute.
5. Number of interrupts per second on a server.
in
2.2.7 Poisson Distribution-Application
nl
Procedure for Using Cumulative Poisson Probabilities Table
Poisson p.m.f. for given l and i can be easily calculated using scientific calculators.
But while calculating cumulative probabilities i.e., ‘c.d.f.’, manual calculations become
O
too tedious. In such cases, we can use the Cumulative Poisson Probabilities.
Cumulative
ity
●● To find cumulative binomial probability for given n, i and p
●● Looking at the given value of l i.e., average rate in the first column of the
table.
●● In first row look for the value of i, the number of successes.
s
●● Locate the cell in the column of i value and row of l value. The contained in
er
this cell is the value of cumulative Poisson probability.
Example: Average number of accidents on express way is five per week. Find the
probability of exactly two accidents that would take place in a given week. Also find the
probability of at the most two accidents that will take place in next week.
v
Solution:
ni
Method I
We can read for n=10, p=0.1 and i=1, the cumulative probability as 0.7361.
ity
Method II
P{X<1}=p(0) + p(1) = [e-1 (/) 0] / 0! + [e-1 (/) 1]/1! = e-1 + e-1 = 0.7358
Or, Using Cumulative Poisson Probabilities Table
)A
We can read for /=1, and i=1 the cumulative probability as 0.7358.
Someone arrives just ahead of you. Find the probability that you will have to wait
(c
e
Solution:
in
P {X > 1} = 1 – F (1) = e-4 = 0.0183
P {X < 0.5} = F (0.5) = 1 - e-2
nl
= 1 - 0.1353
= 0.8647
O
2.2.8 Normal Distribution- Introduction including empirical rule
Normal random variable and its distribution is commonly used in many business
and engineering problems. Many other distributions like binomial, Poisson, beta, chi-
ity
square, students, exponential, etc., could also be approximated to normal distribution
under specific conditions. (Usually when sample size is large.)
If random variable is affected by many independent causes, and the effect of each
cause is not significantly large as compared to other effects, then the random variable
s
will closely follow the normal distribution, e.g., weights of coffee filled in packs, lengths
of nails manufactured on a machine, hardness of ball bearing surface, diameters of
er
shafts produced on lathe, effectiveness of training programme on the employees’
productivity, etc., are examples of normally distributed random variables.
Further, many sampling statistics, e.g., sample means X bar, are normally
v
distributed.
ni
Empirical Rule
The empirical rule, also referred to as the three-sigma rule is a statistical rule
which states that for a normal distribution, almost all observed data will fall within three
U
The empirical rule states that for a normal distribution, nearly all of the data will fall
ity
within three standard deviations of the mean. The empirical rule can be broken down
into three parts:
●● 68% of data falls within the first standard deviation from the mean.
●● 95% fall within two standard deviations.
m
The Empirical Rule is often used in statistics for forecasting, especially when
)A
obtaining the right data is difficult or impossible to get. The rule can give you a rough
estimate of what your data collection might look like if you were able to survey the entire
population.
( x − µ )2
1
=f ( x) e 2σ 2
Where, - ∞ < X < ∞
σ 2π
Amity Directorate of Distance & Online Education
44 Statistics Management
e
theoretical base to the observation that, in practice, many random phenomena obey
approximately, a normal probability distribution. Mean of normal random variable is E(X)
in
= μ and variance of normal random variable is Var(X) σ2. If X is normally distributed
with parameters μ and σ, then another random variable is also normally distributed with
parameters (aμ + b) and (a σ).
nl
Properties of Normal Distribution
1. It is perfectly symmetric about the mean μ.
O
2. For a normal distribution mean = median = mode.
3. It is uni-modal (one mode), with skewness = 0 and kurtosis = 0.
4. Normal distribution is a limiting form of binomial distribution when number trials n is
ity
large, and neither the probability p nor (1-p) is very small.
5. Normal distribution is a limiting case of Poisson distribution when mean μ = λ is very
large.
6. While working on probability of normal distribution we usually use normal distribution
s
(more often standard normal distribution) tables.
While reading these tables, properties are:
er
(a) The probability that a normally distributed random variable with mean μ and
variance σ² lies between two specified values a and b is P (a < X < b) = area
under the curve P(x) between the specified values X = a and X = b.
v
(b) Total area under the curve P (x) is equal to 1 in which 0.5 lies on either side of
the mean.
ni
transformation,
m
z
a
1 a
F (a) =∫ f ( x)dx =
∫ e 2 dz
−∞
2π
This has been calculated for various values of ‘a’ and tabulated. Also, we know that,
Example: Tea is filled in the packs of 200 gm by a machine with variability of 0.25
gms. Packs weighing less than 200 gm would be rejected by customers and not legally
Notes
e
acceptable. Therefore, marketing and legal department requests production manager to
set the machine to fill slightly more quantity in each pack. However, finance department
in
objects to this since it would lead to financial loss due to overfilling the packs. The
general manager wants to know the 99% confidence interval, when the machine is set
at 200gms, so that he can take a decision. Find confidence interval. What is your advice
to the production manger?
nl
Solution:
O
We know that the mean μ = 200 gm and variance σ² = 0.25 gms i.e. σ = 0.5 gm.
First, we find the value of z for 99% confidence. Standard Normal Distribution curve
is symmetric about mean.
ity
Hence, corresponding to 99% confidence, half area under the curve
= 0.99/2
= 0.495.
s
Value z corresponding to probability 0.495 is 2.575. Thus, the 99% confidence
er
interval in terms of variable z is ± 2.575 which in terms of variable x is, 200 ±1.2875 or
(198.71 to 201.29).
meeting legal requirement and at the same time to keep the cost of excess filling of the
coffee to minimum.
U
Key Terms
●● Probability: Probability of a given event is an expression of likelihood or chance
of occurrence of an event. A probability is a number which rages from zero to one.
ity
●● Sample: A sample is that part of the universe which the select for the purpose
of investigation. A sample exhibits the characteristics of the universe. The word
sample literally means small universe.
●● Sampling: Sampling is defined as the selection of some part of an aggregate
or totality on the basis of which a judgment or inference about the aggregate or
(c
totality is made. Sampling is the process of learning about the population on the
basis of a sample drawn from it.
e
of defined target population into different groups called strata and the selection of
sample from each stratum.
in
●● Cluster sampling: Cluster sampling is a probability sampling method in which
the sampling units are divided into mutually exclusive and collectively exhaustive
subpopulation called clusters.
nl
●● Hypothesis testing: Hypothesis testing refers to the formal procedures used by
statisticians to accept or reject statistical hypotheses. It is an assumption about a
population parameter. This assumption may or may not be true.
O
Check your progress
1. In probability theories, events which can never occur together are classified as
ity
a. Collectively exclusive events
b. Mutually exhaustive events
c. Mutually exclusive events
d. Collectively exhaustive events
s
2. Value which is used to measure distance between mean and random variable x in
terms of standard deviation is called
er
a. Z-value
b. Variance
v
c. Probability of x
d. Density function of x
ni
b. Z
c. Rank
d. None of these
ity
c. Originality
d. Convenience
)A
5. Probability of second event in situation if first event has been occurred is classified
as
a. Series probability
b. Conditional probability
(c
c. Joint probability
d. Dependent probability
e
1. What is probability? What do you mean by probability distributions?
2. What is normal distribution ? What are the merits of normal distribution
in
3. What is Hypothesis Testing?
4. What do you mean by t-test and z test ?
nl
5. Explain Poisson Distribution and its Application
O
1. c) Mutually exclusive events
2. a) Z value
3. a) T test
ity
4. d) Convenience
5. b) Conditional probability
Further Readings
s
1. Richard I. Levin, David S. Rubin, Sanjay Rastogi Masood Husain Siddiqui,
er
Statistics for Management, Pearson Education, 7th Edition, 2016.
2. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An
v
Introduction to Statistical Learning with Applications in R, Springer, 2016.
ni
Bibliography
1. Srivastava V. K. etal – Quantitative Techniques for Managerial Decision Making,
Wiley Eastern Ltd
U
e
and Estimation
in
Learning Objective:
●● To understand the basic concepts of sampling distribution and estimation
nl
techniques
●● To get familiarize with MS Excel for confidence interval construction
O
Learning Outcome:
At the end of the course, the learners will be able to –
ity
queries
●● Understand the purpose and need of sampling.
s
Sampling is an important concept which is practiced in every activity. Sampling
involves selecting a relatively small number of elements from a large defined group
er
of elements and expecting that the information gathered from the small group will
allow judgments to be made about the large group. The basic idea of sampling is that
by selecting some of the elements in a population, the conclusion about the entire
v
population is drawn. Sampling is used when conducting census is impossible or
unreasonable.
ni
Meaning of Sampling
Sampling is defined as the selection of some part of an aggregate or totality on
U
the basis of which a judgment or inference about the aggregate or totality is made.
Sampling is the process of learning about the population on the basis of a sample
drawn from it.
ity
Purpose of Sampling
There are several reasons for sampling. They are explained below:
1. Lower cost: The cost of conducting a study based on a sample is much lesser
than the cost of conducting the census study.
m
e
materials. Sampling is the only process possible if the population is infinite.
in
The sampling technique has the following good features of value and significance:
nl
requires much less physical resources as well as time than the census technique.
2. Reliability: In sampling technique, if due diligence is exercised in the choice of
sample unit and if the research topic is homogenous then the sample survey
O
can have almost the same reliability as that of census survey.
3. Detailed Study: An intensive and detailed study of sample units can be done
since their number is fairly small. Also multiple approaches can be applied to a
ity
sample for an intensive analysis.
4. Scientific Base: As mentioned earlier this technique is of scientific nature as
the underlined theory is based on principle of statistics.
5. Greater Suitability in most Situations: It has a wide applicability in most
s
situations as the examination of few sample units normally suffices.
6. Accuracy: The accuracy is determined by the extent to which bias is eliminated
er
from the sampling. When the sample elements are drawn properly some
sample elements underestimates the population values being studied and
others overestimate them.
v
Essentials of Sampling
In order to reach a clear conclusion, the sampling should possess the following
ni
essentials:
efficiency and at the same time the cost is more. A proper size of sample is
maintained in order to have optimized results in terms of cost and efficiency.
The sampling design can be broadly grouped on two basis viz., representation
and element selection. Representation refers to the selection of members on a
probability or by other means. Element selection refers to the manner in which the
elements are selected individually and directly from the population. If each element is
(c
e
selection is -
in
Probability Sampling
Probability sampling is where each sampling unit in the defined target population
has a known non-zero probability of being selected in the sample. The actual probability
nl
of selection for each sampling unit may or may not be equal depending on the type
of probability sampling design used. Specific rules for selecting members from the
operational population are made to ensure unbiased selection of the sampling units and
proper sample representation of the defined target population. The results obtained by
O
using probability sampling designs can be generalized to the target population within a
specified margin of error.
Probability samples are characterised by the fact that, the sampling units are
ity
selected by chance. In such a case, each member of the population has a known,
non- zero probability of being selected. However, it may not be true that all samples
would have the same probability of selection, but it is possible to say the probability
of selecting any particular sample of a given size. It is possible that one can calculate
s
the probability that any given population element would be included in the sample. This
requires a precise definition of the target population as well as the sampling frame.
er
Probability sampling techniques differ in terms of sampling efficiency which is a
concept that refers to trade off between sampling cost and precision. Precision refers to
the level of uncertainty about the characteristics being measured. Precision is inversely
related to sampling errors but directly related to cost. The greater the precision, the
v
greater the cost and there should be a trade-off between sampling cost and precision.
The researcher is required to design the most efficient sampling design in order to
ni
being selected.
In the unrestricted probability sampling design every element in the population
has a known, equal non-zero chance of being selected as a subject. For example, if
)A
Each employee would have a 10/30 or .333 chance of being randomly selected
in a drawn sample. When the defined target population consists of a larger number
of sampling units, a more sophisticated method can be used to randomly draw the
Notes
e
necessary sample. A table of random numbers can be used for this purpose. The table
of random numbers contains a list of randomly generated numbers. The numbers
in
can be randomly generated through the computer programs also. Using the random
numbers the sample can be selected.
nl
The simple random sampling technique can be easily understood and the survey
result can be generalized to the defined target population with a pre specified margin
of error. It also enables the researcher to gain unbiased estimates of the population’s
O
characteristics. The method guarantees that every sampling unit of the population
has a known and equal chance of being selected, irrespective of the actual size of the
sample resulting in a valid representation of the defined target population.
ity
The major drawback of the simple random sampling is the difficulty of obtaining
complete, current and accurate listing of the target population elements. Simple
random sampling process requires all sampling units to be identified which would be
cumbersome and expensive in case of a large population. Hence, this method is most
suitable for a small population.
s
er
The systematic random sampling design is similar to simple random sampling but
requires that the defined target population should be selected in some way. It involves
drawing every nth element in the population starting with a randomly chosen element
v
between 1 and n. In other words individual sampling units are selected according their
position using a skip interval. The skip interval is determined by dividing the sample size
ni
into population size. For example, if the researcher wants a sample of 100 to be drawn
from a defined target population of 1000, the skip interval would be 10(1000/100). Once
the skip interval is calculated, the researcher would randomly select a starting point and
U
take every 10th until the entire target population is proceeded through. The steps to be
followed in a systematic sampling method are enumerated below:
1. It is important that the natural order of the defined target population list be
unrelated to the characteristic being studied.
)A
2. Skip interval should not correspond to the systematic change in the target
population.
there is no need to number the entries in a large personnel file before drawing a
sample. The availability of lists and shorter time required to draw a sample compared
Notes
e
to random sampling makes systematic sampling an attractive, economical method for
researchers.
in
The greatest weakness of systematic random sampling is the potential for the
hidden patterns in the data that are not found by the researcher. This could result in
a sample not truly representative of the target population. Another difficulty is that the
nl
researcher must know exactly how many sampling units make up the defined target
population. In situations where the target population is extremely large or unknown,
identifying the true number of units is difficult and the estimates may not be accurate.
O
Stratified Random Sampling
Stratified random sampling requires the separation of defined target population into
different groups called strata and the selection of sample from each stratum. Stratified
ity
random sampling is very useful when the divisions of target population are skewed
or when extremes are present in the probability distribution of the target population
elements of interest. The goal in stratification is to minimize the variability within each
stratum and maximize the difference between strata. The ideal stratification would be
s
based on the primary variable under study. Researchers often have several important
variables about which they want to draw conclusions.
er
A reasonable approach is to identify some basis for stratification that correlates
well with other major variables. It might be a single variable like age, income etc. or
a compound variable like on the basis of income and gender. Stratification leads to
segmenting the population into smaller, more homogeneous sets of elements. In order
v
to ensure that the sample maintains the required precision in terms of representing
the total population, representative samples must be drawn from each of the smaller
ni
population groups.
sample:
Cluster Sampling
m
members within each group are chosen for study in cluster sampling. Several groups
with intragroup heterogeneity and intergroup homogeneity are found. A random
sampling of the clusters or groups is done and information is gathered from each of
the members in the randomly chosen clusters. Cluster sampling offers more of
heterogeneity within groups and more homogeneity among the groups.
(c
e
In single stage cluster sampling, the population is divided into convenient clusters
and required number of clusters are randomly chosen as sample subjects. Each
in
element in each of the randomly chosen cluster is investigated in the study. Cluster
sampling can also be done in several stages which is known as multistage cluster
sampling. For example: To study the banking behaviour of customers in a national
nl
survey, cluster sampling can be used to select the urban, semi-urban and rural
geographical locations of the study. At the next stage, particular areas in each of the
location would be chosen. At the third stage, the banks within each area would be
chosen.
O
Thus multi-stage sampling involves a probability sampling of the primary sampling
units; from each of the primary units, a probability sampling of the secondary sampling
units is drawn; a third level of probability sampling is done from each of these
ity
secondary units, and so on until the final stage of breakdown for the sample units are
arrived at, where every member of the unit will be a sample.
s
The cluster sampling method is widely used due to its overall cost-effectiveness
and feasibility of implementation. In many situations the only reliable sampling unit
er
frame available to researchers and representative of the defined target population,
is one that describes and lists clusters. The list of geographical regions, telephone
exchanges, or blocks of residential dwelling can normally be easily compiled than
the list of all the individual sampling units making up the target population. Clustering
v
method is a cost efficient way of sampling and collecting raw data from a defined target
population.
ni
intra- cluster heterogeneity and inter-cluster homogeneity are often not met. For these
reasons this method is not practiced often.
ity
Area Sampling
Area sampling is a form of cluster sampling in which the clusters are formed by
geographic designations. For example, state, district, city, town etc., Area sampling is
a form of cluster sampling in which any geographic unit with identifiable boundaries
m
can be used. Area sampling is less expensive than most other probability designs and
is not dependent on population frame. A city map showing blocks of the city would be
adequate information to allow a researcher to take a sample of the blocks and obtain
data from the residents therein.
)A
Sequential/Multiphase Sampling
This is also called Double Sampling. Double sampling is opted when further
information is needed from a subset of groups from which some information has already
(c
been collected for the same study. It is called as double sampling because initially a
sample is used in the study to collect some preliminary information of interest and later
a sub-sample of this primary sample is used to examine the matter in more detail The
process includes collecting data from a sample using a previously defined technique.
Notes
e
Based on this information, a sub sample is selected for further study. It is more
convenient and economical to collect some information by sampling and then use this
in
information as the basis for selecting a sub sample for further study.
nl
When the case of cluster sampling units does not have exactly or approximately
the same number of elements, it is better for the researcher to adopt a random
selection process, where the probability of inclusion of each cluster in the sample
tends to be proportional to the size of the cluster. For this, the number of elements
O
in each cluster has to be listed, irrespective of the method used for ordering it. Then
the researcher should systematically pick the required number of elements from the
cumulative totals. The actual numbers thus chosen would not however reflect the
individual elements, but would indicate as to which cluster and how many from them are
ity
to be chosen by using simple random sampling or systematic sampling. The outcome
of such sampling is equivalent to that of simple random sample. This method is also
less cumbersome and is also relatively less expensive.
s
Non-probability Sampling
In non probability sampling method, the elements in the population do not have any
er
probabilities attached to being chosen as sample subjects. This means that the findings
of the study cannot be generalized to the population. However, at times the researcher
may be less concerned about generalizability and the purpose may be just to obtain
v
some preliminary information in a quick and inexpensive way. Sometimes when the
population size is unknown, then non probability sampling would be the only way to
ni
obtain data. Some non-probability sampling techniques may be more dependable than
others and could often lead to important information with regard to the population.
Convenience Sampling
U
have the freedom to choose as samples whomever they find, thus it is named as
convenience. It is mostly used during the exploratory phase of a research project
and it is the best way of getting some basic information quickly and efficiently. The
assumption is that the target population is homogeneous and the individuals selected
as samples are similar to the overall defined target population with regard to the
m
in a relatively short time. This is one of the main reasons for using convenient
sampling in the early stages of research. However the major drawback is that the
e
measurements can have a serious negative impact on the overall reliability and validity
of those measures and instruments used to collect raw data. Another major drawback is
in
that the raw data and results are not generalizable to the defined target population with
any measure of precision. It is not possible to measure the representativeness of the
sample, because sampling error estimates cannot be accurately determined.
nl
Judgment Sampling
Judgment sampling is a non-probability sampling method in which participants
are selected according to an experienced individual’s belief that they will meet the
O
requirements of the study. The researcher selects sample members who conform to
some criterion. It is appropriate in the early stages of an exploratory study and involves
the choice of subjects who are most advantageously placed or in the best position to
provide the information required. This is used when a limited number or category of
ity
people have the information that are being sought. The underlying assumption is that
the researcher’s belief that the opinions of a group of perceived experts on the topic of
interest are representative of the entire target population.
s
If the judgment of the researcher or expert is correct then the sample generated
from the judgment sampling will be much better than one generated by convenience
er
sampling. However, as in the case of all non-probability sampling methods, the
representativeness of the sample cannot be measured. The raw data and information
collected through judgment sampling provides only a preliminary insight
v
Quota Sampling
ni
An inherent limitation of quota sampling is that the success of the study will be
dependent on subjective decisions made by the researchers. As a non-probability
method, it is incapable of measuring true representativeness of the sample or accuracy
of the estimate obtained. Therefore, attempts to generalize the data results beyond
(c
those respondents who were sampled and interviewed become very questionable and
may misrepresent the given target population.
Snowball Sampling
Notes
e
Snowball sampling is a non-probability sampling method in which a set of
respondents are chosen who help the researcher to identify additional respondents to
in
be included in the study. This method of sampling is also called as referral sampling
because one respondent refers other potential respondents. This method involves
probability and non-probability methods. The initial respondents are chosen by a
nl
random method and the subsequent respondents are chosen by non-probability
methods. Snowball sampling is typically used in research situations where the defined
target population is very small and unique and compiling a complete list of sampling
units is a nearly impossible task. This technique is widely used in academic research.
O
While the traditional probability and other non-probability sampling methods would
normally require an extreme search effort to qualify a sufficient number of prospective
respondents, the snowball method would yield better result at a much lower cost. The
ity
researcher has to identify and interview one qualified respondent and then solicit his
help to identify other respondents with similar characteristics.
s
are small in number, hard to reach and uniquely defined target population. It is most
useful in qualitative research practices. Reduced sample size and costs are the primary
er
advantage of this sampling method. The major drawback is that the chance of bias is
higher. If there is a significant difference between people who are identified through
snowball sampling and others who are not then, it may give rise to problems. The
v
results cannot be generalized to members of larger defined target population.
ni
in the sample do not represent the results that would be obtained from the entire
population.
●● Regardless of the fact that the sample is not representative of the population or
ity
skewed in any way, a sampling error is a difference in sampled value versus true
population value.
●● Also randomized samples may have some sampling error, since it is just a
population estimate from which it is derived.
m
●● Sampling errors can be eliminated when the sample size is increased and also
by ensuring that the sample adequately represents the entire population. For
example, ABC Company provides a subscription-based service that allows
)A
consumers to pay a monthly fee to stream videos and other programming over the
web.
A non-sampling error is a statistical term referring to an error resulting from data
collection, which causes the data to differ from the true values. A non-sampling error is
different from that of a sampling error.
(c
●● A non-sampling error refers to either random or systematic errors, and these errors
can be challenging to spot in a survey, sample, or census.
Amity Directorate of Distance & Online Education
Statistics Management 57
e
because systematic errors may result in the study, survey or census having to be
scrapped.
in
●● The higher the number of errors, the less reliable the information.
●● When non-sampling errors occur, the rate of bias in a study or survey goes up.
nl
3.1.4 Central Limit Theorem
In the study of probability theory, the central limit theorem (CLT) states that the
distribution of sample approximates a normal distribution also known as a “bell curve.
O
As the sample size becomes larger, it assumed that all samples are identical in size,
and regardless of the shape of the population distribution.
It is a statistical theory stating that, given a sufficiently large sample size from a
ity
population with a finite degree of variance, the mean of all samples from the same
population would be approximately equal to the average. Furthermore, all the samples
will follow an approximate normal distribution pattern, with all variances being
approximately equal to the variance of the population, divided by each sample’s size
s
●● The central limit theorem (CLT) states that the distribution of sample means
approximates a normal distribution as the sample size gets larger.
●●
er
Sample sizes equal to or greater than 30 are considered sufficient for the theorem
to hold.
●● A key aspect of the theorem is that the average of the sample means and standard
v
deviations will equal the population mean and standard deviation.
ni
A sample is that part of the universe which the select for the purpose of
investigation. A sample exhibits the characteristics of the universe. The word sample
literally means small universe. For example, suppose the microchips produced in
ity
a factory are to be tested. The aggregate of all such items is universe, but it is not
possible to test every item. So in such a case, a part of the universe is taken and then
tested. Now this quantity extracted for testing is known as sample.
If we take certain number of samples and for each sample compute various
m
statistical measures such as mean, standard deviation etc. then we can find out that
each sample may give its own value for statistics under consideration. All such values
of a particular statics, say, mean together with their relative frequencies will constitute
)A
random samples of a given size n are taken from a population of values for a
categorical variable, where the proportion in the category of interest is p, then the mean
of all sample proportions (p-hat) is the population proportion (p).
The theory dictates the behavior much more precisely than saying that there is
Notes
e
less spread for larger samples as regards the spread of all sample proportions. The
standard deviation of all sample proportions is generally directly related to the sample
size, n as shown below
in
p (1 − p )
The standard deviation of all sample proportion (p ) is exactly
n
nl
Given that the sample size n appears in the square root denominator, the standard
deviation decreases as the sample size increases. Eventually, the p-hat distribution
form should be reasonably normal as long as the sample size n is sufficiently high. The
convention specifies that np and n(1 – p) should be at least 10
O
p is normally distributed with a mean of μp = p
p (1 − p )
and a standard deviation σp =
ity
n
as long as np > 10 and n(1-p) > 10
s
Let x be a random variable with probability density function (or probability mass
function) er
f(X ; θ1 , θ2 , .... θk), where θ1 , θ2 , .... θk are the k parameters of the population.
It should be noted here that there can be several estimators of a parameter, e.g.,
we can have any of the sample mean, median, mode, geometric mean, harmonic
U
1 1
s= ∑ (x i -x) 2 or s =∑ (x i -x) 2
n n −1
ity
population proportions.
e
available information is in the form of a random sample x1,x2,...,xnx1,x2,...,xn of size
nn drawn from the population. We formulate a function of the sample observation
in
x1,x2,...,xnx1,x2,...,xn. The estimator of θθ is denoted by θ^θ^. The different random
sample provides different values of the statistics θ^θ^. Thus θ^θ^ is a random variable
with its own sampling probability distribution.
nl
●● Interval estimate. An interval estimate is defined by two numbers, between
which a population parameter is said to lie. For example, a < x < b is an
interval estimate of the population mean μ. It indicates that the population
mean is greater than a but less than b.
O
This range of values used to estimate a population parameter is known as
interval estimate or estimate by a confidence interval, and is defined by two numbers,
between which a population parameter is expected to lie. For example, a<x¯<ba<x¯<b
ity
is an interval estimate of the population mean μ, indicating that the population mean
is greater than aa but less than bb. The purpose of an interval estimate is to provide
information about how close the point estimate is to the true parameter.
s
3.1.9 Using z Statistic for Estimating Population Mean
The estimation of a population mean given a random sample is a very common
er
task. If the population standard deviation (σσ) is known, the construction of a
confidence interval for the population mean (μ) is based on the normally distributed
sampling distribution of the sample means
v
The 100(1−α)%100 the confidence interval for μ is given by
CI : x ± z *α /2 × σ x
ni
σ
Where σ x =
n
U
The value of z*α/2 corresponds to the critical value and is obtained from the
standard normal table or computed with the qnorm() function in R. The critical value is
a quantity that is related to the desired level of confidence. Typical values for z*α/2zα/2*
ity
are 1.64, 1.96, and 2.58, corresponding to a confidence level of 90%, 95% and 99%.
This critical value is multiplied with the standard error, given by σx¯σx¯, in order to
widen or narrowing the margin of error.
The standard error (σx¯) is given by the ratio of the standard deviation of the
m
population (σ) and the square root of the sample size nn. It describes the degree to
which the computed sample statistic may be expected to differ from one sample to
another. The product of the critical value and the standard error is called the margin of
)A
error. It is the quantity that is subtracted from and added to the value of x¯ to obtain the
confidence interval for μ.
set of sample data. The common notation for the parameter in question is θ. Often, this
Notes
e
parameter is the population mean μ, which is estimated through the sample mean X .
The level C of a confidence interval gives the probability that the interval produced
in
by the method employed includes the true value of the parameter θ.
In many situations, the value of σ is unknown, thus it is estimated with the sample
standard deviation, s; and/or the sample size is small (less than 30), and it is unsure
nl
as to where data came from a normal distribution. (In the latter case, the Central
Limit Theorem can’t be used.) In either situation, the z*-value can not be used from
the standard normal (Z-) distribution as a critical value anymore. It is essential to use a
O
larger critical value than that, because of not knowing the data quantity.
The formula for a confidence interval for one population mean in this case is
− s
X ± t *n −1 , where t*n-1
ity
n
is the critical t*-value from the t-distribution with n-1 degrees of freedom (where n is
the sample size).
s
Estimating population mean using t Statistic
er
A statistical examination of two population means. A two-sample t-test examines
whether two samples are different and is commonly used when the variances of two
normal distributions are unknown and when an experiment uses a small sample size.
v
x–m
Formula: t = s
ni
When the standard deviation of the sample is substituted for the standard deviation
of the population, the statistic does not have a normal distribution; it has what is called
ity
the t-distribution. Because there is a different t-distribution for each sample size, it is
not practical to list a separate area of the curve table for each one. Instead, critical
t-values for common alpha levels (0.10, 0.05, 0.01, and so forth) are usually given in
a single table for a range of sample sizes. For very large samples, the t-distribution
approximates the standard normal (z) distribution. In practice, it is best to use
m
Values in the t-table are not actually listed by sample size but by degrees
)A
of freedom (df). The number of degrees of freedom for a problem involving the
t-distribution for sample size n is simply n – 1 for a one-sample mean problem.
Uses of T Test
(c
●● A two sample location test of the null hypothesis that the means of two
Notes
e
normally distributed populations are equal.
All such tests are usually called Student’s t-tests, though strictly speaking that
in
name should only be used if the variances of the two populations are also assumed
to be equal; the form of the test used when this assumption is dropped is sometimes
called Welch’s t-test. These tests are often referred to as “unpaired” or “independent
nl
samples” t-tests, as they are typically applied when the statistical units underlying the
two samples being compared are non-overlapping.
A test of the null hypothesis that the difference between two responses measured
O
on the same statistical unit has a mean value of zero. For example, suppose we
measure the size of a cancer patient’s tumor before and after a treatment. If the
treatment is effective, we expect the tumor size for many of the patients to be smaller
following the treatment. This is often referred to as the “paired” or “repeated measures”
ity
t-test: A test of whether the slope of a regression line differs significantly from 0.
s
statistical probability that a characteristic is likely to occur within the population.
er
For example, if we wish to estimate the proportion of people with diabetes in
a population, we consider a diagnosis of diabetes as a “success” (i.e., and individual
who has the outcome of interest), and we consider lack of diagnosis of diabetes as a
“failure.” In this example, X represents the number of people with a diagnosis of diabetes
v
in the sample. The sample proportion is p̂ (called “p-hat”), and it is computed by taking the
ratio of the number of successes in the sample to the sample size, that is =
ni
P = x/n
Where x is the number of successes in the sample and n is the size of the sample
U
The formula for the confidence interval for a population proportion follows the same
format as that for an estimate of a population mean. The sampling distribution for the
proportion from , the standard deviation was found to be:
ity
σp’=p(1−p)n
p=p′±[Z(a2)p′(1−p′)n]
m
Z(a2) is set according to our desired degree of confidence and p′(1−p′)n is the
standard deviation of the sampling distribution.
proportions p and q. The estimated proportions p′ and q′ are used because p and q are
not known.
Key Terms
(c
●● Sample: A sample is that part of the universe which the select for the purpose
of investigation. A sample exhibits the characteristics of the universe. The word
sample literally means small universe.
e
or totality on the basis of which a judgment or inference about the aggregate or
totality is made. Sampling is the process of learning about the population on the
in
basis of a sample drawn from it.
●● Stratified random sampling: Stratified random sampling requires the separation
of defined target population into different groups called strata and the selection of
nl
sample from each stratum.
●● Cluster sampling: Cluster sampling is a probability sampling method in which
the sampling units are divided into mutually exclusive and collectively exhaustive
O
subpopulation called clusters.
●● Confidence interval: (CI) for a population proportion can be used to show the
statistical probability that a characteristic is likely to occur within the population.
ity
●● Point estimate. A point estimate of a population parameter is a single value of a
statistic
●● Interval estimate. An interval estimate is defined by two numbers, between which
a population parameter is said to lie.
s
Check your progress er
1. _____ states that the distribution of sample means approximates a normal
distribution as the sample size gets larger.
a) Probability
v
b) Central Limit Theorem
c) Z test
ni
d) Sampling Theorem
2. ____ error is a statistical term referring to an error resulting from data collection,
U
c) Probability
d) Central
3. Sampling method in which a set of respondents are chosen who help the
m
c) Snowball Sampling
d) Convenience Sampling
4. Value used to measure distance between mean and random variable x in terms of
(c
standard deviation is -
a) Z-value
b) Variance
Notes
e
c) Probability of x
d) Density function of x
in
5. Test is applied when samples are less than 30.
a) T
nl
b) Z
c) Rank
O
d) None of these
ity
2. Differentiate between sampling and non-sampling.
3. Explain any five types of sampling techniques
4. What do you mean by t-test and z test?
s
5. Explain Confidence interval estimation for population proportion
er
Check your progress:
1. b) Central Limit Theorem
2. b) Non - sampling
v
3. c) Snowball Sampling
ni
4. a) Z-value
5. a) T
U
Further Readings
4. Richard I. Levin, David S. Rubin, Sanjay Rastogi Masood Husain Siddiqui,
Statistics for Management, Pearson Education, 7th Edition, 2016.
ity
Bibliography
13. Srivastava V. K. etal – Quantitative Techniques for Managerial Decision Making,
Wiley Eastern Ltd
)A
e
18. Kalavathy S. – Operation Research – Vikas Pub Co
19. Gould F J – Introduction to Management Science – Englewood Cliffs N J Prentice
in
Hall.
20. Naray J K, Operation Research, theory and applications – Mc Millan, New Dehi.
nl
21. Taha Hamdy, Operations Research, Prentice Hall of India
22. Tulasian: Quantitative Techniques: Pearson Ed.
23. Vohr.N.D. Quantitative Techniques in Management, TMH
O
24. Stevenson W.D, Introduction to Management Science, TMH
s ity
v er
ni
U
ity
m
)A
(c
e
Learning Objective:
in
●● To get introduced with the concept of hypothesis testing and learn parametric and
non-parametric
nl
Learning Outcome:
At the end of the course, the learners will be able to –
O
●● Perform Test of Hypothesis as well as calculate confidence interval for a
population parameter for single sample and two sample cases.
ity
Hypothesis test is a method of making decisions using data from a scientific study.
In statistics, a result is called statistically significant if it has been predicted as unlikely
to have occurred by chance alone, according to a pre-determined threshold probability,
the significance level. The phrase “test of significance” was coined by statistician
s
Ronald Fisher. These tests are used in determining what outcomes of a study would
lead to a rejection of the null hypothesis for a pre-specified level of significance;
er
this can help to decide whether results contain enough information to cast doubt on
conventional wisdom, given that conventional wisdom has been used to establish
the null hypothesis. The critical region of a hypothesis test is the set of all outcomes
v
which cause the null hypothesis to be rejected in favor of the alternative hypothesis.
Statistical hypothesis testing is sometimes called confirmatory data analysis, in contrast
ni
to exploratory data analysis, which may not have pre-specified hypotheses. Statistical
hypothesis testing is a key technique of frequents inference.
Characteristics of Hypothesis
U
Being handicapped by the data collection, it may not be possible to test the hypothesis.
Watch for words like ought, should, bad.
The hypothesis should not only be specific to a place and situation but also
these should be narrowed down with respect to its operation. Let there be no global
use of concepts whereby the researcher is using such a broad concept which may
Notes
e
all inclusive and may not be able to tell anything. For example somebody may try to
propose the relationship between urbanization and family size. Yes urbanization
in
influences in declining the size of families. But urbanization is such comprehensive
variable which hide the operation of so many other factor which emerge as part of the
urbanization process. These factors could be the rise in education levels, women’s
levels of education, women empowerment, emergence of dual earner families, decline
nl
in patriarchy, accessibility to health services, role of mass media, and could be more.
Therefore the global use of the word `urbanization’ may not tell much. Hence it is
suggested to that the hypothesis should be specific.
O
Hypothesis should be related to available techniques of research
Hypothesis may have empirical reality; still we are looking for tools and techniques
ity
that could be used for the collection of data. If the techniques are not there then the
researcher is handicapped. Therefore, either the techniques are already available or the
researcher is in a position to develop suitable techniques for the study.
s
Hypothesis has to be supported by theoretical argumentation. For this purpose the
research may develop his/her theoretical framework which could help in the generation
er
of relevant hypothesis. For the development of a framework the researcher shall
depend on the existing body of knowledge. In such an effort a connection between
the study in hand and the existing body of knowledge can be established. That is how
v
the study could benefit from the existing knowledge and later on through testing the
hypothesis could contribute to the reservoir of knowledge.
ni
examine the entire population. Since that is often impractical, researchers typically
examine a random sample from the population. If sample data are not consistent with
the statistical hypothesis, the hypothesis is rejected.
In doing so, one has to take the help of certain assumptions or hypothetical values
m
about the characteristics of the population if some such information is available. Such
hypothesis about the population is termed as statistical hypothesis and the hypothesis
is tested on the basis of sample values. The procedure enables one to decide on a
certain hypothesis and test its significance. “A claim or hypothesis about the population
)A
This hypothesis is then tested with available evidence and a decision is made
whether to accept this hypothesis or reject it. If this hypothesis is rejected, then we
accept the alternate hypothesis. This hypothesis is written as H1. For testing hypothesis
(c
e
assume only nominal or ordinal data.
in
4.1.2 Developing Null and Alternate Hypothesis
Null Hypothesis
nl
It is used for testing the hypothesis formulated by the researcher. Researchers
treat evidence that supports a hypothesis differently from the evidence that opposes
it. They give negative evidence more importance than to the positive one. It is because
O
the negative evidence tarnishes the hypothesis. It shows that the predictions made by
the hypothesis are wrong. The null hypothesis simply states that there is no relationship
between the variables or the relationship between the variables is “zero.” That is
how symbolically null hypothesis is denoted as “H0”. For example: H0 = There is no
ity
relationship between the level of job commitment and the level of efficiency.
s
(i.e. H0 is non directional), which may be a second step in testing the hypothesis.
First we look whether or not there is an association then we go for the direction of
er
association and the strength of association. Experts recommend that we test our
hypothesis indirectly by testing the null hypothesis. In case we have any credibility in
our hypothesis then the research data should reject the null hypothesis. Rejection of the
v
null hypothesis leads to the acceptance of the alternative hypothesis.
ni
Alternative Hypothesis
The alternative (to the null) hypothesis simply states that there is a relationship
between the variables under study. In our example it could be: there is a relationship
U
between the level of job commitment and the level of efficiency. Not only there is an
association between the two variables under study but also the relationship is perfect
which is indicated by the number “1”. Thereby the alternative hypothesis is symbolically
denoted as “ H1”. It can be written like this: H1: There is a relationship between the level
ity
(as this implies 100% certainty). Because a p-value is based on probabilities, there is
always a chance of making an incorrect conclusion regarding accepting or rejecting the
null hypothesis (H0).
)A
Anytime we make a decision using statistics there are four possible outcomes, with
two representing correct decisions and two representing errors.
Type 1 error
(c
A type 1 error is also known as a false positive and occurs when a researcher
incorrectly rejects a true null hypothesis. This means that your report that your findings
are significant when in fact they have occurred by chance.
Amity Directorate of Distance & Online Education
68 Statistics Management
●● The probability of making a type I error is represented by your alpha level (α),
Notes
e
which is the p-value below which you reject the null hypothesis. A p-value of
0.05 indicates that user is willing to accept a 5% chance that you are wrong
in
when you reject the null hypothesis.
●● The risk of committing a type I error can be reduced by using a lower value
for p. For example, a p-value of 0.01 would mean there is a 1% chance of
nl
committing a Type I error.
●● However, using a lower value for alpha means that you will be less likely to
detect a true difference if one really exists (thus risking a type II error).
O
Type 2 error
A type II error is also known as a false negative and occurs when a researcher fails
to reject a null hypothesis which is really false. Here a researcher concludes there is not
ity
a significant effect, when actually there really is.
The probability of making a type II error is called Beta (β), and this is related to the
power of the statistical test (power = 1- β). The risk of committing a type II error can be
decreased by ensuring that the test has enough power.
s
4.1.4 Level of Significance and Critical Region
er
Level of Significance
●● The level of significance often referred to as alpha or α, is a measure of
v
the strength of the evidence to be present in your sample before the null
hypothesis is rejected and it is concluded that the effect is statistically
ni
●● The significance levels are used during hypothesis testing to help in the
determination of which hypothesis the data supports and are comparing the
p-value with significance level. If the p-value is less than the significance
level, then the null hypothesis can be rejected and concluded that the effect
m
Critical Region
)A
A critical region, also known as the Region of Rejection, is a set of test statistic
values for which the null hypothesis is rejected. That is to say, if the test statistics
observed are in the critical region then we reject the null hypothesis and accept the
alternative hypothesis. The critical region defines how far away our sample statistic
(c
must be from the null hypothesis value before we can say it is unusual enough to reject
the null hypothesis.
The “best” critical region is one where the likelihood of making a Type I or Type II
Notes
e
error is minimised. In other words, the uniformly most powerful rejection region is the
region where the smallest chance of making a Type I or II error is present. It is also the
in
region that provides the largest (or equally greatest) power function for a UMP test.
nl
A statistic’s standard error is the standard deviation from its sampling distribution,
or an estimate of that standard deviation. If the mean is the parameter or the statistic it
is called the mean standard error. It is defined as –
O
σ
SE=
n
Where,
ity
SE is Standard error of the sample
N is the number of samples and
σ Is the sample standard deviation.
Standard error increases when standard deviation, i.e. the variance of the
s
population, increases. Standard error decreases when sample size increases – as the
sample size gets closer to the true size of the population, the sample means cluster
more and more around the true population mean.
er
The standard error tells how accurate the mean is likely to be compared with the
true population of any given sample from that population. By increasing the standard
v
error, i.e. the means are more spread out; it becomes more likely that any given mean
is an inaccurate representation of the true mean population.
ni
of estimate computed from the statistics of the observed data. This proposes a range of
plausible values for an unknown parameter (for example, the mean). The interval has
an associated confidence level that the true parameter is in the proposed range.
ity
(i.e. the proportion) of possible confidence intervals that contain the true value
of the unknown population parameter.
●● In other words, if confidence intervals are constructed using a given
)A
collection, in 90% of the samples the interval estimate will contain the population
parameter. The confidence level is designated before examining the data. Most
commonly, a 95% confidence level is used. However, confidence levels of 90% and
Notes
e
99% are also often used in analysis.
Factors affecting the width of the confidence interval include the size of the sample,
in
the confidence level, and the variability in the sample. A larger sample will tend to
produce a better estimate of the population parameter, when all other factors are equal.
A higher confidence level will tend to produce a broader confidence interval.
nl
4.2.1 For Single Population Mean Using t-statistic
When s is not known, we use its estimate computed from the given sample. Here,
O
the nature of the sampling distribution of X would depend upon sample size n. There
are the following two possibilities:
If parent population is normal and n < 30 (popularly known as small sample case),
ity
use t - test. The
∑ ( xi − x ) 2
Unbiased estimate of s in this case is given by s=
n −1
If n ³ 30 (large sample case), use standard normal test. The unbiased estimate of
s
∑ ( xi − x ) 2
s in this case can be taken as s= since the difference between n and n - 1
er n
is negligible for large values of n. Note that the parent population may or may not be
Of course, the value of t0.05 depends on the number of degrees of freedom. For
example, with 2 degrees of freedom, t0.05 is equal to 2.92; but with 20 degrees of
ity
Example:
ABC Corporation manufactures light bulbs. The CEO claims that an average
Acme light bulb lasts 300 days. A researcher randomly selects 15 bulbs for testing. The
m
sampled bulbs last an average of 290 days, with a standard deviation of 50 days. If the
CEO’s claim were true, what is the probability that 15 randomly selected bulbs would
have an average life of no more than 290 days?
)A
Note: Solution is the traditional approach and requires the computation of the t
statistic, based on data presented in the problem description. Then, the T distribution
calculator is to be used to find the probability.
Solution:
(c
t=[x-μ]/[s/√ (n)]
Notes
e
t = ( 290 - 300 ) / [ 50 / √ ( 15) ]
t = -10 / 12.909945 = - 0.7745966
in
where x is the sample mean, μ is the population mean, s is the standard deviation
of the sample, and n is the sample size.
nl
●● The t statistic is equal to - 0.7745966.
The calculator displays the cumulative probability: 0.226. Hence, if the true bulb
O
life were 300 days, there is a 22.6% chance that the average bulb life for 15 randomly
selected bulbs would be less than or equal to 290 days.
ity
A z-test is a statistical test that is used to determine if means of population differ
when the variances are known and the sample size is large. It is assumed that the
test statistics have a normal distribution, and nuisance parameters such as standard
deviation should be known in order to perform an accurate z-test.
s
It is useful to standardized the values of a normal distribution by converting them
into z-scores as -
er
(a) It allows the researchers to calculate the probability of a score occurring within a
standard normal distribution;
v
(b) It enables the comparison of two scores that are from different samples (which may
have different means and standard deviations).
ni
●● Also, t-tests assume the standard deviation is unknown, while z-tests assume
it is known.
Application
m
Z = x – µ / σ / √n
Where, x is the sample mean,
u is the population mean
(c
Example:
Notes
e
The mean length of the lumber is supposed to be 8.5 feet. A builder wants to check
whether the shipment of lumber she receives has a mean length different from 8.5 feet.
in
If the builder observes that the sample mean of 61 pieces of lumber is 8.3 feet with a
sample standard deviation of 1.2 feet. What will she conclude? Is 8.3 very different from
8.5?
nl
Solution:
O
Thus,
Z = x – µ / σ / √n
= 8.3 -8.5 / 1.2 √ 61
ity
= - 1.3
Thus, It is been asked if −1.3 is very far away from zero, since that corresponds to
the case when x¯ is equal to μ0. If it is far away so the null statement is unlikely to be
valid and one refuses it. Otherwise the null hypothesis can not be discarded.
s
4.2.3 Hypothesis Testing for Population Proportion.
er
Using independent samples means that there is no relationship between the
groups. The values in one sample have no association with the values in the other
sample. These populations are not related, and the samples are independent. We look
v
at the difference of the independent means.
ni
Independent sample
The samples from two populations are independent if the samples selected from
one of the populations have no relationship with the samples selected from the other
population.
m
Dependent sample
The samples are dependent if each measurement in one sample is matched or
)A
paired with a particular measurement in the other sample. Another way to consider this
is how many measurements are taken off of each subject. If only one measurement,
then independent; if two measurements, then paired. Exceptions are in familial
situations such as in a study of spouses or twins. In such cases, the data is almost
always treated as paired data.
(c
Example - Compare the time that males and females spend watching TV.
Notes
e
a. We randomly select 15 men and 15 women and compare the average time they
spend watching TV. Is this an independent sample or paired sample?
in
b. We randomly select 15 couples and compare the time the husbands and wives
spend watching TV. Is this an independent sample or paired sample?
nl
a. Independent Sample
b. Paired sample
O
Application
The null hypothesis to be tested is H0: π = π0 against Ha: π ≠ π0 for a two tailed test
and π > or < π0 for a one tailed test. The test statistic is
ity
p − π0 n
zcal
= = ( p − π0 )
π 0 (1 − π 0 ) π 0 (1 − π 0 )
n
s
Example :
A wholesaler in oranges claims that only 4% of the apples supplied by him are
er
defective. A random sample of 600 apples contained 36 defective apples. Test the claim
of the wholesaler.
Solution.
v
We have to test H0 : π £ 0.04 against Ha : π > 0.04.
ni
600
zcal =
(0.06−0.04) 2.5
=
U
0.04 x0.96
Example:
470 tails were obtained in 1,000 throws of an unbiased coin. Can the difference
between the proportion of tails in sample and their proportion in population be regarded
m
Solution:
Since this value is less than 1.96, the coin can be regarded as fair and thus,
the difference between sample and population proportion of heads are only due to
fluctuations of sampling.
(c
e
a. When the population mean in being known
in
This test is applicable when the random sample X1 , X2 , ...... Xn is drawn from a
normal population.
nl
The test statistic X − µ � N (0,1) . Let the value of this statistic calculated from
σ/ n
sample be denoted
O
X −µ
as zcal = . The decision rule would be:
σ/ n
Reject H0 at 5% (say) level of significance if zcal > 1.96. Otherwise, there is no
ity
evidence against H0 at 5% level of significance.
Example –
A company claims that the average mileage of bikes of his company is 40 km/l.
s
A random sample of 20 bikes of the company showed an average mileage of 42 km/l.
Test the claim of the manufacturer on the assumption that the mileage of scooter is
er
normally distributed with a standard deviation of 2 km/l.
X −µ 42 − 40
v
=zcal = = 4.47
σ/ n 2 / 20
ni
When s is not known, we use its estimate computed from the given sample. Here,
the nature of the sampling distribution of X would depend upon sample size n. There
are the following two possibilities:
ity
If parent population is normal and n < 30 (popularly known as small sample case),
use t – test. Also, like normal test, the hypothesis may be one or two tailed
If n ³ 30 (large sample case), use standard normal test. Since the difference
between n and n - 1 is negligible for large values of n. Note that the parent population
m
Example:
)A
Daily sales figures of 40 shopkeepers showed that their average sales and
standard deviation were Rs 528 and Rs 600 respectively. Is the assertion that daily
sales on the average is Rs 400, contradicted at 5% level of significance by the sample?
Solution:
(c
Since n > 30, standard normal test is applicable. It is given that n = 40, X = 528 and
S = 600.
e
528 − 400
=zcal = 1.35
600 / 40
in
Since this value is less than 1.96, there is no evidence against H0 at 5% level of
significance. Hence, the given assertion is not contradicted by the sample.
nl
4.3.2 Inference about the Difference Between two Population
Proportions
O
A test of two population proportions is very similar to a test of two means, except
that the parameter of interest is now “p” instead of “µ”.
ity
It is expect that p̂ would be close to p. With a test of two proportions, we will
have two p̂ ’s, and we expect that (p̂ 1 – p̂ 2) will be close to (p1 – p2). The test statistic
accounts for both samples.
s
p− p
z=
p (1 − p )
er
n
v
and it has an approximate standard normal distribution.
HOWEVER, the null hypothesis will be that p1 = p2. Because the H0 is assumed
to be true, the test assumes that p1 = p2. We can then assume that p1 = p2 equals p, a
common population proportion. We must compute a pooled estimate of p (its unknown)
U
Application
ity
Men and Women were asked about what they would do if they received a $100 bill
m
by mail, addressed to their neighbor, but wrongly delivered to them. Would they return
it to their neighbour? Of the 69 males sampled, 52 said “yes” and of the 131 females
sampled, 120 said “yes.”
)A
Does the data indicate that the proportions that said “yes” are different for male and
female?
If the proportion of males who said “yes, they would return it” is denoted as p1 and
the proportion of females who said “yes, they would return it” is denoted as p2, thus p1
(c
= p2
p1 – p2 = 0 or p1/p2 = 1
e
of these expressions.
Thus,
in
Men: n1 = 69 p1 = 52/69
nl
Using the formula –
p1 (1 − p1 ) p2 (1 − p2 )
p1 − p2 ± zα /2 +
n1 n2
O
52 52 120 120
1 − 1 −
52 120 69 69 131 131
− ± 1.96 +
69 131 69 131
ity
−0.1624 ± 1.96(0.05725)
−0.1624 ± 0.1122or (0.2746 − 0.0502)
We are 95% confident that the difference of population proportions of men who
said “yes” and women who said “yes” is between -0.2746 and -0.0502.
s
Based on both ends of the interval being negative, it seems like the proportion of
females who would return it is higher than the proportion of males who would return it.
er
4.3.3 Independent Samples and Matched Samples
Matched samples also called as matched pairs, paired samples or dependent
v
samples are paired such that all characteristics except the one under review are shared
by the participants. A “participant” is a member of the sample, and can be a person,
ni
object or thing. Matched pairs are widely used to assign one person to a treatment
group and another to a control group. This method , called matching, is used in the
design of matched pairs. The “pairs” should not be different persons, at different times
U
●● The same study participants are measured before and after an intervention.
●● The same study participants are measured twice for two different
ity
interventions.
An independent sample is the opposite of a matched sample which deals with
unrelated classes.
Although matching pairs are intentionally selected, individual samples are typically
m
One of the essential steps of a test to compare two population variances is for
checking the equal variances assumption if you want to use the pooled variances. Many
people use this test as a guide to see if there are any clear violations, much like using
the rule of thumb.
(c
An F-test is used to test if the variances of two populations are equal. This test can
be a two-tailed test or a one-tailed test.
The two-tailed version tests that the variances are not equal against the alternative.
Notes
e
The one-tailed version tests only in one direction, that is, the variance from the
first population is either greater or less than (but not both) the second variance in
in
population. The problem determines the choice. If we are testing a new process , for
example, we might only be interested in knowing if the new process is less variable
than the old process.
nl
Application:
To compare the variances of two quantitative variables, the hypotheses of interest
are:
O
Null Alternatives
σ2 σ 12
H 0 : 12 = 1 Hα : ≠1
σ 22
ity
σ2
σ 12
Hα : >1
σ 22
σ 12
Hα : <1
s
σ 22
Example:
er
Suppose randomly 7 women are selected from a population of women, and 12
men from a population of men. The table below shows the standard deviation in each
sample and in each population. Compute the f statistic.
v
Population Population standard deviation Sample standard deviation
ni
Women 30 35
Men 50 45
U
Solution:
The f statistic can be computed from the population and sample standard
ity
= 1.361 / 0.81
= 1.68
For this calculation, the numerator degrees of freedom v1 are 7 - 1 or 6; and the
(c
denominator degrees of freedom v2 are 12 - 1 or 11. On the other hand, if the men’s
data appears in the numerator, we can calculate an f statistic as follows:
e
= (2025 / 2500) / (1225 / 900)
in
= 0.81 / 1.361
= 0.595
For this calculation, the numerator degrees of freedom v1 are 12 – 1 or 11; and
nl
the denominator degrees of freedom v2 are 7 – 1 or 6. When you are trying to find the
cumulative probability associated with an f statistic, you need to know v1 and v2.
O
Assumptions
Several assumptions are made for the test. Your population must be approximately
normally distributed (i.e. fit the shape of a bell curve) in order to use the test. Plus, the
ity
samples must be independent events. In addition, you’ll want to bear in mind a few
important points:
●● The larger variance should always go in the numerator (the top number) to
force the test into a right-tailed test. Right-tailed tests are easier to calculate.
s
●● For two-tailed tests, divide alpha by 2 before finding the right critical value.
●● If you are given standard deviations, they must be squared to get the
variances.
er
●● If your degrees of freedom aren’t listed in the F Table, use the larger critical
value. This helps to avoid the possibility of Type I errors.
v
4.4.1 Analysis of Variance
ni
Variance is defined as the average of squared deviation of data points from their
mean.
When the data constitute a sample, the variance is denoted byσ2x and averaging
U
is done by dividing the sum of the squared deviation from the mean by ‘n – 1’. When
observations constitute the population, the variance is denoted by σ2 and we divide by
N for the average
ity
∑( − )
Population Variance Var (X) =
Where,
)A
X = Sample mean
n = Sample size.
(c
µ = Population mean
N = Population size
Notes
e
Population Variance is,
∑ ( xi − µ ) 2
in
Var (x) = σ 2 =
N
n n n n
∑ ( xi2 − 2µ xi + µ 2 ) ∑ ( xi2 ) − 2µ ∑ xi + µ 2 ∑ (1)
= =
=i 1 =i 1 =i 1 =i 1
nl
N N
n
∑ xi2
= i =1
− µ2
O
N
Var (x) = E(X 2 )-[E(X)]2
ity
It is the test that uses the chi-square statistic to test the fit between a theoretical
frequency distribution and a frequency distribution of observed data for which each
observation may fall into one of several classes.
s
Formula of Chi-square text:
(O – E)2
x2 = Σ
E
er
Table value of X2 for d.f and a
A chi-square test can be used when the data satisfies four conditions:
●● There must be two observed sets of data or one observed set of data and one
U
expected set of data (generally, there are n-rows and c-columns of data).
●● The two sets of data must be based on the same sample size.
●● Each cell in the data contains the observed or expected count of five or large?
ity
●● The different cells in a row of column must have categorical variables (male,
female less than 25 years of age, 25 year of age, older than 40 years of age
etc.
●● To test whether the sample differences among various sample proportions are
significant or can they be attributed to chance
●● To test the independence of two variables in a contingency table.
●● To use it as a test of goodness of fit.
(c
Example 1:
Notes
e
The operations manager of a company that manufactures tires wants to determine
whether there are any differences in the quality of work among the three daily shifts.
in
She randomly selects 496 tires and carefully inspects them. Each tire is either classified
as perfect, satisfactory, or defective, and the shift that produced it is also recorded. The
two categorical variables of interest are shift and condition of the tire produced. The
nl
data can be summarized by the accompanying two-way table. Does the data provide
sufficient evidence at the 5% significance level to infer that there are differences in
quality among the three shifts?
O
Perfect Satisfactory Defective Total
Shift 2 67 85 1 153
ity
Shift 3 37 72 3 112
s
Solution:
C1
er C2 C3 Total
3 37 72 3 112
U
since the p-value (0.071) is greater than 0.05. Even if we did have a significant result,
we still could not trust the result, because there are 3 (33.3% of) cells with expected
counts < 5.0
)A
Example 2
A food services manager for a baseball park wants to know if there is a relationship
between gender (male or female) and the preferred condiment on a hot dog. The
following table summarizes the results. Test the hypothesis with a significance level of
10%.
(c
e
Male 15 23 10 48
in
Female 25 19 8 52
Total 40 42 18 100
nl
Solution:
O
●● H0: Gender and condiments are independent
●● Ha: Gender and condiments are not independent
ity
Ketchup Mustard Relish Total
s
Total 40 42 18 100
er
None of the expected counts in the table are less than 5. Therefore, we can
proceed with the Chi-square test. The test statistic is
v
2* (15 − 19.2)2 (23 − 20.16)2 (10 − 8.64)2
x = + + +
ni
+ + 2.95
=
20.8 21.84 9.36
freedom. Using a table or software, we find the p-value to be 0.2288. With a p-value
greater than 10%, we can conclude that there is not enough evidence in the data to
suggest that gender and preferred condiment are related.
The chi-squared test, when used with the standard approximation that a chi-
squared distribution is applicable, has the following assumptions:
)A
●● Simple random sample: The sample data is a random sampling from a fixed
distribution or population where each member of the population has an equal
probability of selection. Variants of the test have been developed for complex
samples, such as where the data is weighted.
●● Sample size (whole table): A sample with a sufficiently large size is
(c
then the chi squared test will yield an inaccurate inference. The researcher, by
Notes
e
using chi squared test on small samples, might end up committing a Type II
error.
in
●● Expected cell count: Adequate expected cell counts. Some require 5 or more,
and others require 10 or more. A common rule is 5 or more in all cells of a 2-by-
2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero
nl
expected count. When this assumption is not met, Yates’s correction is applied.
●● Independence: The observations are always assumed to be independent of
each other. This means chi-squared cannot be used to test correlated data
(like matched pairs or panel data). In those cases you might want to turn to
O
McNamara’s test.
ity
The degree of freedom, abbreviated as d.f, denotes the extent of independence
(freedom) enjoyed by a given set of observed frequencies. Degrees of freedom are
usually denoted by the letter ‘v’ of the Greek alphabet.
Suppose, if we are given a set of ‘n’ observed frequencies which are subjected to
s
‘k’ independent constraints (restrictions). Then
●● Type I error: A type 1 error is also known as a false positive and occurs when a
researcher incorrectly rejects a true null hypothesis.
●● Type II error: A type II error is a false negative and occurs when a researcher fails
U
data.
●● Z- Test: A z-test is a statistical test to determine whether two population means
are different when the variances are known and the sample size is large.
●● p Value: The p-value is the probability of receiving outcomes as extreme as the
m
probability of selection
●● Degrees of Freedom: The degree of freedom, abbreviated as d.f, denotes
the extent of independence or the freedom enjoyed by a given set of observed
frequencies
(c
e
1. A ____ is a range of values where the true value lies in.
a) Confidence Interval
in
b) Quartile range
c) Sample
nl
d) Mean
2. A ____ a statistical test to determine whether two population means are different
when variances are known
O
a) T test
b) Quartile
ity
c) z test
d) Median
3. What denotes the extent of independence enjoyed by a given set of observed
frequencies
s
a) Standard deviation er
b) Median
c) Degree of freedom
d) Hypothesis
v
4. Which test is used as test of goodness of fit.
ni
a) Z test
b) T test
c) Chi square test
U
d) Fitness test
5. A _____ is also known as a false positive and occurs when researcher incorrectly
rejects a true null hypothesis.
ity
a) Type I error
b) Type II error
c) T test error
m
d) Probability error
e
1. a) Confidence Interval
2. c) z test
in
3. c) Degree of freedom
4. c) Chi square test
nl
5. a) Type I error
Further Readings
O
1. Richard I. Levin, David S. Rubin, Sanjay Rastogi Masood Husain Siddiqui,
Statistics for Management, Pearson Education, 7th Edition, 2016.
2. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
ity
3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An
Introduction to Statistical Learning with Applications in R, Springer, 2016.
Bibliography
s
1. Srivastava V. K. etal – Quantitative Techniques for Managerial Decision Making,
Wiley Eastern Ltd
2.
er
Richard, I.Levin and Charles A.Kirkpatrick – Quantitative Approaches to Management,
McGraw Hill, Kogakusha Ltd.
3. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
v
4. Budnik, Frank S Dennis Mcleaavey, Richard Mojena – Principles of Operation
Research - AIT BS New Delhi.
ni
e
Learning Objective:
in
●● To understand the measures of linear relationship between variables
●● To get familiarize with Time Series Analysis
nl
Learning Outcome:
●● Understand and apply forecasting techniques for business decision making and to
O
uncover relationships between variables to produce forecasts of the future values
of strategic variables
ity
“If two or more quantities vary in sympathy so that the movement in one
tends to be accompanied by corresponding movements in others than they
are said are correlated.”
L.R. Conner says-
s
5.1.1 Measures of Linear Relationship: covariance & correlation – Intro
er
We often encounter the situations, where data appears as pairs of figures relating
to two variables, for example, price and demand of commodity, money supply and
inflation, industrial growth and GDP, advertising expenditure and market share, etc.
v
Examples of correlation problems are found in the study of the relationship
between IQ and aggregate percentage marks obtained in mathematics examination
ni
or blood pressure and metabolism. In these examples, both variables are observed
as they naturally occur, since neither variable can be fixed at predetermined levels.
Correlation and regression analysis show how to determine the nature and strength of
U
variables. It may be the case that one is the cause and other is an effect i.e.
independent and dependent variables respectively. On the other hand, both may be
dependent variables on a third variable. In some cases there may not be any cause
)A
effect relationship at all. Therefore, if we do not consider and study the underlying
economic or physical relationship, correlation may sometimes give absurd results.
For example, The case of global average temperature and Indian population. Both
are increasing over past 50 years but obviously not related. Correlation is an analysis of
the degree to which two or more variables fluctuate with reference to each other.
Amity Directorate of Distance & Online Education
86 Statistics Management
e
sign indicates movement of the variables in the same direction. E.g. Variation of the
fertilizers used on a farm and yield, observes a positive relationship within technological
in
limits. Whereas negative (–ve) coefficient indicates movement of the variables in the
opposite directions, i.e. when one variable decreases, other increases. E.g. Variation of
price and demand of a commodity have inverse relationship. Absence of correlation is
indicated if the coefficient is close to zero. Value of the coefficient close to ±1denotes a
nl
very strong linear relationship.
O
●● To identify relationship of various factors and decision variables.
●● To estimate value of one variable for a given value of other if both are
correlated.
ity
●● To understand economic behaviour and market forces.
●● To reduce uncertainty in decision-making to a large extent.
In business, correlation analysis often helps manager to take decisions by
estimating the effects of changing the values of the decision variables like promotion,
advertising, price, production processes, on the objective parameters like costs, sales,
s
market share, consumer satisfaction, competitive price. The decision becomes more
objective by removing subjectivity to certain extent. However, it must be understood that
er
the correlation analysis only tells us about the two or more variables in a data fluctuate
together or not. It does not necessarily be due cause and effect relationship. To know
if the fluctuations in one of the variables indeed affects other or not, one has to be
v
established with logical understanding of the business environment.
ni
The correlation is said to be positive when the increase (decrease) in the value of
one variable is accompanied by an increase (decrease) in the value of other variable also.
When we say a perfect correlation, the scatter diagram will show a linear (straight
line) plot with all points falling on straight line. If we take appropriate scale, the straight
(c
line inclination can be adjusted to 45°, although it is not necessary as long as inclination
is not 0° or 90° where there is no correlation at all because value of one variable
changes without any change in the value of other variable.
Amity Directorate of Distance & Online Education
Statistics Management 87
In case of negative correlation when one variable increases the other decrease
Notes
e
and visa versa. If the scatter diagram shows the points distributed closely around an
imaginary line, we say it is high degree of correlation. On the other hand, if we can
hardly see any unique imaginary line around which the observations are scattered, we
in
say correlation does not exist. Even in case of imaginary line being parallel to one of
the axes we say no correlation exists between the variables. If the imaginary line is a
straight line we say the correlation is linear.
nl
2. Simple or multiple correlations: In simple correlation the variation is between
only two variables under study and the variation is hardly influenced by any external
factor. In other words, if one of the variables remains same, there won’t be any change
O
in other variable. For example, variation in sales against price change in case of a
price sensitive product under stable market conditions shows a negative correlation. In
multiple correlations, more than two variables affect one another. In such a case, we
need to study correlation between all the pairs that are affecting each other and study
ity
extent to which they have the influence.
s
correlation. In case of partial correlation, we study variation of two variables and
excluding the effects of other variables by keeping them under controlled condition. In
er
case of ‘total correlation’ study we allow all relevant variables to vary with respect to
each other and find the combined effect. With few variables, it is feasible to study ‘total
correlation’. As number of variables increase, it becomes impractical to study the ‘total
v
correlation’. For example, coefficient of correlation between yield of wheat and chemical
fertilizers excluding the effects of pesticides and manures is called partial correlation.
ni
When the amount of change in one variable tends to keep a constant ratio to the
amount of change in the other variable, then the correlation is said to be linear.
The distinction between linear and non-linear is based upon the consistency of the
ratio of change between the variables. The manager must be careful in analyzing the
ity
correlation using coefficients because most of the coefficients are based on assumption
of linearity. Hence plotting a scatter diagram is good practice. In case of linear
correlation, the differential (derivative) of relationship is constant with the graph of the
data being a straight line.
m
based on linear assumption will be misleading unless used over a very short data
range. Using computers, we could analyze a nonlinear correlation to a certain extent,
with some simplified assumption
Many times the observations are grouped into a ‘two way’ frequency distribution
table. These are called bivariate frequency distribution. It is a matrix where rows are
grouped for X variable and columns are grouped for Y variable. Each cell say (i, j)
Notes
e
represents them frequency or count that falls in both groups of a particular range of
values of Xi and Yj. In this case correlation coefficient is given by
in
1
∑ f × mx × m y − ∑ ( f × mx ) ∑ ( f × m y )
r= n
( ∑ f × mx ) 2 (∑ f × my ) 2
∑ ( f × mx 2 ) − ∑( f × my 2 ) −
n n
nl
Where mX and mY are class marks of frequency distributions of X and Y variables,
fX and fY are marginal frequencies of X and Y and fXY are joint frequencies of X and Y
O
respectively.
ity
0-200 12 6 - - - 18
200-400 2 18 4 2 1 27
400-600 - 4 7 3 - 14
s
600-800 - 1 - 2 1 4
800-1000 - - 1 2 3 6
Total 14 29
er 12 9 5 69
Solution: Let the assumed mean for X be 1 = 1250 and the scaling factor g = 500.
v
Therefore, we can calculate f x dy and f x dx2 from the marginal distribution of X as,
mx − a
ni
1
∑ ( X − X )(Y − Y )
r= n
ó xó y
Where r is the ‘Correlation Coefficient’ or ‘Product Moment Correlation Coefficient’
(c
between X and Y. σ X and σ Y are the standard deviations of X and Y respectively. ‘n’ is
the number of the pairs of variables X and Y in the given data.
e
is known as a covariance between the variables X and Y. It is denoted asCov(x,y)
. The Correlation Coefficient r is a dimensionless number whose value lies between
in
+1 and –1. Positive values of r indicate positive (or direct) correlation between the two
variables X and Y i.e. both X and Y increase or decrease together.
Negative values of r indicate negative (or inverse) correlation, thereby meaning that
nl
an increase in one variable X or Y results in a decrease in the value of the other variable.
A zero correlation means that there is no association between the two variables.
O
The formula can be modified as,
1 1
∑ ( X − X )(Y − Y ) ∑ ( XY − XY − XY + XY )
=r = n
n
σ xσ y σ xσ y
ity
∑ XY ∑ X ∑ Y (2)
− ×
= n n n
2 2
∑X2 ∑X ∑Y 2 ∑Y
− −
n n n n
s
(3)
E[ XY ] − E[ X ]E[Y ]
=
E[ X 2 ] − ( E[ X ]) 2 E[Y 2 ] − ( E[Y ]) 2
er
Equations (2) and (3) are alternate forms of equation (1). These have advantage
that each value from the mean may not be subtracted.
v
Example: The data of advertisement expenditure (X) and sales (Y) of a company
for past 10 year period is given below. Determine the correlation coefficient between
ni
X 50 50 50 40 30 20 20 15 10 5
U
Y 700 650 600 500 450 400 300 250 210 200
Solution: We shall take U to be the deviation of X values from the assumed mean
of 30 divided by 5. Similarly, V represents the deviation of Y values from the assumed
ity
3 50 600 4 20 80 16 400
4 40 500 2 10 20 4 100
5 30 450 0 5 0 0 25
)A
6 20 400 -2 0 0 4 0
7 20 300 -2 -10 20 4 100
8 15 250 -3 -15 45 9 225
9 10 210 -4 -19 76 16 361
(c
e
n 1 n n
∑ ui vi − ∑ ui ∑ vi
r= =i 1 n i 1 =i 1
=
in
2 2
n 1 n n 1 n
∑ ui 2 − ∑ ui ∑ vi 2 − ∑ vi
=i 1 = n i 1= i1 = n i 1
nl
( −2)(26)
561 −
10 561 − 5.2
= = = 0.976
4 676 109.6 3068.4
110 − 3136 −
O
10 10
Interpretation of r
The correlation coefficient, r ranges from −1 to 1. A value of 1 implies that a linear
ity
equation describes the relationship between X and Y perfectly, with all data points lying
on a line for which Y increases as X increases. A value of −1 implies that all data points
lie on a line for which Y decreases as X increases. A value of 0 implies that there is no
linear correlation between the variables.
s
More generally, note that (Xi − X) (Yi − Y) is positive if and only if Xi and Yi lie
on the same side of their respective means. Thus the correlation coefficient is positive
er
if Xi and Yi tend to be simultaneously greater than, or simultaneously less than, their
respective means.
values.
●● When r is positive, the variables x and y increases or decrease together.
●● r = +1 implies that there is a perfect positive correlation between variables x
U
and y.
●● When r is negative, the variables x and y move in the opposite direction.
●● When r = -1, there is a perfect negative correlation.
ity
variables. Also there are occasions where it is difficult to measure the cause-effect
variables. For example, while selecting a candidate, there are number of factors on
which the experts base their assessment. It is not possible to measure many of these
)A
parameters in physical units e.g. sincerity, loyalty, integrity, tactfulness, initiative, etc.
Similar is the case during dance contests. However, in these cases the experts may
rank the candidates. It is then necessary to find out whether the two sets of ranks
are in agreement with each other. This is measured by Rank Correlation Coefficient.
The purpose of computing a correlation coefficient in such situations is to determine
(c
the extent to which the two sets of ranking are in agreement. The coefficient that is
determined from these ranks is known as Spearman’s rank coefficient, rS
e
n
6 × ∑ di 2
rs = 1 − i =1
in
n(n 2 − 1)
Where, n = Number of observation pairs
D = Xi - Yi
nl
= Xi = Values of variable X and = Yi values of variable Y
O
Rank Correlation when Ranks are given
Example: Ranks obtained by a set of ten students in a mathematics test (variable
X) and a physics test (variable Y) are shown below:
ity
Rank for Variable X 1 2 3 4 5 6 7 8 9 10
Rank for Variable Y 3 1 4 2 6 9 8 10 5 7
s
Solution: Computations of Spearman’s Rank Correlation as shown below:
5 5 6 +1 1
6 6 9 +3 9
7 7 8 +1 1
U
8 8 10 +2 4
9 9 5 -4 16
10 10 7 -3 9
ity
Total 50
n
Now, n = 10, ∑ di 2 =
50
i =1
m
2
n(n − 1) 10(100 − 1)
It can be said that there is a high degree of correlation between the performance in
mathematics and physics.
Example: Find the rank correlation coefficient for the following data.
X 75 88 95 70 60 80 81 50
Notes
e
Y 120 134 115 110 140 142 100 150
in
Solution: Let R1 and R2 denotes the ranks in X and Y respectively.
X Y R1 R2 d=R1-R2 d2
75 120 5 5 0 0
nl
88 134 2 4 -2 4
95 150 1 1 0 0
O
70 115 6 6 0 0
60 110 7 7 0 0
80 140 4 3 1 1
ity
81 142 3 2 1 1
50 100 8 8 0 0
6
s
6∑d2 6×6
Coefficient of Correlation P =
1− 1−
= +.93
=
n(n 2 − 1) 8(64 − 1)
er
In this method the biggest item gets the first rank, the next biggest second rank and
so on.
v
5.1.6 Regression Model
ni
There is a need for a statistical model that will extract information from the given
data to establish the regression relationship between independent and dependent
relationship. The model should capture systematic behaviour of data. The non-
U
systematic behaviour cannot be captured and called as errors. The error is due to
random component that cannot be predicted as well as the component not adequately
considered in statistical model. Good statistical model captures the entire systematic
component leaving only random errors.
ity
The best fit is calculated as per Legender’s principle of least sum squares of
deviations of the observed data points from the corresponding values on the ‘best
fit’ curve. This is called as minimum squared error criteria. It may be noted that the
(c
data point ( x,y ) and then we measure corresponding y value on ‘best fit’ curve and
Notes
e
then take the value of deviation in y, we call it as regression of Y on X. In the other
case, if we measure deviations in X direction we call it as regression of X and Y.
in
Definition: According to Morris Myers Blair, regression is the measure of the average
relationship between two or more variables in terms of the original units of the data.
nl
Applicability of Regression Analysis
Regression analysis is one of the most popular and commonly used statistical tools
in business. With availability of computer packages, it has simplified the use. However,
O
one must be careful before using this tool as it gives only mathematical measure based
on available data. It does not check whether the cause effect relationship really exists
and if it exists which is dependent and which is dependent variable.
ity
Regression analysis is a branch of statistical theory which is widely used in
all the scientific disciplines. It is a basic technique for measuring or estimating the
relationship among economic variables that constitute the essence of economic theory
and economic life. The uses of regression analysis are not confined to economic and
business activities. Its applications are extended to almost all the natural, physical and
s
social sciences.
∑ xi 2 =
∑ xi 2 − nX 2
)A
∑ yi 2 =
∑ yi 2 − nY 2
∑ xi yi =
∑ xi yi − nX .Y
∑ xi yi
b= , a= Y − bX
(c
∑ xi 2
These measures define a and b which will give the best possible fit through the
Notes
e
original X and Y points and the value of r can then be worked out as under:
in
b ∑ xi 2
r=
∑ yi 2
nl
Thus, the regression analysis is a statistical method to deal with the formulation
of mathematical model depicting relationship amongst variables which can be used for
the purpose of prediction of the values of dependent variable, given the values of the
O
independent variable.
ity
using the following two normal equations:
∑ yi = na + b ∑ X i
∑ X iYi = a ∑ X i + b ∑ X i 2
s
Solving these equations for finding a and b values. Once these values are obtained
er
and have been put in the equation Y = a + bX, we say that we have fitted the regression
equation of Y on X to the given data. In a similar fashion, we can develop the regression
equation of X and Y viz., X = a + bX, presuming Y as an independent variable and X as
dependent variable.
v
5.1.8 Assessing the Model
ni
a + bt + ct2 (dropping the subscript for convenience). Here a, b and c are constants
to be determined from the given data. Using the method of least squares, the normal
equations for the simultaneous solution of a, b, and c are:
ity
∑ Y = na + b ∑ t + c ∑ t 2
∑ tY = a ∑ t + b ∑ t 2 + c ∑ t 3
∑ t 2Y = a ∑ t 2 + b ∑ t 3 + c ∑ t 4
m
By selecting a suitable year of origin, i.e., define X = t - origin such that SX = 0, the
computation work can be considerably simplified. Also note that if SX = 0, then SX3 will
also be equal to zero. Thus, the above equations can be rewritten as:
)A
∑ Y = na + c ∑ X 2 ..(i)
b X 2 ..(ii)
∑ XY =∑
∑ X 2Y = a ∑ X 2 + c ∑ X 4 ..(iii)
∑ XY
(c
∑Y − c ∑ X 2 Notes
e
Further, from equation (i), we get a = ...(v)
n
in
n ∑ X 2Y − (∑ X 2 )(∑ Y )
And from equation(iii), we get c = ...(vi)
n ∑ X 4 − (∑ X 2 ) 2
nl
Thus, equations (iv), (v) and (vi) can be used to determine the values of the
constants a, b and c.
O
5.1.9 Standard Error of Estimate
Standard Error of Estimate is the measure of variation around the computed
regression line.
ity
Standard error of estimate (SE) of Y measure the variability of the observed values
of Y around the regression line. Standard error of estimate gives a measure about the
line of regression. of the scatter of the observations about the line of regression.
s
Y = Observed value of y er
Ye = Estimated values from the estimated equation that correspond to each y value
Regression Coefficient of X on Y
The regression coefficient of X on Y is represented by the symbol bxy that
ity
measures the change in X for the unit change in Y. Symbolically, it can be represented
as: The bxy can be obtained by using the following formula when the deviations are
taken from the actual means of X and Y: When the deviations are obtained from the
assumed mean, the following formula is used:
m
Regression Coefficient of Y on X
The symbol byx is used that measures the change in Y corresponding to the unit
)A
●● In case, the deviations are taken from the actual means; the following formula
is used:
●● The byx can be calculated by using the following formula when the deviations
(c
e
determines the slope of the line i.e., the change in the independent variable
for the unit change in the independent variable.
in
5.1.10 Regression Coefficient
The coefficients of regression are YX b and XY b. They have following implications:
nl
●● Slopes of regression lines of Y on X and X on Y viz. YX b and XY b must have
same signs (because r² cannot be negative).
●● Correlation coefficient is geometric mean of YX b and XY b.
O
●● If both slopes YX b and XY b are positive correlation coefficient r is positive. If both
YX b and XY b are negative the correlation coefficient r is negative.
●● Both regression lines intersect at point (X,Y )
ity
As in case of calculation of correlation coefficient, we can directly write the formula
for the two regression coefficients for a bivariate frequency distribution as given below –
N ∑ ∑ fi j xi y j − (∑ fi xi )(∑ f j y j )
s
b=
N ∑ fi xi 2 − (∑ fi xi ) 2
er
Xi − A YJ − B
if we define ui
or,= = and vj
h k
k N ∑ ∑ fi j ui v j − (∑ f i ui )(∑ f j xJ )
v
b=
h N ∑ fi ui 2 − (∑ fi ui ) 2
ni
N ∑ ∑ fi j xi y j − (∑ fi xi )(∑ f j yJ )
Similarly d=
N ∑ f j y j 2 − (∑ f j y j )2
U
h N ∑ fi j ui v j − (∑ f i ui )(∑ f j vJ )
or d =
k N ∑ f j v j 2 − (∑ f j v j )2
ity
is a procedure for decomposing the time series in these patterns. These are used for
forecasting. However, more accurate and statistically sound procedure is to identify
the patterns in time series using auto-correlations that was explained in previous
)A
subsection. It is correlation between the values of same variable at different time lag.
When the time series represents completely random data, the auto correlation
for various time lags is close to zero with values fluctuating both on positive and
negative side. If auto correlation slowly drops to zero, and more than two or three
(c
differ significantly from zero, it indicates presence of trend in the data. The trend can
be removed by taking difference between consecutive values and constructing a new
series. This is called numerical differentiation.
Amity Directorate of Distance & Online Education
Statistics Management 97
Definition
Notes
e
A time series is a collection of data obtained by observing a response variable
atperiodic points in time. If repeated observations on a variable produce a time series,
in
the variable is called a time series variable. We use Yi to denote the value of the
variable at time i.
nl
Objectives of Time Series
The analysis of time series implies its decomposition into various factors that affect
the value of its variable in a given period. It is a quantitative and objective evaluation of
O
the effects of various factors on the activity under consideration.
There are two main objectives of the analysis of any time series data:
ity
2. To make forecasts for future. The study of past behaviour is essential because
it provides us the knowledge of the effects of various forces. This can facilitate
the process of anticipation of future course of events and, thus, forecasting the
value of the variable as well as planning for future.
s
5.2.2 Variation in Time Series er
Time Series analysis – Secular Component
Secular trend or simply trend is the general tendency of the data to increase or
decrease or stagnate over a long period of time. Most of the business and economic
v
time series would reveal a tendency to increase or to decrease over a number of years.
For example, data regarding industrial production, agricultural production, population,
ni
bank deposits, deficit financing, etc., show that, in general, these magnitudes have
been rising over a fairly long period. As opposed to this, a time series may also reveal
a declining trend, e.g., in the case of substitution of one commodity by another, the
U
demand of the substituted commodity would reveal a declining trend such as the
demand for cotton clothes, demand for coarse grains like bajra, jowar, etc. With
the improved medical facilities, the death rate is likely to show a declining trend, etc.
The change in trend, in either case, is attributable to the fundamental forces such as
ity
weekly, daily. A time series, where the time interval between successive observations
is less than or equal to one year, may have the effects of both the seasonal and cyclical
variations. However, the seasonal variations are absent if the time interval between
successive observations is greater than one year.
)A
e
time series variable and the resulting changes are known as seasonal variations. For
example, the sale of woolen garments is generally at its peak in the month of November
in
and December because of the beginning of winter season. Similarly, timely rainfall may
increase agricultural output, prices of agricultural commodities are lowest during their
harvesting season, etc., reflect the effect of climatic conditions on the value of time
series variable.
nl
Customs and Traditions: The customs and traditions of the people also give rise
to the seasonal variations in time series. For example, the purchase of clothing and
ornaments may be highest during the marriage season, sale of sweets during Diwali,
O
etc., are variations which are the results of customs and traditions of the people.
ity
●● Cyclical variations are revealed by most of the economic and business
time series and, therefore, are also termed as trade or the business cycles.
Any trade cycle has four phases which are respectively known as boom,
recession, depression and recovery.
s
●● Various phases repeat themselves regularly one after another in the given
sequence. The time interval between two identical phases is known as the
er
period of cyclical variations. The period is always greater than one year.
Normally, the period of cyclical variations lies between 3 to 10 years.
fire, floods, war, famines, etc. Random variations is that component of a time series
that cannot be explained in terms of any of the components discussed so far. This
component is obtained as a residue after the elimination of trend, seasonal and cyclical
components and hence is often termed as residual component. Random variations
m
are usually short-term variations but sometimes their effect may be so intense that the
value of trend may get permanently affected.
Numerical Application
)A
Using the method of Free hand determine the trend of the following data:
Production 42 44 48 42 46 50 48 52
(c
(in tonnes)
Solution:
Notes
e
in
nl
O
s ity
Example 2 - Find trend values from the following data using three yearly moving
averages and show the trend line on the graph.
er
Year Price (`) Year Price (`)
1994 52 2000 75
v
1995 65 2001 70
ni
1996 58 2002 64
1997 63 2003 78
1998 66 2004 80
U
1999 72 2005 73
Solution:
ity
Notes
e
in
nl
O
s ity
er
Key Terms
●● Correlation: Correlation is expressed by a coefficient ranging between –1 and +1.
v
Positive (+ve) sign indicates movement of the variables in the same direction.
●● Positive correlation: The correlation is said to be positive when the increase
ni
said to be linear.
●● Regression: Regression is a basic technique for measuring or estimating the
relationship among economic variables that constitute the essence of economic
theory and economic life.
m
a) Constant
b) Positive
c) Negative
Notes
e
d) Probability
2. The correlation that refers to the movement of the variables in opposite direction
in
a) Constant
b) Positive
nl
c) Negative
d) Probability
O
3. A ____ is a collection of data obtained by observing a response variable at
periodic points in time
a) Mean deviation
ity
b) Sample
c) Time Series
d) Hypothesis
4. Technique for estimating the relationship among economic variables that constitute
s
the essence of economic theory is ? er
a) Correlation
b) Time Series
c) Regression
v
d) Standard deviation
ni
5. In ____ the variation is between only two variables under study and the variation is
hardly influenced by any external factor.
a) Partial correlation
U
b) Total correlation
c) Standard correlation
d) Multiple correlation
ity
10 year period is given below. Determine the correlation coefficient between these
variables and comment the correlation.
X 50 50 50 40 30 20 20 15 10 5
Y 700 650 600 500 450 400 300 250 210 200
(c
e
1. b) Positive
2. c) Negative
in
3. c) Time Series
4. c) Regression
nl
5. d) Multiple correlation
Further Readings
O
1. Richard I. Levin, David S. Rubin, Sanjay Rastogi Masood Husain Siddiqui,
Statistics for Management, Pearson Education, 7th Edition, 2016.
2. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
ity
3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An
Introduction to Statistical Learning with Applications in R, Springer, 2016.
Bibliography
s
1. Srivastava V. K. etal – Quantitative Techniques for Managerial Decision Making,
Wiley Eastern Ltd
2.
er
Richard, I.Levin and Charles A.Kirkpatrick – Quantitative Approaches to Management,
McGraw Hill, Kogakusha Ltd.
3. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
v
4. Budnik, Frank S Dennis Mcleaavey, Richard Mojena – Principles of Operation
ni