MBA 105 Statistical Techniques
MBA 105 Statistical Techniques
Statistical Techniques
NOTES
IMPORTANCE OF STATISTICS IN
BUSINESS MANAGEMENT DOMAIN
CHAPTER OBJECTIVES
1. To familiarize the students with the use of statistics in business domain.
2. To familiarize the students with the data presentation techniques.
3. To develop the quantitative skills of the students to make them skilled at
understanding data, tabulation of data, diagrammatic and graphical
presentation of the data.
CONTENTS
1.1 definition of business statistics.
1.2 importance of statistics in business domain.
1.3 frequency distribution.
1.4 types of frequency distribution.
1.5 diagrammatic presentation of data.
a) Histogram.
b) Frequency polygon.
c) Frequency curve.
d) Cumulative frequency curves (ogive curves).
1.6 graphical presentation of data.
a) Simple bar diagram.
b) Sub divided bar diagram
c) Percentage bar diagram
d) Pie chart
KEY WORDS
Business domain, frequency, polygon, Histograms, Ogive curves, pie chart Importance of
Statistics In Business
Management Domain 11
Statistical Techniques INTRODUCTION
Statistics plays an important role in business. Statistical techniques refine
the raw data into finished data such as crude oil is refined into fuel before it can
NOTES be used in the automobile engine. The word Statistics comes from the Italian
word ‘Statista’. It means State. This word was first used by Professor Gottfried
Achenwall in the seventeenth century. As with the many other words the word
Statistics has different meanings for different persons. It is a body of knowledge
or a branch of Mathematics.
The singular word ‘statistic’ means a quantity calculated from the data and
the ‘statistics’ is a subject which includes different theorems and techniques
applicable for analysis of data.
There have been many definitions of Statistics. A few definitions are given
below.
Merriam Webster’s Definition: Merriam-Webster dictionary defines
statistics as "a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data.”
The above definition gives importance to collection of data and presentation
of collected data. It cannot cover all the aspects of Statistics.
Sir Arthur Lyon Bowley’s Definition: Sir Arthur Lyon Bowley defines
statistics as "Numerical statements of facts in any department of inquiry placed
in relation to each other.”
The above definition gives importance for finding the relation between
numerical statements. This definition is very short as it incarcerates the scope of
the Statistics.
Yule and Kendall’s Definition: “By Statistics we mean quantitative data
affected to a marked extent by multiplicity of causes.”
The above definition is also minuscule to explain the huge scope of the
Statistics.
Horace Secrist’s Definition: “By Statistics we mean aggregates of facts
affected to a marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to reasonable standards of accuracy, collected
in a systematic manner for a predetermined purpose and placed in relation to
each other.”
The above definition covers intelligibly all the functions and aspects of the
Statistics. It covers almost all the attributes of the Statistics.
The following figure exhibit the meaning of Statistics.
Importance of
Statistics In Business
12 Management Domain
Statistical Techniques
In Statistics, the frequency (or absolute frequency) of an event “i " is the number “ni”
of times the event occurred in an experiment or study.
A frequency distribution is a table that displays the frequency of various outcomes in
a sample. Each entry in the table contains the frequency or count of the occurrences of
values within a group or interval and in this way the table summarizes the distribution of
values in the sample.
According to Croxton and Cowden “A frequency distribution is a statistical table which
shows the set of all the distinct values of the variable arranged in the order of magnitude
either individually or in groups with their corresponding frequencies side by side.”
Solution
Importance of
Statistics In Business
14 Management Domain
Ex. 2: Given below are the marks obtained by 20 students in an Statistical Techniques
examination. Prepare a discrete frequency distribution table.
15, 25, 20, 30, 20, 40, 10, 40, 15, 20, 30, 10,10, 15, 20, 20, 40, 25, 25, 40
NOTES
Solution
Solution
Solution
Importance of
Statistics In Business
16 Management Domain
Solution Statistical Techniques
NOTES
Note
(a) It can be noted that, the less than cumulative frequency is increasing
in nature. Less than cumulative frequency of the lowest class is the
same as the usual frequency,
(b) It can be noted that, the more than cumulative frequency is decreasing
in nature. More than cumulative frequency of the lower class is equal
to the total of frequency (i.e. ∑f ) or the less than cumulative frequency
of the highest class.
(v) Relative frequency distribution: A Relative frequency is the
proportion of the number of the observations in the class. The formula of Relative
Frequency is as below.
Class Frequency
Relative Frequency = -------------------------
Total Frequency
Ex. 6: The following data gives the frequency distribution of wages (in
hundreds of rupees) of 60 workers. Find the relative frequencies.
Solution
Importance of
Statistics In Business
Management Domain 17
Statistical Techniques Types of Classification of frequency distribution: There are two types of
classification of frequency distribution inclusive type of frequency distribution
and exclusive type of frequency distribution.
NOTES (a) Inclusive type of frequency distribution: In this type of the
distribution the upper limit of class is not same as the lower limit of
succeeding class. There is discontinuity observed in between the
classes.
e.g.
NOTES
Solution
Importance of
Statistics In Business
Management Domain 19
Statistical Techniques Ex. 8: Draw a Histogram to represent the following frequency distribution.
NOTES
Solution
Importance of
Statistics In Business
20 Management Domain
Solution Statistical Techniques
NOTES
Solution
Importance of
Statistics In Business
Management Domain 21
Statistical Techniques Solution
NOTES
Solution
NOTES
Solution
Solution
Importance of
Statistics In Business
Management Domain 23
Statistical Techniques Solution
NOTES
Importance of
Statistics In Business
24 Management Domain
Solution Statistical Techniques
NOTES
Ex. 16: Draw a Simple Bar Diagram for the following data.
Solution
(b) Sub – divided Bar Diagram: In Sub divided bar diagram each bar is
divided in same number of parts and each bar representing the magnitude of a
given phenomenon which is sub divided into various components. Each
component of a bar occupies a part of the bar proportional to its share in the total.
Ex. 18: Draw a Sub Divided Bar Diagram for the following data.
Solution
Importance of
Statistics In Business
Management Domain 25
Statistical Techniques Ex. 17: Draw a Sub Divided Bar Diagram for the following data.
NOTES
Solution
Solution
Importance of
Statistics In Business
26 Management Domain
Ex. 19: Represent the following by Percentage Bar Diagram for the Statistical Techniques
following data.
NOTES
Solution
(d) Pie Chart: Pie diagrams are commonly used in practice. Percentages
are used for showing the breakdowns. While constructing the pie-chart follow
the following procedure.
Solution
Importance of
Statistics In Business
Management Domain 27
Statistical Techniques SUMMARY
In this chapter, we have studied the definitions of Statistics, importance of
Statistics in Business, Organising the data using frequency distribution and types
NOTES of frequency distribution. The chapter also elaborates various graphical
representation methods such as Histogram, frequency polygon and frequency
curves, cumulative frequency curves and diagrammatic representation methods
as Simple bar diagram, sub-divided bar diagram, percentage bar diagram and
pie-chart.
REFERENCES
1. "Definition of Statistics". www.merriam-webster.com. Retrieved 2016-
05-28. Web on 3 Nov 2017 4.30 pm in Wikipedia.
2. "Essay on Statistics: Meaning and Definition of Statistics". Economics
Discussion. 2014-12-02. Retrieved 2016-05-28. Web on 3 Nov 2017
4.30 pm in Wikipedia.
3. S.P. Gupta, “Statistical Methods.” Sultan Chand and Sons, Thirteenth
Revised Edition.2001.
4. M.G. Dhaygude,” Statistical and Quantitative Methods” Everest
Publishing House2004.Print.
EXERCISE
1. Define Statistics and explain its importance in Business.
2. “Statistics play an important role not only in production but also in
human resource management”. Explain this statement with appropriate
examples.
3. “Statistics is an effective tool in marketing”. Comment on the
statement.
4. “Successive marketing managers are those who are well equipped with
statistical tools.” Critically examine the statement and justify your
stand in favour as well as against the statement.
5. What is Statistics? Explain how Statistics is useful in decision making?
6. “Graphs and diagrams are effective methods than other methods of
presenting the data.” Justify your answer.
7. What is Histogram? How do you construct it? Explain with suitable
example.
8. “Pie-diagrams are useful for representing the percentages. Illustrate
your answer with suitable example.
9. Write note on Cumulative frequency distribution.
10. Draw a Histogram and frequency polygon for the following data.
Importance of
Statistics In Business
28 Management Domain
11. Construct a frequency table for the following data regarding annual Statistical Techniques
profits in thousands of rupees of 40 firms. Take classes as 10 – 20,
20 – 30…
11, 37, 25, 21, 60, 11, 43, 22, 25, 53 NOTES
33, 11, 31, 65, 68, 73, 16, 16, 86, 74
28, 20, 56, 27, 50, 39, 20, 13, 18, 70
18, 70, 18, 10, 56, 28, 13, 32, 10, 88
12. Given below are the weekly wages in rupees of 50 workers in a factory.
Prepare a frequency distribution table with classes as 10 – 19, 20 – 29, .....
58, 65, 47, 71, 67, 41, 55, 28, 27, 59
34, 16, 37, 68, 68, 73, 36, 56, 59, 33
28, 20, 56, 17, 50, 39, 20, 13, 18, 70
18, 30, 18, 10, 56, 28, 13, 36, 60, 58
61, 66, 69, 81, 60, 12, 37, 12, 45, 48
13. Draw a Histogram and frequency polygon for the following data.
Importance of
Statistics In Business
***** Management Domain 29
MODULE - 2
Statistical Techniques
NOTES
MEASURES OF CENTRAL TENDENCY
AND DISPERSION
CHAPTER OBJECTIVES
1. To determine the central tendency and dispersion for Individual and
Ungrouped data.
2. To determine the central tendency and dispersion for Grouped data.
CONTENTS
A measures of central tendency
2.1: arithmetic mean,
2.2: median,
2.3: mode
2.4: partition values: quartiles, deciles and percentiles
B measures of dispersion:
2.5: range,
2.6: mean deviation,
2.7: variance and standard deviation,
2.8: coefficient of variation
KEY WORDS
Central tendency, Dispersion, Partition values
2.2 MEDIAN
NOTES
Median is defined as the middle item of all given observations arranged in
order. For ungrouped data, the median is obvious. In case of the number of
measurements is even, the median is obtained by taking the average of the
middle.
Example 3: The median of the ungrouped data::
20, 18, 15, 15, 14, 12, 11, 9, 7, 6, 4, 1 is
Median =
Median = 11.5
For grouped data, the median can be found by first identify the class
containing the median, then apply the following formula:
2.3 MODE
Mode is the value which occurs most frequency. The mode may not exist,
and even if it does, it may not be unique.
For ungrouped data, we simply count the largest frequency of the given
value. If all are of the same frequency, no mode exits. If more than one values
have the same largest frequency, then the mode is not unique.
Example 4: The value for the mode of the data in Example
Measures of 3 is 15 (unimodal)
Central Tendency
32 And dispersion
Example 5 : Calculate mode for the following data. Statistical Techniques
{2, 2, 2, 4, 5, 6, 7, 7, 7}
Mode = 2 or 7 (Bimodal)
NOTES
For grouped data, the mode can be found by first identify the largest
frequency of that class, called modal class, then apply the following formula on
the modal class:
Note that the mode is independent of extreme values and it may be applied
in qualitative data.
For symmetrically distributed data, the mean, median and mode can be used
almost interchangeably.
. (Mean - Median)
For moderately skewed distribution data, their relationship can be given by
Mean - Mode ≈ 3
Physically, mean can be interpreted as the center of gravity of the
distribution. Median divides the area of the distribution into two equal parts and Measures of
mode is the highest point of the distribution. Central Tendency
And dispersion 33
Statistical Techniques
B. MEASURES OF DISPERSION
Here are some definitions of Dispersion as given by Statisticision from time
to time.
“Dispersion is the measure of variation of the item”- A.L. Bowley.
“The degree to which the numerical data tend to spread about an average
value is called the variation or dispersion of the data.” – Spiegal.
The measures of Dispersion are range, mean absolute deviation, variance,
standard deviation, coefficient of variation
2.5 RANGE
Range is the difference between two extreme values. The range is easy to
calculate but cannot be obtained if open ended grouped data are given. noted the
value (Q3 - Q1) / 2 as the Quartile Deviation, QD, or the semi-interquartile range.
Mean absolute deviation is the mean of the absolute values of all deviations
from the mean. Therefore, it takes every item into account. Mathematically it
is given as:
NOTES
Example 7
Measures of
Central Tendency
36 And dispersion
Statistical Techniques
NOTES
Skewness
The skewness is an abstract quantity which shows how data piled-up. A
number of measures have been suggested to determine the skewness of a given
distribution. One of the simplest one is known as Pearson’s measure of skewness:
If the tail is on the right, we say that it is skewed to the right, and the
coefficient of skewness is positive.
If the tail is on the left, we say that is skewed to the left and the coefficient
of skewness is negative.
Measures of
Central Tendency
And dispersion 37
Statistical Techniques Example 12: The salary scales of the two companies are given below.
Company A: $5,000 $15,000 $25,000 $35,000 $45,000 $55,000
Company B: $5,000 $5,000 $5,000 $55,000 $55,000 $55,000
NOTES
Calculate: (i) Range, Mean Absolute Deviation, Variance
(ii) Standard deviation, Coefficient of Variation, Skewness
(i) Range
Company A: $55,000 $5,000
= $50,000
Company B: $55,000 $5,000
= $50,000
(iii) Variance
Company A: $2 { (5,000 30,000)2 + (15,000 30,000)2 + (25,000
30,000)2 + (35,000 30,000)2 + (45,000 30,000)2 +
(55,000 30,000)2 } / 6
= $2291,666,667
Company B: $2 { (5,000 30,000)2 + (5,000 30,000)2 + (5,000
30,000)2 + (55,000 30,000)2 + (55,000 30,000)2 +
(55,000 30,000)2 } / 6
= $2625,000,000
SUMMARY
In this chapter, we have studied the measures of central tendency as mean,
median and mode. The examples of individual data, ungrouped data and grouped
data are used for calculation of various central tendencies. Measures of variation
such as range, mean deviation, standard deviation, and coefficient of variation
are also studied.
REFERENCES
1. www.mypolyuweb.hk/machanck/lectnotes/c1_desc.doc
EXERCISE
1. In the following list, post a D for the situations in which statistical
techniques are used for the purpose of description and an I for those
in which the techniques are used for the purpose of inference.
(a) The price movements of 50 issues of stock are analysed to
determine whether stocks in general have gone up or down during
a certain period.
(b) A statistical table is constructed for presenting the passenger-
miles flown by various commercial airlines in the United States.
(c) The average of a group of test scores is computed so that each
score in the group can be classified as being either above or below
average.
(d) Several manufacturing firms in a industry are surveyed for the
purpose of estimating industrywide investment in capital
equipment.
Measures of
Central Tendency
And dispersion 39
Statistical Techniques 4. The word “statistics” has at least three distinct meanings, depending on
the context in which it is used. It may refer to:
(i) the procedure of statistical analysis
NOTES
(ii) descriptive measures of a sample
(iii) the individual measurements, or elements, that make up either a
sample or a population.
(a) When one becomes “an accident statistics” by being included in
some count of accident frequency, the term is used in the sense
of definition ____.
(b) According to the definitions in a course of study called “Business
Statistics” the term “statistics” is usually used in the sense of
definition ____.
(c) According to the definitions when such sample statistics as the
proportion of a sample in favour of a proposal and the average
age of those in the sample are determined, the term “statistics” is
being used in the sense of definition ____.
10. How does the computation of a sample variance differ from the
computation of a population variance?
(a) μ is replaced by x
(b) N is replaced by n 1
(c) N is replaced by n
(d) a and c but not b
(e) a and b but not c
16. An electronically controlled automatic bulk food filler is set to fill tubs
with 60 units of cheese. A random sample of five tubs from a large
production lot shows filled weights of 60.00, 59.95, 60.05, 60.02 and
60.01 units. Find the mean and the standard deviation of these fills.
18. Find the mean, median and mode for the set of numbers
(a) 3, 5, 2, 6, 5, 9, 5, 2, 8, 6;
(b) 51.6, 48.7, 50.3, 49.5, 48.9.
20. The 1971 populations and growth rates for various regions are given
below. Find the growth rate for the world as a whole
Region Population (millions) Annual Growth Rate (%)
Europe 470 0.8
USSR 240 1.1
N. America 230 1.3
Oceania 20 2.1
Asia 2,100 2.3
Africa 350 2.6
S. America 290 2.9
21. Suppose that the annual incomes of the residents of a certain country
has a mean of $48,000 and a median of $34,000. What is the shape of
the distribution?
22. In a factory, the time during working hours in which a machine is not
operating as a result of breakage or failure is called the ‘downtime”.
The following distribution shows a sample of 100 downtimes of a
certain machine (rounded to the nearest minute) :
Downtime Frequencies
0-9 3
Measures of
10 - 19 13 Central Tendency
And dispersion 43
Statistical Techniques 20 - 29 30
30 - 39 25
40 - 49 14
NOTES
50 - 59 8
60 - 69 4
70 - 79 2
80 - 89 1
With reference to the above distribution, calculate
(a) the mean.
(b) the standard deviation.
(c) the median.
(d) the quartiles Q1 and Q3.
(e) the deciles D1 and D9.
(f) the percentiles P5 and P95.
(g) Pearson’s first and second coefficients of skewness.
(h) the modal downtime of the distribution by the empirical formula
(using the results obtained in part (a) and part (c) only). Compare
this result with the mode obtained in part (g).
26. The National Space Agency requires that all resistors used in electronic
packages assembled for space flight have a coefficient of variation less
than 5 percent. The following resistors made by the Mary Drake
Company have been tested with results as follows :
Resistor Mean Resistance Standard Deviation
A 100 K-ohms 4 K-ohms
B 200 12
C 300 14
D 400 16
E 500 18
F 600 20
Which of the resistors meets specifications?
27. Salaries paid last year to supervisors had a mean of $25,000 with a
standard deviation of $2000. What will be the new mean and standard
deviation if all salaries are increased by $2500? Measures of
Central Tendency
***** And dispersion 45
MODULE - 3
Statistical Techniques
NOTES
CORRELATION ANALYSIS
CHAPTER OBJECTIVE
1. To familiarize the students with one of the technique of analysis of two
variables.
2. To develop the quantitative skills of the students to make them skilled at
understanding the bivariate data analysis technique.
CONTENTS
3.1 definition of correlation.
3.2 types of correlation.
3.3 Methods of studying correlation.
A) scatter diagrams.
B) karl pearson’s coefficient of correlation.
C) rank correlation.
3.4 Coefficient of determination.
KEY WORDS
Correlation, Scatter Diagram, Rank Correlation, Determination.
INTRODUCTION
If for two quantities under study variation in one is accompanied by the
variation in the other, then we say that these quantities are correlated. The degree
of relationship between the variables is measured through the correlation analysis.
Therefore, the Correlation is an analysis of the covariation between two or more
variables. The measure of correlation is called as correlation coefficient. It
indicates the direction and degree of correlation between the variables.
46 Correlation Analysis
Statistical Techniques
1. Positive Correlation
If both the variables under study vary in the same direction i.e. if one
variable (say X) increases then the other variable (say Y) also increases
correspondingly or vice versa then there is a Positive Correlation between the
two variables.
e.g. Rainfall and yield of a crop, consumer spending and gross domestic
product (GDP). Demand and price etc.
2. Negative Correlation
If variables vary in opposite direction i.e. if one variable (say X) increases
then the other variable (say Y) decreases correspondingly or vice versa then there
is a Negative Correlation between the two variables.
e.g. Price and demand, the more time a person spends at the mall purchasing
goods, the less money he has in his . The higher an investor's mutual fund , the
lower his investment returns. The more hours a person spends at the office, the
less time he has for other activities.
3. No Correlation
If change in one variable is not related to change in other variable then there
is no Correlation between two variables.
Correlation Analysis 47
Statistical Techniques There are other types of Correlation as:
• Simple Correlation
• Partial Correlation
NOTES • Multiple Correlation
• Linear Correlation
• Non- Linear Correlation
Simple Correlation: If two variables are study simultaneously then there
is Simple Correlation between two variables.
Partial Correlation: If more than two variables are study and considering
only two variables at a time and treating remaining variables constants then there
is Partial Correlation between variables.
Multiple Correlation: If more than two variables are study simultaneously
then there is Multiple Correlation between variables.
Linear Correlation: If a unit change in one variable then there is a constant
corresponding change in other over the entire range of values then there is Linear
Correlation between the variables.
A. Scatter Diagram
It is a graphical representation of bivariate distribution.
Let there be n pairs of values representing the two variables X and Y. i.e.
(x1, y1), (x2, y2)……., (xn, yn),. Taking one of the variable on X axis and other
on Y axis we get a diagram called as Scatter Diagram. This diagram explains the
nature of Correlation. But we can’t establish the exact degree of Correlation here.
These diagrams are classified into five different categories which are
shown below.
LINEAR CORRELATION
The purpose of a linear correlation analysis is to determine whether there is
a relationship between two sets of variables. We may find that: 1) there is a
positive correlation, 2) there is a negative correlation, or 3) there is no correlation.
These relationships can be easily visualized by using scatter diagrams.
48 Correlation Analysis
Positive Correlation Statistical Techniques
NOTES
Notice that in this example as the heights increase, the diameters of the
trunks also tend to increase. If this were a perfect positive correlation all of the
points would fall on a straight line. The more linear the data points, the closer
the relationship between the two variables.
Negative Correlation
Notice that in this example as the number of parasites increases, the harvest
of unblemished apples decreases. If this were a perfect negative correlation all
of the points would fall on a line with a negative slope. The more linear the data
points, the more negatively correlated are the two variables.
Correlation Analysis 49
Statistical Techniques No Correlation
NOTES
r=
50 Correlation Analysis
Solution Statistical Techniques
NOTES
r=
Solution
Correlation Analysis 51
Statistical Techniques
NOTES
52 Correlation Analysis
Ex 3: Calculate Coefficient of Correlation for the following data. Statistical Techniques
NOTES
Solution
Correlation Analysis 53
Statistical Techniques Calculate Spearman’s rank coefficient for above data and comment on
result.
NOTES Solution
Let us consider the ranks given by Judge I and Judge II be X and Y
respectively.
54 Correlation Analysis
Ex. 5: The marks of 12 candidates in a certain test of Statistics and Statistical Techniques
Economics are as follows;
NOTES
Solution
First, we must rank the data.
Geography History
19 = 1 13 = ½ (1+2) = 1.5
18 = 1/3 (2 + 3 + 4) = 3 12 = 1/3 (3+4+5) = 4
17 = ½ (5 + 6) = 5.5 11= 1/3 (6+7+8) = 7
16 = ½ (7 + 8) = 7.5 10 = 9
15 = ½ (9 + 10) = 9.5 9 = 10
14 = 11 8 = 11
10 = 12 7 = 12
Let the ranks given to Statistics be X and the ranks given to Economics be Y
Correlation Analysis 55
Statistical Techniques
NOTES
56 Correlation Analysis
Coefficient of Determination = r2 = Explained Variance / Total Variance Statistical Techniques
Ex: 6: Calculate coefficient of correlation, coefficient of determination from
the following data.
NOTES
Solution
Here r = 0.99
R2 = 0.9801
SUMMARY
In this chapter, we have studied the definitions of Correlation, types of
correlation. This chapter also elaborates methods of studying correlation such as
scatter diagrams, Karl Pearson’s coefficient of correlation, Spearman’s rank
correlation. Coefficient of determination is explained by the examples.
REFERENCES
1. https://en.oxforddictionaries.com/definition/correlation
2. "Definition of Statistics". www.merriam-webster.com. Retrieved 2016-
05-28. Web on 3 Nov 2017 4.30 pm in Wikipedia.
3. "Essay on Statistics: Meaning and Definition of Statistics". Economics
Discussion. 2014-12-02. Retrieved 2016-05-28. Web on 3 Nov 2017
4.30 pm in Wikipedia.
4. S.P. Gupta, “Statistical Methods.” Sultan Chand and Sons, Thirteenth
Revised Edition.2001.
5. M.G. Dhaygude,” Statistical and Quantitative Methods” Everest
Publishing House2004.Print.
Correlation Analysis 57
Statistical Techniques EXERCISE
1. From the data given below calculate the coefficient of correlation
between marks in mathematics and marks in statistics.
NOTES
*****
58 Correlation Analysis
MODULE - 4
Statistical Techniques
NOTES
REGRESSION ANALYSIS
CHAPTER OBJECTIVES
1. To Distinguish between a dependent variable and an independent
variable and analyse data.
2. To Examine possible relationships between two variables.
3. To Develop simple linear regression models and use them as a
forecasting tool.
CONTENTS
4.1 Introduction
4.2 Definition of regression.
4.3 Regression lines.
4.4 Regression coefficient
4.5 Applications of regression in business.
KEY WORDS
Regression, Regression lines, Regression coefficient
4.1 INTRODUCTION
Regression Analysis 59
Statistical Techniques
60 Regression Analysis
Solutions: a. Statistical Techniques
NOTES
e.
2. a.
NOTES
e.
3. a.
b.
c.
4. a.
62 Regression Analysis
b. There appears to be a positive linear relationship between x = height Statistical Techniques
and y = weight.
c. Many different straight lines can be drawn to provide a linear
approximation of the relationship between x and y; in part (d) we will NOTES
determine the equation of a straight line that “best” represents the
relationship according to the least squares criterion.
d.
1. Predictive Analytics
Predictive analytics i.e. forecasting future opportunities and risks is the most
prominent application of regression1 analysis in business. Demand analysis, for
instance, predicts the number of items which a consumer will probably purchase.
However, demand is not the only dependent variable when it comes to business.
Regression analysis can go far beyond forecasting impact on direct revenue. For
example, we can forecast the number of shoppers who will pass in front of a
billboard and use that data to estimate the maximum to bid for an advertisement.
Insurance companies heavily rely on regression analysis to estimate the credit
standing of policyholders and a possible number of claims in each time.
2. Operation Efficiency
Regression models can also be used to optimize business processes. A
factory manager, for example, can create a statistical model to understand the
impact of oven temperature on the shelf life of the cookies baked in those ovens.
In a call center, we can analyze the relationship between wait times of callers
and number of complaints. Data-driven decision making eliminates guesswork,
hypothesis and corporate politics from decision making. This improves the
business performance by highlighting the areas that have the maximum impact
on the operational efficiency and revenues.
Regression Analysis 63
Statistical Techniques 3. Supporting Decisions
Businesses today are overloaded with data on finances, operations and
customer purchases. Increasingly, executives are now leaning on data analytics
NOTES to make informed business decisions thus eliminating the intuition and gut feel.
Regression analysis can bring a scientific angle to the management of any
businesses. By reducing the tremendous amount of raw data into actionable
information, regression analysis leads the way to smarter and more accurate
decisions. This does not mean that regression analysis is an end to managers
creative thinking. This technique acts as a perfect tool to test a hypothesis before
diving into execution.
4. Correcting Errors
Regression is not only great for lending empirical support to management
decisions but also for identifying errors in judgment. For example, a retail store
manager may believe that extending shopping hours will greatly increase sales.
Regression analysis, however, may indicate that the increase in revenue might
not be sufficient to support the rise in operating expenses due to longer working
hours (such as additional employee labor charges). Hence, regression analysis
can provide quantitative support for decisions and prevent mistakes due to
manager's intuitions.
5. New Insights
Over time businesses have gathered a large volume of unorganized data that
has the potentialto yield valuable insights. However, this data is useless without
proper analysis. Regression analysis techniques can find a relationship between
different variables by uncovering patterns that were previously unnoticed. For
example, analysis of data from point of sales systems and purchase accounts may
highlight market patterns like increase in demand on certain days of the week or
at certain times of the year. You can maintain optimal stock and personnel before
a spike in demand arises by acknowledging these insights.
SUMMARY
In this chapter, we have studied the definitions of Regression, regression
lines. This chapter also focused on regression coefficients. In this chapter
applications of regression are also discussed.
REFERENCES
1. https://en.oxforddictionaries.com/definition/correlation
2. https://www.newgenapps.com
3. "Definition of Statistics". www.merriam-webster.com. Retrieved 2016-
05-28. Web on 3 Nov 2017 4.30 pm in Wikipedia.
4. "Essay on Statistics: Meaning and Definition of Statistics". Economics
Discussion. 2014-12-02. Retrieved 2016-05-28. Web on 3 Nov 2017
4.30 pm in Wikipedia.
5. S.P. Gupta, “Statistical Methods.” Sultan Chand and Sons, Thirteenth
Revised Edition.2001.
6. M.G. Dhaygude,” Statistical and Quantitative Methods” Everest
64 Regression Analysis Publishing House2004.Print.
EXERCISE Statistical Techniques
1. From the data given below calculate two regression lines and estimate
the marks of mathematics when marks of statistics are 50.
NOTES
2. From the data given below calculate two regression lines and
(i) Estimate X when Y is 58
(ii) Estimate Y when X is 30
4. From the data given below calculate two regression lines and
(i) Estimate X when Y is 70
(ii) Estimate Y when X is 60
5. Given:
Regression Analysis 65
MODULE - 5
Statistical Techniques
NOTES
PROBABILITY AND PROBABILITY
DISTRIBUTION
CHAPTER OBJECTIVES
1. Develop problem-solving techniques needed to accurately calculate
probabilities.
2. Apply problem-solving techniques to solving real-world events.
3. Apply selected probability distributions to solve problems.
CONTENTS
5.1 Sample space, events
5.2 Experiment, random experiment
5.3 Probability
5.4 Random variable
5.5 Conditional probability
5.6 Baye’s theorem
5.7 Probability distributions- binomial distribution, poisson distribution,
normal distribution
KEY WORDS
Sample space, Events, Random variable, Conditional probability, Baye’s
theorem, Binomial, Poisson, Normal distribution.
INTRODUCTION
Probabilities are associated with experiments where the outcome is not
known in advance or cannot be predicted. For example, if you toss a coin, will
you obtain a head or tail? If you roll a die will obtain 1, 2, 3, 4, 5 or 6? Probability
Probability And measures and quantifies "how likely" an event, related to these types of
Probability experiment, will happen. The value of a probability is a number between 0 and
66 Distribution
1 inclusive. An event that cannot occur has a probability (of happening) equal to Statistical Techniques
0 and the probability of an event that is certain to occur has a probability equal
to 1.(see probability scale below).
NOTES
SAMPLE SPACE: The sample space is the set of all possible outcomes in
an experiment. It is denoted by S.
Examples
When a coin is tossed, S = {H, T} where H = Head and T = Tail
When a dice is thrown, S = {1, 2, 3, 4, 5, 6}
When two coins are tossed, S = {HH, HT, TH, TT} where H = Head and T = Tail
If two dice are rolled, the sample space S is given by
S = { (1,1),(1,2),(1,3),(1,4),(1,5),(1,6)
(2,1),(2,2),(2,3),(2,4),(2,5),(2,6)
(3,1),(3,2),(3,3),(3,4),(3,5),(3,6)
(4,1),(4,2),(4,3),(4,4),(4,5),(4,6)
(5,1),(5,2),(5,3),(5,4),(5,5),(5,6)
(6,1),(6,2),(6,3),(6,4),(6,5),(6,6) }
EVENT
An event is some specific outcome of an experiment. Any subset of a
Sample Space is an event. Events are generally denoted by capital letters A, B,
C, D etc.
Examples
When a coin is tossed, outcome of getting head or tail is an event
When a die is rolled, outcome of getting 1 or 2 or 3 or 4 or 5 or 6 is an event
A die is rolled. Let us define event E as the set of possible outcomes where Probability And
the number on the face of the die is even. Event E is given by E = {2,4,6} Probability
Distribution 67
Statistical Techniques Two coins are tossed (see example 2 above for the sample space). Let us
define event E as the set of possible outcomes where the number of head obtained
is equal to two. Event E is given by
NOTES E = {(HT), (TH)}
Two dice are rolled (see example 3 above for the sample space). Let us
define event E as the set of possible outcomes where the sum of the numbers on
the faces of the two dice is equal to four. Event E is given by E = {(1,3),
(2,2), (3,1)}
Events can be classified in various ways depending on different
characteristics.
Simple Events: Simple events are those events where only a single
experiment is carried out. Tossing of a coin or rolling a dice are known as simple
events. Simple events can be complementary to each other and independent or
dependent on each other. In the case of simple events, we take the probability of
occurrence of single events.
Examples
Probability of getting a Head (H) when a coin is tossed
Probability of getting 1 when a die is thrown
Compound Events: Compound events are those events which have more
than one experiments occurring together. For example, rolling a dice and tossing
a coin together will be known as a compound event. The sample space of
compound events is obtained using lists, table and tree diagrams. In the case of
compound events, we take the probability of joint occurrence of two or more
events.
Example: When two coins are tossed, probability of getting a Head (H) in
the first toss and getting a Tail (T) in the second toss.
Independent Events: Events can be said to be independent if the
occurrence or non-occurrence of one event does not influence the occurrence or
non-occurrence of the other. Independent events are those events where the
occurrence of one event will not affect the probability of occurrence of the other
event. If A and B are two independent events, then the probability of both A and
B will be written as
Example
If we toss a coin twice then the outcomes are independent of each other.
When a coin is tossed twice, the event of getting Tail(T) in the first toss and the
event of getting Tail(T) in the second toss are independent events. This is because
the occurrence of getting Tail(T) in any toss does not influence the occurrence
of getting Tail(T) in the other toss.
Dependent Events: Dependent events are those where occurrence of one
Probability And event will affect the other event.
Probability
68 Distribution
Example Statistical Techniques
If there are 3 red and 2 green balls in a bag and one green ball has been
taken out then the probability of getting a green ball in next attempt will get
affected. NOTES
Impossible Event: If the probability of an event is zero, then it will be
known as an impossible event. The empty set is known as the impossible event.
If the probability of an event is 1 then that event will occur. Such event is known
as sure event.
Exhaustive Event: Exhaustive Event is the total number of all possible
outcomes of an experiment. Events which together exhaust the whole sample
space are known as exhaustive events.
Example
Sample space = {1,2,3,4,5,6}, Odd = {1, 3, 5}, Even = {2, 4, 6}
The union of odd and even events will add up to the sample space.
When a coin is tossed, we get either Head or Tail. Hence there are 2
exhaustive events.
When two coins are tossed, the possible outcomes are (H, H), (H, T), (T,
H), (T, T). Hence there are 4 (=22) exhaustive events.
When a dice is thrown, we get 1 or 2 or 3 or 4 or 5 or 6. Hence there are 6
exhaustive events.
when we roll a die one event is to get all odd numbers and other is to get all
even numbers. Both the events together will exhaust the whole sample space.
Equally Likely Events: Events are said to be equally likely if there is no
preference for a event over the other.
Examples
When a coin is tossed, Head (H) or Tail is equally likely to occur.
When a dice is thrown, all the six faces (1, 2, 3, 4, 5, 6) are equally likely
to occur.
Mutually Exclusive Events: Two or more than two events are said to be
mutually exclusive if the occurrence of one of the events excludes the occurrence
of the other.
Examples
When a coin is tossed, we get either Head or Tail. Head and Tail cannot
come simultaneously. Hence occurrence of Head and Tail are mutually
exclusive events.
When a die is rolled, we get 1 or 2 or 3 or 4 or 5 or 6. All these faces cannot
come simultaneously. Hence occurrences of faces when rolling a die are mutually
exclusive events.
Probability And
Probability
Distribution 69
Statistical Techniques Note: If A and B are mutually exclusive events, A ∩∩ B = where
represents empty set.
Consider a die is thrown and A be the event of getting 2 or 4 or 6 and B be
NOTES the event of getting 4 or 5 or 6. Then A = {2, 4, 6} and B = {4, 5, 6}
Here A ∩ B ≠ . Hence A and B are not mutually exclusive events.
ALGEBRA OF EVENTS
Let A and B are two events with sample space S. Then
A U B is the event that either A or B or Both occur. (i.e., at least one of A
or B occurs)
A ∩ B is the event that both A and B occur
is the event that A does not occur
A∩ B is the event that none of A and B occurs
Example: Consider a die is thrown, A be the event of getting 2 or 4 or 6 and
B be the event of getting 4 or 5 or 6. Then
A = {2, 4, 6} and B = {4, 5, 6}, A B = {2, 4, 5, 6}, A ∩ B = {4, 6}, A¯
= {1, 3, 5}, B¯ = {1, 2, 3}, ¯A∩¯B = {1,3}
5.2 EXPERIMENT
Examples
NOTES
Tossing of a fair coin: When we toss a coin, the outcome will be either Head
(H) or Tail (T)
Throwing an unbiased die: Die is a small cube used in games. It has six
faces and each of the six faces shows a different number of dots from 1 to 6.
Plural of die is dice.
When a die is thrown or rolled, the outcome is the number that appears on
its upper face and it is a random integer from one to six, each value being
equally likely.
Drawing a card from a pack of shuffled cards: A pack or deck of playing
cards has 52 cards which are divided into four categories as given below
Spades (♠) Clubs (♣) Hearts (♥) Diamonds (♦)
Each of the above-mentioned categories has 13 cards, 9 cards numbered
from 2 to 10, an Ace, a King, a Queen and a jack
Hearts and Diamonds are red faced cards whereas Spades and Clubs are
black faced cards.
Kings, Queens and Jacks are called face cards
Taking a ball randomly from a bag containing balls of different colours
5.3 PROBABILITY
Two dice are rolled. What is the probability that the sum on the top face of
both the dice will be greater than 9?
Total number of outcomes possible when a die is rolled = 6
(∵ any one face out of the 6 faces)
Hence, total number of outcomes possible two dice are rolled, n(S) = 6 × 6 = 36
E = Getting a sum greater than 9 when the two dice are rolled
= {(4, 6), {5, 5}, {5, 6}, {6, 4}, {6, 5}, (6, 6)}
Hence, n(E) = 6
GENERAL PROBABILITY
Probability is based on observations of certain events. Probability of an
event is the ratio of the number of observations of the event to the total numbers
of the observations. An experiment is a situation involving chance or probability
that leads to results called outcomes. An outcome is the result of a single trial of
an experiment. The probability of an event is the measure of the chance that the
event will occur because of an experiment.
Probability of an event A is symbolized by P(A).
Probability of an event A is lies between 0 ≤ P(A) ≤ 1.
PROBABILITY FORMULA
Probability is the measure of how likely an event is. And an event is one or
more outcomes of an experiment. Probability formula is the ratio of number of
favorable outcomes to the total number of possible outcomes.
Definition
For any two events A and B, they can be classified as exclusive, exhaustive,
independent and dependent events. Based on the types of events they are, the
probability of A, P(A) and the probability of B, P(B) will have different rules of
probability for their addition, subtraction and multiplication.
Probability And
Probability
Distribution 73
Statistical Techniques
NOTES
For example: If in a biased coin the probability of getting a head is 0.8 then
what will be the probability of getting a tail?
Solution
The probability of getting a tail will be, 1−0.8=0.2.
Independent Probability Rules: Two events are known to be independent
events if occurrence of one event will have no effect on the probability of the
other one. If A and B are two independent events, and P(A) and P(B) be their
probabilities then independent probability rule of multiplication is,
Probability of both A and B occurring,
Example
Event of getting a king from the deck of 52 cards and getting a 6 if a dice
is rolled are independent events. Find the probability of both these events
happening together.
Solution
Probability of getting a king,
Probability of getting a 6,
Probability of both the events happening =
Example
Suppose a variable X can take the values 1, 2, 3, or 4.
The probabilities associated with each outcome are described by the
following table:
Outcome 1 2 3 4
Probability 0.1 0.3 0.4 0.2
The probability that X is equal to 2 or 3 is the sum of the two probabilities:
P(X = 2 or X = 3) = P(X = 2) + P(X = 3) = 0.3 + 0.4 = 0.7.
Similarly, the probability that X is greater than 1 is equal to 1 - P(X = 1) =
1 - 0.1 = 0.9, by the complement rule.
This distribution may also be described by the probability histogram shown
to the right:
Example
The cumulative distribution function for the above probability distribution
is calculated as follows:
The probability that X is less than or equal to 1 is 0.1,
The probability that X is less than or equal to 2 is 0.1+0.3 = 0.4,
The probability that X is less than or equal to 3 is 0.1+0.3+0.4 = 0.8, and Probability And
The probability that X is less than or equal to 4 is 0.1+0.3+0.4+0.2 = 1. Probability
Distribution 75
Statistical Techniques The probability histogram for the cumulative distribution of this random
variable is shown to the right:
NOTES
Imagine a student who takes leave from school twice a week excluding
Sunday. If it is known that he will be absent from school on Tuesday, then what
are the chances that he will also take a leave on Saturday in the same week?
It is observed that in problems where the occurrence of one event affects
the happening of the following event. These cases of probability are known as
conditional probability.
Let us now investigate conditional probability definition mathematically,
Definition
Probability And The probability of occurrence of any event A when another event B in
Probability relation to A has already occurred is known as conditional probability. It is
76 Distribution depicted by P(A|B).
Statistical Techniques
NOTES
Example
Two die are thrown simultaneously, and the sum of the numbers obtained
is found to be 7. What is the probability that the number 3 has appeared at
least once?
Solution
The sample space S would consist of all the numbers possible by the
combination of two dies. Probability And
Probability
Therefore, S consists of 6 × 6 i.e. 36 events. Distribution 77
Statistical Techniques Event A indicates the combination in which 3 has appeared at least once.
Event B indicates the combination of the numbers which sum up to 7.
A= (3,1), (3,2), (3,3) (3,4) (3,5) (3,6) (1,3) (2,3) (4,3) (5,3) (6,3)
NOTES
B = (1,6) (2,5) (3,4) (4,3) (5,2) (6,1)
Example
Out of 50 people surveyed in a study, 35 runs in which there are 20 males.
What is the probability the if the person surveyed is a running then he is a male?
Solution
Probability of the person being male and a runner, P (A ∩ B) = 20/50
Probability of person being runner, P(A) = 35/50
Probability of a person being male if he is runner,
Example
The probability of raining on Sunday is 0.07. If today is Sunday, then find
the probability of rain today.
Solution
Probability that it is raining and the day is Sunday, P(A B)=0.07
Probability that is is Sunday, P(B) =
Probability that it will rain if today is Sunday, P(A|B) = = 0.49
Hence, the compound probability of raining if it is Sunday is 0.49.
Example
In a school the third language must be chosen between Hindi and French.
If a student has taken French then what is the probability that he will take Hindi,
if the probability of taking Hindi is 0.34?
Probability And
Probability
78 Distribution
Solution Statistical Techniques
Probability of taking French and Hindi, P (A ∩ B) =0 as they are mutually
exclusive events.
Probability of taking French, P(B)=0.34 NOTES
BINOMIAL DISTRIBUTION
Let’s get back to cricket. Suppose that you won the toss today and this
indicates a successful event. You toss again but you lost this time. If you win a
toss today, this does not necessitate that you will win the toss tomorrow. Let’s
assign a random variable, say X, to the number of times you won the toss. What
can be the possible value of X? It can be any number depending on the number
of times you tossed a coin.
There are only two possible outcomes. Head denoting success and tail
denoting failure. Therefore, probability of getting a head = 0.5 and the probability
of failure can be easily computed as: q = 1- p = 0.5.
A distribution where only two outcomes are possible, such as success or
failure, gain or loss, win or lose and where the probability of success and failure
is same for all the trials is called a Binomial Distribution.
The outcomes need not be equally likely. Remember the example of a fight
between me and Undertaker? So, if the probability of success in an experiment
is 0.2 then the probability of failure can be easily computed as q = 1 – 0.2 = 0.8.
Each trial is independent since the outcome of the previous toss doesn’t
determine or affect the outcome of the current toss. An experiment with only two
possible outcomes repeated n number of times is called binomial. The parameters
of a binomial distribution are n and p where n is the total number of trials and p
is the probability of success in each trial.
Based on the above explanation, the properties of a Binomial Distribution are
Each trial is independent.
There are only two possible outcomes in a trial- either a success or a failure.
A total number of n identical trials are conducted.
The probability of success and failure is same for all trials. (Trials are
identical.)
The mathematical representation of binomial distribution is given by:
Probability And
A binomial distribution graph where the probability of success does not Probability
equal the probability of failure looks like Distribution 81
Statistical Techniques
NOTES
NORMAL DISTRIBUTION
Normal distribution represents the behavior of most of the situations in the
universe (That is why it’s called a “normal” distribution. I guess!). The large sum
of (small) random variables often turns out to be normally distributed,
contributing to its widespread application. Any distribution is known as Normal
distribution if it has the following characteristics:
The mean, median and mode of the distribution coincide.
The curve of the distribution is bell-shaped and symmetrical about the line x=μ.
The total area under the curve is 1.
Exactly half of the values are to the left of the center and the other half to
the right.
A normal distribution is highly different from Binomial Distribution.
However, if the number of trials approaches infinity then the shapes will be quite
similar.
The PDF of a random variable X following a normal distribution is given by:
Probability And
Probability
82 Distribution
The mean and variance of a random variable X which is said to be normally Statistical Techniques
distributed is given by:
Mean -> E(X) = µ
Variance -> Var(X) = σ^2 NOTES
POISSON DISTRIBUTION
Suppose you work at a call center, approximately how many calls do you
get in a day? It can be any number. Now, the entire number of calls at a call center
in a day is modeled by Poisson distribution. Some more examples are
The number of emergency calls recorded at a hospital in a day.
The number of thefts reported in an area on a day.
The number of customers arriving at a salon in an hour.
The number of suicides reported in a particular city.
The number of printing errors at each page of the book. Probability And
You can now think of many examples following the same course. Poisson Probability
Distribution 83
Statistical Techniques Distribution is applicable in situations where events occur at random points of
time and space wherein our interest lies only in the number of occurrences of
the event.
NOTES A distribution is called Poisson distribution when the following assumptions
are valid:
1. Any successful event should not influence the outcome of another
successful event.
2. The probability of success over a short interval must equal the
probability of success over a longer interval.
3. The probability of success in an interval approaches zero as the interval
becomes smaller.
Now, if any distribution validates the above assumptions then it is a Poisson
distribution. Some notations used in Poisson distribution are:
λ is the rate at which an event occurs,
t is the length of a time interval,
And X is the number of events in that time interval.
Here, X is called a Poisson Random Variable and the probability distribution
of X is called Poisson distribution.
Let µ denote the mean number of events in an interval of length t. Then, µ = λ*t.
The PMF of X following a Poisson distribution is given by:
The graph shown below illustrates the shift in the curve due to increase in mean.
Probability And
Probability
84 Distribution
It is perceptible that as the mean increases, the curve shifts to the right. Statistical Techniques
The mean and variance of X following a Poisson distribution:
Mean -> E(X) = µ
NOTES
Variance -> Var(X) = µ
Relation between Poisson and Binomial Distribution
Poisson Distribution is a limiting case of binomial distribution under the
following conditions:
The number of trials is indefinitely large or n → ∞.
The probability of success for each trial is same and indefinitely small or p →0.
np = λ, is finite.
Relation between Normal and Binomial Distribution & Normal and Poisson
Distribution:
Normal distribution is another limiting form of binomial distribution under
the following conditions:
The number of trials is indefinitely large, n → ∞.
Both p and q are not indefinitely small.
The normal distribution is also a limiting case of Poisson distribution with
the parameter λ and ∞.
Solution
If a coin is tossed, either 0 tail or 1 tail can be obtained.
Probability of getting zero tail, P(0) = 1/2
Probability of getting one tail, P(1) = 1/2
The probability distribution table will be,
Example 2
If two coins are tossed simultaneously, find the probability distribution of Probability And
getting head. Probability
Distribution 85
Statistical Techniques Solution
If two coins are being tossed, there can be either 0 heads, 1 head, or 2 heads.
Sample space, S = {HH, HT, TH, TT}
NOTES
The probability distribution of getting heads can be shown as:
Example 3
In a class on 100 students, 80 students passed in all subjects, 10 failed in
one subject, 7 failed in two subjects and 3 failed in three subjects. Find the
probability distribution of the variable for number of subjects a student from the
given class has failed in.
Solution
For a random student,
The probability of failing in 0 subjects, P(X=0) = 0.8
The probability of failing in 1 subjects, P(X=1) = 0.1
The probability of failing in 2 subjects, P(X=2) = 0.07
The probability of failing in 3 subjects, P(X=3) = 0.03
The probability distribution can be shown as:
Example 4
A die is having 6 sides has two dots on 3 sides, four dots on 2 sides and six
dot on 1 side. Find the probability distribution of getting a number on rolling
the die.
Solution
On rolling the given die, any one of the three numbers, 2, 4, and 6 will come.
Probability of getting 2 dots, P(X = 2) = 3/6 = 1/2
Probability of getting 4 dots, P(X = 4) = 2/6 = 1/3
Probability of getting 6 dots, P(X = 6) = 1/6
The probability distribution can be shown as,
Probability And
Probability
86 Distribution
Example 5 Statistical Techniques
Three coins are being tossed, find the probability distribution of getting any
number of tails.
NOTES
Solution
Sample space, S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Hence, in tossing three coins number of tails can be 0, 1, 2, 3.
Probability of getting 0 tails, P(X=0) = 1/8
Probability of getting 1 tails, P(X=1) = 3/8
Probability of getting 2 tails, P(X=2) = 3/8
Probability of getting 3 tails, P(X=3) = 1/8
Probability distribution can be given as:
Example 6
The probability of winning a match for team AA is 0.60.6. Find the
probability of winning 33 matches out of 55.
Solution
Probability of winning, pp = 0.60.6
Probability of losing, q = 0.40.4
Probability of winning 33 matches out of 5, i.e P(x =3)
P(x = 3) = 5!3!2!5!3!2! × 0.216× 0.16
=0.3456× 0.216× 0.16
=0.3456
Hence, the probability is 0.3456
Probability And
Probability
Distribution 87
Statistical Techniques Example 7
If a committee has 77 members, find the probability of having more female
members than male members given that the probability of having a male or a
NOTES female member is equal.
Solution
The probability of having a female member = 0.50
The probability of having a male member = 0.50
To have more female members, the number of females should be greater
than or equal to 44.
P(X ≥ 4) = P(4)+P(5)+P(6)+P(7)P(4)+P(5)+P(6)+P(7) = C74(0.5)4 (0.5)
3+C75(0.5)5(0.5)2+C76(0.5)6(0.5)1+C77(0.5)7(0.5)0C47(0.5)
4(0.5)3+C57(0.5) 5(0.5)2+C67(0.5)6(0.5)1+C77(0.5)7(0.5)0
= (0.5)7×(C74+C75+C76+C77)
=0.0078125×64
=0.5(0.5)7×(C47+C57+C67+C77)
=0.0078125×64=0.5
The probability is 0.5
Example 8
Aren is taking part in four competitions. If the probability of him winning
any competition is 0.30.3, find the probability of him winning at least one
competition.
Solution
Probability of winning at least one competition will be the complement of
the probability of winning not a single competition.
P(X=0) = 40C(0.3)0(0.7)404C(0.3)0(0.7)4
= 1×1×0.2401
= 0.24011×1×0.240
= 0.2401
Probability And
Probability
88 Distribution
Example 9 Statistical Techniques
If a coin is tossed thrice, find the probability of a getting head at least two times.
Solution NOTES
The probability of getting head at least two times is the sum of probabilities
of getting head two times and three times.
P(X ≥ 2) = P(X=2)+P(X=3)
= 32C(0.5)2(0.5)1+33C(0.5)3(0.5)023C(0.5)2(0.5)1+33C(0.5)
3(0.5)0
= 3×0.125+1×0.125
= 0.53×0.125+1×0.125
= 0.5
Hence, the needed probability is 0.50.
Example 10
If only 55 percent kids can secure A grade in a paper, find the probability
of at most 22 out of 1010 kids getting A grade in that paper.
Solution
Probability of securing grade A=0.05
Probability of at most 22 kids getting grade A=P(X=0)+P(X=1)+P(X=2)
The required probability =
C100(0.05)0(0.95)10+C101(0.05)1(0.95)9+C102(0.05)2(0.95)8C010(0.05)
0(0.95)10+C110(0.05)1(0.95)9+C210(0.05)2(0.95)8
= 0.5987+ 0.3151 +0.0746
= 0.9884
Hence, the probability needed is 0.9884
Example 11
The mean value for an event X to occur is 2 in a day. Find the probability
of event X to occur thrice in a day.
Probability And
Probability
Distribution 89
Statistical Techniques Solution
Mean, m = 2
Probability of the event to occur thrice, P (3;2) = e−2 = 233! = 0.1804
NOTES
Example 12
A shop sells five pieces of shirt every day, then what is the probability of
selling three shirts today?
Solution
Mean value for 1 day, m = 5
Probability of selling 33 shirts, P(3;5)= e−5 = 533!
= 0.0067×1255!
= 0.0070
Hence, the probability of selling three shirts is 0.007 when at the average
55 shirts are being sold each day.
Example 13
If three persons, on an average, come to ABC company for job interview,
then find the probability that less than three people have come for interview on
a given day.
Solution
The mean for Poisson random variable, m=3
P(x<3;3) = P (0;3) + P (1;3) + P (2;3) P(x<3;3)
P (0;3) = e−3e−3 300!300!
= 0.0497
= 0.1493
P (2;3) = e−3e−3 322!322!
= 0.2240
Hence,
P(x <3;3) = P (0;3) + P (1;3) + P (2;3)
= 0.0497 +0.1493 +0.2240
= 0.4231
The probability of less than three persons coming for interview on a certain
day is 0.4231
Example 14
Number of calls coming to the customer care center of a mobile company
per minute is a Poisson random variable with mean 5. Find the probability that
Probability And
Probability no call comes in a certain minute.
90 Distribution
Solution Statistical Techniques
The mean value, m=5m=5
We need to find the probability of getting zero calls when 5 calls are known
to come every minute. NOTES
Example 15
There are five students in a class and the number of students who will
participate in annual day every year is a Poisson random variable with mean 3.
What will be the probability of more than 3 students participating in annual day
this year?
Solution
Mean for Poisson random variable, m = 3
P(x>3;3) = P (4;3) + P (5;3)
P (4;3) = e−3e−3 344!344!
= 0.1680
P (5;3) = e−3e−3 355!355!
= 0.1008
Hence, P(x>3;3) = P (4;3) + P(5;3)
= 0.268850169
The probability of getting more than three students participating is 0.2688
Example 16
The deals cracked by an agent per day is a random Poisson variable with
mean 2. Given that each day is independent of other day, find the probability of
getting 2 deals cracked on first day and 1 deal to be cracked the next day.
Solution
The probability of getting 2 deals in a day is P(2;2) and the probability of
getting 1 deal is P(1;2).
The probability of getting 2 deals on first day and one deal on second day
= P(2;2) × P(1;2)
PP = e−2e−2 222!222!
= 0.2706 + 0.2706
= 0.5413
The probability the first day two deals are cracked, and the second day one Probability And
deal is cracked is 0.5413 Probability
Distribution 91
Statistical Techniques Normal Distribution Examples
Normal distribution is a symmetric distribution where the single peak is at
the mean of the data. The normal distribution curve is bell shaped and the spread
NOTES of data is controlled by the standard deviation. The 68-95-99.7 rule says that 68
percent of data in a normal distribution comes under one standard deviation, 95
percent comes under two standard deviations and 99.7 percent of data comes
under three standard deviations. To get the probability of from a normal
distribution, we need to know the mean and the standard deviation of the given
data. Then, calculating the z-score and looking up to the z- table gives the needed
probability.
Formula to find the z-score is,
Z=x−m/σ
Example 17
If mean of a given data for a random value is 81.1 and standard deviation
is 4.7, then find the probability of getting a value more than 83.
Solution
Standard deviation, σ = 4.7
Mean, m = 81.1
Expected value, X = 83
Z-score, Z = x− m / σ
z = 83−81.1 / 4.7 = 0.4042
Looking up the z-score in the z-table, 0.6700
Hence, probability (1−0.6700) = 0.330
Example 18
The average speed of a car is 65 kmph with a standard deviation of 4. Find
the probability that the speed is less than 60 kmph.
Solution
Mean m = 65
Standard deviation, σ = 4
Expected value, X = 4
Z-score, Z = x− m / σ
z = 60 − 65 / 4
= -1.25
Looking up the z-score in the z-table, we get 0.1056
Hence, probability is 0.1056.
Probability And
Probability
92 Distribution
Example 19 Statistical Techniques
The average score of a statistics test for a class is 85 and standard deviation
is 10. Find the probability of a random score falling between 75 and 95.
NOTES
Solution
The probability of score falling between 75 and 95 can be found after
finding the respective z-scores.
For X = 75, z = 75 – 85 / 10
= -1.001
For X = 95, z = 95−85 / 10
= 2.002
Probability is, P (-1.00 < z < 2.00) = P(z < 2.00) - P(z < -1.00)
= 0.9772 - 0.1587 = 0.8185.
SUMMARY
In this chapter, we have studied the general definition of probability, random
experiments. Conditional probability and Baye’s theorem is also studied. The
chapter discussed about probability distributions such as normal distribution,
Poisson distribution and Binomial distribution with different types of examples.
REFERENCES
1. Valerie J. Easton and John H. McColl's Statistics Glossary v1.1
2. https://en.wikipedia.org/wiki/Probability_distribution
3. https://www.analyticsvidhya.com/blog/2017/09/6-probability-
distributions-data-science/
4. https://www.investopedia.com/terms/p/probabilitydistribution.asp#i
xzz5HFYZkGKb
5. http://www.probabilityformula.org/probability-distribution-
examples.html
6. http://www.stat.yale.edu/Courses/1997-98/101/ranvar.htm
7. www.analyzemath.com
EXERCISE
1. 40% of people can roll their tongue, and 10% of people surveyed had
red hair.
a) What is the probability of someone having red hair and having
the ability to roll their tongue?
b) In a random sample of six people, what is the probability that at
least five of them can roll their tongue?
Probability And
Probability
Distribution 93
Statistical Techniques 2. A bridge hand is made up of 13 cards from a regular 52 card deck.
a) what is the probability that a player has three aces in his hand?
b) What is the probability that a player does not have a face card or
NOTES an ace in his hand?
4. A bag contains 8 white balls and 5 red balls.What is the probability of:
a) drawing 3 white balls with your first three attempts, without
replacement?
b) drawing 15 balls, replacing the ball each time and drawing out
exactly 9 red balls?
8. A building supply store buys 40% of its pine boards from sawmill A
and 60% from sawmill B. 7% of the boards from sawmill A and 5
percent from sawmill B have blue discoloration. If a randomly picked
board is discolored, what is the probability that it came from
sawmill A?
14. Suppose the average number of car accidents on the highway in one
day is 4. What is the probability of no car accident in one day? What
is the probability of 1 car accidence in two days?
15. Suppose the average number of calls by 104 in one minute is 2. What
is the probability of 10 calls in 5 minutes?
17. A mile-runner’s times for the mile are normally distributed with a
mean of 4 min. 3 sec. (This would have to be expressed in decimal
minutes -- 4.05 minutes), and a standard deviation of 2 seconds
(0.0333333••• minutes (the three dots indicate a repeating decimal)).
What is the probability that on a given run, the time will be 4 minutes
or less?
18. A machine fills 24-ounce (according to the label) boxes with cereal.
The amount deposited into the box is normally distributed with a
standard deviation of 0.25 ounce. What does the mean have to be for
99.5% of the boxes to contain 24 ounces or more of cereal?
*****
Probability And
Probability
Distribution 95
MODULE - 6
Statistical Techniques
NOTES
ASSOCIATION OF ATTRIBUTES
CHAPTER OBJECTIVES
1. To determine the association between two or more attributes.
2. To study Yules association of attributes.
CONTENTS
6.1 Introduction,
6.2 Types of association
6.3 Notations, order of classes and class frequencies, two attributes and
three attributes,
6.4 Yule’s coefficient of association and interpretation.
6.5 Decision making; process of decision making, types of decision, risk,
uncertainty, criteria of decision making
KEY WORDS
Association, Attributes, Risk, Uncertainty
6.1 INTRODUCTION
Two attributes A and B are said to be associated if they are not independent
but are related in some way or the other. There are three kinds of associations,
which possibly occur between attributes.
1. Positive association
2. Negative association or disassociation
3. No association or independence.
In positive association, the presence of one attribute is accompanied by the
presence of another attribute. For example, health and hygiene are positively
associated.
or if (AB ) ( A )( B )
N
Then attributes A and B are positively associated.
In negative association, the presence of one attribute say A ensures the
absence of another attribute say B or vice versa. For example, vaccination and
occurrence of disease for which vaccine is meant are negatively associated.
or if (AB ) ( A )( B )
N
Then attributes A and B are negatively associated.
If two attributes are such that presence or absence of one attribute has
nothing to do with the absence or presence of another, they are said to
independent or not associated. For example, Honesty and Boldness
or if (AB ) ( A )( B )
N
Then attributes A and B are independent. Association of
Attributes 97
Statistical Techniques
6.3 NOTATIONS
NOTES
In this unit, following symbols will be used
(AB)0 = (A)(B) / N , (αβ)0 = (α)(β) / N
(αB)0 = (α)(B) / N , (Aβ)0 = (A)(β) / N
δ = (AB) – (AB)0 = (AB) – (A)(B) / N
If δ = 0 then (AB) = (A)(B) / N ……… A and B are independent
If δ > 0 then attributes A and B are positively associated and if < 0 then
attributes A and B are negatively associated.
Example 1
Show whether A and B are independent, positively associated or negatively
associated in each of the following cases:
(i) N = 1000; (A) = 550; (B) = 700; (AB) = 540
(ii) (A) = 480; (AB) = 290; (α ) = 585; ( B) = 383
(iii) N = 1000; (A) = 500; (B) = 400; (AB) = 200
Solution
We have given
i) (A)(B) / N = (550) (700) / 1000 = 385 = (AB)0
Thus (AB) = 540 > (A)(B) / N
Since (AB) > (AB)0 hence they are positively associated.
Example 2
In a certain interview, there were 150 candidates of which 80 were boys,
40 candidates were successful among them 15 were boys. Calculate Yule’s
Coefficient of Association.
Solution
Let A be the boys; α be the girls and B be the successful; β be failed
We have N = 150, (A) = 80, (B) = 40, (AB) = 15
We have prepared 2 x 2 contingency table as below
Association of
Attributes 99
Statistical Techniques Example 3
Calculate Yule’s coefficient of association for the following data:
(i) (A) = 500; (B) = 400 ;(AB) = 350; N = 1000
NOTES
(ii) (A) = 900; (B) = 800; (AB) = 600; N = 2000
Solution
(i) We have prepared 2 x 2 contingency table as below
Example 4
Investigate if there is any association between extravagance in father and
son from the following:
Extravagant sons with extravagant fathers (AB) = 600
Miser sons with extravagant fathers ( B) = 1750
Extravagant sons with miser fathers (A ) = 1000
Association of
Miser sons with miser fathers (αβ) = 350
100 Attributes
Solution Statistical Techniques
We have prepared 2 x 2 contingency table as below
NOTES
When linear programming models are used, decisions are made based on
the results of the models. However, these LP models were all formulated under
the assumption that certainty exists. It is assumed that all the model coefficients,
constraint values, and solution values are known with certainty and do not vary.
Decisions are often made within an environment of risk, uncertainty, or conflict.
Decision theory is concerned with decision making under conditions of risk and
uncertainty. Game theory, which will not be discussed, is concerned with decision
making under conflict. Decision situations can be categorized into two classes:
situations where probabilities cannot be assigned to future occurrences, and
situations where probabilities can be assigned.
NOTES
Then the maximum among the maximum values is 550,000, so the decision
is to construct a "large-sized facility."
Each of the highlighted payoffs is the maximum for each alternative. The
minimum among the highlighted maximum is 250,000 for a "medium-sized
facility." The decision should be to construct a medium-sized facility.
NOTES
The Hurwitz criterion requires that, for each decision alternative, the
maximum payoff be multiplied by the coefficient of optimism, and the minimum
payoff be multiplied by the coefficient of pessimism. In the above example:
For example, assume that the coefficient of optimism is 0.6. Then each of
the weighted values is computed as follows:
Association of
104 Attributes
However, if the coefficient of optimism is 0.4, then the values are: Statistical Techniques
NOTES
Association of
Attributes 105
Statistical Techniques The use of several decision criteria often results in a mix of decisions, with
no one decision being selected more than the others. Hence, the appropriate
criterion is dependent on the "risk" personality and philosophy of the
NOTES decision maker.
NOTES
The best decision results from minimizing the regret. In this case, the
decision is a "large-sized facility." The expected value and expected opportunity
loss criteria result in the same decision. You may wonder why you need two
separate approaches to reach the same conclusion. This will be discussed in the
next section.
SUMMARY
In this chapter, we have studied the method of testing association of
attributes between attributes. We have studied the Yule’s association of attributes
method. In this chapter, we have demonstrated the concepts and basics of decision
making with or without probabilities. We have used both a payoff table approach
and a decision tree approach.
REFERENCES
1. https://xisspm.files.wordpress.com/2010/11/decision-theory.doc
2. Statistical and Quantitative methods. M.G. Dhaygude, Himalaya
Publications.
EXERCISE
Q1: Calculate Coefficient of association for the following data.
(i) (AB) = 10, (A) = 24, (αβ) = 8, and N = 40
(ii) (AB) = 12, (A) = 21, (αβ) = 10, and N = 50
(iii) (AB) = 100, (A) = 57, (αβ) = 30, and N = 200
Q3: Given (AB) =13, (Aβ) = 24, (αB) = 18, (αβ) = 25 and N = 80. Calculate
Association of coefficient of association.
108 Attributes
Q4: Given (AB) =3150, (A) = 4500, (B) = 6000, and N = 10000. Calculate Statistical Techniques
coefficient of association.
Q5: Given (AB) =13, (Aβ) = 20, (αB) = 15, (αβ) = 25 and N = 100.
NOTES
Calculate coefficient of association.
Q6: In a sample survey of 3500 students, 1500 liked economics, 1250 liked
statistics and 500 liked both economics and statistics. Find coefficient
of association.
Q7: Given (AB) =8, (A) = 18, (αβ) = 5, and N = 35. Calculate coefficient
of association.
Q8. Given N = 160, (A) = 80, (B) = 90, (AB) = 55 Find ultimate class
frequencies.
*****
Association of
Attributes 109