Data Visualization & Analytics for Decision Making (2) (1)
Data Visualization & Analytics for Decision Making (2) (1)
PGCBM 43
Abhishek Chakraborty
[email protected]
2
Course Outline
Evaluation
Software Tools
MICROSOFT EXCEL R
5
Descriptive
Statistics and Data
Representation
6
Nominal
Ordinal
Interval
Ratio
• Representation for classification or categorization
• The numbers are used only to differentiate entities
and not to assign or make a statement regarding the
values
7
• Ordinal scale is used to rank or order entities
• For example, during appraisal, an employee
maybe graded on a scale of one to five
• The distances between consecutive numbers has no
meaning
Ordinal Scale • Due to the imprecise nature of measurement, both
nominal scale and ordinal scale are referred to as non-
metric data also qualitative data
• Both nominal and ordinal variables can take from a
fixed set of values
8
• The distances between consecutive
numbers has a meaning and the data is
always numerical
Interval Scale • The zero point is all about convention
or convenience and is not fixed
• For instance, Celsius scale, GMAT
score, pH value
9
• The zero point is fixed and represents the
absence of something being studied
• Both interval and ratio scale are referred to
Ratio Scale as metric data or quantitative data
• Examples include, height, weight, total
monthly sales, etc.
10
11
Nominal ✓ ✗ ✗ ✗
Ordinal ✓ ✓ ✗ ✗
Interval ✓ ✓ ✓ ✗
Ratio ✓ ✓ ✓ ✓
12
Frequency Distribution
• Raw data is sometimes referred to as
ungrouped data
• We need to organize the ungrouped
data into grouped data through
frequency distribution
• For instance, consider the NIFTY 50
stocks and their sectors
14
Percentage
Sectors Frequency Relative Frequency Frequency
Consumer Goods 6 0.12 12.00%
Banking 6 0.12 12.00%
Automobile 6 0.12 12.00%
Information Technology 5 0.1 10.00%
Frequency Financial Services 5 0.1 10.00%
Pharmaceuticals 4 0.08 8.00%
Distribution Metals 4 0.08 8.00%
Energy - Oil & Gas 3 0.06 6.00%
Cement 3 0.06 6.00%
Energy - Power 2 0.04 4.00%
Telecommunication 1 0.02 2.00%
Infrastructure 1 0.02 2.00%
Healthcare 1 0.02 2.00%
Consumer Durables 1 0.02 2.00%
Construction 1 0.02 2.00%
Chemicals 1 0.02 2.00%
15
Frequency
7
6
Bar Chart
5
4
3
2
1
0
t er
ds in
g
bi
le
og
y
ce
s als etals Gas en er
s
oo n k o o l rv
i
ut
ic m o w th
er
G
Ba utom chn l Se ce
M il & Ce -P O
e a a O y
um A T nc
i
ar
m - er
g
ns o n a h rg y
E n
Co ati Fi
n P e
rm En
fo
In
Pie Chart 16
Frequency
Consumer Goods
Banking
Automobile
Information Technology
Financial Services
Pharmaceuticals
Metals
Energy - Oil & Gas
Cement
Energy - Power
Others
17
Dismissals
0% 6%
2%
16%
11%
5%
60%
lbw caught run out bowled not out stumped hit wicket
Pie Charts
18
Frequency
7
Pareto Chart 4
0
ile t er e ls
ds in
g
og
y
ice
s als eta
ls as en on re ar les on
ica
oo nk ob ol rv u tic &
G
em P ow icati u ctu lthc r ab ucti m
G a m n e e M C r u r e
er B ut
o ch S ac il - un st ea D st Ch
u m A Te cial r m -O er gy m
m n fra H er C on
ns on n a y I um
na Ph rg En co
Co ati Fi ne ele ons
r m E T C
fo
In
19
Histograms
Dot Plots
DATA REPRESENTATION
Ogives and
Stem and
Histogram Frequency Dot Plots
Leaf Plot
Polygons
23
Kohli’s Runs
60
50
CAUTION !!!
Histogram 40
In Excel, Histograms are represented like
Frequency
this though a histogram
30 needs to be
represented as a series on contiguous
20
rectangles
10
0
9 19 29 39 49 59 69 79 89 99 09 19 29 39 49 59 69 79 89 re
1 1 1 1 1 1 1 1 1 Mo
Runs
Frequency
24
Kohli’s Runs
60 120.00%
50 100.00%
40 80.00%
Frequency
Frequency
30 60.00%
Cumulative %
20 40.00%
10 20.00%
0 0.00%
9 19 29 39 49 59 69 79 89 99 109 119 129 139 149 159 169 179 189 More
Runs
25
Distribution of Kohli's runs
30
25
Frequency 20
Polygons
Frequency
15
10
0
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
4. 14. 24. 34. 44. 54. 64. 74. 84. 94. 04. 1 4. 24. 34. 44. 54. 64. 74. 84.
1 1 1 1 1 1 1 1 1
Runs Scored
26
STEM LEAF
0 0 0 0 2 2 8 9
1 0 0 1 2 6 8 8
2 5 7 8
Plot 5 4 4 7
6 3 4 8
7 1 9
8 2
9 1
10 2 5 7
11 8
27
Bivariate Analysis
28
Crosstabulation
We will look into two dimensions: dismissals and innings using Pivot tables
29
• In an IT firm, an IT engineer has been promoted to the role
of an HR manager. In the following year, after the appraisal
of the 30 associates of the IT firm, he was accused by some
of the associates of being partial towards those who are
coming from “Mainframes” skill set as compared to
“Java/J2E” skill set while promoting the associates. One of
the associates who has not been promoted, registered a
complaint with the CEO against the manager stating that
Crosstabulation since the HR manager is himself from the Mainframes
background, he has been biased towards those coming from
Mainframes background while promoting them. In his
defence, he also attached the following table where the
background of each associate is mentioned along with
whether they have been promoted or not. Is there any reason
to believe the claim being made?
30
Cross-tabulation
Employee Code Skill Set If Promoted Employee Code Skill Set If Promoted
Emp 1 Mainframes Yes Emp 16 Java/J2E Yes
Emp 2 Java/J2E No Emp 17 Mainframes No
Emp 3 Java/J2E No Emp 18 Java/J2E Yes
Emp 4 Mainframes No Emp 19 Mainframes Yes
Emp 5 Java/J2E No Emp 20 Mainframes No
Emp 6 Mainframes Yes Emp 21 Java/J2E No
Emp 7 Mainframes Yes Emp 22 Java/J2E Yes
Emp 8 Java/J2E No Emp 23 Mainframes Yes
Emp 9 Java/J2E Yes Emp 24 Java/J2E No
Emp 10 Mainframes No Emp 25 Java/J2E Yes
Emp 11 Java/J2E Yes Emp 26 Mainframes Yes
Emp 12 Mainframes Yes Emp 27 Java/J2E No
Emp 13 Mainframes No Emp 28 Mainframes Yes
Emp 14 Mainframes Yes Emp 29 Mainframes No
Emp 15 Java/J2E No Emp 30 Java/J2E Yes
31
Scatter Plot
800,000
700,000
600,000
Total Traded Quantity
500,000
400,000
300,000
200,000
100,000
0
1,900.00 2,000.00 2,100.00 2,200.00 2,300.00 2,400.00 2,500.00 2,600.00
Closing Price
33
Mean (Arithmetic,
Mode Median
Geometric, Harmonic)
Percentiles
• Procedure – Suppose we want to find pth percentile in a
dataset with n datapoints
• Arrange the data in ascending order
34
35
Analyzing Distributions
Quartiles z-scores
Range
Measures of
Dispersion Inter quartile range
• Coefficient of Skewness
• Coefficient of Kurtosis
Coefficient of Excess
Coefficient of Variation
38
Measure of Symmetry
Coefficient of Skewness
• Positive Skewness
• Positive Value
• Longer right tail
• Higher data concentration on the left
• Negative Skewness
• Negative Value
• Longer left tail
• Higher data concentration on the right
Coefficient of Kurtosis
• Mesokurtic
39 Measure of • Leptokurtic
Peakedness • Platykurtic
Coefficient of Excess
Kurtosis
40
Introduction to Probability
Nissan came out with the launch of Micra. Nissan Micra is
available in 6 different exterior colours (Olive Green,
Onyx Black, Blade Silver, Brick Red, Storm White,
Turquoise Blue). For any car dealer of Nissan, it is not
possible to keep all these varieties in their showroom.
However, based on their past experiences, they have some
idea as to what varieties have been requested more as
compared to the others. Based on all the customer requests
in the past month received at a particular car dealer, they
found the following results and will only be keeping the
cars of three colours in their showroom for display which
are being requested more.
46
Introduction to Probability
Car Sales
Car Shades Sales
Turquoise Blue
Olive Green 35
Storm White Onyx Black 48
Brick Red
Blade Silver 63
Brick Red 100
Blade Silver
Storm White 40
Onyx Black
Turquoise Blue 34
Olive Green
0 20 40 60 80 100 120
Series1
47
Introduction to Probability
• Random Experiment
• The experimental outcomes are well-
defined, and the knowledge is available
before conducting the experiment
Introduction to • In a single trial of the experiment, one and
Probability only one of the possible experimental
outcomes will occur and we don’t know
the outcome of a particular trial in advance
• Sample Space
• It is the set of all experimental outcomes
51
• Consider a random experiment which has N possible
experimental outcomes
• The probability of occurrence of each outcome will be between 0 Assigning
and 1 and the sum of probabilities of occurrences of all the Probabilities to
outcomes should be 1
Experimental
• Subjective Way
• It is an estimate that reflects a person’s opinion, or best guess
Outcomes
about whether an outcome will occur
• Classical Way
• When all the experimental outcomes are equally likely, then
we can assign a probability of 1/N to each experimental
outcome
• Relative Frequency Way
• When the number of experimental trials is large, then the
probability of a given outcome is the number of times that
outcome occurs divided by the total number of repetitions
52
Introduction to Probability
Probability
Packages A B C D E
A B C D E
55
Streams/Programs Enrolments
Introduction to Probability
Science 350
Engineering 450
A market research firm is assigned the
Medical 120
task of finding the acceptance of a certain
software solution used for trading stocks, Arts 150
currencies, commodities, etc. among the
Commerce 275
students specialized in different streams
of XYZ University. In that university, the Management 180
distribution of student enrolment among
Law 75
the different streams/programs is as
follows: Total 1600
56
First, the firm wants to find out the
acceptance of the software solutions
among the students from the management
discipline. What is the probability that a
Introduction to student selected at random will have a
specialization in management?
Probability
What is the probability that a student
selected at random will have a specialization
in commerce?
57
Airline On-Time Mishandled Customer
Introduction to Arrivals (%) Baggage per 1000
passengers
Complaints per
1000 passengers
Probability Virgin America 83.5 0.87 1.50
The given table shows the JetBlue 79.1 1.88 0.79
percentage of on-time arrivals, AirTran Airways 87.1 1.58 0.91
the number of mishandled Delta Airlines 86.5 2.10 0.73
baggage reports per 1000 Alaska Airlines 87.5 2.93 0.51
passengers, and the number of Frontier Airlines 77.9 2.22 1.05
customer complaints per 1000 Southwest Airlines 83.1 3.08 0.25
passengers for 10 airlines US Airways 85.9 2.14 1.74
American Airlines 76.9 2.92 1.80
United Airlines 77.4 3.87 4.24
Source: Statistics for Business and Economics by Anderson, Sweeney, Williams, Camm and Cochran, Cengage Publishers 13e
58
Introduction to Probability
Introduction to Probability
• In an IT firm, an IT engineer has been promoted to the role of an HR manager. In the following
year, after the appraisal of the 30 associates of the IT firm, he was accused by some of the
associates of being partial towards those who are coming from “Mainframes” skill set as compared
to “Java/J2E” skill set while promoting the associates. One of the associates who has not been
promoted, registered a complaint with the CEO against the manager stating that since the HR
manager is himself from the Mainframes background, he has been biased towards those coming
from Mainframes background while promoting them. In his defence, he also attached the following
table where the background of each associate is mentioned along with whether they have been
promoted or not. Is there any reason to believe the claim being made?
60
Employee Skill Set If Employee Skill Set If
Code Promoted Code Promoted
Emp 1 Mainframes Yes Emp 16 Java/J2E Yes
Emp 2 Java/J2E No Emp 17 Mainframes No
Emp 3 Java/J2E No Emp 18 Java/J2E Yes
Emp 4 Mainframes No Emp 19 Mainframes Yes
Emp 5 Java/J2E No Emp 20 Mainframes No
Emp 6 Mainframes Yes Emp 21 Java/J2E No Introduction to
Emp 7 Mainframes Yes Emp 22 Java/J2E Yes Probability
Emp 8 Java/J2E No Emp 23 Mainframes Yes
Emp 9 Java/J2E Yes Emp 24 Java/J2E No
Emp 10 Mainframes No Emp 25 Java/J2E Yes
Emp 11 Java/J2E Yes Emp 26 Mainframes Yes
Emp 12 Mainframes Yes Emp 27 Java/J2E No
Emp 13 Mainframes No Emp 28 Mainframes Yes
Emp 14 Mainframes Yes Emp 29 Mainframes No
Emp 15 Java/J2E No Emp 30 Java/J2E Yes
61
Introduction to Probability
Events
Events
• Complement of an Event: For any event A, its complement consists of all those sample points
which are not in A
• It is denoted by AC
• Thus, P(AC)=1-P(A)
• For instance, consider the event of getting an odd number while rolling a dice
• The sample points corresponding to the event includes 2,4 and 6
• Thus, the complement of the above event is to get 1,3, or 5
66
Events
• Union of Two events: The union of two events A and B is an event consisting of all those
sample points which belong to either of them
• It is denoted by
• Intersection of Two events: The intersection of two events A and B is an event consisting
of all those sample points which belong to both of them
• It is denoted by
• Addition Rule
67
Example
Solution
Events
Conditional Probability
• An IT firm wishes to analyze the data regarding the promotion of its employees over the
past 2 years. The data is presented in the following table:
Conditional Probability
Conditional Probability
• Find the probability that a randomly selected employee has a skill set of JAVA/J2E and is
promoted
• Solution: We are required to find P(A|J)
• P(A|J)=145/200=0.725
• Let us also find other cases
• P(A|M)=375/500=0.75, P(AC|J)=1-0.725=0.275, P(AC|M)=1-0.75=0.25
74
Conditional Probability:
Context of Reneging
•Article Source:
https://www.benivo.com/blog/how-to-prevent-employees-fr
om-reneging-on-a-signed-offer-and-pulling-no-shows
75
Conditional Probability
• HR managers often face the problem of offer reneging which means the candidates, after
getting the offer from a firm don’t end up joining the same firm. A particular IT firm has
investigated the past data of reneges happening in last one year to conclude whether the
educational background has something to do with reneging. The following table shows
the data gathered:
Independence of Events
• In the previous example, we have seen that whether an employee is promoted or not depends
upon the skill set of the individual since P(A|M)≠P(A|J)
• Two events A and B are independent if P(B|A)=P(B) and P(A|B)=P(A)
• Revisiting the Promotion Example with modified data
Skill set JAVA/J2E Skill set Mainframes
Promoted 140 350
Not Promoted 60 150
Total 200 500
77
Independence of Events
Multiplication Law
• Example: It is known that 80% of the households in a city have television sets. It is also
known that out of those having TV sets, 75% also a have connection to satellite channels.
What is the probability that a household selected at random will have both the TV set as
well as connection to the satellite channels?
79
Solution
Bayes’ Theorem
81
• Let events C1, C2 . . . Cn form partitions of the sample space S, where all the events have a
non-zero probability of occurrence.
• For any event, A associated with S, the total probability theorem states
82
C1 C2
A
….
C3 Cn
83
• A student needs to appear for DVADM examination. The question paper
can be of low, moderate, or high level of difficulty. The probabilities of
passing the exam for the student under each of these cases are 0.9, 0.7,
and 0.5 respectively. If the probabilities that the question paper will be of
moderate difficulty is 0.45, and of low difficulty is 0.35, what is the
probability that the student will pass the exam?
• Sol: Let the events be denoted as: L (Low Difficulty), M (Moderate
Difficulty), and H (High Difficulty)
Example • Here, P(L)=0.35, P(M)=0.45, P(H)=0.2
• Also, let the event passing the examination is denoted by P
• Then P(P|L)=0.9, P(P|M)=0.7, and P(P|H)=0.3
• So, P(P)=P(P|L)*P(L)+ P(P|M)*P(M)+ P(P|H)*P(H) =
0.9*0.35+0.7*0.45+0.3*0.2=?
84
Bayes’ Theorem
• Prior Probability
Let there be some specific events of interests. We
have some initial information about the probability
of occurrence of such events
We call the same as Prior Probability and then seek
collection of further information about
• Posterior Probability
After obtaining additional information, we update
the prior probabilities to get the revised probabilities
being referred to as posterior probabilities
85
Solution
Solution
Solution
Example
An IT firm has developed its own filter for the emails
received. Emails are classified as Genuine emails and
Junk emails. The firm receives about 10% of Junk
emails. The filter is designed in such a way that if it
detects a Junk email, it will be sent to the “Spam”
email folder else it will be sent to the “Inbox”.
However, the filter is not fool-proof. It has been found
that about 15% of the Junk emails are being sent to the
Inbox folder while about 5% of the Genuine emails are
sent to the Spam folder.
What is the probability that an received at random will be
sent to the Spam folder?
What is the conditional probability that a randomly checked
email from the Spam folder is a Genuine email?
93
Solution the spam folder and the remaining 855 will be sent to inbox
folder
• Out of the 100 junk emails, 15% i.e. 15 emails will be sent to the
inbox folder and the remaining 85 will be sent to spam folder
• Out of 130 emails in the spam folder, 45 are genuine
• Required probability is 45/130 = 0.346
95
Probability
Distributions
96
Random Variables
• Consider the random experiment of rolling a pair of dice. We define the random variable x as the
sum of the outcomes of the pair of dice. The sample space is expressed in the following table:
1 2 3 4 5 6
1 (1,1) 2 (1,2) 3 (1,3) 4 (1,4) 5 (1,5) 6 (1,6) 7 The numbers outside
2 (2,1) 3 (2,2) 4 (2,3) 5 (2,4) 6 (2,5) 7 (2,6) 8 the brackets indicate
the sum of the
3 (3,1) 4 (3,2) 5 (3,3) 6 (3,4) 7 (3,5) 8 (3,6) 9
outcomes
4 (4,1) 5 (4,2) 6 (4,3) 7 (4,4) 8 (4,5) 9 (4,6) 10
5 (5,1) 6 (5,2) 7 (5,3) 8 (5,4) 9 (5,5) 10 (5,6) 11
6 (6,1) 7 (6,2) 8 (6,3) 9 (6,4) 10 (6,5) 11 (6,6) 12
99
Random Variables
• P(x=2)=1/36 • P(x=8)=5/36
• P(x=3)=2/36 • P(x=9)=4/36
• P(x=4)=3/36 • P(x=10)=3/36
• P(x=5)=4/36 • P(x=11)=2/36
• P(x=6)=5/36 • P(x=12)=1/36
• P(x=7)=6/36
100
Random Variables
• Machines can experience breakdowns due to electrical faults, mechanical faults or misuse
• There is a cost of repair associated with each of them leading to machine breakdowns
• The costs are given as follows:
Reason Electrical Mechanical Misuse
Cost of Repair 2000 2500 5000
Random Variables
Random Variables
• Thus, a random variable associates a numerical value with each and every experimental
outcome
• It can be classified as discrete or continuous depending on the numerical values it
assumes
103
• A random variable that assumes only a finite set of values or an infinite sequence of
values in the form of 0, 1, 2, …. is referred to as a discrete random variable
• Consider the example of number of cars crossing a toll booth each hour on a particular
day
• Here the random variable x can assume values 0,1,2,3,……
104
• A call centre firm records the hourly calls received from different clients on a particular
day. The data is presented in the following table:
x 13 14 15 16 17 18
f(x) 0.05 0.1 0.2 0.3 0.25 0.1
• Compute the expected value and the variance of the random variable x
• A continuous random variable may assume any
value in an interval on the real number line or in a
collection of intervals
• Experimental outcomes based on weights, height,
Continuous marks, measurements etc. are described by
Random continuous random variables
111
• A major issue in dealing with continuous random
variables includes the computation of probability
of random variables at a particular point in the
sample space
Continuous • We define the probability density function as the
Random Variable continuous counterpart of probability mass
function
• Unlike the probability mass function, the
probability density function (f(x)) doesn’t provide
us the probability values directly
112
113
• Discrete
• Binomial
• Continuous
• Normal
115
116
Normal Distribution
122
123
124
Introduction to Sampling
Variable: A specific population characteristic which varies from one unit to another unit
127
8.75
The six samples yielded the
11.36
following sample means
9.19
9.89
9.24
12.31
131
• Exercise
Consider the set of numbers from 1 to 100. Draw samples of size 2, 3, 4, …, etc.
from it and compute the sample means.
133
Let S1, S2,…..,Sn be samples of size n drawn from an independent and identically distributed
population with mean µ and standard deviation σ.
According to the CLT, the distribution of the means of S1, S2,….., Sn follow normal
distribution with mean µ and standard deviation for large value of n.
Independent and identically distributed implies that random variables are mutually
independent and the random variables follow the same probability distribution.
134
The average annual stipend in SIP for a B-
school in Eastern India is ₹ 82000 with
standard deviation ₹ 5000. A random
Central Limit
sample of 36 students selected from the
Theorem population. What is the standard error of
the mean? What is the probability that the
sample mean is less than Rs. 80000?
135
Solution
Statistical Inference
137
Statistical Inference
Point Estimation
Estimation Interval Estimation
Point Estimation
Interval Estimation
• A random sample of 36 students selected from the population of all B-school students
in Eastern India and their stipend during their summer internships is noted. The sample
mean is found out to be 78000. The population standard deviation is assumed to be
10000. Find a 90% confidence interval for the true (population) mean of the summer
internship stipend.
142
143
Interval Estimation
149
What is hypothesis?
• In limited overs format, it was decided by ICC (cricket’s governing body) in May 2015
that they wanted to get rid of batting power play so as to allow the bowlers a little more
breathing space in a format that has been largely dominated by batsmen.
• The economy rates of bowlers are taken both prior and post introduction of the rule to
test the claim.
150
151
What is hypothesis?
“A claim or statement
regarding the population
parameter which may or may
We define hypothesis as: not be true but requires
verification from a randomly
drawn sample”
152
Hypothesis Testing
• Null Hypothesis
• Dictionary Definition of Null Hypothesis
• “ the hypothesis that there is no significant difference between specified populations,
any observed difference being due to sampling or experimental error”
• Null Hypothesis specifies a population parameter of interest and proposes a
values for the same
• It symbolizes status quo and is denoted by H0
153
Hypothesis Testing
• A large distributor of automobile parts is able to maintain the average duration of receipts of interest
free credit allowed from the wholesalers to 20 days since inception of his company. Due to some
regulatory changes in the past months, he noticed that for a randomly selected sample of 75
wholesalers, the average duration of receipts of interest free credit has become 22 with sample
standard deviation of 2.5 days. Does the regulatory change influence in increasing the average credit
days at 5% level of significance?
• It is claimed that increasing the advertisement budget helps in boosting up average daily sales.
Average daily sales figures prior the increase in the advertisement expenditure was found to
be 2500 units. The daily sales post the increase in the advertisement expenditure was found to
be 2650 for 25 days with a daily sample standard deviation of 100. Does the increasing of the
advertisement budget help in boosting of average daily sales at 5% level of significance?
Example
• Find the probability of Type II error against the alternative H 1: {P(X=1,2,3)=1/9 and
P(X=4,5,6)=2/9} where X is a random variable denoting the face obtained after the die is
rolled
165
Solution
Dice 1 1 2 3 4 5 6
Dice 2
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12
166
Solution
• Here α=1/36*6=1/6
• And β=1-6*4/81=57/81
167
Hypothesis Testing
169
Example 1
• An app based food ordering firm, FoodAnytime wants to launch some targeted offers to the customers who will
be availing their services on specific days of the week so as to boost up customer demand for their services on
those specific days. It first wants to find whether customers have a preference for their services more during the
weekends as compared to the weekdays. It collected the data pertaining to the total number of daily orders for
a period of 50 days selected at random for a city. Test at 5% significance level whether the demand for the services
of FoodAnytime is higher during the weekends as compared to the weekdays. If the demand during the weekends
is found to be significantly higher than that during weekdays, then FoodAnytime will launch the targeted offers
during the weekdays to attract more customers to avail their services.
• Further, FoodAnytime is also interested in finding whether the demand for their services is lower on any particular
day during the weekdays so that even better offers can be launched to attract the customers. Test at 5%
significance level, whether the demand for the services of FoodAnytime is different for different days of the week.
170
Assumptions
• follows a normal distribution with mean μ and variance σ2/n
• follows a distribution with n-1 degrees of freedom
• Z and s are independent
171
172
• The marketing team needs to analyse the data to test whether the sales of the existing product
has gone down significantly post the launch of the new product. In such a case, the team will
recommend senior management regarding increasing the price of the new offering.
174
Example 3
• Ajanta Foods is a key player in the segment of processed foods in the Western part of India. It
wants to penetrate in the neighbouring regions where its presence is limited. For the same, the
company has hired three MBA graduates for the role of sales. Three different regions are
allocated to these three different individuals. It is assumed that the regions are similar
demographically as well as on the socio-economic parameters. The company wishes to evaluate
the performance of these MBA graduates on the basis of the sales that has been generated in
these regions. For the same purpose, the daily sales figures across these three regions are
obtained. The objective is to find whether there is a significant difference in the performance of
the three individuals in terms of the average daily sales. If one of the MBA graduates is
outperforming the rest, then the person will be rewarded. If there is no significant difference, then
no extra compensation will be provided.
175
• A manufacturer wants to test whether the hourly output rate of the newly
purchased machine is 60. It is however known that such machines have a
standard deviation of hourly output to be 10. He allowed the machine to be
operated for some time and found that 1910 units were produced in 2000
minutes. Test to see whether the test is consistent with the initial claim at
5% level of significance.
179
Example 4
• FinoTech is an India IT firm based out of Bangalore. It serves its clients primarily in the Banking and Financial Services
(BFS) sector. Last year it signed a contract with a leading firm based out of the US in the BFS sector where any IT related
issues will be raised in the form of tickets. The tickets issued have different levels of criticality: Category A are the most
critical ones and needed to be addressed urgently i.e. within 2 hours of issuing the tickets, Category B are less critical as
compared to Category A and have to resolved within 24 hours of them being raised, and Category C which are the least
critical and have to be resolved in one week. As per the contract, FinoTech will have to ensure a service level of 95% is
achieved for the Category A tickets and an overall service level of 90% is achieved while handling all the tickets taken
together. The project manager has to assign the responsibility of handling these tickets issued by the client to the different
associates reporting to him based on their criticality. The associates handling these tickets are either posted at the client’s
site or are providing the services from the offshore office.
• Can we say that associates at the client’s location are more efficient in handling tickets of category A?
• Can we say that the associates at the onsite get to handle less of Category B and Category C tickets than their offshore
counterparts?
• Test at 5% level of significance.
180
181
R codes
Advanced Topics
183
Sampling Distribution
• A box contains 5 balls with weights in certain units. These balls have weights as 1, 2, 3, 4
and 5. A simple random sample of 2 balls is drawn from the box without replacement. Let
x1 and x2 be the weights of the balls in the sample.
1 2 1.5 0.5
1 3 2 2
1 4 2.5 4.5
1 5 3 8
2 3 2.5 0.5
2 4 3 2
2 5 3.5 4.5
3 4 3.5 0.5
3 5 4 2
4 5 4.5 0.5
185
Solution
Solution
• Expected value of =3
• Variance of =0.75
• Expected value of =2.5
• Population Mean is 3 and population variance is 2
187
Example
• A box contains 4 balls with weights in certain units. These balls have weights as 1, 2, 3
and 4. A simple random sample of 2 balls is drawn from the box with replacement. Let x 1
and x2 be the weights of the balls in the sample.
Solution
• Sampling distribution of
• P(=1)=0.0625
• Sampling distribution of
• P(=1.5)=0.125 • P(=0)=0.25
• P(=2)=0.1875 • P(=0.5)=0.375
• P(=2.5)=0.25
• P(=2)=0.25
• P(=3)=0.1875
• P(=4.5)=0.125
• P(=3.5)=0.125
• P(=4)=0.0625
190
Solution