ZC-417 Quantitative Methods Exam Notes
ZC-417 Quantitative Methods Exam Notes
Types of Data
Measurement Scale
Frequency Distribution
Relative Frequency Distribution
Cumulative Frequency Distribution
Percent Frequency Distribution
Bar Charts / Graph
Pie Charts
All have the same information – A matter of taste
Cross Tabulating the Titanic Data (Titanic Data Contingency Table)
A person is randomly picked and the Class & Survival were recorded
• P((F & A) or (S, & A)) = P((F & A) + P(S & A) = 202/2201 + 118/2201 = 320 / 2201
• (mutually exclusive events)
• P(Not (F, & A)) = 1 – 202/2201 = 1999 / 2201 (complementary events)
• P(Person was Crew or Person Survived)
= P(Person was Crew) + P(Person Survived) – P(Person was Crew and survived)
= 885/2201 + 710/2201 – 212/2201 = 1383 / 2201
A person is randomly picked and the Class & Survival were recorded
• P((F & A) or (S, & A)) = P((F & A) + P(S & A) = 202/2201 + 118/2201 = 320 / 2201
• (mutually exclusive events)
• P(Not (F, & A)) = 1 – 202/2201 = 1999 / 2201 (complementary events)
• P(Person was Crew or Person Survived)
= P(Person was Crew) + P(Person Survived) – P(Person was Crew and survived)
= 885/2201 + 710/2201 – 212/2201 = 1383 / 2201
A person is randomly picked and the Class & Survival were recorded
• P((F & A) or (S, & A)) = P((F & A) + P(S & A) = 202/2201 + 118/2201 = 320 / 2201
• (mutually exclusive events)
• P(Not (F, & A)) = 1 – 202/2201 = 1999 / 2201 (complementary events)
• P(Person was Crew or Person Survived)
= P(Person was Crew) + P(Person Survived) – P(Person was Crew and survived)
= 885/2201 + 710/2201 – 212/2201 = 1383 / 2201
Frequency Distribution
Relative Frequency Distribution
Percentage Relative Frequency Distribution
Cumulative Frequency Distribution
Histogram
Ogive
Dot Plot
Time Taken – Frequency Distributions
Skewness
Graphical and Tabular presentations pictorially summarizes the entire data set
Business may require:
o A measure that summarizes the data with a single number
o A measure that summarizes the spread of the data with a single number
Measures of Location I
Measures of Location II
Measures of Dispersion
The variance is a measure of variability that includes all the available information
∑( ) ∑( )
𝜎 = &𝑠 =
Mean Vs Median
1. Suppose the company knows it will retain customers if the customer satisfaction index is 6 or above (on
a scale of 1 – 10.) The average index for the current survey is 6. Should the company feel comfortable?
1. The shopkeeper maintains a record of the monthly spends by its regular customers. Would he be
interested in the median spend or in the average spend?
Kentucky Derby Data: Box Plot & Five-Number Summary & Outliers
Comparing Different-Looking Values
You have a job offer of Rs.8.2 lakh in Hyderabad and another of Rs8.6 lakh in Bangalore. The mean and
variance for similar jobs in Hyderabad were Rs.7 lakh and Rs.9 lakh, while for Bangalore, these were Rs.7.4
lakh and Rs.12.25 lakh. Which is a better job offer?
The Feb “high” temperature averaged 30oC with variance 100oC, while in May these were 40oC and 64oC.
When is it more unusual to have a high of 35oC?
You have a job offer of Rs.8.2 lakh in Hyderabad and another of Rs8.6 lakh in Bangalore. The mean and
variance for similar jobs in Hyderabad were Rs.7 lakh and Rs.9 lakh, while for Bangalore, these were Rs.7.4
lakh and Rs.12.25 lakh. Which is a better job offer?
Z-Scores:
Zhyd = (8.2-7)/3 = 0.4 &
Zbgl =(8.6-7.4)/3.5 = 0.3429
Hyderabad: The offer is 0.4 σ’s from the mean
Bangalore: The offer is 0.34 σ’s from the mean
The Feb “high” temperature averaged 30oC with variance 100oC, while in May these were 40oC and 64oC.
When is it more unusual to have a high of 35oC?
Z Scores
Zfeb = (35-30)/10 = 0.5 &
Zjun = (35-40)/8 = -5/8 = -0.625
February: 35oC is 0.5 σ’s from the mean
May: 35oC is 0.625 σ’s from the mean (Ignoring the negative sign)
Recall the team member with a a job offer of Rs.15 lakh in Hyderabad.
Should you try and retain him or start the separation process?
Recall also for his level in Hyd, m = Rs.8 lakh and s = Rs.2 lakh.
z score is 3.5
Suppose the salaries for his profile and level are bell shaped
Lesson -2
The soft drinks manufacturer has recently introduced a new drink Cocofizz targeted at college students. It
has engaged a market research company to do a survey among young people to find out how the product is
being perceived.
In the pilot survey, the MR team decided to meet 20 students. A random college was selected and a team
was stationed at the college canteen. The team asked very 10th student to rate the product on a scale of 1
– 5, 1: Yuck!, 2: Poor and 3: Neutral, 4: Good and 5: Excellent.
• The survey result has well-defined outcomes but cannot be predicted with certainty
A random college was selected and a team was stationed at the college canteen. The team asked very 10th
student whether he / she will recommend the drink to a friend – Yes or No
The survey result has well-defined outcomes but cannot be predicted with certainty
• Random Experiment: A process that has well-defined outcomes but these cannot be predicted with
certainty
• Sample Space: The set of all outcomes
• Event: Any subset of the sample space
• For the following processes, identify the sample space and one event:
Flip a coin
Roll a die
Assigning Probabilities
• Equally Likely
Assigning probabilities based on the assumption of equal likely outcomes
• Relative Frequency Method
Assigning probabilities based on historical data
• Subjective Method
Assigning probabilities based on judgement
Venn Diagram
The rental classifieds suggest that 50% of the flats are fully furnished, 20% have 24/7 running water, and
10% have both features. What is the probability that a flat for rent:
A. Is fully furnished or has running water?, B. Neither, C. Is fully furnished but no running water
C. P(24/7 Water and Not Furnished) = P(24/7 Water) – P(Both) = .20-.10 = 0.10
Subjective Method
Often managers use their experience and intuition (and the data available) to assign probabilities. The
probabilities represent their belief in the likelihood of the events
Usually, probability estimates are based on the Relative Frequency approach together with the subjective
estimate.
Example: The firm will shortly launch a variant of the existing model.
R&A assigned the following probabilities to the possible market share by year-end: P(5%) = 20%, P(10%) =
55% and P( 15%) = 25%.
VP Marketing modified the numbers as follows:
P(5%) = 25%, P(10%) = 40% and P( 15%) = 35%.
2. ∑ P(E ) = 1
Conditional Probability
titanic
Conditional Probability
o P(A | B) “Read as Probability of A given B”
o P(A | B) = P(A and B) / P(B) = P(A & B) / P(B)
Independent Events
o P(A | B) = P(A) or P(A & B) = P(A) * P(B)
Multiplicative Law
o P(A and B) = P(B) * P(A | B) or P(A & B) = P(B)P(A|B)
Joint Probability
o Probability of two events both occurring
Marginal Probability
o Joint Probability: Probability Distribution of one of the variables
The firm will shortly launch a variant of the existing soft drink.
Based on past data and industry reports, the R&A Department has assigned the following probabilities to
the possible market share: P(5%) = 0.2, P(10%) = 0.5 and P(15%) = 0.3.
Since the soft drink was targeted for the younger generation, a taste test was held at a college campus. 35
of the 50 students who took the test said they like the drink.
This new fact has to be factored into the earlier estimates to refine the probabilities.
The earlier probabilities are called Prior probabilities and the latter would be called Posterior probabilities.
Bayes’ theorem provides the tool for revising the prior probabilities.
Any manager, dealing with a situation of uncertainty, can assign probabilities to the possible outcomes.
These probabilities may be a combination of subjective and objective probabilities. The latter being based
on historical data and reports.
These initial probabilities are termed as prior probabilities.
Later new information is received – Survey or Product Test or some stray event.
This new information is factored in to generate the posterior probabilities.
Bayes’ theorem is the tool for revising the prior probabilities.
The probability model may be continuously updated as ne w data flow in!
Titanic:
Bayes’ Theorem
Suppose Ei’s are ALL the possible outcomes and the prior probabilities P(E ) have been assigned to them.
The event F has occurred.
( ) ( | )
P(E |F) = ( ) ( | ) ( ) ( | ) ⋯ ( ) ( | )
( ) ( | )
= ( & ) ( & ) ⋯ ( & )
( ) ( | )
= ( )
Required:
2. The Ei’s are collectively exhaustive – that is, are all the possible events
The MD’s EA is often late returning from lunch. Based on observation, HR assigned the following
probabilities:
Out 40%
Cubicle 1%
Events:
E1: Lunch Out, E2: Lunch at the canteen, E3: Lunch in the cubicle
Assumption: These are all the possibilities for the EA to have lunch
F: EA came late today
Prior Probabilities: P(E1) = P(E2) = P(E3) = 1/3
Conditional Probabilities: P(F | E1) = 0.40, P(F | E2) =0.19, P(F | E3) = 0.01
Events:
A1: Lunch Out, A2: Lunch at the canteen, A3: Lunch in the cubicle
Assumption: These are all the possibilities for the EA to have lunch
F: EA came late today
Prior Probabilities: P(A1) = P(A2) = P(A3) = 1/3
Conditional Probabilities: P(F|A1) = 0.40, P(F|A2) =0.19, P(FA3) = 0.01
The disease is present in 0.5% of the population. It is a deadly disease and death is almost always
inevitable.
But there is a test that can detect the disease.
The True Positive is 99% while the False Positive is 5%.
That is P(Test is Positive given that you have the disease) = 0.99
And P(Test is Positive given that you do not have the disease) = 0.05
Question: If you test positive, should you panic?
Events:
E1: You have the disease, E2: You do not have the disease
F: You have tested positive
Prior Probabilities: P(E1) = 0.005, P(E2) = 0.995
Conditional Probabilities: P(F|E1) = 0.99, P(F|E2) =0.05
Need to compute P(A1|F)
Step 1
Identify the mutually exclusive events (E’s) that make up the Sample Space; Identify the Fact F.
Prepare the table with 5 columns and (n+1) rows, where n is the size of the sample space
Step 2
Enter
Column 1 The events E’s
Step 3
Column 4 Compute the joint probabilities P(E&F) using P(E&F) = P(E) * P(F | E)
Step 4
Step 5
Column 5 Compute the posterior probabilities using P(E | F) = P(E & F) / P(F).
Lesson -3
Random Variables
The random variable inherits the probabilities of the events of the random experiment
EXAMPLE
Event p X p
H 0.5 10 0.5
• # of dependents of an employee
• Life of a tire
The Probability Function lists the possible outcomes and their probabilities
The probability function provides the probability for each value of the random variable.
The required conditions for the probability function are: f(x) ≥ 0 & ∑f(x) = 1
The expected value, or mean, of a random variable is a measure of its central location. E(X) = μ = ∑ xf(x)
The standard deviation, σ, is defined as the positive square root of the variance.
Example: Suppose X managed to convince Ace to reduce the charges on every order by Rs.100.
Example: X gets a 10% discount on all orders. In the above equation = 0.9
You are about to launch a new product. The product was test marketed, but preference for body colour
was not.
There are six colours: Violet (1), Blue (2), Green (3), Yellow (4), Orange (5) and Red (6). Initially it must be
assumed that each body colour is equally preferred.
A Poisson distributed random variable is used in estimating the number of occurrences in a specified
interval of time or space
Application
Sizing the size of operations at a bank, call centre, service cenre, petrol bunk, …
Examples
Requirements
X ~ Π(μ)
P(X = x) = !
μ = E(X) = V(X)
Employees visit the ATM at the average rate of 6 per hour in the post-lunch period. What is the probability
of 2 arrivals in 30 minutes in the post-lunch period.
The typical example: Tossing a Coin & we are interested in the number of Heads
3. The probability of a success, denoted by p, does not change from trial to trial.
B(n, p) – Example 1
Indica sells encyclopedias targeted towards children using door-to-door saleswomen. Ms Rita, a
saleswomen with Indica, has randomly selected 20 houses in he neighborhood to sell the product. From
past experience, Rita knows that the probability that a sale will be made is 0.1.
# of trials, n, is 20
1. Trials are identical in the sense that each trial is a household
3. The probability may change. Initially p, the probability of success, may be 0.1. But as the day
progresses, Rita may get tired the success rate may decrease
B(n, p) – Example 2
A 1000-strong IT firm is concerned about a low retention rate for its employees. In recent years, management
has seen a turnover of 10% of the employees annually. HR takes a random sample of 5 employees and meets
each one separately to understand their concerns and also whether they are planning to leave.
1. # of trials, n, is 5; identical in the sense that each trial is an interview with an employee
The supplier claims that the defective rate is 1%. We test the consignment of 1000 items by sampling 10
items and classifying each as Defective or Not Defective
Note:
Since this is sampling without replacement
So ‘p’ changes as we sample.
If N >> n, so that p does not change by much
Rule of Thumb: < 5%
B(n, p) – Probability Function, Mean and Variance
X ~ B(n, p)
p: Probability of Success
q: Probability of Failure
!
f(x) = P(X=x) = p q( )
= !( )!
p q( )
μ = n*p
2= npq
σ = npq
B(3, 0.1) - Example
• P(X = x)
exp(μ) - Properties
Suppose the rate at which cars cross the toll booth is 10 cars/h, and the arrival process can be described by
a Poisson Distribution. Write down the Poisson & Exponential distributions that describe the process.
Suppose calls on your cell phone follow an exponential distribution with the average time between calls
being 10m. What are the Poisson & Exponential distributions that describe the process? (For the Poisson
distribution, take the time period to be 1 h.) Find the probability that there will be no calls in the next 1
hour.
N(μ, σ) - Properties
N(μ, σ) – Examples
Normal Probability Distribution
N(4, 2) – Example
N(4, 1) – Example
Session -4
Purpose of Sampling
• Estimation
What is μ / p / σ?
• Hypothesis Testing
• For Hypothesis Testing, we examine how “far” the statistic is from the standard
Sampling Methods
• Probabilistic Sampling
o Simple Random Sampling (SRS)
o Stratified Random Sampling
o Cluster Sampling
o Systematic Sampling
• Non-Probabilistic Sampling
o Convenience Sampling
o Judgment Sampling
Simple Random Sample (SRS)
HR received 900 applications for the advertised job. The applicants were numbered, from 1 to 900, as their
applications arrived. Director HR wanted a simple random sample of 30 applicants to understand the
profiles of the applicants.
A simple random sample of size n from a finite population of size N is a sample selected such that each
possible sample of size n has the same probability of being selected
Conducting an SRS
Stratified Sampling
HR wants to introduce a yoga program in the organization. She wants to interview a sample of 40
employees to understand how it will be received and what employees are looking for..
If HR believes that the staff comprises Top, Middle and Lower Level Managers, stratified sampling is
required
(Formulas are available for combining the stratum sample data into one population parameter estimate)
Cluster Sampling
Once every quarter the large nation-wide fast foods chain conducts a quality check on its restaurants. 30
randomly selected restaurants are audited
The population is composed of clusters, and each cluster is a representative of the population on a small
scale.
• All elements within each sampled (chosen) cluster form the sample OR a sample from each cluster
is chosen.
Systematic Sampling
Examples
This method has the properties of a simple random sample, especially if the list of the population elements
is a random ordering.
Judgment Sampling
The person most knowledgeable on the subject of the study selects elements of the population that he or
she feels are most representative of the population.
Types of Sampling
• Probabilistic Sampling
o Simple Random Sampling (SRS)
o Stratified Random Sampling
o Cluster Sampling
o Systematic Sampling
• Non-Probabilistic Sampling
o Convenience Sampling
o Judgment Sampling
Point Estimation
• The sample data is used to compute a value of a sample statistic that serves as an estimate of a
population parameter.
• X as the point estimator of the population mean μ.
• s is the point estimator of the population standard deviation σ.
• s2 is the point estimator of the population variance σ2.
• p is the point estimator of the population proportion p.
A sample of 5 weeks of call data was collected. Develop a point estimate for m and s. If success is getting
over 90 calls, estimate the percentage of successful weeks
• E(X) = μ
• E(p) = p
• E(s2) = σ2
• Interesting Point to Note
o E(s) ≠ σ!!
E(s) < σ
Variance of 𝐗/ 𝐩
Std Dev of 𝑋 = 𝜎 ~
√
Std Dev of 𝑝̅ = 𝜎 ̅ ~
Unbiased Estimators
The expected value of the sample statistic is equal to the population parameter being estimated
E(X) = μ
E(p) = p
E(s2) = σ2
Sampling Distribution
• A sampling distribution is the distribution of a statistic that would be produced in repeated random
sampling from the same population.
• For example:
o We collect a sample, record the mean and then discard the sample
o Collect another sample, record the mean and then discard the sample, Do this again and again,
ad nauseam!
• We can then create the histogram and subsequently the distribution
Example 1
• The Central Statistics Office (CSO) estimates the “Per Capita Income” in 2016-17 will be Rs.100,000.
• We will take the following steps ad nauseam:
• Take a random sample of 1000 Indians
• Compute the Average Income
• Plot the value in a Dot Plot
Dot Plot of Sample Mean – 1st Sample
Sample Size: n
3. Compute ∑Xi
4. Compute X = (∑Xi) / n
We know:
E(X) = μ & 𝜎 =
√
E(p) = p & 𝜎 ̅ =
X ~ N(μ, σ) ⇒ X ~ N(μ, )
√
The sample mean 𝑋 is approximately normally distributed for large sample sizes
𝑋 ~ N(μ, )
√
If np ≥ 5 and nq ≥ 5
𝑝𝑞
𝑝̅ ~N 𝑝,
𝑛
Takeaways
If n ≥ 30, 𝑋 ~ N(μ, )
√
If np ≥ 5 and nq ≥ 5, 𝑝̅ ~𝑁 𝑝,
Sampling Distribution of 𝑿
𝑋 ~ N(μ, )
√
The Central Bank conducted a survey of bank accounts of small farmers in a certain province and found the
average money in an account is Rs.1400 & σ = Rs.84. A sample of 36 accounts was taken. What is the
sampling distribution of the sample mean?
Sampling Distribution of 𝑿 Example
What is the probability that a simple random sample of 36 accounts will provide an estimate of the
population mean that is within +/-10 of the actual population mean μ ?
In other words, what is the probability that 𝑋 will be between 1390 and 1410?
Sampling Distribution of 𝒑
𝒑𝒒
𝒑~𝐍 𝒑, 𝒏
Sampling Distribution of 𝒑
• We know:
• If n is large enough (n ≥ 30), 𝑋 ~ N(μ, )*
√
• If X ~ N(μ, σ), 𝑋 ~ N(μ, )
√
• What happens if σ is unknown?
• If X is Normal, ~ t distribution with (n − 1) Degrees of Freedom
√
• Compute (n - 1)s2/2
χ2 distribution: (n - 1)s2/s 2
The F-Distribution
• Assume we repeatedly select a random sample of size n1 from one normal population and another random
sample n2 from another normal population.
s
• And each time, we compute
s
s
• If we do this ad nauseam, we would arrive at the distribution of the ratio of two variances: F= .
s
• The distribution formed in this manner approximates an F distribution with the following degrees of
freedom:
o v1 = n1 - 1 and v2 = n2 - 1
Reading the F Tables
s
The sampling Distribution
s
Lesson -5
• PollStar: 220/500 = 44% will not be the percentage of votes polled at election time
PollStar found that 220 of the 500 contacted, favored their client.
That is p = 44%
1. The Statistic (X or p)
3. The distribution of X or p
But if the daily calibration has been proper, the population mean would be 12 liters.
At the start of production, a random sample of 36 bottles is selected and the sample mean is computed.
Sampling Distribution of 𝐗
1. μ = 12 liters
2. = 0.6 liters
3. n = 36
𝑿 ~ N(12, 0.1)
Let’s Experiment
𝑿 = 𝟏𝟐. 𝟐
𝑿 = 𝟏𝟏. 𝟖
𝑿 = 𝟏𝟐. 𝟏
Experiment’s Conclusion
90% Confidence Interval (𝐗 - 0.1645, 𝐗 + 0.1645)
A confidence interval is a range of values that is likely to contain the (unknown) parameter.
If random samples are drawn repeatedly and for sample, the confidence interval is constructed, a certain
percentage of the confidence intervals will contain the population mean. This percentage is the confidence
level.
Recall X & p are the sample mean and sample proportion respectively.
X & p cannot be expected to provide the exact value of μ or p.
An interval estimate is computed using
o The Point Estimate, say X or p
o The Sampling Distribution of X or p
o The Standard error of X or p: Namely 𝜎 = or σ =
√
o The prescribed Confidence Level
An interval estimate can be computed by adding and subtracting a margin of error to the point estimate.
z / √ where z / is the z value providing an area of /2 in the upper tail of N(0, 1)
t / √ where t / is the t value providing an area of /2 in the upper tail of tn-1
z / where z / is the z value providing an area of /2 in the upper tail of N(0, 1)
Margin of Error and the Interval Estimate III
Given α:
Formulas
/
X & σ is known (n ≥ 30 or X is Normal): E = z / √ ⇒n=
∗ ∗ / ∗ ∗
p with np* & nq* >= 5: E=z / ⇒n=
Ozone sells 12 liter bottles. 36 bottles were sampled and the mean volume of water was found to be
12.17. The population standard deviation is believed to be 0.6 liters. Compute the 95% Confidence Interval
of μ.
Ozone sells 12 liter bottles. 36 bottles were sampled and the mean volume of water was found to be 12.17.
The population standard deviation is believed to be 0.6 liters. Compute the 95% Confidence Interval of μ.
Sampling Distribution of 𝑋 .
~ N(0, 1)
4. Significance Level 5%
: 0.6; 95% Confidence Interval of μ: (11.974 liters, 12.366 liters). E: 0.196 liters.
Ozone sells 12 liter bottles. 36 bottles were sampled and the mean volume of water was found to be
12.17. The sample standard deviation was 0.6 liters. Compute the 95% Confidence Interval of μ.
Ozone sells 12 liter bottles. 36 bottles were sampled and the mean volume of water was found to be
12.17. The sample standard deviation was 0.6 liters. Compute the 95% Confidence Interval of μ.
Sampling Distribution of 𝑋 .
~ t distribution with 35 DoF
1. Significance Level 5%
In the current by poll, PollStar’s client wanted a 99% confidence interval for the proportion of voters that
support the client.
PollStar sampled 500 voters and found that 220 would vote for their client.
In the current by poll, PollStar’s client wanted a 99% confidence interval for the proportion of voters that
support the client.
PollStar sampled 500 voters and found that 220 would vote for their client.
1. Point Estimate of p: 220/500 = 0.44
. ∗ .
2. Estimate for Std Error of p: = = 0.0222
1. Significance Level 1%
PollStar’s client were unhappy since the margin of error was 5.72% while they wanted the sampling error to
be 3%.
PollStar’s client were unhappy since the margin of error was 5.72% while they wanted the sampling error to
be 3%.
. ∗ . . ∗ .
p-z / ,p+z / or p - 2.576 , p + 2.576
. ∗.
Required Margin of Error is 0.03 ⇒ 2.576* = 0.03
Lesson -6:
2. QC Head’s demand that it should not be more nor should it be less than 12 litres
Hypothesis Testing
• The testing procedure samples the population to test the two competing statements H0 and Ha.
A new drug is developed to lower blood sugar more than the existing drug.
Alternative Hypothesis: The new drug lowers blood sugar more than the existing drug.
Null Hypothesis: The new drug does not lower blood sugar more than the existing drug.
The label on a coffee can states that it contains 500g. For the Consumer Forum
The label on a coffee can states that it contains 500g. For the Quality Inspector
The equality part of the hypotheses always appears in the null hypothesis.
H0 and Ha take one of the following three forms:
o H0: μ ≥ μ0 & Ha: μ < μ0, One-Tailed Test (Lower Tail or Left Tail)
o H0: μ ≤ μ0 & Ha: μ > μ0, One-Tailed Test (Upper Tail or Right Tail)
o H0: μ = μ0 & Ha: μ ≠ μ0, Two-Tailed Test
μ0 is the hypothesized value of the population mean
Type I & Type II Errors
The probability of making a Type I error when the null hypothesis is true as an equality is called the
level of significance.
In this course we will control only the Type I error.
Such tests are also called significance tests.
o The probability of making a Type I error when the null hypothesis is true as an equality is called the
level of significance.
o In this course we will control only the Type I error.
o Such tests are also called significance tests.
Three Approaches
EXAMPLE
The CFO is focusing on cost reduction. CFO believes that excess water is being dispensed. If that is the
case, he wants to shut down production for a major overhaul of the machinery.
36 bottles were sampled and the mean volume of water was found to be 12.17. The population standard
deviation is believed to be 0.6 liters.
Computer packages compute the p-value and we need to compare this with a
• For a Lower-Tailed Test, the p-value is the area to the left of the test statistic
• For a Upper-Tailed Test, the p-value is the area to the right of the test statistic
• The critical region comprises 2 tails and each tail has area α / 2
• If test statistic < 0, we need to compare area to the left of the test stat & α / 2
• If test statistic > 0, we need to compare area to the right of the test stat & α / 2
• Computer Packages compute p-value as 2 * Area to the left / right of the test stat
Testing A Mean
Sampling Distribution: ⁄√
~ N(0, 1)
Test Statistic: z = ⁄√
Decision Rule: Reject H0 if p –value < α Else Do Not Reject H0
𝜎 Unknown & X is Normal
Sampling Distribution: ⁄√
~ t distribution with (n-1) degrees of freedom
Test Statistic: t = ⁄√
Decision Rule: Reject H0 if p –value < α Else Do Not Reject H0
EXAMPLE 1
Example 1 LIGHTBOARD
Example 2
After a series of high profile TV Ads, has our market share increased from the previous level of 20%
HR has successfully implemented some team initiatives. Has the satisfaction level gone up?
The politician has announced some populist measure. Has his popularity index gone up?
μ > 12 liters
Event:
Sample Mean > 12.1645 ⇒ Stop production ⇒ Right Decision
Sample Mean ≤ 12.1645 ⇒ Continue production ⇒ Wrong Decision
lesson -7
Compare
Productivity of two shifts
Customer satisfaction levels of two competing mobile service providers
Volume of liquid dispensed at the bottling plant before and after overhaul of machinery
Marital Happiness Levels of married couples.
Sugar Levels before and after insulin injection
Proportion of successful cold calls before and after Voice Modulation training
Defect Rate before and after the implementation of a Six Sigma project
Practical Considerations
Testing μ1 - μ2
Let
Four Cases
Testing p1 - p2
Let
• {X1} be the measurements for the 1st population with proportion of success p1
• {X2} be the measurements for the 2nd population with proportion of success p2
• TS Ltd does a form of Level 0 support for major consumer goods manufacturers.
TS operates two shifts, wherein customers call in toll-free, and TS operatives register the issues
along with the customer details.
• There are a few metrics that the clients want the call center to track
The monthly Average Call Handling Time (AHT) for each operator
1. The SLA for the monthly AHT across the team is 50 seconds
2. The SLA for the variance of the monthly AHT across the team is 9 seconds
The Scenario – II
• TS management has received complaints regarding the quality of service, especially the curtness of the
operators
• Operators were re-trained on good telephone etiquette and on why the script must be followed
• Kumar also brought a sharp focus on basic metrics and published a weekly dashboard
The Data – I
Management informed Kumar that Shift 1 had better supervisors compared to Shift 2. He felt that this
would be reflected in the AHTs.
The tab “Known Variances” in AHT.xls has sample data of AHTs for each shift for the 1st month. The sample
statistics appear below.
Shift 1 Shift 2
The supervisors informed him that the standard deviation for the Shift 1 and Shift 2 were 5.5 s and 6.5 s
respectively. Kumar felt it was reasonable to work with these estimates.
Kumar realized that both teams are well below the 1st SLA. But can we conclude at α = 1%, that Shift 1 is
doing better than Shift 2 for this SLA?
3. Right-Tail Test
( - ) ( ) ( - )
4. As n1 & n2 >= 30, sampling distribution is = .
~ N(0, 1)
7. Since the test statistic is falling in the critical region, we reject H0.
There is sufficient statistical evidence to infer that the average AHT for entire Shift 1 is greater than the
average AHT for the entire Shift 2.
Three months have passed since Kumar joined. He wants to know whether Shift 2 has caught up with Shift
1 with regard to AHT.
Kumar realizes that the earlier standard deviations may not be applicable. He also believes that because of
the standardization of processes that have been implemented, the variances may be equal.
The tab “Unknown Variances” in AHT.xls has sample data of AHTs for each shift for the 3rd month. The
sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
𝐗 41.8824 s 51.5882 s
s2 27.9853 s2 24.4314 s2
Test at 99% confidence level whether Shift 2 has caught up with Shift 1.
Kumar was surprised that Shift was in fact doing better than Shift 1.
He wondered whether his assumption that the variances are equal may be wrong. He decided to test the
same hypothesis taking variances to be unknown but not equal.
The tab “Unknown Variances” in AHT.xls has sample data of AHTs for each shift for the 3rd month. The
sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
X 41.8824 s 51.5882 s
s2 27.9853 s2 24.4314 s2
The tests have shown that Shift 2 has done exceedingly well. They have met the SLA of 50 seconds while
Shift 1 has regressed.
Kumar wanted to understand the efficacy of his interventions by identifying operators in Shift 2 who were
common in the samples from the 1st and 3rd months
The tab “Matched Samples Shift 2” in AHT.xls has the sample data that Kumar wants. The sample statistics
appear below.
Has Shift 2 has significantly improved in the 3rd month. Test at 1% significance level.
Kumar was still not convinced that Shift 2 had improved so much while Shift 1 had slipped. He decided to
test whether the proportion of competent operators were the same in the two shifts.
He defined the acceptable range of AHT as [50 – 3, 50 + 3], where 50 seconds was the target AHT and 3
seconds was the target standard deviation of the AHT.
He asked his EA to test whether the proportions were the same using the sample data for the 3rd month at
α = 1%, and submit the Summary Report.
The tab “Proportions” in AHT.xls has the data and the sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
# of Successes 2 20
p 0.1176 0.5882
p -p pq + = (p -p )/.1471 ~ N(0,1)
Lesson -8
• The average is 1 liter as claimed by the bottling plant but what about the variance?
• The average strength in the production of 1mg Amaryl is 1 mg but what about the variance of drug
weight?
The project manager of Shift 2 was excited that Shift 2 had improved so much while Shift 1 had slipped. He
also claimed that they were meeting the second SLA – the variance was less than the target 9 s2
Summary Data
The tab “variances” in AHT.xls has the data and the sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
Summary Report
3. Right-Tail Test
𝐬𝟏𝟐
Sampling Distribution of
𝐬𝟐𝟐
s
For each pair of samples, is computed
s
s
The distribution of is an F distribution with
s
s
The Test Statistic will be
s
1. H0: σ ≥ σ & Ha:σ < σ , One-Tailed Test (Lower Tail or Left Tail)
2. H0: σ ≤ σ & Ha:σ > σ , One-Tailed Test (Upper Tail or Right Tail)
• Label the populations so that the Right Tailed Test will prevail!
• If Ha:σ <σ , label Women as Population 1 and Men as Population 2 so that Ha:σ > σ
Example
Kumar was still not convinced that Shift 2 had improved so much while Shift 1 had slipped.
He wanted to test at 10% significance level whether the variances of both shifts were equal for the 3rd
month
Note that this is a Two-Tailed Test
The population with the greater sample variance will be labelled 1
Summary Data
The tab “variances” in AHT.xls has the data and the sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
Summary Report
s
4. Sampling Distribution: ~ F(16, 33)
s
s
5. Test Statistic: = 27.9853 24.4314 = 1.1454 ~ 1.15
s
7. Conclusion: Since Test Statistic does not fall in the critical region, do not reject H0.
Example
Test at α = 5% whether the variance of Shift 2 is less than the variance of Shift 1 in the 3rd month.
s
• Test Statistic:
s
Summary Data
The tab “variances” in AHT.xls has the data and the sample statistics appear below.
Shift 1 Shift 2
n 17 Operators 34 Operators
Summary Report
s
4. Sampling Distribution: ~ F(16, 33)
s
s
5. Test Statistic: = 27.9853 24.4314 = 1.1454 ~ 1.15
s
7. Conclusion: Since Test Statistic does not fall in the critical region, do not reject H0.
There is no evidence to show that the variances are not equal
Lesson – 9
The Χ2Test
• Test of Independence
• Test of Homogeneity
The Framework
• Categorical Data
Kumar’s, a large supermarket chain, wanted to minimize the time customers spent at check-out time – there
were separate counters for cash and plastic payments.
The fresh MBA graduate wondered whether customers chose the mode of payment based on the bill size.
Accordingly he grouped the billing amount into 3 categories:
< 500, 500 – 2000 and >2000.
He collected the data from the ERP and created the following contingency table.
Concern: “Whether customers chose the mode of payment based on the bill size”
This is equivalent to
Are the two variables, Bill Size and Mode of Payment, dependent?
Y: Bill Size: Less than Rs.500; Between Rs.500 and Rs.2000; Greater than Rs.2000
2. Select a random sample and record oij , for each cell of the contingency table
Example I
Kumar’s, a large supermarket chain, wanted to minimize the time customers spent at check-out
time – there were separate counters for cash and plastic payments.
The fresh MBA graduate wondered whether customers chose the mode of payment based on the
bill size. Accordingly he grouped the billing amount into 3 categories: < 500, 500 – 2000 and >2000.
He collected the data from the ERP and created the following contingency table.
Bill Size
The Report
Y: Bill Size: Less than Rs.500; Between Rs.500 and Rs.2000; Greater than Rs.2000
2. α = 0.05
Test of Homogeneity
Example 2
Bill Size
• …..
. . .
• H0: p1c = p2c = = prc
• …..
. . .
• H0: pr1 = pr2 = = prc
• H0: p12 = p22 pertaining to Bills of amount between 500 & 2000
Versus
Kumar’s, a large supermarket chain, wanted to minimize the time customers spent at check-out time –
there were separate counters for cash and plastic payments.
The fresh MBA graduate wondered whether customers chose the mode of payment based on the bill size.
Accordingly he grouped the billing amount into 3 categories: < 500, 500 – 2000 and >2000.
The ERP was down. So he collected the following data from each check-out counter
Bill Size
• H0: p12 = p22 pertaining to Bills of amount between 500 & 2000
Versus
Kumar’s, a large supermarket chain, wanted to minimize the time customers spent at check-out time –
there were separate counters for cash and plastic payments.
The fresh MBA graduate wondered whether customers chose the mode of payment based on the bill size.
Accordingly he grouped the billing amount into 3 categories: < 500, 500 – 2000 and >2000.
The ERP was down. So he collected the following data from each check-out counter
Bill Size
The Report
Y: Bill Size: Less than Rs.500; Between Rs.500 and Rs.2000; Greater than Rs.2000
1. H0: p11 = p21 (Bills < 500), H0: p12 = p22 (Bills between 500 & 2000) & H0: p13 = p23 (Bills > 2000) Versus Ha: At
least one H0 is false
2. α = 0.05
The three soft drinks majors have a market share of 25%, 35% and 40%
One major went on an ad blitz
Have the market shares changed?
The three soft drinks majors have a market share of 25%, 35% and 40%
One major went on an ad blitz
Have the market shares changed?
The fresh MBA graduate at Kumar’s noted that H0 was rejected in the previous tests. An article in a
business journal discussed cash payment versus plastic payment in a general scenario. From the report,
the graduate inferred that 15% of payments made by plastics were for bills less than 500, 25% were for bills
between 500 and 2000 and 60% was for bills over 2000.
He wondered whether that was the case at Kumar’s. To test this hypothesis, he collected the following
data.
Bill Size
< 500 500 - 2000 > 2000
Plastic 100 200 500
Test the hypothesis at 5% significance level
Let p1 be the proportion of payments made by plastic for bills < 500
Let p2 be the proportion of payments made by plastic for bills between 500 & 2000
Let p3 be the proportion of payments made by plastic for bills > 20000
H0: p1 = 0.15, p2 = 0.25 = p3 = 0.6
H0: At least one equality is false
The Report
The Consumer Forum wants to test the effect of 4 fuel additives on mileage
Does an MBA specialization has any effect on the starting salary
The effect of 5 diets on liver cholesterol
The Call center needs to evaluate 3 different training methods
Online Retailer: Do sharper visuals of products lead to higher sales
Catalogue Retailers: Which call to action leads to higher sales
Statistical Studies
Dependent variable
– It is a continuous variable
Test units
– The control group are the test units that do not receive any treatment and acts as a
benchmark
Statistical Designs
The Hypotheses
H0: μ1 = μ2 = μ3 = … . = μk
Rejecting H0 means that at least two population means have different values.
Xi ~ N( , σ), i = 1, 2, 3, …k
Three Estimators of 2
We have 3 populations: X1, X2, X3 ~ N( , σ) & 3 samples, one from each population
∑
1. Combine all 3 samples and compute the sample variance =
∑
2. (Between-Treatments Estimate) MSTR = =
∑ ∑
3. (Within-Treatments Estimate) MSE = =
• If H0 is False, only the first and third are unbiased. The second will overestimate 2
∴ Therefore the test statistic is the ratio of the 2nd to the 3rd!
Mean Square due to Treatments (MSTR)
MSTR (Mean Square due to Treatments) denotes the weighted variance of the sample means:
∑ n X −X
MSTR =
k−1
• MSTR is a χ2 distributions with (k-1) DoF
MSE (Mean Square Error) denotes the weighted mean of the sample variances:
∑ ∑ x −X ∑ (n − 1)s
MSE = =
n −k n −k
• MSE is a χ2 distribution with (nT - k) DoF
∑ ∑ ∑
MSTR = & MSE =
MSTR and MSE are χ2 distributions with (k - 1) and (nT - k) DoF respectively
Computations Required
F = MSTR / MSE
MSTR = & MSE =
Preliminary Remarks
Factor: Shift
Test Statistic = = .
= 1.76
Since Test Statistic does not fall in the critical region, Do not reject H0
Lesson – 11
Linear Regression
Scatter Plots
y = 0 + 1x +ε,
Or E(y) = 0 + 1x
o where:
o 0 and 1 are the parameters of the model,
o ε is a random variable called the error term
o The model represents a linear relationship.
o 0 is the y intercept of the regression line.
o 1 is the slope of the regression line.
o E(y) is the expected value of y for a given x value
𝑦 = b0 + b1x
Where
Kumar’s Clothing Emporium periodically has a special week-end sale. As part of the advertising campaign
Kumar runs one or more TV commercials on Friday preceding the sale. Data from a sample of 5 previous
sales are shown below.
Correlation Coefficient
∑( )( )
( )
r= =
( ) ( )
( ) ( )
Scatter Plots
Coefficient of Determination
R2 - Coefficient of Determination
𝑆𝑆𝑇 = ∑(y − y)
SSR = ∑(y − y)
SSE = ∑(y − y )
where:
R =
R2 - Coefficient of Determination
Computing R2: Example
Interpreting R2 & r
Model Assumptions I
y = β0 + β1x +ε
Model Assumptions II
y = β0 + β 1x +ε
2. The variance of ε, denoted by σ2, is the same for all values of the independent variable.
• X is a deterministic variable
Model Assumptions IV
y = β0 + β 1x +ε
Estimate for σ2
Where
Kumar’s Clothing Emporium periodically has a special week-end sale. As part of the advertising campaign
Kumar runs one or more TV commercials on Friday preceding the sale. Data from a sample of 5 previous
sales are shown below.
Understanding the Summary Output
Lesson -12
Three types of coffee beans are required: Bean 1, Bean 2 & Bean 3
He makes a profit of
o Objective Function
o Maximizing Profit or Minimizing Cost
o The objective function must be expressed as an algebraic expression
o This expression must be linear
o Decision Variables
o Variables whose values can be controlled by the manager
o # of units to produce
o Constraints
o Resource availability
o Company Policy
o Each constraint must be expressed as an algebraic expression
o This expression must be linear
Towards a Solution
o A feasible solution: A set of values of the decision variables that satisfies all the constraints.
o The Feasible Region: The set of all feasible solutions
o An optimal solution: A feasible solution that leads to the optimum solution (largest objective
function value when maximizing or smallest when minimizing).
o Special Cases
o An LPP may have
Exactly one solution
Infinite number of solutions
No solution
An unbounded solution
o Problem Modeling is the process of translating a verbal description of the problem into a
mathematical statement.
o The Process
o Understand the problem thoroughly
o Describe the objective
o Describe each constraint
o Get a Sign off on the Problem Statement from the management
o Define the decision variables
o Express the objective function algebraically in terms of the decision variables
o Express the constraints algebraically in terms of the decision variables
o Finally use a computer package to solve the problem
Introduction to Linear Programming
Decision Variables
Objective Function
Constraints
The objective function and the constraints must be expressed as linear functions of the decision variables
The neighbourhood shop produces two blends of coffee: Nilgiri AA & Nilgiri A. Three types of coffee beans
are required.
Standard Form
– Bounded
– Unbounded
Even if the feasible region is unbounded, the LPP may have an optimal solution
– An unbounded solution
Infeasibility
Unbounded Solution
• Unique solution
• Alternative solutions
Lesson -14
o Post-Optimality Analysis measures the robustness of the solution – how sensitive the solution is to
changes in the input data
o This is important
o When the business environment is subject to change
o Because it is difficult to get precise data
o Questions we will examine are:
o If C is the objective coefficient for a decision variable, what is the Range of Optimality for C
over which the optimal solution does not change (although the optimal value of the
objective function may change).
o If the right-hand side of a constraint changes within the Range of Feasibility, how much will
the objective function change by (Shadow Price)
o If a decision variable is not in the solution (i.e. its value is 0), what is the change (Reduced
Cost) to be made to its coefficient in the objective function so that it enters the solution
o Excel Solver provides the relevant information
Range of Optimality
The Range of Optimality for an Objective Function coefficient is the range of values this coefficient can
assume without changing the current solution.
A narrow Range of Optimality for a decision variable should be a cause of concern. Especially if the
objective coefficient is near the endpoints of the range.
o The Shadow Price of a Resource (Constraint) is the change in the optimal objective value for a unit
change in the Resource (Constraint)
o As the RHS increases / decreases, other constraints will become binding and limit the change in the
value of the objective function.
o Thus the Shadow Price is effective only in the range where the current binding constraints remain
binding and the non-binding constraints remain non-binding – The Range of Feasibility
o Note that there is a positive effect if the feasible space increases and a negative effect otherwise
Quartiles Definition
Quartiles are a set of three values that divides a data set into four parts such that each part has an equal number of data
values.
Method 1
Step 1: First, arrange the data set in ascending order and calculate it’s median to divide the data set into two halves.
Step 2: Make a data set of each half and do not consider the median in either of the two parts.
Step 3: Now calculate the median of each lower and upper half of the sets. The median of the lower half of the set is called the
first quartile, and the median of the upper half of the set is called the third quartile.
Example: Consider the production of whole milk powder from 2008 to 2018 in the UK in millions.
Step 1: First, arrange the data set in ascending order and calculate its median to divide the data set into two halves.
Step 2: Make a data set of each half and consider the median in both the parts.
Now, calculate the median in both halves. The median of the lower half is called the first quartile, and that of the upper half is
called the third quartile.
Example: Consider the production of whole milk powder from 2008 to 2018 in UK in millions.
The data set has 11 values. Hence, the median is calculated as follows
IQR- Q3-Q1
1.5*IQR
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set will be within z standard deviations of the mean, where z is any value greater
than 1.
Chebyshev’s theorem requires z > 1, but z need not be an integer.
• At least 75% of the data values must be within 2 standard deviations from the mean
• At least 89% of the data values must be within 3 standard deviations from the mean
• At least 94% of the data values must be within 4 standard deviations from the mean
Descriptive Statistics
Describing Central Tendency
• In addition to describing the shape of a distribution, want to describe
the data set’s central tendency
• A measure of central tendency represents the center or middle of the data
What is Average?
• If we are having ‘n’ observations that needs to be replaced by a single
observation then average is most suitable number.
• It’s a number around which all the observations lies.
The Mean
Geometric Mean
Harmonic Mean
Dispersion
Quartile Deviation
Inter Quartile Range = (Q3 – Q1)
Quartile deviation = (Q3 – Q1)/2.
It is also known as semi-inter quartile range.
Comprehensive Examination
(EC-3 Regular)
Q.1 Set. (A) Explain and Compare- a) Covariance and Correlation, b) Normal Distribution and Sampling
Distribution, and c) One-tail and Two-tail hypothesis tests. Do the comparison in a table with
columns and rows, that is- side-by-side comparison.
[9]
[Common instructions for all questions- Upload only hand-written material; only hand-written material
will be evaluated. 2. Do not type the answer in the space provided below the question in the exam portal.
3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
a)
Covariance:-
Covariance is a statistical measure that shows whether two variables are related by measuring
how the variables change in relation to each other. This is clear when you break down the word.
Co- as a prefix often indicates some sort of joint action (like co-workers, co-owners, coordinate)
and variance refers to variation or change. So, covariance measures how two things change
together. It tells you if there is a relationship between two things and which direction that
relationship is in.
Correlation:-
Correlation, like covariance, is a measure of how two variables change in relation to each other,
but it goes one step further than covariance in that correlation tells how strong the relationship is.
Let's work through these two statistical measures one at a time to get a good understanding of
them, making sure we use the data that you collected when looking for trends with your ice
cream shop.
Both covariance and correlation measure the relationship and the dependency between
two variables.
Covariance indicates the direction of the linear relationship between variables.
Correlation measures both the strength and direction of the linear relationship between
two variables.
Correlation values are standardized.
Covariance values are not standardized
b).
Normal distribution:-
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is
symmetric about the mean, showing that data near the mean are more frequent in occurrence
than data far from the mean. In graph form, normal distribution will appear as a bell curve.
sampling distribution:-
If the population is normally distributed, the sampling distribution will be normal. If the population
is not normally distributed, the sampling distribution, if the samples taken are large, will be
approximately normally distributed.
c).
A one-tailed test results from an alternative hypothesis which specifies a direction. i.e. when the
alternative hypothesis states that the parameter is in fact either bigger or smaller than the value
specified in the null hypothesis.
A two-tailed hypothesis test is designed to show whether the sample mean is significantly greater
than and significantly less than the mean of a population. The two-tailed test gets its name from
testing the area under both tails (sides) of a normal distribution.
However, if the alternative hypothesis is not exhibited directionally, then it is known as the two-
tailed test of the null hypothesis., wherein the critical region is one both the tails.
...
Comparison Chart.
Basis of Comparison One-tailed Test Two-tailed Test
Sign in alternative hypothesis > or < ≠
Q.2 Set. (A) ThirdEyeCare NGO has provided spectacles at no-profit-no-loss basis to 60 workers who are
involved in precision jobs. A summary of the number of spectacles provided to the workers are
given in the table below
Profession Frequency
Jewellery making 14
Embroidery 8
Wood carving 10
Miniature painting 9
Stone carving 11
Watch restoration 8
Is there an evidence that the NGO was fair in distribution of spectacles? Use alpha=0.05. Assume the
proportion workers in the 6 professions are equal in the population. (Do this problem using formulas (no
Excel or any other software’s utilities). Clearly write the hypothesis, all formulas, all steps, and all
calculations. Underline the final result).
[6]
[Common instructions for all questions- Upload only hand-written material; only hand-written material
will be evaluated. 2. Do not type the answer in the space provided below the question in the exam portal.
3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
Step 1: H0 : the proportion workers in the 6 professions are equal in the population.
Ha :the proportion workers in the 6 professions are not equal in the population.
Q.3 Set. (A) StartUp Storage Co. has launched a new model of mobile battery in the market. Its
advertisement claims that the average life of the new model is 600 minutes under standard
operating conditions.
StartUp’s new model performance has surprised the mobile battery industry. The R&D
department of MoreLife, the largest manufacturer of mobile phone batteries, purchased 10
batteries manufactured by StartUp and tested them in its lab under standard operating
conditions. The results of the tests are given below-
Life (minutes)
630
620
650
620
600
590
640
590
580
630
Count= 10
Sum= 6150
Sample variance= 561.11
Test the claim made by StartUp’s advertisement. Use alpha =0.05. (Do this problem using formulas (no
Excel or any other software’s utilities). Clearly write the hypothesis, all formulas, all steps, and all
calculations. Underline the final result on the answer sheet).
[7]
[Common instructions for all questions- Upload only hand-written material; only hand-written material
will be evaluated. 2. Do not type the answer in the space provided below the question in the exam portal.
3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
Q.4 Set. (A) WeTrainWell Consultants has imparted training to 50 production workers selected at random
of a packaging material manufacturer. Before proceeding to train the remaining 950 workers,
the manufacturer would like to know whether the training by TrainWell changes productivity.
The productivity of 6 randomly selected workers before they underwent training and another 6
workers who underwent training is given in the table below-
Before After
40 50
35 40
35 55
45 50
40 35
45 70
Sum 240 300
Sample Stdev 4.47 12.25
Should the manufacturer ask WeTrainWell to train the remaining 990 workers? Use
alpha=0.05. Assume equal variance. (Do this problem using formulas (no Excel or any other
software’s utilities). Clearly write the hypothesis, all formulas, all steps, and all calculations.
Underline the final result).
[7]
[Common instructions for all questions- Upload only hand-written material; only hand-written material
will be evaluated. 2. Do not type the answer in the space provided below the question in the exam portal.
3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
Q.5 Set. (A) SpendMore, a credit card company would like to know whether there is a relationship between
the age of the customers and their spending. The results of 5 randomly selected customers are
given in the table below- [8]
Age Spending
2 3
3 5
4 6
5 6
8 7
(a) What is the Covariance and Coefficient of Correlation between Age and Spending?
(b) What is the Covariance and Coefficient of Correlation between Spending and Age?
(c) What is the Slope and the Intercept of Simple Linear Regression equation considering Age as
an independent variable (X).
(d) Draw a neat (approximate is ok) scatter chart with the regression line on it.
(Do this problem using formulas--- no Excel function/utility, no utility of any other software. Clearly
write all formulas, all steps, and all calculations. Underline the final result).
[Common instructions for all questions- Upload only hand-written material; only hand-written
material will be evaluated. 2. Do not type the answer in the space provided below the question in the
exam portal. 3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
Q.6 Set. (A) OnlyForMen Garments Co. produces three designs of men’s shirts- Fancy, Office, and Causal.
The material required to produce a Fancy shirt is 2m, an Office shirt is 2.5m, and a Casual shirt
is 1.25m. The manpower required to produce a Fancy shirt is 3 hours, an Office shirt is 2 hours,
and a Casual shirt is 1 hour.
In the meeting held for planning production quantities for the next month, the production manager
informed that a maximum of 3000 hours of manpower will be available, and the purchase manager
informed that a maximum of 5000 m of material will be available. The marketing department
reminded that a minimum of 900 nos. of Office shirts and a minimum of 500 nos. of Causal shirts
must be produced to meet prior commitments, and the demand for Fancy shirts will not exceed 1200
shirts and that of Casual shirts will not exceed 600 shirts. The marketing manager also informed that
the selling prices will remain same in the next month- Rs 1,500 for a Fancy shirt, Rs 1,200 for an
Office shirt and Rs 800 for a Casual shirt.
Write a set of linear programming equations to determine the number of Fancy, Office, and Casual
shirts to be produced with an aim to maximize revenue. [8]
[Common instructions for all questions- Upload only hand-written material; only hand-written material
will be evaluated. 2. Do not type the answer in the space provided below the question in the exam portal.
3. Do not attach any screenshot or file of EXCEL/PDF/PPT/any software].
1. Explain important similarities and differences between , give an example and make charts
a. Normal distribution and t distribution
b. Box plot and histogram
c.
Standard normal distribution is used when population standard deviation is known and sample
size is sufficiently large enough. If population standard deviation is unknown and sample size is
small, student t distribution is used. This is because it has heavier tails due to greater variability.
The degrees of freedom are used in student t distribution while in normal distribution are not
used.
2.
Q) average time spent by a guest at QM resort is 12 days with variance 4 days. what is the probability
that guest selected at random will stay less than 8 days, between 7 to 10 days and more than 13
days?
Q. The probability that a candidate gets selected for Commando training is 2%. What is the
probability that from a group of 3 friends, a) 2 friends get selected, and b) all 3 friends get selected?
Also, make a probability tree and show every relevant detail on the tree.
Q. QM Mobiles, a manufacturer of mobile handsets, gets batteries from MNC Batteries Ltd.
MNC Batteries has two manufacturing plants, located in South Korea and Japan. Past
records show that 2% of the batteries supplied by the South Korean plant are defective
and that 3% of the batteries supplied by the Japanese plant are defective. MNC supplies
80% of the requirements of mobile manufacturers from its South Korean plant and
remaining 20% from its Japanese plant.
(A) If a battery selected by QM Mobiles at random turns out to be good quality, what is the
probability that it was supplied from the Japanese plant?
(B) If a battery selected by QM Mobiles at random turns out to be of defective, what is the
probability that it was from the Korean plant?
(C) Draw a neat probability tree(s)/network(s) and show all probabilities and conditional
probabilities.
ANSWER:
(A)
probability that japanese plant supplies good battery + probability that South Korean plant
supplies good battery
= 0.2*0.97 + 0.8*0.98
= 0.978
Now,
Probabity that a battery is of good quality that it was supplied from the Japanese plant =
probability that japanese plant supplies good battery/Probability that a battery sellected is of good
quality
= 0.2*0.97/0.978
= 0.198
Probabity that a battery is of good quality that it was supplied from the Japanese plant =
0.198
(B)
probability that japanese plant supplies defective battery + probability that South Korean plant
supplies defective battery
= 0.2*0.03 + 0.8*0.02
= 0.022
Now,
Probabity that a battery is of defective quality that it was supplied from the Korean plant =
probability that Korean plant supplies good battery/Probability that a battery sellected is of
defective quality
= 0.2*0.02/0.022
= 0.182
Probabity that a battery is of defective quality that it was supplied from the Korean plant =
0.182
(C)
Tree Diagram:
Q. a manufactuter of washing machine plans to introduce entry level new model to boost
up his sale. the company has hired two independent market research firms MRA and MRB,
to estimate the proportion of households that currently use washing machines. MRA
conducted the survey of 400 households and it found 20% of the house holds currently
using washing machines. MRB conducted survey of 600 households and found that 18%
of house holds currently using washing machines.
Ans :-
Given that A manufactuter of washing machine plans to introduce entry level new model to boost
up his sale. the company has hired two independent market research firms MRA and MRB, to
estimate the proportion of households that currently use washing machines. MRA conducted the
survey of 400 households and it found 20% of the house holds currently using washing
machines. MRB conducted survey of 600 households and found that 18% of house holds
currently using washing machines.
The 95% confidence interlavs for the two surveys are different because the sample sizes are
different for both the surveys and also both the surveys work independently with each other.
Q. the details of the patient treated by cough or col clinic is given the table below make a
neat contingency tables and show on the tables the joint probability, marginal probability
and conditional probabilities
Q. The Insurance Regulatory Authority regularly conducts surveys on the coverage of
insurance. Latest survey indicates that 30% of population has medical insurance.
Suppose a random sample of 3 persons is selected.
a. Draw a neat and complete probability tree for this problem. Show all possible
outcomes and their probabilities.
b. What is the probability that only one person has medical insurance coverage? Show
all calculations.
What is the probability that exactly two persons have medical insurance coverage?
Show all calculations.
Mid-Semester Test
(EC-2 Regular)
Course No. : MBA ZC417
Course Title : QUANTITATIVE METHODS
Nature of Exam : Open Book
Weightage : 35% No. of Pages =4
Duration : 2 Hours
Date of Exam : Saturday, 12/03/2022 (AN) No. of Questions = 5
Note:
4. Please follow all the Instructions to Candidates given on the cover page of the answer book.
5. All parts of a question should be answered consecutively. Each answer should start from a fresh page.
6. Assumptions made if any, should be stated clearly at the beginning of your answer.
Q.1 The table below gives 8 summary measures of the duration of stay of 60 patients who were discharged by
MaxiMax Hospital in the last week. [7]
(a) Draw a hand-drawn (approximate is ok), neat, and well-labeled Boxplot chart.
(b) Discuss two or more insights drawn from the Boxplot chart (give precise answers, preferably in points).
(c) Is the maximum number of days of stay an outlier?
(d) Comment on the importance of 3.9 days value, given in the table, to the hospital administrator (give precise
answers, preferably in points).
ANSWER
(a):A Box plot also known as box-and-whisker diagrams is used to present the five summary measures like minimum, maximum,
first quartile, median and third quartile of the data.
It contains a box whose bottom side represents the first quartile and the top side represents the third quartile.The desired drawn
box plot is shown below:
(b):
As the maximum number of days of stay in hospital is 16 days means all of patients stayed less than 16 days in
Hospital.
The value of 1st and 3rd quartiles is 3 and 8 respectively thus it means about 25% of patients stayed less than 3
days and about 75% of patients stayed less than 8 days.
The value median is 5 means about 50% of patients stayed less than 5 days.
(c):
In the box plot, the outlier limits are Q1 - 1.5*IQR and Q3 + 1.5*IQR i.e. if any value lies below (Q1 - 1.5*IQR) and
above (Q3 + 1.5*IQR) called an outlier where Q1 and Q3 are 1st and 3rd quartile respectively and IQR is the
Interquartile Range which is the difference of 1st and 3rd quartile i.e. IQR = Q3 - Q1.
As the value of 1st and 3rd quartiles is 3 and 8 respectively thus the value of Interquartile Range(IQR) is;
IQR = Q3 - Q1
= 8-3 = 5.
As the maximum value, 16 lies outside the upper outlier limit, 15.5 thus it is an outlier.
(d):
The sample standard deviation, 's' indicates how far are the data points from the mean value (average value) in the
sample.
The value of standard deviation is 3.9 days. It means the duration of stay of patients can vary 3.9 days from the
average number of stay, 6 days (the mean value). It gives an approximate idea about the number of occupied and
empty beds in the hospital next week.
(Do only by hand; do not type the answers. Draw a box over the final numerical answer. Wherever applicable,
show every step, every formula, and every calculation. If MS Excel is used, then mention 100% correct Excel
function. Screenshot of any software is not acceptable).
Q.2 ABC Retail sells Apples, Bananas and Chocolates in 50 stores in a metro city. A representative data of the
purchases made by 20 customers is given in the table below- [7]
Customer No. Apples Bananas Choloates
Customer #1 Yes Yes Yes
Customer #2 No No Yes
Customer #3 Yes No No
Customer #4 No Yes Yes
Customer #5 Yes Yes No
Customer #6 No Yes No
Customer #7 Yes Yes No
Customer #8 Yes No No
Customer #9 Yes No No
Customer #10 No Yes Yes
Customer #11 No Yes Yes
Customer #12 Yes Yes Yes
Customer #13 Yes No No
Customer #14 No Yes No
Customer #15 No Yes No
Customer #16 Yes Yes Yes
Customer #17 Yes No No
Customer #18 Yes No No
Customer #19 No Yes Yes
Customer #20 Yes No Yes
(a) Make a table for joint and conditional probabilities for the sale of Apples and Bananas.
(b) Make a table for conditional probability for the sale of Apples given that a customer has purchased or
not purchased Bananas.
(c) What is the probability that a customer will buy Apples and Bananas given that he has purchased
Chocolates.
ANSWER
(Do only by hand; do not type the answers. Draw a box over the final numerical answer. Wherever applicable,
show every step, every formula, and every calculation. If MS Excel is used, then mention 100% correct Excel
function. Screenshot of any software is not acceptable).
Q.3 FireFox General Insurance Company provides insurance services to oil exploration, processing, and
distribution firms. OilInMotion- an oil distribution company- has insured three newly acquired oil tankers
(named- Ace, King and Jack). Past records show the probability that an oil tanker catches fire in a year is 0.05.
Assume incidents of fire are independent.
[7]
(a) What is the probability that only one of three tankers will catch fire in a year.
(b) What is the probability that King and Ace will catch fire in a year, and
(c) What is the probability that FireFox gets insurance claim for King or Jack.
Ace-probability-happening 0.05
Ace-probability-Not happening 0.95
King-probability-happening 0.05
King-probability-not happening 0.95
jack-probability-happening 0.05
jack-probability-not happening 0.95
(Do only by hand; do not type the answers. Draw a box over the final numerical answer. Wherever applicable,
show every step, every formula, and every calculation. If MS Excel is used, then mention 100% correct Excel
function. Screenshot of any software is not acceptable).
Q.4 The amount of powdered spice filled by FillAndChill, an automatic machine, differs from one packet to
another. The mean amount of chili powder filled by the machine is 160g with standard deviation of 2g. Assume
the amount of chili powder filled by the machine is Normal distributed.
[7]
(a) What is the probability that a packet chosen at random will weigh between 158 to 163g,
(b) What is the probability that the weight of a randomly chosen packet is more than 155g, and
(c) If the packets that weigh less than 158g are sold for Rs 80 and other packets for Rs 100, what is the
expected value of revenue/packet.
(Do only by hand; do not type the answers. Draw a box over the final numerical answer. Wherever applicable,
show every step, every formula, and every calculation. If MS Excel is used, then mention 100% correct Excel
function. Screenshot of any software is not acceptable).
Q.5 The Aviation Regulatory Authority (ARA) plans to make it mandatory for the commercial airlines to publish on
their websites the average delay by which their flights are delayed.
The delays of a randomly selected sample of 16 flights of flight no. F-16 and 12 flights of flight no. AN-12 are
given in the Table-1, and Table-2 gives 11 summary measures of the data. [7]
(a) What is the point estimate and interval estimate of the mean delay of flight no. F-16 and flight
no. AN-12, for 95% Confidence Level?
(b) Why is interval estimate preferred over the point estimate (give precise answer)?
(c) What do Skewness values of F-15 and AN-12 given in Tbale-2 indicate (give precise answers)?
(Do only by hand; do not type the answers. Draw a box over the final numerical answer. Wherever applicable,
show every step, every formula, and every calculation. If MS Excel is used, then mention 100% correct Excel
function. Screenshot of any software is not acceptable).
_________