Assignment Module 1
Assignment Module 1
Priyanka Sindhwani
Question 1: Members of a mafia group gathered at a secret hideaway. The city police come to know about the
meeting and plan to arrest the leader of the group. The police know that the members of the mafia will leave one
by one at random order for security reasons from the hideaway. As soon as the police arrests one of them, other
members would be alerted and would flee. For this reason, the cop would arrest if they are reasonably sure if he
is the leader of the gang. The police know that the leader is the tallest member of the gang, and they also know
that the gang consists of 20 members. How can the police maximize the probability of arresting the gang leader?
Answer 1:
K = The point after which the chances of arresting the leader are maximum
Therefore integrating the above equation 1 in order to get the summation, we get
Therefore we would let first 7 person leave undisturbed, only observing their height.
After which we arrest the first person whose height is greater than the previous gangsters
than the previous gangster who left. Also chance of arresting the leader will be 37%
~1~
Question 2: Alcohol checks are regularly conducted near MG road Bangalore. Drivers are first subjected to
breath test; if the test is positive then driver is taken for a blood test. Blood test will reveal whether the driver has
been driving under the influence of alcohol. The breath test yields positive results among 95% of drunken drivers
and yields positive results among 8% of sober drivers. According to current statistics one out of every 25 drivers
on the road drive under the influence of alcohol. Calculate the probability of a randomly tested driver being
unnecessarily subjected to a blood test after a positive breath test?
Answer 2:Bayes' Theorem is dependent on there not being any correlation between the frequency with which
given information is given, and the outcome. Whether or not you are aware of such a correlation.
P(+|D') :: Probability that test is positive given driver is not drunk = 0.08
BAYES THEORM
~2~
Question 3: The number of insurance claims per day follows a Poisson distribution at a rate of 22 claims per day.
Calculate the probability that the number of claims exceeds 30 in a day. If the chance of fraudulent claim is 0.05,
calculate the probability that there will be at least 2 fraudulent claims in any given day.
Answer 3. Part 1
Mean/μ = 22
P = 0.05
Probability that the number of claims exceeds 30 in a day ( Excel formula ) Poission ( x,mean,cumulative)
Part 2
~3~
Question 4: Read the case study, “HR Analytics at Scalene-works”, and answer the following questions.
1. Use a goodness of fit approach to check whether the “offered increase in CTC” in the case follows a normal
distribution.
2. Assume that the offered increase in CTC follows a normal distribution, what is the probability that offered
increase in CTC is more than 50%?
3. Is there a statistical evidence to suggest that the notice period has different influence on the joining of the job
by applicants? Use an appropriate statistical test to answer question.
4. Check whether the expected increase in CTC is different for men and women using an appropriate statistical
test.
Part1: (Use a goodness of fit approach to check whether the “offered increase in CTC” in the case follows a
normal distribution)
Answer:
1st step was to clean the data, for this question we only deleted blank values, deleting definite outliers were not
making any difference to the end result
Assumption:
~4~
Bin No Lower Upper Observed Expected (o-e)^2/e
1 -61 -48 18 8.2 11.7
2 -47 -34 60 41.9 7.8
3 -33 -20 156 161.7 0.2
4 -19 -6 385 472.0 16.0
5 -5 8 1036 1042.9 0.0
6 9 22 1769 1743.6 0.4
7 23 36 2790 2206.1 154.5
8 37 50 3069 2112.6 432.9
9 51 64 923 1531.1 241.5
10 65 78 588 839.8 75.5
11 79 92 317 348.6 2.9
12 93 106 185 109.5 52.1
13 107 120 121 26.0 347.0
14 121 134 52 4.7 479.2
Total 1821.8
Critcal value (Chinv) = 19.67514
As per the decision rule, p-value < 0.05, critical value < chi-square value. Therefore we reject
the null, which establishes that data is not following normal distribution
~5~
Part2: Assume that the offered increase in CTC follows a normal distribution, what is the probability that offered
increase in CTC is more than 50%?
Answer:
Computing area under the curve for an offered increased of atmost 50%, The probability will be
37.7720
Mean 2
Median 34.48
Mode 42.86
Dispersion
35.5933
Std. Deviation 2
Part 3: Is there a statistical evidence to suggest that the notice period has different influence on the joining of the
job by applicants? Use an appropriate statistical test to answer question.
Answer
Decision rule: If p-value > 0.05, critical value < chi square distribution, we retain the null hypothesis
Alpha : 0.05
~6~
Expected
Observed Notice
Notice Period Joined Not Joined Total
Period Joined Not Joined Total 0 1268 524 1792
0 1571 221 1792 30 4446 1839 6285
30 4743 1542 6285 45 508 210 718
45 453 265 718 60 1672 692 2364
60 1369 995 2364 75 135 56 191
75 94 97 191 90 637 264 901
90 461 440 901 120 58 24 82
120 34 48 82 Total 8725 3608 12333
Total 8725 3608 12333
As per the decision rule, p-value < 0.05, critical value < chi-square value. Therefore we reject the null, thus stating
~7~
Part4. Check whether the expected increase in CTC is different for men and women using an appropriate
statistical test.
Answer
From the data set we take male, female and expected increase CTC. We assign random numbers and then take
equal values of both Male and Female, in this case we took 100 sample size for Male and 100 sample size for
female with respective salaries
The fourth assumption is that a reasonably large sample size is used. A larger sample size means that the
distribution of results should approach a normal bell-shaped curve.
Next step is to check the variance of the sample, this is done through F-test
Hypothesis:
H0: σ21 = σ22
H1: σ21 != σ22
Decision rule : if p -value >0.05, then retain the null or else reject it
As, p-value (0.03)<0.05, therefore we reject the null. Thus conducting the unequal variance (Welch) t-test
~8~
Hypothesis: H0 : μ men - μ Female = 0
H1: μ men - μ Female != 0
Decision rule : if p -value >0.05, also if t-critical > 1.96 as this is a 2 tailed t-test, then retain the null or
else reject it
Variable 1 Variable 2
Mean 42.2451 45.5484
3070.56110 3936.10256
Variance 8 1
Observations 100 100
Hypothesized Mean
Difference 0
Df 195
-
0.39463207
t Stat 6
P(T<=t) one-tail 0.34677286
t Critical one-tail 1.65270531
0.69354571
P(T<=t) two-tail 9
1.97220405
As per the decision rule, p-value (0.69)> 0.05, t-critical value (1.97) > 1.96. Therefore we retain the null.
t Critical two-tail 1
Thus stating the mean of expected increase in CTC is same for both men and women.
~9~
Question 5
Read the case: ”A Dean’s Dilemma: Selection of Students for the MBA Program”, and answer the
following questions.
1. Carryout descriptive analytics (use different data visualization approaches). What insights you are able
to gain using descriptive analytics about the students who are placed and not placed?
2. In a random selection of 20 students, what is the probability that exactly 5 students are not placed?
What is the probability that at least 5 students are not placed?
3. Consider only the data of students who were placed and answer the following questions:
a. Is there a statistical evidence to suggest that the average salary of students with average score of 60
marks in SSLC is less than the average salary of students with an average score of more than 60 marks in
SSLC? Use the appropriate statistical test.
b. The Dean, Easwaran Iyer, believes that the male students earn at least 10000 more than female students
per annum. Do an appropriate test to validate this belief.
c. Students from CBSE Board (in SSC) are given higher priority during the admission, is this admission
policy justified? Justify your answer.
4. What will be your recommendations to Dr. Easwaran Iyer based on your response to questions 1, 2 and
3.
Part 1: Carryout descriptive analytics (use different data visualization approaches). What insights you are
able to gain using descriptive analytics about the students who are placed and not placed?
~ 10 ~
Fewer Female get placed when
compared to male, Also it was observed
that on an average salary received by
female is (11%) less than males
~ 11 ~
Placed %
Graduation Marketing Management Marketing
subject and Finance & HR & IB
Arts 75.00% 77.78%
Commerce 80.46% 78.57% 100%
Computer
Applications 75.00% 75.00%
Engineering 94.74% 71.43% 50%
Management 75.00% 84.72% 71%
Others 0.00% 100.00%
Science 100.00% 76.92%
All the students who scored 80 or more in communication got placed, however any score below 80 here
doesn’t seem to be any relationship between communication and placement
Students who passed HSC from CBSE has a placement rate of 84%, whereas others range from 77-79%
Degree in engg. doesn’t play any significant role in getting placement
Part 2.In a random selection of 20 students, what is the probability that exactly 5 students are not placed?
What is the probability that at least 5 students are not placed?
Answer 2.
The following question can be addressed by using Bernoulli trial which follows Binomial Distribution, The
assumption for the same are as follows : -
There are only two outcomes a 1 or 0, i.e., success or failure each time
If the probability of success is p then the probability of failure is (1-p) and this remains the same across
each successive trial.
The probabilities are not affected by the outcomes of other trials which means the trials are independent.
x= 5
From the data set we got, Success probability (p) = 79/391 = 0.202046036
~ 12 ~
Calculation that atleast 5 students are placed
No. of trials = 20
x<5
p = 0.202046036
1-(Binodist(x=4,20,0.20246,True)) = 0.379
a.Is there a statistical evidence to suggest that the average salary of students with average score of 60
marks in SSLC is less than the average salary of students with an average score of more than 60 marks in
SSLC? Use the appropriate statistical test.
Answer a
As data set is more than 30, we assume the data follows normal distribution, therefore would proceed to do T-
test ( To check if the data follows equal or unequal variance, we would conduct a f-test with following
hypothesis)
H0 : σ21 = σ22
H0 : σ21 ǂ σ22
Value1 Value 2
Mean 279908.5 264935.7798
6.53E+0
Variance 9 12843523615
Observations 201 109
df 200 108
F 0.508227
P(F<=f) one-tail 1.9E-05
F Critical one-
tail 0.761805
As p-value <0.05, therefore we reject the null. Thus variance is not equal. Therefore we would proceed with
unequal variance t-test
~ 13 ~
Setting the Hypothesis
Variable
1 Variable 2
Mean 279908.5 264935.7798
6.53E+0
Variance 9 12843523615
Observations 201 109
Hypothesized Mean
Difference 15059.41
df 169
t Stat -0.00707
P(T<=t) one-tail 0.497182
t Critical one-tail 1.65392
P(T<=t) two-tail 0.994364
a. t Critical two-tail 1.9741
ing at values of only one tail. Tstat <T-critical , also p-value > 0.05,
esis , thus average salary of students with average score of 60 or less
h average score of more than 60
b. The Dean, Easwaran Iyer, believes that the male students earn at least 10000 more than female
students per annum. Do an appropriate test to validate this belief.
To conduct this test we first collected random equal sample of male and female.
~ 14 ~
Decision rule : retain null if p value >0.05
data: mv2$MANDF
P = 14.5, p-value = 0.1056
As p-value > 0.05, therefore we retain the null. Thus data is normally distributed
Statistical test : A two sampled t-test, based on H0, it will be left one tailed t test
Decision rule : p-value >0.05, then retain the null or tstat < tcritcal, then retain the null
Male Female
Mean 284241.9 253068.0412
Variance 9.89E+09 5504236572
Observations 215 97
Hypothesized Mean Difference 10000
df 243
t Stat 2.089079
P(T<=t) one-tail 0.018871
t Critical one-tail 1.651148
P(T<=t) two-tail 0.037741
t Critical two-tail 1.969774
As p-value <0.05 and tstat(2.08)> t-critcal (1.65), therefore we reject the null.
Therefore difference in salary is less than 10,000 for men and woman
~ 15 ~
c. Students from CBSE Board (in SSC) are given higher priority during the admission, is this admission
policy justified? Justify your answer.
We would be comparing the salary of students placed, vis a vis there SSC board
We assume data is normally distributed, as we randomly took equal frequency data set for both cbse and other
boards and would be proceeding with t-test. To check the variance of the t-test we would first conduct the f-test
OTHER
CBSE S
276793. 273582.
Mean 6 6
1.25E+1
Variance 0 7.1E+09
Observations 94 218
df 93 217
1.76692
F 5
P(F<=f) one- 0.00037
tail 3
F Critical one- 1.32286
tail 1
As p-value is < 0.05, therefore we reject the null thus concluding the values have unequal variance
Decision: if p-value >= 0.05, retain the null or if T-stat < T-critical value, retain the null
~ 16 ~
CBSE OTHERS
273582.568
Mean 276793.6 8
1.25E+1
Variance 0 7102566884
Observations 94 218
Hypothesized Mean Difference 3211.048
Df 140
t Stat 1.66E-08
P(T<=t) one-tail 0.5
t Critical one-tail 1.655811
P(T<=t) two-tail 1
t Critical two-tail 1.977054
p-value >=0.05 also Tstat (1.66E-08) < Tcritical(1.65). Thus we are retaining the null.
Conclusion : We need to change the strategy as our null hypothesis is correct, which states other
board students have equal or greater chances of being selected
4. What will be your recommendations to Dr. Easwaran Iyer based on your response to questions 1, 2 and
3.
Answer. As per the findings from above questions, we would recommend the following points to be taken into
consideration
Commerce students should be given preference or encouraged to take up Marketing & IB as the
placement of students with commerce is 100% in Marketing & IB
Engg. And Commerce students should be given preference or encouraged to take up Marketing &
Finance
For both Marketing & Finance and Marketing and IB, people with higher experience should be given
experience
Students should not be selected from their boards meaning Cbse vs ICSE should not be the deciding
criteria
As students who scored above 80, got placed. Therefore this can be included as a pre interview criteria to
test students on communication
Question 6
Read the case Hawthrone Plastics Inc (uploaded in Moodle). Answer the following questions:
a. Which process should Mr. Nelson specify for the manufacture of polypropylene strappings?
b. How much of premium price per pound could Nelson afford to pay, to acquire raw material which was
guaranteed of the long molecular chain variety?
a.Which process should Mr. Nelson specify for the manufacture of polypropylene strappings?
Answer a.
~ 17 ~
Cost per pound
Process 1 Process 2 Process 3
Variable cost 0.13 0.15 0.17
Set up cost 0.002 0.007 0.012
Clean up cost 0 0.003 0.0025
Raw Material cost 0.25 0.25 0.25
Total cost per pound 38200 40950 43450
LM : Long Molecule
Revenue per
Batch SM: Small Molecule
LC: Long Chain
Avg quality 50,000 SC: Small Chain
RM : Raw Material
For average Quality 60,000
To answer the above question, decided to go ahead with decision tree for all the three processes, to decide
on how much maximum revenue that can be earned from each
Process1:
~ 18 ~
Will be proceeding with Process 2 with
test as that gives the earning of
12450$, which is more than the other
process
PROCESS 3:
b.How much of premium price per pound could Nelson afford to pay, to acquire raw material
which was guaranteed of the long molecular chain variety?
To calculate more premium, we would first take the process which can yield us maximum profit,
generating high chain material, as it is already guaranteed it will be LM, we would also save on test cost
Calculations
~ 19 ~
c. What further analysis would you suggest?
As it is already mentioned in the study that Hawthrone wants to start their own line of business ,therefore if they
start the plastic strapping business of their own instead of taking it up from other company. We get
If hawthrone, gets into the business of plastic strapping our earning from High quality is 100000 and average
quality 75000
These values have been calculated by imputing the new revenue in the already profitable decision tree, that is
process 2 with test
Process 2 :
Process 2
Variable cost 0.15
Set up cost 0.007
Clean up cost 0.003
Raw Material cost 0.25 Profit is 14200, which is 114% , if Hawthrone
Total cost per
produces and sell its own strapping
pound 40950
~ 20 ~
Question 7: An educational psychologist wants to check the claims of some spiritualists that a daily practice of
meditation among the students will improve the academic achievement of the students. To control the
experiment for academic aptitude, pairs of college students with similar grade point averages (GPA) are
randomly assigned to either a group that receives daily training in meditation or a group that doesn’t receive
training in meditation. At the end of the experiment which lasts for one term, the following GPAs were
reported for 10 pairs of participants:
~ 21 ~
Decision rule: pvalue >0.05, retain the null
data: medi$Observed
P = 1, p-value = 0.9098
Observed Expected
Pair No- Pair No-
Number Meditation Meditation Total Number Meditation Meditation Total
1 4 3.75 7.75 1 3.99 3.762906 7.75
2 2.65 2.75 5.4 2 2.78 2.621895 5.40
3 3.65 3.45 7.1 3 3.65 3.447307 7.10
4 2.55 2.11 4.66 4 2.40 2.262599 4.66
5 3.2 3.21 6.41 5 3.30 3.112287 6.41
6 3.6 3.25 6.85 6 3.52 3.325923 6.85
7 2.9 2.58 5.48 7 2.82 2.660738 5.48
8 3.41 3.28 6.69 8 3.44 3.248237 6.69
9 3.33 (o-
3.35 6.68 9 3.44 3.243382 6.68
Observed
10 Expected
2.9 (o-e) e)^2/e
2.65 5.55 10 2.86 2.694726 5.55
Total 4 3.99
32.19 0.01 30.38
0.0062.57
Total 32.19 30.38 62.57
2.65 2.78 -0.13 0.01
3.65 3.65 0.00 0.00
2.55 2.40 0.15 0.01
3.2 3.30 -0.10 0.00
3.6 3.52 0.08 0.00
2.9 2.82 0.08 0.00
3.41 3.44 -0.03 0.00
3.33 3.44 -0.11 0.00
2.9 2.86 0.04 0.00
3.75 4 -0.01 0.00
2.75 3 0.13 0.01
3.45 3 0.00 0.00
2.11 2 -0.15 0.01
3.21 3.1 0.10 0.00
3.25 3.3 -0.08 0.00 ~ 22 ~
2.58 2.7 -0.08 0.00
3.28 3.2 0.03 0.00
3.35 3.2 0.11 0.00
2.65 2.7 -0.04 0.00
Chi square : 0.055
Critical value : 16.92 As per the decision rule, if p-value >0.05, we retain the null. Therefore our
null is retained, which concludes data is normally distributed
p-value : 1.00
For the current data set we would be proceeding with the Paired –t test, which has the following assumption
H1: μm>μnm
Decision rule: if p-value >= 0.05, retain the null or if T-stat < T-critical value, retain the null
No-
Meditation Meditation
~ 23 ~
Mean 3.219 3.038
0.21832111 0.24572888
Variance 1 9
Observations 10 10
0.93379979
Pearson Correlation 1
Hypothesized Mean
Difference 0.181
df 9
t Stat 0.0
P(T<=t) one-tail 0.5
1.83311293
t Critical one-tail 3
P(T<=t) two-tail 1
2.26215716
t Critical two-tail 3
Retaining the null as per one tail paired t-test p-value (0.5) > 0.05, and t-stat(0.00) < t-critical (1.85).
Thus we can say that Meditation doesn’t add to the GPA scores in students
Question 8: A Car tyres supplier claims that the average life of the tyre is more than 50,000 kilometers. Based
on a sample of 50 tyres, the mean was estimated as 51,200 and the standard deviation 1500.
a. Use an appropriate hypothesis test to check whether the claim by the supplier is true.
b. If the actual average life is 48,000 kilometers, calculate the type II error and the power of hypothesis test.
Answer 1:
Hypothesis :
H0: Average life of tyre is <=50000
H1: Average life of tyre is >50000
Deciding on the test : As std deviation is given and sample size >30, therefore we can assume that data is
normally distributed and go ahead with Z-test
Values: X= 51200
μ- population mean = 50000
σ – sample standard deviation = 1500
n -sample size = 50
p - alpha value = 0.05
df = 50-1 = 49
~ 24 ~
Decision rule : If Z > 1.64(this is z critical, with alpha 0.05),reject the null
Formula : Z = X - μ
σ/Sqrt(n)
Z= 5.65
As Z >1.64, therefore as per the decision rule rejecting the null hypothesis, therefore driver claim of average life of
tyre to be more than 50,000 km cannot be rejected.
b. If the actual average life is 48,000 kilometers, calculate the type II error and the power of hypothesis test
TYPE 2 ERROR : When the null hypothesis is false and you fail to reject it, you make a type II error. The
probability of making a type II error is β, which depends on the power of the test.
The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the power of
the test.
Values:
: Actual Population Mean = 48000
Hypothesized Mean = 50000
σ = 1500
n = 50
Alpha = 0.05
H0: μ >= 50000
H1: μ < 50000
Decision rule : H0 will be rejected whenever Xbar is less than the critical value derived by
Formula1 : Z = X - μ
σ/Sqrt(n)
Formula to calculate Beta = P[ Z> = Xcrit – μ1] = P[ Z> 49561 – 48000/1500/SQRT(50) = P[Z>7.78]
σ/Sqrt(n)
~ 25 ~
Determining the area in Normal distribution curve for Z>7.78
Therefore, formula to calculate Beta = 1-Norm.s.dist(7.78,TRUE) = 3.55E-15 ( This is almost close to 0),
Question 9. Twenty-five overweight people were assigned three different programs for weight loss, namely: 1.
Diet, 2. Exercise and 3. Modification of eating behaviour. The weight changes are recorded after 3 months and
are shown in the table below. In the table, positive value indicates weight loss and negative value indicates
weight gain
S.N
o Diet Exercise Modification of Eating Behaviour
1 0 -3 10
2 4 -1 1
3 3 8 0
4 5 4 12
5 -3 2 18
6 10 3 4
7 0 -2
8 4 5
9 -2 3
10 4
Answer :
As the data set is very small , we were not able to establish the assumption of data being normal, through various
test conducted. Results of the same are given below
data: QUESTION$X0
P = 12, p-value = 0.03479 ~ 26 ~
Therefore for this question, we would want to proceed with a non parametric test, The Kruskal Wallis test
The procedure for the test involves pooling the observations from the k samples into one combined sample,
keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to N,
where N = n1+n2 + ...+ nk.
Hypothesis :
Decision rule : If the observed value of H is greater than or equal to the critical value, we reject H0 in favor of
H1; if the observed value of H is less than the critical value we do not reject H0.
~ 27 ~
Modification of
Eating
Diet Exercise Behaviour
1 0 -3 10
2 4 -1 1
3 3 8 0
4 5 4 12
5 -3 2 18
6 10 3 4
7 0 -2
8 4 5
9 -2 3
10 4
Rank
Modification of
S.N Diet Eating Behaviour
o (N1) Exercise (N2) (N3)
1 7 1.5 22.5
2 16 5 9
3 12 21 7
4 19.5 16 24
5 1.5 10 25
6 22.5 12 16
7 7 3.5
8 16 19.5
9 3.5 12
10 16
~ 28 ~
N3 Modification of Eating
Behaviour 10 154.5 2387.025
N 25 4327
………………………………………………………………………………………………………………
………………………………………………………………………………………………………………
END OF ASSIGNMENT
~ 29 ~