0% found this document useful (1 vote)

98 views30 pages

Assignment Module 1

The document is an assignment containing 4 questions related to hypothesis testing and statistical analysis. Question 1 asks how police can maximize the probability of arresting a mafia gang leader, given information about the gang's size and characteristics. Question 2 calculates the probability that a driver subjected to a breathalyzer test would be unnecessarily given a follow-up blood test. Question 3 involves calculating probabilities related to insurance claims using a Poisson distribution. Question 4 involves goodness of fit tests and other statistical analyses to examine data from a case study on employee compensation.

Uploaded by

Priyanka Sindhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

98 views30 pages

Assignment Module 1

Uploaded by

Priyanka Sindhwani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Assignment : Module 1

Hypothesis testing: 12th

August 2016

Priyanka Sindhwani
Question 1: Members of a mafia group gathered at a secret hideaway. The city police come to know about the
meeting and plan to arrest the leader of the group. The police know that the members of the mafia will leave one
by one at random order for security reasons from the hideaway. As soon as the police arrests one of them, other
members would be alerted and would flee. For this reason, the cop would arrest if they are reasonably sure if he
is the leader of the gang. The police know that the leader is the tallest member of the gang, and they also know
that the gang consists of 20 members. How can the police maximize the probability of arresting the gang leader?

Answer 1:

Optimal Stopping theorem

N=20 (no of Gangsters)

K = The point after which the chances of arresting the leader are maximum

All the values before "K" would be rejected

Equation 1: P (K) = Σ P(A)*P(B)

A= Probability of being the tallest criminal, B= Probability of being selected

P(A)=1/N, P(B) for B<=K = 0

For B = K+1, P(B) is 1/N

For B= K+2, P(B) is (1/N)*(K/K+1)

For B= K+3, P(B) is (1/N)*(K/K+2),

Similarly, P(N) = (1/N)*(K/N-1)

Therefore integrating the above equation 1 in order to get the summation, we get

Equation 2: P(K) = (K/N)*ln(N/K)

Maximizing the equation 2 : Differentiation

Equation 3: P'(K)= -ln(K/N)- 1

On Equating, equation 3 to 0, to find the maxima

We get, K/N = e^-1 = (e= 2.718), therefore e^-1 =1/2.718 = 0.36791

Therefore N= 20, Thus K = Ne^-1 = 200.36791 = 7.4,

Therefore we would let first 7 person leave undisturbed, only observing their height.

After which we arrest the first person whose height is greater than the previous gangsters
than the previous gangster who left. Also chance of arresting the leader will be 37%

~1~
Question 2: Alcohol checks are regularly conducted near MG road Bangalore. Drivers are first subjected to
breath test; if the test is positive then driver is taken for a blood test. Blood test will reveal whether the driver has
been driving under the influence of alcohol. The breath test yields positive results among 95% of drunken drivers
and yields positive results among 8% of sober drivers. According to current statistics one out of every 25 drivers
on the road drive under the influence of alcohol. Calculate the probability of a randomly tested driver being
unnecessarily subjected to a blood test after a positive breath test?

Answer 2:Bayes' Theorem is dependent on there not being any correlation between the frequency with which
given information is given, and the outcome. Whether or not you are aware of such a correlation.

P(D) : : Probability that driver is drunk = 1/25 = 0.04

P(+):: Probability that test is positive

P(D'):: Probability that driver is not drunk = 1- P(D) = 0.96

P(+|D') :: Probability that test is positive given driver is not drunk = 0.08

P(+|D):: Probability that test is positive given driver is drunk = 0.95

BAYES THEORM

P(D'|+) = P(D') * P(+|D')/ P(+),

Computing P(+) P(D'|+) = P(D') * P(+|D')/ P(D) * P(+|D) + P(D')*P(+|D')

Calculating From total probability we get,

P(+) = P(D) * P(+|D) + P(D')P(+|D') = (0.040.95)+(0.96+0.08) = 0.1148

Calculating P (D’|+) = (0.96 * 0.08)/ 0.1148 = 0.66898 or 67%

Probability of a randomly tested driver being unnecessarily subjected to a blood

test after a positive breath test is 67% OR .668

~2~
Question 3: The number of insurance claims per day follows a Poisson distribution at a rate of 22 claims per day.
Calculate the probability that the number of claims exceeds 30 in a day. If the chance of fraudulent claim is 0.05,
calculate the probability that there will be at least 2 fraudulent claims in any given day.

Answer 3. Part 1

The Poisson distribution has the following properties:

The mean of the distribution is equal to μ .

The variance is also equal to μ

In the following question it states

Mean/μ = 22

x= 30 ( as we need probability (X=x = 30))

P = 0.05

Probability that the number of claims exceeds 30 in a day ( Excel formula ) Poission ( x,mean,cumulative)

Poisson (30,22,True) : This gives us value of claims of exact 30 which is 0.95948

Therefore claims that exceed 30 will be (1 - 0.95948) = 0.04051

Probability of claims exceeding 30 is 0.04051

Part 2

Chance of fraudulent claim is 0.05

Probability of atleast 2 fraudulent claims per day

Mean/μ = 22*0.05 = 1.1

x = 1 ( as we need probability of (X>=2))

Poisson (1(n),1.1(mean),True) = 0.6990, therefore atleast 2 fraudulent =( 1- 0.6990) = 0.30

Probability of atleast 2 fraudulent claims in a day is 0.30

~3~
Question 4: Read the case study, “HR Analytics at Scalene-works”, and answer the following questions.

1. Use a goodness of fit approach to check whether the “offered increase in CTC” in the case follows a normal
distribution.

2. Assume that the offered increase in CTC follows a normal distribution, what is the probability that offered
increase in CTC is more than 50%?

3. Is there a statistical evidence to suggest that the notice period has different influence on the joining of the job
by applicants? Use an appropriate statistical test to answer question.

4. Check whether the expected increase in CTC is different for men and women using an appropriate statistical
test.

Part1: (Use a goodness of fit approach to check whether the “offered increase in CTC” in the case follows a
normal distribution)

Answer:

1st step was to clean the data, for this question we only deleted blank values, deleting definite outliers were not
making any difference to the end result

(we rounded of the data set to fit them in the bin)

Assumption:

The data are the observed frequencies

The degrees of freedom is one less than the number of categories, not one less than the sample size.
It is always a right tail test.
It has a chi-square distribution.
The value of the test statistic doesn't change if the order of the categories is switched.

Step 2: Setting up the Hypothesis

H0: The data is consistent with a normal distribution
H1: The data is not consistent with a normal distribution

Step 3: Decision rule

If p-value > 0.05, critical value < chi square distribution, we retain the null hypothesis
Mean/μ = 34.3
Std. deviation/σ = 26.25
No. of Bins= 14 (1+3.3*Log10(N) N ( no. of observation) = 11469.
Length of Bins (B11470 - B2)/Length of Bins
Max-Min/ No. of Bins = 13
Alpha : 0.05
Df = 11

~4~
Bin No Lower Upper Observed Expected (o-e)^2/e
1 -61 -48 18 8.2 11.7
2 -47 -34 60 41.9 7.8
3 -33 -20 156 161.7 0.2
4 -19 -6 385 472.0 16.0
5 -5 8 1036 1042.9 0.0
6 9 22 1769 1743.6 0.4
7 23 36 2790 2206.1 154.5
8 37 50 3069 2112.6 432.9
9 51 64 923 1531.1 241.5
10 65 78 588 839.8 75.5
11 79 92 317 348.6 2.9
12 93 106 185 109.5 52.1
13 107 120 121 26.0 347.0
14 121 134 52 4.7 479.2
Total 1821.8
Critcal value (Chinv) = 19.67514

Chi p value: 0.00E+00

As per the decision rule, p-value < 0.05, critical value < chi-square value. Therefore we reject
the null, which establishes that data is not following normal distribution

For the same we even attempted p-value < 2.2e-16

Pearson chi-square and
(Kolmogorov-Smirnov) test for
normality which gave following
results Pearson chi-square
Lilliefors (Kolmogorov- p-value < 2.2e-16
Smirnov) normality test

Reconfirming that data is not consistent with normal distribution

~5~
Part2: Assume that the offered increase in CTC follows a normal distribution, what is the probability that offered
increase in CTC is more than 50%?

Answer:

Computing area under the curve for an offered increased of atmost 50%, The probability will be

37.7720
Mean 2
Median 34.48
Mode 42.86
Dispersion
35.5933
Std. Deviation 2

Calculations: 1- Normdist (x=50,mean=37.7,std. dev = 35.59,cumulative)

37% (probability above 50)

Part 3: Is there a statistical evidence to suggest that the notice period has different influence on the joining of the
job by applicants? Use an appropriate statistical test to answer question.

Answer

Setting up the Hypothesis

H0: Notice period doesn’t have an influence on joining

H1: Notice period has an influence on joining

Decision rule: If p-value > 0.05, critical value < chi square distribution, we retain the null hypothesis

Alpha : 0.05

Df = (No. of rows – 1) (No. of columns – 1) = 61 = 6

~6~
Expected
Observed Notice
Notice Period Joined Not Joined Total
Period Joined Not Joined Total 0 1268 524 1792
0 1571 221 1792 30 4446 1839 6285
30 4743 1542 6285 45 508 210 718
45 453 265 718 60 1672 692 2364
60 1369 995 2364 75 135 56 191
75 94 97 191 90 637 264 901
90 461 440 901 120 58 24 82
120 34 48 82 Total 8725 3608 12333
Total 8725 3608 12333

Chi square computation

Observed Expected (o-e) (o-e)^2/e
1571 1268 303 73
4743 4446 297 20
453 508 -55 6
1369 1672 -303 55
94 135 -41 13
461 637 -176 49
34 58 -24 10
221 524 -303 175
1542 1839 -297 48
265 210 55 14
995 692 303 133
97 56 41 30
440 264 176 118
48 24 24 24

Chi square = 768

Critical Value: CHIINV(0.05,6) = 13

p-value: CHISQ.TEST(Observed,Expected) = 1.12E-155

As per the decision rule, p-value < 0.05, critical value < chi-square value. Therefore we reject the null, thus stating

Notice period has a influence on joining

~7~
Part4. Check whether the expected increase in CTC is different for men and women using an appropriate
statistical test.

Answer
From the data set we take male, female and expected increase CTC. We assign random numbers and then take
equal values of both Male and Female, in this case we took 100 sample size for Male and 100 sample size for
female with respective salaries

Next step: from the sample we establish we would be conducting a t-test

Assumptions made for a t-test:
A t-test is that the scale of measurement applied to the data collected follows a continuous or ordinal scale
The second assumption made is that of a simple random sample, that the data is collected from a
representative, randomly selected portion of the total population.
The third assumption is that the data, when plotted, results in a normal distribution, bell-shaped
distribution curve.

Expected salary vs frequency Histogram follows normal

distribution

The fourth assumption is that a reasonably large sample size is used. A larger sample size means that the
distribution of results should approach a normal bell-shaped curve.

Next step is to check the variance of the sample, this is done through F-test

Hypothesis:
H0: σ21 = σ22
H1: σ21 != σ22

Decision rule : if p -value >0.05, then retain the null or else reject it

F test to compare two variances ( Done through R)

F = 0.64507, num df = 99, denom df = 99, p-value = 0.03024
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.4340292 0.9587232
sample estimates:
ratio of variances
0.6450689

As, p-value (0.03)<0.05, therefore we reject the null. Thus conducting the unequal variance (Welch) t-test

~8~
Hypothesis: H0 : μ men - μ Female = 0
H1: μ men - μ Female != 0

Decision rule : if p -value >0.05, also if t-critical > 1.96 as this is a 2 tailed t-test, then retain the null or
else reject it

t-Test: Two-Sample Assuming Unequal Variances

Variable 1 Variable 2
Mean 42.2451 45.5484
3070.56110 3936.10256
Variance 8 1
Observations 100 100
Hypothesized Mean
Difference 0
Df 195
-
0.39463207
t Stat 6
P(T<=t) one-tail 0.34677286
t Critical one-tail 1.65270531
0.69354571
P(T<=t) two-tail 9
1.97220405
As per the decision rule, p-value (0.69)> 0.05, t-critical value (1.97) > 1.96. Therefore we retain the null.
t Critical two-tail 1
Thus stating the mean of expected increase in CTC is same for both men and women.

~9~
Question 5
Read the case: ”A Dean’s Dilemma: Selection of Students for the MBA Program”, and answer the
following questions.
1. Carryout descriptive analytics (use different data visualization approaches). What insights you are able
to gain using descriptive analytics about the students who are placed and not placed?

2. In a random selection of 20 students, what is the probability that exactly 5 students are not placed?
What is the probability that at least 5 students are not placed?

3. Consider only the data of students who were placed and answer the following questions:
a. Is there a statistical evidence to suggest that the average salary of students with average score of 60
marks in SSLC is less than the average salary of students with an average score of more than 60 marks in
SSLC? Use the appropriate statistical test.

b. The Dean, Easwaran Iyer, believes that the male students earn at least 10000 more than female students
per annum. Do an appropriate test to validate this belief.

c. Students from CBSE Board (in SSC) are given higher priority during the admission, is this admission
policy justified? Justify your answer.

4. What will be your recommendations to Dr. Easwaran Iyer based on your response to questions 1, 2 and
3.

Part 1: Carryout descriptive analytics (use different data visualization approaches). What insights you are
able to gain using descriptive analytics about the students who are placed and not placed?

In Marketing & IB, fraction of people

not getting placed is higher when
compared to other 2 specialization of
MBA

~ 10 ~
Fewer Female get placed when
compared to male, Also it was observed
that on an average salary received by
female is (11%) less than males

Students who have commerce as their

subject in graduation majority prefer to
take up Marketing & Finance as their
specialization and similarly, Marketing
&HR is taken up by students who had
Management as subject in graduation

a)Average salary received by students

with specialization in Marketing IB with
1year of exp. is 88% higher than the
fresher

b) Also more the exp. In marketing in

finance gets you higher salary

c) Students with Marketing & HR

specialization, prior experience doesn’t
have any relevant affect on the salary

~ 11 ~
Placed %
Graduation Marketing Management Marketing
subject and Finance & HR & IB
Arts 75.00% 77.78%
Commerce 80.46% 78.57% 100%
Computer
Applications 75.00% 75.00%
Engineering 94.74% 71.43% 50%
Management 75.00% 84.72% 71%
Others 0.00% 100.00%
Science 100.00% 76.92%

Few more observation from the data

All the students who scored 80 or more in communication got placed, however any score below 80 here
doesn’t seem to be any relationship between communication and placement
Students who passed HSC from CBSE has a placement rate of 84%, whereas others range from 77-79%
Degree in engg. doesn’t play any significant role in getting placement

Part 2.In a random selection of 20 students, what is the probability that exactly 5 students are not placed?
What is the probability that at least 5 students are not placed?

Answer 2.

The following question can be addressed by using Bernoulli trial which follows Binomial Distribution, The
assumption for the same are as follows : -

There are only two outcomes a 1 or 0, i.e., success or failure each time

If the probability of success is p then the probability of failure is (1-p) and this remains the same across
each successive trial.

The probabilities are not affected by the outcomes of other trials which means the trials are independent.

Calculations for probability that exactly 5 students are not placed

No. of trials (N) = 20

x= 5

From the data set we got, Success probability (p) = 79/391 = 0.202046036

P (X=x) : Binomdist (x=5,20,0.20246,False) = 0.176

Answer 1 : The probability that exactly 5 students are not

placed is 0.176

~ 12 ~
Calculation that atleast 5 students are placed

No. of trials = 20

x<5

p = 0.202046036

1-(Binodist(x=4,20,0.20246,True)) = 0.379

Answer 2 : The probability that atleast 5 students are

not placed is 0.379

a.Is there a statistical evidence to suggest that the average salary of students with average score of 60
marks in SSLC is less than the average salary of students with an average score of more than 60 marks in
SSLC? Use the appropriate statistical test.

Answer a

Average salary of students with average score of 60 marks or less = μ 60

Average salary of students with average score of more than 60 = μ over60

As data set is more than 30, we assume the data follows normal distribution, therefore would proceed to do T-
test ( To check if the data follows equal or unequal variance, we would conduct a f-test with following
hypothesis)

H0 : σ21 = σ22

H0 : σ21 ǂ σ22

Decision rule : If p-value >0.05 , then retain the null

Value1 Value 2
Mean 279908.5 264935.7798
6.53E+0
Variance 9 12843523615
Observations 201 109
df 200 108
F 0.508227
P(F<=f) one-tail 1.9E-05
F Critical one-
tail 0.761805

As p-value <0.05, therefore we reject the null. Thus variance is not equal. Therefore we would proceed with
unequal variance t-test

~ 13 ~
Setting the Hypothesis

H0 : μ60 >= μover60

H1: μ60 < μover60

Decision rule : if T stat <t critcal , then retain the null

t-Test: Two-Sample Assuming Unequal Variances

Variable
1 Variable 2
Mean 279908.5 264935.7798
6.53E+0
Variance 9 12843523615
Observations 201 109
Hypothesized Mean
Difference 15059.41
df 169
t Stat -0.00707
P(T<=t) one-tail 0.497182
t Critical one-tail 1.65392
P(T<=t) two-tail 0.994364
a. t Critical two-tail 1.9741

ing at values of only one tail. Tstat <T-critical , also p-value > 0.05,
esis , thus average salary of students with average score of 60 or less
h average score of more than 60

b. The Dean, Easwaran Iyer, believes that the male students earn at least 10000 more than female
students per annum. Do an appropriate test to validate this belief.

To conduct this test we first collected random equal sample of male and female.

Then tested the same for normality : with

H0 : data is normally distributed

H1 : data is not normally distributed

~ 14 ~
Decision rule : retain null if p value >0.05

Pearson chi-square normality test

data: mv2$MANDF
P = 14.5, p-value = 0.1056

As p-value > 0.05, therefore we retain the null. Thus data is normally distributed

Therefore we proceed with unequal variance t-test .

Formulating the Hypothesis :

H0: μm – μf >=10000

H1: μm – μf < 10000

Statistical test : A two sampled t-test, based on H0, it will be left one tailed t test

Decision rule : p-value >0.05, then retain the null or tstat < tcritcal, then retain the null

t-Test: Two-Sample Assuming Unequal Variances

Male Female
Mean 284241.9 253068.0412
Variance 9.89E+09 5504236572

Observations 215 97
Hypothesized Mean Difference 10000
df 243
t Stat 2.089079
P(T<=t) one-tail 0.018871
t Critical one-tail 1.651148
P(T<=t) two-tail 0.037741
t Critical two-tail 1.969774

As p-value <0.05 and tstat(2.08)> t-critcal (1.65), therefore we reject the null.

Therefore difference in salary is less than 10,000 for men and woman

~ 15 ~
c. Students from CBSE Board (in SSC) are given higher priority during the admission, is this admission
policy justified? Justify your answer.

We would be comparing the salary of students placed, vis a vis there SSC board

We assume data is normally distributed, as we randomly took equal frequency data set for both cbse and other
boards and would be proceeding with t-test. To check the variance of the t-test we would first conduct the f-test

Formulating Hypothesis for F-test:

H0 : σ21 = σ22
H1 : σ21 != σ22

Decision rule: if p value > = 0.05, then retain the null

F-Test Two-Sample for Variances

OTHER
CBSE S
276793. 273582.
Mean 6 6
1.25E+1
Variance 0 7.1E+09
Observations 94 218
df 93 217
1.76692
F 5
P(F<=f) one- 0.00037
tail 3
F Critical one- 1.32286
tail 1

As p-value is < 0.05, therefore we reject the null thus concluding the values have unequal variance

Setting the Hypothesis for unequal variance t-test

Students from Cbse = μcbse

Students from other = μothers

H0: μcbse <= μothers

H1: μcbse > μothers

Statistical test: Right tailed t test

Decision: if p-value >= 0.05, retain the null or if T-stat < T-critical value, retain the null

t-Test: Two-Sample Assuming Unequal Variances

~ 16 ~
CBSE OTHERS
273582.568
Mean 276793.6 8
1.25E+1
Variance 0 7102566884
Observations 94 218
Hypothesized Mean Difference 3211.048
Df 140
t Stat 1.66E-08
P(T<=t) one-tail 0.5
t Critical one-tail 1.655811
P(T<=t) two-tail 1
t Critical two-tail 1.977054

p-value >=0.05 also Tstat (1.66E-08) < Tcritical(1.65). Thus we are retaining the null.

Conclusion : We need to change the strategy as our null hypothesis is correct, which states other
board students have equal or greater chances of being selected

4. What will be your recommendations to Dr. Easwaran Iyer based on your response to questions 1, 2 and
3.

Answer. As per the findings from above questions, we would recommend the following points to be taken into
consideration

Commerce students should be given preference or encouraged to take up Marketing & IB as the
placement of students with commerce is 100% in Marketing & IB
Engg. And Commerce students should be given preference or encouraged to take up Marketing &
Finance
For both Marketing & Finance and Marketing and IB, people with higher experience should be given
experience
Students should not be selected from their boards meaning Cbse vs ICSE should not be the deciding
criteria
As students who scored above 80, got placed. Therefore this can be included as a pre interview criteria to
test students on communication

Question 6
Read the case Hawthrone Plastics Inc (uploaded in Moodle). Answer the following questions:

a. Which process should Mr. Nelson specify for the manufacture of polypropylene strappings?

b. How much of premium price per pound could Nelson afford to pay, to acquire raw material which was
guaranteed of the long molecular chain variety?

c. What further analysis would you suggest?

a.Which process should Mr. Nelson specify for the manufacture of polypropylene strappings?

Answer a.

~ 17 ~
Cost per pound
Process 1 Process 2 Process 3
Variable cost 0.13 0.15 0.17
Set up cost 0.002 0.007 0.012
Clean up cost 0 0.003 0.0025
Raw Material cost 0.25 0.25 0.25
Total cost per pound 38200 40950 43450

LM : Long Molecule
Revenue per
Batch SM: Small Molecule
LC: Long Chain
Avg quality 50,000 SC: Small Chain
RM : Raw Material
For average Quality 60,000

To answer the above question, decided to go ahead with decision tree for all the three processes, to decide
on how much maximum revenue that can be earned from each

Process1:

PROCESS 2 ( Click on the icon to enlarge)

~ 18 ~
Will be proceeding with Process 2 with
test as that gives the earning of
12450$, which is more than the other
process

PROCESS 3:

b.How much of premium price per pound could Nelson afford to pay, to acquire raw material
which was guaranteed of the long molecular chain variety?

To calculate more premium, we would first take the process which can yield us maximum profit,
generating high chain material, as it is already guaranteed it will be LM, we would also save on test cost

Calculations

60,000- expenditure of process

Process 3 with LC 16550 3
PROCESS 2 Output 12450
Process 3 with Long chain -
Difference 4100 Process 2 Output
Difference in pound 0.041
Raw Material cost 0.25
Raw material long chain 0.29
4 cent per pound
So extra that can be paid more can be paid

~ 19 ~
c. What further analysis would you suggest?

As it is already mentioned in the study that Hawthrone wants to start their own line of business ,therefore if they
start the plastic strapping business of their own instead of taking it up from other company. We get

If hawthrone, gets into the business of plastic strapping our earning from High quality is 100000 and average
quality 75000

These values have been calculated by imputing the new revenue in the already profitable decision tree, that is
process 2 with test

Process 2 :

Revenue per Batch REVENUE IN

Avg $
quality 75000 Current earning 26650
High Previous earning 12450
Quality 100000
Profit 14200

Process 2
Variable cost 0.15
Set up cost 0.007
Clean up cost 0.003
Raw Material cost 0.25 Profit is 14200, which is 114% , if Hawthrone
Total cost per
produces and sell its own strapping
pound 40950

~ 20 ~
Question 7: An educational psychologist wants to check the claims of some spiritualists that a daily practice of
meditation among the students will improve the academic achievement of the students. To control the
experiment for academic aptitude, pairs of college students with similar grade point averages (GPA) are
randomly assigned to either a group that receives daily training in meditation or a group that doesn’t receive
training in meditation. At the end of the experiment which lasts for one term, the following GPAs were
reported for 10 pairs of participants:

Pair Number Meditation No-Meditation

1 4 3.75
2 2.65 2.75
3 3.65 3.45
4 2.55 2.11
5 3.2 3.21
6 3.6 3.25
7 2.9 2.58
8 3.41 3.28
9 3.33 3.35
10 2.9 2.65

Answer 7: Need to establish if this is a normally distributed data

Hypothesis test: H0: Normally distributed data set

H1: Not normally distributed

~ 21 ~
Decision rule: pvalue >0.05, retain the null

To check normality, we conducted the following tests

1. Pearson normality test

Pearson chi-square normality test

data: medi$Observed
P = 1, p-value = 0.9098

As per decision rule, p-value >0.05, therefore we retain the null

2. Chi- square test ( goodness of fit )

Observed Expected
Pair No- Pair No-
Number Meditation Meditation Total Number Meditation Meditation Total
1 4 3.75 7.75 1 3.99 3.762906 7.75
2 2.65 2.75 5.4 2 2.78 2.621895 5.40
3 3.65 3.45 7.1 3 3.65 3.447307 7.10
4 2.55 2.11 4.66 4 2.40 2.262599 4.66
5 3.2 3.21 6.41 5 3.30 3.112287 6.41
6 3.6 3.25 6.85 6 3.52 3.325923 6.85
7 2.9 2.58 5.48 7 2.82 2.660738 5.48
8 3.41 3.28 6.69 8 3.44 3.248237 6.69
9 3.33 (o-
3.35 6.68 9 3.44 3.243382 6.68
Observed
10 Expected
2.9 (o-e) e)^2/e
2.65 5.55 10 2.86 2.694726 5.55
Total 4 3.99
32.19 0.01 30.38
0.0062.57
Total 32.19 30.38 62.57
2.65 2.78 -0.13 0.01
3.65 3.65 0.00 0.00
2.55 2.40 0.15 0.01
3.2 3.30 -0.10 0.00
3.6 3.52 0.08 0.00
2.9 2.82 0.08 0.00
3.41 3.44 -0.03 0.00
3.33 3.44 -0.11 0.00
2.9 2.86 0.04 0.00
3.75 4 -0.01 0.00
2.75 3 0.13 0.01
3.45 3 0.00 0.00
2.11 2 -0.15 0.01
3.21 3.1 0.10 0.00
3.25 3.3 -0.08 0.00 ~ 22 ~
2.58 2.7 -0.08 0.00
3.28 3.2 0.03 0.00
3.35 3.2 0.11 0.00
2.65 2.7 -0.04 0.00
Chi square : 0.055

Critical value : 16.92 As per the decision rule, if p-value >0.05, we retain the null. Therefore our
null is retained, which concludes data is normally distributed
p-value : 1.00

For the current data set we would be proceeding with the Paired –t test, which has the following assumption

Dependent variable should be measured on a continuous scale

Independent variable should consist of two categorical, "related groups" or "matched pairs"
The distribution of the differences in the dependent variable between the two related groups should
be approximately normally distributed

Thus establishing we can run a paired t test on the data set

GPA Score of students who meditate = μm

GPA score of students who donot mediation=μnm

Setting the Hypothesis

H0 : μm<=μnm

H1: μm>μnm

Decision rule: if p-value >= 0.05, retain the null or if T-stat < T-critical value, retain the null

t-Test: Paired Two Sample for Means

No-
Meditation Meditation

~ 23 ~
Mean 3.219 3.038
0.21832111 0.24572888
Variance 1 9
Observations 10 10
0.93379979
Pearson Correlation 1
Hypothesized Mean
Difference 0.181
df 9
t Stat 0.0
P(T<=t) one-tail 0.5
1.83311293
t Critical one-tail 3
P(T<=t) two-tail 1
2.26215716
t Critical two-tail 3

Retaining the null as per one tail paired t-test p-value (0.5) > 0.05, and t-stat(0.00) < t-critical (1.85).

Thus we can say that Meditation doesn’t add to the GPA scores in students

Question 8: A Car tyres supplier claims that the average life of the tyre is more than 50,000 kilometers. Based
on a sample of 50 tyres, the mean was estimated as 51,200 and the standard deviation 1500.
a. Use an appropriate hypothesis test to check whether the claim by the supplier is true.

b. If the actual average life is 48,000 kilometers, calculate the type II error and the power of hypothesis test.

Answer 1:

Hypothesis :
H0: Average life of tyre is <=50000
H1: Average life of tyre is >50000

Deciding on the test : As std deviation is given and sample size >30, therefore we can assume that data is
normally distributed and go ahead with Z-test

Values: X= 51200
μ- population mean = 50000
σ – sample standard deviation = 1500
n -sample size = 50
p - alpha value = 0.05
df = 50-1 = 49

~ 24 ~
Decision rule : If Z > 1.64(this is z critical, with alpha 0.05),reject the null

Formula : Z = X - μ
σ/Sqrt(n)

Calculations : Z= 51200 – 50000

1500/Sqrt(50)

Z= 5.65

As Z >1.64, therefore as per the decision rule rejecting the null hypothesis, therefore driver claim of average life of
tyre to be more than 50,000 km cannot be rejected.

b. If the actual average life is 48,000 kilometers, calculate the type II error and the power of hypothesis test

TYPE 2 ERROR : When the null hypothesis is false and you fail to reject it, you make a type II error. The
probability of making a type II error is β, which depends on the power of the test.
The probability of rejecting the null hypothesis when it is false is equal to 1–β. This value is the power of
the test.

Values:
: Actual Population Mean = 48000
Hypothesized Mean = 50000

σ = 1500
n = 50
Alpha = 0.05
H0: μ >= 50000
H1: μ < 50000

Compute Beta when μ = μ1 = 48000

μ = μ0 = 50000

Decision rule : H0 will be rejected whenever Xbar is less than the critical value derived by

Xcritical = μ0 – Zcrit σ/ Sqrt(n) = 50000 – 1.645 * 1500/ sqrt (50) = 49651.0428

Null Hypothesis will be rejected for Xbar <= 49651.04

Determining Type II error, When μ = μ1 = 48000.

Definition of Type 2 error : P(H0 Not Rejected | μ = 48000)

P(Xbar > 495651.04 | μ = 48000)

Formula1 : Z = X - μ
σ/Sqrt(n)

Formula to calculate Beta = P[ Z> = Xcrit – μ1] = P[ Z> 49561 – 48000/1500/SQRT(50) = P[Z>7.78]
σ/Sqrt(n)

~ 25 ~
Determining the area in Normal distribution curve for Z>7.78

Therefore, formula to calculate Beta = 1-Norm.s.dist(7.78,TRUE) = 3.55E-15 ( This is almost close to 0),

Thus Type II Error = 0

Power of test = 1- Beta(3.55E-15) = 1

Question 9. Twenty-five overweight people were assigned three different programs for weight loss, namely: 1.
Diet, 2. Exercise and 3. Modification of eating behaviour. The weight changes are recorded after 3 months and
are shown in the table below. In the table, positive value indicates weight loss and negative value indicates
weight gain

S.N
o Diet Exercise Modification of Eating Behaviour
1 0 -3 10
2 4 -1 1
3 3 8 0
4 5 4 12
5 -3 2 18
6 10 3 4
7 0 -2
8 4 5
9 -2 3
10 4

Answer :

As the data set is very small , we were not able to establish the assumption of data being normal, through various
test conducted. Results of the same are given below

Pearson chi-square normality test

data: QUESTION$X0
P = 12, p-value = 0.03479 ~ 26 ~
Therefore for this question, we would want to proceed with a non parametric test, The Kruskal Wallis test

Assumptions of Kruskal Wallis test :

The Kruskal-Wallis test is a nonparametric (distribution free) test, we use it when we are unable to
determing the assumption of Anova
The K samples are random and Independent
There are 5 or more measurements per sample
The probability distribution is continous

The procedure for the test involves pooling the observations from the k samples into one combined sample,
keeping track of which sample each observation comes from, and then ranking lowest to highest from 1 to N,
where N = n1+n2 + ...+ nk.

Hypothesis :

H0 = The population median are equal

H1 = The population median are not equal

Formula for Kruskal :

Decision rule : If the observed value of H is greater than or equal to the critical value, we reject H0 in favor of
H1; if the observed value of H is less than the critical value we do not reject H0.

~ 27 ~
Modification of
Eating
Diet Exercise Behaviour
1 0 -3 10
2 4 -1 1
3 3 8 0
4 5 4 12
5 -3 2 18
6 10 3 4
7 0 -2
8 4 5
9 -2 3
10 4

Rank
Modification of
S.N Diet Eating Behaviour
o (N1) Exercise (N2) (N3)
1 7 1.5 22.5
2 16 5 9
3 12 21 7
4 19.5 16 24
5 1.5 10 25
6 22.5 12 16
7 7 3.5
8 16 19.5
9 3.5 12
10 16

COUNT Sum Square (Sum^2)/Count

N1 Diet 9 105 1225
N2 Exercise 6 65.5 715.0416667

~ 28 ~
N3 Modification of Eating
Behaviour 10 154.5 2387.025
N 25 4327

Calculating H statistic through the given formula

(12/25*(25+1))*4327 – 3 (25+1) = 1.88 H <= Critical value, therefore we Fail to reject
the null hypothesis. Stating that there is no
H Statistics = 1.88 significance difference in weight loss between
Critcal value = CHIINV(0.05,2) = 5.99 three weight loss programs
p-value = CHIDIST(1.88,2) =0.39

………………………………………………………………………………………………………………
………………………………………………………………………………………………………………
END OF ASSIGNMENT

~ 29 ~

Gradient Descent
No ratings yet
Gradient Descent
18 pages
Farm Woodworking 1919
80% (5)
Farm Woodworking 1919
144 pages
NAEMD Best Event Management Institute Prospectus
No ratings yet
NAEMD Best Event Management Institute Prospectus
36 pages
Comparative and Global Pedagogies
No ratings yet
Comparative and Global Pedagogies
244 pages
TESDA Circular No. 044-2020
100% (1)
TESDA Circular No. 044-2020
7 pages
RefMan Forms Spring2017 V2
No ratings yet
RefMan Forms Spring2017 V2
120 pages
Department of Education: Individual Performance Commitment and Review Form (Ipcrf) Part I-Iv
No ratings yet
Department of Education: Individual Performance Commitment and Review Form (Ipcrf) Part I-Iv
42 pages
Classification Trees - CART and CHAID
No ratings yet
Classification Trees - CART and CHAID
50 pages
Assignment 5
No ratings yet
Assignment 5
43 pages
Uplands School Weekly Newsletter - Term 1 Issue 2 - 26 August 2016
No ratings yet
Uplands School Weekly Newsletter - Term 1 Issue 2 - 26 August 2016
33 pages
Business Etiquette in Switzerland
100% (1)
Business Etiquette in Switzerland
2 pages
Prof Ed Set C
No ratings yet
Prof Ed Set C
30 pages
Assignment 2 Module 3
No ratings yet
Assignment 2 Module 3
26 pages
Focus On Ottawa County 2012
No ratings yet
Focus On Ottawa County 2012
15 pages
Pharmeasy
No ratings yet
Pharmeasy
16 pages
Mental Set
No ratings yet
Mental Set
2 pages
MBBS in Uzbekistan
No ratings yet
MBBS in Uzbekistan
10 pages
Y4 UNIT 11 Insect Investigator
No ratings yet
Y4 UNIT 11 Insect Investigator
9 pages
3 The Role of Parental Emotions in Parenting: Michael S. Nystul
No ratings yet
3 The Role of Parental Emotions in Parenting: Michael S. Nystul
11 pages
5 Characteristics of Grit
No ratings yet
5 Characteristics of Grit
5 pages
Magnetic Normal Modes of Bi-Component Permalloy Structures : Pam Malagò
No ratings yet
Magnetic Normal Modes of Bi-Component Permalloy Structures : Pam Malagò
5 pages
Development, Construction, and Validation of A Kindergarten Test On Language
No ratings yet
Development, Construction, and Validation of A Kindergarten Test On Language
9 pages
Research Associate Fellow Evidence Synthesis - ATJ
No ratings yet
Research Associate Fellow Evidence Synthesis - ATJ
5 pages
HEALTH Q1 Lesson 1 A Gender and Human Sexuality
No ratings yet
HEALTH Q1 Lesson 1 A Gender and Human Sexuality
3 pages
Time Table - Week 16 - May 06 - 11, 2024 - Semester IV - Batch 2022-27
No ratings yet
Time Table - Week 16 - May 06 - 11, 2024 - Semester IV - Batch 2022-27
5 pages
Allison Guenthner Resume
No ratings yet
Allison Guenthner Resume
1 page
DLL Sci G8 March 12-16
No ratings yet
DLL Sci G8 March 12-16
3 pages
JIMS Round UP Vol III, Issue No. 2
No ratings yet
JIMS Round UP Vol III, Issue No. 2
5 pages
Group 5 STEM C Communication Letter
No ratings yet
Group 5 STEM C Communication Letter
4 pages
Month 205-70-N1190 PC - 198 - 27 - 42263 PC - 203 - 32 - 51461 PC - 600 - 863 - 4210 PC - 6735 - 61 - 3410
No ratings yet
Month 205-70-N1190 PC - 198 - 27 - 42263 PC - 203 - 32 - 51461 PC - 600 - 863 - 4210 PC - 6735 - 61 - 3410
4 pages
Access and Quality: Annual Implementation Plan 2019-2020
No ratings yet
Access and Quality: Annual Implementation Plan 2019-2020
4 pages
Paragraph Writing - Autumn Exam First Sitting F - 240102 - 191604
No ratings yet
Paragraph Writing - Autumn Exam First Sitting F - 240102 - 191604
2 pages
Saraswati Puja
No ratings yet
Saraswati Puja
2 pages
CBSE Results 2024
No ratings yet
CBSE Results 2024
1 page
Resume Alex
No ratings yet
Resume Alex
2 pages
Ps Etr 1 0
No ratings yet
Ps Etr 1 0
1 page
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (629)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2885)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4103)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)

Assignment Module 1

Uploaded by

Assignment Module 1

Uploaded by

Assignment : Module 1

Hypothesis testing: 12th

Optimal Stopping theorem

N=20 (no of Gangsters)

All the values before "K" would be rejected

Equation 1: P (K) = Σ P(A)*P(B)

A= Probability of being the tallest criminal, B= Probability of being selected

P(A)=1/N, P(B) for B<=K = 0

For B = K+1, P(B) is 1/N

For B= K+2, P(B) is (1/N)*(K/K+1)

For B= K+3, P(B) is (1/N)*(K/K+2),

Similarly, P(N) = (1/N)*(K/N-1)

Equation 2: P(K) = (K/N)*ln(N/K)

Maximizing the equation 2 : Differentiation

Equation 3: P'(K)= -ln(K/N)- 1

On Equating, equation 3 to 0, to find the maxima

We get, K/N = e^-1 = (e= 2.718), therefore e^-1 =1/2.718 = 0.36791

Therefore N= 20, Thus K = N*e^-1 = 20*0.36791 = 7.4,

P(D) : : Probability that driver is drunk = 1/25 = 0.04

P(+):: Probability that test is positive

P(D'):: Probability that driver is not drunk = 1- P(D) = 0.96

P(+|D):: Probability that test is positive given driver is drunk = 0.95

P(D'|+) = P(D') * P(+|D')/ P(+),

Computing P(+) P(D'|+) = P(D') * P(+|D')/ P(D) * P(+|D) + P(D')*P(+|D')

Calculating From total probability we get,

P(+) = P(D) * P(+|D) + P(D')*P(+|D') = (0.04*0.95)+(0.96+0.08) = 0.1148

Calculating P (D’|+) = (0.96 * 0.08)/ 0.1148 = 0.66898 or 67%

Probability of a randomly tested driver being unnecessarily subjected to a blood

test after a positive breath test is 67% OR .668

The Poisson distribution has the following properties:

The mean of the distribution is equal to μ .

In the following question it states

x= 30 ( as we need probability (X=x = 30))

Poisson (30,22,True) : This gives us value of claims of exact 30 which is 0.95948

Therefore claims that exceed 30 will be (1 - 0.95948) = 0.04051

Probability of claims exceeding 30 is 0.04051

Chance of fraudulent claim is 0.05

Probability of atleast 2 fraudulent claims per day

Mean/μ = 22*0.05 = 1.1

x = 1 ( as we need probability of (X>=2))

Poisson (1(n),1.1(mean),True) = 0.6990, therefore atleast 2 fraudulent =( 1- 0.6990) = 0.30

Probability of atleast 2 fraudulent claims in a day is 0.30

(we rounded of the data set to fit them in the bin)

The data are the observed frequencies

Step 2: Setting up the Hypothesis

Step 3: Decision rule

Chi p value: 0.00E+00

For the same we even attempted p-value < 2.2e-16

Reconfirming that data is not consistent with normal distribution

Calculations: 1- Normdist (x=50,mean=37.7,std. dev = 35.59,cumulative)

37% (probability above 50)

Setting up the Hypothesis

H1: Notice period has an influence on joining

Df = (No. of rows – 1) *(No. of columns – 1) = 6*1 = 6

Chi square computation

Chi square = 768

Critical Value: CHIINV(0.05,6) = 13

p-value: CHISQ.TEST(Observed,Expected) = 1.12E-155

Notice period has a influence on joining

Next step: from the sample we establish we would be conducting a t-test

Expected salary vs frequency Histogram follows normal

F test to compare two variances ( Done through R)

t-Test: Two-Sample Assuming Unequal Variances

In Marketing & IB, fraction of people

Students who have commerce as their

a)Average salary received by students

b) Also more the exp. In marketing in

c) Students with Marketing & HR

Few more observation from the data

Calculations for probability that exactly 5 students are not placed

No. of trials (N) = 20

P (X=x) : Binomdist (x=5,20,0.20246,False) = 0.176

Answer 1 : The probability that exactly 5 students are not

Answer 2 : The probability that atleast 5 students are

Average salary of students with average score of 60 marks or less = μ 60

Average salary of students with average score of more than 60 = μ over60

Therefore N= 20, Thus K = Ne^-1 = 200.36791 = 7.4,

P(+) = P(D) * P(+|D) + P(D')P(+|D') = (0.040.95)+(0.96+0.08) = 0.1148

Df = (No. of rows – 1) (No. of columns – 1) = 61 = 6