0% found this document useful (0 votes)
617 views20 pages

Advanced Statistics Project - Jayant Chandra

The document contains information about several statistical analyses performed on different datasets. 1) It analyzes injury data from different football positions and calculates probabilities related to injuries for specific positions. 2) It examines probability calculations for types of accidents at a nuclear power plant that could result in radiation leaks. 3) It looks at breaking strength data for gunny bags used for cement packaging and calculates proportions falling in different strength ranges. 4) It analyzes exam grade data from a training course and calculates probabilities related to student scores. 5) It provides data on stone hardness and examines whether a company is justified in believing a batch of stones may not be suitable for printing based on a minimum hardness level.

Uploaded by

jayant chandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
617 views20 pages

Advanced Statistics Project - Jayant Chandra

The document contains information about several statistical analyses performed on different datasets. 1) It analyzes injury data from different football positions and calculates probabilities related to injuries for specific positions. 2) It examines probability calculations for types of accidents at a nuclear power plant that could result in radiation leaks. 3) It looks at breaking strength data for gunny bags used for cement packaging and calculates proportions falling in different strength ranges. 4) It analyzes exam grade data from a training course and calculates probabilities related to student scores. 5) It provides data on stone hardness and examines whether a company is justified in believing a batch of stones may not be suitable for printing based on a minimum hardness level.

Uploaded by

jayant chandra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Advanced Statistics Project

By

Jayant Chandra
Problem 1

A physiotherapist with a male football team is interested in studying the relationship between
foot injuries and the positions at which the players play from the data collected

Attacking
Striker Forward Winger Total
Midfielder

Players Injured 45 56 24 20 145

Players Not Injured 32 38 11 9 90

Total 77 94 35 29 235

1.1 What is the probability that a randomly chosen player would suffer an injury?

The probability that a randomly chosen player would suffer an injury = 61%
Number of player who would suffer an injury = 145
Total number of players = 235
The probability that a randomly chosen player would suffer an injury = 145 / 235 = 0.6170

1.2 What is the probability that a player is a forward or a winger?

The probability that a randomly chosen player is a forward or a winger = 52%


Number of player who are a forward or a winger = 94 + 29 = 123
Total number of players = 235
The probability that a randomly chosen player is a forward or a winger = 123 / 235 = 0.5234

1.3 What is the probability that a randomly chosen player plays in a striker position and has a
foot injury?

The probability that a randomly chosen player plays in a striker position and has a foot injury
= 19%

Number of player who plays in a striker position and has a foot injury = 45

Total number of players = 235

The probability that a randomly chosen player plays in a striker position and has a foot injury
P(Striker who has a foot injury) = 45/235
= 0.19

1.4 What is the probability that a randomly chosen injured player is a striker?

The probability that a randomly chosen injured player is a striker = 31%

Number of injured player who are striker = 45 Number of injured players = 145

The probability that a randomly chosen injured player is a striker = 45 / 145 = 0.3103

1.5 What is the probability that a randomly chosen injured player is either a forward or an
attacking midfielder?

The probability that a randomly chosen injured player is either a forward or an attacking
midfielder = 55%

Number of injured player who are either a forward or an attacking midfielder = 56 + 24 = 80

Number of injured players = 145

The probability that a randomly chosen injured player is either a forward or an attacking
midfielder = 80 / 145 = 0.5517
Problem 2

An independent research organization is trying to estimate the probability that an accident at a


nuclear power plant will result in radiation leakage. The types of accidents possible at the plant
are, fire hazards, mechanical failure, or human error. The research organization also knows that
two or more types of accidents cannot occur simultaneously.

According to the studies carried out by the organization, the probability of a radiation leak in
case of a fire is 20%, the probability of a radiation leak in case of a mechanical 50%, and the
probability of a radiation leak in case of a human error is 10%. The studies also showed the
following;

• The probability of a radiation leak occurring simultaneously with a fire is 0.1%.

• The probability of a radiation leak occurring simultaneously with a mechanical failure is


0.15%.

• The probability of a radiation leak occurring simultaneously with a human error is 0.12%.

On the basis of the information available, answer the questions below:

Probability of radiation leak happening, given that fire is there is .20. this is a Baye's theorem
situation. According to Baye's theorem probability of an event A occurring given that B has
already occurred is given by P(A/B)=P(AnB)/P(B)

Defining the events:


F= fire
M= mechanical error
H= human error
R= radiation leak
N= no accident

Given probabilities:
P(R/F) = 0.2
P(R/M)=0.5
P(R/H)=0.1
P(R ∩ F) =0.001
P(R∩ M)=0.0015
P(R∩ H)=0.0012
2.1 What are the probabilities of a fire, a mechanical failure, and a human error respectively?

P(F)= P(P(R ∩ F)/ P(R/F)= 0.001/0.2=0.005


P(M)= P(P(R ∩ M)/ P(R/M)= 0.0015/0.5=0.003
P(H)= P(P(R ∩ H)/ P(R/H)= 0.0012/0.1=0.012

• Probabilities of a fire is 0.5 %


• Probabilities of a mechanical failure is 0.3%
• Probabilities of a human error is 1.2 %

2.2 What is the probability of a radiation leak?

Since, the types of accidents possible at the plant are, fire hazards, mechanical failure, or
human error.
P(N) =1 –(0.005 +0.003+0.012)=0.98
P(R/N)=0
P(R ∩ N)= P(R/N)P(N)=0
By probability theorem:
P(R)=P(R ∩ F)+ P(R ∩ M)+ P(R ∩ H)+ P(R ∩ N)+
P(R)=0.001+0.0015+0.0012+0
P(R)=0.0037

• So we came to conclusion that the Probability of Radiation leak will be 0.37 %

2.3 Suppose there has been a radiation leak in the reactor for which the definite cause is not
known. What is the probability that it has been caused by:

• A Fire.

• A Mechanical Failure.

• A Human Error.

The probability of a fire radiation is


P(F/R)= P(P(R ∩ F)/ P(R)= 0.001/0.0037= 0.270

The probability of the mechanical failure radiation leak


P(M/R)= P(P(R ∩ M)/ P(R)= 0.0015/0.0037=0.405

The probability of the Human Error Radiation is


P(H/R)= P(P(R ∩ H)/ P(R)= 0.0012/0.0037=0.324

• Probability that Radiation Leak has been caused by Fire is 0.2702702702702703 ≈


0.27%
• Probability that Radiation Leak has been caused by Mechanical Failure
0.4054054054054054 ≈ 0.40 %
• Probability that Radiation Leak has been caused by Human Error is
0.32432432432432434 ≈ .32%
Problem 3:

The breaking strength of gunny bags used for packaging cement is normally distributed with a
mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg per sq. centimeter. The
quality team of the cement company wants to know the following about the packaging material
to better understand wastage or pilferage within the supply chain; Answer the questions below
based on the given information; (Provide an appropriate visual representation of your answers,
without which marks will be deducted)

3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg per sq cm?

μ (Mean) = 5
σ (Standard Deviation) = 1.5
X (Gunny Bag Strength) = 3.17
Z Value ( Z) = (X - μ)/ σ = -1.22 ; CDF Value = 0.1112
Z = (3.17-5)/1.5 = -1.22
P(z < -1.22) = 0.1112

• It can be interpreted as 11.1% gunny bags have a breaking strength less than 3.17 kg
per sq. cm.

3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per sq cm.?

Z = (3.6 – 5)/1.5 = -0.9333


P(Z > -0.9333) = 0.8247
0.8247

• Thus we conclude that the 82.46% of gunny bags have the breaking strength at least
3.6 kg/sq.cm
3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5 kg per sq
cm.?

Z = (5.5 – 5)/1.5 = 0.3333


Z = (5 – 5)/1.5 = 0
P(0 < z < 0.3333) = P(Z < 0.3333) – P(Z < 0) = 0.1306
0.1306
• Thus we conclude that 13.06% of gunny bags have breaking strength between 5 and
5.5 kg/sq.cm.

3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and 7.5 kg per
sq cm.?

z3 = (3-5)/1.5 = -1.3333
z4 = (7.5-5)/1.5 =1.6666
stats.norm.cdf(z4)-stats.norm.cdf(z3)= 0.86099
1-0.86099=0.13901

• Thus we conclude that the proportion of gunny bags having strength not between 3
and 7.5 per sq cm is 13.9%
Problem 4:
Grades of the final examination in a training course are found to be normally distributed, with a
mean of 77 and a standard deviation of 8.5. Based on the given information answer the
questions below.

4.1 What is the probability that a randomly chosen student gets a grade below 85 on this exam?

➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
➢ Z Value (Z) = (X - μ)/ σ = 0.94; CDF value = 0.8267

Conclusion - probability that a randomly chosen student gets a grade below 85 on this exam is
82.6%

4.2 What is the probability that a randomly selected student scores between 65 and 87?

➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
➢ X1 = 65
➢ X2 = 87
Z1 = (X1 - μ)/ σ
Z2 = (X2 - μ)/ σ
Z Value ( Z 1) = (X1 - μ)/ σ = -1.41
Z Value ( Z 2) = (X2 - μ)/ σ = 1.17; CDF Value = .8012

The probability that a randomly selected student scores between 65 and 87 is 80.12%

4.3 What should be the passing cut-off so that 75% of the students clear the exam?

➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
We need to calculate the PPF for 25% as top 75% starts for 25th percentile
The minimum score required for 75% is 71.26

Conclusion ----- passing cut-off so that 75% of the students clear the exam is 71.26
Problem 5:

Zingaro stone printing is a company that specializes in printing images or patterns on polished
or unpolished stones. However, for the optimum level of printing of the image the stone surface
has to have a Brinell's hardness index of at least 150. Recently, Zingaro has received a batch of
polished and unpolished stones from its clients. Use the data provided to answer the following
(assuming a 5% significance level);

5.1 Earlier experience of Zingaro with this particular client is favorable as the stone surface was
found to be of adequate hardness. However, Zingaro has reason to believe now that the
unpolished stones may not be suitable for printing. Do you think Zingaro is justified in thinking
so?

Step 1: Define null and alternative hypotheses


Hypothesis Formulation

We will use the T Test : Two sample independent T test

** It states that stones having BHN no of atleast 150 suitable for printing which means any
stonewith BHN<150 is not suitable for printing.**

𝐇𝐇: µ(𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇) ≥ 150


𝐇𝐇: µ(𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇𝐇) < 150

Step 2: Decide the significance level


Significance Level (α) is given as 5 % α = 0.05, Standard deviation (σ) is known as well
Unpolished = 33.041804
Treated and Polished = 15.58735
Step 3: Identify the test statistic
Sample Size (n) = 75
we will use two sample independent T test

Step 4: Calculating the p - value and test statistic

We use the scipy.stats.ttest_ind to calculate the t-test for the means of two independent samples of
given two sample observations. This function returns t statistic and two-tailed pvalue This is a two-
sided test for the null hypothesis that 2 independent samples have identical average (expected)
values. This test assumes that the populations have identical variances

5.2 Is the mean hardness of the polished and unpolished stones the same?

H0 : Avg Brinell's hardness index Unpolished = Avg Brinell's hardness index Polished and Treated HA : Avg
Brinell's hardness index Unpolished = Avg Brinell's hardness index Polished and Treated= Avg Brinell's
hardness index Polished and Treated 5% significance level

The T statistic is: -3.242232050141406


The corresponding p- value is: 0.001465515019462831
since p-value < 0.05

We reject the null hypothesis (H0)


i.e. Mean hardness of the polished and unpolished stones are the same

And we have enough evidence to prove that Mean hardness of the polished and unpolished
stones are not equal
Problem 6:

Aquarius health club, one of the largest and most popular cross-fit gyms in the country has been
advertising a rigorous program for body conditioning. The program is considered successful if
the candidate is able to do more than 5 push-ups, as compared to when he/she enrolled in the
program. Using the sample data provided can you conclude whether the program is successful?
(Consider the level of Significance as 5%)

Note that this is a problem of the paired-t-test. Since the claim is that the training will make a
difference of more than 5, the null and alternative hypotheses must be formed accordingly.

Ans:-
H0: The number of push-ups before and

after enrolling is sameH1: The number of

push-ups after enrolling is > 5

At 95% significance

dbar = 5.55
std dev = 2.872281
n = 100
Df = 99

Formula for paired t test


T statistic = -19.32261
P value = 1.14602E-35

since p-value < 0.05

We reject the null hypothesis (H0)


i.e. The number of push-ups before and after enrolling is same
We have enough evidence to prove that the training has made a difference of
morethan 5 push ups
Problem 7:

Dental implant data: The hardness of metal implant in dental cavities depends on multiple factors, such as the
method of implant, the temperature at which the metal is treated, the alloyused as well as on the dentists who may
favour one method above another and may work better in his/her favourite method. The response is the variable of
interest.

1. Test whether there is any difference among the dentists on the implant hardness. Statethe null and
alternative hypotheses. Note that both types of alloys cannot be considered together. You must state the null and
alternative hypotheses separately for the two types of alloys.

Ans -1

H0: The mean response for the hardness of dental implant is the same for all the 5 dentists
provided the dentists are using type 1 alloy

H1: For at least 1 dentist the mean response for the hardness of dental implant is
differentand is using type 1 alloy

H0: The mean response for the hardness of dental implant is the same for all the 5 dentists
provided the dentists are using type 2 alloy

H1: For at least 1 dentist the mean response for the hardness of dental implant is
differentand is using type 2 alloy
2. Before the hypotheses may be tested, state the required assumptions. Are the
assumptions fulfilled? Comment separately on both alloy types.

Ans - Assumption for 2 way anova are:

Dependent variable should be measured at continuous level.

The independent variables should have 2 or more categorical variables.

There should not be an significant outliner

3. Irrespective of your conclusion in 2, we will continue with the testing procedure. What
do you conclude regarding whether implant hardness depends on dentists? Clearly state your
conclusion. If the null hypothesis is rejected, is it possible to identify which pairs of dentists
differ?

Ans -

Since the p-value is greater than 0.05

The null hypothesis is retained and the implant hardness is not dependent on dentists

4. Now test whether there is any difference among the methods on the hardness of dental
implant, separately for the two types of alloys. What are your conclusions? If the null hypothesis
is rejected, is it possible to identify which pairs of methods differ?
Ans-

Since the p- value is less than 0.05 the null hypothesis is rejected and
therefore the implanthardness is dependent on the method used

5. Now test whether there is any difference among the temperature levels on the hardness
of dental implant, separately for the two types of alloys. What are your conclusions? If the null
hypothesis is rejected, is it possible to identify which levels of temperatures differ?

Ans-

The p-value for Temp variable is greater than 0.05 therefore the null
hypothesis is retained andthe implant hardness is not dependent on the
temperature

6. Consider the interaction effect of dentist and method and comment on the interaction
plot, separately for the two types of alloys?
Ans -

You might also like