Advanced Statistics Project - Jayant Chandra
Advanced Statistics Project - Jayant Chandra
By
Jayant Chandra
Problem 1
A physiotherapist with a male football team is interested in studying the relationship between
foot injuries and the positions at which the players play from the data collected
Attacking
Striker Forward Winger Total
Midfielder
Total 77 94 35 29 235
1.1 What is the probability that a randomly chosen player would suffer an injury?
The probability that a randomly chosen player would suffer an injury = 61%
Number of player who would suffer an injury = 145
Total number of players = 235
The probability that a randomly chosen player would suffer an injury = 145 / 235 = 0.6170
1.3 What is the probability that a randomly chosen player plays in a striker position and has a
foot injury?
The probability that a randomly chosen player plays in a striker position and has a foot injury
= 19%
Number of player who plays in a striker position and has a foot injury = 45
The probability that a randomly chosen player plays in a striker position and has a foot injury
P(Striker who has a foot injury) = 45/235
= 0.19
1.4 What is the probability that a randomly chosen injured player is a striker?
Number of injured player who are striker = 45 Number of injured players = 145
The probability that a randomly chosen injured player is a striker = 45 / 145 = 0.3103
1.5 What is the probability that a randomly chosen injured player is either a forward or an
attacking midfielder?
The probability that a randomly chosen injured player is either a forward or an attacking
midfielder = 55%
The probability that a randomly chosen injured player is either a forward or an attacking
midfielder = 80 / 145 = 0.5517
Problem 2
According to the studies carried out by the organization, the probability of a radiation leak in
case of a fire is 20%, the probability of a radiation leak in case of a mechanical 50%, and the
probability of a radiation leak in case of a human error is 10%. The studies also showed the
following;
• The probability of a radiation leak occurring simultaneously with a human error is 0.12%.
Probability of radiation leak happening, given that fire is there is .20. this is a Baye's theorem
situation. According to Baye's theorem probability of an event A occurring given that B has
already occurred is given by P(A/B)=P(AnB)/P(B)
Given probabilities:
P(R/F) = 0.2
P(R/M)=0.5
P(R/H)=0.1
P(R ∩ F) =0.001
P(R∩ M)=0.0015
P(R∩ H)=0.0012
2.1 What are the probabilities of a fire, a mechanical failure, and a human error respectively?
Since, the types of accidents possible at the plant are, fire hazards, mechanical failure, or
human error.
P(N) =1 –(0.005 +0.003+0.012)=0.98
P(R/N)=0
P(R ∩ N)= P(R/N)P(N)=0
By probability theorem:
P(R)=P(R ∩ F)+ P(R ∩ M)+ P(R ∩ H)+ P(R ∩ N)+
P(R)=0.001+0.0015+0.0012+0
P(R)=0.0037
2.3 Suppose there has been a radiation leak in the reactor for which the definite cause is not
known. What is the probability that it has been caused by:
• A Fire.
• A Mechanical Failure.
• A Human Error.
The breaking strength of gunny bags used for packaging cement is normally distributed with a
mean of 5 kg per sq. centimeter and a standard deviation of 1.5 kg per sq. centimeter. The
quality team of the cement company wants to know the following about the packaging material
to better understand wastage or pilferage within the supply chain; Answer the questions below
based on the given information; (Provide an appropriate visual representation of your answers,
without which marks will be deducted)
3.1 What proportion of the gunny bags have a breaking strength less than 3.17 kg per sq cm?
μ (Mean) = 5
σ (Standard Deviation) = 1.5
X (Gunny Bag Strength) = 3.17
Z Value ( Z) = (X - μ)/ σ = -1.22 ; CDF Value = 0.1112
Z = (3.17-5)/1.5 = -1.22
P(z < -1.22) = 0.1112
• It can be interpreted as 11.1% gunny bags have a breaking strength less than 3.17 kg
per sq. cm.
3.2 What proportion of the gunny bags have a breaking strength at least 3.6 kg per sq cm.?
• Thus we conclude that the 82.46% of gunny bags have the breaking strength at least
3.6 kg/sq.cm
3.3 What proportion of the gunny bags have a breaking strength between 5 and 5.5 kg per sq
cm.?
3.4 What proportion of the gunny bags have a breaking strength NOT between 3 and 7.5 kg per
sq cm.?
z3 = (3-5)/1.5 = -1.3333
z4 = (7.5-5)/1.5 =1.6666
stats.norm.cdf(z4)-stats.norm.cdf(z3)= 0.86099
1-0.86099=0.13901
• Thus we conclude that the proportion of gunny bags having strength not between 3
and 7.5 per sq cm is 13.9%
Problem 4:
Grades of the final examination in a training course are found to be normally distributed, with a
mean of 77 and a standard deviation of 8.5. Based on the given information answer the
questions below.
4.1 What is the probability that a randomly chosen student gets a grade below 85 on this exam?
➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
➢ Z Value (Z) = (X - μ)/ σ = 0.94; CDF value = 0.8267
Conclusion - probability that a randomly chosen student gets a grade below 85 on this exam is
82.6%
4.2 What is the probability that a randomly selected student scores between 65 and 87?
➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
➢ X1 = 65
➢ X2 = 87
Z1 = (X1 - μ)/ σ
Z2 = (X2 - μ)/ σ
Z Value ( Z 1) = (X1 - μ)/ σ = -1.41
Z Value ( Z 2) = (X2 - μ)/ σ = 1.17; CDF Value = .8012
The probability that a randomly selected student scores between 65 and 87 is 80.12%
4.3 What should be the passing cut-off so that 75% of the students clear the exam?
➢ μ (Mean) = 77
➢ σ (Standard Deviation) = 8.5
We need to calculate the PPF for 25% as top 75% starts for 25th percentile
The minimum score required for 75% is 71.26
Conclusion ----- passing cut-off so that 75% of the students clear the exam is 71.26
Problem 5:
Zingaro stone printing is a company that specializes in printing images or patterns on polished
or unpolished stones. However, for the optimum level of printing of the image the stone surface
has to have a Brinell's hardness index of at least 150. Recently, Zingaro has received a batch of
polished and unpolished stones from its clients. Use the data provided to answer the following
(assuming a 5% significance level);
5.1 Earlier experience of Zingaro with this particular client is favorable as the stone surface was
found to be of adequate hardness. However, Zingaro has reason to believe now that the
unpolished stones may not be suitable for printing. Do you think Zingaro is justified in thinking
so?
** It states that stones having BHN no of atleast 150 suitable for printing which means any
stonewith BHN<150 is not suitable for printing.**
We use the scipy.stats.ttest_ind to calculate the t-test for the means of two independent samples of
given two sample observations. This function returns t statistic and two-tailed pvalue This is a two-
sided test for the null hypothesis that 2 independent samples have identical average (expected)
values. This test assumes that the populations have identical variances
5.2 Is the mean hardness of the polished and unpolished stones the same?
H0 : Avg Brinell's hardness index Unpolished = Avg Brinell's hardness index Polished and Treated HA : Avg
Brinell's hardness index Unpolished = Avg Brinell's hardness index Polished and Treated= Avg Brinell's
hardness index Polished and Treated 5% significance level
And we have enough evidence to prove that Mean hardness of the polished and unpolished
stones are not equal
Problem 6:
Aquarius health club, one of the largest and most popular cross-fit gyms in the country has been
advertising a rigorous program for body conditioning. The program is considered successful if
the candidate is able to do more than 5 push-ups, as compared to when he/she enrolled in the
program. Using the sample data provided can you conclude whether the program is successful?
(Consider the level of Significance as 5%)
Note that this is a problem of the paired-t-test. Since the claim is that the training will make a
difference of more than 5, the null and alternative hypotheses must be formed accordingly.
Ans:-
H0: The number of push-ups before and
At 95% significance
dbar = 5.55
std dev = 2.872281
n = 100
Df = 99
Dental implant data: The hardness of metal implant in dental cavities depends on multiple factors, such as the
method of implant, the temperature at which the metal is treated, the alloyused as well as on the dentists who may
favour one method above another and may work better in his/her favourite method. The response is the variable of
interest.
1. Test whether there is any difference among the dentists on the implant hardness. Statethe null and
alternative hypotheses. Note that both types of alloys cannot be considered together. You must state the null and
alternative hypotheses separately for the two types of alloys.
Ans -1
H0: The mean response for the hardness of dental implant is the same for all the 5 dentists
provided the dentists are using type 1 alloy
H1: For at least 1 dentist the mean response for the hardness of dental implant is
differentand is using type 1 alloy
H0: The mean response for the hardness of dental implant is the same for all the 5 dentists
provided the dentists are using type 2 alloy
H1: For at least 1 dentist the mean response for the hardness of dental implant is
differentand is using type 2 alloy
2. Before the hypotheses may be tested, state the required assumptions. Are the
assumptions fulfilled? Comment separately on both alloy types.
3. Irrespective of your conclusion in 2, we will continue with the testing procedure. What
do you conclude regarding whether implant hardness depends on dentists? Clearly state your
conclusion. If the null hypothesis is rejected, is it possible to identify which pairs of dentists
differ?
Ans -
The null hypothesis is retained and the implant hardness is not dependent on dentists
4. Now test whether there is any difference among the methods on the hardness of dental
implant, separately for the two types of alloys. What are your conclusions? If the null hypothesis
is rejected, is it possible to identify which pairs of methods differ?
Ans-
Since the p- value is less than 0.05 the null hypothesis is rejected and
therefore the implanthardness is dependent on the method used
5. Now test whether there is any difference among the temperature levels on the hardness
of dental implant, separately for the two types of alloys. What are your conclusions? If the null
hypothesis is rejected, is it possible to identify which levels of temperatures differ?
Ans-
The p-value for Temp variable is greater than 0.05 therefore the null
hypothesis is retained andthe implant hardness is not dependent on the
temperature
6. Consider the interaction effect of dentist and method and comment on the interaction
plot, separately for the two types of alloys?
Ans -