Stats Presentation
Stats Presentation
STATISTICS
TEAM MEMBERS:
ADITYA SRIVASTAVA
PURUSHOTTAM KUMAR
ROHAN KULDHAR
RUCHI BHATT
SHASHANK KUMAR PANDEY
Agenda
Using the core statistical theoretical concepts and
knowledge to solve real time problem statements.
Visualize a real time industry scenario where one can use
these statistical concepts.
Perform detailed statistical analysis on various concepts
of descriptive statistics, probability distributions and
inferential statistics including confidence intervals and
hypothesis testing.
AGES OF 30 CUSTOMERS WHO ORDERED AN EV SCOOTER FROM ZEN AUTOMOTIVES.EE.
We have a dataset represented as a pandas Series. The data consists of the following
values: 42, 44, 62, 35, 20, 30, 56, 20, 23, 41, 55, 22, 31, 27, 66, 21, 18, 24, 42, 25, 32, 50, 31, 26,
36, 39, 40, 18, 36, and 22.
Calculate the Pearson coefficient of skewness and comment on the skewness of the
data.
The Pearson coefficient of skewness is 0.68.
Skewness is > 0 , hence data is positively skewed.
Count the number of data values that fall within two standard deviations of the mean.
Compare this with the answer from Chebyshev’s Theorem.
There are 28 values (93%) that fall within two standard deviaions.
Chebyshev’s Theorem tells you that at least 75% of the values fall between 34.46 ± 26.63,
equating to a range of 7.8 – 61.1.
Conversely, no more than 25% fall outside that range.
What is the probability that a person ordering an EV scooter is above 50 years old?
There are 4 values which are greater than 50.
Probability of a person greater than 50 is 4/30 = 13.33.
Create a probability What is the shape of the distribution of this
distribution of the data and dataset? Create an appropriate graph to
visualize it appropriately. determine that. Take 100 random samples
with replacement from this dataset of size 5
each. Create a sampling distribution of the
mean age of customers. Compare with other
sampling distributions of sample size 10, 15,
20, 25, 30. State your observations. Does it
corroboratethe Central Limit Theorem?
THE RESULTING HISTOGRAMS SHOW THAT THE SAMPLING DISTRIBUTION OF THE MEAN AGE OF CUSTOMERS
BECOMES MORE SYMMETRIC AND BELL-SHAPED AS THE SAMPLE SIZE INCREASES. THEREFORE, WE CAN
CONCLUDE THAT THE CENTRAL LIMIT THEOREM IS CORROBORATED BY THE SAMPLING DISTRIBUTIONS
IN THIS EXAMPLE.
Treat this dataset as a binomial distribution where p is the probability that a person
ordering an EV is above 50 years age. What is the probability that out of a random
sample of 10 buyers exactly 6 are above 50 years of age?
The probability that out of a random sample of 10 buyers exactly 6 are above 50 years of age
stats.binom.pmf(k=6, n=10, p=6/10) = 0.00066
Compute a 95% Confidence Interval for the true mean age of the population of EV
scooter buyers for the dataset using appropriate distribution.( State reasons as to why
did you use a z or t distribution).
stats.norm.interval(0.95,loc=data.mean(),scale=np.std(data)/np.sqrt(30))
(29.7811, 39.1521)
Since sample size is 30 and standard devaition is known we use z distribution.
A data scientist wants to estimate with 95% confidence the proportion of people who
own an EV in the population. A recent study showed that 20% of people interviewed
had an EV. The data scientist wants to be accurate within 2% of the true proportion.
Find the minimum sample size necessary.
n=(z**2 * p * (1-p)) / (d**2) = 173.1856
A researcher claims that currently 20% of the population are owning EVs. Test his claim
with an alpha =0.05 if out of a random sample of 30 two-wheeler owners only 5 own an
EV.
z_score = (p_samp-p_hyp)/np.sqrt(p_hyp*(1-p_hyp)/n) = -0.45
The critical value for a two-tailed test at a 0.05 significance level is ±1.96. Since the
calculated test statistic (-0.45) is not in the rejection region, we fail to reject the null
hypothesis.
Assume you are working for a Consumer Protection Agency that looks at complaints
raised by customers for the transportation industry. Say you have been receiving
complaints about the mileage of the latest EV launched by the Zen Automotives. Zen
allows you to test randomly 40 of its new EVs to test mileage. Zen claims that the new
EVs get a mileage of 96 kmpl on the highway. Your results show a mean of 91.3 kmpl and
a standard deviation of 14.4.
We perform t test for H0=96 and H1 != 96
t= (s_avg-p_avg)/(std/np.sqrt(40)) = ±2.064
p_value = stats.t.sf(np.abs(t),39) * 2
Since the pvalue<0.05, we to reject the null hypothesis.
After more complaints you decide to test the variability of the mileage on the highway.
On questioning Zen’s quality control engineer , you find that they are claiming a
standard deviation of 7.2. Test the claim about the standard deviation.
chi2_stat = (sample_size - 1) * sample_var / null_var
p_value = chi2.sf(chi2_stat, df=sample_size-1)
p-value is very small (less than the significance level of 0.05), which means that we have
strong evidence to reject the claim of Zen's quality control engineer that the standard
deviation of the mileage is 7.2.
A study claims that 10% of all customers for an EV scooter are above 50 years of age.
Using the Normal approximation of a Binomial distribution, find the probability that in a
random sample of 300 prospective customers exactly 25 will be above 50 years of age.
Z1=(24.5-mean)/sd , Z2=(25.5-mean)/sd
P_Z1=stats.norm.cdf(Z1), P_Z2=stats.norm.cdf(Z2)
P_Z2 - P_Z1=0.048
Thank you!