Basic - Statsitics - Level 1
Basic - Statsitics - Level 1
Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Interval
Weight Ratio
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Interval
Height Ratio
Type of living accommodation Ordinal
Level of Agreement Ordinal
IQ (Intelligence Scale) Ratio
Sales Figures Ratio
Blood Group Nominal
Time Of Day Ordinal
Time on a Clock with Hands Interval
Number of Children Nominal
Religious Preference Nominal
Barometer Pressure Interval
SAT Scores Interval
Years of Education Ordinal
Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
Ans:
P (Two heads and one tail) = N (Event (Two heads and one tail)) / N (Event (Three
coins tossed))
= 3/8 = 0.375 = 37.5%
Q4) Two Dice are rolled, find the probability where sum is
a) Equal to 1
b) Less than or equal to 4
c) Divisible by 2 and 3
Ans:
Number of possible outcomes for the above event is
N (Event (Two dice rolled)) = 6^2 = 36
a.) P (sum is Equal to 1) = ‘0’ zero null nada none.
b.) P (Sum is less than or equal to 4) = N (Event (Sum is less than or equal to
4)) / N (Event (Two dice rolled))
= 6 / 36 = 1/6 = 0.166 = 16.66%
c.) P (Sum is divisible by 2 and 3) = N (Event (Sum is divisible by 2 and 3)) / N
(Event (Two dice rolled))
= 6 / 36 = 1/6 = 0.16 = 16.66%
Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at random.
What is the probability that none of the balls drawn is blue?
Ans: Total number of balls =7 balls
N (Event (2 balls are drawn randomly from bag) = 7! / 2! * 5!
= (7*6*5*4*3*2*1) /
(2*1) * (5*4*3*2*1)
N (Event (2 balls are drawn randomly from bag) = (7*6)/ (2*1) = 21
If none of them drawn 2 balls are blue = 7 – 2 = 5
N (Event (None of the balls drawn is blue) = 5! / 2! * 3! = (5*4) / (2*1)
= 10
P (None of the balls drawn is blue) = N (Event (None of the balls drawn is blue) /
N (Event (2 balls are drawn randomly from
bag)
= 10 / 21
Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children (ignoring the nature of
the child-Generalized view)
CHILD Candies count Probability
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
Ans:
0.015+0.8+1.95+0.025+0.06+0.24 = 3.09
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points, Score, Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range and
also Comment about the values/ Draw some inferences.
Use Q7.csv file
Ans:
Mean for Points = 3.59, Score = 3.21 and Weigh = 17.84
Median for Points = 3.69, Score = 3.32 and Weigh = 17.71
Mode for Points = 3.07, Score = 3.44 and Weigh = 17.02
Variance for Points = 0.28, Score = 0.95, Weigh = 3.19
Standard Deviation for Points = 0.53, Score = 0.97, Weigh = 1.78
Range [Min-Max] for Points [3.59 – 4.93], Score [3.21 – 5.42] and Weigh [17.84 –
22.9]
Draw Inferences
Q8) Calculate Expected Value for the problem below
a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected Value
of the Weight of that patient?
Ans: Expected value = Sum (X * Probability of X)
= (1/9)(108)+ (1/9)(110)+ (1/9)(123)+ (1/9)(134)+ (1/9)(145)+ (1/9)(167)+ (1/9)
(187)+ (1/9)(199)
= 145.33
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance
Use Q9_a.csv
Ans:
q9a = pd.read_csv("C:/Users/Moin Dalvi/Documents/EXcelR Study and
Assignment Material/Data Science Assignments/Basic Statistics 1/Q9_a.csv",
index_col = 'Index')
print('For Cars Speed', "Skewness value=", np.round(q9a.speed.skew(),2), 'and' ,
'Kurtosis value=', np.round(q9a.dist.skew(),2))
For Cars Speed Skewness value= -0.12 and Kurtosis value= 0.81
Draw an Inference from the distribution of data for Boxplot 1 with respect Boxplot
2.
Ans: First there are no outliers. Second both the box plot shares the same median
that is approximately in a range between 275 to 250 and they are normally
distributed with zero to no skewness neither at the minimum or maximum whisker
range.
Q 20) Calculate probability from the given dataset for the below cases
P(MPG>38)= 0.348
b. P(MPG<40)
Ans: prob_MPG_less_than_40 = np.round(stats.norm.cdf(40, loc =
q20.MPG.mean(), scale = q20.MPG.std()),3)
print('P(MPG<40)=',prob_MPG_less_than_40)
P(MPG<40)= 0.729
c. P (20<MPG<50)
Ans: prob_MPG_greater_than_20 = np.round(1-stats.norm.cdf(20, loc =
q20.MPG.mean(), scale = q20.MPG.std()),3)
print('p(MPG>20)=',(prob_MPG_greater_than_20))
p(MPG>20)= 0.943
prob_MPG_greaterthan20_and_lessthan50= (prob_MPG_less_than_50) -
(prob_MPG_greater_than_20)
print('P(20<MPG<50)=',(prob_MPG_greaterthan20_and_lessthan50))
P(20<MPG<50)= 0.013000000000000012
Q 24) A Government company claims that an average light bulb lasts 270 days.
A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an
average of 260 days, with a standard deviation of 90 days. If the CEO's claim were
true, what is the probability that 18 randomly selected bulbs would have an
average life of no more than 260 days
Hint:
rcode pt(tscore,df)
df degrees of freedom
Ans: import numpy as np
Import scipy as stats
t_score = (x - pop mean) / (sample standard daviation / square root of sample size)
(260-270)/90/np.sqrt(18))
t_score = -0.471
stats.t.cdf(t_score, df = 17)
0.32 = 32%