Basic Statisticks 1 - Assignment - Vivek T
Basic Statisticks 1 - Assignment - Vivek T
Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Ratio
Weight Ordinal
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Ratio
Height Ordinal
Type of living accommodation Nominal
Level of Agreement Ordinal
IQ(Intelligence Scale)
Sales Figures Ratio
Blood Group Nominal
Time Of Day Interval
Time on a Clock with Hands Ratio
Number of Children Nominal
Religious Preference
Barometer Pressure Ratio
SAT Scores
Years of Education Ordinal
Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
3/8
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1 Ans: 0
b) Less than or equal to 4 Ans: 6/36=1/6
c) Sum is divisible by 2and 3 Ans: 18/36+12/36=30/36=5/6
Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?
Ans: probability that none of the balls drawn is blue is 5C2/7c2=10/21
Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children(ignoring the nature of
the child-Generalized view)
CHILD Candies count(x) Probability p(x)
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
Ans: the Expected number of candies for a randomly selected child is
Summation(x*p(x)) =1*0.015+4*0.20+3*0.65+5*0.005+6*0.01+2*0.120=3.09
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Ans:, Using R
mean(Q7$Weigh)
[1] 17.84875
> mode(Q7$Weigh)
[1] "numeric"
median(Q7$Weigh)
[1] 17.71
> range(Q7$Weigh)
[1] 14.5 22.9
> var(Q7$Weigh)
[1] 3.193166
> sd(Q7$Weigh)
[1] 1.786943
> mean(Q7$Points)
[1] 3.596563
> mode(Q7$Points)
[1] "numeric"
> median(Q7$Points)
[1] 3.695
> range(Q7$Points)
[1] 2.76 4.93
> var(Q7$Points)
[1] 0.2858814
> sd(Q7$Points)
[1] 0.5346787
mean(Q7$Score)
[1] 3.21725
> mode(Q7$Score)
[1] "numeric"
> median(Q7$Score)
[1] 3.325
> range(Q7$Score)
[1] 1.513 5.424
> var(Q7$Score)
[1] 0.957379
> sd(Q7$Score)
[1] 0.9784574
> skewness(Q9$dist)
[1] 0.7824835 postive skewnewss means right skew i.e data distributed on left
> kurtosis(Q9$dist)
[1] 3.248019 positive kurtosis# data distribution is wide not peak
b.SP and Weight(WT)
Ans:
> skewness(Q9_b$SP)
[1] 1.581454 postive so SP is Right skewness
> kurtosis(Q9_b$SP)
[1] 5.723521 positive# data is high peak
> skewness(Q9_b$WT)
[1] -0.6033099 negative so WT is left skewness
> kurtosis(Q9_b$WT)
[1] 3.819466 positive data is high peak
Postive skewness
Ans: we don’t have the standard deviation for population .So we have to
use the T-distribution to determine the CI of the given data
𝑋̅ = 200 𝑝𝑜𝑢𝑛𝑑𝑠, S = 30 pounds, n = 2000
𝑆
𝑋̅ ± 𝑡1−𝛼,𝑛−1
√𝑛
Confidence interval for 94%: using R getting the
R code : qt(0.97,1999) = 1.88
Substituting values in the equation
30
200 ± 1.88
√2000
Hence the confidence interval for 94% is [198, 201]
34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean,median,variance,standard deviation?
Ans: Mean=41, Median=40.5, Variance=25.52941, SD=5.052664
2)What can we say about the student marks?
Q18) Answer the below questions using the below boxplot visualization.
Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
Both the plots infer that their data is normally distributed.
We can say that box plot 1 is for sample distribution and box plot 2 is for
population or a sample with larger size.
No outliers
Q1 is 25%,Q3=75%.IQR is 50% for both the box plots . so we can say both
the distributions follow normal distribution i.e mean=median=mode.
Q 20) Calculate probability from the given dataset for the below cases
Hint:
rcodept(tscore,df)