100% found this document useful (7 votes)
2K views18 pages

Basic Statisticks 1 - Assignment - Vivek T

This document contains questions related to data types, probability, statistics, and data analysis. It includes questions about identifying data types, calculating probabilities, finding measures of central tendency and dispersion for datasets, checking for normal distributions, and interpreting boxplots and other visualizations. The questions cover concepts like discrete vs continuous data, nominal vs ordinal vs ratio variables, mean, median, mode, variance, standard deviation, skewness, kurtosis, and confidence intervals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (7 votes)
2K views18 pages

Basic Statisticks 1 - Assignment - Vivek T

This document contains questions related to data types, probability, statistics, and data analysis. It includes questions about identifying data types, calculating probabilities, finding measures of central tendency and dispersion for datasets, checking for normal distributions, and interpreting boxplots and other visualizations. The questions cover concepts like discrete vs continuous data, nominal vs ordinal vs ratio variables, mean, median, mode, variance, standard deviation, skewness, kurtosis, and confidence intervals.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Activity Data Type

Number of beatings from Wife Discrete


Results of rolling a dice Discrete
Weight of a person Continuous
Weight of Gold Continuous
Distance between two places Continuous
Length of a leaf Continuous
Dog's weight Continuous
Blue Color Categorical
Number of kids Discrete
Number of tickets in Indian railways Discrete
Number of times married Discrete
Gender (Male or Female) Nominal or categorical

Q1) Identify the Data type for the Following:

Q2) Identify the Data types, which were among the following
Nominal, Ordinal, Interval, Ratio.
Data Data Type
Gender Nominal
High School Class Ranking Ordinal
Celsius Temperature Ratio
Weight Ordinal
Hair Color Nominal
Socioeconomic Status Ordinal
Fahrenheit Temperature Ratio
Height Ordinal
Type of living accommodation Nominal
Level of Agreement Ordinal
IQ(Intelligence Scale)
Sales Figures Ratio
Blood Group Nominal
Time Of Day Interval
Time on a Clock with Hands Ratio
Number of Children Nominal
Religious Preference
Barometer Pressure Ratio
SAT Scores
Years of Education Ordinal

Q3) Three Coins are tossed, find the probability that two heads and one tail are
obtained?
3/8
Q4) Two Dice are rolled, find the probability that sum is
a) Equal to 1 Ans: 0
b) Less than or equal to 4 Ans: 6/36=1/6
c) Sum is divisible by 2and 3 Ans: 18/36+12/36=30/36=5/6

Q5) A bag contains 2 red, 3 green and 2 blue balls. Two balls are drawn at
random. What is the probability that none of the balls drawn is blue?
Ans: probability that none of the balls drawn is blue is 5C2/7c2=10/21

Q6) Calculate the Expected number of candies for a randomly selected child
Below are the probabilities of count of candies for children(ignoring the nature of
the child-Generalized view)
CHILD Candies count(x) Probability p(x)
A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.120
Child A – probability of having 1 candy = 0.015.
Child B – probability of having 4 candies = 0.20
Ans: the Expected number of candies for a randomly selected child is
Summation(x*p(x)) =1*0.015+4*0.20+3*0.65+5*0.005+6*0.01+2*0.120=3.09
Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, Range &
comment about the values / draw inferences, for the given dataset
- For Points,Score,Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range
and also Comment about the values/ Draw some inferences.
Ans:, Using R
mean(Q7$Weigh)
[1] 17.84875
> mode(Q7$Weigh)
[1] "numeric"
median(Q7$Weigh)
[1] 17.71
> range(Q7$Weigh)
[1] 14.5 22.9
> var(Q7$Weigh)
[1] 3.193166
> sd(Q7$Weigh)
[1] 1.786943

> mean(Q7$Points)
[1] 3.596563
> mode(Q7$Points)
[1] "numeric"
> median(Q7$Points)
[1] 3.695
> range(Q7$Points)
[1] 2.76 4.93
> var(Q7$Points)
[1] 0.2858814
> sd(Q7$Points)
[1] 0.5346787

mean(Q7$Score)
[1] 3.21725
> mode(Q7$Score)
[1] "numeric"
> median(Q7$Score)
[1] 3.325
> range(Q7$Score)
[1] 1.513 5.424

> var(Q7$Score)
[1] 0.957379

> sd(Q7$Score)
[1] 0.9784574

Q8) Calculate Expected Value for the problem below


a) The weights (X) of patients at a clinic (in pounds), are
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected
Value of the Weight of that patient?
Ans: The expected value of the weight of a patient chosen at random is
mean of all the patients’ i.e
145.34 pounds
Q9)
a.Calculate Skewness, Kurtosis & draw inferences on the following data
Cars speed and distance
Ans: Using Moments packages in R , found the skewness and Kurtosis
> Skewness (Q9$speed)
[1] -0.1139548 negative skewness means left skew i.e. data distributed on
right side
>histo(Q9$speed)
> kurtosis(Q9$speed)
[1] 2.422853 positive kurtosis # data distribution is wide not peak

> skewness(Q9$dist)
[1] 0.7824835 postive skewnewss means right skew i.e data distributed on left

> kurtosis(Q9$dist)
[1] 3.248019 positive kurtosis# data distribution is wide not peak
b.SP and Weight(WT)

Ans:
> skewness(Q9_b$SP)
[1] 1.581454 postive so SP is Right skewness
> kurtosis(Q9_b$SP)
[1] 5.723521 positive# data is high peak
> skewness(Q9_b$WT)
[1] -0.6033099 negative so WT is left skewness

> kurtosis(Q9_b$WT)
[1] 3.819466 positive data is high peak

Q10) Draw inferences about the following boxplot & histogram


Ans:

50-100 weight having more frequency 180

350-400 weight having very less frequency 5

Postive skewness

Data is right skewed

Data is not a normal distribution

0-50 weight having 80 freuency

100-150 weight having 120 freuency


Ans:

 7 Outliers are present in above box plot


 Positive skewness .i.e. data is right skewed
 DATA is not normally distributed
 Q1 is smaller than the Q3

Q11) Suppose we want to estimate the average weight of an adult male in


Mexico. We draw a random sample of 2,000 men from a population of
3,000,000 men and weigh them. We find that the average person in our
sample weighs 200 pounds, and the standard deviation of the sample is 30
pounds. Calculate 94%,98%,96% confidence interval ?

Ans: we don’t have the standard deviation for population .So we have to
use the T-distribution to determine the CI of the given data
𝑋̅ = 200 𝑝𝑜𝑢𝑛𝑑𝑠, S = 30 pounds, n = 2000
𝑆
𝑋̅ ± 𝑡1−𝛼,𝑛−1
√𝑛
Confidence interval for 94%: using R getting the
R code : qt(0.97,1999) = 1.88
Substituting values in the equation
30
200 ± 1.88
√2000
Hence the confidence interval for 94% is [198, 201]

Confidence interval for 96%: using R getting the


R code : qt(0.98,1999) = 2.05
Substituting values in the equation
30
200 ± 2.05
√2000
Hence the confidence interval for 96% is [198.6, 201.3]
Confidence interval for 98%: using R getting the
R code : qt(0.99,1999) = 2.328
Substituting values in the equation
30
200 ± 2.328
√2000
Hence the confidence interval for 98% is [198.4, 201.4]

Q12)Below are the scores obtained by a student in tests

34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean,median,variance,standard deviation?
Ans: Mean=41, Median=40.5, Variance=25.52941, SD=5.052664
2)What can we say about the student marks?

Ans: Avg of student marks 41


The students markes range from 34 to 56
Mode is 41
Most of students score is bw 35 to 42
Q13) What is the nature of skewness when mean, median of data are equal?
Ans: When the values of mean, median and mode are equal, there is no skewness
also you can say the data is in normal distribution.
Q14) What is the nature of skewness when mean >median?
Ans: f the mean is greater than the median, the distribution is positively skewed
Q15) What is the nature of skewness when median > mean?
ANs: If the mean is less than the median, the distribution is negatively skewed
Q16) What does positive kurtosis value indicates for adata ?
Ans: A distribution with a positive kurtosis value indicates that the distribution has heavier tails than the
normal distribution.

Q17) What does negative kurtosis value indicates for a data?


Ans: A negative kurtosis means that your distribution is flatter than a normal curve with the
same mean and standard deviation.

Q18) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


 No outliers
 Q1 greater than Q3
 Median between 15 to 16
 Most of data present in range of 10 to 18
 Not following normal distribution
 Left skewness of data
What is nature of skewness of the data? Ans: Left skewness
What will be the IQR of the data (approximately)? ANs: IQR=18-10=8

Q19) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect
Boxplot 2.
 Both the plots infer that their data is normally distributed.
 We can say that box plot 1 is for sample distribution and box plot 2 is for
population or a sample with larger size.
 No outliers
 Q1 is 25%,Q3=75%.IQR is 50% for both the box plots . so we can say both
the distributions follow normal distribution i.e mean=median=mode.
Q 20) Calculate probability from the given dataset for the below cases

Data _set: Cars.csv


Calculate the probability of MPG ofCars for the below cases.
MPG<- Cars$MPG
a. P(MPG>38)
b. P(MPG<40)
c. P (20<MPG<50)
provide me explanation

Q 21) Check whether the data follows normal distribution


a) Check whether the MPG of Cars follows Normal Distribution
Dataset: Cars.csv
Ans: MPG of cars not following the normal distribution
b) Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist)
fromwc-at data set follows Normal Distribution
Dataset: wc-at.csv
Ans: Variable Waist circumference(waist) does not follow normal
distribution.

Variable 'AT' adipose tissue follow normal distribution.


Q 22) Calculate the Z scoresof 90% confidence interval,94% confidence
interval, 60% confidence interval
Ans:
qnorm(0.95) #Z score for 90% confidence interval is 1.64485
qnorm(0.97) #Z score for 94% confidence interval is 1.8807
qnorm(0.80) #Z score for 60% confidence interval is 0.8416

Q 23) Calculate the t scores of 95% confidence interval, 96% confidence


interval, 99% confidence interval for sample size of 25
Ans:
qt(0.975,24) #t score fro 95% confidence interval is 2.0638
qt(0.98,24) #t score fro 96% confidence interval is 2.171
qt(0.995,24) #t score fro 99% confidence interval is 2.2.796
Q 24)A Government companyclaims that an average light bulb lasts 270
days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs
last an average of 260 days, with a standard deviation of 90 days. If the
CEO's claim were true, what is the probability that 18 randomly selected
bulbs would have an average life of no more than 260 days

Hint:

rcodept(tscore,df)

df degrees of freedom

You might also like