0% found this document useful (0 votes)
6 views9 pages

2a EDA

The document outlines an assignment on Exploratory Data Analysis, requiring detailed in-line answers and executable code submissions. It includes various statistical problems involving calculations of mean, median, mode, variance, and probabilities related to datasets and scenarios. Guidelines emphasize the importance of thorough documentation and adherence to submission protocols.

Uploaded by

cikihi9288
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views9 pages

2a EDA

The document outlines an assignment on Exploratory Data Analysis, requiring detailed in-line answers and executable code submissions. It includes various statistical problems involving calculations of mean, median, mode, variance, and probabilities related to datasets and scenarios. Guidelines emphasize the importance of thorough documentation and adherence to submission protocols.

Uploaded by

cikihi9288
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

2a.

Exploratory Data Analysis


Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:


Name: _Naveen M____________ Batch ID: _11/09/2023-10AM__________
Topic: Exploratory Data Analysis

Guidelines:
1. An assignment submission is considered complete only when the correct and executable
code(s) is submitted along with the documentation explaining the method and results.
Failing to submit either of those will be considered an invalid submission and will not be
considered a correct submission.
2. Ensure that you submit your assignments correctly. Resubmission is not allowed.
3. Post the submission you can evaluate your work by referring to the keys provided. (will
be available only post the submission).

Hints: Follow CRISP-ML(Q) methodology steps, where were appropriate.


1. Data Understanding: work on each feature of the dataset to create a data
dictionary as displayed in the image below:

Make a table as shown above and provide information about the features such as its data
type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.

Problem Statements:
Q1) Calculate Mean, and Standard Deviation using Python code & draw inferences on the
following data. Refer to the Datasets attachment for the data file.
Hint: [Insights drawn from the data such as data is normally distributed/not, outliers, measures
like mean, median, mode, variance, std. deviation]
a. Car’s speed and distance

© 360DigiTMG. All Rights Reserved.


Skewness(speed)= -0.895425 Skewness(dist)= 1.290763

By seeing the above plot, we can say that speed is slightly negatively
skewed and whereas distance is positively skewed.
Speed can be said platykurtic and dist is strongly leptokurtic
By the above Histogram we can say that the data is not symmetrically
distributed.

Ans) Mean(speed)= 11.407407 Mean(dist)= 27.666667


Median(speed)=12.0 Median(dist)=26.0
Variance(speed)= 10.404558 Variance(dist)= 291.153846
Std.deviation(speed)= 3.225610 Std.deviation(dist)= 17.063231

© 360DigiTMG. All Rights Reserved.


b. Top Speed (SP) and Weight (WT)
Skewness: -SP 1.611450 WT -0.614753

We can say that the speed is positively skewed as most of the data is
on the left side, whereas weight is slightly negatively skewed

Mean: - SP-121.540272 WT-32.412577


Median: -SP-118.208698 WT-32.734518
Std.Deviation: -SP-14.181432 WT-7.492813
Variance: -SP-201.113002 WT-56.142247

© 360DigiTMG. All Rights Reserved.


Q2) Below are the scores obtained by a student on tests.
34, 36, 36, 38, 38, 39, 39, 40, 40, 41, 41, 41, 41, 42, 42, 45, 49, 56
1) Find the mean, median and mode, variance, and standard deviation.
2) What can we say about the student marks?
3) What can you say about the Excepted value for the student score?
ANS) Mean=41.0, Median=40.5, Mode=41.0, Std=5.052664, Var= 25.529412

 from above plot we can say that mean of marks of student is 41 which is slightly
greater than median.
 Most of the students got marks in between 41-42, there are two outlier 49,56.

Q3) Three Coins are tossed, find the probability that two heads and one tail are obtained.
ANS) When three coins are tossed the total number of possible combinations are 2 3 = 8.

These combinations are HHH, HHT, HTH, THH, TTH, THT, HTT, TTT. The number of
combinations which have two heads and one tail are: HHT, HTH, TTH which makes them 3 in
number. The Probability of getting two heads and one tails in the toss of three coins
simultaneously is 3/8 or 0.375.
Q4) Two Dice are rolled, find the probability that the sum is
a) Equal to 1
b) Less than or equal to 4
c) Sum is divisible by 2 and 3
ANS) total possible outcome =62=36=62=36
a) favourable outcome (sum equal to 1) = 0 {i.e., not possible that sum always exceed to 1
Required probability=0/36 =0
b) favourable outcome (sum equal to 4) = 3 {i.e. (1,3)(2,2)(3,1)}
required probability =3/36 =1/12
c) favorable outcome (sum divisible by 2 and 3) =6
required outcome=6/36=1/6

© 360DigiTMG. All Rights Reserved.


Q5) A bag contains 2 red, 3 green, and 2 blue balls. Two balls are drawn at random. What is the
probability that none of the balls drawn is blue?
ANS) P (2R, 3G, 2B)
P (5/7, 4/6) = 20/42 I.e., 10/21

Q6) Calculate the Expected number of candies for a randomly selected child:
Below are the probabilities of the count of candies for children (ignoring the nature of the child-
Generalized view)
i. Child A – the probability of having 1 candy is 0.015.
ii. Child B – the probability of having 4 candies is 0.2.

CHILD Candies count Probability


A 1 0.015
B 4 0.20
C 3 0.65
D 5 0.005
E 6 0.01
F 2 0.12
ANS) Expected number of candies for randomly selected child = 1*0.015+ 4*0.20+3*0.65+
5*0.005+ 6*0.01 +2*0.120
Expected number of candies for randomly selected child =3.09

Q7) Calculate Mean, Median, Mode, Variance, Standard Deviation, and Range & comment
about the values / draw inferences, for the given dataset.
- For Points, Score, Weigh>
Find Mean, Median, Mode, Variance, Standard Deviation, and Range and comment on the
values/ Draw some inferences.

Dataset: Refer to Hands-on Material in LMS - Data Types EDA assignment snapshot of the
dataset is given above.

© 360DigiTMG. All Rights Reserved.


ANS) Points: Mean =3.596563, Median= 3.695, Mode= “numeric”, Variance= 0.2858814, Standard
deviation= 0.5346787.
Score: Mean= 3.21725, Median= 3.325, Mode= “numeric”, Variance= 0.957379, Standard deviation=
0.9784574
Note: Mean value are closer for both ‘Point’ and ‘Score’.
Weight: Mean= 17.84875, Median= 17.71, Mode= “numeric”, Variance= 3.193166, Standard
deviation= 1.786943.

Q8) Calculate the Expected Value for the problem below.


a) The weights (X) of patients at a clinic (in pounds), are.
108, 110, 123, 134, 135, 145, 167, 187, 199
Assume one of the patients is chosen at random. What is the Expected Value of the
Weight of that patient?
ANS) p(x)=1/9=0.11
Expected Value=∑[x.p(x)]
(108*0.11+110*0.11+123*0.11+134*0.11+135*0.11+145*0.11+167*0.11+187*0.11+199*0.11)
Expected value=143.88
Q9) Look at the data given below. Plot the data, find the outliers, and find out: μ , σ , σ 2
Hint: [Use a plot that shows the data distribution, and skewness along with the outliers; also use
Python code to evaluate measures of centrality and spread]

Name of company Measure X


Allied Signal 24.23%
Bankers Trust 25.53%
General Mills 25.41%
ITT Industries 24.14%
J.P.Morgan & Co. 29.62%
Lehman Brothers 28.25%
Marriott 25.81%
MCI 24.39%
Merrill Lynch 40.26%
Microsoft 32.95%
Morgan Stanley 91.36%
Sun Microsystems 25.99%
Travelers 39.42%
US Airways 26.71%
Warner-Lambert 35.00%

ANS) Mean=33.27133333333333
Std= 16.945400921222028
Variance=287.1466123809524

© 360DigiTMG. All Rights Reserved.


In the above plot we can say that the median value is near to 26.5 and we can say the data
is positively skewed and it’s an asymmetrical distribution

Q10) AT&T was running commercials in 1990 aimed at luring back customers who had
switched to one of the other long-distance phone service providers. One such commercial shows
a businessman trying to reach Phoenix and mistakenly getting Fiji, where a half-naked native on
a beach responds incomprehensibly in Polynesian. When asked about this advertisement, AT&T
admitted that the portrayed incident did not actually take place but added that this was an
enactment of something that “could happen.” Suppose that one in 200 long-distance telephone
calls is misdirected.

What is the probability that at least one in five attempted telephone calls reaches the wrong
number? (Assume independence of attempts.)
Hint: [Using the Probability formula evaluate the probability of one call being wrong out of five
attempted calls]
ANS) IF 1 in 200 long-distance telephone calls are getting misdirected.
probability of call misdirecting = 1/200 Probability of call not Misdirecting = 1-1/200 = 199/200
The probability for at least one in five attempted telephone calls reaches the wrong number
Number of Calls = 5 n = 5 p = 1/200 q = 199/200 P(x) = at least one in five attempted telephone
calls reaches the wrong number P(x) = ⁿCₓ pˣ qⁿ⁻ˣ P(x) = (nCx) (p^x) (q^n-x) # nCr = n! / r! * (n -
r)! P(1) = (5C1) (1/200)^1 (199/200)^5-1 P(1) = 0.0245037

Q11) Returns on a certain business venture, to the nearest $1,000, are known to follow the
following probability distribution.
X P(x)

-2,000 0.1

-1,000 0.1

© 360DigiTMG. All Rights Reserved.


0 0.2

1000 0.2

2000 0.3

3000 0.1

(i) What is the most likely monetary outcome of the business venture?
Hint: [The outcome is most likely the expected returns of the venture]

(ii) Is the venture likely to be successful? Explain.


Hint: [Probability of % of the venture being a successful one]

(iii) What is the long-term average earning of business ventures of this kind? Explain.
Hint: [Here, the expected return to the venture is considered as the
required average]

(iv) What is a good measure of the risk involved in a venture of this kind? Compute
this measure.
Hint: [Risk here stems from the possible variability in the expected returns,
therefore, name the risk measure for this venture]

ANS)
(i) most likely monetary outcome of the business venture is $ 2000 as it has maximum
Probability 0.3
(ii) Venture is successful if X is + ve hence if X is 1000 , 2000 or 3000 probability is 0.2 + 0.3 +
0.1 = 0.6

as 0.6 > 0.5 Hence venture likely to be successful

(iii) long-term average earning of business ventures = E(X)


E(X) = ∑ X.P(X) = $ 800

(iv) Risk involved in a venture

Var (X) = E(X²) - { E(X) }²

= 2800000 - 800²

= 2160000 ( Quite High)

SD = √Var ≈ $ 1470

© 360DigiTMG. All Rights Reserved.


As Variability is Quite high hence Risk is high

Hints:
For each assignment, the solution should be submitted in the below format.
1. Research and Perform all possible steps for obtaining the solution.
2. For Statistics calculations, an explanation of the solutions should be documented in detail
along with codes. Use the same word document to fill in your explanation.
Must follow these guidelines:
2.1 Be thorough with the concepts of Probability, Probability Distributions, Business
Moments, and Univariate & Bivariate visualizations.
2.2 For True/False Questions, or short answer type questions explanation is a must.
2.3 Python code for Univariate Analysis (histogram, box plot, bar plots, etc.) the data
distribution is to be attached.
3. All the codes (executable programs) should execute without errors
4. Code modularization should be followed
5. Each line of code should have comments explaining the logic and why you are using that
function

© 360DigiTMG. All Rights Reserved.

You might also like