0% found this document useful (0 votes)
53 views

13.exploratory Data Analysis

The document discusses exploratory data analysis and provides 13 questions related to EDA concepts like skewness, kurtosis, outliers, distributions etc. It asks to calculate various statistical measures from given datasets and draw inferences by analyzing histograms, boxplots and other visualizations. R and Python codes are required wherever applicable.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

13.exploratory Data Analysis

The document discusses exploratory data analysis and provides 13 questions related to EDA concepts like skewness, kurtosis, outliers, distributions etc. It asks to calculate various statistical measures from given datasets and draw inferences by analyzing histograms, boxplots and other visualizations. R and Python codes are required wherever applicable.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Exploratory Data Analysis

Instructions:
Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.

Please ensure you update all the details:


Name: ______Pranjal Kumar___________________
Batch Id: ____DSWEMOB 030421___________________
Topic: Exploratory Data Analysis

Problem Statements:
Q1) Calculate Skewness, Kurtosis using R/Python code & draw inferences on the following data.
Hint: [Insights drawn from the data such as data is normally distributed/not, outliers, measures
like mean, median, mode, variance, std. deviation]
a. Cars speed and distance

Ans-Calculation based on excel data set attach with assignment

Speed <- read.csv(file.choose())


attach(Speed)
mean(dist)
median(dist)
skewness(dist)
kurtosis(dist)
var(dist) # variance
sd(dist) #standard deviation
hist(dist) #histogram

boxplot(dist) #boxplot

© 2013 - 2020 360DigiTMG. All Rights Reserved.


> mean(dist)
[1] 42.98
> median(dist)
[1] 36
> skewness(dist)
[1] 0.7824835
> kurtosis(dist)
[1] 3.248019
> var(dist) # variance
[1] 664.0608
> sd(dist) #standard deviation
[1] 25.76938

© 2013 - 2020 360DigiTMG. All Rights Reserved.


© 2013 - 2020 360DigiTMG. All Rights Reserved.
b. Top Speed (SP) and Weight (WT)

Weight <- read.csv(file.choose())

© 2013 - 2020 360DigiTMG. All Rights Reserved.


attach(Weight)
mean(WT)
median(WT)
skewness(WT)
kurtosis(WT)
var(WT) # variance
sd(WT) #standard deviation
hist(WT) #histogram
boxplot(WT) #boxplot

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Q2) Draw inferences about the following boxplot & histogram.
Hint: [Insights drawn from the plots about the data such as whether data is normally
distributed/not, outliers, measures like mean, median, mode, variance, std. deviation]

Sol:
 It is not normally distributed .
 It is positively skewed because it has long tail to words right and it is a uni-model

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Sol:
 It is not normally distributed
 it is positively skewed
 There are outliers in the data.
 The median is not in middle it is nearer to Q1

Q3) Below are the scores obtained by a student in tests


34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.
2) What can we say about the student marks? [Hint: Looking at the various measures
calculated above whether the data is normal/skewed or if outliers are present].
Ans:- Marks <- read.csv(file.choose())
attach(Marks)
mean(Marks$Mks.)
median(Mks.)
skewness(Mks.)
kurtosis(Mks.)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


var(Mks.) # variance
sd(Mks.) #standard deviation
hist(Mks.) #histogram
> mean(Marks$Mks.)
[1] 41

> median(Mks.)
[1] 40.5
> skewness(Mks.)
[1] 1.542885
> kurtosis(Mks.)
[1] 5.621631
> var(Mks.) # variance
[1] 25.52941
> sd(Mks.) #standard deviation
[1] 5.052664
> hist(Mks.) #histogram

© 2013 - 2020 360DigiTMG. All Rights Reserved.


 Positively skewed
 Outlier present

Q5) What is the nature of skewness when mean, median of data is equal?
Ans-Normally distributed

Q6) What is the nature of skewness when mean > median?


Ans-Positive skewness
Q7) What is the nature of skewness when median > mean?
Ans-Negative skewness
Q8) What does positive kurtosis value indicates for a data?
Ans- Positive values of kurtosis indicate that a distribution is peaked and possess thick tails
Q9) What does negative kurtosis value indicates for a data?

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Ans- Negative values of kurtosis indicate that a distribution is flat and has thin tails.
Q10) Answer the below questions using the below boxplot visualization.

What can we say about the distribution of the data?


Ans Distribution of data is Skewed
What is nature of skewness of the data?
Ans Negative Skewness
What will be the IQR of the data (approximately)?
Ans-8

Q11) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect Boxplot 2.
Hint: [On comparing both the plots, and check if the data is normally distributed/not, outliers
present, skewness etc.]

© 2013 - 2020 360DigiTMG. All Rights Reserved.


 Both box plot normally distributed
 Both have same median point
 No outliers present in both of case

Q12)

Answer the following three questions based on the boxplot above.


(i) What is inter-quartile range of this dataset? [Hint: IQR = Q3 – Q1]
In one line, explain what this value implies. (Hint: Based on IQR definition)
7
(ii) What can we say about the skewness of this dataset?
Positive skewed
(iii) If it were found that the data point with the value 25 is 2.5, how would the new
boxplot be affected?
(Hint: On changing the data point from 25 to 2.5 in the data, how is it different from the
current one.)

Q13)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Answer the following three questions based on the histogram above.
(i) Where would the mode of this dataset lie? Hint: [In terms of values On Y-axis]
On 20 (In terms of values On Y-axis)
(ii) Comment on the skewness of the dataset
Positive skewness
(iii) Suppose that the above histogram and the boxplot in question 2 are plotted for
the same dataset. Explain how these graphs complement each other in providing
information about any dataset. Hint: [Visualizing both the plots, draw the
insights]

 The data is not normally distributed.


 The data is positively skewed.
 The data contains outliers.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Hints:
For each assignment, the solution should be submitted in the below format
1. Research and Perform all possible steps for obtaining solution
2.
3. For Statistics calculations, explanation of the solutions should be documented in black and
white along with the codes.
Must follow these guidelines:
3.1. Be thorough with the concepts of Probability, Central Limit Theorem and Perform the
calculation stepwise
3.2. For True/False Questions, or short answer type questions explanation is must

3.3. R & Python code for Univariate Analysis (histogram, box plot, bar plots etc.) the data
distribution to be attached
4. All the codes (executable programs) should execute without errors
5. Code modularization should be followed
6. Each line of code should have comments explaining the logic and why you are using that

© 2013 - 2020 360DigiTMG. All Rights Reserved.

You might also like