0% found this document useful (0 votes)

144 views8 pages

Aphical Representation

The document outlines instructions for a data visualization assignment, including requirements for submission and guidelines for creating various plots. It covers univariate, bivariate, and multivariate graphs, explaining the necessary plots and their interpretations. Additionally, it discusses skewness and probability distributions, emphasizing the relationships between mean, median, and mode in different distributions.

Uploaded by

cikihi9288

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

144 views8 pages

Aphical Representation

Uploaded by

cikihi9288

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

2a.

Graphical Representation
Instructions:
Please share your answers filled in-line in the word document. Submit code
separately wherever applicable.

Please ensure you update all the details:

Name: _Naveen M_____ Batch ID: _11/09/2023-10AM________
Topic: Data Visualization

Guidelines:
1. An assignment submission is considered complete only when the correct and executable code(s) is
submitted along with the documentation explaining the method and results. Failing to submit either
of those will be considered an invalid submission and will not be considered a correct submission.

2. Ensure that you submit your assignments correctly. Resubmission is not allowed.

3. Post the submission you can evaluate your work by referring to the keys provided. (will be available
only post the submission).

Hints: Follow CRISP-ML(Q) methodology steps, where were appropriate.

1. Data Understanding: work on each feature of the dataset to create a data
dictionary as displayed in the image below:

Make a table as shown above and provide information about the features such as its data
type and its relevance to the model building. And if not relevant, provide reasons and a
description of the feature.
Problem Statements:

1. Univariate plots for UNIV data (Plot must have Title, X & Y label)
A) Plot numerical column with 3 different plots ?
B) What are bin parameters? What are the methods to define the number of bins and
bin sizes ?

© 360DigiTMG. All Rights Reserved.

C) Why do density plots exceed the range values of the column ?
D) Plot categorical columns by taking unique values ?
ANS) A) #Required libraries
import pandas as pd
import matplotlib.pyplot as plt

#Reading the data

df=pd.read_csv('C:/Users/Naveen/Desktop/Data Preprocessing Dataset/education.csv')

#Plotting
plt.hist(df.workex)
plt.title('Histogram of numerical columns')
plt.xlabel('workex')
plt.show()

#Boxplot
plt.boxplot(df.workex)
plt.title('Boxplot of numerical columns')
plt.xlabel('workex')
plt.show()

© 360DigiTMG. All Rights Reserved.

#violinplot
plt.violinplot(df.workex)
plt.title('Violinplot of numerical columns')
plt.xlabel('workex')
plt.show()

B) Bins are used to divide the range of data into intervals in a histogram. Choosing an
appropriate number of bins and bin size is important for effectively visualizing the data. There
are several methods to determine the number of bins:
Square Root Choice:

Sturges' Formula

Scott's Rule:

Freedman-Diaconis Rule:

C) Why do density plots exceed the range values of the column?

Density plots, particularly those created using kernel density estimation (KDE), can extend
beyond the range of the data for visualization purposes. This is because the KDE is used to
estimate the underlying probability density function, and it can have tails that extend beyond
the observed data range. It's important to remember that the density plot doesn't represent
the actual data but rather provides a smoothed estimate of the data distribution.
The extension of the density plot beyond the range of the data is a feature of the KDE
smoothing process and is not necessarily an error. However, you can limit the x-axis range of
the plot to match the observed data range if you want to focus on the data within that range.
D) unique_values = categorical_data.unique()
value_counts = categorical_data.value_counts()

plt.bar(unique_values, value_counts)

© 360DigiTMG. All Rights Reserved.

plt.title("Bar Plot of Categorical Column")
plt.xlabel("Categories")
plt.ylabel("Counts")
plt.xticks(rotation=45) # Rotate x-axis labels for better readability
plt.show()
2. Bivariate graphs for UNIV data (Plot must be readable [use rotation], have all labels)
A) Plot 2 numerical columns with scatter plot [use grid] ?
B) 2 Different plots for plotting a numerical column with a categorical column (bar,
line) ?
C) How are bar plots different from histogram?
ANS) A) #Required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Reading the data

df=pd.read_csv('C:/Users/Naveen/Desktop/Data.csv')

#Plotting
plt.scatter(df.age,df.Salaries)
plt.title('Scatterplot of numerical columns')
plt.xlabel('age')
plt.ylabel('Salaries')
plt.grid()
plt.show()

B) #Bargraph
plt.bar(df.Salaries,df.Sex,width=1000,align='center')
plt.title('Bargraph of numerical and categorical columns')
plt.xlabel('Salaries')
plt.ylabel('Sex')
plt.show()

#line graph
sns.lineplot(df,x=df.Salaries,y=df.age)
C) The major difference between Bar Chart and Histogram is the bars of the bar chart are not
just next to each other. In the histogram, the bars are adjacent to each other. In statistics, bar
charts and histograms are important for expressing a huge or big number of data.

3. Plot multivariate graphs (correlation heatmap, pairplot)

A) Plot for only numerical data ?

B) Plot multivariate graphs for both numerical and categorical columns ?
C) What does it mean when a correlation value says 1? When it is negative? When it is
zero?
ANS) A) #Required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Reading the data

df=pd.read_csv('C:/Users/Naveen/Desktop/Data.csv')
numerical_column=df[['Salaries','age']]
numerical_column_cor=numerical_column.corr()
#Plotting
sns.heatmap(numerical_column_cor)

sns.pairplot(numerical_column)

B) #Required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Reading the data

df=pd.read_csv('C:/Users/Naveen/Desktop/Data.csv')
df_cor=df.corr()
#Plotting
sns.heatmap(df_cor)
sns.pairplot(df)
C) A correlation value of 1: This indicates a perfect positive correlation, meaning that the two
variables move in the same direction. When one variable increases, the other increases by the
same proportion, and when one decreases, the other decreases by the same proportion.
A negative correlation value (e.g., -1): This indicates a perfect negative correlation, meaning
that the two variables move in opposite directions. When one variable increases, the other
decreases by the same proportion, and vice versa.
A correlation value of 0: This means there is no linear relationship between the two variables.
They are not correlated. However, it's essential to note that while a correlation coefficient of 0
indicates no linear relationship, there could still be other types of relationships or associations
that are not captured by the correlation coefficient. Therefore, it's a good practice to explore
the data further.

4. Plot Skewness & Probability distribution for each column of marks data. (Hist, box, density)
A) What is normally distributed and What will be the relationship between mean,
median & mode ?
B) Which data variables are positively skewed and What will be the relationship
between mean, median & mode
C) What are negatively skewed/distributed and What will be the relationship between
mean, median & mode
D) What are the distinctive differences between skewness and distribution?

ANS) #Required libraries

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

#Reading the data
df=pd.read_csv('C:/Users/Naveen/Desktop/education.csv')

#Plotting
plt.hist(df[['workex','gmat']])
plt.legend(df)
plt.show()

plt.boxplot(df[['workex','gmat']])

sns.kdeplot(df[['workex','gmat']])

A) The normal distribution is
a symmetrical, bell-shaped distribution in
which the mean, median and mode are all equal.
B) In case of a positively skewed frequency distribution, the mean is
always greater than median and the median is always greater than the
mode
C) In case of a negatively skewed frequency distribution, the mean is
always lesser than median and the median is always lesser than the mode.
D) Skewness is a measure of the asymmetry in the distribution of data. It helps us
understand whether the data is skewed to the left (negatively skewed), skewed to the right
(positively skewed), or approximately symmetric (no skew).
Distribution, in statistics, refers to the way data values are spread or organized. It describes
the set of all possible values that a random variable can take and how often each value
occurs.

Illustrated Dictionary of Cyborg Anthropology Web
100% (3)
Illustrated Dictionary of Cyborg Anthropology Web
101 pages
Data Visualization Question Bank eDBDA Sept 21
No ratings yet
Data Visualization Question Bank eDBDA Sept 21
5 pages
Livegrade Pro Manual
No ratings yet
Livegrade Pro Manual
122 pages
Typing Lessons
No ratings yet
Typing Lessons
2 pages
Graphical Representation
No ratings yet
Graphical Representation
3 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
EDAV Manual With Code
No ratings yet
EDAV Manual With Code
70 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Unit 5
No ratings yet
Unit 5
25 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Lab Manual For Students
No ratings yet
Lab Manual For Students
38 pages
EDA Unit 3
No ratings yet
EDA Unit 3
64 pages
Data Visualization - PGDBDA - Feb 19
No ratings yet
Data Visualization - PGDBDA - Feb 19
11 pages
3 Data Description
No ratings yet
3 Data Description
87 pages
Lecture3 Classnotes
No ratings yet
Lecture3 Classnotes
31 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
46 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
1.1 Univariate Analysis: 1.1.1 Categorical Data
No ratings yet
1.1 Univariate Analysis: 1.1.1 Categorical Data
10 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Experiment No 9
No ratings yet
Experiment No 9
13 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Ad3301 Apr May 2024 Answer Key
No ratings yet
Ad3301 Apr May 2024 Answer Key
31 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Unit 2
No ratings yet
Unit 2
36 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
25 pages
Applied - Data - Science MODULE 3 SEM 8
No ratings yet
Applied - Data - Science MODULE 3 SEM 8
41 pages
5 - Data Summaries and Visualization
No ratings yet
5 - Data Summaries and Visualization
87 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Chart Conversion Describe
No ratings yet
Chart Conversion Describe
52 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
3 pages
19 Matplotlib
No ratings yet
19 Matplotlib
26 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Lecture 3
No ratings yet
Lecture 3
15 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Word File For Prob and Stats
No ratings yet
Word File For Prob and Stats
22 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
Algebra 1 Unit 6 Describing Data Notes
No ratings yet
Algebra 1 Unit 6 Describing Data Notes
13 pages
Unit 5
No ratings yet
Unit 5
10 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Assignment1 DataViz D
No ratings yet
Assignment1 DataViz D
7 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
Unit 3
No ratings yet
Unit 3
45 pages
21L-1803 Data Visual Assignment#3
No ratings yet
21L-1803 Data Visual Assignment#3
26 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
Decision Science: Ken Black
No ratings yet
Decision Science: Ken Black
296 pages
C5 - DSC551 - R Programming
No ratings yet
C5 - DSC551 - R Programming
30 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
SaaS Implementation Best Practices - v2
No ratings yet
SaaS Implementation Best Practices - v2
24 pages
Agile Methology
No ratings yet
Agile Methology
29 pages
Leoxsys - Wifi Usb Adaper - User Manual
No ratings yet
Leoxsys - Wifi Usb Adaper - User Manual
43 pages
Lecture 7: Least-Squares Problem: Convex Optimization
No ratings yet
Lecture 7: Least-Squares Problem: Convex Optimization
7 pages
Cisco Asa Firepower
No ratings yet
Cisco Asa Firepower
11 pages
2BN1 2BN2 2012
No ratings yet
2BN1 2BN2 2012
63 pages
Dev Guide
No ratings yet
Dev Guide
8 pages
7 Magnificent Tools of Quality
100% (1)
7 Magnificent Tools of Quality
31 pages
Formats
No ratings yet
Formats
14 pages
CE 212 Digital Systems Ch4
No ratings yet
CE 212 Digital Systems Ch4
37 pages
Srs - Lms (Final)
No ratings yet
Srs - Lms (Final)
15 pages
PROGRAM 24: C++ Program For Multilevel Inheritance
No ratings yet
PROGRAM 24: C++ Program For Multilevel Inheritance
23 pages
Book
No ratings yet
Book
162 pages
A7ph 206 1
No ratings yet
A7ph 206 1
7 pages
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
No ratings yet
2-DigitalOcean Invoice 2023 Sep (7467235-466314537)
2 pages
Quiz Let 464 Study Guide 2
No ratings yet
Quiz Let 464 Study Guide 2
17 pages
Mohamed - CV
No ratings yet
Mohamed - CV
2 pages
Literature Review Mobile Application Development
100% (1)
Literature Review Mobile Application Development
5 pages
Use of Automation Codecs Streaming Video Applications Based On Cloud Computing
No ratings yet
Use of Automation Codecs Streaming Video Applications Based On Cloud Computing
14 pages
Lecture 3 Slides
No ratings yet
Lecture 3 Slides
49 pages
Thomas Adrienne 2
No ratings yet
Thomas Adrienne 2
2 pages
The AI Marketing Canvas
No ratings yet
The AI Marketing Canvas
25 pages
OKI Printer Driver Compatibility and Schedule With Mac OS X 10.7 Lion
No ratings yet
OKI Printer Driver Compatibility and Schedule With Mac OS X 10.7 Lion
9 pages
One To One and Onto1
No ratings yet
One To One and Onto1
9 pages
.. Link Analysis Report: Site Information
No ratings yet
.. Link Analysis Report: Site Information
3 pages
Applied Ethics
No ratings yet
Applied Ethics
5 pages
Wire Color Code Charts
No ratings yet
Wire Color Code Charts
4 pages