0% found this document useful (0 votes)
14 views22 pages

DS Lecture 3b Graphical Displays

The document discusses various graphical displays used in data science, including box plots, histograms, scatter plots, and quantile plots. It explains how to create box plots, detect outliers using the interquartile range (IQR), and provides examples using datasets like the IRIS dataset. Additionally, it highlights insights gained from these visualizations, such as correlations between different features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views22 pages

DS Lecture 3b Graphical Displays

The document discusses various graphical displays used in data science, including box plots, histograms, scatter plots, and quantile plots. It explains how to create box plots, detect outliers using the interquartile range (IQR), and provides examples using datasets like the IRIS dataset. Additionally, it highlights insights gained from these visualizations, such as correlations between different features.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 22

Data Science

Graphical Displays
Box Plot, Histogram, Line Chart, Parallel Coordinate Plot and
Scatter Plot

Lecture No. 3b

National University of Computer and Emerging Sciences,


Lahore
Graphic Displays of Basic Statistical Descriptions

 Boxplot: graphic display of five-number summary


 Histogram: x-axis are values, y-axis repres. frequencies

 Quantile plot: each value xi is paired with fi indicating


that approximately 100 fi % of data are ≤ xi

 Quantile-quantile (q-q) plot: graphs the quantiles of


one univariant distribution against the corresponding
quantiles of another
 Scatter plot: each pair of values is a pair of coordinates
and plotted as points in the plane
Box and Whisker Plot (aka Boxplot)
Boxplot is helpful in a lot An extreme value that is at
of ways! abnormal distance from other
values.
Can be used to find
IQR
Outliers! What distance is
What is an outlier? abnormal?

MIN Q1 Media Q3 MAX


n
Outlier Detection with IQR

IQR

MIN Q1 Media Q3 MAX


n

Useful Article:
https://www.thoughtco.com/what-is-the-interquartile-range-rule-3126244
Boxplots with Outliers

Drawing a Box Plot.
Example 1: Draw a Box plot for the data below

Q1 Q2 Q3

4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10,
12
Lower Upper
Media
Quartil Quartil
n=8
e = 5½ e=9
Drawing a Box Plot.
Example 1: Draw a Box plot for the data below

Q1 Q2 Q3

4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10,
12
Lower Upper
Media
Quartil Quartil
n=8
e = 5½ e=9

4 5 6 7 8 9 10 11 12
Drawing a Box Plot.
Example 2: Draw a Box plot for the data below

Q1 Q2 Q3

3, 4, 4, 6, 8, 8, 8, 9, 10, 10,
15,
Lower Upper
Quartil Media Quartil
e=4 n=8 e = 10

3 4 5 6 7 8 9 10 11 12 13 14 15
Drawing a Box Plot.
Question: Stuart recorded the heights in cm of boys in
his class as shown below. Draw a box plot for this data.
QL Q2 Qu

137, 148, 155, 158, 165, 166, 166, 171, 171, 173, 175, 180, 184,
186, 186

Lower Upper
Quartil Media Quartil
e = 158 n= e = 180
171

130 140 150 160 170 180 cm 190


Drawing a Box Plot.
Question: Gemma recorded the heights in cm of girls in the same
class and constructed a box plot from the data. The box plots for both
boys and girls are shown below. Use the box plots to choose some
correct statements comparing heights of boys and girls in the class.
Justify your answers.
Boys

130 140 150 160 170 180 cm 190

Girls
1. The girls are taller on average. 2. The boys are taller on average.

3. The girls show less variability in height. 5. The smallest person is a girl.

4. The boys show less variability in height. 6. The tallest person is a boy.
Example
Boxplot for All Attributes of IRIS dataset
Insights from Boxplots of IRIS Dataset
 Setosa is having smaller feature and less
distributed
 Versicolor is distributed in a average manner and
average features
 Virginica is highly distributed with large no .of
values and features
 Clearly the mean/ median values are being shown
by each plots for various features(sepal length &
width, petal length & width)
Box Plot Example
18, 34, 76, 29, 15, 41, 46, 25, 54, 38, 20, 32, 43, 22
worksheet

Finding the median, quartiles and inter-quartile


range.
Example 1: Find the median and quartiles for the data
below.
12, 6, 4, 9, 8, 4, 9, 8, 5, 9, 8,
10

Example 2: Find the median and quartiles for the data


below.
6, 3, 9, 8, 4, 10, 8, 4, 15, 8,
10

Workshee
t1
Scatter plot
 Provides a first look at bivariate data to see clusters
of points, outliers, etc
 Each pair of values is treated as a pair of
coordinates and plotted as points in the plane
Positively and Negatively Correlated Data

 The left half fragment is positively


correlated
 The right half is negative correlated
Uncorrelated Data
Scatter Plot
Scatter Plot Matrix
 red = Satosa blue = Versicolor green = Virginica
Scatter Plot and Parallel Coordinates
Insights
 High correlation between petal length and
petal width columns.
 Setosa has both low petal length and width
 Versicolor has both average petal length and width
 Virginica has both high petal length and width.
 Sepal width for setosa is high and length is low.
 Versicolor have average values for sepal
dimensions.
 Virginica has small width but large sepal length
What is correlation?
Boxplot Analysis

 Five-number summary of a distribution


 Minimum, Q1, Median, Q3, Maximum
 Boxplot
 Data is represented with a box
 The ends of the box are at the first and
third quartiles, i.e., the height of the box
is IQR
 The median is marked by a line within the
box
 Whiskers: two lines outside the box
extended to Minimum and Maximum

You might also like