0% found this document useful (0 votes)
35 views97 pages

DAY6 Pandas Seaborn

The document provides an overview of data analysis and visualization using Python, focusing on the NumPy and Pandas libraries. It explains the importance of NumPy for numerical computing, detailing its features like ndarray, array operations, and data types, as well as the advantages of using NumPy arrays over Python lists. Additionally, it introduces Pandas as a high-level library for data manipulation, describing its data structures, such as Series and DataFrame, and their functionalities.

Uploaded by

goldivijaybala51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views97 pages

DAY6 Pandas Seaborn

The document provides an overview of data analysis and visualization using Python, focusing on the NumPy and Pandas libraries. It explains the importance of NumPy for numerical computing, detailing its features like ndarray, array operations, and data types, as well as the advantages of using NumPy arrays over Python lists. Additionally, it introduces Pandas as a high-level library for data manipulation, describing its data structures, such as Series and DataFrame, and their functionalities.

Uploaded by

goldivijaybala51
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

Data Analysis and

Visualisation with
Python
Numerical Python (NumPy)
• NumPy is the most foundational package for numerical computing in Python.
• If you are going to work on data analysis or machine learning projects, then
having a solid understanding of NumPy is nearly mandatory.
• Indeed, many other libraries, such as pandas and scikit-learn, use NumPy’s
array objects as the lingua franca for data exchange.
• One of the reasons as to why NumPy is so important for numerical
computations is because it is designed for efficiency with large arrays of data.
The reasons for this include:
- It stores data internally in a continuous block of memory,
independent of other in-built Python objects.
- It performs complex computations on entire arrays without the
need for for loops.
What you’ll find in NumPy
• ndarray: an efficient multidimensional array providing fast array-orientated
arithmetic operations and flexible broadcasting capabilities.
• Mathematical functions for fast operations on entire arrays of data without
having to write loops.
• Tools for reading/writing array data to disk and working with memory-
mapped files.
• Linear algebra, random number generation, and Fourier transform
capabilities.
• A C API for connecting NumPy with libraries written in C, C++, and
FORTRAN. This is why Python is the language of choice for wrapping legacy
codebases.
The NumPy ndarray: A multi-
dimensional array object
• The NumPy ndarray object is a fast and flexible container for large
data sets in Python.
• NumPy arrays are a bit like Python lists, but are still a very different
beast at the same time.
• Arrays enable you to store multiple items of the same data type. It is
the facilities around the array object that makes NumPy so convenient
for performing math and data manipulations.
Ndarray vs. lists
• By now, you are familiar with Python lists and how incredibly useful
they are.
• So, you may be asking yourself:

“I can store numbers and other objects in a Python list and do all sorts
of computations and manipulations through list comprehensions, for-
loops etc. What do I need a NumPy array for?”

• There are very significant advantages of using NumPy arrays overs


lists.
Creating a NumPy array
• To understand these advantages, lets create an array.
• One of the most common, of the many, ways to create a NumPy array
is to create one from a list by passing it to the np.array() function.

In Ou
: t:
Differences between lists and
ndarrays
• The key difference between an array and a list is that arrays are
designed to handle vectorised operations while a python lists are not.
• That means, if you apply a function, it is performed on every item in
the array, rather than on the whole array object.
• Let’s suppose you want to add the number 2 to every item in the list.
The intuitive way to do this is something like this:

In Ou
: t:

• That was not possible with a list, but you can do that on an array:

In Ou
: t:
• It should be noted here that, once a Numpy array is created, you
cannot increase its size.
• To do so, you will have to create a new array.
Create a 2d array from a list of list
• You can pass a list of lists to create a matrix-like a 2d array.

In
Ou
:
t:
The dtype argument
• You can specify the data-type by setting the dtype() argument.
• Some of the most commonly used NumPy dtypes are: float, int,
bool, str, and object.

In
Ou
:
t:
The astype argument
• You can also convert it to a different data-type using the astype method.

In Ou
: t:

• Remember that, unlike lists, all items in an array have to be of the same
type.
dtype=‘object’
• However, if you are uncertain about what data type your array will
hold, or if you want to hold characters and numbers in the same array,
you can set the dtype as 'object'.

In Ou
: t:
The tolist() function
• You can always convert an array into a list using the tolist()
command.

In Ou
: t:
Inspecting a NumPy array
• There are a range of functions built into NumPy that allow you to
inspect different aspects of an array:

In
: Ou
t:
Extracting specific items from an
array
• You can extract portions of the array using indices, much like when
you’re working with lists.
• Unlike lists, however, arrays can optionally accept as many parameters
in the square brackets as there are number of dimensions

In Ou
: t:
Boolean indexing
• A boolean index array is of the same shape as the array-to-be-filtered,
but it only contains TRUE and FALSE values.

In Ou
: t:
Pandas
• Pandas, like NumPy, is one of the most popular Python libraries for
data analysis.
• It is a high-level abstraction over low-level NumPy, which is written in
pure C.
• Pandas provides high-performance, easy-to-use data structures and
data analysis tools.
• There are two main structures used by pandas; data frames and
series.
Indices in a pandas series
• A pandas series is similar to a list, but differs in the fact that a series associates a label
with each element. This makes it look like a dictionary.
• If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging
from 0 to N-1.
• Each series object also has a data type.

In O
: ut
:
• As you may suspect by this point, a series has ways to extract all of
the values in the series, as well as individual elements by index.

In O
: ut
:

• You can also provide an index manually.


In
:
Ou
t:
• It is easy to retrieve several elements of a series by their indices or
make group assignments.

Ou
In t:
:
Filtering and maths operations
• Filtering and maths operations are easy with Pandas as well.

In O
: ut
:
Data Handling using Pandas
-1
Data Structures in Pandas
Two important data structures of pandas are–Series, DataFrame
1. Series
Series is like a one-dimensional array like structure with
homogeneous data.
For example, the following series is a collection of integers.

Basic feature of series are


 Homogeneous data
 Size Immutable
 Values of Data Mutable
Data Handling using Pandas
2. DataFrame
-1
DataFrame is a two-dimensional array with
like heterogeneous
data. SR. Admn
No. No
Student Name Class Section Gender Date Of
Birth
1 001284 NIDHI MANDAL I A Girl 07/08/2010
2 001285 SOUMYADIP I A Boy 24/02/2011
BHATTACHARYA
3 001286 SHREYAANG I A Boy 29/12/2010
SHANDILYA
Basic feature of DataFrame are
 Heterogeneous data
 Size Mutable
 Data Mutable
Pandas data frame
• Simplistically, a data frame is a table, with rows and columns.
• Each column in a data frame is a series object.
• Rows consist of elements inside series.

Case ID Variable one Variable two Variable 3


1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
Creating a Pandas data frame
• Pandas data frames can be constructed using Python dictionaries.
In
:

Ou
t:
• You can also create a data frame from a list.

In Ou
: t:
• You can ascertain the type of a column with the type() function.

In
:

Ou
t:
• A Pandas data frame object as two indices; a column index and row
index.
• Again, if you do not provide one, Pandas will create a RangeIndex
from 0 to N-1.
In
:

Ou
t:
• There are numerous ways to provide row indices explicitly.
• For example, you could provide an index when creating a data frame:

In Ou
: t:

• or do it during runtime.
• Here, I also named the index ‘country code’.
Ou
In t:
:
• Row access using index can be performed in several ways.
• First, you could use .loc() and provide an index label.

In Ou
: t:

• Second, you could use .iloc() and provide an index number

In Ou
: t:
• A selection of particular rows and columns can be selected this way.

In Ou
: t:

• You can feed .loc() two arguments, index list and column list, slicing
operation is supported as well:

In Ou
: t:
Data Handling using Pandas
-1
Pandas Series
It is like one-dimensional array capable of holding data
of any type (integer, string, float, python objects, etc.).
Series can be created using constructor.
Syntax :- pandas.Series( data, index, dtype, copy)
Creation of Series is also possible from – ndarray,
dictionary, scalar value.
Series can be created using
1. Array
2. Dict
3. Scalar value or constant
Data Handling using Pandas
-1
Pandas Series

Create an Empty Series

e.g.

import pandas as pseries


s = pseries.Series()
print(s)

Output
Series([], dtype: float64)
Data Handling using Pandas
Pandas Series
Create a Series from ndarray -1
Without index With index position
e.g. e.g.

import pandas as import pandas as p1


pd1 import numpy import numpy as
as np1 np1
data = data =
np1.array(['a','b','c','d' np1.array(['a','b','c','d'
]) ])
s = pd1.Series(data) s=
print(s) p1.Series(data,index=[1
00,101,102,103])
print(s)
Output
1 a
2 b Output
3 c 100 a
4 d 101 b
102c 103d
Data Handling using Pandas
-1
Pandas Series
Create a Series from dict

Eg.1(without index) Eg.2 (with index)


import pandas as import pandas as
pd1 import numpy pd1 import numpy
as np1 as np1
data = {'a' : 0., 'b' : 1., data = {'a' : 0., 'b' : 1.,
'c' : 2.} 'c' : 2.}
s = pd1.Series(data) s=
print(s) pd1.Series(data,index
=['b','c','d','a'])
Output print(s)
a
0.0 Output
b b
Data Handling using Pandas
-1
Create a Series from Scalar
e.g
import pandas as pd1
import numpy as
np1
s = pd1.Series(5, index=[0, 1, 2,
3]) print(s)
Output
0 5
1 5
2 5
3 5
dtype:
int64
Data Handling using Pandas
-1
Pandas Series
Maths operations with Series
e.g.
import pandas as pd1
s=
pd1.Series([1,2,3])
t = pd1.Series([1,2,4]) 0 2
1 4
u=s+t #addition operation print 2 7
(u) u=s*t # multiplication dtype: int64
print (u)
operation output
0 1
1 4
2 12
dtype: int64
Data Handling using Pandas
Pandas Series
-1
Head
function e.g

import pandas
as pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s.head(3))

Output
a 1
b. 2
c. 3
dtype: int64
Data Handling using
Pandas -1
Pandas Series
tail function
e.g

import
pandas as
pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s.tail(3))

Output
c 3
d. 4
e. 5
dtype:
Visit : python.mykvs.in for regular updates
Data Handling using Pandas
-1
Accessing Data from Series with indexing and slicing
e.g.
import pandas as pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s[0])# for 0 index position
print (s[:3]) #for first 3 index values
print (s[-3:]) # slicing for last 3 index values
Output
1
a. 1
b. 2
c. 3
dtype: int64 c 3
d. 4
e. 5
dtype: int64
Data Handling using Pandas
-1
Pandas Series
Retrieve Data Using Label as (Index)
e.g.

import pandas as pd1


s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s[['c','d']])

Output c
3
d 4
dtyp
e:
int6
Data Handling using Pandas
-1
Pandas Series
Retrieve Data from selection
There are three methods for data selection:
 loc gets rows (or columns) with particular labels from
the index.
 iloc gets rows (or columns) at particular positions in
the index (so it only takes integers).
 ix usually tries to behave like loc but falls back to
behaving like iloc if a label is not present in the index.
ix is deprecated and the use of loc and iloc is encouraged
instead
Data Handling using Pandas
Pandas Series
-1
Retrieve Data from
selection
e.g. >>> s.ix[:3] # the integer is in the index so
>>> s = pd.Series(np.nan,
index=[49,48,47,46,45, 1, 2, 3, 4, s.ix[:3] works like loc
5]) 49 NaN
>>> NaN
49 s.iloc[:3] # slice the first three 48 NaN
rowsNaN
48
47 NaN 47 NaN
>>> s.loc[:3] # slice up to and including 46 NaN
label 3 45 NaN
49 NaN
48 NaN
1 NaN
47 NaN 2 NaN
46 NaN 3 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Data Handling using Pandas
Pandas DataFrame
-1
It is a two-dimensional data structure, just like any table
(with rows & columns).
Basic Features of DataFrame
 Columns may be of different types
 Size can be changed(Mutable)
 Labeled axes (rows / columns)
 Arithmetic operations on rows and columns
Structure

Rows

It can be created using constructor


pandas.DataFrame( data, index, columns, dtype, copy)
Data Handling using
Pandas
Pandas -1
DataFrame Create
DataFrame
It can be created
with followings
 Lists
 dict
 Series
 Numpy ndarrays
 Another
DataFrame
import pandas as pd1 Empty
Create an Empty output DataFrame
df1 = pd1.DataFrame()
DataFrame Columns:
print(df1)
e.g. [ ] Index: [ ]
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from Lists 0
e.g.1 0 1
output 1 2
import pandas as pd1 2 3
data1 = [1,2,3,4,5] 3 4
df1 = pd1.DataFrame(data1) 4 5

print (df1)
e.g.2
import pandas as pd1
data1 = [['Freya',10],['Mohak',12],['Dwivedi',13]]
Name Age
df1 = pd1.DataFrame(data1,columns=['Name','Age'])
1 Freya
print output 10
(df1) 2 2 Mohak
Dwivedi 12
Write below for numeric value as float 13
df1 = pd1.DataFrame(data,columns=['Name','Age'],dtype=float)
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from Dict of ndarrays / Lists
e.g.1
import pandas as pd1
data1 = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df1 = pd1.DataFrame(data1)
print (df1)
Output
Name Age
1 Freya 9
2 Mohak 10
Write below as 3rd statement in above prog
for indexing
df1 = pd1.DataFrame(data1,
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from List of Dicts
e.g.1
import pandas as pd1
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z':
5}]
df1 =
pd1.DataFrame(data1)
print (df1)
0 1 2 NaN
Output
1 5x 4 5.0y z

Write below as 3rd stmnt in above program forindexing


df = pd.DataFrame(data, index=['first', 'second'])
Data Handling using
Pandas
Pandas DataFrame-1
Create a DataFrame from Dict of Series
e.g.1
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])} df1 = pd1.DataFrame(d1)
print (df1)
Output
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Column Selection -> print (df ['one'])
Adding a new column by passing as Series: ->
df1['three']=pd1.Series([10,20,30],index=['a','b','c'])
Adding a new column using the existing columns
values df1['four']=df1['one']+df1['three']
Data Handling using
Pandas -1
Create a DataFrame from .txt file
Having a text file './inputs/dist.txt' as:
1 1 12.92
1 2 90.75
1 3 60.90
2 1 71.34
Pandas is shipped with built-in reader methods. For example the
pandas.read_table method seems to be a good way to read (also in
chunks) a tabular data file.
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True,
names=('A', 'B', 'C'))
will create a DataFrame objects with column named A made of data of
type int64, B of int64 and C of float64
Data Handling using
Pandas -1
Create a DataFrame from csv(comma separated value) file / import
data from cvs file
e.g.
Suppose filename.csv file contains following
data Date,"price","factor_1","factor_2"
2012-06-11,1600.20,1.255,1.548
2012-06-12,1610.02,1.258,1.554
import pandas as pd
# Read data from file 'filename.csv'
# (in the same directory that your python
program is based)
# Control delimiters, rows, column names with
read_csv
data = pd.read_csv("filename.csv")
# Preview the first 1 line of the loaded data
Data Handling using
Pandas
Pandas
-1
DataFrame
Column addition
df = pd.DataFrame({"A": [1, 2,
3], "B": [4, 5, 6]})
c = [7,8,9]
df[‘C'] = c

Column Deletion
del df1['one'] # Deleting the
first column using DEL
function
df.pop('two') #Deleting
another column using POP
function
Rename columns
df = pd.DataFrame({"A":
[1, 2, 3], "B": [4, 5, 6]})
Data Handling using
Pandas -1
Pandas DataFrame
Row Selection, Addition, and Deletion
#Selection by Label
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df1
= pd1.DataFrame(d1)
print (df1.loc['b'])
Output
one 2.0
two 2.0
Name: b, dtype:
float64
Data Handling using
Pandas -1
Pandas DataFrame
#Selection by integer location
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])} df1 = pd1.DataFrame(d1)
print (df1.iloc[2])

Outp
ut one
3.0
two
3.0
Nam
e: c,
dtype:
float64
Data Handling using
Pandas -1
Pandas DataFrame
Addition of Rows
import pandas as pd1

df1 = pd1.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])


df2 = pd1.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df1 = df1.append(df2)
print (df1)

Deletion of Rows
# Drop rows with label
0 df1 = df1.drop(0)
Data Handling using
Pandas -1
Pandas DataFrame
Iterate over rows in a dataframe
e.g.
import pandas as
pd1 import numpy
as np1
raw_data1 = {'name':
['freya', 'mohak'],
'
a
g
e
'
:

[
1
0
Data Handling using
Pandas -1
Pandas DataFrame
Head & Tail
head() returns the first n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number. tail() returns the
last n rows .e.g.
import pandas as pd
import numpy as
np
#Create a Dictionary
of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The first two rows of the data frame is:")
Data Handling using
Pandas -1
Pandas DataFrame
Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns.
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as
np

df =
pd.DataFrame(np.ra
ndom.randn(8, 4),
index =
['a','b','c','d','e','f','g',
'h'], columns = ['A',
Data Handling using Pandas
-1
Export Pandas DataFrame to a CSV File
e.g.
import pandas as pd

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],


'Price': [22000,25000,27000,35000]
}

df = pd.DataFrame(cars, columns= ['Brand', 'Price'])

df.to_csv (r'C:\export_dataframe.csv', index = False, header=True)

print (df)
Filtering
• Filtering is performed using so-called Boolean arrays.
Deleting columns
• You can delete a column using the drop() function.
In Ou
: t:

In Ou
: t:
Reading from and writing to a file
• Pandas supports many popular file formats including CSV, XML, HTML,
Excel, SQL, JSON, etc.
• Out of all of these, CSV is the file format that you will work with the most.
• You can read in the data from a CSV file using the read_csv() function.

• Similarly, you can write a data frame to a csv file with the to_csv()
function.
• Pandas has the capacity to do much more than what we have
covered here, such as grouping data and even data visualisation.
• However, as with NumPy, we don’t have enough time to cover every
aspect of pandas here.
Seaborn
• Matplotlib is a powerful, but sometimes unwieldy, Python library.
• Seaborn provides a high-level interface to Matplotlib and makes it
easier to produce graphs like the one on the right.
• Some IDEs incorporate elements of this “under the hood” nowadays.
Benefits of Seaborn
• Seaborn offers:
- Using default themes that are aesthetically pleasing.
- Setting custom colour palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- Visualising information from matrices and DataFrames.
• The last three points have led to Seaborn becoming the exploratory
data analysis tool of choice for many Python users.
Plotting with Seaborn
• One of Seaborn's greatest strengths is its diversity of plotting
functions.
• Most plots can be created with one line of code.
• For example….
Histograms
• Allow you to plot the distributions of numeric variables.
Other types of graphs: Creating a
scatter plot
Name of our
Name of variable we dataframe fed to the
want on the x-axis “data=“ command

Seaborn “linear Name of variable we


model plot” want on the y-axis
function for
creating a scatter
graph
• Seaborn doesn't have a dedicated scatter plot function.
• We used Seaborn's function for fitting and plotting a regression line;
hence lmplot()
• However, Seaborn makes it easy to alter plots.
• To remove the regression line, we use the fit_reg=False command
The hue function
• Another useful function in Seaborn is the hue function, which
enables us to use a variable to colour code our data points.
Factor plots
• Make it easy to separate plots by categorical classes.

Colour by stage.
Separate by stage.
Generate using a swarmplot.
Rotate axis on x-ticks by 45 degrees.
A box plot
• The total, stage, and legendary entries are not combat stats so we should
remove them.
• Pandas makes this easy to do, we just create a new dataframe
• We just use Pandas’ .drop() function to create a dataframe that doesn’t
include the variables we don’t want.
Seaborn’s theme
• Seaborn has a number of themes you can use to alter the appearance
of plots.
• For example, we can use “whitegrid” to add grid lines to our boxplot.
Violin plots
• Violin plots are useful alternatives to box plots.
• They show the distribution of a variable through the thickness of the violin.
• Here, we visualise the distribution of attack by Pokémon's primary type:
• Dragon types tend to have higher Attack stats than Ghost types, but they also have greater
variance. But there is something not right here….
• The colours!
Seaborn’s colour palettes
• Seaborn allows us to easily set custom colour palettes by providing it
with an ordered list of colour hex values.
• We first create our colours list.
• Then we just use the palette= function and feed in our colours list.
• Because of the limited number of observations, we could also use a
swarm plot.
• Here, each data point is an observation, but data points are grouped
together by the variable listed on the x-axis.
Overlapping plots
• Both of these show similar information, so it might be useful to
overlap them.
Set size of print canvas.

Remove bars from inside the violins

Make bars black and slightly transparent

Give the graph a title


Data wrangling with Pandas
• What if we wanted to create such a plot that included all of the other
stats as well?
• In our current dataframe, all of the variables are in different columns:
• If we want to visualise all stats, then we’ll have to “melt” the
dataframe.
We use the .drop() function again to re-
create the dataframe without these three
variables.
The dataframe we want to melt.

The variables to keep, all others will be


melted.
A name for the new, melted, variable.

• All 6 of the stat columns have been "melted" into one, and
the new Stat column indicates the original stat (HP, Attack,
Defense, Sp. Attack, Sp. Defense, or Speed).
• It's hard to see here, but each pokemon now has 6 rows of
data; hende the melted_df has 6 times more rows of data.
• This graph could be made to look nicer with a few tweaks.

Enlarge the plot.

Separate points by hue.


Use our special Pokemon colour palette.
Adjust the y-axis.
Move the legend box outside of
the graph and place to the right of
it..
Plotting all data: Empirical
cumulative distribution functions
(ECDFs)
• An alternative way of visualising a
distribution of a variable in a large dataset
is to use an ECDF.
• Here we have an ECDF that shows the
percentages of different attack strengths of
pokemon.
• An x-value of an ECDF is the quantity you
are measuring; i.e. attacks strength.
• The y-value is the fraction of data points
that have a value smaller than the
corresponding x-value. For example…
75% of Pokemon have an attack
level of 90 or less

20% of Pokemon have an attack


level of 50 or less.
Plotting an ECDF
• You can also plot multiple ECDFs
on the same plot.
• As an example, here with have an
ECDF for Pokemon attack, speed,
and defence levels.
• We can see here that defence
levels tend to be a little less than
the other two.
The usefulness of ECDFs
• It is often quite useful to plot the ECDF first as part of your workflow.
• It shows all the data and gives a complete picture as to how the data
are distributed.
Heatmaps
• Useful for visualising matrix-like data.
• Here, we’ll plot the correlation of the stats_df variables
Bar plot
• Visualises the distributions of categorical variables.

Rotates the x-ticks 45 degrees


Joint Distribution Plot
• Joint distribution plots combine information from scatter plots and
histograms to give you detailed information for bi-variate distributions.
Any questions?

You might also like