0% found this document useful (0 votes)

35 views97 pages

DAY6 Pandas Seaborn

The document provides an overview of data analysis and visualization using Python, focusing on the NumPy and Pandas libraries. It explains the importance of NumPy for numerical computing, detailing its features like ndarray, array operations, and data types, as well as the advantages of using NumPy arrays over Python lists. Additionally, it introduces Pandas as a high-level library for data manipulation, describing its data structures, such as Series and DataFrame, and their functionalities.

Uploaded by

goldivijaybala51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views97 pages

DAY6 Pandas Seaborn

Uploaded by

goldivijaybala51

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 97

Data Analysis and

Visualisation with
Python
Numerical Python (NumPy)
• NumPy is the most foundational package for numerical computing in Python.
• If you are going to work on data analysis or machine learning projects, then
having a solid understanding of NumPy is nearly mandatory.
• Indeed, many other libraries, such as pandas and scikit-learn, use NumPy’s
array objects as the lingua franca for data exchange.
• One of the reasons as to why NumPy is so important for numerical
computations is because it is designed for efficiency with large arrays of data.
The reasons for this include:
- It stores data internally in a continuous block of memory,
independent of other in-built Python objects.
- It performs complex computations on entire arrays without the
need for for loops.
What you’ll find in NumPy
• ndarray: an efficient multidimensional array providing fast array-orientated
arithmetic operations and flexible broadcasting capabilities.
• Mathematical functions for fast operations on entire arrays of data without
having to write loops.
• Tools for reading/writing array data to disk and working with memory-
mapped files.
• Linear algebra, random number generation, and Fourier transform
capabilities.
• A C API for connecting NumPy with libraries written in C, C++, and
FORTRAN. This is why Python is the language of choice for wrapping legacy
codebases.
The NumPy ndarray: A multi-
dimensional array object
• The NumPy ndarray object is a fast and flexible container for large
data sets in Python.
• NumPy arrays are a bit like Python lists, but are still a very different
beast at the same time.
• Arrays enable you to store multiple items of the same data type. It is
the facilities around the array object that makes NumPy so convenient
for performing math and data manipulations.
Ndarray vs. lists
• By now, you are familiar with Python lists and how incredibly useful
they are.
• So, you may be asking yourself:

“I can store numbers and other objects in a Python list and do all sorts
of computations and manipulations through list comprehensions, for-
loops etc. What do I need a NumPy array for?”

• There are very significant advantages of using NumPy arrays overs

lists.
Creating a NumPy array
• To understand these advantages, lets create an array.
• One of the most common, of the many, ways to create a NumPy array
is to create one from a list by passing it to the np.array() function.

In Ou
: t:
Differences between lists and
ndarrays
• The key difference between an array and a list is that arrays are
designed to handle vectorised operations while a python lists are not.
• That means, if you apply a function, it is performed on every item in
the array, rather than on the whole array object.
• Let’s suppose you want to add the number 2 to every item in the list.
The intuitive way to do this is something like this:

In Ou
: t:

• That was not possible with a list, but you can do that on an array:

In Ou
: t:
• It should be noted here that, once a Numpy array is created, you
cannot increase its size.
• To do so, you will have to create a new array.
Create a 2d array from a list of list
• You can pass a list of lists to create a matrix-like a 2d array.

In
Ou
:
t:
The dtype argument
• You can specify the data-type by setting the dtype() argument.
• Some of the most commonly used NumPy dtypes are: float, int,
bool, str, and object.

In
Ou
:
t:
The astype argument
• You can also convert it to a different data-type using the astype method.

In Ou
: t:

• Remember that, unlike lists, all items in an array have to be of the same
type.
dtype=‘object’
• However, if you are uncertain about what data type your array will
hold, or if you want to hold characters and numbers in the same array,
you can set the dtype as 'object'.

In Ou
: t:
The tolist() function
• You can always convert an array into a list using the tolist()
command.

In Ou
: t:
Inspecting a NumPy array
• There are a range of functions built into NumPy that allow you to
inspect different aspects of an array:

In
: Ou
t:
Extracting specific items from an
array
• You can extract portions of the array using indices, much like when
you’re working with lists.
• Unlike lists, however, arrays can optionally accept as many parameters
in the square brackets as there are number of dimensions

In Ou
: t:
Boolean indexing
• A boolean index array is of the same shape as the array-to-be-filtered,
but it only contains TRUE and FALSE values.

In Ou
: t:
Pandas
• Pandas, like NumPy, is one of the most popular Python libraries for
data analysis.
• It is a high-level abstraction over low-level NumPy, which is written in
pure C.
• Pandas provides high-performance, easy-to-use data structures and
data analysis tools.
• There are two main structures used by pandas; data frames and
series.
Indices in a pandas series
• A pandas series is similar to a list, but differs in the fact that a series associates a label
with each element. This makes it look like a dictionary.
• If an index is not explicitly provided by the user, pandas creates a RangeIndex ranging
from 0 to N-1.
• Each series object also has a data type.

In O
: ut
:
• As you may suspect by this point, a series has ways to extract all of
the values in the series, as well as individual elements by index.

In O
: ut
:

• You can also provide an index manually.

In
:
Ou
t:
• It is easy to retrieve several elements of a series by their indices or
make group assignments.

Ou
In t:
:
Filtering and maths operations
• Filtering and maths operations are easy with Pandas as well.

In O
: ut
:
Data Handling using Pandas
-1
Data Structures in Pandas
Two important data structures of pandas are–Series, DataFrame
1. Series
Series is like a one-dimensional array like structure with
homogeneous data.
For example, the following series is a collection of integers.

Basic feature of series are

 Homogeneous data
 Size Immutable
 Values of Data Mutable
Data Handling using Pandas
2. DataFrame
-1
DataFrame is a two-dimensional array with
like heterogeneous
data. SR. Admn
No. No
Student Name Class Section Gender Date Of
Birth
1 001284 NIDHI MANDAL I A Girl 07/08/2010
2 001285 SOUMYADIP I A Boy 24/02/2011
BHATTACHARYA
3 001286 SHREYAANG I A Boy 29/12/2010
SHANDILYA
Basic feature of DataFrame are
 Heterogeneous data
 Size Mutable
 Data Mutable
Pandas data frame
• Simplistically, a data frame is a table, with rows and columns.
• Each column in a data frame is a series object.
• Rows consist of elements inside series.

Case ID Variable one Variable two Variable 3

1 123 ABC 10
2 456 DEF 20
3 789 XYZ 30
Creating a Pandas data frame
• Pandas data frames can be constructed using Python dictionaries.
In
:

Ou
t:
• You can also create a data frame from a list.

In Ou
: t:
• You can ascertain the type of a column with the type() function.

In
:

Ou
t:
• A Pandas data frame object as two indices; a column index and row
index.
• Again, if you do not provide one, Pandas will create a RangeIndex
from 0 to N-1.
In
:

Ou
t:
• There are numerous ways to provide row indices explicitly.
• For example, you could provide an index when creating a data frame:

In Ou
: t:

• or do it during runtime.
• Here, I also named the index ‘country code’.
Ou
In t:
:
• Row access using index can be performed in several ways.
• First, you could use .loc() and provide an index label.

In Ou
: t:

• Second, you could use .iloc() and provide an index number

In Ou
: t:
• A selection of particular rows and columns can be selected this way.

In Ou
: t:

• You can feed .loc() two arguments, index list and column list, slicing
operation is supported as well:

In Ou
: t:
Data Handling using Pandas
-1
Pandas Series
It is like one-dimensional array capable of holding data
of any type (integer, string, float, python objects, etc.).
Series can be created using constructor.
Syntax :- pandas.Series( data, index, dtype, copy)
Creation of Series is also possible from – ndarray,
dictionary, scalar value.
Series can be created using
1. Array
2. Dict
3. Scalar value or constant
Data Handling using Pandas
-1
Pandas Series

Create an Empty Series

e.g.

import pandas as pseries

s = pseries.Series()
print(s)

Output
Series([], dtype: float64)
Data Handling using Pandas
Pandas Series
Create a Series from ndarray -1
Without index With index position
e.g. e.g.

import pandas as import pandas as p1

pd1 import numpy import numpy as
as np1 np1
data = data =
np1.array(['a','b','c','d' np1.array(['a','b','c','d'
]) ])
s = pd1.Series(data) s=
print(s) p1.Series(data,index=[1
00,101,102,103])
print(s)
Output
1 a
2 b Output
3 c 100 a
4 d 101 b
102c 103d
Data Handling using Pandas
-1
Pandas Series
Create a Series from dict

Eg.1(without index) Eg.2 (with index)

import pandas as import pandas as
pd1 import numpy pd1 import numpy
as np1 as np1
data = {'a' : 0., 'b' : 1., data = {'a' : 0., 'b' : 1.,
'c' : 2.} 'c' : 2.}
s = pd1.Series(data) s=
print(s) pd1.Series(data,index
=['b','c','d','a'])
Output print(s)
a
0.0 Output
b b
Data Handling using Pandas
-1
Create a Series from Scalar
e.g
import pandas as pd1
import numpy as
np1
s = pd1.Series(5, index=[0, 1, 2,
3]) print(s)
Output
0 5
1 5
2 5
3 5
dtype:
int64
Data Handling using Pandas
-1
Pandas Series
Maths operations with Series
e.g.
import pandas as pd1
s=
pd1.Series([1,2,3])
t = pd1.Series([1,2,4]) 0 2
1 4
u=s+t #addition operation print 2 7
(u) u=s*t # multiplication dtype: int64
print (u)
operation output
0 1
1 4
2 12
dtype: int64
Data Handling using Pandas
Pandas Series
-1
Head
function e.g

import pandas
as pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s.head(3))

Output
a 1
b. 2
c. 3
dtype: int64
Data Handling using
Pandas -1
Pandas Series
tail function
e.g

import
pandas as
pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s.tail(3))

Output
c 3
d. 4
e. 5
dtype:
Visit : python.mykvs.in for regular updates
Data Handling using Pandas
-1
Accessing Data from Series with indexing and slicing
e.g.
import pandas as pd1
s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s[0])# for 0 index position
print (s[:3]) #for first 3 index values
print (s[-3:]) # slicing for last 3 index values
Output
1
a. 1
b. 2
c. 3
dtype: int64 c 3
d. 4
e. 5
dtype: int64
Data Handling using Pandas
-1
Pandas Series
Retrieve Data Using Label as (Index)
e.g.

import pandas as pd1

s = pd1.Series([1,2,3,4,5],index =
['a','b','c','d','e']) print (s[['c','d']])

Output c
3
d 4
dtyp
e:
int6
Data Handling using Pandas
-1
Pandas Series
Retrieve Data from selection
There are three methods for data selection:
 loc gets rows (or columns) with particular labels from
the index.
 iloc gets rows (or columns) at particular positions in
the index (so it only takes integers).
 ix usually tries to behave like loc but falls back to
behaving like iloc if a label is not present in the index.
ix is deprecated and the use of loc and iloc is encouraged
instead
Data Handling using Pandas
Pandas Series
-1
Retrieve Data from
selection
e.g. >>> s.ix[:3] # the integer is in the index so
>>> s = pd.Series(np.nan,
index=[49,48,47,46,45, 1, 2, 3, 4, s.ix[:3] works like loc
5]) 49 NaN
>>> NaN
49 s.iloc[:3] # slice the first three 48 NaN
rowsNaN
48
47 NaN 47 NaN
>>> s.loc[:3] # slice up to and including 46 NaN
label 3 45 NaN
49 NaN
48 NaN
1 NaN
47 NaN 2 NaN
46 NaN 3 NaN
45 NaN
1 NaN
2 NaN
3 NaN
Data Handling using Pandas
Pandas DataFrame
-1
It is a two-dimensional data structure, just like any table
(with rows & columns).
Basic Features of DataFrame
 Columns may be of different types
 Size can be changed(Mutable)
 Labeled axes (rows / columns)
 Arithmetic operations on rows and columns
Structure

Rows

It can be created using constructor

pandas.DataFrame( data, index, columns, dtype, copy)
Data Handling using
Pandas
Pandas -1
DataFrame Create
DataFrame
It can be created
with followings
 Lists
 dict
 Series
 Numpy ndarrays
 Another
DataFrame
import pandas as pd1 Empty
Create an Empty output DataFrame
df1 = pd1.DataFrame()
DataFrame Columns:
print(df1)
e.g. [ ] Index: [ ]
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from Lists 0
e.g.1 0 1
output 1 2
import pandas as pd1 2 3
data1 = [1,2,3,4,5] 3 4
df1 = pd1.DataFrame(data1) 4 5

print (df1)
e.g.2
import pandas as pd1
data1 = [['Freya',10],['Mohak',12],['Dwivedi',13]]
Name Age
df1 = pd1.DataFrame(data1,columns=['Name','Age'])
1 Freya
print output 10
(df1) 2 2 Mohak
Dwivedi 12
Write below for numeric value as float 13
df1 = pd1.DataFrame(data,columns=['Name','Age'],dtype=float)
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from Dict of ndarrays / Lists
e.g.1
import pandas as pd1
data1 = {'Name':['Freya', 'Mohak'],'Age':[9,10]}
df1 = pd1.DataFrame(data1)
print (df1)
Output
Name Age
1 Freya 9
2 Mohak 10
Write below as 3rd statement in above prog
for indexing
df1 = pd1.DataFrame(data1,
Data Handling using
Pandas -1
Pandas DataFrame
Create a DataFrame from List of Dicts
e.g.1
import pandas as pd1
data1 = [{'x': 1, 'y': 2},{'x': 5, 'y': 4, 'z':
5}]
df1 =
pd1.DataFrame(data1)
print (df1)
0 1 2 NaN
Output
1 5x 4 5.0y z

Write below as 3rd stmnt in above program forindexing

df = pd.DataFrame(data, index=['first', 'second'])
Data Handling using
Pandas
Pandas DataFrame-1
Create a DataFrame from Dict of Series
e.g.1
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])} df1 = pd1.DataFrame(d1)
print (df1)
Output
one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Column Selection -> print (df ['one'])
Adding a new column by passing as Series: ->
df1['three']=pd1.Series([10,20,30],index=['a','b','c'])
Adding a new column using the existing columns
values df1['four']=df1['one']+df1['three']
Data Handling using
Pandas -1
Create a DataFrame from .txt file
Having a text file './inputs/dist.txt' as:
1 1 12.92
1 2 90.75
1 3 60.90
2 1 71.34
Pandas is shipped with built-in reader methods. For example the
pandas.read_table method seems to be a good way to read (also in
chunks) a tabular data file.
import pandas
df = pandas.read_table('./input/dists.txt', delim_whitespace=True,
names=('A', 'B', 'C'))
will create a DataFrame objects with column named A made of data of
type int64, B of int64 and C of float64
Data Handling using
Pandas -1
Create a DataFrame from csv(comma separated value) file / import
data from cvs file
e.g.
Suppose filename.csv file contains following
data Date,"price","factor_1","factor_2"
2012-06-11,1600.20,1.255,1.548
2012-06-12,1610.02,1.258,1.554
import pandas as pd
# Read data from file 'filename.csv'
# (in the same directory that your python
program is based)
# Control delimiters, rows, column names with
read_csv
data = pd.read_csv("filename.csv")
# Preview the first 1 line of the loaded data
Data Handling using
Pandas
Pandas
-1
DataFrame
Column addition
df = pd.DataFrame({"A": [1, 2,
3], "B": [4, 5, 6]})
c = [7,8,9]
df[‘C'] = c

Column Deletion
del df1['one'] # Deleting the
first column using DEL
function
df.pop('two') #Deleting
another column using POP
function
Rename columns
df = pd.DataFrame({"A":
[1, 2, 3], "B": [4, 5, 6]})
Data Handling using
Pandas -1
Pandas DataFrame
Row Selection, Addition, and Deletion
#Selection by Label
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df1
= pd1.DataFrame(d1)
print (df1.loc['b'])
Output
one 2.0
two 2.0
Name: b, dtype:
float64
Data Handling using
Pandas -1
Pandas DataFrame
#Selection by integer location
import pandas as pd1
d1 = {'one' : pd1.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd1.Series([1, 2, 3, 4], index=['a', 'b', 'c',
'd'])} df1 = pd1.DataFrame(d1)
print (df1.iloc[2])

Outp
ut one
3.0
two
3.0
Nam
e: c,
dtype:
float64
Data Handling using
Pandas -1
Pandas DataFrame
Addition of Rows
import pandas as pd1

df1 = pd1.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

df2 = pd1.DataFrame([[5, 6], [7, 8]], columns = ['a','b'])

df1 = df1.append(df2)
print (df1)

Deletion of Rows
# Drop rows with label
0 df1 = df1.drop(0)
Data Handling using
Pandas -1
Pandas DataFrame
Iterate over rows in a dataframe
e.g.
import pandas as
pd1 import numpy
as np1
raw_data1 = {'name':
['freya', 'mohak'],
'
a
g
e
'
:

[
1
0
Data Handling using
Pandas -1
Pandas DataFrame
Head & Tail
head() returns the first n rows (observe the index values). The default number of
elements to display is five, but you may pass a custom number. tail() returns the
last n rows .e.g.
import pandas as pd
import numpy as
np
#Create a Dictionary
of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
'Age':pd.Series([25,26,25,23,30,29,23]),
'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}
#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data frame is:")
print df
print ("The first two rows of the data frame is:")
Data Handling using
Pandas -1
Pandas DataFrame
Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns.
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as
np

df =
pd.DataFrame(np.ra
ndom.randn(8, 4),
index =
['a','b','c','d','e','f','g',
'h'], columns = ['A',
Data Handling using Pandas
-1
Export Pandas DataFrame to a CSV File
e.g.
import pandas as pd

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],

'Price': [22000,25000,27000,35000]
}

df = pd.DataFrame(cars, columns= ['Brand', 'Price'])

df.to_csv (r'C:\export_dataframe.csv', index = False, header=True)

print (df)
Filtering
• Filtering is performed using so-called Boolean arrays.
Deleting columns
• You can delete a column using the drop() function.
In Ou
: t:

In Ou
: t:
Reading from and writing to a file
• Pandas supports many popular file formats including CSV, XML, HTML,
Excel, SQL, JSON, etc.
• Out of all of these, CSV is the file format that you will work with the most.
• You can read in the data from a CSV file using the read_csv() function.

• Similarly, you can write a data frame to a csv file with the to_csv()
function.
• Pandas has the capacity to do much more than what we have
covered here, such as grouping data and even data visualisation.
• However, as with NumPy, we don’t have enough time to cover every
aspect of pandas here.
Seaborn
• Matplotlib is a powerful, but sometimes unwieldy, Python library.
• Seaborn provides a high-level interface to Matplotlib and makes it
easier to produce graphs like the one on the right.
• Some IDEs incorporate elements of this “under the hood” nowadays.
Benefits of Seaborn
• Seaborn offers:
- Using default themes that are aesthetically pleasing.
- Setting custom colour palettes.
- Making attractive statistical plots.
- Easily and flexibly displaying distributions.
- Visualising information from matrices and DataFrames.
• The last three points have led to Seaborn becoming the exploratory
data analysis tool of choice for many Python users.
Plotting with Seaborn
• One of Seaborn's greatest strengths is its diversity of plotting
functions.
• Most plots can be created with one line of code.
• For example….
Histograms
• Allow you to plot the distributions of numeric variables.
Other types of graphs: Creating a
scatter plot
Name of our
Name of variable we dataframe fed to the
want on the x-axis “data=“ command

Seaborn “linear Name of variable we

model plot” want on the y-axis
function for
creating a scatter
graph
• Seaborn doesn't have a dedicated scatter plot function.
• We used Seaborn's function for fitting and plotting a regression line;
hence lmplot()
• However, Seaborn makes it easy to alter plots.
• To remove the regression line, we use the fit_reg=False command
The hue function
• Another useful function in Seaborn is the hue function, which
enables us to use a variable to colour code our data points.
Factor plots
• Make it easy to separate plots by categorical classes.

Colour by stage.
Separate by stage.
Generate using a swarmplot.
Rotate axis on x-ticks by 45 degrees.
A box plot
• The total, stage, and legendary entries are not combat stats so we should
remove them.
• Pandas makes this easy to do, we just create a new dataframe
• We just use Pandas’ .drop() function to create a dataframe that doesn’t
include the variables we don’t want.
Seaborn’s theme
• Seaborn has a number of themes you can use to alter the appearance
of plots.
• For example, we can use “whitegrid” to add grid lines to our boxplot.
Violin plots
• Violin plots are useful alternatives to box plots.
• They show the distribution of a variable through the thickness of the violin.
• Here, we visualise the distribution of attack by Pokémon's primary type:
• Dragon types tend to have higher Attack stats than Ghost types, but they also have greater
variance. But there is something not right here….
• The colours!
Seaborn’s colour palettes
• Seaborn allows us to easily set custom colour palettes by providing it
with an ordered list of colour hex values.
• We first create our colours list.
• Then we just use the palette= function and feed in our colours list.
• Because of the limited number of observations, we could also use a
swarm plot.
• Here, each data point is an observation, but data points are grouped
together by the variable listed on the x-axis.
Overlapping plots
• Both of these show similar information, so it might be useful to
overlap them.
Set size of print canvas.

Remove bars from inside the violins

Make bars black and slightly transparent

Give the graph a title

Data wrangling with Pandas
• What if we wanted to create such a plot that included all of the other
stats as well?
• In our current dataframe, all of the variables are in different columns:
• If we want to visualise all stats, then we’ll have to “melt” the
dataframe.
We use the .drop() function again to re-
create the dataframe without these three
variables.
The dataframe we want to melt.

The variables to keep, all others will be

melted.
A name for the new, melted, variable.

• All 6 of the stat columns have been "melted" into one, and
the new Stat column indicates the original stat (HP, Attack,
Defense, Sp. Attack, Sp. Defense, or Speed).
• It's hard to see here, but each pokemon now has 6 rows of
data; hende the melted_df has 6 times more rows of data.
• This graph could be made to look nicer with a few tweaks.

Enlarge the plot.

Separate points by hue.

Use our special Pokemon colour palette.
Adjust the y-axis.
Move the legend box outside of
the graph and place to the right of
it..
Plotting all data: Empirical
cumulative distribution functions
(ECDFs)
• An alternative way of visualising a
distribution of a variable in a large dataset
is to use an ECDF.
• Here we have an ECDF that shows the
percentages of different attack strengths of
pokemon.
• An x-value of an ECDF is the quantity you
are measuring; i.e. attacks strength.
• The y-value is the fraction of data points
that have a value smaller than the
corresponding x-value. For example…
75% of Pokemon have an attack
level of 90 or less

20% of Pokemon have an attack

level of 50 or less.
Plotting an ECDF
• You can also plot multiple ECDFs
on the same plot.
• As an example, here with have an
ECDF for Pokemon attack, speed,
and defence levels.
• We can see here that defence
levels tend to be a little less than
the other two.
The usefulness of ECDFs
• It is often quite useful to plot the ECDF first as part of your workflow.
• It shows all the data and gives a complete picture as to how the data
are distributed.
Heatmaps
• Useful for visualising matrix-like data.
• Here, we’ll plot the correlation of the stats_df variables
Bar plot
• Visualises the distributions of categorical variables.

Rotates the x-ticks 45 degrees

Joint Distribution Plot
• Joint distribution plots combine information from scatter plots and
histograms to give you detailed information for bi-variate distributions.
Any questions?

Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
34 pages
Data Visualization1
No ratings yet
Data Visualization1
52 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Chapter 10 Eng Introducing Python Pandas
100% (3)
Chapter 10 Eng Introducing Python Pandas
28 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
Ip Chapter 1
No ratings yet
Ip Chapter 1
36 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
No ratings yet
Ln. 1 - Data Handling Using Pandas - Series & Dataframe
14 pages
Pandas
No ratings yet
Pandas
34 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
75 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
CH 2
No ratings yet
CH 2
36 pages
Data Handling Python NCERT
No ratings yet
Data Handling Python NCERT
36 pages
Leip 102
No ratings yet
Leip 102
36 pages
Python Unit - 6 Pandas
No ratings yet
Python Unit - 6 Pandas
106 pages
Pandas
No ratings yet
Pandas
163 pages
Ip 102
No ratings yet
Ip 102
36 pages
Informatics Practices Class 12
No ratings yet
Informatics Practices Class 12
225 pages
Data Handling Using Pandas - I
No ratings yet
Data Handling Using Pandas - I
42 pages
Pandas Class XII (2021-22)
No ratings yet
Pandas Class XII (2021-22)
246 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
RAW Data
No ratings yet
RAW Data
22 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
Panda Ncert 1
No ratings yet
Panda Ncert 1
36 pages
Packages
No ratings yet
Packages
37 pages
Ncert Pandas
No ratings yet
Ncert Pandas
36 pages
Python Pandas
No ratings yet
Python Pandas
22 pages
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
No ratings yet
XII - Ip - Panda - I - Part - I - 2023 (1) 1 1
25 pages
Python Libraries
No ratings yet
Python Libraries
79 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
23 pages
Python Pandas
No ratings yet
Python Pandas
96 pages
Working With Pandas Notes
No ratings yet
Working With Pandas Notes
27 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
36 pages
Attachment 3 Python For Data Analysis Lyst9850
No ratings yet
Attachment 3 Python For Data Analysis Lyst9850
31 pages
05-Unit-V Python Lecture Notes
No ratings yet
05-Unit-V Python Lecture Notes
14 pages
Unit III Part 2 1725700061785
No ratings yet
Unit III Part 2 1725700061785
85 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
No ratings yet
Data Analysis and Visualization Using Python Libraries and Streamlit - RTF Pre Read Materials
29 pages
Pandas
No ratings yet
Pandas
82 pages
NUMPY
No ratings yet
NUMPY
33 pages
Chapter 2 - NumPy and Pandas
No ratings yet
Chapter 2 - NumPy and Pandas
26 pages
Unit 5
No ratings yet
Unit 5
40 pages
UNIT 3 (Chapter 2) Pandas
No ratings yet
UNIT 3 (Chapter 2) Pandas
43 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
Data Analytics Pandas
No ratings yet
Data Analytics Pandas
33 pages
Data Handling Using Pandas-1 - Series Object Notes PDF
No ratings yet
Data Handling Using Pandas-1 - Series Object Notes PDF
25 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
61 pages
Data Handling Using Pandas - 1-2-1
No ratings yet
Data Handling Using Pandas - 1-2-1
10 pages
Python Pandas Series
No ratings yet
Python Pandas Series
30 pages
Study Material IP 2022
No ratings yet
Study Material IP 2022
55 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
Ip Study Material
No ratings yet
Ip Study Material
185 pages
Numpy & Pandas
No ratings yet
Numpy & Pandas
13 pages
Introduction To Python Libraries
No ratings yet
Introduction To Python Libraries
13 pages
Pandas Python
No ratings yet
Pandas Python
11 pages
ML Lab8
No ratings yet
ML Lab8
28 pages
Python Code
No ratings yet
Python Code
44 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
Chapter 1 Introduction To Assembly Language Programming
No ratings yet
Chapter 1 Introduction To Assembly Language Programming
45 pages
Advanced Web Technology
No ratings yet
Advanced Web Technology
2 pages
Big Data Architectures and The Data Lake: James Serra
No ratings yet
Big Data Architectures and The Data Lake: James Serra
53 pages
FINAL Document Kalyani
No ratings yet
FINAL Document Kalyani
80 pages
WEEK1 - Overview in MS Excel PDF
No ratings yet
WEEK1 - Overview in MS Excel PDF
22 pages
Quiz 1
No ratings yet
Quiz 1
2 pages
Mcs 023
No ratings yet
Mcs 023
261 pages
Difference Between SAP Memory and ABAP Memory: Answers 1
No ratings yet
Difference Between SAP Memory and ABAP Memory: Answers 1
2 pages
System Detail of Noida Office
No ratings yet
System Detail of Noida Office
61 pages
Terraform Associate Exam - Free Questions and Answers - ITExams - Com3
No ratings yet
Terraform Associate Exam - Free Questions and Answers - ITExams - Com3
2 pages
What Are Schemas
No ratings yet
What Are Schemas
25 pages
Statement of Purpose@ Pace
No ratings yet
Statement of Purpose@ Pace
3 pages
Hands-On ChatGPT in Excel
100% (3)
Hands-On ChatGPT in Excel
205 pages
Software Design and Architecture JUNE-2021 Sem - II SET-5 (T.Y.B.tech COMP)
No ratings yet
Software Design and Architecture JUNE-2021 Sem - II SET-5 (T.Y.B.tech COMP)
6 pages
008987690
No ratings yet
008987690
2 pages
Workshop01 - Answer
No ratings yet
Workshop01 - Answer
8 pages
University of Engineering & Technology Lahore
No ratings yet
University of Engineering & Technology Lahore
12 pages
Wireless Communication Interfaces
No ratings yet
Wireless Communication Interfaces
2 pages
Software Engineering: Elysium Technologies Private Limited
No ratings yet
Software Engineering: Elysium Technologies Private Limited
16 pages
PHP Point of Sale
No ratings yet
PHP Point of Sale
52 pages
Pithila Shil - 23441923049 - Mic401b - Ca2
No ratings yet
Pithila Shil - 23441923049 - Mic401b - Ca2
8 pages
NC JC 2022 - Brochure
No ratings yet
NC JC 2022 - Brochure
19 pages
Unit 2 - JDBC
No ratings yet
Unit 2 - JDBC
114 pages
JD ETL Analyst & Developer
No ratings yet
JD ETL Analyst & Developer
2 pages
246M - PLUS - 344M PLUS - User - Manual - V2
No ratings yet
246M - PLUS - 344M PLUS - User - Manual - V2
79 pages
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
4 pages
Security Data Lake PDF
100% (1)
Security Data Lake PDF
37 pages
How To Create A New Academic Year
No ratings yet
How To Create A New Academic Year
9 pages
Application Control Word File Final Xxx...
No ratings yet
Application Control Word File Final Xxx...
9 pages
FIDES GroundSlab e
No ratings yet
FIDES GroundSlab e
37 pages

DAY6 Pandas Seaborn

Uploaded by

DAY6 Pandas Seaborn

Uploaded by

Data Analysis and

• There are very significant advantages of using NumPy arrays overs

• You can also provide an index manually.

Basic feature of series are

Case ID Variable one Variable two Variable 3

• Second, you could use .iloc() and provide an index number

Create an Empty Series

import pandas as pseries

import pandas as import pandas as p1

Eg.1(without index) Eg.2 (with index)

import pandas as pd1

It can be created using constructor

Write below as 3rd stmnt in above program forindexing

df1 = pd1.DataFrame([[1, 2], [3, 4]], columns = ['a','b'])

cars = {'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],

df = pd.DataFrame(cars, columns= ['Brand', 'Price'])

df.to_csv (r'C:\export_dataframe.csv', index = False, header=True)

Seaborn “linear Name of variable we

Remove bars from inside the violins

Make bars black and slightly transparent

Give the graph a title

The variables to keep, all others will be

Enlarge the plot.

Separate points by hue.

20% of Pokemon have an attack

Rotates the x-ticks 45 degrees

You might also like