0% found this document useful (0 votes)
8 views17 pages

20 June BA Class

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views17 pages

20 June BA Class

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

keyboard_arrow_down Python for Researchers and Analysts

Chapter 19. Data Visualization in Pandas

19.1 Introduction

Pandas offers a variety of graphing techniques for visualizing data, leveraging the plotting capabilities of Matplotlib and Seaborn under the
hood. Here are some of the key graphing tools available in Pandas.

keyboard_arrow_down 19.2 Line Plot


Let's start by creating a sample DataFrame with hypothetical figures of economic data of Pakistan over several years

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample Data
data = {
'Year': pd.date_range(start='2010', periods=10, freq='Y'),
'GDP_Growth': np.random.uniform(2, 6, 10),
'Inflation_Rate': np.random.uniform(3, 10, 10),
'Population': np.linspace(180, 220, 10), # in millions
'Exports': np.random.uniform(20, 30, 10) # in billion USD
}
df = pd.DataFrame(data)
df.set_index('Year', inplace=True)
df
GDP_Growth Inflation_Rate Population Exports

Year

2010-12-31 5.658384 4.546454 180.000000 23.276857

2011-12-31 2.278775 5.113731 184.444444 20.557508

2012-12-31 2.208520 4.196979 188.888889 25.525849

2013-12-31 4.522695 5.451623 193.333333 25.146137

2014-12-31 4.225519 7.077512 197.777778 24.713077

2015-12-31 2.396435 8.213365 202.222222 23.764426

2016-12-31 4.094125 8.115253 206.666667 20.770149

2017-12-31 4.093561 9.806631 211.111111 23.251311

2018-12-31 4.356584 6.302788 215.555556 28.127393

2019-12-31 3.897462 8.207297 220.000000 24.078683

Next steps: Generate code with df


toggle_off View recommended plots

Lets plot line plot for the column [“GDP_Growth”]

df['GDP_Growth'].plot(title="GDP Growth Rates")


plt.ylabel('Percentage change in GDP')
plt.show()
There are several options available to customize the plot in Pandas and Matplot.

keyboard_arrow_down 19.2.1 Marker Styles


The table below shows the available marker options with their description. | Marker | Description | |--------|---------------| | 'o' | Circle | 's' |Square |
'^' | Triangle Up | 'D' | Diamond | 'v' | Triangle Down | '*' | Star | '+' | Plus | ',' | Pixel | '.' | Point

df['GDP_Growth'].plot(title="GDP Growth Rates",marker="D")


plt.ylabel('Percentage change in GDP')
plt.show()

keyboard_arrow_down 19.2.2 Line Styles


The table below shows the available line styles with their description. | Linestyle | Description | |--------------|--------------| '-' |Solid line '--' |
Dashed line '-.' | Dash-dot line ':' | Dotted line

df['GDP_Growth'].plot(title="GDP Growth Rates",style="-.")


plt.ylabel('Percentage change in GDP')
plt.show()

keyboard_arrow_down 19.2.3 Colors


The table below shows the available colors available with their description. | Color | Description | |------|-----------------| 'b' | Blue 'g' | Green 'r' |
Red 'c' | Cyan 'm' | Magenta 'y' | Yellow 'k' | Black 'w' | White

df['GDP_Growth'].plot(title="GDP Growth Rates",color="r")


plt.ylabel('Percentage change in GDP')
plt.show()

keyboard_arrow_down 19.2.4 Linewidth:


df['GDP_Growth'].plot(title="GDP Growth Rates",
color="r",
linewidth=2)
plt.ylabel('Percentage change in GDP')
plt.show()

keyboard_arrow_down 19.2.5 Grid


df['GDP_Growth'].plot(title="GDP Growth Rates",
color="r",
linewidth=2,
grid=True)
plt.ylabel('Percentage change in GDP')
plt.show()
keyboard_arrow_down 19.2.6 Label and Legend:
df[['GDP_Growth','Inflation_Rate']].plot(title="GDP Growth Rates",
linewidth=2,
grid=True)
plt.ylabel('Percentage change in GDP')
plt.legend(['GDP','Inflation']) #Default columns names will be used if no
plt.show()

keyboard_arrow_down 19.2.7 Figure Size


df[['GDP_Growth','Inflation_Rate']].plot(title="GDP Growth Rates",
linewidth=3,
figsize=(10,6),
grid=True)
plt.ylabel('Percentage change in GDP')
plt.legend(['GDP','Inflation']) #Default columns names will be used if no
plt.show()
keyboard_arrow_down 19.2.8 Subplots:
df[['GDP_Growth', 'Inflation_Rate']].plot(kind='line',
subplots=True,
title='GDP Growth and Inflation
plt.show()

keyboard_arrow_down 19.2.9 X-label and Y-label:


df['GDP_Growth'].plot(kind='line', title='GDP Growth Over Time')
plt.xlabel('Year')
plt.ylabel('GDP Growth (%)')
plt.show()
keyboard_arrow_down 19.2.10 Rotation (Rot)
df['GDP_Growth'].plot(kind='line', title='GDP Growth Over Time',rot=90)
plt.xlabel('Year')
plt.ylabel('GDP Growth (%)')
plt.show()

keyboard_arrow_down 19.3 Bar Plot


A bar plot (or bar chart) is a type of graph that represents categorical data with rectangular bars. Each bar's height or length is proportional to
the value it represents. Bar plots are useful for comparing different groups or categories, displaying frequencies, counts, or other measures
such as mean or sum.

19.3.1 Key Characteristics of a Bar Plot


• Categorical Data: Bar plots are typically used for categorical data, where each category is represented by a bar.
• Bars: Bars can be vertical (column chart) or horizontal (bar chart). The length or height of the bar corresponds to the value of the category it
represents.
• Axes: One axis (usually the x-axis for vertical bar plots) represents the categories, while the other axis (usually the y-axis for vertical bar plots)
represents the values.
Example
Using the previously defined DataFrame, let's create a bar plot to display the average GDP growth for each year.

import pandas as pd
import matplotlib.pyplot as plt

# Sample Data
data = {
'Year': [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
'GDP Growth': [3 5 4 0 4 5 3 8 4 2 5 0 5 5 5 8 5 2 5 0]
GDP_Growth : [3.5, 4.0, 4.5, 3.8, 4.2, 5.0, 5.5, 5.8, 5.2, 5.0],
'Inflation_Rate': [4.2, 3.8, 5.0, 4.5, 4.0, 3.5, 3.8, 4.2, 4.0, 3.9]
}
df = pd.DataFrame(data)
df.set_index('Year', inplace=True)

# Bar plot example


df['GDP_Growth'].plot(kind='bar', title='GDP Growth Over Time')
plt.xlabel('Year')
plt.ylabel('GDP Growth (%)')
plt.show()

keyboard_arrow_down 19.3.2 Width of the bars


# Adjusting width of the bars
df['GDP_Growth'].plot(kind='bar',
width=0.7,
title='Bar Width Example')
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()

keyboard_arrow_down 19.3.3 Stacked Bar Plot


df[['GDP_Growth','Inflation_Rate']].plot(kind='bar',
stacked=True,
i l ' k d l '
title='Bar Stacked Example')
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()

keyboard_arrow_down 19.3.4 Colormap


The following table summarizes the available colormaps in Matplotlib and their color schemes. You can use these colormaps to enhance the
visual representation of your data plots.
Colormap Colors

viridis Yellow-green-blue
plasma Yellow-orange-purple

inferno Yellow-orange-red-black
magma Yellow-pink-dark purple-black
cividis Blue-yellow
Greys White-grey-black

Purples Light purple-dark purple


Blues Light blue-dark blue
Greens Light green-dark green

Oranges Light orange-dark orange


Reds Light red-dark red
YlOrBr Yellow-orange-brown
YlOrRd Yellow-orange-red

OrRd Orange-red
PuRd Purple-red
RdPu Red-purple

BuPu Blue-purple
GnBu Green-blue
PuBu Purple-blue
YlGnBu Yellow-green-blue

PuBuGn Purple-blue-green
BuGn Blue-green
YlGn Yellow-green

PiYG Pink-yellow-green
PRGn Purple-green
BrBG Brown-green
PuOr Purple-orange

RdGy Red-grey
RdBu Red-blue
RdYlBu Red-yellow-blue

RdYlGn Red-yellow-green
Spectral Red-yellow-green-blue
coolwarm Blue-white-red
bwr Blue-white-red

seismic Blue-white-red
twilight Blue-purple-yellow
twilight_shifted Purple-yellow-blue

hsv Red-yellow-green-cyan-blue-magenta
Pastel1 Pastel colors
Pastel2 Pastel colors
Paired Paired colors

Accent Distinct colors


Dark2 Darker colors
Colormap Colors

Set1 Distinct colors


Set2 Softer colors
Set3 Pastel colors
tab10 10 distinct colors

tab20 20 distinct colors


tab20b 20 distinct colors (blues, greens)
tab20c 20 distinct colors (cyans, browns)

flag Red-white-blue
prism Rainbow colors
ocean Blues and cyans
gist_earth Earth tones

terrain Earth tones


gist_stern Blues and purples
gnuplot Purples and blues

gnuplot2 Reds and blues


CMRmap Various colors
cubehelix Black to white with color
brg Blue-red-green

gist_rainbow Rainbow colors


rainbow Rainbow colors
jet Blue-white-red

nipy_spectral Various colors


gist_ncar Various colors

df.plot(kind="bar",colormap="flag")
<Axes: xlabel='Year'>

keyboard_arrow_down 19.3.5 Transparency of the bars


df.plot(kind="bar",colormap="flag",alpha=0.4)
plt.show()

keyboard_arrow_down 19.4 Horizontal Bar Graph


We have syntax barh’The options for barh (horizontal bar plots) in Pandas are generally the same as those for bar (vertical bar plots). However,
the orientation is horizontal rather than vertical. We will use the same dataframe to understand this graph.

df.plot(kind="barh",width=0.8,figsize=(6,10))
<Axes: ylabel='Year'>

keyboard_arrow_down 19.5 Histogram


A histogram is a graphical representation of the distribution of a dataset. It groups data into bins and displays the frequency of data points in
each bin. This type of plot is useful for understanding the underlying distribution of data and identifying patterns such as skewness, kurtosis,
and the presence of outliers.
To create a histogram in Pandas, you can use the plot method with kind='hist' :

df['column_name'].plot(kind='hist', **options)

Example Here’s an example of a histogram plot:

import pandas as pd
import matplotlib.pyplot as plt

# Sample Data
data = {
'GDP_Growth': [3.5, 4.0, 4.5, 3.8, 4.2, 5.0, 5.5, 5.8, 5.2, 5.0, 4.8,
}
df = pd.DataFrame(data)

# Histogram Plot
df['GDP_Growth'].plot(kind='hist', bins=5, title='GDP Growth Histogram')
plt.xlabel('GDP Growth')
plt.ylabel('Frequency')
plt.show()
keyboard_arrow_down 19.5.1 Adjusting the Number of Bins
df['GDP_Growth'].plot(kind='hist', bins=8, title='GDP Growth Histogram')
plt.xlabel('GDP Growth')
plt.ylabel('Frequency')
plt.show()

data={
'Name':["A","b","c","d","e"],
'Income':[1000,2000,1500,2200,1800]
}
df1=pd.DataFrame(data)
df1
Name Income

0 A 1000

1 b 2000

2 c 1500

3 d 2200

4 e 1800

Next steps: Generate code with df1


toggle_off View recommended plots

df1.plot(kind="hist",bins=4,edgecolor="black")
<Axes: ylabel='Frequency'>

df1.plot(kind="hist",bins=8,edgecolor='black')
<Axes: ylabel='Frequency'>

keyboard_arrow_down 19.5.2 Grouping by a Column


Start coding or generate with AI.

keyboard_arrow_down 19.5.3 Adding a Grid


Start coding or generate with AI.

keyboard_arrow_down 19.5.4 Setting X and Y Axis Limits


Start coding or generate with AI.

keyboard_arrow_down 19.5.5 Cumulative Histogram


Start coding or generate with AI.

keyboard_arrow_down 19.5.6 Customizing Colors and Edges


Start coding or generate with AI.

keyboard_arrow_down 19.6 Box Plot


A box plot, also known as a box-and-whisker plot, is a graphical representation of the distribution of a dataset that displays the dataset's
minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It helps to identify outliers and the spread of the data.
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Temperature': [35, 40, 37, 30, 38],
'Humidity': [55, 60, 50, 45, 65]
}
df = pd.DataFrame(data)

# Box Plot
df[['Temperature', 'Humidity']].plot(kind='box', title='Temperature and Hu
plt.ylabel('Values')
plt.show()

keyboard_arrow_down 19.6.1 Grouping by a Column


import pandas as pd
import matplotlib.pyplot as plt

# Creating the dataset


data = {
'City': ['Karachi', 'Karachi', 'Karachi', 'Karachi', 'Karachi', 'Kara
'Lahore', 'Lahore', 'Lahore', 'Lahore', 'Lahore', 'Lahore',
'Islamabad', 'Islamabad', 'Islamabad', 'Islamabad', 'Islamab
'Quetta', 'Quetta', 'Quetta', 'Quetta', 'Quetta', 'Quetta',
'Values': [23, 45, 56, 78, 67, 34, 89, 54, 23, 77,
34, 47, 89, 65, 23, 56, 78, 45, 34, 88,
12, 56, 78, 34, 22, 44, 66, 88, 45, 67,
22, 44, 66, 88, 23, 56, 78, 45, 67, 34]
}

df = pd.DataFrame(data)
df.head()

Start coding or generate with AI.

keyboard_arrow_down 19.6.2 Customizing Font Size


import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Temperature': [35, 40, 37, 30, 38],
'Humidity': [55, 60, 50, 45, 65]
}
df = pd.DataFrame(data)

Start coding or generate with AI.

keyboard_arrow_down 19.6.3 Customizing Colors


Start coding or generate with AI.
keyboard_arrow_down 19.6.4 Filling Boxes with Color
Start coding or generate with AI.

keyboard_arrow_down 19.6.5 Showing the Mean


Start coding or generate with AI.

keyboard_arrow_down 19.6.6 Hiding the Box


Start coding or generate with AI.

keyboard_arrow_down 19.6.7 Hiding Outliers


Start coding or generate with AI.

keyboard_arrow_down 19.6.8 Horizontal Box Plot


Start coding or generate with AI.

keyboard_arrow_down 19.6.9 Custom Box Positions


Start coding or generate with AI.

keyboard_arrow_down 19.6.10 Custom Outlier Symbols


Start coding or generate with AI.

keyboard_arrow_down 19.7 Area Plot


An area plot is a type of plot that displays quantitative data visually using filled areas. It is similar to a line plot but the area between the line and
the axis is filled with color, making it useful for showing trends over time and comparing multiple datasets.

import pandas as pd
import matplotlib.pyplot as plt

# Sample Data
data = {
'Year': [2016, 2017, 2018, 2019, 2020],
'Production_A': [100, 150, 200, 180, 300],
'Production_B': [80, 130, 170, 220, 260]
}
df = pd.DataFrame(data)

# Area Plot
df.set_index('Year').plot(kind='area', title='Production Over Years')
plt.xlabel('Year')
plt.ylabel('Production')
plt.show()
The following table shows list of commonly used options in an area plot |Option | Description |Example Syntax | Notes |
|------|-----------|---------------|-------------| stacked | Whether to stack the areas | df.plot(kind='area', stacked=True) |Default is True
alpha | Transparency level of the areas | df.plot(kind='area', alpha=0.6) |Values range from 0 (transparent) to 1 (opaque) colormap |
Colormap to use for the areas | df.plot(kind='area', colormap='viridis') |Uses the specified colormap
fontsize | Font size for the tick labels | df.plot(kind='area', fontsize=12) | Sets the font size for tick labels
grid | Whether to show the grid | df.plot(kind='area', grid=True) | Adds a grid to the plot
legend | Whether to display the legend | df.plot(kind='area', legend=True) | Shows the legend
logx | Logarithmic scale on x-axis | df.plot(kind='area', logx=True) | Changes the x-axis to a logarithmic scale
logy | Logarithmic scale on y-axis | df.plot(kind='area', logy=True) | Changes the y-axis to a logarithmic scale
xlim | Limits for the x-axis | df.plot(kind='area', xlim=(2016, 2020)) | Sets the limits for the x-axis
ylim | Limits for the y-axis | df.plot(kind='area', ylim=(0, 350)) |Sets the limits for the y-axis
title | Title of the plot | df.plot(kind='area', title='Area Plot Example') |Adds a title to the plot
xlabel | Label for the x-axis | df.plot(kind='area', xlabel='Year') |Sets the label for the x-axis
ylabel | Label for the y-axis | df.plot(kind='area', ylabel='Production') |Sets the label for the y-axis

keyboard_arrow_down 19.8 Scatter Plot


A scatter plot is a type of plot that displays values for typically two variables for a set of data. The data is displayed as a collection of points,
each having the value of one variable determining the position on the horizontal axis and the value of the other variable determining the position
on the vertical axis. Scatter plots are useful for identifying relationships or correlations between variables.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Generating synthetic data with a strong positive correlation


np.random.seed(0)
temperature = np.random.randint(25, 45, size=15)
humidity = temperature + np.random.randint(-5, 5, size=15)

# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar', 'Raw
'Sialkot', 'Sargodha', 'Bahawalpur', 'Sukkur', 'Jhelum'],
'Temperature': temperature,
'Humidity': humidity
}
df = pd.DataFrame(data)
df.head()

Start coding or generate with AI.

keyboard_arrow_down 19.8.1 Specifying Marker Size


Start coding or generate with AI.

keyboard_arrow_down 19.8.2 Setting Marker Color


Start coding or generate with AI.

keyboard_arrow_down 19.8.3 Applying a Colormap


Start coding or generate with AI.

keyboard_arrow_down 19.9 Pie Chart


A pie chart is a circular statistical graphic divided into slices to illustrate numerical proportions. Each slice represents a category, and the size
of the slice corresponds to the proportion of that category relative to the whole.

import pandas as pd
import matplotlib.pyplot as plt

# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Population': [14910352, 11021000, 1014825, 1001205, 1970042]
}
df = pd.DataFrame(data)

Start coding or generate with AI.

keyboard_arrow_down 19.9.1 Adding Percentage Labels


Refer to the following table for autopct options. |Syntax |Explanation |Example| |-------|---------------|-------| |'%1.1f%%' |Shows values as percentages
with one decimal place| df.plot(kind='pie', y='Population', autopct='%1.1f%%')
|'%1.0f%%' |Shows values as percentages with no decimal place |df.plot(kind='pie', y='Population', autopct='%1.0f%%') |'%1.2f%%'| Shows values
as percentages with two decimal places | df.plot(kind='pie', y='Population', autopct='%1.2f%%') |'%d%%' |This shows the value as an integer
percentage | df.plot(kind='pie', y='Population', autopct='%d%%')

Start coding or generate with AI.

keyboard_arrow_down 19.9.2 Adjusting Percentage Distance


Start coding or generate with AI.

keyboard_arrow_down 19.9.3 Adding Shadow


Start coding or generate with AI.

keyboard_arrow_down 19.9.4 Exploding Slices


Start coding or generate with AI.

keyboard_arrow_down 19.9.5 Customizing Colors


Start coding or generate with AI.

keyboard_arrow_down 19.9.6 Setting Start Angle


Start coding or generate with AI.

keyboard_arrow_down 19.9.7 Custom Labels for Slices


Start coding or generate with AI.

keyboard_arrow_down 19.10 Hexbin Plot


A hexbin plot is a two-dimensional histogram plot that represents the counts of observations that fall into hexagonal bins. It is useful for
visualizing the relationship between two numerical variables when you have a large number of data points.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample Data
np.random.seed(0)
n = 1000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
df = pd.DataFrame({'x': x, 'y': y})

# Hexbin Plot
keyboard_arrow_down 19.11 KDE and Density Plot
A KDE plot is a method for visualizing the distribution of a continuous variable. It represents the data using a continuous probability density
curve. KDE is particularly useful for visualizing the shape of the data distribution and for comparing multiple distributions.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample Data
np.random.seed(0)
data = np.random.randn(1000)
df = pd.DataFrame({'data': data})

# KDE Plot
df['data'].plot(kind='kde', title='KDE Plot')
plt.show()

keyboard_arrow_down 19.12 Creating Multiple Plots in a Single Figure


To create multiple plots in a single figure in Pandas, you can use the subplots parameter of the plot method or use Matplotlib's
plt.subplot function. This allows you to arrange multiple plots within the same figure. Below are two examples.
Example
Let's create a DataFrame with some sample data representing different metrics for major cities in Pakistan.

import pandas as pd
import matplotlib.pyplot as plt

# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Population': [14910352, 11021000, 1014825, 1001205, 1970042],
'Area': [3527, 1772, 906, 2653, 1257],
'GDP': [164, 102, 55, 20, 30],
'Literacy Rate': [77, 74, 88, 48, 62]
}
df = pd DataFrame(data)

You might also like