20 June BA Class
20 June BA Class
19.1 Introduction
Pandas offers a variety of graphing techniques for visualizing data, leveraging the plotting capabilities of Matplotlib and Seaborn under the
hood. Here are some of the key graphing tools available in Pandas.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample Data
data = {
'Year': pd.date_range(start='2010', periods=10, freq='Y'),
'GDP_Growth': np.random.uniform(2, 6, 10),
'Inflation_Rate': np.random.uniform(3, 10, 10),
'Population': np.linspace(180, 220, 10), # in millions
'Exports': np.random.uniform(20, 30, 10) # in billion USD
}
df = pd.DataFrame(data)
df.set_index('Year', inplace=True)
df
GDP_Growth Inflation_Rate Population Exports
Year
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'Year': [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019],
'GDP Growth': [3 5 4 0 4 5 3 8 4 2 5 0 5 5 5 8 5 2 5 0]
GDP_Growth : [3.5, 4.0, 4.5, 3.8, 4.2, 5.0, 5.5, 5.8, 5.2, 5.0],
'Inflation_Rate': [4.2, 3.8, 5.0, 4.5, 4.0, 3.5, 3.8, 4.2, 4.0, 3.9]
}
df = pd.DataFrame(data)
df.set_index('Year', inplace=True)
viridis Yellow-green-blue
plasma Yellow-orange-purple
inferno Yellow-orange-red-black
magma Yellow-pink-dark purple-black
cividis Blue-yellow
Greys White-grey-black
OrRd Orange-red
PuRd Purple-red
RdPu Red-purple
BuPu Blue-purple
GnBu Green-blue
PuBu Purple-blue
YlGnBu Yellow-green-blue
PuBuGn Purple-blue-green
BuGn Blue-green
YlGn Yellow-green
PiYG Pink-yellow-green
PRGn Purple-green
BrBG Brown-green
PuOr Purple-orange
RdGy Red-grey
RdBu Red-blue
RdYlBu Red-yellow-blue
RdYlGn Red-yellow-green
Spectral Red-yellow-green-blue
coolwarm Blue-white-red
bwr Blue-white-red
seismic Blue-white-red
twilight Blue-purple-yellow
twilight_shifted Purple-yellow-blue
hsv Red-yellow-green-cyan-blue-magenta
Pastel1 Pastel colors
Pastel2 Pastel colors
Paired Paired colors
flag Red-white-blue
prism Rainbow colors
ocean Blues and cyans
gist_earth Earth tones
df.plot(kind="bar",colormap="flag")
<Axes: xlabel='Year'>
df.plot(kind="barh",width=0.8,figsize=(6,10))
<Axes: ylabel='Year'>
df['column_name'].plot(kind='hist', **options)
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'GDP_Growth': [3.5, 4.0, 4.5, 3.8, 4.2, 5.0, 5.5, 5.8, 5.2, 5.0, 4.8,
}
df = pd.DataFrame(data)
# Histogram Plot
df['GDP_Growth'].plot(kind='hist', bins=5, title='GDP Growth Histogram')
plt.xlabel('GDP Growth')
plt.ylabel('Frequency')
plt.show()
keyboard_arrow_down 19.5.1 Adjusting the Number of Bins
df['GDP_Growth'].plot(kind='hist', bins=8, title='GDP Growth Histogram')
plt.xlabel('GDP Growth')
plt.ylabel('Frequency')
plt.show()
data={
'Name':["A","b","c","d","e"],
'Income':[1000,2000,1500,2200,1800]
}
df1=pd.DataFrame(data)
df1
Name Income
0 A 1000
1 b 2000
2 c 1500
3 d 2200
4 e 1800
df1.plot(kind="hist",bins=4,edgecolor="black")
<Axes: ylabel='Frequency'>
df1.plot(kind="hist",bins=8,edgecolor='black')
<Axes: ylabel='Frequency'>
# Box Plot
df[['Temperature', 'Humidity']].plot(kind='box', title='Temperature and Hu
plt.ylabel('Values')
plt.show()
df = pd.DataFrame(data)
df.head()
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'Year': [2016, 2017, 2018, 2019, 2020],
'Production_A': [100, 150, 200, 180, 300],
'Production_B': [80, 130, 170, 220, 260]
}
df = pd.DataFrame(data)
# Area Plot
df.set_index('Year').plot(kind='area', title='Production Over Years')
plt.xlabel('Year')
plt.ylabel('Production')
plt.show()
The following table shows list of commonly used options in an area plot |Option | Description |Example Syntax | Notes |
|------|-----------|---------------|-------------| stacked | Whether to stack the areas | df.plot(kind='area', stacked=True) |Default is True
alpha | Transparency level of the areas | df.plot(kind='area', alpha=0.6) |Values range from 0 (transparent) to 1 (opaque) colormap |
Colormap to use for the areas | df.plot(kind='area', colormap='viridis') |Uses the specified colormap
fontsize | Font size for the tick labels | df.plot(kind='area', fontsize=12) | Sets the font size for tick labels
grid | Whether to show the grid | df.plot(kind='area', grid=True) | Adds a grid to the plot
legend | Whether to display the legend | df.plot(kind='area', legend=True) | Shows the legend
logx | Logarithmic scale on x-axis | df.plot(kind='area', logx=True) | Changes the x-axis to a logarithmic scale
logy | Logarithmic scale on y-axis | df.plot(kind='area', logy=True) | Changes the y-axis to a logarithmic scale
xlim | Limits for the x-axis | df.plot(kind='area', xlim=(2016, 2020)) | Sets the limits for the x-axis
ylim | Limits for the y-axis | df.plot(kind='area', ylim=(0, 350)) |Sets the limits for the y-axis
title | Title of the plot | df.plot(kind='area', title='Area Plot Example') |Adds a title to the plot
xlabel | Label for the x-axis | df.plot(kind='area', xlabel='Year') |Sets the label for the x-axis
ylabel | Label for the y-axis | df.plot(kind='area', ylabel='Production') |Sets the label for the y-axis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar', 'Raw
'Sialkot', 'Sargodha', 'Bahawalpur', 'Sukkur', 'Jhelum'],
'Temperature': temperature,
'Humidity': humidity
}
df = pd.DataFrame(data)
df.head()
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Population': [14910352, 11021000, 1014825, 1001205, 1970042]
}
df = pd.DataFrame(data)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample Data
np.random.seed(0)
n = 1000
x = np.random.standard_normal(n)
y = 2.0 + 3.0 * x + 4.0 * np.random.standard_normal(n)
df = pd.DataFrame({'x': x, 'y': y})
# Hexbin Plot
keyboard_arrow_down 19.11 KDE and Density Plot
A KDE plot is a method for visualizing the distribution of a continuous variable. It represents the data using a continuous probability density
curve. KDE is particularly useful for visualizing the shape of the data distribution and for comparing multiple distributions.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample Data
np.random.seed(0)
data = np.random.randn(1000)
df = pd.DataFrame({'data': data})
# KDE Plot
df['data'].plot(kind='kde', title='KDE Plot')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
# Sample Data
data = {
'City': ['Karachi', 'Lahore', 'Islamabad', 'Quetta', 'Peshawar'],
'Population': [14910352, 11021000, 1014825, 1001205, 1970042],
'Area': [3527, 1772, 906, 2653, 1257],
'GDP': [164, 102, 55, 20, 30],
'Literacy Rate': [77, 74, 88, 48, 62]
}
df = pd DataFrame(data)