Plotting Directly With Matplotlib: Objectives
Plotting Directly With Matplotlib: Objectives
with Matplotlib
Estimated time needed: 15 minutes
Objectives
After completing this lab you will be able to:
Table of Contents
1. Import Libraries
2. Fetching Data
3. Line Plot
4. Scatter Plot
5. Bar Plot
6. Histogram
7. Pie
8. Subplotting
Import Libraries
Import the matplotlib library.
# use the inline backend to generate the plots within the browser
%matplotlib inline
Fetching Data
Dataset: Immigration to Canada from 1980 to 2013 - International migration flows to and
from selected countries - The 2015 revision from United Nation's website.
In this lab, we will focus on the Canadian Immigration data and use the already cleaned
dataset and can be fetched from here.
You can refer to the lab on data pre-processing wherein this dataset is cleaned for a quick
refresh your Panads skill Data pre-processing with Pandas
In [3]: df_can.head()
Out[3]: Country Continent Region DevName 1980 1981 1982 1983 1984 1985 ...
Southern Developing
0 Afghanistan Asia 16 39 39 47 71 340 ...
Asia regions
Southern Developed
1 Albania Europe 1 0 0 0 0 0 ...
Europe regions
Northern Developing
2 Algeria Africa 80 67 71 69 63 44 ...
Africa regions
American Developing
3 Oceania Polynesia 0 1 0 0 0 0 ...
Samoa regions
Southern Developed
4 Andorra Europe 0 0 0 0 0 0 ...
Europe regions
5 rows × 39 columns
Let's find out how many entries there are in our dataset.
(195, 39)
Set the country name as index - useful for quickly looking up countries using .loc method.
# Let's view the first five elements and see how the dataframe was changed
df_can.head()
Out[5]: Continent Region DevName 1980 1981 1982 1983 1984 1985 1986
Country
Southern Developing
Afghanistan Asia 16 39 39 47 71 340 496
Asia regions
Southern Developed
Albania Europe 1 0 0 0 0 0 1
Europe regions
Northern Developing
Algeria Africa 80 67 71 69 63 44 69
Africa regions
American Developing
Oceania Polynesia 0 1 0 0 0 0 0
Samoa regions
Southern Developed
Andorra Europe 0 0 0 0 0 0 2
Europe regions
5 rows × 38 columns
Notice now the country names now serve as indices.
Line Plot
A line plot displays the relationship between two continuous variables over a continuous
interval, showing the trend or pattern of the data.
Let's created a line plot to visualize the immigrants (to Canada) trend during 1980 to 2013.
We need the Total of year-wise immigrants,
We will create a new dataframe for only columns containing the years
then, we will apply sum() on the dataframe
You can do create a line plot directly on axes by calling plot function plot()
In [8]: #As years is in the array format, you will be required to map it to str for plottin
#y=list(map(str, years))
The plot function populated the x-axis with the index values (years), and
the y-axis with the column values (population).
However, notice how the years were not displayed because they are of type
string.
Therefore, let's change the type of the index values to integer for plotting.
plt.show()
Let's include the background grid, a legend and try to change the limits on
the axis
#limits on x-axis
plt.xlim(1975, 2015) #or ax.set_xlim()
#Enabling Grid
plt.grid(True) #or ax.grid()
#Legend
plt.legend(["Immigrants"]) #or ax.legend()
#Display the plot
plt.show()
In 2010, Haiti suffered a catastrophic magnitude 7.0 earthquake. The quake caused
widespread devastation and loss of life and aout three million people were affected by this
natural disaster. As part of Canada's humanitarian effort, the Government of Canada stepped
up its effort in accepting refugees from Haiti. We can quickly visualize this effort using a Line
plot:
You be required to create a dataframe where the name of the 'Country' is equal to 'Haiti'
and years from 1980 - 2013
Also you will be required to transpose the new dataframe in to a series for plotting
Might also have to change the type of index of the series to integer for a better look of the
plot
Then create fig and ax and call function plot() on the data.
ax.plot(haiti)
#Legend
plt.legend(["Immigrants"]) #or ax.legend()
You can also specify the ticks to be displayed on the plot like this -
ax.set_xticks(list(range(1980, 2015,5)))
Scatter Plot
A scatter plot visualizes the relationship between two continuous variables, displaying
individual data points as dots on a two-dimensional plane, allowing for the examination of
patterns, clusters, and correlations.
Let's created a Scatter plot to visualize the immigrants (to Canada) trend during 1980 to
2013.
We need the Total of year-wise immigrants,
We will create a new dataframe for only columns containing the years
then, we will apply sum() on the dataframe
You can do create a scatter plot directly on ax by calling plot function scatter()
#add title
plt.title('Immigrants between 1980 to 2013')
#add labels
plt.xlabel('Years')
plt.ylabel('Total Immigrants')
#including grid
plt.grid(True)
total_immigrants.index = total_immigrants.index.map(int)
#add title
plt.title('Immigrants between 1980 to 2013')
#add labels
plt.xlabel('Years')
plt.ylabel('Total Immigrants')
#including grid
plt.grid(True)
Bar Plot
A bar plot represents categorical data with rectangular bars, where the height of each bar
corresponds to the value of a specific category, making it suitable for comparing values
across different categories.
Let's create a bar plot to visualize the top 5 countries that contribued the most immigrants
to Canada from 1980 to 2013.
We will create a new dataframe for only columns containing the years
then, we will apply sum() on the dataframe and can create a separatedataframe for top five
countries
You can further use the names of the countries to label each bar on the plot
Out[19]: ['India',
'China',
'United Kingdom of Great Britain and Northern Ireland',
'Philippines',
'Pakistan']
The third name is too lengthy to fit on the x-axis as label. Let's fix this using indexing
In [20]: label[2]='UK'
label
ax.bar(label,df_bar_5['Total'], label=label)
ax.set_title('Immigration Trend of Top 5 Countries')
ax.set_ylabel('Number of Immigrants')
ax.set_xlabel('Years')
plt.show()
Question:
Create a bar plot of the 5 countries that contributed the least to immigration to
Canada from 1980 to 2013.
In [22]: #Sorting the dataframe on 'Total' in descending order
df_can.sort_values(['Total'], ascending=True, axis=0, inplace=True)
ax.bar(label, df_least5_bar['Total'],label=label)
ax.set_title('Immigration Trend of Top 5 Countries')
ax.set_ylabel('Number of Immigrants')
ax.set_xlabel('Years')
plt.show()
Histogram
A histogram is a way of representing the frequency distribution of numeric dataset. The
way it works is it partitions the x-axis into bins, assigns each data point in our dataset to a
bin, and then counts the number of data points that have been assigned to each bin. So
the y-axis is the frequency or the number of data points in each bin. Note that we can
change the bin size and usually one needs to tweak it so that the distribution is displayed
nicely.
Let's find out the frequency distribution of the number (population) of new immigrants
from the various countries to Canada in 2013?
In [23]: df_country = df_can.groupby(['Country'])['2013'].sum().reset_index()
#you can check the arrays in count with indexing count[0] for count, count[1] fo
1984 93 31 128
1985 73 54 158
1986 93 56 187
1994 93 60 192
1996 70 70 161
1998 63 31 123
1999 81 36 170
2000 93 56 138
2001 81 78 184
2002 70 74 149
2003 89 77 161
2004 89 73 129
2005 62 57 205
2007 97 73 193
2009 81 75 167
Country Denmark Norway Sweden
2010 92 46 159
2011 93 49 134
2012 94 53 140
2013 81 59 140
Question:
What is the immigration distribution for China and India for years 2000 to 2013?
Pie Chart
A pie chart represents the proportion or percentage distribution of different categories
in a dataset using sectors of a circular pie.
Let's create a pie chart representing the 'Total Immigrants' for the year 1980 to 1985
In [28]: fig,ax=plt.subplots()
#Pie on immigrants
ax.pie(total_immigrants[0:5], labels=years[0:5],
colors = ['gold','blue','lightgreen','coral','cyan'],
autopct='%1.1f%%',explode = [0,0,0,0,0.1]) #using explode to highlight t
First, you will have to group the data over continents and get the sum on total. Then
you can pass this data to the pie function
0 Africa 618948
1 Asia 3317794
2 Europe 1410947
5 Oceania 55174
In [30]: fig,ax=plt.subplots(figsize=(10, 4))
#Pie on immigrants
ax.pie(df_con['Total'], colors = ['gold','blue','lightgreen','coral','cyan','r
autopct='%1.1f%%', pctdistance=1.25)
Sub-plotting
Let us explore how to display more than one plot on the same figure and specify the
number of rows and columns to be created to the subplots function.
For instance, let’s create a line and scatter plot in one row
plt.subplots()
You can use the same functions using which you plotte lne and scatter plots at the
start of this lab
Both the subplots will be sharing the same y-axis as the data in the y-axis is the same.
So, assign the ‘Sharey’ parameter as True in the code below. Also notice the use of
'suptitle'
In [31]: # Create a figure with two axes in a row
axs[0].set_ylabel("Number of Immigrants")
In [32]: # Create a figure with Four axes - two rows, two columns
fig = plt.figure(figsize=(8,4))
# Add the first subplot (top-left)
axs1 = fig.add_subplot(1, 2, 1)
#Plotting in first axes - the left one
axs1.plot(total_immigrants)
axs1.set_title("Line plot on immigrants")
Author
Dev Agnihotri