0% found this document useful (0 votes)
20 views18 pages

Python Practical Guide 2

Uploaded by

sajeevmohan001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Python Practical Guide 2

Uploaded by

sajeevmohan001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Rajarata University of Sri Lanka

Department of Computing

DATA VISUALIZATION WITH MATPLOTLIB

Table of Contents
Principle of analytic Graphics ....................................................................................................................... 1
Activities ....................................................................................................................................................... 2
Activity 01: Line Chart ............................................................................................................................. 2
Activity 02: Scatter Plot ............................................................................................................................ 3
Activity 3: Using Data in MS Excel ......................................................................................................... 4
Activity 4: Histogram ............................................................................................................................... 6
Activity 5(i): Bubble Chart ....................................................................................................................... 7
Activity 5(ii).............................................................................................................................................. 8
Activity 6: Emulating ggplot..................................................................................................................... 9
Activity 7: Multiple Subplots ................................................................................................................... 10
Activity 8: Exporting Plots ..................................................................................................................... 11
Activity 9(i): Bar Chart ........................................................................................................................... 12
Activity 9 (ii): Differentiate between the networks by applying different colors ................................... 13
Activity 10: Scatter Chart with trend line (Using Plotly Express) .......................................................... 13
Activity 11: Scatter Chart with multiple subplot (Using Plotly Express) ............................................... 14
Activity 12: Animated Chart ................................................................................................................... 15
Activity 13: Map ..................................................................................................................................... 16
Activity 14: Map with geopandas library ............................................................................................... 17
DATA VISUALIZATION WITH MATPLOTLIB

Principle of analytic Graphics

 Principle 1: Show Comparisons


 Principle 2: Show causality, mechanism, explanation, systemic structure
 Principle 3: Show Multivariate data
 Principle 4: Integration of evidence
 Principle 5: Describe & document the evidence with appropriate labels, scales,
sources etc.
 Principle 6: Content is king

 A common visualization library in python is the matplotlib.You need to import matplotlib


before you start it in python
 If matplotlib and pandas libraries are not available. Please install it, you can refer the
below link for your reference.
(Link : https://www.youtube.com/watch?v=YDqcGxo_4WQ )

1
DATA VISUALIZATION WITH MATPLOTLIB

Activities

Activity 01: Line Chart

Enter the below codes and check the output

import matplotlib.pyplot as plt

year = [1950, 1970, 1990, 2010]


pop = [2.519, 3.692, 5.263, 6.972]

plt.plot(year, pop)
plt.show()

Output:

2
DATA VISUALIZATION WITH MATPLOTLIB

Activity 02: Scatter Plot

import matplotlib.pyplot as plt

x = [5, 7, 8, 7, 2, 17, 2, 9, 4, 11, 12, 9, 6]

y = [99, 86, 87, 88, 100, 86, 103, 87, 94, 78, 77, 85, 86]

plt.scatter(x, y, c="blue")

# To show the plot


plt.show()

3
DATA VISUALIZATION WITH MATPLOTLIB

Activity 3: Using Data in MS Excel


Note:Install openpyxl library
Create a sample data using MS Excel sheet. Then type the code and check the output.

Country Female Literacy Fertility Population


Afghanistan 53% 1.8 50
Albania 42% 2.4 100
Algeria 57% 3.3 150
Andorra 61% 3.5 200
Angola 41% 1.8 250
Anguilla 44% 2.1 300
Antigua and Barbuda 48% 1.7 350
Argentina 62% 3.2 50
Armenia 28% 2.2 100
Australia 35% 1.7 150
Austria 45% 1.8 200
Azerbaijan 48% 2.4 250
Bahamas 25% 3.3 300
Bahrain 56% 3.5 350
Bangladesh 24% 1.8 50
Barbados 51% 2.2 100
Chad 57% 1.8 150
Chile 42% 2.4 200
China 57% 3.3 250
Colombia 61% 3.5 300
Comoros 41% 1.8 350
Congo 44% 2.2 200
Cook Islands 28% 1.7 250
Costa Rica 35% 1.8 300
Côte d'Ivoire 45% 2.4 150
Croatia 48% 3.3 200
Cuba 25% 3.5 250
Cyprus 56% 1.8 300
Czechia 24% 2.2 350
Denmark 35% 2.4 150
Djibouti 45% 1.7 200
Dominica 48% 3.2 250
Dominican Republic 25% 2.2 450
Ecuador 56% 1.7 350
Egypt 35% 1.8 400
El Salvador 45% 2.4 120
Equatorial Guinea 48% 3.3 350
Eritrea 25% 3.5 150
Estonia 56% 1.8 200
Eswatini 45% 2.2 250
Ethiopia 48% 1.8 100
Fiji 25% 2.4 300
Finland 56% 3.3 350
France 24% 3.5 150
Gabon 35% 1.8 200
Gambia 45% 2.4 250
Georgia 48% 3.3 300
Germany 25% 3.5 120
Ghana 56% 1.8 150
Greece 35% 2.2 200
Grenada 45% 1.8 250
Guatemala 48% 2.2 300
Guinea 25% 1.8 300
Guinea-Bissau 56% 2.4 350
Guyana 24% 1.8 150
Haiti 35% 2.4 200
Holy See 45% 3.3 200
Honduras 48% 3.5 300
Hungary 25% 1.8 120
Iceland 56% 2.2 400
India 35% 1.8 420

4
DATA VISUALIZATION WITH MATPLOTLIB

import matplotlib.pyplot as plt


import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\HpUser\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization

population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',


'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)

plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'])


plt.show()

Output

5
DATA VISUALIZATION WITH MATPLOTLIB

Activity 4: Histogram

import matplotlib.pyplot as plt


import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization
population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',
'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)
#plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])

# FiLL missing vaLues in PopuLation column with the median vaLue.


population_literacy['Population'] =
population_literacy['Population'].fillna(population_literacy['Population'].median())
plt.hist(population_literacy['Population'], bins=5)
plt.show()

Output

6
DATA VISUALIZATION WITH MATPLOTLIB

Activity 5(i): Bubble Chart

import matplotlib.pyplot as plt


import pandas as pd
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
# Rename coLumns to remove white spaces and inconsistent copitoLization
population_literacy.rename(columns={'Country ': 'Country', 'Continent':'Continent',
'female literacy' : 'Female Literacy','fertility': 'Fertility', 'population': 'Population'}, inplace=True)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'],
s= population_literacy['Fertility'] ** 3,marker='o',c=population_literacy['Fertility'])

plt.show()

Output

7
DATA VISUALIZATION WITH MATPLOTLIB

Activity 5(ii)

Change the marker to marker='x' and check the output

Output

8
DATA VISUALIZATION WITH MATPLOTLIB

Activity 6: Emulating ggplot

import matplotlib.pyplot as plt


import pandas as pd

#Emulate ggplot
plt.style.use('ggplot')
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\HpUser\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')

plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'],
s= population_literacy['Fertility'] ** 4,marker='o',c=population_literacy['Fertility'])

#add Title
plt.title('Female Literacy vs. Fertility')

#Add x axis label


plt.xlabel('Literacy')

#add y axis label


plt.ylabel('# of Children')

plt.show()

Output

9
DATA VISUALIZATION WITH MATPLOTLIB

Activity 7: Multiple Subplots

import matplotlib.pyplot as plt


import pandas as pd

year = [1950, 1970, 1990, 2010]


pop = [2.519, 3.692, 5.263, 6.972]
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
#Create figure and array ox containing the subplots
fig, ax = plt.subplots(nrows=2, ncols=2)
# Access the first subplot: upper Left
plt.subplot(2,2,1)
plt.plot(year, pop)
# Access the second subpLot: upper right
plt.subplot(2,2,2)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])
# Access the third subpLot: Lower Left
plt.subplot(2,2,3)
plt.hist(population_literacy['Population'], bins=5)
# Access the fourth subpLot: Lower right
plt.subplot(2,2,4)
plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'],
s =population_literacy['Fertility'] ** 3, marker='o', c=population_literacy['Fertility'])
plt.show()

10
DATA VISUALIZATION WITH MATPLOTLIB

Activity 8: Exporting Plots

import matplotlib.pyplot as plt


import pandas as pd

year = [1950, 1970, 1990, 2010]


pop = [2.519, 3.692, 5.263, 6.972]
#Change the location of the Excel file according to yours
data = pd. ExcelFile('C:\\Users\\Hp User\\Downloads\population_literacy.xlsx')
population_literacy = data.parse('Sheet1')
#Create figure and array ox containing the subplots
fig, ax = plt.subplots(nrows=2, ncols=2)
# Access the first subplot: upper Left
plt.subplot(2,2,1)
plt.plot(year, pop)
# Access the second subpLot: upper right
plt.subplot(2,2,2)
plt.scatter(population_literacy['Female Literacy'],population_literacy['Fertility'])
# Access the third subpLot: Lower Left
plt.subplot(2,2,3)
plt.hist(population_literacy['Population'], bins=5)
# Access the fourth subpLot: Lower right
plt.subplot(2,2,4)
plt.scatter(population_literacy['Female Literacy'], population_literacy['Fertility'],
s =population_literacy['Fertility'] ** 3, marker='o', c=population_literacy['Fertility'])
plt.savefig('D:\\subplot.png')
plt.show()

11
DATA VISUALIZATION WITH MATPLOTLIB

Activity 9(i): Bar Chart

import pandas as pd
import plotly.express as px
#Change the location of the CSV file according to yours
Phone_Data=pd.read_csv('C:\\Users\\Hp User\\Downloads\Phone_Data.csv' )
# Get total duration for each network
total_duration_by_network=Phone_Data.groupby('network')['duration'].sum()
#Convert Series into Dataframe as required by Plotly Express
total_duration_by_network=total_duration_by_network.to_frame('duration').reset_index()

bar_chart =px.bar(total_duration_by_network.reset_index(),x="network",y="duration")
bar_chart.show()

12
DATA VISUALIZATION WITH MATPLOTLIB

Activity 9 (ii): Differentiate between the networks by applying different colors

import pandas as pd
import plotly.express as px
Phone_Data=pd.read_csv('C:\\Users\\Hp User\\Downloads\Phone_Data.csv' )
#Change the location of the CSV file according to yours
# Get total duration for each network
total_duration_by_network=Phone_Data.groupby('network')['duration'].sum()
#Convert Series into Dataframe as required by Plotly Express
total_duration_by_network=total_duration_by_network.to_frame('duration').reset_index()

bar_chart
=px.bar(total_duration_by_network.reset_index(),x="network",y="duration",color="network")
bar_chart.show()

Activity 10: Scatter Chart with trend line (Using Plotly Express)

Scatter plots support linear and non-linear trend lines.

import plotly.express as px

df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", trendline="ols")
fig.show()

13
DATA VISUALIZATION WITH MATPLOTLIB

Activity 11: Scatter Chart with multiple subplot (Using Plotly Express)

Can easily plot scatter chart using Plotly Express. You can easily create multiple subplots in a
very intuitive manner directly from the scatter function.

import plotly.express as px
df = px.data.tips()
fig = px.scatter(df, x="total_bill", y="tip", color="smoker",
facet_col="sex", facet_row="time")
fig.show()

14
DATA VISUALIZATION WITH MATPLOTLIB

Activity 12: Animated Chart

import plotly.express as px

df = px.data.gapminder()

fig = px.bar(df, x="continent", y="pop", color="continent",


animation_frame="year", animation_group="country", range_y=[0,4000000000])
fig.show()

Please click this to


Animated Chart_1.wmv see the animation

15
DATA VISUALIZATION WITH MATPLOTLIB

Activity 13: Map

Generate a map from a preloaded dataset in Plotly Express. Use the data function
to load gapminder dataset. Gapminder is a dataset about life expectancies across all
the countries in the world.

import plotly.express as px
df = px.data.gapminder().query("year==2007")
fig = px.scatter_geo(df, locations="iso_alpha", color="continent",
hover_name="country", size="pop",
projection="natural earth")
fig.show()

16
DATA VISUALIZATION WITH MATPLOTLIB

Activity 14: Map with geopandas library

Generate a map from a preloaded dataset in Plotly Express. Use the data function
to load geojson dataset.

Note:Install geopandas library


import plotly.express as px
import geopandas as gpd

df = px.data.election()
geo_df = gpd.GeoDataFrame.from_features(
px.data.election_geojson()["features"]
).merge(df, on="district").set_index("district")

fig = px.choropleth(geo_df,
geojson=geo_df.geometry,
locations=geo_df.index,
color="Joly",
projection="mercator")
fig.update_geos(fitbounds="locations", visible=False)
fig.show()

Output:

17

You might also like