0% found this document useful (0 votes)

24 views18 pages

01 - Lesson - Visualization - Jupyter Notebook

Belajar Data Sains : Visualisasi

Uploaded by

almamalik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views18 pages

01 - Lesson - Visualization - Jupyter Notebook

Belajar Data Sains : Visualisasi

Uploaded by

almamalik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Data Visualization

Data Visualization
Key skill today

“The ability to take data-to be able to understand it, to process it, to extract value
from it, to visualize it, to communicate it-that’s going to be a hugely important skill
in the next decades."

Hal Varian (Google’s Chief Economist) (https://en.wikipedia.org/wiki/Hal_Varian)

Data Visualization for a Data Scientist

1. Data Quality: Explore data quality including identifying outliers
2. Data Exploration: Understand data with visualizing ideas
3. Data Presentation: Present results

The power of Data Visualization

Consider the following data

what is the connection?
See any patterns?

In [2]: import pandas as pd

In [8]: sample = pd.read_csv('files/sample_corr.csv')

In [9]: sample

Out[9]: x y

0 1.105722 1.320945

1 1.158193 1.480131

2 1.068022 1.173479

3 1.131291 1.294706

4 1.125997 1.293024

5 1.037332 0.977393

6 1.051670 1.040798

7 0.971699 0.977604

8 1.102914 1.127956

9 1.164161 1.431070

10 1.161464 1.344481

11 1.080161 1.191159

12 0.996044 0.997308

13 1.143305 1.412850

14 1.062949 1.139761

15 1.149252 1.455886

16 1.190105 1.489407

17 1.026498 1.153031

18 1.110015 1.329586

19 1.077741 1.277995

In [ ]:

Visualizing the same data

Let's try to visualize the data

Matplotlib (https://matplotlib.org) is an easy to use visualization library for Python.

In Notebooks you get started with.

import matplotlib.pyplot as plt

%matplotlib inline
In [12]: import matplotlib.pyplot as plt
%matplotlib inline

In [13]: sample.plot.scatter(x='x', y='y')

Out[13]: <AxesSubplot:xlabel='x', ylabel='y'>

In [ ]:

What Data Visualization gives

Absorb information quickly
Improve insights
Make faster decisions

Data Quality

Is the data quality usable

Consider the dataset: files/sample_height.csv

Check for missing values

isna() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isna.html) .any()

(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.any.html): Check for any missing
values - returns True if missing values

data.isna().any()

Visualize data
Notice: you need to know something about the data
We know that it is heights of humans in centimeters
This could be checked with a histogram

In [14]: data = pd.read_csv('files/sample_height.csv')

In [15]: data.head()

Out[15]: height

0 129.150282

1 163.277930

2 173.965641

3 168.933825

4 171.075462

In [17]: data.isna().any()

Out[17]: height False

dtype: bool

In [18]: data.plot.hist()

Out[18]: <AxesSubplot:ylabel='Frequency'>
In [19]: data[data['height'] < 50]

Out[19]: height

17 1.913196

22 1.629159

23 1.753424

27 1.854795

50 1.914587

60 1.642295

73 1.804588

82 1.573621

91 1.550227

94 1.660700

97 1.675962

98 1.712382

In [ ]:

Identifying outliers
Consider the dataset: files/sample_age.csv

Visualize with a histogram

This gives fast insights

Describe the data

describe() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html):
Makes simple statistics of the DataFrame

data.describe()

In [20]: data = pd.read_csv('files/sample_age.csv')

In [21]: data.head()

Out[21]: age

0 30.175921

1 32.002551

2 44.518393

3 56.247751

4 33.111986

In [22]: data.describe()

Out[22]: age

count 100.000000

mean 42.305997

std 29.229478

min 18.273781

25% 31.871113

50% 39.376896

75% 47.779303

max 314.000000

In [23]: data.plot.hist()

Out[23]: <AxesSubplot:ylabel='Frequency'>
In [24]: data[data['age'] > 150]

Out[24]: age

31 314.0

Data Exploration

Data Visaulization
Absorb information quickly
Improve insights
Make faster decisions

World Bank
The World Bank (https://www.worldbank.org/en/home) is a great source of datasets

CO2 per capita

Let's explore this dataset EN.ATM.CO2E.PC

(https://data.worldbank.org/indicator/EN.ATM.CO2E.PC)
Already available here: files/WorldBank-ATM.CO2E.PC_DS2.csv

Explore typical Data Visualizations

Simple plot
Set title
Set labels
Adjust axis

Read the data

In [26]: data = pd.read_csv('files/WorldBank-ATM.CO2E.PC_DS2.csv', index_col=0)
data.head()

Out[26]: ABW AFE AFG AFW AGO ALB AND ARB ARE

Year

1960 204.631696 0.906060 0.046057 0.090880 0.100835 1.258195 NaN 0.609268 0.119037

1961 208.837879 0.922474 0.053589 0.095283 0.082204 1.374186 NaN 0.662618 0.109136

1962 226.081890 0.930816 0.073721 0.096612 0.210533 1.439956 NaN 0.727117 0.163542

1963 214.785217 0.940570 0.074161 0.112376 0.202739 1.181681 NaN 0.853116 0.175833

1964 207.626699 0.996033 0.086174 0.133258 0.213562 1.111742 NaN 0.972381 0.132815

5 rows × 266 columns

In [ ]:

Simple plot

.plot() Creates a simple plot of data

This gives you an idea of the data

In [27]: data['USA'].plot()

Out[27]: <AxesSubplot:xlabel='Year'>

In [ ]:

Adding title and labels

Arguments

title='Tilte' adds the title

xlabel='X label' adds or changes the X-label
ylabel='X label' adds or changes the Y-label

In [20]: data['USA'].plot(title='CO2 per capita in USA', ylabel='CO2 per capita')

Out[20]: <AxesSubplot:title={'center':'CO2 per capita in USA'}, xlabel='Year', ylabel='C

O2 per capita'>

In [ ]:

Adding axis range

xlim=(min, max) or xlim=min Sets the x-axis range

ylim=(min, max) or ylim=min Sets the y-axis range
In [21]: data['USA'].plot(title='CO2 per capita in USA', ylabel='CO2 per capita', ylim=0)

Out[21]: <AxesSubplot:title={'center':'CO2 per capita in USA'}, xlabel='Year', ylabel='C

O2 per capita'>

In [ ]:

Comparing data
Explore USA and WLD
In [25]: data[['USA', 'WLD']].plot(ylim=0)

Out[25]: <AxesSubplot:xlabel='Year'>

In [ ]:

Set the figure size

figsize=(width, height) in inches

In [27]: data[['USA', 'DNK', 'WLD']].plot(ylim=0, figsize=(20,6))

Out[27]: <AxesSubplot:xlabel='Year'>

In [ ]:

Bar plot
.plot.bar() Create a bar plot

In [28]: data['USA'].plot.bar(figsize=(20,6))

Out[28]: <AxesSubplot:xlabel='Year'>
In [29]: data[['USA', 'WLD']].plot.bar(figsize=(20,6))

Out[29]: <AxesSubplot:xlabel='Year'>

Plot a range of data

.loc[from:to] apply this on the DataFrame to get a range (both inclusive)

In [30]: data[['USA', 'WLD']].loc[2000:].plot.bar(figsize=(20,6))

Out[30]: <AxesSubplot:xlabel='Year'>

In [ ]:

Histogram
.plot.hist() Create a histogram
bins=<number of bins> Specify the number of bins in the histogram.
In [34]: data['USA'].plot.hist(figsize=(20,6), bins=7)

Out[34]: <AxesSubplot:ylabel='Frequency'>

In [ ]:

Pie chart
.plot.pie() Creates a Pie Chart

In [35]: df = pd.Series(data=[3, 5, 7], index=['Data1', 'Data2', 'Data3'])

df.plot.pie()

Out[35]: <AxesSubplot:ylabel='None'>

In [ ]:

Value counts and pie charts

A simple chart of values above/below a threshold
.value_counts() Counts occurences of values in a Series (or DataFrame column)
A few arguments to .plot.pie()
colors=<list of colors>
labels=<list of labels>
title='<title>'
ylabel='<label>'
autopct='%1.1f%%' sets percentages on chart

In [43]: (data['USA'] < 17.5).value_counts().plot.pie(colors=['r', 'g'], labels=['>=17.5',

Out[43]: <AxesSubplot:title={'center':'CO2 per capita'}, ylabel='USA'>

In [ ]:

Scatter plot
Assume we want to investigate if GDP per capita and CO2 per capita are correlated
Data available in 'files/co2_gdp_per_capita.csv'
.plot.scatter(x=<label>, y=<label>) Create a scatter plot
.corr() Compute pairwise correlation of columns (docs
(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html))

In [44]: data = pd.read_csv('files/co2_gdp_per_capita.csv', index_col=0)

data.head()

Out[44]: CO2 per capita GDP per capita

AFE 0.933541 1507.861055

AFG 0.200151 568.827927

AFW 0.515544 1834.366604

AGO 0.887380 3595.106667

ALB 1.939732 4433.741739

In [46]: data.plot.scatter(x='CO2 per capita', y='GDP per capita')

Out[46]: <AxesSubplot:xlabel='CO2 per capita', ylabel='GDP per capita'>

In [47]: data.corr()

Out[47]: CO2 per capita GDP per capita

CO2 per capita 1.000000 0.633178

GDP per capita 0.633178 1.000000

In [ ]:

Data Presentation
This is about making data esay to digest

The message
Assume we want to give a picture of how US CO2 per capita is compared to the rest of the world

Preparation

Let's take 2017 (as more recent data is incomplete)

What is the mean, max, and min CO2 per capital in the world

In [54]: data = pd.read_csv('files/WorldBank-ATM.CO2E.PC_DS2.csv', index_col=0)

In [55]: data.head()

Out[55]: ABW AFE AFG AFW AGO ALB AND ARB ARE

Year

1960 204.631696 0.906060 0.046057 0.090880 0.100835 1.258195 NaN 0.609268 0.119037 2.38

1961 208.837879 0.922474 0.053589 0.095283 0.082204 1.374186 NaN 0.662618 0.109136 2.45

1962 226.081890 0.930816 0.073721 0.096612 0.210533 1.439956 NaN 0.727117 0.163542 2.53

1963 214.785217 0.940570 0.074161 0.112376 0.202739 1.181681 NaN 0.853116 0.175833 2.33

1964 207.626699 0.996033 0.086174 0.133258 0.213562 1.111742 NaN 0.972381 0.132815 2.55

5 rows × 266 columns

In [ ]:

In [56]: data.loc[year].describe()

Out[56]: count 239.000000

mean 4.154185
std 4.575980
min 0.028010
25% 0.851900
50% 2.667119
75% 6.158644
max 32.179371
Name: 2017, dtype: float64

In [ ]:

And in the US?

In [57]: data.loc[year]['USA']

Out[57]: 14.8058824221278

In [ ]:

How can we tell a story?

US is above the mean

US is not the max
It is above 75%
Some more advanced matplotlib

In [58]: ax = data.loc[year].plot.hist(bins=15, facecolor='green')

ax.set_xlabel('CO2 per capita')
ax.set_ylabel('Number of countries')
ax.annotate("USA", xy=(15, 5), xytext=(15, 30),
arrowprops=dict(arrowstyle="->",
connectionstyle="arc3"))

Out[58]: Text(15, 30, 'USA')

Creative story telling with data visualization

Check out this video https://www.youtube.com/watch?v=jbkSRLYSojo
(https://www.youtube.com/watch?v=jbkSRLYSojo)

In [60]: from IPython.display import YouTubeVideo

YouTubeVideo('jbkSRLYSojo')

Out[60]:

In [ ]:

Android Development Internship Report
100% (2)
Android Development Internship Report
18 pages
Extended - Case - 2 - Fellow: 1 The Adverse Health Effects of Air Pollution - Are We Making Any Progress?
No ratings yet
Extended - Case - 2 - Fellow: 1 The Adverse Health Effects of Air Pollution - Are We Making Any Progress?
61 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
Data Unit4
No ratings yet
Data Unit4
8 pages
DVPD Final Lab Word PDF
No ratings yet
DVPD Final Lab Word PDF
93 pages
Lesson 1 - Data Visualisation
No ratings yet
Lesson 1 - Data Visualisation
35 pages
DAC Phase2
No ratings yet
DAC Phase2
8 pages
DEV Experiment No.3
No ratings yet
DEV Experiment No.3
10 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Data Visualization Python Tutorial
No ratings yet
Data Visualization Python Tutorial
9 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Introduction To Data Visualization With Python
No ratings yet
Introduction To Data Visualization With Python
47 pages
ml report
No ratings yet
ml report
12 pages
2,3. Introduction Pandas & Matplotlib - Copy
No ratings yet
2,3. Introduction Pandas & Matplotlib - Copy
32 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Matplotlib Plots
No ratings yet
Matplotlib Plots
13 pages
Project Arsh
No ratings yet
Project Arsh
21 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
Data Visualization with Python
No ratings yet
Data Visualization with Python
42 pages
BDA Seminar
No ratings yet
BDA Seminar
15 pages
Comprehensive Data Visualization With Matplotlib and Seaborn
No ratings yet
Comprehensive Data Visualization With Matplotlib and Seaborn
40 pages
Matplotlib Pandas Guide
No ratings yet
Matplotlib Pandas Guide
7 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
lab record dev
No ratings yet
lab record dev
20 pages
Practical D.V
No ratings yet
Practical D.V
13 pages
Visualization
No ratings yet
Visualization
28 pages
2.Program
No ratings yet
2.Program
8 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Data Visulization
No ratings yet
Data Visulization
2 pages
Using Python For Data Analysis - July 2018 - Slides
No ratings yet
Using Python For Data Analysis - July 2018 - Slides
43 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Datascienece
No ratings yet
Datascienece
18 pages
Mat Plot Lib
No ratings yet
Mat Plot Lib
2 pages
DV Co1 All PDF
No ratings yet
DV Co1 All PDF
196 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
Intermediate Python
No ratings yet
Intermediate Python
22 pages
Week13 2 Data Analysis 2
No ratings yet
Week13 2 Data Analysis 2
44 pages
Intermediate Python
No ratings yet
Intermediate Python
22 pages
DMV Unit-4-1.pdf
No ratings yet
DMV Unit-4-1.pdf
10 pages
INDEX (1)
No ratings yet
INDEX (1)
16 pages
Machine Learning Project 3
No ratings yet
Machine Learning Project 3
74 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
Unit-5 new
No ratings yet
Unit-5 new
31 pages
Advanced Visualization For Data Scientists With Matplotlib
No ratings yet
Advanced Visualization For Data Scientists With Matplotlib
38 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
lecture4
No ratings yet
lecture4
60 pages
Line Chart
No ratings yet
Line Chart
33 pages
Data Analytics With Python Examples
No ratings yet
Data Analytics With Python Examples
2 pages
data visualization
No ratings yet
data visualization
40 pages
Data Visualization - Matplotlib PDF
100% (1)
Data Visualization - Matplotlib PDF
15 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Chapter 4 Plotting Data using Matplotlib
No ratings yet
Chapter 4 Plotting Data using Matplotlib
11 pages
DVA Practical
No ratings yet
DVA Practical
19 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Exp 5
No ratings yet
Exp 5
5 pages
Develop Snakes & Ladders Game Complete Guide with Code & Design
From Everand
Develop Snakes & Ladders Game Complete Guide with Code & Design
Anurag Pandey
No ratings yet
Operating Sistem Chapter 7
No ratings yet
Operating Sistem Chapter 7
33 pages
ADSP-15 Finite Wordlength Effects
No ratings yet
ADSP-15 Finite Wordlength Effects
64 pages
Operating Sistem Chapter 8
No ratings yet
Operating Sistem Chapter 8
24 pages
Operating Sistem Chapter 5
No ratings yet
Operating Sistem Chapter 5
22 pages
Operating Sistem Chapter 4
No ratings yet
Operating Sistem Chapter 4
23 pages
Operating Sistem Chapter 3
No ratings yet
Operating Sistem Chapter 3
24 pages
Operating Sistem Chapter 6
No ratings yet
Operating Sistem Chapter 6
17 pages
Operating Sistem Chapter 2
No ratings yet
Operating Sistem Chapter 2
25 pages
Studio 2003 Users Manual 01
No ratings yet
Studio 2003 Users Manual 01
28 pages
Operating Sistem Chapter 1
No ratings yet
Operating Sistem Chapter 1
18 pages
Tutotial Powersim Studio 10 Part 1
No ratings yet
Tutotial Powersim Studio 10 Part 1
28 pages
Tutorial Matplotlib
No ratings yet
Tutorial Matplotlib
75 pages
The System Dynamics As A Tool For Modeling Healtcare System
No ratings yet
The System Dynamics As A Tool For Modeling Healtcare System
8 pages
Practice Six Steps
No ratings yet
Practice Six Steps
4 pages
A System Dynamic Simulation Model
No ratings yet
A System Dynamic Simulation Model
62 pages
Modem
No ratings yet
Modem
18 pages
Proposal For Research Project / Thuyết Minh Đề Cương
No ratings yet
Proposal For Research Project / Thuyết Minh Đề Cương
13 pages
SwOS CSS610
No ratings yet
SwOS CSS610
14 pages
NEW APP Format (RA-11469)
No ratings yet
NEW APP Format (RA-11469)
14 pages
Provisioning Desing - Entel - VoLTE
No ratings yet
Provisioning Desing - Entel - VoLTE
19 pages
SI-400 User's Manual-B5 - 05 Beep Rev1
No ratings yet
SI-400 User's Manual-B5 - 05 Beep Rev1
8 pages
SINDT 01-RS485 Datasheet
No ratings yet
SINDT 01-RS485 Datasheet
12 pages
Fundamental Unit 3
No ratings yet
Fundamental Unit 3
25 pages
Major Research Project Praposal-Dr. Bhanu Pratap Singh - Roll - No-14
No ratings yet
Major Research Project Praposal-Dr. Bhanu Pratap Singh - Roll - No-14
12 pages
Binjal Patel: M.S. in Computer Science, May 2020
No ratings yet
Binjal Patel: M.S. in Computer Science, May 2020
2 pages
Mishandled Baggage - Property Irregularity Report
No ratings yet
Mishandled Baggage - Property Irregularity Report
2 pages
Introduction To SFG Custom Customer Delivery Protocol: IBM Sterling File Gateway Advanced Topics March 2020
No ratings yet
Introduction To SFG Custom Customer Delivery Protocol: IBM Sterling File Gateway Advanced Topics March 2020
19 pages
Lab Manual Dot Net
No ratings yet
Lab Manual Dot Net
28 pages
To YouTube
No ratings yet
To YouTube
6 pages
Customer Journey Map
100% (1)
Customer Journey Map
20 pages
Tapan Kumar Das
No ratings yet
Tapan Kumar Das
2 pages
Android Update For SAMSUNG Galaxy Tab 7.0 Plus (GT-P6200) - Android Updates Downloads
No ratings yet
Android Update For SAMSUNG Galaxy Tab 7.0 Plus (GT-P6200) - Android Updates Downloads
5 pages
Universal Software Defined Radio Development Platform: Dr. Bertalan Eged, Benjamin Babják
No ratings yet
Universal Software Defined Radio Development Platform: Dr. Bertalan Eged, Benjamin Babják
12 pages
Clean - Afghan Puteh - 01
No ratings yet
Clean - Afghan Puteh - 01
10 pages
7007 26438 1 PB
No ratings yet
7007 26438 1 PB
11 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
FTE Link IC To RSNG Test
No ratings yet
FTE Link IC To RSNG Test
45 pages
E-Marketing and E - CRM
No ratings yet
E-Marketing and E - CRM
30 pages
Rigol DG1022Z Datasheet
No ratings yet
Rigol DG1022Z Datasheet
8 pages
Quiz - Formative Assessment 2
No ratings yet
Quiz - Formative Assessment 2
22 pages
EG4S20 DataSheet V1.5 English
No ratings yet
EG4S20 DataSheet V1.5 English
26 pages
Log
No ratings yet
Log
37 pages
LLS SWI Template V2.0
No ratings yet
LLS SWI Template V2.0
6 pages
Ian Alfred Orozco CV - Latest PDF
No ratings yet
Ian Alfred Orozco CV - Latest PDF
9 pages

01 - Lesson - Visualization - Jupyter Notebook

Uploaded by

01 - Lesson - Visualization - Jupyter Notebook

Uploaded by

Data Visualization

Hal Varian (Google’s Chief Economist) (https://en.wikipedia.org/wiki/Hal_Varian)

Data Visualization for a Data Scientist

The power of Data Visualization

Consider the following data

In [2]: import pandas as pd

In [8]: sample = pd.read_csv('files/sample_corr.csv')

Visualizing the same data

Matplotlib (https://matplotlib.org) is an easy to use visualization library for Python.

In Notebooks you get started with.

import matplotlib.pyplot as plt

In [13]: sample.plot.scatter(x='x', y='y')

Out[13]: <AxesSubplot:xlabel='x', ylabel='y'>

What Data Visualization gives

Is the data quality usable

Check for missing values

isna() (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.isna.html) .any()

In [14]: data = pd.read_csv('files/sample_height.csv')

Out[17]: height False

Visualize with a histogram

This gives fast insights

Describe the data

In [20]: data = pd.read_csv('files/sample_age.csv')

CO2 per capita

Let's explore this dataset EN.ATM.CO2E.PC

Explore typical Data Visualizations

Read the data

5 rows × 266 columns

.plot() Creates a simple plot of data

Adding title and labels

title='Tilte' adds the title

In [20]: data['USA'].plot(title='CO2 per capita in USA', ylabel='CO2 per capita')

Out[20]: <AxesSubplot:title={'center':'CO2 per capita in USA'}, xlabel='Year', ylabel='C

Adding axis range

xlim=(min, max) or xlim=min Sets the x-axis range

Out[21]: <AxesSubplot:title={'center':'CO2 per capita in USA'}, xlabel='Year', ylabel='C

Set the figure size

figsize=(width, height) in inches

Plot a range of data

In [30]: data[['USA', 'WLD']].loc[2000:].plot.bar(figsize=(20,6))

In [35]: df = pd.Series(data=[3, 5, 7], index=['Data1', 'Data2', 'Data3'])

Value counts and pie charts

In [43]: (data['USA'] < 17.5).value_counts().plot.pie(colors=['r', 'g'], labels=['>=17.5',

Out[43]: <AxesSubplot:title={'center':'CO2 per capita'}, ylabel='USA'>

In [44]: data = pd.read_csv('files/co2_gdp_per_capita.csv', index_col=0)

Out[44]: CO2 per capita GDP per capita

AFE 0.933541 1507.861055

AFG 0.200151 568.827927

AFW 0.515544 1834.366604

AGO 0.887380 3595.106667

ALB 1.939732 4433.741739

Out[46]: <AxesSubplot:xlabel='CO2 per capita', ylabel='GDP per capita'>

Out[47]: CO2 per capita GDP per capita

CO2 per capita 1.000000 0.633178

GDP per capita 0.633178 1.000000

Let's take 2017 (as more recent data is incomplete)

In [54]: data = pd.read_csv('files/WorldBank-ATM.CO2E.PC_DS2.csv', index_col=0)

5 rows × 266 columns

Out[56]: count 239.000000

And in the US?

How can we tell a story?

US is above the mean

In [58]: ax = data.loc[year].plot.hist(bins=15, facecolor='green')

Out[58]: Text(15, 30, 'USA')

Creative story telling with data visualization

In [60]: from IPython.display import YouTubeVideo

You might also like