2-2 Intermediate Python - Chapter 2 Dictionaries Pandas
2-2 Intermediate Python - Chapter 2 Dictionaries Pandas
INTERMEDIATE PYTHON
Hugo Bowne-Anderson
Data Scientist at DataCamp
List
countries = ["afghanistan", "albania", "algeria"]
pop = [30.55, 2.77, 39.21]
ind_alb = countries.index("albania")
ind_alb
1
pop[ind_alb]
2.77
• Not convenient
• Not intuitive
Dictionary
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]
...
{ }
Dictionary
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]
...
{"afghanistan":30.55, }
Dictionary
pop = [30.55, 2.77, 39.21]
countries = ["afghanistan", "albania", "algeria"]
...
world["albania"]
2.77
Questions?
Dictionaries, Part 2
INTERMEDIATE PYTHON
Hugo Bowne-Anderson
Data Scientist at DataCamp
Recap
world = {"afghanistan":30.55, "albania":2.77, "algeria":39.21}
world["albania"]
2.77
world
{'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21}
Recap
• Keys have to be "immutable" objects
1
Source: Wikipedia
Dictionary
world["sealand"] = 0.000027
world
{'afghanistan': 30.55, 'albania': 2.81,'algeria': 39.21, 'sealand': 2.7e-05}
"sealand" in world
True
Dictionary
world["sealand"] = 0.000028
world
{'afghanistan': 30.55, 'albania': 2.81,'algeria': 39.21, 'sealand': 2.8e-05}
del(world["sealand"])
world
{'afghanistan': 30.55, 'albania': 2.81, 'algeria': 39.21}
List vs. Dictionary
List vs. Dictionary
List Dictionary
Select, update, and remove with [] Select, update, and remove with []
Hugo Bowne-
Anderson
Data Scientist at DataCamp
Tabular dataset examples
Tabular dataset examples
Tabular dataset examples
Datasets in Python
• 2D NumPy array?
o One data type
Datasets in Python
Datasets in Python
• pandas!
o High level data manipulation tool
o Wes McKinney
o Built on NumPy
o DataFrame
DataFrame
brics
import pandas as pd
brics = pd.DataFrame(dict)
DataFrame from Dictionary (2)
brics
brics
area capital country population
BR 8.516 Brasilia Brazil 200.40
RU 17.100 Moscow Russia 143.50
IN 3.286 New Delhi India 1252.00
CH 9.597 Beijing China 1357.00
SA 1.221 Pretoria South Africa 52.98
DataFrame from CSV file
brics.csv
,country,capital,area,population
BR,Brazil,Brasilia,8.516,200.4
RU,Russia,Moscow,17.10,143.5
IN,India,New Delhi,3.286,1252
CH,China,Beijing,9.597,1357
SA,South Africa,Pretoria,1.221,52.98
brics = pd.read_csv("path/to/brics.csv")
brics
Unnamed:
0 country capital area population
0 BR Brazil Brasilia 8.516 200.40
1 RU Russia Moscow 17.100 143.50
2 IN India New Delhi 3.286 1252.00
3 CH China Beijing 9.597 1357.00
4 SA South Africa Pretoria 1.221 52.98
DataFrame from CSV file
brics = pd.read_csv("path/to/brics.csv", index_col = 0)
brics
country population area capital
BR Brazil 200 8515767 Brasilia
RU Russia 144 17098242 Moscow
IN India 1252 3287590 New Delhi
CH China 1357 9596961 Beijing
SA South Africa 55 1221037 Pretoria
Questions?
Pandas, Part 2
INTERMEDIATE PYTHON
Hugo Bowne-
Anderson
Data Scientist at DataCamp
brics
import pandas as pd
brics = pd.read_csv("path/to/brics.csv", index_col = 0)
brics
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
Index and select data
• Square brackets
• Advanced methods
o loc
o iloc
Column Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics["country"]
BR Brazil
RU Russia
IN India
CH China
SA South Africa
Name: country, dtype: object
Column Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
type(brics["country"])
pandas.core.series.Series
• 1D labelled array
Column Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics[["country"]]
country
BR Brazil
RU Russia
IN India
CH China
SA South Africa
Column Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
type(brics[["country"]])
pandas.core.frame.DataFrame
Column Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics[["country", "capital"]]
country capital
BR Brazil Brasilia
RU Russia Moscow
IN India New Delhi
CH China Beijing
SA South Africa Pretoria
Row Access [ ]
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics[1:4]
brics[1:4]
• Pandas
o loc (label-based)
o iloc (integer position-based)
Row Access loc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics.loc["RU"]
country Russia
capital Moscow
Area 17.1
population 143.5
Name: RU, dtype: object
brics.loc[["RU"]]
• DataFrame
Row Access loc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
country capital
RU Russia Moscow
IN India New Delhi
CH China Beijing
Row & Column loc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
country capital
BR Brazil Brasilia
RU Russia Moscow
IN India New Delhi
CH China Beijing
SA South Africa Pretoria
Recap
• Square brackets
o Column access brics[["country", "capital"]]
o Row access: only through slicing brics[1:4]
• loc (label-based)
o Row access brics.loc[["RU", "IN", "CH"]]
o Column access brics.loc[:, ["country", "capital"]]
o Row & Column access
brics.loc[["RU"]]
brics.iloc[[1]]
brics.iloc[[1,2,3]]
country capital
RU Russia Moscow
IN India New Delhi
CH China Beijing
Row & Column iloc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
country capital
RU Russia Moscow
IN India New Delhi
CH China Beijing
Row & Column iloc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
country capital
BR Brazil Brasilia
RU Russia Moscow
IN India New Delhi
CH China Beijing
SA South Africa Pretoria
Row & Column iloc
country capital area population
BR Brazil Brasilia 8.516 200.40
RU Russia Moscow 17.100 143.50
IN India New Delhi 3.286 1252.00
CH China Beijing 9.597 1357.00
SA South Africa Pretoria 1.221 52.98
brics.iloc[:, [0,1]]
country capital
BR Brazil Brasilia
RU Russia Moscow
IN India New Delhi
CH China Beijing
SA South Africa Pretoria
Let's practice!
INTERMEDIATE PYTHON