The document provides an overview of using Pandas Series to analyze the population of the G7 countries. It explains how to create Series from lists and dictionaries, access elements, and perform operations such as conditional selection and aggregation. The document also highlights the similarities between Series and other data structures like numpy arrays and Python dictionaries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0 ratings0% found this document useful (0 votes)
8 views8 pages
2 - 2 Pandas Series
The document provides an overview of using Pandas Series to analyze the population of the G7 countries. It explains how to create Series from lists and dictionaries, access elements, and perform operations such as conditional selection and aggregation. The document also highlights the similarities between Series and other data structures like numpy arrays and Python dictionaries.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 8
sai11/28, 4:85 PM 2.2 Pandas Sores to Students
import pandas as pd
import nunpy as np
Pandas Series
We'll start analyzing ‘The Group of Seven". Which is a political formed by Canada, France,
Germany, Italy, Japan, the United Kingdom and the United States. We'll start by analyzing
population, and for that, we'll use a pandas.Series object.
# In millions
B7_pop = pd.Series([35.467, 63.951, 80.940, 60.665, 127.061, 64.511, 318.523])
87_pop
Qu 2 35.467
1 63.951
2 80.940
3 60.665
4 127.061
5 64.511
5 318.523
dtype: floated
Someone might not know we're representing population in millions of inhabitants. Series
can have a name , to better document the purpose of the Series
g7_pop.nane = ‘G7 Population in millions"
e7_pop
35.467
63.951
80.940
60.665
127.061
64.511
318.523
Name: G7 Population in millions, dtype: floated
ounuNnee®
Series are pretty similar to numpy arrays:
e7_pop.dtype
Qu dtype( ‘Floated" )
87_pop. values
Qu array([ 35.467, 63.951, 80.94 , 60.665, 127.061, 64.511, 318.523])
They're actually backed by numpy arrays
‘type(g7_pop. values)
‘ieC:/Usersluser/Downloads/2_? Pandas Series to Studonts:himl 8sai11/28, 4:85 PM
‘ielC:/Usersluser/Downloads!2_? Pandas Ser
2.2 Pandas Series to Students
numpy nndarray
And they look like simple Python lists or Numpy Arrays. But they're actually more similar
to Python dict s.
A Series has an index., that's similar to the automatic index assigned to Python's lists:
e7_pop
@ 35.467
1 63.951
2 80.940
3 60,665
4 127.061
5 64.511
8 318.523
Name: G7 Population in millions, dtype: floated
87_pop[2)
35.467
87_pop[1]
83.951
g7_pop. index
RangeIndex(start=@, stop=7, step=1)
l= ['a', ‘bY, 'e']
But, in contrast to lists, we can explicitly define the index
g7_pop. index = [
*Canada’ ,
“France”,
“Germany,
“Ttaly',
“Japan’,
“united Kingdon’ ,
“united States",
e7_pop
canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511,
United States 318.523
Name: G7 Population in millions, dtype: floates
Compare it with the following table:
to Studonts:himl
218sai11/28, 4:85 PM 2.2 Pandas Sores to Students
(Expressed in milions)
Canada 35.467
France 63.951
Germany 80.94
Italy 60.665
Japan 127.061
United Kingdom 64.511
United States 318.523
We can say that Series look like “ordered dictionaries’. We can actualy create Series out
of dictionaries
pd.Series({
“canada’: 35.467,
“France’: 63.951,
*Germany': 80.94,
‘Italy’: 68.665,
“Japan': 127.061,
‘united Kingdom’: 64.511,
“United States’: 318.523
}, name="G7 Population in millions")
ut canada 35.467
France 63.951
Germany 80.948
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype: floatea
pd.Series(
(35.467, 63.951, 80.94, 60.665, 127.061, 64.511, 318.523],
index=['Canada", ‘France’, ‘Germany’, ‘Italy’, ‘Japan', ‘United Kingdon’,
“united states" ],
name='G7 Population in millions’)
Qu canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype: floatea
You can also create Series out of other series, specifying indexes
pd.Series(g7_pop, index=['France', ‘Germany’, ‘Italy’, 'Spain'])
‘ieC:/Usersluser/Downloads/2_? Pandas Series to Studonts:himl 318sai11/28, 4:85 PM
‘ielC:/Usersluser/Downloads!2_? Pandas Ser
2.2 Pandas Series to Students
France 63.951.
Germany 80.94€
Italy 60,665
Spain NaN
Name: G7 Population in millions, dtype: floates
87_pop
canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype: floates
87_pop[ ‘Canada"]
35.467
87_pop[ ‘Japan’ ]
127.061
Numeric positions can also be used, with the iloc attribute
e7_pop. iloc[]
35.467
@7_pop. iloc[-1]
318.523
Selecting multiple elements at once:
g7_pop[['Italy', ‘France"]]
Italy 60.665
France 63.951
Name: G7 Population in millions, dtype: floates
(The result is another Series)
e7_pop. iloc[[, 1]]
Canada 35.467
France 63.951
Name: G7 Population in millions, dtype: floates
Slicing also works, but important, in Fandas, the upper limit is also included:
g7_pop['‘Canada': ‘Italy']
to Studonts:himl 418sai11/28, 4:85 PM
‘ielC:/Usersluser/Downloads!2_? Pandas Ser
2.2 Pandas Series to Students
Canada 35.467
France 63.951
Germany 80.94€
Italy 60.665
Name: G7 Population in millions, dtype:
Conditional selection (bo
+ floated
lean arrays)
The same boolean array techniques we saw applied to numpy arrays can be used for
Pandas Series :
87_pop
canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdom 64.522
United States 318.523
Name: G7 Population in millions, dtype:
B7_pop > 70
Canada False
France False
Germany True
Italy False
Japan True
United Kingdom False
United States True
Name: G7 Population in millions, dtype:
7_pop[g7_pop > 70)
Germany 80.94€
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype:
7_pop.mean()
107..30257142857144
87_pop[g7_pop > g7_pop.mean()]
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype:
g7_pop.std()
97. 24996987121581
~ not
| or
& and
to Studonts:himl
1 floates
bool
1 floates
+ floates
58sai11/28, 4:85 PM 2.2 Pandas Sores to Students
cell In[33], line 1
~ not
syntaxError: invalid syntax
87_pop[(g7_pop > g7_pop.mean() - g7_pop.std() / 2) | (g7_pop > g7_pop.mean() + €
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061,
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype: floates
Operations and methods
Series also support vectorized operations and aggregation functions as Numpy:
87_pop
canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype: floates
87_pop.mean()
107. 3257142857144
np. 10g(87_pop)
Canada 3.568603
France 4.158117
Germany 4.393708
Italy 4.105367
Japan 4.844667
United Kingdom 4.166836
United States 5.763695
Name: G7 Population in millions, dtype: floated
g7_pop[ ‘France’: ‘Italy'].mean()
58,51866666666666
Boolean arrays
(Work in the same way as numpy)
e7_pop
‘ieC:/Usersluser/Downloads/2_? Pandas Series to Studonts:himl aesai11/28, 4:85 PM
‘ielC:/Usersluser/Downloads!2_? Pandas Ser
2.2 Pandas Series to Students
canada 35.467
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype:
e7_pop > 80
Canada False
France False
Germany True
Italy False
Japan True
United Kingdom False
United States True
Name: G7 Population in millions, dtype:
87_pop[@7_pop > 82)
Germany 80.940
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype:
87_popl(e7_pop > 8) | (g7_pop < 4@)]
canada 35.467
Germany 80.940
Japan 127.061
United States 318.523
Name: G7 Population in millions, dtype:
87_pop[(g7_pop > 88) & (g7_pop < 200)]
Germany 80.940
Japan 127.061
Name: G7 Population in millions, dtype:
e7_pop['Canada'] = 40.5
&7_pop
Canada 40.500
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 318.523
Name: G7 Population in millions, dtype:
g7_pop.iloc[-1] = 500
e7_pop
to Studonts:himl
Floates
bool
Floates
floates
Floates
Floates
718sai11/28, 4:85 PM 2.2 Pandas Sores to Students
canada 49.500
France 63.951
Germany 80.940
Italy 60.665
Japan 127.061
United Kingdon 64.511
United States 500.000
Name: G7 Population in millions, dtype: floats
87_pop[g7_pop < 72]
canada 49.50@
France 63.951
Italy 60.665
United Kingdon 64.511
Name: G7 Population in millions, dtype: floats
&7_poplg7_pop < 7@] = 99.99
87_pop
Canada 99.990
France 99.990
Germany 80.940
Italy 99.990
Japan 127.061
United Kingdon 99.998
United States 500.000
Name: G7 Population in millions, dtype: floates
‘ieC:/Usersluser/Downloads/2_? Pandas Series to Studonts:himl a8