0% found this document useful (0 votes)
48 views

Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels

The document describes various operations performed on a pandas DataFrame containing bird observation data. The DataFrame is created from a dictionary of data and list of index labels. Summary statistics are displayed and various data selections, filters, and transformations are applied, such as calculating group means, sorting, and replacing values.

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Pandas - Basics - Practice: Consider The Following Python Dictionary Data and Python List Labels

The document describes various operations performed on a pandas DataFrame containing bird observation data. The DataFrame is created from a dictionary of data and list of index labels. Summary statistics are displayed and various data selections, filters, and transformations are applied, such as calculating group means, sorting, and replacing values.

Uploaded by

Abhishek Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

pandas_basics_practice

April 7, 2021

Consider the following Python dictionary data and Python list labels:
data = {‘birds’: [‘Cranes’, ‘Cranes’, ‘plovers’, ‘spoonbills’, ‘spoonbills’, ‘Cranes’, ‘plovers’, ‘Cranes’,
‘spoonbills’, ‘spoonbills’], ‘age’: [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4], ‘visits’: [2, 4, 3, 4, 3, 4,
2, 2, 3, 2], ‘priority’: [‘yes’, ‘yes’, ‘no’, ‘yes’, ‘no’, ‘no’, ‘no’, ‘yes’, ‘no’, ‘no’]}
labels = [‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’]
1. Create a DataFrame birds from this dictionary data which has the index labels.

[1]: import pandas as pd


import numpy as np

data = {'birds': ['Cranes', 'Cranes', 'plovers', 'spoonbills', 'spoonbills',␣


,→'Cranes', 'plovers', 'Cranes', 'spoonbills', 'spoonbills'],

'age': [3.5, 4, 1.5, np.nan, 6, 3, 5.5, np.nan, 8, 4],


'visits': [2, 4, 3, 4, 3, 4, 2, 2, 3, 2],
'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no',␣
,→'no']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df=pd.DataFrame(data, index=labels)
df

[1]: birds age visits priority


a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

2. Display a summary of the basic information about birds DataFrame and its data.

[114]: df.columns

1
[114]: Index(['birds', 'age', 'visits', 'priority'], dtype='object')

[115]: df.describe()

[115]: age visits


count 8.000000 10.000000
mean 4.437500 2.900000
std 2.007797 0.875595
min 1.500000 2.000000
25% 3.375000 2.000000
50% 4.000000 3.000000
75% 5.625000 3.750000
max 8.000000 4.000000

3. Print the first 2 rows of the birds dataframe

[43]: df.iloc[[0,1]]

[43]: birds age visits priority


labels
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes

4. Print all the rows with only ‘birds’ and ‘age’ columns from the dataframe

[52]: df[['birds','age']]

[52]: birds age


labels
a Cranes 3.5
b Cranes 4.0
c plovers 1.5
d spoonbills NaN
e spoonbills 6.0
f Cranes 3.0
g plovers 5.5
h Cranes NaN
i spoonbills 8.0
j spoonbills 4.0

5. select [2, 3, 7] rows and in columns [‘birds’, ‘age’, ‘visits’]

[60]: df.loc[['b','c','g'],['birds','age','visits']]

[60]: birds age visits


labels
b Cranes 4.0 4
c plovers 1.5 3
g plovers 5.5 2

2
6. select the rows where the number of visits is less than 4

[59]: filt=df['visits']<4
df[filt]

[59]: birds age visits priority


labels
a Cranes 3.5 2 yes
c plovers 1.5 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

7. select the rows with columns [‘birds’, ‘visits’] where the age is missing i.e NaN

[2]: filt= df['age'].isnull()


df[filt][['birds','visits']]

[2]: birds visits


d spoonbills 4
h Cranes 2

8. Select the rows where the birds is a Cranes and the age is less than 4

[68]: filt=(df['birds']=='Cranes') & (df['age']<4)


df[filt]

[68]: birds age visits priority


labels
a Cranes 3.5 2 yes
f Cranes 3.0 4 no

9. Select the rows the age is between 2 and 4(inclusive)

[70]: filt=(df['age']>=2) & (df['age']<=4)


df[filt]

[70]: birds age visits priority


labels
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
f Cranes 3.0 4 no
j spoonbills 4.0 2 no

10. Find the total number of visits of the bird Cranes

[71]: df[df['birds']=='Cranes']['visits'].sum()

3
[71]: 12

11. Calculate the mean age for each different birds in dataframe.

[76]: birds_grp=df.groupby('birds')
birds_grp['age'].mean()

[76]: birds
Cranes 3.5
plovers 3.5
spoonbills 6.0
Name: age, dtype: float64

12. Append a new row ‘k’ to dataframe with your choice of values for each column.
Then delete that row to return the original DataFrame.

[106]: df.loc['k']=['Sparrow',3,4,'yes']
df

[106]: birds age visits priority


labels
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no
k Sparrow 3.0 4 yes

[111]: df=df.drop('k')
df

[111]: birds age visits priority


labels
a Cranes 3.5 2 yes
b Cranes 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f Cranes 3.0 4 no
g plovers 5.5 2 no
h Cranes NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

4
13. Find the number of each type of birds in dataframe (Counts)

[73]: df['birds'].value_counts()

[73]: spoonbills 4
Cranes 4
plovers 2
Name: birds, dtype: int64

14. Sort dataframe (birds) first by the values in the ‘age’ in decending order, then by
the value in the ‘visits’ column in ascending order.

[116]: df.sort_values(by=['age','visits'],ascending=[False,True])

[116]: birds age visits priority


i spoonbills 8.0 3 no
e spoonbills 6.0 3 no
g plovers 5.5 2 no
j spoonbills 4.0 2 no
b Cranes 4.0 4 yes
a Cranes 3.5 2 yes
f Cranes 3.0 4 no
c plovers 1.5 3 no
h Cranes NaN 2 yes
d spoonbills NaN 4 yes

15. Replace the priority column values with’yes’ should be 1 and ‘no’ should be 0

[101]: def replace_priority(x):


if x=='yes':
return 1
else:
return 0
df['priority'].apply(replace_priority)
df

[101]: birds age visits priority


labels
a trumpeters 3.5 2 1
b trumpeters 4.0 4 1
c plovers 1.5 3 0
d spoonbills NaN 4 1
e spoonbills 6.0 3 0
f trumpeters 3.0 4 0
g plovers 5.5 2 0
h trumpeters NaN 2 1
i spoonbills 8.0 3 0
j spoonbills 4.0 2 0

16. In the ‘birds’ column, change the ‘Cranes’ entries to ‘trumpeters’.

5
[91]: df['birds']=df['birds'].replace({'Cranes':'trumpeters'})
df

[91]: birds age visits priority


labels
a trumpeters 3.5 2 yes
b trumpeters 4.0 4 yes
c plovers 1.5 3 no
d spoonbills NaN 4 yes
e spoonbills 6.0 3 no
f trumpeters 3.0 4 no
g plovers 5.5 2 no
h trumpeters NaN 2 yes
i spoonbills 8.0 3 no
j spoonbills 4.0 2 no

[ ]:

You might also like