Python Filtering
Python Filtering
Filtering
In [1]:
import numpy as np
import pandas as pd
In [2]:
# Create a Dictionary
d = {
'Name':['Amarend','Ajay','Preety','Rakesh','Raju','Shyam',
'Kiran','Rishi','Prem','Raj','Ravina','Premjit'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],
'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}
# Create a dataframe
df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])
df
Out[2]:
In [5]:
df['Name']
Out[5]:
0 Amarend
1 Ajay
2 Preety
3 Rakesh
4 Raju
5 Shyam
6 Kiran
7 Rishi
8 Prem
9 Raj
10 Ravina
11 Premjit
Name: Name, dtype: object
In [18]:
df[['Name', 'Score']]
Out[18]:
Name Score
0 Amarend 62
1 Ajay 47
2 Preety 55
3 Rakesh 74
4 Raju 31
5 Shyam 77
6 Kiran 85
7 Rishi 63
8 Prem 42
9 Raj 67
10 Ravina 89
11 Premjit 81
In [6]:
df[:2]
Out[6]:
In [7]:
df.head(2)
Out[7]:
In [20]:
df[-2:]
Out[20]:
In [21]:
Out[21]:
In [22]:
Out[22]:
In [31]:
Out[31]:
Name Score
0 Amarend 62
6 Kiran 85
7 Rishi 63
In pandas package, there are multiple ways to perform filtering. The above code can also be written like the
code shown below. This method is elegant and more readable and you don't need to mention dataframe name
localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 4/7
2/24/2020 3.2_filtering - Jupyter Notebook
In [33]:
Out[33]:
loc is an abbreviation of location term. All these 3 methods return same output. It's just a different ways of doing
filtering rows.
In [36]:
Out[36]:
loc considers rows based on index labels. Whereas iloc considers rows based on position in the index so it only
takes integers. Let's create a sample data for illustration
In [38]:
Out[38]:
col1
9 1
8 3
7 5
6 7
0 9
1 11
2 13
3 15
4 17
5 19
In [39]:
x.iloc[0:5]
Out[39]:
col1
9 1
8 3
7 5
6 7
0 9
In [40]:
x.loc[0:5]
Out[40]:
col1
0 9
1 11
2 13
3 15
4 17
5 19
In [41]:
In [3]:
df.head()
Out[3]:
In [9]:
#df.sortby('Name')
In [ ]: