0% found this document useful (0 votes)
2 views7 pages

Python Filtering

The document is a Jupyter Notebook that demonstrates various methods of filtering data in a pandas DataFrame using Python. It includes examples of filtering based on conditions, selecting specific columns, and comparing the use of loc and iloc functions. The notebook provides a practical guide for users to manipulate and analyze data effectively.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views7 pages

Python Filtering

The document is a Jupyter Notebook that demonstrates various methods of filtering data in a pandas DataFrame using Python. It includes examples of filtering based on conditions, selecting specific columns, and comparing the use of loc and iloc functions. The notebook provides a practical guide for users to manipulate and analyze data effectively.

Uploaded by

paulrajarshi7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

2/24/2020 3.

2_filtering - Jupyter Notebook

Filtering
In [1]:

import numpy as np
import pandas as pd

In [2]:

# Create a Dictionary
d = {
'Name':['Amarend','Ajay','Preety','Rakesh','Raju','Shyam',
'Kiran','Rishi','Prem','Raj','Ravina','Premjit'],
'Exam':['Semester 1','Semester 1','Semester 1','Semester 1','Semester 1','Semester 1',
'Semester 2','Semester 2','Semester 2','Semester 2','Semester 2','Semester 2'],

'Subject':['Mathematics','Mathematics','Mathematics','Science','Science','Science',
'Mathematics','Mathematics','Mathematics','Science','Science','Science'],
'Score':[62,47,55,74,31,77,85,63,42,67,89,81]}

# Create a dataframe
df = pd.DataFrame(d,columns=['Name','Exam','Subject','Score'])
df

Out[2]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

1 Ajay Semester 1 Mathematics 47

2 Preety Semester 1 Mathematics 55

3 Rakesh Semester 1 Science 74

4 Raju Semester 1 Science 31

5 Shyam Semester 1 Science 77

6 Kiran Semester 2 Mathematics 85

7 Rishi Semester 2 Mathematics 63

8 Prem Semester 2 Mathematics 42

9 Raj Semester 2 Science 67

10 Ravina Semester 2 Science 89

11 Premjit Semester 2 Science 81

View a column of the dataframe in pandas python:

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 1/7


2/24/2020 3.2_filtering - Jupyter Notebook

In [5]:

df['Name']

Out[5]:

0 Amarend
1 Ajay
2 Preety
3 Rakesh
4 Raju
5 Shyam
6 Kiran
7 Rishi
8 Prem
9 Raj
10 Ravina
11 Premjit
Name: Name, dtype: object

View two or more columns of the dataframe in pandas:

In [18]:

df[['Name', 'Score']]

Out[18]:

Name Score

0 Amarend 62

1 Ajay 47

2 Preety 55

3 Rakesh 74

4 Raju 31

5 Shyam 77

6 Kiran 85

7 Rishi 63

8 Prem 42

9 Raj 67

10 Ravina 89

11 Premjit 81

View first two rows of the dataframe in pandas:

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 2/7


2/24/2020 3.2_filtering - Jupyter Notebook

In [6]:

df[:2]

Out[6]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

1 Ajay Semester 1 Mathematics 47

In [7]:

df.head(2)

Out[7]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

1 Ajay Semester 1 Mathematics 47

View last two rows of the dataframe in pandas:

In [20]:

df[-2:]

Out[20]:

Name Exam Subject Score

10 Ravina Semester 2 Science 89

11 Premjit Semester 2 Science 81

Filter pandas dataframe by column value

Method 1 : DataFrame Way

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 3/7


2/24/2020 3.2_filtering - Jupyter Notebook

In [21]:

# based on one condition


df1 = df[df['Score']>60]
df1

Out[21]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

3 Rakesh Semester 1 Science 74

5 Shyam Semester 1 Science 77

6 Kiran Semester 2 Mathematics 85

7 Rishi Semester 2 Mathematics 63

9 Raj Semester 2 Science 67

10 Ravina Semester 2 Science 89

11 Premjit Semester 2 Science 81

In [22]:

# based on multiple conditions


df1A = df[(df['Score']>60) & (df['Subject']=='Mathematics')]
df1B = df[(df.Score>60) & (df.Subject=='Mathematics')]
#df1A
df1B

Out[22]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

6 Kiran Semester 2 Mathematics 85

7 Rishi Semester 2 Mathematics 63

In [31]:

# Select only a few columns under some conditions


df1C = df[(df.Score>60) & (df.Subject=='Mathematics')][['Name','Score']]
df1C

Out[31]:

Name Score

0 Amarend 62

6 Kiran 85

7 Rishi 63

Method 2 : Query Function

In pandas package, there are multiple ways to perform filtering. The above code can also be written like the
code shown below. This method is elegant and more readable and you don't need to mention dataframe name
localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 4/7
2/24/2020 3.2_filtering - Jupyter Notebook

everytime when you specify columns (variables).

In [33]:

df2 = df.query('Score > 60 & Subject == "Mathematics"')


df2

Out[33]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

6 Kiran Semester 2 Mathematics 85

7 Rishi Semester 2 Mathematics 63

Method 3 : loc function

loc is an abbreviation of location term. All these 3 methods return same output. It's just a different ways of doing
filtering rows.

In [36]:

df3 = df.loc[(df.Score>60) & (df.Subject=='Mathematics')]


df3

Out[36]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

6 Kiran Semester 2 Mathematics 85

7 Rishi Semester 2 Mathematics 63

Difference between loc and iloc function

loc considers rows based on index labels. Whereas iloc considers rows based on position in the index so it only
takes integers. Let's create a sample data for illustration

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 5/7


2/24/2020 3.2_filtering - Jupyter Notebook

In [38]:

x = pd.DataFrame({"col1" : np.arange(1,20,2)}, index=[9,8,7,6,0, 1, 2, 3, 4, 5])


x

Out[38]:

col1

9 1

8 3

7 5

6 7

0 9

1 11

2 13

3 15

4 17

5 19

iloc - Index Position

In [39]:

x.iloc[0:5]

Out[39]:

col1

9 1

8 3

7 5

6 7

0 9

loc - Index Label

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 6/7


2/24/2020 3.2_filtering - Jupyter Notebook

In [40]:

x.loc[0:5]

Out[40]:

col1

0 9

1 11

2 13

3 15

4 17

5 19

Note : x.loc[0:5] returns 6 rows (inclusive of 5 which is 6th element)


It is because loc does not produce output based on index position. It considers labels of index only which can
be alphabet as well and includes both starting and end point. Refer the example below.

In [41]:

# more examples - (offline) Data Analytics - Preprocessing 4

In [3]:

df.head()

Out[3]:

Name Exam Subject Score

0 Amarend Semester 1 Mathematics 62

1 Ajay Semester 1 Mathematics 47

2 Preety Semester 1 Mathematics 55

3 Rakesh Semester 1 Science 74

4 Raju Semester 1 Science 31

In [9]:

#df.sortby('Name')

In [ ]:

localhost:8888/notebooks/Machine Learning/Python/3.2_filtering.ipynb 7/7

You might also like