0% found this document useful (0 votes)
8 views3 pages

Boolean Indexing

Uploaded by

melikakhajeh94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Boolean Indexing

Uploaded by

melikakhajeh94
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

What is Boolean Indexing?

Boolean indexing is a powerful feature in Pandas that allows you to select specific rows
or columns from a DataFrame based on a conditional statement. It's called "boolean"
because the condition is evaluated to a boolean value (True or False) for each element
in the DataFrame.

How to use Boolean Indexing?

The basic syntax for boolean indexing is:

python

VerifyOpen In EditorEditCopy code


1df[condition]

Where df is the DataFrame, and condition is a boolean expression that evaluates to a


boolean array with the same shape as the DataFrame.

When to use Boolean Indexing?

You can use boolean indexing in a variety of situations, such as:

 Selecting rows or columns based on specific conditions (e.g., values above or below a
certain threshold)
 Filtering out missing or null values
 Creating a new DataFrame with a subset of the original data
 Performing data cleaning or preprocessing tasks

Which type of data can use Boolean Indexing?

Boolean indexing can be used with Pandas DataFrames, Series, and Index objects.

Syntax of using Boolean Indexing

The syntax for boolean indexing is flexible and can be used in various ways. Here are
some examples:

 Selecting rows where a condition is True:

python

VerifyOpen In EditorEditCopy code


1df[df['column_name'] > 0]

 Selecting columns where a condition is True:

python

VerifyOpen In EditorEditCopy code


1df.loc[:, df.columns > 'column_name']

 Selecting rows and columns where a condition is True:

python

VerifyOpen In EditorEditCopy code


1df[(df['column_name'] > 0) & (df['other_column'] < 10)]
 Using the | operator for OR conditions:

python

VerifyOpen In EditorEditCopy code


1df[(df['column_name'] > 0) | (df['other_column'] < 10)]
 Using the ~ operator for NOT conditions:

python

VerifyOpen In EditorEditCopy code


1df[~(df['column_name'] > 0)]

Explaining outliers = df[(df < lower) | (df > upper)]

Now, let's break down the specific example you provided:

python

VerifyOpen In EditorEditCopy code


1outliers = df[(df < lower) | (df > upper)]

Here, we're using boolean indexing to select rows from the DataFrame df where the
values are either less than lower or greater than upper.

Here's what's happening step by step:

1. df < lower creates a boolean array with the same shape as df, where each element
is True if the corresponding value in df is less than lower, and False otherwise.
2. df > upper creates another boolean array with the same shape as df, where each
element is True if the corresponding value in df is greater than upper,
and False otherwise.
3. The | operator is used to perform an element-wise OR operation between the two
boolean arrays. This creates a new boolean array where each element is True if either
of the conditions is True.
4. The resulting boolean array is used to index into the original DataFrame df, selecting
only the rows where the condition is True. This creates a new
DataFrame outliers with the selected rows.

In summary, outliers = df[(df < lower) | (df > upper)] selects rows from the
original DataFrame df where the values are either less than lower or greater
than upper, and assigns the result to a new DataFrame outliers.

You might also like