Boolean Indexing
Boolean Indexing
Boolean indexing is a powerful feature in Pandas that allows you to select specific rows
or columns from a DataFrame based on a conditional statement. It's called "boolean"
because the condition is evaluated to a boolean value (True or False) for each element
in the DataFrame.
python
Selecting rows or columns based on specific conditions (e.g., values above or below a
certain threshold)
Filtering out missing or null values
Creating a new DataFrame with a subset of the original data
Performing data cleaning or preprocessing tasks
Boolean indexing can be used with Pandas DataFrames, Series, and Index objects.
The syntax for boolean indexing is flexible and can be used in various ways. Here are
some examples:
python
python
python
python
python
python
Here, we're using boolean indexing to select rows from the DataFrame df where the
values are either less than lower or greater than upper.
1. df < lower creates a boolean array with the same shape as df, where each element
is True if the corresponding value in df is less than lower, and False otherwise.
2. df > upper creates another boolean array with the same shape as df, where each
element is True if the corresponding value in df is greater than upper,
and False otherwise.
3. The | operator is used to perform an element-wise OR operation between the two
boolean arrays. This creates a new boolean array where each element is True if either
of the conditions is True.
4. The resulting boolean array is used to index into the original DataFrame df, selecting
only the rows where the condition is True. This creates a new
DataFrame outliers with the selected rows.
In summary, outliers = df[(df < lower) | (df > upper)] selects rows from the
original DataFrame df where the values are either less than lower or greater
than upper, and assigns the result to a new DataFrame outliers.