0% found this document useful (0 votes)

10 views33 pages

Pandas 3

Uploaded by

cr.lucianoperez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views33 pages

Pandas 3

Uploaded by

cr.lucianoperez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Manipulating Pandas Dataframes

Once you have loaded data into your Pandas dataframe, you might need to
further manipulate the data and perform a variety of functions such as filtering
certain columns, dropping the others, selecting a subset of rows or columns,
sorting the data, finding unique values, and so on. You are going to study all
these functions in this chapter.

You will first see how to select data via indexing and slicing, followed by a
section on how to drop unwanted rows or columns from your data. You will
then study how to filter your desired rows and columns. The chapter
concludes with an explanation of sorting and finding unique values from a
Pandas dataframe.

3.1. Selecting Data Using Indexing and Slicing

Indexing refers to fetching data using index or column information of a Pandas
dataframe. Slicing, on the other hand, refers to slicing a Pandas dataframe
using indexing techniques.

In this section, you will see the different techniques of indexing and slicing
Pandas dataframes.

You will be using the Titanic dataset for this section, which you can import
via the Seaborn library’s load_dataset() method, as shown in the script below.

Script 1:

import matplotlib.pyplot as plt

import seaborn as sns

# sets the default style for plotting

sns.set_style("darkgrid")

titanic_data = sns.load_dataset('titanic')
titanic_data.head()

Output:

3.1.1. Selecting Data Using Brackets []

One of the simplest ways to select data from various columns is by using
square brackets. To get column data in the form of a series from a Pandas
dataframe, you need to pass the column name inside square brackets that
follow the Pandas dataframe name.

The following script selects records from the class column of the Titanic
dataset.

Script 2:

print (titanic_data["class"])
type (titanic_data["class"])

Output:

0 Third
1 First
2 Third
3 First
4 Third
…
886 Second
887 First
888 Third
889 First
890 Third
Name: class, Length: 891, dtype: category
Categories (3, object): ['First', 'Second', 'Third']
Out[2]:
pandas.core.series.Series

You can select multiple columns by passing a list of column names inside a
string to the square brackets. You will then get a Pandas dataframe with the
specified columns, as shown below.

Script 3:

print (type (titanic_data[["class", "sex", "age"]]))

titanic_data[["class", "sex", "age"]]

Output:

You can also filter rows based on some column values. For doing this, you
need to pass the condition to the filter inside the square brackets. For instance,
the script below returns all records from the Titanic dataset where the sex
column contains the value “male.”

Script 4:
my_df = titanic_data[titanic_data["sex"] == "male"]
my_df.head()

Output:

You can specify multiple conditions inside the square brackets. The following
script returns those records where the sex column contains the string “male,”
while the class column contains the string “First.”

Script 5:

my_df = titanic_data[(titanic_data["sex"] == "male") & (titanic_data["class"] == "First")]

my_df.head()

Output:

You can also use the isin() function to specify a range of values to filter
records. For instance, the script below filters all records where the age
column contains the values 20, 21, or 22.

Script 6:

ages = [ 20 , 21 , 22 ]
age_dataset = titanic_data[titanic_data["age"].isin(ages)]
age_dataset.head()
Output:

3.1.2. Indexing and Slicing Using loc Function

The loc function from the Pandas dataframe can also be used to filter records
in the Pandas dataframe.

To create a dummy dataframe used as an example in this section, run the

following script:

Script 7:

import pandas as pd

scores = [
{'Subject':'Mathematics', 'Score': 85 , 'Grade': 'B', 'Remarks': 'Good', },
{'Subject':'History', 'Score': 98 , 'Grade': 'A','Remarks': 'Excellent'},
{'Subject':'English', 'Score': 76 , 'Grade': 'C','Remarks': 'Fair'},
{'Subject':'Science', 'Score': 72 , 'Grade': 'C','Remarks': 'Fair'},
{'Subject':'Arts', 'Score': 95 , 'Grade': 'A','Remarks': 'Excellent'},
]

my_df = pd.DataFrame(scores)
my_df.head()

Output:
Let’s now see how to filter records. To filter the row at the second index in
the my_dfdataframe, you need to pass 2 inside the square brackets that follow
the loc function. Here is an example:

Script 8:

print (my_df.loc[ 2 ])
type(my_df.loc[ 2 ])

In the output below, you can see data from the row at the second index (row 3)
in the form of a series.

Output:

Subject English
Score 76
Grade C
Remarks Fair
Name: 2, dtype: object
Out[7]:
pandas.core.series.Series

You can also specify the range of indexes to filter records using the loc
function. For instance, the following script filters records from index 2 to 4.

Script 9:

my_df.loc[ 2 : 4 ]
Output:

Along with filtering rows, you can also specify which columns to filter with
the loc function.

The following script filters the values in columns Grade and Score in the
rows from index 2 to 4.

Script 10:

my_df.loc[ 2 : 4 , ["Grade", "Score"]]

Output:

In addition to passing default integer indexes, you can also pass named or
labeled indexes to the loc function.

Let’s create a dataframe with named indexes. Run the following script to do
so:

Script 11:
import pandas as pd

my_df = pd.DataFrame(scores, index = ["Student1", "Student2", "Student3", "Student4", "Student5"])

my_df

From the output below, you can see that the my_dfdataframe now contains
named indexes, e.g., Student1, Student2, etc.

Output:

Let’s now filter a record using Student1 as the index value in the loc function.

Script 12:

my_df.loc["Student1"]

Output:

Subject Mathematics
Score 85
Grade B
Remarks Good
Name: Student1, dtype: object

As shown below, you can specify multiple named indexes in a list to the loc
method. The script below filters records with indexes Student1 and Student2
.

Script 13:

index_list = ["Student1", "Student2"]

my_df.loc[index_list]

Output:

You can also find the value in a particular column while filtering records
using a named index.

The script below returns the value in the Grade column for the record with the
named index Student1 .

Script 14:

my_df.loc["Student1", "Grade"]

Output:

'B'

As you did with the default integer index, you can specify a range of records
using the named indexes within the loc function.
The following function returns values in the Grade column for the indexes
from Student1 to Student2.

Script 15:

my_df.loc["Student1":"Student2", "Grade"]

Output:

Student1 B
Student2 A
Name: Grade, dtype: object

Let’s see another example.

The following function returns values in the Grade column for the indexes
from Student1 to Student4.

Script 16:

my_df.loc["Student1":"Student4", "Grade"]

Output:

Student1 B
Student2 A
Student3 C
Student4 C
Name: Grade, dtype: object

You can also specify a list of Boolean values that correspond to the indexes to
select using the loc method.

For instance, the following script returns only the fourth record since all the
values in the list passed to the loc function are false, except the one at the
fourth index.
Script 17:

my_df.loc[[False, False, False, True, False ]]

Output:

You can also pass dataframe conditions inside the loc method. A condition
returns a boolean value which can be used to index the loc function, as you
have already seen in the previous scripts.

Before you see how loc function uses conditions, let’s see the outcome of a
basic condition in a Pandas dataframe. The script below returns index names
along with True or False values depending on whether the Score column
contains a value greater than 80 or not.

Script 18:

my_df[«Score»]> 80

You can see Boolean values in the output. You can see that indexes Student1,
Student2, and Student5 contain True.

Output:

Student1 True
Student2 True
Student3 False
Student4 False
Student5 True
Name: Score, dtype: bool

Now, let’s pass the condition “my_df["Score"]> 80 ” to the loc function.

Script 19:

my_df.loc[my_df["Score"]> 80 ]

In the output, you can see records with the indexes Student1, Student2, and
Student5.

Output:

You can pass multiple conditions to the loc function. For instance, the script
below returns those rows where the Score column contains a value greater
than 80, and the Remarks column contains the string Excellent.

Script 20:

my_df.loc[(my_df["Score"]> 80 ) & (my_df["Remarks"] == "Excellent")]

Output:

Finally, you can also specify column names to fetch values from, along with a
condition.

For example, the script below returns values from the Score and Grade
columns, where the Score column contains a value greater than 80.
Script 21:

my_df.loc[my_df["Score"]> 80 , ["Score","Grade"]]

Output:

Finally, you can set values for all the columns in a row using the loc function.
For instance, the following script sets values for all the columns for the
record at index Student4 as 90.

Script 22:

my_df.loc["Student4"] = 90
my_df

Output:

3.1.3. Indexing and Slicing Using iloc Function

You can also use the iloc function for selecting and slicing records using index
values. However, unlike the loc function, where you can pass both the string
indexes and integer indexes, you can only pass the integer index values to the
iloc function.

The following script creates a dummy dataframe for this section.

Script 23:

import pandas as pd

Output:

Let’s filter the record at index 3 (row 4).

Script 24:

my_df.iloc[ 3 ]
The script below returns a series.

Output:

Subject Science
Score 72
Grade C
Remarks Fair
Name: 3, dtype: object

If you want to select records from a single column as a dataframe, you need to
specify the index inside the square brackets and then those square brackets
inside the square brackets that follow the iloc function, as shown below.

Script 25:

my_df.iloc[[ 3 ]]

Output:

You can pass multiple indexes to the iloc function to select multiple records.
Here is an example:

Script 26:

my_df.iloc[[ 2 , 3 ]]

Output:
You can also pass a range of indexes. In this case, the records from the lower
range to 1 less than the upper range will be selected.

For instance, the script below returns records from index 2 to index 3 (1 less
than 4).

Script 27:

my_df.iloc[ 2 : 4 ]

Output:

In addition to specifying indexes, you can also pass column numbers (starting
from 0) to the iloc method.

The following script returns values from columns number 0 and 1 for the
records at indexes 2 and 3.

Script 28:

my_df.iloc[[ 2 , 3 ], [ 0 , 1 ]]

Output:
You can also pass a range of indexes and columns to select. The script below
selects columns 1 and 2 and rows 2 and 3.

Script 29:

my_df.iloc[ 2 : 4 , 0 : 2 ]

Output:

3.2. Dropping Rows and Columns with the drop() Method

Apart from selecting columns using the loc and iloc functions, you can also
use the drop() method to drop unwanted rows and columns from your
dataframe while keeping the rest of the rows and columns.

3.2.1. Dropping Rows

The following script creates a dummy dataframe that you will use in this
section.

Script 30:

import pandas as pd

my_df = pd.DataFrame(scores)
my_df.head()

Output:

The following script drops records at indexes 1 and 4.

Script 31:

my_df2 = my_df.drop([ 1 , 4 ])
my_df2.head()

Output:

From the output above, you can see that the indexes are not in sequence since
you have dropped indexes 1 and 4.
You can reset dataframe indexes starting from 0, using the reset_index().

Let’s call the reset_index() method on the my_df2 dataframe. Here, the value
True for the inplace parameter specifies that you want to remove the records
in place without assigning the result to any new variable.

Script 32:

my_df2.reset_index(inplace=True )
my_df2.head()

Output:

The above output shows that the indexes have been reset. Also, you can see
that a new column index has been added, which contains the original index. If
you only want to reset new indexes without creating a new column named
index , you can do so by passing True as the value for the drop parameter of
the reset_index method.

Let’s again drop some rows and reset the index using the reset_index() method
by passing True as the value for the drop attribute. See the following two
scripts:

Script 33:

my_df2 = my_df.drop([ 1 , 4 ])
my_df2.head()

Output:
Script 34:

my_df2.reset_index(inplace=True , drop = True )

my_df2.head()

Output:

By default, the drop method doesn’t drop rows in place. Instead, you have to
assign the result of the drop() method to another variable that contains the
records with dropped results.

For instance, if you drop the records at indexes 1, 3, and 4 using the following
script and then print the dataframe header, you will see that the rows are not
removed from the original dataframe.

Script 35:

my_df.drop([ 1 , 3 , 4 ])
my_df.head()

Output:
If you want to drop rows in place, you need to pass True as the value for the
inplace attribute, as shown in the script below:

Script 36:

my_df.drop([ 1 , 3 , 4 ], inplace = True )

my_df.head()

Output:

3.2.1. Dropping Columns

You can also drop columns using the drop() method.

The following script creates a dummy dataframe for this section.

Script 37:

import pandas as pd

my_df = pd.DataFrame(scores)
my_df.head()

Output:

To drop columns via the drop() method, you need to pass the list of columns
to the drop() method, along with 1 as the value for the axis parameter of the
drop method.

The following script drops the columns Subject and Grade from our dummy
dataframe.

Script 38:

my_df2 = my_df.drop(["Subject", "Grade"], axis = 1 )

my_df2.head()

Output:
You can also drop the columns inplace from a dataframe using the inplace =
True parameter value, as shown in the script below.

Script 39:

my_df.drop(["Subject", "Grade"], axis = 1 , inplace = True)

my_df.head()

Output:

3.3. Filtering Rows and Columns with Filter Method

The drop() method drops the unwanted records, and the filter() method
performs the reverse tasks. It keeps the desired records from a set of records
in a Pandas dataframe.

3.3.1. Filtering Rows

Run the following script to create a dummy dataframe for this section.
Script 40:

import pandas as pd

my_df = pd.DataFrame(scores)
my_df.head()

Output:

To filter rows using the filter() method, you need to pass the list of row
indexes to filter to the filter() method of the Pandas dataframe. Along with
that, you need to pass 0 as the value for the axis attribute of the filter()
method. Here is an example. The script below filters rows with indexes 1, 3,
and 4 from the Pandas dataframe.

Script 41:

my_df2 = my_df.filter([ 1 , 3 , 4 ], axis = 0 )

my_df2.head()

Output:
You can also reset indexes after filtering data using the reset_ index() method,
as shown in the following script:

Script 42:

my_df2 = my_df2.reset_index(drop=True)
my_df2.head()

Output:

3.3.2. Filtering Columns

The dummy dataframe for this section is created using the following script:

Script 43:

import pandas as pd

my_df = pd.DataFrame(scores)
my_df.head()
Output:

To filter columns using the filter() method, you need to pass the list of column
names to the filter method. Furthermore, you need to set 1 as the value for the
axis attribute.

The script below filters the Score and Grade columns from your dummy
dataframe.

Script 44:

my_df2 = my_df.filter(["Score","Grade"], axis = 1 )

my_df2.head()

Output:

3.4. Sorting Dataframes

You can also sort records in your Pandas dataframe based on values in a
particular column. Let’s see how to do this.

For this section, you will be using the Titanic dataset, which you can import
using the Seaborn library using the following script:

Script 45:

import matplotlib.pyplot as plt

import seaborn as sns

# sets the default style for plotting

sns.set_style("darkgrid")

titanic_data = sns.load_dataset('titanic')
titanic_data.head()

Output:

To sort the Pandas dataframe, you can use the sort_values() function of the
Pandas dataframe. The list of columns used for sorting needs to be passed to
the by attribute of the sort_ values() method.

The following script sorts the Titanic dataset in ascending order of the
passenger’s age.

Script 46:

age_sorted_data = titanic_data.sort_values(by=['age'])
age_sorted_data.head()

Output:
To sort by descending order, you need to pass False as the value for the
ascending attribute of the sort_values() function.

The following script sorts the dataset by descending order of age.

Script 47:

age_sorted_data = titanic_data.sort_values(by=['age'], ascending = False)

age_sorted_data.head()

Output:

You can also pass multiple columns to the by attribute of the sort_values()
function. In such a case, the dataset will be sorted by the first column, and in
the case of equal values for two or more records, the dataset will be sorted by
the second column and so on.

The following script first sorts the data by Age and then by Fare, both by
descending orders.

Script 48:

age_sorted_data = titanic_data.sort_values(by=['age','fare'], ascending = False)

age_sorted_data.head()
Output:

3.5. Pandas Unique and Count Functions

In this section, you will see how you can get a list of unique values, the
number of all unique values, and records per unique value from a column in a
Pandas dataframe.

You will be using the Titanic dataset once again, which you download via the
following script.

Script 49:

import matplotlib.pyplot as plt

import seaborn as sns

# sets the default style for plotting

sns.set_style("darkgrid")

titanic_data = sns.load_dataset('titanic')
titanic_data.head()

Output:

To find the number of all the unique values in a column, you can use the
unique() function. The script below returns all the unique values from the
class column from the Titanic dataset.

Script 50:

titanic_data["class"].unique()

Output:

['Third', 'First', 'Second']

Categories (3, object): ['Third', 'First', 'Second']

To get the count of unique values, you can use the nunique() method, as shown
in the script below.

Script 51:

titanic_data["class"].nunique()

Output:

To get the count of non-null values for all the columns in your dataset, you
may call the count() method on the Pandas dataframe. The following script
prints the count of the total number of non-null values in all the columns of the
Titanic dataset.

Script 52:

titanic_data.count()

Output:

survived 891
pclass 891
sex 891
age 714
sibsp 891
parch 891
fare 891
embarked 889
class 891
who 891
adult_male 891
deck 203
embark_town 889
alive 891
alone 891
dtype: int64

Finally, if you want to find the number of records for all the unique values in a
dataframe column, you may use the value_counts() function.

The script below returns counts of records for all the unique values in the
class column.

Script 53:

titanic_data["class"].value_counts()

Output:

Third 491
First 216
Second 184
Name: class, dtype: int64

Hands-on Time – Exercises

Now, it is your turn. Follow the instructions in the exercises below to
check your understanding of Pandas dataframe manipulation techniques that
you learned in this chapter. The answers to these questions are given at the
end of the book.

Exercise 3.1
Question 1:

Which function is used to sort Pandas dataframe by a column value?

A. sort_dataframe()
B. sort_rows()
C. sort_values()
D. sort_records()

Question 2:

To filter columns from a Pandas dataframe, you have to pass a list of column
names to one of the following methods:
A. filter()
B. filter_columns()
C. apply_filter ()
D. None of the above

Question 3:

To drop the second and fourth rows from a Pandas dataframe named my_df,
you can use the following script:
A. my_df.drop([2,4])
B. my_df.drop([1,3])
C. my_df.delete([2,4])
D. my_df.delete([1,3])
Exercise 3.2
From the Titanic dataset, filter all the records where the fare is greater than 20
and the passenger traveled alone. You can access the Titanic dataset using the
following Seaborn command:

import seaborn as sns

titanic_data = sns.load_dataset('titanic')

How To Write ATC Checks
100% (1)
How To Write ATC Checks
16 pages
E Operator Manual For MLS2000S Moment Limited System PDF
No ratings yet
E Operator Manual For MLS2000S Moment Limited System PDF
32 pages
Sap em 9.0 CG - TM - em
100% (1)
Sap em 9.0 CG - TM - em
32 pages
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
REXX Language Coding Techniques
No ratings yet
REXX Language Coding Techniques
50 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Chapter-2 Python Pandas
100% (2)
Chapter-2 Python Pandas
33 pages
Analyzing Data Using Python Filtering Data in Pandas
No ratings yet
Analyzing Data Using Python Filtering Data in Pandas
52 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
Day64 - Pandas Interview Questions
No ratings yet
Day64 - Pandas Interview Questions
5 pages
Pandas Python For Data Science
No ratings yet
Pandas Python For Data Science
1 page
Iloc and Loc Uses PDF
No ratings yet
Iloc and Loc Uses PDF
16 pages
Aiogram PDF
No ratings yet
Aiogram PDF
245 pages
DataFrame 2
No ratings yet
DataFrame 2
38 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Data Frame
No ratings yet
Data Frame
17 pages
Blueseer ERP (200-235)
No ratings yet
Blueseer ERP (200-235)
36 pages
Cronacle For SAP Solutions: Mail Capabilities Production
No ratings yet
Cronacle For SAP Solutions: Mail Capabilities Production
35 pages
Pandas
No ratings yet
Pandas
5 pages
EEEB114 Worksheet 10 v1.0 PDF
No ratings yet
EEEB114 Worksheet 10 v1.0 PDF
6 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Dataframes-I (Create & Selection)
No ratings yet
Dataframes-I (Create & Selection)
10 pages
Max API
No ratings yet
Max API
1,299 pages
Vision Tutorial
No ratings yet
Vision Tutorial
138 pages
Seleccione Pandas Dataframes Columnas y Filas Usando Loc y Iloc
No ratings yet
Seleccione Pandas Dataframes Columnas y Filas Usando Loc y Iloc
7 pages
C++ Tutorial - Multi-Threaded Programming - C++ Class Thread For Pthreads - 2012
100% (2)
C++ Tutorial - Multi-Threaded Programming - C++ Class Thread For Pthreads - 2012
15 pages
41 Computer Science MQP 2
No ratings yet
41 Computer Science MQP 2
4 pages
Reference Guide - Pandas Tools For Structuring A Dataset
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
5 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
About The XPRESS Language
No ratings yet
About The XPRESS Language
98 pages
Unit 4 Pandas
No ratings yet
Unit 4 Pandas
8 pages
Pandas
No ratings yet
Pandas
13 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
AKD Fault Card EN Rev G
No ratings yet
AKD Fault Card EN Rev G
16 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Lifecycle of The Dojo Widget Events
No ratings yet
Lifecycle of The Dojo Widget Events
3 pages
SAP CATT - Computer Aided Test Tool
No ratings yet
SAP CATT - Computer Aided Test Tool
2 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Cheat Python
No ratings yet
Cheat Python
8 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
03 DataFrames
No ratings yet
03 DataFrames
9 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
GibbsCAM Macro Reference
No ratings yet
GibbsCAM Macro Reference
87 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
Unit 2 notes-II
No ratings yet
Unit 2 notes-II
47 pages
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Cambridge Assessment International Education: Computer Science 9608/23 May/June 2018
No ratings yet
Cambridge Assessment International Education: Computer Science 9608/23 May/June 2018
13 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
ReleaseNotes - W68252 DL8000 2.60 (April-2023)
No ratings yet
ReleaseNotes - W68252 DL8000 2.60 (April-2023)
6 pages
Shah Rukh Khan 2.0
No ratings yet
Shah Rukh Khan 2.0
18 pages
Loc Iloc at Dataframe
No ratings yet
Loc Iloc at Dataframe
9 pages
TR 150
No ratings yet
TR 150
121 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas - I (PPT 6)
No ratings yet
Pandas - I (PPT 6)
14 pages
Python Cheatsy
No ratings yet
Python Cheatsy
1 page
Ip Lab File Python
No ratings yet
Ip Lab File Python
9 pages
Chapter Three - Object Oriented Fundametals in C#
No ratings yet
Chapter Three - Object Oriented Fundametals in C#
50 pages
CSL 410 L17
No ratings yet
CSL 410 L17
27 pages
Pandas DataFrame
No ratings yet
Pandas DataFrame
70 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Weekly Lesson Plan (Grade 10)
No ratings yet
Weekly Lesson Plan (Grade 10)
8 pages
Dsmlusingpython
No ratings yet
Dsmlusingpython
10 pages
C05 Advanced SQL
No ratings yet
C05 Advanced SQL
33 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Lecture 2 - Data Wrangling - Update
No ratings yet
Lecture 2 - Data Wrangling - Update
114 pages
Pandas Filtering
No ratings yet
Pandas Filtering
19 pages
Unit - 4 - Part 2
No ratings yet
Unit - 4 - Part 2
36 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
Pandas Introduction: What Is Python Pandas Used For?
No ratings yet
Pandas Introduction: What Is Python Pandas Used For?
28 pages
Pandas Dataframe1
No ratings yet
Pandas Dataframe1
43 pages
Unit 3
No ratings yet
Unit 3
22 pages
Ip Study
No ratings yet
Ip Study
18 pages
Python Notes
No ratings yet
Python Notes
13 pages
Pandas Practice
No ratings yet
Pandas Practice
7 pages
Final Exam 2023, V2
No ratings yet
Final Exam 2023, V2
4 pages
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Pandas
No ratings yet
Pandas
26 pages
Dynamic Link Library PrnFiscalDLL32 Ver 8.2-2016
No ratings yet
Dynamic Link Library PrnFiscalDLL32 Ver 8.2-2016
69 pages
Pseudocode - IGCSE OL Notes
No ratings yet
Pseudocode - IGCSE OL Notes
40 pages
Python Filtering
No ratings yet
Python Filtering
7 pages
Unit 3 Functions
No ratings yet
Unit 3 Functions
43 pages
Pandas Functions
No ratings yet
Pandas Functions
3 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet

Pandas 3

Uploaded by

Pandas 3

Uploaded by

Manipulating Pandas Dataframes

3.1. Selecting Data Using Indexing and Slicing

import matplotlib.pyplot as plt

# sets the default style for plotting

3.1.1. Selecting Data Using Brackets []

print (type (titanic_data[["class", "sex", "age"]]))

my_df = titanic_data[(titanic_data["sex"] == "male") & (titanic_data["class"] == "First")]

3.1.2. Indexing and Slicing Using loc Function

To create a dummy dataframe used as an example in this section, run the

my_df.loc[ 2 : 4 , ["Grade", "Score"]]

my_df = pd.DataFrame(scores, index = ["Student1", "Student2", "Student3", "Student4", "Student5"])

index_list = ["Student1", "Student2"]

Let’s see another example.

my_df.loc[[False, False, False, True, False ]]

Now, let’s pass the condition “my_df["Score"]> 80 ” to the loc function.

my_df.loc[(my_df["Score"]> 80 ) & (my_df["Remarks"] == "Excellent")]

3.1.3. Indexing and Slicing Using iloc Function

The following script creates a dummy dataframe for this section.

Let’s filter the record at index 3 (row 4).

3.2. Dropping Rows and Columns with the drop() Method

3.2.1. Dropping Rows

The following script drops records at indexes 1 and 4.

my_df2.reset_index(inplace=True , drop = True )

my_df.drop([ 1 , 3 , 4 ], inplace = True )

3.2.1. Dropping Columns

You can also drop columns using the drop() method.

The following script creates a dummy dataframe for this section.

my_df2 = my_df.drop(["Subject", "Grade"], axis = 1 )

my_df.drop(["Subject", "Grade"], axis = 1 , inplace = True)

3.3. Filtering Rows and Columns with Filter Method

3.3.1. Filtering Rows

my_df2 = my_df.filter([ 1 , 3 , 4 ], axis = 0 )

3.3.2. Filtering Columns

my_df2 = my_df.filter(["Score","Grade"], axis = 1 )

3.4. Sorting Dataframes

import matplotlib.pyplot as plt

# sets the default style for plotting

The following script sorts the dataset by descending order of age.

age_sorted_data = titanic_data.sort_values(by=['age'], ascending = False)

age_sorted_data = titanic_data.sort_values(by=['age','fare'], ascending = False)

3.5. Pandas Unique and Count Functions

import matplotlib.pyplot as plt

# sets the default style for plotting

['Third', 'First', 'Second']

Further Readings – Pandas Dataframe Manipulation

Hands-on Time – Exercises

Which function is used to sort Pandas dataframe by a column value?

import seaborn as sns

You might also like