How to Sort Pandas DataFrame?
Last Updated :
02 Dec, 2024
Pandas provides a powerful method called sort_values() that allows to sort the DataFrame based on one or more columns. The method can sort in both ascending and descending order, handle missing values, and even apply custom sorting logic. To immediately understand how sorting works, let’s look at a simple example:
1. Sort DataFrame by One Column Value
To sort a DataFrame by a single column, you use the sort_values()
method and specify the column name using the by
parameter.
Python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)
# Sorting by 'Age' in ascending order
sorted_df = df.sort_values(by='Age')
print(sorted_df)
Output:
Sort Pandas DataFrameIn this example, the DataFrame is sorted by the Age
column in ascending order. Now let’s dive deeper into how this works.
Sorting is essential when dealing with large datasets as it helps organize and interpret data more efficiently. In Pandas, the sort_values()
method allows you to sort a DataFrame by one or more columns. By default, it sorts in ascending order but can be customized with various parameters.
Key Parameters of sort_values():
- by: Specifies the column(s) to sort by.
- ascending: Boolean (default True). If False, sorts in descending order.
- inplace: If True, modifies the original DataFrame; otherwise returns a new sorted DataFrame.
- na_position: Specifies whether to place NaN values at the beginning ('first') or end ('last').
- ignore_index: If True, resets the index after sorting.
By default, the sorting is done in ascending order. If you want to sort in descending order, you can set the ascending
parameter to False
.
Python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],'Age': [25, 30, 35, 40],'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)
# Sorting by 'Age' in descending order
sorted_df = df.sort_values(by='Age',ascending=False)
print(sorted_df)
Output Name Age Score
3 David 40 80
2 Charlie 35 95
1 Bob 30 90
0 Alice 25 85
2. Sort DataFrame by Multiple Columns
Sometimes, you need to sort your data based on multiple criteria. For example, you might want to sort by age and then by name. You can achieve this by passing a list of column names to the by
parameter.
Python
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'Score': [85, 90, 95, 80]}
df = pd.DataFrame(data)
# Sorting by 'Score' in ascending order
sorted_df = df.sort_values(by=['Age', 'Score'])
print(sorted_df)
Output Name Age Score
0 Alice 25 85
1 Bob 30 90
2 Charlie 35 95
3 David 40 80
This will first sort by Age
, and if there are ties (same age), it will then sort by Score
. You can also specify different sort orders for each column by using the ascending
parameter with a list of boolean values.
3. Sort DataFrame with Missing Values
When datasets contain missing values, sorting behavior can be controlled using na_position
parameter in sort_values()
. By default, missing values are placed last, but you can place them first if needed.
Python
import pandas as pd
data_with_nan = {"Name": ["Alice", "Bob", "Charlie", "David"],"Age": [28, 22, None, 22]}
df_nan = pd.DataFrame(data_with_nan)
# Sort by 'Age', placing missing values first
sorted_df = df_nan.sort_values(by="Age", na_position="first")
print(sorted_df)
Output Name Age
2 Charlie NaN
1 Bob 22.0
3 David 22.0
0 Alice 28.0
The na_position='first'
option moves rows with NaN
values to the top during sorting.
Choosing the Sorting Algorithm
Pandas allows you to specify the sorting algorithm using the kind
parameter. The available options are:
'quicksort'
: Quicksort is a highly efficient, divide-and-conquer sorting algorithm. It selects a "pivot" element and partitions the dataset into two halves: one with elements smaller than the pivot and the other with elements greater than the pivot.'mergesort'
: Divides the dataset into smaller subarrays, sorts them, and then merges them back together in sorted order.'heapsort'
: Heapsort is another comparison-based sorting algorithm that builds a heap data structure to systematically extract the largest or smallest element and reorder the dataset.
To better demonstrate the behavior and benefits of using the 'mergesort'
algorithm, particularly its stability, let's modify the example to include duplicate values in the column being sorted.
Python
import pandas as pd
# Create a DataFrame with duplicate 'Age' values
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"Age": [28, 22, 25, 22, 28],
"Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)
# Sort the DataFrame by 'Age' using the 'mergesort' algorithm
sorted_df = df.sort_values(by='Age', kind='mergesort')
print(sorted_df)
Output:
Sort Pandas DataFrameStability ensures that the relative order of rows with equal values in the sorting column is preserved.
Custom Sorting with Key Functions
You can also apply custom sorting logic using the key
parameter. For example, let’s say you want to sort strings ignoring case sensitivity:
Python
import pandas as pd
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
"Age": [28, 22, 25, 22, 28],
"Score": [85, 90, 95, 80, 88]
}
df = pd.DataFrame(data)
sorted_df = df.sort_values(by='Name', key=lambda col: col.str.lower())
print(sorted_df)
Output Name Age Score
0 Alice 28 85
1 Bob 22 90
2 Charlie 25 95
3 David 22 80
4 Eve 28 88
This ensures that names are sorted alphabetically without considering case differences.
Key Takeaways:
- sort_values() is versatile and allows sorting by one or multiple columns.
- You can control whether sorting is ascending or descending using the
ascending
parameter. - Missing values (NaN) can be placed at either the beginning or end using the
na_position
parameter. - Custom sorting logic can be applied using the key parameter.
Similar Reads
Python Tutorial | Learn Python Programming Language Python Tutorial â Python is one of the most popular programming languages. Itâs simple to use, packed with features and supported by a wide range of libraries and frameworks. Its clean syntax makes it beginner-friendly.Python is:A high-level language, used in web development, data science, automatio
10 min read
Machine Learning Tutorial Machine learning is a branch of Artificial Intelligence that focuses on developing models and algorithms that let computers learn from data without being explicitly programmed for every task. In simple words, ML teaches the systems to think and understand like humans by learning from the data.It can
5 min read
Python Interview Questions and Answers Python is the most used language in top companies such as Intel, IBM, NASA, Pixar, Netflix, Facebook, JP Morgan Chase, Spotify and many more because of its simplicity and powerful libraries. To crack their Online Assessment and Interview Rounds as a Python developer, we need to master important Pyth
15+ min read
Python OOPs Concepts Object Oriented Programming is a fundamental concept in Python, empowering developers to build modular, maintainable, and scalable applications. By understanding the core OOP principles (classes, objects, inheritance, encapsulation, polymorphism, and abstraction), programmers can leverage the full p
11 min read
Python Projects - Beginner to Advanced Python is one of the most popular programming languages due to its simplicity, versatility, and supportive community. Whether youâre a beginner eager to learn the basics or an experienced programmer looking to challenge your skills, there are countless Python projects to help you grow.Hereâs a list
10 min read
Linear Regression in Machine learning Linear regression is a type of supervised machine-learning algorithm that learns from the labelled datasets and maps the data points with most optimized linear functions which can be used for prediction on new datasets. It assumes that there is a linear relationship between the input and output, mea
15+ min read
Support Vector Machine (SVM) Algorithm Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression tasks. It tries to find the best boundary known as hyperplane that separates different classes in the data. It is useful when you want to do binary classification like spam vs. not spam or
9 min read
Python Exercise with Practice Questions and Solutions Python Exercise for Beginner: Practice makes perfect in everything, and this is especially true when learning Python. If you're a beginner, regularly practicing Python exercises will build your confidence and sharpen your skills. To help you improve, try these Python exercises with solutions to test
9 min read
Python Programs Practice with Python program examples is always a good choice to scale up your logical understanding and programming skills and this article will provide you with the best sets of Python code examples.The below Python section contains a wide collection of Python programming examples. These Python co
11 min read
Logistic Regression in Machine Learning Logistic Regression is a supervised machine learning algorithm used for classification problems. Unlike linear regression which predicts continuous values it predicts the probability that an input belongs to a specific class. It is used for binary classification where the output can be one of two po
11 min read