
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Finding Quantile and Decile Ranks of a Pandas DataFrame Column
Quantile and decile ranks are commonly used statistical measures to determine the position of an observation in a dataset relative to the rest of the dataset. In this technical blog, we will explore how to find the quantile and decile ranks of a Pandas DataFrame column in Python.
Installation and Syntax
pip install pandas
The syntax for finding the quantile and decile ranks of a Pandas DataFrame column is as follows ?
# For finding quantile rank df['column_name'].rank(pct=True) # For finding decile rank df['column_name'].rank(pct=True, method='nearest', bins=10)
Algorithm
Load the data into a Pandas DataFrame.
Select the column for which you want to find the quantile and decile ranks.
Use the rank() method with the pct parameter set to True to find the quantile rank of each observation in the column.
Use the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10 to find the decile rank of each observation in the column.
Example 1
import pandas as pd # Create a DataFrame data = {'A': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]} df = pd.DataFrame(data) # Find the quantile rank df['A_quantile_rank'] = df['A'].rank(pct=True) print(df)
Output
A A_quantile_rank 0 1 0.1 1 3 0.3 2 5 0.5 3 7 0.7 4 9 0.9 5 11 0.5 6 13 0.7 7 15 0.9 8 17 1.0 9 19 1.0
Create a Pandas DataFrame with one column A containing 10 integers and then find the quantile rank of each observation in the A column using the rank() method with the pct parameter set to True. We create a new column A_quantile_rank to store the quantile ranks and print the resulting DataFrame.
Example 2
import pandas as pd # Create a DataFrame data = {'A': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]} df = pd.DataFrame(data) # Find the decile rank n = 10 df['A_decile_rank'] = pd.cut(df['A'], n, labels=range(1, n+1)).astype(int) print(df)
Output
A A_decile_rank 0 1 1 1 3 2 2 5 3 3 7 4 4 9 5 5 11 6 6 13 7 7 15 8 8 17 9 9 19 10
Make a Pandas DataFrame with one column A containing 10 integers. We then find the decile rank of each observation in the A column using the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10. We create a new column A_decile_rank to store the decile ranks and print the resulting DataFrame.
Example 3
import pandas as pd import numpy as np # Create a DataFrame np.random.seed(42) data = {'A': np.random.normal(0, 1, 1000), 'B': np.random.normal(5, 2, 1000)} df = pd.DataFrame(data) # Find the quantile rank of column A df['A_quantile_rank'] = df['A'].rank(pct=True) # Find the decile rank of column B n = 10 df['B_decile_rank'] = pd.cut(df['B'], n, labels=range(1, n+1)).astype(int) # Print the resulting DataFrame print(df)
Output
A B A_quantile_rank B_decile_rank 0 0.496714 7.798711 0.693 8 1 -0.138264 6.849267 0.436 7 2 0.647689 5.119261 0.750 5 3 1.523030 3.706126 0.929 4 4 -0.234153 6.396447 0.405 6 .. ... ... ... ... 995 -0.281100 7.140300 0.384 7 996 1.797687 4.946957 0.960 5 997 0.640843 3.236251 0.746 4 998 -0.571179 4.673866 0.276 5 999 0.572583 3.510195 0.718 4 [1000 rows x 4 columns]
Start with a Pandas DataFrame with two columns A and B, each containing 1000 randomly generated values. We then find the quantile rank of the A column using the rank() method with the pct parameter set to True and store the resulting ranks in a new column A_quantile_rank. We also find the decile rank of the B column using the rank() method with the pct parameter set to True, the method parameter set to 'nearest', and the bins parameter set to 10, and store the resulting ranks in a new column B_decile_rank. Finally, we print the resulting DataFrame.
Applications
Identifying outliers in a dataset
Ranking observations in a dataset
Comparing observations in a dataset
Conclusion
This technical blog examined how to use the rank() method with the pct parameter set to True and the method and bins arguments to modify the behavior of the rank() function to get the quantile and decile rankings of a Pandas DataFrame column in Python. Data analysis and visualization might benefit from knowing the quantile and decile rankings of a Pandas DataFrame column since doing so can make it easier to comprehend a dataset's distribution and spot outliers.