How to Count Distinct Values of a Pandas Dataframe Column?
Last Updated :
02 Dec, 2024
Let's discuss how to count distinct values of a Pandas DataFrame column.
Using pandas.unique()
You can use pd.unique()to get all unique values in a column. To count them, apply len()to the result. This method is useful when you want distinct values and their count.
Python
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'height': [165, 165, 164, 158, 167, 160, 158, 165],
'weight': [63.5, 64, 63.5, 54, 63.5, 62, 64, 64],
'age': [20, 22, 22, 21, 23, 22, 20, 21]
}, index=['Steve', 'Ria', 'Nivi', 'Jane', 'Kate', 'Lucy', 'Ram', 'Niki'])
# Count unique values in 'height' column using unique()
n = len(pd.unique(df['height']))
print("Number of unique values in 'height':", n)
OutputNumber of unique values in 'height': 5
In addition to the pandas.unique() method, there are several other approaches to count distinct values in a Pandas DataFrame:
nunique()method counts distinct values in each column, making it perfect for quickly summarizing unique values across one or more columns.
Python
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'height': [165, 165, 164, 158, 167, 160, 158, 165],
'weight': [63.5, 64, 63.5, 54, 63.5, 62, 64, 64],
'age': [20, 22, 22, 21, 23, 22, 20, 21]
}, index=['Steve', 'Ria', 'Nivi', 'Jane', 'Kate', 'Lucy', 'Ram', 'Niki'])
# Count unique values in each column using nunique()
n = df.nunique()
print("Number of unique values in each column:\n", n)
OutputNumber of unique values in each column:
height 5
weight 4
age 4
dtype: int64
Count Distinct Values in Pandas DataFrame using Series.value_counts()
value_counts() counts the frequency of each unique value in a column. Use it to get both the count and distribution of values, and find the number of unique values by applying len().
Python
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'height': [165, 165, 164, 158, 167, 160, 158, 165],
'weight': [63.5, 64, 63.5, 54, 63.5, 62, 64, 64],
'age': [20, 22, 22, 21, 23, 22, 20, 21]
}, index=['Steve', 'Ria', 'Nivi', 'Jane', 'Kate', 'Lucy', 'Ram', 'Niki'])
# Count unique values in 'height' column using value_counts()
li = list(df['height'].value_counts())
print("Number of unique values in 'height':", len(li))
OutputNumber of unique values in 'height': 5
Using a For Loop
A for loop can manually count unique values by checking if a value has already been visited. This is useful when built-in Pandas functions aren't available or you need a custom solution.
Python
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'height': [165, 165, 164, 158, 167, 160, 158, 165],
'weight': [63.5, 64, 63.5, 54, 63.5, 62, 64, 64],
'age': [20, 22, 22, 21, 23, 22, 20, 21]
}, index=['Steve', 'Ria', 'Nivi', 'Jane', 'Kate', 'Lucy', 'Ram', 'Niki'])
# Count unique values in 'height' column using a for loop
cnt = 0
visited = []
for value in df['height']:
if value not in visited:
visited.append(value)
cnt += 1
print("Number of unique values in 'height':", cnt)
print("Unique values:", visited)
OutputNumber of unique values in 'height': 5
Unique values: [165, 164, 158, 167, 160]
Using drop_duplicates() Method
drop_duplicates()is useful when you need to remove duplicate values and count distinct values directly. It’s a good alternative to unique() when you want to see the actual distinct values as a new DataFrame or Series.
Python
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'height': [165, 165, 164, 158, 167, 160, 158, 165],
'weight': [63.5, 64, 63.5, 54, 63.5, 62, 64, 64],
'age': [20, 22, 22, 21, 23, 22, 20, 21]
}, index=['Steve', 'Ria', 'Nivi', 'Jane', 'Kate', 'Lucy', 'Ram', 'Niki'])
# Count unique values in 'height' column using drop_duplicates()
unique_values = df['height'].drop_duplicates()
print("Unique values in 'height':", unique_values)
print("Number of unique values in 'height':", unique_values.count())
OutputUnique values in 'height': Steve 165
Nivi 164
Jane 158
Kate 167
Lucy 160
Name: height, dtype: int64
Number of unique values in 'height': 5
Similar Reads
Count the NaN values in one or more columns in Pandas DataFrame Let us see how to count the total number of NaN values in one or more columns in a Pandas DataFrame. In order to count the NaN values in the DataFrame, we are required to assign a dictionary to the DataFrame and that dictionary should contain numpy.nan values which is a NaN(null) value. Consider the
2 min read
Count number of columns of a Pandas DataFrame Let's discuss how to count the number of columns of a Pandas DataFrame. Lets first make a dataframe. Example: Python3 # Import Required Libraries import pandas as pd import numpy as np # Create a dictionary for the dataframe dict = {'Name': ['Sukritin', 'Sumit Tyagi', 'Akriti Goel', 'Sanskriti', 'Ab
2 min read
Count number of rows and columns in Pandas dataframe In Pandas understanding number of rows and columns in a DataFrame is important for knowing structure of our dataset. Whether we're cleaning the data, performing calculations or visualizing results finding shape of the DataFrame is one of the initial steps. In this article, we'll explore various ways
3 min read
Count the number of rows and columns of Pandas dataframe In this article, we'll see how we can get the count of the total number of rows and columns in a Pandas DataFrame. There are different methods by which we can do this. Let's see all these methods with the help of examples. Example 1: We can use the dataframe.shape to get the count of rows and column
2 min read
Count Values in Pandas Dataframe Counting values in Pandas dataframe is important for understanding the distribution of data, checking for missing values or summarizing data. In this article, we will learn various methods to count values in a Pandas DataFrameWe will be using below dataframe to learn about various methods:Pythonimpo
3 min read