Grouping Categorical Variables in Pandas Dataframe Last Updated : 15 Jul, 2025 Comments Improve Suggest changes Like Article Like Report Firstly, we have to understand what are Categorical variables in pandas. Categorical are the datatype available in pandas library of python. A categorical variable takes only a fixed category (usually fixed number) of values. Some examples of Categorical variables are gender, blood group, language etc. One main contrast with these variables are that no mathematical operations can be performed with these variables. A dataframe can be created in pandas consisting of categorical values using Dataframe constructor and specifying dtype = ”category”. Python3 # importing pandas as pd import pandas as pd # Create the dataframe # with categorical variable df = pd.DataFrame({'A': ['a', 'b', 'c', 'c', 'a', 'b'], 'B': [0, 1, 1, 0, 1, 0]}, dtype = "category") # show the data types df.dtypes Output: Here one important thing is that categories generated in each column are not same, conversion is done column by column as we can see here: Output: Now, in some works, we need to group our categorical data. This is done using the groupby() method given in pandas. It returns all the combinations of groupby columns. Along with group by we have to pass an aggregate function with it to ensure that on what basis we are going to group our variables. Some aggregate function are mean(), sum(), count() etc. Now applying our groupby() along with count() function. Python3 # initial state print(df) # counting number of each category print(df.groupby(['A']).count().reset_index()) Output: dataframeGroup by column 'A' Now, one more example with mean() function. Here column A is converted to categorical and all other are numerical and mean is calculated according to categories of column A and column B. Python3 # importing pandas as pd import pandas as pd # Create the dataframe df = pd.DataFrame({'A': ['a', 'b', 'c', 'c', 'a', 'b'], 'B': [0, 1, 1, 0, 1, 0], 'C':[7, 8, 9, 5, 3, 6]}) # change the datatype of # column 'A' into category # data type df['A'] = df['A'].astype('category') # initial state print(df) # calculating mean with # all combinations of A and B print(df.groupby(['A','B']).mean().reset_index()) Output: DataframeGroup by both column 'A' and 'B' Other aggregate functions are also implemented in the same way using groupby(). Comment More infoAdvertise with us Next Article Adding New Variable to Pandas DataFrame V vipul1501 Follow Improve Article Tags : Python Python-pandas Python pandas-dataFrame Practice Tags : python Similar Reads Pandas - Groupby value counts on the DataFrame Prerequisites: Pandas Pandas can be employed to count the frequency of each value in the data frame separately. Let's see how to Groupby values count on the pandas dataframe. To count Groupby values in the pandas dataframe we are going to use groupby() size() and unstack() method. Functions Used:gro 3 min read Adding New Variable to Pandas DataFrame In this article let's learn how to add a new variable to pandas DataFrame using the assign() function and square brackets. Pandas is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing d 3 min read Python | Pandas Categorical DataFrame creation pandas.DataFrame(dtype="category") : For creating a categorical dataframe, dataframe() method has dtype attribute set to category. All the columns in data-frame can be converted to categorical either during or after construction by specifying dtype="category" in the DataFrame constructor. Code : Pyt 1 min read Python | Pandas Categorical DataFrame creation pandas.DataFrame(dtype="category") : For creating a categorical dataframe, dataframe() method has dtype attribute set to category. All the columns in data-frame can be converted to categorical either during or after construction by specifying dtype="category" in the DataFrame constructor. Code : Pyt 1 min read Pyspark dataframe: Summing column while grouping over another In this article, we will discuss how to sum a column while grouping another in Pyspark dataframe using Python. Let's create the dataframe for demonstration:Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating spark 4 min read Sort Dataframe according to row frequency in Pandas In this article, we will discuss how to use count() and sort_values() in pandas. So the count in pandas counts the frequency of elements in the dataframe column and then sort sorts the dataframe according to element frequency. count(): This method will show you the number of values for each column i 2 min read Like