To count distinct, use nunique in Pandas. We will groupby a column and find sun as well using Numpy sum().
At first, import the required libraries −
import pandas as pd import numpy as np
Create a DataFrame with 3 columns. The columns have duplicate values −
dataFrame = pd.DataFrame( { "Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],"Place": ['Delhi','Bangalore','Delhi','Chandigarh','Chandigarh'],"Units": [100, 150, 50, 110, 90] } )
Count distinct in aggregation agg() with nunique. Calculating the sum for counting, we are using numpy sum() −
dataFrame = dataFrame.groupby("Car").agg({"Units": np.sum, "Place": pd.Series.nunique})
Example
Following is the code −
import pandas as pd import numpy as np dataFrame = pd.DataFrame( { "Car": ['BMW', 'Audi', 'BMW', 'Lexus', 'Lexus'],"Place": ['Delhi','Bangalore','Delhi','Chandigarh','Chandigarh'],"Units": [100, 150, 50, 110, 90] } ) print"DataFrame ...\n",dataFrame # count distinct in aggregation with nunique dataFrame = dataFrame.groupby("Car").agg({"Units": np.sum, "Place": pd.Series.nunique}) print"\nUpdated DataFrame ...\n",dataFrame
Output
This will produce the following output −
DataFrame ... Car Place Units 0 BMW Delhi 100 1 Audi Bangalore 150 2 BMW Delhi 50 3 Lexus Chandigarh 110 4 Lexus Chandigarh 90 Updated DataFrame ... Units Place Car Audi 150 1 BMW 150 1 Lexus 200 1