Skip to content

BUG: DataFrameGroupBy.sum ignores min_count for boolean data type #34051

Closed
@dsaxton

Description

@dsaxton
  • I have checked that this issue has not already been reported.
    I have confirmed this bug exists on the latest version of pandas.
    (optional) I have confirmed this bug exists on the master branch of pandas.

Behavior is from master:

import pandas as pd

df = pd.DataFrame({"a": [1, 2], "b": pd.array([True, True])})
df.groupby("a").sum(min_count=2)

gives

      b
a      
1  True
2  True

but expected output is

      b
a      
1  <NA>
2  <NA>

It looks to me like there's an attempt to compute a Cythonized result which fails, after which point the min_count argument is forgotten.

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
NA - MaskedArraysRelated to pd.NA and nullable extension arrays
Numeric OperationsArithmetic, Comparison, and Logical operations
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on May 7, 2020
added this to the 1.1 milestone on May 10, 2020
simonjayhawkins

simonjayhawkins commented on May 11, 2020

@simonjayhawkins
Member

This also occurs for Timedelta

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1537.g6be51cb65b'
>>>
>>> df = pd.DataFrame(
...     {"foo": ["a", "a", "a"], "bar": [pd.Timedelta("1D"), pd.Timedelta("2D"), None]}
... )
>>>
>>> df
  foo    bar
0   a 1 days
1   a 2 days
2   a    NaT
>>>
>>> grp = df.groupby("foo")["bar"]
>>>
>>> grp.sum(min_count=3)
foo
a   3 days
Name: bar, dtype: timedelta64[ns]
>>>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugGroupbyNA - MaskedArraysRelated to pd.NA and nullable extension arraysNumeric OperationsArithmetic, Comparison, and Logical operations

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      Participants

      @jreback@dsaxton@simonjayhawkins

      Issue actions

        BUG: DataFrameGroupBy.sum ignores min_count for boolean data type · Issue #34051 · pandas-dev/pandas