-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Closed
Labels
Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations
Milestone
Description
Right now, the output shape and dtype of DataFrame.describe for object columns depends on whether the DataFrame is empty.
Code Sample, a copy-pastable example if possible
In [75]: x = pd.DataFrame({"A": ['a', np.nan, np.nan]})
In [76]: x.describe()
Out[76]:
A
count 1
unique 1
top a
freq 1
In [77]: x.iloc[:0].describe()
Out[77]:
A
count 0
unique 0
Problem description
This leads to instability in the output dtypes and shape.
Would people prefer that we use np.NaN
or None
for the top and freq in this case? I believe there's no ambiguity, since we drop missing values before computing.
While the output consistency would be nice, it's not clear to me what's actually best for users here.
Metadata
Metadata
Assignees
Labels
Numeric OperationsArithmetic, Comparison, and Logical operationsArithmetic, Comparison, and Logical operations