Skip to content

BUG: ValueError: cannot convert float NaN to integer when resetting MultiIndex with NaT values #36541

Closed
@ssche

Description

@ssche
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

In [18]: ix = pd.MultiIndex.from_tuples([(pd.NaT, 1), (pd.NaT, 2)], names=['a', 'b'])

In [19]: ix
Out[19]: 
MultiIndex([('NaT', 1),
            ('NaT', 2)],
           names=['a', 'b'])

In [20]: d = pd.DataFrame({'x': [11, 12]}, index=ix)

In [21]: d
Out[21]: 
        x
a   b    
NaT 1  11
    2  12

In [22]: d.reset_index()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-22-4653618060e8> in <module>
----> 1 d.reset_index()

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
   4851                     name = tuple(name_lst)
   4852                 # to ndarray and maybe infer different dtype
-> 4853                 level_values = _maybe_casted_values(lev, lab)
   4854                 new_obj.insert(0, name, level_values)
   4855 

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in _maybe_casted_values(index, labels)
   4784                     dtype = index.dtype
   4785                     fill_value = na_value_for_dtype(dtype)
-> 4786                     values = construct_1d_arraylike_from_scalar(
   4787                         fill_value, len(mask), dtype
   4788                     )

~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in construct_1d_arraylike_from_scalar(value, length, dtype)
   1556 
   1557         subarr = np.empty(length, dtype=dtype)
-> 1558         subarr.fill(value)
   1559 
   1560     return subarr

ValueError: cannot convert float NaN to integer

Problem description

With the introduction and use of groupby(..., dropna=False) multiindex with NaT values are more likely to occur which exhibits a few issues that previously went undetected. This issue was discovered when finding a workaround for another dropna=False related issue (#36060 (comment))

Further investigation shows that this may be an issue with numpy not accepting pd.NaT. The following code reproduces the issue in construct_1d_arraylike_from_scalar:

In [33]: a = np.empty(2, dtype='datetime64[ns]')

In [34]: a.fill(pd.NaT)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-9fa9ff66da99> in <module>
----> 1 a.fill(pd.NaT)

ValueError: cannot convert float NaN to integer

which led me to propose this fix:

--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -1559,6 +1559,12 @@ def construct_1d_arraylike_from_scalar(
             dtype = np.dtype("object")
             if not isna(value):
                 value = ensure_str(value)
+        elif isinstance(dtype, np.dtype) and dtype.kind == "M" and value is NaT:
+            # can't fill sub array directly with pandas' NaT:
+            #
+            # > a.fill(pd.NaT)
+            # ValueError: cannot convert float NaN to integer
+            value = np.datetime64("NaT")
 
         subarr = np.empty(length, dtype=dtype)
         subarr.fill(value)

the value is NaT check could possibly be extended to isna(value)...

Expected Output

No ValueError being raised

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 15539fa
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_AU.UTF-8

pandas : 1.2.0.dev0+453.g15539fa62.dirty
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0
Cython : 0.29.21
pytest : 6.0.2
hypothesis : 5.35.3
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.3.4
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.2
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateMultiIndex

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions