Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
In [18]: ix = pd.MultiIndex.from_tuples([(pd.NaT, 1), (pd.NaT, 2)], names=['a', 'b'])
In [19]: ix
Out[19]:
MultiIndex([('NaT', 1),
('NaT', 2)],
names=['a', 'b'])
In [20]: d = pd.DataFrame({'x': [11, 12]}, index=ix)
In [21]: d
Out[21]:
x
a b
NaT 1 11
2 12
In [22]: d.reset_index()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-22-4653618060e8> in <module>
----> 1 d.reset_index()
~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
4851 name = tuple(name_lst)
4852 # to ndarray and maybe infer different dtype
-> 4853 level_values = _maybe_casted_values(lev, lab)
4854 new_obj.insert(0, name, level_values)
4855
~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/frame.py in _maybe_casted_values(index, labels)
4784 dtype = index.dtype
4785 fill_value = na_value_for_dtype(dtype)
-> 4786 values = construct_1d_arraylike_from_scalar(
4787 fill_value, len(mask), dtype
4788 )
~/envs/pandas-test/lib/python3.8/site-packages/pandas/core/dtypes/cast.py in construct_1d_arraylike_from_scalar(value, length, dtype)
1556
1557 subarr = np.empty(length, dtype=dtype)
-> 1558 subarr.fill(value)
1559
1560 return subarr
ValueError: cannot convert float NaN to integer
Problem description
With the introduction and use of groupby(..., dropna=False)
multiindex with NaT values are more likely to occur which exhibits a few issues that previously went undetected. This issue was discovered when finding a workaround for another dropna=False
related issue (#36060 (comment))
Further investigation shows that this may be an issue with numpy not accepting pd.NaT. The following code reproduces the issue in construct_1d_arraylike_from_scalar
:
In [33]: a = np.empty(2, dtype='datetime64[ns]')
In [34]: a.fill(pd.NaT)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-34-9fa9ff66da99> in <module>
----> 1 a.fill(pd.NaT)
ValueError: cannot convert float NaN to integer
which led me to propose this fix:
--- a/pandas/core/dtypes/cast.py
+++ b/pandas/core/dtypes/cast.py
@@ -1559,6 +1559,12 @@ def construct_1d_arraylike_from_scalar(
dtype = np.dtype("object")
if not isna(value):
value = ensure_str(value)
+ elif isinstance(dtype, np.dtype) and dtype.kind == "M" and value is NaT:
+ # can't fill sub array directly with pandas' NaT:
+ #
+ # > a.fill(pd.NaT)
+ # ValueError: cannot convert float NaN to integer
+ value = np.datetime64("NaT")
subarr = np.empty(length, dtype=dtype)
subarr.fill(value)
the value is NaT
check could possibly be extended to isna(value)
...
Expected Output
No ValueError being raised
Output of pd.show_versions()
INSTALLED VERSIONS
commit : 15539fa
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 19.6.0
Version : Darwin Kernel Version 19.6.0: Thu Jun 18 20:49:00 PDT 2020; root:xnu-6153.141.1~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_AU.UTF-8
pandas : 1.2.0.dev0+453.g15539fa62.dirty
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.1.1
setuptools : 46.4.0
Cython : 0.29.21
pytest : 6.0.2
hypothesis : 5.35.3
sphinx : 3.2.1
blosc : 1.9.2
feather : None
xlsxwriter : 1.3.4
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : 1.3.2
fsspec : 0.8.2
fastparquet : 0.4.1
gcsfs : 0.7.1
matplotlib : 3.3.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : 0.5.1
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.16.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.51.2