Skip to content

BUG: check_dtype=False in assert_series_equal is not returning expected results for datetime & timedelta types in pandas-2.0 #52449

Closed
@galipremsagar

Description

@galipremsagar

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

In [1]: import pandas as pd

In [2]: s = pd.Series([1000213, 2131232, 21312331], dtype='datetime64[s]')

In [3]: s
Out[3]: 
0   1970-01-12 13:50:13
1   1970-01-25 16:00:32
2   1970-09-04 16:05:31
dtype: datetime64[s]

In [4]: p = s.astype('datetime64[ms]')

In [6]: p
Out[6]: 
0   1970-01-12 13:50:13
1   1970-01-25 16:00:32
2   1970-09-04 16:05:31
dtype: datetime64[ms]

In [7]: s
Out[7]: 
0   1970-01-12 13:50:13
1   1970-01-25 16:00:32
2   1970-09-04 16:05:31
dtype: datetime64[s]

In [8]: pd.testing.assert_series_equal(s, p)          # Failure: Works as expected since `dtype's` are different
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 pd.testing.assert_series_equal(s, p)

    [... skipping hidden 2 frame]

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/_testing/asserters.py:596, in raise_assert_detail(obj, message, left, right, diff, first_diff, index_values)
    593 if first_diff is not None:
    594     msg += f"\n{first_diff}"
--> 596 raise AssertionError(msg)

AssertionError: Attributes of Series are different

Attribute "dtype" are different
[left]:  datetime64[s]
[right]: datetime64[ms]

In [9]: pd.testing.assert_series_equal(s, p, check_dtype=False)       # I expect this to not raise, because we are asking for the dtypes to be ignored and the data as seen above is perfectly identical.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 pd.testing.assert_series_equal(s, p, check_dtype=False)

    [... skipping hidden 1 frame]

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/_testing/asserters.py:741, in assert_extension_array_equal(left, right, check_dtype, index_values, check_exact, rtol, atol, obj)
    732     assert_attr_equal("dtype", left, right, obj=f"Attributes of {obj}")
    734 if (
    735     isinstance(left, DatetimeLikeArrayMixin)
    736     and isinstance(right, DatetimeLikeArrayMixin)
   (...)
    739     # Avoid slow object-dtype comparisons
    740     # np.asarray for case where we have a np.MaskedArray
--> 741     assert_numpy_array_equal(
    742         np.asarray(left.asi8),
    743         np.asarray(right.asi8),
    744         index_values=index_values,
    745         obj=obj,
    746     )
    747     return
    749 left_na = np.asarray(left.isna())

    [... skipping hidden 1 frame]

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/_testing/asserters.py:666, in assert_numpy_array_equal.<locals>._raise(left, right, err_msg)
    664     diff = diff * 100.0 / left.size
    665     msg = f"{obj} values are different ({np.round(diff, 5)} %)"
--> 666     raise_assert_detail(obj, msg, left, right, index_values=index_values)
    668 raise AssertionError(err_msg)

File /nvme/0/pgali/envs/cudfdev/lib/python3.10/site-packages/pandas/_testing/asserters.py:596, in raise_assert_detail(obj, message, left, right, diff, first_diff, index_values)
    593 if first_diff is not None:
    594     msg += f"\n{first_diff}"
--> 596 raise AssertionError(msg)

AssertionError: Series are different

Series values are different (100.0 %)
[index]: [0, 1, 2]
[left]:  [1000213, 2131232, 21312331]
[right]: [1000213000, 2131232000, 21312331000]

Issue Description

With the newly introduced datetime64 & timedelta64 time resolutions, it is possible to hold the identical data in different dtypes. So when we pass check_dtype=False to assert_frame_equal we expect identical data to pass and not raise an error.

Expected Behavior

In [9]: pd.testing.assert_series_equal(s, p, check_dtype=False) # Passes.

In [10]: pd.testing.assert_series_equal(s, p, check_dtype=True) # Raises error

Installed Versions

INSTALLED VERSIONS

commit : c2a7f1a
python : 3.10.10.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-76-generic
Version : #86-Ubuntu SMP Fri Jan 17 17:24:28 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.0rc1
numpy : 1.23.5
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.0.1
Cython : 0.29.33
pytest : 7.2.2
hypothesis : 6.70.1
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.12.0
pandas_datareader: None
bs4 : 4.12.0
bottleneck : None
brotli :
fastparquet : None
fsspec : 2023.3.0
gcsfs : None
matplotlib : None
numba : 0.56.4
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : None
pyxlsb : None
s3fs : 2023.3.0
scipy : 1.10.1
snappy :
sqlalchemy : 1.4.46
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugNon-Nanodatetime64/timedelta64 with non-nanosecond resolution

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions