What’s new in 2.3.1 (July 7, 2025)#

These are the changes in pandas 2.3.1. See Release notes for a full changelog including other versions of pandas.

Improvements and fixes for the StringDtype#

Most changes in this release are related to StringDtype which will become the default string dtype in pandas 3.0. See Upcoming changes in pandas 3.0 for more details.

Comparisons between different string dtypes#

In previous versions, comparing Series of different string dtypes (e.g. pd.StringDtype("pyarrow", na_value=pd.NA) against pd.StringDtype("python", na_value=np.nan)) would result in inconsistent resulting dtype or incorrectly raise (GH 60639). pandas will now use the hierarchy

object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)

in determining the result dtype when there are different string dtypes compared. Some examples:

When pd.StringDtype("pyarrow", na_value=pd.NA) is compared against any other string dtype, the result will always be boolean[pyarrow].
When pd.StringDtype("python", na_value=pd.NA) is compared against pd.StringDtype("pyarrow", na_value=np.nan), the result will be boolean, the NumPy-backed nullable extension array.
When pd.StringDtype("python", na_value=pd.NA) is compared against pd.StringDtype("python", na_value=np.nan), the result will be boolean, the NumPy-backed nullable extension array.

Index set operations ignore empty RangeIndex and object dtype Index#

When enabling the future.infer_string option, Index set operations (like union or intersection) will now ignore the dtype of an empty RangeIndex or empty Index with object dtype when determining the dtype of the resulting Index (GH 60797).

This ensures that combining such empty Index with strings will infer the string dtype correctly, rather than defaulting to object dtype. For example:

>>> pd.options.future.infer_string = True
>>> df = pd.DataFrame()
>>> df.columns.dtype
dtype('int64')               # default RangeIndex for empty columns
>>> df["a"] = [1, 2, 3]
>>> df.columns.dtype
<StringDtype(na_value=nan)>  # new columns use string dtype instead of object dtype

Bug fixes#

Bug in DataFrameGroupBy.min(), DataFrameGroupBy.max(), Resampler.min(), Resampler.max() where all NA values of string dtype would return float instead of string dtype (GH 60810)
Bug in DataFrame.join() incorrectly downcasting object-dtype indexes (GH 61771)
Bug in DataFrame.sum() with axis=1, DataFrameGroupBy.sum() or SeriesGroupBy.sum() with skipna=True, and Resampler.sum() with all NA values of StringDtype resulted in 0 instead of the empty string "" (GH 60229)
Fixed bug in DataFrame.explode() and Series.explode() where methods would fail with dtype="str" (GH 61623)
Fixed bug in unpickling objects pickled in pandas versions pre-2.3.0 that used StringDtype (GH 61763)

Contributors#

A total of 10 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.

David Krych
Irv Lustig
Joris Van den Bossche
Lumberbot (aka Jack)
Marc Garcia
Matthew Roeschke
Pandas Development Team
Ralf Gommers
Richard Shadrach
jbrockmendel