What’s new in 2.3.1 (July 7, 2025)#
These are the changes in pandas 2.3.1. See Release notes for a full changelog including other versions of pandas.
Improvements and fixes for the StringDtype#
Most changes in this release are related to StringDtype
which will
become the default string dtype in pandas 3.0. See
Upcoming changes in pandas 3.0 for more details.
Comparisons between different string dtypes#
In previous versions, comparing Series
of different string dtypes (e.g. pd.StringDtype("pyarrow", na_value=pd.NA)
against pd.StringDtype("python", na_value=np.nan)
) would result in inconsistent resulting dtype or incorrectly raise (GH 60639). pandas will now use the hierarchy
object < (python, NaN) < (pyarrow, NaN) < (python, NA) < (pyarrow, NA)
in determining the result dtype when there are different string dtypes compared. Some examples:
When
pd.StringDtype("pyarrow", na_value=pd.NA)
is compared against any other string dtype, the result will always beboolean[pyarrow]
.When
pd.StringDtype("python", na_value=pd.NA)
is compared againstpd.StringDtype("pyarrow", na_value=np.nan)
, the result will beboolean
, the NumPy-backed nullable extension array.When
pd.StringDtype("python", na_value=pd.NA)
is compared againstpd.StringDtype("python", na_value=np.nan)
, the result will beboolean
, the NumPy-backed nullable extension array.
Index set operations ignore empty RangeIndex and object dtype Index#
When enabling the future.infer_string
option, Index
set operations (like
union or intersection) will now ignore the dtype of an empty RangeIndex
or
empty Index
with object
dtype when determining the dtype of the resulting
Index (GH 60797).
This ensures that combining such empty Index with strings will infer the string dtype
correctly, rather than defaulting to object
dtype. For example:
>>> pd.options.future.infer_string = True
>>> df = pd.DataFrame()
>>> df.columns.dtype
dtype('int64') # default RangeIndex for empty columns
>>> df["a"] = [1, 2, 3]
>>> df.columns.dtype
<StringDtype(na_value=nan)> # new columns use string dtype instead of object dtype
Bug fixes#
Bug in
DataFrameGroupBy.min()
,DataFrameGroupBy.max()
,Resampler.min()
,Resampler.max()
where all NA values of string dtype would return float instead of string dtype (GH 60810)Bug in
DataFrame.join()
incorrectly downcasting object-dtype indexes (GH 61771)Bug in
DataFrame.sum()
withaxis=1
,DataFrameGroupBy.sum()
orSeriesGroupBy.sum()
withskipna=True
, andResampler.sum()
with all NA values ofStringDtype
resulted in0
instead of the empty string""
(GH 60229)Fixed bug in
DataFrame.explode()
andSeries.explode()
where methods would fail withdtype="str"
(GH 61623)Fixed bug in unpickling objects pickled in pandas versions pre-2.3.0 that used
StringDtype
(GH 61763)
Contributors#
A total of 10 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
David Krych
Irv Lustig
Joris Van den Bossche
Lumberbot (aka Jack)
Marc Garcia
Matthew Roeschke
Pandas Development Team
Ralf Gommers
Richard Shadrach
jbrockmendel