Skip to content

pd.Series.reindex is not thread safe. #25870

@allComputableThings

Description

@allComputableThings

Code Sample, a copy-pastable example if possible

import traceback
import pandas as pd
import numpy as np
from multiprocessing.pool import ThreadPool

def f(arg):
    s,idx = arg
    try:
        # s.loc[idx]   # No problem
        s.reindex(idx) # Fails
    except Exception:
        traceback.print_exc()
    return None


def gen_args(n=10000):
    a = np.arange(0, 3000000)
    for i in xrange(n):
        if i%1000 == 0:
            # print "?",i
            s = pd.Series(data=a, index=a)
            f((s,a)) # <<< LOOK. IT WORKS HERE!!!
        yield s, np.arange(0,1000)

# for arg in gen_args():
#     f(arg)   # Works just fine

t = ThreadPool(4)
for result in t.imap(f, gen_args(), chunksize=1):
    pass

Problem description

pd.Series.reindex fails in a multi-threaded application.

This is a little surprising since I'm not asking for any writes.

The error also seems bogus: 'cannot reindex from a duplicate axis' ... the series does not have any duplicate axis and I was able to call s.reindex(idx) in the main thread before the same failed in the pool's thread.

  File "<ipython-input-8-4121235a46fa>", line 6, in f
    s.reindex(idx).values # Fails
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2681, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3023, in reindex
    fill_value, copy).__finalize__(self)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3041, in _reindex_axes
    copy=copy, allow_dups=False)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3145, in _reindex_with_indexers
    copy=copy)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4139, in reindex_indexer
    self.axes[axis]._can_reindex(indexer)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py", line 2944, in _can_reindex
    raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis

Expected Output

Program should output nothing.

Output of pd.show_versions()

``` INSTALLED VERSIONS ------------------ commit: None python: 2.7.15.candidate.1 python-bits: 64 OS: Linux OS-release: 4.15.0-46-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None pandas: 0.22.0 pytest: None pip: 18.1 setuptools: 40.6.2 Cython: 0.29.1 numpy: 1.16.1 scipy: 1.2.0 pyarrow: None xarray: None IPython: 5.0.0 sphinx: None patsy: 0.5.1 dateutil: 2.6.0 pytz: 2016.10 blosc: None bottleneck: None tables: None numexpr: 2.6.8 feather: None matplotlib: 2.1.0 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: 4.6.0 html5lib: 0.9999999 sqlalchemy: 1.2.17 pymysql: None psycopg2: 2.7.7 (dt dec pq3 ext lo64) jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None ```

Metadata

Metadata

Assignees

No one assigned

    Labels

    Duplicate ReportDuplicate issue or pull requestPerformanceMemory or execution speed performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions