-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Closed
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestPerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Code Sample, a copy-pastable example if possible
import traceback
import pandas as pd
import numpy as np
from multiprocessing.pool import ThreadPool
def f(arg):
s,idx = arg
try:
# s.loc[idx] # No problem
s.reindex(idx) # Fails
except Exception:
traceback.print_exc()
return None
def gen_args(n=10000):
a = np.arange(0, 3000000)
for i in xrange(n):
if i%1000 == 0:
# print "?",i
s = pd.Series(data=a, index=a)
f((s,a)) # <<< LOOK. IT WORKS HERE!!!
yield s, np.arange(0,1000)
# for arg in gen_args():
# f(arg) # Works just fine
t = ThreadPool(4)
for result in t.imap(f, gen_args(), chunksize=1):
pass
Problem description
pd.Series.reindex fails in a multi-threaded application.
This is a little surprising since I'm not asking for any writes.
The error also seems bogus: 'cannot reindex from a duplicate axis' ... the series does not have any duplicate axis and I was able to call s.reindex(idx)
in the main thread before the same failed in the pool's thread.
File "<ipython-input-8-4121235a46fa>", line 6, in f
s.reindex(idx).values # Fails
File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 2681, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3023, in reindex
fill_value, copy).__finalize__(self)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3041, in _reindex_axes
copy=copy, allow_dups=False)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 3145, in _reindex_with_indexers
copy=copy)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/internals.py", line 4139, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/indexes/base.py", line 2944, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Expected Output
Program should output nothing.
Output of pd.show_versions()
```
INSTALLED VERSIONS
------------------
commit: None
python: 2.7.15.candidate.1
python-bits: 64
OS: Linux
OS-release: 4.15.0-46-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None
pandas: 0.22.0
pytest: None
pip: 18.1
setuptools: 40.6.2
Cython: 0.29.1
numpy: 1.16.1
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 5.0.0
sphinx: None
patsy: 0.5.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 2.1.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.2.17
pymysql: None
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
```
Metadata
Metadata
Assignees
Labels
Duplicate ReportDuplicate issue or pull requestDuplicate issue or pull requestPerformanceMemory or execution speed performanceMemory or execution speed performance