Skip to content

Assign to df with repeated column fails #6120

Closed
@dbew

Description

@dbew

If you have a DataFrame with a repeated or non-unique column, then some assignments fail.

df = pd.DataFrame(np.random.randn(10,2), columns=['that', 'that'])

df2
Out[10]: 
   that  that
0     1     1
1     1     1
2     1     1
3     1     1
4     1     1
5     1     1
6     1     1
7     1     1
8     1     1
9     1     1

[10 rows x 2 columns]

This is float data and the following works:

df['that'] = 1.0

However, this fails with an error and breaks the dataframe (e.g. a subsequent repr will also fail.)

df2['that'] = 1
Traceback (most recent call last):
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/ipython-1.1.0_1_ahl1-py2.7.egg/IPython/core/interactiveshell.py", line 2830, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
  File "<ipython-input-11-8701f5b0efe4>", line 1, in <module>
    df2['that'] = 1
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1879, in __setitem__
    self._set_item(key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1960, in _set_item
    NDFrame._set_item(self, key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 1057, in _set_item
    self._data.set(key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 2968, in set
    _set_item(item, arr[None, :])
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 2927, in _set_item
    self._add_new_block(item, arr, loc=None)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 3108, in _add_new_block
    new_block = make_block(value, self.items[loc:loc + 1].copy(),
TypeError: unsupported operand type(s) for +: 'slice' and 'int'

I stepped through the code and it looked like most places handle repeated columns ok except the code that reallocates arrays when the dtype changes.

I've tested this against pandas 0.13.0 and the latest master. Here's the output of installed versions when running on the master:

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-308.el5
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.13.0-292-g4dcecb0
Cython: 0.16
numpy: 1.7.1
scipy: 0.9.0
statsmodels: None
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: None
bottleneck: 0.6.0
tables: 2.3.1-1
numexpr: 2.0.1
matplotlib: 1.1.1
openpyxl: None
xlrd: 0.8.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: 2.3.6
bs4: None
html5lib: None
bq: None
apiclient: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugIndexingRelated to indexing on series/frames, not to indexes themselves

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions