Assign to df with repeated column fails

If you have a DataFrame with a repeated or non-unique column, then some assignments fail.

``` python
df = pd.DataFrame(np.random.randn(10,2), columns=['that', 'that'])

df2
Out[10]: 
   that  that
0     1     1
1     1     1
2     1     1
3     1     1
4     1     1
5     1     1
6     1     1
7     1     1
8     1     1
9     1     1

[10 rows x 2 columns]
```

This is float data and the following works:

``` python
df['that'] = 1.0
```

However, this fails with an error and breaks the dataframe (e.g. a subsequent repr will also fail.)

``` python
df2['that'] = 1
Traceback (most recent call last):
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/ipython-1.1.0_1_ahl1-py2.7.egg/IPython/core/interactiveshell.py", line 2830, in run_code
    exec code_obj in self.user_global_ns, self.user_ns
  File "<ipython-input-11-8701f5b0efe4>", line 1, in <module>
    df2['that'] = 1
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1879, in __setitem__
    self._set_item(key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/frame.py", line 1960, in _set_item
    NDFrame._set_item(self, key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/generic.py", line 1057, in _set_item
    self._data.set(key, value)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 2968, in set
    _set_item(item, arr[None, :])
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 2927, in _set_item
    self._add_new_block(item, arr, loc=None)
  File "/users/is/dbew/pyenvs/timeseries/lib/python2.7/site-packages/pandas-0.13.0_292_g4dcecb0-py2.7-linux-x86_64.egg/pandas/core/internals.py", line 3108, in _add_new_block
    new_block = make_block(value, self.items[loc:loc + 1].copy(),
TypeError: unsupported operand type(s) for +: 'slice' and 'int'
```

I stepped through the code and it looked like most places handle repeated columns ok except the code that reallocates arrays when the dtype changes.

I've tested this against pandas 0.13.0 and the latest master. Here's the output of installed versions when running on the master:

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 2.6.18-308.el5
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB

pandas: 0.13.0-292-g4dcecb0
Cython: 0.16
numpy: 1.7.1
scipy: 0.9.0
statsmodels: None
patsy: None
scikits.timeseries: None
dateutil: 1.5
pytz: None
bottleneck: 0.6.0
tables: 2.3.1-1
numexpr: 2.0.1
matplotlib: 1.1.1
openpyxl: None
xlrd: 0.8.0
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: 2.3.6
bs4: None
html5lib: None
bq: None
apiclient: None 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Assign to df with repeated column fails #6120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Assign to df with repeated column fails #6120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions