Skip to content

ER: multiindexes in no-data DataFrame constructor should yield a nan frame #4078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Jun 28, 2013 · 9 comments · Fixed by #5089
Closed

ER: multiindexes in no-data DataFrame constructor should yield a nan frame #4078

cpcloud opened this issue Jun 28, 2013 · 9 comments · Fixed by #5089
Assignees
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@cpcloud
Copy link
Member

cpcloud commented Jun 28, 2013

as it stands

In [24]: tuples = [(randint(10), randint(10)) for _ in range(10)]

In [25]: mi = MultiIndex.from_tuples(tuples)

In [26]: df = DataFrame(index=mi, columns=mi)

fails, whereas single level indexes work fine

In [27]: df = DataFrame(index=range(10), columns=range(10))

In [28]: df
Out[28]:
     0    1    2    3    4    5    6    7    8    9
0  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
1  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
2  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
3  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
4  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
5  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
6  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
7  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
8  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
9  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN  NaN
@jtratner
Copy link
Contributor

Or at least a non-crazy error:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-11-243e9e4b4a0c> in <module>()
----> 1 df = DataFrame(index=mi, columns=mi)

../pandas/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    395             mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
    396         elif isinstance(data, dict):
--> 397             mgr = self._init_dict(data, index, columns, dtype=dtype)
    398         elif isinstance(data, ma.MaskedArray):
    399             mask = ma.getmaskarray(data)

../pandas/pandas/core/frame.py in _init_dict(self, data, index, columns, dtype)
    526
    527         return _arrays_to_mgr(arrays, data_names, index, columns,
--> 528                               dtype=dtype)
    529
    530     def _init_ndarray(self, values, index, columns, dtype=None,

../python/pandas/pandas/core/frame.py in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5703     axes = [_ensure_index(columns), _ensure_index(index)]
   5704
-> 5705     return create_block_manager_from_arrays(arrays, arr_names, axes)
   5706
   5707 def extract_index(data):

../pandas/pandas/core/internals.pyc in create_block_manager_from_arrays(arrays, names, axes)
   2232 def create_block_manager_from_arrays(arrays, names, axes):
   2233     try:
-> 2234         blocks = form_blocks(arrays, names, axes)
   2235         mgr = BlockManager(blocks, axes)
   2236         mgr._consolidate_inplace()

../pandas/pandas/core/internals.pyc in form_blocks(arrays, names, axes)
   2315
   2316     if len(object_items) > 0:
-> 2317         object_blocks = _simple_blockify(object_items, items, np.object_, is_unique=is_unique)
   2318         blocks.extend(object_blocks)
   2319

../pandas/pandas/core/internals.pyc in _simple_blockify(tuples, ref_items, dtype, is_unique)
   2334 def _simple_blockify(tuples, ref_items, dtype, is_unique=True):
   2335     """ return a single array of a block that has a single dtype; if dtype is not None, coerce to this dtype """
-> 2336     block_items, values, placement = _stack_arrays(tuples, ref_items, dtype)
   2337
   2338     # CHECK DTYPE?

../pandas/pandas/core/internals.pyc in _stack_arrays(tuples, ref_items, dtype)
   2395         items = _ensure_index([ n for n in names if n in ref_items ])
   2396         if len(items) != len(stacked):
-> 2397             raise Exception("invalid names passed _stack_arrays")
   2398
   2399     return items, stacked, placement

Exception: invalid names passed _stack_arrays

@jreback
Copy link
Contributor

jreback commented Jun 28, 2013

you don't like the error msg? (I am not sure i even remember what it's for!)

@jtratner
Copy link
Contributor

@jreback :P

On Fri, Jun 28, 2013 at 7:07 PM, jreback [email protected] wrote:

you don't like the error msg? (I am not sure i even remember what it's
for!)


Reply to this email directly or view it on GitHubhttps://github.com//issues/4078#issuecomment-20219320
.

@cpcloud
Copy link
Member Author

cpcloud commented Aug 14, 2013

just got to the bottom of this: it only happens for a duplicate index value. works fine when there are no duplicates

@ghost ghost assigned jreback Sep 28, 2013
@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

does this fail for you on curent master?

@cpcloud
Copy link
Member Author

cpcloud commented Oct 2, 2013

yes but only when things aren't sorted and there are duplicates:

In [12]: tuples = [(3, 3), (2, 3), (3, 3)]

In [13]: mi = MultiIndex.from_tuples(tuples)

In [14]: df = DataFrame(index=mi,columns=mi)
Exception: invalid names passed _stack_arrays

while

In [15]: tuples = [(2, 3), (3, 3), (3, 3)]

In [16]: mi = MultiIndex.from_tuples(tuples)

In [17]: df = DataFrame(index=mi,columns=mi)

works fine

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

ahh...this is the case @jtratner talking about the other day...

@jreback
Copy link
Contributor

jreback commented Oct 2, 2013

ok....i'll just put tests up (and nto merge)

@cpcloud
Copy link
Member Author

cpcloud commented Oct 2, 2013

yes ... this happened during my refactor of read_html bc skiprows=1, header=[0, 1] and tupleize_cols=False are the trifecta of doom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Error Reporting Incorrect or improved errors from pandas Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants