-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
ENH: Make categorical repr nicer. #4368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added tests. |
@jseabold this is nice......can you hook up travis? (prob just need to flip the switch), then
|
Should be going now. Ping me if it's not. Maybe needed a setup lag. |
does not appear to have taken..... |
Appears to be going now. |
Well, I don't see the banner here, but I see it running on travis. I dunno. |
travis usually takes a couple of minutes to actually start, that's how it rolls |
i see the banner |
Anyone know off the top of their head the 2.6 errors? I assume it's a unicode comparison issue... |
that kind of looks like a bug python 2.6:
python 2.7
|
or possibly a difference in numpy since categorical repr calls np.array_repr...my python 2.6 has np 1.6.1 and py 2.7 up there has 1.7.1 |
def _repr_footer(self): | ||
levheader = 'Levels (%d): ' % len(self.levels) | ||
#TODO: should max_line_width respect a setting? | ||
levstring = np.array_repr(self.levels, max_line_width=60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep...this should be com.pprint_thing(self.levels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe self.levels.format()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.levels.format()
does the correct thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost inclined to just fix the tests here and live with the numpy inconsistency unless there's another way around this. pprint_thing
drops the object name and I like knowing levels is an Index. E.g.,
>>> np.array_repr(np.arange(3))
'array([0, 1, 2])'
>>> com.pprint_thing(np.arange(3))
u'[0, 1, 2]'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's wrong with self.levels.format()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a list (for 1.8.x)? So either array_repr fails with an error or I'm back to pprint_thing. Either way I lose the Index([...]) information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or of course, leaving it, I lose the Index info. I could just do something like "Index(%s)"
but I hoped to avoid this in case levels is ever not an Index
for any reason, but I guess the tests will catch a change like that now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh duh sorry yes you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a delicious bit of irony I'm the one that changed the unicode repr of numpy object arrays.
yep |
Thanks. For the record, y'all beat my attempts to build Python 2.6, the whole numpy stack, and pandas in the background of my work. |
@jseabold building the stack is not fun. i get strange issues with |
hah....we have had fair share of issues with py2.6 lately! here's a fun bug in python (that was actually not fixed): http://bugs.python.org/issue2325 |
Yes, I switched to openblas recently and my hacked together numpy/distutils is not working for scipy (numpy tests pass) for 2.6. |
Fixed the tests. |
gets bit unwieldy for a large number of levels (i tried 100), but my opinion here doesn't matter that much i never use |
I left a TODO in there for this. It's admittedly not handled, but I can't imagine having categorical variables with too many categories. The degrees of freedom loss is too prohibitive in estimation. Maybe it could be useful in some machine learning contexts, but I don't know how ML people use factors in R. I suspect they don't. |
totally fine by me. like i said, i never found a need for this |
fine by me |
Too late to throw out case the repr being valid python (so it can eval itself) :s ? |
@jseabold this almost got lost... can you rebase and we can merge it in.... thxs |
Rebased. There was a merge conflict, which I didn't check too closely, so make sure tests pass. |
Any idea what's going on here? |
Looks like a circular import introduced in 85f191c? |
Make looking at Categorical types a little nicer. Needs some tests still, but works fine locally so far.