Skip to content

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Jun 3, 2014

closes #7220

@@ -165,10 +168,11 @@ class _HtmlFrameParser(object):
See each method's respective documentation for details on their
functionality.
"""
def __init__(self, io, match, attrs):
def __init__(self, io, match, attrs, encoding):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this default to None, then set to utf-8? (or just not set and leave as None)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it defaults to None (from the read_html entry point) because I didn't want to enforce an encoding if bs4 or lxml can parse it from HTML meta information.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k...sounds good

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

@klonuo would you like to try out this branch on your data and see if it's to your liking?

@jreback
Copy link
Contributor

jreback commented Jun 3, 2014

don't you have to: cc @klonuo ?

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

i did

@@ -1,5 +1,8 @@
# encoding: utf8
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: remove this either after passes on travis or before merge

@klonuo
Copy link
Contributor

klonuo commented Jun 3, 2014

@cpcloud I just tried it and it works fine

Thanks for your patience

@cpcloud
Copy link
Member Author

cpcloud commented Jun 3, 2014

great! thanks for the report. keep the issues coming . promise i won't bite anymore :)

@jreback jreback added this to the 0.14.1 milestone Jun 3, 2014
@cpcloud cpcloud self-assigned this Jun 3, 2014
@cpcloud
Copy link
Member Author

cpcloud commented Jun 4, 2014

@jreback going to merge this after whatsnew

@jreback
Copy link
Contributor

jreback commented Jun 4, 2014

yep

cpcloud added a commit that referenced this pull request Jun 4, 2014
UNI/HTML/WIP: add encoding argument to read_html
@cpcloud cpcloud merged commit 89983c3 into pandas-dev:master Jun 4, 2014
@cpcloud cpcloud deleted the html-encoding branch June 4, 2014 14:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HTML read_html, to_html, Styler.apply, Styler.applymap Unicode Unicode strings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Suggestions for html table parsing
3 participants