Archive

Posts Tagged ‘accent’

unicode to ascii

December 17, 2010 Leave a comment

Problem

I had the following unicode string: “Kellemes Ünnepeket!” that I wanted to simplify to this: “Kellemes Unnepeket!”, that is strip “Ü” to “U”. Furthermore, most of the strings were normal ascii, only some of them were in unicode.

Solution

import unicodedata

title = ...   # get the string somehow
try:
    # if the title is a unicode string, normalize it
    title = unicodedata.normalize('NFKD', title).encode('ascii','ignore')
except TypeError:
    # if it was not a unicode string => OK, do nothing
    pass

Credits

I used the following resources:

Categories: python Tags: , , ,
Design a site like this with WordPress.com
Get started