the attached patch stops MySQLdb's converting binary
char and binary varchar fields to unicode objects, if
one has connected with use_unicode=True. The patch is
against 1.2.1c3 and also applies against 1.2.0.
Your patch causes char and varchar columns with a binary
collation to returned as array('c',...). I find that, for
example, on the 5.0 (and probably 4.1) privilege tables, a
lot of the fields have utf8_bin collation, which causes
these fields to be returned as array which means they are
not properly decoded (left in utf8 encoding).
Unless you can provide some use cases where this is
required, I'll need to remove this.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I actually wanted only to not convert CHAR / VARCHAR
columns with collation 'BINARY' to unicode objects; these
columns are then "converted" to the type BINARY or
VARBINARY (like TEXT with collation 'BINARY' is converted
to BLOB).
I think the patch accomplished this, but I didn't test for
'*_bin' collations, where I would want the conversion to
unicode objects to happen.
Is there a way to do this?
Thanks & Regards,
Milan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm rethinking this a bit. Binary columns really can't be
returned as unicode strings, since they have contain invalid
unicode data, so they need to be returned either as
array('c') or string, and I'm inclined lately to make the
latter choice user- or site-configurable, since a lot of
people hate array('c') (I think it's annoying too).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just tested 1.2.2b1 and saw that my VARBINARY fields are
now returned as byte (binary) string :) (which I like
better than array('c') too btw)
I also tested VARCHAR and text with utf8_bin, and saw that
they are also returned as byte string. I think we agreed
that utf8_bin should be decoded, and I verified that the
MySQL manual also says it's UTF-8 data. And I also found
out that the manual tells us how to distinguish VARBINARY
from VARCHAR etc :)
First, this page clears sth up: <URL:http: dev.mysql.com="" doc="" refman="" 4.1="" en="" charset-binary-op.html="">
It says that instead of the BINARY /attribute/ we must now
(in 4.1) use the BINARY /character set/ for binary data. In
4.1, the binary attribute causes the charecter set's _bin
collation to be used (e.g. utf8_bin). This is why VARCHAR
with _bin has that attribute set.
<URL:http: dev.mysql.com="" doc="" refman="" 4.1="" en="" c-api-="" datatypes.html=""> only knows one constant for VARCHAR/
VARBINARY, CHAR/BINARY and TEXT/BLOB types resp., but tells
how to distinguish anyway:
To distinguish between binary and non-binary data for
string data types, check whether the charsetnr value is
63. If so, the character set is binary, which indicates
binary rather than non-binary data. This is how to
distinguish between BINARY and CHAR, VARBINARY and
VARCHAR, and BLOB and TEXT.
I have no intention on writing a patch myself, as I don't
need it for know, but I think you might be interested ;)
Regards,
Milan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Patch to fix converting binary chars and binary varchars to unicode objects
Logged In: YES
user_id=71372
Your patch, or a variation, has been applied to the current CVS tree.
Logged In: YES
user_id=71372
Your patch causes char and varchar columns with a binary
collation to returned as array('c',...). I find that, for
example, on the 5.0 (and probably 4.1) privilege tables, a
lot of the fields have utf8_bin collation, which causes
these fields to be returned as array which means they are
not properly decoded (left in utf8 encoding).
Unless you can provide some use cases where this is
required, I'll need to remove this.
Logged In: YES
user_id=1261581
I actually wanted only to not convert CHAR / VARCHAR
columns with collation 'BINARY' to unicode objects; these
columns are then "converted" to the type BINARY or
VARBINARY (like TEXT with collation 'BINARY' is converted
to BLOB).
I think the patch accomplished this, but I didn't test for
'*_bin' collations, where I would want the conversion to
unicode objects to happen.
Is there a way to do this?
Thanks & Regards,
Milan
Logged In: YES
user_id=71372
I don't think there is a way to distinquish between the two
cases with the C API as both will set BINARY_FLAG/FLAG.BINARY.
Also see:
http://dev.mysql.com/doc/refman/5.0/en/charset-binary-op.html
Logged In: YES
user_id=71372
I'm rethinking this a bit. Binary columns really can't be
returned as unicode strings, since they have contain invalid
unicode data, so they need to be returned either as
array('c') or string, and I'm inclined lately to make the
latter choice user- or site-configurable, since a lot of
people hate array('c') (I think it's annoying too).
Logged In: YES
user_id=1261581
Hello,
I just tested 1.2.2b1 and saw that my VARBINARY fields are
now returned as byte (binary) string :) (which I like
better than array('c') too btw)
I also tested VARCHAR and text with utf8_bin, and saw that
they are also returned as byte string. I think we agreed
that utf8_bin should be decoded, and I verified that the
MySQL manual also says it's UTF-8 data. And I also found
out that the manual tells us how to distinguish VARBINARY
from VARCHAR etc :)
First, this page clears sth up: <URL:http: dev.mysql.com="" doc="" refman="" 4.1="" en="" charset-binary-op.html="">
It says that instead of the BINARY /attribute/ we must now
(in 4.1) use the BINARY /character set/ for binary data. In
4.1, the binary attribute causes the charecter set's _bin
collation to be used (e.g. utf8_bin). This is why VARCHAR
with _bin has that attribute set.
<URL:http: dev.mysql.com="" doc="" refman="" 4.1="" en="" c-api-="" datatypes.html=""> only knows one constant for VARCHAR/
VARBINARY, CHAR/BINARY and TEXT/BLOB types resp., but tells
how to distinguish anyway:
I have no intention on writing a patch myself, as I don't
need it for know, but I think you might be interested ;)
Regards,
Milan
Logged In: YES
user_id=71372
Wow, that really sucks.
Logged In: YES
user_id=71372
Originator: NO
I don't really what to do with this at this point.