roadmap.markdown: Summarized Unicode handling, and made minor corrections.

author: Joey Adams 2011-03-18 15:53:16 +0000
committer: Joey Adams 2011-03-18 15:53:16 +0000
commit: 59b46ebff2133b363ae3b174714726cf1c8640a4 (patch)
tree: c71a46597ca521e8e0db6d84a8f7eebee540252e
parent: 5228d8477bbcb74bba287b94ac043a04a7c9b77b (diff)
1 files changed, 10 insertions, 2 deletions
diff --git a/roadmap.markdown b/roadmap.markdown
index a416533..aea302a 100644
--- a/roadmap.markdown
+++ b/roadmap.markdown
@@ -15,7 +15,14 @@ The core traits of this particular implementation is/will be as follows:
  * Binary send/recv will not be implemented.  There is no standard binary representation for JSON at this time (BSON is *not* one-to-one with JSON—Binary JSON is a misnomer).
  * The text representation will be optimized, not preserved verbatim (as it currently is).  For example, `[   "\u006A" ]` will become `["j"]`.  I believe that users generally care a lot more about the content than the formatting of JSON, and that any users interested in preserving the formatting can just use TEXT.
 
-Unicode will be handled as follows:
+In a nutshell, character set handling follows two principles:
+
+ * Escapes are converted to characters to save space when it is possible and efficient to do so.
+ * Characters are escaped as necessary to prevent encoding conversion errors.
+
+The JSON datatype behaves ideally with respect to encodings when both the client and server encodings are UTF-8.  When the client encoding is not UTF-8, SQL_ASCII, nor the same as the server encoding, a performance penalty is incurred to prevent encoding conversion errors.
+
+More specifically:
 
  * On input:
 
@@ -28,6 +35,7 @@ Unicode will be handled as follows:
        - The client encoding is UTF-8.  Escaping is not necessary because the client can encode all Unicode codepoints.
        - The client encoding and the server encoding are the same.  Escaping is not necessary because the client can encode all codepoints the server can encode.
        - The server encoding is SQL_ASCII.  This encoding tells PostgreSQL to shirk transcoding in favor of speed.  It wasn't unescaped on input, so don't worry about escaping on output.
+       - The client encoding is SQL_ASCII.  This encoding tells PostgreSQL to not perform encoding conversion.
     - Otherwise, (no matter how expensive it is) all non-ASCII characters are escaped.
 
  * When encoding a string to JSON on the server, as in:
@@ -40,4 +48,4 @@ Unicode will be handled as follows:
 
         SELECT from_json($$ "\u266b" $$);
 
-   It has to be unescaped, of course.  If the server encoding lacks the codepoint (including if the server encoding is SQL_ASCII), an error is thrown.  As far as I know, PostgreSQL does not provide a fast path for converting individual codepoints to/from non-UTF-8 encodings, so string extraction will be a little slower if the server encoding is not UTF-8 or SQL_ASCII.
+   It has to be unescaped, of course.  If the server encoding lacks the codepoint (including if the server encoding is SQL_ASCII), an error is thrown.  As far as I know, PostgreSQL does not provide a fast path for converting individual codepoints to/from non-UTF-8 encodings, so string extraction will be slower if the server encoding is not UTF-8 or SQL_ASCII.
author	Joey Adams	2011-03-18 15:53:16 +0000
committer	Joey Adams	2011-03-18 15:53:16 +0000
commit	59b46ebff2133b363ae3b174714726cf1c8640a4 (patch)
tree	c71a46597ca521e8e0db6d84a8f7eebee540252e
parent	5228d8477bbcb74bba287b94ac043a04a7c9b77b (diff)