Added roadmap.markdown

author: Joey Adams 2011-01-21 08:38:48 +0000
committer: Joey Adams 2011-01-21 08:38:48 +0000
commit: e6ba3b7e0c044aa088e204b0cb89b86ae00d5989 (patch)
tree: 16fae58b983a7a3b33a84f77e6c521ae9df0e189
parent: 97e50382f3ccc7f452a6ae402c824d3e5df3d4d8 (diff)
1 files changed, 20 insertions, 0 deletions
diff --git a/roadmap.markdown b/roadmap.markdown
new file mode 100644
index 0000000..d3c7b54
--- /dev/null
+++ b/roadmap.markdown
@@ -0,0 +1,20 @@
+Preface:  This document merely contains some decisions about minor and not-so-minor details about the JSON datatype.  It establishes the direction I would like to pursue with this implementation, but should not be interpreted as inflexible.
+
+The core traits of this particular implementation is/will be as follows:
+
+ * JSON is a TEXT-like datatype.  Value wrapping and extraction require explicit use of the functions `from_json` and `to_json`.
+ * The JSON datatype allows top-level scalar values (number, string, true, false, null).  `"hello"` is technically not a JSON document according to RFC 4627.  However, allowing scalar toplevels tends to be more useful than not allowing them (e.g. `select json_path('[1,2,3]', '$[*]');` ).
+ * The datatype's on-disk format is JSON-formatted text, in the server encoding.  Although a binary encoding could theoretically be more size-efficient, I believe an optimized text-based representation is pretty darn size-efficient too.  It's also easier to implement :-)
+ * Binary send/recv will not be implemented.  There is no standard binary representation for JSON at this time (BSON is *not* one-to-one with JSON—Binary JSON is a misnomer).
+ * The text representation will be optimized, not preserved verbatim (as it currently is).  For example, `[   "\u006A" ]` will become `["j"]`.  I believe that users generally care a lot more about the content than the formatting of JSON, and that any users interested in preserving the formatting can just use TEXT.
+
+Unicode will be handled as follows:
+
+ * On input, if and only if the server encoding is UTF-8, Unicode escapes above the ASCII range will be unescaped.  For example, `"\u266b"` will be condensed to `"♫"` if the server encoding is UTF-8, `"\u266b"` if it is not.
+ * On output, if the client encoding is neither UTF-8 nor equivalent to the server encoding, and if the server encoding is not SQL_ASCII, then all non-ASCII characters will be escaped, no matter how expensive it is.
+ * If the server encoding is SQL_ASCII, Unicode escapes above ASCII will never be created nor unescaped.
+ * When extracting a string, and a non-ASCII escape(s) occurs, as in:
+
+        select from_json($$ "\u266b" $$);
+
+   It has to be unescaped, of course.  If the server encoding lacks the codepoint (including if the server encoding is SQL_ASCII), an error will be thrown.  As far as I know, PostgreSQL does not provide a fast path for converting individual codepoints to/from non-UTF-8 encodings, so string extraction will be a little slower if the server encoding is not UTF-8.
author	Joey Adams	2011-01-21 08:38:48 +0000
committer	Joey Adams	2011-01-21 08:38:48 +0000
commit	e6ba3b7e0c044aa088e204b0cb89b86ae00d5989 (patch)
tree	16fae58b983a7a3b33a84f77e6c521ae9df0e189
parent	97e50382f3ccc7f452a6ae402c824d3e5df3d4d8 (diff)