summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJoey Adams2011-01-21 08:38:48 +0000
committerJoey Adams2011-01-21 08:38:48 +0000
commite6ba3b7e0c044aa088e204b0cb89b86ae00d5989 (patch)
tree16fae58b983a7a3b33a84f77e6c521ae9df0e189
parent97e50382f3ccc7f452a6ae402c824d3e5df3d4d8 (diff)
Added roadmap.markdown
-rw-r--r--roadmap.markdown20
1 files changed, 20 insertions, 0 deletions
diff --git a/roadmap.markdown b/roadmap.markdown
new file mode 100644
index 0000000..d3c7b54
--- /dev/null
+++ b/roadmap.markdown
@@ -0,0 +1,20 @@
+Preface: This document merely contains some decisions about minor and not-so-minor details about the JSON datatype. It establishes the direction I would like to pursue with this implementation, but should not be interpreted as inflexible.
+
+The core traits of this particular implementation is/will be as follows:
+
+ * JSON is a TEXT-like datatype. Value wrapping and extraction require explicit use of the functions `from_json` and `to_json`.
+ * The JSON datatype allows top-level scalar values (number, string, true, false, null). `"hello"` is technically not a JSON document according to RFC 4627. However, allowing scalar toplevels tends to be more useful than not allowing them (e.g. `select json_path('[1,2,3]', '$[*]');` ).
+ * The datatype's on-disk format is JSON-formatted text, in the server encoding. Although a binary encoding could theoretically be more size-efficient, I believe an optimized text-based representation is pretty darn size-efficient too. It's also easier to implement :-)
+ * Binary send/recv will not be implemented. There is no standard binary representation for JSON at this time (BSON is *not* one-to-one with JSON—Binary JSON is a misnomer).
+ * The text representation will be optimized, not preserved verbatim (as it currently is). For example, `[ "\u006A" ]` will become `["j"]`. I believe that users generally care a lot more about the content than the formatting of JSON, and that any users interested in preserving the formatting can just use TEXT.
+
+Unicode will be handled as follows:
+
+ * On input, if and only if the server encoding is UTF-8, Unicode escapes above the ASCII range will be unescaped. For example, `"\u266b"` will be condensed to `"♫"` if the server encoding is UTF-8, `"\u266b"` if it is not.
+ * On output, if the client encoding is neither UTF-8 nor equivalent to the server encoding, and if the server encoding is not SQL_ASCII, then all non-ASCII characters will be escaped, no matter how expensive it is.
+ * If the server encoding is SQL_ASCII, Unicode escapes above ASCII will never be created nor unescaped.
+ * When extracting a string, and a non-ASCII escape(s) occurs, as in:
+
+ select from_json($$ "\u266b" $$);
+
+ It has to be unescaped, of course. If the server encoding lacks the codepoint (including if the server encoding is SQL_ASCII), an error will be thrown. As far as I know, PostgreSQL does not provide a fast path for converting individual codepoints to/from non-UTF-8 encodings, so string extraction will be a little slower if the server encoding is not UTF-8.