diff options
author | Joey Adams | 2011-01-21 08:38:48 +0000 |
---|---|---|
committer | Joey Adams | 2011-01-21 08:38:48 +0000 |
commit | e6ba3b7e0c044aa088e204b0cb89b86ae00d5989 (patch) | |
tree | 16fae58b983a7a3b33a84f77e6c521ae9df0e189 | |
parent | 97e50382f3ccc7f452a6ae402c824d3e5df3d4d8 (diff) |
Added roadmap.markdown
-rw-r--r-- | roadmap.markdown | 20 |
1 files changed, 20 insertions, 0 deletions
diff --git a/roadmap.markdown b/roadmap.markdown new file mode 100644 index 0000000..d3c7b54 --- /dev/null +++ b/roadmap.markdown @@ -0,0 +1,20 @@ +Preface: This document merely contains some decisions about minor and not-so-minor details about the JSON datatype. It establishes the direction I would like to pursue with this implementation, but should not be interpreted as inflexible. + +The core traits of this particular implementation is/will be as follows: + + * JSON is a TEXT-like datatype. Value wrapping and extraction require explicit use of the functions `from_json` and `to_json`. + * The JSON datatype allows top-level scalar values (number, string, true, false, null). `"hello"` is technically not a JSON document according to RFC 4627. However, allowing scalar toplevels tends to be more useful than not allowing them (e.g. `select json_path('[1,2,3]', '$[*]');` ). + * The datatype's on-disk format is JSON-formatted text, in the server encoding. Although a binary encoding could theoretically be more size-efficient, I believe an optimized text-based representation is pretty darn size-efficient too. It's also easier to implement :-) + * Binary send/recv will not be implemented. There is no standard binary representation for JSON at this time (BSON is *not* one-to-one with JSON—Binary JSON is a misnomer). + * The text representation will be optimized, not preserved verbatim (as it currently is). For example, `[ "\u006A" ]` will become `["j"]`. I believe that users generally care a lot more about the content than the formatting of JSON, and that any users interested in preserving the formatting can just use TEXT. + +Unicode will be handled as follows: + + * On input, if and only if the server encoding is UTF-8, Unicode escapes above the ASCII range will be unescaped. For example, `"\u266b"` will be condensed to `"♫"` if the server encoding is UTF-8, `"\u266b"` if it is not. + * On output, if the client encoding is neither UTF-8 nor equivalent to the server encoding, and if the server encoding is not SQL_ASCII, then all non-ASCII characters will be escaped, no matter how expensive it is. + * If the server encoding is SQL_ASCII, Unicode escapes above ASCII will never be created nor unescaped. + * When extracting a string, and a non-ASCII escape(s) occurs, as in: + + select from_json($$ "\u266b" $$); + + It has to be unescaped, of course. If the server encoding lacks the codepoint (including if the server encoding is SQL_ASCII), an error will be thrown. As far as I know, PostgreSQL does not provide a fast path for converting individual codepoints to/from non-UTF-8 encodings, so string extraction will be a little slower if the server encoding is not UTF-8. |