Json Final
Json Final
Juan L. Reutter
Pontificia Universidad Católica de Chile and IMFD Chile
Domagoj Vrgoč
Pontificia Universidad Católica de Chile and IMFD Chile
Abstract
Despite the fact that JSON is currently one of the most popular formats for
exchanging data on the Web, there are very few studies on this topic and
there is no agreement upon a theoretical framework for dealing with JSON.
Therefore in this paper we propose a formal data model for JSON documents
and, based on the common features present in available systems using JSON,
we define a lightweight query language allowing us to navigate through JSON
documents, study the complexity of basic computational tasks associated
with this language, and compare its expressive power with practical languages
for managing JSON data.
Keywords: JSON, Schema languages, Navigation
1. Introduction
JavaScript Object Notation (JSON) [29, 19] is a lightweight format based
on the data types of the JavaScript programming language. In their essence,
JSON documents are dictionaries consisting of key-value pairs, where the
value can again be a JSON document, thus allowing an arbitrary level of
nesting. An example of a JSON document is given in Figure 1. As we can see
here, apart from simple dictionaries, JSON also supports arrays and atomic
types such as numbers and strings. Arrays and dictionaries can again contain
arbitrary JSON documents, thus making the format fully compositional.
Due to its simplicity, and the fact that it is easily readable both by hu-
mans and by machines, JSON is quickly becoming one of the most popular
formats for exchanging data on the Web. This is particularly evident in
Web services communicating with their users through an Application Pro-
gramming Interface (API), as JSON is currently the predominant format for
sending API requests and responses over the HTTP protocol. Additionally,
JSON format is much used in database systems built around the NoSQL
paradigm (see e.g. [38, 3, 42]), or graph databases (see e.g. [48]).
Despite its popularity, the coverage of the specifics of JSON format in
the research literature is very sparse, and to the best of our knowledge,
there is still no agreement on the correct theoretical framework for JSON,
and no formalisation of the core query features which JSON systems should
support. While some preliminary studies do exist [40, 33, 10, 44, 27], as far
as we are aware, no attempt to describe a theoretical basis for JSON has
been made by the research community. Therefore, the main objective of this
paper is to formally define an appropriate data model for JSON, identify
the key querying features provided by the existing JSON systems, study the
complexity of basic tasks associated to these query features such as evaluation
and satisfiability, and compare existing JSON languages with respect to the
type of features they support.
In order to define the data model, we examine the key characteristics of
JSON documents and how they are used in practice. As a result we obtain
a tree-shaped structure very similar to the ordered data-tree model of XML
[8], but with some key differences. The first difference is that JSON trees are
deterministic by design, as each key can appear at most once inside a single
nesting level of a dictionary. This has various implications at the time of
2
querying JSON documents: on one hand we sometimes deal with languages
far simpler than XML, but on the other hand this key restriction can make
static analysis more complicated. Next, arrays are explicitly present in JSON,
which is not the case in XML. Of course, the ordered structure of XML could
be used to simulate arrays, but the defining feature of each JSON dictionary
is that it is unordered, thus dictating the nodes of our tree to be typed
accordingly. And finally, JSON values are again JSON objects, thus making
equality comparisons much more complex than in case of XML, since we
are now comparing subtrees, and not just atomic values. We cover all of
these features of JSON in more detail in the paper, and we also argue that,
while technically possible (albeit, in a very awkward manner), coding JSON
documents using XML might not be the best solution in practice.
Having a formal data model for JSON documents in place, we then con-
sider the problem of querying JSON. As there is currently no agreed upon
query language in place, we examine an array of practical JSON systems,
ranging from programming languages such as Python [23], XPath analogues
for JSON such as JSONPath [24], or fully operational JSON databases such
as MongoDB [38], and isolate what we consider to be key concepts for access-
ing JSON documents. As we will see, the main focus in many systems is on
navigating the structure of a JSON tree, therefore we propose a navigational
logic for JSON documents based on similar approaches from the realm of
XML [21], or graph databases [4, 32]. We then show how our logic captures
common use cases for JSON, extend it with additional features, and demon-
strate that it respects the “lightweight nature” of the JSON format, since
it can be evaluated very efficiently, and it also has reasonable complexity of
main static tasks. Interestingly, sometimes we can reuse results devised for
other similar languages such as XPath or Propositional Dynamic Logic, but
the nature of JSON and the functionalities present in query languages also
demand new approaches or a refinement of these techniques.
Finally, since theoretical study of JSON is still in its early stages, we close
with a series of open problems and directions for future research.
Related work. We argue in this paper that JSON needs a formal model
and that we need to study logics for JSON query processing. However, the
analysis of this logic is related to, and draws from, a wide body of research
about XML documents, XPath query answering, tree automata, and its var-
ious extensions. We remark that in this paper we are more interested in the
complexity of evaluating formulas in our logic, and (to a lesser degree) in the
3
problem of checking whether such formulas are satisfiable. It would be also
interesting to have a complete analysis of particular operators of our logic,
and how including or excluding them from the language affects the evalua-
tion and the satisfiability problem, similarly as it was done for the case of
XPath (see e.g. [5, 21]). There is plenty of work on XML processing that
could be reused for this analysis, but we think such considerations are out of
the scope of this paper.
On the other hand, we believe that JSON offers some unique challanges
that were not considered in the XML setting. First of all, JSON introduces a
kind of deterministic navigation, stemming from the restriction that key:value
pairs in JSON are unique in a specific level of the dictionary. In turn, this
means that the evaluation problem that we study has a different flavour
than the usual XML evaluation problem, as far as we are aware (see e.g.
[26, 18, 5]). Similarly, to process the equality operators in the JSON logic
we propose, one must deal with subtree comparisons. Nonetheless, we can
still extend some of the known techniques that work in the XML setting to
JSON. For instance, to evaluate a query that uses subtree comparisons, we
can preprocess our document by deploying a DAG-like compression such as
the one introduced in [14], in order to reduce subtree comparisons to data
comparisons. For satisfiability we know that this operator can easily lead
to undecidability as long as some form of recursion is allowed [49, 34], and
even local equality tests can be troublesome [20, 22]. We confirm this with a
simple proof in this paper. The extensions on the basic JSON logic with non-
determinism and/or recursion looks much more like known languages such
as Propositional Dynamic Logic or XPath. For the evaluation of this logic
we offer a strategy based on compiling subtree equalities as node data tests,
and then invoke known bounds for XML [9] or PDL [2, 15]. For satisfiability
we offer just two results, one for the full logic and another one for the full
logic without equalities, but there are several results for XML that could
help us pinpoint the behaviour of each particular operator, starting with [36]
for a simple versions in which arrays are allowed nondeterminism, or porting
any of the fragments studied in e.g. [39, 18, 5, 22]. One could also try to
study more complex problems such as minimization of queries which were
considered in the XML context (see e.g. [17]), however, that is not the main
focus of our paper.
Organisation. We introduce the required notation, and define the appro-
priate data model for JSON is discussed in Section 2, and in Section 3 we
4
present our logic, together with an inspection of JSON query languages that
fuelled the design choices of the logic. Section 4 presents a complete analysis
of some of the algorithmic properties of our logic, namely the evaluation and
satisfiability problems. Our conclusions and the directions for future work
are discussed in Section 5.
Remark. Some of the results in this paper have previously appeared in
a conference proceedings [11]. However, this paper introduces substantial
new material. The analysis of JSON languages has been expanded, and this
version includes a series of backwards comparisons between our logic and
languages developed for JSON systems. The formalisation of MongoDB’s
projection is also new to this paper, as well as some of the proofs for the
results about evaluation and satisfiability.
5
objects or arrays, thus allowing the documents an arbitrary level of nesting.
A JSON document (or just document) is any JSON value. From now on we
will use the term JSON document and JSON value interchangeably.
1
Some JSON systems use the dot notation, where J.key and J.n are the equivalents of
J rkeys and J rns.
6
over each of v1 , . . . , vn . We also mention that some JSON databases
also feature FLWR expressions that support this type of iteration (see
e.g. [41, 10]).
2. JSON query languages normally provide a declarative alternative for
this iteration in the form of a wildcard or a similar argument that would
be matched to any key in a navigation instruction, and therefore extend
navigation instructions so that they return a set of values. For example,
the instruction J[*] would retrieve the set tv1 , . . . , vn u of JSON values,
since the wildcard * matches any of the keys s1 , . . . , sn .
To incorporate these primitives into our model, we add two more princi-
ples for navigating documents:
If J is a JSON object, then one should be able to iterate over all of its
values.
If J is a JSON array, then one should be able to iterate over all of its
elements.
At this point it is important to note a few important consequences of
processing JSON according to these primitives, as these have an important
role at the time of formalising JSON query languages.
First, note that we do not have a way of obtaining the keys of JSON
objects. For example, if J is the object {"first":"John", "last":"Doe"}
we could issue the instructions J[first] to obtain the value of the pair
"first":"John", which is the string value "John". Or use J[last] to obtain
the value of the pair "last":"Doe". However, there is no instruction that
can retrieve the keys inside this document (i.e. "first" and "last" in this
case).
Similarly, for the case of the arrays, the access is essentially a random
access: we can access the i-th element of an array using an expression of the
form J[i], and most of the time there are additional primitives to access the
first or the last element of arrays. For instance, J[-1] usually defines the
last element of the array, and J[-k], the kth element from last to first. Most
of the time, it is also possible to navigate through arrays by using a range of
indices. For instance, J[3,6] navigates from index 3 to index 6 (returning
all these elements in this particular order; i.e. returning a sub-array of J).
Likewise, J[0,-1] navigates from the first to the last element in the array.
Sometimes it is also possible to navigate the array in reverse using negative
indices; e.g. J[-1,-4] returns the last four elements of J in reverse order.
7
2.2. JSON trees
JSON objects are by definition compositional: each JSON object is a
set of key-value pairs, in which values can again be JSON objects. This
naturally suggests using a tree-shaped structure to model JSON documents.
However, this structure must preserve the compositional nature of the JSON
specification. That is, if each node of the tree structure represents a JSON
document, then the children of each node must represent the documents
nested within it. For instance, consider the following JSON document J.
{
"name": {
"first": "John",
"last": "Doe"
},
"age": 32
}
as explained before, this document is a JSON object which contains two
keys: "name" and "age". Furthermore, the value of the key "name" is an-
other JSON document and the value of the key "age" is the number 32.
There are in total 5 JSON values inside this object: the complete docu-
ment itself, plus the literals 32, "John" and "Doe", and the object "name":
{"first":"John", "last":"Doe"}. So how should a tree representation of
the document J look like? If we are to preserve the compositional structure
of the JSON specification, then the most natural representation is by using
the following edge-labelled tree:
"name" "age"
32
"first" "last"
"John" "Doe"
The root of tree represents the entire document. The two edges labelled
"name" and "age" represent two keys inside this JSON object, and they lead
to nodes representing their respective values. In the case of the key "age"
this is just a number, while in the case of "name" we obtain another JSON
object that is represented as a subtree of the entire tree.
8
Finally, we need to enforce the property that no object can have two keys
with the same name, thus making the model deterministic in some sense,
since each node will have only one child reachable by an edge with a specific
label. Let us briefly summarise the properties of our model so far.
Labelled edges. Edges in our model are labelled by the keys forming the
key-value pairs of objects. This means that we can directly follow the label
of edges when issuing JSON navigation instructions, and also means that
information about keys is represented in a different medium than JSON val-
ues (labels for the former, nodes for the latter). This is inline with the way
JSON navigation instructions work, as one can only retrieve values of key-
value pairs, but not the keys themselves. To comply with the JSON standard,
we enforce that two edges leaving the same node cannot have the same label
(but edge labels may be repeated across the document as long as they leave
from different nodes).
Compositional structure. One of the advantages of our tree representation is
that any of its subtrees represent a JSON document themselves. In fact, the
five possible subtrees of the tree above correspond to the five JSON values
present in the JSON document J.
Atomic values. Finally, some elements of a JSON document are actual val-
ues, such as numbers or strings. For this reason, the leafs of the tree that
correspond to numberss and strings will also be assigned a value they carry.
Nodes that are leafs and are not assigned a value represent empty objects:
that is, documents of the form {}.
Although this model is simple and conceptually clear, we are missing a
way of representing arrays. Indeed, consider again the document from Figure
1 (call this document J2 ). In J2 the value of the key "hobbies" is an array:
another feature explicitly present in the JSON standard that thus needs to
be reflected in our model.
As arrays are ordered, this might suggest that we can have some nodes
whose children form an ordered list of siblings, much like in the case of XML.
But this would not be conceptually correct, for the following two reasons.
First, as we have explained, JSON navigation instructions use random access
to access elements in arrays. For example, the navigation instruction used
to retrieve an element of an array is of the form J2[hobbies][i], aimed at
obtaining the i-th element of the array under the key "hobbies". But more
importantly, we do not want to treat arrays as a list because lists naturally
suggest that one can readily navigate from one element to its neighbours.
9
"name" "hobbies"
"age"
32
"first" "last" 1 2
On the contrary, in JSON the only way to obtain the next element in an
array is via the iteration of all its elements (or by randomly accessing this
element through its position). We choose to model JSON arrays as nodes
whose children are accessed by axes labelled with natural numbers reflecting
their position in the array. Namely, in the case of JSON document J2 above
we obtain the representation shown in Figure 2.
Having arrays defined in this way allows us still to treat the child edges of
our tree as navigational axes: before we used a key such as "age" to traverse
an edge, and now we use the number labelling the edge to traverse it and
arrive at the child.
Formal definition. As our model is a tree, we will use tree domains as its
base. A tree domain is a prefix-closed subset of N , where each node of the
form n i, with i P N, represents a child node of n. Without loss of generality
we assume that for all tree domains D, if D contains a node n i, for n P N
then D contains all n j with 0 ¤ j i.
Let Σ be an alphabet. A JSON tree over Σ is a structure J
pD, Obj, Arr, Str, Num, A, O, valq, where D is a tree domain that is parti-
tioned by Obj, Arr, Str and Num, O Obj Σ D is the object-child rela-
tion, A Arr N D is the array-child relation, val : Str Y Num Ñ Σ Y N
is the string and number value function, and where the following holds:
1. For each node n P Obj and child n i of n, O contains exactly one triple
pn, w, n iq, for a word w P Σ.
2. The first two components of O form a key: if pn, w, n iq and pn, w, n j q
are in O, then i j.
3. For each node n P Arr and child n i of n, with i P N, A contains the
triple pn, i, n iq.
10
4. If n is in Str or Num then D does not contain nodes of form n u.
5. The value function assigns to each string node in Str a value in Σ and
to each number node in Num a natural number.
The usage of a tree domain is standard, and we have elected to explicitly
partition the domain into four types of nodes: Obj for objects, Arr for arrays,
Str for strings and Num for numbers. The first and the second condition
specify that edges between objects and their children are labelled with words,
but children are uniquely identified by the word on the edge from their parent
(i.e. the edge label acts as a key to identify a child). The third condition
specifies that the edges between arrays and their children are labelled with
the number representing the order of children. The fourth condition simply
states that strings and numbers must be leaves in our trees, and the fifth
condition describes the value function val.
Throughout this paper we will use the term JSON tree and JSON inter-
changeably; that is, when we are given a JSON document we will assume
that it is represented as a JSON tree. As already mentioned above, one im-
portant feature of our model is that when looking at any node of the tree, a
subtree rooted at this node is again a valid JSON. We can therefore define,
for a JSON tree J and a node n in J, a function jsonpnq which returns the
subtree of J rooted at n. Since this subtree is again a JSON tree, the value
of jsonpnq is always a valid JSON.
11
2. JSON trees are deterministic. The property of JSON trees which im-
poses that all keys of each object have to be distinct makes JSON trees
deterministic in the sense that if we have the key name, there can be
at most one node reachable through an edge labelled with this key.
On the other hand, XML trees are nondeterministic since there are no
labels on the edges, and a node can have multiple children. As we will
see, the deterministic nature of JSON documents makes some problems
easier and others more difficult than in the XML setting.
3. Value is not just in the node, but is the entire subtree rooted at that
node. Another fundamental difference is that in XML when we talk
about values we normally refer to the value of an attribute in a node.
On the contrary, it is common for systems using JSON data to allow
comparisons of the full subtree of a node with a nested JSON document,
or even comparing two nodes themselves in terms of their subtrees. To
be fair, in XML one could also argue this to be true, but unlike in
the case of XML, these “structural” comparisons are intrinsic in most
JSON query languages, as we discuss in the following sections.2
On the other hand, it is certainly possible to code JSON documents us-
ing the XML data format. In fact, the model of ordered unranked trees with
labels and attributes, which serves as the base of XML, was shown to be
powerful enough to code some very expressive database formats, such as re-
lational and even graph data. However, both models have enough differences
to justify a study of the JSON standard on its own. This is particularly
evident when considering navigation through JSON documents, where keys
in each object have to be unique, thus allowing us to obtain values very ef-
ficiently. On the other hand, coding JSON data as XML data would require
us to have keys as node labels.
2
This is not to say that the techniques involved to study problems arising from JSON
subtrees must be fundamentally different of those developed for XML or tree automata.
In fact, comparing subtrees has already been studied for automata in e.g. [49, 34, 16].
12
lines, about how documents are accessed. As a result the syntax and op-
erations between systems vary so much that it would be almost impossible
to compare them. Hence, it would be desirable to identify a common core
of functionalities shared between these systems, or at least a general picture
of how such query languages look like. Therefore we begin this section by
reviewing the most common operations available in current JSON systems.
Here we mainly focus on the subdocument selecting functionalities of
JSON query languages. By subdocument selecting we mean functionalities
that are capable of finding or highlighting specific parts within JSON docu-
ments, either to be returned immediately or to be combined as new JSON
documents. As our work is not intended to be a survey, we have not re-
viewed all possible systems available to date. However, we take inspiration
from MongoDB’s query language (which arguably has served as a basis for
many other systems as well, see e.g. [3, 42, 47]), as well as JSONPath [24]
and SQL++ [41], two other query languages that have been proposed by the
community.
Based on this, we propose a navigational logic that can serve as a common
core to define a standard way of querying JSON data. We then define several
extensions of this logic, such as allowing nondeterminism or recursion. To
justify our claim that our logic is a common core of the languages, we show
how our logic captures the navigational functionalities of the languages we
originally reviewed.
13
the w-th value of J, if J is an array and w is an integer, or
3
For a detailed study of other functionalities MongoDB offers see e.g. [10]. Note that
in [10] the authors do not consider the find function though.
14
Operator Semantics (when evaluated over a document J)
N: $exists:true true if J N is nonempty
N: $exists:false true if if J N is empty
N: $eq:D true if J N is equal to D
N: $ne:D true if J N is not equal to D
N: $in:A true if J N is equal to one of the elements in array A
N: $nin:A true if J N is different from all elements in array A
N: $type:<type> true if J N is of type <type>
N: $all:A true if J N is an array containing all elements in A
N: $size:s true if J N is an array with s elements
N: $elemMatch:Q true if J N is an array and one of its elements
matches a new query query Q
Figure 3: Some comparison operators in Mongo. These comparisons are used together
with a navigation condition N , and here D is any JSON document and A is a JSON
array. We are omitting $gt, $gte, $lt and $lte (greater, greater-or-equal, lower and
lower-or-equal than), comparison of strings and regular expressions, operators for bitwise
comparison, geospatial operators and operators that deal with comments.
4
Strictly speaking, negation is not allowed in this fashion, but it can be simulated using
the allowed NOR operation.
15
JSONPath Intended meaning
$ The root of the tree
@ The current node being processed
* Any child
.. Any descendant
.’key’ Value of the key ’key’
[’key1 ’,...,’keyk ’] Value of any of the keys ’keyi ’
[i] The ith element of an array
[i1 ,...,ik ] All elements ij (1 ¤ j ¤ k) of the array
[i:j] Any element between position i and j
Table 1: Atomic navigation expressions in JSONPath. When selecting the ith element of
the array, a negative i means the ith element from the last (see Section 2.1). Similarly,
negative indices in the expression [i:j] mean traversing the array in reverse.
of the find function is projection, and is used to construct and return different
documents. We will show how to use our framework to formalise mongoDB
queries with projection later on, in Section 3.4.
Query languages inspired by XPath. The languages we analysed thus
far offer very simple navigational features. However, people also recognized
the need to allow more complex properties such as nondetermnistic naviga-
tion, expression filters and allowing arbitrary depth nesting through recur-
sion. As a result, an adaptation of the XML query language XPath to the
context of the JSON specification, called JSONPath [24, 1] was introduced
and implemented.
JSONPath uses expressions to traverse JSON documents in the same way
XPath works with XML trees. Expressions start with the context variable $,
signalling the outermost layer of the document, and navigate the tree repeat-
edly using one of the atomic navigational axes in Table 1. A navigational
JSONPath expression is simply a concatenation of atomic navigational ex-
pressions, and its semantics is equivalent to executing these expressions in
sequence.
Just as with XPath, JSONPath expressions allow for filtering paths, by
the usage of the expression ?(). Filters can be either paths or a comparison
between JSONPath expressions and/or values. They are constructed by using
a context variable @ to refer to the current node in the navigation. For
example, we can use $..book[?(@.isbn)] to obtain all objects with a key
16
book, and such that this object contains a key-value pair with isbn as key.
Or we can use $..book[?(@.price<10)] to specify that the book contains a
key-value pair of the form "price":n, with n a number less than 10. Finally,
context variables can be mixed inside filters, so for example the expression
..[?([email protected])] searches for the nodes where it is true that the value
of the key key2 of the current node (retrieved using the operation @.key2)
equals the value of the key key1 of the root of our document (the operation
$.key1).
JSONPath does not offer an official semantics other than a considerable
number of examples, and thus we can only give an overview of the semantics
of this language. We view JSONPath expressions as queries that return a
number of nodes from a given document. Given a JSONPath expression α,
the evaluation αpJ q over a document J would then be all nodes that can be
reached by navigating through the axes of Table 1, as if they were JSON
navigation instructions (one can understand these axes as a form of non-
deterministic navigation instruction). If a filter is encountered during the
navigation, then a check must be performed to make sure that the current
node satisfies this filter. In turn, a filter of the form ?( e ), for an expression
e, is satisfied if e returns at least one value. Likewise, a filter of the form ?(
e1 = e2 ) is satisfied if there are nodes n1 and n2 belonging to the evaluation
of e1 and e2, respectively, and n1 n2 . Other comparisons are defined in
the same way. Note that JSONPath also offers a special built-in function
length such that @.length returns the length of the current node in case
that it is an array (and an error otherwise).
Query languages inspired by FLWR or relational expressions. There
are several proposals to construct query languages that can merge, join and
even produce new JSON documents. Most of them are inspired either by
XQuery (such as JSONiq [45]) or SQL (such as SQL++ [41]). These lan-
guages have of course a lot of intricate features, and to the best of our knowl-
edge have not been formally studied. However, in terms of JSON navigation
they all seem to support basic JSON navigation instructions and not much
more.
Based on these features, we first introduce a logic capturing basic queries
provided by navigation instructions and conditions, and then extend it with
non-determinism and recursion so that it captures more powerful querying
formalisms.
17
3.2. Deterministic JSON logic
The first logic we introduce is meant to capture JSON navigation instruc-
tions and other deterministic forms of querying such as most of MongoDB’s
find function conditions. We call this logic JSON navigation logic, or JNL for
short. We believe that this logic, although not very powerful, is interesting in
its own right, as it leads to very lightweight algorithms and implementations,
which is one of the aims of the JSON data format.
As often done in XML [21] and graph data [32], we define our logic in
terms of unary and binary formulas.
Definition 1 (JSON navigational logic). Unary formulas ϕ, ψ and bi-
nary formulas α, β of the JSON navigational logic (JNL for short) are ex-
pressions satisfying the grammar
α, β : x ϕ y | Xw | Xi | α β | ε | r
ϕ, ψ : J | ϕ | ϕ ^ ψ | ϕ _ ψ | rαs | EQpα, Aq | EQpα, β q
where w is a word in Σ , i is a natural number and A is an arbitrary JSON
document.
Intuitively, binary operators allow us to move through the document (they
connect two nodes of a JSON tree), and unary formulas check whether a
property is true at some node of our tree. For instance, Xw and Xi allow basic
navigation by accessing the value of the key named w, or the ith element of an
array respectively, and r is used to access the root of a document. They can
subsequently be combined using composition or boolean operations to form
more complex navigation expressions. Unary formulas serve as tests if some
property holds at the part of the document we are currently reading. These
also include the operator rαs allowing us to test if some binary condition is
true starting at a current node (similarly, xϕy allows us to combine node tests
with navigation). Finally, the comparison operators EQpα, Aq and EQpα, β q
simulate XPath style tests which check whether a current node can reach
a node whose value is A, or if two paths can reach nodes with the same
value. The difference with XML though, is that this value is again a JSON
document and thus a subtree of the original tree.
The semantics of binary formulas is given by the relation JαKJ , for a
binary formula α and a document J, and it selects pairs of nodes of J:
JxϕyKJ tpn, nq | n P JϕKJ u.
18
JXw KJ tpn, n1q | pn, w, n1q P Ou.
JXi KJ tpn, n1 q | pn, i, n1 q P Au, for i P N.
Jα βKJ JαKJ JβKJ .
JεKJ tpn, nq | n is a node in J u.
JrKJ tpn, nq | n is the root of J u.
For the semantic of the unary operators, let us assume that D is the domain
of J.
JJKJ D.
J ϕKJ D JϕKJ .
or equivalently,
19
Typically, most systems allow jumping to the last element of an array,
or the j-th element counting from the last to the first. To simulate this we
can allow binary expressions of the form Xi , for an integer i 0, where 1
states the last position of the array, and j states the j-th position starting
from the last to the first. Having this dual operator would not change any
of our results, but we prefer to leave it out for the sake of readability.
JNL and JSON navigation instructions. As promised, we can easily
encode JSON navigation instructions with JNL: we iteratively replace every
entry of the form J rkeys with Xkey and J rns with Xn .
3.3. Extensions
Although the base proposal for JNL captures the deterministic spirit of
JSON navigation, it is somewhat limited in expressive power. First of all, it
fails to express one of the basic query primitives we desire from a JSON lan-
guage: iteration through JSON values (see Subsection 2.1). Second, JNL as
defined in Section 3.2 can not capture neither MongoDB’s find function nor
the base proposal of JSONPath [24], as it can not traverse arbitrary paths.
Here we propose two natural extensions: the ability to non-deterministically
select which child of a node is selected, and the ability to traverse paths of ar-
bitrary length. In the following subsection we show how these extensions add
enough expressive power to capture the most popular JSON query languages
in existence today.
Non-determinism. The path operators Xw and Xi can be easily extended
such that they return more than a single child; namely, we can permit match-
ing of regular expressions and intervals, instead of simple words and array
positions. Similarly, we can add disjunction to binary formulas to allow them
to choose which path they traverse.
20
Formally, Non-deterministic JSON logic, or NJNL, extends binary for-
mulas of JNL by the following grammar:
α, β : xϕy | Xe | Xi:j | α β | α Y β | ε
where e is a subset of Σ (given as a regular expression), and i ¤ j are
natural numbers, or j 8 (signifying that we want any element of the
array following i). The semantics of the new path operators is as follows:
Notice that NJNL can easily express iteration through JSON values: to
iterate over of the values of some object we use the expression XΣ , and to
iterate through the elements of an array, we simply use the expression X1: 8 .
As when defining JNL, here we assume that the indices in Xi:j are positive.
It is straightforward to extend their semantics in order to allow iterating
backwards, as discussed previously (see Section 2.1).
Recursion. While NJNL is enough to capture Mongo’s find function, we
are missing one more ingredient to be able to fully capture JSONPath: the
ability to traverse down to any descendant of a node, not just to its children.
We go a bit further, and allow exploring not just the descendant query, but
any path of arbitrary length that can be described by subsequently applying
any formula an arbitrary number of times. We capture this by adding a
Kleene star to our logic. Recursive JNL, or RJNL, allows pαq as a binary
formula (as usual we normally omit the brackets when the precedence of
operators is clear). The semantics of pαq is given by
21
of the tree (and its neighbours). For example, MongoDB includes primitives
to see that a certain array contains n elements, or that a node is of a given
type (string, object, array, etc.); other proposals include testing whether all
the elements of an array are different [30], testing that a certain property is
present in an object, etc. Since our goal is to understand the navigational
capabilities, we do not cover these node tests in full detail. However, adding
some of them to our logic is straightforward: we simply create more unary
predicates that are to be tested in a given node, just as we did when including
the equality predicate EQ.
22
Condition N : C Equivalent NJNL expression ψN :C
N: {$exists: true} rϕN s
N: {$exists: false} rϕN s
N: {$eq: D} EQpϕN , Dq
N: {$ne: D} EQpϕN , Dq
N: {$in: A} EQpϕN , a1 q _ _ EQpϕN , an q
N: {$nin: A} EQpϕN , a1 q ^ ^ EQpϕN , an q
N: {$all: A} EQpϕN X1: 8 , a1 q ^ ^ EQpϕN X1: 8 , an q
N: {$size: s} rϕN Xss ^ rϕN Xs 1s
N: {$elemMatch: Q } rϕN X1: 8xϕQys
Table 2: Encoding a navigation condition of the form N : C, where N is a JSON navigation
instruction and C a condition with a NJNL expression. Here ϕN is the JNL expression
obtained from N in Proposition 1. Note that all operators except for all and elemMatch
require only deterministic navigation.
JSON tree J with root r, the result of evaluating N : C over J returns true
if and only if the root r of J is in JψN :C KJ .
23
obtaining a single node n from J. We then prune from J everything which
is not an ancestor or a descendant of n. The final projection combines all
the structures obtained for each of the <path>:1 pairs, to form the desired
query answer.
Thus, if we want to retrieve only the first name in our John Doe document,
plus his age, we would write
db.collection.find({},{"name.first":1, age:1}),
and obtain the document
{
"name": {
"first": "John"
},
"age": 32,
}
We define the result of the expression <path>:1 over a JSON tree J, de-
noted Jpath , formally as follows. First, let ϕpath be the JNL formula obtained
from the navigational expression path in Proposition 1. If pr, nq is not in
Jϕpath KJ for any node n, then Jpath is the empty data structure. Otherwise,
let n be the unique node such that pr, nq P Jϕpath KJ . Let Dpath be the set con-
taining all the nodes from J that: (i) either lie on the unique path from the
root of J to n; (ii) or are in the subtree jsonpnq. The expression <path>:1,
when evaluated over the JSON tree J pD, Obj, Arr, Str, Num, A, O, valq,
produces a structure Jpath , which is the substructure of J with the domain
Dpath (that is the elements from Obj in this structure are Obj X Dpath , and
similarly for other elements in J). Notice that, while Jpath has a tree struc-
ture, it is not necessarily a JSON tree (e.g. when the expression path selects
an element of an array different from the initial element Dpath is not even a
tree domain).
To define the semantics of the projection operator, we have to show
how to combine the results of combining multiple paths. More pre-
cisely, if the projection operation consists of pairs <path1 >:1, <path2 >:1
... <pathn >:1, then denote by tJ1 , . . . , Jn u the set tJ<path1> , . . . , J<pathn> u
obtained by evaluating <path1 >:1, <path2 >:1 ... <pathn >:1 over a docu-
ment J as described above. Furthermore, denote each data structure Ji as
Ji pDi , Obji , Arri , Stri , Numi , Ai , Oi , vali q. Let J be the tuple
¤ ¤ ¤ ¤ ¤ ¤ ¤ ¤
Di , Obji , Arri , Stri , Numi , Ai , Oi , vali ,
24
where we abuse notation by treating each function vali as the relation con-
taining all pairs px, vali pxqq (so that it can be unified as the rest of the com-
ponents). Since all the expressions are evaluated over the same document J,
te structure J is a tree, in the sense that it has exactly one root and
the
children of each of its nodes is given by one of the relations
Obj i or Arr i.
However, J is not a JSON document yet: the set Di is not necessarily
a tree domain, and because of this the array-child relation Ai does not
satisfy the condition of our definition, as one may find e.g. a node n with a
child n 2 but no child n 0, nor n 1. However, we can relabel the nodes in the
domain and update all relations accordingly. That is, let D be the (unique)
tree-domain such
that the tree represented by D is isomorphic to the tree
represented by Di , and let h be the
isomorphism from D to Di . Now
define Obj as, informally, the union hpObji q: the set that contains a node
hpnq for each node n in any of the Obji ’s. Define Arr , Str and Num in the
same way. Furthermore, let A map each node hpnq to any of the children
included in any of the relations Ai : a set that contains the triple phpnq, i, n iq
for each triple pn, j, n1 q in any of the Ai ’s, assuming hpn1 q hpnq i. Like-
wise, define O as the set that contains a triple phpnq, w, hpn1 qq for each triple
pn, w, n1q in any of the Oi’s. Finally, let val be the function that assigns to
each node n the value vali ph1 pnq, if there exists an 1 ¤ i ¤ n for which
vali ph1 pnq is defined, and is undefined otherwise.
The resulting JSON document is then defined as the tree
25
JSONPath Intended meaning RNJNL equivalent
$ The root of the tree r
@ The current node being processed ε
* Any child XΣ Y X1: 8
.. Any descendant pXΣ Y X1: 8q
.’key’ Value of the key ’key’ Xkey
[’key1 ’,...,’keyk ’] Value of one of the keys ’keyi ’ Xkey1 Y . . . Y Xkeyk
[i] The ith element of an array Xi
[i1 ,...,ik ] The element ij of the array Xi1 Y . . . Y Xik
[i:j] Any element between position i and j Xi:j
It is easy to see that the semantics is well defined in both cases, and
that one can compute the JSON document resulting of these projections in
Ptime. However, one naturally wonders whether one could extend these
instructions to, for example, allow for a combination of pairs <path>:1 and
<path>:0; or allowing a finer interplay between filtering and projecting, to
issue instructions such as remove all nodes where the age is less than 18. We
believe this is an interesting ground for future work, as there are also fun-
damental questions regarding the expressive power of these transformations,
and the possible interactions with schema definitions.
As we mentioned, the only axis for which we need recursion is the de-
scendant ... As far as equality tests are concerned, the most common use
5
Here we assume that JNL expressions Xi and Xi:j can use negative indices, and extend
their semantics accordingly.
26
of filters in JSONPath, which test if the current node, or some of its descen-
dants equals either a fixed value or another descendant of the current node,
is precisely the same as the semantics of the EQ operation in JNL, so one can
easily obtain a similar result for JSONPath formulas that include equality
comparisons. To support more involved tests, and the use of the in-built
function .length, we would have to extend RNJNL with these capabilities
explicitly, as is usually the case.
27
Proof: For this proof we will assume that the JSON tree is stored through
a series of pointers (i.e. in a way that the tree data structure is usually
implemented). Each pointer in this representation will additionally carry its
label (i.e. the key), so that we can directly access a child of a node using a
particular label. For instance, we can ask for the name child of the root of the
JSON tree from Figure 2 to obtain the node to which the pointer labelled
name points to. Note that we can easily implement this in such a way that
asking for a child with a specific key has cost Op1q, assuming that the key
itself is treated as a single symbol. The latter is a reasonable assumption,
since in practice the size of the JSON document is almost never dominated
by the size of the keys that are used. Alternatively, we could simply put
a fixed upper bound on the number of symbols used in each key for this
assumption to hold true.
We will furthermore assume that the JNL query is given by its parse tree,
which is again a reasonable assumption, as database systems usually store
the queries in this form internally.
Next, we make three observations that will be key to obtaining the evalua-
tion algorithm. First, note that it is sufficient to solve the evaluation problem
for the root a JSON tree. Indeed, testing if n P JϕKJ for some formula ϕ is
the same as testing if this hold at the root of the tree jsonpnq, i.e. the subtree
of J rooted at n. Second, given a binary formula α and a node n in a JSON
tree J, there can exists at most one node n1 in J such that pn, n1 q P JαKJ .
The latter follows from the fact that path formulas in JNL are simply con-
catenation of symbols (or unary tests which hold at one precise node), and
that a JSON tree is deterministic, i.e. it can have at most one path with
a pre-defined labelling of the edges. Third, we will heavily rely on the fact
that each unary JNL formula is a boolean combination of the operators J,
rαs, EQpα, β q and EQpα, Aq.
To make the presentation easier to follow, we will present the algorithm for
JNL fragments of increasing complexity, starting with the simplest fragment
not allowing the EQ operator, nor the use of xy inside the operator rαs; that
is, we do not allow the nesting of unary and binary formulas. The syntax of
this fragment is given below:
α, β : Xw | Xi | α β | ε
ϕ, ψ : J | ϕ | ϕ ^ ψ | ϕ _ ψ | rα s
The key observation here is that our formula is a boolean combination of
the operators J and rαs, where α is a simple concatenation of key names or
28
array positions (and ε). We can therefore view it as a propositional formula
where all the variables are different, and each operator rαs or J corresponds
to one variable. What then remains is to evaluate each rαs, seeing whether
it holds true at the root of the tree J, and using this information evaluate
the propositional formula. Since the tree J is deterministic, and α is simply
a concatenation of key names, array positions and ε, checking whether rαs is
true can clearly be done in time Op|α|q by following the appropriate pointers
from the root of J. Therefore, the total time needed to evaluate ϕ corresponds
to the sum of Op|α|q, for each operator rαs in ϕ, plus the time needed to
evaluate the propositional formula, which can be done in time Op|ϕ|q, thus
giving the total time bounded by Op2 |ϕ|q Op|ϕ|q.
Next we show how this algorithm can be extended when we allow nesting
of subformulas; namely, if we allow the operator xψ y inside binary formu-
las. In this case, we can proceed similarly as when nesting is not present.
More precisely, each binary subformula xψ y occurring in ϕ is again a boolean
combination of J and rαs, with the difference that α can again contain xψ 1 y,
with ψ 1 a unary formula, and so on recursively. In this case, we can use the
same algorithm as when nesting is not present, but with potential recursive
calls to itself. That is, our formula ϕ is going to be a boolean combination
of Js and rαss, so we proceed by treating it again as a propositional formula
where we need to find the value of rαss. When processing each rαs we rely on
the fact that α is a concatenation of key names, array positions, ε, and xψ y,
for ψ a unary formula. We therefore process α by remembering where we
currently are in the document J (i.e. we remember the unique node where
each part of the concatenation in α starts), and if we encounter xψ y in α,
we evaluate ψ recursively at this node using the same procedure, until its
value can be computed because it contains no further nesting (i.e. we are
treating a formula with no xψ 1 y inside its binary subformulas). It can be
shown by an easy induction on the nesting depth that this indeed works in
time Op|ϕ|q. The base case is when ϕ contains no nesting, so the result fol-
lows from the algorithm above. If we assume that the claim holds for each
formula where the nesting depth is n, then a formula of nesting depth n 1
can easily be evaluated, since each nested subformula xψ y can be evaluated in
time Op|ψ |q by the induction hypothesis, so our evaluation algorithm, which
simply treats ϕ as a propositional formula whose truth value depends on
the value of (top level) operators rαs, can process each α as before (i.e. by
following the required concatenation of symbols inside the tree), while each
of its xψ y subformula is evaluated in time Op|ψ |q, giving the total time of
29
Op|ϕ|q.
Finally, we show how to treat the equality operators EQpα, β q, and
EQpα, Aq. For his, observe that comparing if two JSON trees J1 and J2
are equal can be done in time Opmaxt|J1 |, |J2 |uq by doing a traversal of the
two trees in parallel (by following pointers from the root). Therefore, to
evaluate EQpα, Aq inside a formula, we will be starting at some node n of
the tree J. We can then compute the unique node n1 such that pn, n1 q P JαKJ
as in the case when no equality tests are used (or recursively as above if α
contains equality tests). Having this n1 , we can then compare the subtree of
J rooted at n1 with A in time Op|A|q as described above, so it is bounded
by the size of the subformula. If the two trees are equal we mark the node
corresponding to EQpα, Aq in the parse tree of ϕ with true, and if they are
not equal, or n1 does not exist, with false. Notice that we still maintain the
Op|ϕ|q bound when processing these types of queries.
Handling EQpα, β q cannot be done in a similar manner, because it would
involve checking for equality of two JSON subtrees, which can take time
comparable to the size of those trees, an arbitrary number of times. To
reduce the complexity, if any EQpα, β q operator is present then we first
preprocess the tree, assigning colors to nodes such that n and n1 are assigned
the same color if the subtrees starting from this nodes is the same (in other
words, if jsonpnq jsonpn1 q). Notice that this preprocessing phase only
needs |J | colors, and we can do it in linear time in the size of J by first
processing all leafs of J, then all nodes one level above, and so on: checking
for a level only needs to take into account the children of the nodes at this
level, and their respective colors. Now, to evaluate EQpα, β q, we simply need
to check whether the node retrieved by α is colored with the same color as
the node retrieved by β, which can be done in constant time. This gives us
a total time in Op|J | |ϕ|q, as we first need to do the linear preprocessing
and then proceed with the evaluation of ϕ. l
Next, we move to satisfiability. Here we can easily get NP-hardness from
the fact that JNL can emulate propositional formulas. However, we show
that this lower bound holds even for the positive fragment of JNL. It might
be surprising that the positive fragment is not always trivially satisfiable, but
this holds due to the fact that each key in an object is unique, so a formula
of the form rXa xrX1 sys ^ rXa xrXb sys is unsatisfiable because it forces the
value of the key a to be both an array and a string at the same time. We
also show a matching NP-upper bound, which is again not direct due to the
30
presence of numbers in our logic.
Proof: Membership. This is a standard guess and check algorithm for NP.
More precisely, if ϕ is a satisfiable formula, the document satisfying ϕ can
have height at most |ϕ|, and its width at each level is at most the number
of operators appearing at this level in the formula. A small complication
is caused by having array positions written in binary, so constructing this
many array elements might be exponential in the length of the input number.
However, not all the array elements need to be materialized, as we can simply
sort the required numbers at each level of our formula and start enumerating
them from 1 again. This gives us a polynomial size witness for the formula,
since we only need to materialize the array elements that are mentioned in the
formula itself. It is straightforward to see that a formula where the numbers
are reassigned in this way is satisfiable if and only if the original formula is
satisfiable. We can now simply guess a witness of polynomial size and using
Proposition 4 check whether it satisfies the formula.
Hardness. Reduction is from 3SAT. Let ϕ be a propositional formula
in 3CNF using variables p1 , . . . , pn and clauses C1 , . . . , Cm . For each pi we
define the formula θp1 pXp1 xX1 yq _ pXp1 xXw yq, with w a fresh string, with
the intention of allowing, as models, all valuations of each of the pi ’s: if the
object under key p1 is an array, then we will interpret this as pi being assigned
the value true, and if it is an object we will interpret that pi was assigned
the value false. Moreover, for each of the clauses Cj that uses variables a, b
and c, define γCJ as Xa Sa _ Xb Sb _ Xc Sc , where each Sa , Sb and Sc is either
xX1y, if a (respectively, b or c) appears positively in Cj , and xXw y otherwise.
Recall that all edges leaving from array nodes are labelled with natural
numbers, and all edges leaving from object nodes are labelled with strings.
This means that for any document J it must be the case that JxX1 yKJ and
JxXa yKJ are always disjoint. Moreover, recall that an object cannot have
two pairs with identical keys, so a node in a JSON trees cannot have two
children under an edge with the same label. With these two remarks it is
then immediate to see that ϕ is satisfiable if and only if the following JNL
31
expression is satisfiable:
© ©
rθp s ^
i
rγC s
`
¤¤
1 i n ¤¤
1 ` m
Proof: When our formula does not use the EQpα, β q operator, we can reuse
the classical model checking algorithm from PDL [2, 15] that runs in time
Op|J ||F |q, since our logic is a syntactic variant of PDL and JSON trees can
be viewed as a generalisation of Kripke structures. Some small changes are
needed though in order to accommodate the specifics of the JSON format
and of our syntax. First, arrays can be treated as usual nodes, with the
edges accessing them being labelled by numbers. Second, for the formula
Xe , where e is a regular expression, JNL traverses a single edge (i.e. the
regular expression is not applied as the Kleene star over the formulas). To
accommodate this, we can mark each edge of our tree with an expression e
such that the label of the edge belongs to the language of e. Since checking
membership of a label l in the language of the expression e can be done
in Op|e| |l|q, and the sum of the length of all the labels is less than the
size of the model (as we have to at least write them all down), this can
be done in Op|e| |J |q. We now repeat this for every regular expression
appearing in our JNL formula. Since the number of expressions is linear in
32
the size of the formula the preprocessing takes linear time. Finally, we need
the preprocessing that employs as many colors as the number of EQpα, Aq
operators appearing in ϕ, and that colors the nodes n such that jsonpnq A.
With this preprocessing, we can treat the fact that jsonpnq A as a unary
node test, and then process EQpα, Aq as the formula that process α and
then checks that this node satisfies this unary test. We can now run the
classical PDL model checking over this extended structure treating regular
expressions and numbers as ordinary edge labels.
When the EQpα, β q operator is used we need again the linear preprocess-
ing to reduce subtree equality to data tests (checking if the color of a node is
the same to the color of another node). With this model, the evaluation of
RNJNL can be dealt with using the same techniques for evaluating XPath
under said data tests. We recall the Op|J | 2|ϕ| q bound from [9].
Interestingly, we can also show a Op|J |3 |ϕ|q bound, and actually this
bounds hold even for the more general problem that takes as input a JSON
tree J, a JNL formula (with recursion and equality tests) F , and computes
the relation (or set if F is unary) JF KJ .
Before describing the algorithm for evaluation, we give some observations.
First, notice that for a JSON tree J, we measure the size |J | as the number of
nodes in J (we implicitly assume that the size of the edge labels is dominated
by this number; alternatively, one could also count the sizes of edge labels to
contribute to the size of the model without changing much). This remains
true if we view J as a graph, since we will have a graph with |J | nodes and
|J |1 edges (since J is a tree). Therefore standard graph traversal algorithms
such as breadth and depth first search will run in time Op|J |q over a JSON
tree. Next, we will assume that there is an ordering of the elements of a
JSON tree J. This is easily obtained by e.g. running a graph traversal
algorithm over our tree, and will allow us to sort the results of our query.
Finally, we will assume that there is a total ordering on the key names and
array positions, with numbers coming before any string, and strings being
ordered according to the lexicographical ordering.
Considering this, the first step of our algorithm will be to pre-compute
the equality relation with a linear preprocessing, assigning colors to nodes
with the same subtrees just as we did in the proof of Proposition 4, so that
we can test if jsonpnq jsonpn1 q in constant time, for two nodes n, n1 .
Next we compute the relation E pAq, of all the nodes equal to the fixed
JSON document A, for each operator EQpα, Aq appearing in F . We can
color all nodes equivalent to a single document A just as we did with the
33
linear preprocessing above. We therefore need one new color per document
A appearing in ϕ, which can be done in Op|J | |ϕ|q.
Finally, we will pre-compute the relations JXe KJ , for every operator Xe
appearing in our input formula F . This can easily be done with a single
pass over the tree J, where checking if the label of the edge coming into the
current node belongs to e can be done in time Op|e|q. Therefore the entire
algorithm takes time at most Op|J | |F |q. Similarly, we pre-compute the
relation JXi:j K, for each operator can be done in Op|J | |F |q. Note that all
of these relations have size at most Op|J |q, since they are defined over the
edges of our tree J.
We now solve the evaluation problem using a dynamic programming algo-
rithm that processes the parse tree of our formula F in a bottom-up fashion,
and computes, for every binary sub-expression α of F , the binary relation
JαKJ . Similarly, we compute, for every unary sub-expression ϕ of F , the set
JϕKJ . Clearly, if each such relation can be computed within time Op|J |3 |ϕ|q,
the evaluation problem can be solved within the required time.
We now discuss how to obtain the desired time bound. The algorithm
is similar to an algorithm used for evaluating regular expressions on graphs
[31, 32], and can be described inductively as follows.
The base cases for binary expressions, that is, computing JαKJ where α is
one of ε, Xe , or Xi:j are straightforward, since ε simply defines the diagonal
relation, and we already have all the Xe , and Xi:j pre-computed. Similarly,
the base cases for unary expressions, that is, computing JJKJ is trivial as
well.
For the induction step we need to consider binary expressions of the form
xϕy, α β, α Y β, and α. Also, we need to consider node expressions of the
form ϕ, ϕ ^ ψ, ϕ _ ψ, rβ s, EQpα, β q, and EQpα, Aq.
In the case of binary expressions, the case xϕy is trivial because JϕKJ
contains at most |J | elements. For α Y β we can first sort both relations JαKJ
and JβKJ (costing Op|J |2 log |J |q time since they are of the size at most |J |2 )
and then compute Jα Y βKJ while performing a single pass over JαKJ and
JβKJ . For α β the relation Jα βKJ is the composition JαKJ JβKJ , which can
be obtained by computing the natural join of JαKJ with JβKJ by sorting the
first relation on the second attribute, and the second one on the first one.
Computing Jα KG amounts to computing the reflexive-transitive closure of
JαKG which can be done in time |J |3 by Warshall’s algorithm.
In the case of unary expressions, ϕ, ϕ ^ ψ, ϕ _ ψ, and rβ s are straightfor-
ward to evaluate in the desired time. The most interesting cases here involve
34
the operator EQ.
Computing JEQpα, β qKJ from JαKJ and JβKJ can be done similarly to how
one performs a sort-merge join. First, we sort the relations JαKJ and JβKJ
on the first attribute in time Op|J |2 log |J |q. Then, for each of the |J |
possible nodes n (in increasing order), we can compute in time Op|J |q the
sets Dn,1 tn1 | pn, n1 q P JαKJ u and Dn,2 tn2 | pn, n2 q P JβKJ u. Since
both Dn,1 and Dn,2 have at most |J | elements, and using our pre computed
equality relation we can test in constant time if jsonpn1 q jsonpn2 q, it can
be tested in time Op|J |2 q if the two sets have an element with the same value.
The result contains all such elements, and can therefore be computed in time
Op|J |3 q. The case of EQpα, Aq is similar, since we have pre-computed the
relation E pAq containing all the elements in J equal to A.
l
For satisfiability the situation is radically different, as the combination
of recursion, non-determinism and the binary equalities ends up being too
difficult to handle. The following proposition can be shown by adapting
similar results for tree automata using equality and inequality constraints (see
e.g. [16], proposition 4.2.10), but for completeness we give here a reduction
from the halting problem of two counter machines.
35
δ pq q pIF Ci 0 T HEN q1 , ELSE q2 q, stating that, when in the
state q, the machine should move to the state q1 if the counter Ci
contains a zero, and move to the state q2 otherwise.
The triple consisting of the current state of the machine, and the values
in the two counters is called the configuration of the machine. The machine
starts in the initial configuration where the state is q0 and C1 C2 0. It
then starts executing the instructions of the relation δ. The halting problem
for two-counter machines asks if there is a sequence of transitions in δ starting
with the initial configuration, and such that the configuration reached at the
end of this sequence has the state qf and C1 C2 0. The configuration
with the state qf and C1 C2 0 is called accepting. As shown in e.g. [28],
this problem is undecidable.
For our reduction, given a two-counter machine M , we need to define
a formula ϕ that is satisfiable if and only if M can reach the accepting
configuration when started in the initial configuration. We will encode a
configuration (inside a run) of a two-counter machine M using a JSON object
that has precisely four keys:
the key state, whose value is the current state of the machine repre-
sented as a string; and
the key next, whose value is another JSON object representing a con-
figuration of our machine.
The values of keys c1 and c2 will be nested JSON objects with a single
key a, whose depth represents the value of the corresponding counter. The
empty counter is represented by the string "0". For instance, if C1 2, our
coding will have c1 :{"a":{"a":"0"}}, and if C2 0, then our encoding will
have c2 :"0". Therefore, if we are in a configuration where q is the current
state and the values of the counters are C1 C2 1, then we will code this
configuration with the following JSON document:
{
"state": "q",
"c1": {"a": "0"},
36
"c2": {"a": "0"},
"next": {...}
}
The idea here is that the key "next" is either an encoding of a valid
configuration that follows the current one, or is not present in the document
(signalling that we have reached an accepting configuration).
We now define a formula ϕM that will accept precisely such encodings of
valid runs of a counter machine M . We let ϕM xFinit Ftransition Faccept y,
where:
Finit εxEQpXc1 , ”0”q ^ EQpXc2 , ”0”q ^ EQpXstate , ”q0 ”qy Xnext ,
checks that we are in the initial configuration at the root of our JSON
document;
Faccept εxEQpXstate , ”qf ”qy, checks that at the end we reach the ac-
cepting configuration; and
Ftransition pεx q ϕq y Xnext q , checks that the transition from one
configuration to the next is done correctly according to δ.
The formulas ϕq code the transition δ pq q as follows:
if δ pq q pIN C Ci , q 1 q, then
ϕq EQpXc , Xnext Xc Xaq^
i i
if δ pq q pDEC Ci , q 1 q, then
ϕq EQpXstate , ”q”q ^ EQpXnext Xstate , ”q 1 ”q^
EQpXc Xa , Xnext Xc q _ EQpXc , ”0”q
i i i
©
rXc Xas ^ EQpXnext Xstate, ”q2”q
i
EQpXc , Xnext Xc q
i i
37
The idea here is to use the current state to check that the value of the key
next is correct according to the transition function δ. With this definition
at hand it is straightforward to see that M has an accepting run if and only
if the formula ϕM is satisfiable, since every JSON document satisfying ϕM
has to contain as a subdocument a JSON document that correctly codes a
satisfying run. l
As it is usually the case in PDL-like languages such as XPath [5, 35],
the undecidability is caused by the presence of the equality test EQpα, β q.
Indeed, it was shown several times in the literature that XPath without
data tests has Exptime-complete satisfiability problem. Most notably, the
techniques of [5] can be transferred directly to our case.
5. Future perspectives
In this work we present a first attempt to formally study the JSON data
format. To this end, we describe the underlying data model for abstracting
JSON documents, and introduce logical formalisms which capture the way
JSON data is accessed and controlled in practice. Through our results we
emphasise how the new features present in the JSON standard affect classical
results known from the XML context. While some of these features have been
consider in the past (e.g. comparing subtrees [6, 49], or an infinite number
of keys [7]), it is still not entirely clear how these properties mix with the
deterministic structure of JSON trees, thus providing an interesting ground
for future work.
Apart from these fundamental problems that need to be tackled, there is
also a series of practical aspects of the JSON format that we did not consider.
In particular, we identify three areas that we believe warrant further study,
and where the formal framework we propose could be of use in understanding
the underlying problems.
MongoDB’s projection. Our formalisation of MongoDB’s projection
opens up several lines of work. First there is the issue of understanding
up to what extend can this projection operators be extended without loos-
ing the good algorithmic properties of the operator. Indeed, the projection
38
in MongoDB is quite limited in expressive power, and does not allow a lot
of interaction between filtering and projecting. There are also fundamental
questions regarding the expressive power of these transformations, and the
possible interactions with schema definitions. Finally, there is the issue of
understanding the real need for a JSON-to-JSON query language, and to
what extent does JNL, augmented with projection, satisfies these needs.
A standard query language for JSON data. The popularity of the
format constantly feeds the creation of new systems capable of dealing with
JSON data. More often than not, these systems come up with their own ver-
sion of a query language. The lack of a standardised query language taxes
the JSON environment with the cost of learning several of these languages,
makes benchmarking these systems a rather complicated effort, and prevents
the community from adopting or refining the fastest querying algorithms
available at hand. Of course, the creation of a standard language requires
finding common ground amongst the different languages currently available,
which would be much easier to do in our framework, using JNL as a com-
mon language to compare them. We are strongly convinced of JNL as an
interesting core for a future standardisation of a JSON query language.
Streaming. Another important line of future work is streaming. Indeed, the
widespread use of JSON documents as a means of communicating information
through the Web demands the usage of streaming techniques to query JSON
documents or validate document against schemas. Streaming applications
most surely will be related to APIs, in order to be able to query data fetched
from an API without resorting to store the data (for example if we are in a
mobile environment). In contrast with XML (see, e.g., [46]), we suspect that
deterministic JNL might actually be shown to be evaluated in a streaming
context with constant memory requirements when tree equality is excluded
from the language.
Acknowledgements
Reutter and Vrgoč were funded by the Millennium Institute for Foun-
dational Research on Data. Bourhis and Vrgoč were partially funded by
the STIC AMSUD project Foundations of Graph Structured Data (Fog).
Bourhis was partially funded by the DeLTA project (ANR-16-CE40-0007).
Reutter was also funded by CONICYT FONDECYT regular project number
1170866.
39
References
[1] Jayway JsonPath. https://github.com/json-path/JsonPath, 2017.
[4] Pablo Barceló, Jorge Pérez, and Juan L. Reutter. Relative expressiveness
of nested regular expressions. In Proceedings of the 6th Alberto Mendel-
zon International Workshop on Foundations of Data Management, Ouro
Preto, Brazil, June 27-30, 2012, pages 180–195, 2012.
[5] Michael Benedikt, Wenfei Fan, and Floris Geerts. XPath satisfiability
in the presence of DTDs. Journal of the ACM (JACM), 55(2):8, 2008.
[6] Bruno Bogaert and Sophie Tison. Equality and disequality constraints
on direct subterms in tree automata. In STACS 92, 9th Annual Sym-
posium on Theoretical Aspects of Computer Science, Cachan, France,
February 13-15, 1992, Proceedings, pages 161–171, 1992.
[7] Adrien Boiret, Vincent Hugot, Joachim Niehren, and Ralf Treinen. De-
terministic automata for unordered trees. In Proceedings Fifth Interna-
tional Symposium on Games, Automata, Logics and Formal Verification,
GandALF 2014, Verona, Italy, September 10-12, 2014., pages 189–202,
2014.
[9] Mikolaj Bojańczyk and Pawel Parys. Xpath evaluation in linear time.
Journal of the ACM (JACM), 58(4):17, 2011.
[10] Elena Botoeva, Diego Calvanese, Benjamin Cogrel, and Guohui Xiao.
Expressivity and complexity of mongodb queries. In 21st International
Conference on Database Theory, ICDT 2018, March 26-29, 2018, Vi-
enna, Austria, pages 9:1–9:23, 2018.
40
[11] Pierre Bourhis, Juan L Reutter, Fernando Suárez, and Domagoj Vrgoč.
Json: data model, query languages and schema specification. In PODS,
pages 123–135, 2017.
[12] Tim Bray. The JavaScript Object Notation (JSON) Data Interchange
Format. 2014.
[14] Peter Buneman, Martin Grohe, and Christoph Koch. Path queries on
compressed xml. In Proceedings of the 29th international conference on
Very large data bases-Volume 29, pages 141–152. VLDB Endowment,
2003.
[18] Wojciech Czerwiński, Wim Martens, Pawel Parys, and Marcin Przy-
bylko. The (almost) complete guide to tree pattern containment. In
Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on
Principles of Database Systems, pages 117–130. ACM, 2015.
41
[21] Diego Figueira. Reasoning on words and trees with data. (Raisonnement
sur mots et arbres avec données). PhD thesis, École normale supérieure
de Cachan, France, 2010.
[25] Georg Gottlob, Christoph Koch, and Reinhard Pichler. Efficient al-
gorithms for processing xpath queries. ACM Trans. Database Syst.,
30(2):444–491, 2005.
[26] Georg Gottlob, Christoph Koch, Reinhard Pichler, and Luc Segoufin.
The complexity of xpath query evaluation and xml typing. Journal of
the ACM (JACM), 52(2):284–335, 2005.
[27] Jan Hidders, Jan Paredaens, and Jan Van den Bussche. J-logic: Log-
ical foundations for json querying. In Proceedings of the 36th ACM
SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Sys-
tems, pages 137–149. ACM, 2017.
[29] Internet Engineering Task Force (IETF). The JavaScript Object No-
tation (JSON) Data Interchange Format. https://tools.ietf.org/
html/rfc7159, March 2014.
42
[32] Leonid Libkin, Wim Martens, and Domagoj Vrgoč. Querying graphs
with data. J. ACM, 63(2):14, 2016.
[33] Zhen Hua Liu, Beda Christoph Hammerschmidt, and Doug McMa-
hon. JSON data management: supporting schema-less development in
RDBMS. In International Conference on Management of Data, SIG-
MOD 2014, Snowbird, UT, USA, June 22-27, 2014, pages 1247–1258,
2014.
[36] Gerome Miklau and Dan Suciu. Containment and equivalence for a
fragment of xpath. J. ACM, 51(1):2–45, 2004.
[39] Frank Neven and Thomas Schwentick. Xpath containment in the pres-
ence of disjunction, dtds, and variables. In International Conference on
Database Theory, pages 315–329. Springer, 2003.
[41] Kian Win Ong, Yannis Papakonstantinou, and Romain Vernoux. The
SQL++ semi-structured data model and query language: A capabil-
ities survey of sql-on-hadoop, nosql and newsql databases. CoRR,
abs/1405.3631, 2014.
43
[42] OrientDB LTD. The OrientDB database. http://orientdb.com/,
2016.
[43] Pawel Parys. Xpath evaluation in linear time with polynomial com-
bined complexity. In Proceedings of the Twenty-Eigth ACM SIGMOD-
SIGACT-SIGART Symposium on Principles of Database Systems,
PODS 2009, June 19 - July 1, 2009, Providence, Rhode Island, USA,
pages 55–64, 2009.
[44] Felipe Pezoa, Juan L. Reutter, Fernando Suarez, Martı́n Ugarte, and
Domagoj Vrgoč. Foundations of JSON schema. In Proceedings of the
25th International Conference on World Wide Web, WWW 2016, Mon-
treal, Canada, April 11 - 15, 2016, pages 263–273, 2016.
[49] Karianto Wong and Christof Löding. Unranked tree automata with sib-
ling equalities and disequalities. In Automata, Languages and Program-
ming, 34th International Colloquium, ICALP 2007, Wroclaw, Poland,
July 9-13, 2007, Proceedings, pages 875–887, 2007.
44