Skip to content

Commit e02d44b

Browse files
committed
Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that operate on datums containing JSON arrays -- only some operators supported negative array count-from-the-end subscripting. Specifically, only a new-to-9.5 jsonb deletion operator had support (the new "jsonb - integer" operator). This inconsistency seemed likely to be counter-intuitive to users. To fix, allow all places where the user can supply an integer subscript to accept a negative subscript value, including path-orientated operators and functions, as well as other extraction operators. This will need to be called out as an incompatibility in the 9.5 release notes, since it's possible that users are relying on certain established extraction operators changed here yielding NULL in the event of a negative subscript. For the json type, this requires adding a way of cheaply getting the total JSON array element count ahead of time when parsing arrays with a negative subscript involved, necessitating an ad-hoc lex and parse. This is followed by a "conversion" from a negative subscript to its equivalent positive-wise value using the count. From there on, it's as if a positive-wise value was originally provided. Note that there is still a minor inconsistency here across jsonb deletion operators. Unlike the aforementioned new "-" deletion operator that accepts an integer on its right hand side, the new "#-" path orientated deletion variant does not throw an error when it appears like an array subscript (input that could be recognized by as an integer literal) is being used on an object, which is wrong-headed. The reason for not being stricter is that it could be the case that an object pair happens to have a key value that looks like an integer; in general, these two possibilities are impossible to differentiate with rhs path text[] argument elements. However, we still don't allow the "#-" path-orientated deletion operator to perform array-style subscripting. Rather, we just return the original left operand value in the event of a negative subscript (which seems analogous to how the established "jsonb/json #> text[]" path-orientated operator may yield NULL in the event of an invalid subscript). In passing, make SetArrayPath() stricter about not accepting cases where there is trailing non-numeric garbage bytes rather than a clean NUL byte. This means, for example, that strings like "10e10" are now not accepted as an array subscript of 10 by some new-to-9.5 path-orientated jsonb operators (e.g. the new #- operator). Finally, remove dead code for jsonb subscript deletion; arguably, this should have been done in commit b81c7b4. Peter Geoghegan and Andrew Dunstan
1 parent 0fc94a5 commit e02d44b

File tree

10 files changed

+231
-29
lines changed

10 files changed

+231
-29
lines changed

doc/src/sgml/func.sgml

+12-4
Original file line numberDiff line numberDiff line change
@@ -10177,7 +10177,8 @@ table2-mapping
1017710177
<row>
1017810178
<entry><literal>-&gt;</literal></entry>
1017910179
<entry><type>int</type></entry>
10180-
<entry>Get JSON array element (indexed from zero)</entry>
10180+
<entry>Get JSON array element (indexed from zero, negative
10181+
integers count from the end)</entry>
1018110182
<entry><literal>'[{"a":"foo"},{"b":"bar"},{"c":"baz"}]'::json-&gt;2</literal></entry>
1018210183
<entry><literal>{"c":"baz"}</literal></entry>
1018310184
</row>
@@ -10230,7 +10231,10 @@ table2-mapping
1023010231
returning <type>text</>, which coerce the value to text.
1023110232
The field/element/path extraction operators return NULL, rather than
1023210233
failing, if the JSON input does not have the right structure to match
10233-
the request; for example if no such element exists.
10234+
the request; for example if no such element exists. The
10235+
field/element/path extraction operators that accept integer JSON
10236+
array subscripts all support negative subscripting from the end of
10237+
arrays.
1023410238
</para>
1023510239
</note>
1023610240
<para>
@@ -10318,7 +10322,8 @@ table2-mapping
1031810322
<row>
1031910323
<entry><literal>#-</literal></entry>
1032010324
<entry><type>text[]</type></entry>
10321-
<entry>Delete the field or element with specified path</entry>
10325+
<entry>Delete the field or element with specified path (for
10326+
JSON arrays, negative integers count from the end)</entry>
1032210327
<entry><literal>'["a", {"b":1}]'::jsonb #- '{1,b}'</literal></entry>
1032310328
</row>
1032410329
</tbody>
@@ -10858,6 +10863,9 @@ table2-mapping
1085810863
<replaceable>create_missing</replaceable> is true ( default is
1085910864
<literal>true</>) and the item
1086010865
designated by <replaceable>path</replaceable> does not exist.
10866+
As with the path orientated operators, negative integers that
10867+
appear in <replaceable>path</replaceable> count from the end
10868+
of JSON arrays.
1086110869
</entry>
1086210870
<entry><para><literal>jsonb_set('[{"f1":1,"f2":null},2,null,3]', '{0,f1}','[2,3,4]', false)</literal>
1086310871
</para><para><literal>jsonb_set('[{"f1":1,"f2":null},2]', '{0,f3}','[2,3,4]')</literal>
@@ -10872,7 +10880,7 @@ table2-mapping
1087210880
<entry><para><type>text</type></para></entry>
1087310881
<entry>
1087410882
Returns <replaceable>from_json</replaceable>
10875-
as indented json text.
10883+
as indented JSON text.
1087610884
</entry>
1087710885
<entry><literal>jsonb_pretty('[{"f1":1,"f2":null},2,null,3]')</literal></entry>
1087810886
<entry>

src/backend/utils/adt/json.c

+39
Original file line numberDiff line numberDiff line change
@@ -340,6 +340,45 @@ pg_parse_json(JsonLexContext *lex, JsonSemAction *sem)
340340

341341
}
342342

343+
/*
344+
* json_count_array_elements
345+
*
346+
* Returns number of array elements in lex context at start of array token
347+
* until end of array token at same nesting level.
348+
*
349+
* Designed to be called from array_start routines.
350+
*/
351+
int
352+
json_count_array_elements(JsonLexContext *lex)
353+
{
354+
JsonLexContext copylex;
355+
int count;
356+
357+
/*
358+
* It's safe to do this with a shallow copy because the lexical routines
359+
* don't scribble on the input. They do scribble on the other pointers etc,
360+
* so doing this with a copy makes that safe.
361+
*/
362+
memcpy(&copylex, lex, sizeof(JsonLexContext));
363+
copylex.strval = NULL; /* not interested in values here */
364+
copylex.lex_level++;
365+
366+
count = 0;
367+
lex_expect(JSON_PARSE_ARRAY_START, &copylex, JSON_TOKEN_ARRAY_START);
368+
if (lex_peek(&copylex) != JSON_TOKEN_ARRAY_END)
369+
{
370+
do
371+
{
372+
count++;
373+
parse_array_element(&copylex, &nullSemAction);
374+
}
375+
while (lex_accept(&copylex, JSON_TOKEN_COMMA, NULL));
376+
}
377+
lex_expect(JSON_PARSE_ARRAY_NEXT, &copylex, JSON_TOKEN_ARRAY_END);
378+
379+
return count;
380+
}
381+
343382
/*
344383
* Recursive Descent parse routines. There is one for each structural
345384
* element in a json document:

src/backend/utils/adt/jsonfuncs.c

+75-25
Original file line numberDiff line numberDiff line change
@@ -597,6 +597,17 @@ jsonb_array_element(PG_FUNCTION_ARGS)
597597
if (!JB_ROOT_IS_ARRAY(jb))
598598
PG_RETURN_NULL();
599599

600+
/* Handle negative subscript */
601+
if (element < 0)
602+
{
603+
uint32 nelements = JB_ROOT_COUNT(jb);
604+
605+
if (-element > nelements)
606+
PG_RETURN_NULL();
607+
else
608+
element += nelements;
609+
}
610+
600611
v = getIthJsonbValueFromContainer(&jb->root, element);
601612
if (v != NULL)
602613
PG_RETURN_JSONB(JsonbValueToJsonb(v));
@@ -629,6 +640,17 @@ jsonb_array_element_text(PG_FUNCTION_ARGS)
629640
if (!JB_ROOT_IS_ARRAY(jb))
630641
PG_RETURN_NULL();
631642

643+
/* Handle negative subscript */
644+
if (element < 0)
645+
{
646+
uint32 nelements = JB_ROOT_COUNT(jb);
647+
648+
if (-element > nelements)
649+
PG_RETURN_NULL();
650+
else
651+
element += nelements;
652+
}
653+
632654
v = getIthJsonbValueFromContainer(&jb->root, element);
633655
if (v != NULL)
634656
{
@@ -719,7 +741,7 @@ get_path_all(FunctionCallInfo fcinfo, bool as_text)
719741
/*
720742
* we have no idea at this stage what structure the document is so
721743
* just convert anything in the path that we can to an integer and set
722-
* all the other integers to -1 which will never match.
744+
* all the other integers to INT_MIN which will never match.
723745
*/
724746
if (*tpath[i] != '\0')
725747
{
@@ -728,13 +750,13 @@ get_path_all(FunctionCallInfo fcinfo, bool as_text)
728750

729751
errno = 0;
730752
ind = strtol(tpath[i], &endptr, 10);
731-
if (*endptr == '\0' && errno == 0 && ind <= INT_MAX && ind >= 0)
753+
if (*endptr == '\0' && errno == 0 && ind <= INT_MAX && ind >= INT_MIN)
732754
ipath[i] = (int) ind;
733755
else
734-
ipath[i] = -1;
756+
ipath[i] = INT_MIN;
735757
}
736758
else
737-
ipath[i] = -1;
759+
ipath[i] = INT_MIN;
738760
}
739761

740762
result = get_worker(json, tpath, ipath, npath, as_text);
@@ -752,14 +774,15 @@ get_path_all(FunctionCallInfo fcinfo, bool as_text)
752774
*
753775
* json: JSON object (in text form)
754776
* tpath[]: field name(s) to extract
755-
* ipath[]: array index(es) (zero-based) to extract
777+
* ipath[]: array index(es) (zero-based) to extract, accepts negatives
756778
* npath: length of tpath[] and/or ipath[]
757779
* normalize_results: true to de-escape string and null scalars
758780
*
759781
* tpath can be NULL, or any one tpath[] entry can be NULL, if an object
760782
* field is not to be matched at that nesting level. Similarly, ipath can
761-
* be NULL, or any one ipath[] entry can be -1, if an array element is not
762-
* to be matched at that nesting level.
783+
* be NULL, or any one ipath[] entry can be INT_MIN if an array element is
784+
* not to be matched at that nesting level (a json datum should never be
785+
* large enough to have -INT_MIN elements due to MaxAllocSize restriction).
763786
*/
764787
static text *
765788
get_worker(text *json,
@@ -964,6 +987,17 @@ get_array_start(void *state)
964987
*/
965988
_state->result_start = _state->lex->token_start;
966989
}
990+
991+
/* INT_MIN value is reserved to represent invalid subscript */
992+
if (_state->path_indexes[lex_level] < 0 &&
993+
_state->path_indexes[lex_level] != INT_MIN)
994+
{
995+
/* Negative subscript -- convert to positive-wise subscript */
996+
int nelements = json_count_array_elements(_state->lex);
997+
998+
if (-_state->path_indexes[lex_level] <= nelements)
999+
_state->path_indexes[lex_level] += nelements;
1000+
}
9671001
}
9681002

9691003
static void
@@ -1209,9 +1243,30 @@ get_jsonb_path_all(FunctionCallInfo fcinfo, bool as_text)
12091243
errno = 0;
12101244
lindex = strtol(indextext, &endptr, 10);
12111245
if (endptr == indextext || *endptr != '\0' || errno != 0 ||
1212-
lindex > INT_MAX || lindex < 0)
1246+
lindex > INT_MAX || lindex < INT_MIN)
12131247
PG_RETURN_NULL();
1214-
index = (uint32) lindex;
1248+
1249+
if (lindex >= 0)
1250+
{
1251+
index = (uint32) lindex;
1252+
}
1253+
else
1254+
{
1255+
/* Handle negative subscript */
1256+
uint32 nelements;
1257+
1258+
/* Container must be array, but make sure */
1259+
if ((container->header & JB_FARRAY) == 0)
1260+
elog(ERROR, "not a jsonb array");
1261+
1262+
nelements = container->header & JB_CMASK;
1263+
1264+
if (-lindex > nelements)
1265+
PG_RETURN_NULL();
1266+
else
1267+
index = nelements + lindex;
1268+
}
1269+
12151270
jbvp = getIthJsonbValueFromContainer(container, index);
12161271
}
12171272
else
@@ -3411,10 +3466,8 @@ jsonb_delete_idx(PG_FUNCTION_ARGS)
34113466
it = JsonbIteratorInit(&in->root);
34123467

34133468
r = JsonbIteratorNext(&it, &v, false);
3414-
if (r == WJB_BEGIN_ARRAY)
3415-
n = v.val.array.nElems;
3416-
else
3417-
n = v.val.object.nPairs;
3469+
Assert (r == WJB_BEGIN_ARRAY);
3470+
n = v.val.array.nElems;
34183471

34193472
if (idx < 0)
34203473
{
@@ -3431,14 +3484,10 @@ jsonb_delete_idx(PG_FUNCTION_ARGS)
34313484

34323485
while ((r = JsonbIteratorNext(&it, &v, true)) != 0)
34333486
{
3434-
if (r == WJB_ELEM || r == WJB_KEY)
3487+
if (r == WJB_ELEM)
34353488
{
34363489
if (i++ == idx)
3437-
{
3438-
if (r == WJB_KEY)
3439-
JsonbIteratorNext(&it, &v, true); /* skip value */
34403490
continue;
3441-
}
34423491
}
34433492

34443493
res = pushJsonbValue(&state, r, r < WJB_BEGIN_ARRAY ? &v : NULL);
@@ -3657,7 +3706,7 @@ IteratorConcat(JsonbIterator **it1, JsonbIterator **it2,
36573706
* If newval is null, the element is to be removed.
36583707
*
36593708
* If create is true, we create the new value if the key or array index
3660-
* does not exist. All path elemnts before the last must already exist
3709+
* does not exist. All path elements before the last must already exist
36613710
* whether or not create is true, or nothing is done.
36623711
*/
36633712
static JsonbValue *
@@ -3818,7 +3867,8 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool *path_nulls,
38183867

38193868
errno = 0;
38203869
lindex = strtol(c, &badp, 10);
3821-
if (errno != 0 || badp == c || lindex > INT_MAX || lindex < INT_MIN)
3870+
if (errno != 0 || badp == c || *badp != '\0' || lindex > INT_MAX ||
3871+
lindex < INT_MIN)
38223872
idx = nelems;
38233873
else
38243874
idx = lindex;
@@ -3829,7 +3879,7 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool *path_nulls,
38293879
if (idx < 0)
38303880
{
38313881
if (-idx > nelems)
3832-
idx = -1;
3882+
idx = INT_MIN;
38333883
else
38343884
idx = nelems + idx;
38353885
}
@@ -3838,12 +3888,12 @@ setPathArray(JsonbIterator **it, Datum *path_elems, bool *path_nulls,
38383888
idx = nelems;
38393889

38403890
/*
3841-
* if we're creating, and idx == -1, we prepend the new value to the array
3842-
* also if the array is empty - in which case we don't really care what
3843-
* the idx value is
3891+
* if we're creating, and idx == INT_MIN, we prepend the new value to the
3892+
* array also if the array is empty - in which case we don't really care
3893+
* what the idx value is
38443894
*/
38453895

3846-
if ((idx == -1 || nelems == 0) && create && (level == path_len - 1))
3896+
if ((idx == INT_MIN || nelems == 0) && create && (level == path_len - 1))
38473897
{
38483898
Assert(newval != NULL);
38493899
addJsonbToParseState(st, newval);

src/include/utils/jsonapi.h

+7
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,13 @@ typedef struct JsonSemAction
103103
*/
104104
extern void pg_parse_json(JsonLexContext *lex, JsonSemAction *sem);
105105

106+
/*
107+
* json_count_array_elements performs a fast secondary parse to determine the
108+
* number of elements in passed array lex context. It should be called from an
109+
* array_start action.
110+
*/
111+
extern int json_count_array_elements(JsonLexContext *lex);
112+
106113
/*
107114
* constructors for JsonLexContext, with or without strval element.
108115
* If supplied, the strval element will contain a de-escaped version of

src/test/regress/expected/json.out

+14
Original file line numberDiff line numberDiff line change
@@ -569,6 +569,14 @@ WHERE json_type = 'array';
569569
"two"
570570
(1 row)
571571

572+
SELECT test_json -> -1
573+
FROM test_json
574+
WHERE json_type = 'array';
575+
?column?
576+
----------
577+
{"f1":9}
578+
(1 row)
579+
572580
SELECT test_json -> 2
573581
FROM test_json
574582
WHERE json_type = 'object';
@@ -698,6 +706,12 @@ select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 1;
698706

699707
(1 row)
700708

709+
select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> -1;
710+
?column?
711+
----------
712+
713+
(1 row)
714+
701715
select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 'z';
702716
?column?
703717
----------

src/test/regress/expected/json_1.out

+14
Original file line numberDiff line numberDiff line change
@@ -569,6 +569,14 @@ WHERE json_type = 'array';
569569
"two"
570570
(1 row)
571571

572+
SELECT test_json -> -1
573+
FROM test_json
574+
WHERE json_type = 'array';
575+
?column?
576+
----------
577+
{"f1":9}
578+
(1 row)
579+
572580
SELECT test_json -> 2
573581
FROM test_json
574582
WHERE json_type = 'object';
@@ -698,6 +706,12 @@ select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 1;
698706

699707
(1 row)
700708

709+
select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> -1;
710+
?column?
711+
----------
712+
713+
(1 row)
714+
701715
select '{"a": [{"b": "c"}, {"b": "cc"}]}'::json -> 'z';
702716
?column?
703717
----------

0 commit comments

Comments
 (0)