Re: SQL/JSON features for v15
От | Nikita Glukhov |
---|---|
Тема | Re: SQL/JSON features for v15 |
Дата | |
Msg-id | [email protected] обсуждение исходный текст |
Ответ на | Re: SQL/JSON features for v15 (Andres Freund <[email protected]>) |
Ответы |
Re: SQL/JSON features for v15
|
Список | pgsql-hackers |
Hi,
On 16.08.2022 01:38, Andres Freund wrote:
Continuation from the thread at https://postgr.es/m/20220811171740.m5b4h7x63g4lzgrk%40awork3.anarazel.de I started hacking on this Friday. I think there's some relatively easy improvements that make the code substantially more understandable, at least for me, without even addressing the structural stuff.
I also started hacking Friday, hacked all weekend, and now have a new version of the patch. I received your message when I finished writing of mine, so I will try answer your new questions only in next message. But in short, I can say that some things like ExecEvalJsonExprSubtrans() were fixed. I took Amit's patch and tried to simplify execution further. Explanation of the patches is at the very end of message. Next, I try to answer some of previous questions. On Aug 2, 2022 at 9:39 AM Andres Freund <[email protected]> wrote: > The whole coercion stuff just seems incredibly clunky (in a > slightly different shape before this patch). > ExecEvalJsonExprItemCoercion() calls ExecPrepareJsonItemCoercion(), > which gets a pointer to one of the per-type elements in > JsonItemCoercionsState, dispatching on the type of the json > object. Then we later call ExecGetJsonItemCoercion() (via a > convoluted path), which again will dispatch on the type > (extracting the json object again afaics!), to then somehow > eventually get the coerced value. I think it might be possible > to make this a bit simpler, by not leaving anything > coercion-related in ExecEvalJsonExpr(). On 2022-08-02 12:05:55 +0900, Amit Langote wrote: > I left some pieces there, because I thought the error of not finding an > appropriate coercion must be thrown right away as the code in > ExecEvalJsonExpr() does after calling ExecGetJsonItemCoercion(). > ExecPrepareJsonItemCoercion() is called later when it's time to > actually evaluate the coercion. If we move the error path to > ExecPrepareJsonItemCoercion(), both ExecGetJsonItemCoercion() and the > error path code in ExecEvalJsonExpr() will be unnecessary. I will > give that a try. The first dispatch is done only for throwing error about missing cast without starting subtransaction in which second dispatch is executed. I agree, this is bad that result of first dispatch is not used later, and I have removed second dispatch. > I don't understand the design of what needs to have error handling, > and what not. > I don't think subtransactions per-se are a fundamental problem. > I think the error handling implementation is ridiculously complicated, > and while I started to hack on improving it, I stopped when I really > couldn't understand what errors it actually needs to handle when and > why. Here is the diagram that may help to understand error handling in SQL/JSON functions (I hope it will be displayed correctly): JSON path ------- expression \ ->+-----------+ SQL/JSON +----------+ Result PASSING args ------->| JSON path |--> item or --->| Output |-> SQL ->| executor | JSONB .->| Coercion | value / +-----------+ datum | +----------+ JSON + - - - -+ | | | | Context ->: FORMAT : v v | v item : JSON : error? empty? | error? + - - - -+ | | | | | | +----------+ | / v | | ON EMPTY |--> SQL --' / error? | +----------+ value / | | | / \ | v / \ \ error? / \ \ | / \______ \ | _____________/ \ \ | / v v v v +----------+ +----------+ | Output | Result | ON ERROR |--->| Coercion |--> SQL +----------+ +----------+ value | | V V EXCEPTION EXCEPTION The first dashed box "FORMAT JSON" used for parsing JSON is absent in our implementation, because we support only jsonb type which is pre-parsed. This could be used in queries like that: JSON_VALUE('invalid json', '$' DEFAULT 'error' ON ERROR) => 'error' JSON path executor already has error handling and does not need subtransactions. We had to add functions like numeric_add_opt_error() which return error flag instead of throwing exceptions. On Aug 10, 2022 at 3:57 AM Andres Freund <[email protected]> wrote: >>> One way this code could be drastically simplified is to force all >>> type-coercions to go through the "io coercion" path, which could be >>> implemented as a single execution step (which thus could trivially >>> start/finish a subtransaction) and would remove a lot of the >>> complicated code around coercions. >> Could you please clarify how you think we might do the io coercion >> wrapped with a subtransaction all as a single execution step? I >> would've thought that we couldn't do the sub-transaction without >> leaving ExecInterpExpr() anyway, so maybe you meant the io coercion >> itself was done using some code outside ExecInterpExpr()? >> The current JsonExpr code does it by recursively calling >> ExecInterpExpr() using the nested ExprState expressly for the >> coercion. > The basic idea is to rip out all the type-dependent stuff out and > replace it with a single JSON_IOCERCE step, which has a parameter > about whether to wrap things in a subtransaction or not. That step > would always perform the coercion by calling the text output function > of the input and the text input function of the output. On Aug 3, 2022 at 12:00 AM Andres Freund <[email protected]> wrote: > But we don't need to wrap arbitrary evaluation in a subtransaction - > afaics the coercion calls a single function, not an arbitrary > expression? SQL standard says that scalar SQL/JSON items are converted to SQL type through CAST(corresponding_SQL_type_for_item AS returning_type). Our JSON_VALUE implementation supports arbitrary output types that can have specific CASTs from numeric, bool, datetime, which we can't emulate with simple I/O coercion. But supporting of arbitrary types may be dangerous, because SQL standard denotes only a limited set of types: The <data type> contained in the explicit or implicit <JSON returning clause> shall be a <predefined type> that identifies a character string data type, numeric data type, boolean data type, or datetime data type. I/O coercion will not even work in the following simple case: JSON_VALUE('1.23', '$' RETURNING int) It is expected to return 1::int, like ordinary cast 1.23::numeric::int. Exceptions may come not only from coercions. Expressions in DEFAULT ON EMPTY can also throw exceptions, which also must be handled. Here is excerpt from ISO/IEC 19075-6:2021(E) "Part 6: Support for JSON", which explains SQL standard features in human-readable manner: 6.4.3 JSON_VALUE: <JSON value error behavior> specifies what to do if there is an unhandled error. Unhandled errors can arise if there is an input conversion error (for example, if the context item cannot be parsed), an error returned by the SQL/JSON path engine, or an output conversion error. The choices are the same as for <JSON value empty behavior>. When using DEFAULT <value expression> for either the empty or error behavior, what happens if the <value expression> raises an exception? The answer is that an error during empty behavior "falls through" to the error behavior. If the error behavior itself has an error, there is no further recourse but to raise the exception. So, we need to support execution of arbitrary expressions inside a subtransaction, and do not try to somehow simplify coercions. In Amit's fix, wrapping DEFAULT ON EMPTY into subtransactions was lost, mainly because there were no tests for this case. The following test should not fall on the second row: SELECT JSON_VALUE(jsonb '1', '$.a' RETURNING int DEFAULT 1 / x ON EMPTY DEFAULT 2 ON ERROR) FROM (VALUES (1::int), (0)) x(x); json_value ------------ 1 2 I have added this test in 0003. On Aug 3, 2022 at 12:00 AM Andres Freund <[email protected]> wrote: > On 2022-08-02 12:05:55 +0900, Amit Langote wrote: >> I am not really sure if different coercions may be used >> in the same query over multiple evaluations of the same JSON path >> expression, but maybe that's also possible. > Even if the type can change, I don't think that means we need to have > space for multiple types at the same time - there can't be multiple > coercions happening at the same time, otherwise there could be two > coercions of the same type as well. So we don't need memory for > every coercion type. Only the one item coercion is used in execution of JSON_VALUE(). Multiple coercions could be executed, if we supported quite useful SRF JSON_QUERY() using "RETURNING SETOF type" (I had this idea for a long time, but I didn't dare to implement it). I don't understand what "memory" you mean. If we will not emit all possible expressions statically, we will need to generate them dynamically at run-time, and this could be hardly acceptable. In the last version of the fix there is only 4 bytes (int jump) of additional state space per coercion. On Aug 6, 2022 at 5:37 Andres Freund <[email protected]> wrote: > There's one layer of subtransactions in one of the paths in > ExecEvalJsonExpr(), another in ExecEvalJson(). Some paths of > ExecEvalJsonExpr() go through subtransactions, others don't. Really, there is only one level of subtransactions. Outer subtransaction may be used for FORMAT JSON handling which always requires subtransaction at the beginning of expression execution. Inner subtransactions are conditional, they are started only and when there is no outer subtransaction. Now, outer subtransactions are not used at all, ExecEvalJsonNeedsSubtransaction(NULL) always returns false. (AFAIR, FORMAT JSON was present in older version of SQL/JSON patches, then it was removed, but outer subtransactions were not). In the last version of the fix I have removed them completely and moved inner subtransactions into a separate executor step (see below). The description of the patches: 0001 - Fix returning of json[b] domains in JSON_VALUE() (This may require a separate thread.) I found a bug in returning json[b] domains in JSON_VALUE(). json[b] has special processing in JSON_VALUE, bypassing oridinary SQL/JSON item type => SQL type coercions. But json[b] domains miss this processing: CREATE DOMAIN jsonb_not_null AS jsonb NOT NULL; SELECT JSON_VALUE('"123"', '$' RETURNING jsonb); "123" SELECT JSON_VALUE( '123', '$' RETURNING jsonb); 123 SELECT JSON_VALUE('"123"', '$' RETURNING jsonb_not_null); 123 SELECT JSON_VALUE( '123', '$' RETURNING jsonb_not_null ERROR ON ERROR); ERROR: SQL/JSON item cannot be cast to target type Fixed by examinating output base type in parse_expr.c and skipping allocation of item coercions, what later will be a signal for special processing in ExecEvalJsonExpr(). 0002 - Add EEOP_SUBTRANS executor step On 2022-08-02 12:05:55 +0900, Amit Langote wrote: > So, the problem with inlining coercion evaluation into the main parent > JsonExpr's is that it needs to be wrapped in a sub-transaction to > catch any errors and return NULL instead. I don't know a way to wrap > ExprEvalStep evaluation in a sub-transaction to achieve that effect. I also don't know way to run subtransactions without recursion in executor, but I still managed to elimiate subsidary ExprStates. I have introduced new EEOP_SUBTRANS step which executes its subsequent steps in a subtransaction. It recursively calls a new variant of ExecInterpExpr() in which starting stepno is passed. The return from subtransaction is done with EEOP_DONE step that emitted after subexpression. This step can be reused for other future expressions, that's why it has no JSON prefix in its name (you could see recent message in the thread about casts with default values, which are missing in PostgreSQL). But for JIT I still had to construct additional ExprState with a function compiled from subexpression steps. 0003 - Simplify JsonExpr execution: - New EEOP_SUBTRANS was used to wrap individual coercion expressions: after execution it jumps to "done" or "onerror" step - JSONEXPR_ITEM_COERCE step was removed - JSONEXPR_COERCE split into JSONEXPR_IOCOERCE and JSONEXPR_POPULATE - Removed all JsonExprPostEvalState - JSONEXPR step simply returns jump address to one of its possible continuations: done, onempty, onerror, coercion, coercion_subtrans, io_coercion or one of item_coercions - Fixed JsonExprNeedsSubTransaction(): considired more cases - Eliminated transactions on Const expressions
-- Nikita Glukhov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Вложения
В списке pgsql-hackers по дате отправления: