SlideShare a Scribd company logo
JSONB in PostgreSQL
Working with JSON in PostgreSQL vs. MongoDB
Dharshan Rangegowda
Founder, ScaleGrid.io | @dharshanrg
What is JSON?
● JSON stands for Javascript object
notation.
● Open standard format RFC 7159.
● Most popular format to store and
exchange documents.
Working with JSON in PostgreSQL vs. MongoDB
Why does PostgreSQL need to care about JSON?
• Schema flexibility
• Dealing with transient or changing columns.
• Nested objects
• Might not need to deserialize to query.
• Handling objects from other systems
• E.g. Stripe transaction
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL + JSON Timeline
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL JSON Support
• Wave 1: PostgreSQL 9.2 (2012) added support for the JSON datatype
• Text field with JSON validation
• No index support
• Wave 2: PostgreSQL 9.4 (2014) added support for JSONB datatype
• Binary data structure to store JSON
• Index support
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL JSON Support
• Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard
• JSONPath support
• Powerful query and projection engine for JSON data
• Further improvements to JSONPath in PostgreSQL 13
• JSON roadmap
Working with JSON in PostgreSQL vs. MongoDB
JSON vs. JSONB
• JSONB is what you should be using (in most cases)
• However, there are some scenarios where JSON is useful:
• JSON preserves the original formatting (a.k.a whitespace)
• JSON preserves ordering of the keys
• JSON preserves duplicate keys
• JSON is faster to ingest vs. JSONB
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● What is the best way to use JSONB?
○ Do we even need columns any more?
○ Why not just use <int id, jsonb data>?
● JSONB has some high-level limitations you need to
be aware of:
○ Statistics collection
○ Storage bloat
● Commonly occurring fields should be stored as
columns.
○ Use JSONB for variable or intermittent columns.
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● PostgreSQL collect stats on column data distribution
○ Most common values (MCV)
○ Fraction of null values
○ Histogram of distribution
● No column statistics collected for JSONB
○ Query planner doesn’t have stats to make smart decisions
○ Could make wrong choice – cross join vs hash join etc
● More details in blog post - When To Avoid JSONB
In A PostgreSQL Schema
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● Storage bloat
○ Keys are stored in the data (Similar to MongoDB mmapv1)
○ Use smaller key names to reduce footprint
○ Relies on TOAST compression
○ Sample table with 1M rows (11GB of data)
○ PostgreSQL - 8.7 GB
○ MongoDB Snappy – 8GB, Zlib – 5.3 GB
Working with JSON in PostgreSQL vs. MongoDB
JSONB & TOAST
● If the size of your column exceeds the
TOAST_TUPLE_THRESHOLD (2KB default) data
could be moved to out of line storage - TOAST
● TOAST also provides compression (pglz)
○ Decent Compression
○ MongoDB WiredTiger snappy/zlib is potentially better
● To access the data it needs to be De’TOASTed
○ Could result in performance overhead
Working with JSON in PostgreSQL vs. MongoDB
JSONB Data Structures
Working with JSON in PostgreSQL vs. MongoDB
Images courtesy: https://erthalion.info/2017/12/21/advanced-json-benchmarks/
BSON Data Structures
Working with JSON in PostgreSQL vs. MongoDB
JSONB Operators
Working with JSON in PostgreSQL vs. MongoDB
Operator Description
->, ->> Get JSON object field by key
@>, <@ Does the left JSONB value contain the right JSONB path/value entries at
the top level?
?, ?!, ?& Does the string exist as a top-level key within the JSON value?
@@, @@> JSONPath operators
Full list of operators can be found in the docs – JSONB op table
JSONB Functions
• PostgreSQL provides a wide variety of functions to create and process
JSON data
• Creation functions
• Processing functions
Working with JSON in PostgreSQL vs. MongoDB
MongoDB Query language
• Query language based on JSON syntax
• db.books.find( {} ) , db.books.find( { publisher: "D" } )
• Array operators
• db.books.find( { tags: ["red", "blank"] } )
• AND and OR operators
• db.books.find( { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )
Working with JSON in PostgreSQL vs. MongoDB
MongoDB Query language
• Query nested documents
• db.books.find( { "size.uom": "in" } )
• Query an Array of objects
• db.books.find( { 'instock.qty': { $lte: 20 } } ))
• Project fields to return from query
• db.books.find( {prints: 1}, { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes
• JSONB provides a wide array of options to index your JSON data.
• We are going to dig into three types of indexes:
• GIN
• BTREE
• HASH
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes : GIN
• GIN stands for “Generalized
Inverted Indexes”
• GIN supports two operator classes
• jsonb_ops
• ?, ?|, ?&, @>, @@, @?
• [Index each key and value]
• jsonb_pathops
• @>, @@, @?
• [Index only the values]
Copyright © ScaleGrid.io
JSON sample data
Fictional book database (apologies to any librarians in the audience):
Working with JSON in PostgreSQL vs. MongoDB
demo=# d+ books
Table "public.books"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | not null | | plain | |
author | character varying(255) | | | | extended | |
isbn | character varying(25) | | | | extended | |
rating | integer | | | | plain | |
data | jsonb | | | | extended | |
Indexes:
"books_pkey" PRIMARY KEY, btree (id)
…..
JSON sample data
Working with JSON in PostgreSQL vs. MongoDB
demo=# select jsonb_pretty(data) from books where id = 1000021;
jsonb_pretty
------------------------------------
{ +
"tags": { +
"nk906585": { +
"ik844766": "iv364087"+
} +
}, +
"prints": [ +
{ +
"price": 100, +
"style": "hc" +
}, +
{ +
"price": 50, +
"style": "pb" +
} +
], +
"braille": false, +
"keywords": [ +
"abc", +
"kef", +
"keh" +
], +
"hardcover": true, +
"publisher": "nVxJVA8Bwx", +
"criticrating": 2 +
}
JSONB Indexes: GIN - ?
Find all books that are available in braille? Let’s create the GIN index on the ‘data’ JSONB column:
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX datagin ON books USING gin (data);
demo=# select * from books where data ? 'braille';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
.....
demo=# explain analyze select * from books where data ? 'braille';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1)
Recheck Cond: (data ? 'braille'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1)
Index Cond: (data ? 'braille'::text)
Planning Time: 0.102 ms
Execution Time: 0.067 ms
(7 rows)
JSONB Indexes: GIN - ?
What if we wanted to find books that were in braille or in hardcover?
Working with JSON in PostgreSQL vs. MongoDB
demo=# explain analyze select * from books where data ?| array['braille','hardcover'];
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.029..0.035 rows=15 loops=1)
Recheck Cond: (data ?| '{braille,hardcover}'::text[])
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.023..0.023 rows=15 loops=1)
Index Cond: (data ?| '{braille,hardcover}'::text[])
Planning Time: 0.138 ms
Execution Time: 0.057 ms
(7 rows)
JSONB Indexes: GIN
GIN index supports the “existence” operators only on “top level” keys. If the key is not at the top level, then
the index will not be used.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data->'tags' ? 'nk455671';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1)
Filter: ((data -> 'tags'::text) ? 'nk455671'::text)
Rows Removed by Filter: 1000017
Planning Time: 0.078 ms
Execution Time: 270.728 ms
(5 rows)
JSONB Indexes: GIN
The way to check for existence in nested docs is to use “Expression indexes”. Let’s create an index on
data->tags:
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX datatagsgin ON books USING gin (data->'tags');
demo=# select * from books where data->'tags' ? 'nk455671';
id | author | isbn | rating | data
---------+-----------------+------------+--------+-----------------------------------------------------------------------------------------------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1007.75 rows=1000 width=158) (actual time=0.031..0.035 rows=2 loops=1)
Recheck Cond: ((data ->'tags'::text) ? 'nk455671'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1)
Index Cond: ((data ->'tags'::text) ? 'nk455671'::text)
Planning Time: 0.098 ms
Execution Time: 0.061 ms
(7 rows)
JSONB Indexes: GIN - @>
The “path” operator can be used for multi-level queries of your JSON data. Let’s use it similar to the ?
operator.
Working with JSON in PostgreSQL vs. MongoDB
select * from books where data @> '{"braille":true}'::jsonb;
demo=# explain analyze select * from books where data @> '{"braille":true}'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.040..0.048 rows=6 loops=1)
Recheck Cond: (data @> '{"braille": true}'::jsonb)
Rows Removed by Index Recheck: 9
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.030..0.030 rows=15 loops=1)
Index Cond: (data @> '{"braille": true}'::jsonb)
Planning Time: 0.100 ms
Execution Time: 0.076 ms
(8 rows)
JSONB Indexes: GIN - @>
The "path" operator can be used for multi level queries of your JSON data.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.491..0.492 rows=1 loops=1)
Recheck Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.092..0.092 rows=1 loops=1)
Index Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)
Planning Time: 0.090 ms
Execution Time: 0.523 ms
JSONB Indexes: GIN - @>
The JSON queries can be nested to many levels. You can also use the ># operation but GIN does not
support it.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
(1 row)
JSONB Indexes: GIN - jsonb_pathops
GIN also supports a “pathops” option to reduce the size of the GIN index.
From the docs:
“The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates
independent index items for each key and value in the data, while the latter creates index items only for
each value in the data.”
On my small dataset of 1M books, you can see that the pathops GIN index is smaller – you should test
with your dataset to understand the savings.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops);
public | dataginpathops | index | sgpostgres | books | 67 MB |
public | datatagsgin | index | sgpostgres | books | 84 MB |
JSONB Indexes: GIN - jsonb_pathops
Let’s rerun our query from before with the pathops index:
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
(1 row)
demo=# explain select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158)
Recheck Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
-> Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0)
Index Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
(4 rows)
JSONB Indexes: GIN - jsonb_pathops
The “jsonb_pathops” option supports only the @> operator.
Smaller index but more limited scenarios.
The following queries below can no longer leverage the GIN index:
Working with JSON in PostgreSQL vs. MongoDB
select * from books where data ? 'tags'; => Sequential scan
select * from books where data @> '{"tags" :{}}'; => Sequential scan
select * from books where data @> '{"tags" :{"k7888":{}}}' => Sequential scan
JSONB Indexes: B-tree
• B-tree indexes are the most common index type in relational databases.
• If you index an entire JSONB column with a B-tree index, the only useful
operators are the comparison operators:
• =, <, <=, >, >=
• Can be used only for whole object comparisons.
• Very limited use case.
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes: B-tree
• Use B-tree “Expression indexes”
• B-tree expression indexes can support the common comparison operators '=', '<', '>', '>=', '<=‘ (which
GIN doesn't support).
• Retrieve all books with a data->criticrating > 4.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data->'criticrating' > 4;
ERROR: operator does not exist: jsonb >= integer
LINE 1: select * from books where data->'criticrating’ > 4;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
#Lets cast JSONB to integer
demo=# select * from books where (data->'criticrating')::int4 > 4;
#If you are using a version prior to pg11 you need to query as text and then cast
demo=# select * from books where (data->>'criticrating')::int4 > 4;
JSONB Indexes: B-tree
For expression indexes, the index needs to be an exact match with the query expression:
Working with JSON in PostgreSQL vs. MongoDB
demo=# CREATE INDEX criticrating ON books USING BTREE (((data->'criticrating')::int4));
CREATE INDEX
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)
1
From above we can see that the BTREE index is being used as expected.
JSONB Indexes: HASH
• If you are only interested in the "=" operator, then Hash indexes become interesting.
• Hash indexes tend to be smaller than B-tree indexes.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX publisherhash ON books USING HASH ((data->'publisher'));
demo=# select * from books where data->'publisher' = 'XlekfkLOtL'
demo-# ;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data->'publisher' = 'XlekfkLOtL';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: ((data -> 'publisher'::text) = 'XlekfkLOtL'::text)
Planning Time: 0.080 ms
Execution Time: 0.035 ms
(4 rows)
JSONB Indexes: GIN - Trigram
• PostgreSQL supports string matching using Trigram indexes.
• Trigrams are basically words broken up into sequences of 3 letters.
• We can search for any arbitrary regex (not just left anchored).
Working with JSON in PostgreSQL vs. MongoDB
CREATE EXTENSION pg_trgm;
CREATE INDEX publisher ON books USING GIN ((data->'publisher') gin_trgm_ops);
demo=# select * from books where data->'publisher' LIKE '%I0UB%';
id | author | isbn | rating | data
----+-----------------+------------+--------+---------------------------------------------------------------------------------
4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm | 0 | {"tags": {"nk3": {"ik1": "iv1"}}, "publisher": "MI0UBqZJDt", "criticrating": 1}
(1 row)
demo=# explain analyze select * from books where data->'publisher' LIKE '%I0UB%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=9.78..111.28 rows=100 width=158) (actual time=0.033..0.033 rows=1 loops=1)
Recheck Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on publisher (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1)
Index Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
Planning Time: 0.213 ms
Execution Time: 0.058 ms
(7 rows)
JSONB Indexes: GIN - Arrays
• GIN indexes are great for indexing arrays.
• Indexing and searching the keyword array.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX keywords ON books USING GIN ((data->'keywords') jsonb_path_ops);
demo=# select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
--------------
1000003 | zEG406sLKQ2IU8O | viPdlu3DZm | 4 | {"tags": {"nk263020": {"ik203820": "iv817928"}}, "keywords": ["abc", "kef", "keh"], "publisher": "7NClevxuTM",
"criticrating": 2}
1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 | 4 | {"tags": {"nk780341": {"ik397357": "iv632731"}}, "keywords": ["abc", "kef", "keh"], "publisher": "fqaJuAdjP5",
"criticrating": 2}
(2 rows)
demo=# explain analyze select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=54.75..1049.75 rows=1000 width=158) (actual time=0.026..0.028 rows=2 loops=1)
Recheck Cond: ((data -> 'keywords'::text) @> '["abc", "keh"]'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on keywords (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1)
Index Cond: ((data -> 'keywords'::text) @&amp;amp;amp;amp;amp;gt; '["abc", "keh"]'::jsonb)
Planning Time: 0.131 ms
Execution Time: 0.063 ms
(7 rows)
SQL/JSON
• SQL standard added support for JSON – SQL Standard-2016 (SQL/JSON).
• SQL/JSON Data model
• JSONPath
• SQL/JSON functions
• With PG12 release, PostgreSQL has one of the best implementations of
SQL/JSON.
Working with JSON in PostgreSQL vs. MongoDB
SQL/JSON 2016
● A sequence of SQL/JSON items, each item can be (recursively) any of:
○ SQL/JSON scalar — non-null value of SQL types: Unicode character string, numeric, Boolean
or datetime.
○ SQL/JSON null, value that is distinct from any value of any SQL type (not the same as NULL).
○ SQL/JSON arrays, ordered list of zero or more SQL/JSON items — SQL/JSON element
○ SQL/JSON objects — unordered collections of zero or more SQL/JSON members.
■ (key, SQL/JSON item)
Working with JSON in PostgreSQL vs. MongoDB
JSONPath
Working with JSON in PostgreSQL vs. MongoDB
.key Returns an object member with the specified key
[*] Wildcard array element accessor that returns all array elements
.* Wildcard member accessor that returns the values of all members located at the top level of
the current object
.** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the
current object and returns all the member values, regardless of their nesting level
JSONPath allows you to specify an expression (using a syntax similar to the
property access notation in Javascript) to query or project your JSON data.
SQL/JSON Functions
● PG 12 provides several functions to use JSONPATH to query your JSON
data
○ jsonb_path_exists - Checks whether JSON path returns any item for the
specified JSON value
○ jsonb_path_match - Returns the result of JSON path predicate check for
the specified JSON value.
○ jsonb_path_query - Gets all JSON items returned by JSON path for the
specified JSON value.
Working with JSON in PostgreSQL vs. MongoDB
JSONPath
Finding books by publisher?
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
-------------
1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {"tags": {"nk542369": {"ik55240": "iv305393"}}, "keywords": ["abc", "def", "geh"], "publisher": "ktjKEZ1tvq",
"criticrating": 0}
(1 row)
demo=# explain analyze select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=21.75..1014.25 rows=1000 width=158) (actual time=0.123..0.124 rows=1 loops=1)
Recheck Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
Heap Blocks: exact=1
-&amp;amp;amp;amp;gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Index Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
Planning Time: 0.137 ms
Execution Time: 0.194 ms
(7 rows)
JSONPath
Add a JSONPath filter:
Working with JSON in PostgreSQL vs. MongoDB
select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
Build complicated filter expressions:
select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)');
Index support for JSONPath is very limited.
demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1)
Filter: jsonb_path_exists(data, '$."publisher"?(@ == "ktjKEZ1tvq")'::jsonpath, '{}'::jsonb, false)
Rows Removed by Filter: 1000028
Planning Time: 0.095 ms
Execution Time: 480.348 ms
(5 rows)
JSONPath: Projection JSON
Select the last element of the array
Working with JSON in PostgreSQL vs. MongoDB
demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029;
jsonb_path_query
------------------------------
{"price": 50, "style": "pb"}
(1 row)
Select only the hardcover prints from the array
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc")') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)
We can also chain the filters
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)
Roadmap
● Improvements to the JSONPath implementation in PG13
● Future Roadmap
Working with JSON in PostgreSQL vs. MongoDB
Questions?
Working with JSON in PostgreSQL vs. MongoDB

More Related Content

PDF
How to Design Indexes, Really
Karwin Software Solutions LLC
 
PDF
PostgreSQL Deep Internal
EXEM
 
ODP
OpenGurukul : Database : PostgreSQL
Open Gurukul
 
PDF
Json in Postgres - the Roadmap
EDB
 
PDF
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
PPTX
JSON and the Oracle Database
Maria Colgan
 
PPTX
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
PDF
How to Use JSON in MySQL Wrong
Karwin Software Solutions LLC
 
How to Design Indexes, Really
Karwin Software Solutions LLC
 
PostgreSQL Deep Internal
EXEM
 
OpenGurukul : Database : PostgreSQL
Open Gurukul
 
Json in Postgres - the Roadmap
EDB
 
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Jonathan Katz
 
JSON and the Oracle Database
Maria Colgan
 
MySQL Indexing - Best practices for MySQL 5.6
MYXPLAIN
 
How to Use JSON in MySQL Wrong
Karwin Software Solutions LLC
 

What's hot (20)

PDF
Introduction to Apache Calcite
Jordan Halterman
 
PDF
Solving PostgreSQL wicked problems
Alexander Korotkov
 
PDF
MongodB Internals
Norberto Leite
 
PDF
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
PDF
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
PDF
Dive into PySpark
Mateusz Buśkiewicz
 
PDF
PostgreSQL Performance Tuning
elliando dias
 
PPTX
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
PDF
Get to know PostgreSQL!
Oddbjørn Steffensen
 
PDF
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PgDay.Seoul
 
PDF
MS-SQL SERVER ARCHITECTURE
Douglas Bernardini
 
PPTX
Indexing with MongoDB
MongoDB
 
PPTX
Apache Spark Architecture
Alexey Grishchenko
 
PDF
Spark shuffle introduction
colorant
 
PDF
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
PDF
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
PDF
MySQL Index Cookbook
MYXPLAIN
 
PDF
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
PDF
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
PDF
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Introduction to Apache Calcite
Jordan Halterman
 
Solving PostgreSQL wicked problems
Alexander Korotkov
 
MongodB Internals
Norberto Leite
 
Apache Spark Core—Deep Dive—Proper Optimization
Databricks
 
Kevin Kempter PostgreSQL Backup and Recovery Methods @ Postgres Open
PostgresOpen
 
Dive into PySpark
Mateusz Buśkiewicz
 
PostgreSQL Performance Tuning
elliando dias
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Bo Yang
 
Get to know PostgreSQL!
Oddbjørn Steffensen
 
[pgday.Seoul 2022] PostgreSQL구조 - 윤성재
PgDay.Seoul
 
MS-SQL SERVER ARCHITECTURE
Douglas Bernardini
 
Indexing with MongoDB
MongoDB
 
Apache Spark Architecture
Alexey Grishchenko
 
Spark shuffle introduction
colorant
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
MongoDB Schema Design (Event: An Evening with MongoDB Houston 3/11/15)
MongoDB
 
MySQL Index Cookbook
MYXPLAIN
 
Apache Calcite Tutorial - BOSS 21
Stamatis Zampetakis
 
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
 
Building an open data platform with apache iceberg
Alluxio, Inc.
 
Ad

Similar to Working with JSON Data in PostgreSQL vs. MongoDB (20)

PDF
Oh, that ubiquitous JSON !
Alexander Korotkov
 
PPTX
You, me, and jsonb
Arielle Simmons
 
PDF
Jsquery - the jsonb query language with GIN indexing support
Alexander Korotkov
 
PDF
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Ontico
 
PPT
The NoSQL Way in Postgres
EDB
 
PDF
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
Ontico
 
PPTX
Power JSON with PostgreSQL
EDB
 
PDF
There is Javascript in my SQL
PGConf APAC
 
PDF
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Jumping Bean
 
PDF
No sql way_in_pg
Vibhor Kumar
 
PDF
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
pgdayrussia
 
PPTX
How to Use JSONB in PostgreSQL for Product Attributes Storage
techprane
 
PPTX
PostgreSQL 9.4 JSON Types and Operators
Nicholas Kiraly
 
PPT
Do More with Postgres- NoSQL Applications for the Enterprise
EDB
 
PDF
NoSQL on ACID - Meet Unstructured Postgres
EDB
 
PDF
Postgre(No)SQL - A JSON journey
Nicola Moretto
 
PDF
Mathias test
Mathias Stjernström
 
PDF
NoSQL для PostgreSQL: Jsquery — язык запросов
CodeFest
 
PDF
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
Ryan B Harvey, CSDP, CSM
 
PPTX
NoSQL on ACID: Meet Unstructured Postgres
EDB
 
Oh, that ubiquitous JSON !
Alexander Korotkov
 
You, me, and jsonb
Arielle Simmons
 
Jsquery - the jsonb query language with GIN indexing support
Alexander Korotkov
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Ontico
 
The NoSQL Way in Postgres
EDB
 
NoSQL Best Practices for PostgreSQL / Дмитрий Долгов (Mindojo)
Ontico
 
Power JSON with PostgreSQL
EDB
 
There is Javascript in my SQL
PGConf APAC
 
Postgrtesql as a NoSQL Document Store - The JSON/JSONB data type
Jumping Bean
 
No sql way_in_pg
Vibhor Kumar
 
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
pgdayrussia
 
How to Use JSONB in PostgreSQL for Product Attributes Storage
techprane
 
PostgreSQL 9.4 JSON Types and Operators
Nicholas Kiraly
 
Do More with Postgres- NoSQL Applications for the Enterprise
EDB
 
NoSQL on ACID - Meet Unstructured Postgres
EDB
 
Postgre(No)SQL - A JSON journey
Nicola Moretto
 
Mathias test
Mathias Stjernström
 
NoSQL для PostgreSQL: Jsquery — язык запросов
CodeFest
 
JSON Processing in the Database using PostgreSQL 9.4 :: Data Wranglers DC :: ...
Ryan B Harvey, CSDP, CSM
 
NoSQL on ACID: Meet Unstructured Postgres
EDB
 
Ad

More from ScaleGrid.io (7)

PDF
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
ScaleGrid.io
 
PDF
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
ScaleGrid.io
 
PPTX
Introduction to Redis Data Structures: Sets
ScaleGrid.io
 
PPTX
Introduction to Redis Data Structures: Sorted Sets
ScaleGrid.io
 
PPTX
Cassandra vs. MongoDB
ScaleGrid.io
 
PPTX
Introduction to Redis Data Structures: Hashes
ScaleGrid.io
 
PPTX
Introduction to Redis Data Structures
ScaleGrid.io
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
ScaleGrid.io
 
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
ScaleGrid.io
 
Introduction to Redis Data Structures: Sets
ScaleGrid.io
 
Introduction to Redis Data Structures: Sorted Sets
ScaleGrid.io
 
Cassandra vs. MongoDB
ScaleGrid.io
 
Introduction to Redis Data Structures: Hashes
ScaleGrid.io
 
Introduction to Redis Data Structures
ScaleGrid.io
 

Recently uploaded (20)

PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Software Development Company | KodekX
KodekX
 
PPT
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Doc9.....................................
SofiaCollazos
 
Software Development Company | KodekX
KodekX
 
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Software Development Methodologies in 2025
KodekX
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 

Working with JSON Data in PostgreSQL vs. MongoDB

  • 1. JSONB in PostgreSQL Working with JSON in PostgreSQL vs. MongoDB Dharshan Rangegowda Founder, ScaleGrid.io | @dharshanrg
  • 2. What is JSON? ● JSON stands for Javascript object notation. ● Open standard format RFC 7159. ● Most popular format to store and exchange documents. Working with JSON in PostgreSQL vs. MongoDB
  • 3. Why does PostgreSQL need to care about JSON? • Schema flexibility • Dealing with transient or changing columns. • Nested objects • Might not need to deserialize to query. • Handling objects from other systems • E.g. Stripe transaction Working with JSON in PostgreSQL vs. MongoDB
  • 4. PostgreSQL + JSON Timeline Working with JSON in PostgreSQL vs. MongoDB
  • 5. PostgreSQL JSON Support • Wave 1: PostgreSQL 9.2 (2012) added support for the JSON datatype • Text field with JSON validation • No index support • Wave 2: PostgreSQL 9.4 (2014) added support for JSONB datatype • Binary data structure to store JSON • Index support Working with JSON in PostgreSQL vs. MongoDB
  • 6. PostgreSQL JSON Support • Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard • JSONPath support • Powerful query and projection engine for JSON data • Further improvements to JSONPath in PostgreSQL 13 • JSON roadmap Working with JSON in PostgreSQL vs. MongoDB
  • 7. JSON vs. JSONB • JSONB is what you should be using (in most cases) • However, there are some scenarios where JSON is useful: • JSON preserves the original formatting (a.k.a whitespace) • JSON preserves ordering of the keys • JSON preserves duplicate keys • JSON is faster to ingest vs. JSONB Working with JSON in PostgreSQL vs. MongoDB
  • 8. JSONB Anti Patterns ● What is the best way to use JSONB? ○ Do we even need columns any more? ○ Why not just use <int id, jsonb data>? ● JSONB has some high-level limitations you need to be aware of: ○ Statistics collection ○ Storage bloat ● Commonly occurring fields should be stored as columns. ○ Use JSONB for variable or intermittent columns. Working with JSON in PostgreSQL vs. MongoDB
  • 9. JSONB Anti Patterns ● PostgreSQL collect stats on column data distribution ○ Most common values (MCV) ○ Fraction of null values ○ Histogram of distribution ● No column statistics collected for JSONB ○ Query planner doesn’t have stats to make smart decisions ○ Could make wrong choice – cross join vs hash join etc ● More details in blog post - When To Avoid JSONB In A PostgreSQL Schema Working with JSON in PostgreSQL vs. MongoDB
  • 10. JSONB Anti Patterns ● Storage bloat ○ Keys are stored in the data (Similar to MongoDB mmapv1) ○ Use smaller key names to reduce footprint ○ Relies on TOAST compression ○ Sample table with 1M rows (11GB of data) ○ PostgreSQL - 8.7 GB ○ MongoDB Snappy – 8GB, Zlib – 5.3 GB Working with JSON in PostgreSQL vs. MongoDB
  • 11. JSONB & TOAST ● If the size of your column exceeds the TOAST_TUPLE_THRESHOLD (2KB default) data could be moved to out of line storage - TOAST ● TOAST also provides compression (pglz) ○ Decent Compression ○ MongoDB WiredTiger snappy/zlib is potentially better ● To access the data it needs to be De’TOASTed ○ Could result in performance overhead Working with JSON in PostgreSQL vs. MongoDB
  • 12. JSONB Data Structures Working with JSON in PostgreSQL vs. MongoDB Images courtesy: https://erthalion.info/2017/12/21/advanced-json-benchmarks/
  • 13. BSON Data Structures Working with JSON in PostgreSQL vs. MongoDB
  • 14. JSONB Operators Working with JSON in PostgreSQL vs. MongoDB Operator Description ->, ->> Get JSON object field by key @>, <@ Does the left JSONB value contain the right JSONB path/value entries at the top level? ?, ?!, ?& Does the string exist as a top-level key within the JSON value? @@, @@> JSONPath operators Full list of operators can be found in the docs – JSONB op table
  • 15. JSONB Functions • PostgreSQL provides a wide variety of functions to create and process JSON data • Creation functions • Processing functions Working with JSON in PostgreSQL vs. MongoDB
  • 16. MongoDB Query language • Query language based on JSON syntax • db.books.find( {} ) , db.books.find( { publisher: "D" } ) • Array operators • db.books.find( { tags: ["red", "blank"] } ) • AND and OR operators • db.books.find( { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } ) Working with JSON in PostgreSQL vs. MongoDB
  • 17. MongoDB Query language • Query nested documents • db.books.find( { "size.uom": "in" } ) • Query an Array of objects • db.books.find( { 'instock.qty': { $lte: 20 } } )) • Project fields to return from query • db.books.find( {prints: 1}, { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } ) Working with JSON in PostgreSQL vs. MongoDB
  • 18. JSONB Indexes • JSONB provides a wide array of options to index your JSON data. • We are going to dig into three types of indexes: • GIN • BTREE • HASH Working with JSON in PostgreSQL vs. MongoDB
  • 19. JSONB Indexes : GIN • GIN stands for “Generalized Inverted Indexes” • GIN supports two operator classes • jsonb_ops • ?, ?|, ?&, @>, @@, @? • [Index each key and value] • jsonb_pathops • @>, @@, @? • [Index only the values] Copyright © ScaleGrid.io
  • 20. JSON sample data Fictional book database (apologies to any librarians in the audience): Working with JSON in PostgreSQL vs. MongoDB demo=# d+ books Table "public.books" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | author | character varying(255) | | | | extended | | isbn | character varying(25) | | | | extended | | rating | integer | | | | plain | | data | jsonb | | | | extended | | Indexes: "books_pkey" PRIMARY KEY, btree (id) …..
  • 21. JSON sample data Working with JSON in PostgreSQL vs. MongoDB demo=# select jsonb_pretty(data) from books where id = 1000021; jsonb_pretty ------------------------------------ { + "tags": { + "nk906585": { + "ik844766": "iv364087"+ } + }, + "prints": [ + { + "price": 100, + "style": "hc" + }, + { + "price": 50, + "style": "pb" + } + ], + "braille": false, + "keywords": [ + "abc", + "kef", + "keh" + ], + "hardcover": true, + "publisher": "nVxJVA8Bwx", + "criticrating": 2 + }
  • 22. JSONB Indexes: GIN - ? Find all books that are available in braille? Let’s create the GIN index on the ‘data’ JSONB column: Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX datagin ON books USING gin (data); demo=# select * from books where data ? 'braille'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} ..... demo=# explain analyze select * from books where data ? 'braille'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1) Recheck Cond: (data ? 'braille'::text) Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1) Index Cond: (data ? 'braille'::text) Planning Time: 0.102 ms Execution Time: 0.067 ms (7 rows)
  • 23. JSONB Indexes: GIN - ? What if we wanted to find books that were in braille or in hardcover? Working with JSON in PostgreSQL vs. MongoDB demo=# explain analyze select * from books where data ?| array['braille','hardcover']; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.029..0.035 rows=15 loops=1) Recheck Cond: (data ?| '{braille,hardcover}'::text[]) Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.023..0.023 rows=15 loops=1) Index Cond: (data ?| '{braille,hardcover}'::text[]) Planning Time: 0.138 ms Execution Time: 0.057 ms (7 rows)
  • 24. JSONB Indexes: GIN GIN index supports the “existence” operators only on “top level” keys. If the key is not at the top level, then the index will not be used. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data->'tags' ? 'nk455671'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} 685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0} (2 rows) demo=# explain analyze select * from books where data->'tags' ? 'nk455671'; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1) Filter: ((data -> 'tags'::text) ? 'nk455671'::text) Rows Removed by Filter: 1000017 Planning Time: 0.078 ms Execution Time: 270.728 ms (5 rows)
  • 25. JSONB Indexes: GIN The way to check for existence in nested docs is to use “Expression indexes”. Let’s create an index on data->tags: Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX datatagsgin ON books USING gin (data->'tags'); demo=# select * from books where data->'tags' ? 'nk455671'; id | author | isbn | rating | data ---------+-----------------+------------+--------+----------------------------------------------------------------------------------------------------------- 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} 685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0} (2 rows) demo=# explain analyze select * from books where data->'tags' ? 'nk455671'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on books (cost=12.75..1007.75 rows=1000 width=158) (actual time=0.031..0.035 rows=2 loops=1) Recheck Cond: ((data ->'tags'::text) ? 'nk455671'::text) Heap Blocks: exact=2 -> Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1) Index Cond: ((data ->'tags'::text) ? 'nk455671'::text) Planning Time: 0.098 ms Execution Time: 0.061 ms (7 rows)
  • 26. JSONB Indexes: GIN - @> The “path” operator can be used for multi-level queries of your JSON data. Let’s use it similar to the ? operator. Working with JSON in PostgreSQL vs. MongoDB select * from books where data @> '{"braille":true}'::jsonb; demo=# explain analyze select * from books where data @> '{"braille":true}'::jsonb; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.040..0.048 rows=6 loops=1) Recheck Cond: (data @> '{"braille": true}'::jsonb) Rows Removed by Index Recheck: 9 Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.030..0.030 rows=15 loops=1) Index Cond: (data @> '{"braille": true}'::jsonb) Planning Time: 0.100 ms Execution Time: 0.076 ms (8 rows)
  • 27. JSONB Indexes: GIN - @> The "path" operator can be used for multi level queries of your JSON data. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb; id | author | isbn | rating | data -----+-----------------+------------+--------+------------------------------------------------------------------------------------- 346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3} (1 row) demo=# explain analyze select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.491..0.492 rows=1 loops=1) Recheck Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.092..0.092 rows=1 loops=1) Index Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb) Planning Time: 0.090 ms Execution Time: 0.523 ms
  • 28. JSONB Indexes: GIN - @> The JSON queries can be nested to many levels. You can also use the ># operation but GIN does not support it. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} (1 row)
  • 29. JSONB Indexes: GIN - jsonb_pathops GIN also supports a “pathops” option to reduce the size of the GIN index. From the docs: “The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.” On my small dataset of 1M books, you can see that the pathops GIN index is smaller – you should test with your dataset to understand the savings. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops); public | dataginpathops | index | sgpostgres | books | 67 MB | public | datatagsgin | index | sgpostgres | books | 84 MB |
  • 30. JSONB Indexes: GIN - jsonb_pathops Let’s rerun our query from before with the pathops index: Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} (1 row) demo=# explain select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) Recheck Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb) -> Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0) Index Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb) (4 rows)
  • 31. JSONB Indexes: GIN - jsonb_pathops The “jsonb_pathops” option supports only the @> operator. Smaller index but more limited scenarios. The following queries below can no longer leverage the GIN index: Working with JSON in PostgreSQL vs. MongoDB select * from books where data ? 'tags'; => Sequential scan select * from books where data @> '{"tags" :{}}'; => Sequential scan select * from books where data @> '{"tags" :{"k7888":{}}}' => Sequential scan
  • 32. JSONB Indexes: B-tree • B-tree indexes are the most common index type in relational databases. • If you index an entire JSONB column with a B-tree index, the only useful operators are the comparison operators: • =, <, <=, >, >= • Can be used only for whole object comparisons. • Very limited use case. Working with JSON in PostgreSQL vs. MongoDB
  • 33. JSONB Indexes: B-tree • Use B-tree “Expression indexes” • B-tree expression indexes can support the common comparison operators '=', '<', '>', '>=', '<=‘ (which GIN doesn't support). • Retrieve all books with a data->criticrating > 4. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data->'criticrating' > 4; ERROR: operator does not exist: jsonb >= integer LINE 1: select * from books where data->'criticrating’ > 4; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. #Lets cast JSONB to integer demo=# select * from books where (data->'criticrating')::int4 > 4; #If you are using a version prior to pg11 you need to query as text and then cast demo=# select * from books where (data->>'criticrating')::int4 > 4;
  • 34. JSONB Indexes: B-tree For expression indexes, the index needs to be an exact match with the query expression: Working with JSON in PostgreSQL vs. MongoDB demo=# CREATE INDEX criticrating ON books USING BTREE (((data->'criticrating')::int4)); CREATE INDEX demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1) Index Cond: (((data -> 'criticrating'::text))::integer = 3) Planning Time: 0.103 ms Execution Time: 79.019 ms (4 rows) demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1) Index Cond: (((data -> 'criticrating'::text))::integer = 3) Planning Time: 0.103 ms Execution Time: 79.019 ms (4 rows) 1 From above we can see that the BTREE index is being used as expected.
  • 35. JSONB Indexes: HASH • If you are only interested in the "=" operator, then Hash indexes become interesting. • Hash indexes tend to be smaller than B-tree indexes. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX publisherhash ON books USING HASH ((data->'publisher')); demo=# select * from books where data->'publisher' = 'XlekfkLOtL' demo-# ; id | author | isbn | rating | data -----+-----------------+------------+--------+------------------------------------------------------------------------------------- 346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3} (1 row) demo=# explain analyze select * from books where data->'publisher' = 'XlekfkLOtL'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1) Index Cond: ((data -> 'publisher'::text) = 'XlekfkLOtL'::text) Planning Time: 0.080 ms Execution Time: 0.035 ms (4 rows)
  • 36. JSONB Indexes: GIN - Trigram • PostgreSQL supports string matching using Trigram indexes. • Trigrams are basically words broken up into sequences of 3 letters. • We can search for any arbitrary regex (not just left anchored). Working with JSON in PostgreSQL vs. MongoDB CREATE EXTENSION pg_trgm; CREATE INDEX publisher ON books USING GIN ((data->'publisher') gin_trgm_ops); demo=# select * from books where data->'publisher' LIKE '%I0UB%'; id | author | isbn | rating | data ----+-----------------+------------+--------+--------------------------------------------------------------------------------- 4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm | 0 | {"tags": {"nk3": {"ik1": "iv1"}}, "publisher": "MI0UBqZJDt", "criticrating": 1} (1 row) demo=# explain analyze select * from books where data->'publisher' LIKE '%I0UB%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=9.78..111.28 rows=100 width=158) (actual time=0.033..0.033 rows=1 loops=1) Recheck Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text) Heap Blocks: exact=1 -> Bitmap Index Scan on publisher (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1) Index Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text) Planning Time: 0.213 ms Execution Time: 0.058 ms (7 rows)
  • 37. JSONB Indexes: GIN - Arrays • GIN indexes are great for indexing arrays. • Indexing and searching the keyword array. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX keywords ON books USING GIN ((data->'keywords') jsonb_path_ops); demo=# select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- -------------- 1000003 | zEG406sLKQ2IU8O | viPdlu3DZm | 4 | {"tags": {"nk263020": {"ik203820": "iv817928"}}, "keywords": ["abc", "kef", "keh"], "publisher": "7NClevxuTM", "criticrating": 2} 1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 | 4 | {"tags": {"nk780341": {"ik397357": "iv632731"}}, "keywords": ["abc", "kef", "keh"], "publisher": "fqaJuAdjP5", "criticrating": 2} (2 rows) demo=# explain analyze select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=54.75..1049.75 rows=1000 width=158) (actual time=0.026..0.028 rows=2 loops=1) Recheck Cond: ((data -> 'keywords'::text) @> '["abc", "keh"]'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on keywords (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1) Index Cond: ((data -> 'keywords'::text) @&amp;amp;amp;amp;amp;gt; '["abc", "keh"]'::jsonb) Planning Time: 0.131 ms Execution Time: 0.063 ms (7 rows)
  • 38. SQL/JSON • SQL standard added support for JSON – SQL Standard-2016 (SQL/JSON). • SQL/JSON Data model • JSONPath • SQL/JSON functions • With PG12 release, PostgreSQL has one of the best implementations of SQL/JSON. Working with JSON in PostgreSQL vs. MongoDB
  • 39. SQL/JSON 2016 ● A sequence of SQL/JSON items, each item can be (recursively) any of: ○ SQL/JSON scalar — non-null value of SQL types: Unicode character string, numeric, Boolean or datetime. ○ SQL/JSON null, value that is distinct from any value of any SQL type (not the same as NULL). ○ SQL/JSON arrays, ordered list of zero or more SQL/JSON items — SQL/JSON element ○ SQL/JSON objects — unordered collections of zero or more SQL/JSON members. ■ (key, SQL/JSON item) Working with JSON in PostgreSQL vs. MongoDB
  • 40. JSONPath Working with JSON in PostgreSQL vs. MongoDB .key Returns an object member with the specified key [*] Wildcard array element accessor that returns all array elements .* Wildcard member accessor that returns the values of all members located at the top level of the current object .** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level JSONPath allows you to specify an expression (using a syntax similar to the property access notation in Javascript) to query or project your JSON data.
  • 41. SQL/JSON Functions ● PG 12 provides several functions to use JSONPATH to query your JSON data ○ jsonb_path_exists - Checks whether JSON path returns any item for the specified JSON value ○ jsonb_path_match - Returns the result of JSON path predicate check for the specified JSON value. ○ jsonb_path_query - Gets all JSON items returned by JSON path for the specified JSON value. Working with JSON in PostgreSQL vs. MongoDB
  • 42. JSONPath Finding books by publisher? Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @@ '$.publisher == "ktjKEZ1tvq"'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- ------------- 1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {"tags": {"nk542369": {"ik55240": "iv305393"}}, "keywords": ["abc", "def", "geh"], "publisher": "ktjKEZ1tvq", "criticrating": 0} (1 row) demo=# explain analyze select * from books where data @@ '$.publisher == "ktjKEZ1tvq"'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=21.75..1014.25 rows=1000 width=158) (actual time=0.123..0.124 rows=1 loops=1) Recheck Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath) Heap Blocks: exact=1 -&amp;amp;amp;amp;gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1) Index Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath) Planning Time: 0.137 ms Execution Time: 0.194 ms (7 rows)
  • 43. JSONPath Add a JSONPath filter: Working with JSON in PostgreSQL vs. MongoDB select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")'); Build complicated filter expressions: select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)'); Index support for JSONPath is very limited. demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")'); QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1) Filter: jsonb_path_exists(data, '$."publisher"?(@ == "ktjKEZ1tvq")'::jsonpath, '{}'::jsonb, false) Rows Removed by Filter: 1000028 Planning Time: 0.095 ms Execution Time: 480.348 ms (5 rows)
  • 44. JSONPath: Projection JSON Select the last element of the array Working with JSON in PostgreSQL vs. MongoDB demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029; jsonb_path_query ------------------------------ {"price": 50, "style": "pb"} (1 row) Select only the hardcover prints from the array demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc")') from books where id = 1000029; jsonb_path_query ------------------------------- {"price": 100, "style": "hc"} (1 row) We can also chain the filters demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 1000029; jsonb_path_query ------------------------------- {"price": 100, "style": "hc"} (1 row)
  • 45. Roadmap ● Improvements to the JSONPath implementation in PG13 ● Future Roadmap Working with JSON in PostgreSQL vs. MongoDB
  • 46. Questions? Working with JSON in PostgreSQL vs. MongoDB