Hackingpg Present
Hackingpg Present
Stephen Frost
Crunchy Data
[email protected]
PGConf.EU 2018
October 24, 2018
PostgreSQL Source Code
Hacking PostgreSQL
Final Code
PostgreSQL Subsystems
Hacking the PostgreSQL Way
Stephen Frost
Directory Description
port Backend-specific platform-specific hacks
postmaster The ”main” PG process that always runs, answers requests, hands off connections
regex Henry Spencer’s regex library, also used by TCL, maintained more-or-less by PG now
replication Backend components to support replication, shipping WAL logs, reading them in, etc
rewrite Query rewrite engine, used with RULEs, also handles Row-Level Security
snowball Snowball stemming, used with full-text search
statistics Extended Statistics system (CREATE STATISTICS)
storage Storage layer, handles most direct file i/o, support for large objects, etc
tcop ”Traffic Cop”- this is what gets the actual queries, runs them, etc
tsearch Full-Text Search engine
utils Various back-end utility components, cacheing system, memory manager, etc
What is a Parser?
stmt :
AlterEventTrigStmt
| AlterCollationStmt
| AlterDatabaseStmt
...
| CopyStmt
...
COPY productions
These are the other COPY productions
copy_from:
FROM { $$ = true; }
| TO { $$ = false; }
;
opt_program:
PROGRAM { $$ = true; }
| /* EMPTY */ { $$ = false; }
;
...
copy_file_name:
Sconst { $$ = $1; }
| STDIN { $$ = NULL; }
| STDOUT { $$ = NULL; }
;
COPY productions
Multi-value productions look like this
copy_generic_opt_list:
copy_generic_opt_elem
{
$$ = list_make1($1);
}
| copy_generic_opt_list ',' copy_generic_opt_elem
{
$$ = lappend($1, $3);
}
;
copy_generic_opt_elem:
ColLabel copy_generic_opt_arg
{
$$ = makeDefElem($1, $2, @1);
}
;
copy_generic_opt_arg:
opt_boolean_or_string { $$ = (Node *) makeString($1); }
| NumericOnly { $$ = (Node *) $1; }
| '*' { $$ = (Node *) makeNode(A_Star); }
| '(' copy_generic_opt_arg_list
Crunchy Data ')' { $$ = 2018
PGConf.EU (Node *) $2; } 13 / 36
PostgreSQL Source Code
Hacking PostgreSQL From an Idea..
Final Code Parser Changes
PostgreSQL Subsystems Command Code Changes
Hacking the PostgreSQL Way
COPY productions
Note the C template code in the grammar
Compiled as part of the overall parser in gram.c
”$$” is ”this node”
”$1” is the whatever the first value resolves to
”$3” is the whatever the third value resolves to
copy_generic_opt_list:
copy_generic_opt_elem
{
$$ = list_make1($1);
}
| copy_generic_opt_list ',' copy_generic_opt_elem
{
$$ = lappend($1, $3);
}
;
copy_opt_item:
BINARY
{
$$ = makeDefElem("format", (Node *)makeString("binary"), @1);
}
| OIDS
{
$$ = makeDefElem("oids", (Node *)makeInteger(true), @1);
}
| FREEZE
{
$$ = makeDefElem("freeze", (Node *)makeInteger(true), @1);
}
...
ProcessCopyOptions(CopyState cstate,
...
}
+ else if (strcmp(defel->defname, "compressed") == 0)
+ {
+#ifdef HAVE_LIBZ
+ if (cstate->compressed)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("conflicting or redundant options")));
+ cstate->compressed = defGetBoolean(defel);
+#else
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("Not compiled with zlib support.")));
+#endif
+ }
else if (strcmp(defel->defname, "oids") == 0)
...
Is that it?
Not hardly.
Further changes to copy.c for a COMPRESSED state
Changes to track gzFile instead of FILE*
Also have to use gzread()/gzwrite()
Documentation updates in doc/src/sgml/ref/copy.sgml
Regression test updates
Resulting diffstat:
doc/src/sgml/ref/copy.sgml | 12 ++
src/backend/commands/copy.c | 458 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
src/backend/parser/gram.y | 9 +-
src/backend/storage/file/fd.c | 97 ++++++++++++
src/include/parser/kwlist.h | 1 +
src/include/storage/fd.h | 9 ++
src/test/regress/input/copy.source | 20 +++
src/test/regress/output/copy.source | 18 +++
8 files changed, 583 insertions(+), 41 deletions(-)
PostgreSQL Subsystems
Memory management
Error logging / cleanup
Linked lists (multiple ways...)
Catalog lookups
Nodes
Datums and Tuples
Memory Management
Nodes
Datums
Tuples
Tuples - continued
Other Subsystems
Selection of Subsystems
Simple Linked List implementation - pg list.h, list.c
Integrated/inline doubly- and singly- linked lists - ilist.h, ilist.c
Binary Heap implementation- binaryheap.c
Hopcroft-Karp maximum cardinality algorithm for bipartite graphs - bipartite match.c
Bloom Filter - bloomfilter.c
Dynamic Shared Memory Based Hash Tables - dshash.c
HyperLogLog cardinality estimator - hyperloglog.c
Knapsack problem solver - knapsack.c
Pairing Heap implementation - pairingheap.c
Red-Black binary tree - rbtree.c
String handling - stringinfo.c
pgsql-hackers
Primary mailing list for discussion of PostgreSQL development
Get a PostgreSQL Account at https://postgresql.org/account
Subscribe at https://lists.postgresql.org
Discuss your ideas and thoughts about how to improve PostgreSQL
Watch for others working on similar capabilities
Try to think about general answers, not specific
Be supportive of other ideas and approaches
What happened to COPY ... COMPRESSED ?
Send and receive COPY data from program instead
COPY ... PROGRAM ’zcat ...’
Not quite identical but large overlap
Simpler in a few ways than direct zlib support
Crunchy Data PGConf.EU 2018 31 / 36
PostgreSQL Source Code
Follow the mailing lists
Hacking PostgreSQL
Style
Final Code
Hacking with git
PostgreSQL Subsystems
Submitting Patches
Hacking the PostgreSQL Way
Code Style
Git crash-course
Clone down the repo-
git clone https://git.postgresql.org/git/postgresql.git
Creates postgresql directory as a git repo
cd into postgresql
Create a branch to work on
git checkout -b myfeature
Creates a local branch called myfeature
Hack on PostgreSQL! Make changes!
Commit changes and build a diff
git add files changes
git commit
git branch –set-upstream-to=origin/master myfeature
git format-patch @{u} –stdout >myfeature.patch
Crunchy Data PGConf.EU 2018 34 / 36
PostgreSQL Source Code
Follow the mailing lists
Hacking PostgreSQL
Style
Final Code
Hacking with git
PostgreSQL Subsystems
Submitting Patches
Hacking the PostgreSQL Way
Patch format
Context diff or git-diff
Ideally, pick which is better
Multiple patches in one email- do not multi-email
Include in email to -hackers
Description of the patch
Regression tests
Documentation updates
pg dump support, if appropriate
Register patch on https://commitfest.postgresql.org
Questions?
Thanks!