Lists: | pgsql-hackers |
---|
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-07 19:44:55 |
Message-ID: | CA+14426e8dbmMjGLu8jO8CQAb9-FKiM-CQhvQHUB=3OnJwWpzQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
I got some time over during the holidays, so I spent some of it
doing something I've been thinking about for a while.
For those of you that are not aware of it: Coccinelle is a tool for pattern
matching and text transformation for C code and can be used for detection
of problematic programming patterns and to make complex, tree-wide patches
easy. It is aware of the structure of C code and is better suited to make
complicated changes than what is possible using normal text substitution
tools like Sed and Perl.
Coccinelle have been successfully been used in the Linux project since 2008
and is now an established tool for Linux development and a large number of
semantic patches have been added to the source tree to capture everything
from generic issues (like eliminating the redundant A in expressions like
"!A || (A && B)") to more Linux-specific problems like adding a missing
call to kfree().
Although PostgreSQL is nowhere the size of the Linux kernel, it is
nevertheless of a significant size and would benefit from incorporating
Coccinelle into the development. I noticed it's been used in a few cases
way back (like 10 years back) to fix issues in the PostgreSQL code, but I
thought it might be useful to make it part of normal development practice
to, among other things:
- Identify and correct bugs in the source code both during development and
review.
- Make large-scale changes to the source tree to improve the code based on
new insights.
- Encode and enforce APIs by ensuring that function calls are used
correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL
versions.
To that end, I created a series of patches to show how it could be used in
the PostgreSQL tree. It is a lot easier to discuss concrete code and I
split it up into separate messages since that makes it easier to discuss
each individual patch. The series contains code to make it easy to work
with Coccinelle during development and reviews, as well as examples of
semantic patches that capture problems, demonstrate how to make large-scale
changes, how to enforce APIs, and also improve some coding patterns.
This first patch contains the coccicheck.py script, which is a
re-implementation of the coccicheck script that the Linux kernel uses. We
cannot immediately use the coccicheck script since it is quite closely tied
to the Linux source code tree and we need to have something that both
supports autoconf and Meson. Since Python seems to be used more and more in
the tree, it seems to be the most natural choice. (I have no strong opinion
on what language to use, but think it would be good to have something that
is as platform-independent as possible.)
The intention is that we should be able to use the Linux semantic patches
directly, so it supports the "Requires" and "Options" keywords, which can
be used to require a specific version of spatch(1) and add options to the
execution of that semantic patch, respectively.
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0001-Add-initial-coccicheck-script.v1.patch | text/x-patch | 7.1 KB |
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Mats Kindahl <mats(at)timescale(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-08 10:42:09 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote:
> I got some time over during the holidays, so I spent some of it
> doing something I've been thinking about for a while.
>
> For those of you that are not aware of it: Coccinelle is a tool for
> pattern matching and text transformation for C code and can be used
> for detection of problematic programming patterns and to make complex,
> tree-wide patches easy. It is aware of the structure of C code and is
> better suited to make complicated changes than what is possible using
> normal text substitution tools like Sed and Perl.
>
> Coccinelle have been successfully been used in the Linux project since
> 2008 and is now an established tool for Linux development and a large
> number of semantic patches have been added to the source tree to
> capture everything from generic issues (like eliminating the redundant
> A in expressions like "!A || (A && B)") to more Linux-specific
> problems like adding a missing call to kfree().
>
> Although PostgreSQL is nowhere the size of the Linux kernel, it is
> nevertheless of a significant size and would benefit from
> incorporating Coccinelle into the development. I noticed it's been
> used in a few cases way back (like 10 years back) to fix issues in the
> PostgreSQL code, but I thought it might be useful to make it part of
> normal development practice to, among other things:
>
> - Identify and correct bugs in the source code both during development
> and review.
> - Make large-scale changes to the source tree to improve the code
> based on new insights.
> - Encode and enforce APIs by ensuring that function calls are used
> correctly.
> - Use improved coding patterns for more efficient code.
> - Allow extensions to automatically update code for later PostgreSQL
> versions.
>
> To that end, I created a series of patches to show how it could be
> used in the PostgreSQL tree. It is a lot easier to discuss concrete
> code and I split it up into separate messages since that makes it
> easier to discuss each individual patch. The series contains code to
> make it easy to work with Coccinelle during development and reviews,
> as well as examples of semantic patches that capture problems,
> demonstrate how to make large-scale changes, how to enforce APIs, and
> also improve some coding patterns.
>
> This first patch contains the coccicheck.py script, which is a
> re-implementation of the coccicheck script that the Linux kernel uses.
> We cannot immediately use the coccicheck script since it is quite
> closely tied to the Linux source code tree and we need to have
> something that both supports autoconf and Meson. Since Python seems to
> be used more and more in the tree, it seems to be the most natural
> choice. (I have no strong opinion on what language to use, but think
> it would be good to have something that is as platform-independent as
> possible.)
>
> The intention is that we should be able to use the Linux semantic
> patches directly, so it supports the "Requires" and "Options"
> keywords, which can be used to require a specific version of spatch(1)
> and add options to the execution of that semantic patch, respectively.
>
Please don't start multiple threads like this. If you want to submit a
set of patches for a single feature, send them all as attachments in a
single email. Otherwise this just makes life hard for threading email
readers.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-08 11:40:50 |
Message-ID: | CA+14427gUW_OHPSCbzd2-LtTya+mCtH5TftBbRfrhvk1LZr1Ag@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Wed, Jan 8, 2025 at 11:42 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 2025-01-07 Tu 2:44 PM, Mats Kindahl wrote:
> > I got some time over during the holidays, so I spent some of it
> > doing something I've been thinking about for a while.
> >
> > For those of you that are not aware of it: Coccinelle is a tool for
> > pattern matching and text transformation for C code and can be used
> > for detection of problematic programming patterns and to make complex,
> > tree-wide patches easy. It is aware of the structure of C code and is
> > better suited to make complicated changes than what is possible using
> > normal text substitution tools like Sed and Perl.
> >
> > Coccinelle have been successfully been used in the Linux project since
> > 2008 and is now an established tool for Linux development and a large
> > number of semantic patches have been added to the source tree to
> > capture everything from generic issues (like eliminating the redundant
> > A in expressions like "!A || (A && B)") to more Linux-specific
> > problems like adding a missing call to kfree().
> >
> > Although PostgreSQL is nowhere the size of the Linux kernel, it is
> > nevertheless of a significant size and would benefit from
> > incorporating Coccinelle into the development. I noticed it's been
> > used in a few cases way back (like 10 years back) to fix issues in the
> > PostgreSQL code, but I thought it might be useful to make it part of
> > normal development practice to, among other things:
> >
> > - Identify and correct bugs in the source code both during development
> > and review.
> > - Make large-scale changes to the source tree to improve the code
> > based on new insights.
> > - Encode and enforce APIs by ensuring that function calls are used
> > correctly.
> > - Use improved coding patterns for more efficient code.
> > - Allow extensions to automatically update code for later PostgreSQL
> > versions.
> >
> > To that end, I created a series of patches to show how it could be
> > used in the PostgreSQL tree. It is a lot easier to discuss concrete
> > code and I split it up into separate messages since that makes it
> > easier to discuss each individual patch. The series contains code to
> > make it easy to work with Coccinelle during development and reviews,
> > as well as examples of semantic patches that capture problems,
> > demonstrate how to make large-scale changes, how to enforce APIs, and
> > also improve some coding patterns.
> >
> > This first patch contains the coccicheck.py script, which is a
> > re-implementation of the coccicheck script that the Linux kernel uses.
> > We cannot immediately use the coccicheck script since it is quite
> > closely tied to the Linux source code tree and we need to have
> > something that both supports autoconf and Meson. Since Python seems to
> > be used more and more in the tree, it seems to be the most natural
> > choice. (I have no strong opinion on what language to use, but think
> > it would be good to have something that is as platform-independent as
> > possible.)
> >
> > The intention is that we should be able to use the Linux semantic
> > patches directly, so it supports the "Requires" and "Options"
> > keywords, which can be used to require a specific version of spatch(1)
> > and add options to the execution of that semantic patch, respectively.
> >
>
>
> Please don't start multiple threads like this. If you want to submit a
> set of patches for a single feature, send them all as attachments in a
> single email. Otherwise this just makes life hard for threading email
> readers.
>
My apologies, I thought this would make it easier to discuss and review the
code. I will send a single email in the future.
Should I resend this as a single email with all the patches?
--
Best wishes,
Mats Kindahl, Timescale
From: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
---|---|
To: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Mats Kindahl <mats(at)timescale(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-14 15:18:56 |
Message-ID: | CAJ7c6TPqocH10pbUfJah_dC1o9aT1RFWBW-HchcbUZ56x6KC=w@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
> My apologies, I thought this would make it easier to discuss and review the code. I will send a single email in the future.
>
> Should I resend this as a single email with all the patches?
IMO the best solution would be re-submitting all the patches to this
thread. Also please make sure the patchset is registered on the
nearest open CF [1] This will ensure that the patchset is built on our
CI (aka cfbot [2]) and will not be lost.
[1]: https://commitfest.postgresql.org/
[2]: http://cfbot.cputube.org/
--
Best regards,
Aleksander Alekseev
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-18 19:44:00 |
Message-ID: | CA+14425Q2u14FgqndYyHwmUds+v+9Sg6kyUqKMFsC2Gb+H2Bnw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:
> IMO the best solution would be re-submitting all the patches to this
> thread. Also please make sure the patchset is registered on the
> nearest open CF [1] This will ensure that the patchset is built on our
> CI (aka cfbot [2]) and will not be lost.
>
> [1]: https://commitfest.postgresql.org/
> [2]: http://cfbot.cputube.org/
>
>
Thank you Aleksander,
Here is a new post with all patches attached and all comments combined.
For those of you that are not aware of it: Coccinelle is a tool for pattern
matching and text transformation for C code and can be used for detection
of problematic programming patterns and to make complex, tree-wide patches
easy. It is aware of the structure of C code and is better suited to make
complicated changes than what is possible using normal text substitution
tools like Sed and Perl. I've noticed it's been used at a few cases way
back to fix issues.[1]
Coccinelle have been successfully been used in the Linux project since 2008
and is now an established tool for Linux development and a large number of
semantic patches have been added to the source tree to capture everything
from generic issues (like eliminating the redundant A in expressions like
"!A || (A && B)") to more Linux-specific problems like adding a missing
call to kfree().
Although PostgreSQL is nowhere the size of the Linux kernel, it is
nevertheless of a significant size and would benefit from incorporating
Coccinelle into the development. I noticed it's been used in a few cases
way back (like 10 years back) to fix issues in the PostgreSQL code, but I
thought it might be useful to make it part of normal development practice
to, among other things:
- Identify and correct bugs in the source code both during development and
review.
- Make large-scale changes to the source tree to improve the code based on
new insights.
- Encode and enforce APIs by ensuring that function calls are used
correctly.
- Use improved coding patterns for more efficient code.
- Allow extensions to automatically update code for later PostgreSQL
versions.
To that end, I created a series of patches to show how it could be used in
the PostgreSQL tree. It is a lot easier to discuss concrete code and I
split it up into separate messages since that makes it easier to discuss
each individual patch. The series contains code to make it easy to work
with Coccinelle during development and reviews, as well as examples of
semantic patches that capture problems, demonstrate how to make large-scale
changes, how to enforce APIs, and also improve some coding patterns.
The first three patches contain the coccicheck.py script and the
integration with the build system (both Meson and Autoconf).
# Coccicheck Script
It is a re-implementation of the coccicheck script that the Linux kernel
uses. We cannot immediately use the coccicheck script since it is quite
closely tied to the Linux source code tree and we need to have something
that both supports Autoconf and Meson. Since Python seems to be used more
and more in the tree, it seems to be the most natural choice. (I have no
strong opinion on what language to use, but think it would be good to have
something that is as platform-independent as possible.)
The intention is that we should be able to use the Linux semantic patches
directly, so it supports the "Requires" and "Options" keywords, which can
be used to require a specific version of spatch(1) and add options to the
execution of that semantic patch, respectively.
# Autoconf support
The changes to Autoconf modifies the configure.ac and related files (in
particular Makefile.global.in). At this point, I have deliberately not
added support for pgxs so extensions cannot use coccicheck through the
PostgreSQL installation. This is something that we can add later though.
The semantic patches are expected to live in cocci/ directory under the
root and the patch uses the pattern cocci/**/*.cocci to find all semantic
patches. Right now there are no subdirectories for the semantic patches,
but this might be something we want to add to create different categories
of scripts.
The coccicheck target is used in the same way as for the Linux kernel, that
is, to generate and apply all patches suggested by the semantic patches,
you type:
make coccicheck MODE=patch | patch -p1
Linux as support for a few more variables: V to set the verbosity, J to use
multiple jobs for processing the semantic patches, M to select a different
directory to apply the semantic patches to, and COCCI to use a single
specific semantic patch rather than all available. I have not added support
for this right now, but if you think this is valuable, it should be
straightforward to add.
I used autoconf 2.69, as mentioned in configure.ac, but that generate a
bigger diff than I expected. Any advice here is welcome.
# Meson Support
The support for Meson is done by adding three coccicheck targets: one for
each mode. To apply all patches suggested by the semantic patches using
ninja (as is done in [2]), you type the following in the build directory
generated by Meson (e.g., the "build/" subdirectory).
ninja coccicheck-patch | patch -p1 -d ..
If you want to pass other flags you have to set the SPFLAGS environment
variable when calling ninja:
SPFLAGS=--debug ninja coccicheck-report
# Semantic Patch: Wrong type for palloc()
This is the first example of a semantic patch and shows how to capture and
fix a common problem.
If you use an palloc() to allocate memory for an object (or an array of
objects) and by mistake type something like:
StringInfoData *info = palloc(sizeof(StringInfoData*));
You will not allocate enough memory for storing the object. This semantic
patch catches any cases where you are either allocating an array of objects
or a single object that do not have corret types in this sense, more
precisely, it captures assignments to a variable of type T* where palloc()
uses sizeof(T) either alone or with a single expression (assuming this is
an array count).
The semantic patch is overzealous in the sense that using the wrong typedef
will suggest a change (this can be seen in the patch). Although the sizes
of these are the same, it is probably be better to just follow the
convention of always using the type "T*" for any "palloc(sizeof(T))" since
the typedef can change at any point and would then introduce a bug.
Coccicheck can easily fix this for you, so it is straightforward to enforce
this. It also simplifies other automated checking to follow this convention.
We don't really have any real bugs as a result from this, but we have one
case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an
"LLVMBasicBlockRef*", which strictly speaking is not correct (it should be
"sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there
is no risk of incorrect allocation size. One typedef usage that does not
match.
# Semantic Patch: Introduce palloc_array() and palloc_object() where
possible
This is an example of a large-scale refactoring to improve the code.
For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
commit 2016055a92f to provide more type-safety, but these functions are not
widely used. This semantic patch captures and replaces all uses of palloc()
where palloc_array() or palloc_object() could be used instead. It
deliberately does not touch cases where it is not clear that the
replacement can be done.
[1]: https://coccinelle.gitlabpages.inria.fr/website/
[2]: https://www.postgresql.org/docs/current/install-meson.html
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0004-Add-semantic-patch-for-sizeof-using-palloc.v2.patch | text/x-patch | 3.3 KB |
0001-Add-initial-coccicheck-script.v2.patch | text/x-patch | 7.1 KB |
0002-Create-coccicheck-target-for-autoconf.v2.patch | text/x-patch | 9.1 KB |
0003-Add-meson-build-for-coccicheck.v2.patch | text/x-patch | 4.5 KB |
0005-Add-script-for-palloc_array.v2.patch | text/x-patch | 560.4 KB |
From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | Mats Kindahl <mats(at)timescale(dot)com> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-19 01:10:05 |
Message-ID: | [email protected] |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sat, Jan 18, 2025 at 08:44:00PM +0100, Mats Kindahl wrote:
> For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
> commit 2016055a92f to provide more type-safety, but these functions are not
> widely used. This semantic patch captures and replaces all uses of palloc()
> where palloc_array() or palloc_object() could be used instead. It
> deliberately does not touch cases where it is not clear that the
> replacement can be done.
I am not sure how much a dependency to coccicheck would cost (usually
such changes should require a case-by-case analysis rather than a
blind automation), but palloc_array() and palloc_object() are
available down to v13.
Based on this argument, it would be tempting to apply this rule
across the stable branches to reduce conflict churn. However this is
an improvement in readability, like the talloc() things as Peter has
mentioned, hence it should be a HEAD-only thing. I do like the idea
of forcing more the object-palloc style on HEAD in the tree in some
areas of the code, even if it would come with some backpatching cost
for existing code.
Thoughts? Perhaps this has been discussed previously?
--
Michael
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-01-20 07:33:12 |
Message-ID: | CA+14425pzingPHHKyqhe6PnPzV0GUnUP0vzuUFwfMRddV--wvQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Jan 19, 2025 at 2:10 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Sat, Jan 18, 2025 at 08:44:00PM +0100, Mats Kindahl wrote:
> > For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
> > commit 2016055a92f to provide more type-safety, but these functions are
> not
> > widely used. This semantic patch captures and replaces all uses of
> palloc()
> > where palloc_array() or palloc_object() could be used instead. It
> > deliberately does not touch cases where it is not clear that the
> > replacement can be done.
>
> I am not sure how much a dependency to coccicheck would cost (usually
> such changes should require a case-by-case analysis rather than a
> blind automation), but palloc_array() and palloc_object() are
> available down to v13.
This script is intended to be conservative in that it should not do
replacements that are not clearly suitable for palloc_array/palloc_object.
Since the intention is that it should automatically generate patches on
cases that can be improved, I think the best strategy is to be conservative
and not do replacements unless it is clear that it is a good replacement.
> Based on this argument, it would be tempting to apply this rule
> across the stable branches to reduce conflict churn. However this is
> an improvement in readability, like the talloc() things as Peter has
> mentioned, hence it should be a HEAD-only thing.
I would argue that it is a HEAD-only thing. Main reason is that backports
always risk creating extra work, even if they look innocent, and it is
usually better to only backport patches that *really* need to be backported.
> I do like the idea
> of forcing more the object-palloc style on HEAD in the tree in some
> areas of the code, even if it would come with some backpatching cost
> for existing code.
>
> Thoughts? Perhaps this has been discussed previously?
>
My main reasoning around this patch is that the palloc_array and
palloc_object were introduced for a reason, in this case for type-safety
and readability, and not using it widely in the code base sort of defeats
the purpose of adding the functions at all. Doing it manually is a chore,
but with Coccinelle we can do these kinds of large rewrites easily.
--
Best wishes,
Mats Kindahl, Timescale
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-03-02 13:26:08 |
Message-ID: | CA+144273GOMq5NOc+ULbwxTk+Z1_1rwcpMs0gr8Yr6r9T=OjvQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sat, Jan 18, 2025 at 8:44 PM Mats Kindahl <mats(at)timescale(dot)com> wrote:
> On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
> aleksander(at)timescale(dot)com> wrote:
>
>> IMO the best solution would be re-submitting all the patches to this
>> thread. Also please make sure the patchset is registered on the
>> nearest open CF [1] This will ensure that the patchset is built on our
>> CI (aka cfbot [2]) and will not be lost.
>>
>> [1]: https://commitfest.postgresql.org/
>> [2]: http://cfbot.cputube.org/
>>
>>
>
Hi all,
Here is a new set of patches rebased on the latest version of the Postgres.
I decided to just include the semantic patches in each patch since it is
trivial to generate the patch and build using:
ninja coccicheck-patch | patch -d .. -p1 && ninja
I repeat the description from the previous patch set and add comments where
things have changed, but I have also added two semantic patches, which are
described last.
For those of you that are not aware of it: Coccinelle is a tool for pattern
> matching and text transformation for C code and can be used for detection
> of problematic programming patterns and to make complex, tree-wide patches
> easy. It is aware of the structure of C code and is better suited to make
> complicated changes than what is possible using normal text substitution
> tools like Sed and Perl. I've noticed it's been used at a few cases way
> back to fix issues.[1]
>
> Coccinelle have been successfully been used in the Linux project since
> 2008 and is now an established tool for Linux development and a large
> number of semantic patches have been added to the source tree to capture
> everything from generic issues (like eliminating the redundant A in
> expressions like "!A || (A && B)") to more Linux-specific problems like
> adding a missing call to kfree().
>
> Although PostgreSQL is nowhere the size of the Linux kernel, it is
> nevertheless of a significant size and would benefit from incorporating
> Coccinelle into the development. I noticed it's been used in a few cases
> way back (like 10 years back) to fix issues in the PostgreSQL code, but I
> thought it might be useful to make it part of normal development practice
> to, among other things:
>
> - Identify and correct bugs in the source code both during development and
> review.
> - Make large-scale changes to the source tree to improve the code based on
> new insights.
> - Encode and enforce APIs by ensuring that function calls are used
> correctly.
> - Use improved coding patterns for more efficient code.
> - Allow extensions to automatically update code for later PostgreSQL
> versions.
>
> To that end, I created a series of patches to show how it could be used in
> the PostgreSQL tree. It is a lot easier to discuss concrete code and I
> split it up into separate messages since that makes it easier to discuss
> each individual patch. The series contains code to make it easy to work
> with Coccinelle during development and reviews, as well as examples of
> semantic patches that capture problems, demonstrate how to make large-scale
> changes, how to enforce APIs, and also improve some coding patterns.
>
> The first three patches contain the coccicheck.py script and the
> integration with the build system (both Meson and Autoconf).
>
> # Coccicheck Script
>
> It is a re-implementation of the coccicheck script that the Linux kernel
> uses. We cannot immediately use the coccicheck script since it is quite
> closely tied to the Linux source code tree and we need to have something
> that both supports Autoconf and Meson. Since Python seems to be used more
> and more in the tree, it seems to be the most natural choice. (I have no
> strong opinion on what language to use, but think it would be good to have
> something that is as platform-independent as possible.)
>
> The intention is that we should be able to use the Linux semantic patches
> directly, so it supports the "Requires" and "Options" keywords, which can
> be used to require a specific version of spatch(1) and add options to the
> execution of that semantic patch, respectively.
>
I have added support for using multiple jobs similar to how "make -jN"
works. This is also supported by the autoconf and ninja builds
> # Autoconf support
>
> The changes to Autoconf modifies the configure.ac and related files (in
> particular Makefile.global.in). At this point, I have deliberately not
> added support for pgxs so extensions cannot use coccicheck through the
> PostgreSQL installation. This is something that we can add later though.
>
> The semantic patches are expected to live in cocci/ directory under the
> root and the patch uses the pattern cocci/**/*.cocci to find all semantic
> patches. Right now there are no subdirectories for the semantic patches,
> but this might be something we want to add to create different categories
> of scripts.
>
> The coccicheck target is used in the same way as for the Linux kernel,
> that is, to generate and apply all patches suggested by the semantic
> patches, you type:
>
> make coccicheck MODE=patch | patch -p1
>
> Linux as support for a few more variables: V to set the verbosity, J to
> use multiple jobs for processing the semantic patches, M to select a
> different directory to apply the semantic patches to, and COCCI to use a
> single specific semantic patch rather than all available. I have not added
> support for this right now, but if you think this is valuable, it should be
> straightforward to add.
>
> I used autoconf 2.69, as mentioned in configure.ac, but that generate a
> bigger diff than I expected. Any advice here is welcome.
>
Using the parameter "JOBS" allow you to use multiple jobs, e.g.:
make coccicheck MODE=patch JOBS=4 | patch -p1
> # Meson Support
>
> The support for Meson is done by adding three coccicheck targets: one for
> each mode. To apply all patches suggested by the semantic patches using
> ninja (as is done in [2]), you type the following in the build directory
> generated by Meson (e.g., the "build/" subdirectory).
>
> ninja coccicheck-patch | patch -p1 -d ..
>
> If you want to pass other flags you have to set the SPFLAGS environment
> variable when calling ninja:
>
> SPFLAGS=--debug ninja coccicheck-report
>
If you want to use multiple jobs, you use something like this:
JOBS=4 ninja coccicheck-patch | patch -d .. -p1
> # Semantic Patch: Wrong type for palloc()
>
> This is the first example of a semantic patch and shows how to capture and
> fix a common problem.
>
> If you use an palloc() to allocate memory for an object (or an array of
> objects) and by mistake type something like:
>
> StringInfoData *info = palloc(sizeof(StringInfoData*));
>
> You will not allocate enough memory for storing the object. This semantic
> patch catches any cases where you are either allocating an array of objects
> or a single object that do not have corret types in this sense, more
> precisely, it captures assignments to a variable of type T* where palloc()
> uses sizeof(T) either alone or with a single expression (assuming this is
> an array count).
>
> The semantic patch is overzealous in the sense that using the wrong
> typedef will suggest a change (this can be seen in the patch). Although the
> sizes of these are the same, it is probably be better to just follow the
> convention of always using the type "T*" for any "palloc(sizeof(T))" since
> the typedef can change at any point and would then introduce a bug.
> Coccicheck can easily fix this for you, so it is straightforward to enforce
> this. It also simplifies other automated checking to follow this convention.
>
> We don't really have any real bugs as a result from this, but we have one
> case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an
> "LLVMBasicBlockRef*", which strictly speaking is not correct (it should be
> "sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there
> is no risk of incorrect allocation size. One typedef usage that does not
> match.
>
> # Semantic Patch: Introduce palloc_array() and palloc_object() where
> possible
>
> This is an example of a large-scale refactoring to improve the code.
>
> For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
> commit 2016055a92f to provide more type-safety, but these functions are not
> widely used. This semantic patch captures and replaces all uses of palloc()
> where palloc_array() or palloc_object() could be used instead. It
> deliberately does not touch cases where it is not clear that the
> replacement can be done.
>
# Semantic Patch: replace code with pg_cmp_*
This is an example of a large-scale refactoring to improve the code.
In commit 3b42bdb4716 and 6b80394781c overflow-safe comparison functions
were introduced, but they are not widely used. This semantic patch
identifies some of the more common cases and replaces them with calls to
the corresponding pg_cmp_* function.
The patches give a few instructions of improvement in performance when
checking with Godbolt. It's not much, but since it is so easy to apply
them, it might still be worthwhile.
# Semantic Patch: Replace dynamic allocation of StringInfo with
StringInfoData
Use improved coding patterns for more efficient code.
This semantic patch replaces uses of StringInfo with StringInfoData where
the info is dynamically allocated but (optionally) freed at the end of the
block. This will avoid one dynamic allocation that otherwise has to be
dealt with.
For example, this code:
StringInfo info = makeStringInfo();
...
appendStringInfo(info, ...);
...
return do_stuff(..., info->data, ...);
Can be replaced with:
StringInfoData info;
initStringInfo(&info);
...
appendStringInfo(&info, ...);
...
return do_stuff(..., info.data, ...);
It does not do a replacement in these cases:
- If the variable is assigned to an expression. In this case, the
pointer can "leak" outside the function either through a global variable or
a parameter assignment.
- If an assignment is done to the expression. This cannot leak the data,
but could mean a value-assignment of a structure, so we avoid this case.
- If the pointer is returned.
The cases that this semantic patch fixed when I uploaded the first version
of the other patches seems to have been dealt with, but having it as part
of the code base prevents such cases from surfacing again.
> [1]: https://coccinelle.gitlabpages.inria.fr/website/
> [2]: https://www.postgresql.org/docs/current/install-meson.html
>
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0005-Semantic-patch-for-palloc_array-and-palloc_object.v3.patch | text/x-patch | 4.2 KB |
0003-Add-meson-build-for-coccicheck.v3.patch | text/x-patch | 4.6 KB |
0004-Semantic-patch-for-sizeof-using-palloc.v3.patch | text/x-patch | 1.7 KB |
0006-Semantic-patch-for-pg_cmp_-functions.v3.patch | text/x-patch | 4.4 KB |
0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v3.patch | text/x-patch | 4.4 KB |
0001-Add-initial-coccicheck-script.v3.patch | text/x-patch | 8.4 KB |
0002-Create-coccicheck-target-for-autoconf.v3.patch | text/x-patch | 9.1 KB |
From: | Mats Kindahl <mats(at)timescale(dot)com> |
---|---|
To: | Aleksander Alekseev <aleksander(at)timescale(dot)com> |
Cc: | PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net> |
Subject: | Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py |
Date: | 2025-03-10 07:08:09 |
Message-ID: | CA+14427LS9jKY2FuLwE5extZHK5DDzjFa_YG88yQ3k16WZ=How@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sun, Mar 2, 2025 at 2:26 PM Mats Kindahl <mats(at)timescale(dot)com> wrote:
> On Sat, Jan 18, 2025 at 8:44 PM Mats Kindahl <mats(at)timescale(dot)com> wrote:
>
>> On Tue, Jan 14, 2025 at 4:19 PM Aleksander Alekseev <
>> aleksander(at)timescale(dot)com> wrote:
>>
>>> IMO the best solution would be re-submitting all the patches to this
>>> thread. Also please make sure the patchset is registered on the
>>> nearest open CF [1] This will ensure that the patchset is built on our
>>> CI (aka cfbot [2]) and will not be lost.
>>>
>>> [1]: https://commitfest.postgresql.org/
>>> [2]: http://cfbot.cputube.org/
>>>
>>>
>>
> Hi all,
>
> Here is a new set of patches rebased on the latest version of the
> Postgres. I decided to just include the semantic patches in each patch
> since it is trivial to generate the patch and build using:
>
>
> ninja coccicheck-patch | patch -d .. -p1 && ninja
>
>
> I repeat the description from the previous patch set and add comments
> where things have changed, but I have also added two semantic patches,
> which are described last.
>
> For those of you that are not aware of it: Coccinelle is a tool for
>> pattern matching and text transformation for C code and can be used for
>> detection of problematic programming patterns and to make complex,
>> tree-wide patches easy. It is aware of the structure of C code and is
>> better suited to make complicated changes than what is possible using
>> normal text substitution tools like Sed and Perl. I've noticed it's been
>> used at a few cases way back to fix issues.[1]
>>
>> Coccinelle have been successfully been used in the Linux project since
>> 2008 and is now an established tool for Linux development and a large
>> number of semantic patches have been added to the source tree to capture
>> everything from generic issues (like eliminating the redundant A in
>> expressions like "!A || (A && B)") to more Linux-specific problems like
>> adding a missing call to kfree().
>>
>> Although PostgreSQL is nowhere the size of the Linux kernel, it is
>> nevertheless of a significant size and would benefit from incorporating
>> Coccinelle into the development. I noticed it's been used in a few cases
>> way back (like 10 years back) to fix issues in the PostgreSQL code, but I
>> thought it might be useful to make it part of normal development practice
>> to, among other things:
>>
>> - Identify and correct bugs in the source code both during development
>> and review.
>> - Make large-scale changes to the source tree to improve the code based
>> on new insights.
>> - Encode and enforce APIs by ensuring that function calls are used
>> correctly.
>> - Use improved coding patterns for more efficient code.
>> - Allow extensions to automatically update code for later PostgreSQL
>> versions.
>>
>> To that end, I created a series of patches to show how it could be used
>> in the PostgreSQL tree. It is a lot easier to discuss concrete code and I
>> split it up into separate messages since that makes it easier to discuss
>> each individual patch. The series contains code to make it easy to work
>> with Coccinelle during development and reviews, as well as examples of
>> semantic patches that capture problems, demonstrate how to make large-scale
>> changes, how to enforce APIs, and also improve some coding patterns.
>>
>> The first three patches contain the coccicheck.py script and the
>> integration with the build system (both Meson and Autoconf).
>>
>> # Coccicheck Script
>>
>> It is a re-implementation of the coccicheck script that the Linux kernel
>> uses. We cannot immediately use the coccicheck script since it is quite
>> closely tied to the Linux source code tree and we need to have something
>> that both supports Autoconf and Meson. Since Python seems to be used more
>> and more in the tree, it seems to be the most natural choice. (I have no
>> strong opinion on what language to use, but think it would be good to have
>> something that is as platform-independent as possible.)
>>
>> The intention is that we should be able to use the Linux semantic patches
>> directly, so it supports the "Requires" and "Options" keywords, which can
>> be used to require a specific version of spatch(1) and add options to the
>> execution of that semantic patch, respectively.
>>
>
> I have added support for using multiple jobs similar to how "make -jN"
> works. This is also supported by the autoconf and ninja builds
>
>
>> # Autoconf support
>>
>> The changes to Autoconf modifies the configure.ac and related files (in
>> particular Makefile.global.in). At this point, I have deliberately not
>> added support for pgxs so extensions cannot use coccicheck through the
>> PostgreSQL installation. This is something that we can add later though.
>>
>> The semantic patches are expected to live in cocci/ directory under the
>> root and the patch uses the pattern cocci/**/*.cocci to find all semantic
>> patches. Right now there are no subdirectories for the semantic patches,
>> but this might be something we want to add to create different categories
>> of scripts.
>>
>> The coccicheck target is used in the same way as for the Linux kernel,
>> that is, to generate and apply all patches suggested by the semantic
>> patches, you type:
>>
>> make coccicheck MODE=patch | patch -p1
>>
>> Linux as support for a few more variables: V to set the verbosity, J to
>> use multiple jobs for processing the semantic patches, M to select a
>> different directory to apply the semantic patches to, and COCCI to use a
>> single specific semantic patch rather than all available. I have not added
>> support for this right now, but if you think this is valuable, it should be
>> straightforward to add.
>>
>> I used autoconf 2.69, as mentioned in configure.ac, but that generate a
>> bigger diff than I expected. Any advice here is welcome.
>>
>
> Using the parameter "JOBS" allow you to use multiple jobs, e.g.:
>
> make coccicheck MODE=patch JOBS=4 | patch -p1
>
>
>
>> # Meson Support
>>
>> The support for Meson is done by adding three coccicheck targets: one for
>> each mode. To apply all patches suggested by the semantic patches using
>> ninja (as is done in [2]), you type the following in the build directory
>> generated by Meson (e.g., the "build/" subdirectory).
>>
>> ninja coccicheck-patch | patch -p1 -d ..
>>
>> If you want to pass other flags you have to set the SPFLAGS environment
>> variable when calling ninja:
>>
>> SPFLAGS=--debug ninja coccicheck-report
>>
>
> If you want to use multiple jobs, you use something like this:
>
> JOBS=4 ninja coccicheck-patch | patch -d .. -p1
>
>
>
>> # Semantic Patch: Wrong type for palloc()
>>
>> This is the first example of a semantic patch and shows how to capture
>> and fix a common problem.
>>
>> If you use an palloc() to allocate memory for an object (or an array of
>> objects) and by mistake type something like:
>>
>> StringInfoData *info = palloc(sizeof(StringInfoData*));
>>
>> You will not allocate enough memory for storing the object. This semantic
>> patch catches any cases where you are either allocating an array of objects
>> or a single object that do not have corret types in this sense, more
>> precisely, it captures assignments to a variable of type T* where palloc()
>> uses sizeof(T) either alone or with a single expression (assuming this is
>> an array count).
>>
>> The semantic patch is overzealous in the sense that using the wrong
>> typedef will suggest a change (this can be seen in the patch). Although the
>> sizes of these are the same, it is probably be better to just follow the
>> convention of always using the type "T*" for any "palloc(sizeof(T))" since
>> the typedef can change at any point and would then introduce a bug.
>> Coccicheck can easily fix this for you, so it is straightforward to enforce
>> this. It also simplifies other automated checking to follow this convention.
>>
>> We don't really have any real bugs as a result from this, but we have one
>> case where an allocation of "sizeof(LLVMBasicBlockRef*)" is allocated to an
>> "LLVMBasicBlockRef*", which strictly speaking is not correct (it should be
>> "sizeof(LLVMBasicBlockRef)"). However, since they are both pointers, there
>> is no risk of incorrect allocation size. One typedef usage that does not
>> match.
>>
>> # Semantic Patch: Introduce palloc_array() and palloc_object() where
>> possible
>>
>> This is an example of a large-scale refactoring to improve the code.
>>
>> For PostgreSQL 16, Peter extended the palloc()/pg_malloc() interface in
>> commit 2016055a92f to provide more type-safety, but these functions are not
>> widely used. This semantic patch captures and replaces all uses of palloc()
>> where palloc_array() or palloc_object() could be used instead. It
>> deliberately does not touch cases where it is not clear that the
>> replacement can be done.
>>
>
> # Semantic Patch: replace code with pg_cmp_*
>
> This is an example of a large-scale refactoring to improve the code.
>
> In commit 3b42bdb4716 and 6b80394781c overflow-safe comparison functions
> were introduced, but they are not widely used. This semantic patch
> identifies some of the more common cases and replaces them with calls to
> the corresponding pg_cmp_* function.
>
> The patches give a few instructions of improvement in performance when
> checking with Godbolt. It's not much, but since it is so easy to apply
> them, it might still be worthwhile.
>
> # Semantic Patch: Replace dynamic allocation of StringInfo with
> StringInfoData
>
> Use improved coding patterns for more efficient code.
>
> This semantic patch replaces uses of StringInfo with StringInfoData where
> the info is dynamically allocated but (optionally) freed at the end of the
> block. This will avoid one dynamic allocation that otherwise has to be
> dealt with.
>
> For example, this code:
>
> StringInfo info = makeStringInfo();
> ...
> appendStringInfo(info, ...);
> ...
> return do_stuff(..., info->data, ...);
>
> Can be replaced with:
>
> StringInfoData info;
> initStringInfo(&info);
> ...
> appendStringInfo(&info, ...);
> ...
> return do_stuff(..., info.data, ...);
>
> It does not do a replacement in these cases:
>
> - If the variable is assigned to an expression. In this case, the
> pointer can "leak" outside the function either through a global variable or
> a parameter assignment.
> - If an assignment is done to the expression. This cannot leak the
> data, but could mean a value-assignment of a structure, so we avoid this
> case.
> - If the pointer is returned.
>
> The cases that this semantic patch fixed when I uploaded the first version
> of the other patches seems to have been dealt with, but having it as part
> of the code base prevents such cases from surfacing again.
>
>
>> [1]: https://coccinelle.gitlabpages.inria.fr/website/
>> [2]: https://www.postgresql.org/docs/current/install-meson.html
>>
> --
> Best wishes,
> Mats Kindahl, Timescale
>
Hi all,
There was a problem with the meson.build file causing errors in the build
farm (because spatch is not installed), so here is a new set of patches.
Only the one with the meson.build has changed, but I am unsure how patches
are picked up, so adding a new version of all files here.
--
Best wishes,
Mats Kindahl, Timescale
Attachment | Content-Type | Size |
---|---|---|
0001-Add-initial-coccicheck-script.v4.patch | text/x-patch | 8.4 KB |
0003-Add-meson-build-for-coccicheck.v4.patch | text/x-patch | 4.6 KB |
0004-Semantic-patch-for-sizeof-using-palloc.v4.patch | text/x-patch | 1.7 KB |
0002-Create-coccicheck-target-for-autoconf.v4.patch | text/x-patch | 9.1 KB |
0005-Semantic-patch-for-palloc_array-and-palloc_object.v4.patch | text/x-patch | 4.2 KB |
0007-Semantic-patch-to-use-stack-allocated-StringInfoData.v4.patch | text/x-patch | 4.4 KB |
0006-Semantic-patch-for-pg_cmp_-functions.v4.patch | text/x-patch | 4.4 KB |