Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop YAJL dependency #3308

Open
mikelolasagasti opened this issue Nov 26, 2024 · 19 comments
Open

Drop YAJL dependency #3308

mikelolasagasti opened this issue Nov 26, 2024 · 19 comments
Labels
2.x Related to ModSecurity version 2.x 3.x Related to ModSecurity version 3.x

Comments

@mikelolasagasti
Copy link

yajl library has been unmaintained upstream[1] since 2015. Last published release contians multiple CVEs (CVE-2023-33460, CVE-2022-24795, CVE-2017-16516) and fixes have had to be carried by downstream distributions and third parties. This is not an ideal situation. While a fork exists[2], it is unclear how widely adopted or blessed it is downstream.

As the maintainer of libmodsecurity for Fedora[3] and EPEL, I recently found that the yajl library will not be shipped with RHEL 10. This means a new maintainer would be required to add it to EPEL. The previous maintainer for RHEL & EPEL has recommended moving away from yajl[4].

Currently, libmodsecurity can be built without yajl, but I understand that making it mandatory is considered desirable, as per #3144 and #3151.

Given the security concerns and the upstream status of yajl, I recommend opening a discussion on dropping yajl as a dependency and exploring alternative JSON libraries, such as JSON-C, which is actively maintained and more widely adopted.

[1] https://github.com/lloyd/yajl
[2] https://github.com/robohack/yajl/
[3] https://src.fedoraproject.org/rpms/libmodsecurity
[4] https://lists.fedoraproject.org/archives/list/[email protected]/message/YPFHPOKAND3RZR7ZKWTDHUQEESG6IUJ3/

@mikelolasagasti mikelolasagasti added the 3.x Related to ModSecurity version 3.x label Nov 26, 2024
@airween
Copy link
Member

airween commented Nov 26, 2024

Agree, we should change.

What do you think about the "official" YAML library?

@airween airween added the 2.x Related to ModSecurity version 2.x label Nov 26, 2024
@airween
Copy link
Member

airween commented Nov 26, 2024

Note, I added label [2.x] too, because mod_security2 is also affected.

@dune73
Copy link
Member

dune73 commented Nov 26, 2024

This does not sound good. Thank you for bringing this to the attention @mikelolasagasti.

@berrange
Copy link

What do you think about the "official" YAML library?

I'm the Fedora yajl maintainer quoted in the link above. While YAML may be a superset of JSON, I'd not recommend using a YAML parser instead of a native JSON parser. It'll perform worse, and potentially open the code up to attacks that are impossible with pure JSON (aka the "Billion laughs attack" in YAML context). I can vouch for using json-c as an alternative to yajl, as that's what was successfully adopted in projects I work on. Be slightly cautious about jansson - when we previously tried it, we found it was not able to cope with the full numeric range of the uint64 type - if that's important to modsecurity's needs.

@airween
Copy link
Member

airween commented Nov 26, 2024

Hi @berrange,

While YAML may be a superset of JSON, I'd not recommend using a YAML parser instead of a native JSON parser.

You're right, that was a mistake from me - sorry, and thank you.

@dune73
Copy link
Member

dune73 commented Nov 26, 2024

Thank you for this very valuable input @berrange. That puts the replacement discussion on a far better base.

@marcstern
Copy link

These are the libraries that are reported (by their authors and third-parties) as the fastest:

It would be interesting to check their functionalities for our intended use:

  • validation
  • callback to store ARGS (or should we avoid that and use only the resulting sJSON onjects?)
  • memory footprint

@dune73
Copy link
Member

dune73 commented Nov 27, 2024

There are a few subtle behavior quirks of yajl that should be examined for alternative libraries too. And then documented in case.

The behavior with empty request body for example. From the back of my head I am no longer sure if yajl rejects that and other parsers are OK with it or if it's the other way around. But I've seen this problem in production before.

@marcstern
Copy link

This would definitely need extensive test cases

@airween
Copy link
Member

airween commented Feb 12, 2025

Just a quick remark for this issue:

  • I started to make a small tool which can help to compare different JSON implementations; still work in progress
  • I've contacted with YAJL developer but after (almost) two weeks still no reply
    • Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

Other ideas?

@berrange
Copy link

I've contacted with YAJL developer but after (almost) two weeks still no reply

AFAICT they haven't replied to anyone asking about YAJL on github in many years. I see no notable github activity on their account in 10 years.

* Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

The thing about yajl forks is that there are already many to choose from, each with a random selection of changes, but none seem to gain critical mass/attention.

Personally I'd just recommend letting yajl fade out. There are already two actively maintained alternative plain C JSON libraries - jansson & json-c - json-c slightly more active, as well as glib-json for apps which happen to use GLib. Between them they ought to be able to satisfy JSON needs for C applications, unless there's an unavoidable need for historical bug compatibility with YAJL. Even bug compatibility is questionable since JSON ought to be used/consumed in a way that ensures it is interoperable across impls, so it doesn't feel like further investment in YAJL is a long term benefit.

@airween
Copy link
Member

airween commented Feb 12, 2025

AFAICT they haven't replied to anyone asking about YAJL on github in many years. I see no notable github activity on their account in 10 years.

I saw that in some place but gave a try...

* Perhaps we should contact other contributors and fork YAJL repository, then fix the available issues...?

The thing about yajl forks is that there are already many to choose from, each with a random selection of changes, but none seem to gain critical mass/attention.

that's said.

Personally I'd just recommend letting yajl fade out. There are already two actively maintained alternative plain C JSON libraries - jansson & json-c - json-c slightly more active, as well as glib-json for apps which happen to use GLib. Between them they ought to be able to satisfy JSON needs for C applications, unless there's an unavoidable need for historical bug compatibility with YAJL. Even bug compatibility is questionable since JSON ought to be used/consumed in a way that ensures it is interoperable across impls, so it doesn't feel like further investment in YAJL is a long term benefit.

If we don't have other chance we must switch, but honestly I'd like to avoid that. If we will switch, I do not see any reason to keep any compatibility with old bugs, I agree with you.

And thanks for following this issue.

@airween
Copy link
Member

airween commented Feb 13, 2025

Just a side note: it seems like Debian package maintainer made a fork and he tries to contribute the code. This is the upstream of official Debian package.

@airween
Copy link
Member

airween commented Mar 13, 2025

I made some researches in this topic, and now I'm completely disappointed. Honestly, I think it's harder to make a good decision than I thought.

Here are the libraries what I checked.

Libray Has GH organization Language Has SAX parser Speed Last commit Last release Has Debian package
json-c C slow (it uses traditional lexer and parser) 2025.01.09 2024.09.15
simdjson C++ (no C extension, need to write a wrapper) extreme fast 2025.03.06 2025.02.14
Jansson C slower than json-c 2024.03.31 2021.09.09
cJSON C fast 2024.09.23 2024.05.13
RapidJSON C++ (no C extension) fast 2025.02.05 2016.08.25
yyjson C fast 2025.03.12 2024.07.09 ✅ (After Trixie)
Ultrajson C fast 2025.03.01 2024.05.14
JSMN C fast 2021.10.14 2019.06.23
Centijson C ? 2024.01.20 -
cJSON C ? 2024.09.23 2024.05.13

I know the list is far not complete, but I haven't more time. If anyone can make similar research, please send me the results and I will add it to this sheet.

Here is a good collection of available parsers, but I'm not sure that's a complete list.

Aspects that I took into account:

  • has a GH organization: perhaps this is a bit weird aspect but learning from the current situation (YAJL developer does not response to anyone, does not merge PR's, ...) I'd be happy if I saw that the project belongs to an organization
  • language: it would be nice if the library also supported the C language; if the library's language is C++ but not native C support, we can make a wrapper, but then we bring a C++ compiler as a dependency
  • has SAX parser: YAJL uses SAX parser, I think we should stay on this method, especially in case of huge payloads
  • speed: this is a bit of a vague point; I didn't find any exact information, just asked chatGPT... but I think we should keep in mind
  • last commit and last release: I think it's obviously; these parameters shows how active the project is
  • has Debian package: yes I'm a Debian user, but I hadn't enough time to check other distributions that they have package or not; but I think it's an important indicator if the user can install dependencies from a package (especially the reason of this issue is that YAJL library no longer exists in the mentioned distribution).

If anyone has other aspects that we should consider, please leave a comment.

And any help are welcome to continue this research.

Update on 2025.03.27:

  • ȷson-c does not support SAX parsing
  • fixed Ultrajson URL
  • added JSMN
  • added Centijson
  • added cJSON

Update on 2025.03.28:

  • Janson does not support SAX parsing

Update on 2025.01.01:

  • yyjson has Debian package (after Trixie)

@airween
Copy link
Member

airween commented Mar 31, 2025

To all: I've reviewed the available JSON parsers and organized them by my criteria - see this comment.

There are some important requirements that the candidate must meet:

  • must be use in native C - it's necessary for mod_security2; I don't want to use any own wrapper, because then the code will have a new dependency, a C++ compiler
  • it must have a SAX parser - in case of DOM parser, the processor builds the whole tree in memory; I think that would be a waste
  • be available as an operating system package - I think this does not need any special preparation: the original reason of this issue was that some distribution removed the YAJL packages.

Based on the sheet above (inside of the comment) there is only ONE JSON library which fit these requirements: the JSMN, but unfortunately that's - we can say - also an unmaintained codebase. The last commit was on 2021.10.14, and the last release was on 2019.06.23. This is very sad.

Actually I don't see the solution. I reviewed all listed library here (see the C libraries, you should scroll down), but I couldn't find any usable one.

Meanwhile I created a small tool which emulates the engines' behavior (especially it's closer to mod_security2's behavior). If anyone wants to try to add a new library, feel free to do that. Or anyone has an idea what library is missing from the list above, and thinks that would be a good candidate, please share that and I will add that.

The tool is here:
https://github.com/digitalwave/jsonbench

Also please if you think I forget to add a usable library above, please notice me.

@mikelolasagasti
Copy link
Author

yyjson seems to have an option read data using YYJSON_READ_INSITU without having to parse the whole DOM document. It's not SAX, but could be close enough for modsecurity needs?

Also, yyjson is packaged for Debian Testing and many other distributions:

@mikelolasagasti
Copy link
Author

Looks like the i3 project is also exploring alternatives to YAJL: i3/i3#6257

@airween
Copy link
Member

airween commented Apr 1, 2025

yyjson seems to have an option read data using YYJSON_READ_INSITU without having to parse the whole DOM document. It's not SAX, but could be close enough for modsecurity needs?

I have to check this, thanks. Btw if you want to take a look and/or try that, feel free to add yyjson to jsonbench tool.

Also, yyjson is packaged for Debian Testing and many other distributions:

* https://tracker.debian.org/pkg/yyjson
* https://repology.org/project/yyjson/versions

Ah, thanks - now I realized that it has been added in May of last year, so it's part of SID and testing (which is the Trixie now). But it's a good sign.

Thanks again for your help!

@airween
Copy link
Member

airween commented Apr 1, 2025

Looks like the i3 project is also exploring alternatives to YAJL: i3/i3#6257

Yes, and it seems like they try yyjson too.

Thanks, now I think it worth a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to ModSecurity version 2.x 3.x Related to ModSecurity version 3.x
Projects
None yet
Development

No branches or pull requests

5 participants