Landrum StateOfTheToolkit
Landrum StateOfTheToolkit
Greg Landrum
@[email protected]
@greg_landrum.bsky.social
What’s new in the last year?
2
Adoption / usage
Unlike with web apps or commercial software,
this is tricky to figure out with open source
tools, but let’s try.
3
Usage: Conda install counts (by operating system)
Last 12 months
Data collected using the
condastats package
4
Usage: Conda install counts (by operating system)
5
Usage: Conda install counts (by python version)
6
Usage: PyPi
7
rdkit-js usage:
8
Beyond download counts: what about other approaches for
looking at adoption?
9
Usage in other open-source projects (updated 2021)
● Shape-IT - shape-based alignment ● CheTo - Chemical topic modeling
● DockOnSurf - high-throughput code to find stable ● OCEAN - web-tool for target-prediction of
geometries for molecules on surfaces chemical structures which uses ChEMBL as
● https://datamol.io/ - A Python library to intuitively datasource
manipulate molecules.
● Scopy - Python library for desirable HTS/VS ● Coot - software for macromolecular model
database design building, model completion and validation
● ChEMBL Structure Pipeline - ChEMBL protocols ● DeepChem - deep learning toolkit for drug
used to standardise and salt strip molecules. discovery
● FPSim2 - Simple package for fast molecular ● sdf2ppt - Reads an SDFile and displays
similarity searches. molecules as image grid in powerpoint/openoffice
● stk (docs, paper) - a Python library for building, presentation.
manipulating, analyzing and automatic design of ● chemfp
molecules.
● OpenFF - Open source approach for better force ● PYPL - Simple cartridge that lets you call Python
fields scripts from Oracle PL/SQL.
● gpusimilarity - GPU implementation of fingerprint ● WONKA - Tool for analysis and interrogation of
similarity searching protein-ligand crystal structures
● Samson Connect - Software for adaptive ● OOMMPPAA - Tool for directed synthesis and
modeling and simulation of nanosystems data analysis based on protein-ligand crystal
● mol_frame - Chemical Structure Handling for structures
Dask and Pandas DataFrames ● chemicalite - SQLite integration for the RDKit
● mmpdb 2.0 - matched molecular pair database
generation and analysis ● django-rdkit - Django integration for the RDKit
● … more ...
10
Usage in online tools/resources
● ChEMBL
● ZINC
● Google Patents
● PDBe
● Enamine
● TeachOpenCADD
Disclaimer: this info is from public statements made by people associated with those projects. I almost
certainly have forgotten someone
11
Usage in commercial tools
● Amazon Web Services
● Collaborative Drug Discovery
● Cresset Software
● Dalke Scientific Software
● Datagrok
● Glysade
● MedChemica
● NextMove Software
● Schrödinger
● SCM
● Wolfram Research
Disclaimer: this info is from public statements made by people from those companies.
I almost certainly have forgotten someone
12
Other adoption measures
● Mailing lists: ~250 messages to
rdkit-discuss from 2022.09 - 2023.08
13
Community
The heart of any
successful open-source
project
14
Support
● Web searches
● Mailing list
● Github discussions
● Commercial support
15
Community support
16
Github community stats
Contributions to github issue tracker in the last year
AlanKerstjens Arch4ngel21 AttilaVM Boilermaker14 ChemRMB CreamyLong
DavidACosgrove Efim-Shats Hikoyu Hong-Rui JLVarjo JackFang0815 KrisVolkova
Leocontreas LiuCMU MariaDolotova OleinikovasV SPKorhonen StLeonidas UnixJunkie
ValeryPolyakov andresilvapimentel autodataming bddap ben-ikt bjonnh-work bp-kelley
bradakta bwolfe-benchling bzoracler cdvonbargen chloechow chmnk dangthatsright
davidegraff davidoskky diogomart eguidotti eloyfelix gayverjr gedeck giordano greglandrum
jasondbiggs jepdavidson jmyounk jones-gareth juius kienerj koalaaaaaaaaa kovalp
lavoisiermod lhyuen liushili0319 lounsbrough lpravda luwei0917 maclandrol mapengsen
mcneela mpagni12 oleksii-dukhno-bayer pablo-arantes peastman ptosco pwging13
rachelnwalker radchenkods rmrmg roccomoretti sagitter sakoht shortydutchie
sitanshubhunia spparel trallnag vfscalfani zpincus
18
How you can contribute/help: non-developers
● Use the code in your own projects and provide feedback:
■ Good bug reports
■ Ideas for improvements
■ Positive feedback via the mailing list/Github discussions
● Answering questions on the mailing list/Github
discussions
● Improve the documentation
■ in-code documentation
■ the “Getting started in Python” book
■ the “RDKit Book” reference
■ the “Cookbook”
● Write blog posts (either your own or for the RDKit blog)
● Contribute interesting scripts/libraries for the Contrib
folder
● Pay someone else to work on RDKit code1
1
It’s generally a good idea to check with Greg or one of the maintainers
before adding significant new functionality.
19
Sustainability: the bus problem
https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
20
Sustainability: the bus problem
RDKit maintainers:
- Greg
- Brian Kelley (Relay
Therapeutics)
- Ricardo Rodriguez
Schmidt (Schrödinger)
- Paolo Tosco (Novartis)
21
Most frequent code contributors in the last year
22
Merged pull request contributors in the last year
23
Maintenance work in the last year
We started tracking maintenance/cleanup work with the
2019.09 release.
For the 2023.03 and 2023.09 releases, there have been >45
“cleanup” issues/PRs merged:
Greg Landrum 15
Paolo Tosco 13
Ric 5
David Cosgrove 3
Riccardo Vianello 2
github-actions[bot] 1
Vedran Miletić 1
Rocco Moretti 1
Juuso Lehtivarjo 1
Jonathan Bisson 1
Iren Azra Azra Coskun 1
Gareth Jones 1
Eisuke Kawashima 1
Dan N 1
24
Roadmap
25
Still, some parts of the way forward are pretty obvious...
Making sure all the pieces required to
build a good compound registration
system are there
Performance improvements
26
Taking big steps forward…
27
Some things are hard...
Technology changes (i.e. taking advantage of new C++ or
Python versions) is tricky: which operating systems/compilers
are people using?
28
… what we’re doing about it
Try to minimize hard external dependencies
29
Thinking about changing the RDKit release model
Motivation: make new functionality available sooner
Current:
● Feature releases twice a year, e.g. 2023.03
■ Possibly including backwards-incompatible changes
● Patch releases every 4-6 weeks, e.g. 2023.03.2
■ Only bug fixes, but these can still change results
Possible alternative:
● Major releases twice a year, e.g. 2023.09
■ Possibly including backwards-incompatible changes
● Minor releases every 4-6 weeks, e.g. 2023.09.2
■ Include bug fixes (can change results)
■ Include backwards-compatible new features
30
State of the RDKit?
31