0% found this document useful (0 votes)
25 views

Python EuroQSAR2008

The document discusses Python's suitability for computational chemistry and compares various cheminformatics toolkits available for Python. It recommends OEChem, RDKit, and CDK as capable toolkits for high-end users, and OpenBabel and RDKit as good options for more general use due to their community support and expanding feature sets.

Uploaded by

Agung Suryaputra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Python EuroQSAR2008

The document discusses Python's suitability for computational chemistry and compares various cheminformatics toolkits available for Python. It recommends OEChem, RDKit, and CDK as capable toolkits for high-end users, and OpenBabel and RDKit as good options for more general use due to their community support and expanding feature sets.

Uploaded by

Agung Suryaputra
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Python for Computational Chemistry

Andrew Dalke <[email protected]>, Andrew Dalke Scientific AB, Göteborg, Sweden

(Wherein I describe how Python is the best


choice of high-level programming language
for computational chemistry. ) Timeline of cheminformatics toolkits* *(runs on Unix and supports SMILES and SMARTS)

1995 and earlier 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

Daylight
C and Fortran
Is a wrapper
DayPerl
Developer moved
DaySWIG between projects
Tcl, Python and more

PyDaylight higher-level Python API


frowns
Python; API based on PyDaylight

(OBabel) OELib OEChem +Ogham &Lexichem


Babel
(not a library) C++ +Python C++ +Python +Java
(third-party package)

Guidelines OpenBabel
+Python, Perl +Java, Ruby
OEChem and its sister libraries for molecular modeling are fast, flexible, powerful
and complete (except for fingerprints). It is designed for high-end users who know Pybel
the nuances of cheminformatics. Expensive. My choice for C++, Java and Python. higher-level Python API

CDK is the toolkit to use if you are on the JDK and OEChem is too pricey. It has a
RDKit
strong structure and structural biology component, close ties with 2D and 3D C++/Python - internal library
display programs, and integration with Bioclipse, Taverna, and Knime. Public release on Sourceforge

RDKit is relatively new and with a small user community. The software Accessible from the C version of Python
Accessible from the Java version of Python (Jython) cinfony
engineering skills are the best of the free projects. Includes 2D layout, 2D→3D,
abstraction API
QSAR, forcefield, shape and machine learning components. Worth a look!
JOELib
OpenBabel is the most community driven. Its strength is file format conversion, for Java; API based on OELib
both small molecules and biomolecules. It is expanding towards more modeling
support, including several forcefield implementations. Often used as a test-bed for Part of JChemDraw
CDK
new algorithms. Code quality is variable, reflecting the diverse contributor base.
Java * Python with COM extensions should work just fine with Accord SDK on Windows.
Do not use the Daylight toolkit for new code. It is expensive, there's very little new I have no experience with it.
development, and you can get nearly all of its functionality elsewhere.

Answers to the question: “How do I do _____ in Python?”


Plotting 2D Depiction 3D Structure visualization MD and QM
Use matplotlib. It produces great plots, is easy Your chemists want ChemDraw. Probably PyMol. In Python? Don't.
to use, and has a big support community.
But if you have to depict a structure you can use People have diverse personal preferences about their That's the realm of FORTRAN and C/C++. You'll
from pylab import *
from data_helper import get_daily_data OEChem, RDKit or CDK. There's even BKChem choice of structure viewer. There are so many, even not find much Python there. There's nMOLDYN,
intc, msft = get_daily_data() which is a 2D editor for Python, but I've never restricting myself to those programmable in Python. The BALLView and PyQuante but you'll probably want
used it. If you're only interested in depictions then best general purpose choice is PyMol, but Chimera and one of the more well-known and used programs.
delta1 = diff(intc.open)/intc.open[0]
also consider the command-line depictors from VMD (for trajectory visualization) are also good.
Matrix math, Fourier
# size in points ^2
volume = (15*intc.volume[:-2]/intc.volume[0])**2 Molinspiration and CACTVS.
close = 0.003*intc.close[:-2]/0.003*intc.open[:-2]
scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.75)

ticks = arange(-0.06, 0.061, 0.02)


Thanks to Noel O'Boyle for the pointers and the transforms, ODEs, and
images, from his blog article "Cheminformatics
xticks(ticks)
yticks(ticks)
toolkit face-off - Depiction Part 2"
other deep math topics
xlabel(r'$\Delta_i$', fontsize=20)
ylabel(r'$\Delta_{i+1}$', fontsize=20) Use numpy and scipy.
title('Volume and percent change')
grid(True)

Clustering, SVM, and other


show()

Scatterplot example from matplotlib


machine learning. And R.
Web applications molinspiration CACTVS RDKit PyMol
Some popular packages are Shogun, libsvm and
Use Django. Excel PyCluster. Plenty are available, it's mostly a matter
If you're on Windows, try the win32com interface to control of finding a good quality one. But it seems the
There are other options, like Zope and TurboGears, best code is in R (a programming language for
Excel. You can use it to open and read a spreadsheet for
but if you don't already know about them then the doing math) and not Python. Solution? Use RPy to
you, modify data in an existing spreadsheet, make charts,
answer is Django. exchange data between Python and R.
and more.
If you're doing Javascript programming, use jQuery. I OASA CDK
If you want to read an Excel "csv" file, use the "csv" module
also like MochiKit because it makes Javascript
from Python's standard library.
programming feel like Python. GUI Programming
It seems that all my clients these days want web If you're on unix and want to read an Excel file, try xlrd and
Databases applications so I don't have much experience here. then let me know. I've only read about it.
If you're developing a Django web application then There's two main choices: wxPython and Qt. I like
use its ORM. the Qt API and functionality better. I've known Command-line wrappers
people who used wxPython and complained about
Use the subprocess module.
If you like writing SQL statements, use the DB-API how the APIs are always in flux, especially for the
library for your database of choice. (Got Oracle? Install spreadsheet table. Screenshot of Shogun for Python
And likely a bunch of parsing, guess work, error-handling,
cx_oracle to connect to it from Python.)
and perhaps a dash of evil genius. But start with subprocess.
On the other hand, the PyChem developers (a
If you like object interfaces, use SQLAlchemy. multivariate analysis package, shown below) used
wxPython and found it very useful. Interfaces to compiled libraries .Net or Silverlight
Recent versions of Python include a in-process
relational DBMS called SQLite. It's easy to use and it My favorite is ctypes, which lets Python call C libraries You're cutting edge, aren't you?
might make more sense to have a SQLite database in directly without compiling glue code. It needs some hand-
your project instead of a set of flat files. written code but at least it's Python code. IronPython is an implementation of Python for .Net,
and it's being developed by Microsoft. However, the
SWIG is best if you have a lot of code and want to automate
Training building the interface, or if you want bindings to multiple
chemistry toolkits haven't caught up with you. I
haven't heard of any .Net libraries with
I teach Python courses for cheminformatics, languages. OEChem and OpenBabel use SWIG. cheminformatics support. Word on the street is that
designed for working scientists who want to get OpenBabel is working towards it.
better training in that aspect of their research. If Boost.Python helps make interfaces to C++ code. It's hand-
you are interested, email me or see: written code, in C++, but it does a very good job converting You might be lucky and get IronClad to work. It's a
between C++ and Python expectations. RDKit uses way for IronPython code to call CPython extensions.
http://dalkescientific.com/training/ PyChem Boost.Python.

You might also like