Python For Chemistry in 21 Days: Minutes
Python For Chemistry in 21 Days: Minutes
Word of warning!!
My mental eye could now distinguish larger structures, of
manifold conformation; long rows, sometimes more closely
fitted together; all twining and twisting in snakelike motion.
But look! What was that? One of the snakes had seized
hold of its own tail, and the form whirled mockingly before
my eyes. As if by the flash of lightning I awoke...Let us learn
to dream, gentlemen
Friedrich August Kekul (1829-1896)
What is Python?
For a computer scientist...
a high-level programming language
interpreted (byte-compiled)
dynamic typing
object-oriented
What is Python?
For a computer scientist...
a high-level programming language
interpreted (byte-compiled)
dynamic typing
object-oriented
For everyone else...
a scripting language (like Perl or Ruby) released by
Guido von Rossum in 1991
easy to learn
easy to read (!)
What is Python?
For a computer scientist...
a high-level programming language
interpreted (byte-compiled)
dynamic typing
object-oriented
For everyone else...
a scripting language (like Perl or Ruby) released by
Guido von Rossum in 1991
easy to learn
easy to read (!)
named after Cambridge comedians
The Great Debate
Sir Lancelot:
We were in the nick of time. You were in great Perl.
Sir Galahad:
I don't think I was.
Sir Lancelot:
You were, Sir Galahad. You were in terrible Perl.
Sir Galahad:
Look, let me go back in there and face the Perl.
Sir Lancelot:
No, it's too perilous.
At least one object oriented programming language, e.g., Python, C++, Java.
Web-based application development (design/construction/maintenance)
UNIX, UNIX scripting & Linux OS
linear regression
(and more)
pylab
pychem: Using scipy for chemoinformatics
http://www.redbrick.dcu.ie/~noel/RversusPython.html
Python and R
rpy module allows Python programs to interface with R
have the best of both worlds
access to the statistical functions of R
access to the numerous modules available for Python
can program in Python, instead of in R!!
Python R
>>> from rpy import r
>>> x = [5.05, 6.75, 3.21, 2.66] > x <- c(5.05, 6.75, 3.21, 2.66)
>>> y = [1.65, 26.5, -5.93, 7.96] > y <- c(1.65, 26.5, -5.93, 7.96)
>>> print r.lsfit(x,y)['coefficients'] > lsfit(x, y)$coefficients
{'X': 5.3935773611970212, Intercept X
'Intercept': -16.281127993087839} -16.281128 5.393577
Python and R
> hc$merge
Problem: Analyse a hierarchical clustering [,1] [,2]
[1,] -32 -33
Solution: Use R to cluster, and Python to [2,] -39 -71
analyse the merge object of the cluster [3,] -43 -47
[4,] -10 -55
[5,] -19 -36
[6,] -5 -24
[7,] -62 -63
[8,] -74 -75
[9,] -35 -76
[10,] -1 -84
[11,] -41 -42
[12,] -83 -96
[13,] -2 -29
[14,] -7 -21
[15,] -61 4
Problem 1
Graphically show the distribution of molecular weights of
molecules in an SD file. The molecular weight is stored in
a field of the SD file.
1,2-Diaminoethane
MOE2004 3D
6 3 0 0 0 0 0 0 0 0999 V2000
-0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0
0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0
1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane
> <molecular.weight>
60.0995
$$$$
Object-oriented approach
object SD file
attributes fields
name atoms
inputfile = "mddr_complete.sd"
allmolweights = []
r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution
inputfile = "mddr_complete.sd"
allmolweights = []
r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution
inputfile = "mddr_complete.sd"
allmolweights = []
r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Solution
inputfile = "mddr_complete.sd"
allmolweights = []
r.png(file="molwt_r.png")
r.hist(allmolweights,xlab="Mol. weights",main="MDDR", col="red")
r.dev_off()
Problem 2
Every molecule in an SD file is missing the name. To be
compatible with proprietary program X, we need to set the
name equal to the value of the field chem.name.
(MISSING NAME!)
MOE2004 3D
6 3 0 0 0 0 0 0 0 0999 V2000
-0.6900 -0.6620 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.5850 -1.9590 0.8240 H 0 0 0 0 0 0 0 0 0 0 0 0
0.5350 1.5040 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
0.6060 2.0500 0.8460 H 0 0 0 0 0 0 0 0 0 0 0 0
1.3040 0.8520 -0.0460 H 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
> <chem.name>
1,2-Diaminoethane
> <molecular.weight>
60.0995
$$$$
Solution
inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"
inputfile.close()
outputfile.close()
inputfile = "mddr_complete.sd"
outputfile = "mddr_withnames.sd"
inputfile.close()
outputfile.close()
def getNumRings(molecule):
# Convert to a CDK molecule
reader = CMLReader(java.io.StringReader(molXmlValue))
chemFile = reader.read(cdk.ChemFile())
cdkMol = chemFile.getChemSequence(0).getChemModel(0).getSetOfMolecules().getMolecule(0)
# Calculate the number of rings
sssrFinder = SSSRFinder(cdkMol)
sssr = sssrFinder.findSSSR().size()
return sssr
3D visualisation
VTK (Visualisation Toolkit) from Kitware
open source, freely available
scalar, tensor, vector and volumetric methods
advanced modeling techniques such as implicit modelling,
polygon reduction, mesh smoothing, cutting, contouring, and
Delaunay triangulation
MayaVi
easy to use GUI interface to VTK, written in Python
can create input files and visualise them using Python scripts
Demo
Python Resources
http://www.python.org
Guido's Tutorial
http://www.python.org/doc/current/tut/tut.html
O'Reilly's Learning Python or Visual Quickstart Guide
to Python
Make sure it's Python 2.3 or 2.4 though
For Windows, consider the Enthought edition
http://www.enthought.com/