0% found this document useful (0 votes)
91 views

Simulating Bio Molecules With Python

Python and C serve as the basis for a molecular modeling toolkit called MMTK. MMTK provides standard molecular simulation techniques like molecular dynamics in a ready-to-use form and also allows for new techniques to be easily implemented. MMTK uses Python for its high-level components and C for computationally intensive low-level tasks to balance rapid development and efficient execution.

Uploaded by

Taty Mamy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Simulating Bio Molecules With Python

Python and C serve as the basis for a molecular modeling toolkit called MMTK. MMTK provides standard molecular simulation techniques like molecular dynamics in a ready-to-use form and also allows for new techniques to be easily implemented. MMTK uses Python for its high-level components and C for computationally intensive low-level tasks to balance rapid development and efficient execution.

Uploaded by

Taty Mamy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Simulating Biomolecules with Python

Category: Science
Keywords: Data Visualization, Biology, Computational Chemistry
Title: Simulating Biomolecules with Python
Author: Konrad Hinsen
Date: 2005-04-20
Website: http://dirac.cnrs-orleans.fr/MMTK/
Summary: Python and C serve as the basis for a molecular modeling toolkit.

Logo:

Background
The Molecular Modeling Toolkit (MMTK) is a open source Python library for molecular
modeling and simulation with a focus on biomolecular systems, written in a mixture of
Python and C. It provides standard techniques such as Molecular Dynamics or normal
mode calculations in a ready-to-use form, but also provides a basis of low-level
operations on top of which new techniques can easily be implemented.

I started developing MMTK in 1996. I had some experience with mainstream simulation
packages for biomolecules that were written in Fortran and had their origins in the 1970s.
Those packages were too cumbersome to use and in particular to modify and extend.
Since my research work is focused on the development of new simulation techniques,
modifiability was a particularly important criterion.
Dynamic deformation of the chaperon protein GroEL, obtained with the MMTK-based
interactive DomainFinder (Zoom in)

Characteristic features of biomolecular simulations that had to be taken into account are
the long execution times of some simulation techniques (several weeks are not
uncommon) and the complexity of the data structures describing biomolecules.

Choice of languages
The choice of Python plus C was made after an evaluation of various languages. I was
rapidly convinced that only a mixture of a high-level interpreted language and a CPU-
efficient compiled language could meet my seemingly conflicting requirements of rapid
development and efficient execution.

For the high-level part, Tcl was ruled out because it could not handle the complex data
structures required by the project. Perl was ruled out because of its unpleasant syntax
(this was of course a subjective choice), and because of its badly integrated OO
mechanism. Python scored high in readability, OO support, library support, and
integration with compiled languages. Moreover, Numerical Python had just been released
and was an important building block for my developments.
For the low-level part, Fortran 77 was eliminated because of its archaic character, lack of
memory management, and portability issues in C-Fortran interfacing. C++ was a
candidate, but ultimately not chosen because portability between compilers was still an
issue in 1996, and because I considered the benefits of C++ for the small amount of
compiled code in the project insufficient to compensate for the complexity of the
language.

Library architecture
The architecture of MMTK is clearly Python-driven. To the user, it presents itself as a
pure Python library. The C code in MMTK was written from scratch in the form of
Python extension modules that only handle the few time-critical aspects: evaluation of
interaction energies, and long-running iterative algorithms such as energy minimization
and Molecular Dynamics, which run without any Python-related overhead. Extensive use
is made of Numerical Python, LAPACK, and the netCDF library. MMTK provides multi-
threading support for shared memory parallel machines, and MPI-based parallelization
for distributed memory machines.

The biggest part of MMTK is a set of classes that describe atoms and molecules and
manage a database of molecules and fragments. Biomolecules (proteins, DNA, and RNA)
are handled by subclasses of the generic Molecule class. Another important subset of
MMTK implements schemas for calculating interaction energies (called somewhat
incorrectly "force fields" in the simulation community). I/O-related code is the third pillar
of MMTK. It reads and writes a few popular file formats plus its own trajectory format
that is based on the netCDF format. Contrary to other trajectory file formats, MMTK's
netCDF files are both binary (and thus compact) files and portable between platforms.
and moreover permit efficient access to nearly arbitrary subsets.
Snapshot from a Molecular Dynamics simulation of lysozyme in water, run with MMTK.
Zoom in

Modularity and extendibility were important design criteria. Algorithms, energy terms,
and specializations of the data types can be added without having to modify the MMTK
code. The design of MMTK as a library, rather than a closed program, is essential for
many applications.

An important aspect of biomolecular simulations is visualization. MMTK delegates this


task to external tools. Two visualization programs, VMD and PyMOL, are particularly
well integrated.

Most MMTK users access the library from simple Python scripts, but MMTK has also
been used as a basis for end-user programs with graphical user interfaces, such as
nMOLDYN and DomainFinder.

MMTK currently consists of about 18,000 lines of Python code, 12,000 lines of hand-
written C code, and some machine-generated C code. The majority of the code was
developed by one person during eight years as part of a research activity. Two modules,
some functions, and many ideas were contributed by the user community.

Practical experience
MMTK and other Python libraries have been the basis for all my research projects for ten
years. Many of these projects would not have been possible without the rapid prototyping
that is characteristic for Python. In methodological work, development and testing time is
essential: an idea that can be tried out in an afternoon will be tried out, whereas an idea
that requires a week of work for evaluation is often put aside.

As with all open source projects, the size of the MMTK user community can only be
estimated indirectly. The mailing list for MMTK users currently has 175 members, and
the scientific publication that describes MMTK to computational chemists has been cited
30 times.

About the author


Konrad Hinsen is a researcher in theoretical physics working for the French Centre
National de la Recherche Scientifique (CNRS). He was involved in the Numerical Python
project and is the author of ScientificPython, a general-purpose library of scientific
Python code.
Prijevod:

Simuliranje biomolekula s Python


Kategorija: Znanost
Ključne riječi: vizualizacija podataka, biologija, računalne kemije
Naslov: Simuliranje biomolekula s Python
Autor: Konrad Hinsen
Date: 2005/04/20
Web stranica: http://dirac.cnrs-orleans.fr/MMTK/
Sažetak: Python i C služe kao osnova za molekularno modeliranje alat.
Logo:
/ Files / uspjeh / mmtk / mmtk-logo.gif
Pozadina

Molekularno modeliranje Toolkit (MMTK) je open source Python biblioteka za


molekularno modeliranje i simulacija s naglaskom na biomolekularne sustava, napisana u
mješavinu Python i C. To osigurati standardne tehnike kao što su molekularne dinamike
ili normalan način izračuna u spremni- to-iskoristiti obliku, ali također pruža osnovu
niske razine operacija na vrhu kojih novih tehnika se lako može provesti.

Počela sam u razvoju MMTK u 1996. Imao sam neka iskustva s mainstream simulaciju
biomolekula pakete za koje su napisane u FORTRAN i imali su svoje korijene u 1970. Ti
paketi su previše glomazan za korištenje, a posebice to promijeniti i proširiti. Budući da
moj istraživački rad je fokusiran na razvoj novih tehnika simulacije, modifiability je
posebno važan kriterij.
Primjer MMTK Molekularna model

Dinamička deformacija GroEL pratilica proteina, dobiveni MMTK-baziran interaktivni


DomainFinder (Uvećaj)

Karakteristične značajke biomolekularne simulacije koje je trebalo uzeti u obzir su dugo


vrijeme izvršenja simulacije nekih tehnika (nekoliko tjedana nisu rijetki) i složenosti
strukture podataka koji opisuju biomolekula.
Izbor jezika

Izbor Python plus C je donesena nakon procjene različitih jezika. Brzo sam bio uvjeren
da samo mješavina visoke razine tumači jezik i CPU-efikasan sastavljen jezik mogao
ispuniti moje naoko proturječne zahtjeve brz razvoj i efikasno izvršenje.

Za visoke razine dijelu, Tcl je odbacio jer nije mogao nositi kompleksne strukture
podataka koje zahtijeva projekt. Perl isključena je zbog neugodnog sintakse (ovo je
naravno subjektivno izbora), a zbog svoje loše integriranog OO mehanizam. Python je
postigao visoko u čitljivosti, OO podršku, podaci knjižnice, te integraciju s kompilirane
jezika. Štoviše, Numerička Python je pravedan bio otpušten i je važan element za moj
razvoj.
Za niske razine dijelu, Fortran 77 bio eliminiran zbog svojih arhaičnih karaktera,
nedostatak memorije za upravljanje, i prenosivosti brojeva u C-Fortran sučelja. C + + je
bio kandidat, ali u konačnici nije izabran jer prenosivost između kompilatora je još uvijek
problem u 1996, i zato što sam smatrao koristi C + + za malo sastavljen kod u projekt
dovoljno za kompenzaciju složenosti jezika.
Knjižnica arhitektura

Arhitektura MMTK je jasno Python-driven. Da korisnik, ona sebe predstavlja kao čista
Python biblioteka. C kod u MMTK napisan od samog početka u obliku Python modula
kako rukovati samo nekoliko vremenski kritične aspekte: evaluacija interakcije energija, i
dugo-prikazivati iterativni algoritmi minimizacije energije kao što su i molekularne
dinamike koje teku bez Python vezane pretek. Ekstenzivni upotrebljavaju se Numerička
Python, LAPACK i netCDF knjižnice. MMTK pruža multi-threading podrška za
zajedničku memoriju strojeva paralelni i MPI-based paralelizam raspodijeljena memorija
za strojeve.

Najveći dio MMTK je skup klasa koje opisuju atoma i molekula i upravljati baze
podataka molekula i fragmenata. Biomolekula (proteini, DNA i RNA) rješava podrazreda
generičkih Molekula klase. Drugi važan podskup MMTK provodi shema za
izračunavanje interakcije energije (naziva nešto krivo "silnica" u zajednici simulacije). I /
O-vezane kod je treći stup MMTK. Ona čita i piše nekoliko popularnih formata datoteka i
vlastiti putanje oblik koji se temelji na netCDF formatu. Za razliku od druge formate
datoteka trajektorije, MMTK je netCDF datoteke su obje binarne (a time i kompaktne)
datoteka i prenosiv između platforme. i štoviše dozvola učinkovit pristup gotovo
proizvoljnog podskupa.
Primjer MMTK Molekularna model

Snimak iz simulacije dinamike lizozima u vodi, trčanje sa MMTK. Zumirati

Modularnost i extendibility bili važni dizajn kriterijima. Algoritmi, energija uvjete, i


specijalizacije za vrste podataka mogu se dodati bez potrebe za izmjenu MMTK kod.
Dizajn MMTK kao knjižnica, a ne zatvorena program je neophodan za mnoge primjene.

Važan aspekt biomolekularne simulacije je vizualizacija. MMTK delegata ovaj zadatak


na vanjske alate. Dva programa za vizualizaciju, VMD i PyMOL, posebno su dobro
integrirani.

Većina MMTK korisnik pristup biblioteke iz jednostavnog Python skripte, ali MMTK je
također bio korišten kao osnova za kraj-korisnik programa s grafičkim korisničkim
sučeljem, kao što su nMOLDYN i DomainFinder.

MMTK trenutno se sastoji od oko 18.000 redaka Python koda, 12.000 redaka rukom
pisane C koda, a neki strojno generirani C kod. Većina koda razvio je jedna osoba
tijekom osam godina u sklopu istraživačke djelatnosti. Dva modula, neke funkcije, te
mnoge ideje su doprinijeli korisnik zajednica.
Praktično iskustvo

MMTK i drugih Python knjižnica je osnova za sve moje istraživačke projekte za deset
godina. Mnogi od tih projekata ne bi bilo moguće bez brza koji je karakterističan za
Python. U metodološkom rada, razvoja i testiranja vrijeme je bitno: ideja da se može
suditi u popodnevnim satima biti će isprobani, dok je ideju da se zahtijeva tjedan dana
rada za procjenu često odložio.

Kao i za sve open source projekte, veličina MMTK zajednica korisnika može samo
pretpostaviti indirektno. Mailing lista za MMTK korisnik trenutno ima 175 članova, a
znanstvena publikacija koja opisuje MMTK uz računarsko kemičare je citirani 30 puta.
O autoru

Konrad Hinsen je istraživač u teorijskoj fizici raditi za francuski Centre National de la


Recherche Scientifique (CNRS). On je bio uključen u projekt Numerički Python i autor
ScientificPython, opće namjene biblioteku znanstvenih Python koda.

You might also like