57 Plant Metabolomics (158pages) Biotechnology in Agriculture and Forestry
57 Plant Metabolomics (158pages) Biotechnology in Agriculture and Forestry
57 Plant Metabolomics (158pages) Biotechnology in Agriculture and Forestry
Edited by
T. Nagata (Managing Editor)
H. Lörz
J. M. Widholm
Biotechnology in Agriculture and Forestry
Volumes already published and in preparation are listed at the end of this book.
Biotechnology in
Agriculture and Forestry 57
Plant Metabolomics
Edited by
K. Saito, R.A. Dixon, and L. Willmitzer
123
Series Editors
Professor Dr. Toshiyuki Nagata
University of Tokyo
Graduate School of Science
Department of Biological Sciences
7-3-1 Hongo, Bunkyo-ku
Tokyo 113-0033, Japan
Professor Dr. Horst Lörz Professor Dr. Jack M. Widholm
Universität Hamburg University of Illinois
Biozentrum Klein Flottbek 285A E.R. Madigan Laboratory
Zentrum für Angewandte Molekularbiologie Department of Crop Sciences
der Pflanzen (AMP II) 1201 W. Gregory
Ohnhorststraße 18 Urbana, IL 61801, USA
22609 Hamburg, Germany
Volume Editors
Professor Dr. Kazuki Saito
Chiba University
Graduate School of Pharmaceutical Sciences
Yayoi-cho 1-33, Inage-ku
Chiba 263-8522, Japan;
RIKEN Plant Science Center
Yokohama 230-0045, Japan
Professor Dr. Richard A. Dixon Professor Dr. Lothar Willmitzer
Plant Biology Division Max Planck Institute
Samuel Roberts Noble Foundation of Molecular Plant Physiology
2510 Sam Noble Parkway Am Mühlenberg 1
Ardmore, OK 73401, USA 14476 Golm, Germany
ISSN 0934-943X
ISBN-10 3-540-29781-2 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-29781-9 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights reserved, whether the whole or part of the material is concerned,
specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction
on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof
is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current
version, and permission for use must always be obtained from Springer. Violations are liable for prosecution
under the German Copyright Law.
Springer is a part of Springer Science + Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply,
even in the absence of a specific statement, that such names are exempt from the relevant protective laws and
regulations and therefore free for general use.
Editor: Dr. Dieter Czeschlik, Heidelberg, Germany
Desk Editor: Dr. Andrea Schlitzberger, Heidelberg, Germany
Cover design: design&production GmbH, Heidelberg, Germany
Typesetting and production: LE-TEX Jelonek, Schmidt & Vöckler GbR, Leipzig, Germany
Printed on acid-free paper 31/3152 5 4 3 2 1 0
Preface
Section II Bioinformatics
II.3 Map Editor for the Atomic Reconstruction of Metabolism (ARM) . 129
M. Arita, Y. Fujiwara, and Y. Nakanishi
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2 Definition of Metabolic Information . . . . . . . . . . . . . . . . . . . . . . . . 131
3 Metabolic Map Editor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
L. Achnine
Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble
Parkway, Ardmore, OK 73401, USA
P. Ahiahonu
Phenomenome Discoveries Inc., 204–407 Downey Road, Saskatoon,
Saskatchewan, Canada S7N 4L8
M. Altaf-Ul-Amin
Department of Bioinformatics and Genomics, Graduate School of
Information Science, Nara Institute of Science and Technology (NAIST),
Takayama-cho 8916–5, Ikoma, Nara 630–0101, Japan
M. Arita
Department of Computational Biology, Graduate School of Frontier Sciences,
The University of Tokyo, 5–1–5 Kashiwanoha, Kashiwa, 277–8561 Japan,
e-mail: [email protected]
H. Asahi
Department of Bioinformatics and Genomics, Graduate School of
Information Science, Nara Institute of Science and Technology (NAIST),
Takayama-cho 8916–5, Ikoma, Nara 630–0101, Japan
M.H. Beale
The National Centre for Plant and Microbial Metabolomics, Rothamsted
Research, West Common, Harpenden, Herts. AL5 2JQ, UK,
e-mail: [email protected]
R.J. Bino
Plant Research International, P.O. Box 16, 6700 AA Wageningen, The
Netherlands
C. Böttcher
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany
XIV List of Contributors
P.M. Bramley
School of Biological Sciences, Royal Holloway, University of London, Egham,
Surrey, TW20 0EX, UK
F. Carrari
Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1,
14476, Golm, Germany
Y.H. Choi
Division of Pharmacognosy, Section Metabolomics, Institute of Biology,
Leiden University, P.O. Box 9502, 2300RA, Leiden, The Netherlands
S. Clemens
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany, e-mail: [email protected]
R.B. Croteau
Institute of Biological Chemistry, Washington State University, Pullman, WA
99164, USA
T. Daskalchuk
Phenomenome Discoveries Inc., 204–407 Downey Road, Saskatoon,
Saskatchewan, Canada S7N 4L8, e-mail: [email protected]
B.E. Deavours
Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble
Parkway, Ardmore, OK 73401, USA
R.A. Dixon
Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble
Parkway, Ardmore, OK 73401, USA, e-mail: [email protected]
A.R. Fernie
Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1,
14476, Golm, Germany, e-mail: [email protected]
H. Foerster
Carnegie Institution, Department of Plant Biology, 260 Panama Street,
Stanford, CA 94305, USA
M. Franz
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany
List of Contributors XV
P.D. Fraser
School of Biological Sciences, Royal Holloway, University of London, Egham,
Surrey, TW20 0EX, UK, e-mail: [email protected]
Y. Fujiwara
Department of Computational Biology, Graduate School of Frontier Sciences,
The University of Tokyo, 5–1–5 Kashiwanoha, Kashiwa, 277–8561 Japan
E. Fukusaki
Department of Biotechnology, Graduate School of Engineering, Osaka Univ,
2–1 Yamadaoka, Suita, 565–0871, Japan,
e-mail: [email protected]
I.A. Graham
CNAP, Department of Biology (Area 7), University of York, PO Box 373, York
YO10 5YW, UK
J. Gullberg
Umeå Plant Science Centre, Department of Forest Genetics and Plant
Physiology, Swedish University of Agricultural Sciences, SE-901 87 Umeå,
Sweden
R.D. Hall
Plant Research International, P.O. Box 16, 6700 AA Wageningen, The
Netherlands, e-mail: [email protected]
D. Heath
Phenomenome Discoveries Inc., 204–407 Downey Road, Saskatoon,
Saskatchewan, Canada S7N 4L8
M.Y. Hirai
RIKEN Plant Science Center, Suehiro-cho 1–7–22, Tsurumi-ku, Yokohama,
Kanagawa 230–0045, Japan
T. Hirayama
International Graduate School of Arts and Sciences, Yokohama City
University, 1-7-29 Suehiro, Tsurumi-ku, Yokohama, 230-0045 Japan
T. Ikegami
Department of Polymer Science and Engineering, Kyoto Institute of
Technology, Matsugasaki, Sakyo-ku, Kyoto, 606–8585, Japan
A.I. Johansson
Umeå Plant Science Centre, Department of Forest Genetics and Plant Physiol-
ogy, Swedish University of Agricultural Sciences, SE-901 87 Umeå, Sweden
XVI List of Contributors
P. Jonsson
Research Group for Chemometrics; Organic Chemistry, Department of
Chemistry, Umeå University, SE-901 87 Umeå, Sweden
K.-M. Oksman-Caldentey
VTT Biotechnology, P.O. Box 1500, 02044 VTT, Finland,
e-mail: Kirsi-Marja.Oksman@vtt.fi
S. Kanaya
Department of Bioinformatics and Genomics, Graduate School of
Information Science, Nara Institute of Science and Technology (NAIST),
Takayama-cho 8916–5, Ikoma, Nara 630–0101, Japan,
e-mail: [email protected]
R.E.B. Ketchum
Institute of Biological Chemistry, Washington State University, Pullman, WA
99164, USA, e-mail: [email protected]
J. Kikuchi
RIKEN Plant Science Center, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama,
230-0045 Japan, e-mail: [email protected]
H.K. Kim
Division of Pharmacognosy, Section Metabolomics, Institute of Biology,
Leiden University, P.O. Box 9502, 2300RA, Leiden, The Netherlands,
e-mail: [email protected]
J. Kopka
Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476
Potsdam-Golm, Germany, e-mail: [email protected]
K. Kurokawa
Department of Bioinformatics and Genomics, Graduate School of
Information Science, Nara Institute of Science and Technology (NAIST),
Takayama-cho 8916–5, Ikoma, Nara 630–0101, Japan
T.R. Larson
CNAP, Department of Biology (Area 7), University of York, PO Box 373, York
YO10 5YW, UK, e-mail: [email protected]
B. Mehrotra
Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State
University, Washington St., MC 0477, Blacksburg, Virginia 24061, USA
List of Contributors XVII
P. Mendes
Virginia Bioinformatics Institute, Virginia Polytechnic Institute and State
University, Washington St., MC 0477, Blacksburg, Virginia 24061, USA,
e-mail: [email protected]
T. Moritz
Umeå Plant Science Centre, Department of Forest Genetics and Plant
Physiology, Swedish University of Agricultural Sciences, SE-901 87 Umeå,
Sweden, e-mail: [email protected]
Y. Nakamura
Department of Bioinformatics and Genomics, Graduate School of
Information Science, Nara Institute of Science and Technology (NAIST),
Takayama-cho 8916–5, Ikoma, Nara 630–0101, Japan
Y. Nakanishi
Intec Web and Genome Informatics Corporation
M. Naoumkina
Plant Biology Division, Samuel Roberts Noble Foundation, 2510 Sam Noble
Parkway, Ardmore, OK 73401, USA
D. Ohta
Department of Plant Genes and Physiology, Graduate School of Agriculture
and Biological Sciences, Osaka Prefecture University, Gakuen-cho 1–1, Sakai,
Osaka 599–8531, Japan
M. Orešic̀
VTT Biotechnology, P.O. Box 1500, 02044 VTT, Finland
S.Y. Rhee
Carnegie Institution, Department of Plant Biology, 260 Panama Street,
Stanford, CA 94305, USA, e-mail: [email protected]
H. Rischer
VTT Biotechnology, P.O. Box 1500, 02044 VTT, Finland
XVIII List of Contributors
K. Saito
Chiba University, Graduate School of Pharmaceutical Sciences, Yayoi-cho
1-33, Chiba 263-8522, Japan
RIKEN Plant Science Center, Suehiro-cho 1–7–22, Tsurumi-ku, Yokohama,
Kanagawa 230–0045, Japan, e-mail: [email protected]
N. Sakurai
Kazusa DNA Research Institute, 2–6–7 Kazusa-kamatari, Kisarazu, Chiba
292–0818, Japan
N. Schauer
Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1,
14476, Golm, Germany
D. Scheel
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany
D. Shibata
Kazusa DNA Research Institute, 2–6–7 Kazusa-kamatari, Kisarazu, Chiba
292–0818, Japan, e-mail: [email protected]
Y. Shinbo
New Energy and Industrial Technology Development Organization, Toshima,
Tokyo 170–6028, Japan
L.W. Sumner
The Samuel Roberts Noble Foundation, 2510 Sam Noble Parkway, Ardmore,
OK 73401, USA, e-mail: [email protected]
H. Suzuki
Kazusa DNA Research Institute, 2–6–7 Kazusa-kamatari, Kisarazu, Chiba
292–0818, Japan
N. Tanaka
Department of Polymer Science and Engineering, Kyoto Institute of
Technology, Matsugasaki, Sakyo-ku, Kyoto, 606–8585, Japan
C. Tissier
Carnegie Institution, Department of Plant Biology, 260 Panama Street,
Stanford, CA 94305, USA
T. Tohge
RIKEN Plant Science Center, Suehiro-cho 1–7–22, Tsurumi-ku, Yokohama,
Kanagawa 230–0045, Japan
List of Contributors XIX
T. Tokimatsu
Kazusa DNA Research Institute, 2–6–7 Kazusa-kamatari, Kisarazu, Chiba
292–0818, Japan
R.N. Trethewey
metanomics GmbH and metanomics Health GmbH, Tegeler Weg 33, 10589
Berlin, Germany, e-mail: [email protected]
J. Trygg
Research Group for Chemometrics; Organic Chemistry, Department of
Chemistry, Umeå University, SE-901 87 Umeå, Sweden
E. v. Roepenack-Lahaye
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany
H.A. Verhoeven
Plant Research International, P.O. Box 16, 6700 AA Wageningen, The
Netherlands
R. Verpoorte
Division of Pharmacognosy, Section Metabolomics, Institute of Biology,
Leiden University, P.O. Box 9502, 2300RA, Leiden, The Netherlands
J.L. Ward
The National Centre for Plant and Microbial Metabolomics, Rothamsted
Research, West Common, Harpenden, Herts. AL5 2JQ, UK,
e-mail: [email protected]
L. Willmitzer
Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1,
14476, Golm, Germany
E. Willscher
Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale,
Germany
Y. Yamazaki
Phenomenome Discoveries Inc., 204–407 Downey Road, Saskatoon,
Saskatchewan, Canada S7N 4L8
P. Zhang
Carnegie Institution, Department of Plant Biology, 260 Panama Street,
Stanford, CA 94305, USA
I.1 Gas Chromatography Mass Spectrometry
J. Kopka1
1 Introduction
GC-MS technology has been used for decades in studies which aim at the
exact quantification of metabolite pool size and metabolite flux. Exact quan-
tification has traditionally been focused on a single or small set of predefined
target metabolites. Today GC-MS is one of the most widely applied technology
platforms in modern metabolomic studies. Since early applications in unrav-
elling the mode of action of herbicides (Sauter et al. 1988) it has experienced
a renaissance (Fig. 1) in post-genomic, high-throughput fingerprinting and
metabolite profiling of genetically modified (e. g. Roessner et al. 2001a,b, 2002;
Fernie et al. 2004) or experimentally challenged plant samples (e. g. Cook et
al. 2004; Kaplan et al. 2004; Urbanczyk-Wochniak and Fernie 2005). Metabolic
phenotyping and analysis of respective phenocopies by metabolite profiling
has become an integral part of plant functional genomics (Fiehn et al. 2000b;
Roessner et al. 2002; Fernie et al. 2004). The essence of metabolite profiling,
namely the non-biased screening of biological samples for changes of metabo-
lite levels relative to control samples, has been thoroughly discussed earlier
and is clearly distinguished from fingerprinting approaches and the concept
of exact quantification (Fiehn et al. 2000b; Sumner et al. 2003; Birkemeyer et
al. 2005).
GC-MS-based metabolome profiling analysis is on the verge of becoming
a routine technology. This fact substantially contributes to the development
of metabolomics as a fourth integral part of the Rosetta stone for functional
genomics and molecular physiology (Trethewey et al. 1999; Fiehn et al. 2000b;
Trethewey 2004). Nevertheless, GC-MS technology is already challenged again
by new bottlenecks and demands for improved data sets which are optimised
for the mathematical modelling tools currently developed in the fields of bioin-
formatics and biological systems analysis.
The challenges of modern, multi-parallel, GC-MS based metabolite anal-
ysis are manifold: (i) automation of sample preparation, wet chemistry and
data processing after acquisition for increased throughput and reproducibil-
ity, (ii) extension of the analytical scope of metabolomics studies, for example
by combined analysis of single samples using multiple analytical technol-
ogy platforms, and combined analysis with the proteome and transcriptome
1 MaxPlanck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 14476 Potsdam-Golm,
Germany, e-mail: [email protected]
Fig. 1. Literature survey of publications which associate the concepts, “metabolite”, “profiling”,
and “gas chromatography” performed on 1/2005. A total of ∼500 citations without conference
proceedings, abstracts and book chapters were found. The frequency of publications in all bio-
logical sciences (open circles) is compared to the contribution by plant metabolomics community
(closed circle)
One of the major criticisms and pitfalls of metabolome analyses is best ex-
plained by so-called matrix effects. This well-known effect describes unex-
pected losses or increased recovery of metabolites in complex extracts com-
pared to pure authentic preparations. Matrix effects on one hand are caused
by the presence of compounds which either specifically inhibit extraction or
chemical analysis of metabolites. Positive matrix effects can stabilise other-
wise labile compounds in the presence of suitable chemicals. Typical exam-
ples are suppression effects of soft ionization techniques, for example electro-
spray ionization (ESI) or matrix assisted laser desorption ionization (MALDI).
Electron-impact ionization (EI) typically used in GC-MS profiling is not sus-
ceptible to suppression. Instead GC injection is the crucial step which may
8 J. Kopka
Fig. 2. Mass spectra of deuterated and 13 C labeled MSTs help structural elucidation and recov-
ery analysis of metabolites. Labeled and non-labeled MSTs of Glycine N, N-di(trimethylsilyl)-,
trimethylsilyl ester are shown. Oryza sativa L. cv. Nipponbare was labelled in vivo using deuter-
ated water or 13 CO2 . MSTs representing the fully labeled mass isotopomers demonstrate presence
of two carbon atoms (left panel) and two non-exchangeable hydrogen atoms (right panel). Mass
fragments which exhibited a mass shift of 1 amu (red) or 2 amu (blue) are indicated
Fig. 3a–c. Mass spectral deconvolution of deuterated mass isotopomers. Succinic acid
di(trimethylsilyl) ester was partially labelled in vivo by exposing Oryza sativa L. cv. Nippon-
bare to deuterated water. Metabolite profiles were performed on a Pegasus II GC-TOF-MS system
(LECO, St. Joseph, MI, USA) with 20 scans s−1 . Mass spectra were deconvoluted using Chro-
maTOF software version 1.00, with baseline offset just above noise, smoothing and peak width
set to 10 and 2 scans, respectively: a selected ion traces of non-deuterated (D0 , m/z = 247) and
deuterated (D1−4 , m/z = 248 − 251) M+ − 15 mass fragments. Mass fragments at 252 and 253 amu
are carbon mass isotopomers of D4 ; b peak area compared to deconvoluted peak height. Peak
area integration does not allow differentiation of contributions by carbon mass isotopomers;
c deconvoluted mass spectra of D0−4 . Inset shows partial deconvolution of D0−4 carbon mass
isotopomers and missing carbon mass isotopomers of D1−3
Gas Chromatography Mass Spectrometry 9
For this purpose, full saturating 13 C in vivo labelling was developed using
yeast which is one of the most important organisms in systems biology (e. g.
Stephanopoulos et al. 2004). Metabolites of yeast were demonstrated to be fully
labelled when provided with an exclusive carbon source, such as U-13 C-glucose
(Mashego et al. 2004; Birkemeyer et al. 2005). Refer to Birkemeyer et al. (2005)
for detailed discussion of potential applications for 13 C-labelled metabolomes.
Similar approaches are possible in plants (Figs. 2 and 3).
In short, standardised in vivo labelled extracts of yeast or other microor-
ganisms can substitute the rather small number of chemically synthesised
mass isotopomers used in earlier studies (Fiehn et al. 2000a; Gullberg et al.
2004). Typically a standardised labelled reference sample is combined in equal
amounts with non-labelled experimentally challenged samples. The advan-
tages of this approach are (i) the presence of a mass isotopomer for all iden-
tified but also all hitherto non-identified metabolites, (ii) the concentration
of each mass isotopomer is inherently adjusted to the endogenous metabolite
concentration, (iii) metabolic components can easily be distinguished from
laboratory contaminations, and (iv) recovery of all metabolic components can
be determined with the appropriate mass isotopomer.
Thus metabolite profiling will achieve the same level of transcriptome and
proteome experiments, which utilize differential fluorescent probes or differ-
ential isotope coded tagging, respectively. In conclusion, comprehensive in
vivo isotope labelling will help to establish quantitative between laboratory
comparability of GC-MS based metabolome experiments. More importantly,
we expect metabolome experiments with full mass isotopomer standardisa-
tion to be also independent of the mass spectrometric platform, e. g. CE-MS,
LC-MS, or possibly even MALDI-TOF-MS.
Routine GC-MS profiling analysis (Fiehn et al. 2000b; Roessner et al. 2000)
has an upper size exclusion limit which is roughly equivalent to a persily-
lated trisaccharide derivative (MW:1296), hexatriacontane (MW:506), or hen-
triacontanoic acid trimethylsilylester (MW:523). Even though it may appear
tempting, metabolite and analyte are best not defined by molecular weight.
Gas Chromatography Mass Spectrometry 11
time. Retention time indices (RI), based on homologous series of internal ref-
erence substances, such as n-alkanes, have been introduced early to aid GC
analyses (Kovàts 1958). Use of an n-alkane RI system in GC-MS metabolite pro-
filing substantially improves the reproducibility of the chromatography axis.
The currently achievable accuracy of RI prediction was recently investigated
in three different profiling laboratories which use the same type of capillary
column but different GC-MS systems (Schauer et al. 2005). In this investiga-
tion the possibility of predicting RIs of more than 100 identified analytes was
tested. Mathematical regression resulted in an average accuracy of ±5.4 RI
units.
The IC intensity axis in GC-MS is standardised for high vs low mass discrim-
ination. The GC-MS tuning includes processes which ensure constant ratios
of high vs low mass intensities. However, mass spectra which are recorded by
either QUAD-MS or fast scanning GC-TOF-MS detection may differ in this
respect. Fast scanning GC-TOF-MS systems (e. g. Pegasus II MS system, LECO,
St. Joseph, MI, USA) have increased sensitivity of small mass fragments and
reduced sensitivity in the high mass range.
Fig. 4. Synthetic and representative GC-MS profiles of Oryza sativa L. cv. Nipponbare leaves:
A – 132 identified MSTs representing 109 known metabolites; B – 12 added internal stan-
dard substances; C – 148 unidentified MSTs which match previous MSRI library entries;
D – all previously observed MSTs present in the MSRI library at GMD (http://csbdb.mpimp-
golm.mpg.de/gmd.html)
Acknowledgements. I would like to thank A.R. Fernie, A. Erban, Max Planck Institute of Molecular
Plant Physiology, Potsdam-Golm, Germany for critically reading and discussing my manuscript.
My thanks extend to Prof. Dr. Le Tran Binh, Institute of Biotechnology (IBT), Hanoi, Vietnam,
for sharing his expertise in rice cultivation. This work was supported by the Max-Planck society,
and the Bundesministerium für Bildung und Forschung (BMBF), grant PTJ-BIO/0312854.
References
Ausloos P, Clifton CL, Lias SG, Mikaya AI, Stein SE, Tchekhovskoi DV, Sparkman OD, Zaikin V,
Zhu D (1999) The critical evaluation of a comprehensive mass spectral library. J Am Soc Mass
Spectrom 10:287–299
Barsch A, Patschkowski T, Niehaus K (2004) Comprehensive metabolite profiling of Sinorhizobium
meliloti using gas chromatography-mass spectrometry. Funct Integrat Genomics 4:219–230
Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-
Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004) Potential
of metabolomics as a functional genomics tool. Trends Plant Sci 9:418–425
Birkemeyer C, Kolasa A, Kopka J (2003) Comprehensive chemical derivatization for gas
chromatography-mass spectrometry-based multi-targeted profiling of the major phytohor-
mones. J Chromatogr A 993:89–102
18 J. Kopka
Sinha AE, Fraga CG, Prazen BJ, Synovec RE (2004a) Trilinear chemometric analysis of twodimen-
sional comprehensive gas chromatography-time-of-flight mass spectrometry data. J Chro-
matogr A 1027:269–277
Sinha AE, Hope JL, Prazen BJ, Nilsson EJ, Jack RM, Synovec RE (2004b) Algorithm for locating
analytes of interest based on mass spectral similarity in GC×GC-TOF-MS data: analysis of
metabolites in human infant urine. J Chromatogr. A 1058:209–215
Sinha AE, Prazen BJ, Synovec RE (2004c) Trends in chemometric analysis of comprehensive
two-dimensional separations. Anal Bioanal Chem 378:1948–1951
Stein SE (1999) An integrated method for spectrum extraction and compound identification from
gas chromatography/mass spectrometry data. J Am Soc Mass Spectrom 10:770–781
Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J (2004) CSB.DB: a comprehensive
systems-biology database. Bioinformatics 20:3647–3651
Stephanopoulos G, Alper H, Moxley J (2004) Exploiting biological complexity for strain improve-
ment through systems biology. Nat Biotechnol 22:1261–1267
Strelkov S, von Elstermann M, Schomburg D (2004) Comprehensive analysis of metabolites
in Corynebacterium glutamicum by gas chromatography/mass spectrometry. Biol Chem
385:853–861
Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62:817–836
Toyo’oka T (1999) Modern derivatization methods for separation science. Wiley, New York
Trethewey RN (2004) Metabolite profiling as an aid to metabolic engineering in plants. Curr Opin
Plant Biol 7:196–201
Trethewey RN, Krotzky AJ, Willmitzer L (1999) Metabolic profiling: a Rosetta stone for genomics?
Curr Opin Plant Biol 2:83–85
Urbanczyk-Wochniak E, Fernie AR (2005) Metabolic profiling reveals altered nitrogen nutrient
regimes have diverse effects on the metabolism of hydroponically-grown tomato (Solanum
lycopersicum) plants. J Exp Bot 56:309–321
Urbanczyk-Wochniak E, Luedemann A, Kopka J, Selbig J, Roessner-Tunali U, Willmitzer L,
Fernie AR (2003) Parallel analysis of transcript and metabolic profiles: a new approach in
systems biology. EMBO Reports 4:989–993
Van Deursen MM, Beens J, Janssen HG, Leclercq PA, Cramers CA (2000) Evaluation of time-of-
flight mass spectrometric detection for fast gas chromatography. J Chromatogr A 878:205–213
Veriotti T, Sacks R (2001) High-speed GC and GC/time-of-flight MS of lemon and lime oil samples.
Anal Chem 73:4395–4402
Vreuls RJJ, Dallüge J, Brinkman UAT (1999) Gas chromatography – time-of-flight mass spec-
trometry for sensitive determination of organic microcontaminants. J Microcolumn Sep
11:663–675
Wagner C, Sefkow M, Kopka J (2003) Construction and application of a mass spectral and retention
time index database generated from plant GC/EI-TOF-MS metabolite profiles. Phytochem
62:887–900
Weckwerth W, Loureiro ME, Wenzel K, Fiehn O (2004a) Differential metabolic networks unravel
the effects of silent plant phenotypes. Proc Natl Acad Sci USA 18:7809–7814
Weckwerth W, Wenzel K, Fiehn O (2004b) Process for the integrated extraction, identification and
quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical
networks. Proteomics 4:78–83
I.2 Current Status and Forward Looking Thoughts
on LC/MS Metabolomics
L.W. Sumner1
1 Introduction
The plant metabolome is quite complex with current estimates on the order
of 15,000 metabolites within a given species and over 200,000 different metabo-
lites within the plant kingdom (Dixon 2001; Hartman et al. 2005). Due to the
chemical complexity of the plant metabolome, it is generally accepted that
a single analytical technique will not provide comprehensive visualization of
the metabolome, and therefore, multiple technologies are generally employed.
The selection of the most suitable technology is generally a compromise be-
tween speed, chemical selectivity and instrumental sensitivity. Tools such as
nuclear magnetic resonance spectroscopy (NMR) are rapid, highly selective,
and non-destructive, but have relatively lower sensitivity. Other methods such
as capillary electrophoresis (CE) coupled to laser induced fluorescence (LIF)
detection are highly sensitive, but have limited chemical selectivity. Chromato-
graphically coupled mass spectrometry methods such as gas chromatography
(GC)/mass spectrometry (MS) and liquid chromatography (LC)/MS offer the
best combination of sensitivity and selectivity, and therefore are central to
most metabolomics approaches. Mass selective detection provides highly spe-
cific chemical information including molecular mass and/or characteristic
fragment ion(s) information that are directly related to chemical structure.
This information can be utilized for compound identification through spectral
matching with data compiled in libraries for authentic compounds or used for
de novo structural elucidation. Further, chemically selective MS information
can be obtained from extremely small metabolite quantities with limits of de-
tection in the pmole and fmole level for many primary and secondary plant
metabolites.
GC/MS has proven capability for profiling large numbers of metabolites
with reports covering several hundred to slightly more than a thousand var-
ious components (Fiehn et al. 2000; Roessner et al. 2000, 2001; Birkemeyer
et al. 2003; Wagner et al. 2003; Broeckling et al. 2005; Schauer et al. 2005;
Welthagen et al. 2005). The term component is used because a large num-
ber of metabolites often yield more than one derivatized component which
are observed in the GC/MS analysis. The achievable range and number of
metabolites profiled by GC/MS can be attributed to the high separation effi-
ciencies of long (30−60 m) capillary GC columns (i. e. N ≥ 250,000 for 60 m).
These high efficiencies enable the separation of very complex mixtures, and
with mass selective detection, qualitative identification of a significant pro-
portion of these compounds is achievable. This makes GC/MS a very effi-
cient and cost effective metabolomics tool. A major prerequisite for GC/MS
is sample volatility which is necessary to enable separation in the gas phase.
Analytes may be innately volatile or chemically derivatized to yield volatile
compounds. Unfortunately, there exist a large number of metabolites which
are not amenable to GC/MS even following derivatization. These include com-
pounds such as phenylpropanoid and other natural product glycosides whose
labile glycosidic bonds degrade during heating and vaporization. Thus, alter-
native techniques are necessary and especially so for the study of secondary
metabolism.
Current Status and Forward Looking Thoughts on LC/MS Metabolomics 23
2 Chromatography Theory
Currently, the chromatographic performance of HPLC, relative to GC and CE,
is lower, and there is a significant need for improvement. However, to dis-
cuss this issue and possible improvements in detail, several terms must be
defined. A number of quantifiers are used to assess chromatographic perfor-
mance. These include resolution (Rs ), selectivity (α), efficiency (N), and peak
capacity (n) which are defined below:
1. Resolution (Rs ) is a quantifier of the degree of separation between mixture
components, i. e. two peaks ta and tb with peak widths at the base wa
and wb . A resolution of 1 indicates that two adjacent peaks are baseline
resolved. Resolution can also be expressed as a function of the theoretical
plate number (N) and selectivity (α) as defined below in Eq. (1):
√
2(tb − ta ) 2ΔtR N α−1 k2
Rs = = Rs = (1)
wa + wb wa + wb 4 α 1 + k2
2. Selectivity (α), which is also referred to as the separation factor, is a ratio of
the retention or capacity factor (k ) of two peaks. The capacity factor is a rel-
ative retention parameter that has been normalized using the void elution
time (tv ) or volume (Vv ) and is therefore independent of column geometry –
see Eq. (2). The void value is the volume or time of an unretained component.
The selectivity parameter provides a quantifier of the relative separation of
two components. Selectivity can be altered based on the chemical composi-
tion of the stationary phase, stationary phase manufacturer, mobile phase,
and pH:
k2 t1 − tv V1 − V2
α= k1 = = (2)
k1 tv Vv
3. Column efficiency is usually quantified based upon a column’s theoretical
plate number (N) which is unitless and a measure of band broadening per
unit time – see Eq. (3). This can be practically quantified using retention
time (tR ) and peak width. Peak width can be defined at the base (Wb ) or at
half height (w1/2 ) as they are directly related if one assumes a Gaussian peak
shape, i. e. Wb = 1.698 w1/2 = 4σ where σ equals the standard deviation of
the peak. Alternatively, plate number can be calculated using the column
resolution (R) and selectivity (α).
4. Separation efficiency is also quantified using a normalized theoretical plate
number based on column length, i. e. (N/L) with units of plates/m. The
theoretical plate number can be dramatically increased by decreasing the
peak width. Plate number and efficiency are also related to particle size (dp )
and column length (L) as described below:
2 2
tR tR tR 16R2 L
N= = 16 = 5.54 = = (3)
σ Wb w1/2 (1 − α)2 dp
Current Status and Forward Looking Thoughts on LC/MS Metabolomics 25
case scenario; however in practice this value is seldom achieved and more
realistic peak capacities are between 100 and 200. Thus, current HPLC tech-
nologies are limiting the comprehensive scope of metabolomics. Separation
efficiencies can be improved by altering selectivity, increasing column lengths,
reducing particle sizes, increasing temperature, and/or alternative column
materials. Alternatively, the utilization of multidimensional chromatography
offers increased HPLC peak capacities of greater than 1000 to provide more
comprehensive coverage of plant natural products (Tanaka et al. 2004). Each
of methods to increase HPLC efficiency is discussed below.
Typically, improving selectivity is the best approach to improving chro-
matographic resolution. Selectivity is based upon the chemical or physical
interaction properties that are fundamental to the separation process. More
precisely, the separation selectivity of specific components can be optimized
by the appropriate choice of column materials, mobile phases, and/or man-
ufacturer. Various generic and proprietary materials are available for vari-
ous chromatographic modes for HPLC. Example modes include ion-exchange,
normal-phase, reverse-phase, hydrophilic interaction, and size exclusion chro-
matography. All HPLC columns are not equal, and different particles, particle
sizes, surface modification chemistries, surface coverage, and packing pro-
cesses vary significantly from manufacturer to manufacturer. These parame-
ters dramatically influence chromatographic performance.
Often selectivity is optimized for a targeted set of analytes as a means of
increasing resolution. However, in more complex mixtures associated with
global metabolomics-based approaches, improved selectivity for one class of
compounds often results in decreased selectivity for others. Thus, techniques
(e. g. reverse-phase chromatography) with a broad range of selectivity are most
likely to be the best choices for metabolomics.
One of the simplest means of increasing resolution is to increase the number
of theoretical plates. Since the plate number is directly proportional to the
column length (Eq. (3)), one needs only to increase the column length to
increase resolution. However, Eq. (1) tells us that R is proportional to the
square root of N. Thus, to achieve a 2× increase in resolution, we would
have to square the column length. For example a 250 mm long column would
need to be extended to 625 cm (i. e. 25 × 25 cm) for a twofold increase in
resolution. Unfortunately, this is not a practical solution as the operating
pressure is directly proportional to the column length. Equation (5) defines
the relationship between pressure (ΔP), column length (L), analyte diffusion
coefficient (Dm ), particle size (dp ), mobile-phase viscosity (η), and column
permeability (K o ):
LvDm η
ΔP = (5)
dp Ko
column length (25 cm)2 would require an operational pressure of 75,000 p.s.i.
(i. e. 3,000 p.s.i. × 25). Although this illustrates the advantage of very high
pressure liquid chromatography which has been achieved by select groups
using custom apparatuses (MacNair et al. 1997, 1999; Tolley et al. 2001; Patel et
al. 2004; Shen et al. 2005), commercial pumps do not operate at these pressures
(most commercial HPLC pumps have a 5,000-p.s.i. limit). Therefore, significant
resolution enhancements achieved through longer columns is limited for most
researchers. With that said, several companies (i. e. Waters and JASCO) have
recently introduced 15,000-p.s.i. HPLC pumps.
Equation (5) reveals that the pressure differential is proportional to the
mobile phase viscosity (η). Thus, lowering of the mobile phase viscosity (η)
by increasing the temperature can lower the operational pressure and allow
the use of longer columns for resolution enhancement (Djordjevic et al. 1998,
1999, 2000). Selectivity is also affected by temperature and additional efficiency
can be achieved by heating alone. However, one must ensure analyte thermal
stability if elevated temperature separations are to be employed.
Equation (5) also shows that the pressure is a function of the column per-
meability (K o ). New monolithic columns offer greater permeability and lower
pressures, thus allowing for the use of longer columns. The continuous bed
stationary phases of these columns consist of porous polymeric materials gen-
erated from silica or organic materials such as acrylamide, styrene, acrylate,
or methacrylate monomers which result in lower back-pressure than packed
particles. The lower back-pressure allows for the use of longer columns and
hence greater efficiencies. Several groups have reported on the use of up to
1 m capillary columns (Que and Novotny 2002; Legido-Quigley et al. 2003;
Tolstikov et al. 2003; Tanaka et al. 2004) and this technology looks promis-
ing.
Plate number and efficiency are also related to particle size (dp ) and column
length (L) as shown in Eq. (3). This equation shows that decreasing the particle
size increases the theoretical plate number/efficiency (MacNair et al. 1997,
1999; Tolley et al. 2001; Shen et al. 2005). However, Eq. (5) shows again that
pressure increases with smaller particle size. Fortunately, new commercial
ultra-high pressure liquid chromatography pumps (UPLC) are now available
from multiple manufacturers that allow the use of smaller particles in the range
of 1−2 μm. These instruments offer substantial resolution enhancements with
plate numbers on the order of several hundred thousand and peak capacities in
excess of 400 (Wilson et al. 2005). In addition to increased resolution, UPLC also
offers higher speed separations as the optimum flow velocity has a significantly
broader range which allows for increased flow rates without significant loss of
resolution (Wilson et al. 2005). Estimates of up to ninefold increases in flow
rates without significant loss of resolution have been suggested (Wilson et al.
2005). It is important to note that ultra-high pressure separations result in
increased frictional heating; however this can be reduced by down-scaling the
chromatography dimensions with the heating being negligible in columns of
less than 1 mm (MacNair et al. 1997).
28 L.W. Sumner
2004) and more recently applied to metabolite analyses (Kapron et al. 2005).
Extension of this concept to metabolomics will surely occur.
The above text discusses multidimensional chromatographic approaches in
an on-line context. However, multidimensional approaches can also be pur-
sued in an off-line, multiplexed, or parallel approach. For example, fractions
can be collected off-line using a separate HPLC. The fractions can then be con-
centrated and reinjected onto an on-line LC/MS system. Alternately, fractions
of the same samples could be injected onto a series of parallel systems using
different methods (i. e. GC/MS, LC/MS, or various selective modes of each
performed with different column selectivities). This is our current approach.
For example, samples are fractionated and/or enriched and then the polar and
lipophilic fractions are analyzed by GC/MS. In addition methanolic extracts
are analyzed for phenolic/saponin content. An interesting concept would be to
design a multiplexed system, with multiple chromatographic-mass spectrom-
etry systems operating in an integrated manner. For example, a multiplexed
chip system with each chip having a slightly different selectivity and indepen-
dent mass analyzer could be designed to increase the comprehensive coverage.
Such a system with on-line enrichment could also be used to address dynamic
range limitations that currently exist for specific compound classes such as
phytohormones.
If higher resolution chromatography is obtained, mass analyzers must also
be employed with compatible scans speeds to record data for compounds
eluting in very short temporal periods. It is expected that LC peak widths of
1−5 s will be routine in the very near future. For accurate quantification, it is
commonly accepted that the sampling rate should be sufficient to capture 10
data points across the eluting peak with higher sampling rates being beneficial.
Thus, sampling rates should be less than 0.1 s or greater than 10 Hz. This
is achievable with current TOF-MS analyzers. It is worth mentioning that
quadrupole based mass analyzers, including traps, can approach these speeds;
however, TOF mass spectrometers equipped with delayed extraction and ion-
reflectrons also offer improved mass accuracy over quadrupoles.
Improvements in the accuracy of the mass analyzer can further enhance
metabolite differentiation, elemental composition determination, identifica-
tion, and allow for the profiling of greater numbers of metabolites. Mass
accuracy is directly related to the mass resolution or the ability of the mass
analyzer to resolve compounds of different m/z values. Mass resolution is de-
fined in Eq. (6) and is a function of mass (M) divided by the peak width (ΔM)
which is most commonly defined at half-height:
M
Rm = (6)
ΔM
Often, LC/MS is performed with ion-traps or quadrupole mass analyzers that
yield mass accuracies in the range of 1.0−0.1 Da. Unfortunately, many metabo-
lites have similar nominal masses which can not be differentiated at this level
of mass accuracy. For example, the important natural products genistein and
30 L.W. Sumner
medicarpin have similar nominal masses of 270, but have different accurate
masses of 270.2390 (C15 H10 O5 ) and 270.2830 (C16 H14 O4 ) respectively due to
different chemical compositions. If one could measure their mass with suf-
ficient accuracy, then one could differentiate these compounds in the mass
domain even if they could not be physically separated in the chromatographic
domain. This mass differentiation can be achieved at a mass resolution (M/ΔM)
greater than 6136. Compounds with closer accurate masses such as rutin
(C27 H30 O16 = 610.5180) and hesperidin (C28 H34 O15 = 610.5620) would re-
quire a higher mass resolution of 13,864 for their differentiation. Mass resolu-
tions on the order of 10,000 can be achieved with modern TOF-MS analyzers,
and resolutions in excess of 100,000 with sub-part-per-million mass accuracies
(i. e. less than 0.001 at m/z of 1,000 Da) are achievable with Fourier transform
ion cyclotron mass spectrometry (FTMS). Newer technologies, such as Thermo
Electron Corporation’s Orbitraps are currently surfacing that also offer high-
resolution solutions. Although high resolution accurate mass measurements
have great advantages, this technology is still rather costly.
Interestingly, a significant argument can be made that accurate mass mea-
surements significantly reduce the need for ultra-high resolution separations
due to the enhanced separation in the mass domain. However if the chro-
matography step is omitted or compressed significantly, then ion suppression,
competitive ionization, and other matrix affects become increasingly more
problematic. I personally believe that both improved chromatographic reso-
lution and accurate mass measurements offer the best solution and that the
combination of these techniques will provide greater comprehension and con-
fidence in our ability to profile the metabolome. Further, I also believe that the
needed magnitude of enhancements in chromatographic resolution can only
be achieved with multidimensional approaches.
References
Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-
Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004) Potential
of metabolomics as a functional genomics tool. Trends Plant Sci 9:418–425
Birkemeyer C, Kolasa A, Kopka J (2003) Comprehensive chemical derivatization for gas
chromatography-mass spectrometry-based multi-targeted profiling of the major phytohor-
mones. J Chromatogr A 993:89–102
Broeckling CD, Huhman DV, Farag MA, Smith JT, May GD, Mendes P, Dixon RA, Sumner LW
(2005) Metabolic profiling of Medicago truncatula cell cultures reveals the effects of biotic
and abiotic elicitors on metabolism. J Exp Bot 56:323–336
Chester T, Parcher JF (2001) Blurring the Boundaries. Science 291:502–503
Chester T, Pinkston J (2002) Supercritical fluid and unified chromatography. Anal Chem 74:2801–
2811
Dixon RA (2001) Phytochemistry in the genomics and post-genomics eras. Phytochemistry
57:145–148
Djordjevic N, Houdiere F, Fowler P (1998) High temperature and temperature programming in
capillary HPLC. Biomed Chromatogr 12:153–154
Current Status and Forward Looking Thoughts on LC/MS Metabolomics 31
Djordjevic N, Fitzpatrick F, Houdiere F, Lerch G, Rozing G (2000) High temperature and temper-
ature programming in capillary electrochromatography. J Chromatogr A 887:245–252
Djordjevic NM, Fowler PWJ, Houdiere F (1999) High temperature and temperature programming
in high-performance liquid chromatography: Instrumental considerations. J Microcolumn
Separ 11:403–413
Evans C, Jorgenson J (2004) Multidimensional LC-LC and LC-CE for high-resolution separations
of biological molecules. Anal Bioanal Chem 378:1952–1961
Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolite profiling
for plant fuctional genomics. Nat Biotechnol 18:1142–1161
Guevremont R (2004) High-field asymmetric waveform ion mobility spectrometry: a new tool
for mass spectrometry. J Chromatogr A 1058:3–19
Gygi SP, Rochon Y, Franza BR, Aebersold R (1999) Correlation between protein and mRNA
abundance in yeast. Mol Cell Biol 19:1720–1730
Hartman T, Kutchan TM, Strack D (2005) Evolution of metabolic diversity. Phytochemistry
66:1198–1199
Kapron J, Jemal M, Duncan G, Kolakowski B, Purves R (2005) Removal of metabolite interfer-
ence during liquid chromatography/tandem mass spectrometry using high-field asymmetric
waveform ion mobility spectrometry. Rapid Commun Mass Spectrom 19:1979–1983
Lee Y, Hoaglund-Hyzera C, Srebalus Barnes C, Hilderbrand A, Valentine S, Clemmer D (2002)
Development of high-throughput liquid chromatography injected ion mobility quadrupole
time-of-flight techniques for analysis of complex peptide mixtures. J Chromatogr B Anal
Technol Biomed Life Sci 782:343–351
Legido-Quigley C, Marlin N, Melin V, Manz A, Smith N (2003) Advances in capillary electrochro-
matography and micro-high performance liquid chromatography monolithic columns for
separation science. Electrophoresis 24:917–944
Liu X, Plasencia M, Ragg S, Valentine S, Clemmer D (2004) Development of high throughput
dispersive LC-ion mobility-TOFMS techniques for analysing the human plasma proteome.
Brief Funct Genomic Proteomic 3:177–186
Luo Z, Xiong Y, Parcher J (2003) Chromatography with dynamically created liquid “stationary”
phases: methanol and carbon dioxide. Anal Chem 75:3557–3562
MacNair J, Lewis K, Jorgenson J (1997) Ultrahigh-pressure reversed-phase liquid chromatography
in packed capillary columns. Anal Chem 69:983–989
MacNair J, Patel K, Jorgenson J (1999) Ultrahigh-pressure reversed-phase capillary liquid chro-
matography: isocratic and gradient elution using columns packed with 1.0-micron particles.
Anal Chem 71:700–708
Matz L, Dion H, Hill H (2002) Evaluation of capillary liquid chromatography-electrospray ioniza-
tion ion mobility spectrometry with mass spectrometry detection. J Chromatogr A 946:59–68
Mondello L, Lewis AC, Bartle KD (2002) Multidimensional Chromatography. Wiley, Chichester,
UK
Oliver DJ, Nikolau B, Wurtele ES (2002) Functional genomics: high-throughput mRNA, protein,
and metabolite analyses. Metab Eng 4:98–106
Patel K, Jerkovich A, Link J, Jorgenson J (2004) In-depth characterization of slurry packed capillary
columns with 1.0-microm nonporous particles using reversed-phase isocratic ultrahigh-
pressure liquid chromatography. Anal Chem 76:5777–5786
Que A, Novotny M (2002) Separation of neutral saccharide mixtures with capillary electrochro-
matography using hydrophilic monolithic columns. Anal Chem 74:5184–5191
Roessner U, Wagner C, Kopka J, Trethewey RN, Willmitzer L (2000) Simultaneous analysis of
metabolites in potato tuber by gas chromatography-mass spectrometry. Plant J 23:131–142
Roessner U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Fernie AR (2001) Metabolic
profiling allows comprehensive phenotyping of genetically or environmentally modified plant
systems. Plant Cell 13:11–29
Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-
Tunali U, Forbes M, Willmitzer L, Fernie A, Kopka J (2005) GC-MS libraries for the rapid
identification of metabolites in complex biological samples. FEBS Lett 579:1332–1337
32 L.W. Sumner
Shen Y, Zhang R, Moore R, Kim J, Metz T, Hixson K, Zhao R, Livesay E, Udseth H, Smith R
(2005) Automated 20 kpsi RPLC-MS and MS/MS with chromatographic peak capacities of
1000–1500 and capabilities in proteomics and metabolomics. Anal Chem 77:3090–3100
Shvartsburg A, Tang K, Smith R (2005) Optimization of the design and operation of FAIMS
analyzers. J Am Soc Mass Spectrom 16:2–12
Somerville C, Dangl J (2000) Plant biology in 2010. Science 290:2077–2078
Somerville C, Somerville S (1999) Plant functional genomics. Science 285:380–383
Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62:817–836
Takats Z, Wiseman JM, Gologan B, Cooks RG (2004) Mass spectrometry sampling under ambient
conditions with desorption electrospray ionization. Science 306:471–473
Tanaka N, Kimura H, Tokuda D, Hosoya K, Ikegami T, Ishizuka N, Minakuchi H, Nakanishi K, Shin-
tani Y, Furuno M, Cabrera K (2004) Simple and comprehensive two-dimensional reversed-
phase HPLC using monolithic silica columns. Anal Chem 76:1273–1281
Tolley L, Jorgenson J, Moseley M (2001) Very high pressure gradient LC/MS/MS. Anal Chem
73:2985–2991
Tolstikov VV, Lommen A, Nakanishi K, Tanaka N, Fiehn O (2003) Monolitichic silica-based
capillary reversed-phase liquid chromatography/electrospray mass spectrometry for plant
metabolomics. Anal Chem 75:6737–6740
Trethewey RN (2001) Gene discovery via metabolic profiling. Curr Opin Biotechnol 12:135–138
Trethewey RN, Krotzky AJ, Willmitzer L (1999) Metabolic profiling: a rossetta stone for genomics?
Curr Opin Plant Biol 2:83–85
Verbeck G, Ruotolo B, Sawyer H, Gillig K, Russell D (2002) A fundamental introduction to ion
mobility mass spectrometry applied to the analysis of biomolecules. J Biomol Tech 13:56–61
Wagner C, Sefkow M, Kopka J (2003) Construction and application of a mass spectral and
retention time index database generated from plant GC/EI-TOF-MS metabolite profiles.
Phytochemistry 62:887–900
Washburn M, Wolters D, Yates J (2001) Large-scale analysis of the yeast proteome by multidi-
mensional protein identification technology. Nat Biotechnol 19:242–247
Weckwerth W (2003) Metabolomics in systems biology. Annu Rev Plant Biol 54:669–689
Wells P, Zhou S, Parcher J (2002) Gas-liquid chromatography with a volatile “stationary” liquid
phase. Anal Chem 74:2103–2111
Wells P, Zhou S, Parcher J (2003) Unified chromatography with CO2-based binary mobile phases.
Anal Chem 75:18A–24A
Welthagen W, Shellie RA, Spranger J, Ristow M, Zimmermannn R, Fiehn O (2005) Comprehensive
two-dimensional gas chromatography-time-of-flight mass spectrometry (GC×GC-TOF) for
high resolution metabolomics: biomarker discovery on spleen tissue extracts of obese NZO
compared to lean C57BL/6 mice. Metabolomics 1:65–73
Wilson I, Nicholson J, Castro-Perez J, Granger J, Johnson K, Smith B, Plumb R (2005) High reso-
lution “ultra performance” liquid chromatography coupled to oa-TOF mass spectrometry as
a tool for differential metabolic pathway profiling in functional genomic studies. J Proteome
Res 4:591–598
Wolters D, Washburn M, Yates J (2001) An automated multidimensional protein identification
technology for shotgun proteomics. Anal Chem 73:5683–5690
I.3 Plant Metabolomics Strategies Based upon
Quadrupole Time of Flight Mass Spectrometry
(QTOF-MS)
H.A. Verhoeven1,2 , C.H. Ric de Vos1,2 , R.J. Bino1,2 , and R.D. Hall1,2
1 Introduction
2 The Technology
Time of flight was already introduced in the early 1960s but was quickly re-
placed by other approaches. This was due to the lack of sufficiently fast elec-
tronics needed to process data on a nanosecond scale. Thirty years later in
the early 1990s, the development of high megahertz and even gigahertz digital
circuits led to the dramatic increase in the application of TOF technology. This,
combined with new developments in the area of sample introduction and ion-
isation of (macro)molecules, has subsequently led to many new applications
of (TOF-based) mass spectrometry in the fields of biology and pharmacy.
A TOF instrument serves as the main mass analyser, and its principle is
based on ions with different mass/charge ratios having different flight times in
a field-free drift zone once they have been accelerated by a very short electric
pulse from the electrodes of an accelerator: lighter ions travel faster through
the measurement chamber than the heavier ones. A thorough discussion on
the physical principles can be found in, for example, Guilhaus (1995). In most
TOF instrument designs, ions are detected using Micro Channel Plates (MCP),
which, on capturing the ions, generate a cascade of electrons to amplify the
signal so that it can be detected by the associated electronics (see Fig. 1 for
a schematic representation of the various parts). Several ion recorders have
been used with the various designs of hybrid TOF mass spectrometer. The two
most widely used are the time-to-digital converter (TDC) and the transient
recorder or analogue to digital converter (ADC) (Chernushevich et al. 2001).
The type of detector affects both the dynamic range of the signals that can be
measured and also the mass accuracy. In a TDC, every individual ion generates
a pulse. This pulse is shaped into a digital signal, of which the rising flank is
used for timing. The time passed since the start of the ion accelerating pulse
and its arrival at the MCP is stored in the memory. This system is very accurate
over the entire mass range and is optimally suited for the accurate timing of
low ion counts. It is, however, less suitable or even unsuitable for the detection
of ions arriving simultaneously at the MCP since these will be recorded as
being single events and this will thus lead to an underestimation of the signal.
TDCs also suffer from an additional limitation concerning the detection of
ions. During the time required to process one pulse, the detector is ‘blind’ to
new incoming pulses. This so-called dead time, not only leads to a further
underestimation of the signal, but also it causes a shift in the observed m/z
value towards lower values. This can lead to serious deviations from the true
accurate mass at high to very high signal intensities. These problems occur to
a lesser extent in instruments equipped with an ADC, since these machines can
sample the analog output of the MCP at very high frequencies, thus providing
multiple data points per observed m/z value. In this way, multiple ions arriving
at the same time will lead to a linear increase in peak area. In some designs, TDC
and ADC are both used to combine the high mass accuracy of the TDC at low
ion rates, with the high dynamic range and accurate m/z value measurements
of an ADC at high ion rates.
(Q)TOF-MS in plant metabolics 35
Fig. 1. Schematic diagram showing the main components of a typical quadrupole/time of flight
configuration. Ions enter the instrument on the left, and pass through the first quadrupole.
This can be operated either in ion transfer mode, which allows all ions to pass, or in selective
mode, which is used for precursor scanning and alignment of the different quadrupoles. The
ion transfer is a special quadrupole, intended to separate the operating pressures in the different
compartments of the quadruple section. In the collision cell, a collision gas can be present in
order to induce fragmentation of the incoming ions. When no gas is present molecular ions will
be detected. Subsequently, molecular ions and/or charged fragments enter the TOF tube, where
they are collected and pushed into the drift zone. During this transition, ions are accelerated in
an electric field in the accelerator assembly, consisting of several ion lenses, which determines
their kinetic energy. The ions now all follow a trajectory towards the reflectron, consisting of
a pile of cylindrical ion lenses at different potentials, which causes them to be repelled towards
the detector, the multi channel plate. Here the ions strike the surface of the detector, which finally
converts the arrival of every single ion into a measurable electric current. Additional electronics
is required to process the electrical signals and the timing between pushing the ions into the drift
zone and their arrival at the detector
Regardless of the high mass accuracy, resolution and sensitivity, the applica-
tion of TOF instruments in structure elucidation is quite limited due to the ab-
sence of filtering and scanning capabilities. Consequently, hybrid instruments
have since been designed to cope with these shortcomings. These machines
include the addition of ion trap(s), quadrupoles or combinations thereof, to
the basic TOF analyser. One key example is the now increasingly well-known
and widely used QTOF system. These instruments rely on the combination
of two or more quadrupoles with a TOF analyser. The first quadrupole (Q1)
serves as a mass filter or ion tunnel, depending on the operational mode, with
the second quadrupole (Q2) serving as the collision cell for the fragmentation
of the ions which have passed through Q1. This fragmentation is achieved
using an electric field to accelerate the ions, in combination with a collision
gas such as nitrogen or argon. Fragmentation can be controlled by varying the
(very low) pressure of the collision gas and/or by varying the collision energy
36 H.A. Verhoeven, C.H. Ric de Vos, R.J. Bino, and R.D. Hall
through altering the acceleration voltage of the cell. Collisions with the gas
molecules also result in a cooling of the ions, which incurs that their kinetic
energy is transferred. This results in a more homogeneous energy distribution
of the individual ions, which in turn improves the mass accuracy capacity of
the instrument. The ions and/or ion fragments are subsequently collected in
the accelerator part of the TOF instrument where a very short pulse is applied
to the electrodes of the chamber to eject the ions. In the case of orthogonal
ejection, the differences in kinetic energy in the z-axis will be less than in case
of forward ejection. The differences in kinetic energy are also further reduced
in the reflectron lens, which repels the ions towards the detector. Here, ions
with higher energy will travel further than lower energy ions, thus reducing
the difference. A number of variations on this basic design have been created.
These include, for example, the modification of the second quadrupole into
a linear ion trap with axial ejection through the addition of a number of ex-
tra ion lenses (Hager 2002). As a result, new possibilities are created, such as
the ability to store specific ions, which can be selectively ejected for complex
MS/MS or MSn analyses (Hager 2002). The high mass accuracy can be further
improved by using an internal (reference) standard that is sampled at regular
intervals throughout the entire analysis period. This reference is then used
to correct the instrument calibration on-the-fly (lock mass correction). Such
a capacity for continuous (re)calibration is particularly useful, if not essential,
in the case of long series of chromatographic runs where excellent, long-term
stability of the mass accuracy can be continuously achieved down to ±5 ppm.
This is significant as mass accuracy at or below this level allows us to predict
the chemical composition of a given ion by using the small known differences
in atomic masses of the various atomic elements. In this way, a first predic-
tion can be made about the nature and identity of the molecular component.
Combined with other data (retention time, N rule, stable isotope distribution
of 13 C etc.) this can then enable the list of possible molecular identities to
be reduced even further and thus come closer to translating MS output into
named metabolites.
Combining the results obtained from several biological samples into a sin-
gle comparative analysis is an arduous task that requires the precise align-
ment and matching of peaks representing the same compound over all chro-
matograms. Due to its relatively robust chromatography and compound sepa-
ration efficiency, GC-(TOF)-MS of derivatized extracts is at present generally
preferred over LC-MS in metabolomic studies (Fiehn et al. 2000; Roessner et al.
2001a,b; Fernie et al. 2004). Nevertheless, GC-MS is less suitable for semi-polar
compounds among which are key classes of plant (secondary) metabolites
including flavonoids, (glyco-)alkaloids, glucosinolates and saponins. Recent
advances in techniques for improving resolution in LC by using capillary elec-
trophoresis (Soga et al. 2002), hydrophilic interaction columns (Tolstikov and
Fiehn 2002) and monolithic columns (Tolstikov et al. 2003) demonstrate the
high potential which TOF technology has for LC-MS to complement GC-MS in
unravelling metabolic profiles.
(Q)TOF-MS in plant metabolics 37
3 Data Analysis
Data analysis is perhaps the most crucial step in any metabolomics strategy
and the importance of bioinformatics tools should not be underestimated. In
a standard (ideal) approach, a whole range of standards would be used to assist
in identifying through simple linkage, which peaks in an MS output represent
which metabolites. However, as the vast majority of metabolites present in
complex plant extracts are as yet unknown and are not commercially available,
as is especially true for the secondary plant metabolites, this approach is
unfeasible at present for a true untargeted metabolomics approach. Another
strategy is therefore required which enables the automated and essentially
blind direct comparison of large numbers of spectra. Since most datasets are
very complicated, dedicated metabolomics software is needed for this purpose.
Some of this software is already available but more still needs to be developed
and this represents a major task for the next five years.
Data manipulation is essential for reliable metabolomics analyses and spe-
cial attention has to be paid to aspects such as baseline correction and noise
elimination. In addition, in the case of LC-MS, particular attention also needs
to be given to reliable correction of local drifts in retention time and accurate
mass. Different compositions of eluant can cause significant variation in base-
line especially when using steep LC gradients. For the successful correction of
such baseline fluctuations, the chromatogram has to contain a region without
strong peaks. Digital filtering will enable the elimination of excess noise which
would otherwise lead to the generation of erroneous (false) peaks. Some re-
cent software packages are able to deal with a number of these problems in
TOF data analysis. Another key element is the need to correct for retention
time fluctuations. Unlike capillary gas chromatography which is generally very
stable, liquid chromatography often suffers from relatively large, non-linear
(localised) fluctuations in retention time. This can be due to small differences
in pH, temperature, or the co-elution of components which interact differently
with the stationary phase. Consequently, this problem prevents a simple direct
comparison of different samples. A number of algorithms have been designed
to correct for this phenomenon. One such approach, based on photodiode
array type data, uses correlation optimised warping of the chromatograms to
achieve alignment of shifted peaks in the chromatograms (Nielsen et al. 1998).
For MS data, MetAlign™ software, in contrast, uses specific mass peaks with
strong local maxima throughout the chromatogram as ‘landmark peaks’ with
which to correct for chromatographic shifts over the entire series of analyses
(Vorst et al. 2005). After correction, unbiased, direct spectral comparisons,
based on mass peak intensities are possible and contrasting mass signals can
be reliably identified and extracted. Differential chromatograms are produced
from which all unchanging peaks have been removed to reveal the true extent
of the differences between two (groups of) samples in one or both directions.
This dedicated software can automatically handle hundreds of full scan MS
datasets obtained by either LC or GC, and is independent of type of mass
38 H.A. Verhoeven, C.H. Ric de Vos, R.J. Bino, and R.D. Hall
autosampler directly into the ion source and all ions with the corresponding
charge are then analysed by the MS. In this case, TOF instruments have a clear
advantage over scanning quadrupole instruments because their significantly
higher resolution allows for the simultaneous detection of many ion species
and, because no scanning is required, every individual ion can theoretically
be captured. This inevitably results in a very rich mass spectrum which is
further complicated by the many interactions which can occur between the
different components of the sample during ionization. Furthermore, unstable
ions can cause additional extensive ion vapour phase interactions. For these
reasons, there was initially considerable scepticism of the potential value of
DFI approaches for reliable metabolic analysis and these phenomena are ex-
tensively discussed elsewhere (Kebarle 2000; King et al. 2000). However, recent
publications have shown that the mass spectrum data obtained in this way is
actually highly reproducible and can effectively be used for a fast screening of
complex extracts (Aharoni et al. 2002; Goodacre et al. 2002, 2003; Castrillo et
al. 2003; Verhoeven et al., in preparation).
Data processing is in many cases the bottleneck for the successful deploy-
ment of this technology, and many applications rely on a dedicated approach to
data processing. This was clearly demonstrated for example, by the MS analysis
of unfractionated plant extracts of Pharbitis leaf sap (Goodacre et al. 2003).
Correct data processing of the complex mass spectra was found crucial for
reliable discrimination between the different physiological treatments used.
Experiments performed in our laboratory resulted in similar conclusions. Five
commercially available extracts from Salix were analysed using DFI in a QTOF
MS in positive mode. A single total ion count (TIC) injection peak was ob-
served, and all the masses obtained were combined into a single mass spectrum
per sample. The aligned spectra were processed for noise elimination, baseline
correction and then centroided to obtain the accurate masses of each m/z peak.
These were then aligned in the m/z dimension using exact masses of known
metabolites to correct for small fluctuations in exact mass due unavoidable
minor (thermal) drift in the TOF tube. Intensities of the m/z peaks were log
transformed, and exported to GeneMaths™ for multivariate analysis. Principle
Components Analysis (PCA) revealed first (Fig. 2a) that the sample replicates
(Samples 1 and 2) cluster close together reflecting the high reproducibility of
the extraction and mass profiling techniques. Sample 3 is also clearly simi-
lar in overall composition to Sample 5, whereas Sample 4 is clearly distinct
from all others. Sample 4 was found to have come from a different supplier.
Differences in sample composition were readily detected by selecting the m/z
values which were responsible for the separation of the samples in the PCA
(Fig. 2b). This example indicates the usefulness of rapid screening for quality
control of complex extracts without the need for more dedicated but time-
consuming LC separation. In a similar manner Goodacre also used a rapid DFI
approach to compare olive oil samples and to test for adulteration (Goodacre
et al. 2002).
40 H.A. Verhoeven, C.H. Ric de Vos, R.J. Bino, and R.D. Hall
(Q)TOF-MS in plant metabolics 41
Fig. 2. a PCA plot of the entire set of detected mass peaks of 5 Salix samples. Samples 1 and 2
were experimental replicates taken from plant extracts of the same origin, but with different
batch numbers. Samples 3, 4 and 5 were samples of unknown, but different origin. This figure
shows that experimental variation (Samples 1 and 2) is low, Samples 3 and 5 are highly similar
with respect to their overall composition while Sample 4 is distinctly different from the rest being
placed on the other side of the PCA plot. b Detailed PCA of all mass peaks in the Samples 1 to 5.
The area responsible for the grouping and positioning of Samples 3 and 5 in the bottom right
quadrant is highlighted, and the corresponding mass peaks are shown as their logarithmic ratio
on the right together with the m/z values of each. Light grey: low abundant mass peaks, dark grey:
highly abundant mass peaks
detected on-line prior to the molecules entering the MS. It is this combination
of technologies which has made QTOF-based MS analysis a popular choice.
4.2.1 HPLC-PDA-ESI-QTOF-MS
Fig. 4. a Absorbance spectrum of chromatographic peak at retention time 23.48 min. b QTOF-
ESI-MS/MS spectrum of chromatographic peak at retention time 23.48 min. Observed accurate
mass of parent ion [M+H] + = 611.1596 corresponds to an elemental composition of C27 H31 O21
(−2.6 ppm) and its fragments obtained correspond to C21 H21 O16 (+1.9 ppm) and C15 H11 O7
(−3.9 ppm)
44 H.A. Verhoeven, C.H. Ric de Vos, R.J. Bino, and R.D. Hall
References
Aharoni A, de Vos CHR, Verhoeven HA, Maliepaard CA, Kruppa G, Bino RJ, Goodenowe DB
(2002) Nontargeted metabolome analysis by use of Fourier Transform Ion Cyclon Mass
Spectrometry. OMICS 6:217–234
Bino RJ, de Vos CHR, Lieberman M, Hall RD, Bovy A, Jonker HH, Tikunov Y, Lommen A, Moco S,
Levin I (2005) The light-hyperresponsive high pigment-2dg mutation of tomato: alterations
in the fruit metabolome. New Phtyologist 166:427–438
Castrillo JI, Hayes A, Mohammed S, Gaskell SJ, Oliver SG (2003) An optimized protocol for
metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phyto-
chemistry 62:929–937
Chernushevich IV, Loboda AV, Thomson BA (2001) An introduction to quadrupole-time-of-flight
mass spectrometry. J Mass Spectrom 36:849–865
Fernie AR (2003) Metabolome characterization in plant system analysis. Funct Plant Biol
30:111–120
Fernie AR, Trethewey RW, Krotzky AJ, Willmitzer L (2004) Metabolic profiling: from diagnostics
to systems biology. Nature Rev Mol Cell Biol 5:763–769
Fiehn O (2001) Combining genomics, metabolome analysis and biochemical modelling to un-
derstand metabolic networks. Comp Funct Genom 2:155–168
Fiehn O (2002) Metabolomics-the link between genotypes and phenotypes. Plant Mol Biol
48:115–171
(Q)TOF-MS in plant metabolics 47
Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey RN, Willmitzer L (2000) Metabolic profiling
for plant functional genomics. Nat Biotechnol 18:1157–1161
Goodacre R, Vaidyanathan S, Bianchi G, Kell DB (2002) Metabolic profiling using direct infusion
electrospray ionisation mass spectrometry for the characterisation of olive oils. Analyst
127:1457–1462
Goodacre R, York EV, Heald JK, Scott IM (2003) Chemometric discrimination of unfractionated
plant extracts analyzed by electrospray mass spectrometry. Phytochemistry 62:859–863
Goodacre R, Vaidyanathan S, Dunn WB, Harrigan GG, Kell DB (2004) Metabolomics by
numbers: acquiring and understanding global metabolomics data. Trends Biotechnol
22:245–252
Guilhaus M (1995) Principles and instrumentation in Time-of-flight mass spectrometry. J Mass
Spectrom 30:1519–1532
Hager JW (2002) A new linear ion trap mass spectrometer. Rapid Commun Mass Spectrom
16:512–526
Hall RD, de Vos CHR, Verhoeven HA, Bino RJ (2005) Metabolomics for the assessment of
functional diversity and quality traits in plants. In: Harrigan G, Vaidyanathan S, Goodacre R
(eds) Metabolic profiling. Kluwer Acad Publ, Dordrecht, Netherlands pp 31–44
Jander G, Norris SR, Joshi V, Fraga M, Rugg A, Yu S, Li L, Last RL (2004) Application of a high-
throughput HPLC-MS/MS assay to Arabidopsis mutant screening; evidence that threonine
aldolase plays a role in seed nutritional quality. Plant J 39:465–475
Kebarle P (2000) A brief overview of the present status of the mechanisms involved in electrospray
mass spectrometry. J Mass Spectrom 35:804–817
King R, Bonfiglio R, Fernandez-Metzler C, Miller-Stein C, Olah T (2000) Mechanistic investigation
of ionization supression in electrospray ionization. J A Soc Mass Spectrom 11:942–950
LeGall G, DuPont MS, Mellon FA, Davis AL, Collins GJ, Verhoeyen ME, Colquhoun IJ (2003) Char-
acterization and content of flavonoid glycosides in genetically-modified tomato (Lycopersicon
esculentum) fruits. J Agric Food Chem 51:2438–2446
Levin I, Frankel P, Gilboa N, Tanny S, Lalazar A (2003) The tomato dark green mutation is a novel
allele of the tomato homolog of the DEFOLIATED 1 gene. TAG 106:454–460
Markham KR (1989) Flavones, flavonols and their glycosides. In: Dey PM, Harborne JB (eds)
Methods in plant biochemistry, vol 1. Academic Press, San Diego, USA, pp 197–235
Muir S, Collins GJ, Robinson S, Hughes S, Bovy A, de Vos CHR, van Tunen AJ, Verhoeyen ME
(2001) Overexpression of petunia chalcone isomerase in tomato results in fruit containing
increased levels of flavonoids. Nat Biotechnol 19:470–474
Nielsen N-PV, Carstensen JM, Smedsgaard J (1998) Aligning of single and multiple wavelength
chromatographic profiles for chemometric data analysis using correlation optimised warp-
ing. J Chromatogr A 805:17–35
Roessner U, Willmitzer L, Fernie AR (2001a) High-resolution metabolic phenotyping of geneti-
cally and environmentally diverse potato tuber systems. Identification of phenocopies. Plant
Physiol 127:746–764
Roessner U, Luedemann A, Brust D, Fiehn O, Linke T, Willmitzer L, Fernie A (2001b) Metabolic
profiling allows comprehensive phenotyping of genetically or environmentally modified plant
systems. Plant Cell 13:11–29
Soga T, Ueno Y, Naraoka H, Matsuda K, Tomita M, Nishioka T (2002) Pressure-assisted capillary
electrophoresis electrospray ionization mass spectrometry for analysis of multivalent anions.
Anal Chem 74:6224–6229
Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62:817–836
Tolstikov VV, Fiehn O (2002) Analysis of highly polar compounds of plant origin: combining
of hydrophilic interaction chromatography and electrospray ion trap spectrometry. Anal
Biochem 301:298–307
Tolstikov VV, Lommen A, Nakanishi K, Tanaka N, Fiehn O (2003) Monolithic silica-based cap-
illary reversed phase liquid chromatography / electrospray mass spectrometry for plant
metabolomics. Anal Chem 75:6737–6740
48 H.A. Verhoeven, C.H. Ric de Vos, R.J. Bino, and R.D. Hall
Van Tuinen A, de Vos CHR, Hall RD, van der Plas LHW, Bowler C, Bino RJ (2005) Use of
metabolomics for identification of tomato genotypes with enhanced nutritional value derived
from natural light-hyperresponsive mutants. In: Jaiwal PK (ed) Improving the nutritional and
therapeutic qualities of plants. (Plant Metabolic Engineering & Molecular pharming.) SciTech
Publishers, Raleigh, USA (in press)
Vorst OF, de Vos CHR, Lommen A, Staps RV, Visser RGF, Bino RJ, Hall RD (2005) A non
directed approach to the differential analysis of multiple LC/MS-derived metabolic profiles.
Metabolomics 1:169–180
Weckwerth W, Tolstikov V, Fiehn O (2001) Metabolomic characterization of transgenic potato
plants using GC/TOF and LC/MS analysis reveals silent metabolic phenotypes. Abstract:
Proceedings of the 49th ASMS Conference on Mass spectrometry and Allied Topics (1–2)
Wolff JC, Eckers C, Sage AB, Giles K, Bateman R (2001) Accurate mass liquid chromatography /
mass spectrometry on quadrupole orthogonal acceleration time-of flight mass analyzers
using switching between separate sample and reference sprays. 2 Applications using the
dual-electrospray ion source. Anal Chem 73:2605–2612
I.4 Capillary HPLC
T. Ikegami1 , E. Fukusaki2 , and N. Tanaka1
1 Introduction
Micro HPLC systems with a monolithic silica capillary column possess the
following advantages:
1. Small consumption of stationary and mobile phases
2. High detection sensitivity for a certain amount of samples
3. High speed separation with low pressure drop
4. The possible use of a long column with 1 ∼ 2 m that can provide around
100,000 ∼ 200,000 theoretical plates
along with some disadvantages:
Table 1. Column sizes, flow rates, linear velocities, and degrees of sample dilution
Here, the features of monolithic silica capillary columns and the optimization
of separation conditions will be described. The use of monolithic silica columns
consisting of network silica skeletons and through-pores for micro HPLC was
reported recently (Minakuchi et al. 1996; Tanaka et al. 2001). Monolithic silica
capillary columns were reported to provide better separation efficiencies than
particle-packed columns, and the use of these columns for proteomics and
Capillary HPLC 51
Fig. 1. Scanning electron microscope images of monolithic silica prepared from sol-gel methods:
a monolithic silica prepared in a test tube; b,c monolithic silica prepared in 50 μm ID fused silica
capillary; d monolithic silica prepared in 100 μm ID fused silica capillary; e monolithic silica
prepared in 200 μm ID fused silica capillary tube
Fig. 2. Chromatograms obtained for alkylbenzenes (C6 H5 (CH2 )n H, n = 0 − 6) by: a–d C18 mono-
lithic silica capillary columns; e particle packed column (5 mm silica-C18 particles, Mightysil
RP18)
Capillary HPLC 53
injection (Taniguchi and Murata 2002). Moreover, the use of weak eluents for
sample injection is also effective to increase the separation efficiency: in the
case of reversed-phase HPLC, sample solution can be prepared with water-rich
solvent (Ikegami at al. 2004).
4 Two-Dimensional HPLC
Peak capacity (PC) given by Eq. (7) indicates the separation ability regarding
how many solutes can be potentially separated by a chromatographic system.
Retention times of the first solute and the last solute are given as t1 and tR
respectively in Eq. (7). Separation methods such as ultrahigh-pressure liquid
chromatography (UHPLC) and supercritical fluid chromatography (SFC) can
produce a PC of ca. 300/h (Shen and Lee 1998; MacNair et al. 1999), while
a conventional HPLC system gives a PC of 100 ∼ 200/h. In order to achieve
far larger PC using conventional HPLC systems, multidimensional separation
systems were shown to be effective. When two chromatographic systems with
PCx and PCy are combined to form a two-dimensional (2D) chromatography
system, PC for the total system can be theoretically estimated as a product of
two PC values as Eq. (8) (Giddings 1991):
Fig. 3. Replicate injections of an Arabidopsis leaf methanol extract on capillary monolithic C18 columns in positive ionization fullscan MS, given as base peak
chromatograms. Upper panel 0.2 mm ID, 300 mm long; middle panel 0.2 mm ID, 600 mm long; lower panel 0.2 mm ID, 900 mm long column; t0 , void volume
57
58 T. Ikegami, E. Fukusaki, and N. Tanaka
Fig. 4. a Tubing connection at 2nd-D injector of simple 2D-HPLC. b Tubing connection of two
six-port valves used as 2nd-D injector
fraction from the 1st-D column is loaded and temporarily kept in a loop of the
2nd-D injector that results in mixing of separated peaks, but the flow rates of
two HPLC systems can be controlled independently. The 2nd-D separation can
be carried out at very high flow rate (for example, 10 ml/min for a 4.6 mm ID
column) throughout the separation. The simplest 2D-HPLC in Fig. 4a produced
PC = 1000 in reversed phase mode. When two six-port valves or a ten-port
valve is used at the 2nd-D HPLC in Fig. 4b, all fractions can be subjected to the
separation at the 2nd-D column to provide a comprehensive 2D-HPLC system
resulting in so-called group separation, solutes of similar structural features
appear as a group. Because of fast flow rate in the 2nd-D separation using
a 4.6 mm ID column, the 2D-HPLC system consumed a lot of mobile phase sol-
vent. In order to reduce the consumption of mobile phases, the sufficiently fast,
simple 2D-HPLC using capillary columns has been examined (Kimura et al.
2004). The use of capillary column at 2nd-D leads to less solvent consumption
and better MS detectability compared to a larger-sized column. Figure 5a shows
a 2D chromatogram for the tryptic digest of BSA (Bovine serum albumin) ob-
tained from total ion monitoring by ESI-TOF (Time of flight)-MS. From the
1st-D (2.1 mm ID, 5.0 cm long), 18 fractions were injected at 2-min intervals
into the 2nd-D reversed-phase system (4.6 mm ID, 2.5 cm long), generating
18 chromatograms that were used to produce a 2D chromatogram. Figure 5b
shows a 2D chromatogram obtained for the separation of tryptic digest of BSA
using a capillary column (100 μm ID, 10 cm long) in the 2nd D separation. The
number of spots distinguishable in vertical direction in Fig. 5b was greater than
that in Fig. 5a. This is due to the higher column efficiency and longer gradient
time in 2nd-D, along with greater MS detection sensitivity based on nearly
optimum flow rate (3 μL/min) on the capillary column, the greater amount of
sample introduced to the 2nd-D column because of the longer fractionation
interval, and the smaller extent of dilution due to the use of small diameter
column (Kimura et al. 2004).
Capillary HPLC 59
Fig. 5. Two-dimensional separation of tryptic digest of BSA in simple 2D-HPLC, 1st-D; MCI
CQK-31S column (2.1 mm ID, 50 mm long), flow rate; 50 μl/min: a 2nd-D; monolithic silica-C18
column (4.6 mm ID, 25 mm long), flow rate; 5.0 ml/min; b 2nd D; C18 monolithic column (0.1 mm
ID, 100 mm long), flow rate in a capillary column; 3.0 μl/min with a split flow/injection; linear
velocity in the column; 7.7 mm/s. ESI-TOF-MS detection, total ion chromatogram for a mass
range 400–2000
Fig. 6. Comparison of chromatograms of an Arabidopsis thaliana leaf methanol extract, obtained by HILIC-LC mode (top panel) and reversed-phase mode
(bottom panel): Conditions (top panel) TSK Gel Amide 80, 4.6 mm ID, 150 mm long, gradient elution from MeCN to ammonium acetate buffer (6.5 mmol/l,
pH 5.5), MeCN content (%) (time, min) 100 → 100(5) → 90(8) → 60(75) → 0(80), (bottom panel) C18 column, 4.6 mm ID, 150 mm long, gradient elution
T. Ikegami, E. Fukusaki, and N. Tanaka
from ammonium acetate buffer (6.5 mmol/l, pH 5.5) to MeCN, MeCN content (%) (time, min) 0→ 0(15) → 95(40) → 100(60) → 100(80)
Capillary HPLC 61
the compositions of mobile phases that controls the retention order are total
opposites to each other. Capillary columns for HILIC LC are under develop-
ment.
6 Outlook
Routine use of micro HPLC will need development of several important con-
stituents; the reproducible preparation of high performance columns, small-
volume pumps and gradient systems, and improvement of an injection system.
Subjects to be studied are the development of high performance monolithic
silica columns for variety of separation modes, multidimensional microLC
systems, and optimization of an interface between LC and MS instruments.
Large peak capacities realized by highly efficient microHPLC systems or mul-
tidimensional HPLC will greatly contribute to metabolomics studies when
coupled with MS instruments and stable isotope dilution methodology.
References
Alpert AJ, Shukla M, Shukla AK, Zieske LR, Yuen SW, Ferguson MAJ, Mehlert A, Pauly M, Orlando
R (1994) Hydrophilic-interaction chromatography of complex carbohydrates. J Chromatogr
A 676:191–202
Bamba T, Fukusaki E, Nakazawa Y, Kobayashi A (2004) Rapid and high-resolution analysis of
geometric polyprenol homologues by connected octadecylsilylated monolithic silica columns
in high-performance liquid chromatography. J Sep Sci 27:293–296
Bristow PA, Knox JH (1977) Standardization of test conditions for high performance liquid
chromatography columns. Chromatographia 10:279–289
Bushey MM, Jorgenson JW (1990) Automated instrumentation for comprehensive two-
dimensional high-performance liquid chromatography of proteins. Anal Chem 62:161–167
Cabrera K (2004) Application of silica-based monolithic HPLC columns. J Sep Sci 27:843–852
Fukusaki E, Harada K, Bamba T, Kobayashi A (2005) An isotope effect on the comparative
quantification of flavonoids by means of methylation-based stable isotope dilution coupled
with capillary liquid chromatograph/mass spectrometry. J Biosci Bioeng 99:75–77
Giddings JC (1965) Dynamics of chromatography, part 1. Principles and theory. Dekker, New
York
Giddings JC (1991) Unified separation science. Wiley-Interscience, New York, pp 126–128
Han DK, Eng J, Zhou H, Aebersold R (2001) Quantitative profiling of differentiation-induced
microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol
19:946–951
Ikegami T, Dicks E, Kobayashi H, Morisaka H, Tokuda D, Cabrera K, Hosoya H, Tanaka N (2004)
How to utilize the true performance of monolithic silica columns. J Sep Sci 27:1292–1302
Ishizuka N, Minakuchi H, Nakanishi K, Soga N, Nagayama H, Hosoya K, Tanaka N (2000) Perfor-
mance of a monolithic silica column in a capillary under pressure-driven and electrodriven
conditions. Anal Chem 72:1275–1280
Ishizuka N, Kobayashi H, Minakuchi H, Nakanishi K, Hirao K, Hosoya K, Ikegami T, Tanaka N
(2002) Monolithic silica columns for high-efficiency separations by high-performance liquid
chromatography. J Chromatogr A 960:85–96
62 T. Ikegami, E. Fukusaki, and N. Tanaka
Taniguchi H, Murata Y (2002) The newest protocol of proteomics 9, Capillary HPLC. Cell Tech
21:1332–1343
Tolstikov VV, Fiehn O (2002) Analysis of highly polar compounds of plant origin: combination of
hydrophilic interaction chromatography and electrospray ion trap mass spectrometry. Anal
Biochem 301:298–307
Tolstikov VV, Lommen A, Nakanishi K, Tanaka N, Fiehn O (2003) Monolithic silica-based
capillary reversed-phase liquid chromatography/electrospray mass spectrometry for plant
metabolomics. Anal Chem 75:6737–6740
Tomita M, Nishioka T (2003) Frontier of metabolomics. Springer, Berlin Heidelberg New York
Venkatramani CJ, Zelechonok Y (2003) An automated orthogonal two-dimensional liquid chro-
matograph. Anal Chem 75:3484–3494
Wagner K, Miliotis T, Marko-Varga G, Bischoff R, Unger KK (2002) An automated on-line multidi-
mensional HPLC system for protein and peptide mapping with integrated sample preparation.
Anal Chem 74:809–820
Yoshida T (1997) Peptide separation in normal phase liquid chromatography. Anal Chem 69:3038–
3043
Zhang R, Sioma CS, Wang S, Regnier FE (2001) Fractionation of isotopically labeled peptides in
quantitative proteomics. Anal Chem 73:5142–5149
I.5 Capillary HPLC Coupled to Electrospray Ionization
Quadrupole Time-of-flight Mass Spectrometry
S. Clemens, C. Böttcher, M. Franz, E. Willscher,
E. v. Roepenack-Lahaye, and D. Scheel1
1 Introduction
Metabolite profiling in the pre-metabolomics era of the early 1970s to the late
1990s as well as the pioneering metabolomics projects since the late 1990s
have been predominantly GC-MS based. GC-MS techniques are robust and
well-established. Many primary metabolites (e. g. organic acids, sugars, amino
acids, sugar alcohols) can easily be derivatized and are therefore amenable to
GC-MS analysis. Also, spectral databases and deconvolution algorithms are
available, which help extracting meaningful information. Early on, however, it
was obvious that no single analytical technique would be sufficient to achieve
comprehensive coverage of the metabolome (Sumner et al. 2003). As stated
from the beginning and reiterated since, the chemical diversity of metabolites
makes it virtually impossible to detect all compound classes in one “catch”
(Goodacre et al. 2004; Dunn et al. 2005). That is why already the first reports
describing GC-MS-based metabolomics platforms emphasized the need to
develop complementing LC-MS platforms (Roessner et al. 2000). LC-MS covers
in principle a much wider mass range and should allow one to target many
compound classes not detectable by GC-MS. Furthermore, there is usually
no need for derivatization and LC-MS offers superior options to elucidate
unknown metabolites structurally. Particular fractions can easily be collected
for NMR analysis and metabolites/molecular ions can be further analyzed by
tandem-MS or even MSn . Hampering the adoption of LC-MS approaches for
metabolomics, however, was the fact that LC-MS has only rather recently (i. e.
in the 1990s) developed into a routine technology (Niessen 1999a).
One might argue that the need for LC-MS-based profiling is even more
pressing in plant science. A highly rich and diverse secondary metabolism is
a hallmark of plant biology. Lacking the ability to avoid or to retreat from
unfavorable conditions or potential foes, plants have evolved an enormous
metabolic plasticity, which allows them to respond dynamically to environ-
mental changes through the synthesis and/or degradation of particular com-
pounds. This is complemented by the accumulation of various pre-formed
defenses against microbial attack and other threats (Dixon 2001). Further-
more, many so-called secondary metabolites also apparently play major roles
in primary developmental processes and as signaling molecules. Flavonoids
1 Leibniz
Institute of Plant Biochemistry, Weinberg 3, 06120 Halle/Saale, Germany, e-mail:
[email protected]
and their biosynthesis, for instance, have long been investigated because of
their role in flower pigmentation, UV protection, or pathogen defense (Winkel-
Shirley 2001). More recent work demonstrated that flavonoids negatively reg-
ulate auxin transport and are required for pollen germination (Taylor and
Grotewold 2005).
A large fraction of plant secondary metabolites has been classically analyzed
by LC techniques, predominantly through separation on reversed phase mate-
rial. Thus, it is a straightforward concept to combine this with state-of-the-art
mass spectrometry in order to develop powerful metabolomics platforms that
cover important compound classes such as phenylpropanoids or alkaloids.
A look at Arabidopsis thaliana, the most important plant model species, can
illustrate the need for and the potential of LC-MS profiling. Because A. thaliana
has no history of use as a medicinal plant, it initially did not attract the atten-
tion of too many natural product chemists. As a consequence, few secondary
metabolites were identified 10 years ago. In the course of the genome sequenc-
ing, however, it became increasingly clear, that A. thaliana should produce
thousands of different compounds. The Arabidopsis genome encodes a myriad
of proteins likely to be involved in secondary metabolism (d’Auria and Ger-
shenzon 2005). There are more than 270 cytochrome P450 genes, more than
100 glycosyl transferase genes, about 50 glutathione S-transferase genes, to
name a few. For most of the encoded enzymes we do not know substrates or
products.
The first major challenge for metabolomics is the huge chemical diversity
of the metabolome. The second lies in the fact that – as indicated above for
Arabidopsis thaliana – most of the metabolites in any given higher eukaryote
are unknown. Current estimates are in the range of 4000–20,000 metabolites
for a given species (Fernie et al. 2004). Unlike for proteins, genome sequences
do not allow one to deduce the structure of the metabolites. Instead, the
structure has to be elucidated because for only a very minor portion of the
metabolites are standards available. Thus, the future success of metabolomics
will also be determined by the ability to identify reliably metabolites and
to establish the metabolomes of the important model species. Again, this is
a particularly daunting task for plants and filamentous fungi, organisms that
synthesize huge numbers of secondary metabolites, many of which might only
be synthesized in certain cell types or at particular developmental stages. LC-
MS, especially in the combination of quadrupole and time-of-flight analysis in
modern hybrid instruments, holds the promise to meet this challenge as well.
Structural information can in principal be obtained in three different ways:
(i) by determining the elemental composition through the accurate mass, (ii)
by exploiting the information provided by in-source fragmentation, and (iii)
by performing targeted CID-MS (collision-induced dissociation). In contrast,
GC-MS-based profiling faces severe limitations when it comes to de novo
identification of unknown compounds (Fiehn 2002). Molecular ions are rarely
detected because most analytes are derivatized and molecules are fragmented
by the electron impact ionization.
Capillary LC-ESI-QTOF-MS-based profiling 67
mass spectrometer. Often these two options are combined. In capillary liquid
chromatography the flow rate is reduced to meet the optimum flow rate range
characteristic for many ESI interfaces. Splitting occurs – if at all – prior to
chromatography between the pump and the column. Chromatography is per-
formed at low flow rates of 2−20 μL/min (Abian et al. 1999). Column diameters
are typically between 80 and 800 μm. In principle, MS is a mass flow sensi-
tive detection because the response is proportional to the actual number of
molecules reaching the detector. However, at a constant flow rate under atmo-
spheric pressure ionization conditions, MS acts as a concentration sensitive
detector, i. e. the signal is proportional to the analyte concentration in the elu-
ent (Niessen 1999a). The smaller diameter of a capillary column as compared
to a regular 4.6-mm analytical column combined with a lower flow rate allows
the use of much smaller sample volumes and lower sample concentrations.
Furthermore, depending on the design of the ESI interface a reduced flow-rate
can result in higher sensitivity due to the enhanced ionization yield of the
smaller primary droplet formation (Wilm and Mann 1994). Thus, since the
mid 1990s there has been a trend towards miniaturization of the LC (Abian et
al. 1999), although the better sensitivity – i. e. lower concentration detection
limits – is partly offset by the need to reduce the injection volume and by the
lower capacity of the column.
It is advisable to inject as small a volume as possible (and reproducible)
in a solvent of low eluotropic strength. Otherwise, retention on the station-
ary phase is incomplete and many compounds will elute partly in the flow-
through. Furthermore, separation could be seriously disturbed, which results
in unsymmetrical peak shapes and altered retention times. Figure 1 shows
the extracted ion chromatograms, which correspond to the molecular ion
of 4-glucopyranosyloxybenzoyl choline, a secondary metabolite identified in
methanolic seed extracts, injected in either 2 μL 80% methanol (a) or 2 μL 10%
Fig. 1. Influence of solvent on the retention and separation. Extracted ion chromatograms (XIC
412.0–412.5) showing the altered retention behaviour of 4-glucopyranosyloxybenzoyl choline
from a seed extract upon injection in different injection solvent mixtures: a 80% methanol –
a fraction of the metabolite elutes in the flow-through (tR = 6.60 min); b 10% methanol –
diastereomers are retained on the column and baseline-separated (tR = 16.80, tR = 20.73 min)
Capillary LC-ESI-QTOF-MS-based profiling 69
Fig. 2. Effects of ion source potentials on sensitivity and degree of in-source fragmentation:
a schematic overview of the differentially pumped (evacuated) interface between ion source and
mass spectrometer of an API QSTAR Pulsar Hybrid LC/MS system: curtain plate (CP), curtain
gas (CGs), orifice (OR), ring (RNG), skimmer (SK); b definition of electrical potentials applied in
the interfacial region: declustering potential (DP), focusing potential (FP); c,d breakdown curves
for hirsutin (c) and rutin (d) obtained in DP1 and DP2 ramping experiments
retention times, accurate mass and intensity. Self-made macros are then needed
to normalize, to align peak list and to compare intensities (von Roepenack-
Lahaye et al. 2004). These latter steps are covered by the software MetAlign,
developed by Arjen Lommen (www.metalign.nl; Tolstikov et al. 2003), which
aligns and compares sets of chromatograms to identify differentially abundant
mass signals. Reproducibility of the retention times of capillary LC is, in our
experience, high enough to allow accurate alignment of chromatograms.
72 S. Clemens et al.
Fig. 4. Most of the known biosynthetic classes of A. thaliana secondary metabolites are de-
tectable by CapLC-ESI-QTOF-MS. CapLC-ESI(+/-)-CID-MS spectra of representative metabolites
detected in methanolic extracts of different A. thaliana tissues such as leaves, seeds and roots
(for details on the biosynthetic classes and representative metabolites see text)
74 S. Clemens et al.
3.2 Quantification
Fig. 5. Evaluation of matrix effects through mixing of extracts of different origin. Signal ratios for
mass signals in crosswise matrix-diluted and solvent-diluted leaf and root extracts obtained by
CapLC-ESI(+)-TOF-MS measurements. A total of 45 mass signals were analyzed in two different
mixtures each. A ratio of “signal after dilution with methanol” to “signal after dilution with
root/leaf extract” above 1 is indicative of signal enhancement through the other extract, a ratio
below 1 of signal suppression. Most of the mass signals showed a ratio of 1 ± 0.4 (equals about
two times the technical variation). This threshold of ±40% is indicated by vertical lines
matrix effects will probably be smaller than in this pilot experiment because
matrices will be less diverse than a root and leaf extract. Information obtained
by analyses such as these can be used to weigh the data obtained in a profiling
experiment and to add “confidence tags” to each metabolite. Factored into
such “confidence tags” should also be the results on the dynamic range and
the degree of variability observed over many experiments. We conclude that
the sensitivity and range of CapLC-ESI-QTOF-MS clearly make it a powerful
approach for the identification of qualitative differences between samples but
that – as predicted by Chernushevich et al. (2001) – quantitative analysis is also
feasible provided analyte-dependent effects can be detected and corrected for.
All available data suggest that overall the analytical variation is smaller than
the biological variation – as is the case for the more established metabolomics
approaches (Dunn et al. 2005).
Our experience with respect to the potential of this technique for metabolomics
can in part be validated by taking a look at recent applications of LC-MS in
general and of CapLC-ESI-QTOF-MS in particular. In occupational toxicology
the superiority of LC-MS-MS with respect to sensitivity is now being exploited
for the determination of trace and ultra-trace amounts of biomarkers of expo-
sure (Manini et al. 2004). Quantification of low-abundance molecules in highly
variable complex matrices is considered feasible, provided that precautions
such as those outlined above are taken. Also, many novel metabolites have
been identified and minor metabolic routes for well-known occupational haz-
ards have been uncovered (Manini et al. 2004). Similarly, the mass accuracy
and sensitivity of QTOF-MS coupled to liquid chromatography is now being
applied to the elucidation of unknown environmental micro-contaminants in,
for instance, water samples (Ibanez et al. 2005). Studies of this kind face chal-
lenges similar to those of metabolomics experiments. The emerging picture is
that CapLC-ESI-QTOF-MS can be routinely applied (von Roepenack-Lahaye
et al. 2004; Bino et al. 2005) and has high potential not only for the iden-
tification of selected molecules but for a highly sensitive, robust metabolite
profiling that achieves very good coverage of the metabolome. Obviously this
technology will be undergoing continuous validation and improvement. De-
veloping the profiling is an iterative process. Any progress made with respect
to the availability of standards or reference compounds, the identification of
metabolites, the linear range or the possible matrix effects for a particular mass
signal has to be used to increase further the accuracy of quantification. Also,
protocols for extraction, selective enrichment of metabolites and chromato-
78 S. Clemens et al.
References
Abian J, Oosterkamp AJ, Gelpi E (1999) Comparison of conventional, narrow-bore and capillary
liquid chromatography mass spectrometry for electrospray ionization mass spectrometry:
Practical considerations. J Mass Spectrometry 34:244–254
Bino RJ, Hall RD, Fiehn O, Kopka J, Saito K, Draper J, Nikolau BJ, Mendes P, Roessner-
Tunali U, Beale MH, Trethewey RN, Lange BM, Wurtele ES, Sumner LW (2004) Potential
of metabolomics as a functional genomics tool. Trends Plant Sci 9:418–425
Bino RJ, Ric de Vos CH, Lieberman M, Hall RD, Bovy A, Jonker HH, Tikunov Y, Lommen A,
Moco S, Levin I (2005) The light hyperresponsive high pigment-2dg mutation of tomato:
alterations in the fruit metabolome. New Phytol 166:427–438
Birkemeyer C, Luedemann A, Wagner C, Erban A, Kopka J (2005) Metabolome analysis: the
potential of in vivo labeling with stable isotopes for metabolite profiling. Trends Biotechnol
23:28–33
Chernushevich IV, Loboda AV, Thomson BA (2001) An introduction to quadrupole-time-of-flight
mass spectrometry. J Mass Spectrom 36:849–865
D’Auria JC, Gershenzon J (2005) The secondary metabolism of Arabidopsis thaliana: growing
like a weed. Curr Opin Plant Biol 8:308–316
Dixon RA (2001) Natural products and plant disease resistance. Nature 411:843–847
Dunn WB, Bailey NJ, Johnson HE (2005) Measuring the metabolome: current analytical tech-
nologies. Analyst 130:606–625
Fernie AR, Trethewey RN, Krotzky AJ, Willmitzer L (2004) Metabolite profiling: from diagnostics
to systems biology. Nat Rev Mol Cell Biol 5:763–769
Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol
48:155–171
Fiehn O, Kopka J, Dormann P, Altmann T, Trethewey R, Willmitzer L (2000) Metabolite profiling
for plant functional genomics. Nature Biotechnol 18:1157–1161
Goodacre R, Vaidyanathan S, Dunn WB, Harrigan GG, Kell DB (2004) Metabolomics by numbers:
acquiring and understanding global metabolite data. Trends Biotechnol 22:245–252
Halket JM, Waterman D, Przyborowska AM, Patel RK, Fraser PD, Bramley PM (2005) Chemical
derivatization and mass spectral libraries in metabolic profiling by GC/MS and LC/MS/MS.
J Exp Bot 56:219–243
Hirai MY, Klein M, Fujikawa Y, Yano M, Goodenowe DB, Yamazaki Y, Kanaya S, Nakamura Y,
Kitayama M, Suzuki H, Sakurai N, Shibata D, Tokuhisa J, Reichelt M, Gershenzon J, Pa-
penbrock J, Saito K (2005) Elucidation of gene-to-gene and metabolite-to-gene networks in
Capillary LC-ESI-QTOF-MS-based profiling 79
1 Introduction
1 The
National Centre for Plant and Microbial Metabolomics, Rothamsted Research, West Com-
mon, Harpenden, Herts. AL5 2JQ, UK, e-mail: [email protected], [email protected]
Fig. 1. a 600 MHz 1 H NMR spectrum of 4:1 (D2 O:CD3 OD) extract of freeze-dried Arabidopsis
thaliana Col-0 tissue. b Expanded portion of the spectrum featuring the carbohydrate anomeric
proton region highlighting differences in carbohydrate concentrations between Col-0 and the
starch biosynthesis mutants pgm-1 and adg-1. Labelled peaks: 1-sucrose, 2-maltose, 3-glucose.
c PCA scores plot illustrating differences observed between Col-0 and the adg-1/pgm-1 mutants
(filled squares – Col-0, open triangles – pgm-1, filled circles – adg-1). d Loadings plot of PC1
depicting the ‘spectra’ of compounds responsible for differences between Col-0 and the mutants.
e Loadings plot of PC2 depicting the differences between adg-1 and pgm-1
mutants in the starch biosynthesis pathway (pgm-1 and adg-1) can be seen.
These differences can be interpreted in relation to the known function of
the enzymes missing in the mutants. However, in most plant metabolomic
applications many hundreds of similar spectra are collected. Simultaneous
classical interpretation of these numbers of individual spectra is not possible,
but the use of chemometrics (see next section) allows the spectroscopist to
84 J.L. Ward and M.H. Beale
3 Data Analysis
NMR-based metabolomic datasets are very large, both in terms of the number
of datapoints per sample (typically 32 k or 64 k), and also the number of samples
and resulting spectra acquired (from dozens to thousands, often including
replicates). In order to draw conclusions and make comparisons between large
numbers of spectra, automated strategies must be employed for the analysis
and interpretation of such data once they have been acquired. The literature
on data analysis is extensive and will only be briefly discussed here. Interested
readers are directed to the review on pattern recognition methods (Lindon et
al. 2001) for further coverage of some of the issues.
Data manipulation typically starts with some form of ‘bucketing’ or ‘bin-
ning’ whereby the spectrum is split into discrete regions (typically between
0.01 and 0.04 ppm in width), which are then integrated to return a list of inte-
gral values for each spectrum. Whilst this reduces the resolution of the data, it
has the advantage of removing small chemical shift changes due to slight pH
variation between samples. Increasingly, work is being carried out using all of
the datapoints in the spectrum by employing an algorithm to align the peaks,
eliminating any unwanted variation (Stoyanova et al. 2004). NMR data is usu-
ally analysed initially using multivariate statistical methods such as Principal
Component Analysis (PCA). PCA is a data visualisation method, useful for ob-
serving groupings within large datasets. There are a number of commercially
available software products that carry out PCA and other related multivariate
analyses. One that has been widely used for NMR data is SIMCA-P (Umetrics,
Sweden). A PCA model can be displayed in a graphical fashion as a “scores”
plot as shown in Fig. 1c. This example compares 1 H-spectra collected from
polar extracts of wild-type Arabidopsis thaliana Col-0 with those from the two
mutants in starch biosynthesis (pgm-1 and adg-1). This plot is useful for ob-
serving any groupings in the data set and in addition will highlight outliers that
may be due to errors in sample preparation or instrumentation parameters etc.
Coefficients by which the original variables must be multiplied to obtain the
score are called “loadings”. Thus, “loading plots” [e. g. Fig. 1d,e] can be used to
detect and display the spectral areas responsible for the separation in the data,
and can be interpreted as positive and negative NMR spectra of the compounds
responsible for the differences between the clusters. The numerical value of the
loading of a given variable on a PC indicates how much the variable has in com-
mon with that component (Massart et al. 1988). In Fig. 1d, PC1 represents the
NMR spectra of compounds differing between wild-types and both mutants,
whilst PC2, Fig. 1e, represents the (smaller) difference between the mutants.
NMR Spectroscopy in Plant Metabolomics 85
4 Two-dimensional NMR
13 C and 15 N-labelling (Fox et al. 1995; Prabhu et al. 1996; Schleucher et al.
1998). The analysis of stable isotope labelling in pulse-chase and time-course
experiments can also provide quantitative information on metabolic fluxes
although this is restricted to simple linear pathways that are reasonably close
to the entry point of the label into metabolism. For complicated pathways it
may be useful to examine the distribution of the label once the system has
reached a steady state.
6 Hyphenated NMR
cartridge concentrates the analytes. There are still relatively few publications
using this technology although, an application of LC-SPE-NMR to the detection
of compounds from oregano was recently reported (Exarchou et al. 2003). Very
recently the technology was used for the rapid identification of antioxidants in
complex commercial rosemary extracts (Pukalskas et al. 2005). In this work,
all major compounds present in the extract were collected on SPE cartridges
after their separation and analysed by both NMR and ESI-MS. LC-SPE-NMR
using post column solid-phase extraction was also applied to the direct analysis
of phenolic compounds in the polar fraction of olive oil (Christophoridou et
al. 2005). As well as the identification of simple phenolic acids, lignans and
flavonoids the technique enabled the identification of several new phenolic
compounds not previously reported as constituents of olive oil.
References
Aranìbar N, Singh BJ, Stockton GW, Ott KH (2001) Automated mode-of-action detection by
metabolic profiling. Biochem Biophysl Res Commun 286:150–155
Bollard ME, Stanley EG, Lindon JC, Nicholson JK, Holmes E (2005) NMR-based metabonomic
approaches for evaluating physiological influences on biofluid composition. NMR Biomed
18:143–162
Buddrus J, Herzog H (1980) Coupling of HPLC and NMR.1. Analysis of flowing liquid-
chromatographic fractions by proton magnetic-resonance. Org Magn Reson 13:153–155
Charlton AJ, Farrington WHH, Brereton P (2002) Application of 1 H-NMR and multivariate
statistics for screening complex mixtures: quality control and authenticity of instant coffee.
J Agric Food Chem 50:3098–3103
Charlton A, Allnutt T, Holmes S, Chisholm J, Bean S, Ellis N, Mullineaux P, Oehlschlager S (2003)
NMR profiling of transgenic peas. Plant Biotech J 2:27–35
Choi HY, Kim HK, Hazekamp A, Erkelens C, Lefeber AWM, Verpoorte R (2004a) Metabolomic Dif-
ferentation of Cannabis sativa Cultivars using 1 NMR spectroscopy and principal component
analysis. J Nat Prod 67:953–957
Choi H, Choi HY, Verberne M, Lefeber AMW, Erkelens C, Verpoorte R (2004b) Metabolic finger-
printing of wild type and transgenic tobacco plants by 1 H NMR and multivariate analysis
technique. Phytochemistry 65:857–864
Choi HY, Tapias Casas E, Kim KH, Lefeber AMW, Erkelens C, Verhoeven JTH, Brzin J, Zel J,
Verpoorte R (2004c) Metabolic discrimination of Catharanthus roseus leaves infected by Phy-
toplasma using 1 H-NMR spectroscopy and multivariate data analysis. Plant Physiol 135:398–
2410
Christophoridou S, Dais P, Tseng L-H, Spraul M (2005) Separation and identification of phenolic
compounds in olive oil by coupling high-performance liquid chromatorgraphy with post-
column solid-phase extraction to nuclear magnetic resonance spectroscopy (LC-SPE-NMR).
J Agric Food Chem 53:4667–4679
Cornah JE, Germain V, Ward JL, Beale MH, Smith SM (2004) Lipid utilisation, gluconeogenesis
and seedling growth in Arabidopsis mutants lacking the glyoxylate cycle enzyme malate
synthase. J Biol Chem 279:42916–42923
Defernez M, Gunning YM, Parr AJ, Shepherd LVT, Davies HV, Colquhoun IJ (2004) NMR and
HPLC-UV profiling of potatoes with genetic modifications to metabolic pathways. J Agric
Food Chem 52:6075–6085
Exarchou V, Godejohann M, van Beek TA, Gerothanassis IP, Vervoort J (2003) LC-UV-solid-
phase extraction-NMR-MS combined with a cryogenic flow probe and its application to the
identification of compounds present in Greek oregano. Anal Chem 75:6288–6294
Fan TMW, Lane AN, Pedler J, Crowley D, Higashi RM (1997) Comprehensive analysis of organic
ligands in whole root exudates using nuclear magnetic resonance and gas chromatography-
mass spectrometry. Anal Biochem 251:57–68
90 J.L. Ward and M.H. Beale
Fox GG, Ratcliffe RG, Robinson SA, Stewart GR (1995) Evidence for deamination by glutamate-
dehydrogenase in higher-plants – commentary. Can J Bot 73:1112–1115
Frederich M, Choiu YH, Angenot L, Harnischfeger G, Lefeber AWM, Verpoorte R (2004)
Metabolomic analysis of Strychnos nux-vomica, Strychnos icaja and Strychnos ignatii ex-
tracts by 1 H nuclear magnetic resonance spectrometry and multivariate analysis techniques.
Phytochemistry 65:1993–2001
Gavaghan CL, Wilson ID, Nicholson JK (2002) Physiological variation in metabolic phenotyping
and functional genomic studies: use of orthogonal signal correction and PLS-DA. FEBS Lett
530:191–196
Gil AM, Duarte I, Cabrita E, Goodfellow BJ, Spraul M, Kerssebaum R (2004) Exploratory applica-
tions of diffusion ordered spectroscopy to liquid foods: an aid towards spectral assignment.
Anal Chim Acta 506:215–223
Hostettmann K, Wolfender JL (2001) Applications of liquid chromatography/UV/MS and liq-
uid chromatography/NMR for the online identification of plant metabolites. In: Tringali C
(ed) Bioactive compounds from natural products-isolation, characterisation and biological
properties. Taylor and Francis, London, pp 31–68
Keun HC, Ebbels TMD, Antti H, Bollard ME, Beckonert O, Holmes E, Lindon JC, Nicholson JK
(2003) Improved analysis of multivariate data by variable stability scaling: application to
NMR-based metabolic profiling. Anal Chim Acta 490:265–276
Kikuchi J, Shinozaki K, Hirayama T (2004) Stable isotope labelling of Arabidopsis thaliana for an
NMR-based metabolomics approach. Plant Cell Physiol 45:1099–1104
Le Gall G, Colquhoun IJ, Davis AL, Collins GJ, Verhoeyen ME (2003) Metabolite profiling of
tomato (Lycopersicon esculentum) using 1 H NMR spectroscopy as a tool to detect potential
unintended effects following a genetic modification. J Agric Food Chem 51:2447–2456
Le Gall G, Metzdorff SB, Pedersen J, Bennett RN, Colquhoun IJ (2005) Metabolite profiling of
Arabidopsis thaliana (L.) plants transformed with an antisense chalcone synthase gene.
Metabolomics 1:181–198
Lewis J, Baker JM, Beale MH, Ward JL (2003) Metabolite profiling of GM plants: the impor-
tance of robust experimental design and execution. In: Nap JP, Atanassov A, Stiekema WJ
(eds) Genomics for biosafety in plant biotechnology. NATO science series I, 359. IOS Press,
Amsterdam, pp 47–57
Lindon JC, Nicholson JK, Holmes E, Everett JR (2000) Metabonomics: metabolic processes studied
by NMR spectroscopy. Concepts Magn Reson 12:289–320
Lindon JC, Holmes E, Nicholson JK (2001) Pattern recognition methods and applications in
biomedical magnetic resonance. Prog Nucl Magn Reson Spectrosc 39:1–40
Lindon JC, Holmes E, Nicholson JK (2004) Toxological applications of magnetic resonance. Prog
Nucl Magn Reson Spectrosc 45:109–143
Manetti C, Bianchetti C, Bizzari M, Casciani L, Castro C, d’Ascenzo G, Delfini M, di Cocco ME,
Lagana A, Miccheli A, Motto M, Conti F (2004) NMR-based metabonomic study of transgenic
maize. Phytochemistry 65:3187–3198
Massart DL, Vandeginste BGM, Deming SN, Michotte Y, Kauffman L (1988) Chemometrics:
a textbook. Elsevier, Amsterdam
Nicholson JK, Lindon JC, Holmes E (1999) ‘Metabonomics’: understanding the metabolic re-
sponses of living systems to pathophysiological stimuli via multivariate statistical analysis of
biological NMR spectroscopic data. Xenobiotica 29:1181–1189
Noteborn HPJM, Lommen A, van der Jagt RC, Weseman JM (2000) Chemical fingerprinting for
the evaluation of unintended secondaty metabolic changes in transgenic food crops. J Biotech
77:103–114
Prabhu V, Chatson KB, Abrams GD, King J (1996) C-13 nuclear magnetic resonance detection
of interactions of serine hydroxymethyltransferase with C1-tetrahydrofolate synthase and
glycine decarboxylase complex activities in Arabidopsis. Plant Physiol 112:207–216
Pukalskas A, van Beek TA, de Waard P (2005) Development of a triple hyphenated HPLC-radical
scavenging detection-DAD-SPE-NMR system for the rapid identification of antioxidants in
complex plant extracts. J Chromatography A 1074:81–88
NMR Spectroscopy in Plant Metabolomics 91
Ratcliffe RG, Shachar-Hill Y (2001) Probing plant metabolism with NMR. Annu Rev Physiol Plant
Mol Biol 52:499–526
Ratcliffe RG, Shachar-Hill Y (2005) Revealing metabolic phenotypes in plants: inputs from NMR
analysis. Biol Rev 80:27–43
Ratcliffe RG, Roscher A, Shachar-Hill Y (2001) Plant NMR spectroscopy. Prog Nucl Magn Reson
Spectrosc 39:267–300
Roberts JKM (2000) NMR adventures in the metabolic labyrinth within plants. Trends Plant Sci
5:30–34
Roscher NJ, Kruger NJ, Ratcliffe RG (2000) Strategies for metabolic flux analysis in plants using
isotope labelling. J Biotechnol 77:81–102
Schaefer J, Skokut TA, Stejskal EO, McKay RA, Varner JE (1981) Estimation of protein-turnover
in soybean leaves using magic angle double cross-polarization N-15 nuclear magnetic-
resonance. J Biol Chem 256:1574–1579
Schleucher J, Vanderveer PJ, Sharkey TD (1998) Export of carbon from chloroplasts at night.
Plant Physiol 118:1439–1445
Stoyanova R, Nicholls AW, Nicholson JK, Lindon JC, Brown TR (2004) Automatic alignment of
individual peaks in large high-resolution of spectral data sets. J Magn Reson 170:329–335
Viant MR (2003) Improved mthods for the acquisition and interpretation of NMR metabolomic
data. Biochem Biophys Res Commun 310:943–948
Vogels JTWE, Terwel L, Tas AC, van den Berg F, Dukel F, van der Greef J (1996) Detection of
adulteration in orange juices by a new screening method using proton NMR spectroscopy in
combination with pattern recognition techniques. J Agric Food Chem 44:175–180
Vogler B, Klaiber I, Roos G, Walter CU, Hiller W, Sandor P, Kraus W (1998) Combination of
LC-MS and LC-NMR as a tool for the structure determination of natural products. J Nat Prod
61:175–178
Ward JL, Harris C, Lewis J, Beale MH (2003) Assessment of 1 H NMR spectroscopy and multivariate
analysis as a technique for metabolite fingerprinting of Arabidopsis thaliana. Phytochemistry
62:949–957
Wolfender JL, Rodriguez S, Hostettmann K, Hiller W (1997) Liquid chromatography/ultra-
violet/mass spectrometric and liquid chromatography/nuclear magnetic resonance spec-
troscopic analysis of crude extracts of Gentianaceae species. Phytochem Anal 8:97–104
Wolfender JL, Ndjoko K, Hostettmann K (2001) The potential of LC-NMR in phytochemical
analysis. Phytochem Anal 12:2–22
I.7 Hetero-nuclear NMR-based Metabolomics
J. Kikuchi1,2,3 and T. Hirayama2,3,4,5
1 Introduction
Novel methods for measurement of living systems are making new break-
throughs in life science. In the era of the metabolome (analysis of all mea-
surable metabolites), a mass spectrometry (MS)-based approach is considered
to be the major technology (Aharoni et al. 2002; Fiehn 2002; Sumner et al.
2003), whereas a nuclear magnetic resonance (NMR )-based method is fre-
quently regarded as a minor technology due to its low sensitivity. However,
we intend to strengthen the NMR-based approach, using advantages of NMR
measurement, such as high quantification, non-invasive measurements, local-
ized in vivo spectroscopy, selectivity of nuclear environments, and validity of
structure analysis of diverse biomolecules including stereo-isomers. Attrac-
tive NMR-based metabolic analyses can be achieved by uniform stable isotope
labeling of organisms allowing the application of multi-dimensional NMR ex-
periments that have been used in protein structure determination (Kikuchi et
al. 2004; Kikuchi and Hirayama 2005). Using these novel methods, the dynamic
molecular networks inside cells and tissues will be dissected.
230-0045 Japan
5 Laboratory of Plant Molecular Biology, RIKEN Tsukuba Institute, 3-1-1 Koyadai, Tsukuba, 305-
0074 Japan
(Ernst 1992; Claridge 1999). NMR spectroscopy provides many new insights
into the physiology of higher plants. The evolution of this particular application
of NMR can be traced back to the ground-breaking 13 CNMR studies using
magic-angle spinning methods (Schaefer and Stejskal 1976). The subsequent
developments of the technique and its applications have been charted at regular
intervals in the review literature, and although not as widely exploited as its
proponents might wish, NMR is now becoming an established technique in the
armory of plant biochemists.
3 1 H-NMR-based Metabolomics
NMR signals are highly reproducible, and quantitative assessment of each
metabolite in a sample is therefore guaranteed. In contrast, MS-signals are
sometimes less quantitative due to problems of matrix effects (“ion suppres-
sion” or “ion enhancement”) (Mei et al. 2003; Mallet et al. 2004). Because
NMR is a nondestructive technique, it is easy to combine NMR analysis with
a complementary technique such as gas chromatography/MS or liquid chro-
matography/MS (Corcoran and Spraul 2003; Ott et al. 2003). In contrast to
these applications in which numerous specific metabolites can be identified in
complex mixtures, other investigators have addressed the question of whether
computer-aided comparisons of the 1 H NMR spectra of partially fractionated
extracts can yield statistically meaningful metabolic fingerprints of the ex-
tracted tissue. Using this approach, it was possible to show that there were min-
imal compositional differences between certain transgenic and non-transgenic
tobacco varieties, but only after accounting for the substantial effects of exter-
nal factors (Choi et al. 2004).
The nitrogen atom has two magnetic isotopes, 14 N and 15 N, and both can be
useful for the detection of metabolites in vivo and in extracts. The practicality of
detecting the naturally abundant (99.63%) 14 N isotope was first demonstrated
in root tissues and subsequently in vivo 14 N NMR has mainly been used for the
analysis of ammonium and nitrate. The extremely low natural abundance of
the 15 N isotope (0.037%) rules out the detection of unlabeled metabolites, but
after labeling with [15 N]ammonium or [15 N]nitrate it is possible to use in vivo
15 N NMR to detect amino acids, as well as certain secondary products. NMR
methods are relatively insensitive, so only signals from compounds present at
relatively high levels (concentrations of at least 10 μmol/L) can be detected in
spectra (Krishnan et al. 2005). Since metabolic engineering often results in the
accumulation of relatively high concentrations of metabolites, this insensitivity
is often not as restrictive for compound detection and identification as it is in
other areas of biochemistry.
In recent years, hetero-nuclear NMR methods and their spectral editing tech-
nologies have developed rapidly. For example, careful selection of window
functions and base-line corrections of two dimensional (2D)-spectra yielded
improved signal dispersion and line shapes of cross peaks permitting clear
subtraction 2D-spectra, a technically difficult and time-consuming procedure
using conventional 1D-NMR technology (Deferenz and Colquhoun 2003). With
the methodology used in recent protein NMR studies, differences in the molec-
ular composition between wild type and mutant strains can be easily quan-
tified. Therefore, we think advanced technologies in NMR analysis combined
with stable isotope labeling are useful tool for metabolomic analysis. We report
here stable isotope labeling experiments in Arabidopsis using carbon or nitro-
gen, two of the largest components of all organic compounds (Kikuchi et al.
2004; Kikuchi and Hirayama 2005). Figure 1 shows the basic concept of NMR-
based plant metabolomics proposed in this study. The NMR-based approach
has an advantage when comparing different samples. Spectral subtraction be-
tween different mutants or stimuli enables metabolite levels between different
samples to be quantified.
Fig. 1. Comparison of ordinal metabolomics approach (left: PCA-based) and our hetero-nuclear
NMR metabolomics approach (right: multi-dimensional NMR-based)
Fig. 2. a Example of how spectral subtraction can be used to differentiate environmental stress
responses between WT and gek1 (Hirayama et al. 2004) mutant. b For NMR spectroscopy,
5 mg of freshly frozen samples were heated with 0.5 mL H2 O and centrifuged at 15,000 g for
5 min to remove insoluble fractions. After adding 50 μL of 2 H2 O for NMR lock, supernatants
were transferred into 5-mm NMR tubes. The spectra were measured on a Bruker DRX-500
spectrometer equipped with a 1 H inverse probe with triple axis gradient. A total of 200 complex
f1 (13 C) and 1024 complex f2 (1 H) points were recorded with 64 scans per f1 increment. The
spectral widths were 12,000 Hz and 8400 Hz for f1 and f2, respectively. c To quantify the signal
intensities, a Lorentzian-to-Gaussian window with a Lorentzian line width of 10 Hz and a Gaussian
line width of 15 Hz was applied in both dimensions, prior to Fourier transformation. A fifth order
polynomial baseline correction was subsequently applied in the f1 dimension (Kikuchi et al.
2002). The indirect dimension was zero-filled to 2048 points in the final data matrix. NMR
spectra were processed using NMRPipe software (Delaglio et al. 1995). Quantitative 2D-spectral
subtraction was accomplished by editing a macro program of the NMRPipe software. Signal
assignments are highlighted next to the corresponding cross peaks
Fig. 3. Development of the 1 H–15 N HSQC spectrum measured in living 15 N-labeled seeds that
was induced by soaking the dry seeds in water (pictures shown at the bottom). A total of 128
complex f1 (15 N) and 1024 complex f2 (1 H) points were recorded with 96 scans per f1 increment.
The spectral widths were 4500 Hz and 8400 Hz for f1 and f2, respectively. Two spectra: a dark
at 0 h; b light at 78 h are shown for comparison. Signal assignments are highlighted next to the
corresponding cross peaks
Acknowledgements. This work was supported in part by RIKEN GSC Internal Collaborations
(No. 830-56625), by CREST (No. A88-54366), Japan Science and Technology Agency to J.K., T.H.
We also acknowledge Grants-in-Aid for Scientific Research (No. 15710171, to J.K.; No. 15570045,
to T.H.) from the Ministry of Education, Science, Sports and Culture of Japan.
References
Aharoni A, Ric de Vos CH, Verhoeven HA, Maliepaard CA, Kruppa G, Bino R, Goodenowe DB
(2002) Nontargeted metabolome analysis by use of Fourier Transform Ion Cyclotron Mass
Spectrometry. OMICS 6:217–234
Bodenhausen G, Ruben DJ (1980) Natural abundance nitrogen-15 NMR by enhanced hetero-
nuclear spectroscopy. Chem Phys Lett 69:185–189
100 J. Kikuchi and T. Hirayama
Choi H-K, Choi YH, Verberne M, Lefeber AWM, Erkelens C, Verpoorte R (2004) Metabolic
fingerprinting of wild type and transgenic tobacco plants by 1 H NMR and multivariate
analysis technique. Phytochemistry 65:857–864
Claridge TDW (1999) High-resolution NMR techniques in organic chemistry. Elsevier Science,
London, UK
Corcoran O, Spraul M (2003) LC-NMR-MS in drug discovery. Drug Discov Today 8:624–631
Defernez M, Colquhoun IJ (2003) Factors affecting the robustness of metabolite fingerprinting
using 1 H NMR spectra. Phytochemistry 62:1009–1017
Delaglio F, Grzesiek S, Vuister G W, Zhu G, Pfeifer J, Bax A (1995) NMRPipe: a multidimensional
spectral processing system based on UNIX pipes. J Biomol NMR 6:277–293
Ernst RR (1992) Nuclear magnetic resonance Fourier transform spectroscopy. Angew Chem Int
Ed Engl 31:805–823
Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol
48:155–171
Grzesiek S, Bax A (1993) The importance of not saturating H2 O in protein NMR. Application to
sensitivity enhancement and NOE measurements. J Am Chem Soc 115:12593–12594
Hare PD, Cress WA, van Staden J (1998) Dissecting the roles of osmolyte accumulation during
stress. Plant Cell Environ 21:535–553
Hinse C, Richter A, Provenzani J, Stöckgt J (2003) In vivo monitoring of alkaloid metabolism
in hybrid plant cell cultures by 2D cryo-probe NMR without labeling. Bioorg Med Chem
11:3913–3919
Hirayama T, Fujishige N, Kunii N, Iuchi S, Shinozaki K (2004) A novel ethanol hypersensitive
mutant of Arabidopsis. Plant Cell Physiol 45:703–711
Horiuchi T, Takahashi M, Kikuchi J, Yokoyama S, Maeda H (2005) Effect of dielectric properties
of solvents on the quality factor for a beyond 900 MHz cryogenic probe model. J Magn Reson
174:34–42
Kikuchi J, Asakura T (1999) Use of 13 C conformation-dependent chemical shifts to elucidate
the local structure of a large protein with homologous domains in solution and solid state.
J Biochem Biophys Method 38:203–208
Kikuchi J, Hirayama T (2005) Novel methods for uniform stable isotope labeling in plant and
animal systems for a hetero-nuclear NMR based metabomics. 1st Int Metabol Meeting
Kikuchi J, Williamson MP, Shimada K, Asakura T (2000) Structure and dynamics of photosyn-
thetic membrane-bound proteins in Rhodobacter sphaeroides, studied with solid-state NMR
spectroscopy. Photosyn Res 63:259–267
Kikuchi J, Iwahara J, Kigawa T, Murakami T, Okazaki T, Yokoyama S (2002) Solution structure
determination of the two DNA-binding domains in the Shizosaccharomyces pombe Abp1 pro-
tein by a combination of dipolar coupling and diffusion anisotropy restraints. J Biomol NMR
22:333–347
Kikuchi J, Shinozaki K, Hirayama T (2004) Stable isotope labeling of Arabidopsis thaliana for
a hetero-nuclear NMR-based metabolomics approach. Plant Cell Physiol 45:1099–1104
Kiyoshi T, Maeda H, Kikuchi J, Ito Y, Hirota H, Yokoyama S, Ito S, Miki T, Hamada M, Ozaki O,
Hayashi S, Kurihara N, Suematsu H, Yoshikawa M, Matsumoto S, Sato A, Wada H (2004)
Present status of 920 MHz high-resolution NMR spectrometers. IEEE Trans Appl Supercond
14:1608–1612
Krishnan P, Kruger NJ, Ratcliffe RJ (2005) Metabolic finger printing and profiling in plants by
NMR. J Exp Bot 56:255–265
Mallet CR, Lu A, Mazzeo JR (2004) A study of ion suppression effects in electrospray ioniza-
tion from mobile phase additives and solid-phase extracts. Rapid Commun Mass Spectrom
18:49–58
Mei H, Hsieh Y, Nardo C, Xu X, Wang S, Ng K, Korfmacher WA (2003) Investigation of matrix ef-
fects in bioanalytical high-performance liquid chromatography/tandem mass spectrometric
assays: application to drug discovery. Rapid Commun Mass Spectrom 17:97–103
Hetero-nuclear NMR-based Metabolomics 101
1 Introduction
not yet received the crucial support from the publishing world. Metabolomics
is no different, and recently two proposals have been published to define stan-
dards for plant metabolomic data and metadata, ArMet (Jenkins et al. 2004)
and MIAMET (Bino et al. 2004). Invariably, these attempts build upon the
MIAME standard and hopefully will soon allow unequivocal specification of
systems biology data. (The existing Systems Biology Markup Language, SBML
(Hucka et al. 2003), is a standard for specifying systems biology models, rather
than data.)
In the remainder of this chapter we will delineate ongoing efforts in our lab-
oratory pertaining to integrating metabolomics and other functional genomics
data. These efforts have arisen from our participation in plant systems biology
studies of Medicago truncatula and Vitis vinifera, and also of the yeast Saccha-
romyces cerevisiae. All of these are large team efforts and we acknowledge all
our collaborators for their vital role in these projects (see below).
2 Databases
Fig. 1. High level schema of the integrative functional genomics database DOME. Metadata
tables are used to provide context to the actual experimental data. Raw data from microarray,
2D-PAGE-MS, and various metabolomics technologies are stored in separate tables. These data
are transformed by their appropriate normalization methods, keeping intermediate values, and
finally arriving at numerical summaries of each sample (such as means and standard deviations),
which are then comparable across all technologies and stored in the sp summary table. It is the
data in sp summary that is then processed with higher-level statistical analyses or visualizations.
It is also at this level that background information about the known molecular biology of the
system (B-Net) can be integrated
from the analysis. These coordinates are context specific, and could be retention
time of a separation, mass-to-charge ratio, chemical shift, wavenumber, etc.
Another objective with this naming convention is to allow for future analyses
that attempt to establish identity between these unknown metabolites. It is
expected that many of these are observed in different studies in separate
laboratories, and through the inclusion of the analysis coordinates in their
names it becomes easier to recognize that two unknowns may actually be the
same molecule. For example, if several studies consistently identified a peak
in GC-MS (using the same extraction and analysis parameters) with the same
retention time and main ion mass, then it may be that the two are the same
metabolite. By assigning names derived from this scheme, it then also becomes
possible to create lists of molecular entities that have been observed and not
yet identified (a kind of “orphan” list for metabolomics).
Another unresolved issue that is being encountered in our projects is that
the same metabolites in a sample might have been observed by more than one
technique. The problem that is posed then is which quantification should one
chose if they do not agree. This is complicated by the fact that when metabolites
appear in an analysis, they may not have been present in the original sample,
but instead result from an artifact of the extraction method. Another reason
could be that the same metabolite might be present in different locations in
the cell, leading to the metabolite being isolated in two separate pools. In the
latter case the two pools should both be represented in the database, while
in the case of artifacts, one should use only the more accurate quantification.
This issue results in a need for careful annotation of metabolomics results,
but also requires special structures in the database schema that are capable of
representing several pools of a single metabolite.
3 Data Visualization
are present in a certain map, such that the researcher can then quickly ob-
serve their levels organized according to how we believe the biochemistry is
organized. This could help in understanding how changes in mRNA or protein
levels affect the level of metabolites in a certain pathway or network. However
this is not as straightforward as it may seem: the changes in level of mRNA are
likely very different from changes in protein levels or changes in metabolite
levels. Cells cannot tolerate large changes of many metabolites, while mRNA
levels can change widely without much toxicity. Thus, in order to visualize
the expression of metabolites, mRNA, and proteins in the same biochemical
map, they need to be expressed on different scales, or otherwise normalized to
some comparable scale. A problem with thinking about data as part of some
biochemical map (“pathway”) is that it is likely that molecules in the map are
also involved in other interactions not depicted there. Therefore, looking at
a particular slice of a network could be highly misleading. It has been shown
that the concentrations of metabolites next to each other in a metabolic map do
not necessarily have high correlation (Steuer et al. 2003; Camacho et al. 2005),
strengthening this point. In order to understand a change in the level of a par-
ticular metabolite, it may be more useful to view the expression changes of all
enzymes (i. e. their protein and mRNA levels) linked with that metabolite. For
this we have developed the concept of metabolite neighborhood maps (Xing Li
et al. 2002), which are local views of the biochemical network and consist of all
the reactions that affect the metabolite of interest, including all the metabolites
and enzymes that take part in those reactions. BROME has a large number of
maps available, from these neighborhood maps to the nice pathway maps of
the KEGG system (Kanehisa et al. 2004).
4 Data Analysis
Analysis of metabolomic data can use the same multivariate statistical meth-
ods that are widely used in microarray data analysis. These methods can be
either supervised, where each sample or variable (molecule) is associated to
an already known class, or unsupervised, where there is no pre-classification
of the data (Mendes 2002; Sumner et al. 2003; Goodacre et al. 2004). Unsuper-
vised methods are widely popular, and the most used are principal component
analysis (PCA), hierarchical clustering (HCA), k-means clustering, and self-
organizing maps (SOM). Unsupervised analyses are mostly guided by the vari-
ance and covariance (or correlation) in the data sets, so they are good at finding
patterns therein; however nothing guarantees, other than a careful experimen-
tal design, that the largest variance is indeed a result of the perturbation rather
than other unwanted effects. On the other hand, supervised analyses are guided
by the pre-existing knowledge provided by the researcher and so are usually
based on discrimination, a property that is more related to the consistency
of the members in a class, and the differences between classes. Supervised
112 B. Mehrotra and P. Mendes
5 Conclusion
Acknowledgements. We thank our collaborators John Cushman, Grant Cramer, Rick Dixon, Greg
May, David Schooley, Vladimir Shulaev, and Lloyd Sumner for the excellent experimental and
analytical data collection work in their laboratories. We also thank our colleagues Xing Li and
Aejaaz Kamal for their work on the DOME and BROME systems, and Diogo Camacho and Alberto
de la Fuente for the metabolite correlation analysis. We acknowledge the generous support of
the National Science Foundation’s Plant Genome Research Program (awards DBI–0109732 and
DBI–0217653).
References
Allen J, Davey HM, Broadhurst D et al. (2003) High-throughput classification of yeast mutants
for functional genomics using metabolic footprinting. Nat Biotechnol 21:692–696
Ashburner M, Ball CA, Blake JA et al. (2000) Gene ontology: tool for the unification of biology.
The Gene Ontology Consortium. Nat Genet 25:25–29
Bairoch A, Apweiler R, Wu CH et al. (2005) The Universal Protein Resource (UniProt). Nucleic
Acids Res 33:D154–D159
Bino RJ, Hall RD, Fiehn O et al. (2004) Potential of metabolomics as a functional genomics tool.
Trends Plant Sci 9:418–425
Brazma A, Hingamp P, Quackenbush J et al. (2001) Minimum information about a microarray
experiment (MIAME)-toward standards for microarray data. Nature Genet 29:365–371
Broeckling CD, Huhman DV, Farag MA et al. (2005) Metabolic profiling of Medicago truncatula
cell cultures reveals the effects of biotic and abiotic elicitors on metabolism. J Exp Bot
56:323–336
Buckingham J (1994) Dictionary of natural products. Chapman and Hall/CRC, London
Bundy JG, Willey TL, Castell RS, Ellar DJ, Brindle KM (2005) Discrimination of pathogenic clinical
isolates and laboratory strains of Bacillus cereus by NMR-based metabolomic profiling. FEMS
Microbiol Lett 242:127–136
Camacho D, de la Fuente A, Mendes P (2005) The origin of correlations in metabolomics data.
Metabolomics 1:53–63
Davidson SB, Overton C, Buneman P (1995) Challenges in integrating biological data sources.
J Comput Biol 2:557–572
Davies T (1998) The new Automated Mass Spectrometry Deconvolution and Identification System
(AMDIS). Spectroscopy, Europe 10:24–27
Dwight SS, Balakrishnan R, Christie KR et al. (2004) Saccharomyces genome database: underlying
principles and organisation. Brief Bioinform 5:9–22
Fernie AR, Trethewey RN, Krotzky AJ, Willmitzer L (2004) Metabolite profiling: from diagnostics
to systems biology. Nat Rev Mol Cell Biol 5:763–769
Gavaghan CL, Wilson ID, Nicholson JK (2002) Physiological variation in metabolic phenotyping
and functional genomic studies: use of orthogonal signal correction and PLS-DA. FEBS Lett
530:191–196
114 B. Mehrotra and P. Mendes
Goodacre R (2005) Making sense of the metabolome using evolutionary computation: seeing the
wood with the trees. J Exp Bot 56:245–254
Goodacre R, Shann B, Gilbert RJ et al. (2000) Detection of the dipicolinic acid biomarker in Bacil-
lus spores using Curie-point pyrolysis mass spectrometry and Fourier transform infrared
spectroscopy. Anal Chem 72:119–127
Goodacre R, Vaidyanathan S, Dunn WB, Harrigan GG, Kell DB (2004) Metabolomics by numbers:
acquiring and understanding global metabolite data. Trends Biotechnol 22:245–252
Hucka M, Beale M, Fiehn O et al. (2003) The systems biology markup language (SBML): a medium
for representation and exchange of biochemical network models. Bioinformatics 19:524–531
Jenkins H, Hardy N, Beckmann M et al. (2004) A proposed framework for the description of plant
metabolomics experiments and their results. Nat Biotechnol 22:1601–1606
Johnson HE, Broadhurst D, Goodacre R, Smith AR (2003) Metabolic fingerprinting of salt-stressed
tomatoes. Phytochem 62:919–928.
Jonsson P, Broadhurst D, Goodacre R et al. (2004) A strategy for identifying differences in large
series of metabolomic samples analyzed by GC/MS. Anal Chem 76:1738–1745
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering
the genome. Nucleic Acids Res 32:D277–D280
Kell DB (2004) Metabolomics and systems biology: making sense of the soup. Curr Opin Microbiol
7:296–307
Kitano H (2002) Systems biology: a brief overview. Science 295:1662–1664
Krieger CJ, Zhang P, Mueller LA et al. (2004) MetaCyc: a multiorganism database of metabolic
pathways and enzymes. Nucleic Acids Res 32:D438–D442
Lange BM, Ghassemian M (2005) Comprehensive post-genomic data analysis approaches inte-
grating biochemical pathway maps. Phytochemistry 66:413–451
Lee Y, Tsai J, Sunkara S et al. (2005) The TIGR Gene Indices: clustering and assembling EST and
known genes and integration with eukaryotic genomes. Nucleic Acids Res 33:D71–D74
Luyf AC, de Gast J, van Kampen AH (2002) Visualizing metabolic activity on a genome-wide
scale. Bioinformatics 18:813–818
Mendes P (2001) Modeling large scale biological systems from functional genomic data: parameter
estimation. In: Kitano H (ed) Foundations of systems biology. MIT Press, Cambridge, MA,
pp 163–186
Mendes P (2002) Emerging bioinformatics for the metabolome. Brief Bioinform 3:134–145
Mendes P, de la Fuente A, Hoops S (2002) Bioinformatics and computational biology for plant
functional genomics. Rec Adv Phytochem 36:1–13
Mueller LA, Zhang P, Rhee SY (2003) AraCyc: a biochemical pathway database for Arabidopsis.
Plant Physiol 132:453–460
Oliver DJ, Nikolau B, Wurtele ES (2002) Functional genomics: high-throughput mRNA, protein,
and metabolite analyses. Metabolic Eng 4:98–106
Orchard S, Hermjakob H, Julian RK et al. (2004) Common interchange standards for proteomics
data: Public availability of tools and schema. Proteomics 4:490–491
Purohit PV, Rocke DM, Viant MR, Woodruff DL (2004) Discrimination models using variance-
stabilizing transformation of metabolomic NMR data. Omics 8:118–130
Quackenbush J (2001) Computational analysis of microarray data. Nat Rev Genet 2:418–427
Raamsdonk LM, Teusink B, Broadhurst D et al. (2001) A functional genomics strategy that uses
metabolome data to reveal the phenotype of silent mutations. Nature Biotechnol 19:45–50
Shannon P, Markiel A, Ozier O et al. (2003) Cytoscape: a software environment for integrated
models of biomolecular interaction networks. Genome Res 13:2498–2504
Shi H, Paolucci U, Vigneau-Callahan KE, Milbury PE, Matson WR, Kristal BS (2004) Development
of biomarkers based on diet-dependent metabolic serotypes: practical issues in development
of expert system-based classification models in metabolomic studies. Omics 8:197–208
Steuer R, Kurths J, Fiehn O, Weckwerth W (2003) Observing and interpreting correlations in
metabolomic networks. Bioinformatics 19:1019–1026
Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62:817–836
Bioinformatics Approaches to Integrate Metabolomics and Other Systems Biology Data 115
Taylor CF, Paton NW, Garwood KL et al. (2003) A systematic approach to modeling, capturing,
and disseminating proteomics experimental data. Nat Biotechnol 21:247–254
Thimm O, Blasing O, Gibon Y et al. (2004) MAPMAN: a user-driven tool to display genomics data
sets onto diagrams of metabolic pathways and other biological processes. Plant J 37:914–939
Verhoeckx KC, Bijlsma S, Jespersen S et al. (2004) Characterization of anti-inflammatory com-
pounds using transcriptomics, proteomics, and metabolomics in combination with multi-
variate data analysis. Int Immunopharmacol 4:1499–1514
Wagner C, Sefkow M, Kopka J (2003) Construction and application of a mass spectral and
retention time index database generated from plant GC/EI-TOF-MS metabolite profiles.
Phytochemistry 62:887–900
Weckwerth W (2003) Metabolomics in systems biology. Annu Rev Plant Biol 54:669–689
Wittig U, de Beuckelaer A (2001) Analysis and comparison of metabolic pathway databases. Brief
Bioinform 2:126–142
Xing Li X, Brazhnik O, Kamal A et al. (2002) Databases and visualization for metabolomics. In:
Harrigan GG, Goodacre R (eds) Metabolic profiling: its role in biomarker discovery and gene
function analysis. Kluwer Academic Publ, Boston, pp 293–309
II.2 Chemometrics in Metabolomics – An Introduction
J. Trygg1 , J. Gullberg2 , A.I. Johansson2 , P. Jonsson1 , and T. Moritz2
1 Introduction
Fig. 1. Overview of the basic categories of chemometrics analysis: A overview of data structure;
B classification and discriminant analysis; C regression analysis
Most of us can only grasp the effect of one factor at a time in our minds, and
that often leads us into the inefficient COST approach. We need the mathe-
matics (and the computer) to keep track of the factors and their combinations.
Chemometrics in Metabolomics – An Introduction 119
In summary, (1) all factors are varied together over a set of experimental
runs, (2) noise is decreased by means of averaging, (3) the functional space is
efficiently mapped, interactions and synergisms are seen.
1. What do I want? – formulate question(s) stating the objectives and goals of
the investigation. For example identify factors (e. g. temperature, day length,
nutrition) and factor ranges (e. g. 15−25 ◦ C, 6−12 h, 1−10 mmol N/L) that
affects flowering time.
2. Screening design – finding out a little about many factors. Which factors
are the dominating ones in controlling flowering time? Screening designs
provide simple models with information about dominating variables, and
information about ranges. Pareto’s principle states that 20% of the data
(factors) account for 80% of the information. Different types of screening
designs exist – which one to choose depends on the problem. The most com-
mon one is the fractional factorials design (Fig. 2). The full factorial design
is a set of experimental runs where every level of a factor is investigated at
both levels of all the other factors. It requires N = 2k number of runs for
k factors. Investigating more than five factors with the full factorial design
can in some cases become time consuming, i. e. 25 = 32, 26 = 64, 27 = 128
experiments, etc. Instead, performing a fractional factorial design reduces
Fig. 2. Example of a full factorial design of experiments (DOE) for investigating how three factors
(temperature, day length and nutrition) control flowering time. Varying the three factors at two
levels (coded as +/-) requires 23 = 8 experiments + center points. Each experiment according to
the design set of experiments is marked with a circle in the figure. Evaluating the results from
such an experimental design reveals the influence of each of the different factors separately and
also any interactions between them. DOE is the only feasible approach to separate cause and
effect from each other
120 J. Trygg et al.
that number quickly without the loss of too much information regarding the
estimation of factors involved. Fractional factorial design takes advantage
of the fact that three-way and higher interactions are seldom significant. It
requires only N = 2k−p number of runs for k factors, where p is set manually.
For example five factors can be run in only 25−2 = 8 experiments instead
of 25 = 32 experiments compared to the full factorial design. Fractional
factorial design takes advantage of the fact that three-way and higher inter-
actions are seldom significant. The downside, of course, for not performing
all experiments, is that confounding patterns are present. In other words,
the estimated effects are not “pure” but instead mixed with higher degree
interaction effects. This loss of information is the prize we need to pay for
the reduction of the number of experiments. The degree of confounding is
determined by the choice of p.
3. Response surface modeling (RSM) and optimization (few factors) – after
screening the factors involved in, e. g. determination of flowering time or
derivatization of metabolites, the goal of the investigation is usually to
create a valid map of the experimental domain (local space) given by the
significant factors and their ranges. This is done with a quadratic polynomial
model. The higher order models have an increased complexity, and therefore
also require more experiments/factors than screening designs. Different
types of RSM designs include Central composite designs, Box Behnken
designs and D-optimal designs (see, e. g. Lundstedt et al. 1998 for more
information).
4. Robustness testing – in robustness testing of, for instance, an analytical
method, the aim is to explore how sensitive the responses are to small
changes in the factor settings, e. g. temperature. Ideally, a robustness test
should show that the responses are not sensitive to small fluctuations in
the factors, that is, the results are the same for all experiments. Robust-
ness testing is usually applied as the last test just before the release of
a product or a method. The fractional factorial design is usually applied
here.
two widely used methods that can handle incomplete, noisy and collinear data
structures.
Fig. 3. (1) Each row (representing one biological sample) in a data table with K = 3 variables
can be represented as one point in a K = 3 dimensional space. The position of that point is
given by the coordinates given by the values in each of the K = 3 variables. (2) Repeating this
for all rows (samples) in a data table produces a swarm of points in K = 3 dimensional space.
Points (samples) that are close to each other have more similar biological properties than points
that are far apart. (3) Projection methods such as PCA, finds a representative low-dimensional
plane (here two-dimensional) that is a good summary of the variation in the X data table (swarm
of points). (4) This model plane can then be visualised in scatter plots (A) and provides an
overview, e. g. if there are any groupings, trends or outliers in the data. For example in the figure
(A) there is a clear separation between the Arabidopsis wild type and mutant. It is also possible
to understand the reason for this separation by looking at the direction of the model plane
with respect to the original axes (original variables). These are summarized in the PCA model
loadings, P (B)
loading plot (Fig. 3). This is a powerful tool for understanding the underlying
patterns in the data.
The PCA model can be expressed as
Model of X: X = TPT + E
where T are the scores, P defines the loadings, and E represent the residual
matrix. The residual matrix E contains the residuals for each sample between
124 J. Trygg et al.
Fig. 4. PCA summarise all variation in X into a few new variables called scores T. These new
variables are linearly weighted combinations of the original X-variables. The loadings P contain
the weights used for each X-variable and thus reveal the influence of individual X-variables
its point in K-dimensional space and its point on the model plane. The residuals
are important for detection of outliers and for defining the model boundaries
(see Fig. 4).
The PLS method is used instead of the PCA method when additional knowledge
about each sample exists, the Y matrix, e. g. genotype of each sample (wild
type/mutant). The sample information according to the design matrix from
the Design of Experiments (see Sect. 2.1) is often used as a Y matrix. Hence,
PLS represents the regression analogy of PCA working with two matrices,
X and Y (Wold et al. 1984). It is one of the most common methods when
a quantitative relationship between a descriptor matrix X and a response
matrix Y is sought. The Y matrix can contain both quantitative (e. g. glucose
concentration) and qualitative (genotype) information. This additional sample
information in Y is used by the PLS method to focus the model plane to capture
the Y-related variation in X, e. g. separation between genotypes, rather than
providing an overall view of all variation in the data as done by the PCA
model. In addition, the PLS method can also be used to predict the properties
(Y-values) of new unknown samples, e. g. predict the glucose concentration or
genotype.
The Y matrix consists of the same number of rows as the X matrix. Each
column in Y indicate a certain property, e. g. glucose concentration or genotype
for each sample. When Y contains qualitative information such as genotype, the
number of columns in Y equals the number of classes. Each row in Y describes
the group membership for that sample where “1” indicates class belonging for
that sample and “0” does not. When Y is qualitative, the PLS method is called
PLS Discriminant Analysis (PLS-DA), to distinguish it from the situation when
Y is quantitative.
Chemometrics in Metabolomics – An Introduction 125
We will work through a metabolomics example using GC/MS data from the
analysis of Arabidopsis extracts. Shoots of higher plants are characterized by
axillary branching, where the shoot branches develop from shoot meristems
located between a leaf and the shoot stem. The control of axillary shoot growth
(branching) is not well understood, but it is known that several internal fac-
tors such as the plant hormones IAA and cytokinins are involved (McSteen
and Leyser 2005). Mutations screens in Arabidopsis have identified four loci
involved in the repression of axillary bud growth, MAX1–4. Based on the mu-
tants, it is now suggested that an unknown transmittable substance might be
involved in controlling branching (see McSteen and Leyser 2005). The biosyn-
thesis of this compound in Arabidopsis is catalyzed by a number of MAX
(more-axillary growth) proteins.
We have used a metabolomics approach to classify and identify the metabolic
differences between the MAX-mutants. Root samples from WT, max3 and max4
mutants were analysed by GC/TOFMS as described by Gullberg et al. (2004).
The GC/MS data was processed by hierarchical multivariate curve resolution
(Jonsson et al. 2005a), and the obtained X-matrix was thereafter subjected to
PCA and PLS-DA analysis. The GC/MS processing resulted in 514 resolved peak
areas. Log transformation, column centering and scaling to unit variance was
done on the resolved peak areas (X-matrix) prior to modeling and two dummy
Y-variables were constructed based on the class belonging of each sample to the
Fig. 5. A PLS-DA score-plot from the analysis of metabolite profiles in roots of Arabidopsis WT,
max3 and max4. The PLS-DA model is based on WT and max3. The X-matrix was centered and
scaled to unit variance. The explained variation in the X-matrix (R2 X) is 0.74, the explained
variation in the Y matrix (R2 Y) is 0.99 and the predictive ability according to sevenfold cross-
validation (Q2) is 0.84. R2 X is the cumulative modelled variation in X, R2 Y is the cumulative
modelled variation in Y and Q2 Y is the cumulative predicted variation in Y, according to cross-
validation. The range of these parameters is 0–1, where 1 indicates a perfect fit. B Based on the
model max4 samples were predicted into the model showing that max3 and max4 are very similar
regarding metabolic content (compare position score plot in A)
126 J. Trygg et al.
genotypes, WT and max3. The PLS-DA model score plot is shown in Fig. 5A.
The score plot reveals the relationship among the samples. It is clear from the
figure that the model plane displays a clear separation of the two genotypes.
To validate the model results, predictions were made for the genotype max4,
using the calculated PLS-DA model based on the other sample-set (WT and
max3). The results, shown in the obtained PLS-DA score plot (Fig. 5B) pre-
dicted that the max4 is closer to max3 than WT. This is consistent with the facts
that max3 is very similar to the max4 genotype, where the MAX3 and MAX4
proteins use the same substrate (Schwarz et al. 2005). Interpretation of the first
weight vector (w1) from the PLS-DA model, as described by Trygg and Wold
(2002), together with the 99% confidence intervals calculated using jack-knifing
(Martens and Martens 2000), highlighted 64 significant variables (metabolites)
differing between WT and max4. The importance of these metabolites is a part
of biological validation of the data set. The statistical validation was done
by prediction of the max3 mutants into the WT/max4 model. Both type of
validation is of importance for validating the multivariate data set.
Multivariate projection methods, e. g. PCA and PLS, represent a useful and ver-
satile technology to modelling, monitoring and prediction of complex prob-
lems and data structures encountered within metabolomics and other ‘omics’
disciplines. The common denominator is that high complexity data tables are
generated and that these data tables can be analysed and interpreted by means
of chemometric methods. The principal component analysis (PCA) method
summarizes the variation in a data table X into a model plane (the scores T).
A scatter plot of these scores gives an overview of the samples (observations)
and how they relate to each other, e. g. if there are groupings or trends or
deviating samples and so on. In order to interpret the patterns found in a score
plot one examines the corresponding loading plot (P). The loadings P reveal
how each variable contributes to the separation among samples in the model
plane and also gives insights into the relative importance of each variable.
However, one fundamental property is that the data does contain relevant
information regarding our biological question. In other words, how to max-
imise the information content in the data? The traditional way to Change One
Factor at a Time, i. e. the COST approach, is not recommended. Design of Ex-
periments (DOE) is the methodology of how to conduct and plan experiments
in order to maximize information in the data in the fewest number of runs.
A proper experimental design will reveal the influence of each of the different
factors separately and also any interactions between them. DOE is the only
feasible approach to separate cause and effect from each other. Therefore is
DOE in combination with chemometrical analysis a powerful way of planning,
conducting and evaluating metabolomics experiments.
Chemometrics in Metabolomics – An Introduction 127
Acknowledgements. The Swedish Research Council, Wallenberg Consortium North (WCN), the
Kempe foundation, EU strategic funding, Knut and Alice Wallenberg Foundation (JT) and Strate-
gic Research Funding (SSF) are acknowledged for financial support. Professor Ottoline Leyser,
York, UK, for allowing us to show data from the max-mutant project, and Dr. Miyako Kusano,
RIKEN Plant Science Centre, Yokohama, Japan for the initial analysis of metabolites in the
max-mutants.
References
Allen J, Davey HM, Broadhurst D, Heald JK, Rowland JJ, Oliver SG, Kell DB (2003) High-
throughput classification of yeast mutants for functional genomics using metabolic foot-
printing. Nature Biotechnol 21:692–696
Andreev VP, Rejtar T, Chen HS, Moskovets EV, Ivanov AR, Karger BL (2003) A universal denoising
and peak picking algorithm for LC-MS based on matched filtration in the chromatographic
time domain. Anal Chem 75:6314–6326
128 J. Trygg et al.
Dunn WB, Bailey NJC, Johnson HE (2005) Measuring the metabolome: current analytical tech-
nologies. Analyst 130:606–625
Eriksson L, Johansson E, Kettaneh-Wold N, Wold S (2001) Multi and megavariate data analysis.
Umetrics (www.umetrics.com), ISBN 91–973730-1-X
Gullberg J, Jonsson P, Nordström A, Sjöström, M, Moritz T (2004) Optimisation of preparation
of plant samples for metabolic profiling by GC-MS. Anal Biochem 331:283–295
Halket JM, Przyborowska A, Stein SE, Mallard WG, Down S, Chalmers RA (1999) Deconvolution
gas chromatography mass spectrometry of urinary organic acids - potential for pattern recog-
nition and automated identification of metabolic disorders. Rapid Commun Mass Spectrom
13:279–284
Idborg-Björkman H, Edlund, PO, Kvalheim OM, Schuppe-Koistinen I, Jacobsson SP (2003)
Screening of biomarkers in rat urine using LC/electrospray ionization-MS and two-way
data analysis. Anal Chem 75:4784–4792
Jackson JE (1991) A users guide to principal components. Wiley, New York
Jonsson P, Gullberg J, Nordström A, Kowalczyk M, Sjöström M, Moritz T (2004) A strategy for
extracting information from large series of non-processed complex GC/MS data. Anal Chem
76:1738–1745
Jonsson P, Johansson AI, Gullberg J, Trygg J, A J, Grung B, Marklund S, Sjöström M, Antti H,
Moritz T (2005a) Highthroughput data analysis for detecting and identifying differences
between samples in GC/MS-based metabolomic analyses. Anal Chem 77:5635–5642
Jonsson P, Bruce SJ, Moritz T, Trygg J, Sjostrom M, Plumb R, Granger J, Maibaum E, Nicholson JK,
Holmes E, Antti H (2005b) Extraction, interpretation and validation of information for
comparing samples in metabolic LC/MS data sets. Analyst 130:701–707
Lundstedt T, Seifert E, Abramo L, Thelin B, Nyström A, Pettersen J, Bergman R (1998) Experi-
mental design and optimization. Chem Intel Lab Systems 42:3–40
Martens H, Martens M (2000) Modified Jack-knife estimation of parameter uncertainty in bilinear
modelling by partial least squares regression (PLSR). Food Qual Pref 11:5–16
McSteen P, Leyser O (2005) Shoot branching. Annu Rev Plant Biol 56:353–374
Schauer N, Steinhauser D, Strelkov S, Schomburg D, Allison G, Moritz T, Lundgren K, Roessner-
Tunali U, Forbes MG, Willmitzer L et al (2005) GC-MS libraries for the rapid identification
of metabolites in complex biological samples. FEBS Lett 579:1332–1337
Schwartz S, Qin XQ, Loewen MC (2005) The biochemical characterization of two Carotenoid
cleavage enzymes from Arabidopsis indicates that a carotenoid-derived compound inhibits
lateral branching. J Biol Chem 279:46940–46945
Sumner LW, Mendes P, Dixon RA (2003) Plant metabolomics: large-scale phytochemistry in the
functional genomics era. Phytochemistry 62:817–836
Trygg J (2002) O2-PLS for qualitative and quantitative analysis in multivariate calibration.
J Chemometr 16:283–293
Trygg J, Wold S (2002) Orthogonal projections to latent structures (O-PLS). J Chemometrics
16:119–128
Wold S, Ruhe A, Wold H, Dunn WJ III (1984) The collinearity problem in linear regression.
The partial least squares (PLS) approach to generalized inverses. SIAM J Sci Statist Comput
5:735–743
II.3 Map Editor for the Atomic Reconstruction
of Metabolism (ARM)
M. Arita1,2 , Y. Fujiwara1 , and Y. Nakanishi3
1 Introduction
Fig. 2. Three atomic mappings in the reaction glucose + ATP = glucose 6-phospahte + ADP. The
mapping between ATP and ADP (shown in solid lines) is common to all phosphorylation reactions
with ATP and ADP
Map Editor for the Atomic Reconstruction of Metabolism (ARM) 133
3.1 Overview
The design principle of the map editor is that users can flexibly integrate a se-
quence of atomic mappings (not reactions) into existing metabolic pathways
to form metabolic maps (Fig. 3). First, users are expected to search metabolic
pathways using the associated database that stores enzymatic reactions, their
atomic mappings and molecular structures. The searched pathways (sequences
of atomic mappings) are transferred to the main window where their layout
can be freely edited as in a conventional graphical drawing editor. The advan-
tage of our editor over conventional editors such as Microsoft PowerPoint is
that users can import metabolic objects (e. g. compound structures and re-
actions) from the background database: although on-screen it appears as if
only graphical objects for compound structures and reactions are imported,
more information is processed in the background. For example, importing
one enzymatic reaction on the screen implicitly invokes the integration of its
associated atomic mappings into the already drawn metabolic map so that the
route of any atom in the new reaction can be traced seamlessly on the resulting
metabolic map.
134 M. Arita, Y. Fujiwara, and Y. Nakanishi
Fig. 3. Screen-shot of the Map Editor. Pathways are searched by typing molecular names in the
input fields (Number 1). The search results are listed (Number 2). Users can drag pathways into
the main canvas
The background database for atomic information stores three types of metabol-
ic data: molecular structures, reaction formulas, and their atomic mappings.
All data are freely available in text format from the website http://www.
metabolome.jp/download.html.
Molecular structures are registered in the MOL-file format (MDL Informa-
tion Systems; its description is downloadable from http://www.mdli.com/). The
MOL-file format is the de facto standard to describe molecular structures; an
example is shown in Fig. 4. Each MOL-file describes one molecular structure
as a list of atoms with their XYZ coordinates and their chemical bondings. The
chirality of carbon is specified using one integer value for each corresponding
carbon atom. Information on the display of chirality (in thick and shaded lines)
is specified using other integer values. The metabolic editor does not use the
XYZ coordinates written in the format; rather, it applies the original drawing
algorithm to assign XYZ positions (Arita 2005).
As in other metabolic databases, enzymatic reactions are described using
compound names. Reaction formulas were obtained from the Enzyme Nomen-
clature of the International Union of Biochemistry and Molecular Biology
(http://www.chem.qmw.ac.uk/iubmb/enzyme/). In each reaction, the order of
molecules on the left- and right-hand side was manually rearranged so that
the atomic mappings can be computationally detected by comparing molecu-
Map Editor for the Atomic Reconstruction of Metabolism (ARM) 135
Fig. 4. MOL-file format for l-alanine. In the ARM database, carbon atoms in the atom-block
are ordered according to the IUPAC positions for molecular structures. Only carbon atoms are
correctly ordered
Fig. 5. Schematic view of reaction EC 6.3.5.2. The reaction formula is written as “ATP + XMP + l-
glutamine + H2 O =>AMP + GMP + l-glutamic acid + pyrophosphate” in the database so that the
molecular structures roughly correspond one-to-one (top to bottom)
lar structures sequentially from left to right (Fig. 5). For details on structure
comparison to compute atomic mappings refer to Arita (2003).
Using molecular structures, atomic mappings were pre-computed for all
registered reactions and, after manual verification, correct mapping results
136 M. Arita, Y. Fujiwara, and Y. Nakanishi
were stored in the database. Details, including the accuracy of the mapping
computation, were described previously (Arita 2003). Although the mapping
was computed for all atomic elements except hydrogen, the results were reg-
istered only for carbon, nitrogen, and sulfur atoms due to ambiguities in the
mapping of the rest of the elements.
The map editor is equipped with a search engine for metabolic pathways.
Given a source and target metabolites, the engine computes logically possible
pathways between these metabolites from the shortest- to pathways of any
length. Although pathway length is measured by the number of reaction steps,
an arbitrary value can be assigned in the algorithm used. In other words, the
engine can compute any pathway throughout which at least one carbon (or
nitrogen, sulfur) atom is conserved. An arbitrary combination of pathways can
be visualized by dragging a searched pathway into the main canvas window
(Fig. 3). When a pathway is dragged into the canvas, it is merged with the already
drawn network (Fig. 6). Although its initial layout is automatically assigned,
a user can freely rearrange the orientation or location of any metabolic object
by using the mouse.
Fig. 6. The network generated by merging two pathways from xylulose 5-phosphate (X5P) to
erythrose 4-phosphate. Every time a pathway is dropped, only the difference from the existing
network is drawn. Carbon 3 in X5P is traced in this example (shown with blank arrows)
Map Editor for the Atomic Reconstruction of Metabolism (ARM) 137
The unique function of the map editor is its ability to trace a particular
atom on the map. Since each metabolic object on the map is linked with its
atomic information in the database, the logical tracing of each atomic position
is possible by transitive calculation of the atomic mappings in the network.
A user needs only to mouse-click a particular atom on the network to see its
traces (Fig. 6).
4 Applications
Metabolic pathways contain two types of cycles in terms of tracing atoms: cycles
where carbon atoms are exchanged in each round (e. g. the tricarboxylic acid
(TCA) cycle), and cycles where all carbon atoms are conserved (e. g. the urea
cycle). If reversible, a single reaction catalyzing two identical molecules (e. g.
2 pyruvate = 2-acetolactate +CO2 ) can form the former type of cycle by itself.
Likewise, the latter type of cycle can be formed by any reversible reaction. Cyclic
pathways may be biochemically meaningful, but in practice, their existence is
problematic in searching metabolic pathways. Since pathways are searched
and output according to the number of reaction steps in our system, short
cycles drastically increase the number of spurious pathways with local loops.
To eliminate such futile pathways, our pathway-search algorithm eliminates all
pathways that visit the same molecule multiple times. However, this constraint
is too strict for searching all possibly existing pathways. For example, users
studying the TCA cycle may want to analyze carbon traces that go round the
cycle multiple times.
To support the atomic analysis of cyclic pathways, a metabolic map editor
is indispensable. First, users search the pathways of interest and paste them
into the main window using the edit function (i. e., model selection). Then,
a particular atom can be interactively traced within the selected set of reactions.
Since the target model is highly constrained, it is feasible to compute pathways
visiting the same compound multiple times.
Because of the shared metabolites between the glycolytic and pentose phos-
phate pathways, the atomic traces of a particular carbon atom often become
hard to follow. This is the case for the correspondence between the C-1 of
glucose and the C-1 of pentose 5-phosphate. By clicking the corresponding
atomic position in the map editor, the atomic trace can be grasped at a glance.
Although the entry point of the pentose phosphate pathway decarboxylates
the C-1 position of glucose, this position is identical to the C-1 of fructose 6-
phosphate through glycolysis. Thus, in the pentose phosphate pathway, the C-1
138 M. Arita, Y. Fujiwara, and Y. Nakanishi
Recently, metabolome analysis has been facilitated due to the rapid techni-
cal progress made in mass spectrometry (MS). To detect lipid molecules, for
example, an effective strategy is to couple MS with liquid chromatography-
electrospray ionization (Houjou et al. 2004). More than 1000 glycerophospho-
lipid species can be quantitated in a single assay in less than 2 h (R. Taguchi,
personal communication). However, the efficient analysis of such large-scale
data sets poses a vexing problem. Network visualization remains the first step
for gaining an overview of the data; however, the traditional metabolic map
is not suitable for visualization because it contains abstract notations. For
example, the ‘phosphatidyl group’ contains two fatty acids of variable lengths
(usually 12∼24 carbon atoms) and degrees of unsaturation (usually 0∼6 double
bonds, depending on the length). Since experimentally confirmed fatty acids
in a phosphatidyl group are comprised of more than 30 species, the number
of actual phosphatidyl species may be as many as its square, i. e., ≈ 1000. To
visualize the distribution of the spectrum of molecular species, our map editor
supports an interactive instantiation of abstract moieties. For each abstract
notation using ‘R-group’, a user can assign a list of molecules as its possi-
ble instantiation. The map editor can also display an integer value for each
molecule (such as the concentration, mass, logP, etc.). Given the list of possible
instantiations for each R-group and their corresponding concentrations (i. e.,
metabolomic data), the editor can display the percentage fraction of candidate
molecules. When multiple R-groups exist, the amount to be displayed will be
the integration of all possible assignments. In a phosphatidyl group, various
fatty acids can be linked at R1 and R2 positions of glycerol phosphate. With
a mouse click, the candidate list for the R1 (or R2) position is displayed together
with the relevant percentage fractions. The percentage for a docosahexanoic
acid (DHA) in the R1 group, for example, is calculated as the sum of all phos-
phatidyl molecules that have DHA at R1. When DHA is chosen for R1, the
percentage list for R2 consists of fully instantiated molecular species that have
DHA at R1 and another fatty acid at R2.
Due to the abstract notations for molecules, lipid metabolism is a particu-
larly unspecific part in the traditional metabolic map. The computer-assisted
metabolic map is indispensable to visualize the metabolomic data of such
pathways.
Map Editor for the Atomic Reconstruction of Metabolism (ARM) 139
5 Conclusions
The map editor is not only a tool for visualizing metabolic pathways, but
is a necessary component for the systematic and modular understanding of
species- and context-dependent metabolic networks. Since the software system
is linked with atomic-level information in the background database, users can
trace any atomic position on any metabolic network they draw. It is a desir-
able realization of a pathway database. Most web-based pathway databases do
not support the users’ own arrangement of networks, although in computer
science, the definition of a database system is ‘ a collection of information or-
ganized in such a way that a computer program can quickly select desired data
in a desired arrangement’. Our map editor compensates for this drawback, and
represents a step forward to a more flexible analysis of large-scale biological
information.
Acknowledgements. The ongoing analysis of lipid metabolism is a joint effort with Prof. Ryo
Taguchi at The University of Tokyo. The authors thank Ursula Petralia for editing the manuscript.
This work was supported by The Ministry of Education, Culture, Sports, Science and Technology
(MEXT), Grant-in-Aid for Scientific Research on Priority Areas.
References
Arita M (2003) In silico atomic tracing by substrate-product relationships in Escherichia coli
intermediary metabolism. Genome Res 13(11):2455–2466
Arita M (2005) Introduction to the ARM database: database on chemical transformations in
metabolism for tracing pathways. In: Tomita M, Nishioka T (eds) Metabolomics: the frontier
of systems biology. Springer, Berlin Heidelberg New York, pp 193–211
Arita M, Robert M, Tomita M (2005) All systems go: launching cell simulation fueled by integrated
experimental biology data. Curr Opin Biotechnol 16(3):344–349
Berg JM, Tymoczka JL, Stryer L (2002) Biochemistry, 5th edn. Freeman, New York
Hood L (2003) Systems biology: integrating technology, biology, and computation. Mech Ageing
Dev 124:9–16
Houjou T, Yamatani K, Nakanishi H, Imagawa M, Shimizu T, Taguchi R (2004) Rapid and selective
identification of molecular species in phosphatidylcholine and sphingomyelin by conditional
neutral loss scanning and MS3. Rapid Commun Mass Spectrom 18(24):3123–3130
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering
the genome. Nucleic Acids Res 32 (Database Issue):D277–D280
Keseler IM, Collado-Vides J, Gama-Castro S, Ingraham J, Paley S, Paulsen IT, Peralta-Gil M,
Karp PD (2005) EcoCyc: a comprehensive database resource for Escherichia coli. Nucleic
Acids Res 33 (Database Issue):D334–337
Ma HW, Zhao XM, Yuan YJ, Zeng AP (2004) Decomposition of metabolic network into func-
tional modules based on the global connectivity structure of reaction graph. Bioinformatics
20(12):1870–1876
Mavrovouniotis ML (1992) Computer-aided synthesis of biochemical pathways. Biotechnol Bio-
eng 36:1119–1132
Mendes P (2002) Emerging bioinformatics for the metabolome. Brief Bioinform 3(2):134–145
Papin JA, Reed JL, Palsson BO (2004) Hierarchical thinking in network biology: the unbiased
modularization of biochemical networks. Trends Biochem Sci 29(12):641–647
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabasi AL (2002) Hierarchical organization of
modularity in metabolic networks. Science 297(5586):1551–1555
Subject Index
Volumes in preparation
Transgenic Crops IV
E.C. Pua and M.R. Davey (Eds.)
Transgenic Crops V
E.C. Pua and M.R. Davey (Eds.)
Transgenic Crops VI
E.C. Pua and M.R. Davey (Eds.)