Food Metabolome Review
Food Metabolome Review
Food Metabolome Review
1286 Am J Clin Nutr 2014;99:1286–308. Printed in USA. Ó 2014 American Society for Nutrition
enzymes), red blood cells (fatty acids, carotenoids, and hemo- in erythrocyte glutathione peroxidase show saturable effects and
globin adducts), and to a lesser extent in urine (polyphenols, may not be suitable for use at high levels of exposure (29, 42).
vitamins, inorganic compounds, and amino acids). Some of Conversely, some biomarkers are present at concentrations too low
these biomarkers correspond to nutrients and bioactive compounds to be reliably detected at low levels of exposure. For example,
and have been used to compare status or exposure. Some have been some biomarkers of alcohol abuse were not appropriate to evaluate
used as surrogate biomarkers of food intake, as follows: poly- low to moderate levels of alcohol consumption (43).
phenols, carotenoids, and vitamin C for fruit and vegetables (17, Specificity is another essential characteristic of biomarkers.
18); alkylresorcinols for whole-grain cereals (19, 20); isoflavones Some biomarkers can be highly specific for a particular food (Table
for soy (21); amino acids and fatty acids for meat (22, 23); fatty 1). Proline betaine and lycopene are well-established biomarkers
acids for dairy products and fish (22, 24); and polyphenols for tea for citrus fruit and tomato products, respectively (44, 45). Other
and wine (18, 25) (Table 1). Dietary biomarkers not only include biomarkers may be common to several foods or characteristic of an
natural food constituents but also certain food additives such as entire food group. Vitamin C and a number of carotenoids and
iodine in milk (26) or food contaminants such as polychlorinated flavonoids are common to many fruit and vegetables. Vitamin C or
biphenyls in fatty fish (27). These latter biomarkers are often the sum of carotenoids or flavonoids have been used as generic
specific to certain populations who consume these additives or biomarkers for fruit and vegetable intake (18, 45).
where consistent levels of contamination are observed.
Other biomarkers are directly derived from the digestion and gut
absorption of food constituents or are endogenous metabolites that Single biomarker or combinations of biomarkers
have been altered by exposure to specific nutrients. For instance, Traditionally, single biomarkers have been used to characterize
serotonin metabolism is altered by acute alcohol intake (28), the complex dietary exposures such as consumption of a whole food
activity of selenium-containing enzymes such as erythrocyte group or intake of a group of compounds with related biological
glutathione peroxidase depends on selenium intake, and ceramide activities. Two examples show the limits of such global assays.
synthase is inhibited by exposure to the mycotoxin fumonisins (29). Vitamin C used as a biomarker for fruit and vegetable intake is
present in a large number of fruit and vegetables, but its content
varies widely according to species, varieties, and food-processing
Pharmacokinetics and reliability of dietary biomarkers methods. It is also widely used as an additive and dietary sup-
Dietary biomarkers are not without their limitations. They may plement. The Folin assay, commonly used to estimate total
be altered because of possible interactions with genetic factors, polyphenols in foods (46), has also been applied to urine samples
physiologic or health status (ie, age or obesity) (30), dietary to compare polyphenol intake (47), but such use may be in-
factors such as fats for lipophilic biomarkers (31), and lifestyle appropriate because of the presence of interfering reducing
factors such as alcohol intake or smoking (32). Their concen- metabolites in such complex biological matrices (46).
trations also vary over time according to their pharmacokinetic In contrast to these global assays, analytic approaches based on
properties. A higher intraindividual variability is expected for the estimation of combinations of dietary constituents may pro-
biomarkers with a short half-life (20, 33). Intraindividual vari- vide more accurate measurements of dietary exposure. The ratios
ability leads to exposure measurement errors when the objective of 2 alkylresorcinols characteristic of whole-grain wheat or rye
is to characterize habitual exposure in epidemiologic studies and were found to be good indicators of the relative consumption of
small numbers of measurements are available across subjects. these cereals (20, 48). However, there are very few such examples
Some of the biomarkers listed in Table 1 have half-lives that do in which combinations of biomarkers were used to improve the
not exceed 24 h [polyphenols, alkylresorcinols, and amino acids specificity of dietary exposure measurements. Metabolomics
(34, 35)]. These biomarkers may thus be useful only in pop- constitutes a comprehensive approach to identify new panels of
ulations who regularly and frequently consume these dietary biomarkers that are specific or common to particular foods or food
sources. Lipophilic markers (carotenoids, lipids) (36) or bio- groups, as shown recently for citrus fruit (49). This should greatly
markers associated with erythrocytes (folate, fatty acids) (29) improve the assessment of exposure to classes of food bioactive
have longer half-lives (week to month) because of the equilibrium compounds, food groups, or dietary patterns.
of biomarkers between blood and fatty tissues, or because of their
integration into erythrocytes. Some dietary compounds such as
isothiocyanates and acrylamide also form adducts with blood THE FOOD METABOLOME IN THE OMICS ERA
albumin and hemoglobin (37, 38), with half-lives varying be- Metabolomics can be described as the application of high-
tween 3 and 8 wk, and may be used as longer-term biomarkers. throughput analytic chemistry technologies [liquid chromatography–
Protein adducts with dietary compounds have received limited mass spectrometry (LC-MS)4, nuclear magnetic resonance
attention thus far. Adductomics appears to be particularly
promising for the discovery of these adduct biomarkers (39, 40).
4
Abbreviations used: dbNP, Nutritional Phenotype Database; ECMDB,
E. coli Metabolome Database; FDR, false discovery rate; FooDB, Food Com-
Biomarker sensitivity and specificity ponent Database; GC-MS, gas chromatography–mass spectrometry; HMDB,
Human Metabolome Database; LC-MS, liquid chromatography–mass spec-
Dietary biomarkers should have sufficient sensitivity to measure
trometry; MS, mass spectrometry; MSI, Metabolomics Standards Initiative;
exposures within ranges commonly found in the populations of MWAS, metabolome-wide association study; NMR, nuclear magnetic reso-
interest. Intervention studies are essential to address this question nance spectroscopy; PCA, principal components analysis; PLS-DA, partial
and to evaluate the relation between exposure and biomarker least-squares discriminant analysis; TMAO, trimethylamine oxide-N-oxide;
concentrations (17, 41). Biomarkers such as vitamin C or selenium YMDB, Yeast Metabolome Database.
Fruit
Apple Kaempferol, isorhamnetin, m-coumaric acid, phloretin
Orange Caffeic acid, hesperetin, proline betaine
Grapefruit Naringenin
Citrus fruit Ascorbic acid, b-cryptoxanthin, hesperetin, naringenin, proline betaine, vitamin A, zeaxanthin
Fruit (total) 4-O-Methylgallic acid, b-cryptoxanthin, carotenoids (mix), flavonoids (mix), gallic acid, hesperetin, isorhamnetin,
kaempferol, lutein, lycopene, naringenin, phloretin, vitamin A, vitamin C, zeaxanthin
Vegetables
Carrot a-Carotene
Tomato Carotenoids (mix), lycopene, lutein
Vegetables, leafy Ascorbic acid, b-carotene, carotenoids (mix)
Vegetables, root Ascorbic acid, a-carotene, b-carotene
Vegetables (total) Ascorbic acid, a-carotene, b-carotene, b-cryptoxanthin, carotenoids (mix), enterolactone, lutein, lycopene
Fruit and vegetables (total) a-Carotene, apigenin, ascorbic acid, b-carotene, b-cryptoxanthin, carotenoids (mix), eriodictyol, flavonoids (mix),
hesperetin, hippuric acid, lutein, lycopene, naringenin, phloretin, phytoene, zeaxanthin
Cereal products
Whole-grain rye 5-Heptadecylresorcinol, 5-pentacosylresorcinol, 5-tricosylresorcinol
Whole-grain wheat 5-Heneicosylresorcinol, 5-tricosylresorcinol, alkylresorcinols (mix)
Whole-grain cereals (total) 5-Heneicosylresorcinol, 3,5-dihydroxybenzoic acid, 3-(3,5-dihydroxyphenyl)-1-propanoic acid, 5-pentacosylresorcinol,
5-tricosylresorcinol, alkylresorcinols (mix)
Seeds
Soy products Daidzein, genistein, isoflavones (mix), O-desmethylangolensin
Meats
Meat 1-Hydroxypyrene glucuronide, 1-methylhistidine
Meat, beef Pentadecylic acid
Animal products (total) 1-Methylhistidine, 3-methylhistidine, margaric acid, pentadecylic acid, phytanic acid
Dairy products
Milk, dairy products Iodine, margaric acid, pentadecylic acid, phytanic acid
Fish
Fatty DHA, EPA, long-chain v-3 PUFAs, polychlorinated biphenyl toxic equivalents, pentachlorodibenzofuran,
polychlorinated biphenyl 126, polychlorinated biphenyl 153, v-3 PUFAs
Lean Long-chain v-3 PUFAs
Beverages (nonalcoholic)
Tea 4-O-Methylgallic acid, gallic acid, kaempferol
Coffee Chlorogenic acid
Beverages (alcoholic)
Wine 4-O-Methylgallic acid, caffeic acid, gallic acid, resveratrol metabolites
Beverages (alcoholic) (total) 5-Hydroxytryptophol/5-hydroxyindole-3-acetic acid, carbohydrate-deficient transferrin, ethyl glucuronide,
g-glutamyltransferase, aspartate aminotransferase, alanine aminotransferase
1
Data were extracted from the Exposome-Explorer database (V Neveu, DS Wishart, and A Scalbert, unpublished data, 2014).
spectroscopy (NMR), gas chromatography–mass spectrometry at characterizing the metabolic responses of humans to the in-
(GC-MS)] directed at characterizing the metabolome (ie, the take of various foods or food constituents such as soy (55), citrus
small molecules associated with metabolism). Its development fruit (44), nuts (56), meats (57), and tea (58).
follows that of genomics, transcriptomics, and proteomics. Al-
though not as rapid in development or as high-throughput as its
omics cousins, metabolomics led a sea change in how small The food metabolome as part of the human metabolome
molecules could and should be analyzed. Rather than being It was through these early metabolome studies that scientists
limited to measuring only one or a few compounds at a time, realized that the human metabolome was not as small or as simple
new metabolomic technologies allowed researchers to measure as first imagined. In particular, noticeable differences in human
hundreds or even thousands of metabolites at a time. This newly metabolomes could be detected that appeared to depend strongly
found capacity to measure so many chemicals at once led to on diet, sex, health status, genetics, kinetics, physiology, and
a number of metabolomic projects, all launched in the mid- age—with diet being most important (59–62). This dietary de-
2000s, aimed at identifying the metabolomes of microbes (50), pendence was not unexpected, but it was not anticipated to be so
plants (51), and humans (52–54). These projects typically used complicated. Unlike laboratory animals, humans are free-living
LC-MS, GC-MS, NMR, or a combination of all 3 techniques omnivores who, in fact, eat other metabolomes. Furthermore,
to identify and/or quantify as many metabolites as possible in humans are exposed to a huge variety of “chemical environ-
cells, tissues, and biofluids of the organisms of interest. These ments” associated with the various foods we consume. Thus, the
comprehensive metabolomic studies were also complemented human metabolome is not just a single entity but consists of
by a number of much more specific metabolomic studies aimed several components (Figure 1), including the following: 1) the
endogenous metabolome (consisting of chemicals needed for, or essential fatty acids along with most vitamins, and minerals,
excreted from, cellular metabolism), 2) the food metabolome which cannot be produced by humans and must originate from
(consisting of essential and nonessential chemicals derived from external dietary sources.
foods after digestion and subsequent metabolism by the tissues The second way that food constituents can be metabolized is
and the microbiota), 3) other xenobiotics derived from drugs, through transformation by host tissues. Food compounds that are
and 4) xenobiotics derived from environmental or workplace not useful for basic metabolism or that do not correspond to fa-
chemicals. miliar endogenous metabolites are treated as “foreign” or as
The exact size and composition of these different human xenobiotics. Examples of exogenous food constituents include
metabolomes are difficult to ascertain. Minimally, the human polyphenols, alkaloids, carotenoids, chlorophylls, artificial colors,
metabolome contains 50,000 different detectable compounds (9, artificial flavors, natural volatiles for flavoring/aroma, and Mail-
63), but as instrument sensitivity and separation technologies lard reaction products formed during cooking. The human body
improve, this number is expected to increase. Up to 200,000 maintains a complex defense system consisting of dozens of en-
different metabolites are estimated to occur in the plant kingdom, zymes and membrane transporters to recognize these foreign and
and combinations of several hundreds of secondary metabolites potentially toxic chemicals and to neutralize them by rapid bio-
generally characterize each edible plant (6, 64, 65). Furthermore, transformation and/or elimination. Classically, the biotrans-
the composition often depends on the body compartment, tissue, formation process consists of 2 types of chemical reactions,
or biofluid to which one refers. For instance, many food or drug phase I and phase II transformations, both of which occur
constituents that might be found in the mouth or stomach are primarily in the liver, kidney, and intestine. Phase I trans-
chemically identical to the compounds isolated from the intact formations typically involve oxidation of compounds via
food or drug. On the other hand, food constituents found in blood, cytochrome P450 enzymes as well as hydrolysis by various
urine, or other excreta are often metabolically transformed in the dehydrogenases, esterases, and amidases. On the other hand,
liver, kidney, or intestine to metabolites that are very different phase II transformations consist of chemical modifications
from the parent compound. This adds greatly to the diversity of such as methylation (by methyltransferases), sulfation (by
the food metabolome. However, in some cases, the parent sulfotransferases), acetylation (by N-acetyltransferases), glu-
compounds are broken down to such an extent that their end curonidation (by UDP-glucuronyltransferases), and amino acid
products are actually identical to chemicals that the body pro- conjugation (by glutathione or glycyl transferases). A recent
duces naturally. The importance of the gut microbiota in con- meta-analysis (68) of the metabolic fate of .1000 xenobiotics
tributing metabolites to the human metabolome has also recently showed that cytochrome P450 catalyzed oxidations (40%) and
emerged (50, 66). Some microbial metabolites, typically vitamins, UDP-glucuronosyltransferase glucuronidations (14%) were the
certain essential amino acids, and a few fatty acids, are specific most common followed by reactions involving dehydrogenases
microbial metabolites (w100 compounds in total are known at (8%), hydrolases (7%), glutathione-S-transferases (6%), and sul-
this time). However, a large majority of the metabolites produced fatases (5%). In fact, there are .300 different empirical rules that
by the gut microbiota are derived from the biotransformation of allow one to predict the fate of metabolites on the basis of their
both the endogenous metabolome and the food metabolome and chemical structure (69). Many of the metabolites derived from the
are therefore an integral part of these 2 metabolomes. These biotransformation of food components have not been well char-
microbial metabolites include short-chain fatty acids, secondary acterized. For polyphenols, .230 phase I/II metabolites have
bile acids, protein and amino acid metabolites, as well as plant been identified and associated with the consumption of specific
polyphenol metabolites (67). polyphenol-containing foods (70). The yield of phase I/II reactions
are often very high (68, 71), and host-transformed metabolites re-
tain many of the features of their parent compounds. Consequently,
Metabolism of food constituents these exogenously derived metabolites can be quite useful as specific
Knowledge of the metabolism of food constituents is critical to food biomarkers.
understanding the origin of the biotransformed fraction of the The third way that food metabolites may be transformed is
food metabolome. It is also essential if we wish to use food through microbial metabolism. Microbes have a very different set
metabolites as nutritional biomarkers or as a means to monitor of enzymes from mammals, and given that there are .1000
food consumption. In this regard, it is useful to review how food different species of microbes in the human gut (72) there is an
chemicals can be metabolized. Food constituents can be me- enormous diversity of enzymatic processes that act on food-
tabolized in 3 different ways: 1) they can be digested in the derived compounds. The gut microbiota is particularly adept at
mouth, stomach, and small intestine into simple nutrients that processing polyphenols to phenolic breakdown products. For
can be absorbed through the gut barrier; 2) they can be further instance, depending on the predominant microbiota, polyphe-
transformed by host tissues, especially the liver and kidney; nols can be transformed by ring cleavage to a variety of aromatic
or 3) they can be processed by the gut microbiota in the large compounds such as benzoate and various derivatives of hy-
intestine. droxyphenylacetic and hydroxypropionic acids. These phenolic
The first category of food constituents are intermediary me- acids can be further conjugated to glycine as in hippurate. The
tabolites formed by digestion of lipids, polysaccharides, and gut microbiota also processes indigestible carbohydrates
proteins. Most of these compounds are common to all living through a variety of fermentative pathways yielding short-chain
organisms and identical to human endogenous metabolites. They fatty acids such as butyric acid and propionic acid. Certain
cannot generally be used as dietary biomarkers because of their microbial metabolites can be useful as food biomarkers, al-
common identity and the impossibility to trace their dietary though there is a complex relation between the food source, the
origin. The possible exceptions are the essential amino acids, predominant gut microbial species, and the resulting food
may not be sufficiently specific for the test food in population that useful in epidemiologic studies because both their parent
studies, because regular diets may include other foods containing metabolites (caffeic acid and epicatechin) have been described in
precursors of the same biomarkers. For instance, in a cross- a variety of foods of plant origin (70).
sectional analysis of a whole-diet intervention study it was only For this reason, it may be particularly advisable to look for
possible to verify 23% of potential biomarkers observed in characteristic dietary biomarkers directly in cross-sectional
previous-meal studies (81). studies. However, the chances to identify robust biomarkers will
Cross-sectional studies can therefore play an important role in rely both on the sensitivity of the analytic equipment used and on
biomarker discovery. Low and high consumers are selected from the quality of the dietary data against which metabolic profiles are
food intake data collected by using food-frequency questionnaires, correlated. Both 24-h dietary recalls and food-frequency ques-
food diaries, or other dietary assessment tools. Comparison of tionnaires have been used, and new biomarkers for citrus fruit
these groups can lead to the identification of biomarkers that are intake or coffee were successfully identified (49, 88) (Table 3).
reflective of habitual intake, provided that these biomarkers have The use of food-frequency questionnaires may directly lead to the
a sufficient half-life in the organism or that the foods are regularly identification of biomarkers of habitual dietary exposure, but the
consumed. Although these and other studies have shown the lower accuracy and lower number of foods documented may limit
potential of cross-sectional studies, care needs to be taken because their value for such discovery studies (105).
many of the foods consumed are highly correlated and there is With the exception of 2 studies on dietary fiber and milk protein
a risk of identifying biomarkers that are not specific to the par- diet, all discovery studies were conducted on urine samples as
ticular food of interest unless their identity and specific occurrence opposed to blood samples (Table 3). The reason for this is partly
in the considered foods are established. Notwithstanding, cross- technical because of the higher concentrations of food-derived
sectional studies are excellent resources that are currently un- metabolites in urine as compared with blood and because of the
derused for dietary biomarker discovery. lack of interfering proteins. This contrasts with the preferred use
of blood biospecimens to measure biomarkers of nutritional status
in epidemiologic studies. More metabolomic studies using blood
Novel dietary biomarkers identified through samples should be carried out because of the more common
a metabolomic approach availability of plasma or serum samples in biobanks. Also, li-
An extensive list of potential dietary biomarkers discovered by pophilic biomarkers, which may be more stable over time (see
metabolomics is presented in Table 3. Markers associated with Pharmacokinetics and reliability of dietary biomarkers section),
the consumption of foods, nutrients, or diets have been identified. are more likely to be found in blood. Regression analyses of the
Successful studies include the identification of proline betaine as concentrations of 363 metabolites in plasma with a number of
a marker of citrus intake (49, 80). This marker was first identified dietary variables measured with a food-frequency questionnaire
in small-scale acute feeding studies and validated in free-living showed the highest correlations with phospholipid concentra-
subjects in 2 independent studies (44, 80). It was confirmed in tions (109). Furthermore, chain length and degree of saturation
a cross-sectional study that used untargeted metabolomics (49) of fatty acids in glycerophosphatidylcholines were associated
and played an important role in discriminating noncompliant with intake of specific foods or nutrients such as fish and dietary
individuals in a dietary pattern study of Nordic compared with fiber.
habitual diets (106). In these same studies, screening of urinary It is important to point out that the identities of many of the
profiles for predicted metabolites of citrus fruit also led to the proposed biomarkers in Table 3 (marked with an asterisk) have
identification of some terpenoids and flavonoids as biomarkers of not been fully validated with proper chemical standards because
citrus food intake as well as of intake of citrus-flavored sweets. these standards are often not commercially available. In addition,
This shows well the importance of previous knowledge on food no standard yet exists to report chemical identification of bio-
composition and on metabolism of food constituents for anno- markers in metabolomic studies (110). For this reason, it is
tating unknown discriminating ions in untargeted metabolomic often difficult to evaluate the degree of confidence in biomarker
studies. identification.
Trimethylamine oxide-N-oxide (TMAO) was found to be
a putative biomarker for meat intake or for meat-containing diets
in several studies (102–104), but it has also been reported as Analysis of the food metabolome
a biomarker of fish intake by other authors (82, 107) and shown Analyzing the food metabolome is a particularly challenging
to be more responsive to intake of fish than meat (85). Several task for 3 reasons. First, it comprises a much greater chemical
dietary precursors of TMAO such as choline or carnitine have diversity than any other part of the metabolome (see Food me-
been described (108) and care should be paid when interpreting tabolome and metabolite databases section). A second feature of
variations in TMAO concentrations in populations. the food metabolome is the huge range of concentrations, from
The state of validation of biomarkers listed in Table 3 varies picomolar or nanomolar concentrations for some contaminants
widely. Proline betaine is a good example of a well-validated or phytochemical metabolites to millimolar concentrations for
citrus fruit biomarker. Other biomarkers, particularly those nutrients such as sugars. Third, many components of the food
identified in controlled intervention studies, may prove to be less metabolome are unknown. Indeed, the metabolism for a large
robust in populations because of the possible existence of a va- proportion of nonnutrients in humans has never been studied and
riety of precursors as seen for TMAO, or the occurrence of the the chemical structures of their circulating metabolites have not
same precursor in various foods. Food-derived biomarkers such been identified. Until recently, the food metabolome was typi-
as caffeic acid sulfate or methylepicatechin sulfate, which were cally analyzed through targeted methods optimized for specific
found to discriminate consumers of raspberries (82), may not be compounds or families of nutrients or nonnutrients, such as
on 16 July 2018
Tentative dietary biomarkers identified through untargeted metabolomic approaches in human dietary intervention studies and cross-sectional studies1
Dietary factor and No. of Dietary assessment Analytic
study type subjects Comparison tool Biospecimen technique Biomarker Reference
TABLE 3 (Continued )
CS 107 Consumers/ 24-h dietary record U (24-h) LC-Q-Tof 4-Ethyl-5-aminopyrocatechol sulfate,* 4-ethyl-5-methylaminopyrocatechol- (81)
Coffee
CS 18 Consumers/ U (fasting) LC-Q-Tof N-Methylpyridinium, trigonelline (88)
nonconsumers
AI 9 Before/after NA U (kinetics) LC-Q-Tof N-Methylpyridinium, trigonelline (88)
CS 68 H/M/L FFQ U (spot, 24-h, FIE-FTICR-MS Dihydrocaffeic acid (87)
fasting)
Chamomile tea
SMTI 14 Before/after NA U (spot) NMR Hippuric acid* (89)
Black tea
AI 3 Before/after NA U (24-h) NMR Hippuric acid,* gallic acid, 1,3-dihydroxyphenyl-2-O-sulfate* (90)
Tea (black and green)
STI 17 Consumers/control NA U (24-h) NMR Hippuric acid.* 1,3-dihydrophenyl-2-O-sulfate* (58)
Green tea
AI 8 Before/after NA U (kinetics) NMR Hippuric acid* (91)
Black tea
AI 20 Consumers/control NA U (kinetics) NMR Hippuric acid,* 4-hydroxyhippuric acid,* 1,3-dihydrophenyl-2-O- (92)
sulfate,* allic acid, 4-O-methylgallic acid*
(Continued)
on 16 July 2018
TABLE 3 (Continued )
methoxyhydroxyphenylvalerolactone-glucuronide,*
hydroxyphenylvalerolactone-glucuronide* and -sulfate,*
5-(hydroxymethoxyphenyl)valeric acid-sulfate,* 4-hydroxy-
5-(phenyl)valeric acid-sulfate*
Chocolate (solid or
drink)
CS 107 Consumers/ 24-h dietary record U (24-h) LC-Q-Tof 6-Amino-5-(N-methylformylamino)-1-methyluracil,* theobromine, (81)
nonconsumers 7-methyluric acid
(Continued)
1295
TABLE 3 (Continued )
on 16 July 2018
1296
Almond-skin extract
AI 24 Before/after NA U (kinetics) LC-Q-Tof (Epi)catechin-sulfate,* O-methyl-(epi)catechin-sulfate,* naringenin- (98)
O-glucuronide,* 5-(hydroxyphenyl)-g-valerolactone-glucuronide*
and -sulfate,* 5-(dihydroxyphenyl)-g-valerolactone- glucuronide,* -sulfate
glucuronide* and -sulfate,* 5-(trihydroxyphenyl)-g-valerolactone-
CS 107 Consumers/ 24-h dietary record U (24-h) LC-Q-Tof 5-Hydroxyindole-3-acetic acid (81)
nonconsumers
Nutrients
Dietary fiber
SMTI 77 H/L Dietary record U (24-h) NMR Hippuric acid* (99)
SMTI 25 Consumers/control NA P (fasting) LC-Q-Tof 2-Aminophenol-sulfate, 2,6-dihydroxybenzoic acid, hydroxynuategenin- (100)
glucuronide*
Whey protein isolate
SMTI 12 Consumers/control NA P (sequential) LC-Q-Tof Tryptophan, phenylalanine, kynurenine, g-Glu-Leu (101)
Whey hydrolysate
SMTI 12 Consumers/control NA P (sequential) LC-Q-Tof Methionine sulphoxide, cyclo(Pro-Thr), cyclo(Ala-Ile), cyclo(Phe-Val), (101)
b-Asp-Leu, pGlu-Pro,
Diets
Omnivorous diet
SMTI 12 Consumers/control NA U (24-h) NMR Taurine,* carnitine,* acetylcarnitine,* 1-methylhistidine,* (102)
3-methylhistidine,* trimethylamine-N-oxide*
Vegetarian diet
SMTI 12 Consumers/control NA U (24-h) NMR p-Hydroxyphenylacetate* (102)
Meat protein diet
SMTI 24 Before/after NA U (24-h) NMR Trimethylamine-N-oxide,* histidine* (103)
Seafood
AI 17 Consumers/control NA U (kinetics) LC-Q-Tof Trimethylamine-N-oxide (85)
(Continued)
THE FOOD METABOLOME: A WINDOW OVER DIETARY EXPOSURE 1297
*No standard was used to confirm the identity of the biomarker. AI, acute intervention; CS, cross-sectional; FFQ, food-frequency questionnaire; FIE, flow injection electrospray; FTICR, Fourier transform
Reference
ion cyclotron resonance; GC, gas chromatography; H/L, high and low (intake); H/M/L, high, medium, and low (intake); LC, liquid chromatography; MS, mass spectrometry; NA, not applicable; NMR, nuclear
lipids, organic acids, sugars, flavonoids, or carotenoids. How-
(104)
(103)
(104)
(105)
(105)
(106)
ever, the combination of available targeted analysis methods is
still far from covering the whole chemical space of the food
metabolome. In principle, untargeted metabolomics provides
magnetic resonance spectroscopy; P, plasma; Q, quadrupole; S, serum; SMTI, short- and medium-term intervention; Tof, time-of-flight; U, urine.
Short-chain fatty acids*
NMR
NMR
U (fasting)
U (fasting)
S (fasting)
U (spot)
U (spot)
Questionnaire
NA
Consumers/control
Consumers/control
Consumers/control
161
161
107
24
10
60
vegetables, soy)
Lactovegetarian diet
diet (citrus,
cruciferous
Dietary factor and
SMTI
SMTI
study type
CS
CS
remains a continuing challenge. It is essentially impossible to use compounds and structures in the samples and is therefore partic-
standards or isotopically labeled references to quantify the thou- ularly important for characterizing the food metabolome.
sands of compounds in the food metabolome. New approaches are Metabolic profiling data may be analyzed by using univariate or
being developed with isotope labeling and multiple reaction multivariate statistical methods. Statistical analysis of untargeted
monitoring–based profiling for families of compounds sharing metabolomic data is often an initial step in the biomarker discovery
distinctive chemical functionalities (118). Labeled reagents tar- process that should not be confused with hypothesis testing, be-
geted at these functionalities or particular multiple reaction cause there is no a priori hypothesis. In dietary intervention studies
monitoring transitions could be used to specifically measure se- with single foods, the contrast observed for a good biomarker can be
lected fractions of the food metabolome such as amines, phenols, large, sometimes even infinite, making it possible to work robustly
glucuronides, or mercapturic acid derivatives. These advances may with small sample sets and discriminate potential intake biomarkers
allow researchers to target larger areas of the food metabolome from more subtle changes in endogenous metabolites (126). In
chemical space with the use of standardized quantitative methods. cross-sectional studies this large contrast seldom applies, but ap-
proximate dose-response relations from food-frequency question-
naires may help in the identification of food intake biomarkers.
Analysis of metabolomic data Multivariate analysis is most commonly used for explorative
The metabolic profile of raw data generated by the spectro- analysis of metabolic profiling data (127). As opposed to uni-
metric analysis of biological samples can be analyzed in several variate analysis, multivariate analysis can be performed in an
steps (119, 120). These include data preprocessing, data align- unsupervised manner (ie, without including information on group
ment, data normalization, and signal correction followed by the assignment for the analysis). This provides an objective as-
analysis through various statistical methods. There are a number sessment of the principal patterns in the data set (eg, intake or no
of different software tools available for these tasks; most vendors intake of a specific food component or diet). Unsupervised
have their proprietary software but highly efficient freeware analysis such as principal components analysis (PCA) should
programs, Web servers, or add-on softwares exist. For NMR, an always be the starting point for explorative multivariate analysis
example is the Interval Correlation Optimized shift algorithm to ascertain that there is an overall segregation into a food-related
produced for Matlab (121), and for LC-MS data alignment pattern. The features associated with any pattern can be shown by
freeware such as XCMS (122), MZmine (119, 123), and Met- the loadings in a PCA plot; however, PCA is generally not well
Align (124) are widely used. suited to identify the most prominent part of the pattern. Sparse
The preprocessing step is software dependent and typically PCA overcomes this limitation (128, 129). Clustering methods
includes data reduction methods such as centroiding of mass are also widely used for subdividing and ordering a data set into
spectra or analog-to-digital conversion of NMR, infrared, or UV/ groups of data with a high degree of similarity. Hierarchical
visible spectra. Preprocessing also includes translation of data clustering generates a dendrogram in which neighboring samples
formats and data export. The next step is data alignment. It is share the greatest similarity and neighboring features are those
crucial to align the different sample profiles, which do not match most closely related. This provides useful biological information
exactly because of small variations in retention times, masses, or and unsupervised groupings of the data set (130).
chemical shifts. All available software tools differ in their peak Supervised multivariate analysis is commonly the next step in
picking algorithms. There is only a 50–70% overlap between the many data analysis methods but has a strong tendency to overfit
peaks detected by different packages from the same raw data set, the data. Even random data will usually segregate and show
even with similar settings (125). Additional markers may be a “marker pattern” after supervised analysis (131). Careful
observed by using additional softwares or simply by altering validation with the use of techniques such as permutation testing
software settings. Another major difference between packages is and cross-validation is therefore always necessary. There are
the presence or absence of so-called gap filling, a routine to a large number of supervised methods (120, 127), with the most
revisit the raw data for any peak that has not been detected in commonly used analysis for comparing 2 groups being partial
a sample when it was found in others. The lack of a gap-filling least-squares discriminant analysis (PLS-DA) (132) or one of its
algorithm creates major problems for normalization and for several variants. In complex nutritional studies it may be useful
statistical analysis. An ideal food intake marker would have to combine ANOVA separations of factors with PLS-DA (133,
a zero value in control samples from volunteers who did not 134) or use multilevel PLS-DA to reduce the influence of in-
consume the food; in this case, the gap-filling routine helps to terindividual variation (135). Some multivariate methods such as
estimate the background noise in the peak area. PLS are mainly used to fit the data to a continuous variable. This
The output from the peak detection and data alignment steps is is useful to explore the relation of any features in the profiling
typically a matrix of samples and features with the intensity as the data set with an external variable (eg, intake of a specific food
values within the matrix structure. A feature here denotes any based on a questionnaire or any biological outcome marker)
distinct peak in the data set, regardless of whether it represents (121). In addition, for these prediction models very careful
a known, unknown, or even an artifact ion. In LC-MS profiling, validation is required and their global ability to predict a specific
the features are characterized by a retention time and a mass (m/z) food intake has to be assessed in separate studies.
value. Such a feature may be a compound’s parent ion, but just Univariate analysis is supervised—that is, a hypothesis re-
as frequently it represents an adduct ion or a fragment from garding a difference between groups is implicit. Any marker
a compound. In NMR and in most other digitized spectral data the identified by this approach should therefore also be in-
single features are part of spectral shapes that usually have local dependently validated in a separate study. For univariate analysis
maxima and minima. For both kinds of data the fine structure of the used in exploration of new food intake biomarkers it is impor-
data contains additional information that is useful for identifying tant to set a reasonable threshold for false discovery rates
dietary polyphenols (144). A large number of polyphenol me- acid, and di-homo-g-linolenic acid—were found to be associ-
tabolites such as glucuronides of 5-(3#,4#-dihydroxyphenyl)- ated with the risk of gastric cancer (169). These associations
g-valerolactone and sulfate esters of methylated (epi)catechin were tentatively explained by either different amounts of dietary
could thus be easily annotated and some fully structurally elu- intake or differential fatty acid metabolism in cases and controls.
cidated by using a combination of MS fragmentation and NMR The alternative untargeted approach makes no a priori as-
(163). sumptions regarding sources of exposure that are causal for
Moreover, recently developed bioinformatics approaches aim a particular disease but instead relies on comparisons of com-
to narrow the number of possible candidate structures that match prehensive profiles of metabolomic features between cases and
with an unknown query metabolite by taking into account the controls to find discriminating exposure biomarkers. Once these
chemical and biological background of the sample (164). For exposure biomarkers have been identified, follow-up studies are
example, it is more likely that a metabolite excreted in urine is performed to determine their sources (167), and those related to
more polar as a result of phase II reactions. This has predictable dietary factors would be regarded as disease-associated dietary
consequences for its expected mass and chromatographic be- biomarkers (Figure 2). The agnostic nature of the untargeted
havior, which can be used to mine metabolomic data sets. It is design allows all potentially useful biomarkers to be identified,
expected that these various software tools will be beneficial in the including not only dietary biomarkers but also those related to
hunt for metabolite entities represented by the food metabolome. endogenous factors (including the microbiota), pollution, and
drugs as well as biomarkers of disease progression. A good
example of the untargeted approach is given by Holmes et al (59)
PERSPECTIVES FOR FUTURE APPLICATIONS OF THE and Bictash et al (166) who used untargeted NMR of .4000
FOOD METABOLOME urine specimens from the INTERMAP study to investigate po-
tentially causal factors for high blood pressure across geo-
Discovering disease-related dietary factors graphically diverse populations. The investigators showed that
MWASs have been proposed as useful tools for discovering metabolite concentrations differed substantially between Asian
low-molecular-weight biomarkers that are predictive of either and Western populations, suggesting important effects of diet
causal exposures or disease progression (59, 165, 166). In fact, and related risk factors, including the microbiota, on the risk of
MWASs can be regarded as a special case of the exposome-wide coronary artery disease and stroke. Three highly discriminating
association study, which investigates disease associations with all biomarkers were identified, namely alanine, which was directly
exposures to low- and high-molecular-weight compounds (167). correlated with blood pressure, and formate and hippurate, both
Given the thousands of potentially important exposures to con- of which were inversely correlated with blood pressure. All of
sider, MWASs and exposome-wide association studies move these discriminatory biomarkers point to dietary sources, some-
away from knowledge-driven designs that focus on a priori times in combination with cometabolism by gut microbiota. For
hypotheses about particular exposures toward data-driven de- example, alanine is associated with diets that emphasize animal
signs using untargeted or semitargeted sets of analytes (167). In products rather than vegetables, and hippurate has been associated
either case, potentially useful biomarkers may be identified with microbiota colonization of the gut (170).
through rigorous comparisons of quantitative or semiquantitative A more recent example of the untargeted approach is provided
profiles of biospecimens obtained from subjects with and without by a series of articles from Stanley Hazen’s group at the
a particular disease (59). Because diets and lifestyle strongly Cleveland Clinic (108, 171, 172). In their initial untargeted
affect the metabolome, any pending disease may lead to reverse LC-MS/MS investigation (171), the authors showed that the nutrient
causation in MWASs; study design and interpretation must choline, along with its major metabolites, betaine and TMAO,
therefore take into account the common responses to early signs were associated with risks of cardiovascular disease, particularly
of disease in the population under study and other potential TMAO. Then, by using an elegant set of targeted follow-up
confounders.
This biomarker discovery process is shown in Figure 2. With
a focus on the food metabolome and associated biomarkers of
potentially causal dietary exposures, the figure includes both
semitargeted and untargeted designs. In the semitargeted ap-
proach, preliminary cross-sectional studies are developed to
connect dietary records with the food metabolome and thereby
identify dietary biomarkers that are highly correlated with the
consumption of particular foods. A good example of this ap-
proach is given by Saadatian-Elahi et al (168), who correlated
food consumption, as determined by 24-h dietary recall, with
plasma concentrations of 22 fatty acids determined by gas
chromatography in 3000 subjects from the European Prospective
Investigation into Cancer and Nutrition cohort. Strong correla-
tions between regional dietary factors and fatty acid concen-
trations allowed components of the food metabolome to be used
as predictor variables in a prospective investigation of gastric FIGURE 2. The food metabolome and discovery of food-related bio-
cancer in the European Prospective Investigation into Cancer markers associated with diseases. Both semitargeted and untargeted ap-
and Nutrition cohort. Three fatty acids—oleic acid, a-linolenic proaches are shown. Disease-validated biomarkers are shown in bold letters.
standards in general, such as the MSI working groups (189), the by the gut microbiota, so these metabolic pathways need to be
Metabolomics Forum (190), and several others, but none of these covered as well. This work will require a large community effort
relate specifically to the food metabolome. The First International to develop software to predict structures from all possible me-
Workshop on the Food Metabolome was a first occasion for all tabolites from any food compound, including their conjugates;
researchers active in the field to meet and make propositions for such a tool could be further combined with software that performs
future research. These propositions are summarized here. in silico fragmentation to predict daughter ions and additional
prediction tools for predicting physicochemical properties such as
Coordination of dietary studies polarity and hence retention time. Prediction of absorption, dis-
tribution, and excretion of the food compounds and of their me-
The food metabolome is exceedingly complex because it en-
tabolites would be an additional area that would help the food
compasses metabolites derived from as many metabolomes as there
metabolome community. Systematic in silico–predicted metabo-
are edible species. Therefore, a particularly focused community
lites could also be stored in food metabolome databases.
effort is necessary to reach our ultimate goal of full coverage for all
foods and all food metabolites. A large number of studies with
different designs will be necessary to validate each dietary marker. Databases
For example, many studies have been conducted with oranges
The human metabolome database has recently expanded to
(Table 3), but a broad coverage of all citrus and many other fruit as
include compounds found in common foods because these are, at
well as kinetic studies have been necessary to interpret proline
least initially before metabolism, also present within the human
betaine as a short-term marker of citrus that is dominated by orange
body (9). Databases specific for the food metabolome are still
and orange juice intake (80). Similar work is needed and could be
largely missing apart from Phenol-Explorer, a database on all
a shared effort for many other food groups, including cruciferous
known polyphenol metabolites (70). The development of similar
and apiaceous vegetables, pomes, cheeses, meats, fish, and others.
databases for other classes of food compounds will likely require
A large concerted action or open-project network would be
a coordinated effort from many researchers active in various
needed to help prioritize needs for novel markers and focus on areas
fields. These databases should provide spectral data for the food-
in which drugs have largely failed and where diet and nutrition
derived metabolites in each class and any information useful for
show promise to prevent or cure diseases. More discussion is clearly
their identification. When not available, in silico–predicted mass
needed between laboratory scientists, nutritionists, and epidemi-
fragmentation spectra could be calculated and also stored, as is
ologists to address this question in a rational way. Such a network
done in SciFinder (151). The same databases could additionally
might share information on current research plans to avoid re-
allow metabolites to be linked with their food precursors, as well
dundancy, share known as well as unidentified markers related to
as with their possible dietary sources (70). The involvement of
specific foods, or even form a shared workflow pipeline for dietary
food scientists will be essential to provide this information.
studies, data analysis, and metabolite identification. In addition, the
constitution of a database describing resources of high-quality
human samples collected in various dietary intervention studies Study repositories with processed metabolomic data
developed for other purposes would also be extremely useful. This
To shape consensus and create openness in the evolving field of
information is partly accessible in a database such as ClinicalTrials.
metabolomics, it is important to share data and information on
gov (191), but no indication is given on the availability of bio-
food metabolome studies, as is done in many other biomedical
specimens. These samples would prove very useful for biomarker
fields (194–196). Indeed, for many funding agencies, this is
validation purposes and would save a lot of effort and money
becoming a key condition of funding. One such initiative is the
otherwise needed to replicate such clinical studies. An example of
Metabolights database, which aims to shape a fully open-access,
a local, but open, sample repository for experimental studies in-
shared database for metabolomic studies (197). Raw and pro-
cluding nutrition is the CUBE biobank, which covers samples from
cessed data and metadata can be uploaded and curated before
a single university (www.cube.ku.dk). An umbrella of such local
deposition into the Metabolights core database, which then
repositories could be one possible way forward to improve reuse of
makes the information accessible through the Internet. A similar
samples for biomarker validation studies.
ongoing but conceptually broader initiative is the Nutritional
Phenotype Database (dbNP) (198), initiated by the Nu-
Software tools trigenomics Organization. The dbNP can hold data from several
A comprehensive set of software tools has been developed and omics platforms, including metabolomics, together with study
shared to help the scientific community that covers every step in metadata in a searchable format. It is open access and builds on
data processing and analysis. Most of these are not specific to the private accounts for uploading and analyzing data with the
food metabolome analysis (see Analysis of metabolomic data possibility of open sharing when data can be released for others.
and Software tools for annotation of the food metabolome sec- Both dbNP and Metabolights provide several online software
tions). However, for the identification of food-derived metabo- tools to help in data curation and analysis.
lites, additional software developments are needed, particularly The storage of searchable, annotated, raw analytic data files
for in silico prediction of the metabolism of compounds found in with well-documented dietary metadata from human intervention
foods. Some commercial software exists for the pharmaceutical or cross-sectional studies will facilitate the comparison of raw or
area (192, 193) and covers many phase I and II reactions. How- preprocessed data with previously obtained spectral data of food-
ever, many compounds in foods have structures that are un- derived metabolites. Such a repository that contains all unknowns
common in pharmaceuticals. Food constituents may be degraded detected in previous food metabolome studies would be a pre-
by specialized enzymes and may also be extensively metabolized cious aid to identify the most robust dietary biomarkers. The