Pang 2020
Pang 2020
Pang 2020
OH
OH
metabolites
Article
MetaboAnalystR 3.0: Toward an Optimized Workflow
for Global Metabolomics
Zhiqiang Pang 1 , Jasmine Chong 1 , Shuzhao Li 2 and Jianguo Xia 1,3, *
1 Institute of Parasitology, McGill University, 21111 Lakeshore Road, Ste Anne de Bellevue, QC H9X 3V9,
Canada; [email protected] (Z.P.); [email protected] (J.C.)
2 The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032, Canada;
[email protected]
3 Department of Animal Science, McGill University, 21111 Lakeshore Road, Ste Anne de Bellevue,
QC H9X 3V9, Canada
* Correspondence: [email protected]; Tel.: +1-(514)-398-8668
Received: 16 April 2020; Accepted: 3 May 2020; Published: 7 May 2020
Keywords: global metabolomics; peak detection; batch effects; pathway activity prediction
1. Introduction
Global or untargeted metabolomics is increasingly used to investigate metabolic changes of
various biological or environmental systems in an unbiased manner [1,2]. Liquid chromatography
coupled to high-resolution mass spectrometry (LC-HRMS) has become the main workhorse for
global metabolomics [3,4]. The typical LC-HRMS metabolomics workflow involves spectra collection,
raw data processing, statistical and functional analysis [5]. A wide array of bioinformatics tools
have been developed to address one or several of these steps [5,6]. Despite significant progress
made in recent years, critical issues remain with regard to several key steps involved in the current
metabolomics workflow.
The first issue is related to peak detection during raw spectra processing. Improving the ability to
extract real compound signals and reduce noise is crucial to avoid noise inflation prior to statistical
and functional analyses. Default parameters provided by common spectra processing tools are not
applicable to all experiments [7], and misuse of parameters can lead to significant issues in data
quality [8]. To mitigate this issue, commercial tools such as Waters MassLynx™ and open-source
software such as XCMS [9] and MZmine [10] allow users to specify multiple parameters to define
LC-MS scan signals as chromatographic peaks. Although useful, such manual configuration assumes
users are familiar with the experiments, which is often not the case. To facilitate the process, several
tools and protocols have been developed for optimizing parameters for spectra processing. For instance,
Isotopologue Parameter Optimization (IPO) is an R package designed to estimate the best parameters
for XCMS [11]. While the approach is effective, its stepwise optimization based on the entire spectra
is very time consuming. IPO can often take days to weeks to compute the optimized parameters.
Another recent tool is AutoTuner [12], which optimizes peak widths based on pre-defined extracted
ion chromatograms (EIC). Despite being more computationally efficient than IPO, it may lead to
potential errors due to unverified EICs used. Aside from these tools, Design of Experiment (DoE)
strategies based on diluted samples provide a relative time-saving protocol for parameter optimization,
but requires an extra series of diluted standards to be prepared [13]. Another optimization strategy,
One Variable at A Time (OVAT) [14], attempts to maintain the lowest coefficient of variation of peaks
within a group, but this method takes even more computational time than IPO, in our experience.
The second issue is batch effect, which is commonly associated with large-scale clinical or
population studies when samples are analyzed in different batches or across a long time period [15,16].
Over the course of spectral collection, chromatographic conditions can change and baselines can
drift [17]. To address this issue, several types of batch correction methods have been developed
based on quality control (QC) samples, QC metabolites, internal standards, matrix factorization,
or location-scale normalization [18]. These methods are based on different assumptions with their
own advantages and limitations. Selecting a suitable batch correction method is critical, as it has a
significant impact on downstream statistical and functional analysis.
Finally, biological interpretation of metabolomics data typically requires metabolites to be first
identified prior to functional analysis. This process is very time consuming and remains a key
bottleneck in global metabolomics [19,20]. The mummichog algorithm has introduced the concept
of predicting pathway activity from ranked LC-MS peaks based on matching patterns of putatively
annotated metabolites [21]. The algorithm is available as Python scripts [22]. To support the broad
R user community, previous versions of MetaboAnalystR [5,23] implemented mummichog v1.08.
The recently released version 2 has added several improvements including the use of retention time
(RT) to refine the grouping of signals into empirical compounds (EC). The inclusion of retention time
will reduce false-positive annotations to increase the accuracy of pathway activity prediction.
Here, we introduce version 3.0 of MetaboAnalystR. Compared to its predecessor, version 3.0 has
three key features: (1) efficient parameter optimization for spectral peak picking; (2) automatic selection
of an optimal batch correction approach from 12 well-established methods; and (3) incorporation of
retention time coupled with updated pathway libraries for improved pathway activity prediction.
The performances of these new features are assessed in the three case studies below.
2. Results
MetaboAnalystR 3.0 aims to provide an efficient pipeline to support end-to-end analysis of
LC-HRMS metabolomics data in a high-throughput manner. This open-source R package is freely
available at the GitHub repository [24]). Detailed tutorials, manuals, example datasets, and R scripts
are also included in the repository. The enhanced key points in the global metabolomics workflow of
MetaboAnalystR 3.0 is summarized in Figure 1.
Figure 1. MetaboAnalystR 3.0 provides an optimized workflow for global metabolomics: optimized
peak picking, automized batch effect correction, and improved pathway activity prediction.
Metabolites 2020, 10, 186 3 of 14
Figure 2. Time consumed by One Variable at A Time (OVAT), Isotopologue Parameter Optimization
(IPO), MetaboAnalystR, and AutoTuner for parameter optimization on three different datasets.
The evaluations were performed on a desktop computer (Ubuntu 18.04.3 with an Intel®Core™
i7-4790 CPU and 32 GB of memory).
distribution. Peaks with a cor estimate over 0.9 and p value less than 0.05 are considered Gaussian Peaks.
XCMS under different parameters (default, IPO and AutoTuner) displayed different performances on
the peak simulation. Meanwhile, peaks picked by MetaboAnalystR 3.0 had the highest Gaussian Peaks
ratio compared with other strategies.
Table 1. Qualitative peak picking results of the different tools using different settings.
Figure 3. Assessment of the performance of different tools utilizing the NIST 1950 serum dilution series.
(A) Reliability Index (RI) vs. processing speed for three optimization strategies compared to the default.
(B) A bar graph showing the number of peaks with good linearity (p < 0.001).
As shown in Figure 3A, compared to the default (no optimization), IPO produces the best RI
value (6252), however, at the cost of speed (316 minutes in total). Meanwhile MetaboAnalystR 3.0 has
both good RI performance (5658) and acceptable speed (total of 49 minutes for optimization and data
processing). AutoTuner is the fastest for optimization and data processing, but the improvement on
RI is marginal. The number of peaks that meet the linearity (p < 0.001) are summarized in Figure 3B.
MetaboAnalystR 3.0 produced the largest number of linear peaks compared to the other options.
Metabolites 2020, 10, 186 5 of 14
Figure 4. Performance evaluation using Inflammatory Bowel Disease (IBD) data. Principal Component
Analysis (PCA) of peaks profiled with (A) default parameters and (B) optimized parameters.
(C) Performance of batch effect correction by different strategies. Among them, EigenMS behaved the
best (indicated by *). (D) PCA of the optimized and batch corrected data.
Given that the QC samples are a homogenous mixture of all of the patients’ samples, they are
expected to locate in the center of the PCA as a tight cluster. However, this was not the case using the
default parameters (Figure 4A). Using optimized parameters, these pooled QC samples were better
mixed with the other samples (Figure 4B). However, both A and B showed systematic variations
among these samples, suggesting batch effects in this large-scale study. In this case, MetaboAnalystR3.0
applied batch effect correction with the Combat, Analysis of Covariance (ANCOVA), WaveICA, Quality
Control-robust LOESS signal correction (QC-RLSC), and EigenMS methods, respectively. The PCA
distances among all QC samples are summarized in Figure 4C, which indicates that the best correction
was performed by EigenMS, a method based on singular value decomposition to detect and correct
for systematic bias [28]. After applying EigenMS, QCs were tightly clustered together and biological
samples were clustered based on their biological origins (Figure 4D), providing strong evidence for the
utility of the batch effect correction method selected by MetaboAnalystR 3.0.
Predicting pathway activities directly from LC-HRMS peaks can significantly accelerate biological
discoveries in global metabolomics. We have previously implemented mummichog v1.08 within
MetaboAnalystR 2.0. Now, MetaboAnalystR 3.0 has incorporated a major update of mummichog
(v2.0) with retention time integration. To demonstrate the improvements to biological interpretation
stemming from both the optimized pre-processing steps and the updated mummichog algorithm,
we applied both versions of the mummichog algorithm using the human BiGG and Edinburgh Model
pathway library (“has_mfn”) to compare the biological significance detected by the original pipeline
(default peak parameters and non-corrected data, as shown in Figure S1) versus the optimized pipeline.
Metabolites 2020, 10, 186 6 of 14
For the Crohn’s disease (CD) and non-IBD controls, a total of 3048 features were identified using
the optimized pipeline and 2364 features using the non-optimized pipeline. For the non-optimized
dataset, mummichog v1.08 identified no significant pathways (Gamma-adjusted p value < 0.05),
while mummichog v2.0 identified 16 significantly different pathways (Tables S3 and 4). Similarly,
for the optimized dataset, mummichog v1.08 identified only nine significantly perturbed pathways,
whilst v2.0 identified 17 significantly perturbed pathways (Table 2). Evidently, mummichog v2.0,
with its integration of RT information to group related m/z features into empirical compounds, reveals
more biological insights than its predecessor. Moreover, mummichog results (both v1.08 and v2.0) for
the optimized versus non-optimized dataset consistently identified differences in Bile acid biosynthesis,
Vitamin D metabolism, and Vitamin E metabolism between CD patients and non-IBD controls. The details
of the pathways identified are summarized in Tables S3–S6. Finally, both versions of mummichog
algorithms also consistently identified a higher total number of pathways for the optimized dataset,
versus the non-optimized dataset. This highlights the importance of data calibration to improve the
detection of true biological signals. The other comparisons (ulcerative colitis vs. non-IBD control)
showed similar results, as shown in Figure S2.
Table 2. The pathway enrichment results (top 20, Crohn’s disease vs. non-IBD) generated by
mummichog v1.0.8 and v2.0. Insignificant pathways (p value > 0.05) are shown in grey text.
3. Discussion
The previous version (v2.0) of MetaboAnalystR provided an end-to-end workflow to process raw
LC-HRMS metabolomics data [5]. This new version (v3.0) has further enhanced three key steps of
this workflow by focusing on efficient optimization for peak picking, improved batch effect correction,
and more meaningful putative compound annotations for pathway analysis.
Parameter optimization remains a computational bottleneck in current raw LC-HRMS spectra data
processing. Most tools rely on users to manually adjust the default parameters, which is inconvenient
as users need to be very familiar with their MS instruments and experimental setup. The key concept
of our optimization strategy is to use a subset of spectra based on multiple ROIs that are enriched for
real peaks, instead of using complete spectra. These ROIs are selected based on the characteristics of
the eluted compounds’ peaks across the whole chromatogram to extract peaks with wide m/z ranges
(see Materials and Methods for more detail). The subsequent optimization is performed on peaks in
these ROIs. One potential criticism we anticipate is the “bias” toward high-intensity peaks. We would
like to point out that this is generally not the case - low intensity peaks are sufficiently represented
in these ROIs due to the sparse nature of LC-HRMS spectra (see Figure 5 in Materials and Methods).
By focusing computational resources on real signals instead of noise, our approach has significantly
accelerated the process for practical applications. Meanwhile, users can manually adjust the default
m/z or RT window for selecting ROIs. The qualitative and quantitative efficacy of this approach
have been demonstrated by two benchmark datasets. In particular, a significant improvement on the
identification of true peak features has been observed using a known standards benchmark dataset [25].
This resulted from the increased emphasis on the Gaussian fitting and peak group stability at the same
time, rather than only focusing on the number of detected isotopes. The quantitative improvement
of the parameters optimized by MetaboAnalystR 3.0 was also illustrated using the NIST SRM 1950
datasets. It should be noted that this data contains only two replicates for each concentration, which is
a limiting factor for this validation.
Finally, the IBD data was first processed using the optimized parameters, followed by batch
correction based on QC samples. The PCA revealed clear group patterns according to different IBD
groups. Furthermore, more metabolic pathways were reported when using our optimized metabolomics
workflow. The majority of these pathways are biologically meaningful according to previous studies
including bile acid [28,29], vitamin E [30], vitamin D3 [31,32], galactose [33], glycerophospholipid [33],
fatty acid [29,34], and hyaluronan [35] metabolism pathways. Similarly, other comparisons between the
different IBD groups also produced more perturbed metabolic pathways by our optimized workflow
in MetaboAnalystR 3.0.
Using the IBD samples, we also compared the performances of the mummichog algorithm
implemented in MetaboAnalystR 2.0 versus that in MetboAnalystR 3.0. The main difference between
their implementations is that retention time information is integrated when performing the putative
compound annotation. This step moves pathway enrichment from the compound space to the empirical
compound space formed by grouping co-eluting m/z features. Our results show that the new version
improves both the number and quality of significant pathways that can be identified, as it identified
perturbed pathways that are more consistent with IBD literature, as stated above.
Metabolites 2020, 10, 186 8 of 14
Figure 5. The selection process of regions of interest (ROIs) that are enriched for true peak signals.
Red dashes in (A) represent the bin boundaries used for sliding windows’ working to contain the most
signal points. The whole spectrum is divided evenly into four bins. Four m/z windows (light red area)
will slide within each bin respectively in parallel and select the window with the highest scan intensity
sum in the retained m/z window. RT window (light red area) in (B) will slide across the entire RT
dimension to get retention time regions with the highest scan signal intensity. (C) The intersected MS
scan signals from both the m/z and RT dimensions containing four ROIs. (D) The zoomed-in view of
the ROIs (note low intensity peaks are still abundant).
4. Conclusions
MetaboAnalystR 1.0 provided the comprehensive statistical and functional analysis underlying
the MetaboAnalyst web application, while MetaboAnalystR 2.0 equipped v1.0 with comprehensive
raw LC-MS data processing and pathway activity prediction from MS peaks. MetaboAnalystR 3.0
has further enhanced three key aspects of the LC-MS data processing workflow including parameter
optimization for peak picking, adaptive batch effect correction, and improved annotation of putative
compounds for pathway activity prediction. MetaboAnalystR 3.0 represents our latest efforts toward
developing an efficient pipeline for high-throughput global metabolomics.
intensity sum within each bin will be retained, as shown in Figure 5A. Second, at the RT dimension,
the sliding window method is used again to detect the scan signal intensity and returns the window
with the highest values (Figure 5B). Synthetic spectra are created based on the returned ROIs defined by
the two dimensions (m/z and RT). Peaks are extracted from the synthetic spectra to simulate standards
across the whole m/z range (Figure 5C). These ROIs are enriched for true peaks, which are characterized
by overall high-intensity signals distributed across the window. It is important to note that ROIs still
contain a sufficient number of low-intensity signals for optimization, as shown in Figure 5D. The RT
sliding window is also manually adjustable to cover different percentages (0, 100%] of RT dimension
to further overcome the potential bias. If there are internal standards or quality control metabolites
included within the user’s samples, peaks with specific m/z and/or RT can be extracted or removed
with the modes named “mz_specific” or “rt_specific”.
RP3/2
QS = 0 all
∗ GR2 ∗ QcoE
peaks0 − LIP
where RP is the reliable peaks and LIP is the low-intensity peaks, as defined by IPO according to the
isotopes detected by CAMERA. Briefly, RPs refers to peaks with detectable isotopes. “all peaks” means
all peaks detected including reliable and unreliable peaks. LIP refers to a group of peaks with the
intensity of their isotopes too low (less than the average of the lowest 3% peak intensity in the spectra).
Unlike IPO, the exponential factor for RP was lowered to 1.5 to reduce the sensitivity for peak picking
and to avoid the inflation of noise. GR is the Gaussian peaks ratio. An exponential factor of 2 was
empirically used to put more emphasis on the peak shape. QcoE is the quality coefficient. GR and
QcoE are defined as below.
Gaussian Peaks
GR =
all peaks
where Gaussian Peaks refer to the peaks that have shapes that follow the Gaussian distribution
(cor estimate ≥ 0.9 and p value ≤ 0.05).
where RCS is the retention time correction score and GS is the grouping score and both are defined by
IPO [11]. Briefly, they are used to evaluate the retention time shift and peak number within a peak
group, respectively. Higher values of RCS and GS mean more stable and reliable peaks have been
included and grouped as a peak feature. CV, the coefficient of variation, refers to the CV of peak
intensity in a group, as described by Sascha K [14]. This index highlights the importance of the peak
intensity within a group. RCS, GS, and CV are normalized using the unit-based method. QcoE is
further normalized to 0 to 1 and by weighted RCS, GS, and CV with 0.4, 0.4, and 0.2, respectively.
The SetPeakParam function provides initial parameters for different platforms including Ultra
Performance Liquid Chromatography (UPLC)- Q-Exactive (Q/E) Orbitrap, UPLC- Quadrupole
Time-of-Flight (Q/TOF), UPLC- Triple TOF (T/TOF), UPLC-Ion trap, UPLC-G2-S, High-performance
liquid chromatography (HPLC)-Q/TOF, HPLC-Ion Trap, HPLC-Orbitrap, and HPLC- Single
Quadrupole (S/Q). The best parameter combination is the one that produces the greatest number
Metabolites 2020, 10, 186 10 of 14
of reliable peaks, whose peak shapes follow a Gaussian distribution and show stable peak groups,
as defined by the formula for Quality Score. The step is performed in parallel using multicores to
accelerate the process.
Categories Methods
QC Sample Independent Combat [37], WaveICA [18], Eigens MS [38]
QC Sample Dependent QC-RLSC [16], ANCOVA [39]
QC Metabolite Dependent RUV-random [40], RUV2 [41], RUVseq [42]
Internal Standards Dependent NOMIS [43], CCMN [44]
(1) All m/z features are matched to potential compounds considering isotopes and adducts. Then,
per compound, all matching m/z features are split into ECs based on whether they match within
an expected retention time window. By default, the retention time window (in seconds) is
calculated as the maximum retention time * 0.02. This results in the initial EC list. Users can
either customize the retention time fraction (default is 0.02) or retention time tolerance in general
in the UpdateInstrumentParameters function (rt_frac and rt_tol, respectively).
(2) ECs are merged if they have the same m/z, matched form/ion, and retention time. This results in
the merged empirical compounds list.
(3) Primary ions are enforced (defined in the UpdateInstrumentParameters function [force_primary
_ion]), only ECs containing at least one primary ion are kept. Primary ions considered are
‘M+H[1+]’, ‘M+Na[1+]’, ‘M−H2O+H[1+]’, ‘M−H[−]’, ‘M−2H[2−]’, ‘M−H2O−H[−]’, ‘M+H [1+]’,
‘M+Na [1+]’, ‘M−H2O+H [1+]’, ‘M−H [1−]’, ‘M−2H [2−]’, and ‘M−H2O−H[1−]’. This produces
the final EC list.
(4) Pathway libraries are converted from “Compound” space to “Empirical Compound” space.
This is done by converting all compounds in each pathway to all empirical compound matches.
Then, the mummichog/GSEA algorithm works as before to calculate pathway enrichment.
(5) To use the updated algorithm, set the version parameter in SetPeakEnrichMethod to “v2”.
Metabolites 2020, 10, 186 11 of 14
References
1. Hartl, J.; Kiefer, P.; Kaczmarczyk, A.; Mittelviefhaus, M.; Meyer, F.; Vonderach, T.; Hattendorf, B.; Jenal, U.;
Vorholt, J.A. Untargeted metabolomics links glutathione to bacterial cell cycle progression. Nat. Metab. 2020,
2, 153–166. [CrossRef]
2. Garza, D.R.; Van Verk, M.C.; Huynen, M.A.; Dutilh, B.E. Towards predicting the environmental metabolome
from metagenomics with a mechanistic model. Nat. Microbiol. 2018, 3, 456–460. [CrossRef]
3. Wang, M.; Carver, J.J.; Phelan, V.; Sanchez, L.M.; Garg, N.; Peng, Y.; Nguyen, N.D.; Watrous, J.; A Kapono, C.;
Luzzatto-Knaan, T.; et al. Sharing and community curation of mass spectrometry data with Global Natural
Products Social Molecular Networking. Nat. Biotechnol. 2016, 34, 828–837. [CrossRef]
4. Uppal, K.; Walker, D.I.; Liu, K.; Li, S.; Go, Y.-M.; Jones, D.P. Computational Metabolomics: A Framework for
the Million Metabolome. Chem. Res. Toxicol. 2016, 29, 1956–1975. [CrossRef]
5. Chong, J.; Yamamoto, M.; Xia, J. MetaboAnalystR 2.0: From Raw Spectra to Biological Insights. Metabolities
2019, 9, 57. [CrossRef]
6. De Bruycker, K.; Welle, A.; Hirth, S.; Blanksby, S.J.; Barner-Kowollik, C. Mass spectrometry as a tool to
advance polymer science. Nat. Rev. Chem. 2020, 1–12. [CrossRef]
7. Albóniga, O.E.; González, O.; Alonso, R.M.; Xu, Y.; Goodacre, R. Optimization of XCMS parameters for
LC–MS metabolomics: An assessment of automated versus manual tuning and its effect on the final results.
Metabolomics 2020, 16, 14. [CrossRef]
8. Nash, W.; Dunn, W.B. From mass to metabolite in human untargeted metabolomics: Recent advances in
annotation of metabolites applying liquid chromatography-mass spectrometry data. TrAC Trends Anal. Chem.
2019, 120, 115324. [CrossRef]
9. Smith, C.A.; Want, E.J.; O’Maille, G.; Abagyan, R.; Siuzdak, G. XCMS: Processing Mass Spectrometry Data
for Metabolite Profiling Using Nonlinear Peak Alignment, Matching, and Identification. Anal. Chem. 2006,
78, 779–787. [CrossRef]
10. Pluskal, T.; Castillo, S.; Villar-Briones, A.; Orešič, M. MZmine 2: Modular framework for processing,
visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinform. 2010, 11, 395.
[CrossRef]
11. Libiseller, G.; Dvorzak, M.; Kleb, U.; Gander, E.; Eisenberg, T.; Madeo, F.; Neumann, S.; Trausinger, G.;
Sinner, F.; Pieber, T.; et al. IPO: A tool for automated optimization of XCMS parameters. BMC Bioinform.
2015, 16, 736. [CrossRef]
12. McLean, C.; Kujawinski, E.B. AutoTuner: High Fidelity and Robust Parameter Selection for Metabolomics
Data Processing. Anal. Chem. 2020. [CrossRef] [PubMed]
13. Zheng, H.; Clausen, M.R.; Dalsgaard, T.K.; Mortensen, G.; Bertram, H. Time-Saving Design of Experiment
Protocol for Optimization of LC-MS Data Processing in Metabolomic Approaches. Anal. Chem. 2013, 85,
7109–7116. [CrossRef]
14. Manier, S.K.; Keller, A.; Meyer, M.R. Automated optimization of XCMS parameters for improved peak
picking of liquid chromatography-mass spectrometry data using the coefficient of variation and parameter
sweeping for untargeted metabolomics. Drug Test. Anal. 2018, 11, 752–761. [CrossRef]
15. Lloyd-Price, J.; Arze, C.; Ananthakrishnan, A.N.; Schirmer, M.; Avila-Pacheco, J.; Poon, T.W.; Andrews, E.;
Ajami, N.J.; Bonham, K.S.; IBDMDB Investigators; et al. Multi-omics of the gut microbial ecosystem in
inflammatory bowel diseases. Nature 2019, 569, 655–662. [CrossRef]
16. Dunn, W.B.; Broadhurst, D.; Begley, P.; Zelená, E.; Francis-McIntyre, S.; Anderson, N.; Brown, M.; Knowles, J.;
Halsall, A.; The Human Serum Metabolome (HUSERMET) Consortium; et al. Procedures for large-scale
metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to
mass spectrometry. Nat. Protoc. 2011, 6, 1060–1083. [CrossRef]
17. Li, B.; Tang, J.; Yang, Q.; Li, S.; Cui, X.; Li, Y.; Chen, Y.; Xue, W.W.; Li, X.; Zhu, F. NOREVA: Normalization
and evaluation of MS-based metabolomics data. Nucleic Acids Res. 2017, 45, W162–W170. [CrossRef]
18. Deng, K.; Zhang, F.; Tan, Q.; Huang, Y.; Song, W.; Rong, Z.; Zhu, Z.-J.; Li, K.; Li, Z. WaveICA: A novel
algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis.
Anal. Chim. Acta 2019, 1061, 60–69. [CrossRef]
19. Domingo-Almenara, X.; Montenegro-Burke, J.R.; Benton, H.P.; Siuzdak, G. Annotation: A Computational
Solution for Streamlining Metabolomics Analysis. Anal. Chem. 2017, 90, 480–489. [CrossRef]
Metabolites 2020, 10, 186 13 of 14
20. Chaleckis, R.; Meister, I.; Zhang, P.; E Wheelock, C. Challenges, progress and promises of metabolite
annotation for LC–MS-based metabolomics. Curr. Opin. Biotechnol. 2019, 55, 44–50. [CrossRef]
21. Li, S.; Park, Y.H.; Duraisingham, S.; Strobel, F.H.; Khan, N.; Soltow, Q.A.; Jones, D.P.; Pulendran, B. Predicting
Network Activity from High Throughput Metabolomics. PLoS Comput. Boil. 2013, 9, e1003123. [CrossRef]
22. Shuzhao, L. Mummichog. Available online: https://github.com/shuzhao-li/mummichog (accessed on 1
March 2020).
23. Chong, J.; Xia, J. MetaboAnalystR: An R package for flexible and reproducible analysis of metabolomics data.
Bioinform. 2018, 34, 4313–4314. [CrossRef]
24. Pang, Z.; Chong, J.; Li, S.; Xia, J. MetaboAnalystR 3.0: Toward an Optimized Workflow for Global
Metabolomics. Metab. 2020, 10, 186. [CrossRef]
25. Li, Z.; Lu, Y.; Guo, Y.; Cao, H.; Wang, Q.; Shui, W. Comprehensive evaluation of untargeted metabolomics
data processing software in feature detection, quantification and discriminating marker selection. Anal. Chim.
Acta 2018, 1029, 50–57. [CrossRef] [PubMed]
26. Simón-Manso, Y.; Lowenthal, M.S.; Kilpatrick, L.E.; Sampson, M.; Telu, K.H.; Rudnick, P.A.; Mallard, W.G.;
Bearden, D.W.; Schock, T.B.; Tchekhovskoi, D.V.; et al. Metabolite Profiling of a NIST Standard Reference
Material for Human Plasma (SRM 1950): GC-MS, LC-MS, NMR, and Clinical Laboratory Analyses, Libraries,
and Web-Based Resources. Anal. Chem. 2013, 85, 11725–11731. [CrossRef] [PubMed]
27. Eliasson, M.; Rännar, S.; Madsen, R.B.; Donten, M.A.; Marsden-Edwards, E.; Moritz, T.; Shockcor, J.P.;
Johansson, E.; Trygg, J. Strategy for Optimizing LC-MS Data Processing in Metabolomics: A Design of
Experiments Approach. Anal. Chem. 2012, 84, 6869–6876. [CrossRef] [PubMed]
28. Cantero, J.M.B.; Flores, E.I.; Alcalde, B.G.; Ortega, E.M.; Muret, F.R.M.; Asenjo, E.C.; Casas, J.A.V. Bile acid
malabsorption in patients with chronic diarrhea and Crohn’s disease. Revista Española de Enfermedades
Digestivas 2018, 111, 40–45. [CrossRef]
29. Uchiyama, K.; Kishi, H.; Komatsu, W.; Nagao, M.; Ohhira, S.; Kobashi, G. Lipid and Bile Acid Dysmetabolism
in Crohn’s Disease. J. Immunol. Res. 2018, 2018, 1–6. [CrossRef]
30. Kuroki, F.; Iida, M.; Tominaga, M.; Matsumoto, T.; Kanamoto, K.; Fujishima, M. Is vitamin E depleted in
Crohn’s disease at initial diagnosis? Dig. Dis. 1994, 12, 248–254. [CrossRef]
31. Narula, N.; Cooray, M.; Anglin, R.; Muqtadir, Z.; Narula, A.; Marshall, J.K. Impact of High-Dose Vitamin
D3 Supplementation in Patients with Crohn’s Disease in Remission: A Pilot Randomized Double-Blind
Controlled Study. Dig. Dis. Sci. 2016, 62, 448–455. [CrossRef]
32. Dionne, S.; Calderon, M.R.; White, J.H.; Memari, B.; Elimrani, I.; Adelson, B.; Piccirillo, C.; Seidman, E.G.
Differential effect of vitamin D on NOD2- and TLR-induced cytokines in Crohn’s disease. Mucosal Immunol.
2014, 7, 1405–1415. [CrossRef] [PubMed]
33. Scoville, E.A.; Allaman, M.M.; Brown, C.T.; Motley, A.K.; Horst, S.N.; Williams, C.S.; Koyama, T.; Zhao, Z.;
Adams, D.W.; Beaulieu, D.B.; et al. Alterations in lipid, amino acid, and energy metabolism distinguish
Crohn’s disease from ulcerative colitis and control subjects by serum metabolomic profiling. Metabolomics
2017, 14, 17. [CrossRef] [PubMed]
34. Kolacek, M.; Paduchova, Z.; Dvorakova, M.; Zitnanova, I.; Cierna, I.; Durackova, Z.; Muchova, J. Effect
of natural polyphenols on thromboxane levels in children with Crohn’s disease. Bratisl. Med J. 2019, 120,
924–928. [CrossRef] [PubMed]
35. Petrey, A.C.; De La Motte, C.A. Hyaluronan in inflammatory bowel disease: Cross-linking inflammation and
coagulation. Matrix Boil. 2019, 314–323. [CrossRef] [PubMed]
36. Ramette, A. Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 2007, 62, 142–160. [CrossRef]
37. Johnson, W.E.; Li, C.; Rabinovic, A. Adjusting batch effects in microarray expression data using empirical
Bayes methods. Biostatistics 2006, 8, 118–127. [CrossRef]
38. Karpievitch, Y.V.; Nikolic, S.B.; Wilson, R.; Sharman, J.E.; Edwards, L.M. Metabolomics Data Normalization
with EigenMS. PLoS ONE 2014, 9, e116221. [CrossRef]
39. Wehrens, R.; Hageman, J.A.; Van Eeuwijk, F.; Kooke, R.; Flood, P.J.; Wijnker, E.; Keurentjes, J.J.; Lommen, A.;
Van Eekelen, H.D.L.M.; Hall, R.D.; et al. Improved batch correction in untargeted MS-based metabolomics.
Metabolomics 2016, 12, 88. [CrossRef]
40. De Livera, A.M.; Sysi-Aho, M.; Jacob, L.; Gagnon-Bartsch, J.A.; Castillo, S.; Simpson, J.A.; Speed, T.P.
Statistical Methods for Handling Unwanted Variation in Metabolomics Data. Anal. Chem. 2015, 87,
3606–3615. [CrossRef]
Metabolites 2020, 10, 186 14 of 14
41. De Livera, A.M.; Dias, D.A.; De Souza, D.P.; Rupasinghe, T.; Pyke, J.; Tull, D.; Roessner, U.; McConville, M.;
Speed, T.P. Normalizing and Integrating Metabolomics Data. Anal. Chem. 2012, 84, 10768–10776. [CrossRef]
42. Risso, D.; Ngai, J.; Speed, T.P.; Dudoit, S. Normalization of RNA-seq data using factor analysis of control
genes or samples. Nat. Biotechnol. 2014, 32, 896–902. [CrossRef] [PubMed]
43. Sysi-Aho, M.; Katajamaa, M.; Yetukuri, L.; Orešič, M. Normalization method for metabolomics data using
optimal selection of multiple internal standards. BMC Bioinform. 2007, 8, 93. [CrossRef]
44. Redestig, H.; Fukushima, A.; Stenlund, H.; Moritz, T.; Arita, M.; Saito, K.; Kusano, M. Compensation for
Systematic Cross-Contribution Improves Normalization of Mass Spectrometry Based Metabolomics Data.
Anal. Chem. 2009, 81, 7974–7980. [CrossRef] [PubMed]
45. Mahieu, N.G.; Patti, G.J. Systems-Level Annotation of a Metabolomics Data Set Reduces 25 000 Features to
Fewer than 1000 Unique Metabolites. Anal. Chem. 2017, 89, 10397–10406. [CrossRef]
46. Chambers, M.C.; MacLean, B.; Burke, R.; Amodei, D.; Ruderman, D.L.; Neumann, S.; Gatto, L.; Fischer, B.;
Pratt, B.; Egertson, J.; et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol.
2012, 30, 918–920. [CrossRef] [PubMed]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).