s13059 015 0637 X
s13059 015 0637 X
Abstract
Background: Pouchitis is common after ileal pouch-anal anastomosis (IPAA) surgery for ulcerative colitis (UC). Similar
to inflammatory bowel disease (IBD), both host genetics and the microbiota are implicated in its pathogenesis. We use
the IPAA model of IBD to associate mucosal host gene expression with mucosal microbiomes and clinical outcomes.
We analyze host transcriptomic data and 16S rRNA gene sequencing data from paired biopsies from IPAA patients with
UC and familial adenomatous polyposis. To achieve power for a genome-wide microbiome-transcriptome association
study, we use principal component analysis for transcript and clade reduction, and identify significant co-variation
between clades and transcripts.
Results: Host transcripts co-vary primarily with biopsy location and inflammation, while microbes co-vary primarily
with antibiotic use. Transcript-microbe associations are surprisingly modest, but the most strongly microbially-associated
host transcript pattern is enriched for complement cascade genes and for the interleukin-12 pathway. Activation of
these host processes is inversely correlated with Sutterella, Akkermansia, Bifidobacteria, and Roseburia abundance, and
positively correlated with Escherichia abundance.
Conclusions: This study quantifies the effects of inflammation, antibiotic use, and biopsy location upon the
microbiome and host transcriptome during pouchitis. Understanding these effects is essential for basic biological
insights as well as for well-designed and adequately-powered studies. Additionally, our study provides a method for
profiling host-microbe interactions with appropriate statistical power using high-throughput sequencing, and suggests
that cross-sectional changes in gut epithelial transcription are not a major component of the host-microbiome
regulatory interface during pouchitis.
by probiotic use [3] but antibiotics have shown somewhat due to the large number of transcripts and operational
mixed results in their efficacy for treating Crohn’s disease taxonomic units (OTUs) observed relative to number of
(CD) and UC [10,11]. This combination of physiological samples, comparable to the analysis methods necessary
similarities and genetic differences makes pouchitis for eQTL or similar studies [19-21]. After employing both
an appropriate model in which to examine the inter- supervised and unsupervised data reduction methods, we
play of inflammatory disease, gut microbes, and host used multivariate linear modeling to identify significant
gene activity [12]. associations between microbes, transcripts, and environ-
While it is known that both host genetics and the ment, as described above, as well as between the overall
microbiome influence the development of pouchitis, pre- patterns of host transcription and microbial composition.
cisely how they interact is less well-understood. Follow- These were primarily related to level of host inflammation
ing IPAA surgery, the mucosal structure of the J-pouch as, for example, the most microbially-associated host tran-
becomes more colon-like; villous structures become script pattern (gPC9) was enriched for complement and
more shallow, mucin expression changes [13], and the IL-12 components in GSEA analysis (Additional file 1C).
microbial community becomes functionally more similar Finally, discriminant modeling of pouchitis outcome by
to a colonic community [14]. It is unclear, however, linear discriminant analysis proved to be ineffective using
whether pouchitis is a recurrence of UC that manifests either microbial composition, transcriptional activity, or
as the host postoperative ileum and microbiome collect- both, in antibiotic-free samples.
ively become more colon-like, or a unique disease with
characteristics of both CD and UC. However, by simul- Results
taneously measuring the microbiome and host transcrip- A multivariate model for co-analysis of host epithelial
tome, we may begin to understand the relationships tissue gene expression, gut tissue-associated microbiome
between microbiota, host, and disease pathogenesis. structure, and cohort characteristics and clinical phenotype
To gain insight into these host-microbe interactions in In order to better understand the relationships between
the epithelial mucosa, we have collected paired host the host and microbiome after IPAA surgery, we mea-
transcriptome and microbial metagenome data from a sured host gene expression by microarray [17] and the
large J-pouch cohort, allowing us to measure whether el- microbial community using the 16S rRNA gene [9] (re-
evated or depleted host epithelial transcripts are associ- ferred to hereafter as 16S) in a large, metadata-rich,
ated with specific microbial clades. While other studies cross-sectional cohort. The cohort consisted of 265 pa-
have applied sequencing to the IPAA microbiome, these tients (51% women) aged between 18 and 78 years (me-
had small numbers of patients [14,15] or did not concur- dian age, 48 years; Table 1). Patients who had surgical
rently examine host gene expression [9,16]. Likewise, management of UC or FAP were included, and all pa-
few studies have comprehensively measured the IPAA tients had IPAA surgery at least 1 year prior to biopsy
host microbiome and transcriptome [17,18]. To the best collection for this study. Patients were classified as FAP
of our knowledge, ours is the first study to examine (Familial Adenomatous Polyposis), No Pouchitis, Acute
both. In this study we use the IPAA model to study the
relationship between the IPAA microbiome and host Table 1 Demographic and clinical characteristics of IPAA
gene expression. We have recruited a large population of cohort
patients having undergone IPAA at Mount Sinai Hos- Patients cohort
pital, a large, tertiary care referral center in Toronto, (n = 265)
Canada. These subjects were identified as part of a wider Age at recruitment, years (mean, range) 47 (18–76)
study investigating the etiology of pouch complications. Gender (% female) 135 (50.5)
Thus, this cohort had a wide variety of both molecular Time since ileostomy closure (mean years, range) 12 (1–40)
and clinical data available for analysis, including detailed Smoking (% at recruitment) 24 (9.2)
information regarding postsurgical outcomes.
Antibiotic use previous month (%) 78 (29.4)
The gut microbiome in this cohort was most affected
Distribution of patients in phenotypic outcome
by inter-individual differences in antibiotic usage, while groups, number (%)
epithelial transcription was more strongly influenced by
FAP 32 (12)
tissue location (pouch vs. pre-pouch ileum). A very small
proportion of microbial or transcriptional variation was NP 72 (27)
explained by host-microbe correspondences, in that asso- CP 27 (10)
ciations of the host transcriptome with the microbiome CDL 34 (13)
were relatively modest in comparison to other effects. We AP 69 (26)
developed a dimensionality reduction process to ensure All recruited patients had IPAA surgery >1 year prior to recruitment except for
appropriate statistical power for testing these associations, two, whose previous diagnoses were pouchitis and FAP, respectively.
Morgan et al. Genome Biology (2015) 16:67 Page 3 of 15
Pouchitis, Chronic Pouchitis, or Crohn’s Disease-Like was observed between these sites in the tissue transcrip-
Inflammation (see Methods for criteria). Most patients tome [9,17]. As expected, we observed that the Bray-Curtis
were biopsied in both the pouch (P) and in the pre-pouch distance for microbial profiles between locations was much
ileum (PPI). After quality control, there was host gene ex- lower than between individuals, indicating that the micro-
pression and microbiome data obtained by microarray bial profiles of pouch and PPI were similar (Additional file
and 16S analysis from a total of 255 samples representing 2). In contrast, the within-site variation in gene expression
204 individuals (Methods, Figure 1); these comprised 196 based on Pearson correlation was nearly as great as the
PPI samples and 59 pouch samples. between-individual variation, indicating that tissue location
(pouch vs. PPI) was a large source of transcriptional variation.
Between-tissue variation is high for host gene expression
but low for the microbiome Dimensionality reduction for well-powered multi-omic
Previous studies in a subset of this cohort demonstrated data integration in a human cohort
that there were few differences in the microbiome between In order to improve power to associate microbial com-
pouch and PPI samples [9], yet a great deal of variability position with host transcriptional activity, we reduced
Figure 1 Overview of data analysis. (A) Data were acquired from a cohort of 265 UC and FAP patients who had IPAA surgery at least 1 year
previously. Biopsies were collected from each patient from both the pre-pouch ileum and j-pouch. The host transcriptome was profiled using
cDNA microarrays, and the microbiome was profiled by sequencing the V4 region of the 16S gene. Data were then subjected to unsupervised
reduction and linear modeling (B), and to supervised reduction and linear discriminant analysis (C). (B) After quality control, data dimensionality
was reduced to maximize statistical power prior to linear modeling. After filtering low-variance transcripts, principal component analysis was used
to create nine gene principal components (gPCs) to account for 50% of the variance in the transcriptome data. OTUs were filtered for minimum
abundance and for presence in at least three samples. PCA was then used to create nine clade principal components (cPCs) explaining 50% of
the variance in OTU data. Multivariate association with linear modeling was then used to test for associations between clades and transcripts that
were significant after adjusting for metadata (inflammation, antibiotic use, and outcome). (C) In an alternative data reduction approach, a list of
449 genes was curated from IBD genome-wide association studies [4] and host genes that physically interact with bacteria [22]. The expression
profiles of these 449 genes were further reduced by k-medoid clustering into 75 medoids, each representing a cluster of genes with similar
expression profiles. Abundant microbial clades were hierarchically clustered, and one representative from each cluster was chosen. Linear
discriminant analysis was used to measure which genes and clades were most discriminant between clinical outcomes. (See also Additional file 1,
Additional file 2, and Additional file 3A to C).
Morgan et al. Genome Biology (2015) 16:67 Page 4 of 15
the dimensionality of both host and microbial features. Through these data reduction methods, we trans-
We first calculated that given a true covariance of 0.5 in formed 19,908 host transcripts and 6,999 observed
the data between microbial abundance and gene expres- OTUs into a total of 138 features. There were nine tran-
sion, it would be possible to perform a maximum of 104 script principal components and nine clade principal
pairwise tests and retain 90% power and an alpha equal components, which had been chosen in an unsupervised
to 0.05 using Bonferroni correction (Additional file 1A). manner. In addition, there were 75 gene medoids and 45
Thus, it was necessary to reduce 19,908 host transcripts clades, which had been selected in a more supervised
and 6,999 observed OTUs to 104 tests, or approximately manner. These 138 features were used for subsequent
100 transcripts and 100 clades of interest. analysis.
We pursued several broad strategies to achieve this
goal. First, we limited our analysis of OTUs to only Tissue location and antibiotic use induce the greatest
those that were both present in multiple individuals and changes in host gene expression and microbiome
abundant, with mean abundance >0.005 (see Methods). composition, respectively
Second, we employed both further unsupervised and After initial gene and clade reduction, in order to pro-
supervised strategies for data reduction prior to our vide an initial visualization of the relationships between
downstream analysis, which included multivariate linear gPCs, cPCs, medoids of interest, inflammation, antibiotic
modeling (which aimed to associate microbes with host use, and clinical outcome, we generated a biplot using
transcripts) and linear discriminant analysis (which aimed the Breadcrumbs package ([23], Figure 2). The strongest
to determine which microbes and transcripts were most data separation effect corresponded to antibiotic use,
discriminant of clinical outcome; Figure 1). which was highly correlated both with the chronic pou-
For unsupervised dimensionality reduction of micro- chitis phenotype and with abundant Enterococcus, which
bial data, after OTUs were abundance-filtered, we ap- is frequently resistant to both metronidazole and cipro-
plied a variance-stabilizing arcsin-square transformation, floxacin [24,25]. In contrast, high expression of gPC8
then used principal component analysis to reduce these was inversely correlated with antibiotic use (Figure 2).
filtered, abundant clades to nine clade principal compo- Crohn’s disease-like inflammation was modestly associ-
nents (cPCs) that explained 50% of observed variance ated with increased Enterobacteriaceae, while high ex-
(Figure 1). The loadings of each cPC represent a pattern pression of gPC9 was associated with more abundant
of highly correlated microbial abundances (Additional Sutterella and beneficial Clostridia, including Rumino-
file 1D; Additional file 3A, B). For supervised clade re- coccus and Blautia. The transcript patterns gPC1, gPC9,
duction, we further reduced the filtered list of microbial and gPC6 were most closely associated with FAP or no
clades by hierarchically clustering it, then selecting the pouchitis (Figure 2).
lowest-mean-abundance representative from each clus- Next, we quantified the proportions of the microbiome
ter. This had the practical effect of removing redundant and total host transcriptome that were affected by tissue
higher-order taxonomic clades from the list of taxa, and location (pouch vs. PPI), clinical outcome, antibiotic use,
it reduced the total number of microbial clades to 45 and inflammation, using univariate association tests of each
(Figure 1). transcript and each clade with the metadata. The extent
Supervised transcript reduction aimed to focus upon of shift is summarized as the percentage of transcriptome
host genes of particular prior interest, specifically those or microbiome features differentially expressed at FDR
that had been previously implicated in IBD, pouchitis, or <0.05 (Table 2; Additional file 3D to I). As previously
host-microbe interactions. Thus, we curated a set of 174 shown [17], host transcripts were most strongly associated
IBD-associated genes [4], 272 bacterially-interacting genes with location, followed by inflammation, with little or no
[22], and 12 pouchitis-related genes from the literature association with antibiotic use. When we subjected the
(Methods), and the expression profiles of these genes were differentially-expressed transcripts between pouch and
clustered into 75 gene medoids, each of which represented PPI to gene ontology enrichment analysis by GOrilla [26],
one or several similarly-expressed genes (Additional file the transcript category most significantly affected was
3C). For unsupervised reduction of transcripts, we first fil- transporters (Additional file 4). The transcriptional differ-
tered all host transcripts to remove the two quantiles of ences between pouch and PPI and are described in detail
genes whose expression varied the least across all subjects. by Kabakchiev et al. [17]. In contrast, differential expres-
Next, we used principal component analysis to reduce the sion of microbial clades was strongly associated with anti-
remaining 11,945 host transcripts to a collection of nine biotics, but very few clades were differentially expressed in
transcript principal components (gPCs) explaining 50% association with inflammation or tissue type (Table 2;
of all observed variance. Again, the loadings of each Additional file 3D, E, I). Large differences in microbial sig-
principal component represent a pattern of highly cor- nificance (for example, 41% of microbes in PPI signifi-
related transcript abundances. cantly affected by antibiotics vs. 2% in pouch) are likely
Morgan et al. Genome Biology (2015) 16:67 Page 5 of 15
Figure 2 Biplot of clades, genes, and study metadata. Non-metric multidimensional scaling (NMDS) of clade abundances was used to position
samples and show samples relatively enriched in specific clades (purple). Arrows represent host transcripts (brown) and metadata (blue), which
include antibiotic use and clinical outcome. Arrow coordinates are determined by averaging the coordinates of each sample containing a specific
metadata, and show the central tendency of the metadata. Samples are color-coded according to inflammation, which ranges from none (green)
to high (red). This figure was created with PPI-only samples.
Figure 3 The relationship between clades and metadata in univariate analysis. The major metadata in the cohort were antibiotic use,
inflammation, tissue (pouch or PPI), and outcome (AP, NP, CP, FAP, or CDL). Univariate linear discriminant analysis effect size analysis was
performed on each of these variables. Antibiotic use was associated the greatest number of perturbations in the microbiome, causing broad
decreases in the Clostridia, Bacteroides, Tenericutes, and Betaproteobacteria, and increases in the Lactobacilliales, Actinobacteria, and
Gammaproteobacteria. Because the antibiotic effect size was very large and affected most clades, LDA effects for inflammation (ring 2), tissue
types (ring 3), and outcomes (rings 4, 5, and 6) were calculated after stratifying for antibiotic use. Color intensity of ring corresponds to the
taxonomic level at which the LDA effect is significant (P <0.05), from phylum (least intense) to genus (most intense).
Morgan et al. Genome Biology (2015) 16:67 Page 7 of 15
antibiotic-free samples (Additional file 5C). Escherichia The only gPCs significantly associated with cPCs were
were positively associated with inflammation, while the gPC8 and gPC9 (q <0.25). The top loadings of gPC9
Actinobacteria were negatively associated. The genus reflected reduced expression of the complement cascade
Sutterella and generally higher levels of Bacteroidetes (CFI, C2, and CFB), interferon regulatory factor 1,
were strongly associated with the outcome FAP even interferon-induced guanylate binding protein, and the
after accounting for antibiotic use. Actinomycetales and leukocyte chemotaxis factor CCL2, indicating that high
Flavobacteria were weakly associated with the PPI. expression of gPC9 may correspond to a lower overall
However, antibiotic effects on the microbiota were state of inflammation. Indeed, when samples were strati-
much stronger and more widespread than effects due to fied by clinical outcome, gPC9 was lowest-expressed in
tissue, inflammation, or clinical outcome. patients with Crohn’s disease-like inflammation, and
highest-expressed in patients with FAP (Additional file
Host gene expression is not a major determinant of 5A). The top loadings of gPC8 included reduced expres-
pouch microbial community composition sion of the lipopolysaccharide-activated p38 MAP kinase
Following data reduction, in order to measure gene-clade Map2K6 and of PLA2G10, which is involved in calcium
associations, we used MaAsLin [5,28] to apply a multivari- and fat-mediated inflammatory signaling and eicosanoid
ate linear model which controlled for the effects of anti- release; thus, gPC8 may also be related to inflammation.
biotic use and inflammation (see Methods). Although However, when stratified by antibiotic use or clinical
pouch and PPI microbiome profiles were highly similar outcome, gPC8 was less differentially expressed than
within the same individual, pouch-PPI transcriptomes were gPC9 (Additional file 5A, Additional file 3B).
not. Under these circumstances, we did not expect any gain A total of four clade cPCs were associated with gPC8
in power for detecting microbiome-transcriptome associa- and gPC9: cPC1, cPC3, cPC6, and cPC8. The loadings of
tions from the addition of PPI samples by inclusion of a cPC1, which accounted for 15% of the observed vari-
random effect for individual to the linear model. Thus, we ance, show several features apparently corresponding to
excluded the relatively small number of paired pouch antibiotic use: increased Enterobacteriaceae abundance,
samples from association testing (Figure 1B). The super- a broad decrease in Bacteroides and Firmicutes, and
vised (curated gene) and unsupervised (gPC/cPC) gene among the highest abundance of Enterococcus (Figure 4).
lists were run through MaAsLin independently; only the Indeed, cPC1 was also more abundant in patients who
unsupervised results were significant (Figure 4). had been taking antibiotics (Additional file 5A). cPC3
Figure 4 Results of multivariate linear modeling. Principal component analysis was used to reduce the data into nine gPCs and cPCs that
explained 50% of total transcriptional and microbial variation. The top six loadings for each cPC (left) and cPC (middle) are shown; orange and
blue indicate increases or decreases in expression, respectively. (Right) MaAsLin [5,28] was used for multivariate linear analysis of associations
between cPCs and gPCs while controlling for the effects of inflammation, tissue location, and antibiotic use. Black/gray scale corresponds to the
significance of the association, while blue / orange corresponds to the direction. See also Additional file 5.
Morgan et al. Genome Biology (2015) 16:67 Page 8 of 15
featured the lowest levels of Bifidobacterium. cPC1 and question, we used linear discriminant analysis (LDA) to
cPC3 were negatively associated with gPC8 and gPC9; identify which combinations of genes and microbes were
thus, these patterns indicate that an antibiotic-signature most able to cross-sectionally segregate clinical outcome
microbiome was associated with higher potentially in- in a training set, then assessed accuracy in cross-validation
flammatory gene expression. However, in contrast to (see Methods). Because antibiotic use was highly asym-
cPC1, cPC3 was not differentially abundant when strati- metrical across clinical outcomes (Additional file 5B) and
fied by outcome or antibiotic use (Additional file 5A). highly predictive of the chronic pouchitis outcome, we
The most remarkable feature of cPC6 loadings was its limited this analysis to those samples without antibiotic
high abundance of Akkermansia, a beneficial mucin- use (Additional file 6).
utilizing microbe [29]; cPC6 was also evenly distributed CDL and CP were best discriminated by this model,
among outcomes and antibiotic use (Additional file 5A). particularly with respect to FAP (Figure 5). However,
cPC8 loadings were noteworthy for their high abun- accuracy was low upon cross-validation (mean AUC
dance of the genus Sutterella, and lower abundance of 0.57 across all outcomes and models, Additional file
cPC8 was associated with chronic pouchitis and anti- 6A), primarily due to the model’s lower discrimination
biotic use (Additional file 5A). While some studies have of AP and NP outcomes. These represent the extremes
associated Sutterella with autism [30,31], in our cohort, of outcome phenotypes in several respects, particu-
it was associated with the healthy FAP outcome (Figure 3). larly with respect to inflammation. While this is also
A recent study also found that Sutterella was decreased in true for antibiotic usage (highly prevalent in CDL and
new-onset Crohn’s disease [32]. rare in FAP), this analysis specifically excluded all
Together, the linear relationship between host tran- samples from antibiotic-treated patients, as these
scripts and microbes was generally modest, representing proved to be very well-discriminated using microbial
approximately 25% of total variance, as variation is driven profiles alone. Indeed, when antibiotic-treated samples
primarily by location and by antibiotic use, respectively. were included, discrimination accuracy for the CDL
However, these data represent the strongest transcript- (AUC 0.67), CP (AUC 0.88), and FAP (AUC 0.71) out-
microbe associations in the cohort after variation from comes was much higher based solely on models of
antibiotic and tissue has been factored out. The strongest microbiome profiles (Additional file 6B). When we ex-
relationships we observed appear to be associated with amined the separation ability of the LDs (Figure 5,
inflammation-associated loadings. Other potential rela- Additional file 6C), they were most discriminant be-
tionships may be better explored with additional samples tween FAP and CDL.
for more statistical power.
Discussion
Using a joint host-microbe model to segregate pouch Although this study and many others have observed that
outcome the mucosal microbiome is highly variable between any
It is of great clinical interest to know whether host tran- two individuals [33,34], the host mucosal transcriptome
scripts, microbes, or some combination thereof can be appears to be a surprisingly small correlate of this vari-
used to distinguish clinical outcomes. To explore this ation in microbial community composition. Here, the
Figure 5 Linear discriminant analysis for clinical outcome. Linear discriminant analysis was used to determine which genes and clades
were most discriminant between clinical outcomes after controlling for antibiotic use. All samples with antibiotic use were removed
prior to analysis, and an LDA fitting model with leave-one-out cross-validation was used. (A, B) The separation of clinical outcomes by LD1 and
LD2. See also (Additional file 6).
Morgan et al. Genome Biology (2015) 16:67 Page 9 of 15
transcriptome showed large variation between the pre- species as consistently overrepresented in IBD as Escher-
pouch ileum and the pouch within the same individual; ichia, and this overrepresentation appears to be a feature
for example, there were significant differences in the ex- of later IBD rather than early IBD [32], it is possible that
pression of amino acid, heme, and metal ion transporters Escherichia is unique among the intestinal microbiota in
(Additional file 4). Despite these large transcriptional its ability to thrive in chronic redox stress. It has re-
differences between tissue locations, the microbial com- cently been shown that nitrate respiration in the in-
munity within each individual remained similar between flamed host gut is at least one of the mechanisms by
these two environments. It is important to note that our which Escherichia may gain an advantage [53]. Alterna-
methods would not resolve sub-genus-level differences tively, our ability to associate microbes with inflamma-
in the mucosal communities, and that mucosal commu- tion may be reduced by perturbations already induced in
nities are likely to show less homogeneity over greater the microbiome as, for example, by pouch surgery prior
biogeographic distances in the GI tract [35]. However, to sampling.
these findings suggest that the composition of an indi- The transcript pattern gPC9 demonstrated the broadest
vidual’s microbiome in adulthood may not be shaped by range of associations identified between host transcription
local transcriptional activity on a long-term basis, but ra- and microbial community structure. Its individual gene
ther by factors such as initial early life colonization loading components (including complement cascade, im-
events [36-39] or diet [40] over time spans relevant for mune cell adhesion, p38 MAP kinase genes) were func-
disease development. Conversely, inter-individual differ- tionally associated with inflammation, but expression of
ences in the microbiome appear not to drive corres- gPC9 itself was not correlated with the clinical inflamma-
pondingly large changes in gene expression. tory score (rs = 0.02) (Additional file 5D). There was a
As expected, the largest effect on the microbiome is slightly greater negative correlation between gPC9 and
antibiotic use. Metronidazole, the antibiotic most com- the abundance of Escherichia (rs = −0.29) (Additional
monly used to treat pouchitis, kills anaerobic bacteria by file 5E). gPC9 was positively associated with cPC6; the
damaging their DNA [41], thus profoundly decreasing most abundant clade in this cPC was Akkermansia,
the populations of Bacteroidetes and Clostridiaceae. The which has previously been associated with improvement
resistance of facultative anaerobes to metronidazole is of metabolic syndrome and DSS colitis [29,54], as well
much more variable; Gardnerella is highly susceptible as increased susceptibility to Salmonella [55]. Taken to-
[41], while Eikenella is highly resistant [42], and resist- gether, sub-clinical inflammation may thus be inducing
ance in Propionibacterium appears to correlate with the a modest but detectable effect on the microbiome de-
presence of nim genes [43]. In our data from the pelvic tectable in these data and in a corresponding host tran-
pouch, the Bacteroidetes and Clostridiaceae appeared to scriptional response, even prior to being histologically
be displaced by facultative anaerobes such as the Lactoba- detectable.
cilliales (for example, Enterococcus and Streptoccus) and Dimensionality reduction was a key component in mak-
gammaproteobacteria (for example, Pasteurellaceae). En- ing this study possible; as with genome-wide association
terococcus genomes are highly recombinant and remark- studies or eQTL associations, naive testing of all possible
able as a reservoir of antibiotic resistance, and thus a hypotheses would require an exceptionally large cohort.
public health concern [44]. Their metronidazole resistance As this is rarely possible in practice, we used principal
is well-known [45-47], and they are becoming increasingly component analysis for unsupervised data reduction, and
resistant to ciprofloxacin [48-50], which is an antibiotic of k-medoids clustering of a curated gene list for supervised
choice for pouchitis. Although the antibiotic-resistance data reduction. Other recent papers [40,56-58] have
profiles of human-associated Pasteurella have been much employed similar clustering-based data reduction strat-
less widely described, a study of swine-associated Pasteur- egies to find signal in relatively small datasets. These
ella strains found that they were highly resistant to metro- results also underscore the importance of designing mi-
nidazole (but not quinolones) [51], which is consistent crobial association studies to include an explicit, up-front
with our observations. power analysis and of having realistic expectations about
We found in univariate analysis that after accounting the effect sizes to be observed; they are likely to be modest
for the effects of antibiotic use, pouch inflammation in- effects, similar to GWAS, rather than large effects. Here,
fluenced relatively few taxa; specifically, it enriched for for example, the strongest microbe-transcript correlations
Escherichia, while there were non-specific inflammation- were approximately 0.2 to 0.3, and it would have been im-
associated decreases in the class Actinobacteria and in possible for significant associations to survive correction
the phylum Bacteroidetes (Figure 3). This is consistent for multiple hypothesis testing if all genes and clades were
with Escherichia’s role as a facultative anaerobe that is simultaneously analyzed. This must be anticipated when
frequently enriched in Crohn’s disease [5,52]. Inasmuch planning studies to ensure they are designed with appro-
as many microbial surveys of CD patients have found no priate sample sizes.
Morgan et al. Genome Biology (2015) 16:67 Page 10 of 15
Finally, discriminating clinical outcome based on the infiltration and ulceration by histology) according to the
microbiome and transcriptome was a complex problem numeric scale described by Tyler et al. [9], and the in-
intractable to LDA analysis. While chronic pouchitis flammation score was defined as the sum of these traits.
could be accurately distinguished after the fact based on A total inflammation score of 14 was possible, but any
antibiotic use (Additional file 6), this is not clinically score over 3 was considered inflamed. Subjects were clas-
useful. Cross-sectional data may particularly limit the sified based on postsurgical phenotypic outcome using a
utility of LDA for exploring this problem, given the high combination of long-term history following surgery and
degree of between-individual variation in microbiota and inflammatory activity at the time of pouch endoscopy, as
the temporal nature of pouchitis and antibiotic use. has been previously described [9]: Familial Adenomatous
While it is clearly not feasible to biopsy subjects repeat- Polyposis (FAP) with no inflammatory complications
edly over short periods of time, it would be reasonable post-surgery; No Pouchitis (NP) with no previous docu-
to study the relationship between microbiota and onset mented episodes of pouchitis and no evidence of pouchitis
of chronic pouchitis with longitudinal stool collection. at the time of pouchoscopy; Acute Pouchitis (AP) based
More stable markers, such as SNPs and serum anti- on historical or current documentation of inflammation of
bodies may also have better utility in classifying postop- the pouch resolving after a single course of antibiotics;
erative pouch outcomes [2]. Chronic Pouchitis (CP), including antibiotic-dependent
and antibiotic-refractory patients who required either pro-
Conclusion longed (>1 month) antibiotic therapy, medical interven-
In conclusion, the primary influences upon host gene tion for pouchitis more than three times per year, or the
expression and the microbiome appeared to be distinct use of second- or third-line medications (5-ASA, steroids,
by several measures in this cohort. We observed modest immunomodulators, biologics); or Crohn’s disease-like
associations between groups of host transcripts involved phenotype (CDL) based on a patient developing an ab-
in inflammation and clades such as Sutterella, Akker- scess or fistula more than 1 year following ileostomy
mansia, and Bifidobacterium, but these were not among closure, or inflammation in the afferent limb or prox-
the greatest sources of variation in community structure imal small bowel. Subject recruitment and study proce-
or gene expression. Instead, the former was greatly influ- dures were approved by and carried out in accordance
enced by pharmaceutical treatments (specifically antibi- with the Research Ethics Board of Mount Sinai Hospital
otics), and the latter by tissue location. Thus, while (Toronto, Canada), with the following tracking informa-
pouchitis clinical outcomes were well-differentiated by tion: 08-0180-E: Genetic, Serologic and Microbial Fac-
naive linear discriminant analysis, this was due almost tors Related to Patterns of Ileal Inflammation (IPAA).
exclusively to differences in antibiotic usage among out- Informed consent was obtained from all subjects imme-
comes and may be a problem better-suited to longitu- diately prior to the initial sample collection in compli-
dinal data. Although we are able to observe significant ance with our Research Ethics Board study approval. All
host-transcript associations, the effect sizes are modest, experimental methods are compliant with the Helsinki
indicating that other factors, such as initial host Declaration.
colonization and diet, are also significant influencers of For this cohort, antibiotic use was reported as ‘true’ if
microbial composition. To distinguish these effects, we patients had taken antibiotics in the 30 days prior to bi-
will need additional data from well-powered studies. opsy collections. The vast majority of antibiotic use was
for pouchitis, and was either metronidazole, ciprofloxa-
Methods cin, or a combination of both. A very small number of
Patient cohort pouch patients (two to three) were on vancomycin in-
Patients having undergone proctocolectomy with ileal stead of more standard antibiotics. Antibiotic use was
pouch-anal anastomosis (IPAA) for treatment of UC or also reported as ‘true’ if the patient had taken antibiotic
FAP at least 1 year prior to enrollment, were recruited at for a non-IBD purpose in the past 30 days (for example,
Mount Sinai Hospital (Toronto, Canada). Individuals amoxicillin for oral surgery).
with a diagnosis of CD were excluded. Patients under-
went pouch endoscopy with biopsy, and completed a Sample collection
questionnaire encompassing demographic and clinical Tissue biopsies were obtained from the mid-portion of
elements. Physicians documented the appearance of the the pouch and the PPI during pouchoscopy. One biopsy
pouch using specific evaluation criteria outlined in the from each site was immediately placed into a sterile, empty
pouchitis activity score (PAS). Specifically, to numeric- freezer vial and snap frozen in liquid nitrogen for subse-
ally score inflammation, the severity of objective traits quent microbial analysis. Two additional biopsies from
was graded (erythema, friability, and ulceration at the each site were placed into RNAlater (Qiagen) for host
time of endoscopy, and polymorphonuclear leukocyte transcriptomic analysis. Study samples were stored
Morgan et al. Genome Biology (2015) 16:67 Page 11 of 15
long-term at −80°C. Two biopsies were also taken for tissue homogenizer (MP Biomedicals, Santa Ana, CA,
histological analysis as per standard clinical practice at USA) set to speed 6 for 30 s. Additional enzymatic lysis
our institution. Inflammation was measured according was conducted through the addition of proteinase K (as
to the objective and location-specific components from per the Qiagen protocol) and incubation of samples at
the pouchitis activity score (PAS) [59] as previously de- 95°C.
scribed [9,17].
16S profiling and sequencing
Host RNA extraction and microarray gene expression
The 16S gene dataset consists of Illumina MiSeq sequences
analysis
targeting the V4 variable region. Detailed protocols used
The biopsy samples were immediately suspended in
for 16S amplification and sequencing are as previously de-
RNAlater (QIAGEN) stabilizing reagent upon collection
scribed [63]. In brief, genomic DNA was subjected to 16S
to deter RNA degradation and were stored at −80°C.
amplifications using primers designed to incorporate both
Total RNA was extracted with the miRNeasy Mini Kit
the Illumina adapters and a sample barcode sequence,
(Qiagen) in two batches. A NanoDrop 1000 (Thermo
allowing directional sequencing that covers variable region
Fisher Scientific) and Bioanalyzer 2100 (Agilent) were
V4 (Primers: 515 F [GTGCCAGCMGCCGCGGTAA]
used to determine RNA concentration, quality and pur-
and 806R [GGACTACHVGGGTWTCTAAT]). PCR mix-
ity. Only samples with a RNA integrity number (RIN)
tures contained 10 μL of diluted template (1:50), 10 μL of
greater than or equal to 5.0 were considered for further
HotMasterMix with the HotMaster Taq DNA Polymerase
analysis [60].
(5 Prime), and 5 μL of primer mix (2 μM of each primer).
From samples that passed quality control, 400 ng of
The cycling conditions consisted of an initial denaturation
RNA was amplified with the Ambion WT Expression Kit
of 94°C for 3 min, followed by 30 cycles of denaturation at
(Ambion). A total of 5.5 μg of cDNA per sample were
94°C for 45 s, annealing at 50°C for 60 s, extension at 72°C
then labeled and hybridized to Human Gene 1.0 ST arrays
for 5 min, and a final extension at 72°C for 10 min.
(Affymetrix) in a Fluidics Station 450 (Affymetrix), utiliz-
Amplicons were quantified on the Caliper LabChipGX
ing standard protocol FS450_0007 with the GeneChip
(PerkinElmer, Waltham, MA, USA), pooled in equimo-
WT Terminal Labeling and Controls Kit (Affymetrix) and
lar concentrations, and size selected (375–425 bp) on
GeneChip Hybridization, Wash, and Stain Kit (Affyme-
the Pippin Prep (Sage Sciences, Beverly, MA, USA) to
trix). The GeneChip Scanner 3000 (Affymetrix) was used
reduce non-specific amplification products from host
to scan the completed arrays. Summarized probe cell in-
DNA. Finally, an Agilent Bioanalyzer (2100 DNA 1000
tensity data were generated with an Affymetrix GeneChip
chips) (Agilent Technologies, Santa Clara, CA, USA)
Command Console. Finally, probe-level summarization
was used to determine the final concentration and size
files were produced, and the data were background-
distribution of the library. Sequencing was performed
adjusted, normalized, and log-transformed with the robust
on the Illumina MiSeq v2 platform, according to the
multiarray average (RMA) algorithm in Affymetrix Ex-
manufacturer’s specifications, with addition of 5% PhiX,
pression Console [61].
generating paired-end reads of 175 bp in length in each
The empirical Bayes (EB) method described by Johnson
direction.
et al. [62] was applied to the normalized data to correct
for batch effects which may have resulted from a non-
linear sample extraction and microarray processing sched- Bioinformatic processing of sequences
ule. Finally, duplicate and ambiguous Affymetrix probesets The overlapping paired-end reads were stitched together
(Release 32) as well as those no longer mapping to a gene (approximately 97 bp overlap), size selected to reduce
in the current human genome build (GRCh37.p5) were re- non-specific amplification products from host DNA
moved from further analysis. This filter retained 19,908 (225–275 bp), and further processed in a data curation
probesets from the original 33,297. pipeline implemented in QIIME 1.5.0 as pick_referen-
ce_otus.py [64]. In brief, this pipeline picks OTUs using
Microbial DNA extraction and sequencing a reference-based method and constructs an OTU table.
Community DNA extraction Taxonomy is assigned using the Greengenes predefined
Total microbial DNA was extracted from biopsies in two taxonomy map of reference sequence OTUs to tax-
batches using the DNeasy blood and tissue kit (Qiagen), onomy [65]. The resulting OTU tables are checked for
with an additional bead beating step to ensure adequate mislabeling [66] and contamination [67], and further mi-
cell lysis. Bead beating was performed using both 5 mm crobial community analysis and visualizations. A mean se-
stainless steel beads to disrupt tissue (Qiagen 69989) quence depth of 29,914 sequences/sample was obtained,
and glass beads (Mo-Bio, Mississauga, ON, Canada) to and samples with less than 3,000 filtered sequences were
disrupt bacterial cells, in conjunction with the FastPrep excluded from analysis.
Morgan et al. Genome Biology (2015) 16:67 Page 12 of 15
Power calculations and gene/microbial feature selection reduce its dimensionality. In a first wave of filtering,
Initial power calculation 174 genes prioritized as IBD-associated in the most re-
Power estimation was performed by simulation of corre- cent and largest genome wide association study of the
lated variable pairs with standard normal distribution and disease [4] were selected for further statistical analysis.
a sample size of 196. The 90th percentile of raw P values In addition, 272 genes which were previously shown to
of the Spearman correlation test was calculated as a func- physically interact with bacterial partners from Bacil-
tion of true covariance of the variables. The number of al- lus anthracis, Francisella tularensis, and Yersinia pestis
lowable tests for 90% power and 5% type I error rate was based on yeast two-hybrid experiments [22] were also
estimated by Bonferroni correction, 0.05 divided by the chosen. Preselected genes were then aggregated into
90th percentile calculated as above. The number of allow- 75 clusters based on their co-expression pattern using
able tests increases with the assumed true covariance of the Pearson metric and semi-supervised Ward cluster-
the variable pair, but is approximately 100 for a true co- ing [69]. A representative gene was selected from each
variance of 0.35, and 105 for a true covariance of 0.45 cluster by the k-medoids algorithm [70]. Finally, due to
(Additional file 1A). This analysis was performed by the their importance to the pathogenesis of IBD, the fol-
associated corpower.Rnw script. lowing genes were manually curated and added to the
existing medoids: NOD2, IL23R, PTPN22, FUT2,
NFKB1, MMEL1, IFNG, IL10, IL1RN, CD14, IL8, TLR1,
Microbial feature reduction
TNF, and NOX3.
The data were first filtered by removing OTUs without
Unsupervised transcript reduction: Principal compo-
at least three counts in at least three samples. Next,
nent analysis of host transcriptome data was performed
OTUs were hierarchically summed at all taxonomic
on all PPI and pouch samples, keeping a sufficient num-
levels, and these counts were normalized to relative
ber of components to account for 50% of variance. The
abundance. Features were then filtered again to require
only filter applied to whole-transcriptome data for PCA
a mean abundance across all samples of at least 0.005,
was to remove transcripts with variance below the me-
and an abundance of 0.05 in at least one sample. This
dian variance of all transcripts (for example, filtering
left 129 features, to which we applied unsupervised
out the least-invariant two quantiles of transcripts). In-
(PCA) and supervised (hierarchical clustering) reduction.
terpretation of the principal component axes was
For PCA, a variance-stabilizing arcsine square-root
assisted by inspection of the top 25 genes by magnitude
transformation was applied. Next, standard Principal
of loadings, and by Enrichment Analysis using the wil-
Component Analysis of scaled features was used to cap-
coxGST function of the limma package with ‘C2.CP.bio-
ture major axes of variation, keeping enough compo-
carta’ v3.1 mSigDB pathways [71] (Additional file 1C).
nents to account for 50% of variance. The previously
This analysis was performed by the associated PCA.Rnw
documented ‘horseshoe effect’ in Principal Component
script.
Analysis of compositional data [68] was present (Additional
file 1B) but was not so extreme as to overly diminish the
utility of Principal Component Analysis. Interpretation of
Major phenotypic associations of the microbiome and
microbial principal components was guided by a loadings
host transcriptome
plot (Figure 1B, Additional file 3A and B, Additional file
A linear model was fit for each microbial clade and for
1E). PCA reduced the 129 clades to nine cPCs. For super-
each transcript separately, with respect to antibiotics
vised feature reduction to allow pairwise comparison to
(yes/no), outcome (NP, P, CDL, AP, and FAP), inflam-
host transcriptome features, we performed hierarchical
mation (0–13), and tissue location (pouch/PPI), using
clustering of clades with abundance of at least 10 to 4 in
the lm R function. Nominal statistical significance of
10% of samples, 1 minus Pearson correlation dissimilarity
each feature was assessed by analysis of variance F-test
measure, and default options for the hclust R function,
of the fit. For the effect of tissue location, all 255 pouch
then finally cutting the tree at height 0.5 and selecting the
and pre-pouch ileum (PPI) samples were used; for anti-
feature with smallest mean. This approach was confirmed
biotics, inflammation, outcome, and the PPI samples
visually to select reasonable microbial representatives (Fig-
from each of the 196 individuals were used. The latter
ure 1C). This analysis was performed by the associated
tests were repeated using all samples, with a random
preparePCLfiles.Rnw script. It reduced the total number
intercept for individual, using the glmmPQL function
of features from 129 to 45.
of the MASS R package. This analysis was performed
for whole transcriptome data, and for all microbial
Host transcriptome feature reduction clades passing the ‘3 counts in 3 samples’ filter de-
Supervised feature reduction: Targeted gene selection scribed above, by the associated sourcesOfVaria-
was applied to the transcriptomic data in order to tion.Rnw script.
Morgan et al. Genome Biology (2015) 16:67 Page 13 of 15
Using biplots to visualize associations between function from the pROC library, using the 10-fold cross-
transcripts, clades, and metadata validated posterior probabilities from the lda function of
We used the scriptBiplotTSV.R script from the Bread- the MASS library. Ninety-five percent confidence inter-
crumbs software package [23] to generate a biplot vals were estimated using the ci function from the pROC
showing the relationships between clades, metadata, package (Additional file 6).
and transcripts of interest (Figure 2). This script plots
a tsv (transposed PCL) file as a biplot. The positioning Data availability
of sample markers and clade text is generated by non- 16S sequence data for this project have been filtered to
metric multidimensional scaling (R Vegan package). remove human sequences and are publicly available as
The metadata are represented by arrows, labeled by Bioproject PRJNA269954; dbGaP accession number:
text at the head of the arrow. Arrow coordinates are phs000659.v1.p1 contains a subset of these data.
determined by the coordinates of the samples and Microarray data are available from GEO as GSE65270;
show the central tendency of the metadata. GSE40292 contains a subset of these data. Metadata
are available at [74].
Using multivariate analysis with linear modeling to model
host/microbe metadata associations Additional files
MaAsLin (multivariate analysis with linear modeling)
[5,28] was used to find associations between microbes, Additional file 1: Figure S1. Data reduction. (A) (Top) 90th percentile
transcripts, and metadata. As many of the strongest uni- of raw P values of Spearman correlation test, as a function of true
covariance between the variables. Variables are standard normal
variate associations in this dataset (for example, chronic distributed, so covariance equals Pearson product moment. (Bottom)
pouchitis and abundant Enterococcus) would be obviously Number of tests possible to retain 90% power and alpha equal to 0.05,
due to either antibiotic use or inflammation, and thus of using Bonferroni correction. Variables are standard normal distributed, so
covariance equals Pearson product moment. (B) Principal component
less interest than associations which were not directly at- analysis for cPC1 and cPC2. The documented ‘horseshoe effect’ is
tributable to either, we used a multivariate linear model to noticeable, but not extreme. (C) Gene set enrichment analysis (GSEA) was
correct for antibiotic use, FAP/nonFAP outcome, and in- used to detect categories for which the gPCs were enriched and assist in
interpretation (see Methods). Only gPCs and gene sets with at least one
flammation score. The model used was gene ~ clade + significant P value after Bonferroni correction (q <0.1) are shown. (D) The
antibiotic + ISCORE + OutcomeFAP/nonFAP, with arcsin- top 25 loading values for each clade principal component. The blue/
square root variance stabilizing transformation of clade. orange scale bar corresponds to a decrease or increase in the relative
abundance of the clade in the principal component.
Bonferroni false discovery correction was used with a
Additional file 2: Figure S2. The transcriptome and microbiome in
threshold of q <0.25. Input files used for MaAsLin are paired samples. The Pearson correlation was calculated for host
available from [72]. transcripts in all paired pouch-PPI samples, and the Bray-Curtis distance
was calculated for all microbiome samples. Ordinations were calculated
for Bray-Curtis and for (1-Pearson correlation). Paired samples are
Discriminant assessment of host/microbe interactions in connected with a line on ordinations. Plots show the difference between
pouchitis outcomes samples between locations for genes (top) and for microbes (bottom).
Linear discriminant analysis (LDA) was used to discrim- Additional file 3 Supplementary data tables. A: The top 25 loadings
inate clinical outcome (AP, CP, NP, FAP, CDL) based on for each clade principal component (cPC). B: The top 25 loadings for
each gene (host transcript) principal component (gPC). C: The list of 75
expression patterns of 75 gene medoids and 45 clades. gene medoids, each of which represents a cluster of genes with a similar
As there were many more PPI samples (196) than pouch expression profile. D: List of P values of differential expression in pouch
samples (59), to ensure all samples were equally repre- for all metadata for all clades. E: List of P values of differential expression
in pre-pouch ileum for all metadata for all clades. F: List of P values of
sented, only PPI samples were used. Because antibiotic differential expression in pouch for all metadata for all genes. G: List of
use was not uniformly distributed across outcomes, we P values of differential expression in pre-pouch ileum for all metadata for
removed all samples with recent antibiotic use for dis- all genes. H: List of P values of differential expression in all samples for all
metadata for all genes, calculated using random intercept of individual. I:
crimination of clinical outcome. This left 55 AP samples, List of P values of differential expression in all samples for all metadata
18 CDL samples, five CP samples, 20 FAP samples, and for all clades, calculated using random intercept of individual.
46 NP samples for LDA analysis. Discrimination models Additional file 4: Figure S3. GOrilla analysis. GOrilla was used to
were fit with three different sets of covariates: transcripts measure for functional enrichment between genes differentially
expressed in pouch and pre-pouch ileum (Additional file 3). There was a
only, clades only, and transcripts plus clades together. major difference in transporter expression between the two sites.
Model fitting and assessment of discrimination by 10- Additional file 5 Figure S4. Data stratification. (A) cPC1, cPC3, cPC6,
fold cross validation were performed using the R pack- cPC8, gPC8, and gPC9 were the principal components that significantly
age ‘caret,’ within the script ldaprediction.Rnw from [73]. associated with one another in multivariate linear analysis. This figure
shows the expression of each of these components in PPI samples when
Ten-fold cross-validation was used to calculate accur- stratified by antibiotic use and by clinical outcome. (B) This figure shows
acy of discrimination. For each clinical outcome and the distribution of antibiotic use in the cohort, stratified by sample type
each model (transcripts only, clades only, and clades + (pouch vs. PPI) and clinical outcome. (C) The distribution of Enterococcacaeae
in samples, stratified by clinical outcome and antibiotic use. It is abundant
transcripts), a ROC plot was constructed using the roc
Morgan et al. Genome Biology (2015) 16:67 Page 14 of 15
almost exclusively in chronic pouchitis patients with recent antibiotic use. invasive Escherichia coli of novel phylogeny relative to depletion of
(D) gPC9, plotted relative to patient histological inflammation score. (E) Clostridiales in Crohn's disease involving the ileum. Isme J. 2007;1:403–18.
Escherichia abundance, plotted relative to gPC9. 7. Joossens M, Huys G, Cnockaert M, De Preter V, Verbeke K, Rutgeerts P, et al.
Dysbiosis of the faecal microbiota in patients with Crohn’s disease and their
Additional file 6: Figure S5. Linear discriminant analysis for unaffected relatives. Gut. 2011;60:631–7.
discrimination of clinical outcome. (A) Summary of LDA prediction for 8. Ott SJ, Musfeldt M, Wenderoth DF, Hampe J, Brant O, Folsch UR, et al.
samples without antibiotics. Top: Areas under the curve for LDA Reduction in diversity of the colonic mucosa associated bacterial microflora
discrimination models. A single model was fit with 5-level response. in patients with active inflammatory bowel disease. Gut. 2004;53:685–93.
Ten-fold cross-validated class probabilities for each level (AP, CDL, CP, NP, 9. Tyler AD, Knox N, Kabakchiev B, Milgrom R, Kirsch R, Cohen Z, et al.
FAP) were used to construct ROC plots for that outcome. Ninety-five Characterization of the gut-associated microbiome in inflammatory pouch
percent confidence intervals were estimated using the ci function from complications following ileal pouch-anal anastomosis. PLoS One.
the pROC package. Bottom: Individual ROC plots for each possible outcome, 2013;8:e66934.
using genes only, clades only, and genes + clades. For each model, the ROC 10. Khan KJ, Ullman TA, Ford AC, Abreu MT, Abadir A, Marshall JK, et al.
plot was constructed using the roc function from the pROC library, from Antibiotic therapy in inflammatory bowel disease: a systematic review and
10-fold cross-validated posterior probabilities from the lda function of the meta-analysis. Am J Gastroenterol. 2011;106:661–73.
MASS library. (B) Summary of LDA prediction using all samples (with and 11. Wang SL, Wang ZR, Yang CQ. Meta-analysis of broad-spectrum antibiotic
without antibiotics). These were calculated as described in (A). (C) LDA score therapy in patients with active inflammatory bowel disease. Exp Ther Med.
scatterplots for the phenotypes show which LDAs discriminate for which 2012;4:1051–6.
phenotypes. Only the scatterplots for antibiotic-free samples are shown.
12. Wu H, Shen B. Pouchitis: lessons for inflammatory bowel disease. Curr Opin
Scatterplots for genes (left) and for clades (right) are shown. Scatterplots are
Gastroenterol. 2009;25:314–22.
colored for visualization. (D) Linear discriminant loadings plots show which
13. de Silva HJ, Millard PR, Kettlewell M, Mortensen NJ, Prince C, Jewell DP.
genes and microbes are most elevated or decreased in LDs 1 to 4 (and are
Mucosal characteristics of pelvic ileal pouches. Gut. 1991;32:61–5.
thus most discriminant).
14. Young VB, Raffals LH, Huse SM, Vital M, Dai D, Schloss PD, et al. Multiphasic
analysis of the temporal development of the distal gut microbiota in
Competing interests patients following ileal pouch anal anastomosis. Microbiome. 2013;1:9.
CH is a member of the scientific advisory board for SeresHealth™. The other 15. McLaughlin SD, Walker AW, Churcher C, Clark SK, Tekkis PP, Johnson MW,
authors declare that they have no competing interests. et al. The bacteriology of pouchitis: a molecular phylogenetic analysis using
16S rRNA gene cloning and sequencing. Ann Surg. 2010;252:90–8.
Authors’ contributions 16. Zella GC, Hait EJ, Glavan T, Gevers D, Ward DV, Kitts CL, et al. Distinct
CH, RJX, MS, and DG conceived and designed the study. BK and AT collected microbiome in pouchitis compared to healthy pouches in ulcerative colitis
the experimental data. DG generated sequencing data. JS coordinated and familial adenomatous polyposis. Inflamm Bowel Dis. 2011;17:1092–100.
patient data. RM classified patient phenotypes. XCM and LW analyzed data. 17. Kabakchiev B, Tyler A, Stempak JM, Milgrom R, Silverberg MS.
XCM, LW, TT, and BK performed computational analysis. XCM, CH, MS, BK, AT, Downregulation of expression of xenobiotic efflux genes is associated with
LW, and RJX interpreted the data. XCM and CH drafted the manuscript. All pelvic pouch inflammation in ulcerative colitis. Inflamm Bowel Dis.
authors have read and approved the manuscript for publication. 2014;20:1157–64.
18. Ben-Shachar S, Yanai H, Baram L, Elad H, Meirovithz E, Ofer A, et al. Gene
Acknowledgements expression profiles of ileal inflammatory bowel disease correlate with
This study was supported by grants NIH R01HG005969, DBI-1053486 (National disease phenotype and advance understanding of its immunopathogenesis.
Science Foundation), and PLF-5972-GD (Danone Research) to CH, and funding Inflamm Bowel Dis. 2013;19:2509–21.
from Crohn’s and Colitis Canada, CIHR, and Zane Cohen Centre for Digestive 19. Ringner M. What is principal component analysis? Nat Biotech. 2008;26:303–4.
Diseases and Mount Sinai Hospital to MS. 20. Alter O, Brown PO, Botstein D. Singular value decomposition for genome-wide
expression data processing and modeling. Proc Natl Acad Sci U S A.
Author details 2000;97:10101–6.
1
Department of Biostatistics, Harvard T. H. Chan School of Public Health, 655 21. Biswas S, Storey JD, Akey JM. Mapping gene expression quantitative trait
Huntington Ave, Boston, MA 02115, USA. 2The Broad Institute of MIT and loci by singular value decomposition and independent component analysis.
Harvard, 415 Main St, Cambridge, MA 02142, USA. 3Mount Sinai Hospital, BMC Bioinformatics. 2008;9:244.
Zane Cohen Centre for Digestive Diseases, University of Toronto, 600 22. Dyer MD, Neff C, Dufford M, Rivera CG, Shattuck D, Bassaganya-Riera J, et al.
University Ave, Toronto, ON M5G 1X5, Canada. 4City University of New York The human-bacterial pathogen protein interaction networks of Bacillus
School of Public Health, Hunter College, 2180 3rd Ave Rm 538, New York, NY anthracis, Francisella tularensis, and Yersinia pestis. PLoS One.
10035-4003, USA. 2010;5, e12089.
23. Breadcrumbs. [http://huttenhower.sph.harvard.edu/biobakery/breadcrumbs].
Received: 9 October 2014 Accepted: 18 March 2015 24. Nagy E, Foldes J. Inactivation of metronidazole by Enterococcus faecalis. J
Antimicrob Chemother. 1991;27:63–70.
25. Perry JD, Ford M, Gould FK. Susceptibility of enterococci to ciprofloxacin. J
References Antimicrob Chemother. 1994;34:297–8.
1. Landy J, Al-Hassi HO, McLaughlin SD, Knight SC, Ciclitira PJ, Nicholls RJ, et al. 26. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: a tool for discovery
Etiology of pouchitis. Inflamm Bowel Dis. 2012;18:1146–55. and visualization of enriched GO terms in ranked gene lists. BMC
2. Tyler AD, Milgrom R, Stempak JM, Xu W, Brumell JH, Muise AM, et al. The Bioinformatics. 2009;10:48.
NOD2insC polymorphism is associated with worse outcome following ileal 27. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al.
pouch-anal anastomosis for ulcerative colitis. Gut. 2013;62:1433–9. Metagenomic biomarker discovery and explanation. Genome Biol. 2011;12:R60.
3. McLaughlin SD, Clark SK, Tekkis PP, Nicholls RJ, Ciclitira PJ. The bacterial 28. MaAsLiN. [http://huttenhower.sph.harvard.edu/maaslin].
pathogenesis and treatment of pouchitis. Therap Adv Gastroenterol. 29. Everard A, Belzer C, Geurts L, Ouwerkerk JP, Druart C, Bindels LB, et al. Cross-
2010;3:335–48. talk between Akkermansia muciniphila and intestinal epithelium controls
4. Jostins L, Ripke S, Weersma RK, Duerr RH, McGovern DP, Hui KY, et al. diet-induced obesity. Proc Natl Acad Sci U S A. 2013;110:9066–71.
Host-microbe interactions have shaped the genetic architecture of 30. Williams BL, Hornig M, Parekh T, Lipkin WI. Application of novel PCR-based
inflammatory bowel disease. Nature. 2012;491:119–24. methods for detection, quantitation, and phylogenetic characterization of
5. Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, et al. Sutterella species in intestinal biopsy samples from children with autism
Dysfunction of the intestinal microbiome in inflammatory bowel disease and gastrointestinal disturbances. mBio. 2012;3:e00261–11.
and treatment. Genome Biol. 2012;13:R79. 31. Wang L, Christophersen CT, Sorich MJ, Gerber JP, Angley MT, Conlon MA.
6. Baumgart M, Dogan B, Rishniw M, Weitzman G, Bosworth B, Yantiss R, et al. Increased abundance of Sutterella spp. and Ruminococcus torques in feces
Culture independent analysis of ileal mucosa reveals a selective increase in of children with autism spectrum disorder. Mol Autism. 2013;4:42.
Morgan et al. Genome Biology (2015) 16:67 Page 15 of 15
32. Gevers D, Kugathasan S, Denson LA, Vazquez-Baeza Y, Van Treuren W, Ren 54. Kang CS, Ban M, Choi EJ, Moon HG, Jeon JS, Kim DK, et al. Extracellular
B, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell vesicles derived from gut microbiota, especially Akkermansia muciniphila,
Host Microbe. 2014;15:382–92. protect the progression of dextran sulfate sodium-induced colitis. PLoS One.
33. Human Microbiome Project C. Structure, function and diversity of the 2013;8:e76520.
healthy human microbiome. Nature. 2012;486:207–14. 55. Ganesh BP, Klopfleisch R, Loh G, Blaut M. Commensal Akkermansia muciniphila
34. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human exacerbates gut inflammation in Salmonella Typhimurium-infected gnotobiotic
gut microbial gene catalogue established by metagenomic sequencing. mice. PLoS One. 2013;8:e74963.
Nature. 2010;464:59–65. 56. Race AM, Steven RT, Palmer AD, Styles IB, Bunch J. Memory efficient
35. Yasuda K, Oh K, Ren B, Tickle TL, Franzosa EA, Wachtman LM, et al. principal component analysis for the dimensionality reduction of large mass
Biogeography of the intestinal mucosal and lumenal microbiome in the spectrometry imaging data sets. Anal Chem. 2013;85:3071–8.
rhesus macaque. Cell Host Microbe. 2015;17:385–91. 57. Engreitz JM, Daigle Jr BJ, Marshall JJ, Altman RB. Independent component
36. Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF. analysis: mining microarray data for fundamental human gene expression
Time series community genomics analysis reveals rapid shifts in bacterial modules. J Biomed Inform. 2010;43:932–44.
species, strains, and phage during infant gut colonization. Genome Res. 58. Korkeila EA, Sundstrom J, Pyrhonen S, Syrjanen K. Carbonic anhydrase IX,
2013;23:111–20. hypoxia-inducible factor-1alpha, ezrin and glucose transporter-1 as predictors
37. Koenig JE, Spor A, Scalfone N, Fricker AD, Stombaugh J, Knight R, et al. of disease outcome in rectal cancer: multivariate Cox survival models following
Succession of microbial consortia in the developing infant gut microbiome. data reduction by principal component analysis of the clinicopathological
Proc Natl Acad Sci U S A. 2011;108:4578–85. predictors. Anticancer Res. 2011;31:4529–35.
38. Makino H, Kushiro A, Ishikawa E, Kubota H, Gawad A, Sakai T, et al. 59. Heuschen UA, Autschbach F, Allemeyer EH, Zollinger AM, Heuschen G,
Mother-to-infant transmission of intestinal bifidobacterial strains has Uehlein T, et al. Long-term follow-up after ileoanal pouch procedure:
an impact on the early development of vaginally delivered infant's algorithm for diagnosis, classification, and management of pouchitis. Dis
microbiota. PLoS One. 2013;8:e78331. Colon Rectum. 2001;44:487–99.
39. Kostic AD, Gevers D, Siljander H, Vatanen T, Hyotylainen T, Hamalainen AM, 60. Schroeder A, Mueller O, Stocker S, Salowsky R, Leiber M, Gassmann M, et al.
et al. The dynamics of the human infant gut microbiome in The RIN: an RNA integrity number for assigning integrity values to RNA
development and in progression toward type 1 diabetes. Cell Host measurements. BMC Mol Biol. 2006;7:3.
Microbe. 2015;17:260–73. 61. Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, et al.
40. David LA, Maurice CF, Carmody RN, Gootenberg DB, Button JE, Wolfe BE, Exploration, normalization, and summaries of high density oligonucleotide
et al. Diet rapidly and reproducibly alters the human gut microbiome. array probe level data. Biostatistics. 2003;4:249–64.
Nature. 2014;505:559–63. 62. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray
41. Lofmark S, Edlund C, Nord CE. Metronidazole is still the drug of choice for expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.
treatment of anaerobic infections. Clin Infect Dis. 2010;50:S16–23. 63. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, Fierer N, et al.
42. Sheng WS, Hsueh PR, Hung CC, Teng LJ, Chen YC, Luh KT. Clinical features Ultra-high-throughput microbial community analysis on the Illumina HiSeq
of patients with invasive Eikenella corrodens infections and microbiological and MiSeq platforms. ISME J. 2012;6:1621–4.
characteristics of the causative isolates. Eur J Clin Microbiol Infect Dis. 64. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello
2001;20:231–6. EK, et al. QIIME allows analysis of high-throughput community sequencing
43. Lubbe MM, Stanley K, Chalkley LJ. Prevalence of nim genes in anaerobic/ data. Nat Methods. 2010;7:335–6.
facultative anaerobic bacteria isolated in South Africa. FEMS Microbiol Lett. 65. McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A,
1999;172:79–83. et al. An improved Greengenes taxonomy with explicit ranks for ecological
44. de Been M, van Schaik W, Cheng L, Corander J, Willems RJ. Recent and evolutionary analyses of bacteria and archaea. ISME J. 2012;6:610–8.
recombination events in the core genome are associated with adaptive 66. Knights D, Kuczynski J, Koren O, Ley RE, Field D, Knight R, et al. Supervised
evolution in Enterococcus faecium. Genome Biol Evol. 2013;5:1524–35. classification of microbiota mitigates mislabeling errors. ISME J. 2011;5:570–3.
45. Rams TE, Feik D, Mortensen JE, Degener JE, van Winkelhoff AJ. Antibiotic 67. Knights D, Kuczynski J, Charlson ES, Zaneveld J, Mozer MC, Collman RG,
susceptibility of periodontal Enterococcus faecalis. J Periodontol. 2013;84:1026–33. et al. Bayesian community-wide culture-independent microbial source
tracking. Nat Methods. 2011;8:761–3.
46. Lucas GM, Lechtzin N, Puryear DW, Yau LL, Flexner CW, Moore RD.
68. Legendre P, Gallagher E. Ecologically meaningful transformations for
Vancomycin-resistant and vancomycin-susceptible enterococcal
ordination of species data. Oecologia. 2001;129:271–80.
bacteremia: comparison of clinical features and outcomes. Clin Infect
69. Ward JH. Hierarchical grouping to optimize an objective function. J Am Stat
Dis. 1998;26:1127–33.
Assoc. 1963;58:236–44.
47. Rafii F, Wynne R, Heinze TM, Paine DD. Mechanism of metronidazole-
70. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning: data
resistance by isolates of nitroreductase-producing Enterococcus gallinarum
mining, inference and prediction. 2nd ed. New York: Springer; 2009.
and Enterococcus casseliflavus from the human intestinal tract. FEMS
71. Gene Set Enrichment Analysis. [http://www.broadinstitute.org/gsea]
Microbiol Lett. 2003;225:195–200.
72. Pouchitis. [https://bitbucket.org/biobakery/pouchitis-public/]
48. Jia W, Li G, Wang W. Prevalence and antimicrobial resistance of
73. Pouchitis Source. [https://bitbucket.org/biobakery/pouchitis-public/src]
Enterococcus species: a hospital-based study in China. Int J Environ Res
74. Pouchitis2015. [http://huttenhower.sph.harvard.edu/pouchitis2015]
Public Health. 2014;11:3424–42.
49. Sadowy E, Sienko A, Gawryszewska I, Bojarska A, Malinowska K, Hryniewicz
W. High abundance and diversity of antimicrobial resistance determinants
among early vancomycin-resistant Enterococcus faecium in Poland. Eur J
Clin Microbiol Infect Dis. 2013;32:1193–203. Submit your next manuscript to BioMed Central
50. Sreeja S, Babu PRS, Prathab AG. The prevalence and the characterization of and take full advantage of:
the enterococcus species from various clinical samples in a tertiary care
hospital. J Clin Diagn Res. 2012;6:1486–8.
• Convenient online submission
51. Gutierrez Martin CB, Rodriguez Ferri EF. In vitro susceptibility of Pasteurella
multocida subspecies multocida strains isolated from swine to 42 • Thorough peer review
antimicrobial agents. Zentralbl Bakteriol. 1993;279:387–93. • No space constraints or color figure charges
52. Elliott TR, Hudspith BN, Wu G, Cooley M, Parkes G, Quinones B, et al.
Quantification and characterization of mucosa-associated and intracellular • Immediate publication on acceptance
Escherichia coli in inflammatory bowel disease. Inflamm Bowel Dis. • Inclusion in PubMed, CAS, Scopus and Google Scholar
2013;19:2326–38. • Research which is freely available for redistribution
53. Winter SE, Winter MG, Xavier MN, Thiennimitr P, Poon V, Keestra AM, et al.
Host-derived nitrate boosts growth of E. coli in the inflamed gut. Science.
2013;339:708–11. Submit your manuscript at
www.biomedcentral.com/submit