Identification of Drought Stress-Responsive Genes in Rice (Oryza Sativa) by Meta-Analysis of Microarray Data
Identification of Drought Stress-Responsive Genes in Rice (Oryza Sativa) by Meta-Analysis of Microarray Data
https://doi.org/10.1007/s12041-020-01195-w (0123456789().,-volV)(0123456789().,-volV)
RESEARCH ARTICLE
PREETI SIROHI, BIRENDRA S. YADAV, SHADMA AFZAL, ASHUTOSH MANI and NAND K. SINGH*
Department of Biotechnology, Motilal Nehru National Institute of Technology Allahabad, Prayagraj 211 004, India
*For correspondence. E-mail: [email protected], [email protected].
Abstract. Meta-analysis provides a systematic access to the previously studied microarray datasets that can recognize several common
signatures of stresses. Three different datasets of abiotic stresses on rice were used for meta-analysis. These microarray datasets were
normalized to regulate data for technical variation, as opposed to biological differences between the samples. A t-test was performed to
recognize the differentially-expressed genes (DEGs) between stressed and normal samples. Gene ontology enrichment analysis revealed the
functional distribution of DEGs in different stressed conditions. Further analysis was carried out using software RICE NET DB and divided
into three different categories: biological process (homoiothermy and protein amino acid phosphorylation), cellular component (nucleus
and membrane), and molecular function (zinc ion binding ad DNA binding). The study revealed that 5686 genes were constantly expressed
differentially in Oryza sativa (2089 upregulated and 3597 downregulated). The lowest P value (P = 0.003756) among upregulated DEGs
was observed for naringenin, 2-oxoglutrate 3-dioxygenase protein. The lowest P value (P = 0.002866816) among the downregulated DEGs
was also recorded for retrotransposon protein. The network constructed from 48 genes revealed 10 hub genes that are connected with
topological genes. These hub genes are stress responsive genes that may also be regarded as the marker genes for drought stress response.
Our study reported a new set of hub genes (reference genes) that have potentially significant role in development of stress tolerant rice.
Keywords. meta-analysis; drought; abiotic stress; downregulated; upregulated; differentially expressed genes; hub genes.
drought affected the total DNA methylation pattern which (Oryza sativa). The objective of this study was that a meta-
counted as an average total of 12.1% methylation differences analysis of freely accessible microarray datasets of different
when calculated across different tissue, genotype and devel- biotic and abiotic stresses on rice can recognize a common
opmental stage (Saraswat et al. 2017). In plant system, the cell mark of stresses and apply a meta-analysis method on var-
wall is composed of cellulose and connected to hemicellulose. ious microarray datasets which then approves the signature
Besides these major components, it also has a trace amount of in individual datasets (Daves et al. 2011).
phenolics, esterase, pectin, expansin and several other pro-
teins. Drought and other stress cause the production of ROS
leading to crosslinking of phenolics with glycoproteins Materials and methods
resulting in hardening of the cell wall (Tenhaken 2014). This
hardening of cell wall ultimately reduces the leaf surface area Microarray datasets selection
and photosynthetic rate per unit area (Basu et al. 2016).
Plants have diverse mechanisms to sustain drought at cellular The publicly available microarray studies were searched
and physiological levels. At the physiological level, they reduce using several keywords and their combinations such as
water loss by a decreased rate of diffusion through stomata and ‘O. sativa, abiotic stress, gene expression, microarray and
other parts, increases water absorption with the enhanced root genome’ by using Gene Expression Omnibus (GEO) data-
system, smaller and succulent leaves to diminish water defi- base for candidate genes in O. sativa gene expression
ciency. Drought stress in plant can be managed by executing datasets. The data collected further screened for eligibility
few tactics such as marker-assisted selection, breeding and mass and the duplicate data was excluded (figure 1).
screening and an extracellular spray of hormones and osmo- The available experimental datasets and their correspond-
protectants (Fang and Xiong 2015). ing experimental conditions were only attributed towards the
The plants experiencing stresses, i.e. biotic or abiotic, gene expression profiling in O. sativa between controlled and
independently and facing positive and negative influence have stressed conditions. The data of stressed samples were col-
shown crosstalk between them (Abuqamar et al. 2009). These lected after administration of stress conditions. In the present
studies have offered opportunities to improve plants for study, a meta-analysis was performed as per the guidelines of
fighting different individual stress tolerance (Mao et al. 2010). PRISMA statement. Data were accessed from original studies
Currently, the gene expression data from separate experiments having GEO accession number, analysis platform, the number
of biotic and abiotic stress have been exploited to find out of cases and controls, gene expression data and related refer-
shared stress-responsive genes (Shaik and Ramakrishna ences (Yang et al. 2014). Among the data, two of them were
2013, 2014). Plants in nature are generally tested by various collected from root tissues and one from the seedling of the
biotic and abiotic stresses. Plants capable of tolerating two or rice. The data with GEO ID GSE36661 administered with
more independent stresses are not really capable of tolerating drought stress, similarly GSE62308 and GSE64576 admin-
these stress altogether (Atkinson and Urwin 2012; Rame- istered with ABA-regulated drought stress (table 1).
gowda and Kumar 2015). The reports showed that plants are
capable of coping up with coinciding biotic and abiotic Data curation
stresses through demonstration of relevant responses which
cannot be comprehended by specifically concluding the out- In data curation, normalization of available data and related
comes from specific stress studies where each stress is applied parameters is a very crucial step in comparing microarray
separately (Bostock et al. 2014). There is a requirement of datasets. It became very difficult in a direct comparison
comprehending coinciding biotic and abiotic stress tolerance between altered datasets from various sources; these differ-
of plants because sufficient work was not done for this pur- ences arise mainly due to the use of the different platform,
pose. The answer to this problem is combining the different gene nomenclature and tissue used as a control. Variation in
stressed conditions data and find out specific markers in the normalization may lead to the probability of distorting
response to stress through meta-analysis. comparative outcome; it reduces the authentic computation
Meta-analysis of available data has the potential to explore of candidate gene expression changes. As a consequence,
the transcriptomic studies (Feichtinger et al. 2012). It is based there may be a need to consider a globally accepted nor-
on the statistical analysis of multiple studies on the similar malization pathway for minimal inconsistency. Z score
experimental conditions, based on this one can easily identify transformation method is very reliable and sensitive tool to
the variation and have a reason for this alteration. By com- compute the expression potencies of each probe in gene
paring the studies on the statistical background, it increases the expression profiles and computed by the given formula.
reliability of the outcome with a given set of data that may be in
the form of genes called as differentially-expressed genes ðxi xÞ
Z score ¼ ;
(DEGs) (Ramasamy et al. 2008). d
Keeping the above situation in mind, this work was
designed to study abiotic stress on plants together by com- where xi denotes raw data for each gene; x denotes average
paring the microarray datasets of different stresses on rice gene intensity within a single experiment and d denotes
Drought stress-responsive genes in rice Page 3 of 10 35
Figure 1. Flowchart of the selected process of microarray datasets for the meta-analysis.
GSE36661 3:3 Affymetrix rice genome array Root Disease and drought
GSE62308 2:2 Affymetrix rice genome array Seedling ABA-regulated drought stress
GSE64576 4:4 Affymetrix rice genome array Root ABA-regulated drought stress
standard deviation (SD) of all measured potencies (Yang selected from those genes which showed at least two-fold
et al. 2014). changes equivalent to a false discovery rate (FDR) \ 0.01
(Tusher et al. 2001).
Statistical analysis
Venn diagram
To identify the DEGs between stressed and control samples
statistical significance analysis of microarray (SAM) was Venn diagrams were prepared using online tool Venny v2.1.
used. Microsoft Excel 2010 was used to analyze the data. The individual microarray data was normalized and fold
The specific t-test (one-tailed, paired t-test) was performed to change values were calculated. Fold change value [ 2 were
validate the significance of DEGs with a ‘comparative dif- selected as upregulated DEGs and fold change value \ 2
ference’ score for screened candidate genes (Yang et al. were selected as downregulated DEGs. The list of selected
2014). The average expression change from various reference IDs were pasted in the online tool which will
expression forms the standard deviation of values for that provide the common upregulated and common downregu-
gene is termed as P value (Hou et al. 2014). DEGs were lated DEGs among all three microarray data. Venn diagrams
35 Page 4 of 10 Preeti Sirohi et al.
were also constructed for tissues vs conditions as shown in significance of candidate gene in a wide variety of datasets
figure 2. (Madrid et al. 2012).
Functional distribution and biological significance of Coexpression networks are useful for associate genes that
screened DEGs were further analysed on RICE NET DB are involved in the same biological pathway or that are of
(Narsai et al. 2013). These online tools perform the gene protein complexes (Moyano et al. 2015). The coexpression
ontology (GO) enrichment analysis and find the biological network was constructed for the genes greater than fold
change ± 1.5 and P value less than 0.05 by constructing the
coefficient correlation matrix. Those genes that have Pear-
son’s coefficient correlation above the cut-off value,
i.e. ± 0.95 are used for construction of biological network
using expression correlation and network analyzer by net-
work analyst of Cytoscape (Shannon et al. 2003). In bio-
logical network, nodes represent the genes and nodes
represent the connectivity between the genes. Hub genes
(most connected genes among the biological network) were
also screened using cytoHubba plugin of cytoscape (Chin
et al. 2014).
Figure 3. Venn diagrams summarizing microarray data analysis: (a) Venn diagram showing number of upregulated and (b) downregulated
genes in rice after drought stress. Only genes with log2 fold change above 2 were considered for this analysis.
Drought stress-responsive genes in rice Page 5 of 10 35
0.002866816
0.006254085
0.008920827
0.009377673
0.009379355
0.012599032
0.013500274
0.014619313
0.015094976
0.015256946
water shortage, depleting soil moisture and cause adverse
P value
effects on plants. Drought drastically devastates plant growth
and development with considerable reductions in crop
growth rate and biomass accumulation. Meta-analysis
approach integrates DEGs from microarray datasets which
were expressed consistently with statistical significance and
– 4.286387491
– 8.718961239
– 2.317901962
– 5.779275181
– 3.883709153
– 3.167526416
– 3.991934519
– 5.591710319
– 3.387832634
– 3.829875411
performed GO enrichment analysis. High-throughput tran-
Fold change
scriptomic data enable meta-analysis of multiple datasets
which lead to discovering robust candidate gene for stress
(Fang et al. 2015). In this study, we identified DEGs by
comparing transcriptomic responses in stress and normal
rice. Most important candidate genes were identified in
Cytochrome P450
Expressed protein
LOC_Os12g30150
LOC_Os05g50730
LOC_Os04g24310
LOC_Os04g24319
LOC_Os04g24478
LOC_Os05g32730
LOC_Os06g32310
LOC_Os07g43180
LOC_Os06g08690
LOC_Os06g08710
LOC_Os09g31040
OsAffx.19894.1.S1_at
OsAffx.26087.1.S1_at
OsAffx.21309.1.S1_at
OsAffx.4679.1.S1_at
Os.46470.2.S1_x_at
Os.16233.1.S1_a_at
upregulated DEGs express naringenin, 2-oxoglutarate Top 10 significantly downregulated and upregulated
3-dioxygenase protein. The predicted subcellular location of DEGs are listed in tables 2&3. Among the most upregulated
this protein is cytosol chloroplast. This protein is involved in DEGs, most of the genes are responsible for the proteins like
flavonoid biosynthesis pathway, which is a part of secondary heat shock protein STI, heat shock protein DnaJ, U-box
metabolite biosynthesis. The DEG with lowest P value domain containing heat shock protein, hsp20/alpha crystallin
(P = 0.002866816) among the downregulated DEGs express family protein. In response to drought stress, plants develop
retrotransposon protein which is a member of cytochrome protective strategies to cope up with it. Heat shock proteins
P450 family protein. Retrotransposons are regulators of gene synthesis is one of the protective tools of the plants to pro-
expression mediated through RNA intermediate (Elbarbary vide a defense against drought and heat stress (Virdi et al.
et al. 2016). Therefore during the drought condition, plants 2015).
become incapable of synthesis of proteins helpful to over- Similarly, a few downregulated DEGs are observed in
come the stress condition (tables 2 & 3). response to drought stress which regulates the proteins like
Figure 4. The top 10 enriched GO terms of upregulated DEGs. (a) Biological process for DEGs; (b) cellular component for DEGs;
(c) molecular function for DEGs.
Drought stress-responsive genes in rice Page 7 of 10 35
Figure 5. The top 10 enriched GO terms of downregulated DEGs. (a) Biological process for DEGs; (b) cellular component for DEGs;
(c) molecular function for DEGs.
retrotransposon protein, diterpene phytoalexin precursor categories, namely, biological process, cellular component
biosynthetic process pathway and ent-kaurene synthase, and molecular function. Genes with the minimum signifi-
chloroplast precursor, CAMK_CAMK_like.47 - CAMK cance level of (P \ 0.01) 1% were selected and were tested
includes calcium/calmodulin-dependent protein kinases. against the background set of all genes with GO annotations.
Calmodulin (CaM) acts as an integrator of different stress The biological process, molecular functions, and cellular
signalling pathways, which allows plants to maintain components were investigated separately by web-based
homeostasis between different cellular processes (Virdi et al. software RICE NET DB. The GO terms found for biological
2015). In case of drought stress, plants encounter osmotic process are significantly enriched in protein amino acid
stress which however, further induces a chain of various phosphorylation (GO:0006468, hyper P = 0.0000) (fig-
responses at the molecular and cellular levels. Due to ure 4a) and homoiothermy (GO:0042309, hyper
osmotic stress, the concentration of the cytosolic Ca2? in- P = 0.0000) (figure 5a) while for cellular component, the
creases which transduces Ca2? signals. Ca2?-ion signalling enriched GO terms were membrane (GO:0016020, hyper
induces appropriate cellular responses to overcome the P = 0.0000) (figure 4b) and nucleus (GO:0005634, hyper
damage caused by the drought stress (Zeng et al. 2015). P = 0.0000) (figure 5b), and for molecular functions, the
enriched GO terms were DNA binding (GO:0003677, hyper
P = 0.0000) (figure 4c) and zinc ion binding (GO:0008270,
Functional annotation hyper P = 0.0000) (figure 5c).
Based on the GO analysis, the upregulated genes mostly
The biological significance of the DEGs could be understood performed the DNA binding, i.e. transcription factor local-
by performing GO enrichment analysis from O. sativa. A ized in the membrane and participate in protein amino acid
typical descriptive model and functional annotation and phosphorylation activity, while downregulated genes mostly
categorization to study the gene set information were pro- performed the zinc ion binding factor localized in the
vided by gene ontology. GO groups are arranged into three nucleus and participate in homoiothermy, which indicated
35 Page 8 of 10 Preeti Sirohi et al.
that the temperature regulation of plant was affected due to The hub genes identified during network analysis were alpha
downregualtion of biological process homiothermy. DNA- amylase isozyme 3D which belongs to gylcosyl hydrolase 13
binding transcription factor such as CaM7 regulates plant family; myb family transcription factor APL required for the
response to light signals to reduce the probable damage phloem identity and regulates the expression of transcription
caused by drought stress. Post-translational modifications factor NAC045 (direct the sieve element enucleation and
(PTMs) involve protein phosphorylation which changes cytosol degradation), they may also activate the transcription
protein function, protein–protein interaction and cellular of specific genes involved in phosphate uptake or assimilation.
localization. Phosphorylated drought-responsive proteins Heat stress transcription factors (transcriptional regulators)
play major role in signalling, transcription and photosyn- were identified as hub genes that specifically binds DNA of
thesis, as well as in protein synthesis. The investigation of heat shock promotor elements (HSE). OsWRKY71 (tran-
physiological, molecular and proteomic studies related to scription factor) identified hub gene might function as a
drought-responsive traits gives insights for further under- transcriptional regulator in rice defense signalling pathways.
standing of plant drought tolerance (Wang et al. 2016). WRKY proteins are a large family of transcription factors that
mainly participate in plant biotic stress responses, therefore
they are responsible for the development of drought stress
Biological network analysis tolerance in rice (Liu et al. 2006).
Drought creates water deficiency in plants/rice which affect
A set of 52 genes having fold change [ ± 1.5 and the physiological functions of the rice as rice require a large
P-value \ 0.05 was taken for construction of coefficient amount of water for its physiological functions. Hence, it
correlation matrix. Of the 52 genes, 48 cleared the cut-off
and biological network was constructed (figure 6). Different
Table 4. Different parameters of biological network obtained from
parameters of network obtained from microarray expression
microarray expression data of rice on the basis of Pearson’s coef-
data of rice during drought stress is shown in table 4. ficient correlation during exposure of drought stress using cytos-
Clustering coefficient is low which represents the property of cape software.
biological network.
The top 10 hub genes with their degree which were iden- Clustering coefficient 0.3409
Shortest path 337
tified from network is shown in table 5. The maximum of Characteristic path length 1.7032
connectivity, i.e. degree of gene was 12 during drought stress. Average no. of neighbours 7.958
These genes are differentially expressed during different Network density 0
conditions and are most connected genes, having key role in No. of nodes 48
different biological process and molecular function (figure 7).
Figure 6. Biological network constructed on the basis of Pearson’s coefficient correlation using the expression data of rice under drought
stress.
Drought stress-responsive genes in rice Page 9 of 10 35
Table 5. Hub genes obtained from biological network of drought stress data of rice (O. sativa).
Figure 7. Biological networks constructed from expression data of rice under drought stress with hub genes, i.e. most connected genes.
adversely affects the yield of the rice crop. To overcome the retrotransposon protein, diterpene phytoalexin precursor
problem of drought and develop drought resistant/tolerant rice biosynthetic process pathway and ent-kaurene synthase,
varieties, the knowledge of the morphological, biochemical and chloroplast precursor, CAMK_CAMK_like.47 - CAMK
molecular mechanisms involved in rice against drought is very includes calcium/calmodulin dependent protein kinases were
important for rice breeders (Nahar et al. 2016). found downregulated to enhance the plants adaptation during
In conclusion, 5686 genes were consistently expressed dif- stress. In response to drought stress, intracellular Ca2? levels
ferentially in O. sativa, among which 2089 genes were upreg- changes and induce signalling pathways which help plants to
ulated and 3597 genes were found downregulated. The meta- cope with the changing environmental conditions. CaM is one
analysis based on gene expression data of stressed rice have of the important proteins that decodes Ca2? signals and regu-
shown the fundamental differences between normal and stres- lates activities of diverse proteins. The heat stress transcription
sed rice which includes DEGs along with their biological factors play a pivotal role in regulating the drought stress con-
function and it may contribute to identify potential candidate dition by regulating heat shock elements/promoters and help the
genes of abiotic stress. Against drought stress, the proteins like plants to overcome this situation. This study gives a broad view
heat shock protein STI, heat shock protein DnaJ, U-box domain for researchers with respect to the available different microarray
containing heat shock protein, hsp20/alpha crystallin family dataset which can be used to find out how plants overcome
protein were found upregulated to increase the defense mech- different stresses/diseases. The identified hub genes also pro-
anism of plants and on the other hand, proteins like vide a platform to develop a drought tolerant rice varieties.
35 Page 10 of 10 Preeti Sirohi et al.