FemaleLiver 02 NetworkConstr Blockwise
FemaleLiver 02 NetworkConstr Blockwise
2.c Dealing with large data sets: block-wise network construction and
module detection
Peter Langfelder and Steve Horvath
November 25, 2014
Contents
0 Preliminaries: setting up the R session 1
Important note: The code below uses parallel computation where multiple cores are available. This works well
when R is run from a terminal or from the Graphical User Interface (GUI) shipped with R itself, but at present it
does not work with RStudio and possibly other third-party R environments. If you use RStudio or other third-party
R environments, skip the enableWGCNAThreads() call below.
# Display the current working directory
getwd();
# If necessary, change the path below to the directory where the data files are stored.
# "." means current directory. On Windows use a forward slash / instead of the usual \.
workingDir = ".";
setwd(workingDir);
# Load the WGCNA package
library(WGCNA)
# The following setting is important, do not omit.
options(stringsAsFactors = FALSE);
# Allow multi-threading within WGCNA. This helps speed up certain calculations.
# At present this call is necessary.
# Any error here may be ignored but you may want to update WGCNA if you see one.
# Caution: skip this line if you run RStudio or other third-party R environments.
# See note above.
enableWGCNAThreads()
# Load the data saved in the first part
lnames = load(file = "FemaleLiver-01-dataInput.RData");
#The variable lnames contains the names of loaded variables.
lnames
We have loaded the variables datExpr and datTraits containing the expression and trait data, respectively.
The result is shown in Fig. 1. We choose the power 6, which is the lowest power for which the scale-free topology fit
index curve flattens out upon reaching a high value (in this case, roughly 0.90).
Scale independence Mean connectivity
7 8 18 20 1
6 16
Scale Free Topology Model Fit,signed R^2
9 14
12
0.8
10
600
Mean Connectivity
5
0.6
400
4
0.4
200
3
0.2
3
1 4
5 6
7 8 9 10 12
0.0
2 14 16 18 20
0
5 10 15 20 5 10 15 20
Soft Threshold (power) Soft Threshold (power)
Figure 1: Analysis of network topology for various soft-thresholding powers. The left panel shows the scale-free fit
index (y-axis) as a function of the soft-thresholding power (x-axis). The right panel displays the mean connectivity
(degree, y-axis) as a function of the soft-thresholding power (x-axis).
We have chosen the soft thresholding power 6, a relatively large minimum module size of 30, and a medium sensitivity
(deepSplit=2) to cluster splitting. The parameter mergeCutHeight is the threshold for merging of modules. We have
also instructed the function to return numeric, rather than color, labels for modules, and to save the Topological
Overlap Matrix. The output of the function may seem somewhat cryptic, but it is easy to use. For example,
bwnet$colors contains the module assignment, and bwnet$MEs contains the module eigengenes of the modules.
A word of caution for the readers who would like to adapt this code for their own data. The function
blockwiseModules has many parameters, and in this example most of them are left at their default value. We
have attempted to provide reasonable default values, but they may not be appropriate for the particular data set
the reader wishes to analyze. We encourage the user to read the help file provided within the package in the R envi-
ronment and experiment with tweaking the network construction and module detection parameters. The potential
reward is, of course, better (biologically more relevant) results of the analysis.
A second word of caution concerning block size. In particular, the parameter maxBlockSize tells the function
how large the largest block can be that the reader’s computer can handle. In this example we have set the maximum
block size to 2000 to illustrate the block-wise analysis and its results, but this value is needlessly small for most
modern computers; the default is 5000 which is appropriate for most modern desktops. If the reader has access
to a large workstation with more than 4 GB of memory, the parameter maxBlockSize can be increased. A 16GB
workstation should handle up to 20000 probes; a 32GB workstation should handle perhaps 30000. A 4GB standard
desktop or a laptop may handle up to 8000-10000 probes, depending on operating system and other running programs.
In general it is preferable to analyze a data set in as few blocks as possible.
Below we will compare the results of this analysis to the results of Section 2.a in which all genes were analyzed
in a single block. To make the comparison easier, we relabel the block-wise module labels so that modules with a
significant overlap with single-block modules have the same label:
# Load the results of single-block analysis
load(file = "FemaleLiver-02-networkConstruction-auto.RData");
# Relabel blockwise modules
bwLabels = matchLabels(bwnet$colors, moduleLabels);
# Convert labels to colors for plotting
bwModuleColors = labels2colors(bwLabels)
To see how many modules were identified and what the module sizes are, one can use table(bwLabels). Its output is
> table(bwLabels)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
142 472 470 479 271 327 130 209 153 121 100 100 104 77 73 81 40 42 34 91
20
84
and indicates that there are 20 modules, labeled 1 through 20, The label 0 is reserved for genes outside of all
modules. The hierarchical clustering dendrograms (trees) used for the module identification for each block are
returned in bwnet$dendrograms[[1]], bwnet$dendrograms[[2]]. The dendrograms can be displayed together with the
color assignment using the following code:
# open a graphics window
sizeGrWindow(6,6)
# Plot the dendrogram and the module colors underneath for block 1
plotDendroAndColors(bwnet$dendrograms[[1]], bwModuleColors[bwnet$blockGenes[[1]]],
"Module colors", main = "Gene dendrogram and module colors in block 1",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
# Plot the dendrogram and the module colors underneath for block 2
plotDendroAndColors(bwnet$dendrograms[[2]], bwModuleColors[bwnet$blockGenes[[2]]],
"Module colors", main = "Gene dendrogram and module colors in block 2",
dendroLabels = FALSE, hang = 0.03,
addGuide = TRUE, guideHang = 0.05)
The resulting plots are shown in Fig. 2. We note that if the user would like to change some of the tree cut, module
membership, and module merging criteria, the package provides the function recutBlockwiseTrees that can apply
modified criteria without having to recompute the network and the clustering dendrogram, thus saving a substantial
amount of time.
Gene dendrogram and module colors in block 1 Gene dendrogram and module colors in block 2
1.0
1.0
0.9
0.9
0.8
0.8
0.7
Height
Height
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
Module colors Module colors
Figure 2: Clustering dendrograms of genes, with dissimilarity based on topological overlap, together with assigned
module colors. There is one gene dendrogram per block.
The resulting plot is shown in Fig. 3. Visual inspection confirms that there is excellent agreement between the
single-block and the block-wise module assignment.
We now verify that module eigengenes of modules that correspond to one another in the single-block and block-wise
approaches are extremely similar. We first calculate the module eigengenes based on the single block and block-wise
module colors:
singleBlockMEs = moduleEigengenes(datExpr, moduleColors)$eigengenes;
blockwiseMEs = moduleEigengenes(datExpr, bwModuleColors)$eigengenes;
Next we match the single-block and block-wise eigengenes by name and calculate the correlations of the corresponding
eigengenes:
single2blockwise = match(names(singleBlockMEs), names(blockwiseMEs))
signif(diag(cor(blockwiseMEs[, single2blockwise], singleBlockMEs)), 3)
The result is
> signif(diag(cor(blockwiseMEs[, single2blockwise], singleBlockMEs)), 3)
MEblack MEblue MEbrown MEcyan MEgreen
Single block gene dendrogram and module colors
1.0
0.9
0.8
0.7
Height
0.6
0.5
0.4
0.3
Single block
2 blocks
Figure 3: Clustering dendrogram of genes obtained in the single-block analysis in Section 2.a, together with module
colors determined in the single-block analysis and the module colors determined in the block-wise analysis. There is
excellent agreement between the single-block and block-wise network construction and module detection.
Each number above represents the correlation of a single-block eigengene with its corresponding block-wise counter-
part. The correlations are all very close to 1 (the turquoise eigengene changed orientation), again indicating that the
block-wise and single-block analyses lead to very similar results.
References
[1] B. Zhang and S. Horvath. A general framework for weighted gene co-expression network analysis. Statistical
Applications in Genetics and Molecular Biology, 4(1):Article 17, 2005.