Convolutional Neural Networks With Alternately Updated Clique

Uploaded by

bob wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Convolutional Neural Networks With Alternately Updated Clique

Uploaded by

bob wu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Convolutional Neural Networks with Alternately Updated Clique

Yibo Yang1,2 , Zhisheng Zhong2 , Tiancheng Shen1,2 , Zhouchen Lin2,3,∗

1
Academy for Advanced Interdisciplinary Studies, Peking University
2
Key Laboratory of Machine Perception (MOE), School of EECS, Peking University
3
Cooperative Medianet Innovation Center, Shanghai Jiao Tong University
{ibo,zszhong,tianchengShen,zlin}@pku.edu.cn
arXiv:1802.10419v3 [cs.CV] 3 Apr 2018

Abstract
0
Improving information flow in deep networks helps to 0
ease the training difficulties and utilize parameters more Block
efficiently. Here we propose a new convolutional neu- unfold 1 2
ral network architecture with alternately updated clique 1
(CliqueNet). In contrast to prior networks, there are both
forward and backward connections between any two layers
in the same block. The layers are constructed as a loop and 2 4 3
are updated alternately. The CliqueNet has some unique
properties. For each layer, it is both the input and output of
any other layer in the same block, so that the information 3
flow among layers is maximized. During propagation, the
newly updated layers are concatenated to re-update previ-
ously updated layer, and parameters are reused for mul- 4 1 2 3 4
tiple times. This recurrent feedback structure is able to
bring higher level visual information back to refine low- Stage-I feature Stage-II feature
level filters and achieve spatial attention. We analyze the
Figure 1. An illustration of a block with 4 layers. Any layer is
features generated at different stages and observe that using
both the input and output of another one. Node 0 denotes the input
refined features leads to a better result. We adopt a multi-
layer of this block.
scale feature strategy that effectively avoids the progressive
growth of parameters. Experiments on image recognition
would make it hard for latter layer to access the gradient in-
datasets including CIFAR-10, CIFAR-100, SVHN and Ima-
formation from previous layers, which may cause gradient
geNet show that our proposed models achieve the state-of-
vanishing and parameter redundancy problems [17, 18].
the-art performance with fewer parameters 1 .
Successfully adopted in ResNet [13] and Highway Net-
1. Introduction work [34], skip connection is an efficient way to make
top layers accessible to the information from bottom lay-
In recent years, the structure and topology of deep neural ers, and ease the network training at the same time, due
networks have attracted significant research interests, since to its relief of the gradient vanishing problem. The resid-
the convolutional neural network (CNN) based models have ual block structure in ResNet [13] also inspires a series
achieved huge success in a wide range of tasks of computer of ResNet variations, including ResNext [40], WRN [41],
vision. A notable trend of those CNN architectures is that PolyNet [44], etc. To further activate the gradient and in-
the layers are going deeper, from AlexNet [23] with 5 con- formation flow in networks, DenseNet [17] is a newly pro-
volutional layers, the VGG network and GoogleLeNet with posed structure, where any layer in a block is the output of
19 and 22 layers, respectively [32, 36], to recent ResNets all preceding layers, and the input of all subsequent layers.
[13] whose deepest model has more than one thousand Recent studies show that the skip connection mechanism
layers. However, inappropriately designed deep networks can be extrapolated as a recurrent neural network (RNN)
∗ Corresponding author or LSTM [14], when weights are shared among different
1 Code address: http://github.com/iboing/CliqueNet layers [27, 5, 21]. In this way, the deep residual network
is treated as a long sequence and hidden units are linked feedback mechanism. In each Clique Block, both forward
by skip connections. While this recurrent structure benefits and feedback are densely connected. The information flow
feature re-usage and iterative learning, the residual informa- is maximized and feature maps are repeatedly refined by
tion is restricted among neighboring layers and cannot be attention. We show that our network architecture can sup-
considered across multiple layers, because the recurrence press the activations of background and noises, and achieve
only happens once at each single layer. competitive results without resorting to data augmentation.
Attention mechanism is another focus of recent stud- The contributions in this study are listed as follows:
ies on network structure [39, 37, 1, 28] and applications • We propose a new convolutional neural network archi-
[3, 29, 24, 8]. When people watch a picture or a scene, the tecture called CliqueNet, which incorporates both for-
information on our target is better captured if we re-look at ward and backward connections between any two lay-
or re-think the target with additional attention. In cognition ers in the same block. The layers constructed as a loop
theory, the activity of a neuron in visual cortex is influenced are updated alternately. The CliqueNet that combines
by other cortical area’s responses transferred through feed- both recurrent structure and attention mechanism, is
back connections [19, 15]. This motivates the introduce of able to maximize information flow and achieve feature
feedback to deep networks [35, 42]. The feedback connec- refinement. We show that the refined features are more
tions that bring back higher-level semantic information in a discriminative and lead to a better performance.
top-down manner are able to re-weight the focus, and sup- • We adopt a multi-scale feature strategy that effectively
press the non-relevant neuron activations of background and circumvents the progressive increment of parameters,
noises. despite the extra feedback connections.
Inspired by the recurrent structure and attention mecha- • We conduct experiments on four benchmark datasets
nism, in this study, we propose a new convolutional neu- including CIFAR-10, CIFAR-100, SVHN and Ima-
ral network architecture with alternately updated clique geNet to demonstrate the superiority of our models.
(CliqueNet). In contrast to prior network structures, there
are both forward and feedback connections between any 2. Related Work
two layers in the same block. As illustrated in Figure 1, the A number of deep networks with large model capacity
layers in Clique Block are constructed as a clique and are have been proposed. For widening the network, the Incep-
updated alternately. Concretely, the several previous layers tion modules in GoogLeNet [36] fuse the features in dif-
are concatenated to update the next layer, after which, the ferent map size to construct a multi-scale representation.
newly updated layer is concatenated to re-update the pre- Multi-column [6] nets and Deeply-Fused Nets [38] also use
vious layer, so that information flow and feedback mecha- fusion strategy and have a wide network structure. Wide
nism can be maximized. Each layer in a block is both the residual networks [41] increase the width and decrease the
input and output of another one, which means they are more depth to improve the performance, while FractalNet [25]
densely connected than DenseNets [17]. We adopt a multi- deepen and widen at the same time. However, simply
scale feature strategy to compose the final representation widening the network is easy to consume more runtime and
with the block features in different map sizes. memory [44]. For deepening the networks, skip connec-
CliqueNet architecture has some unique properties. tions or shortcut paths are widely adopted strategies to ease
An intuition would tell that our proposal is parameter- the network training [13, 34]. In [18], it is shown that some
demanding, because given a block with n layers, DenseNet of the layers in ResNets are dispensable and cause parame-
[17] needs Cn2 groups of parameters, while ours needs A2n ters redundancy. So they randomly drop a subset of layers to
(C and A represents combination operator and permutation ease the training and achieve a better performance. To fur-
operator, respectively). However, the filters in DenseNet ther increase information flow, DenseNets [17] replace the
increase linearly as the depth rises [5], which may leads to identity mapping in residual block by concatenating oper-
the rapid growth of parameters. In our architecture, only the ation, so that new feature learning can be reinforced while
Stage-II feature in each block is fed into the next block. It keeping old feature re-usage. In line with this view, dual
turns out that this is a more parameter-efficient way. In ad- path networks (DPN) [5] are proposed to combine both ad-
dition, traditional neural networks add a new layer with its vantages of residual path and densely connected path.
corresponding parameters. As for CliqueNet, the weights Both residual path and densely connected path corre-
among layers in a block keep recycling during propagation. spond to a recurrent propagation, and their success has been
The layers can be updated alternately for multiple times so attributed to the recurrent structure and iterative refinement
that a deeper representation space is attained with the fixed [27, 11, 21]. Studies incorporating recurrent connections
number of parameters. into CNNs also show superiority in object recognition [26],
CliqueNet also shows a strong ability for representation scene parsing [31] and some other tasks. CliqueNet dif-
learning due to the combination of recurrent structure and fers from these structures in that the iterative mechanism
Input

Transition

Transition
Transition
Stage-II Stage-II

Clique Block 1 Clique Block 2 Clique Block 3

𝑋0 , Stage-II 𝑋0 , Stage-II 𝑋0 , Stage-II

Global Pool Global Pool Global Pool

Prediction

Figure 2. A CliqueNet with three blocks. The input layer together with the Stage-II feature in each block are concatenated to be the block
feature, and form part of the final representation after global pooling. The Stage-II feature passes through transition layers, which include
a convolution and an average pooling to change map sizes, and then becomes the input of the next block.

exists in each step of the propagation, instead of just be- 3. CliqueNet Architecture
tween neighboring layers or from the top layer to the bot-
tom layer; all layers in a block participate in the recurrent The CliqueNet architecture has two main ingredients, the
loop so that the filters are communicated sufficiently and the block with alternately updated clique (Clique Block) to en-
blocks play both roles of information carrier and refiner. able feature refinement, and the multi-scale feature strategy
Recent studies have embraced the attention mechanism that facilitates parameter efficiency.
as an effective technique to strengthen some neurons that
feature the target, and improve the performance as a re- 3.1. Clique Block
sult. It is proved fruitful in many applications, including In order to maximize the information flow among lay-
image recognition [37, 8], image captioning [3], image- ers, we design the Clique Block. Any two layers in the
text matching [29], and saliency detection [24]. In gen- same block are connected bidirectionally except for the in-
eral, visual attention can be achieved by formulating an op- put node. Compared with Dense Block [17] where each
timization problem [1], weighting the activations spatially layer is the output of all previous layers, and the input of
or channel-wisely [3, 16], and introducing feedback con- all subsequent layers, Clique Block makes each layer both
nections [39, 35, 42]. In [42], the model makes consecu- the input and output of any other layers. The propagation
tive decisions for a more accurate prediction via feedback of a Clique Block with 5 layers is illustrated in Table 1. At
connections. The input of the next decision is based on the first stage, the input layer (X0 ) initializes all layers in
the output of the last decision. Experiments show that the this block by single directional connections. Each updated
top-down propagation is capable of refining lower-level fea- layer is concatenated to update the next layer. From the sec-
tures, and improving classification performance [35], espe- ond stage, the layers begin updating alternately. All layers
cially on datasets with noise and occlusion [39, 28]. But except the top layer to be updated are concatenated as the
how to make a proper attention mechanism and boost the bottom layer, and their corresponding parameters are also
supervision between layers remains further exploration. concatenated. Accordingly, the ith (i ≥ 1) layer in the kth
There are also some studies that design attention mecha- (k ≥ 2) loop can be formulated as:
nism tied with recurrent neural networks [28, 24, 8]. A re-
cent report [2] tries to propose a loopy net, but it just repeats !
(k) (k)
X X
the skip connections and does not make layers communi- (k−1)
Xi =g Wli ∗ Xl + Wmi ∗ Xm (1)
cated. The loopy inference adopted in [4, 45] shares a sim- l<i m>i
ilar motivation with our work. However, they do not incor-
porate feedback connections, which are important for fea- where ∗ denotes the convolution operation with parameters
ture refinement. CliqueNet enables true cycling because of W , and g is the non-linear activation function. Wij keeps
the alternate propagation. Although alternate updating has re-used in different stages. Each layer will always receive
been an important method in the optimization theory [9], the feedback information from the layers that are updated
it has not been introduced into deep learning areas. At the more lately. It achieves a spatial attention mechanism due
best of out knowledge, we are the first to use updated lay- to the top-down refinement brought by each propagation.
ers to re-update previous layers alternately, and these layers This recurrent feedback structure ensures that the commu-
construct a loop to cycle for multiple times. nication is maximized among all layers in the block.
Bottom Layers Weights Top Layer Feature
(1)
X0 W01 X1
(1) (1)
{X0 , X1 } {W02 , W12 } X2
(1)
{X0 , X1 , X2 }
(1)
{W03 , W13 , W23 }
(1)
X3 Stage-I
(1) (1) (1) (1)
{X0 , X1 , X2 , X3 } {W04 , W14 , W24 , W34 } X4
(1) (1) (1) (1) (1)
{X0 , X1 , X2 , X3 , X4 } {W05 , W15 , W25 , W35 , W45 } X5
(1) (1) (1) (1) (2)
{X2 , X3 , X4 , X5 } {W21 , W31 , W41 , W51 } X1
(1) (1) (1) (2) (2)
{X3 , X4 , X5 , X1 } {W32 , W42 , W52 , W12 } X2
(1) (1) (2) (2)
{X4 , X5 , X1 , X2 } {W43 , W53 , W13 , W23 }
(2)
X3 Stage-II
(1) (2) (2) (2) (2)
{X5 , X1 , X2 , X3 } {W54 , W14 , W24 , W34 } X4
(2) (2) (2) (2) (2)
{X1 , X2 , X3 , X4 } {W15 , W25 , W35 , W45 } X5
···

Table 1. A diagram of CliqueNet’s propagation in a block with 5 layers. Wij is the weights of parameter from Xi to Xj and keeps re-used.
“{}” denotes the concatenation operator. The Stage-II feature is to be transited as the input layer (X0 ) of the next block.

name block feature transit error(%)

10 0 CliqueNet (I+I) X0 , Stage-I Stage-I 6.64
CliqueNet (I+II) X0 , Stage-I Stage-II 6.1
CliqueNet (II+II) X0 , Stage-II Stage-II 5.76

10 -1 Table 2. Results of different versions of CliqueNets on CIFAR-10.

loss

II+II train
II+II test For the purpose of analyzing the features generated in
10 -2
I+II train different stages, we conduct experiments on CIFAR-10
I+II test
I+I train dataset (with no data augmentation) using different versions
I+I test of CliqueNets. As Table 2 shows, the CliqueNet (I+I) only
0 50 100 150 200 250 300 considers the Stage-I feature. The CliqueNet (I+II) uses the
epoch
Stage-I feature and input layer as block feature to access
Figure 3. Training and testing curves of different versions of loss function, but transits the Stage-II feature into the next
CliqueNets. Learning rate is divided by 10 at epoch 150 and 225.
block. The CliqueNet (II+II) adopts our aforementioned
strategy. They all have 3 blocks with 5 layers in each block.
3.2. Feature at Different Stages Each layer contains 36 filters. The experimental settings are
We analyze the features produced at different stages, and following [17]. The main results are shown in Figure 3. It
adopt a multi-scale feature strategy to avoid the rapid incre- is found that the introduce of Stage-II feature indeed leads
ment of parameters. to a better result by a significant margin. We adopt the
CliqueNet (II+II) structure for the following experiments.
The first stage is used to initialize all layers in the block,
and the layers are refined repeatedly since the second stage. 3.3. Extra Techniques
Given that the Stage-II feature is refined with attention and
assimilates more high level visual information, we make the In addition to the structures mentioned above, we con-
Stage-II feature together with the input layer in each block sider some techniques to help strengthen the model and im-
concatenated as the block feature, and then accessed to the prove the state of the art. In the experimental section, we
loss function after global pooling. Only the Stage-II feature conduct experiments with and without these additional tech-
is fed into the next block as their input layer X0 ; see Fig- niques to show the effectiveness of our model.
ure 2. In this way, the final representation is characterized Attentional transition. The CliqueNet includes feedback
by multi-scale feature maps, and the dimensionality in each connections to refine lower level activations using higher
block will not increase progressively. Because higher stage level visual information. The attention mechanism weight
propagation comes with more computational cost and am- the feature maps spatially to weaken the noises and back-
plifies the model complexity, we only consider the first two ground. The channel-wise attention, adopted in [3, 37, 16],
stages. also benefits recognition problem because it recalibrates
different filters to prevent overfitting and inspire new fea- convolution(1 × 1)
tures learning. In CliqueNet, we incorporate channel-
wise attention mechanism in transition layers, following the Global Pooling
method proposed in [16]. As depicted in Figure 4, the fil-
ters are globally averaged after the convolution in transi- 𝑊×𝐻×𝐶
tion. They are followed by two fully connected (FC) layers. 1×1×𝐶 FC, Relu
The first FC layer has half of the filters and is activated by
1 × 1 × 𝐶/2 FC, Sigmoid
Relu function. The second FC layer has the same number
of filters and is activated by Sigmoid function, so that the 1×1×𝐶
activation is scaled into [0, 1] and acts on the input layer by
filter-wise multiplication. Different from [16] which sets
this module at each residual layer, we only add it to transi- Filter-wise multiplication
tion layers in order to adjust the filters into the next block.
Bottleneck and compression. Bottleneck is an effective
way to decrease the number of parameters and provide fur-
ther potential to enlarge model capacity. It is conjectured 𝑊×𝐻×𝐶 pooling(2 × 2)
[41] that bottleneck architecture is suitable for deeper net-
works and large dataset like ImageNet, and recent stud-
Figure 4. A schema for attentional transition. The transition layer
ies have embraced bottleneck for a better performance
consists of convolution and pooling. The filter-wise multiplication
[13, 17, 37, 5]. So we introduce bottleneck to our large happens after convolution and before down pooling. W , H and C
models. The 3 × 3 convolution kernels in each block are are width, height and channels of feature maps.
replaced by 1 × 1, and produce a middle layer, after which,
a 3 × 3 convolution layer follows to produce the top layer.
Layer S0 S1 S2 S3
The middle layer and top layer contain the same number of
feature maps. Compression is another tool adopted in [17] Convolution
conv (7 × 7), 64, stride 2
to make the model more compact. Instead of compressing (112 × 112)
the number of filters in transition layers as they do, we only Pooling
max pool (3 × 3), stride 2
compress the features that are accessed to the loss function, (56 × 56)
i.e. the Stage-II concatenated with its input layer. The mod- Block 1
36 × 5 36 × 5 36 × 5 40 × 6
els with compression have an extra convolutional layer with (56 × 56)
1 × 1 kernel size before global pooling. It generates half the Transition: conv (1 × 1), avg pool (2 × 2)
number of filters to enhance model compactness and keep Block 2
64 × 6 80 × 6 80 × 5 80 × 6
the dimensionality of the final feature in a proper range. (28 × 28)
Transition: conv (1 × 1), avg pool (2 × 2)
3.4. Implementation Block 3
100 × 6 120 × 6 150 × 6 160 × 6
(14 × 14)
In our experiments, we test our models on benchmark Transition: conv (1 × 1), avg pool (2 × 2)
datasets without the aforementioned extra techniques to
Block 4
show the effectiveness of CliqueNet, and further improve 80 × 6 100 × 6 120 × 6 160 × 6
(7 × 7)
the state-of-the-art performance with them. There are two
structure parameters, the sum of layers in all blocks, T, and Table 3. Structures on ImageNet. The first number in each block is
the number of filters per layer, k. For our models without the number of filters per layer, and the second denotes the number
bottleneck, convolution layers in each block are with 3 × 3 of layers in this block.
kernel size and padded by one pixel to keep the feature maps
in the same size. Blocks are linked by transition layers,
where a convolution layer with 1 × 1 kernel size is followed 16 × 16, and 8 × 8, respectively. Before entering the first
by 2 × 2 average pooling. All convolutions are performed block, the input images pass through a 3 × 3 convolution
in a unit composed of three consecutive operations: batch with output channels set to be 64 as the input layer (X0 ) of
normalization[20], Relu, and the convolution. Stage-II fea- the first block. As for ImageNet, we use four blocks with
ture with its input layer from all blocks are concatenated bottleneck and compression, and compare our results with
after global pooling, and end with a fully-connected layer and without attentional transition. The initial transition has
with softmax. 7 × 7 convolution with stride 2 and 3 × 3 max pooling with
For experiments on CIFAR and SVHN, there are three stride 2 on the 224 × 224 input images. Our four network
blocks in total, in which the feature map sizes are 32 × 32, structures on ImageNet are shown in Table 3.
Model A B C FLOPs Params CIFAR-10 CIFAR-100 SVHN
Recurrent CNN [26] - - - - 1.86M 8.69 31.75 1.80
Stochastic Depth ResNet [18] - - - - 1.7M 11.66 37.8 1.75
dasNet [35] - - - - - 9.22 33.78 -
FractalNet [25] - - - - 38.6M 7.33 28.2 1.87
DenseNet (k = 12, T = 36) [17] - - - 0.53G 1.0M 7.00 27.55 1.79
DenseNet (k = 12, T = 96) [17] - - - 3.54G 7.0M 5.77 23.79 1.67
DenseNet (k = 24, T = 96) [17] - - - 13.78G 27.2M 5.83 23.42 1.59
CliqueNet (k = 36, T = 12) - - - 0.91G 0.94M 5.93 27.32 1.77
CliqueNet (k = 64, T = 15) - - - 4.21G 4.49M 5.12 23.98 1.62
CliqueNet (k = 80, T = 15) - - - 6.45G 6.94M 5.10 23.32 1.56
CliqueNet (k = 80, T = 18) - - - 9.45G 10.14M 5.06 23.14 1.51
DenseNet (k = 12, T = 96) [17] - X X 0.58G 0.8M 5.92 24.15 1.76
DenseNet (k = 24, T = 246) [17] - X X 10.84G 15.3M 5.19 19.64 1.74
CliqueNet (k = 36, T = 12) X - - 0.91G 0.98M 5.8 26.41 -
CliqueNet (k = 36, T = 12) - - X 0.98G 1.04M 5.69 26.45 -
CliqueNet (k = 36, T = 12) X - X 0.98G 1.08M 5.61 25.55 1.69
CliqueNet (k = 80, T = 15) X - X 6.88G 8M 5.17 22.78 1.53
CliqueNet (k = 150, T = 30) X X X 8.49G 10.02M 5.06 21.83 1.64

Table 4. Error rates (%) on CIFAR-10, CIFAR-100, and SVHN without any data augmentation. In CliqueNets and DenseNets, k is the
number of filters per layer, and T is the total number of layers in three blocks. “A, B, C” represents attentional transition, bottleneck and
compression, respectively. The FLOPs of DenseNets are calculated by ourselves.

4. Experiments flip. The images are normalized into [0, 1] using mean val-
ues and standard deviations. We report the single-crop error
We evaluate the CliqueNet on benchmark classification rate on the validation set.
datasets, including CIFAR-10, CIFAR-100, SVHN and Im- Training Details. For fair comparison, we do not take much
ageNet, and compare our results with the state of the arts. hyper-parameter tuning, and most of our training strategies
are following [13, 17]. We train our models using stochas-
4.1. Datasets and Training Details
tic gradient descent (SGD) with 0.9 Nesterov momentum
CIFAR. The CIFAR-10 and CIFAR-100 datasets [22] are and 10−4 weight decay. The parameters are initialized ac-
both 32 × 32 colored images. CIFAR-10 dataset consists cording to [12] and the weights of fully connected layer are
of 60,000 images in 10 classes, with 6,000 images in each using Xavier initialization [10]. For CIFAR and SVHN,
class. There are 50,000 images for training and 10,000 im- we train for 300 epochs and 40 epochs, respectively, with
ages for testing. CIFAR-100 dataset is similar to CIFAR-10 batchsize of 64. The learning rate is set to be 0.1 initially
but has 100 classes, each of which contains 600 images. For and is divided by 10 at 50% and 75% of the training proce-
data normalization, we preprocess the dataset by subtracting dure. Compared with ImageNet, the experiments on CIFAR
the mean and dividing by the standard deviation. and SVHN are not resorting to any data augmentation, and
SVHN. The Street View House Number (SVHN) [30] we add a dropout layer [33] with drop out rate 0.2 after each
dataset contains 32 × 32 colored images of house numbers convolution layer following[17]. For ImageNet, we train
cropped from Google Street View. There are 73,257 images our models for 100 epochs and drop the learning rate by 0.1
in the training set, 26,032 in the testing set and 531,131 dig- at epoch 30, 60, and 90. Because we have only server with
its for additional training. Following the common practice 4 GPUs and are constrained by GPU memory, the batchsize
[41, 18, 25, 17], we use all training samples without aug- is 160 for our models on ImageNet, instead of 256 as most
mentation and divide images by 255 for normalization. We studies did.
report the lowest error rate on the testing set.
4.2. Results on CIFAR and SVHN
ImageNet. We also conduct experiments on ILSVRC
2012 dataset[7], which contains 1.2 million training im- Our experimental results on CIFAR and SVHN are
ages, 50,000 validation images, and 100,000 test images shown in Table 4. The first part in the table includes some
with 1,000 classes. Following [13, 17], we adopt the stan- methods before DenseNets and some other studies that also
dard data augmentation for the training sets. A 224 × 224 incorporate feedback connections or attention mechanism.
crop is randomly sampled from the images or its horizontal The second and third parts compare the CliqueNets with
Dense Block Clique Block
Model Params top-1 top-5
0 0 1.0
ResNet-18 [13] 11.7M 30.43 10.76 0.9
CliqueNet-S0∗ 5.7M 27.52 8.98 1 1 0.8
ResNet-34 [13] 21.8M 26.73 8.74 2 2 0.7

bottom layer (i)

CliqueNet-S1∗ 7.96M 26.21 8.3 0.6
3 3 0.5
CliqueNet-S2∗ 10M 25.85 8.02 0.4
DenseNet-121 [17] 7.98M 25.02 7.71 4 4 0.3
CliqueNet-S2 11M 24.82 7.51 5 5 0.2
CliqueNet-S3∗ 13.17M 24.98 7.48 0.1
6 6 0.0
ResNet-50 [13] 25.6M 24.01 7.02 1 2 3 4 5 6 1 2 3 4 5 6
CliqueNet-S3 14.38M 24.01 7.15 top layer (j) top layer (j)

Table 5. Single crop error rates (%) on ImageNet. The ∗ indicates Figure 5. Visualization of the weights in the first block in pre-
the models without attentional transition. trained DenseNet (left) and CliqueNet (right) by calculating the
average absolute value of Wij . Node 0 denotes the input layer of
this block.
DenseNets when they both have no extra technique. The
last two parts show the situation with extra techniques. The
best result and the second best result are marked by red bold (36-12) with both attentional transition and compression
and bold, respectively. leads to a better result than its original version and its origi-
Without extra techniques. The first three parts show nal version with only attentional transition or compression.
that, when extra techniques are not considered, CliqueNets Compared with its counterpart DenseNet (12-36), it drops
outperform most previous methods on CIFAR-10, CIFAR- an error rate of 1.39% on CIFAR-10, 2% on CIFAR-100,
100, and SVHN with significantly fewer parameters. Be- and 0.1% on SVHN, with just 0.08M more parameters. The
cause the layers in CliqueNet can be re-updated but con- CliqueNet (80-15) with attentional transition and compres-
tribute features in each cycle, the depth of CliqueNet is sion also has an improvement than its original version, and
much shallower than other models. For our smallest model increases the state of the art of SVHN to 1.53% with 8M pa-
CliqueNet (36-12), (representing k = 36, and T = 12), rameters, while the previously best result 1.59% on SVHN
each block contains 4 layers. It has the same number of performed by DenseNet (24-96) has three times more pa-
filters, 144, in each block as DenseNet (12-36), but re- rameters. The bottleneck architecture is effective to save
duce the error rate from 7% to 5.93% on CIFAR-10 with parameters, and our largest model CliqueNet (150-15) with
slightly fewer parameters than its counterpart DenseNet bottleneck further improves the performance on CIFAR-10
(12-36). Although the ResNet with stochastic depth [18] and CIFAR-100, but increases parameter and computation
achieved a slightly better performance with 1.7M parame- cost moderately.
ters on SVHN than CliqueNet (36-12), our model drops the
4.3. Results on ImageNet
error rate on CIFAR-10 and CIFAR-100 by a large margin.
As the model capacity goes larger, we find that the perfor- Because we have limited computational resource and can
mance of CliqieNets is getting better without overfitting. As only spread a batch among 4 GPUs, we use a batchsize of
for our model CliqueNet (80-15), it has already achieved 160 on ImageNet, instead of 256 in most studies. Although
the state of the art on three datasets, and even outperforms a smaller batchsize would impair the performance training
the DenseNets that use extra techniques on CIFAR-10 and for the same epochs, the CliqueNets achieve a comparable
SVHN. It has only 6.94M parameters, which are a quarter result on ImageNet with ResNets or DenseNets; see Table 5.
of DenseNet (24-96) with 27.2M parameters, and a half of This indicates that our proposed models can also be applied
DenseNet (24-246) using bottleneck and compression with on large datasets.
15.3M parameters. The CliqueNet-S0∗ and CliqueNet-S1∗ outperform the
With extra techniques. The CliqueNets realize spatial at- ResNet-18 and ResNet-34 with only a half of their param-
tention mechanism due to its recurrent feedback propaga- eters. Larger models also achieve on par with the state
tion. When armed with channel-wise attention, they achieve of the art performed by ResNets and DenseNets. When
an improved performance. This is demonstrated by the the attentional transition is considered, the CliqueNet con-
CliqueNet (36-12) with attentional transition. It has a better tains both spatial attention and channel-wise attention, and
result on CIFAR-10 and CIFAR-100 with slightly more pa- has a better performance accordingly. The CliqueNet-S2
rameters. The compression has the same effect by making and CliqueNet-S3 both reduce about 1% top-1 error rate
the model more compact. It is shown that the attentional compared with their original versions, CliqueNet-S2∗ and
transition is compatible with compression. The CliqueNet CliqueNet-S3∗ that do not have attentional transition.
4.4. Further Discussion Input Stage-I Stage-II

In order to better analyze the recurrent feedback mecha-

nism and the multi-scale feature strategy in CliqueNet, we
visualize feature maps and parameters based on pre-trained
models and provide a further understanding.
Parameter efficiency. Despite the fact that CliqueNet has
bipartite connections between any two layers in the same
block, which would bring more parameters in the block,
we find that the CliqueNet achieves the state of the art on
CIFAR and SVHN dataset with considerably fewer param-
eters than DenseNets. On ImageNet, the CliqueNet us-
ing a smaller batchsize also has parameter efficiency com-
pared with ResNets. This is mainly due to the multi-scale
feature strategy that only transits the Stage-II feature into
the next block, instead of having feature maps stacked to-
wards deeper layers, which may cause progressive incre-
ment of parameters. In Figure 5, we visualize the weights
among layers within a block of pre-trained CliqueNet and
DenseNet. The color pixel of Clique Block covers the
whole heat map because of our feedback connections. It is
noted that the heat dots in a Dense Block are concentrated
along the diagonal. A similar result is also reported in [17]. Figure 6. Feature maps of Stage-I and Stage-II with the highest
The observation reveals that only neighboring layers have average activation in a pre-trained model. The activations of back-
strong dependency in DenseNet, while its forward stacking ground or surrounding objects are repressed in Stage-II.
pattern is actually parameter-demanding. This helps to ex-
plain the parameter and flop efficiency in CliqueNet where
information flow is distributed more evenly in each block. one in the same block so that the information flow is maxi-
Feature refinement. In CliqueNet, the layers are updated mized. The parameters are circulated in the course of prop-
alternately so that they are supervised by each other. More- agation and are able to produce multiple stage features. We
over, in the second stage, feature maps always receive a analyze the feature in different stages and observe that the
higher-level information from the filters that are updated introduce of the Stage-II feature helps to suppress noises
more lately. This spatial attention mechanism makes lay- and leads to a better performance. The multi-scale feature
ers refined repeatedly, and is able to repress the noises or strategy effectively circumvents the progressive increment
background of images and focus more activations on the re- of parameters. Experiments show that our proposed archi-
gion that characterize the target object. In order to test the tectures are able to achieve the state of the arts with fewer
effects, we visualize the feature maps following the meth- parameters, especially on CIFAR and SVHN without resort-
ods in [43]. As shown in Figure 6, we choose three input ing to data augmentation.
images with complex background from ImageNet valida- Different from prior networks, the CliqueNet utilizes a
tion set, and visualize their feature maps with the highest fixed number of parameters to attain a deeper representation
average activation magnitude in the Stage-I and Stage-II, space and incorporates the recurrent feedback to achieve at-
respectively. It is observed that, compared with the Stage- tention mechanism. This topology provides the potential of
I, the feature maps in Stage-II diminish the activations of developing models for other computer vision tasks in future
surrounding objects and focus more attention on the target work, such as semantic segmentation, salient object detec-
region. This is in line with the conclusion in Table 2 that the tion, image captioning, etc.
Stage-II feature is more discriminative and leads to a better
performance.
Acknowledgements
5. Conclusion
Zhouchen Lin was supported by National Basic Re-
In this study, we introduce a new convolutional neural search Program of China (973 Program) (grant no.
network architecture where the layers in a block are con- 2015CB352502), National Natural Science Foundation
structed as a clique and are updated alternately in a loop (NSF) of China (grant nos. 61625301 and 61731018), Qual-
manner. Any layer is both the input and output of another comm, and Microsoft Research Asia.
References [19] J. Hupé, A. James, B. Payne, S. Lomber, P. Girard, and
J. Bullier. Cortical feedback improves discrimination be-
[1] C. Cao, X. Liu, Y. Yang, Y. Yu, J. Wang, Z. Wang, Y. Huang, tween figure and background by v1, v2 and v3 neurons. Na-
L. Wang, C. Huang, W. Xu, et al. Look and think twice: ture, 394(6695):784–787, 1998. 2
Capturing top-down visual attention with feedback convolu- [20] S. Ioffe and C. Szegedy. Batch normalization: Accelerating
tional neural networks. In ICCV, pages 2956–2964, 2015. 2, deep network training by reducing internal covariate shift. In
3 ICML, pages 448–456, 2015. 5
[2] I. Caswell, C. Shen, and L. Wang. Loopy neural nets: Imi- [21] S. Jastrzebski, D. Arpit, N. Ballas, V. Verma, T. Che, and
tating feedback loops in the human brain. Tech.Report. 3 Y. Bengio. Residual connections encourage iterative infer-
[3] L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T.- ence. In ICLR, 2018. 1, 2
S. Chua. SCA-CNN: Spatial and channel-wise attention in [22] A. Krizhevsky and G. Hinton. Learning multiple layers of
convolutional networks for image captioning. In CVPR, July features from tiny images. 2009. 6
2017. 2, 3, 4 [23] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet
[4] L.-C. Chen, A. Schwing, A. Yuille, and R. Urtasun. Learning classification with deep convolutional neural networks. In
deep structured models. In ICML, pages 1785–1794. PMLR, NIPS, pages 1097–1105, 2012. 1
July 2015. 3 [24] J. Kuen, Z. Wang, and G. Wang. Recurrent attentional net-
[5] Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng. Dual works for saliency detection. In CVPR, pages 3668–3677,
path networks. In NIPS, 2017. 1, 2, 5 2016. 2, 3
[6] D. Ciregan, U. Meier, and J. Schmidhuber. Multi-column [25] G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet:
deep neural networks for image classification. In CVPR, Ultra-deep neural networks without residuals. In ICLR,
pages 3642–3649. IEEE, 2012. 2 2017. 2, 6
[7] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- [26] M. Liang and X. Hu. Recurrent convolutional neural network
Fei. Imagenet: A large-scale hierarchical image database. In for object recognition. In CVPR, pages 3367–3375, 2015. 2,
CVPR, pages 248–255. IEEE, 2009. 6 6
[8] J. Fu, H. Zheng, and T. Mei. Look closer to see better: recur- [27] Q. Liao and T. Poggio. Bridging the gaps between residual
rent attention convolutional neural network for fine-grained learning, recurrent neural networks and visual cortex. arXiv
image recognition. In CVPR, 2017. 2, 3 preprint arXiv:1604.03640, 2016. 1, 2
[9] D. Gabay and B. Mercier. A dual algorithm for the solu- [28] V. Mnih, N. Heess, A. Graves, et al. Recurrent models of
tion of nonlinear variational problems via finite element ap- visual attention. In NIPS, pages 2204–2212, 2014. 2, 3
proximation. Computers & Mathematics with Applications, [29] H. Nam, J.-W. Ha, and J. Kim. Dual attention networks for
2(1):17–40, 1976. 3 multimodal reasoning and matching. In CVPR, July 2017. 2,
[10] X. Glorot and Y. Bengio. Understanding the difficulty of 3
training deep feedforward neural networks. In Proceedings [30] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y.
of the Thirteenth International Conference on Artificial In- Ng. Reading digits in natural images with unsupervised fea-
telligence and Statistics, pages 249–256, 2010. 6 ture learning. In NIPS workshop on deep learning and unsu-
pervised feature learning, volume 2011, page 5, 2011. 6
[11] K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway
[31] P. H. O. Pinheiro and R. Collobert. Recurrent convolutional
and residual networks learn unrolled iterative estimation. In
neural networks for scene labeling. In ICML, pages I–82,
ICLR, 2017. 2
2014. 2
[12] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into
[32] K. Simonyan and A. Zisserman. Very deep convolutional
rectifiers: Surpassing human-level performance on imagenet
networks for large-scale image recognition. In ICLR, 2015.
classification. In ICCV, pages 1026–1034, 2015. 6
1
[13] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning
[33] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and
for image recognition. In CVPR, pages 770–778, 2016. 1, 2,
R. Salakhutdinov. Dropout: a simple way to prevent neu-
5, 6, 7
ral networks from overfitting. Journal of machine learning
[14] S. Hochreiter and J. Schmidhuber. Long short-term memory. research, 15(1):1929–1958, 2014. 6
Neural computation, 9(8):1735–1780, 1997. 1 [34] R. K. Srivastava, K. Greff, and J. Schmidhuber. Training
[15] J. B. Hopfinger, M. H. Buonocore, and G. R. Mangun. The very deep networks. In NIPS, pages 2377–2385, 2015. 1, 2
neural mechanisms of top-down attentional control. Nature [35] M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber.
neuroscience, 3(3):284–291, 2000. 2 Deep networks with internal selective attention through feed-
[16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation net- back connections. In NIPS, pages 3545–3553, 2014. 2, 3, 6
works. arXiv preprint arXiv:1709.01507, 2017. 3, 4, 5 [36] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed,
[17] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger. D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich.
Densely connected convolutional networks. In CVPR, July Going deeper with convolutions. In CVPR, pages 1–9, 2015.
2017. 1, 2, 3, 4, 5, 6, 7, 8 1, 2
[18] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. [37] F. Wang, M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang,
Deep networks with stochastic depth. In ECCV, pages 646– X. Wang, and X. Tang. Residual attention network for image
661. Springer, 2016. 1, 2, 6, 7 classification. In CVPR, July 2017. 2, 3, 4, 5
[38] J. Wang, Z. Wei, T. Zhang, and W. Zeng. Deeply-fused nets.
arXiv preprint arXiv:1605.07716, 2016. 2
[39] Q. Wang, J. Zhang, S. Song, and Z. Zhang. Attentional neu-
ral network: Feature selection using cognitive feedback. In
NIPS, pages 2033–2041, 2014. 2, 3
[40] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He. Aggregated
residual transformations for deep neural networks. In CVPR,
July 2017. 1
[41] S. Zagoruyko and N. Komodakis. Wide residual networks.
In BMVC, 2016. 1, 2, 5, 6
[42] A. R. Zamir, T.-L. Wu, L. Sun, W. B. Shen, B. E. Shi, J. Ma-
lik, and S. Savarese. Feedback networks. In CVPR, July
2017. 2, 3
[43] M. D. Zeiler and R. Fergus. Visualizing and understanding
convolutional networks. In ECCV, pages 818–833. Springer,
2014. 8
[44] X. Zhang, Z. Li, C. Change Loy, and D. Lin. Polynet: A pur-
suit of structural diversity in very deep networks. In CVPR,
July 2017. 1, 2
[45] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet,
Z. Su, D. Du, C. Huang, and P. H. S. Torr. Conditional ran-
dom fields as recurrent neural networks. In ICCV, December
2015. 3

Usability Principle
60% (10)
Usability Principle
6 pages
11-DESSA Competencies
No ratings yet
11-DESSA Competencies
1 page
Listening Skills Unit 5 16-2-2025
No ratings yet
Listening Skills Unit 5 16-2-2025
2 pages
Building A Compact MQDF Classifier by Sparse Coding and Vector Quantization Technique
No ratings yet
Building A Compact MQDF Classifier by Sparse Coding and Vector Quantization Technique
6 pages
Extremely Sparse Deep Learning Using Inception Modules With Dropfilters
No ratings yet
Extremely Sparse Deep Learning Using Inception Modules With Dropfilters
6 pages
Classification of Graphomotor Impressions Using Convolutional Neural Networks: An Application To Automated Neuro-Psychological Screening Tests
No ratings yet
Classification of Graphomotor Impressions Using Convolutional Neural Networks: An Application To Automated Neuro-Psychological Screening Tests
6 pages
Benchmarking Keypoint Filtering Approaches For Document Image Matching
No ratings yet
Benchmarking Keypoint Filtering Approaches For Document Image Matching
6 pages
Allama Iqbal Open University, Islamabad (Department of English Language and Applied Linguistics) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of English Language and Applied Linguistics) Warning
2 pages
Segmentation Free Spotting of Cuneiform Using Part Structured Models
No ratings yet
Segmentation Free Spotting of Cuneiform Using Part Structured Models
6 pages
A Lexicon Verification Strategy in A BLSTM Cascade Framework
No ratings yet
A Lexicon Verification Strategy in A BLSTM Cascade Framework
6 pages
Zoning Aggregated Hypercolumns For Keyword Spotting
No ratings yet
Zoning Aggregated Hypercolumns For Keyword Spotting
6 pages
Convolutional Multi-Directional Recurrent Network For of Ine Handwritten Text Recognition
No ratings yet
Convolutional Multi-Directional Recurrent Network For of Ine Handwritten Text Recognition
6 pages
Line-of-Sight Stroke Graphs and Parzen Shape Context Features For Handwritten Math Formula Representation and Symbol Segmentation
No ratings yet
Line-of-Sight Stroke Graphs and Parzen Shape Context Features For Handwritten Math Formula Representation and Symbol Segmentation
7 pages
The First Handwritten Balinese Palm Leaf Manuscripts Dataset
No ratings yet
The First Handwritten Balinese Palm Leaf Manuscripts Dataset
6 pages
Online Handwritten Mathematical Expressions Recognition by Merging Multiple 1D Interpretations
No ratings yet
Online Handwritten Mathematical Expressions Recognition by Merging Multiple 1D Interpretations
6 pages
On The Design of Personal Digital Bodyguards: Impact of Hardware Resolution On Handwriting Analysis
No ratings yet
On The Design of Personal Digital Bodyguards: Impact of Hardware Resolution On Handwriting Analysis
6 pages
Done-Read-Lit W6
No ratings yet
Done-Read-Lit W6
11 pages
Sheet Music Statistical Layout Analysis: 2016 15th International Conference On Frontiers in Handwriting Recognition
No ratings yet
Sheet Music Statistical Layout Analysis: 2016 15th International Conference On Frontiers in Handwriting Recognition
6 pages
Phocnet: A Deep Convolutional Neural Network For Word Spotting in Handwritten Documents
No ratings yet
Phocnet: A Deep Convolutional Neural Network For Word Spotting in Handwritten Documents
6 pages
Multiple Generation of Bengali Static Signatures
No ratings yet
Multiple Generation of Bengali Static Signatures
6 pages
Discovering Visual Element Evolutions For Historical Document Dating
No ratings yet
Discovering Visual Element Evolutions For Historical Document Dating
6 pages
New Tampered Features For Scene and Caption Text Classification in Video Frame
No ratings yet
New Tampered Features For Scene and Caption Text Classification in Video Frame
6 pages
Automatic Signature Segmentation Using Hyper-Spectral Imaging
No ratings yet
Automatic Signature Segmentation Using Hyper-Spectral Imaging
6 pages
Zhang ResNeSt Split-Attention Networks CVPRW 2022 Paper
No ratings yet
Zhang ResNeSt Split-Attention Networks CVPRW 2022 Paper
11 pages
Cascading Training For Relaxation CNN On Handwritten Character Recognition
No ratings yet
Cascading Training For Relaxation CNN On Handwritten Character Recognition
6 pages
Defensive Patches For Robust Recognition in The Physical World
No ratings yet
Defensive Patches For Robust Recognition in The Physical World
10 pages
Recognizing Off-Line Flowcharts by Reconstructing Strokes and Using On-Line Recognition Techniques
No ratings yet
Recognizing Off-Line Flowcharts by Reconstructing Strokes and Using On-Line Recognition Techniques
6 pages
Fourier Coefficients For Fraud Handwritten Document Classification Through Age Analysis
No ratings yet
Fourier Coefficients For Fraud Handwritten Document Classification Through Age Analysis
6 pages
On The Parametrization of The Three-Dimensional Rotation Group
No ratings yet
On The Parametrization of The Three-Dimensional Rotation Group
10 pages
Efficient Inference in Fully Connected CRFs
No ratings yet
Efficient Inference in Fully Connected CRFs
9 pages
Residual Squeeze VGG16
No ratings yet
Residual Squeeze VGG16
11 pages
Recent Advances in Simultaneous Localiza
No ratings yet
Recent Advances in Simultaneous Localiza
34 pages
Practical Research1 DLL W9
No ratings yet
Practical Research1 DLL W9
7 pages
Ent 20
No ratings yet
Ent 20
3 pages
EATING DISORDERS Research Proposal
No ratings yet
EATING DISORDERS Research Proposal
12 pages
Richard Bolles
No ratings yet
Richard Bolles
2 pages
TPAMI20 Res2Net
No ratings yet
TPAMI20 Res2Net
11 pages
465-Lecture 7
No ratings yet
465-Lecture 7
46 pages
Modern Convolutional Neural Networks
No ratings yet
Modern Convolutional Neural Networks
68 pages
The Multicultural World I - Listening Test: Text 1
No ratings yet
The Multicultural World I - Listening Test: Text 1
2 pages
Deep Learning (22CS63) : Module-3
No ratings yet
Deep Learning (22CS63) : Module-3
58 pages
Investigating The Effectiveness of Online Games in Vocabulary Learning
No ratings yet
Investigating The Effectiveness of Online Games in Vocabulary Learning
108 pages
Population Study
No ratings yet
Population Study
7 pages
RESNET
No ratings yet
RESNET
5 pages
Generative Linguistics and Cognitive Psychology
No ratings yet
Generative Linguistics and Cognitive Psychology
5 pages
Linda Trinkaus Zagzebski - Epistemic Values - Collected Papers in Epistemology-Oxford University Press (2020) PDF
100% (1)
Linda Trinkaus Zagzebski - Epistemic Values - Collected Papers in Epistemology-Oxford University Press (2020) PDF
375 pages
19 ResNet 10 09 2024
No ratings yet
19 ResNet 10 09 2024
35 pages
Distributed Clique-Based Neural Networks For Data Fusion at The Edge
No ratings yet
Distributed Clique-Based Neural Networks For Data Fusion at The Edge
4 pages
Literacy Learner Analysis Project - Kara Weinstein
No ratings yet
Literacy Learner Analysis Project - Kara Weinstein
53 pages
Post Structuralism Wikipedia
No ratings yet
Post Structuralism Wikipedia
8 pages
Term Paper Question Bus 5101 Marketing Management
No ratings yet
Term Paper Question Bus 5101 Marketing Management
3 pages
Gated Attention Networks For Learning On Large
No ratings yet
Gated Attention Networks For Learning On Large
11 pages
Memory Level of Teaching
No ratings yet
Memory Level of Teaching
29 pages
REgnet
No ratings yet
REgnet
6 pages
Leidy Viviana Martinez Gomez: Contact
No ratings yet
Leidy Viviana Martinez Gomez: Contact
2 pages
Unit 3
No ratings yet
Unit 3
38 pages
Habits-A Repeat Performance PDF
No ratings yet
Habits-A Repeat Performance PDF
5 pages
DL3 QB
No ratings yet
DL3 QB
19 pages
OGAT Teaching Recruitment Brochure
No ratings yet
OGAT Teaching Recruitment Brochure
14 pages
Aggregated Residual Transformations For Deep Neural Networks
No ratings yet
Aggregated Residual Transformations For Deep Neural Networks
9 pages
Grammar - Simple Past Tense Questions 2 - Mode - Report - Unit 11 - Lesson 2 - YAC03192102V (F2F) BenGil - MyEnglishLab
100% (1)
Grammar - Simple Past Tense Questions 2 - Mode - Report - Unit 11 - Lesson 2 - YAC03192102V (F2F) BenGil - MyEnglishLab
2 pages
CAAI Trans On Intel Tech - 2024 - Sharma - Image and Video Analysis Using Graph Neural Network For Internet of Medical
No ratings yet
CAAI Trans On Intel Tech - 2024 - Sharma - Image and Video Analysis Using Graph Neural Network For Internet of Medical
15 pages
Racine 2018 Lexi Cal Method
No ratings yet
Racine 2018 Lexi Cal Method
9 pages
CNN (1) - Unit 3 - Merged
No ratings yet
CNN (1) - Unit 3 - Merged
14 pages
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
No ratings yet
Implementation of Deep Neural Networks Learning On Unmanned Aerial Vehicle Based Remote-Sensing
7 pages
Question Bank Maths Method
No ratings yet
Question Bank Maths Method
3 pages
Unit III
No ratings yet
Unit III
58 pages
Condense Net
No ratings yet
Condense Net
10 pages
Zehao Huang Data-Driven Sparse Structure ECCV 2018 Paper
No ratings yet
Zehao Huang Data-Driven Sparse Structure ECCV 2018 Paper
17 pages
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
No ratings yet
International Journal of Computational Science, Information Technology and Control Engineering (IJCSITCE)
8 pages
A. Preparatory Activity 1. Prayer
No ratings yet
A. Preparatory Activity 1. Prayer
4 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
214 Fractalnet Ultra Deep Neural N
No ratings yet
214 Fractalnet Ultra Deep Neural N
11 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
Trustworthy - Final Essay
No ratings yet
Trustworthy - Final Essay
21 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Unit 5
No ratings yet
Unit 5
24 pages
Hierarchical Graph Neural Networks
No ratings yet
Hierarchical Graph Neural Networks
14 pages
Res Net
No ratings yet
Res Net
8 pages
DCNN Algorithms
No ratings yet
DCNN Algorithms
4 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Unit-2 Adl
No ratings yet
Unit-2 Adl
25 pages
The First Tamil Book On Behavioral Economics
0% (1)
The First Tamil Book On Behavioral Economics
152 pages
Leadership Is A Conversation: Presented by
No ratings yet
Leadership Is A Conversation: Presented by
15 pages
4b Image Processing
No ratings yet
4b Image Processing
63 pages
Going Deeper With Convolutions
No ratings yet
Going Deeper With Convolutions
9 pages
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
No ratings yet
Xuesong Wang Et Al - 2021 - Multipath Ensemble Convolutional Neural Network
9 pages
6 Apr - 6 - DL
No ratings yet
6 Apr - 6 - DL
69 pages
Hu Squeeze-And-Excitation Networks CVPR 2018 Paper
No ratings yet
Hu Squeeze-And-Excitation Networks CVPR 2018 Paper
10 pages
Dense Net
No ratings yet
Dense Net
15 pages
Bandura Reviewer
No ratings yet
Bandura Reviewer
5 pages
CNN Case Studies Unit 4
No ratings yet
CNN Case Studies Unit 4
13 pages
Chen Et Al. - 2018 - Graph-Based Global Reasoning Networks
No ratings yet
Chen Et Al. - 2018 - Graph-Based Global Reasoning Networks
10 pages
Post-Reading Report Alex Shen (Mid Exam)
No ratings yet
Post-Reading Report Alex Shen (Mid Exam)
36 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
1 s2.0 S0031320317304120 Main
No ratings yet
1 s2.0 S0031320317304120 Main
24 pages
Technical Report On DenseNet Architecture (Deep Learning Network Model)
No ratings yet
Technical Report On DenseNet Architecture (Deep Learning Network Model)
9 pages
Competitive Inner-Imaging Squeeze and Excitation For Residual Network PDF
No ratings yet
Competitive Inner-Imaging Squeeze and Excitation For Residual Network PDF
17 pages
Densely Residual Laplacian Super-Resolution: Saeed Anwar, Member, IEEE, and Nick Barnes, Senior Member, IEEE
No ratings yet
Densely Residual Laplacian Super-Resolution: Saeed Anwar, Member, IEEE, and Nick Barnes, Senior Member, IEEE
12 pages
Practical Block-Wise Neural Network Architecture Generation
No ratings yet
Practical Block-Wise Neural Network Architecture Generation
11 pages
Organizational Effectiveness in Organization's Development
No ratings yet
Organizational Effectiveness in Organization's Development
54 pages
Graph Convolutional Networks Adaptations and Applications
No ratings yet
Graph Convolutional Networks Adaptations and Applications
6 pages
Narrative Report On The Regional Mass Training of Grade 10 Teachers of The K
100% (1)
Narrative Report On The Regional Mass Training of Grade 10 Teachers of The K
5 pages
Deep ResNet
No ratings yet
Deep ResNet
9 pages
Csrnet: Dilated Convolutional Neural Networks For Understanding The Highly Congested Scenes
No ratings yet
Csrnet: Dilated Convolutional Neural Networks For Understanding The Highly Congested Scenes
16 pages
Data Science Interview Preparation (30 Days of Interview Preparation)
No ratings yet
Data Science Interview Preparation (30 Days of Interview Preparation)
15 pages
An Efficient Object Detection Algorithm Based On Compressed Networks
No ratings yet
An Efficient Object Detection Algorithm Based On Compressed Networks
13 pages
DL Tutorial NIPS2015 PDF
No ratings yet
DL Tutorial NIPS2015 PDF
133 pages
Squeeze Net
No ratings yet
Squeeze Net
13 pages
Densely Connected Convolutional Networks
No ratings yet
Densely Connected Convolutional Networks
11 pages
tmpD684 TMP
No ratings yet
tmpD684 TMP
8 pages
Identify Web Cam Images Using Neural Networks
No ratings yet
Identify Web Cam Images Using Neural Networks
17 pages

Convolutional Neural Networks With Alternately Updated Clique

Uploaded by

Convolutional Neural Networks With Alternately Updated Clique

Uploaded by

Convolutional Neural Networks with Alternately Updated Clique

Yibo Yang1,2 , Zhisheng Zhong2 , Tiancheng Shen1,2 , Zhouchen Lin2,3,∗

Clique Block 1 Clique Block 2 Clique Block 3

Global Pool Global Pool Global Pool

name block feature transit error(%)

10 -1 Table 2. Results of different versions of CliqueNets on CIFAR-10.

bottom layer (i)

In order to better analyze the recurrent feedback mecha-

You might also like