0% found this document useful (0 votes)
60 views7 pages

Cloud Final Report PDF

This document proposes three variations of the DenseNet CNN called PartialDenseNet that aim to reduce computational cost while retaining performance on image classification tasks. The PartialDenseNets were built upon DenseNet, a state-of-the-art network, by eliminating potentially useless connections between layers. Experiments on the PartialDenseNets analyzed the influence of different connection characteristics and training curves. Finally, the best performing PartialDenseNet was implemented in a website for online image classification using cloud computing resources.

Uploaded by

Pu Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views7 pages

Cloud Final Report PDF

This document proposes three variations of the DenseNet CNN called PartialDenseNet that aim to reduce computational cost while retaining performance on image classification tasks. The PartialDenseNets were built upon DenseNet, a state-of-the-art network, by eliminating potentially useless connections between layers. Experiments on the PartialDenseNets analyzed the influence of different connection characteristics and training curves. Finally, the best performing PartialDenseNet was implemented in a website for online image classification using cloud computing resources.

Uploaded by

Pu Su
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Image Classification using SparseNet CNN

Tiangyang Liu

Yicheng Wang

Su Pu

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
[email protected]

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
[email protected]

Electrical and Computer Engineering


University of Florida
Gainesville, Florida
[email protected]

Abstract Analyzing and classifying pictures is quite a heating


topic in nowadays computers vision field, but may cost great
computational resource. Cloud computing enables users and
enterprises from different fields to share the same resource to process
big data and big algorithm. The Convolutional Neural Network
(CNN) has been a dominant approach to do image classification
since 2011, while DenseNet CNN is the most recent state-of-the-art
structure. We proposed three variations upon DenseNet, and
successfully reduced the computational cost but retained
performance. We cut off low efficient connections between layers,
then conducted several experiments to discuss connection features,
multiform blocks, growth, learning curves, and etc. Finally, we built
a dynamic website with the best pretrained network to do image
classification. All computations benefit from cloud computing
technology.
Keywordscloud
DenseNet

computing;

CIFAR-10;

DenseNet;

Partial

I. INTRODUCTION
Analyzing and classifying pictures is quite a heating topic in
nowadays computers vision field, and is a crucial module in
robotics. The associated dataset ImageNet which is leaded by
Feifei Li, is the most prestigious and the largest academic
resource on images. In recent years since 2010, a 1.2 million
subset of the ImageNet within 1000 categories has been used in
the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), the most important annual competition on image
classification, as the database. The state-of-the-art result from
this annual competition is improving so fast, that today the
overall accuracy could reach more than 95%. High though it is,
the requirements upon the computation resources are critical,
and both calculation and storage can take up great amounts of
time and space, which slows down the progress of design and
development. Fortunately, the cloud computing technology
today is mature enough to help the researchers deal with high
performance calculation. In specific, cloud computing relieves
the pressure of computing hardware when processing big data
and big algorithm, by granting users paid access to remote
clusters. It helps companies save money from purchasing more
servers, and provides with almost unlimited amount of secured
space. By using third party data centers like AWS and Google
Cloud, users and enterprises from different fields can share the
same resources, and dynamically change the requests according
to needs.

In the past few decades, various image classification


methods like decision tree [1], SVM [2], and fuzzy algorithm [3]
were proposed, but are limited to solving tasks of small datasets,
and therefore do not feature much practical meaning. The
Convolutional Neural Network (CNN) idea [4] which evolved
from Artificial Neural Network (ANN) explains how the
machine is going to understand and learn objects from the
pictures and is comparatively an advanced method. In the area
of machine learning, CNN is a type of feed-forward ANN in
which the connection characteristics among neurons are
enlightened by the arrangement of an animal visual network.
Each visual neuron would respond to overlapping regions
affecting the visual field, which mathematically could be
expressed as a convolution operation, and that is the basic of
CNN. CNN consists multiple layers of receptive convolution
fields and may include local and global pooling layers
combining the outputs of neural clusters. The convolutional
layers aim to extract lots of useful information from images,
while the pooling layers condense the representation by
maximally reducing the redundant information. CNN is such a
great boost to ILSVRC, that in 2011 it [5] increased the best
recognition rate from 74% to 84%, and became a hot research
topic in turn. Numerous variations of CNN upon [5] have been
published: some constructed a wide and deep network to
increase accuracy; some others [6-9] promoted neuron
efficiency to save parameters. While the convolutional
structures and pooling schemes are two main aspects people
delve into, the classification accuracy is the single most crucial
metric to challenge.
We believe that CNN still has much potential to develop.
This work, the PartialDenseNet, is built upon a variation of CNN
called DenseNet [10], which is the most recent state-of-the-art
network evaluated by CIFAR-10/100. PartialDenseNet keeps
the basic framework of DenseNet, but eliminates potentially
useless connections to train even more accurate and fast image
classifiers. The contributions of this work include: 1)
Investigated principles of the DenseNet to design three partially
dense variations. 2) Analyzed and compared PartialDenseNets
using multiple metrics. 3) Discussed the influence of different
training curves. 4) Implemented a PartialDenseNet in a website
to do online image classification. In the following sections, we
will first review some papers on image classification as well as
the progress of CNN, then illustrate specific architectures of the

three proposed variations. Next, we conduct experiments to


check the influence, of connection characteristics and training
curves, with exhaustive explanation and discussion. Finally, we
will summarize this work, and talk about specifications on the
website implementation.
II.

RELATED WORK

Image classification refers to the task of extracting classbased information from rasterized images. Advanced
classification methods could be classified into pixel algorithms,
subpixel algorithms, field algorithms, contextual-based
approaches, knowledge-based algorithms, and combinative
approaches, while in different scenarios we implement selected
methods. Best several methods include decision tree [1], SVM
[2], and fuzzy algorithm [3]. Decision tree revealed its potential
in [1] on land mapping problem solving. Univariate decision
trees, multivariate decision trees, and hybrid decision trees were
tested in case of the classification accuracy, and outperformed
maximum likelihood method as well as the line discriminant
function classifiers. It was believed to be the best choice in
remote sensing applications due to its simple, explicit, and
intuitive structure; the nonparametric feature enables the flexible
and robust operation with even noisy inputs. SVM [2] was
introduced to solve pattern recognition problems and had
received great success. The fuzzy algorithm in [3] could help the
K nearest decision rule in situations that, the knowledge of
probabilities is lost. While these methods all once achieved the
state-of-the-art result, they can just solve image classification
tasks of small datasets. In a case of large datasets like the
CIFAR-10/100 which is more complicated and practical, a novel
method must be introduced
CNN was originated in the 1990s, and has been growing fast
in this decade: since 2011, CNN has been taking a dominant
position in ILSVRC, and has become a hot research point in turn.
The Neural Network (NN) simulates how human brains work,
and therefore gains a deep potential in the field of computers
vision and pattern recognition. CNN inherits the advantages of
NN, but from pictures can extract additional useful information
like continuity; it also features convolution and pooling schemes
to enable precise extraction and condensation of the inputs. [4]
published in 1998 was the first attempt to use a CNN called
LeNet-5 in document recognition. Back then it was not so
famous, partly because GPUs were not developed well enough
to support CNN to exhibit its ability, and also, because there
were some traditional methods already good enough to solve
tasks of small datasets. As a result, the superiority of CNN was
occluded. In recent years in 2012, article [5] picked up CNN to
challenge the ILSVRC, and achieved a state-of-the-art result:
while the second place got a 74% accuracy, this implementation
received 84% which was a big step in image classification. In
this CNN, 60M parameters were used to construct five
convolutional layers, three max pooling layers, and three fully
connected layers.
Numerous works upon [5] were published. Szegedy et al. [6,
7] concentrated on increasing the width and depth of CNN while
keeping the computational budget constant. This idea came from
the thinking that, although increased model size tends to
translate to immediate quality gains in most cases, the efficiency
and low parameters counts are still crucial in situations like

mobile vision. This CNN of 22 layers was implemented with


factorized convolution and aggressive regularization, and
eventually won the first place in 2014. Deep CNN inevitably
accompanies a large count of hyperparameters and therefore
could be difficult to train. He [8] presented a residual learning
framework to solve the problems of training a deep CNN: it
explicitly transformed the layers as residual functions, instead of
the unreferenced functions with regard to the inputs. As a result,
this method could increase the performance dramatically,
especially in those deep CNNs. CNN has multiple features
identical to the visual part of human brains, but is different in
that CNN has just a feed forward architecture, while brains have
lots of recurrent structures. Article [9] researched on this idea,
and proposed a variation called Recurrent CNN via connecting
convolutional layers with recurrent modules. The pros of this
structure include high utilization of context information, as well
as using a small number of parameters to achieve a high
recognition rate: that is, 0.67M parameters to achieve a small
7.37% miss in CIFAR-10/100. While [5] illustrated the
importance of depth versus performance, all three variations
above [6-9] increased by just a small count of depth, but adopted
various hacks to increase the efficiency of each neuron.
The most recent state-of-the-art result of CIFAR-10/100
image classification was published in [10] from a DenseNet
CNN. DenseNet structure originated from the conclusion that,
short connections between layers close to the entrance, and
layers close to the exit, could boost accurate and efficient
training of deep CNNs. In DenseNet, layers are directly
connected to all following layers in a feed-forward fashion, with
the feature maps being passed as inputs to all the subsequent
layers. In specific, it puts 12 layers DenseNet as a single block
in the Network In Network (NIN) framework, while in total it
has 40 layers; between blocks there are batch normalization,
ReLU activation, and pooling operations. The advantages of this
model include: preventing gradients from vanishing,
encouraging feature reuse, reducing the count of parameters,
simple generalization, and the state-of-the-art results in all five
mainstream benchmarks. The proposed PartialDenseNet is
invented upon the achievements of DenseNet: PartialDenseNet
keeps the shortcut feature as well as the NIN architecture, but it
accepts just the previous several layers, instead of all the
previous layers, as the input to do convolution to generate the
next result, which reduces parameters and saves lots of
computation.
III.

ARCHITECTURE

A. DenseNet
CNN is a dominant approach to do visual object recognition
which was invented 20 years ago, and people have long
observed that the increasing of depth can be transformed into the
improvement of performance. Nevertheless, CNN kept shallow
and did not reach 100 layers until 2015 when HighwayNet [11]
and ResNet [8] were introduced. The challenge of training a
deep network is, as the inputs and gradients go through so many
layers, effective informations could be lost. HighwayNet and
ResNet addressed this problem by bypassing feature maps to the
next layers via the so-called identity connections. Upon these
works, StochasticNet [12] dropped random layers from ResNet
to allow efficient flow of information, while the FractalNet [13]

incorporated a fractal structure of a large nominal depth but


having many shortcuts. DenseNet [10] noticed the common part
of these deep structures: they all have short connections from
layers close to the input, to layers close to the output, and was
therefore invented.
DenseNet ensured a maximum information flow by
connecting all layers to others. In specific, one accepts feature
maps from all previous layers as inputs, while its outputted
feature maps are passed down to all subsequent layers. The
output against input is expressed by a composite function
consisting three consecutive operations: Batch Normalization
(BN), Rectified Line Unit (ReLU), and convolution. The count
of feature maps outputted by a composite function, is referred to
as the growth rate, and is 12 in the provided example. In the
example, a DenseNet of 40 layers is separated by three dense
blocks each having 12 layers, i.e. the NIN framework; between
blocks there have convolution and pooling modules as
transitional layers where images are resized. In the final network
training, a simple staircase training curve was adopted, but there
may have gained in accuracy from effective learning schedules.
B. Variation 1: PartialDenseNet

O(L2) links in a block of L layers. A substitute should maintain


the good features of DenseNet including non-vanishing
gradients, feature reusing, low parameters count, as well as a
simple generalization, but should cost less time to train. A
method we can think of is, to feed a node with just the following
N layers instead of all layers, and that the output of this block
should be some selected N layers plus the block input, instead of
all layers.
Fig. 1 illustrates this idea: while the left graph represents
DenseNet, the right one describes connections of the proposed
PartialDenseNet when N equals to 3, which requires just O(L)
links in an L layers block. Specifically, in most layers we accept
just 12*3=36 images as input to generate the next 12 images,
which is fairly small compared to the average of
(16+436)/2=226 images in the fully connected version. We then
simplify this idea by saying that, the count of images to process
is positively related to the cost of the computational resource,
and therefore negatively related to efficiency. This
simplification is not so precise because we will ignore the
resizing influence of pooling, and also in the next variations, we
would ignore the influence of growth. Table 1 is provided to
compare these metrics in case of different N values, i.e.
connected layers, where the relation of the average input versus
N is given by
Avg. input = ((16 + 12*N + 6*(N - 1))*N

(1)

+ 12*(12 - N)*N)/12
TABLE I.

COMPUTATIONAL COST AND EFFICIENCY V.S. NUMBER OF


CONNECTED LAYERS

Fig. 1. DenseNet v.s. PartialDenseNet

At first, we will discuss the connection features within a


single block. In DenseNet, a node would accept all its preceding
layers as inputs, then process them using the composite function,
and finally, pass the output to all subsequent layers. That is, in a
DenseNet of 12 layers, with a growth of 12 and an input of 16
images, the last node of the same block would accept 16 + 12*11
= 148 images, then generate 12 images from them. Considering
the NIN structure, the last node of the last block would accept
16 + 12*(12*2 + 11) = 436 images, which is a huge batch of data
to process. We admit the high performance of DenseNet, but
doubt a necessity of the fully connection which requires up to

Conn. layers

Avg. input

Comp. cost

Efficiency

13

0.058

16.950

2
3
4

28
43
59

0.122
0.190
0.263

8.169
5.256
3.809

77

0.339

2.948

95

0.420

2.379

114

0.506

1.977

135

0.596

1.678

156

0.690

1.449

10

178

0.789

1.267

11
12

201
226

0.892
1.000

1.121
1.000

From Table 1 we see that, the selection of N must be very


careful: N should not be very big as to cause a quick increase of
the computational cost; N also cannot be so small as to lose
information and accuracy. According to experience, 4 to 6 is an
appropriate range to choose N among, so we implemented an
instance where N equals to 4 in the experiment section.

Fig. 2. Network in network

Fig. 3. Final Architecture of PartialDenseNet

C. Variation 2: Multiform Blocks


We then rethink about the relationship between the three
blocks in the NIN structure, see Fig. 2. In a PartialDenseNet
where N equals to 4, the input of block A is 16 images, so the
output should be 16+4*12=64 images. The input of block B is
64 images, so the output should be 64+4*12=112 images. It is
observed that, each block would increase the count of images by
a same constant. Nevertheless, the information flow close to the
input, i.e. the block A, has gone through just a few operations,
while the information flow close to the output, i.e. the block C,
has already passed numerous convolutions. Therefore we say,
block A may contain much more authentic information than
block C, and so, it should be wise to make A has a big N value
to retain high fidelity information, while at the same time, make
C has a small N value to increase speed. Based on this idea, we
modified PartialDenseNet by setting multiform blocks. In
experiment, an instance was implemented where block A has
N=12, block B has N=6, and block C has N=4.
D. Variation 3: Increasing Growth
The growth rate is also a point to delve into. In a
PartialDenseNet, the growth is set to 12 as default, which means
that any input can result in 12 images as output. But the fact is,
the input of A is just 16 images while the input of C has
accumulated 16+12*12+6*12=232 images. Since the input of C
contains more information, we may apply more convolutional
cores on them, to extract more information from the 232 images,
which is expressed by increasing the growth rate. This idea gives
an instance where block A has a growth of 8, block B has a
growth of 12, and block C has a growth of 16. Overall, the
architecture of PartialDenseNet is given by Fig. 3. The average
input is calculated as 93 images, which cost 0.412 computational
units compared to DenseNet as 1. We anticipate a result which
accelerates the training by 1 time, but achieves almost a same
accuracy compared to DenseNet.
IV. IMPLEMENTATION
A. Overview
We used TensorFlow as the platform to test our architecture.
Based on the architecture above, we mainly tried two types of
experiments: change the number of feed layers and the learning
rate.
B. TensorFlow
TensorFlow is an open source library for machine learning
first started by google brain and released on November 9th.
A very special structure of TensorFlow is it is using data flow
graphs for numerical computation. The user can adjust the data
flow graphs and created its own calculation process. Besides,

TensorFlow can also work on different platforms, GPU can


increase the speed of computation, and the mobile platform
stimulates the development of the industry for machine
intelligence field. TensorFlow 0.8 even provide the distributed
system, which mean the machine learning process can be work
on parallel node and increase the speed.
There are also many good open source machine learning
libraries such as Caffe and Mxnet. Since the TensorFlow has
more learning resources and can support variety of neuron
networks, we choose this library as our final library.
C. Architectures
We used three architectures above: DenseNet, 6-5-4 Partial
DenseNet and 4-4-4 Partial DenseNet.
We keep the same learning rate and 300 iterations, compared
the error rate on the test data and converge time. The fully fed
DenseNet would definitely get highest accuracy because it
saves most information between layers. But even the last layer
in one block would connect to the first layer, there might be lots
of redundant information thus wastes tremendous on training.
CIFAR-10 database for example, the DenseNet needs almost 10
days on training with our 27 CPU cores machine, which would
be impossible to apply on larger database like ImageNet.
D. learning rate
We changed three learning rate curve because at first we just
used 0.1 as the learning rate, we just get a low accuracy with
jitter. Then we tried smaller learning rates 0.01 and 0.001, the
result showed that 0.001 could get best accuracy but it really
took a long time to converge.
So in this section, we mainly explain three momentum learning
methods.
The first one is a basic momentum learning that reduces the
learning rate, as shown in Fig. 1, the initial learning rate is 0.1,
after 150 iterations it changes to 0.01, and it becomes 0.001 at
225 iterations.

The second method is a gradual decay learning rate and the


function of LR with epoch is LR=0.1/(1+epoch*0.5).

Result
Architecture
4-4-4
PartialDenseNet

Accuracy on test
data (%)

Training time
(hours)

91.2

47.6

As we can see from the Table 2, the DenseNet could


definitely achieve highest accuracy, but the training time is
unbearable. However, 6-5-4 and 4-4-4 PartialDenseNet can
converge within one fifth of time while only at the cost of 0.5%
accuracy. The time of converge partly represents the number of
parameters of the architecture.
The result of our structure shows that the original DenseNet
has lots of redundant information, which almost does not
influence the final accuracy on testing data.
Fig. 4. LR=0.1/(1+epoch*0.5)

The third one is a cyclical triangle learning rate, we get the idea
from [14]. The paper introduced the CLR (cyclical learning
rate) and achieved near optimal classification accuracy without
tuning. In this experiment, we set the maximum bound
(max_lr) as 0.1, minimum bound (base_lr) as 0.0005 and step
size is 25.

B. Learning Rate
In the second experiment, we compare the accuracy on test
data with different learning rate methods. And we keep the
architecture same for different learning methods, we just use the
4-4-4 PartialDenseNet because it converges fastest.

Fig. 6. Compare of Large and Small LR

Fig. 5. Cyclical Triangle LR

V.

RESULT ANALYSIS

In the experiment, we got the error rate on test data and


converge time during the training process. In table.1, we
compared the original DenseNet with the architecture we
proposed.
A. Architecture
TABLE II.

TABLE III.
COMPARE ON DIFFERENT ARCHITECTURE
Result

Architecture

Accuracy on test
data (%)

Training time
(hours)

86

7.9

ResNet

88.7

DenseNet

92.7

240

Traditional 3 layers
CNN

6-5-4
PartialDenseNet

In Fig 9. It shows the accuracy from epoch 1 to 300 when we


use first momentum learning method, before 150 iterations, the
LR remains 0.1, just as the Fig. 8 (a) shows, the learning step is
so big that it can just not reach the lowest loss point. So in Fig.8
after around 50 iterations, the accuracy just quivers and does not
improve any more. Then after 150 epochs, we change the
learning rate to 0.01, it shows a dramatic increase. And when the
learning change to 0.001 at 225 epochs, there is another
improvement on the accuracy.

Learning method
0.1 all the time
Basic 0.1, 0.01,0.001
Momentum Learning
LR=0.1/(1+epoch*0.
5)
Cyclical Triangle

COMPARE ON DIFFERENT LEARNING RATE


Accuracy on test data (%)

86
91.2

the demo website. The ImageNet interface include the upload


and reload part, the ImageNet Interface part was using the PHP
form and the data will be write into a text file to be further usage.
The Wikipedia part is to use the key word we extracted from the
result and search the internet by that word, we use the iframe
structure to create the embed in window in the webpage.

Fig. 7. Accuracy with Basic 0.1, 0.01,0.001 Momentum Learning

VI. WEBPAGE
A. Overview
Motivation: We would like to demonstrate the result of the
image classification models in this way, this website is
constructed to make other people grow interesting in this field.
Since Cifar10 only has 10 classes, it is not enough to show the
variety of knowledge the image classification can provide,
therefore, we built the demo website based on the model we
download from the Internet [15]. We also demonstrate our
cifar10 training model on the next page without the training
interface. Finally, we will introduce ourselves again at the
about us page.

Fig. 9. index page

2file upload page:


This is a PHP page deal with the upload file, after we receive
a form from the index page, this PHP page will initiate a shell
running in the server to give command to the server, since the
server doesnt have the permission to run the local python
library, we create an unstopping listening shell file running at
the back stage, when the command comes from the PHP page,
the listening shell will initiate the python script to compute the
result of the image.

B. Implement module:
1Front end: HTML and CSS
HTML is the basic structure of the whole webpage, the CSS
is help the website to arrange the location of different structure.
2Back end: PHP
We built the model on local server, PHP is used in the file
system to give commands to run the ImageNet Model, and it will
receive the data from the server back to the website.

Fig. 10. upload page

Fig. 8.Webpage Framework

C. Webiste Pages:
1) index.php page
The main page consists of three parts, the headline, the
navigation bar, the ImageNet interface and the Wikipedia page.
The four parts are all wrote in HTML format and been arranged
by the CSS. Only the retrievement of the result need to use the
PHP file function. The navigation bar can link to other pages in

After the result has been computed, the result will be write
to a text file, we set the file upload page to refresh after 11
seconds based on the python program running time on the back
stage. The wikipedia will also get the name result and update the
wikipedia link, people can know more about the image they
upload, it is a great learning process for many people, however,
we are not having enough time to change the image size to suit
the calculation model, that will be our future work.
D. Cifar10 page:
The Cifar10 is our main training dataset in this project, the
reason we didnt use it as our testing model is because the
dataset only contains 10 classes as we said before. Therefore,

we plan to create this page to show our final report, the pdf is
embedded in the webpage and can scroll down to view all of it,
we actually use the embedded code in the Scribd.com[16] and
realize this function.

REFERENCES
[1]

[2]

VII.

CONCLUSION

VIII. FUTURE WORK

[3]

[4]
[5]

IX. ACKNOWLEDGE
In this paper we only use 40 layers depth DenseNet, which
could only get 93% accuracy from original paper. If we use 100
layers architecture we could achieve almost 95% accuracy
within much longer training time, but it goes against our
purpose.
Although we got access to 4 GPUs on Hipergator, we were
still unable to install CuDNN for GPU training. Thus we are only
able to use CPUs to train a smaller database, CIFAR-10,
compared with ImageNet. Besides, the training time in our paper
is only comparable between different architectures we designed.
It is meaningless to compare with architectures in other papers
that trained with GPUs.
Though not succeeded, we still thank the System Admin of
Hipergator for spending one month on environment setup. We
also thank Dr. Damon L Woodard for providing the private
access to Hipergator.

[6]

[7]
[8]
[9]

[10]
[11]
[12]
[13]

[14]
[15]
[16]

Friedl, Mark A., and Carla E. Brodley. "Decision tree classification of


land cover from remotely sensed data." Remote sensing of environment
61.3 (1997): 399-409.
Joachims, Thorsten. "Text categorization with support vector machines:
Learning with many relevant features." European conference on machine
learning. Springer Berlin Heidelberg, 1998.
Keller, James M., Michael R. Gray, and James A. Givens. "A fuzzy knearest neighbor algorithm." IEEE transactions on systems, man, and
cybernetics 4 (1985): 580-585.
LeCun, Yann, et al. "Gradient-based learning applied to document
recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet
classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition.
2015.
Szegedy, Christian, et al. "Rethinking the inception architecture for
computer vision." arXiv preprint arXiv:1512.00567 (2015).
He, Kaiming, et al. "Deep residual learning for image recognition." arXiv
preprint arXiv:1512.03385 (2015).
Liang, Ming, and Xiaolin Hu. "Recurrent convolutional neural network
for object recognition." Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition. 2015.
Huang, Gao, Zhuang Liu, and Kilian q. Weinberger. "Densely connected
convolutional networks." arXiv preprint arXiv:1608.06993 (2016).
Srivastava, Rupesh Kumar, Klaus Greff, and Jrgen Schmidhuber.
"Highway networks." arXiv preprint arXiv:1505.00387 (2015).
Huang, Gao, et al. "Deep networks with stochastic depth." arXiv preprint
arXiv:1603.09382 (2016).
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich.
"FractalNet: Ultra-Deep Neural Networks without Residuals." arXiv
preprint arXiv:1605.07648 (2016).
Smith, Leslie N. "Cyclical Learning Rates for Training Neural
Networks.".
https://www.tensorflow.org/
https://zh.scribd.com/

You might also like