Cloud Final Report PDF
Cloud Final Report PDF
Tiangyang Liu
Yicheng Wang
Su Pu
computing;
CIFAR-10;
DenseNet;
Partial
I. INTRODUCTION
Analyzing and classifying pictures is quite a heating topic in
nowadays computers vision field, and is a crucial module in
robotics. The associated dataset ImageNet which is leaded by
Feifei Li, is the most prestigious and the largest academic
resource on images. In recent years since 2010, a 1.2 million
subset of the ImageNet within 1000 categories has been used in
the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), the most important annual competition on image
classification, as the database. The state-of-the-art result from
this annual competition is improving so fast, that today the
overall accuracy could reach more than 95%. High though it is,
the requirements upon the computation resources are critical,
and both calculation and storage can take up great amounts of
time and space, which slows down the progress of design and
development. Fortunately, the cloud computing technology
today is mature enough to help the researchers deal with high
performance calculation. In specific, cloud computing relieves
the pressure of computing hardware when processing big data
and big algorithm, by granting users paid access to remote
clusters. It helps companies save money from purchasing more
servers, and provides with almost unlimited amount of secured
space. By using third party data centers like AWS and Google
Cloud, users and enterprises from different fields can share the
same resources, and dynamically change the requests according
to needs.
RELATED WORK
Image classification refers to the task of extracting classbased information from rasterized images. Advanced
classification methods could be classified into pixel algorithms,
subpixel algorithms, field algorithms, contextual-based
approaches, knowledge-based algorithms, and combinative
approaches, while in different scenarios we implement selected
methods. Best several methods include decision tree [1], SVM
[2], and fuzzy algorithm [3]. Decision tree revealed its potential
in [1] on land mapping problem solving. Univariate decision
trees, multivariate decision trees, and hybrid decision trees were
tested in case of the classification accuracy, and outperformed
maximum likelihood method as well as the line discriminant
function classifiers. It was believed to be the best choice in
remote sensing applications due to its simple, explicit, and
intuitive structure; the nonparametric feature enables the flexible
and robust operation with even noisy inputs. SVM [2] was
introduced to solve pattern recognition problems and had
received great success. The fuzzy algorithm in [3] could help the
K nearest decision rule in situations that, the knowledge of
probabilities is lost. While these methods all once achieved the
state-of-the-art result, they can just solve image classification
tasks of small datasets. In a case of large datasets like the
CIFAR-10/100 which is more complicated and practical, a novel
method must be introduced
CNN was originated in the 1990s, and has been growing fast
in this decade: since 2011, CNN has been taking a dominant
position in ILSVRC, and has become a hot research point in turn.
The Neural Network (NN) simulates how human brains work,
and therefore gains a deep potential in the field of computers
vision and pattern recognition. CNN inherits the advantages of
NN, but from pictures can extract additional useful information
like continuity; it also features convolution and pooling schemes
to enable precise extraction and condensation of the inputs. [4]
published in 1998 was the first attempt to use a CNN called
LeNet-5 in document recognition. Back then it was not so
famous, partly because GPUs were not developed well enough
to support CNN to exhibit its ability, and also, because there
were some traditional methods already good enough to solve
tasks of small datasets. As a result, the superiority of CNN was
occluded. In recent years in 2012, article [5] picked up CNN to
challenge the ILSVRC, and achieved a state-of-the-art result:
while the second place got a 74% accuracy, this implementation
received 84% which was a big step in image classification. In
this CNN, 60M parameters were used to construct five
convolutional layers, three max pooling layers, and three fully
connected layers.
Numerous works upon [5] were published. Szegedy et al. [6,
7] concentrated on increasing the width and depth of CNN while
keeping the computational budget constant. This idea came from
the thinking that, although increased model size tends to
translate to immediate quality gains in most cases, the efficiency
and low parameters counts are still crucial in situations like
ARCHITECTURE
A. DenseNet
CNN is a dominant approach to do visual object recognition
which was invented 20 years ago, and people have long
observed that the increasing of depth can be transformed into the
improvement of performance. Nevertheless, CNN kept shallow
and did not reach 100 layers until 2015 when HighwayNet [11]
and ResNet [8] were introduced. The challenge of training a
deep network is, as the inputs and gradients go through so many
layers, effective informations could be lost. HighwayNet and
ResNet addressed this problem by bypassing feature maps to the
next layers via the so-called identity connections. Upon these
works, StochasticNet [12] dropped random layers from ResNet
to allow efficient flow of information, while the FractalNet [13]
(1)
+ 12*(12 - N)*N)/12
TABLE I.
Conn. layers
Avg. input
Comp. cost
Efficiency
13
0.058
16.950
2
3
4
28
43
59
0.122
0.190
0.263
8.169
5.256
3.809
77
0.339
2.948
95
0.420
2.379
114
0.506
1.977
135
0.596
1.678
156
0.690
1.449
10
178
0.789
1.267
11
12
201
226
0.892
1.000
1.121
1.000
Result
Architecture
4-4-4
PartialDenseNet
Accuracy on test
data (%)
Training time
(hours)
91.2
47.6
The third one is a cyclical triangle learning rate, we get the idea
from [14]. The paper introduced the CLR (cyclical learning
rate) and achieved near optimal classification accuracy without
tuning. In this experiment, we set the maximum bound
(max_lr) as 0.1, minimum bound (base_lr) as 0.0005 and step
size is 25.
B. Learning Rate
In the second experiment, we compare the accuracy on test
data with different learning rate methods. And we keep the
architecture same for different learning methods, we just use the
4-4-4 PartialDenseNet because it converges fastest.
V.
RESULT ANALYSIS
TABLE III.
COMPARE ON DIFFERENT ARCHITECTURE
Result
Architecture
Accuracy on test
data (%)
Training time
(hours)
86
7.9
ResNet
88.7
DenseNet
92.7
240
Traditional 3 layers
CNN
6-5-4
PartialDenseNet
Learning method
0.1 all the time
Basic 0.1, 0.01,0.001
Momentum Learning
LR=0.1/(1+epoch*0.
5)
Cyclical Triangle
86
91.2
VI. WEBPAGE
A. Overview
Motivation: We would like to demonstrate the result of the
image classification models in this way, this website is
constructed to make other people grow interesting in this field.
Since Cifar10 only has 10 classes, it is not enough to show the
variety of knowledge the image classification can provide,
therefore, we built the demo website based on the model we
download from the Internet [15]. We also demonstrate our
cifar10 training model on the next page without the training
interface. Finally, we will introduce ourselves again at the
about us page.
B. Implement module:
1Front end: HTML and CSS
HTML is the basic structure of the whole webpage, the CSS
is help the website to arrange the location of different structure.
2Back end: PHP
We built the model on local server, PHP is used in the file
system to give commands to run the ImageNet Model, and it will
receive the data from the server back to the website.
C. Webiste Pages:
1) index.php page
The main page consists of three parts, the headline, the
navigation bar, the ImageNet interface and the Wikipedia page.
The four parts are all wrote in HTML format and been arranged
by the CSS. Only the retrievement of the result need to use the
PHP file function. The navigation bar can link to other pages in
After the result has been computed, the result will be write
to a text file, we set the file upload page to refresh after 11
seconds based on the python program running time on the back
stage. The wikipedia will also get the name result and update the
wikipedia link, people can know more about the image they
upload, it is a great learning process for many people, however,
we are not having enough time to change the image size to suit
the calculation model, that will be our future work.
D. Cifar10 page:
The Cifar10 is our main training dataset in this project, the
reason we didnt use it as our testing model is because the
dataset only contains 10 classes as we said before. Therefore,
we plan to create this page to show our final report, the pdf is
embedded in the webpage and can scroll down to view all of it,
we actually use the embedded code in the Scribd.com[16] and
realize this function.
REFERENCES
[1]
[2]
VII.
CONCLUSION
[3]
[4]
[5]
IX. ACKNOWLEDGE
In this paper we only use 40 layers depth DenseNet, which
could only get 93% accuracy from original paper. If we use 100
layers architecture we could achieve almost 95% accuracy
within much longer training time, but it goes against our
purpose.
Although we got access to 4 GPUs on Hipergator, we were
still unable to install CuDNN for GPU training. Thus we are only
able to use CPUs to train a smaller database, CIFAR-10,
compared with ImageNet. Besides, the training time in our paper
is only comparable between different architectures we designed.
It is meaningless to compare with architectures in other papers
that trained with GPUs.
Though not succeeded, we still thank the System Admin of
Hipergator for spending one month on environment setup. We
also thank Dr. Damon L Woodard for providing the private
access to Hipergator.
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]