0% found this document useful (0 votes)
53 views20 pages

Deep Learning Techniques and Application

The document discusses deep learning techniques and applications. It describes what makes deep learning state-of-the-art, including its accuracy compared to other methods. Various deep learning techniques like convolutional neural networks are explained. Examples and applications of deep learning in areas like computer vision and natural language processing are provided.

Uploaded by

Javier Cristobal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views20 pages

Deep Learning Techniques and Application

The document discusses deep learning techniques and applications. It describes what makes deep learning state-of-the-art, including its accuracy compared to other methods. Various deep learning techniques like convolutional neural networks are explained. Examples and applications of deep learning in areas like computer vision and natural language processing are provided.

Uploaded by

Javier Cristobal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Deep Learning :

Techniques and Applications

Debasmita Bhoumik

MTech 3rd Semester


Roll No : 97/CSM/160001
Registration No : 221-1221-0303-10

Seminar Paper
Contents

1 Introduction 3
1.1 What Makes Deep Learning State-of-the-Art? . . . . . . . . . . . . . . . . . . . . . . 3
1.2 What is the Difference Between Deep Learning and Machine Learning? . . . . . . . . 5

2 Literature review 6

3 Techniques 7
3.1 Where Machine Learning fails there comes Deep Learning . . . . . . . . . . . . . . . . 7
3.2 How a Deep Neural Network Learns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.1 Working with the previous example . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3.1.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1.2 Pooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1.4 Deep stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1.5 Fully connected layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1.6 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Conclusion 18

Reference 18

1
List of Figures

1.1 Comparison of results in ImageNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4


1.2 Comparison of results with SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Comparison of results using MNIST dataset . . . . . . . . . . . . . . . . . . . . . . . 5

3.1 Example with machine learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7


3.2 Example with machine learning: Trickier cases . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Example with machine learning: What computers see . . . . . . . . . . . . . . . . . . 8
3.4 Example with machine learning: Matching is poor . . . . . . . . . . . . . . . . . . . . 8
3.5 Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.6 Example with CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.7 Example with CNN: features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.8 Example with CNN: matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.9 Example with CNN: Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.10 Example with CNN: Convolution layer . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.11 Example with CNN: After Convolution layer . . . . . . . . . . . . . . . . . . . . . . . 13
3.12 Example with CNN: After Pooling layer . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.13 Rectified Linear Units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.14 Example with CNN: After Rectified Linear Units . . . . . . . . . . . . . . . . . . . . 15
3.15 Deep stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.16 Fully connected layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.17 Fully connected layer: Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.18 Back Propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.19 Flow of the CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.20 Gradient descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2
Chapter 1

Introduction

Learning is a process by which a system improves its performance from experience . Since 2006,
deep learning is emerged as a new area of machine learning [], impacting a wide range of signal
and information processing work in both the traditional and the new scopes. Many traditional ma-
chine learning and signal processing techniques exploit shallow architectures, which contain a single
layer of nonlinear feature transformation. Examples of shallow architectures are conventional hidden
Markov models (HMMs) [], maximum entropy (MaxEnt) models [], support vector machines (SVMs)
[], kernel regression [], and multilayer perceptron (MLP) with a single hidden layer []. A standard
neural network (NN) consists of many simple, connected processors called neurons, each producing a
sequence of real-valued activations. Input neurons get activated through sensors perceiving the envi-
ronment, other neurons get activated through weighted connections from previously active neurons.
Human information processing mechanisms (e.g., vision and speech), however, recommend the need
of deep architectures [], for extracting complex structure and building internal representation from
rich sensory inputs (e.g., natural image and its motion, speech, and music). It is natural to believe
that the state of the art can be advanced in processing these types of media signals if efficient and
effective deep learning algorithms are developed. Signal processing systems with deep architectures
are composed of many layers of nonlinear processing stages, where each lower layers outputs are fed
to its immediate higher layer as the input. Deep learning is a type of machine learning in which a
model learns to perform classification tasks directly from images, text, or sound. Deep learning is
usually implemented using a neural network architecture. The term deep refers to the number of
layers in the networkthe more layers, the deeper the network. Traditional neural networks contain
only 2 or 3 layers, while deep networks can have hundreds.
A few examples of deep learning at work: A self-driving vehicle slows down as it approaches a
pedestrian crosswalk, An ATM rejects a counterfeit bank note, A smartphone app gives an instant
translation of a foreign street sign. Deep learning is especially well-suited to identification appli-
cations such as face recognition, text translation, voice recognition, and advanced driver assistance
systems, including, lane classification and traffic sign recognition
In this paper we will see some of those techniques to achieve deep learning and their correspond-
ing applications.

1.1 What Makes Deep Learning State-of-the-Art?


In a word, accuracy [Fig:1.1]. Advanced tools and techniques have dramatically improved deep
learning algorithmsto the point where they can outperform humans at classifying images.
Three technology enablers make this degree of accuracy possible:
Easy access to massive sets of labeled data:
Data sets such as ImageNet [] and PASCAL VoC [] are freely available, and are useful for training
on many different types of objects. Increased computing power
High-performance GPUs accelerate the training of the massive amounts of data needed for deep
3
Figure 1.1: Comparison of results in ImageNet

learning, reducing training time from weeks to hours.


Pretrained models built by experts:
Models such as AlexNet [] can be retrained to perform new recognition tasks using a technique called
transfer learning []. While AlexNet was trained on 1.3 million high-resolution images to recognize
1000 different objects, accurate transfer learning can be achieved with much smaller datasets.

Kim, Guk Bae, et al. in the year 2017, shows how deep learning is outperforming other methods.
1200 regions of interest (using GE or Siemens scanner) are used to compare shallow and deep learning
of classifying the patterns of interstitial lung diseases (ILDs). They employed the convolution neural
network (CNN) with six learnable layers that consisted of four convolution layers and two fully
connected layers and compared with the results classified by a shallow learning of a support vector
machine (SVM). The CNN classifier showed [Fig: 1.2] significantly better performance for accuracy
compared with that of the SVM classifier by 69

Figure 1.2: Comparison of results with SVM

In the year of 2006, the ground breaking result is obtained by Hinton et al using Deep Belief
4
Networks. The MNIST data set is used and they got the error of as low as 1.25 % which is much
lower than other shallow learning methods [Fig: 1.3].

Figure 1.3: Comparison of results using MNIST dataset

1.2 What is the Difference Between Deep Learning and Ma-


chine Learning?
Deep learning is a subtype of machine learning. With machine learning, you manually extract the
relevant features of an image. With deep learning, you feed the raw images directly into a deep neural
network that learns the features automatically. Deep learning often requires hundreds of thousands
or millions of images for the best results, where as machine learning produces good results with small
data sets. Deep learning is also computationally intensive and requires a high-performance GPU.

5
Chapter 2

Literature review

Deep Belief Networks (DBN) are the first effective Deep Learning method introduced by Hinton et
al []. In this work, they trained successfully a DBN and used it to tackle the task of handwritten
digit recognition. Until then, learning in densely connected, directed belief nets with multiple hidden
layers was very difficult. They proposed a layer-by-layer, unsupervised method to train such a model
by stacking restricted Boltzmann machine (RBMs) (i.e., two-layer networks with one visible and one
hidden layer) on top of each other. The presented algorithm was both fast and efficient. Shortly after,
the same model was further studied and extended by Bengio et al.[] gaining considerable popularity.
Since then, DBNs have been used, amongst others, for object recognition, acoustic modeling, and
speech recognition.
Han et al. [] proposed Mesh Convolutional Restricted Boltzmann Machines (MCRBMs) for learning
high discriminative 3D features from 3D meshes. The learned features were designed to preserve the
structure between local regions and can be used as local or global features. A novel raw representation
of the local region, called Local Function Energy Distribution (LFED), was provided as input to the
network. In addition, Multiple MCRBMs were combined forming a deeper model, named Mesh
Convolutional Deep Belief Network (MCDBN).
Following the success in computer vision, the first applications of deep learning to clinical data
were on image processing, especially on the analysis of brain Magnetic Resonance Imaging (MRI)
scans to predict Alzheimer disease and its variations [3, 1]. In other medical domains, Convolution
Neural Networks (CNN) were used to infer a hierarchical representation of low-field knee MRI scans
to automatically segment cartilage and predict the risk of osteoarthritis [4].
In the era of big data, transformation of biomedical big data into valuable knowledge has been one
of the most important challenges in bioinformatics. Application of deep learning in bioinformatics to
gain insight from data has been emphasized in both academia and industry. Deep Neural Networks
(DNN) have been widely applied in protein structure prediction research. Heffernan et al. [2]
applied sparse auto-encoder (SAE) to protein amino acid sequences to solve prediction problems
for secondary structure, torsion angle, and accessible surface area. In another study, Spencer et al.
[5] applied DBN to amino acid sequences along with PSSM and Atchley factors to predict protein
secondary structure. Stober et al. [6] classified the rhythm type and genre of music that participants
listened to via CNN.

6
Chapter 3

Techniques

3.1 Where Machine Learning fails there comes Deep Learn-


ing
Machine Learning is the field of study that gives computers the ability to learn without being ex-
plicitly programmed. What we do is:
(i) take some data,
(ii) train a model on that data, and
(iii) use the trained model to make predictions on new data.
Now suppose we have to classify a set of pixel, if its is X or O. First we see the conventional machine
learning approach [Fig: 3.1.
But the situation is trickier in some cases [Fig:3.2]

Figure 3.1: Example with machine learning

Figure 3.2: Example with machine learning: Trickier cases

7
By doing pixel by pixel matching , the decision is hard to take whether its X or O [Fig:3.3].
Because computer sees whether there is a pixel or not[Fig:3.4].

Figure 3.3: Example with machine learning: What computers see

Figure 3.4: Example with machine learning: Matching is poor

Therefore this method will not work here well. For improved matching we will explore the deep
learning mechanism.

3.2 How a Deep Neural Network Learns


Lets say we have a set of images where each image contains one of four different categories of object,
and we want the deep learning network to automatically recognize which object is in each image. We
label the images in order to have training data for the network. Using this training data, the network
can then start to understand the objects specific features and associate them with the corresponding
category. Each layer in the network takes in data from the previous layer, transforms it, and passes
it on. The network increases the complexity and detail of what it is learning from layer to layer.
The network learns directly from the datawe have no influence over what features are being learned.

3.3 Convolutional Neural Network


A convolutional neural network (CNN, or ConvNet) is one of the most popular algorithms for deep
learning with images and video. Like other neural networks, a CNN is composed of an input layer,
8
an output layer, and many hidden layers in between. These layers perform one of three types of
operations on the data: convolution, pooling, or rectified linear unit (ReLU). Convolution puts the
input images through a set of convolutional filters, each of which activates certain features from the
images. Pooling simplifies the output by performing nonlinear downsampling, reducing the number
of parameters that the network needs to learn about. Rectified linear unit (ReLU) allows for faster
and more effective training by mapping negative values to zero and maintaining positive values.
These three operations are repeated over tens or hundreds of layers, with each layer learning to
detect different features.

Figure 3.5: Convolutional Neural Network

After feature detection, the architecture of a CNN shifts to classification. The next-to-last layer
is a fully connected layer (FC) that outputs a vector of K dimensions where K is the number of
classes that the network will be able to predict. This vector contains the probabilities for each class
of any image being classified. The final layer of the CNN architecture uses a softmax function to
provide the classification output.

3.3.1 Working with the previous example


Let us consider the previous example- we have to classify a set of pixel, if its is X or O [Fig:3.1 ,
Fig:3.2]. Instead of matching entire picture pixel by pixel, convolutional neural network or ConvNets
match pieces of the image [Fig: 3.6] .

Figure 3.6: Example with CNN

So ConvNets fot 3 kind of features [Fig. 3.7].

9
Figure 3.7: Example with CNN: features

The matching will be done like this [Fig:3.8]

Figure 3.8: Example with CNN: matching

3.3.1.1 Filtering
The math behind the match is Filtering. The steps are:
1. Line up the feature and the image patch.
2. Multiply each image pixel by the corresponding feature pixel.
3. Add them up.
4. Divide by the total number of pixels in the feature.
The steps are shown graphically in Fig: 3.9
After completion of Convolution layer with first feature we get Fig: 3.10
After completion of Convolution layer with all the feature pieces the final result is Fig:3.11

3.3.1.2 Pooling
Pooling is shrinking of the image stack. The steps are:
1. Pick a window size (usually 2 or 3).
2. Pick a stride (usually 2).
3. Walk your window across your filtered images.
4. From each window, take the maximum value.

After completion of Pooling layer with all the feature pieces the final result is Fig:3.12.

3.3.1.3 Normalization
For normalization, here we change everything negative to zero. We are using ReLu (Rectified Linear
Units). ReLu is a kind of activation function where: A(x) = max(0, x)
It gives an output x if x is positive and 0 otherwise [Fig:3.13].

10
Figure 3.9: Example with CNN: Filtering

After completion of ReLu layer the final result is Fig:3.14.

3.3.1.4 Deep stacking


These 3 layers (Convolution , Pooling and ReLu) can be repeated several (or many) times [Fig: 3.15.]

3.3.1.5 Fully connected layer


In this layer every value gets a vote. The vote depends on how strongly a value predicts X or O [Fig:
3.16]. The future values vote on X or O [Fig: 3.17]. Voting weights in fully connected layers comes
from back propagation. In case of Back Propagation [Fig: 3.18]
Error = right answer actual answer

The entire flow of the Convolution Neural Network is here in Fig: 3.19

3.3.1.6 Gradient descent


Gradient descent method is a way to find a local minimum of a function. The way it works is we
start with an initial guess of the solution and we take the gradient of the function at that point. We

11
Figure 3.10: Example with CNN: Convolution layer

step the solution in the negative direction of the gradient and we repeat the process. The algorithm
will eventually converge where the gradient is zero (which correspond to a local minimum) [Fig:3.20].
To modify the weight, we use the function
dJ(w)
w =w−α×
d(w)
where α is the learning rate,
dJ(w)
is the derivative.
d(w)
By definition, derivative is slope of a function at a point, i.e. tangent is height/width of the
triangle shown in the figure 3.20. This is the case where derivative is positive.Therefore a value is
being subtracted from w. So step to the left is taken which leads to decrease of the parameter w.

In our case - for each feature pixel and voting weight, we adjust it up and down a bit and see
how the error changes. And finally chose the weights with minimum error.

12
Figure 3.11: Example with CNN: After Convolution layer

13
Figure 3.12: Example with CNN: After Pooling layer

14
Figure 3.13: Rectified Linear Units

Figure 3.14: Example with CNN: After Rectified Linear Units

Figure 3.15: Deep stacking

15
Figure 3.16: Fully connected layer

Figure 3.17: Fully connected layer: Outcome

Figure 3.18: Back Propagation

16
Figure 3.19: Flow of the CNN

Figure 3.20: Gradient descent

17
Chapter 4

Conclusion

In this paper we have discussed how we can use Convolution Neural network - a deep learning
techniques to improve the already existing machine learning algorithms. This is a toy example.
There are some ConvNet/DNN toolkits already working well in real world. Few of them are - Caffe
(Berkeley Vision and Learning Center), CNTK (Microsoft), Deeplearning4j (Skymind), TensorFlow
(Google), Theano (University of Montreal + broad community), Torch (Ronan Collobert) etc.
There are few limitations though: ConvNets only capture the local spatial patterns in data. If
the data cant be made to look like an image, ConvNets are less useful.
It is always said that Deep Learning works well. But it is no where clearly mentioned ”WHY
deep learning works well”. In a very recent paper (July 21, 2017) Henry et. al []showed how the
success of deep learning could depend not only on mathematics but also on physics. They explored
how properties frequently encountered in physics such as symmetry, locality, compositionality, and
polynomial log-probability translate into exceptionally simple neural networks. In 2017 , Another
new theory cracks open the black box of deep learning A new idea called the information bottleneck
[] is helping to explain the puzzling success of todays artificial-intelligence algorithms - and might
also explain how human brains learn. As the most recent development using Deep Learning - IBM
Research [] achieves record deep learning performance with new software technology. By using 100s
of NVIDAI GPUs, it yielded record image recognition accuracy of 33.8% on 7.5M images from the
ImageNet-22k dataset. The previous best published result was 29.8% by Microsoft. A 4% increase
in accuracy is a big leap forward whereas typical improvements in the past have been less than
1%. The distributed deep learning (DDL) approach not just improved accuracy, but also trained a
ResNet-101 neural network model in just 7 hours, by leveraging the power of 10s of servers, equipped
with 100s of NVIDIA GPUs whereas Microsoft took 10 days to train the same model. To conclude ,
we can say that Deep Neural Networks are powerful. Deep Neural Networks are trainable if we have
a very fast computer.

18
References

[1] Tom Brosch, Roger Tam, Alzheimers Disease Neuroimaging Initiative, et al. Manifold learning
of brain mris by deep learning. In International Conference on Medical Image Computing and
Computer-Assisted Intervention, pages 633–640. Springer, 2013.

[2] Rhys Heffernan, Kuldip Paliwal, James Lyons, Abdollah Dehzangi, Alok Sharma, Jihua Wang,
Abdul Sattar, Yuedong Yang, and Yaoqi Zhou. Improving prediction of secondary structure,
local backbone angles, and solvent accessible surface area of proteins by iterative deep learning.
Scientific reports, 5:11476, 2015.

[3] Siqi Liu, Sidong Liu, Weidong Cai, Sonia Pujol, Ron Kikinis, and Dagan Feng. Early diagno-
sis of alzheimer’s disease with deep learning. In Biomedical Imaging (ISBI), 2014 IEEE 11th
International Symposium on, pages 1015–1018. IEEE, 2014.

[4] Adhish Prasoon, Kersten Petersen, Christian Igel, François Lauze, Erik Dam, and Mads Nielsen.
Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural
network. In International conference on medical image computing and computer-assisted inter-
vention, pages 246–253. Springer, 2013.

[5] Matt Spencer, Jesse Eickholt, and Jianlin Cheng. A deep learning network approach to ab initio
protein secondary structure prediction. IEEE/ACM transactions on computational biology and
bioinformatics, 12(1):103–112, 2015.

[6] Sebastian Stober, Daniel J Cameron, and Jessica A Grahn. Using convolutional neural networks
to recognize rhythm stimuli from electroencephalography recordings. In Advances in neural
information processing systems, pages 1449–1457, 2014.

19

You might also like