Deep Learning Techniques: An Overview: January 2021
Deep Learning Techniques: An Overview: January 2021
net/publication/341652370
CITATIONS READS
3 9,562
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Amitha Mathew on 02 August 2020.
1 Introduction
Deep learning techniques which implement deep neural networks became pop-
ular due to the increase of high-performance computing facility. Deep learning
achieves higher power and flexibility due to its ability to process a large number
of features when it deals with unstructured data. Deep learning algorithm passes
the data through several layers; each layer is capable of extracting features pro-
gressively and passes it to the next layer. Initial layers extract low-level features,
and succeeding layers combines features to form a complete representation. Sec-
tion 2 gives an overview of the evolution of deep learning models. Section 3
provides a brief idea about the different learning approaches, such as supervised
learning, unsupervised learning, and hybrid learning. Supervised learning uses
labeled data to train the neural network. In supervised learning, the network
uses unlabeled data and learns the recurring patterns. Hybrid learning combines
supervised and unsupervised methods to get a better result. Deep learning can
be implemented using different architectures such as architectures like Unsuper-
vised Pre-trained Networks, Convolutional Neural Networks, Recurrent Neural
Networks, and Recursive Neural Networks, which are described in section 4.
Section 5 introduces various training methods and optimization techniques that
help in achieving better results. Section 6 describes the frameworks which allow
us to develop tools that offer a better programming environment. Despite the
various challenges in deep learning applications, many exciting applications that
may rule the world are briefed in Section 7.
2 Amitha Mathew et al.
Deep learning architectures perform better than simple ANN, even though train-
ing time of deep structures are higher than ANN. However, training time can
be reduced using methods such as transfer learning, GPU computing. One of
the factors which decide the success of neural networks lies in the careful design
of network architecture. Some of the relevant deep learning architectures are
discussed below.
Autoencoders : are used for the reduction of the dimension of data, novelty
detection problems, as well as in anomaly detection problems. In an autoencoder,
the first layer is built as an encoding layer and transpose of that as a decoder.
Then train it to recreate the input using the unsupervised method. After train-
ing, fix the weights of that layer. Then move to the subsequent layer until we
pre-train all the layers of deep net. Then go back to the original problem that
we want to solve with deep net (Classification/Regression) and optimize it with
Stochastic gradient descent by starting from weights learned using pre-training.
Deep Learning Techniques: An Overview 5
Autoencoder network consists of two parts [7]. The input is translated to a latent
space representation by the encoder, which can be denoted as:
h = f (x) (2)
The input is reconstructed from the latent space representation by the decoder,
which can be denoted as:
r = g(h) (3)
In essence, autoencoders can be described as in equation (4). r is the decoded
output which will be similar to input x :
g(f (x)) = r (4)
Deep Belief Networks: The first step for training the deep belief network is
to learn features using the first layer. Then use the activation of trained fea-
tures in the next layer. Continue this until the final layer. Restricted Boltzmann
Machines (RBM) is used to train layers of the Deep Belief Networks (DBNs),
and the feed-forward network is used for fine-tuning. DBN learns hidden pat-
tern globally, unlike other deep nets where each layer learns complex patterns
progressively [19].
Learning Rate Decay : Adjusting the learning rate increases the performance
and reduces the training time of stochastic gradient descent algorithms. The
widely used technique is to reduce the learning rate gradually, in which we can
make large changes at the beginning and then reduce the learning rate gradually
in the training process. This allows fine-tuning the weights in the later stages [7].
Dropout : The overfitting problem in deep neural networks can be addressed us-
ing the drop out technique. This method is applied by randomly dropping units
and their connections during training [9]. Dropout offers an effective regular-
ization method to reduce overfitting and improve generalization error. Dropout
gives an improved performance on supervised learning tasks in computer vision,
computational biology, document classification, speech recognition [1].
dow containing n terms and train the neural network by skipping one of this
term and then use the model to predict skipped term [13].
PyTorch PyTorch can be used for building deep neural networks as well as ex-
ecuting tensor computations. PyTorch is a Python-based package that provides
Tensor computations. PyTorch delivers a framework to create computational
graphs [22].
Caffe Yangqing Jia developed Caffe, and it is open source as well. Caffe stands
out from other frameworks in its speed of processing as well as learning from
images. Caffe Model Zoo framework facilitates us to access pre-trained models,
which enable us to solve various problems effortlessly [22].
8 Conclusion
Deep learning is continuously evolving faster; still, there are a number of prob-
lems to deal with and can be solved using deep learning. Even though a full
understanding of the working of deep learning is still a mystery, we can make
machines smarter using Deep learning, sometimes even smarter than human.
Now the aim is to develop deep learning models that work with mobile to make
the applications smarter and more intelligent. Let deep learning be more devoted
to the betterment of humanity and thus making our domain a better place to
live.
References
1. Alessandro Achille and Stefano Soatto. Information dropout: Learning optimal
representations through noisy computation. IEEE transactions on pattern analy-
sis and machine intelligence, 40(12):2897–2905, 2018. doi: 10.1109/TPAMI.2017.
2784440.
2. Filippo Maria Bianchi, Enrico Maiorino, Michael C Kampffmeyer, Antonello Rizzi,
and Robert Jenssen. An overview and comparative analysis of recurrent neural
networks for short term load forecasting. arXiv preprint arXiv:1705.04378, 2017.
3. Li Deng, Dong Yu, et al. Deep learning: methods and applications. Founda-
tions and Trends
R in Signal Processing, 7(3–4):197–387, 2014. doi: 10.1007/
978-981-13-3459-7 3.
4. Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pas-
cal Vincent, and Samy Bengio. Why does unsupervised pre-training help deep
learning? Journal of Machine Learning Research, 11(Feb):625–660, 2010.
5. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley,
Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.
In Advances in neural information processing systems, pages 2672–2680, 2014.
6. Palash Goyal, Sumit Pandey, and Karan Jain. Introduction to natural language
processing and deep learning. In Deep Learning for Natural Language Processing,
pages 1–74. Springer, 2018. doi: 10.1007/978-1-4842-3685-7 1.
7. Nathan Hubens. Deep inside: Autoencoders - towards data science, Apr 2018.
8. Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating
deep network training by reducing internal covariate shift. arXiv preprint
arXiv:1502.03167, 2015.
10 Amitha Mathew et al.
9. Anil K Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters,
31(8):651–666, 2010. doi: 10.1016/j.patrec.2009.09.011.
10. Sotiris Kotsiantis and Dimitris Kanellopoulos. Association rules mining: A recent
overview. GESTS International Transactions on Computer Science and Engineer-
ing, 32(1):71–82, 2006.
11. Sotiris B Kotsiantis, I Zaharakis, and P Pintelas. Supervised machine learning: A
review of classification techniques. Emerging artificial intelligence applications in
computer engineering, 160:3–24, 2007.
12. Quoc V Le et al. A tutorial on deep learning part 2: Autoencoders, convolutional
neural networks and recurrent neural networks. Google Brain, pages 1–20, 2015.
13. Chaochun Liu, Yaliang Li, Hongliang Fei, and Ping Li. Deep skip-gram networks
for text classification. In Proceedings of the 2019 SIAM International Conference
on Data Mining, pages 145–153. SIAM, 2019.
14. Jonathan Lorraine and David Duvenaud. Stochastic hyperparameter optimization
through hypernetworks. arXiv preprint arXiv:1802.09419, 2018.
15. Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink,
Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy,
et al. Evolving deep neural networks. In Artificial Intelligence in the Age of
Neural Networks and Brain Computing, pages 293–312. Elsevier, 2019. doi: 10.
1016/B978-0-12-815480-9.00015-3.
16. Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timo-
thy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous
methods for deep reinforcement learning. In International conference on machine
learning, pages 1928–1937, 2016.
17. Sinno Jialin Pan and Qiang Yang. A survey on transfer learning. IEEE
Transactions on knowledge and data engineering, 22(10):1345–1359, 2009. doi:
10.1109/TKDE.2009.191.
18. Abhishek Panigrahi, Yueru Chen, and C-C Jay Kuo. Analysis on gradient prop-
agation in batch normalized residual networks. arXiv preprint arXiv:1812.00342,
2018.
19. Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. International Jour-
nal of Approximate Reasoning, 50(7):969–978, 2009. doi: 10.1016/j.ijar.2008.11.
006.
20. Bernhard Scholkopf and Alexander J Smola. Learning with kernels: support vector
machines, regularization, optimization, and beyond. MIT press, 2001.
21. George AF Seber and Alan J Lee. Linear regression analysis, volume 329. John
Wiley & Sons, 2012.
22. Pulkit Sharma. Top 5 deep learning frameworks, their applications, and compar-
isons!, May 2019.
23. Toshihiro Takahashi. Statistical max pooling with deep learning, July 3 2018. US
Patent 10,013,644.
24. Bhiksha Wang, HaohanandRaj. On the origin of deep learning. arXiv preprint
arXiv:1702.07800, 2017.
25. Rikiya Yamashita, Mizuho Nishio, Richard Kinh Gian Do, and Kaori Togashi.
Convolutional neural networks: an overview and application in radiology. Insights
into imaging, 9(4):611–629, 2018. doi: 10.1007/s13244-018-0639-9.