Deep Learning Application Pros and Cons Over Algorithm
Deep Learning Application Pros and Cons Over Algorithm
1
School of Information Engineering Jiangxi University of Science and Technology, Ganzhou, China, [email protected].
2
School of Information Engineering Jiangxi University of Science and Technology, Ganzhou, China,
[email protected].
3
Faculty of technology, school of engineering and sustainable development, De Montfort University, Leicester,
[email protected]
4
School of Information Engineering Jiangxi University of Science and Technology, Ganzhou, China,
[email protected].
Abstract
Deep learning is a new area of machine learning research. Deep learning technology applies the nonlinear and advanced
transformation of model abstraction into a large database. The latest development shows that deep learning in various fields
and greatly contributed to artificial intelligence so far. This article reviews the contributions and new applications of deep
learning. The main target of this review is to give the summarize points for scholars to have the analysis about applications
and algorithms. Then review tries to investigate the main applications and uses algorithms. In addition, the advantages of
using the method of deep learning and its hierarchical and nonlinear functioning are introduced and compared to traditional
algorithms in common applications. The following three criteria should be taken into consideration when choosing the area
of application. (1) expertise or knowledge of the author; (2) the successful application of deep learning technology has
changed the field of application, such as voice recognition, chat robots, search technology and vision; and (3) deep learning
can have a significant impact on the application domain and benefit from recent research with natural language and text
processing, information recovery and multimodal information processing resulting from multitasking deep learning. This
review provides a general overview of a new concept and the growing benefits and popularity of deep learning, which can
help researchers and students interested in deep learning methods.
Keywords: Deep learning, face recognition, speech recognition, medical image recognition, character recognition,
Copyright © 2022 Ata Jahangir Moshayedi et al., licensed to EAI. This is an open access article distributed under the terms of the
Creative Commons Attribution license, which permits unlimited use, distribution and reproduction in any medium so long as the
original work is properly cited.
doi: 10.4108/10.4108/airo.v1i.19
of DL are explored, and a chronological advancement in validation loss for training without any additional
research has been shown in each field. This review paper supervision signals. They found that it automatically learns
represents the significance of research in DL and shows to support high-quality face images while rejecting low-
how it will soon become the world’s future. Also, its quality images such as blur, occlusion, and incorrect
advantages and disadvantages are discussed. In the next exposure of the human face.
section, some accredited researches are described. The Ding et al, (2017) [4] present FaceNet2ExpNet, a novel
article discusses facial recognition, speech recognition, idea to train an expression recognition network based on
image recognition or handwriting recognition, virtual static images. They first proposed a new distribution
assistants, chatbots, healthcare, entertainment and music function to simulate the high-level neurons expressing the
and finally, robotics. Detailed discussion is also done in network. On this basis, a two-stage training algorithm is
facial recognition, speech recognition and healthcare designed. In the pre-training stage, they train the
sectors where DL are used more than other applications. convolutional layer of the expression network, which is
Then finally, the popular techniques used in DL are regularized by the face network; in the refinement stage,
summarized. they attach the fully connected layer to the pre-trained
convolutional layer and jointly train the entire network.
The visualization results show that the model trained with
2. Research on the Applications of DL their method captures improved high-level expression
semantics. The evaluation of the four public expression
databases of CK+, Oulu-CASIA, TFD and SFEW shows
2.1. DL on Facial Recognition that their method achieves better results than other
methods.
Liu et al. (2017) [1] proposed a novel approach towards Ding et al, (2017) [5] proposed a comprehensive
face recognition using Convolutional Neural Networks framework based on CNN to overcome the challenges of
(CNNs). They used angular softmax loss (A-Softmax loss) Video-based Face Recognition (VFR). First, to learn fuzzy
to enable CNNs and learn the discriminative features of a robust face representations, researchers artificially blur the
face with angular margin. They showed that A-Softmax training data composed of clear still images to make up for
loss could provide a suitable geometric interpretation by the lack of real-world video training data. Using training
constraining the learned features to be discriminative on data composed of still images and artificial blur data, CNN
hyper-spherical manifolds, which essentially matches the is encouraged to automatically learn blur-insensitive
prior features that the surfaces are also located on nonlinear features. Secondly, to enhance the robustness of CNN
manifolds. This connection makes A-Softmax very features to pose changes and occlusions, they proposed a
effective for learning facial expressions. trunk branch integrated CNN model (TBE-CNN), which
Luan Tran et al, (2017) [2] have proposed Disentangled extracts supplements from the overall face image and the
Representation learning Generative Adversarial Network cropped patches around the facial parts information. TBE-
(DR-GAN) with three different novelties. First, in addition CNN is an end-to-end model that efficiently extracts
to image synthesis, the encoder-decoder structure of the features by sharing low-level and mid-level convolutional
generator allows DR-GAN to learn, generate and layers between the backbone and branch networks. Third,
discriminate representations. Secondly, this representation to further improve the discrimination ability of the
is clearly separated from other facial changes such as poses representation learned by TBE-CNN, they proposed an
through the pose code provided to the decoder and the pose improved triplet loss function. The systematic experiment
estimation in the discriminator. Third, DR-GAN can take proves the effectiveness of the proposed technique. The
one or more images as input to generate a unified most impressive thing is that TBE-CNN has achieved the
representation and any number of composite images. most advanced performance on three popular video face
Quantitative and qualitative evaluations of controlled databases (PaSC, COX face and YouTube Faces). Using
databases and wild databases show that DR-GAN is the proposed technology, this research won first place in
superior to the latest technology. the BTAS 2016 video person recognition evaluation.
Yang et al, (2017) [3] proposed a Neural Aggregation Nech et al, (2017) [6] stated some questions. If all
Network (NAN) for video face recognition. The network algorithms for face recognition are very different; are large
takes a face video or a face image set (with a variable datasets the real key to more accurate facial recognition?
number of face images) as its input and generates a Where does facial recognition need to improve? In a
compact, fixed-size feature representation for recognition. pursuit to find the answers to these sorts of questions, the
The entire network consists of two modules. The feature researchers created a benchmark, MF2, which requires all
embedding module is a deep CNN, which maps each face algorithms to be trained on the same data and tested on a
image to a feature vector. The aggregation module is million scale. MF2 is a large public collection with 672K
composed of two attention blocks. The two attention blocks identities and 4.7 million photos, aiming to create a level
adaptively aggregate feature vectors to form a single playing field for large-scale face recognition. We compare
feature in the convex hull that they span. Due to the our results with the results of two other large benchmarks,
attention mechanism, the aggregation is invariant to the MegaFace Challenge and MS-Celebs-1M, in which groups
image order. Their NAN uses standard classification or can train on any private or public, and large or small
collection. Some key findings are (1) the algorithms trained 2.2. DL on Speech Recognition
on MF2 can reach the most advanced level, and the results
are comparable to those trained on large-scale private sets; Rabiner (1997) [11] presented the current functions of
(2) some algorithms perform well after training on MF2; speech recognition systems and how they are used in
(3) aging invariance as with MegaFace, the accuracy is today’s services and applications, and how they will evolve
lower, and in future tests, it is determined that a greater age over time into the next generation of voice recognition
changes may be required in the adjustment of identity or services. he pointed out that there are different types of
algorithm. speech recognition, including spontaneous, conversational,
Wang et al. (2018) [7] have stated that the traditional hybrid recognition, etc. He also showed that the vocabulary
SoftMax loss of deep CNNs usually lacks the power of size goes up to 64,000 words but comes with an error rate
discrimination. Hence, they proposed a new loss function, as high as 50%. He stated that although the main
the Large Margin Cosine Loss (LMCL), to realize this idea application of speech recognition would be
differently. They use L2 to normalize features and weight telecommunication in the future, it could be extended to
vectors to eliminate radial changes, re-express the services like voice dialling, voice banking service, voice
maximum soft loss as a cosine loss, and introduce a cosine assistance, call completion agent technology, customer
margin term to maximize the decision margin angular care computer to the phone integration voice dictation and
space further. As a result, the smallest intra-class variance so on.
and the largest inter-class variance are achieved through Li et al, (2004) [12] have mentioned several challenges in
normalisation and maximisation of the cosine decision speech recognition. Human and machine speech
margin. They call this approach Cosface. In the mega face recognition has a big gap. If there is no underlying
challenge, the CosFace approach shows its superiority for recognition technology that can provide sufficient
both the identification and verification tasks on both robustness and low error performance, the mainstream
protocols. adoption of speech recognition is impossible. One of these
Mittal et al. (2018) [8] designed a mobile application that programs is a computational model based on
performs real-time multiple face recognition using CNNs. pronunciation, the science of language, and some
Their model implemented a ResNet network. It tries to fundamental aspects of human linguistic communication.
learn a mapping from face images to a compact 128-D Based on the general principles of computational phonetics
Euclidean space where the correlation between two vectors and phonology, a quantitative statistical model is
represents the face similarity. This method optimizes the established. Advanced algorithms can be developed to
Inception model using Triplet Loss Function and optimize the measurement parameters automatically.
curriculum learning. They used the Labeled Faces in the These parameters have physical significance. Another
Wild dataset to train their images. With this training, they program is based on humans. All aspects of speech
classified the faces extracting features afterwards. Finally, perception go contrary to the common acoustic speech
with the data augmentation, their system transforms information used in the spectral envelopes frame by frame
various degrees on the photos, and finally, identification in all the main speech recognition systems. This technique
was achieved. They found that with the increasing number aims to develop a new processing front acoustic that does
of members in the photos or videos, recognition precision not depend on the spectral envelope and uses several fronts
dropped. on the whole of the temporal and frequency plane. The
Cheng et al, (2018) [9] have proposed two face recognition researchers state that overcoming the challenge of making
methods based on multiple CNN classifiers. These speech recognition systems robust in noisy acoustic
methods have the same CNN structure but have different environments and the challenge of creating workable
training sets. The first method used random sampling and recognition systems for natural, free-style speech is the
replacement to train a single CNN using different training ultimate challenge for the system. They also mentioned a
sets generated in the original data set. The second method few existing directives to these challenges and showed the
generated different training sets by enlarging and reducing architecture of Microsoft’s Speech Recognition System.
the image size of the original data set. These two methods Quoc et al. (2012) [13] proposed a robust speech
finally used the voting method to merge a single CNN recognition based on a binaural speech enhancement
result. Experiments showed that the ensemble CNNs system as a pre-processing step. The system uses existing
classifier is better than the single CNNs classifier, and face de-redundancy techniques, followed by a noise removal
recognition accuracy is as 99.5%. Deng et al. (2019) [10] algorithm based on spatial masking, in which only signals
introduced Arcface that uses additive angular margin loss from the desired direction are retained by using a threshold
on Deep CNNs for facial recognition. Their angle. Here, they urge consideration of adaptive
experimentation showed that Arcface surpasses cosface, calculation, where the threshold is first learned in a few
sphereface, centerloss, VGGFace and other opensource noise frames and then updated frame by frame. Their robust
face recognition methods by a fair margin. They have also speech recognition algorithm is based on binaural sound
shown comprehensive comparisons with other types of source separation and data loss. The simulation results
losses in facial recognition, including each study’s code. show that because the target speech data obtained by the
binaural sound source separation method can eliminate
noise and interference, the algorithm can significantly
improve the speech recognition performance in a complex synthesis, machine translation, image caption generation,
acoustic environment. The result of speech recognition in and visual object classification. Researchers have
a real environment shows the effectiveness of this method. introduced an extension of the attention-based recurrent
Miao et al. (2013) [14] studied the application of deep network to make it suitable for speech recognition.
maxout network (DMN) in large vocabulary continuous Learning to recognize speech can be seen as learning to
speech recognition (LVCSR) tasks. They experimented on generate a sequence (transcription) given another sequence
the challenging Babel corpus. They concluded that, (speech).
compared with DNNs, DMNs could improve the Chorowski et al, (2015) [17] proposed and evaluated an
performance of hybrid and BNF systems under limited LP end-to-end trainable speech recognition architecture based
conditions. Their conclusions further include that stacked on a hybrid attention mechanism that combines content and
denoising autoencoders (SDA)-based pre-training is location information to select the next location in the input
effective for DMN initialization and brings benefits when sequence for decoding. An ideal feature of the model is that
DMN becomes truly deep, and DMN can be used as a it can recognize utterances that are much longer than the
sparse feature extractor to generate hierarchical high-level training time. In conclusion, their work provides two new
representations. They mentioned that in future work, they ideas for the attention mechanism: a better-standardized
are interested in researching Restricted Boltzmann method to produce smoother alignments and a general
machines (RBM) for DMNs initialization, which requires principle for extracting and using features from previous
a probabilistic pooling strategy to achieve a fully alignments. They state that both methods may be applied
generative model, and they hope to extend the idea of to fields other than speech recognition. However, despite
sparse feature extraction to bottleneck features (BNF) and the great achievements made in the past few decades,
generate sparseness for tandem systems Bottleneck natural and powerful human-machine voice interaction
characteristics. seems still out of reach, especially in challenging
Based on the Gaussian mixture model and the improved N- environments with significant noise and reverberation. To
best speech recognition algorithm, an improved lattice- improve robustness, modern speech recognizers usually
based speech keyword recognition system is proposed by use an acoustic model based on a recurrent neural network
Xiao Xi et al, (2015) [15]. First, they used tests to evaluate (RNN), which can naturally take advantage of large time
different simplified structures of the Gaussian mixture context and long-term speech modulation. Therefore, it is
model. Then, an N-best token passing algorithm is very meaningful to continue to study appropriate
proposed based on the classic token passing algorithm and technologies to improve the effectiveness of RNN in
using the unique pronunciation rules of Chinese. These two processing speech signals.
modifications improve the performance of 1-best and N- Ravanelli et al. (2018) [18] modified one of the most
best candidate speech recognition. Finally, a keyword popular RNN models, the gated recurrent unit (GRU), and
recognition system based on the N-best lattice was proposed a simplified architecture that is very effective for
developed to verify the effectiveness of these ASR. The work has contributed on two fronts: First, they
improvements. analyzed the role played by the reset gate and showed that
Noda et al. (2015) [16] introduced a connectionist Hidden there is significant redundancy with the update gate. They
Markov Model (HMM) system for anti-noise AVSR. First, suggested removing the former from the GRU design,
use a deep denoising autoencoder to obtain audio features resulting in a more efficient and compact single-gate
with strong noise robustness. By preparing training data for model. Then, they suggested replacing the hyperbolic
a network with audio characteristics degraded in several tangent with a corrected linear unit activation. This change
consecutive pairwise steps and corresponding eigen is well combined with batch normalization, which can help
characteristics, the network is formed to produce noise the model learn long-term dependencies without numerical
elimination characteristics from the corresponding noise problems. The results show that the proposed architecture
degraded characteristics. Secondly, use a CNN to extract light GRU reduces the epoch training time by more than
visual features from the original oral region image. By 30% compared with the standard GRU and continuously
preparing the training data of CNN as a pair of original improves the recognition accuracy in different tasks, input
images and corresponding phoneme label output, the features, noise conditions, and different ASR paradigms
network is trained to predict phoneme labels from the from the standard DNN- HMM speech recognizer to the
corresponding oral region input image. Finally, a multi- end-to-end connectionist temporal classification model.
stream hidden Markov model (MSHMM) is used to
integrate independently trained audio and video HMMs
with their respective features. By comparing the normal 2.3. DL in Image Recognition
and mel-frequency cepstral coefficients (MFCCs) (MFCC)
as HMM audio features, our single-peak isolated word Image recognition is a subclass or a derivation of computer
recognition results show that under the 10 dB signal-to- vision and AI, representing a set of image detection and
noise ratio (SNR) of the audio signal input, the denoising analysis methods to achieve automation of specific tasks -
MFCC can obtain a word recognition rate gain of about identify places, people, objects and many other types of
65%. Attention-based recursive networks have been items in a picture and draw conclusions through analysis
successfully applied to many tasks such as handwriting [19-20].
Today, more and more people are using pictures to aloud, finding phone numbers, scheduling, making phone
represent and convey information. It is convenient for us to calls, and reminding appointments end-user. Currently,
learn much information from pictures. Image recognition popular virtual assistants include Amazon Alexa, Apple’s
is an important area of research that is widely used. For Siri, Google Assistant, and Microsoft’s Cortana digital
image recognition issues such as handwritten assistant, all built into Windows Phone 8.1 and Windows
classification, we should know how to use the data to 10 [35-36]. Although the definition focuses on the digital
represent the image. The data here is not line pixels but form of virtual assistants, the term virtual assistant or
image features with an advanced representation. The virtual personal assistant is also often used to describe
quality of the extraction of the characteristics is very contract workers who work from home, who usually
important for the results [21-23]. perform administrative tasks by administrative assistants or
Wu et al, (2015) [24] applied DL to handwritten character secretaries [37-39]. Virtual assistants can also be compared
recognition and explored two common DL algorithms: the with another type of consumer-oriented AI programming
CNN and the deep belief network (DBN). CNN and DBN (called smart advisors). Smart advisors are subject-
in the MNIST database and in the actual handwritten oriented, while virtual assistants are task-oriented. Virtual
characters database. CNN and DBN in the MNIST assistants are typically cloud-based apps to function that is
database were 99.28% and 98.12 % accurate, respectively, nowadays integrated into almost all devices by default.
and the accuracy of the classification accuracy in the Three of these apps are Siri on Apple, Cortana on
database of the actual handwritten characters was 92.91% Microsoft, and Google Assistant on Android. There are
and 91.66%. Experimental results show that DL has a good also devices dedicated to virtual assistance. Amazon’s
ability to learn characteristics. There is no need to extract Alexa, Google Assistant and Microsoft’s Cortana, and
the characteristics manually; DL helps to understand the Apple’s Siri on Apple Home are the most popular. Usually,
essential characteristics of the data better. these devices have a call signal like for Google Assistant;
Elleuch et al, (2016) proposes a method that constitutes two it’s “Hey Google or Okay Google”, or for Cortana, it’s
classifiers – CNN and SVM. They use it to identify Arabic “Hey Cortana”. The LEDs on the device indicate to the user
characters from handwritten text. CNN and dropout that they are ready to receive orders. Often these are simple
regularization are not only used by this paper but also in language requests such as “Tell me a joke,” or “play some
other projects such as [25-29]. music”, or “buy a shirt”. These requests will be addressed
Pham et al. [30] used RNN with Dropout as one of the first and stored in the Amazon cloud. The technology that
projects to detect handwritten text and tested with a range powers virtual assistants requires large amounts of data
of handwritten databases. Sun et al, [31] used Deep LSTM from artificial power intelligence (AI) platforms, including
to detect Chinese handwriting. Their methods have been ML, DL and AI, natural language processing, and speech
evaluated on the CASIA-OLHWDB set of data. Compared recognition platforms. When the user interacts with the
to the most advanced method, the accuracy and precision virtual assistant, AI programming uses complex algorithms
of the test set is reduced by more than 30 % over relative to learn from data input and better predict the end user’s
error [32]. needs. Virtual assistants usually perform simple tasks for
Yang et al, (2016) [33] recommend combining Neural end-users, such as adding tasks to the calendar; providing
Network Classifier with Style Transfer Mapping (STM) for information that is usually searched in a web browser; or
unsupervised author adaptation, which only requires controlling and checking the status of smart home devices,
author-specific unmarked data and is more common and including lights, Air conditioners, ovens etc.
efficient than supervised adaptation. To increase the Users also require virtual assistants to make and receive
performance of neural network classifiers, they used calls, create text messages, get directions, listen to the
techniques such as Dropout, reread, momentum and Deep news, and weather forecasts, find hotels or restaurants,
Monitoring Strategies. Experiments on CASIA- check flight reservations, listen to music, or play games.
OLHWDB, a Chinese database writing online, show that Page et al, (2017) [40] showed how an Artificially
this method effectively improves classification accuracy. Intelligent Virtual Assistant Helps Students Navigate the
Islam et al (2018) [34] presents an image-based food Road to College. This paper shows a pure application of an
recognition method. They used CNNs to classify food AI-powered VA to reduce workload in real life. The paper
images. The project used 16,643 images to classify a food states that they use Conversational AI to provide
dataset containing different food categories with 92.86% personalized, SMS-based outreach and guidance to
accuracy. thousands of college freshmen to help them complete each
task that requires support, thereby effectively supporting
them. The system which was tested through field trials at
2.4. Virtual Assistants Georgia State University (GSU) was called Pounce. GSU
promised that students receiving treatment via the Virtual
Virtual Assistant, also known as AI assistant or digital Assistant showed a greater success rate in terms of pre-
assistant, is an application that understands natural entry requirements. And at the same period of time, the
language voice instructions and performs tasks for the user. pressure on university teachers is greatly reduced.
These tasks are usually performed by a personal assistant Kepuska et al, (2018) [41] have designed a prototype
or secretary and include dictating, reading texts or emails system that showcases the future of current Virtual
Personal Assistants (VPAs). Their application used a of pathological signs suggestive of respiratory disease,
multi-modal dialogue system that processes two or more reducing the diagnostic error, increasing efficiency, and
combined user input modes, such as voice, image, video, quality of care in the medical field, reducing efforts of
touch, manual gestures, gaze, and body movement, to radiologists.
design the next generation of VPAs model. The new model
of VPAs will be used to increase human-machine
interaction by using different technologies, such as gesture 2.5. Chatbots
recognition, image and video recognition, voice
recognition, many knowledge-based on conversations and Chatbots are programs that use natural languages to
dialogues and a common knowledge base. In addition, the interact with users. This technology has been gaining
new VPA system can be used in various other application popularity since the 1960s [46]. Chatbots had been in use
areas, including education aid, medical assistance, robots by almost every field of business, e-commerce,
and vehicles, systems for people with disabilities, home organization out there. Prominent uses include as a Tool to
automation and secure access control. Learn and Practice a Language, as a tool to retrieve
Giancarlo Iannizzotto et al. (2018) [42] combine some of information, as a tool to conduct business, as a tool to
the most advanced technologies in the fields of computer provide customer support etc. Where there is no voice
vision, DL, speech generation and recognition, and AI into supported virtual assistant or where users cannot use voice-
the virtual assistant architecture of the smart home based VA, Chatbots are the way to go. One recent example
automation system. The proposed assistant is efficient and is Microsoft’s Bing Chatbox on their search engine [47],
customizable, and the implemented prototype runs on a which uses AI. And chatbots have been on the rise since
low-cost, small-sized Raspberry PI 3 device. The system 2016 at an unprecedented speed with the introduction of
was integrated with an open-source home automation Microsoft’s Cortana, Amazon’s Alexa etc [48].
environment for testing purposes and ran for a few days Chatbots made with traditional methods such as
while encouraging people to interact with it and proving conditional programming are not as efficient as using AI
that it was accurate, reliable, and attractive. and DL. The current-day DL method has made training
Dipanshu et al, (2020) [43] designed a system that can models for Natural Language Processing (NLP) far easier
successfully use the integrated webcam to capture gestures, than before, which is exactly why chatbots and similar
process them, convert them into text format, display them fields like VA can take advantage of it [49].
on the input frame, and convert them into text then into ELIZA is one of the first Chatbots to be built as a program
audio format when receiving a call command. The audio to pass the Turing test [50] but it was simple and totally
becomes the query of the virtual assistant, and the audio built on a rule-based framework. Other chatbots such as
output is successfully converted to text format and A.L.I.C.E. and PARRY was also built using the same
displayed on the screen again. This system is built to technology [36 and 49]. Jabberwocky was one of the first
showcase a gesture-based virtual assistant that is targeted chatbots to be built with AI to conduct hilarious
towards people with speaking or hearing disabilities. They conversations [51]. Hybrid chatbots with Recurrent Neural
have used DL and TensorFlow along with OpenCV to networks (RNN) and Long/Short-Term Memory (LSTM)
accomplish their goal. Vishnu et al, (2021) [44] proposed prove more efficient in many cases [49].
a CNN-based solution that utilizes transfer learning for Wu et al, (2019) [52] state the different ways chatbots are
developing a scene recognition Android application. They built in their presentation. They show how retrieval-based
have used MobileNet to implement the CNN model with chatbots are built with single turn response selection and
the TensorFlow ML framework. To reduce the inference multi-turn response selection. They infer various
time, they used hardware acceleration using GPU is used conclusions based on comparing the different ways DL is
in the application. They trained the system using images used in building chatbots. For example, in the case of single
with occlusion, different illumination and background turn response selection, the neural tensor is a powerful
clutter for improving the robustness. In their system, they matching function and that combining information from
trained the system with the MIT Indoor Recognition different sources can be very beneficial for matching
dataset and then used it for testing. The main goal is to sequence etc.
showcase how DL can be used to expedite current- Research has been on a high horse since 2016 due to the
generation virtual assistants’ advancement further and even sudden uprising for using human interaction programs, and
expand its territory into the Robotic field with AI. various methods and frameworks have been in
Carnier et al, (2021) [45] ToraxIA is a Virtual Assistant for development since then to make the job even smoother. For
Radiologists Based on DL from Chest X-Ray. The system example, some notable mentions are the DL to Respond
was trained with over 240000 images of chest x-rays. And (D2LR) model [53] with MLP matching, Coupled LSTM
the system can detect several pathological signs from them [54] with MAP matching etc. Conversational AI for Chit-
with an accuracy of about 97%. Even it was used to detect Chat Programs or chatbots is catching on fire as it is more
COVID-19 signs by training the system to differentiate and more involved in mass business domains because big
between normal x-ray images and images from the data technology is also on the rise [55].
COVID-19 dataset. Ultimately, this paper presents a virtual
assistant made with DL capabilities for automatic detection 2.6. DL in Healthcare
Health care is the care or improvement of health through Longitudinal sequence of model Stacked [78]
the prevention, diagnosis, treatment, recovery, treatment of serum uric acid measurement to AE
indicate multiple population
illnesses, illnesses, injuries and other physical and mental subtypes and distinguish the
injuries. Definition of anger from the book that everyone characteristics of uric acid from
knows because it concerns our well-being. In this area, the gout and acute leukemia
convergence of technologies makes it easier to achieve this Doctor AI : Use the patient's GRU [79]
medical history to predict the RNN
goal for people from all walks of life. Implications of DL diagnosis and medication for
in healthcare has been increasing in the past 6 years [56- follow-up visits
58]. This has mostly happened because of the increase in Deeper: An end-to-end system for CNN [80]
Big Data analytics and research, IoT advances, and ML predicting accidental readmissions
after discharge
advancements [59 and 60]. CNNs can be trained for Predicting disease onset from LSTM [81]
medical imaging thanks to the massive advancements of longitudinal laboratory tests RNN
Computer Vision with DL. CNNs and RNNs both can be De-Identification the LSTM [82]
used to predict health problems [56]. DNNs can be used in identification of patient clinical RNN
genomics for research in genetics, gene splicing, records
Predicting chromatin markers CNN [83]
generating new gene sequences etc [61]. The following from DNA sequences
table 1 shows the various applications of DL in Health care. Basset: an open-source platform CNN [84]
for predicting DNase I
Table 1. A brief list of DL Applications in Healthcare [62] hypersensitivity in multiple cell
Type Application Model Ref. types and quantifying the effect of
SNV on chromatin accessibility
Early diagnosis of Alzheimer's Stacked [63] DeepBind: Predict the specificity CNN [85]
disease by brain magnetic Sparse Genetics of DNA and RNA binding
resonance imaging AE proteins
Brain magnetic resonance RBM [64] Predicting methylation status in CNN [86]
imaging detects the diversity of single-cell bisulfite sequencing
Alzheimer's disease variant studies
patterns Estimation of the prevalence of CNN [87]
Automatic segmentation of knee CNN [65]
Medical imaging
system includes a scene recognition system and hardware industrial robots for purposes like object identification and
modules. When users watch movies and animations at picking [112].
home, these modules provide users with tactile sensations.
Their recognition module uses Google Cloud Vision to Le et al. (2019) [113] showed this field using Kinect and
detect common scene elements in the video, such as fire, Denso robot. They teach the robot to identify and pick up
explosion, wind, and rain. After the system considers the USB packs using both offline and online methods. In the
target detection result through the scene recognition era of online purchase and e-commerce, where goods are
system, the system generates the corresponding tactile or delivered right to the doorstep, object picking or bin
haptic sensation. Their system seamlessly integrates DL, picking is very important; Amazon conducted the first
auditory signals, and touch to provide an enhanced viewing competition of this exact challenge using DL in 2015 [114].
experience. This is one of the more recent research as to Outside of concentrated fields like industrial robotics, there
show how DL can be used in media consumption. But DL are works to other fields like civil engineering and
has been in use for a longer period in digital games. Digital architecture that have been made with DL. McLaughlin et
game-based learning or DGBL is an old learning method to al. (2020) [115] used CNN and LIDAR to collect and
learn about player goals and playing patterns. This is used analyze 3D spatial data, which a robot used to
to analyze the in-game approaches of tutorial bots and automatically quantify defects in concrete bridges. The
feedback systems [97]. Popular games like ATARI, researcher also used similar structures, i.e. mobile net v2
Racing, DOOM, Minecraft, and StarCraft all use DL, albeit and deep lab v3, to automatically assess concrete
not a perfect implementation, and can cause problems such delamination [116-120].
as unwanted NPC behaviour [98], enhance user experience
and make the game easier and suits it to the consumer.
Prediction is the future of DL [99]. Player data is used to 3. Conclusions
predict future game contents and delivered as DLC, which
is the current business model for the game industry. DL is raging today, as major breakthroughs in artificial
Gudmundsson et al. (2018) [100] showed that it is possible neural networks have prompted companies in all industries
to predict the next most “human” action, and from that, to implement DL solutions as part of their strategy over the
they can update future contents of the game. They used past few years. From chatbots in client service to image and
Candy Crush Saga (CCS) and Candy Crush Saga (CCSS) object recognition in retail and more, DL has opened up
player data to learn human playtesting methods and many new and complex AI applications. In recent years,
generate key metrics for future content creation. Another DL has been particularly appealing to many organizations
major advancement in the entertainment industry with the as more and more publicly accessible data pre-training
help of DL is Deepfake. It is a technology where the models have been developed. However, this does not mean
algorithm learns the facial characteristic of a person, and that DL answers all questions related to ML. One of the
those facial characteristics are copied to a video or image chief advantages of DL is its ability to resolve complex
of another individual. In easier words, it's refacing problems that need discovering hidden patterns in the data
someone. Deepfake movies can make the movie setting and a deep understanding of the complex relationships
more efficient, and we can watch movies without subtitles between a large number of interrelated variables. DL
in our native language [101]. In the music and audio sector, algorithms can learn patterns on their own hidden from the
DL is being used. Generating realistic original music is data, combine them and build more efficient decision rules.
time-consuming and requires energy. Technology has DL is excellent for complex tasks that often require
made that easier. DL and midi files of music can learn processing large amounts of unstructured data, such as
music genre and generate real music [102]. Online music image classification, natural language processing, speech
streaming platforms like Spotify, YouTube music, apple recognition, etc. However, classical ML may be better for
music rely on AI and DL for the music suggestion to their simple tasks involving more direct features engineering
users. Still, novel ways to do the same task arises like T- that do not require unstructured data processing.
RECSYS [103]. Services like Google, shazam used DL to In this paper, several fields have been described that is
extract music information from the audio [104]. shining in DL – image recognition, object detection,
healthcare, education, entertainment, music, robotics etc. in
whole DL consist of four main network named CNNs,
2.9. DL in Robotics recurrent neural networks (RNNS), generative adversarial
networks (GAN) and reinforcement learning (RL). The
In the past 5 years, DL technology has completely changed CNNs are mainly used in image and video applications like
many aspects of computer vision and has been rapidly face recognition and object detection, RNN are applicable
adopted by robotics [105-110]. for text classifications and commercials like Exchange and
Punjani et al. (2015) [111] in their research, shows that the image captions. In Computer Vision and Natural Language
performance of DNN in obtaining helicopter dynamics of Processing (NLP), The Gans are mostly used for non-real
the remote-controlled helicopter is about 60% higher than images like deep fake videos, text to image conversion, and
that of other methods. DL is now being actively used in photo editing. And the last on RL is used as the basis of
computational neuroscience to model decision-making
processes like self-driving cars, traffic light control, There is no standard theory to guide you in choosing the
robotics. A summary of each application has been appropriate DL tool, as it requires knowledge of topology,
described in the following figure (1). training methods and other parameters, so it is not easy to
As the figure 1 shows, the most popular DL technique is adopt it by less qualified people. It is not easy to understand
CNN. Every field’s research begins from CNN. Then other outputs based on learning, and a classifier must understand.
techniques are incorporated on top of it. The second most The CNN-based algorithm performs these tasks. DL is
used technique is RNN or Recurrent Neural Network. CNN currently known as the hot topic in the field of ML and
and RNN seem to go hand in hand in all fields. The social cognition. This article discusses face recognition,
expansion of DL has been steadily spreading over the speech recognition, Chinese character recognition, neural
world. Steady but not slow. As seen above how AI and DL networks, and chat robots. We also discussed the future
has been touching all aspects of normal life. From impact of these applications on society. In DL, the learning
education to business to industry to entertainment – very of humans and animals is unsupervised: we tend to
soon, all fields will be using ML to complete their discover the earth's structure through perception rather than
fundamental tasks. And it looks like a blessing. The main by being told the name of each object. Human vision is a
advantages of DL are as follows: powerful method that continuously collects optical arrays
Features are automatically derived and optimized to in an intelligent, task-specific manner, using tiny, high-
achieve desired results. Features do not need to be extracted resolution fovea and super-large, low-resolution surrounds.
in advance. This avoids time-consuming ML techniques. We tend to expect that, in the long run, many advances in
The robustness of ML to natural variations in data. vision will be restored from systems that take square
The same neural network-based approach can be applied to measurements from the end of training to the end and mix
many applications and different types of data. Conv networks and RNNs. These RNNs use reinforcement
The GPU can be used to perform large-scale parallel learning to let you Make up your mind anywhere. Systems
calculations and is expandable for large amounts of data, that combine DL and reinforcement learning begin to
and when the data is large, it can provide better measure in their infancy. Tongue understanding is another
performance results. space during which DL will have a huge impact in the next
The DL architecture is flexible and can adapt to new future few years. Ultimately, major advances in computer science
issues. can be achieved through systems that combine illustration
learning with advanced reasoning. However, language and
handwriting have adopted in-depth study and
straightforward reasoning. DL is extensively used in the
field of medical image recognition. Face recognition,
recognition of character and many things. It trains a model
on a given set of information to complete the specific task
of new information, but ancient medical image recognition
is inefficient in feature extraction and data processing, and
its popularization effect is not ideal. These application
domain units are changing with each passing day, and a
large number of the latest skill domain units have also
joined these technologies.
Acknowledgements.
This work was supported by Jiangxi University of Science and
Technology, 341000, Ganzhou, P.R China, [grant number
205200100460].
References
[1] Liu W, Wen Y, Yu Z, Li M, B Raj, Song L, Sphereface:
Deep hypersphere embedding for face recognition, in
Figure 1. DL Techniques used in different fields Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, 212–220.
But nothing is without drawbacks. Some of the downsides [2] Tran L, Yin X, Liu X, Disentangled representation learning
of DL are as follows: gan for pose-invariant face recognition. in Proceedings of
the IEEE conference on computer vision and pattern
It takes a lot of data to be better than other technologies.
recognition. 2017; 1415–1424.
The cost of training a model is very high due to complex [3] Yang J, et al., Neural aggregation network for video face
data models. Besides, DL requires expensive GPUs and recognition. in Proceedings of the IEEE conference on
hundreds of machines, which increases the cost for users. computer vision and pattern recognition, 2017; 4362–4371.
[4] Ding H, Zhou SK, Chellappa R, Facenet2expnet: [22] Khan AT, Cao X, Li Z, and Li S, Evolutionary Computation
Regularizing a deep face recognition net for expression Based Real-time Robot Arm Path-planning Using Beetle
recognition. in 2017 12th IEEE International Conference on Antennae Search. EAI Endorsed Trans AI Robotics, 2022;
Automatic Face & Gesture Recognition. 2017; 118–126. 1:1–10.
[5] Ding C, Tao D, Trunk-branch ensemble convolutional [23] Chen Z, Walters J, Xiao G, and Li S, An Enhanced GRU
neural networks for video-based face recognition,” IEEE Model with Application to Manipulator Trajectory
Trans. Pattern Anal. Mach. Intell., 2017; 40(4):1002–1014. Tracking”, EAI Endorsed Trans AI Robotics, 2022; 1:1–11.
[6] A Nech, I Kemelmacher-Shlizerman, Level playing field for [24] Wu M, and Chen L, Image recognition based on deep
million scale face recognition. in Proceedings - 30th IEEE learning. in 2015 Chinese Automation Congress (CAC),
Conference on Computer Vision and Pattern Recognition, 2015, pp. 542–546, DOI: 10.1109/CAC.2015.7382560.
CVPR 2017, 2017; 2017-Janua:3406–3415, DOI: [25] Miao Y, Metze F, Improving low-resource CD-DNN-HMM
10.1109/CVPR.2017.363. using dropout and multilingual DNN training. in
[7] Wang H, et al., Cosface: Large margin cosine loss for deep Interspeech, 2013; 13:2237–2241.
face recognition. in Proceedings of the IEEE conference on [26] Shen J, Shafiq MO, Deep Learning Convolutional Neural
computer vision and pattern recognition, 2018, 5265–5274. Networks with Dropout - A Parallel Approach,” in 2018
[8] Mittal S, Agarwal S, Nigam MJ, Real time multiple face 17th IEEE International Conference on Machine Learning
recognition: A deep learning approach. in ACM and Applications (ICMLA), 2018; 572–577, DOI:
International Conference Proceeding Series, 2018, 70–76, 10.1109/ICMLA.2018.00092.
DOI: 10.1145/3299852.3299853. [27] Maalej R, Tagougui N, Kherallah M, Online Arabic
[9] Cheng WC, Wu TY, Li DW, Ensemble convolutional neural Handwriting Recognition with Dropout Applied in Deep
networks for face recognition. ACM Int. Conf. Proceeding Recurrent Neural Networks. in 2016 12th IAPR Workshop
Ser. 2018; 40(4):1002–1014. DOI: on Document Analysis Systems (DAS), 2016; 417–421,
10.1145/3302425.3302459. DOI: 10.1109/DAS.2016.49.
[10] Deng J, Guo J, Xue N, Zafeiriou S, Arcface: Additive [28] Mansoorianfar, M, et al., Amorphous/crystalline phase
angular margin loss for deep face recognition. in control of nanotubular TiO2 membranes via pressure-
Proceedings of the IEEE/CVF Conference on Computer engineered anodizing. Materials & Design, 2021;
Vision and Pattern Recognition, 2019, 4690–4699. 198:109314.
[11] Rabiner LR, Applications of speech recognition in the area [29] Mansoorianfar, M, et al., MXene–laden bacteriophage: A
of telecommunications. in 1997 IEEE Workshop on new antibacterial candidate to control bacterial
Automatic Speech Recognition and Understanding contamination in water. Chemosphere, 2022; 290:133383.
Proceedings, 1997, 501–510. [30] Pham V, Bluche T, Kermorvant C, Louradour J, Dropout
[12] Deng L, Huang X, Challenges in adopting speech Improves Recurrent Neural Networks for Handwriting
recognition. Commun. ACM, 2004; 47(1) 69–75. Recognition. in 2014 14th International Conference on
[13] Quoc CN, Tien DT, Dang KN, Huu BN, Robust speech Frontiers in Handwriting Recognition, 2014; 285–290, DOI:
recognition based on binaural speech enhancement system 10.1109/ICFHR.2014.55.
as a preprocessing step. ACM Int. Conf. Proceeding Ser., [31] Sun L, Su T, Liu C, Wang R, Deep LSTM Networks for
2012; 91–96. DOI: 10.1145/2350716.2350732. Online Chinese Handwriting Recognition. in 2016 15th
[14] Miao Y, Metze F, Rawat S, Deep maxout networks for low- International Conference on Frontiers in Handwriting
resource speech recognition. in 2013 IEEE Workshop on Recognition (ICFHR). 2016; 271–276, DOI:
Automatic Speech Recognition and Understanding, 2013, 10.1109/ICFHR.2016.0059.
398–403. [32] Elleuch M, Maalej R, Kherallah M, A New Design Based-
[15] Xi X, Jingqian W, Improved lattice-based speech keyword SVM of the CNN Classifier Architecture with Dropout for
spotting algorithm. J. Tsinghua Univ. Science Technol. Offline Arabic Handwritten Recognition. Procedia Comput.
2015; 55(5):508–513. Sci., 2016; 80:1712–1723. DOI:
[16] Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T, https://doi.org/10.1016/j.procs.2016.05.512.
Audio-visual speech recognition using deep learning. Appl. [33] Yang HM, Zhang XY, Yin F, Luo Z, Liu CL, Unsupervised
Intell. 2015; 42(4):722–737. Adaptation of Neural Networks for Chinese Handwriting
[17] Chorowski J, Bahdanau D, Serdyuk D, Cho K, Bengio Y, Recognition. in 2016 15th International Conference on
Attention-based models for speech recognition arXiv Prepr. Frontiers in Handwriting Recognition (ICFHR), 2016, pp.
arXiv1506.07503, 2015. 512–517, DOI: 10.1109/ICFHR.2016.0100.
[18] Ravanelli M, Brakel P, Omologo M, Bengio Y, Light gated [34] Islam MT, Karim Siddique BMN, Rahman S, Jabid T,
recurrent units for speech recognition. IEEE Trans. Emerg. “Image Recognition with Deep Learning,” in 2018
Top. Comput. Intell. 2018; 2(2):92–102. International Conference on Intelligent Informatics and
[19] Image Recognition : A Complete Guide,” Deepomatic. Jan. Biomedical Sciences (ICIIBMS), 2018; 3:106–110, doi:
2019, Accessed: Sep. 25, 2021. 10.1109/ICIIBMS.2018.8550021.
https://deepomatic.com/en/what-is-image-recognition. [35] Botelho B, virtual assistant (AI assistant). techtarget, 2017.
[20] Torkian, N, et al, Synthesis and characterization of Ag-ion- https://searchcustomerexperience.techtarget.com/definition
exchanged zeolite/TiO2 nanocomposites for antibacterial /virtual-assistant-AI-assistant (accessed Aug. 21, 2021).
applications and photocatalytic degradation of antibiotics. [36] Akkara S, and T J, PI Controller Based Switching
Environmental Research, 2022; 207:112157 Reluctance Motor Drives using Smart Bacterial Foraging
[21] Pak M, Kim S, A review of deep learning in image Algorithm. EAI Endorsed Trans AI Robotics, 2022; 1:1–8.
recognition. in 2017 4th International Conference on [37] Duermyer R, Defenition and examples of virtual assistant:
Computer Applications and Information Processing The balance small business. 2021.
Technology (CAIPT), 2017; 1–3, DOI: https://www.thebalancesmb.com/virtual-assistant-1794441
10.1109/CAIPT.2017.8320684. (accessed Aug. 21, 2021).
[38] Mourtas S, Katsikis V, and Kasimis C, Feedback Control [56] Esteva A, et al., A guide to deep learning in healthcare. Nat.
Systems Stabilization Using a Bio-inspired Neural Med. 2019; 25(1):24–29. DOI: 10.1038/s41591-018-0316-
Network. EAI Endorsed Trans AI Robotics, 2022; 1;1–13. z.
[39] Nourouzi S, Kolahdooz A, Bakhshi-Juybari M, [57] Hojjati-Najafabadi, A, et al., Antibacterial and
Hosseinipour SJ, Effect of temperature on the photocatalytic behaviour of green synthesis of
Microstructure of semi-solid casting in cooling slope Zn0.95Ag0.05O nanoparticles using herbal medicine
method. Aerospace Mechanics Journal. 2013; 9(3): 55-63. extract. Ceramics International, 2021; 47(22):31617-31624.
[40] Page LC, Gehlbach H, How an Artificially Intelligent [58] Hojjati-Najafabadi, A, et al, A Tramadol Drug
Virtual Assistant Helps Students Navigate the Road to Electrochemical Sensor Amplified by Biosynthesized Au
College,” AERA Open. 2017; 3(4):2332858417749220. Nanoparticle Using Mentha aquatic Extract and Ionic
Oct., doi: 10.1177/2332858417749220. Liquid. Topics in Catalysis, 2021.
[41] Kepuska V, Bohouta G, Next-generation of virtual personal [59] Ardabili S, Mosavi A, Várkonyi-Kóczy AR, Advances in
assistants (microsoft cortana, apple siri, amazon alexa and machine learning modeling reviewing hybrid and ensemble
google home),” in 2018 IEEE 8th annual computing and methods. in International Conference on Global Research
communication workshop and conference (CCWC), 2018; and Education, 2019; 215–227.
99–103. [60] Martis RJ, Gurupur VP, Lin H, Islam A, Fernandes SL,
[42] Iannizzotto G, Lo Bello L, Nucita A, Grasso GM, A vision Recent advances in big data analytics, internet of things and
and speech enabled, customizable, virtual assistant for smart machine learning. Elsevier, 2018.
environments. in 2018 11th International Conference on [61] Zou J, Huss M, Abid A, Mohammadi P, Torkamani A,
Human System Interaction (HSI), 2018; 50–56. Telenti A, A primer on deep learning in genomics. Nat.
[43] Someshwar D, Bhanushali D, Chaudhari V, Nadkarni S, Genet. 2019; 51(1):12–18, DOI: 10.1038/s41588-018-
Implementation of Virtual Assistant with Sign Language 0295-5.
using Deep Learning and TensorFlow. in 2020 Second [62] Miotto R, Wang F, Wang S, Jiang X, Dudley JT, Deep
International Conference on Inventive Research in learning for healthcare: review, opportunities and
Computing Applications (ICIRCA), 2020; 595–600. challenges,” Brief. Bioinform. 2018; 19(6):1236–1246.
[44] Vishnu R, Krishna Prakash N, Mobile Application-Based DOI: 10.1093/bib/bbx044.
Virtual Assistant Using Deep Learning. in Soft Computing [63] Liu S, Liu S, Cai W, Pujol S, Kikinis R, Feng D, Early
and Signal Processing, 2021; 609–617. diagnosis of Alzheimer’s disease with deep learning. in
[45] Carnier M, Albertti R, Gavidia L, Severeyn E, A La Cruz, 2014 IEEE 11th international symposium on biomedical
ToraxIA: Virtual Assistant for Radiologists Based on Deep imaging (ISBI), 2014; 1015–1018.
Learning from Chest X-Ray. in Artificial Intelligence, [64] Brosch T, Tam R, Initiative ADN, Manifold learning of
Computer and Software Engineering Advances, 2021; 49– brain MRIs by deep learning. in International Conference on
63. Medical Image Computing and Computer-Assisted
[46] Shawar BA, Atwell E, Chatbots: are they really useful?. in Intervention, 2013; 633–640.
Ldv forum, 2007, vol. 22, no. 1, pp. 29–49. [65] Prasoon A, Petersen K, Igel C, Lauze F, Dam E, Nielsen M,
[47] Parmar M, Microsoft Bing search is getting its own AI- Deep feature learning for knee cartilage segmentation using
powered assistant. https://www.windowslatest.com/, 2021. a triplanar convolutional neural network. in International
https://www.windowslatest.com/2021/05/31/microsoft- conference on medical image computing and computer-
bing-search-is-getting-its-own-ai-powered-assistant/ assisted intervention, 2013; 246–253.
(accessed Aug. 27, 2021). [66] Yoo Y, Brosch T, Traboulsee A, Li DKB, Tam R, Deep
[48] DALE R, The return of the chatbots,” Nat. Lang. Eng., vol. learning of image features from unlabeled data for multiple
2016; 22(5):811–817. DOI: DOI: sclerosis lesion segmentation in International workshop on
10.1017/S1351324916000243. machine learning in medical imaging, 2014; 117–124.
[49] Bhagwat VA, Deep Learning for Chatbots. 2018. [67] Cheng JZ, et al., Computer-aided diagnosis with deep
[50] Weizenbaum J, ELIZA — a Computer Program for the learning architecture: applications to breast lesions in us
Study of Natural Language Communication between Man images and pulmonary nodules in CT scans. Sci Rep 6:
and Machine. Commun. ACM, 1983; 26(1):23–28, DOI: 24454. 2016.
10.1145/357980.357991. [68] Gulshan V, et al., Development and validation of a deep
[51] Benoit R, Making a Clever Intelligent Agent: The Theory learning algorithm for detection of diabetic retinopathy in
behind the Implementation, vol. 3. 2009. retinal fundus photographs. Jama. 2016; 316(22):2402–
[52] Wu W, Yan R, Deep chit-chat: Deep learning for chatbots. 2410.
in Proceedings of the 42nd International ACM SIGIR [69] Esteva A, et al., Dermatologist-level classification of skin
Conference on Research and Development in Information cancer with deep neural networks. Nature. 2017;
Retrieval, 2019, 1413–1414. 542(7639):115–118.
[53] Yan R, Song Y, Wu H, Learning to respond with deep neural [70] Cheng Y, Wang F, Zhang P, Hu J, Risk prediction with
networks for retrieval-based human-computer conversation electronic health records: A deep learning approach. in
system. in Proceedings of the 39th International ACM Proceedings of the 2016 SIAM International Conference on
SIGIR conference on Research and Development in Data Mining, 2016; 432–440.
Information Retrieval, 2016, pp. 55–64. [71] Lipton ZC, Kale DC, Elkan C, Wetzel R, Learning to
[54] Yan R, Zhao D, Coupled context modeling for deep chit- diagnose with LSTM recurrent neural networks. arXiv
chat: towards conversations between human and computer. Prepr. arXiv1511.03677, 2015.
in Proceedings of the 24th ACM SIGKDD International [72] Pham T, Tran T, Phung D, Venkatesh S, Deepcare: A deep
Conference on Knowledge Discovery & Data Mining, 2018; dynamic memory model for predictive medicine. in Pacific-
2574–2583. Asia conference on knowledge discovery and data mining,
[55] Yan R, Chitty-Chitty-Chat Bot’: Deep Learning for 2016; 30–41.
Conversational AI. in IJCAI, 2018; 18:5520–5526.
[73] Miotto R, Li L, Kidd BA, Dudley JT, Deep patient: an [91] Zhu J, Pande A, Mohapatra P, Han JJ, Using deep learning
unsupervised representation to predict the future of patients for energy expenditure estimation with wearable sensors. in
from the electronic health records. Sci. Rep. 2016; 6(1):1– 2015 17th International Conference on E-health
10. Networking, Application & Services (HealthCom), 2015;
[74] Miotto R, Li L, Dudley JT, Deep learning to predict patient 501–506.
future diseases from the electronic health records. in [92] Jindal V, Birjandtalab J, Pouyan MB, Nourani M, An
European conference on information retrieval, 2016; 768– adaptive deep learning approach for PPG-based
774. identification in 2016 38th Annual international conference
[75] Liang Z, Zhang G, Huang JX, Hu QV, Deep learning for of the IEEE engineering in medicine and biology society
healthcare decision making with EMRs. in 2014 IEEE (EMBC), 2016; 6401–6404.
International Conference on Bioinformatics and [93] Nurse E, Mashford BS, Yepes AJ, Kiral-Kornek I, Harrer S,
Biomedicine (BIBM), 2014; 556–559. Freestone DR, Decoding EEG and LFP signals using deep
[76] Tran T, Nguyen TD, Phung D, Venkatesh S, Learning learning: heading TrueNorth. in Proceedings of the ACM
vector representation of medical objects via EMR-driven international conference on computing frontiers, 2016; 259–
nonnegative restricted Boltzmann machines (eNRBM). J. 266.
Biomed. Inform., 2015; 54:96–105. [94] Sathyanarayana A, et al., Sleep quality prediction from
[77] Che Z, Kale D, Li W, Bahadori MT, Liu Y, Deep wearable data using deep learning. JMIR mHealth uHealth,
computational phenotyping. in Proceedings of the 21th 2016; 4(4):125.
ACM SIGKDD International Conference on Knowledge [95] Top Use Cases for AI in Media and Entertainment,”
Discovery and Data Mining, 2015; 507–516. Dataiku. Sep. 22, 2021, [Online]. Available:
[78] Lasko TA, Denny JC, Levy MA, Computational phenotype https://www.dataiku.com/stories/ai-in-media-and-
discovery using unsupervised feature learning over noisy, entertainment/.
sparse, and irregular clinical data,” PLoS One. 2013; [96] Chou CH, Su YS, Hsu CJ, Lee KC, Han PH, Design of
8(6):66341. Desktop Audiovisual Entertainment System with Deep
[79] Choi E, Bahadori MT, Schuetz A, Stewart WF, J Sun, Learning and Haptic Sensations. Symmetry (Basel)., vol.
Doctor ai: Predicting clinical events via recurrent neural 12, no. 10, 2020, DOI: 10.3390/sym12101718.
networks. in Machine learning for healthcare conference, [97] Erhel S, Jamet E, Digital game-based learning: Impact of
2016; 301–318. instructions and feedback on motivation and learning
[80] Nguyen P, Tran T, Wickramasinghe N, Venkatesh S, Deepr: effectiveness. Comput. Educ. 2013; 67:156–167. DOI:
a convolutional net for medical records. IEEE J. Biomed. https://doi.org/10.1016/j.compedu.2013.02.019.
Heal. informatics. 2016; 21(1):22–30. [98] Justesen N, Bontrager P, Togelius J, Risi S, Deep learning
[81] Razavian N, Marcus J, Sontag D, Multi-task prediction of for video game playing. IEEE Trans. Games. 2019; 12(1):1–
disease onsets from longitudinal laboratory tests. in 20.
Machine learning for healthcare conference, 2016; 73–100. [99] ParamitaGhosh, The Future of Deep Learning.
[82] Dernoncourt F, Lee JY, Uzuner O, Szolovits P, De- DATAVERSITY. Sep. 22, 2020, [Online]. Available:
identification of patient notes with recurrent neural https://www.dataversity.net/the-future-of-deep-learning/.
networks. J. Am. Med. Informatics Assoc. 2017; 24(3):596– [100] Gudmundsson SF, et al., Human-Like Playtesting with
606. Deep Learning. in 2018 IEEE Conference on Computational
[83] Zhou J, Troyanskaya OG, Predicting effects of noncoding Intelligence and Games (CIG), 2018; 1–8, DOI:
variants with deep learning–based sequence model. Nat. 10.1109/CIG.2018.8490442.
Methods, 2015; 12(10):931–934. [101] Usukhbayar B, Homer S, Deepfake Videos: The Future of
[84] Kelley DR, Snoek J, Rinn JL, Basset: learning the Entertainment. 2020.
regulatory code of the accessible genome with deep [102] Kulkarni R, Gaikwad R, Sugandhi R, Kulkarni P, Kone S,
convolutional neural networks,” Genome Res. 2016; Survey on deep learning in music using GAN. Int. J. Eng.
26(7):990–999. Res. Technol., vol. 8, no. 9, pp. 646–648, 2019.
[85] Alipanahi B, A Delong, MT Weirauch, BJ Frey, Predicting [103] Fessahaye F, et al., T-recsys: A novel music
the sequence specificities of DNA-and RNA-binding recommendation system using deep learning. in 2019 IEEE
proteins by deep learning,” Nat. Biotechnol., vol. 33, no. 8, international conference on consumer electronics (ICCE),
pp. 831–838, 2015. 2019; 1–6.
[86] Angermueller C, Lee HJ, Reik W, Stegle O, Accurate [104] Nam J, Choi K, Lee J, Chou SY, Yang YH, Deep learning
prediction of single-cell DNA methylation states using deep for audio-based music classification and tagging: Teaching
learning. BioRxiv. 2016; 55715. computers to distinguish rock from bach. IEEE Signal
[87] Koh PW, Pierson E, Kundaje A, Denoising genome-wide Process. Mag. 2018; 36(1):41–51.
histone ChIP-seq with convolutional neural networks. [105] Sünderhauf N, et al., The limits and potentials of deep
Bioinformatics. 2017; 33(14):i225–i233. learning for robotics. Int. J. Rob. Res. 2018; 37(4–5):405–
[88] Fakoor R, Ladhak F, Nazi A, Huber M, Using deep learning 420.
to enhance cancer diagnosis and classification. in [106] A. J. Moshayedi, A. S. Roy, and L. Liao, “PID Tuning
Proceedings of the international conference on machine Method on AGV ( Automated Guided Vehicle ) Industrial
learning, 2013, vol. 28, pp. 3937–3949. Robot,” vol. 12, no. 4, pp. 53–66, 2020. A
[89] Lyons J, et al., Predicting backbone Cα angles and dihedrals [107] A. J. Moshayedi and D. C. Gharpure, “Development of
from protein sequences by stacked sparse auto‐encoder deep position monitoring system for studying performance of
neural network. J. Comput. Chem. 2014; 35(28):2040– wind tracking algorithms,” 7th Ger. Conf. Robot. Robot.
2046. 2012, vol. 32, pp. 161–164, 2012.
[90] Hammerla NY, Halloran S, Plötz T, Deep, convolutional, [108] A. J. Moshayedi, S. S. Fard, L. Liao, and S. A. Eftekhari,
and recurrent models for human activity recognition using “Design and development of pipe inspection robot meant for
wearables. arXiv Prepr. arXiv1604.08880, 2016.