0% found this document useful (0 votes)
31 views10 pages

A Survey of Deep Learning Technologies For Mobile Robot Applications

The document discusses the use of deep learning techniques for mobile robot applications. It provides an overview of different deep learning architectures like convolutional neural networks that are well-suited for tasks involving images, video, speech and other signals. Examples of applications discussed include object detection, segmentation, localization, mapping and control of robotic systems.

Uploaded by

buihuyanh2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views10 pages

A Survey of Deep Learning Technologies For Mobile Robot Applications

The document discusses the use of deep learning techniques for mobile robot applications. It provides an overview of different deep learning architectures like convolutional neural networks that are well-suited for tasks involving images, video, speech and other signals. Examples of applications discussed include object detection, segmentation, localization, mapping and control of robotic systems.

Uploaded by

buihuyanh2018
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO.

8, AUGUST 2015 1

A Survey of Deep Learning Techniques for


Mobile Robot Applications
Jahanzaib Shabbir, and Tarique Anwer

Abstract—Advancements in deep learning over the years have attracted research into how deep artificial neural networks can be used
in robotic systems. It is on this basis that the following research survey will present a discussion of the applications, gains, and
obstacles to deep learning in comparison to physical robotic systems while using modern research as examples. The research survey
will present a summarization of the current research with specific focus on the gains and obstacles in comparison to robotics. This will
arXiv:1803.07608v1 [cs.CV] 20 Mar 2018

be followed by a primer on discussing how notable deep learning structures can be used in robotics with relevant examples. The next
section will show the practical considerations robotics researchers desire to use in regard to deep learning neural networks. Finally, the
research survey will show the shortcomings and solutions to mitigate them in addition to discussion of the future trends. The intention
of this research is to show how recent advancements in the broader robotics field can inspire additional research in applying deep
learning in robotics.

Index Terms—Deep learning, robotic vision, navigation, autonomous driving, deep reinforcement learning, algorithms for robotic
perception, Semi-supervised and self-supervised learning, Deep learning architectures, multimodal, decision making and control.

1 I NTRODUCTION
1.1 Defining Deep Learning in the Context of Robotic 1.2 Forms of Deep Learning Applied to Mobile Robotic
Systems Systems

One type of layer that demands specific mention is con-


Deep learning is defined as the field of science that involves volutional layers. Unlike traditional layers that are fully
training extensive artificial neural networks using complex connected, convolutional layers apply the same weights in
functions, for example, nonlinear dynamics to change data order to operate in all the input space. This brings about a
from a raw, high-dimension, multimodal state to that which significant reduction of the overall number of weights in the
can be understood by a robotic system [1]. However, deep neural network which is specifically vital with images that
learning entails certain shortcomings which affect physical normally compose of hundreds of thousands and millions
robotic systems whereby generation of training data in of pixels that require processing [6]. It should be noted
overall is costly and therefore sub-optimal performance in that processing these kinds of images which have fully con-
the course of training poses a risk in certain applications. nected layers would need over 100K2 to 1M2 weights which
Yet, even with such difficulties, robotics researchers are connect to each layer which makes it entirely impractical.
searching for creative options, for instance, leveraging train- The inspiration of convolutional layers came from cortical
ing data through digital manipulation, automated training neurons within the visual cortex which only respond to
and using multiple deep neural networks to improve the stimuli in a receptive environment. Since convolution es-
performance and lower the time for training [2]. The idea timates such behavior, convolutional layers can be expected
of using machine learning to control robots needs humans to excel at image processing assignments [7]. The pioneer-
to show the willingness to lose a certain measure of con- ing research in neural networks using convolutional layers
trol. This is seemingly counterintuitive in the beginning uses image recognition tasks which we built on the ad-
although the gain for doing this is to allow the system vancements of ImageNet recognition competitions around
to begin learning on its own [3]. This makes the systems 2012. The lessons learned in this period gained widespread
capable of adaptation such that the potential of ultimately interest in convolutional layers being able to gain super-
improving their direction is that originating from human human recognition of images [8]. Currently, convolutional
control. This makes deep neural networks well suited to be neural networks have been come well known and highly
used with robots since they are flexible and can be used effective as a deep learning model for many image-based
in frameworks that cannot be supported by other machine applications. These applications comprise of semantic image
learning models [4]. For a long time, the most notable segmentation, scaling images using super-resolution, scene
method for optimization in neural networks is known as the recognition, object localization with images, human gesture
stochastic gradient descent. However, improved techniques, recognition and facial recognition [9] [10]. Images are not the
for instance, RMSProp, as well as Adam of recent, have only form of a signal which illustrates the excellence of con-
gained widespread use. Each of the many types of deep volutional neural networks. Their capability is also effective
learning models is made through the stacking of several in any form of a signal which demonstrates spatiotemporal
layers of regression models [5]. Within these models, distinct proximity for example speech recognition as well as speech
types of layers have undergone evolution for many aims. and audio synthesis [11]. Naturally, these have also started
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

to be dominative in the domain of signal processing and These can be the most general purpose deep learning struc-
heavily used in robotics, for instance, pedestrian detection tures since there are several distinct functions in robotics
with the use of LIDAR, mico-Doppler signatures, as well which researchers can use in approximating from sample
as depth-map, estimating [12] [13]. Recent projects have observations [21]. Certain examples of these observations
even started to integrate signals from several modalities and entail mapping from actions to corresponding changes in
combine them for unified recognition and perception [14]. the stage, mapping these changes in state to actions that
Ultimately, the philosophy that underlies and prevails in can cause it or mapping from force to motion. While in
the deep learning community is that every component of certain cases particular physical equations for such func-
a complex system can be taught to ”learn.” Therefore, the tions may already be defined, there are several situations
actual power of deep learning does not come from applying where the environment is highly complex for such equa-
just one of the described structures in the previous section tions to generate acceptable accuracy [22]. However, in such
as a part in robotics systems by in connecting components scenarios learning approximation of functions from sample
of all these structures to form a complete system that learns observations can yield accuracy that is significantly better. In
entirely [15]. This is the point where deep learning starts to other words, approximated functions do not need to be con-
make its impact such that each component of the system is tinuous. However, function approximation models are also
capable of learning as a whole and is capable of adapting excellent at classification tasks, for instance, determining the
to sophisticated methods. For example, neuroscientists have type of obstacles before a robot, the overall path planning
even started recognizing the many patterns evolving in the strategy well suited for present environments or the state of
deep learning community and in all artificial intelligence a certain complex object which the object is interacting with
are starting to mirror patterns previously evolved in the [23]. Furthermore, function approximation deep learning
human brain [16]. In the process of learning complex, high- architecture using rectifiers can model the high coupled
dimensional as well as novel dynamics, the analysis of dynamics of an autonomous mobile robot to solve the
derivatives within these complex dynamics needs human analytic derivatives and challenging system identification
expertise. However, this process normally consumes a lot of problems. Deep neural networks have superseded other
time and can bring about a trade-off between the dimension- models in detection and perception since they are capable
ality and tractability of states [17]. Therefore making these of engaging in direct operation with highly-dimensional
models robust to unforeseen impact is challenging and in input rather than needing feature vectors based on hand-
most cases, full state information is normally unknown. In engineered designs by humans [24]. This lowers the depen-
this case, systems that are able to rapidly and autonomously dence on humans such that additional training time can
adapt to modern dynamics are required to solve prob- be partially offset by lowering initial engineering efforts
lems for instance moving over services with unknown or [25]. Extraction of meaning from video or still imagery is
uncertain attributes, managing interaction in a new envi- another application where deep learning has gained im-
ronment or adapting or degrading robot subsystems [18]. pressive progression. This process demands simultaneously
Therefore, we need methods that are able to accomplish addressing using the four independent factors of object
possession of hundreds or thousands of degrees of freedom detection and a single deep neural network. These factors
and demonstrate high measures of uncertainty which are include feature extraction, motion handling, classification,
only available in a state of partial information. On the other articulation and occlusion handling [26]. Unified systems
hand, the process of learning control policies in dynamic limit suboptimal interactions between normally separate
environments and dynamic control systems is able to ac- systems by predicting the physical results of dynamic scenes
commodate high measures of freedom for applications such using vision alone. This is based on the actions of humans to
as swarm robotics, anthropomorphic hands, robot vision, be able to predict the results of a dynamic scene from visual
autonomous robot driving and robotic arm manipulation information, for example, a rock falling down and impacting
[19]. However, despite the advancements gained over the another rock [27]. It is therefore on this premise that deep
years in active research, robust and overall solutions for learning has been identified as being effective in managing
tasks, for instance, moving in deformed surfaces or navigat- multimodal data generation in robotic sensor applications.
ing complex geometries with the use of tools and actuator These applications include integration of vision and haptic
systems have remained elusive more so in novel scenarios. sensor data, incorporating depth data and image informa-
This shortcoming is inclusive of kinematic and path plan- tion from RGB-D camera data. Due to the extensive number
ning tasks which are inherent in advanced movement [20]. of meta-parameters, deep neural networks have evolved
On the other hand, in terms of advanced object recognition, somewhat a reputation of being challenging for non-experts
deep neural networks have proved to be increasingly adapt to be used effectively [28]. However, such parameters also
to the recognition and classification of objects. Examples avail significant flexibility which is a vital factor in their
of advanced application include recognition of deformed general success. Therefore, training deep neural networks
objects and estimation of their state and pose for movement, needs the user to be able to develop at least an elementary
semantic task, and path specification, for example, moving level of familiarization with many concepts. Specifically,
around the table [20]. In addition, it includes recognizing the applying these techniques will help in tacking advanced
attributes of an object and surface whereby for instance a object recognition challenges and reduced the extent of the
sharp object could present a danger to human collaborators entire changes as well [29].
in certain environments such as rough terrain. In the face
of such difficulties, deep learning models can be used in the
approximation of functions from sample input-output pairs.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

2 D EEP L EARNING FOR ROBOTIC P ERCEPTION machine learning resources and method and uses them to
2.1 Current Robotic Perception Trends solve any difficulties that need ”thought” whether human
or artificial. Deep learning is also introduced as a means of
Although current trends are more leaning to deep and
making sense of data with the use of multiple abstraction
big models, a simplified neural network with just a single
layers. In the course of the training process, deep neural
hidden layer and a basic sigmoid shaped activation function
networks are able to learn the means of discovering useful
will train faster and provide a baseline that is used to give
patterns to digitally represent data such as sounds and
meaning to any deeper model improvements. When we
images. This is specifically why we observe more advances
use deeper models, Leaky Rectifiers are able to normally
in the areas of image recognition and natural language
promote faster training by lowering the impact of the dimin-
processing originating from deep learning [36]. It is on this
ishing gradient challenge and improving accuracy through
backdrop that deep learning has taken the forefront position
using simplified monotonic derivatives [30]. Furthermore,
in helping researchers develop breakthrough methods to
since models with additional weights have increased flex-
the perception capabilities of robotics systems [16]. In more
ibility to over fitting training data, regularization is a vi-
simplified language, perception refers to the functionality of
tal technique in training the best model. In addition, an
robots being able to detect its surroundings. It is therefore
elastic net is a combination of well-established regulariza-
heavily reliant on multiple sources of sensory information.
tion methods used in promoting robustness against weight
However, with traditional robot technology extracting data
saturation and also promotion sparsity in weights [31].
from raw sensor by using rudimentary constructed sensors
However, newer regularization methods inclusive of drop-
theseold methods were limited by constraints of adapting
out and drop-connect has attained even better empirical
to generic settings [17]. In situations where these robotic
outcomes. Furthermore, many regularization methods are
systems faced dynamic environments, they operated in an
also in existence in specifically improving the robustness
unstructured manner by combining hybrid and autonomous
of autoencoders. In this case, special-purpose layers can
functionality to process information about their surround-
also make a significant distinction with deep neural net-
ings [18]. As such, with deep learning came the introduction
works [32]. It is a common method to alternate between
of new methods of processing data from robotic sensorsof
convolutional and max pool layers. These pool layers can
a robotic systems surroundings using a feature known as
lower the general number of weights in the network and
perception. These methods comprise of robot motion, per-
also allow the model to be able to recognize objects inde-
ception, human-interaction, manipulation and grasping, au-
pendent of where they are placed in the visual field [33].
tomation, self-supervision, self training and learning as well
On the other hand, batch normalization can provide us
as robot vision [19]. Deep learning models utilize automated
with significant improvements in rating the convergence
actions technology at only half the cost by using supervised
by ensuring the gradient in range affects the weights of
learning to attain their goals [20]. For example, in order
all neurons. In addition, residual layers can allow a deeper
to perfect an image recognition application, a neural net
and consequently more flexible model to be trained [34]. To
will be required to be trained with a collection of labeled
make effective use of deep learning models, it is vital to
data. On the other hand, unsupervised learning is how
train one or many General Purpose Graphical Processing
deep learning operates and allows for the discovery of new
Units since the other methods of parallelization of deep
patterns and insights by tackling problems with little or
neural networks have been tried but none of them have
no insight on what the results should be perceived by a
yet provided gains in beneficial performance of General
robot [21]. The method by which a mobile robot is able
Purpose Graphical Processing Units [35]. For a long time
to detect its environment with the use perception is by
until in recent years, robots have long been used in in-
using definitive decision-making policies [20]. For instance,
dustrial environments. In industrial environments, robotic
mobile robots using deep learning are able to navigate with
systems are pre-programmed with repetitive assignments
rationality by using motion and track precision sensors
which lack the capability of autonomy and as such operate
which are driven machine learning algorithms [21]. How-
on the basis of a structured approach. Such an environment
ever, in difficult environments such as a congested room, the
cannot be adaptive for a mobile robot since it eliminates the
level of accurately perceiving their environment is limited.
need for autonomy. On the other hand, mobile robots need
Therefore, deep learning based solutions can tackle this
less structured environments such that they can be able to
challenge by using artificial intelligence, high computational
make their own decisions such as navigate paths, determine
hardware and processing layers known as deep convolution
whether objects are obstacles, recognize images and audio
neural networks to solve this dilemma by successfully de-
as well as map their environments. As such, surviving and
ciphering intricate environment perception difficulties [22].
adapting in the real world is more complex for any robotic
The various obstacles that exist in a robots environment
system in comparison to the industrial setting since the
are an indication of the immense high-dimensional data
risk of failure, system error, external factors, obstacles, cor-
processing capabilitiesrequiredby a robotic system to be able
rupt data, human error and unrecognizable environments is
to perceive its surroundings [23]. By using self-supervised,
more prevalent [15].
semi-supervised and full supervised training coupled with
learning, robotic systems are able to utilize their machine
2.2 Machine Learning Usage in Training Robotic Sys- learning and pattern recognition capabilities to process raw
tem Perception data such as images, objects, and semantics, audio as well
The difference between deep learning and machine learn- as process natural language in the real time. This segment of
ing is that deep learning place emphasis on the subset of deep learning is the best since it uses feed-forward artificial
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

neural networks to successfully analyze visual imagery. It 2.4 How Mobile Robotic Perception Models Can Be
also uses multiple multilayer perceptrons based on designs Used to Attain Complete Situational Awareness
that need low preprocessing [22]. In comparison to other im- These capabilities will be surveyed within the research
age classification algorithms, it uses minimal pre-processing survey through deep learning perception model algorithms
which means that the network is able to learn filters using that are ableto determine how a robot responds to the
automated procedures unlike traditional algorithms that are dynamic changes within an environment [29]. The basis
manually engineered. Therefore, by not relying on past of these models revolves around control theory affiliated
knowledge and human efforts in the design of features paradigms for instance system stability, control as well as
makes it a major advantage [24]. As such in general, this observation [30]. This theory states that a robotic system
makes deep learning capable of the extraction of multi- is able to perceive its environment by using hierarchical
level attributes from raw sensor data in a direct manner extensions or enhancements of learning by maximizing the
with no need for human assisted robotic control [24]. This range of its sensor capabilities while using path planning
presents researchers with the implication that deep learning algorithms to maneuver around obstacles or paths [31].
programming librariessuch as TensorflowTheano of Python, Most real-time map algorithms are concerned with the
Caffe of C++, darch in R, CNTK, Convent.js of Javascript, acquisition of compact 3D mapping within indoor settings
and Deeplearning4j derived from C++ and Java among with the use of range as well as imaging sensory capabil-
others are extremely of use in providing robotic systems ities [32]. The process of developing models of a robot’s
with a platform to develop their sensory data analysis environment is a vital problem to deal with more so in
and environmentlearningby using deep learning algorithms regard to managing its workspace especially when it is
[25]. Robotic system perception concerns auxiliary functions shared with other machinery [33]. A mobile robot interacts
within mobile robotic are vital in interactingwith a robots with the environment by using control systems which define
environment.Sensing and intelligent perception are some of structures or obstacles as geometrical areas so as to be able
the applications which are vital since they determine the to cover all the likely configurations on the robot. Objects or
performance of a robotic system. These performances are structures are defined according to parallelepipeds, spheres,
largely dependent on how robot sensors perform. Modern planes, and cylinders. With such a simplified model, the
sensors and their functionality can provide impressive robot mobile robot system is able to define many geometrical
perception which is the foundation of self-adaptive as well areas of this nature to cover nearly all objects within its
as robotic artificial intelligence [26]. The process of changing surrounding, for example, moving objects, stationary items
from sensory input to control output using sensory-motor such as furniture and machinery. Therefore, we propose
control structures presentsa big difficultyin robotic percep- elementary geometrical volumes as a means of modeling
tion [27]. Some of the vital mobile robot components include a mobile robot’s perception capabilities of its environment
the manipulator comprising of many joints and connections, [34]. This method will allow the robot to be able to move
a locomotion device, sensors, a controller and an endeffec- within an environment with the certainty of not colliding
tor [37]. Mobile robots are automated systems capable of with regions that are forbidden; these regions must already
moving. They also have the ability to move around their be defined, declared and activated so as to be able to
surroundings and are not fixed to a single physical location. correctly work [35].
With these features mobile robots are able to perceive their
The geometrical perception algorithm will check if the
environments usingsensory data andautonomous control robot end effector is within the controlled area or warning
commands [28].
zone. This checking is done through the use of already
stored geometrical areas already defined by the user [38].
2.3 Shortcomings of Deep Learning in Robotic Percep- Similarly, in this context, the position of the dynamic ob-
tion ject avails the perception system with the likelihood of
connecting the geometrical regions with arbitrary moving
However, certain challenges remain unresolved in these points [39]. These points can be read from external sensors
robotic systems particularly the areas of perception and
such as encoders. Control of speed is undertaken with the
intelligent control. Some of these challenges are reflected
user of geometrical area blocks able to detect the shape
in the process needing a lot of data to be able to train and
typology and choose the correct movement law to be used
teach algorithms progressively. Large datasets are required
so as to modify the robot override and avoid collisions with
to ensure machines deliver the desired outcomes. In the
user-defined zones [40]. The speed override is transformed
same way as the human brain needs rich experience to smoothly when the robot end effector collides with a spher-
learn and deduce information, artificial neural networks
ical zone in accordance with the perception law in (1) [3].
also need abundant amounts of data. This means for more
powerful abstractions, more parameters are required and d−r
V = v0 . , r ≤ d ≤ r + δ,
hence more data. Another challenge is the tendency of over δ
fitting in neural networks whereby in certain cases, there V = v0 , d > r + δ, and (1)
is a sharp distinction between an error within a training V = 0, d < r
set and that encountered into new untrained datasets. This
problem arises when many models make the relative num- In this case, v is represented as the robot end effector ac-
ber of parameters fail to reliably perform. Therefore, the tual speed override, v0 as the old override, d as the distance
model only memorizes training examples and fails to learn between the robot end-effector and the main spherical area,
generalization of new situations and new datasets. the thickness of the warning zone is represented as δ while r
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

is the sphere radius [41]. At thestage when the robot meets a planning process. Furthermore, in environments that are un-
cylindrical zone, the speed override will be subjected to the structured, a robot is only able to possess limited knowledge
perception law in (2) [3]. of its environment, objects are able to change their unknown
to a known state for the robot by manipulating assignments
d1 − r may using the end effector [50]. However, this capability can
V = v0 . , 0 ≤ z ≤ h, be challenged by to a constrained trajectory such that the
R−r
d2 mobile robot is unable to reach a particular location with
V = v0 . , h ≤ z ≤ h + δ, −δ ≤ z < 0, p ∈ cyl, and (2) ease [51]. Each of these problems makes the challenge of
δ
d3 motion generation more complex. The explicit coordination
V = v0 . , h ≤ z ≤ h + δ, −δ ≤ z < 0, p ∈ cyl of planning and sensing necessary to manage dynamic
δ
environments increases the dimension of the state space. In
The perception law above represents h as the cylinder addition, robotic assignment requirements impose stringent
height, the position of the robot end effector is denoted as terms in form of high-frequency feedback [52]. Therefore,
p, the distance between the center of the cylinder and the existing motion planners tend to make assumptions that
position of the robot is denoted as d1 while the d2 represents are highly restrictive for environments that are unstructured
the distance between the top or bottom base of the cylinder since they are highly computationally difficult to satisfy
as well as the position of the robot [42]. in terms of gaining operator-to-robot feedback [53]. These
In addition, the least position between the position of assumptions, as well as the computational difficulty, are an
the robot and the top/bottom circumference points of the outcome of the foundation of motion planning with its high-
cylinder base is denoted as d3 . Furthermore, the robot speed dimensional configuration spacing which makes it highly
override coincides with the past speed override at the stage unsuited in solving space problems. Planners instead can
when the robot end effector is outside the warning zone [43]. be able to solve this paradigm by solely using workspace
information for collision avoidance. Nearly all real-world
3 D EEP L EARNING FOR ROBOTIC C ONTROL AND surroundings, however, comprised of a considerable mea-
sure of structure [54]. For instance, buildings are segmented
E XPLORATION
into hallways, doors, and rooms; outdoor environments
3.1 How Autonomous Robotic Systems Use Deep comprise of paths, streets, and intersections while objects
Learning to Control and Explore Their Enivornments for example tables, shelves and chairs have more favorable
Realizing the benefits of autonomous robot exploration approach directions. This information, however, is neglected
presents robotics researchers with many applications of when robot exploration planners exclusively operate on the
considerable community and financial impact [44]. Robotics basis of a configuration space [55]. Therefore, the outcome
research relies on perfect knowledge and control of the for most robotic exploration planners is to operate by the
environment [45]. The problems related to unstructured assumption that any environment is perfectly defined and
environments are an outcome of the high-dimensional state sustained as static in the course of planning [56].
space as well as the inherent likelihood in mapping sensory
perceptions on particular states. It should be noted that the
high dimensionality of the state space is representative of
4 D EEP L EARNING IN ROBOTIC ROBOTIC N AVI -
the most basic difficulty since robots leave highly controlled GATION AND AUTONOMOUS D RIVING
environments of a laboratory and enter into unstructured Using deep learning to attain autonomous driving assign-
surroundings. For example autonomous unmanned aerial ment is not a perfectly controlled and modeled task as most
vehicles used deep learning to classify terrain and solve any people think. Instead, it needs optimal perceptual capabili-
exploration shortcomings by generating control commands ties [57]. The process of perception of a robots environment
for its human operator so as to adapt to a certain trade- and interpreting the information it acquires allows it to
off [46]. The major hypothesis of this approach is therefore understand the condition of its surroundings, devise plans
for mobile robots to succeed in unstructured surroundings to change the state and observe how its actions impact
such that they can carefully choose assignment specific its environment. In unstructured environments, recognition
attributes and identify the relevant real-time structures to of objects has been proven to be highly challenging. With
lower their state space without impacting the performance immense volumes of sensor data and increased variation
of their exploration objectives [47]. Robots perform assign- of objects within similar object categories, for example, a
ments by exploring their surroundings. As such, given our paved and unpaved road as well as recognizing objects
focus on autonomous mobile exploration, we shall direct [58]. Deep learning uses machine learning motion capturing
most attention to exploration in service of movement, that abilities as well as optimal perceptual functionalities. This
is to say collision-free movement for end-effector placement is a prerequisite of several vital applications for robots
[48]. The challenge of generating such movement is an for instance flexible manufacturing, planetary exploration,
example of the problem faced in motion planning. Motion collaboration with human experts and elder care [59]. The
planning for robotic systems with many levels of freedom is challenge of driving in an environment includes problems
computationally challenging even in environments that are of movement of the robot in navigating varied obstacles
highly structured due to the increased-dimensional config- by pushing and pulling. Even in structured environments,
uration space [49]. In addition, unstructured environments automated driving is difficult due to the complexities of
are associated with the imposition of added difficulties in the related state space [60]. This state space comprises of
motion generation in comparison to the traditional motion appearance, dimension, position as well as the weight of
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

objects within the scene. It also comprises of several other navigation. However, it needs human supervision to over-
relevant attributes which provide indications of where to see the process of training and sending correct control
pull, push or grasp as well as the level of force to apply [61]. commands to robots without feedback. It is this form of
Deep learning and associated machine learning collective procedure that is prone to failure and high cost [66]. There-
actions improved the performance of robots in making the fore, in order to lower human involvement and limit manual
decision by embedding the capability within these systems data labeling of autonomous robotic navigation using imi-
to choose between possible actions and determine the re- tation learning, the techniques of semi-supervised and self-
quired parameters for their controllers. supervised learning can be introduced. It should be noted
however that these techniques need to operate according
to a multi-sensory design approach. The solution should
4.1 How Autonomous Robotic Systems Use Deep
comprise of a suboptimal sensor policy founded on sensor
Learning to Navigate Their Enivornments
fusion and automatic labeling of states the robot could
Autonomous driving in unstructured environments faces encounter [67]. This is also aimed at eliminating human
many challenges which do not exist in structured envi- supervision in the course of the learning process [68] [69].
ronments. In unstructured environments, object attributes Furthermore, a recording policy needs to be developed to
needed for driving cannot be defined as priori. Informa- provide throttling of the adversarial impact of too much
tion concerning objects has to be gained through sensors data being learned from the suboptimal sensor policy. As
even though these are normally ambiguous and therefore such, this solution will equip the robot with the capability
introduce uncertainty and avail information that is redun- of achieving near-human performance to a large extent in
dant. Furthermore, autonomous driving in unstructured most of its assignments [70]. It is also capable of surpassing
and dynamic surroundings normally needs responding in human performance in situations of unexpected outcomes
a fashion that is timely to a rapidly transforming environ- for instance hardware failure or human operator error. Fur-
ment [62]. Problems of autonomous driving can be made thermore, the semi-supervised method can be considered
simple through the exploitation of the structure which is as a solution to the problem of track classification in con-
inherent to human surroundings. Most objects in the real gested environments such as a room. This problem entails
world are based on designs to perform certain functions object classification undergoing segmenting and tracking
with the intention of being utilized by humans [44]. As without using class models [71]. Therefore, we introduce
an outcome, several real-world objects can share common semi-supervised learning as a technique capable of solving
attributes which allude to their intended usage. By plac- this problem by iteratively training a classifier and extract-
ing emphasis on these assignment-related object attributes, ing vital training examples from the data in its unlabeled
the complexity of autonomous driving is lowered. For ex- state. This is achieved by exploiting the tracking data. In
ample, visual data can be analyzed for the identification addition, the process also involves evaluating large multi-
of few points which correspond to positive locations at class difficulties presented by data sourced from congested
which a robot can maneuver. Furthermore, since movement artificial natural environments such as a street [72]. As
features are similar across many objects, robots could be such, when provided with manually labeled training tracks
trained to identify them. As a consequence, the state space of individual object classes, then semi-supervised learning
that requires exploration in order to move is significantly performance in comparison to self-supervised learning is
lowered [63]. In this case, researchers normally make as- able to use thousands of training tracks [73]. In addition,
sumptions to lower the complexity of autonomous driving when also provided with augmented unlabeled data, semi-
in unstructured environments. For instance, it is normally supervised learning has demonstrated the capability of
assumed that full models of objects in the environment can outperforming the self-supervised learning method. In this
be availed a priori or it can be gained through sensors case, semi-supervised learning presents itself as the most
while the environment remains the same in the process of simplified algorithmic approach to speeding up incremental
interaction [64]. However, in practical terms, it is impossible updating of booster classifiers by lowering the learning time
to avail autonomous driving with full priori models in the factor by three [74].
actual world. However, models that are perfect are not a
prerequisite for successful autonomous driving [65]. Robot
mobile driving can be guided using existing structures in
the world and in most cases those which are easy to per-
ceive. As such, by leveraging this structure, the complexity
of autonomous driving in unstructured environments is 6 M ULTIMODAL D EEP L EARNING M ETHODS FOR
lowered significantly. Similarly, understanding the intrinsic ROBOTIC V ISION AND C ONTROL
measures of freedom of objects in an environment is also
able to lower the complexity of autonomous driving in
unstructured surroundings [45]. Performing tasks in an imperfect controlled and modeled
surrounding means robots need to have optimal multimodal
capabilities. The procedure of perceiving the environment
5 S EMI -S UPERVISED AND S ELF -S UPERVISED
and interpreting the gained information allows robots to
L EARNING FOR ROBOTICS perceive the state of the environment, devise methods to
Imitation based learning is a promising approach to tackling alter the state and observe the impacts of their actions on
the difficult robotic assignments, for instance, autonomous the environment
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

6.1 How Autonomous Robotic Systems Use Semi- 7 A PPLICATION OF D EEP M ODELS TO P ROBLEMS
Supervised and Self-Supervised Learning to Learn Their IN V ISION AND ROBOTICS
Enivornments
The preceding overview of machine learning applications
The environment of a robot can be controlled too many in robotics will highlight five major areas where consid-
levels. In principle, less constrained environments are more erable impacts have been made by robotic technologies
difficult to perceive. In the real-world and its unstructured currently and in the development levels for long-term use.
and dynamic surroundings such as vegetation landscape However, by no means inclusive, the aim of this summa-
and terrain, the perception of a mobile robot needs to be rization is to provide the reader with a preview of the
capable of navigating this unknown environment by using form of machine learning applications in existence within
sensor modalities. More so, even without the introduction of robotics and motivate the desire for extended research in
uncertainty, sensors in themselves are ambiguous [75]. For such and other fields [80]. The growth of big data which
example, a lemon and a soccer ball can look similar from is to day visual information provided on the internet with
a certain perspective. In addition, a cup could be invisible the inclusion of annotated images and video has pushed
in case the cupboard is shut and it can be challenging forward advancements in computer vision which has in
to tell the difference between a remote control and cell turn assisted in extending machine-based learning systems
phone is they are both facing down. These factors are all to prediction learning methods such as those presented by
contributive to the challenges of perceiving the state of the research at Carnegie Mellon [81]. This presentation involved
environment. Furthermore, for example, advances in face unveiling offshoot examples such as the anomaly detection
recognition normally operate under the assumption con- using supervised learning have been applied in building
cerning the position and orientation of the individual in the structures with the capability of searching and assessing
image. The outcomes of object segmentation are normally damages in silicon wafers with the use of convolutional
founded on the capability of telling the difference between neural networks [81]. In addition, extrasensory technologies
an object and background on the basis of differences in color for instance lidar, ultrasound, and radar such as those
[76]. In addition, object recognition is normally reduced to developed by Nvidia as also propelling the creation of 360-
similarities in computing to a limited collection of given degree vision-based systems for autonomous vehicles and
objects. On the other hand, in an unstructured environ- UAVs [82]. Imitation learning which is closely associated
ment, the position and orientation are uncontrollable since with observational learning is also a field categorized by
assumptions concerning color and shades and problematic reinforcement learning or the difficulties of gaining an
to justify. Furthermore, the range and likely objects the robot agent to act towards maximizing rewards. One example
could encounter are intractable [77]. is Bayesian or probabilistic models which stand out as a
common machine learning method used an integral compo-
nent of field robotics where attributes of mobility in fields
6.2 How Autonomous Robotic Systems Use Multi-
such as construction, rescue, and inverse optimal control
modal and Deep Learning Methods to Percieve Their
methods have been utilized in humanoid robotics, off-
Enivornments
road terrain navigation, and legged locomotion [83]. Self-
Therefore, in order to tackle perception in unstructured supervised learning is another method that allows robots
surroundings, robots need to be able to lower the state to generate personalized training instances so as to refine
space that requires being analyzed. to provide facilitation of performance. This has been integrated robots and optical
certain perceptual assignments by limiting uncertainty and devices for instance in detection and rejection of objects
as such lowering the dimensionality of the state space. For such as dust and snow, identification of obstacles, vehicle
instance, in order to compute the distance of objects in an en- dynamics modeling, and 3D-scene analysis [84]. Assistive
vironment, robots need to relate depth to visual information and medical technologies are another application where
[78]. This is normally done with the use of a stereo vision assistive robots entail devices capable of sensing, processing
system and solving the correspondence challenge between sensory data and undertaking actions that gain individuals
two static 2D images. In addressing the correspondence with disabilities. Even as smart assistive technologies are
problem, however, it is complicated due to noise, many existent for the overall population, for instance, driver assis-
likely matches and the uncertainty in calibrating the camera tance resources, movement therapy robots avail diagnostic
[79]. On the other hand, in a system capable of the capture of or therapeutic gains. Furthermore, multi-agent learning con-
at least three view angles in one image, this lowers the state cerning coordination and negotiation are vital components
space by reducing a multi-sensor system to a single sensor. involving machine learning based robots also known as
Furthermore, in an unstructured environment, recognition agents. This method is broadly utilized in games that are
of objects has proven to be highly challenging [57]. This capable of adapting to a transforming landscape of other
is due to large volumes of sensor data and an increased robots or agents and searching for equilibrium strategies
variation within objects of a similar category. In this case, [76].
object recognition is an increased dimensional challenge. The interdisciplinary arena of computer vision con-
However, even in the face of these challenges, objects in cerned with how computers are able to be developed for at-
the similar category do share similar attributes. As such, by taining an increased measure of perception from the digital
applying this insight, robots are able to place emphasis to imagery of video. Computer vision assignments comprise
only a minimal subset of the state space that comprises of of techniques for acquisition, processing, analysis, perceiv-
the most relevant characteristics for classification [58]. ing digital imagery as well as extracting high dimensional
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

information from the actual world so as to yield numerical 8.2 Drawbacks of Deep Learning in the Context of Mo-
or symbolic data for example in the format of decisions [85]. bile Robots
Artificial intelligence areas concerned with autonomous The drawbacks of deep learning in applied robotics is that
planning or deliberated robotic systems navigation demand the storage of the agent data using replay memory does not
a thorough perception of such settings since information allow for re-batching or sampling at randomly from varied
concerning the environment can be availed by a computer time-stages. As such, memory aggregation in this approach
vision system in action as a vision sensor [57]. Therefore, lowers non-stationary and eroded updates while in simulta-
artificial intelligence, as well as computer vision, share neously limiting the techniques to off-policy reinforcement
other fields, for instance, pattern recognition and learning learning algorithms [23].
methods. The consequence is that in certain cases, computer
vision is viewed as a component of artificial intelligence.
Another application of computer vision is solid state physics
9 C ONCLUSION
since a large portion of computer vision systems are reliant Deep learning is set to transform the arena of artificial
on imagery sensors that provide detection of electromag- intelligence as well as represent a measure in the direction
netic radiation which normally takes a form that is either of developing autonomous systems with an increased scope
visible or infra-red lighting [81] [86]. These sensors operate of perceiving the visual world. Presently, deep learning
according to quantum physics designs with the procedure is allowing scaling of challenges that were traditionally
by which light interacts with the surface better explained by intractable for instance learning to directly play video games
optics behaviors. Such intricate inner working demonstrates for pixels. Furthermore, deep learning algorithms are also
how even complex image sensors even need quantum me- utilized in robotics to foster the capability of control func-
chanics to avail a full understanding of the process of image tionality for robots indirectly learning from cameral inputs
formation. Furthermore, another application of computer in the actual world. It is on this premise that the sur-
vision is the multiple measurement challenges in physics vey above illustrates the major advances and approaches
which can be tackled by utilizing computer vision for in- of reinforcement learning in regard to the main streams
stance fluids motion [39]. of value and policy-driven methods as well as associated
coverage of central algorithms in deep learning for instance
deep network, asynchronous advantage actor-critic as well
8 B ENEFITS AND D RAWBACKS OF D EEP L EARN - as trust region policy optimization. Furthermore, the re-
ING TO BE A PPLIED IN M OBILE ROBOTS search survey has highlighted the gains of deep neural
networking with emphasis on visual perception through
8.1 Benefits of Deep Learning in the Context of Mobile deep learning. One of the core objectives of the discipline
Robots of artificial intelligence it the production of completely au-
The gains of deep learning as a component of the wider tonomous agents that are able to interact with their settings
family of machine learning techniques founded on repre- to learn optimal behavior and demonstrate improvements
sentations of learning data in opposition to assignment- with time by trial and error regiments. It is therefore on
particular algorithms through supervised, unsupervised this premise that creating artificial intelligence systems that
and semi-supervised learning allows for structured on the are responsive and with the capability of learning has long
interpretation of information processing and patterns of been an elusive challenge. However, hope is found in the
communications which can be viewed as trials at defining principled mathematical framework of deep learning with
a relation between multiple stimuli and related neuronal utilizes experience driven autonomous learning to apply a
responses [87]. Deep learning architectures for instance deep functional approximation to represent learning attributes of
neural, deep beliefs as well as recurrent neural networks deep neural networks to overcome these challenges.
have been utilized in arenas inclusive of computer vision,
natural language processing, social network filtering, speech R EFERENCES
recognition, bioinformatics and audio recognition. In these
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol.
mentioned fields, deep learning architecture has produced 521, no. 7553, p. 436, 2015.
outcomes in comparison to an in certain case more advanced [2] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa,
to human expertise. Furthermore, deep learning algorithms D. Silver, and D. Wierstra, “Continuous control with deep rein-
forcement learning,” arXiv preprint arXiv:1509.02971, 2015.
utilize a cascade of many nonlinear processing unit layers [3] J. S. Esteves, A. Carvalho, and C. Couto, “Generalized geo-
for extraction of features and transformation with particular metric triangulation algorithm for mobile robot absolute self-
layers applying the output from the past layer as input [88]. localization,” in Industrial Electronics, 2003. ISIE’03. 2003 IEEE
International Symposium on, vol. 1. IEEE, 2003, pp. 346–351.
Deep reinforcement learning proposes a simplified con-
[4] A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural net-
ceptual light framework that utilizes asynchronous gradient works for matlab,” in Proceedings of the 23rd ACM international
descent to cater for deep neural network controller opti- conference on Multimedia. ACM, 2015, pp. 689–692.
mization. The presented asynchronous variants in standard [5] A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Bur-
gard, “Multimodal deep learning for robust rgb-d object recog-
reinforcement learning algorithms reveal that parallel actor nition,” in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ
learner holds a stabilizing influence on training which al- International Conference on. IEEE, 2015, pp. 681–687.
lows for the successful training of the neural network con- [6] Y. Yang and Y. Li, “Robot learning manipulation action plans by”
troller [19]. It is on this premise that asynchronous variants watching” unconstrained videos from the world wide web.” 2015.
[7] D. C. Cireşan, U. Meier, L. M. Gambardella, and J. Schmidhuber,
are presented as the most appealing Deep reinforcement “Deep, big, simple neural nets for handwritten digit recognition,”
learning approach. Neural computation, vol. 22, no. 12, pp. 3207–3220, 2010.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

[8] J. Lu, V. Behbood, P. Hao, H. Zuo, S. Xue, and G. Zhang, “Transfer [30] H. Brighton and H. Selina, Introducing Artifi-
learning using computational intelligence: a survey,” Knowledge- cial Intelligence: A Graphic Guide, ser. Introduc-
Based Systems, vol. 80, pp. 14–23, 2015. ing... Icon Books Limited, 2015. [Online]. Available:
[9] M. Turan, J. Shabbir, H. Araujo, E. Konukoglu, and M. Sitti, https://books.google.com.pk/books?id=4GxGCgAAQBAJ
“A deep learning based fusion of rgb camera information and [31] J. Bai, Y. Wu, J. Zhang, and F. Chen, “Subset based deep learning
magnetic localization information for endoscopic capsule robots,” for rgb-d object recognition,” Neurocomputing, vol. 165, pp. 280–
International journal of intelligent robotics and applications, vol. 1, 292, 2015.
no. 4, pp. 442–450, 2017. [32] V. Veeriah, N. Zhuang, and G.-J. Qi, “Differential recurrent neural
[10] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, per- networks for action recognition,” in Computer Vision (ICCV), 2015
spectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, IEEE International Conference on. IEEE, 2015, pp. 4041–4049.
2015. [33] R. Xu, C. Xiong, W. Chen, and J. J. Corso, “Jointly modeling deep
[11] M. Turan, Y. Almalioglu, H. Araujo, E. Konukoglu, and M. Sitti, video and compositional text to bridge vision and language in a
“Deep endovo: A recurrent convolutional neural network (rcnn) unified framework.” 2015.
based visual odometry approach for endoscopic capsule robots,” [34] K. Narasimhan, T. Kulkarni, and R. Barzilay, “Language under-
Neurocomputing, vol. 275, pp. 1861–1870, 2018. standing for text-based games using deep reinforcement learning,”
[12] N. Sünderhauf, S. Shirazi, A. Jacobson, F. Dayoub, E. Pepperell, arXiv preprint arXiv:1506.08941, 2015.
B. Upcroft, and M. Milford, “Place recognition with convnet land- [35] J. Wu, I. Yildirim, J. J. Lim, B. Freeman, and J. Tenenbaum, “Galileo:
marks: Viewpoint-robust, condition-robust, training-free,” Proceed- Perceiving physical object properties by integrating a physics
ings of Robotics: Science and Systems XII, 2015. engine with deep learning,” in Advances in neural information
[13] M. Turan, Y. Y. Pilavci, R. Jamiruddin, H. Araujo, E. Konukoglu, processing systems, 2015, pp. 127–135.
and M. Sitti, “A fully dense and globally consistent 3d map recon- [36] M. Turan, E. P. Ornek, N. Ibrahimli, C. Giracoglu, Y. Almali-
struction approach for gi tract to enhance therapeutic relevance oglu, M. F. Yanik, and M. Sitti, “Unsupervised odometry and
of the endoscopic capsule robot,” arXiv preprint arXiv:1705.06524, depth learning for endoscopic capsule robots,” arXiv preprint
2017. arXiv:1803.01047, 2018.
[14] M. Turan, Y. Y. Pilavci, I. Ganiyusufoglu, H. Araujo, E. Konukoglu, [37] M. Turan, Y. Almalioglu, H. Araujo, T. Cemgil, and M. Sitti,
and M. Sitti, “Sparse-then-dense alignment-based 3d map recon- “Endosensorfusion: Particle filtering-based multi-sensory data fu-
struction method for endoscopic capsule robots,” Machine Vision sion with switching state-space model for endoscopic capsule
and Applications, vol. 29, no. 2, pp. 345–359, 2018. robots using recurrent neural network kinematics,” arXiv preprint
[15] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human- arXiv:1709.03401, 2017.
level concept learning through probabilistic program induction,”
[38] R. Lun and W. Zhao, “A survey of applications and human motion
Science, vol. 350, no. 6266, pp. 1332–1338, 2015.
recognition with microsoft kinect,” International Journal of Pattern
[16] S. Marsland, “Machine learning, an algorithmic perspective, chap- Recognition and Artificial Intelligence, vol. 29, no. 05, p. 1555008,
man & hall/crc machine learning & pattern recognition,” CRC, 2015.
Boca Raton, Fla, 2009.
[39] J. Kuen, K. M. Lim, and C. P. Lee, “Self-taught learning of a deep
[17] I. Lenz, H. Lee, and A. Saxena, “Deep learning for detect-
invariant representation for visual tracking via temporal slowness
ing robotic grasps,” The International Journal of Robotics Research,
principle,” Pattern recognition, vol. 48, no. 10, pp. 2964–2982, 2015.
vol. 34, no. 4-5, pp. 705–724, 2015.
[40] Y. Hou, H. Zhang, and S. Zhou, “Convolutional neural network-
[18] Z. Ghahramani, “Probabilistic machine learning and artificial in-
based image representation for visual loop closure detection,” in
telligence,” Nature, vol. 521, no. 7553, p. 452, 2015.
Information and Automation, 2015 IEEE International Conference on.
[19] Y. Duan, X. Chen, R. Houthooft, J. Schulman, and P. Abbeel,
IEEE, 2015, pp. 2238–2245.
“Benchmarking deep reinforcement learning for continuous con-
trol,” in International Conference on Machine Learning, 2016, pp. [41] Y. Qian, J. Dong, W. Wang, and T. Tan, “Deep learning for steganal-
1329–1338. ysis via convolutional neural networks,” in Media Watermarking,
Security, and Forensics 2015, vol. 9409. International Society for
[20] S. Levine, P. Pastor, A. Krizhevsky, and D. Quillen, “Learning
Optics and Photonics, 2015, p. 94090J.
hand-eye coordination for robotic grasping with large-scale data
collection,” in International Symposium on Experimental Robotics. [42] E. Tzeng, J. Hoffman, T. Darrell, and K. Saenko, “Simultaneous
Springer, 2016, pp. 173–184. deep transfer across domains and tasks,” in Computer Vision
[21] Y. Yang, C. Fermuller, Y. Li, and Y. Aloimonos, “Grasp type (ICCV), 2015 IEEE International Conference on. IEEE, 2015, pp.
revisited: A modern perspective on a classical feature for vision,” 4068–4076.
in Proceedings of the IEEE Conference on Computer Vision and Pattern [43] L. Pinto, D. Gandhi, Y. Han, Y.-L. Park, and A. Gupta, “The curious
Recognition, 2015, pp. 400–408. robot: Learning visual representations via physical interactions,”
[22] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution in European Conference on Computer Vision. Springer, 2016, pp.
using deep convolutional networks,” IEEE transactions on pattern 3–18.
analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2016. [44] B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman,
[23] A. Gongal, S. Amatya, M. Karkee, Q. Zhang, and K. Lewis, “Sen- “Building machines that learn and think like people,” Behavioral
sors and systems for fruit detection and localization: A review,” and Brain Sciences, vol. 40, 2017.
Computers and Electronics in Agriculture, vol. 116, pp. 8–19, 2015. [45] G. Chen, D. Clarke, M. Giuliani, A. Gaschler, and A. Knoll, “Com-
[24] A. M. Nguyen, J. Yosinski, and J. Clune, “Innovation engines: bining unsupervised learning and discrimination for 3d action
Automated creativity and improved stochastic optimization via recognition,” Signal Processing, vol. 110, pp. 67–81, 2015.
deep learning,” in Proceedings of the 2015 Annual Conference on [46] J. Wulff and M. J. Black, “Efficient sparse-to-dense optical flow
Genetic and Evolutionary Computation. ACM, 2015, pp. 959–966. estimation using a learned basis and layers,” in Proceedings of the
[25] Y. Du, W. Wang, and L. Wang, “Hierarchical recurrent neural IEEE Conference on Computer Vision and Pattern Recognition, 2015,
network for skeleton based action recognition,” in Proceedings of pp. 120–130.
the IEEE conference on computer vision and pattern recognition, 2015, [47] J.-R. Ruiz-Sarmiento, C. Galindo, and J. Gonzalez-Jimenez, “Scene
pp. 1110–1118. object recognition for mobile robots through semantic knowledge
[26] Y. Tang, “Deep learning using linear support vector machines,” and probabilistic graphical models,” Expert Systems with Applica-
arXiv preprint arXiv:1306.0239, 2013. tions, vol. 42, no. 22, pp. 8805–8816, 2015.
[27] N. Sünderhauf, S. Shirazi, F. Dayoub, B. Upcroft, and M. Milford, [48] N. Das, E. Ohn-Bar, and M. M. Trivedi, “On performance evalua-
“On the performance of convnet features for place recognition,” tion of driver hand detection algorithms: Challenges, dataset, and
in Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ International metrics,” in Intelligent Transportation Systems (ITSC), 2015 IEEE 18th
Conference on. IEEE, 2015, pp. 4297–4304. International Conference on. IEEE, 2015, pp. 2953–2958.
[28] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: [49] R. Salakhutdinov, “Learning deep generative models,” Annual
Learning affordance for direct perception in autonomous driving,” Review of Statistics and Its Application, vol. 2, pp. 361–385, 2015.
in Computer Vision (ICCV), 2015 IEEE International Conference on. [50] J. Schmidhuber, “On learning to think: Algorithmic informa-
IEEE, 2015, pp. 2722–2730. tion theory for novel combinations of reinforcement learning
[29] J. Schmidhuber, “Deep learning in neural networks: An overview,” controllers and recurrent neural world models,” arXiv preprint
Neural networks, vol. 61, pp. 85–117, 2015. arXiv:1511.09249, 2015.
JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10

[51] M. Turan, Y. Almalioglu, H. Araujo, E. Konukoglu, and M. Sitti, [73] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training
“A non-rigid map fusion-based direct slam method for endoscopic of deep visuomotor policies,” The Journal of Machine Learning
capsule robots,” International journal of intelligent robotics and appli- Research, vol. 17, no. 1, pp. 1334–1373, 2016.
cations, vol. 1, no. 4, pp. 399–409, 2017. [74] G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger, “Deep
[52] T. Chen, Z. Chen, Q. Shi, and X. Huang, “Road marking detection networks with stochastic depth,” in European Conference on Com-
and classification using machine learning algorithms,” in Intelli- puter Vision. Springer, 2016, pp. 646–661.
gent Vehicles Symposium (IV), 2015 IEEE. IEEE, 2015, pp. 617–621. [75] S. Niekum, S. Osentoski, G. Konidaris, S. Chitta, B. Marthi, and
[53] M. Vrigkas, C. Nikou, and I. A. Kakadiaris, “A review of human A. G. Barto, “Learning grounded finite-state representations from
activity recognition methods,” Frontiers in Robotics and AI, vol. 2, unstructured demonstrations,” The International Journal of Robotics
p. 28, 2015. Research, vol. 34, no. 2, pp. 131–157, 2015.
[54] O. K. Oyedotun and A. Khashman, “Deep learning in vision- [76] C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning
based static hand gesture recognition,” Neural Computing and modular neural network policies for multi-task and multi-robot
Applications, vol. 28, no. 12, pp. 3941–3951, 2017. transfer,” in Robotics and Automation (ICRA), 2017 IEEE Interna-
[55] R. K. Moore, “From talking and listening robots to intelligent tional Conference on. IEEE, 2017, pp. 2169–2176.
communicative machines.” [77] C. Finn, X. Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbeel,
[56] G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algo- “Deep spatial autoencoders for visuomotor learning,” in Robotics
rithm for deep belief nets,” Neural computation, vol. 18, no. 7, pp. and Automation (ICRA), 2016 IEEE International Conference on.
1527–1554, 2006. IEEE, 2016, pp. 512–519.
[57] Y. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, L. Fei-Fei, [78] A. A. Rusu, M. Vecerik, T. Rothörl, N. Heess, R. Pascanu, and
and A. Farhadi, “Target-driven visual navigation in indoor scenes R. Hadsell, “Sim-to-real robot learning from pixels with progres-
using deep reinforcement learning,” in Robotics and Automation sive nets,” arXiv preprint arXiv:1610.04286, 2016.
(ICRA), 2017 IEEE International Conference on. IEEE, 2017, pp. [79] S. Mohamed and D. J. Rezende, “Variational information max-
3357–3364. imisation for intrinsically motivated reinforcement learning,” in
[58] F. Cruz, J. Twiefel, S. Magg, C. Weber, and S. Wermter, “Interactive Advances in neural information processing systems, 2015, pp. 2125–
reinforcement learning through speech guidance in a domestic 2133.
scenario,” in Neural Networks (IJCNN), 2015 International Joint [80] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural
Conference on. IEEE, 2015, pp. 1–8. network for real-time object recognition,” in Intelligent Robots and
[59] A. Vinciarelli, A. Esposito, E. André, F. Bonin, M. Chetouani, J. F. Systems (IROS), 2015 IEEE/RSJ International Conference on. IEEE,
Cohn, M. Cristani, F. Fuhrmann, E. Gilmartin, Z. Hammal et al., 2015, pp. 922–928.
“Open challenges in modelling, analysis and synthesis of human [81] Y. Zhang, K. Sohn, R. Villegas, G. Pan, and H. Lee, “Improving
behaviour in human–human and human–machine interactions,” object detection with deep convolutional networks via bayesian
Cognitive Computation, vol. 7, no. 4, pp. 397–413, 2015. optimization and structured prediction,” in Proceedings of the IEEE
[60] J. Doshi, Z. Kira, and A. Wagner, “From deep learning to episodic Conference on Computer Vision and Pattern Recognition, 2015, pp.
memories: Creating categories of visual experiences,” in Proceed- 249–258.
ings of the third annual conference on advances in cognitive systems [82] M. Turan, Y. Almalioglu, E. P. Ornek, H. Araujo, M. F. Yanik, and
ACS, 2015, p. 15. M. Sitti, “Magnetic-visual sensor fusion-based dense 3d recon-
struction and localization for endoscopic capsule robots,” arXiv
[61] X. Wang, D. Fouhey, and A. Gupta, “Designing deep networks for
preprint arXiv:1803.01048, 2018.
surface normal estimation,” in Proceedings of the IEEE Conference on
[83] C. Finn and S. Levine, “Deep visual foresight for planning robot
Computer Vision and Pattern Recognition, 2015, pp. 539–547.
motion,” in Robotics and Automation (ICRA), 2017 IEEE International
[62] H. Cuayáhuitl, S. Keizer, and O. Lemon, “Strategic dialogue
Conference on. IEEE, 2017, pp. 2786–2793.
management via deep reinforcement learning,” arXiv preprint
[84] V. Campos, A. Salvador, X. Giro-i Nieto, and B. Jou, “Diving deep
arXiv:1511.08099, 2015.
into sentiment: Understanding fine-tuned cnns for visual senti-
[63] E. Ohn-Bar and M. M. Trivedi, “Looking at humans in the age of
ment prediction,” in Proceedings of the 1st International Workshop on
self-driving and highly automated vehicles,” IEEE Transactions on
Affect & Sentiment in Multimedia. ACM, 2015, pp. 57–62.
Intelligent Vehicles, vol. 1, no. 1, pp. 90–104, 2016.
[85] S. Gu, E. Holly, T. Lillicrap, and S. Levine, “Deep reinforcement
[64] J. Wei, H. Liu, G. Yan, and F. Sun, “Robotic grasping recognition learning for robotic manipulation with asynchronous off-policy
using multi-modal deep extreme learning machine,” Multidimen- updates,” in Robotics and Automation (ICRA), 2017 IEEE Interna-
sional Systems and Signal Processing, vol. 28, no. 3, pp. 817–833, tional Conference on. IEEE, 2017, pp. 3389–3396.
2017. [86] M. Turan, Y. Almalioglu, H. Gilbert, A. E. Sari, U. Soylu,
[65] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale and M. Sitti, “Endo-vmfusenet: deep visual-magnetic sensor fu-
video prediction beyond mean square error,” arXiv preprint sion approach for uncalibrated, unsynchronized and asymmet-
arXiv:1511.05440, 2015. ric endoscopic capsule robot localization data,” arXiv preprint
[66] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, arXiv:1709.06041, 2017.
“Rethinking the inception architecture for computer vision,” in [87] J. Sanchez-Riera, K.-L. Hua, Y.-S. Hsiao, T. Lim, S. C. Hidayati,
Proceedings of the IEEE Conference on Computer Vision and Pattern and W.-H. Cheng, “A comparative study of data fusion for rgb-d
Recognition, 2016, pp. 2818–2826. based visual recognition,” Pattern Recognition Letters, vol. 73, pp.
[67] M. Turan, Y. Almalioglu, E. Konukoglu, and M. Sitti, “A deep 1–6, 2016.
learning based 6 degree-of-freedom localization method for endo- [88] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro,
scopic capsule robots,” arXiv preprint arXiv:1705.05435, 2017. G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow:
[68] J. Tang, C. Deng, and G.-B. Huang, “Extreme learning machine for Large-scale machine learning on heterogeneous distributed sys-
multilayer perceptron,” IEEE transactions on neural networks and tems,” arXiv preprint arXiv:1603.04467, 2016.
learning systems, vol. 27, no. 4, pp. 809–821, 2016.
[69] M. Turan, Y. Almalioglu, H. Araujo, E. Konukoglu, and M. Sitti,
“A non-rigid map fusion-based rgb-depth slam method for endo-
scopic capsule robots,” arXiv preprint arXiv:1705.05444, 2017.
[70] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L.
Yuille, “Deeplab: Semantic image segmentation with deep convo-
lutional nets, atrous convolution, and fully connected crfs,” arXiv
preprint arXiv:1606.00915, 2016.
[71] J.-C. Chen, V. M. Patel, and R. Chellappa, “Unconstrained face
verification using deep cnn features,” in Applications of Computer
Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016, pp.
1–9.
[72] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik,
and A. Swami, “The limitations of deep learning in adversarial
settings,” in Security and Privacy (EuroS&P), 2016 IEEE European
Symposium on. IEEE, 2016, pp. 372–387.

You might also like