0% found this document useful (0 votes)
11 views14 pages

Olfactory-Based Navigation Via Model-Based Reinforcement Learning and Fuzzy Inference Methods

OlfactoryBased_Navigation_via_ModelBased_Reinforcement_Learning_and_Fuzzy_Inference_Methods

Uploaded by

chtian1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views14 pages

Olfactory-Based Navigation Via Model-Based Reinforcement Learning and Fuzzy Inference Methods

OlfactoryBased_Navigation_via_ModelBased_Reinforcement_Learning_and_Fuzzy_Inference_Methods

Uploaded by

chtian1999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3014 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO.

10, OCTOBER 2021

Olfactory-Based Navigation via Model-Based


Reinforcement Learning and Fuzzy
Inference Methods
Lingxiao Wang , Student Member, IEEE, Shuo Pang , Member, IEEE, and Jinlong Li

Abstract—This article presents an olfactory-based navigation An effective navigation method is crucial for an OSL problem.
algorithm for using a mobile robot to locate an odor source in a Like image-based navigation methods, which use the informa-
turbulent flow environment. We analogize the odor source localiza- tion extracted from images as the reference to locate and navigate
tion as a reinforcement learning problem. During the odor plume
tracing process, the belief state in a partially observable Markov a robot, olfactory-based navigation methods detect odor plumes
decision process model is adapted to generate a source probability as cues to guide a robot toward an odor source. The challenging
map that estimates possible odor source locations, and a hidden part of this navigation problem is to estimate plume locations,
Markov model is employed to produce a plume distribution map which are not only related to the molecular diffusion that takes
that premises plume propagation areas. Both source and plume esti-
plumes away from the odor source but also the advection of
mations are fed to the robot, and a decision-making approach based
on fuzzy inference is designed to dynamically fuse information from airflow [6].
two maps and to balance the exploitation and exploration of the The most straightforward olfactory-based navigation ap-
search. After assigning the fused information to reward functions, proach is the chemotaxis [7], which commands the robot to move
a value iteration based path planning algorithm is presented to along the gradient of odor plume concentrations. A common
solve for the optimal action policy. Comparing to other commonly
implementation of this method is to install a pair of chemical
used olfactory-based navigation algorithms, such as moth-inspired
and Bayesian inference methods, simulation results show that the sensors on the left and right sides of a robot, and the robot
proposed method is more intelligent and efficient. is commanded to steer toward the side with the higher con-
centration [8]. Many experiments [9]–[12] have proved that the
Index Terms—Fuzzy theory, odor source localization (OSL),
olfactory-based navigation, partially observable Markov decision chemotaxis method is effective when the odor source is placed in
process (POMDP). an environment with laminar (i.e., low Reynolds numbers) flows.
However, this method is not applicable in an environment with
I. INTRODUCTION turbulent flows (i.e., high Reynolds numbers), since odor plumes
are congregated into packets and the gradient of concentration
LFACTION, also known as the sense of smell, is an
O important sensing ability for animals to perform life-
essential activities, such as homing, foraging, mate-seeking,
is a patchy and intermittent signal [13].
Alternatively, two other categories of olfactory-based naviga-
tion strategies have been proposed, namely bio-inspired methods
and evading predators. Inspired by olfactory capabilities of and engineering-based methods. A bio-inspired method directs
animals, an autonomous vehicle or a mobile robot, equipped with the robot to mimic animal behaviors, such as mate-seeking be-
odor-detection sensors (e.g., chemical sensors), could locate an haviors of male moths, which could successfully locate a female
odor (or volatile chemical) source in an unknown environment. moth by tracking pheromones over a long distance [14]. To com-
The technology of employing a robot to find an odor source plete this task, a male moth follows a “surge/casting” behavior
is referred to as odor source localization (OSL) [1]. Some pattern: it will fly upwind (surge) when detects pheromones
practical OSL applications that are frequently quoted include and traverse the wind (casting) when pheromones are absent.
monitoring air pollution [2], locating chemical gas leaks [3], Ryohei et al. [15] generalized the “surge/casting” model and
locating unexploded mines and bombs [4], and detecting bio- implemented it on a ground wheeled vehicle to find an odor
logical phenomena such as underwater hydrothermal vents [5]. source in a laminar flow environment. Li et al. [16] implemented
this method on an autonomous underwater vehicle to search for
Manuscript received February 8, 2020; revised May 29, 2020 and July 10,
2020; accepted July 17, 2020. Date of publication July 24, 2020; date of current
an underwater chemical source. Experiment results [17] proved
version October 6, 2021. (Corresponding author: Shuo Pang.) the validity of this method.
Lingxiao Wang and Shuo Pang are with the Department of Electrical Engi- By contrast, an engineering-based method does not follow a
neering and Computer Science, Embry-Riddle Aeronautical University, Daytona
Beach, FL 32114 USA (e-mail: [email protected]; [email protected]).
fixed behavior pattern. It utilizes math and physics approaches
Jinlong Li is with the Shanghai Jiaotong University, Shanghai 201101, China to model odor plume distribution and estimates possible odor
(e-mail: [email protected]). source locations. Then, a path planner is employed to guide the
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org.
robot moving toward the estimated source location. Methods that
Digital Object Identifier 10.1109/TFUZZ.2020.3011741 produce a source probability map, i.e., a map that indicates the

1063-6706 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3015

possibility of an area containing an odor source, are various, such


as Bayesian inference [18], particle filter [19], hidden Markov
model (HMM) [20], and partially observable Markov decision
process (POMDP) [21]. In the planning procedure, artificial
potential field [22], and deterministic policy gradient [23] are
feasible algorithms to design path planners, which generate a
search path that guides the robot to the estimated target. Besides,
Vergassola et al. [24] presented the “infotaxis” method, which
uses information entropy to guide the robot searching for an odor
source. In this method, the robot was commanded to select the
movement that reduces the information uncertainty of the odor
source.
Comparing existing olfactory-based navigation strategies, the
limitation of bio-inspired methods is that the robot lacks the
capability of estimating odor plume locations. Thus, when
odor plumes are not detected, the robot can only perform a
time-consuming “casting” behavior to recover plumes. As for
engineering-based methods, if the robot is source seeking ori-
ented (e.g. [25]), the search efficiency is not ideal, since the
source probability map is unreliable before the robot acquires
enough odor source information. On the other hand, the search
result is also not desired if the robot is plume seeking oriented
(e.g., [23]), since it leans toward detecting plumes instead of
locating the odor source. The research niche that our approach
fits is to let the robot estimate both odor source and odor plume
locations and fuse two estimations as the target to guide the
robot. So that, not only does the robot search for the odor source Fig. 1. Framework of the proposed olfactory-based navigation method.
location, but also it can quickly recover from plume nondetection
events when it does not observe plumes.
Reinforcement learning (RL) algorithms are widely imple-
mented in the field of artificial intelligence (AI). For instance, value iteration method is adopted to solve the RL problem and
AlphaGo [26], an AI robot based on RL methods, defeated a produces the optimal policy, which is a search route that leads
couple of best professional human players in the game of Go. the robot to the maximum reward location, i.e., the location that
An RL algorithm models interactions between an agent and the contains the most odor source information. Modeling and plan-
environment: an agent receives rewards by performing actions, ning procedures are repeated until reward functions converge,
and the goal of the agent is to take the action that maximizes which is considered as the complete of an OSL problem.
the cumulative reward [27]. The framework of an RL algorithm
is similar to an OSL problem: an agent could be considered
as a robot that aims to find an odor source in an unknown II. OVERVIEW OF THE PROPOSED OLFACTORY-BASED
environment. By appropriately defining reward functions, the NAVIGATION METHOD
robot is driven to choose actions that are beneficial to locate an It is commonly accepted that an OSL problem can be divided
odor source. The optimal policy is adapted as a search path that into three phases, namely plume finding, plume tracing, and
leads the robot to the maximal reward location. source declaration [28]. In the first phase, the robot searches
In this article, an olfactory-based navigation method for using the presence of odor plumes. After the robot detects plumes, it
on a ground mobile robot is presented. The proposed method switches to the plume tracing phase, which follows plumes as
contains two main procedures, i.e., modeling and planning. In cues to find the odor source. In the source declaration phase, the
the modeling procedure, odor source and plume estimations robot recognizes the odor source and declares the odor source
are obtained. Specifically, belief states in a POMDP model location.
are adapted to represent a source probability map, from which The scheme of the proposed olfactory-based navigation
the robot estimates possible odor source locations. Besides, a method is presented in Fig. 1. In the plume finding phase, a
plume distribution map that predicts odor plume propagation “zigzag” search route presented in [16] is adopted to detect
areas is obtained from an HMM-based method. A fuzzy infer- plumes for the first time, and the source mapping algorithm
ence approach is designed to dynamically fuse the information that estimates odor source locations (see Section III-C5) is
from two maps, and the combined information is assigned to activated simultaneously. After the robot detects plumes for the
reward functions. In the planning procedure, a search route first time, the robot switches to the plume tracing phase, in which
is determined based on the generated reward functions. The the source mapping algorithm sustains and the plume mapping

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3016 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

algorithm that predicts plume propagation areas (see Section III-


D) is activated. Results from source and plume mappings are
combined via a fuzzy controller to form reward functions (see
Section IV-A). Then, a search route is generated by the value
iteration based path planning algorithm (see Section IV-B). In the
plume tracing phase, aforementioned algorithms keep updating
until reward functions converge, which indicates that the robot
finds the odor source location.
Specifically, the proposed method can be separated into two
principle procedures, namely modeling and planning. In the
modeling procedure, source estimates are obtained from belief
states in a POMDP. The motivation of using belief states is
that under the POMDP framework, belief states can estimate
uncertainties of an environment in a stochastic fashion, i.e.,
Fig. 2. Search area defined in the OSL problem.
the agent cannot directly observe states, but it can estimate the
current state through the belief state, which is the probability of
the agent being in a state. To adapt the POMDP framework in
an OSL problem, the odor source location can be used to define i ∈ [1, M ]. The odor source is placed in one of M cells, and its
hidden states, since it is unknown to the robot, actions can be location is clouded to the robot.
considered as possible moving directions of the robot, observa- Let f ∈ [1, m] count over cells in the x direction and g ∈ [1, n]
tions can be adapted as plume detection and nondetection events, count over cells in the y direction. Then, a cell index i ∈ [1, M ]
and the belief state can be interpreted as the probability of an can be represented as i = f + (g − 1)m. Reversely, if i is given,
area containing the odor source, i.e., a source probability map. f and g can be calculated as follows:
Besides, to construct a plume distribution map, which estimates f (i) = rem(i − 1, m) + 1
plume propagation areas, an HMM-based method is adopted. (1)
In the planning procedure, robot searching routes are deter- g(i) = int(i − 1, m) + 1
mined. In an RL algorithm, reward functions determine behav-
where rem(n, m) and int(n, m) are the remainder of n being
iors of an agent and stipulate how we want the agent to accom-
divided by m and the greatest integer that is less than or equal
plish its objective. In the proposed method, reward functions are
to n/m, respectively. Thus, two cell representations, i.e., Ci and
expected to contain the information that not only conjectures
C(f,g) , are equivalent.
odor source locations but also estimates plume propagation
areas since source and plume estimations are instructive for
the robot to either exploit or explore the odor source location. B. Probabilities of Detecting and Not Detecting Plumes
Thus, a source probability map and a plume distribution map are 1) Plume Model: In a turbulent flow environment, move-
combined to form reward functions. Instead of combining them ments of odor plumes follow a random walk superimposed on
in a fixed proportional pattern (e.g., [29]), a fuzzy controller is the airflow advection, which can be expressed as [18]
designed to identify the current search condition and dynami-
cally balance weights of two maps (exploitation or exploration). Ẋ(t) = U(X, t) + N(t) (2)
Then, a value iteration based path planning algorithm is adopted
where X = (xp , yp ) is the odor plume location, U = (ux , uy ) is
to solve for the optimal policy, i.e., the optimal search route that
the mean wind velocity, which transports a plume as the whole
leads the robot toward the location containing the most odor
body (i.e., advection), and N = (nx , ny ) denotes the random
source information.
walk velocity, which stirs filaments inside a plume and changes
the plume shape (i.e., diffusion). N can be modeled as a Gaussian
III. MODELING random process with zero mean and σ 2 = (σx2 , σy2 ) variance,
A. Search Area where σx and σy are strengths of the random walk velocity in
the x and y directions, respectively. It should be noted that the
In this work, the OSL is considered as a 2-D problem, since the position of an odor plume is chiefly determined by the advection
aimed implementation platform of the proposed olfactory-based U, since the strength of mean wind velocity is much higher than
navigation method is a ground mobile robot. For computational that of the random walk velocity [30].
feasibility, the search area is modeled as a rectangular grid with If the odor source releases a single odor plume at time tl , the
m cells in a row and n cells in a column as shown in Fig. 2. The location of the odor plume at time tk (tl < tk ) can be calculated
size of a cell is defined as Lx × Ly , where Lx and Ly are the by integrating (2)
length and width of a cell in the x and y directions, respectively.  tk  tk
A vector C = [C1 , C2 , ..., CM ] is defined to store cell indexes,
X (tl , tk ) = Xs (tl ) + U(X(τ ), τ )dτ + N(τ )dτ
where M = mn. Besides, Ci can also represent the position of tl tl
a cell, such that Ci = (xi , yi ) is the center point of a cell Ci , (3)

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3017

where Xs (tl ) is the odor source location. If the odor source pij (tl , tk ) can be calculated as follows:
continuously releases plumes in the time interval [tl , tk ],
assuming that the release rate is G plumes per second, pij (tl , tk )
there are G(tk − tl ) plumes released, and positions of re-   −
(xj −sx −xs )2

(yj −sy −ys )2
2 2
leased plumes can be denoted by P(tl , tk ) = [X(tl ), X(tl + e 2(tk −tl )σx
e 2(tk −tl )σy

=   dxs dys
dτ ), X(tl + 2dτ ), ..., X(tk )] where dτ = 1/G. 2π(tk − tl )σx2 2π(tk − tl )σy2
2) Single Released Odor Plume: This section presents the xs ∈Ci ys ∈Ci

probability of detecting a plume in an arbitrary cell given that xi + L2x yi +


Ly
(xj −sx −xs )2 (yj −sy −ys )2
  2 − 2
− 2
2(tk −tl )σy
the source only releases a single t plume. e 2(tk −tl )σx
e
Let s(tl , tk ) = (sx , sy ) = tlk U(X(τ ), τ )dτ , where sx and = dxs dys
2π (tk − tl ) σx σy
sy are plume advection distances in the x and y directions, xi − L2x L
yi − 2y
t
respectively. Let W(tl , tk ) = (wx , wy ) = tlk N(τ )dτ , which Ly (yj −yi −sy −ys )2
Lx (xj −xi −sx −xs )2
is a Gaussian random process with zero mean and (tk − tl )σ 2 2 2 − 2
− 2
e 2(tk −tl )σx
e 2(tk −tl )σy
variance. Thus, (3) can be rewritten as follows: = dxs dys (7)
2π (tk − tl ) σx σy
− L2x L
− 2y
X(tl , tk ) = Xs (tl ) + s(tl , tk ) + W(tl , tk ). (4)
which is a function of the relative positions of the cell Cj (i.e.,
plume position) and the cell Ci (i.e., possible odor source posi-
It is worth mentioning that s(tl , tk ) is approximated by inte- tion), plume advection distances s(tk , tl ), and the plume propa-
grating the sensed airflow velocities at the robot position from tl gation time tk − tl . In the algorithm implementation, pij (tl , tk )
to tk since the global airflow information is absent: s(tl , tk ) =
 is approximated as
(sx , sy ) ≈ k−1
q=l u(Xv (tq ), tq )dt, where u(Xv (t), t) denotes
(xj −xi −sx )2 (yj −yi −sy )2
airflow measurements at the location Xv (t) (i.e., the robot − 2
2(tk −tl )σx
− 2
2(tk −tl )σy
e e
location at time t). This approximation will introduce additional pij (tl , tk ) = Lx Ly (8)
errors, but, since the global airflow information is unavailable, 2π(tk − tl )σx σy
this assumption is acceptable. for the calculation efficiency. This approximation will introduce
If the odor plume is propagated to a cell Cj at the time tk , the additional errors when the cell size is large (i.e., Lx >> σx and
estimated odor source location X̂s = (xs , ys ) can be obtained Ly >> σy ), but in this work, the cell size is small (i.e., Lx ≈ σx
by solving (4) and Ly ≈ σy ), thus, the produced errors are negligible.
One feature of the olfactory sensing device is that it has
X̂s (tl , tk ) = X(tk ) − s(tl , tk ) − W(tl , tk ) (5) trivial false-alarm rates, but high missed-detection rates [18]. To
model this mechanism, let β donate the probability of the robot
successfully detecting plumes given that there are detectable
where X(tk ) = (xj , yj ) is the plume location at time tk , which is
plumes at the chemical sensor position. Thus, the probability of
inside the cell Cj . Note that, since X(tk ) − s(tl , tk ) is a constant
detecting a single released plume in Cj at tk that was released
and W(tl , tk ) is a Gaussian random variable with zero mean and
from Ci at tl is βpij (tl , tk ), and the probability of not detecting
(tk − tl )σ 2 variance, X̂s is also a Gaussian random variable this plume is 1 − βpij (tl , tk ).
with X(tk ) − s(tl , tk ) mean and (tk − tl )σ 2 variance. Thus, 3) Continuous Released Odor Plumes: If the odor source
the probability density function (PDF) of X̂s in the x and y continuously releases odor plumes from tl to tk , what is the
directions are as follows: probability of detecting and not detecting odor plumes? To
answer this question, the value of tl should be clarified.
(xj −sx −xs )2
− 2
2(tk −tl )σx As the plume traveling time tk − tl increases, the value of
e
f (xs ) =  s(tl , tk ) grows correspondingly, but s(tl , tk ) should not exceed
2π (tk − tl ) σx2 the size of the search area since the robot cannot detect odor
(yj −sy −ys )2
(6) plumes outside of it. The value of tl is initialized as 0 because
− 2
2(tk −tl )σy
e the airflow velocity is unavailable before the search, and as
f (ys ) =  .
the search progresses (i.e., tk increases), the distance between
2π (tk − tl ) σy2
tl and tk should be constrained (i.e., tk − tl < h, where h is
the maximum length of recorded flow history) to satisfy the
Since x and y directions are orthogonal, the joint PDF of X̂s aforementioned restraint. Therefore, tl is defined as follows:
is f (xs ) × f (ys ). Then, the probability of the estimated odor
tl = max(0, tk − h + 1). (9)
source being located in an arbitrary cell can be calculated by
integrating the PDF over positions in that cell. In Section III-B2, the probability of not detecting a plume in
Let pij (tl , tk ) donate the probability of there being an odor a cell Cj at time tk due to a single odor plume release from a
source in a cell Ci that released a single odor plume at time cell Ci at time tl is calculated as 1 − βpij (tl , tk ). Thus, if all
tl given that the odor plume is in the cell Cj at time tk . Thus, release times within [tl , tk ] are accounted, the probability of not

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3018 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

Fig. 4. Action space. The robot location is at the center cell. Arrows indicate
Fig. 3. Basic POMDP model. possible actions that the robot can take.

detecting plumes in a cell Cj at time tk due to the continuous For instance, s1 = [1, 0, ..., 0] indicates that the odor source
plume release from a cell Ci is is in the cell C1 , s2 = [0, 1, ..., 0] represents that the odor source
tk−1 is in the cell C2 , etc. Since the odor source could be located in an

κij (tl , tk ) = [1 − βpij (tl , tk )] . (10) arbitrary cell inside the search area, the state space is represented
tl
as S = {s1 , s2 , ...sM }.
2) Action Space: The action space defines possible actions
Since plume detection and nondetection events are comple- that an agent could select. As shown in Fig. 4, at the center cell
mentary, the probability of detecting plumes under the same C(f,g) , the robot could select one of eight actions and enter the
condition (i.e., continuous plume release) is 1 − κij (tl , tk ). corresponding cell around it.
In this work, an action is represented by the destination cell.
C. Source Mapping For example, a2 in Fig. 4 can be represented as a2 = C(f,g−1)
Belief states in a POMDP model are adapted as source esti- since the destination cell of this action is C(f,g−1) . Thus, the
mations, which are used to construct a source probability map. action space can be represented as A = {a1 = C(f −1,g−1) , a2 =
A basic POMDP model can be defined by a tuple (S, A, Ω, P , C(f,g−1) , ..., a8 = C(f +1,g+1) }.
O, R, b0 ) as shown follows [31]: 3) State Transition Probabilities: The location of the odor
• S is a state space. source is stationary in this work, i.e., the odor source cannot
• A is an action space. move. Thus, the state transition probability is 1 if the new state
• Ω is an observation space. is the same as the old state; otherwise, this probability is 0

• P are state transition probabilities between states.
 1 i=j
• O are observation probabilities. P (s = si |s = sj , a) = (11)
• R is the reward function defined on the transitions. 0 i = j
• b0 is an initial probability distribution over states. where i, j ∈ [1, M ].
As shown in Fig. 3, at each time-step, the agent receives an 4) Observation Space and Probabilities: When the robot
observation (o, o ∈ Ω) at the current state (s, s ∈ S), and after enters a cell Cj , it could or could not detect odor plumes.
performing an action (a, a ∈ A), the agent is transferred to a Thus, two observation states are defined in the observation space
new state (s , s ∈ S) according to the state transition probability ¯ namely the plume detection event d and the plume
Ω = {d, d},
P (s |s, a) and receives a new observation (o , o ∈ Ω) with the nondetection event d. ¯ A fixed plume concentration threshold
observation probability O(a, s , o ) and a reward R(s, a). Since is adopted to identify two events, i.e., a plume detection event
states are hidden to the agent, a probability distribution over is confirmed when the sensed concentration is higher than the
states is defined as the belief state b(s), which indicates the threshold, otherwise, a plume nondetection event is confirmed.
probability of the agent being in a particular state s, and the When the robot enters a cell Cj , the probability of detecting
initial belief state is b0 . continuous released plumes is 1 − κij (tl , tk ) and the probabil-
To illustrate the proposed source mapping algorithm, the rest ity of not detecting these plumes is κij (tl , tk ) as defined in
of section presents an approach that adapts elements in a basic Section III-B3. Thus, the observation probability O(a, s , o ) is
POMDP model to the context of an OSL problem. defined as follows:
1) State Space: States in a basic POMDP model are hidden 
to the agent, i.e., the agent does not know which state it is in.   1 − κij (tl , tk ) o = d
O(a = Cj , s = si , o ) = (12)
In an OSL problem, the actual odor source location is unknown κij (tl , tk ) o = d¯
to the robot. Thus, we defined states as the actual odor source
location, which can be represented by a length-M (M is the where i, j ∈ [1, M ].
number of cells in the search area) vector of Boolean values 5) Belief States: In a basic POMDP model, after taking an
indicating whether each cell contains the odor source. If the action a and transferring to a new state s , the belief state of the
Boolean value is 1, then the corresponding cell contains the new state b(s ) is updated based on the old belief state b(s), the
odor source; otherwise, this value is 0. observation probability O, and the state transition probability

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3019

P , which is represented as [32] time t1 , which can be obtained from the airflow history [20]. In
 addition, (18) can be rewritten in a vector notation
 O(a, s , o ) s∈S P (s |s, a)b(s)
b(s ) =     
. (13)
s∈S s ∈S O(a, s , o )P (s |s, a)b(s) ᾱ(t0 , t1 ) = ᾱ(t0 , t0 )Ā(t0 ) (19)
In an OSL problem, the belief state can be interpreted as the where Ā(t0 ) = [akj (t0 )] ∈ RM ×M is the matrix form of
probability of the robot believing that there is an odor source in akj (t0 ). Moreover, for an arbitrary plume propagation period,
a cell. Based on the defined observation probability (12) and the i.e., (tk − t0 ), ᾱ(t0 , tk ) can be calculated as follows:
state transition probability (11), (13) can be rewritten as follows: ⎧
⎧ (1−κ (t ,t ))b(s ) ⎪
⎨0, for tk < t0

⎪ 
ij l k i
o = d
⎨ M i=1 (1−κij (tl ,tk ))b(si ) ᾱ(t0 , tk ) = bI, for tk = t0 . (20)
b(s = si ) = . (14) ⎪

⎪ bΦ(t0 , tk ), for tk > t0

⎩ Mκij (tl ,tk )b(si )  ¯
o =d
i=1 κij (tl ,tk )b(si ) where I is theidentity matrix with the size of M × M and
The above equations iteratively update belief states depending Φ(t0 , tk+1 ) = kq=0 Ā(tq ).
on plume detection and nondetection events. The initial belief For the continuous plume release scenario, α(t0 , tk ) can be
state b0 is defined as 1/M , since the prior information about the derived from the single release case ᾱ(t0 , tk ) by considering all
odor source location is unavailable to the robot before it starts the release times from t0 to tk
search. However, it could be exploited through an appropriate k
1
distribution of b0 to reflect the prior knowledge known about α(t0 , tk ) = ᾱ(tq , tk ) (21)
the source if the information regarding the source location is k+1 q=0
available prior to the search.
In summary, by calculating belief states over all states, a where 1/(k + 1) is the normalization factor to maintain
source probability map b that estimates the source location is ||α(t0 , tk )||1 = 1. With (20) and (21) can be further reduced
obtained as follows:
 k 
b = [b(s1 ), b(s2 ), ..., b(sM )]. (15) 1
α(t0 , tk ) = bΦ(tq , tk )
k + 1 q=0
D. Plume Mapping  k−1

b
The plume mapping algorithm produces a plume distribution = Φ(tk , tk ) + Φ(tq , tk )
map, which indicates possible plume propagation areas. With k+1 q=0
the produced source probability map and the recorded airflow  k−1

history, an HMM-based plume mapping algorithm [20] is pre- b
= I+ Φ(tq , tk )
sented in this section. k+1 q=0
Let αj (t0 , tk ) denote the probability that a cell Cj contains
the detectable odor plume at time tk due to the continuous plume = bΨ(t0 , tk ), (22)
release by the source starting at t0 , where t0 is the initial time k−1
that the robot records airflow measurements. Denote where Ψ(t0 , tk ) = 1/(k + 1)[I + q=0 Φ(tq , tk )], which can
be iteratively updated as follows:
α(t0 , tk ) = [α1 (t0 , tk ), α2 (t0 , tk ), ..., αM (t0 , tk )] (16)
1  
Ψ(t0 , tk ) = I + kΨ(t0 , tk−1 )Ā(tk−1 ) . (23)
as the vector storing this variable for each cell, which is a plume k+1
distribution map at the current time tk .
Note that, since Ā(tk−1 ) relates to the latest airflow measure-
Introduce the variable ᾱj (t0 , tk ) representing the probability
ment, (23) encapsulates the airflow history over the entire search
that a cell Cj contains a detectable odor plume at tk due to a
time (i.e., from t0 to tk ).
single plume release at time t0 . Define
In summary, when the source probability map b is available
ᾱ(t0 , tk ) = [ᾱ1 (t0 , tk ), ᾱ2 (t0 , tk ), ..., ᾱM (t0 , tk )] (17) at the current time tk , a plume propagation map α(t0 , tk ) can
as the vector form of ᾱj (t0 , tk ). At t0 , the plume propagation be obtained by (22), and if b is updated in the next time step,
has not occurred yet and plumes are at the odor source location, by updating Ψ with the latest airflow measurements, a renewed
therefore, ᾱ(t0 , t0 ) = b since the actual odor source location is plume distribution map based on the new source probability map
unknown. To find ᾱj (t0 , t1 ), which is the probability of a cell can be obtained.
Cj containing plumes after one time step, the plume transitions
from all other cells to the cell Cj must be considered, i.e., IV. PLANNING
M A. Generate Reward Functions With Fuzzy Inference
ᾱj (t0 , t1 ) = ᾱk (t0 , t0 )akj (t0 ) (18) After the source probability map b and the plume distribution
k=1 map α(t0 , tk ) are obtained, information from two maps is fused
where akj (t0 ) denotes the probability of the one step transition and assigned to reward functions. The information provided by
of odor plumes from a cell Ck at time t0 to another cell Cj at two maps is complementary for determining robot behaviors,

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3020 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

Fig. 5. Structure of the proposed fuzzy controller. ρ and δT are sensed plume
concentration and plume non-detection period, respectively, and λ is the fusion
coefficient.

i.e., the robot either moves to the estimated source location


or to the possible plume areas. Thus, a weighted superposition
pattern is adopted to combine two maps, and values from two
maps are normalized with the min–max normalization before
the combination.
Let define four constants, bmax , bmin , αmax , and αmin , as
maximal and minimal values in b and α(t0 , tl ), respectively.
These constants can be determined before computing reward
functions. For an action that moves the robot into a cell Cj , the
reward function is
R(s, a = Cj )
b(sj ) − bmin αj (t0 , tk ) − αmin (24)
=λ + (1 − λ)
bmax − bmin αmax − αmin

where λ ∈ [0, 1] is the fusion coefficient that controls the balance


of two maps. When λ > 0.5, the source probability map is
chiefly dominated in reward functions, which results the robot
surging to the estimated source location, i.e., the exploitation.
Conversely, when λ < 0.5, the robot moves to possible plume
areas since the plume distribution map outweighs the counterpart
in reward functions, i.e., the exploration. An ideal value of λ
should be adaptive for different search circumstances to generate Fig. 6. Fuzzy sets and membership functions of antecedents and consequent.
the optimal search objective. (a) Sensed plume concentration, ρ. (b) Plume nondetection period, δT . (c) Fusion
coefficient, λ.
The primary hurdle of determining the value of λ is the
vagueness of search circumstances. A critical question to ask
is that under what conditions the robot should choose its search 1) Fuzzification: Fuzzification is the process that changes
objective as the exploration or the exploitation. Attempts that real scalar values of antecedent and consequent (i.e., inputs and
use mathematical methods to quantitatively analyze differences outputs) into fuzzy values, which are the degree of uncertainty
of search conditions and assign a precise λ for different search that scalar values belong in a fuzzy set.
circumstances are hard to implement due to uncertainties of the In this work, inputs are utilized to conjecture the distance from
source location and the search environment. the robot to the odor source and the output is the value of λ. If the
Inspired by implementations of fuzzy theory in the field of robot is close to the source, it surges toward the source (λ > 0.5);
decision making [33] and data classification [34], which success- otherwise, it leans to detect plumes to gather more information
fully handles the problems with vagueness and uncertainties, a (λ < 0.5). The sensed plume concentration at the robot location
fuzzy inference approach is employed. In fuzzy theory, vague ρ(Xv (t), t) (for the simplification purpose, we use ρ as plume
variables and environments can be handled in a deterministic concentration measurements in the rest of this article) is set as
manner via linguistic descriptions and rules. By analyzing sensor an input, and due to the existence of local concentration maxima
measurements, such as plume concentrations, search circum- along the plume trajectory [36], the plume nondetection period
stances are expected to be identified. Then, the corresponding δT , i.e., the time interval between two detection events, is added
λ that dynamically balances the exploitation and exploration of to inputs. Since the positions of local concentration maxima are
the odor source information can be generated based on defined time varying in a turbulent flow environment [37], if the robot
fuzzy rules. senses consecutive high odor concentrations in a short period, it
As shown in Fig. 5, procedures of the proposed fuzzy con- is very likely that the robot is near to the source.
troller include fuzzification, defining fuzzy rules, and defuzzifi- Fig. 6 shows plots of membership functions of inputs and
cation [35]. the output, which are determined based on the distribution

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3021

TABLE I
LIST OF FUZZY RULES

Fig. 7. Result of the proposed fuzzy controller. In the plot, the horizontal axes
are two inputs, the sensed odor concentration ρ and the plume non-detection
of measured data from experiments. As shown in Fig. 6, all period δT , and the vertical axis is the output, the fusion coefficient λ.
membership functions are triangular. Three fuzzy sets have been
defined to cover the discourse of universe of the sensed plume
concentration ρ, namely low (L), medium (M), and high (H). Algorithm 1: Value Iteration Based Planning Algorithm.
The discourse of universe of the plume nondetection period δT
1: Initialize Value Functions V (Ci ) = 0, i ∈ [1, M ]
is also covered by three fuzzy sets, namely short (Sh), averaged
2: Calculate Reward Functions R(s, a) for all cells based
(Av), and long (Lo). For the output λ, five fuzzy sets, namely
on (24)
very small (VS), small (S), middle (MI), large (L), and very large
3: Set the convergence tolerance
(VL), are defined to cover its discourse of universe.
4: while Δ ≥ do
2) Fuzzy Rules: Fuzzy rules in the fuzzy inference theory
5: δ=0
are presented in a “IF–THEN” format, which determine search
6: for i ∈ [1, M ] do
strategies of the robot. In this work, fuzzy rules are designed
7: v = V (Ci )
based on moth odor searching behaviors [38]. As mentioned,
8: V (Ci ) = maxa∈A (R(s, a) + γV (a))
previous researchers [16], [17], [39] have summarized these
9: Δ = max(Δ, |v − V (Ci )|)
behaviors into a “surge/casting” model and demonstrated the
10: end for
validity of implementing this model on robots in OSL problems.
11: end while
Borrowing this idea, we want the robot to explore if plumes
12: Generate the optimal policy π ∗ = argmaxV (Ci )
are absent (like the moth’s casting behavior) and to exploit
when the robot is in plumes (like the moth’s surge behavior).
To achieve this mechanism, the distance from the robot to the
emphasizes the plume mapping information in reward functions,
odor source is estimated and monitored: if the robot is far from
and the opposite combination of ρ and δT (i.e., a large ρ and a
the source, the robot inclines to find plumes, i.e., exploration;
short δT ) provides a large λ that prioritizes the source mapping
otherwise, the robot tends to search the source, i.e., exploitation.
information in reward functions.
The inclination of changing λ is that: when ρ is high and δT
is short, the robot is very likely being close to the odor source;
thus, λ is large. On the other hand, when ρ is low and δT is long, B. Solve for the Optimal Policy
the robot is probably far from the source; thus, λ is small. In a After reward functions are determined, a search route is
“IF–THEN” format, the above rules are represented as follows: generated in the planning procedure. Given the current reward
F 1 = {IF ρ is H AND δT is Sh, THEN λ is VL } functions, we adopt a value iteration method (see Algorithm
F 2 = {IF ρ is L AND δT is Lo, THEN λ is VS }. 1) to fast determine the optimal policy, i.e., the search route.
Enumerate all possible combinations of antecedents and the The motivation of using this method is to reduce the processing
corresponding consequent, a rule table (see Table I) can be time, which allows the robot to timely respond to new plume
obtained. observations. By contrast, solving the POMDP [41] is also
3) Defuzzification: The centroid method [40] is chosen as the feasible to obtain the search route, but, considering the large size
defuzzification algorithm, which can be expressed as follows: of the hidden state space defined in our POMDP, this approach
n becomes time-consuming and intractable. The ability of fast
i=1 Ui · μ (Ui )
U0 =  n (25) solving the searching path is one of our main concerns, since
i=1 μ (Ui ) the ultimate goal of this article is implementing this algorithm
where U0 is the output (i.e., the value of λ), i is the index of rules on a mobile robot, which has limited onboard computational
i ∈ [1, 9], μ(Ui ) is the truth value of result membership function resources.
for the ith rule, and Ui is the value where the result member- As shown in Algorithm 1, value functions of all cells are
ship function is maximum over the output variable fuzzy set initialized as 0. If the robot is currently located in a cell Ci ,
range. it could choose one of eight actions (see Fig. 4) and enter the
Fig. 7 presents the result of the proposed fuzzy controller. In corresponding cell. Note that, based on the reliable maneuver-
general, a small ρ and a long δT produce a trivial λ value that ability of the ground robot, it is assumed that the transition of

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3022 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

robot positions is deterministic, i.e., after taking an action, the


robot can correctly enter the corresponding cell. Based on the
Bellman optimality equation [27], the value function of a cell
Ci can be calculated by

V (Ci ) = max (R(s, a) + γV (a)) (26)


a∈A

where γ ∈ [0, 1] is the discounting factor that penalizes future


rewards. In this work, the larger the γ is, the broader region the
robot considers in planning the search route. In implementations,
we set γ to be 0.9.
A convergence tolerance is defined to check whether or
not value functions converge. The maximal update of value
functions (i.e., Δ in Algorithm 1) is compared with , and value Fig. 8. Search area in the simulation program.
functions are considered as converged if Δ < , otherwise value
functions keep updating. To balance the tradeoff between the
navigation performance and the processing time, we set as
10−6 in experiments to obtain a well-algorithm performance and
to save the processing time. After value functions converge, the
optimal policy is obtained by selecting the optimal action with
the maximal value function. Thus, a series of optimal actions can
be obtained between the robot current position and the maximal
reward location, which is a search route that guides the robot
toward the position with the most odor source information.
It should be noted that this optimal policy is obtained based on
Fig. 9. Airflow fields and corresponding odor plume trajectories in the sim-
the current reward functions and is not permanent. In every time ulation with different environmental settings. (a) Laminar flows, U0 = (1, 0)
step, new observations, i.e., plume detection and nondetection m/s and ς = 0. (b) Turbulent flows, U0 = (3, 0.5) m/s and ς = 30.
events, update reward functions via (24), and a new optimal
policy will be determined via Algorithm 1. The new policy TABLE II
overwrites the old one, which allows the robot to timely adjust its VALUES OF PARAMETERS IN GAUSSIAN NOISES IN SENSOR MEASUREMENTS
searching targets (exploration or exploitation) and trajectories
to intelligently fit new observations. In general, the overall
searching trajectory is a combination of a sequence of optimal
policies generated from different reward functions.

V. EXPERIMENTS mmpv: million molecules per cm3 .


A. Simulation Setup
1) Simulated Environment: The proposed olfactory-based values of boundary condition variables (i.e., U0 and ς), varying
navigation algorithm is evaluated in a simulation program that amplitudes of airflow fields can be created.
was designed based on [6]. In this simulation program, time- Fig. 9 shows snapshots of two simulated airflow fields. In
varying plume trajectories in different flow environments can be the left diagram [see Fig. 9(a)], U0 = (1, 0) m/s and ς = 0, a
customized and created. Some other researchers, such as [18], laminar airflow filed is created, and in the right diagram [see
[20], and [42], also employed this simulator as the evaluation Fig. 9(b)], U0 = (3, 0.5) m/s and ς = 30, a turbulent airflow
tool for their works. field is produced.
Fig. 8 shows the simulated search area, where the size is 2) Vehicle Assumptions: Comparing to the large scale of
100 × 100 m2 . Over the search area, a coordinate is constructed the search area, the size of the robot is negligible. Thus, the
with 40 × 25 cells in the x and y directions, respectively. An robot is approximated as a single point in the simulation. It
odor source is located at (20, 0) m (at the cell C332 ) and releases is assumed that the robot is equipped with a chemical sen-
10 plumes per second. Released plumes form a circular plume sor, an anemometer, and a positioning sensor, which measure
trajectory as plotted by a grey-scale patchy trail. Local airflow plume concentrations, wind speeds and directions in the inertial
vectors are presented by arrows in the background, where the frame, and the robot position in the inertial frame, respectively.
tail of the arrow points to the airflow direction and the length Measurements of all sensors are corrupted with Gaussian white
of an arrow indicates the strength of airflow velocity. In the noises to imitate the real-world applications, where parameters
simulator, airflows are calculated from time-varying boundary of noises are listed in Table II. The proposed olfactory-based
conditions that are generated by a mean flow (U0 ) and Gaussian navigation algorithm is operated on an onboard computer to
white noises (zero mean and ς variance). Thus, by altering process sensor readings and produce the target heading θ and

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3023

velocity v, which are limited in ranges of [−180◦ , 180◦ ] and


[0.6, 1] m/s, respectively. During the search, the robot follows
heading and velocity commands to find the odor source.
3) Experiment Designs: To evaluate the performance of the
proposed navigation method, around 60 tests have been con-
ducted on the simulation program.
These tests can be separated into three groups. In the first
group of tests, the proposed source mapping, plume mapping,
and fusion algorithms are implemented in a laminar flow envi-
ronment to verify their validities. Source and plume estimations
are presented and compared with actual source and plume loca-
tions. Tests in the second group are carried out to investigate the
effectiveness of implementing the proposed navigation method
in a turbulent flow environment. Results are compared with a
moth-inspired method [16] and a Bayesian inference method
[18], which are typical plume tracing approaches in categories
of bio-inspired and engineering-based methods, respectively. In
the last group of tests, the robustness of the proposed naviga-
tion method is evaluated. Various search conditions, including
varying initial search positions and airflow fields, are designed.
Similar to the second group of tests, implementation results of
the proposed navigation method are compared with two typical
plume tracing methods.

B. Group 1: Implementation in a Laminar Flow Environment


This section demonstrates the results of implementing the Fig. 10. Results of the source mapping, plume mapping, and fusion algorithms
proposed source mapping (see Section III-C5), plume mapping after the robot detecting plumes for the first time. In the left column of diagrams
[i.e., (a), (c), and (e)], cells in the search area are painted with different colors to
(see Section III-D), and fusion algorithms (see Section IV) in the reflect various values of algorithm results. In the right column of diagrams [i.e.,
simulation. The robot starts the OSL task at (60, −40) m (or in (b), (d), and (f)], the horizontal plane is the grid that covers the search area, and
the cell C64 ) in a laminar flow environment, where U0 = (2, 0) the star mark in the horizontal plane indicates the actual odor source location.
(a) Source estimations. (b) Source probability map. (c) Plume estimations. (d)
m/s and ς = 5. Plume distribution map. (e) Reward functions. (f) The plot of reward functions.
Fig. 10 shows results of the above algorithms after the robot
detecting odor plumes for the first time. To visualize algorithm
results, cells in the search area are painted with various colors,
where darker cells have the higher values (red: largest, white: the plume trajectory, and the maximal reward location (i.e., the
lowest). Depending on the implementing algorithm, the value of target of the path planning algorithm), as shown in Fig. 10(f),
a cell could be the probability of containing the source (i.e., the is at the upflow area of the last plume detection location. Test
result of source mapping), the probability of carrying plumes results reveal that reward functions are instructive for the robot to
(i.e., the result of plume mapping), and the reward function (i.e., collect more odor source information, since the robot is expected
the result of fusion algorithm). to detect more plumes at the maximal reward location.
For the source mapping algorithm, it can be observed in
Fig. 10(a) that possible source locations are narrowed to upflow
C. Group 2: Implementation in a Turbulent Flow Environment
areas of the plume detection location, and as shown in Fig. 10(b),
the cell with the maximal probability of containing the odor In this test, the proposed navigation method is implemented
source is close to the actual odor source location. For the plume in an environment with turbulent flows (U0 = (3, 0.5) m/s and
mapping algorithm, Fig. 10(a) and (d) illustrates plume estima- ς = 30). Fig. 11 shows reward functions over the search area
tions and the plume distribution map, respectively. As shown at different times in an OSL task and the plot of the fusion
in these two diagrams, plume estimations correctly overlap with coefficient λ. Similar to the first group of tests, cells in the search
the actual plume trajectory, and the cell with the peak probability area are painted with different colors, where the darker color
of containing plumes is located at the upflow area, which is a indicates the higher value in reward functions and vice versa.
reasonable estimation of plume propagation areas. The robot starts at the cell C64 and adopts a “zigzag” search
For the fusion algorithm, due to the low sensed plume con- trajectory at the beginning of the OSL task, where the source
centration ρ and the short plume nondetection period δT , the mapping algorithm is activated simultaneously. As shown in
fusion coefficient λ is middle according to the defined fuzzy Fig. 11(a), the source mapping algorithm excludes possible
rules. Fig. 10(e) presents the distribution of reward functions source locations from the upflow areas of the robot (white areas),
over the search area. It can be seen that reward functions cover since the robot does not detect plumes. At 55 s, the robot detects

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3024 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

Fig. 11. Robot search trajectories and reward functions at different time steps in an environment with turbulent flows. The plot of the fusion coefficient λ versus
the search time t is presented at the center of the diagram, where cross marks indicates plume detection events. Diagrams around the center plot are robot trajectories
and reward functions at different time steps. For each of these diagrams, the robot trajectory is represented by the trail of dark arrows; the grey-scale patchy trail
in the middle of the background indicates the simulated plume trajectory; cells are painted with colors according to their reward values, where darker cells have
higher reward values (red: largest, white: smallest).

plumes for the first time, and the plume mapping and fusion TABLE III
SEARCH TIME AND TRAVEL DISTANCE OF THREE NAVIGATION METHODS IN A
algorithms are activated to generate reward functions. TURBULENT FLOW ENVIRONMENT
From 55 to 172 s, the robot is in the exploration, where the
robot is encouraged to detect plumes and gather odor source
information. Specifically, it can be observed in the plot of λ
that the value of λ fluctuates between 0 and 0.5 due to the low
sensed odor concentration ρ and the long plume nondetection
period δT . As the result, plume estimations outweigh source
estimations in reward functions, which drives the robot to seek -:Fail to locate the source within 500 s.

plumes. Note that, at 96 and 97 s, λ rises to 0.75 but quickly


falls to 0.5. It is because the robot senses a local concentration alters rapidly in a turbulent flow environment, thus, the robot
maximum at 96 s, but the successive sensed concentration is not can barely stay in plumes and surge upflow to seek the odor
as high as the previous, which contributes the drop of λ. After source. For the Bayesian inference method, it can be observed
successively detecting high concentration plumes at 175 s, the in Fig. 12(b) that due to the lack of plume estimations, the
robot is in the exploitation state. At 175 s, λ rises to 0.83, which robot constantly circulates and tries to detect new plumes after
indicates that the robot is near the source, and the robot surges the second plume detection event (i.e., the middle cross on the
toward the estimated source location. At 200 s, reward functions searching trajectory). As the result, the robot spends a lot of
converge to a single cell C292 [i.e., the red cell in Fig. 11(h)], time to recover from plume nondetection events. By contrast,
which overlaps the actual odor source location, and the robot the proposed navigation method achieves the best performance
successfully finds the odor source location. with the shortest search time and travel distance. Compared
A moth-inspired method [16] and a Bayesian inference to the search trajectory of the Bayesian inference method, the
method [18] are implemented in the same environment to com- robot implemented with the proposed navigation method detects
pare with the proposed method. Fig. 12 shows search trajecto- more plumes, and after several plume detection events, the
ries, and Table III compares search times and travel distances robot surges toward estimated source location and correctly
of three navigation methods. As shown in Fig. 12(a), the moth- finds the odor source eventually. Results in this test verify the
inspired method fails to find the odor source within the time validity of implementing the proposed method in a turbulent
limit (500 s). The primary reason is that the plume trajectory flow environment.

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3025

Fig. 13. Search trajectories of the proposed navigation method at different ini-
tial positions. The robot starts an OSL task from (a) (45, 50) m. (b) (5, −30) m.
(c) (30, 30) m. (d) (55, 20) m. (e) (70, 45) m. (f) (70, −45) m.

demonstrates the validity of the proposed navigation method


with varying initial searching positions.
2) Results of Scenario 2: In order to evaluate the perfor-
mance of the proposed navigation method, the moth-inspired
Fig. 12. Search trajectories of three navigation methods in a turbulent flow en- method and the Bayesian-based method are also implemented
vironment. (a) Moth-inspired method. (b) Bayesian-based method. (c) Proposed and compared in this test. For each navigation method, the
method. test is repeated 15 times in environments with varying airflow
conditions. Table IV presents airflow conditions of 15 tests and
search times of three navigation methods in the corresponding
environment, and Table V lists statistical results of all tests. It
D. Group 3: Varying Search Conditions can be seen from Table IV that the moth-inspired method barely
In this test, the robustness of the proposed navigation method succeeds to find the odor source in turbulent flow environments.
in varying search conditions is investigated. Two scenarios have Comparing to the Bayesian inference method, the robot with the
been designed in this test, which are listed in the following. proposed method achieves a higher success rate (100% vs. 80%)
1) Scenario 1: the robot starts an OSL task at different initial and a shorter averaged search time (199.5 s vs. 299.1 s), which
positions in a turbulent flow environment (U0 = (2, 0) illustrates the effectiveness of the proposed method in varying
m/s and ς = 15). airflow environments.
2) Scenario 2: the robot starts at the same initial posi-
tion, but airflow fields are varying (ux ∈ [1, 3] m/s, uy ∈
[−0.5, 0.5] m/s, and ς ∈ [10, 30]). E. Discussion
1) Results of Scenario 1: Six tests have been conducted in Results in the above tests reveal the effectiveness of the
Scenario 1, and Fig. 13 presents search trajectories of all tests. It proposed olfactory-based navigation method. However, it is
can be observed in Fig. 13 that all search trajectories terminates worth mentioning that the limitation of our method is the lack of
at the actual source location, which is marked by a star, i.e., global wind measurements. Due to this reason, as mentioned
the robot correctly finds the odor source in all tests, which in Section III-B2, wind measurements at robot positions are

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3026 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021

TABLE IV method and the Bayesian-based method, the proposed method


ENVIRONMENTAL SETTINGS AND SEARCH TIMES
OF DIFFERENT
is more effective and intelligent in turbulent flow environments.
NAVIGATION METHODS

REFERENCES
[1] G. Kowadlo and R. A. Russell, “Robot odor localization: A taxonomy and
survey,” Int. J. Robot. Res., vol. 27, no. 8, pp. 869–894, 2008.
[2] M. Dunbabin and L. Marques, “Robots for environmental monitoring:
Significant advancements and applications,” IEEE Robot. Autom. Mag.,
vol. 19, no. 1, pp. 24–39, Mar. 012.
[3] S. Soldan, G. Bonow, and A. Kroll, “Robogasinspector-a mobile robotic
system for remote leak sensing and localization in large industrial en-
vironments: Overview and first results,” IFAC Proc. Vol., vol. 45, no. 8,
pp. 33–38, 2012.
[4] R. A. Russell, “Robotic location of underground chemical sources,” Robot-
ica, vol. 22, no. 1, pp. 109–115, 2004.
[5] G. Ferri, M. V. Jakuba, and D. R. Yoerger, “A novel method for hydrother-
mal vents prospecting using an autonomous underwater robot,” in Proc.
IEEE Int. Conf. Robot. Autom., 2008, pp. 1055–1060.
[6] J. A. Farrell, J. Murlis, X. Long, W. Li, and R. T. Cardé, “Filament-based
atmospheric dispersion model to achieve short time-scale structure of odor
plumes,” Environ. Fluid Mech., vol. 2, no. 1–2, pp. 143–169, 2002.
[7] H. Ishida, K.-I. Suetsugu, T. Nakamoto, and T. Moriizumi, “Study of
-:Fail to locate the source within 500 s. autonomous mobile sensing system for localization of odor source using
gas sensors and anemometric sensors,” Sensors Actuators A, Phys., vol. 45,
TABLE V no. 2, pp. 153–157, 1994.
STATISTICAL RESULTS OF REPEATED TESTS AND THE COMPARISON [8] G. Sandini, G. Lucarini, and M. Varoli, “Gradient driven self-organizing
OF THREE NAVIGATION METHODS systems,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 1993, vol. 1,
pp. 429–432.
[9] F. W. Grasso, T. R. Consi, D. C. Mountain, and J. Atema, “Biomimetic
robot lobster performs chemo-orientation in turbulence using a pair of
spatially separated sensors: Progress and challenges,” Robot. Auton. Syst.,
vol. 30, no. 1–2, pp. 115–131, 2000.
[10] R. A. Russell, A. Bab-Hadiashar, R. L. Shepherd, and G. G. Wallace, “A
comparison of reactive robot chemotaxis algorithms,” Robot. Auton. Syst.,
vol. 45, no. 2, pp. 83–97, 2003.
[11] A. Lilienthal and T. Duckett, “Experimental analysis of gas-sensitive
Braitenberg vehicles,” Adv. Robot., vol. 18, no. 8, pp. 817–834, 2004.
[12] H. Ishida, G. Nakayama, T. Nakamoto, and T. Moriizumi, “Controlling
a gas/odor plume-tracking robot based on transient responses of gas
utilized to estimate plume advection distances. This approxi- sensors,” IEEE Sensors J., vol. 5, no. 3, pp. 537–545, Jun. 2005.
[13] J. Murlis and C. Jones, “Fine-scale structure of odour plumes in relation
mation introduces additional errors if the robot is in a highly to insect orientation to distant pheromone and other attractant sources,”
turbulent flow environment, i.e., wind directions and velocities Physiol. Entomol., vol. 6, no. 1, pp. 71–86, 1981.
have a huge variance in space. It can be observed in Table IV that [14] R. T. Cardé and A. Mafra-Neto, “Mechanisms of flight of male moths to
pheromone,” in Insect Pheromone Research. Berlin, Germany: Springer,
the searching time grows significantly if wind fields are highly 1997, pp. 275–290.
turbulent (e.g., Test 4, 7, and 11). This issue could be alleviated [15] R. Kanzaki, N. Sugi, and T. Shibuya, “Self-generated zigzag turning of
with the multiagent searching algorithm. By employing multiple Bombyx mori males during pheromone-mediated upwind walking (Phys-
iology),” Zool. Sci., vol. 9, no. 3, pp. 515–527, 1992.
robots, wind information at different locations are obtained, [16] W. Li, J. A. Farrell, S. Pang, and R. M. Arrieta, “Moth-inspired chemical
and a comprehensive wind map over the searching area could plume tracing on an autonomous underwater vehicle,” IEEE Trans. Robot.,
be derived. The design and implementation of the multiagent vol. 22, no. 2, pp. 292–307, Aug. 2006.
[17] J. A. Farrell, S. Pang, and W. Li, “Chemical plume tracing via an au-
searching algorithm is one of our prospective research direc- tonomous underwater vehicle,” IEEE J. Ocean. Eng., vol. 30, no. 2,
tions. pp. 428–442, Apr. 2005.
[18] S. Pang and J. A. Farrell, “Chemical plume source localization,” IEEE
Trans. Syst., Man, Cybern., B, vol. 36, no. 5, pp. 1068–1080, Oct. 2006.
VI. CONCLUSION [19] J.-G. Li, Q.-H. Meng, Y. Wang, and M. Zeng, “Odor source localization
using a mobile robot in outdoor airflow environments with a particle filter
An olfactory-based navigation algorithm based on model- algorithm,” Auton. Robots, vol. 30, no. 3, pp. 281–292, 2011.
based RL and fuzzy inference methods was presented in this [20] J. A. Farrell, S. Pang, and W. Li, “Plume mapping via hidden Markov meth-
ods,” IEEE Trans. Syst., Man, Cybern., Part B, vol. 33, no. 6, pp. 850–863,
article. The OSL problem was modeled as a model-based RL Dec. 2003.
problem, in which belief states in a POMDP model are adapted to [21] H. Jiu, Y. Chen, W. Deng, and S. Pang, “Underwater chemical plume
generate a source probability map, and a plume distribution map tracing based on partially observable Markov decision process,” Int. J.
Adv. Robot. Syst., vol. 16, no. 2, 2019. [Online]. Available: https://doi.org/
is constructed via an HMM-based method. The information from 10.1177/1729881419831874
both maps was fused by a fuzzy inference based fuzzy controller [22] H.-F. Jiu, S. Pang, J.-L. Li, and B. Han, “Odor plume source localization
and assigned to reward functions, and the value iteration method with a pioneer 3 mobile robot in an indoor airflow environment,” in Proc.
IEEE Southeastcon, 2014, pp. 1–6.
was adopted to solve for the optimal policy. Experiment results [23] H. Hu, S. Song, and C. P. Chen, “Plume tracing via model-free reinforce-
showed that the proposed navigation method was valid in turbu- ment learning method,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30,
lent flow environments. Besides, compared to the moth-inspired no. 8, pp. 2515–2527, Aug. 2019.

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3027

[24] M. Vergassola, E. Villermaux, and B. I. Shraiman, “‘Infotaxis’ as a strategy Lingxiao Wang (Student Member, IEEE) received
for searching without gradients,” Nature, vol. 445, no. 7126, 2007, Art. the B.S. degree in electrical engineering from the
no. 406. Civil Aviation University of China, Tianjin, China,
[25] S. Pang and F. Zhu, “Reactive planning for olfactory-based mobile robots,” in 2015 and the M.S. degree in electrical and com-
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2009, pp. 4375–4380. puter engineering from Embry-Riddle Aeronautical
[26] D. Silver et al., “Mastering the game of go with deep neural networks and University, Daytona Beach, FL, USA, in 2017. He is
tree search,” Nature, vol. 529, no. 7587, 2016, Art. no. 484. currently working toward the Ph.D. degree with the
[27] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Department of Electrical Engineering and Computer
Cambridge, MA, USA: MIT Press, 2018. Science, Embry-Riddle Aeronautical University.
[28] W. Naeem, R. Sutton, and J. Chudley, “Chemical plume tracing and odour His current research interests include autonomous
source localisation by autonomous vehicles,” J. Navig., vol. 60, no. 2, systems, olfactory-based navigation methods, and
pp. 173–190, 2007. artificial intelligence.
[29] F. Rahbar, A. Marjovi, and A. Martinoli, “An algorithm for odor source
localization based on source term estimation,” in Proc. Int. Conf. Robot.
Autom., 2019, pp. 973–979.
[30] B. Luo, Q.-H. Meng, J.-Y. Wang, and M. Zeng, “A flying odor compass to
autonomously locate the gas source,” IEEE Trans. Instrum. Meas., vol. 67,
no. 1, pp. 137–149, Jan. 2018. Shuo Pang (Member, IEEE) received the B.S. degree
[31] O. Sigaud and O. Buffet, Markov Decision Processes in Artificial Intelli- in electrical engineering from Harbin Engineering
gence. New York, NY, USA: Wiley, 2013. University, Harbin, China in 1997, and the M.S. and
[32] M. L. Littman, “A tutorial on partially observable Markov decision pro- Ph.D. degrees in electrical engineering from the Uni-
cesses,” J. Math. Psychol., vol. 53, no. 3, pp. 119–125, 2009. versity of California, Riverside, CA, USA in 2001
[33] G. C. Sousa and B. K. Bose, “A fuzzy set theory based control of a and 2004, respectively.
phase-controlled converter dc machine drive,” IEEE Trans. Industry Appl., He is currently an Associate Professor with the
vol. 30, no. 1, pp. 34–44, Jan./Feb. 1994. Department of Electrical Engineering and Com-
[34] M. Pratama, J. Lu, and G. Zhang, “Evolving type-2 fuzzy classifier,” IEEE puter Science, Embry-Riddle Aeronautical Univer-
Trans. Fuzzy Syst., vol. 24, no. 3, pp. 574–589, Jun. 2016. sity, Daytona Beach, FL, USA. His research inter-
[35] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, 1965. ests include spans theoretical-algorithm development
[36] J. P. Crimaldi, M. B. Wiley, and J. R. Koseff, “The relationship between and application-driven intelligent systems. His current research interests also
mean and instantaneous structure in turbulent passive scalar plumes,” J. include embedded systems, robotics, and artificial intelligence techniques
Turbulence, vol. 3, no. 14, pp. 1–24, 2002. for autonomous vehicles, i.e., autonomous vehicle chemical plume tracing,
[37] J. Elkinton, R. Cardé, and C. Mason, “Evaluation of time-average dis- autonomous vehicle online mapping, and planning.
persion models for estimating pheromone concentration in a deciduous
forest,” J. Chem. Ecol., vol. 10, no. 7, pp. 1081–1108, 1984.
[38] R. T. Cardé and M. A. Willis, “Navigational strategies used by insects to
find distant, wind-borne sources of odor,” J. Chem. Ecol., vol. 34, no. 7,
pp. 854–866, 2008.
[39] S. Shigaki, T. Sakurai, N. Ando, D. Kurabayashi, and R. Kanzaki, “Time-
varying moth-inspired algorithm for chemical plume tracing in turbu-
lent environment,” IEEE Robot. Autom. Lett., vol. 3, no. 1, pp. 76–83, Jinlong Li is currently working toward the Ph.D.
Jan. 2018. degree in naval architecture and ocean engineering
[40] B. K. Bose, “Expert system, fuzzy logic, and neural network applications with Shanghai Jiao Tong University, Shanghai, China.
in power electronics and motion control,” Proc. IEEE, vol. 82, no. 8, His current research interests include online map-
pp. 1303–1323, 1994. ping and planning in chemical plume tracing via
[41] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning an autonomous vehicle and computational fluid
algorithms for POMDPS,” J. Artif. Intell. Res., vol. 32, pp. 663–704, 2008. dynamics.
[42] Q. Lu, Q.-L. Han, and S. Liu, “A cooperative control framework for a
collective decision on movement behaviors of particles,” IEEE Trans. Evol.
Comput., vol. 20, no. 6, pp. 859–873, Dec. 2016.

Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.

You might also like