Olfactory-Based Navigation Via Model-Based Reinforcement Learning and Fuzzy Inference Methods
Olfactory-Based Navigation Via Model-Based Reinforcement Learning and Fuzzy Inference Methods
Abstract—This article presents an olfactory-based navigation An effective navigation method is crucial for an OSL problem.
algorithm for using a mobile robot to locate an odor source in a Like image-based navigation methods, which use the informa-
turbulent flow environment. We analogize the odor source localiza- tion extracted from images as the reference to locate and navigate
tion as a reinforcement learning problem. During the odor plume
tracing process, the belief state in a partially observable Markov a robot, olfactory-based navigation methods detect odor plumes
decision process model is adapted to generate a source probability as cues to guide a robot toward an odor source. The challenging
map that estimates possible odor source locations, and a hidden part of this navigation problem is to estimate plume locations,
Markov model is employed to produce a plume distribution map which are not only related to the molecular diffusion that takes
that premises plume propagation areas. Both source and plume esti-
plumes away from the odor source but also the advection of
mations are fed to the robot, and a decision-making approach based
on fuzzy inference is designed to dynamically fuse information from airflow [6].
two maps and to balance the exploitation and exploration of the The most straightforward olfactory-based navigation ap-
search. After assigning the fused information to reward functions, proach is the chemotaxis [7], which commands the robot to move
a value iteration based path planning algorithm is presented to along the gradient of odor plume concentrations. A common
solve for the optimal action policy. Comparing to other commonly
implementation of this method is to install a pair of chemical
used olfactory-based navigation algorithms, such as moth-inspired
and Bayesian inference methods, simulation results show that the sensors on the left and right sides of a robot, and the robot
proposed method is more intelligent and efficient. is commanded to steer toward the side with the higher con-
centration [8]. Many experiments [9]–[12] have proved that the
Index Terms—Fuzzy theory, odor source localization (OSL),
olfactory-based navigation, partially observable Markov decision chemotaxis method is effective when the odor source is placed in
process (POMDP). an environment with laminar (i.e., low Reynolds numbers) flows.
However, this method is not applicable in an environment with
I. INTRODUCTION turbulent flows (i.e., high Reynolds numbers), since odor plumes
are congregated into packets and the gradient of concentration
LFACTION, also known as the sense of smell, is an
O important sensing ability for animals to perform life-
essential activities, such as homing, foraging, mate-seeking,
is a patchy and intermittent signal [13].
Alternatively, two other categories of olfactory-based naviga-
tion strategies have been proposed, namely bio-inspired methods
and evading predators. Inspired by olfactory capabilities of and engineering-based methods. A bio-inspired method directs
animals, an autonomous vehicle or a mobile robot, equipped with the robot to mimic animal behaviors, such as mate-seeking be-
odor-detection sensors (e.g., chemical sensors), could locate an haviors of male moths, which could successfully locate a female
odor (or volatile chemical) source in an unknown environment. moth by tracking pheromones over a long distance [14]. To com-
The technology of employing a robot to find an odor source plete this task, a male moth follows a “surge/casting” behavior
is referred to as odor source localization (OSL) [1]. Some pattern: it will fly upwind (surge) when detects pheromones
practical OSL applications that are frequently quoted include and traverse the wind (casting) when pheromones are absent.
monitoring air pollution [2], locating chemical gas leaks [3], Ryohei et al. [15] generalized the “surge/casting” model and
locating unexploded mines and bombs [4], and detecting bio- implemented it on a ground wheeled vehicle to find an odor
logical phenomena such as underwater hydrothermal vents [5]. source in a laminar flow environment. Li et al. [16] implemented
this method on an autonomous underwater vehicle to search for
Manuscript received February 8, 2020; revised May 29, 2020 and July 10,
2020; accepted July 17, 2020. Date of publication July 24, 2020; date of current
an underwater chemical source. Experiment results [17] proved
version October 6, 2021. (Corresponding author: Shuo Pang.) the validity of this method.
Lingxiao Wang and Shuo Pang are with the Department of Electrical Engi- By contrast, an engineering-based method does not follow a
neering and Computer Science, Embry-Riddle Aeronautical University, Daytona
Beach, FL 32114 USA (e-mail: [email protected]; [email protected]).
fixed behavior pattern. It utilizes math and physics approaches
Jinlong Li is with the Shanghai Jiaotong University, Shanghai 201101, China to model odor plume distribution and estimates possible odor
(e-mail: [email protected]). source locations. Then, a path planner is employed to guide the
Color versions of one or more of the figures in this article are available online
at https://ieeexplore.ieee.org.
robot moving toward the estimated source location. Methods that
Digital Object Identifier 10.1109/TFUZZ.2020.3011741 produce a source probability map, i.e., a map that indicates the
1063-6706 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3015
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3016 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3017
where Xs (tl ) is the odor source location. If the odor source pij (tl , tk ) can be calculated as follows:
continuously releases plumes in the time interval [tl , tk ],
assuming that the release rate is G plumes per second, pij (tl , tk )
there are G(tk − tl ) plumes released, and positions of re- −
(xj −sx −xs )2
−
(yj −sy −ys )2
2 2
leased plumes can be denoted by P(tl , tk ) = [X(tl ), X(tl + e 2(tk −tl )σx
e 2(tk −tl )σy
= dxs dys
dτ ), X(tl + 2dτ ), ..., X(tk )] where dτ = 1/G. 2π(tk − tl )σx2 2π(tk − tl )σy2
2) Single Released Odor Plume: This section presents the xs ∈Ci ys ∈Ci
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3018 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
Fig. 4. Action space. The robot location is at the center cell. Arrows indicate
Fig. 3. Basic POMDP model. possible actions that the robot can take.
detecting plumes in a cell Cj at time tk due to the continuous For instance, s1 = [1, 0, ..., 0] indicates that the odor source
plume release from a cell Ci is is in the cell C1 , s2 = [0, 1, ..., 0] represents that the odor source
tk−1 is in the cell C2 , etc. Since the odor source could be located in an
κij (tl , tk ) = [1 − βpij (tl , tk )] . (10) arbitrary cell inside the search area, the state space is represented
tl
as S = {s1 , s2 , ...sM }.
2) Action Space: The action space defines possible actions
Since plume detection and nondetection events are comple- that an agent could select. As shown in Fig. 4, at the center cell
mentary, the probability of detecting plumes under the same C(f,g) , the robot could select one of eight actions and enter the
condition (i.e., continuous plume release) is 1 − κij (tl , tk ). corresponding cell around it.
In this work, an action is represented by the destination cell.
C. Source Mapping For example, a2 in Fig. 4 can be represented as a2 = C(f,g−1)
Belief states in a POMDP model are adapted as source esti- since the destination cell of this action is C(f,g−1) . Thus, the
mations, which are used to construct a source probability map. action space can be represented as A = {a1 = C(f −1,g−1) , a2 =
A basic POMDP model can be defined by a tuple (S, A, Ω, P , C(f,g−1) , ..., a8 = C(f +1,g+1) }.
O, R, b0 ) as shown follows [31]: 3) State Transition Probabilities: The location of the odor
• S is a state space. source is stationary in this work, i.e., the odor source cannot
• A is an action space. move. Thus, the state transition probability is 1 if the new state
• Ω is an observation space. is the same as the old state; otherwise, this probability is 0
• P are state transition probabilities between states.
1 i=j
• O are observation probabilities. P (s = si |s = sj , a) = (11)
• R is the reward function defined on the transitions. 0 i = j
• b0 is an initial probability distribution over states. where i, j ∈ [1, M ].
As shown in Fig. 3, at each time-step, the agent receives an 4) Observation Space and Probabilities: When the robot
observation (o, o ∈ Ω) at the current state (s, s ∈ S), and after enters a cell Cj , it could or could not detect odor plumes.
performing an action (a, a ∈ A), the agent is transferred to a Thus, two observation states are defined in the observation space
new state (s , s ∈ S) according to the state transition probability ¯ namely the plume detection event d and the plume
Ω = {d, d},
P (s |s, a) and receives a new observation (o , o ∈ Ω) with the nondetection event d. ¯ A fixed plume concentration threshold
observation probability O(a, s , o ) and a reward R(s, a). Since is adopted to identify two events, i.e., a plume detection event
states are hidden to the agent, a probability distribution over is confirmed when the sensed concentration is higher than the
states is defined as the belief state b(s), which indicates the threshold, otherwise, a plume nondetection event is confirmed.
probability of the agent being in a particular state s, and the When the robot enters a cell Cj , the probability of detecting
initial belief state is b0 . continuous released plumes is 1 − κij (tl , tk ) and the probabil-
To illustrate the proposed source mapping algorithm, the rest ity of not detecting these plumes is κij (tl , tk ) as defined in
of section presents an approach that adapts elements in a basic Section III-B3. Thus, the observation probability O(a, s , o ) is
POMDP model to the context of an OSL problem. defined as follows:
1) State Space: States in a basic POMDP model are hidden
to the agent, i.e., the agent does not know which state it is in. 1 − κij (tl , tk ) o = d
O(a = Cj , s = si , o ) = (12)
In an OSL problem, the actual odor source location is unknown κij (tl , tk ) o = d¯
to the robot. Thus, we defined states as the actual odor source
location, which can be represented by a length-M (M is the where i, j ∈ [1, M ].
number of cells in the search area) vector of Boolean values 5) Belief States: In a basic POMDP model, after taking an
indicating whether each cell contains the odor source. If the action a and transferring to a new state s , the belief state of the
Boolean value is 1, then the corresponding cell contains the new state b(s ) is updated based on the old belief state b(s), the
odor source; otherwise, this value is 0. observation probability O, and the state transition probability
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3019
P , which is represented as [32] time t1 , which can be obtained from the airflow history [20]. In
addition, (18) can be rewritten in a vector notation
O(a, s , o ) s∈S P (s |s, a)b(s)
b(s ) =
. (13)
s∈S s ∈S O(a, s , o )P (s |s, a)b(s) ᾱ(t0 , t1 ) = ᾱ(t0 , t0 )Ā(t0 ) (19)
In an OSL problem, the belief state can be interpreted as the where Ā(t0 ) = [akj (t0 )] ∈ RM ×M is the matrix form of
probability of the robot believing that there is an odor source in akj (t0 ). Moreover, for an arbitrary plume propagation period,
a cell. Based on the defined observation probability (12) and the i.e., (tk − t0 ), ᾱ(t0 , tk ) can be calculated as follows:
state transition probability (11), (13) can be rewritten as follows: ⎧
⎧ (1−κ (t ,t ))b(s ) ⎪
⎨0, for tk < t0
⎪
⎪
ij l k i
o = d
⎨ M i=1 (1−κij (tl ,tk ))b(si ) ᾱ(t0 , tk ) = bI, for tk = t0 . (20)
b(s = si ) = . (14) ⎪
⎩
⎪ bΦ(t0 , tk ), for tk > t0
⎪
⎩ Mκij (tl ,tk )b(si ) ¯
o =d
i=1 κij (tl ,tk )b(si ) where I is theidentity matrix with the size of M × M and
The above equations iteratively update belief states depending Φ(t0 , tk+1 ) = kq=0 Ā(tq ).
on plume detection and nondetection events. The initial belief For the continuous plume release scenario, α(t0 , tk ) can be
state b0 is defined as 1/M , since the prior information about the derived from the single release case ᾱ(t0 , tk ) by considering all
odor source location is unavailable to the robot before it starts the release times from t0 to tk
search. However, it could be exploited through an appropriate k
1
distribution of b0 to reflect the prior knowledge known about α(t0 , tk ) = ᾱ(tq , tk ) (21)
the source if the information regarding the source location is k+1 q=0
available prior to the search.
In summary, by calculating belief states over all states, a where 1/(k + 1) is the normalization factor to maintain
source probability map b that estimates the source location is ||α(t0 , tk )||1 = 1. With (20) and (21) can be further reduced
obtained as follows:
k
b = [b(s1 ), b(s2 ), ..., b(sM )]. (15) 1
α(t0 , tk ) = bΦ(tq , tk )
k + 1 q=0
D. Plume Mapping k−1
b
The plume mapping algorithm produces a plume distribution = Φ(tk , tk ) + Φ(tq , tk )
map, which indicates possible plume propagation areas. With k+1 q=0
the produced source probability map and the recorded airflow k−1
history, an HMM-based plume mapping algorithm [20] is pre- b
= I+ Φ(tq , tk )
sented in this section. k+1 q=0
Let αj (t0 , tk ) denote the probability that a cell Cj contains
the detectable odor plume at time tk due to the continuous plume = bΨ(t0 , tk ), (22)
release by the source starting at t0 , where t0 is the initial time k−1
that the robot records airflow measurements. Denote where Ψ(t0 , tk ) = 1/(k + 1)[I + q=0 Φ(tq , tk )], which can
be iteratively updated as follows:
α(t0 , tk ) = [α1 (t0 , tk ), α2 (t0 , tk ), ..., αM (t0 , tk )] (16)
1
Ψ(t0 , tk ) = I + kΨ(t0 , tk−1 )Ā(tk−1 ) . (23)
as the vector storing this variable for each cell, which is a plume k+1
distribution map at the current time tk .
Note that, since Ā(tk−1 ) relates to the latest airflow measure-
Introduce the variable ᾱj (t0 , tk ) representing the probability
ment, (23) encapsulates the airflow history over the entire search
that a cell Cj contains a detectable odor plume at tk due to a
time (i.e., from t0 to tk ).
single plume release at time t0 . Define
In summary, when the source probability map b is available
ᾱ(t0 , tk ) = [ᾱ1 (t0 , tk ), ᾱ2 (t0 , tk ), ..., ᾱM (t0 , tk )] (17) at the current time tk , a plume propagation map α(t0 , tk ) can
as the vector form of ᾱj (t0 , tk ). At t0 , the plume propagation be obtained by (22), and if b is updated in the next time step,
has not occurred yet and plumes are at the odor source location, by updating Ψ with the latest airflow measurements, a renewed
therefore, ᾱ(t0 , t0 ) = b since the actual odor source location is plume distribution map based on the new source probability map
unknown. To find ᾱj (t0 , t1 ), which is the probability of a cell can be obtained.
Cj containing plumes after one time step, the plume transitions
from all other cells to the cell Cj must be considered, i.e., IV. PLANNING
M A. Generate Reward Functions With Fuzzy Inference
ᾱj (t0 , t1 ) = ᾱk (t0 , t0 )akj (t0 ) (18) After the source probability map b and the plume distribution
k=1 map α(t0 , tk ) are obtained, information from two maps is fused
where akj (t0 ) denotes the probability of the one step transition and assigned to reward functions. The information provided by
of odor plumes from a cell Ck at time t0 to another cell Cj at two maps is complementary for determining robot behaviors,
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3020 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
Fig. 5. Structure of the proposed fuzzy controller. ρ and δT are sensed plume
concentration and plume non-detection period, respectively, and λ is the fusion
coefficient.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3021
TABLE I
LIST OF FUZZY RULES
Fig. 7. Result of the proposed fuzzy controller. In the plot, the horizontal axes
are two inputs, the sensed odor concentration ρ and the plume non-detection
of measured data from experiments. As shown in Fig. 6, all period δT , and the vertical axis is the output, the fusion coefficient λ.
membership functions are triangular. Three fuzzy sets have been
defined to cover the discourse of universe of the sensed plume
concentration ρ, namely low (L), medium (M), and high (H). Algorithm 1: Value Iteration Based Planning Algorithm.
The discourse of universe of the plume nondetection period δT
1: Initialize Value Functions V (Ci ) = 0, i ∈ [1, M ]
is also covered by three fuzzy sets, namely short (Sh), averaged
2: Calculate Reward Functions R(s, a) for all cells based
(Av), and long (Lo). For the output λ, five fuzzy sets, namely
on (24)
very small (VS), small (S), middle (MI), large (L), and very large
3: Set the convergence tolerance
(VL), are defined to cover its discourse of universe.
4: while Δ ≥ do
2) Fuzzy Rules: Fuzzy rules in the fuzzy inference theory
5: δ=0
are presented in a “IF–THEN” format, which determine search
6: for i ∈ [1, M ] do
strategies of the robot. In this work, fuzzy rules are designed
7: v = V (Ci )
based on moth odor searching behaviors [38]. As mentioned,
8: V (Ci ) = maxa∈A (R(s, a) + γV (a))
previous researchers [16], [17], [39] have summarized these
9: Δ = max(Δ, |v − V (Ci )|)
behaviors into a “surge/casting” model and demonstrated the
10: end for
validity of implementing this model on robots in OSL problems.
11: end while
Borrowing this idea, we want the robot to explore if plumes
12: Generate the optimal policy π ∗ = argmaxV (Ci )
are absent (like the moth’s casting behavior) and to exploit
when the robot is in plumes (like the moth’s surge behavior).
To achieve this mechanism, the distance from the robot to the
emphasizes the plume mapping information in reward functions,
odor source is estimated and monitored: if the robot is far from
and the opposite combination of ρ and δT (i.e., a large ρ and a
the source, the robot inclines to find plumes, i.e., exploration;
short δT ) provides a large λ that prioritizes the source mapping
otherwise, the robot tends to search the source, i.e., exploitation.
information in reward functions.
The inclination of changing λ is that: when ρ is high and δT
is short, the robot is very likely being close to the odor source;
thus, λ is large. On the other hand, when ρ is low and δT is long, B. Solve for the Optimal Policy
the robot is probably far from the source; thus, λ is small. In a After reward functions are determined, a search route is
“IF–THEN” format, the above rules are represented as follows: generated in the planning procedure. Given the current reward
F 1 = {IF ρ is H AND δT is Sh, THEN λ is VL } functions, we adopt a value iteration method (see Algorithm
F 2 = {IF ρ is L AND δT is Lo, THEN λ is VS }. 1) to fast determine the optimal policy, i.e., the search route.
Enumerate all possible combinations of antecedents and the The motivation of using this method is to reduce the processing
corresponding consequent, a rule table (see Table I) can be time, which allows the robot to timely respond to new plume
obtained. observations. By contrast, solving the POMDP [41] is also
3) Defuzzification: The centroid method [40] is chosen as the feasible to obtain the search route, but, considering the large size
defuzzification algorithm, which can be expressed as follows: of the hidden state space defined in our POMDP, this approach
n becomes time-consuming and intractable. The ability of fast
i=1 Ui · μ (Ui )
U0 = n (25) solving the searching path is one of our main concerns, since
i=1 μ (Ui ) the ultimate goal of this article is implementing this algorithm
where U0 is the output (i.e., the value of λ), i is the index of rules on a mobile robot, which has limited onboard computational
i ∈ [1, 9], μ(Ui ) is the truth value of result membership function resources.
for the ith rule, and Ui is the value where the result member- As shown in Algorithm 1, value functions of all cells are
ship function is maximum over the output variable fuzzy set initialized as 0. If the robot is currently located in a cell Ci ,
range. it could choose one of eight actions (see Fig. 4) and enter the
Fig. 7 presents the result of the proposed fuzzy controller. In corresponding cell. Note that, based on the reliable maneuver-
general, a small ρ and a long δT produce a trivial λ value that ability of the ground robot, it is assumed that the transition of
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3022 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3023
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3024 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
Fig. 11. Robot search trajectories and reward functions at different time steps in an environment with turbulent flows. The plot of the fusion coefficient λ versus
the search time t is presented at the center of the diagram, where cross marks indicates plume detection events. Diagrams around the center plot are robot trajectories
and reward functions at different time steps. For each of these diagrams, the robot trajectory is represented by the trail of dark arrows; the grey-scale patchy trail
in the middle of the background indicates the simulated plume trajectory; cells are painted with colors according to their reward values, where darker cells have
higher reward values (red: largest, white: smallest).
plumes for the first time, and the plume mapping and fusion TABLE III
SEARCH TIME AND TRAVEL DISTANCE OF THREE NAVIGATION METHODS IN A
algorithms are activated to generate reward functions. TURBULENT FLOW ENVIRONMENT
From 55 to 172 s, the robot is in the exploration, where the
robot is encouraged to detect plumes and gather odor source
information. Specifically, it can be observed in the plot of λ
that the value of λ fluctuates between 0 and 0.5 due to the low
sensed odor concentration ρ and the long plume nondetection
period δT . As the result, plume estimations outweigh source
estimations in reward functions, which drives the robot to seek -:Fail to locate the source within 500 s.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3025
Fig. 13. Search trajectories of the proposed navigation method at different ini-
tial positions. The robot starts an OSL task from (a) (45, 50) m. (b) (5, −30) m.
(c) (30, 30) m. (d) (55, 20) m. (e) (70, 45) m. (f) (70, −45) m.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
3026 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 29, NO. 10, OCTOBER 2021
REFERENCES
[1] G. Kowadlo and R. A. Russell, “Robot odor localization: A taxonomy and
survey,” Int. J. Robot. Res., vol. 27, no. 8, pp. 869–894, 2008.
[2] M. Dunbabin and L. Marques, “Robots for environmental monitoring:
Significant advancements and applications,” IEEE Robot. Autom. Mag.,
vol. 19, no. 1, pp. 24–39, Mar. 012.
[3] S. Soldan, G. Bonow, and A. Kroll, “Robogasinspector-a mobile robotic
system for remote leak sensing and localization in large industrial en-
vironments: Overview and first results,” IFAC Proc. Vol., vol. 45, no. 8,
pp. 33–38, 2012.
[4] R. A. Russell, “Robotic location of underground chemical sources,” Robot-
ica, vol. 22, no. 1, pp. 109–115, 2004.
[5] G. Ferri, M. V. Jakuba, and D. R. Yoerger, “A novel method for hydrother-
mal vents prospecting using an autonomous underwater robot,” in Proc.
IEEE Int. Conf. Robot. Autom., 2008, pp. 1055–1060.
[6] J. A. Farrell, J. Murlis, X. Long, W. Li, and R. T. Cardé, “Filament-based
atmospheric dispersion model to achieve short time-scale structure of odor
plumes,” Environ. Fluid Mech., vol. 2, no. 1–2, pp. 143–169, 2002.
[7] H. Ishida, K.-I. Suetsugu, T. Nakamoto, and T. Moriizumi, “Study of
-:Fail to locate the source within 500 s. autonomous mobile sensing system for localization of odor source using
gas sensors and anemometric sensors,” Sensors Actuators A, Phys., vol. 45,
TABLE V no. 2, pp. 153–157, 1994.
STATISTICAL RESULTS OF REPEATED TESTS AND THE COMPARISON [8] G. Sandini, G. Lucarini, and M. Varoli, “Gradient driven self-organizing
OF THREE NAVIGATION METHODS systems,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 1993, vol. 1,
pp. 429–432.
[9] F. W. Grasso, T. R. Consi, D. C. Mountain, and J. Atema, “Biomimetic
robot lobster performs chemo-orientation in turbulence using a pair of
spatially separated sensors: Progress and challenges,” Robot. Auton. Syst.,
vol. 30, no. 1–2, pp. 115–131, 2000.
[10] R. A. Russell, A. Bab-Hadiashar, R. L. Shepherd, and G. G. Wallace, “A
comparison of reactive robot chemotaxis algorithms,” Robot. Auton. Syst.,
vol. 45, no. 2, pp. 83–97, 2003.
[11] A. Lilienthal and T. Duckett, “Experimental analysis of gas-sensitive
Braitenberg vehicles,” Adv. Robot., vol. 18, no. 8, pp. 817–834, 2004.
[12] H. Ishida, G. Nakayama, T. Nakamoto, and T. Moriizumi, “Controlling
a gas/odor plume-tracking robot based on transient responses of gas
utilized to estimate plume advection distances. This approxi- sensors,” IEEE Sensors J., vol. 5, no. 3, pp. 537–545, Jun. 2005.
[13] J. Murlis and C. Jones, “Fine-scale structure of odour plumes in relation
mation introduces additional errors if the robot is in a highly to insect orientation to distant pheromone and other attractant sources,”
turbulent flow environment, i.e., wind directions and velocities Physiol. Entomol., vol. 6, no. 1, pp. 71–86, 1981.
have a huge variance in space. It can be observed in Table IV that [14] R. T. Cardé and A. Mafra-Neto, “Mechanisms of flight of male moths to
pheromone,” in Insect Pheromone Research. Berlin, Germany: Springer,
the searching time grows significantly if wind fields are highly 1997, pp. 275–290.
turbulent (e.g., Test 4, 7, and 11). This issue could be alleviated [15] R. Kanzaki, N. Sugi, and T. Shibuya, “Self-generated zigzag turning of
with the multiagent searching algorithm. By employing multiple Bombyx mori males during pheromone-mediated upwind walking (Phys-
iology),” Zool. Sci., vol. 9, no. 3, pp. 515–527, 1992.
robots, wind information at different locations are obtained, [16] W. Li, J. A. Farrell, S. Pang, and R. M. Arrieta, “Moth-inspired chemical
and a comprehensive wind map over the searching area could plume tracing on an autonomous underwater vehicle,” IEEE Trans. Robot.,
be derived. The design and implementation of the multiagent vol. 22, no. 2, pp. 292–307, Aug. 2006.
[17] J. A. Farrell, S. Pang, and W. Li, “Chemical plume tracing via an au-
searching algorithm is one of our prospective research direc- tonomous underwater vehicle,” IEEE J. Ocean. Eng., vol. 30, no. 2,
tions. pp. 428–442, Apr. 2005.
[18] S. Pang and J. A. Farrell, “Chemical plume source localization,” IEEE
Trans. Syst., Man, Cybern., B, vol. 36, no. 5, pp. 1068–1080, Oct. 2006.
VI. CONCLUSION [19] J.-G. Li, Q.-H. Meng, Y. Wang, and M. Zeng, “Odor source localization
using a mobile robot in outdoor airflow environments with a particle filter
An olfactory-based navigation algorithm based on model- algorithm,” Auton. Robots, vol. 30, no. 3, pp. 281–292, 2011.
based RL and fuzzy inference methods was presented in this [20] J. A. Farrell, S. Pang, and W. Li, “Plume mapping via hidden Markov meth-
ods,” IEEE Trans. Syst., Man, Cybern., Part B, vol. 33, no. 6, pp. 850–863,
article. The OSL problem was modeled as a model-based RL Dec. 2003.
problem, in which belief states in a POMDP model are adapted to [21] H. Jiu, Y. Chen, W. Deng, and S. Pang, “Underwater chemical plume
generate a source probability map, and a plume distribution map tracing based on partially observable Markov decision process,” Int. J.
Adv. Robot. Syst., vol. 16, no. 2, 2019. [Online]. Available: https://doi.org/
is constructed via an HMM-based method. The information from 10.1177/1729881419831874
both maps was fused by a fuzzy inference based fuzzy controller [22] H.-F. Jiu, S. Pang, J.-L. Li, and B. Han, “Odor plume source localization
and assigned to reward functions, and the value iteration method with a pioneer 3 mobile robot in an indoor airflow environment,” in Proc.
IEEE Southeastcon, 2014, pp. 1–6.
was adopted to solve for the optimal policy. Experiment results [23] H. Hu, S. Song, and C. P. Chen, “Plume tracing via model-free reinforce-
showed that the proposed navigation method was valid in turbu- ment learning method,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30,
lent flow environments. Besides, compared to the moth-inspired no. 8, pp. 2515–2527, Aug. 2019.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: OLFACTORY-BASED NAVIGATION VIA MODEL-BASED REINFORCEMENT LEARNING AND FUZZY INFERENCE METHODS 3027
[24] M. Vergassola, E. Villermaux, and B. I. Shraiman, “‘Infotaxis’ as a strategy Lingxiao Wang (Student Member, IEEE) received
for searching without gradients,” Nature, vol. 445, no. 7126, 2007, Art. the B.S. degree in electrical engineering from the
no. 406. Civil Aviation University of China, Tianjin, China,
[25] S. Pang and F. Zhu, “Reactive planning for olfactory-based mobile robots,” in 2015 and the M.S. degree in electrical and com-
in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2009, pp. 4375–4380. puter engineering from Embry-Riddle Aeronautical
[26] D. Silver et al., “Mastering the game of go with deep neural networks and University, Daytona Beach, FL, USA, in 2017. He is
tree search,” Nature, vol. 529, no. 7587, 2016, Art. no. 484. currently working toward the Ph.D. degree with the
[27] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Department of Electrical Engineering and Computer
Cambridge, MA, USA: MIT Press, 2018. Science, Embry-Riddle Aeronautical University.
[28] W. Naeem, R. Sutton, and J. Chudley, “Chemical plume tracing and odour His current research interests include autonomous
source localisation by autonomous vehicles,” J. Navig., vol. 60, no. 2, systems, olfactory-based navigation methods, and
pp. 173–190, 2007. artificial intelligence.
[29] F. Rahbar, A. Marjovi, and A. Martinoli, “An algorithm for odor source
localization based on source term estimation,” in Proc. Int. Conf. Robot.
Autom., 2019, pp. 973–979.
[30] B. Luo, Q.-H. Meng, J.-Y. Wang, and M. Zeng, “A flying odor compass to
autonomously locate the gas source,” IEEE Trans. Instrum. Meas., vol. 67,
no. 1, pp. 137–149, Jan. 2018. Shuo Pang (Member, IEEE) received the B.S. degree
[31] O. Sigaud and O. Buffet, Markov Decision Processes in Artificial Intelli- in electrical engineering from Harbin Engineering
gence. New York, NY, USA: Wiley, 2013. University, Harbin, China in 1997, and the M.S. and
[32] M. L. Littman, “A tutorial on partially observable Markov decision pro- Ph.D. degrees in electrical engineering from the Uni-
cesses,” J. Math. Psychol., vol. 53, no. 3, pp. 119–125, 2009. versity of California, Riverside, CA, USA in 2001
[33] G. C. Sousa and B. K. Bose, “A fuzzy set theory based control of a and 2004, respectively.
phase-controlled converter dc machine drive,” IEEE Trans. Industry Appl., He is currently an Associate Professor with the
vol. 30, no. 1, pp. 34–44, Jan./Feb. 1994. Department of Electrical Engineering and Com-
[34] M. Pratama, J. Lu, and G. Zhang, “Evolving type-2 fuzzy classifier,” IEEE puter Science, Embry-Riddle Aeronautical Univer-
Trans. Fuzzy Syst., vol. 24, no. 3, pp. 574–589, Jun. 2016. sity, Daytona Beach, FL, USA. His research inter-
[35] L. A. Zadeh, “Fuzzy sets,” Inf. Control, vol. 8, no. 3, pp. 338–353, 1965. ests include spans theoretical-algorithm development
[36] J. P. Crimaldi, M. B. Wiley, and J. R. Koseff, “The relationship between and application-driven intelligent systems. His current research interests also
mean and instantaneous structure in turbulent passive scalar plumes,” J. include embedded systems, robotics, and artificial intelligence techniques
Turbulence, vol. 3, no. 14, pp. 1–24, 2002. for autonomous vehicles, i.e., autonomous vehicle chemical plume tracing,
[37] J. Elkinton, R. Cardé, and C. Mason, “Evaluation of time-average dis- autonomous vehicle online mapping, and planning.
persion models for estimating pheromone concentration in a deciduous
forest,” J. Chem. Ecol., vol. 10, no. 7, pp. 1081–1108, 1984.
[38] R. T. Cardé and M. A. Willis, “Navigational strategies used by insects to
find distant, wind-borne sources of odor,” J. Chem. Ecol., vol. 34, no. 7,
pp. 854–866, 2008.
[39] S. Shigaki, T. Sakurai, N. Ando, D. Kurabayashi, and R. Kanzaki, “Time-
varying moth-inspired algorithm for chemical plume tracing in turbu-
lent environment,” IEEE Robot. Autom. Lett., vol. 3, no. 1, pp. 76–83, Jinlong Li is currently working toward the Ph.D.
Jan. 2018. degree in naval architecture and ocean engineering
[40] B. K. Bose, “Expert system, fuzzy logic, and neural network applications with Shanghai Jiao Tong University, Shanghai, China.
in power electronics and motion control,” Proc. IEEE, vol. 82, no. 8, His current research interests include online map-
pp. 1303–1323, 1994. ping and planning in chemical plume tracing via
[41] S. Ross, J. Pineau, S. Paquet, and B. Chaib-Draa, “Online planning an autonomous vehicle and computational fluid
algorithms for POMDPS,” J. Artif. Intell. Res., vol. 32, pp. 663–704, 2008. dynamics.
[42] Q. Lu, Q.-L. Han, and S. Liu, “A cooperative control framework for a
collective decision on movement behaviors of particles,” IEEE Trans. Evol.
Comput., vol. 20, no. 6, pp. 859–873, Dec. 2016.
Authorized licensed use limited to: Nanyang Technological University Library. Downloaded on January 31,2023 at 08:40:58 UTC from IEEE Xplore. Restrictions apply.