Human-Robot Collaborative Minimum Time Search through Sub-priors in Ant Colony Optimization

Oscar Gil and Alberto Sanfeliu Work supported under the European project CANOPIES with grant number H2020- ICT-2020-2-101016906 and JST Moonshot R & D Grant Number JPMJMS2011-85.The authors are with the Institut de Robòtica i Informàtica Industrial (CSIC-UPC), Llorens Artigas 4-6, 08028 Barcelona, Spain. {ogil, sanfeliu}@iri.upc.edu
Abstract

Human-Robot Collaboration (HRC) has evolved into a highly promising issue owing to the latest breakthroughs in Artificial Intelligence (AI) and Human-Robot Interaction (HRI), among other reasons. This emerging growth increases the need to design multi-agent algorithms that can manage also human preferences. This paper presents an extension of the Ant Colony Optimization (ACO) meta-heuristic to solve the Minimum Time Search (MTS) task, in the case where humans and robots perform an object searching task together. The proposed model consists of two main blocks. The first one is a convolutional neural network (CNN) that provides the prior probabilities about where an object may be from a segmented image. The second one is the Sub-prior MTS-ACO algorithm (SP-MTS-ACO), which takes as inputs the prior probabilities and the particular search preferences of the agents in different sub-priors to generate search plans for all agents. The model has been tested in real experiments for the joint search of an object through a Vizanti web-based visualization in a tablet computer. The designed interface allows the communication between a human and our humanoid robot named IVO. The obtained results show an improvement in the search perception of the users without loss of efficiency.

I INTRODUCTION

For thousands of years, mankind has relied on collaboration between individuals to perform tasks as optimally as possible in a wide variety of situations. Actually, the increasing use of robots in a wide variety of settings to perform a multitude of tasks such as, for instance, in assistive robotics [1] or educational robotics [2], enhances the usefulness of improving Human-Robot Interaction (HRI) and Human-Robot Collaboration (HRC) systems [3].

Social-aware robot navigation [4] and path planning algorithms become requirements for HRC in cases where robot navigation is involved. In these cases, the communication between a robot and humans can be implicit or explicit and the participants can take different roles to accomplish the task. Side-by-side navigation [5], human-robot handover [6] and object transportation [7] are typical cases that involve implicit or explicit communication between agents.

However, there is a lack of HRC in most multi-agent systems that play an essential role in areas like Search and Rescue (SAR) [8], where a collaborative group of robots tries to find a target in an environment. These environments can require different types of robots like unmanned underwater vehicles (UUVs) [9], unmanned aerial vehicles (UAVs) [10], or unmanned ground vehicles (UGVs). Very few approaches include this collaboration in the task [11, 12].

Refer to caption
Figure 1: Human-Robot Collaborative Search with the IVO robot. The left picture shows the IVO robot and a person searching for an object. The right picture shows the tablet computer interface with the search plans information and the human participant’s preferences.

This work tackles the human-robot collaborative search of an object in urban environments from the point of view of Probabilistic Search (PS) and the Optimal Search Theory [13]. To achieve this end, the approach presented in this work is based on the Ant Colony Optimization (ACO) meta-heuristic, used by [14] to solve the Minimum Time Search (MTS) problem minimizing the Expected Time (ET) to find an object. The human preferences in the task are taken into account through a Human-Informed Robot Planning system to obtain plans for all agents. Different approaches to solving other tasks have demonstrated to benefit of this type of planning [15, 16]. To simplify the problem in this work, the implementation of the algorithm supposes the same dynamical and sensorial capabilities for all agents but different preferences.

The main novelties and contributions of this work are:

  • A new version of an MTS-ACO algorithm has been developed that takes into account learned human preferences to create search plans and allows adaptation to particular humans with a novel Human-Informed Planning using sub-priors.

  • A Probabilistic Map Predictor based on a Convolutional Neural Network (CNN) has been developed that can learn about the most likely areas where people would look for lost objects using small datasets.

  • A HRC system through an interface to ensure communication between multiple agents and devices(refer to Fig. 1).

  • An evaluation of a real case of HRC between a person and a robot to test the viability of the proposed methods and measure the participants’ perception.

The remainder of this paper is organized as follows. In Sec. II, the related work is introduced. Sec. III describes the theoretical approaches. In Sec. IV, the simulation results are presented. In Sec. V, the real-life experiment results are presented. Finally, in Sec. VI, the conclusions are provided.

II BACKGROUND

In this section, the ACO meta-heuristic, which is used for the approach presented here, is briefly explained.

II-A The ACO Meta-heuristic

ACO is a bio-inspired meta-heuristic that was proposed by M. Dorigo [17] to solve combinatorial NP-hard optimization problems. It has been widely used to solve the Travelling Salesman Problem (TSP), an NP-hard optimization problem. The algorithm simulates the foraging strategy of ants through a mathematical model. Ants try different routes in a graph, G=(C,L)𝐺𝐶𝐿G=(C,L)italic_G = ( italic_C , italic_L ), to find food in the set of nodes C and deposit pheromones, τijsubscript𝜏𝑖𝑗\tau_{ij}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, each time they take an arc (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) that connects nodes i𝑖iitalic_i and j𝑗jitalic_j and compounds the set L. For each ant, pheromones increase the probability of choosing the shorter arcs so, arcs with more pheromones are going to be more visited. After a while, most of the ants travel through the shorter or optimal route. Also, there is a pheromone evaporation rate, ρ𝜌\rhoitalic_ρ, that allows to find new better routes.

τij(1ρ)τij(i,j)Lformulae-sequencesubscript𝜏𝑖𝑗1𝜌subscript𝜏𝑖𝑗for-all𝑖𝑗𝐿\tau_{ij}\longleftarrow(1-\rho)\tau_{ij}\ \ \ \forall(i,j)\in Litalic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⟵ ( 1 - italic_ρ ) italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ∀ ( italic_i , italic_j ) ∈ italic_L (1)

Additionally, ACO includes a heuristic ηijsubscript𝜂𝑖𝑗\eta_{ij}italic_η start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, to encourage the best arcs during specific optimization processes. Taking into account the pheromones and the heuristic, the probability of an ant k𝑘kitalic_k to choose an arc (i,j)𝑖𝑗(i,j)( italic_i , italic_j ) is:

pijk=[τij]α[ηij]βl=1Ci[τij]α[ηij]βsuperscriptsubscript𝑝𝑖𝑗𝑘superscriptdelimited-[]subscript𝜏𝑖𝑗𝛼superscriptdelimited-[]subscript𝜂𝑖𝑗𝛽superscriptsubscript𝑙1subscript𝐶𝑖superscriptdelimited-[]subscript𝜏𝑖𝑗𝛼superscriptdelimited-[]subscript𝜂𝑖𝑗𝛽p_{ij}^{k}=\frac{[\tau_{ij}]^{\alpha}[\eta_{ij}]^{\beta}}{\sum\limits_{l=1}^{C% _{i}}{[\tau_{ij}]^{\alpha}[\eta_{ij}]^{\beta}}}italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT = divide start_ARG [ italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT [ italic_η start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_ARG start_ARG ∑ start_POSTSUBSCRIPT italic_l = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUPERSCRIPT [ italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_α end_POSTSUPERSCRIPT [ italic_η start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT italic_β end_POSTSUPERSCRIPT end_ARG (2)

where Cisubscript𝐶𝑖C_{i}italic_C start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is the number of available nodes from node i𝑖iitalic_i and the parameters α𝛼\alphaitalic_α and β𝛽\betaitalic_β set the relative influence of the pheromones and the heuristic.

There are different ACO algorithms [18] designed to improve the optimization aspects. Some approaches, like Ant System (AS), MAX-MIN Ant System (MMAS) [19], and Ant Colony System (ACS) have been designed to optimize in discrete spaces using graphs. Other approaches, such as the ACO for continuous domains (ACOR) can be used in continuous spaces. This work is centered on the MMAS algorithm due to the promising results obtained with UAVs [14].

Each approach has different equations to deposit pheromones. The rule used to deposit pheromones after the evaporation in MMAS is:

τijτij+Δτijbestsubscript𝜏𝑖𝑗subscript𝜏𝑖𝑗Δsuperscriptsubscript𝜏𝑖𝑗𝑏𝑒𝑠𝑡\tau_{ij}\longleftarrow\tau_{ij}+\Delta\tau_{ij}^{best}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ⟵ italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT + roman_Δ italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b italic_e italic_s italic_t end_POSTSUPERSCRIPT (3)

where Δτijbest=1/CbΔsuperscriptsubscript𝜏𝑖𝑗𝑏𝑒𝑠𝑡1subscript𝐶𝑏\Delta\tau_{ij}^{best}=1/C_{b}roman_Δ italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_b italic_e italic_s italic_t end_POSTSUPERSCRIPT = 1 / italic_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT is the pheromone applied to the arcs of the path with the minimum value for the cost function, Cbsubscript𝐶𝑏C_{b}italic_C start_POSTSUBSCRIPT italic_b end_POSTSUBSCRIPT. In each optimization iteration, a group of ants produces different paths with different costs. The pheromone update can be randomly performed using the best iteration cost or the best-so-far cost.

III RELATED WORK

In this section, an overview of the relevant topics related to this work is provided. These topics are a description of Probabilistic Optimal Search and different methods for Human-Robot Collaborative Search.

III-A PS algorithms

PS is an area that considers probabilistic maps of the environment to find a target [13]. It has been broadly applied for military purposes and in SAR to find lost people using UAVs or UGVs. In [20], a Bayesian perspective with a greedy algorithm is proposed for maritime environments. Most models have adopted this Bayesian outlook using different optimization algorithms [21, 22, 14, 23]. The optimization utility functions used by these methods are normally the ”cumulative” probability of detection or the Expected Time (ET) to find the target. Sometimes other functions as the expended energy or the collisions are also used. When ET is the main criterion, the search is a MTS problem.

III-B Human-Robot Collaborative Search

Some approaches consider HRC to find objects or people. In SAR, human-robot teams normally use these approaches to search for lost people. In [24], a web interface is used to manage the search task assignment for the teams. This interface offers autonomous partitioning to assign tasks and allows the users to change it. In [25], UAVs search for people in disaster environments with 2 systems, one is semi-autonomous and another one is totally autonomous. In the semi-autonomous system, a Human-Informed Robot Planning is performed where the human preferences can modify the plan of the robot.

Specifically, in [12], a robot and a person share a task representation through an interface in a smartphone [26] to perform the search together. The approach uses different Social Reward Sources to enable the HRI during the task. These rewards are used to construct an objective function optimized with a Monte Carlo Tree Search planner that uses Rapidly Random Trees for each agent and it works online. This approach considers only uniform probability maps and does not consider segmented areas or the ET as a criterion.

Unlike our approach, the aforementioned methods do not combine previous knowledge about where the object could be lost, individual human preferences and the ET criterion.

IV OUR APPROACH

IV-A Problem Formulation

This approach is focused on the human-robot collaborative search for a lost object outdoors. The prior information used to find the object is a segmented top-view image where approximately equidistant nodes are sampled to build a graph G𝐺Gitalic_G, considering the map obstacles. To sample the nodes, the space is divided into a grid of squares. The nodes are sampled at the centroid of the squares if there is no obstacle within 40 cm, otherwise, they are sampled at the centroid of the area not covered by the obstacle in that square or at a vertex of the square. This method ensures better exploration close to obstacle edges than uniform sampling. The segmented image is used to obtain a prior probability map p(𝐱0t)𝑝superscriptsubscript𝐱0𝑡p({\bf x}_{0}^{t})italic_p ( bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) about the target location in a 2D map, 𝐱0t=(x0t,y0t)superscriptsubscript𝐱0𝑡superscriptsubscript𝑥0𝑡superscriptsubscript𝑦0𝑡{\bf x}_{0}^{t}=(x_{0}^{t},y_{0}^{t})bold_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = ( italic_x start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_y start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ), at the step k=0𝑘0k=0italic_k = 0.

During the search process, M𝑀Mitalic_M agents are only able to move in G𝐺Gitalic_G, and a static target is considered so the probabilistic Markov model for the target is p(𝐱kt|𝐱k1t)=𝕀𝑝conditionalsuperscriptsubscript𝐱𝑘𝑡superscriptsubscript𝐱𝑘1𝑡𝕀p({\bf x}_{k}^{t}|{\bf x}_{k-1}^{t})=\mathbb{I}italic_p ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | bold_x start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = blackboard_I. At each step k𝑘kitalic_k, agents can perform an observation 𝐳ksubscript𝐳𝑘{\bf z}_{k}bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT and the probability map is updated using the Bayes’ rule and the previous observations 𝐳1:k1subscript𝐳:1𝑘1{\bf z}_{1:k-1}bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT:

p(𝐱kt|𝐳1:k)=p(𝐳k|𝐱kt)p(𝐱kt|𝐳1:k1)p(𝐳k|𝐱kt)p(𝐱kt|𝐳1:k1)𝑑𝐱kt𝑝conditionalsuperscriptsubscript𝐱𝑘𝑡subscript𝐳:1𝑘𝑝conditionalsubscript𝐳𝑘superscriptsubscript𝐱𝑘𝑡𝑝conditionalsuperscriptsubscript𝐱𝑘𝑡subscript𝐳:1𝑘1𝑝conditionalsubscript𝐳𝑘superscriptsubscript𝐱𝑘𝑡𝑝conditionalsuperscriptsubscript𝐱𝑘𝑡subscript𝐳:1𝑘1differential-dsuperscriptsubscript𝐱𝑘𝑡p({\bf x}_{k}^{t}|{\bf z}_{1:k})=\frac{p({\bf z}_{k}|{\bf x}_{k}^{t})p({\bf x}% _{k}^{t}|{\bf z}_{1:k-1})}{\int{p({\bf z}_{k}|{\bf x}_{k}^{t})p({\bf x}_{k}^{t% }|{\bf z}_{1:k-1})d{\bf x}_{k}^{t}}}italic_p ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | bold_z start_POSTSUBSCRIPT 1 : italic_k end_POSTSUBSCRIPT ) = divide start_ARG italic_p ( bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) italic_p ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ) end_ARG start_ARG ∫ italic_p ( bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) italic_p ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ) italic_d bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT end_ARG (4)

where p(𝐳k|𝐱kt)𝑝conditionalsubscript𝐳𝑘superscriptsubscript𝐱𝑘𝑡p({\bf z}_{k}|{\bf x}_{k}^{t})italic_p ( bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) is the observation model. To simplify the problem, a circular ideal sensor model is supposed. In this model, the probability of detecting an object in the step k𝑘kitalic_k that is set in a location of the free space is:

p(𝐳k=Dk|𝐱kt)=IAwH(Rwrw)𝑝subscript𝐳𝑘conditionalsubscript𝐷𝑘superscriptsubscript𝐱𝑘𝑡subscript𝐼subscript𝐴𝑤𝐻subscript𝑅𝑤subscript𝑟𝑤p({\bf z}_{k}=D_{k}|{\bf x}_{k}^{t})=I_{A_{w}}H(R_{w}-r_{w})italic_p ( bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = italic_I start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_H ( italic_R start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT ) (5)

where Dksubscript𝐷𝑘D_{k}italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is a detection in the k𝑘kitalic_k step, H𝐻Hitalic_H is the Heaviside function, Rwsubscript𝑅𝑤R_{w}italic_R start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT is the considered visibility radius of the agent w𝑤witalic_w and rwsubscript𝑟𝑤r_{w}italic_r start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT is the Euclidean distance (ED) between the w𝑤witalic_w sensor and 𝐱ktsuperscriptsubscript𝐱𝑘𝑡{\bf x}_{k}^{t}bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. IAwsubscript𝐼subscript𝐴𝑤I_{A_{w}}italic_I start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT end_POSTSUBSCRIPT is the indicator function for the agent w𝑤witalic_w in the not occluded area, Awsubscript𝐴𝑤A_{w}italic_A start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT, delimited by Rwsubscript𝑅𝑤R_{w}italic_R start_POSTSUBSCRIPT italic_w end_POSTSUBSCRIPT. The agent w𝑤witalic_w is defined as w=min{1mM|IAm(Rmrm)>0}𝑤1𝑚𝑀ketsubscript𝐼subscript𝐴𝑚subscript𝑅𝑚subscript𝑟𝑚0w=\min\{1\leq m\leq M|I_{A_{m}}(R_{m}-r_{m})>0\}italic_w = roman_min { 1 ≤ italic_m ≤ italic_M | italic_I start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) > 0 }, where rmsubscript𝑟𝑚r_{m}italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is the ED between the m𝑚mitalic_m sensor and 𝐱ktsuperscriptsubscript𝐱𝑘𝑡{\bf x}_{k}^{t}bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT.

This sensor model considers obstacles or occlusions through IAmsubscript𝐼subscript𝐴𝑚I_{A_{m}}italic_I start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT. To compute Amsubscript𝐴𝑚A_{m}italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, ray tracing is performed to not consider the occluded area and the obstacle area in Amsubscript𝐴𝑚A_{m}italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT.

When some agents are humans, other models for object detection that combine the human field of view [27] with an estimation of how humans spin their heads while searching could be considered to obtain more realistic results. This is out of this work’s scope. For this reason, a circular detection model is enough to evaluate how the approach proposed here considers human preferences in simulated and real cases.

The goal of this approach is to find the optimal paths for the agents in the graph that minimize the ET (solve the MTS problem). This expectation is defined by:

ET=k=1kpk𝐸𝑇superscriptsubscript𝑘1𝑘subscript𝑝𝑘ET=\sum_{k=1}^{\infty}kp_{k}italic_E italic_T = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∞ end_POSTSUPERSCRIPT italic_k italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT (6)

where pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is the probability to find the object in the step k𝑘kitalic_k if it has not been previously detected, 𝐳1:k1=D¯1:k1subscript𝐳:1𝑘1subscript¯𝐷:1𝑘1{\bf z}_{1:k-1}={\overline{D}}_{1:k-1}bold_z start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT = over¯ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT, in the 2D map S𝑆Sitalic_S. pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT can be computed with p~~𝑝\tilde{p}over~ start_ARG italic_p end_ARG, the unnormalized version of the probability map, as in [14]:

pk=Sp(𝐳k=Dk|𝐱kt)p~(𝐱kt|D¯1:k1)𝑑𝐱kt𝐱ktSformulae-sequencesubscript𝑝𝑘subscript𝑆𝑝subscript𝐳𝑘conditionalsubscript𝐷𝑘superscriptsubscript𝐱𝑘𝑡~𝑝conditionalsuperscriptsubscript𝐱𝑘𝑡subscript¯𝐷:1𝑘1differential-dsuperscriptsubscript𝐱𝑘𝑡for-allsuperscriptsubscript𝐱𝑘𝑡𝑆p_{k}=\int_{S}{p({\bf z}_{k}=D_{k}|{\bf x}_{k}^{t})\tilde{p}({\bf x}_{k}^{t}|{% \overline{D}}_{1:k-1})d{\bf x}_{k}^{t}}\ \ \ \forall{\bf x}_{k}^{t}\in Sitalic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∫ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_p ( bold_z start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = italic_D start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT | bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) over~ start_ARG italic_p end_ARG ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | over¯ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ) italic_d bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∀ bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ∈ italic_S (7)

The ET computation has to be limited to a finite horizon N𝑁Nitalic_N. This approximation, applied to (6), does not guarantee that an optimal path will be obtained if N𝑁Nitalic_N is not enough to reduce to zero the probability of the map. For this reason, a different way to compute the ET, deduced in [13], is used to obtain optimal paths in arbitrary horizons:

ET=k=1N[1P(tk)]Δt𝐸𝑇superscriptsubscript𝑘1𝑁delimited-[]1𝑃𝑡𝑘Δ𝑡ET=\sum_{k=1}^{N}[1-P(t\leq k)]\Delta titalic_E italic_T = ∑ start_POSTSUBSCRIPT italic_k = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT [ 1 - italic_P ( italic_t ≤ italic_k ) ] roman_Δ italic_t (8)

where P(tk)=t=1kpt𝑃𝑡𝑘superscriptsubscript𝑡1𝑘subscript𝑝𝑡P(t\leq k)=\sum_{t=1}^{k}p_{t}italic_P ( italic_t ≤ italic_k ) = ∑ start_POSTSUBSCRIPT italic_t = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k end_POSTSUPERSCRIPT italic_p start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is the cumulative probability to find the object during the steps tk𝑡𝑘t\leq kitalic_t ≤ italic_k and ΔtΔ𝑡\Delta troman_Δ italic_t is the time between steps, that is considered to be 1 in this formulation.

IV-B System Overview

Refer to caption
Figure 2: System-Overview. The Segmented Map is used to predict the Probability Map and generate the restricted areas. These elements combined with the preferred areas are used to obtain the optimal paths with the Sub-Prior MTS ACO in a common representation for the IVO robot and Humans.

The whole centralized model is shown in Fig. 2. There are 2 main blocks explained in the next subsections: the Probabilistic Map Predictor and the Sub-Prior MTS-ACO algorithm.

The HRC is defined in this system through the next steps:

  • Communication of preferred areas: The human provides the robot with the preferred areas.

  • The robot provides the search plans: The robot provides the plans that consider the preferred areas.

  • Confirmation: The human can agree and confirm to start the search or return to the first step to get another plan.

IV-C Probabilistic Map Predictor

The model used to perform the prediction is a CNN with dense blocks designed in [28] to predict occupancy grids. As the first step, different patches, 𝐗𝐩subscript𝐗𝐩{\bf X_{p}}bold_X start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT are obtained in a segmented bird’s-eye view image, Issubscript𝐼𝑠I_{s}italic_I start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, with 14 representative semantic classes in urban environments. During the training process, the CNN uses as ground truth a probability map pGT(𝐱𝟎𝐭)subscript𝑝𝐺𝑇superscriptsubscript𝐱0𝐭p_{GT}(\bf x_{0}^{t})italic_p start_POSTSUBSCRIPT italic_G italic_T end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ) associated with a segmented area and takes the patches as inputs:

𝐘p=CNN(𝐗𝐩)subscript𝐘𝑝𝐶𝑁𝑁subscript𝐗𝐩{\bf Y}_{p}=CNN({\bf X_{p}})bold_Y start_POSTSUBSCRIPT italic_p end_POSTSUBSCRIPT = italic_C italic_N italic_N ( bold_X start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT ) (9)

The output layer provides patches, 𝐘𝐩subscript𝐘𝐩\bf Y_{p}bold_Y start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT, where each pixel is the output of a spatial softmax function and represents the probability of finding a lost object. Then, the probability map, p(𝐱𝟎𝐭)𝑝superscriptsubscript𝐱0𝐭p(\bf x_{0}^{t})italic_p ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ), is reconstructed using the patches, considering the average of the intersection.

Refer to caption
Figure 3: Labelling Interface for Users. Participants select the areas by marking the vertices of a polygon with the computer mouse over the segmented image until the polygon is closed. The semantic classes are shown in the image.

To obtain a dataset to train the model, an interface using the Tkinter Python library was designed to allow 16 people to select areas in the interface for 22 segmented images where they would look for the object (refer to Fig. 3). Since the number of images is not very high, data augmentation is carried out to obtain several patches that allow to train the model and to obtain a low error in validation and testing.

The target considered is an object the size of a smartphone. This specification is given to the participants at the beginning and conditions the areas marked by the participants to this type of object. Participants are encouraged to select only the first areas in which they would search. They are not asked to select a specific number of areas or to do so in any order of preference. Nor are they limited in the size of the areas they select. During the process, they can also see in the interface the real top-view image corresponding to the segmented image and the 14 classes. The average of the selected areas for each map is normalized and used as pGT(𝐱𝟎𝐭)subscript𝑝𝐺𝑇superscriptsubscript𝐱0𝐭p_{GT}(\bf x_{0}^{t})italic_p start_POSTSUBSCRIPT italic_G italic_T end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ).

IV-D Sub-prior MTS-ACO

This model takes p(𝐱𝟎𝐭)𝑝superscriptsubscript𝐱0𝐭p({\bf x_{0}^{t}})italic_p ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ) and the segmented map as inputs to generate the agents’ optimal paths. As distinct from [14], restricted areas are considered for each agent depending on their traversability limitations. To consider these areas, a different spatial graph for each agent is built. These graphs are constructed from the original graph (the one without restricted areas) by removing the nodes that are in the restricted areas for each agent.

Another difference from ACO algorithms, like the one in [14], is that to consider individual preferences in the search process without losing the common objective, here it is considered a different probability map for each agent m𝑚mitalic_m. This map is called sub-prior, pm(𝐱𝟎𝐭)subscript𝑝𝑚superscriptsubscript𝐱0𝐭p_{m}({\bf x_{0}^{t}})italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ). All the sub-priors are normalized to 1/M1𝑀1/M1 / italic_M where M𝑀Mitalic_M is the number of agents. Then, the global probability map or prior distribution is the sum of the sub-priors.

Refer to caption
Figure 4: Gaussian sub-priors. The left image shows a probability map of 2 gaussian functions in blue. The white lines inside the gaussians are obstacles where the probability is zero. The right images are the 2 Gaussian functions separately as sub-priors to allow the distribution of the search task between 2 agents.

In Fig. 4 an example of sub-priors is presented. In this case, a prior probability map with 2 2-D Gaussian functions is divided between 2 agents into 2 sub-priors. The criteria used to perform this division depends on the negotiation process between the agents in the case of multi-agent systems with humans.

In contrast to the original ACO, as a consequence of having a different graph and a different probability map for each agent, this approach considers a different pheromone matrix for each agent, τijmsuperscriptsubscript𝜏𝑖𝑗𝑚\tau_{ij}^{m}italic_τ start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT. This leads to the probabilities, pijkmsuperscriptsubscript𝑝𝑖𝑗𝑘𝑚p_{ij}^{km}italic_p start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_k italic_m end_POSTSUPERSCRIPT, calculated with (2), depending on the agent m𝑚mitalic_m. A condition is imposed in the optimization process to encourage paths of similar length in a not very restrictive way. In each step, an ant k𝑘kitalic_k has to choose between two options: The first one is to choose a node for the agent with the shortest path. The second option is to choose a node for another random agent. The probability of choosing the first option is higher than the second one. This condition is imposed because, in the optimization process, it is supposed all the search agents have the same dynamical and sensorial features.

Optionally, a heuristic matrix can be considered for each agent, ηijmsuperscriptsubscript𝜂𝑖𝑗𝑚\eta_{ij}^{m}italic_η start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT, in cases where the heuristic depends on the agent. For example, it occurs when the MTS heuristic proposed in [14] is used for this approach.

When an ant generates the agents paths, pksubscript𝑝𝑘p_{k}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT is computed with the sub-priors for each agent, pm(𝐱kt|D¯1:k1)subscript𝑝𝑚conditionalsuperscriptsubscript𝐱𝑘𝑡subscript¯𝐷:1𝑘1p_{m}({\bf x}_{k}^{t}|{\overline{D}}_{1:k-1})italic_p start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | over¯ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ):

pk=m=1MSIAmH(Rmrm)p~m(𝐱kt|D¯1:k1)𝑑𝐱ktsubscript𝑝𝑘superscriptsubscript𝑚1𝑀subscript𝑆subscript𝐼subscript𝐴𝑚𝐻subscript𝑅𝑚subscript𝑟𝑚subscript~𝑝𝑚conditionalsuperscriptsubscript𝐱𝑘𝑡subscript¯𝐷:1𝑘1differential-dsuperscriptsubscript𝐱𝑘𝑡p_{k}=\sum_{m=1}^{M}\int_{S}{I_{A_{m}}H(R_{m}-r_{m})\tilde{p}_{m}({\bf x}_{k}^% {t}|{\overline{D}}_{1:k-1})d{\bf x}_{k}^{t}}italic_p start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_m = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_M end_POSTSUPERSCRIPT ∫ start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT italic_I start_POSTSUBSCRIPT italic_A start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT italic_H ( italic_R start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_r start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ) over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT | over¯ start_ARG italic_D end_ARG start_POSTSUBSCRIPT 1 : italic_k - 1 end_POSTSUBSCRIPT ) italic_d bold_x start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT (10)

The ET computed in this way, with the unnormalized sub-priors, p~msubscript~𝑝𝑚\tilde{p}_{m}over~ start_ARG italic_p end_ARG start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, can be called Expected Sub-prior Time (EST) and it is normally different from the one computed in (8), leading to different optimal paths in the optimization.

Refer to caption
Figure 5: Sub-prior MTS-ACO planning of 2 agents exchanging sub-priors. This figure presents the search plans for 2 agents (red and blue) in 2 different cases for the same map exchanging their sub-priors.

Intuitively, the sub-priors represent areas that guide the formation of the agents’ paths. The EST represents an ET constrained to the condition each agent visits his or her sub-prior.

In Fig. 5, an example of how the sub-priors guide the generation of the paths is shown. In that figure, different search plans computed using the Sub-prior MTS-ACO (concretely, using the MMAS algorithm) are represented for 2 agents in the same map. When the upper-right gaussian is assigned to the red agent, the Sub-prior MTS-ACO generates a red path close to that area. When the central gaussian is assigned to the red agent, the red path is around that area and the blue path is around the upper-right gaussian.

The Sub-prior MTS-ACO has a computational complexity of O(IC2)𝑂𝐼superscript𝐶2O(IC^{2})italic_O ( italic_I italic_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ). The cost depends on the number of iterations, I𝐼Iitalic_I, and the number of nodes, C𝐶Citalic_C. The number of agents only affects the EST𝐸𝑆𝑇ESTitalic_E italic_S italic_T computation because the number of sub-priors that are updated is the same as the number of agents. For the same search map, more agents don’t add a significant computational cost. On the other hand, the size of the map increases a lot the number of nodes needed and supposes an important limitation in the Computation Time (CT).

Although the algorithm supposes a static environment it can be used in environments with people or other relatively small elements that don’t cause significant occlusions if the agents have capabilities for obstacle avoidance.

V SIMULATIONS AND REAL-LIFE EXPERIMENTS

Refer to caption
(a) M1 Map 40x40 m
Refer to caption
(b) M2 Map 40x40 m
Refer to caption
(c) Uniform map M2
Refer to caption
(d) Gaussian map M2
Figure 6: Simulation maps. M1 and M2 are shown respectively in the top images from left to right. The bottom maps are the uniform and gaussian probability maps of M2. The agents’ paths are in blue and red.

V-A Sub-prior MTS-ACO Results

The Sub-prior MTS-ACO model has been validated through simulations in different maps using the MMAS algorithm and 2 heuristic functions: The one used to solve the TSP problem, ηTSP=1/dijsubscript𝜂𝑇𝑆𝑃1subscript𝑑𝑖𝑗\eta_{TSP}=1/d_{ij}italic_η start_POSTSUBSCRIPT italic_T italic_S italic_P end_POSTSUBSCRIPT = 1 / italic_d start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT, that depends on the distance between nodes and the one used in [14], ηMTSsubscript𝜂𝑀𝑇𝑆\eta_{MTS}italic_η start_POSTSUBSCRIPT italic_M italic_T italic_S end_POSTSUBSCRIPT, that considers regions of probability in different directions. The comparison between heuristics has been performed in 2 different maps, M1 and M2 (refer to Fig. 6). The metrics used to compare the heuristics are the ET, the CT, and the Path Distance (PD). The comparison is also performed considering sub-priors and non-sub-priors.

TABLE I: ET, path distance (PD) and computation time (CT) obtained in 400 generations of 10 ants using the 2 heuristics in M1 and M2 with the uniform (U) and the gaussian (G) probability maps adding sub-priors (S).
Map ET(s) CT(s) PD(m)
TSP - MTS TSP - MTS TSP - MTS
M1/U 80.98 - 94.60 44 - 168 371.42 - 839.45
M1/U/S 84.14 - 116.95 47 - 184 361.05 - 730.43
M1/G 39.33 - 37.01 44 - 156 419.67 - 785.91
M1/G/S 38.77 - 35.15 48 - 184 416.46 - 828.24
M2/U 142.23 - 165.32 116 - 460 673.88 - 1202.69
M2/U/S 155.47 - 237.15 123 - 560 685 - 1129.86
M2/G 75.04 - 92.53 112 - 452 621.16 - 1098.57
M2/G/S 80.50 - 86.32 136 - 544 655.41 - 1176.21

The mean results of 10 trials (see Table I) show that the TSP heuristic outperforms the MTS heuristic in uniform probability maps in terms of ET. The MTS heuristic only gives slightly better results in the case where the probability is concentrated in 2 gaussians and the map has no obstacles (M1). In all the cases, the CT is much lower for the TSP heuristic because it is not necessary to compute the heuristic for each optimization step and the PD is also lower. For these reasons, the TSP heuristic is chosen to perform real-life experiments. The results in CT are shown without parallelization of the algorithm.

The sub-priors in the uniform probability map are p(𝐱𝟎𝐭)/2𝑝superscriptsubscript𝐱0𝐭2p({\bf x_{0}^{t}})/2italic_p ( bold_x start_POSTSUBSCRIPT bold_0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT bold_t end_POSTSUPERSCRIPT ) / 2. In the gaussian probability map, each sub-prior is a 2D gaussian assigned to the closest agent. When the sub-priors are used, the results show a small increase in ET and CT. In the case of the 2 gaussians in M1, the ET is lower because the gaussians are associated with the closest agent. If the agents’ sub-priors are exchanged the ET increases a lot. These cases are sub-optimal compared to use only the MMAS due to the optimal is conditioned and the optimal EST is not exactly the optimal ET. Nevertheless, the sub-priors enable to take humans preferences into account without affect too much the performance.

V-B Real-life Experiments

To check the feasibility of this approach in a real system and the ability to manage human preferences in a searching task, real-life outdoor experiments have been performed. The experiment is a searching task between our humanoid IVO robot and a person where they have to find a figure made up of 3 Parcheesi tiles in the ground, as in [12]. IVO is an urban land-based robot designed to interact with humans in tasks that involve object manipulation and navigation. To navigate, IVO uses four omnidirectional wheels, a 3D-LiDAR, 2 2D-LiDAR, and a RealSense stereo camera to detect holes and ramps. IVO also possesses a touchable screen to communicate with people.

To enable communication between IVO and the person during the search process, a web-based visualization tool for Robot Operating System (ROS) called Vizanti 111Vizanti documentation: http://wiki.ros.org/vizanti is used. This visualization is a user-friendly version of RVIZ that can be opened in a browser so, it can be used in multiple devices connected to the same local network.

Using Vizanti and the Sub-prior MTS-ACO algorithm, a ROS implementation has been built with 3 main nodes:

  • Search Planner: This node computes an ordered node list using the MMAS and the sub-priors.

  • Goal Sequencer: This node takes the node lists to generate waypoints as successive goals used for the ROS Navigation Stack.

  • Aco Gui Manager: This node launches the Vizanti interface and communicates with the search planner node to send the user preferences, draw the paths, and manage the experiment logic.

The experiment takes around 25-30 min and consists of an initial explanation about the task and the Vizanti interface and 2 search phases in the same map:

  • First Search: Here the MMAS without sub-priors is used to provide the agents’ paths. The ’init’ button is pressed in the interface to see the paths and ’start’ is pressed to begin the search with the robot. The robot shows the same interface on its screen, so the buttons can also be pushed in the robot. The participant has to follow the green path while searching for the object. The participant can also see in the interface the red robot path, the robot position (red point), and his position (green point). When the participant finds the object has to press ’object found’ to communicate it and the search finishes. If the robot finds the object or the object is not found before the robot has finished its path, a message appears to indicate the end of the search.

  • Second Search: In this case, before the search, the participant has to press the button with a square and draw rectangles over the ROS map in the Vizanti interface. The rectangles represent the preferred areas where the person wants to search. After drawing the areas, the participant has to press ’replan’ and wait until the paths computed using the MMAS with sub-priors appear. Then, after pressing ’start’ the search begins.

Before starting the experiment, the task is explained and a sheet is provided with the segmented map and the semantic classes. Between the two rounds, the participant has to fill out the first part of a questionnaire and finish it at the end.

It is important to remark that the order of the search phases is the same for all participants to allow them to familiarize themselves with the interface at a basic level on the first search before using it to give their preferences. This order could induce a bias in the participants’ perception that could affect the questionnaire answers. More experiments in future work are required to test whether such bias exists.

The experiment has been performed with 20 participants under the approval of the ethics committee of the Universitat Politècnica de Catalunya (UPC) 222Committee website: https://comite-etica.upc.edu/en. The volunteers are of legal age and in full use of their mental faculties. At the beginning of the experiment, they signed an informed consent form after having received the relevant information regarding the experiment. Additionally, they have accepted that all the information collected during the experiments will be treated anonymously for academic purposes.

To perform the experiment, a covered outdoor area of 21 x 27 m𝑚mitalic_m inside the Barcelona Robot Lab has been considered. The search area is the left part of the map shown in Fig. 1. The robot does not use a sensor to detect the object and the person is not detected because the perception systems are not in the scope of this article. For this reason, the object position is provided to ROS and the robot finds the object when the distance to the object is closer than 2.5 m𝑚mitalic_m. The person’s position during the searching task is marked by hand in Vizanti by a third person. A video with explanations about the experiment and some examples has been developed 333Experiment example: https://youtu.be/b0J57hXV7ic.

TABLE II: Average values and standard deviation in real experiments for different metrics to evaluate differences between the two search phases.
Metric 1st search 2nd search
ET (𝐬𝐬\bf sbold_s) 15.78 ±plus-or-minus\pm± 0.00 20.94 ±plus-or-minus\pm± 6.56
RST (𝐬𝐬\bf sbold_s) 69.05 ±plus-or-minus\pm± 52.40 86.35 ±plus-or-minus\pm± 47.90
%percent\%% Robot finds 35 15
%percent\%% Person finds 50 75
%percent\%% Not found 15 10
𝐯¯𝐫subscript¯𝐯𝐫{\bf\overline{v}_{r}}over¯ start_ARG bold_v end_ARG start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT (𝐦/𝐬𝐦𝐬\bf m/sbold_m / bold_s) 0.35 ±plus-or-minus\pm± 0.02 0.33 ±plus-or-minus\pm± 0.03
𝐯¯𝐩subscript¯𝐯𝐩{\bf\overline{v}_{p}}over¯ start_ARG bold_v end_ARG start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT (𝐦/𝐬𝐦𝐬\bf m/sbold_m / bold_s) 0.44 ±plus-or-minus\pm± 0.15 0.51 ±plus-or-minus\pm± 0.14
DD (𝐦𝐦\bf mbold_m) 0.65 ±plus-or-minus\pm± 0.76 0.46 ±plus-or-minus\pm± 0.26
%percent\%% CA 39.95 ±plus-or-minus\pm± 25.47 70.52 ±plus-or-minus\pm± 21.51

The results of the experiments are shown in Table II. The first metric used is the ET of the paths shown in the interface. The Real Search Time (RST) is the average time expended until the object is found or the timeout is achieved. %percent\%% Robot Finds (%percent\%%RF) and %percent\%% Person Finds (%percent\%%PF) are respectively the percent of times the robot and the person find the object. %percent\%% Not Found (%percent\%%NF) is the percent of times the object is not found. 𝐯¯𝐫subscript¯𝐯𝐫{\bf\overline{v}_{r}}over¯ start_ARG bold_v end_ARG start_POSTSUBSCRIPT bold_r end_POSTSUBSCRIPT and 𝐯¯𝐩subscript¯𝐯𝐩{\bf\overline{v}_{p}}over¯ start_ARG bold_v end_ARG start_POSTSUBSCRIPT bold_p end_POSTSUBSCRIPT are respectively the average velocities of the robot and the person during the search task. The Divergence Distance (DD) is the average minimum distance between the plan shown in the interface and the agents’ real position. The DD measures how accurately the person follows the plan displayed on the interface during the search. %percent\%% Considered Areas (%percent\%%CA) is the average percent of the plan shown in the interface that is inside the preferred areas selected by the participants. The %percent\%%CA indicates the extent to which the individual’s preferences are taken into account in the plans displayed in the interface.

When preferred areas are provided, the ET increases because the plan is less optimal concerning the first search, and the %percent\%% CA also increases because the new plans consider human preferences. The RST is much longer than the ET because equal velocities had been considered for the agents in the optimization process with a value of 0.5 m/s𝑚𝑠m/sitalic_m / italic_s and the ideal sensor given by (5) has been used for the agents. The results show that these assumptions are not fulfilled in this real scenario. IVO’s velocity is lower than the participant’s velocity for security reasons and the ideal sensor model is not a good approximation for people in real scenarios. The reduction in the DD during the second search can be explained by people’s increased experience in locating themselves at the interface.

V-C User’s Study

A User’s Study has been performed to test the next hypotheses:

  • H1 - ”Participants’ perception of IVO changes in the second search with respect to the first one.”

  • H2 - ”The HRC in the planning process to obtain the search paths improves the participants’ search experience.”

Both hypotheses are conditioned to the fact that the order of the search phases is the same for all participants. Experiments where the order of the search is randomly taken could produce different results.

To obtain the participants’ information and test the hypotheses a questionnaire in Spanish and English has been presented with 5 sections:

  • Demographic Data: In this section, the participant’s name, academic level and age are taken. After the experiment, the data is anonymized. The average age of participants was 28,19 years old with a standard deviation of 4.35 years.

  • IVO perception after the first search: This section evaluates the robot’s perception of the participant after the first search. To evaluate the perception, questions from [29] and [30] have been taken in a 7-point Likert scale to evaluate 4 attributes: Warmth, Competence, Discomfort and Anthropomorphism.

  • IVO perception after the second search: This section evaluates the robot’s perception of the participant after the second search. The questions are the same as in the previous section.

  • Interface perception: The fourth section evaluates the interface perception using the System Usability Scale (SUS) [31].

  • Preferred method: At the end of the questionnaire, there is a last question to select one of the 2 search methods as the preferred one.

Refer to caption
Figure 7: Participants’ perception of IVO. The mean attribute values appear in green for the first search and brown for the second search. The error bars indicate the standard deviation, and p is the p-value of the tests.

To test the hypotheses, the second and third section results are compared. The average Cronbach’s alpha obtained for the attributes Warmth, Competence, Discomfort and Anthropomorphism in these sections is respectively α=(0.81,0.61,0.60,0.64)𝛼0.810.610.600.64{\bf\alpha}=(0.81,0.61,0.60,0.64)italic_α = ( 0.81 , 0.61 , 0.60 , 0.64 ). The average values of Warmth, Competence and Anthropomorphism are compared using a Paired Sample T-test. The Discomfort is the only attribute that has not passed the Shapiro-Wilk test so to compare the average Discomfort a Wilcoxon test is used. The test results are summarized in Fig. 7.

H1 is fulfilled for Warmth with a p-value, p<0.1𝑝0.1p<0.1italic_p < 0.1, and Competence with p<0.05𝑝0.05p<0.05italic_p < 0.05 and, in both cases, the second search shows a more positive participants’ perception of IVO. On the other hand, for Discomfort and Anthropomorphism, the results show a very similar perception in the 2 cases.

H2 is fulfilled because of the increase in Warmth and Competence in the second search. The results in the last section of the questionnaire also support it. The 85.7 %percent\%% of participants have chosen as the preferred method the one where they select the preferred areas.

The results also show that participants do not perceive a high Warmth or Anthropomorphism. These results are consistent with the fact that the person who searches does not interact too much with the robot. Most of the interaction is performed using the tablet interface. However, participants consider that the robot is very competent and the Discomfort is very low. This may occur due to the near-zero path overlapping, which prevents agents from getting in each other’s way.

The interface perception has been very positive. The obtained result, 82.06, indicates good overall usability. This is consistent with the fact that the participants interact more with the interface than with IVO.

V-D Implementation Details

The MMAS parameters considered are α=1𝛼1\alpha=1italic_α = 1, β=6𝛽6\beta=6italic_β = 6 and ρ=0.002𝜌0.002\rho=0.002italic_ρ = 0.002. The graph used is a grid with a 7x7 neighborhood for each cell. In the experiments’ first search, plans are pre-calculated with 1200 iterations. In the second search, a parallelized optimization is performed with 10 ants across 300 iterations that take around 80s80𝑠80\ s80 italic_s. The visibility radius is 2.5m2.5𝑚2.5\ m2.5 italic_m for all agents, the grid distance is 3.5m3.5𝑚3.5\ m3.5 italic_m and the maximum probability not covered by the plans is 0.014.

To generate probability maps, the CNN-31 model [28] has been trained with the original hyper-parameters and a batch size of 32. The inputs are 14 64x64 pixel patches, one for each semantic class. The output is a 64x64 patch with the probabilities. The dataset consists of 22 segmented images close to the Barcelona Robot Lab. 15 images to train, 4 images for validation and 3 images for testing. Data augmentation is performed through 90osuperscript90𝑜90^{o}90 start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT rotations to obtain 6584 patches for training. The Mean Square Error (MSE) for training, validation and testing is respectively: 0.017, 0.021 and 0.013. The probability map used for the experiments is one of the test maps.

VI CONCLUSIONS

The sub-prior MTS-ACO has been presented as a feasible solution to incorporate human preferences for the search task in real experiments with a robot and a person. The results show that the algorithm can leverage the prior knowledge with the particular preferences of other agents to create optimal plans. Moreover, an interface to obtain a dataset that allows learning probability maps of the search area using a basic segmented map has been proposed. A small obtained dataset has been enough to train a CNN with a low MSE. As a third contribution, a Vizanti interface has been presented to enable HRC during the search task.

References

  • [1] M. Ienca, F. Jotterand, C. Vică, and B. Elger, “Social and assistive robotics in dementia care: ethical recommendations for research and practice,” International Journal of Social Robotics, vol. 8, pp. 565–573, 2016.
  • [2] F. Tanaka, T. Takahashi, S. Matsuzoe, N. Tazawa, and M. Morita, “Telepresence robot helps children in communicating with teachers who speak a different language,” in Proceedings of the 2014 ACM/IEEE International Conference on Human-Robot Interaction, ser. HRI ’14.   New York, NY, USA: Association for Computing Machinery, 2014, p. 399–406.
  • [3] A. Ajoudani, A. M. Zanchettin, S. Ivaldi, A. Albu-Schäffer, K. Kosuge, and O. Khatib, “Progress and prospects of the human-robot collaboration,” Autonomous Robots, vol. 42, no. 5, pp. 957–975, 2018.
  • [4] P. Singamaneni, P. Bachiller-Burgos, M. L.J., A. Garrell, A. Sanfeliu, A. Spalanzani, and R. Alami, “A survey on socially aware robot navigation: Taxonomy and future challenges,” International Journal Robotics Research, 2024.
  • [5] E. Repiso, A. Garrell, and A. Sanfeliu, “People’s adaptive side-by-side model evolved to accompany groups of people by social robots,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 2387–2394, 2020.
  • [6] J. Laplaza, A. Garrell, F. Moreno-Noguer, and A. Sanfeliu, “Context and intention for 3d human motion prediction: Experimentation and user study in handover tasks,” in 2022 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2022, pp. 630–635.
  • [7] J. E. Dominguez, N. Rodriguez, and A. Sanfeliu, “Perception-intention-action cycle in human-robot collaborative tasks: The collaborative lightweight object transportation use-case,” International Journal of Social Robotics, pp. 1–30, 2024.
  • [8] J. P. Queralta, J. Taipalmaa, B. Can Pullinen, V. K. Sarker, T. Nguyen Gia, H. Tenhunen, M. Gabbouj, J. Raitoharju, and T. Westerlund, “Collaborative multi-robot search and rescue: Planning, coordination, perception, and active vision,” IEEE Access, vol. 8, pp. 191 617–191 643, 2020.
  • [9] V. Yordanova and B. Gips, “Coverage path planning with track spacing adaptation for autonomous underwater vehicles,” IEEE Robotics and Automation Letters, vol. 5, no. 3, pp. 4774–4780, 2020.
  • [10] E. Yanmaz, “Joint or decoupled optimization: Multi-uav path planning for search and rescue,” Ad Hoc Networks, vol. 138, p. 103018, 2023.
  • [11] Y.-J. Zheng, Y.-C. Du, W.-G. Sheng, and H.-F. Ling, “Collaborative human–uav search and rescue for missing tourists in nature reserves,” INFORMS Journal on Applied Analytics, vol. 49, no. 5, pp. 371–383, 2019.
  • [12] M. Dalmasso, J. E. Domínguez-Vidal, I. J. Torres-Rodríguez, P. Jiménez, A. Garrell, and A. Sanfeliu, “Shared task representation for human–robot collaborative navigation: The collaborative search case,” International Journal of Social Robotics, pp. 1–27, 2023.
  • [13] L. D. Stone, Theory of optimal search.   Elsevier, 1976.
  • [14] S. Perez-Carabaza, E. Besada-Portas, J. A. Lopez-Orozco, and J. M. de la Cruz, “Ant colony optimization for multi-uav minimum time search in uncertain domains,” APPLIED SOFT COMPUTING, vol. 62, pp. 789–806, 2018.
  • [15] M. Chen, S. Nikolaidis, H. Soh, D. Hsu, and S. Srinivasa, “Planning with trust for human-robot collaboration,” in Proceedings of the 2018 ACM/IEEE international conference on human-robot interaction, 2018, pp. 307–315.
  • [16] M. T. Shaikh and M. A. Goodrich, “A measure to match robot plans to human intent: A case study in multi-objective human-robot path-planning,” in 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).   IEEE, 2020, pp. 1033–1040.
  • [17] M. Dorigo and G. Di Caro, “Ant colony optimization: a new meta-heuristic,” in Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), vol. 2, 1999, pp. 1470–1477 Vol. 2.
  • [18] M. Dorigo and T. Stützle, Ant Colony Optimization.   The MIT Press, 06 2004. [Online]. Available: https://doi.org/10.7551/mitpress/1290.001.0001
  • [19] T. Stützle and H. H. Hoos, “Max–min ant system,” Future generation computer systems, vol. 16, no. 8, pp. 889–914, 2000.
  • [20] F. Bourgault, T. Furukawa, and H. F. Durrant-Whyte, “Optimal search for a lost target in a bayesian world,” Field and Service Robotics: Recent Advances in Reserch and Applications, pp. 209–222, 2006.
  • [21] P. Lanillos, E. Besada-Portas, G. Pajares, and J. J. Ruz, “Minimum time search for lost targets using cross entropy optimization,” in 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012, pp. 602–609.
  • [22] S. Perez-Carabaza, E. Besada-Portas, J. A. Lopez-Orozco, and J. M. de la Cruz, “A real world multi-uav evolutionary planner for minimum time target detection,” in Proceedings of the Genetic and Evolutionary Computation Conference 2016, ser. GECCO ’16.   New York, NY, USA: Association for Computing Machinery, 2016, p. 981–988.
  • [23] S. Perez-Carabaza, J. Bermudez-Ortega, E. Besada-Portas, J. A. Lopez-Orozco, and J. M. de la Cruz, “A multi-uav minimum time search planner based on acor,” in Proceedings of the Genetic and Evolutionary Computation Conference, ser. GECCO ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 35–42.
  • [24] R. Williams, “Collaborative multi-robot multi-human teams in search and rescue.” in Proceedings of the International ISCRAM Conference, vol. 17, 2020.
  • [25] S. Papaioannou, P. S. Kolios, C. G. Panayiotou, and M. M. Polycarpou, “Synergising human-like responses and machine intelligence for planning in disaster response,” ArXiv, vol. abs/2404.09877, 2024.
  • [26] J. E. Domínguez-Vidal, I. J. Torres-Rodríguez, A. Garrell, and A. Sanfeliu, “User-friendly smartphone interface to share knowledge in human-robot collaborative search tasks,” in 2021 30th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2021, pp. 913–918.
  • [27] R. Vonthein, S. Rauscher, J. Paetzold, K. Nowomiejska, E. Krapp, A. Hermann, B. Sadowski, C. Chaumette, J. M. Wild, and U. Schiefer, “The normal age-corrected and reaction time–corrected isopter derived by semi-automated kinetic perimetry,” Ophthalmology, vol. 114, no. 6, pp. 1065–1072.e2, 2007.
  • [28] J. Doellinger, M. Spies, and W. Burgard, “Predicting occupancy distributions of walking humans with convolutional neural networks,” IEEE Robotics and Automation Letters, vol. 3, no. 3, pp. 1522–1528, 2018.
  • [29] C. Bartneck, D. Kulić, E. Croft, and S. Zoghbi, “Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots,” International journal of social robotics, vol. 1, pp. 71–81, 2009.
  • [30] C. M. Carpinella, A. B. Wyman, M. A. Perez, and S. J. Stroessner, “The robotic social attributes scale (rosas): Development and validation,” 2017 12th ACM/IEEE International Conference on Human-Robot Interaction (HRI, pp. 254–262, 2017.
  • [31] J. Brooke et al., “Sus-a quick and dirty usability scale,” Usability evaluation in industry, vol. 189, no. 194, pp. 4–7, 1996.