Robotics

New submissions
Cross-lists
Replacements

See recent articles

Total of 43 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2407.17502 [pdf, other]: Title: Meta-Reinforcement Learning for Universal Quadrupedal Locomotion Control

Fabrizio Di Giuro, Fatemeh Zargarbashi, Jin Cheng, Dongho Kang, Bhavya Sukhija, Stelian Coros

Comments: The supplementary video is available at this https URL

Subjects: Robotics (cs.RO)

This work presents a deep reinforcement learning-based approach to develop a policy for robot-agnostic locomotion control. Our method involves training an agent equipped with memory, implemented as a recurrent policy, on a diverse set of procedurally generated quadruped robots. We demonstrate that the policies trained by our framework transfer seamlessly to both simulated and real-world quadrupeds not encountered during training, maintaining high-quality motion across platforms. Through a series of simulation and hardware experiments, we highlight the critical role of the recurrent unit in enabling generalization, rapid adaptation to changes in the robot's dynamic properties, and sample efficiency.
[2] arXiv:2407.17515 [pdf, other]: Title: Quality Diversity for Robot Learning: Limitations and Future Directions

Sumeet Batra, Bryon Tjanaka, Stefanos Nikolaidis, Gaurav Sukhatme

Comments: Accepted to GECCO 2024

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Quality Diversity (QD) has shown great success in discovering high-performing, diverse policies for robot skill learning. While current benchmarks have led to the development of powerful QD methods, we argue that new paradigms must be developed to facilitate open-ended search and generalizability. In particular, many methods focus on learning diverse agents that each move to a different xy position in MAP-Elites-style bounded archives. Here, we show that such tasks can be accomplished with a single, goal-conditioned policy paired with a classical planner, achieving O(1) space complexity w.r.t. the number of policies and generalization to task variants. We hypothesize that this approach is successful because it extracts task-invariant structural knowledge by modeling a relational graph between adjacent cells in the archive. We motivate this view with emerging evidence from computational neuroscience and explore connections between QD and models of cognitive maps in human and other animal brains. We conclude with a discussion exploring the relationships between QD and cognitive maps, and propose future research directions inspired by cognitive maps towards future generalizable algorithms capable of truly open-ended search.
[3] arXiv:2407.17516 [pdf, other]: Title: Amplifying the Kinematics of Origami Mechanisms With Spring Joints

Malcolm Smith

Comments: 14 pages, 10 figures

Subjects: Robotics (cs.RO); Algebraic Geometry (math.AG)

Due to its rigid foldability and predictable kinematics, the reverse fold is the fundamental mechanism behind some of the most well known origami kinematic structures, including the Miura Ori, Yoshimura, and waterbomb patterns. However, the reverse fold only has one parameter to control its behavior: the starting fold angle. In this paper I introduce an alternative to the traditional reverse fold, based on the spring into action pattern, called the spring joint. This novel rigidly foldable mechanism is able to couple multiple reverse folds into a compact space to amplify the kinematic output of a traditional reverse fold by up to ten times, and to add one parameter for each reverse fold, giving more programmatic control of origami structures. Methods of parameterizing both the starting angle, the path of travel, and the axis of motion are also introduced. Unfortunately, this versatility comes at the cost of a large buildup of layers, making the spring joint impractical for thick origami mechanisms. To solve this problem, I also introduce a modular alternative to the spring joint that has no additional layers, with the same kinematic properties. Both of these mechanisms are tested as replacements for the reverse fold in both traditional and custom origami structures.
[4] arXiv:2407.17617 [pdf, other]: Title: Adaptive Robot Detumbling of a Non-Rigid Satellite

Longsen Gao, Claus Danielson, Rafael Fierro

Comments: This paper has been accepted by the 63rd IEEE Conference on Decision and Control(CDC2024) as a regular paper

Subjects: Robotics (cs.RO)

The challenge of satellite stabilization, particularly those with uncertain flexible dynamics, has become a pressing concern in control and robotics. These uncertainties, especially the dynamics of a third-party client satellite, significantly complicate the stabilization task. This paper introduces a novel adaptive detumbling method to handle non-rigid satellites with unknown motion dynamics (translation and rotation). The distinctive feature of our approach is that we model the non-rigid tumbling satellite as a two-link serial chain with unknown stiffness and damping in contrast to previous detumbling research works which consider the satellite a rigid body. We develop a novel adaptive robotics approach to detumble the satellite by using two space tugs as servicer despite the uncertain dynamics in the post-capture case. Notably, the stiffness properties and other physical parameters, including the mass and inertia of the two links, remain unknown to the servicer. Our proposed method addresses the challenges in detumbling tasks and paves the way for advanced manipulation of non-rigid satellites with uncertain dynamics.
[5] arXiv:2407.17683 [pdf, other]: Title: RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

Seung Hyeon Bang, Carlos Arribalzaga Jové, Luis Sentis

Comments: 8 pages, 7 figures

Subjects: Robotics (cs.RO)

This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. To address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. This integration synergizes the predictive capability of MPC with the flexibility and adaptability of RL. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach.
[6] arXiv:2407.17709 [pdf, other]: Title: PGD-VIO: An Accurate Plane-Aided Visual-Inertial Odometry with Graph-Based Drift Suppression

Yidi Zhang, Fulin Tang, Zewen Xu, Yihong Wu, Pengju Ma

Subjects: Robotics (cs.RO)

Generally, high-level features provide more geometrical information compared to point features, which can be exploited to further constrain motions. Planes are commonplace in man-made environments, offering an active means to reduce drift, due to their extensive spatial and temporal observability. To make full use of planar information, we propose a novel visual-inertial odometry (VIO) using an RGBD camera and an inertial measurement unit (IMU), effectively integrating point and plane features in an extended Kalman filter (EKF) framework. Depth information of point features is leveraged to improve the accuracy of point triangulation, while plane features serve as direct observations added into the state vector. Notably, to benefit long-term navigation,a novel graph-based drift detection strategy is proposed to search overlapping and identical structures in the plane map so that the cumulative drift is suppressed subsequently. The experimental results on two public datasets demonstrate that our system outperforms state-of-the-art methods in localization accuracy and meanwhile generates a compact and consistent plane map, free of expensive global bundle adjustment and loop closing techniques.
[7] arXiv:2407.17840 [pdf, other]: Title: Complex picking via entanglement of granular mechanical metamaterials

Ashkan Rezanejad, Mostafa Mousa, Matthew Howard, Antonio Elia Forte

Subjects: Robotics (cs.RO); Applied Physics (physics.app-ph)

When objects are packed in a cluster, physical interactions are unavoidable. Such interactions emerge because of the objects geometric features; some of these features promote entanglement, while others create repulsion. When entanglement occurs, the cluster exhibits a global, complex behaviour, which arises from the stochastic interactions between objects. We hereby refer to such a cluster as an entangled granular metamaterial. We investigate the geometrical features of the objects which make up the cluster, henceforth referred to as grains, that maximise entanglement. We hypothesise that a cluster composed from grains with high propensity to tangle, will also show propensity to interact with a second cluster of tangled objects. To demonstrate this, we use the entangled granular metamaterials to perform complex robotic picking tasks, where conventional grippers struggle. We employ an electromagnet to attract the metamaterial (ferromagnetic) and drop it onto a second cluster of objects (targets, non-ferromagnetic). When the electromagnet is re-activated, the entanglement ensures that both the metamaterial and the targets are picked, with varying degrees of physical engagement that strongly depend on geometric features. Interestingly, although the metamaterials structural arrangement is random, it creates repeatable and consistent interactions with a second tangled media, enabling robust picking of the latter.
[8] arXiv:2407.17936 [pdf, other]: Title: Goal Estimation-based Adaptive Shared Control for Brain-Machine Interfaces Remote Robot Navigation

Tomoka Muraoka, Tatsuya Aoki, Masayuki Hirata, Tadahiro Taniguchi, Takato Horii, Takayuki Nagai

Subjects: Robotics (cs.RO)

In this study, we propose a shared control method for teleoperated mobile robots using brain-machine interfaces (BMI). The control commands generated through BMI for robot operation face issues of low input frequency, discreteness, and uncertainty due to noise. To address these challenges, our method estimates the user's intended goal from their commands and uses this goal to generate auxiliary commands through the autonomous system that are both at a higher input frequency and more continuous. Furthermore, by defining the confidence level of the estimation, we adaptively calculated the weights for combining user and autonomous commands, thus achieving shared control.
[9] arXiv:2407.17942 [pdf, other]: Title: A Novel Perception Entropy Metric for Optimizing Vehicle Perception with LiDAR Deployment

Yongjiang He, Peng Cao, Zhongling Su, Xiaobo Liu

Subjects: Robotics (cs.RO); Information Theory (cs.IT)

Developing an effective evaluation metric is crucial for accurately and swiftly measuring LiDAR perception performance. One major issue is the lack of metrics that can simultaneously generate fast and accurate evaluations based on either object detection or point cloud data. In this study, we propose a novel LiDAR perception entropy metric based on the probability of vehicle grid occupancy. This metric reflects the influence of point cloud distribution on vehicle detection performance. Based on this, we also introduce a LiDAR deployment optimization model, which is solved using a differential evolution-based particle swarm optimization algorithm. A comparative experiment demonstrated that the proposed PE-VGOP offers a correlation of more than 0.98 with vehicle detection ground truth in evaluating LiDAR perception performance. Furthermore, compared to the base deployment, field experiments indicate that the proposed optimization model can significantly enhance the perception capabilities of various types of LiDARs, including RS-16, RS-32, and RS-80. Notably, it achieves a 25% increase in detection Recall for the RS-32 LiDAR.
[10] arXiv:2407.17944 [pdf, other]: Title: Time-Optimal Planning for Long-Range Quadrotor Flights: An Automatic Optimal Synthesis Approach

Chao Qin, Jingxiang Chen, Yifan Lin, Abhishek Goudar, Angela P. Schoellig, Hugh H.-T. Liu

Comments: 19 pages, 19 figures

Subjects: Robotics (cs.RO); Systems and Control (eess.SY)

Time-critical tasks such as drone racing typically cover large operation areas. However, it is difficult and computationally intensive for current time-optimal motion planners to accommodate long flight distances since a large yet unknown number of knot points is required to represent the trajectory. We present a polynomial-based automatic optimal synthesis (AOS) approach that can address this challenge. Our method not only achieves superior time optimality but also maintains a consistently low computational cost across different ranges while considering the full quadrotor dynamics. First, we analyze the properties of time-optimal quadrotor maneuvers to determine the minimal number of polynomial pieces required to capture the dominant structure of time-optimal trajectories. This enables us to represent substantially long minimum-time trajectories with a minimal set of variables. Then, a robust optimization scheme is developed to handle arbitrary start and end conditions as well as intermediate waypoints. Extensive comparisons show that our approach is faster than the state-of-the-art approach by orders of magnitude with comparable time optimality. Real-world experiments further validate the quality of the resulting trajectories, demonstrating aggressive time-optimal maneuvers with a peak velocity of 8.86 m/s.
[11] arXiv:2407.17967 [pdf, other]: Title: Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Nghia Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Comments: Accepted at IROS 2024

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.
[12] arXiv:2407.18009 [pdf, other]: Title: Egocentric Robots in a Human-Centric World? Exploring Group-Robot-Interaction in Public Spaces

Ana Müller, Anja Richert

Comments: Accepted at the workshop on advancing Group Understanding and robots' adaptive behavior (GROUND), held at the Robotics Science and Systems (RSS) Conference, 2024

Subjects: Robotics (cs.RO)

The deployment of social robots in real-world scenarios is increasing, supporting humans in various contexts. However, they still struggle to grasp social dynamics, especially in public spaces, sometimes resulting in violations of social norms, such as interrupting human conversations. This behavior, originating from a limited processing of social norms, might be perceived as robot-centered. Understanding social dynamics, particularly in group-robot-interactions (GRI), underscores the need for further research and development in human-robot-interaction (HRI). Enhancing the interaction abilities of social robots, especially in GRIs, can improve their effectiveness in real-world applications on a micro-level, as group interactions lead to increased motivation and comfort. In this study, we assessed the influence of the interaction condition (dyadic vs. triadic) on the perceived extraversion (ext.) of social robots in public spaces. The research involved 40 HRIs, including 24 dyadic (i.e., one human and one robot) interactions and 16 triadic interactions, which involve at least three entities, including the robot.
[13] arXiv:2407.18043 [pdf, other]: Title: YOCO: You Only Calibrate Once for Accurate Extrinsic Parameter in LiDAR-Camera Systems

Tianle Zeng, Dengke He, Feifan Yan, Meixi He

Comments: IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

In a multi-sensor fusion system composed of cameras and LiDAR, precise extrinsic calibration contributes to the system's long-term stability and accurate perception of the environment. However, methods based on extracting and registering corresponding points still face challenges in terms of automation and precision. This paper proposes a novel fully automatic extrinsic calibration method for LiDAR-camera systems that circumvents the need for corresponding point registration. In our approach, a novel algorithm to extract required LiDAR correspondence point is proposed. This method can effectively filter out irrelevant points by computing the orientation of plane point clouds and extracting points by applying distance- and density-based thresholds. We avoid the need for corresponding point registration by introducing extrinsic parameters between the LiDAR and camera into the projection of extracted points and constructing co-planar constraints. These parameters are then optimized to solve for the extrinsic. We validated our method across multiple sets of LiDAR-camera systems. In synthetic experiments, our method demonstrates superior performance compared to current calibration techniques. Real-world data experiments further confirm the precision and robustness of the proposed algorithm, with average rotation and translation calibration errors between LiDAR and camera of less than 0.05 degree and 0.015m, respectively. This method enables automatic and accurate extrinsic calibration in a single one step, emphasizing the potential of calibration algorithms beyond using corresponding point registration to enhance the automation and precision of LiDAR-camera system calibration.
[14] arXiv:2407.18140 [pdf, other]: Title: Influence Vectors Control for Robots Using Cellular-like Binary Actuators

Alexandre Girard, Jean-Sébastien Plante

Journal-ref: IEEE Transactions on Robotics ( Volume: 30, Issue: 3, June 2014)

Subjects: Robotics (cs.RO)

Robots using cellular-like redundant binary actuators could outmatch electric-gearmotor robotic systems in terms of reliability, force-to-weight ratio and cost. This paper presents a robust fault tolerant control scheme that is designed to meet the control challenges encountered by such robots, i.e., discrete actuator inputs, complex system modeling and cross-coupling between actuators. In the proposed scheme, a desired vectorial system output, such as a position or a force, is commanded by recruiting actuators based on their influence vectors on the output. No analytical model of the system is needed; influence vectors are identified experimentally by sequentially activating each actuator. For position control tasks, the controller uses a probabilistic approach and a genetic algorithm to determine an optimal combination of actuators to recruit. For motion control tasks, the controller uses a sliding mode approach and independent recruiting decision for each actuator. Experimental results on a four degrees of freedom binary manipulator with twenty actuators confirm the method's effectiveness, and its ability to tolerate massive perturbations and numerous actuator failures.
[15] arXiv:2407.18240 [pdf, other]: Title: CodedVO: Coded Visual Odometry

Sachin Shah, Naitri Rajyaguru, Chahat Deep Singh, Christopher Metzler, Yiannis Aloimonos

Comments: 7 pages, 4 figures, IEEE ROBOTICS AND AUTOMATION LETTERS

Journal-ref: IEEE ROBOTICS AND AUTOMATION LETTERS, 2024

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Autonomous robots often rely on monocular cameras for odometry estimation and navigation. However, the scale ambiguity problem presents a critical barrier to effective monocular visual odometry. In this paper, we present CodedVO, a novel monocular visual odometry method that overcomes the scale ambiguity problem by employing custom optics to physically encode metric depth information into imagery. By incorporating this information into our odometry pipeline, we achieve state-of-the-art performance in monocular visual odometry with a known scale. We evaluate our method in diverse indoor environments and demonstrate its robustness and adaptability. We achieve a 0.08m average trajectory error in odometry evaluation on the ICL-NUIM indoor odometry dataset.

[16] arXiv:2407.17518 (cross-list from cs.AI) [pdf, other]: Title: Driving pattern interpretation based on action phases clustering

Xue Yao, Simeon C. Calvert, Serge P. Hoogendoorn

Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO); Applications (stat.AP); Machine Learning (stat.ML)

Current approaches to identifying driving heterogeneity face challenges in comprehending fundamental patterns from the perspective of underlying driving behavior mechanisms. The concept of Action phases was proposed in our previous work, capturing the diversity of driving characteristics with physical meanings. This study presents a novel framework to further interpret driving patterns by classifying Action phases in an unsupervised manner. In this framework, a Resampling and Downsampling Method (RDM) is first applied to standardize the length of Action phases. Then the clustering calibration procedure including ''Feature Selection'', ''Clustering Analysis'', ''Difference/Similarity Evaluation'', and ''Action phases Re-extraction'' is iteratively applied until all differences among clusters and similarities within clusters reach the pre-determined criteria. Application of the framework using real-world datasets revealed six driving patterns in the I80 dataset, labeled as ''Catch up'', ''Keep away'', and ''Maintain distance'', with both ''Stable'' and ''Unstable'' states. Notably, Unstable patterns are more numerous than Stable ones. ''Maintain distance'' is the most common among Stable patterns. These observations align with the dynamic nature of driving. Two patterns ''Stable keep away'' and ''Unstable catch up'' are missing in the US101 dataset, which is in line with our expectations as this dataset was previously shown to have less heterogeneity. This demonstrates the potential of driving patterns in describing driving heterogeneity. The proposed framework promises advantages in addressing label scarcity in supervised learning and enhancing tasks such as driving behavior modeling and driving trajectory prediction.
[17] arXiv:2407.17673 (cross-list from cs.CV) [pdf, other]: Title: CRASAR-U-DROIDs: A Large Scale Benchmark Dataset for Building Alignment and Damage Assessment in Georectified sUAS Imagery

Thomas Manzini, Priyankari Perali, Raisa Karnik, Robin Murphy

Comments: 16 Pages, 7 Figures, 6 Tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

This document presents the Center for Robot Assisted Search And Rescue - Uncrewed Aerial Systems - Disaster Response Overhead Inspection Dataset (CRASAR-U-DROIDs) for building damage assessment and spatial alignment collected from small uncrewed aerial systems (sUAS) geospatial imagery. This dataset is motivated by the increasing use of sUAS in disaster response and the lack of previous work in utilizing high-resolution geospatial sUAS imagery for machine learning and computer vision models, the lack of alignment with operational use cases, and with hopes of enabling further investigations between sUAS and satellite imagery. The CRASAR-U-DRIODs dataset consists of fifty-two (52) orthomosaics from ten (10) federally declared disasters (Hurricane Ian, Hurricane Ida, Hurricane Harvey, Hurricane Idalia, Hurricane Laura, Hurricane Michael, Musset Bayou Fire, Mayfield Tornado, Kilauea Eruption, and Champlain Towers Collapse) spanning 67.98 square kilometers (26.245 square miles), containing 21,716 building polygons and damage labels, and 7,880 adjustment annotations. The imagery was tiled and presented in conjunction with overlaid building polygons to a pool of 130 annotators who provided human judgments of damage according to the Joint Damage Scale. These annotations were then reviewed via a two-stage review process in which building polygon damage labels were first reviewed individually and then again by committee. Additionally, the building polygons have been aligned spatially to precisely overlap with the imagery to enable more performant machine learning models to be trained. It appears that CRASAR-U-DRIODs is the largest labeled dataset of sUAS orthomosaic imagery.
[18] arXiv:2407.17757 (cross-list from cs.CV) [pdf, other]: Title: CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To address these challenges, this study introduces a novel accident anticipation framework for AVs, termed CRASH. It seamlessly integrates five components: object detector, feature extractor, object-aware module, context-aware module, and multi-layer fusion. Specifically, we develop the object-aware module to prioritize high-risk objects in complex and ambiguous environments by calculating the spatial-temporal relationships between traffic agents. In parallel, the context-aware is also devised to extend global visual information from the temporal to the frequency domain using the Fast Fourier Transform (FFT) and capture fine-grained visual features of potential objects and broader context cues within traffic scenes. To capture a wider range of visual cues, we further propose a multi-layer fusion that dynamically computes the temporal dependencies between different scenes and iteratively updates the correlations between different visual features for accurate and timely accident prediction. Evaluated on real-world datasets--Dashcam Accident Dataset (DAD), Car Crash Dataset (CCD), and AnAn Accident Detection (A3D) datasets--our model surpasses existing top baselines in critical evaluation metrics like Average Precision (AP) and mean Time-To-Accident (mTTA). Importantly, its robustness and adaptability are particularly evident in challenging driving scenarios with missing or limited training data, demonstrating significant potential for application in real-world autonomous driving systems.
[19] arXiv:2407.17766 (cross-list from cs.MA) [pdf, other]: Title: Strategic Pseudo-Goal Perturbation for Deadlock-Free Multi-Agent Navigation in Social Mini-Games

Abhishek Jha, Tanishq Gupta, Sumit Singh Rawat, Girish Kumar

Subjects: Multiagent Systems (cs.MA); Robotics (cs.RO)

This work introduces a Strategic Pseudo-Goal Perturbation (SPGP) technique, a novel approach to resolve deadlock situations in multi-agent navigation scenarios. Leveraging the robust framework of Safety Barrier Certificates, our method integrates a strategic perturbation mechanism that guides agents through social mini-games where deadlock and collision occur frequently. The method adopts a strategic calculation process where agents, upon encountering a deadlock select a pseudo goal within a predefined radius around the current position to resolve the deadlock among agents. The calculation is based on controlled strategic algorithm, ensuring that deviation towards pseudo-goal is both purposeful and effective in resolution of deadlock. Once the agent reaches the pseudo goal, it resumes the path towards the original goal, thereby enhancing navigational efficiency and safety. Experimental results demonstrates SPGP's efficacy in reducing deadlock instances and improving overall system throughput in variety of multi-agent navigation scenarios.
[20] arXiv:2407.17905 (cross-list from cs.CV) [pdf, other]: Title: StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory

Zhiheng Li, Yubo Cui, Jiexi Zhong, Zheng Fang

Comments: 8 pages, 7 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Moving object segmentation based on LiDAR is a crucial and challenging task for autonomous driving and mobile robotics. Most approaches explore spatio-temporal information from LiDAR sequences to predict moving objects in the current frame. However, they often focus on transferring temporal cues in a single inference and regard every prediction as independent of others. This may cause inconsistent segmentation results for the same object in different frames. To overcome this issue, we propose a streaming network with a memory mechanism, called StreamMOS, to build the association of features and predictions among multiple inferences. Specifically, we utilize a short-term memory to convey historical features, which can be regarded as spatial prior of moving objects and adopted to enhance current inference by temporal fusion. Meanwhile, we build a long-term memory to store previous predictions and exploit them to refine the present forecast at voxel and instance levels through voting. Besides, we present multi-view encoder with cascade projection and asymmetric convolution to extract motion feature of objects in different representations. Extensive experiments validate that our algorithm gets competitive performance on SemanticKITTI and Sipailou Campus datasets. Code will be released at this https URL.
[21] arXiv:2407.17980 (cross-list from cs.AI) [pdf, other]: Title: Personalized and Context-aware Route Planning for Edge-assisted Vehicles

Dinesh Cyril Selvaraj, Falko Dressler, Carla Fabiana Chiasserini

Subjects: Artificial Intelligence (cs.AI); Robotics (cs.RO)

Conventional route planning services typically offer the same routes to all drivers, focusing primarily on a few standardized factors such as travel distance or time, overlooking individual driver preferences. With the inception of autonomous vehicles expected in the coming years, where vehicles will rely on routes decided by such planners, there arises a need to incorporate the specific preferences of each driver, ensuring personalized navigation experiences. In this work, we propose a novel approach based on graph neural networks (GNNs) and deep reinforcement learning (DRL), aimed at customizing routes to suit individual preferences. By analyzing the historical trajectories of individual drivers, we classify their driving behavior and associate it with relevant road attributes as indicators of driver preferences. The GNN is capable of representing the road network as graph-structured data effectively, while DRL is capable of making decisions utilizing reward mechanisms to optimize route selection with factors such as travel costs, congestion level, and driver satisfaction. We evaluate our proposed GNN-based DRL framework using a real-world road network and demonstrate its ability to accommodate driver preferences, offering a range of route options tailored to individual drivers. The results indicate that our framework can select routes that accommodate driver's preferences with up to a 17% improvement compared to a generic route planner, and reduce the travel time by 33% (afternoon) and 46% (evening) relatively to the shortest distance-based approach.
[22] arXiv:2407.18022 (cross-list from cs.NE) [pdf, other]: Title: Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind

Francesca Bianco, Silvia Rigato, Maria Laura Filippetti, Dimitri Ognibene

Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.
[23] arXiv:2407.18038 (cross-list from cs.CV) [pdf, other]: Title: TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework

Guanfeng Tang, Zhiyuan Wu, Rui Fan

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Semantic segmentation and stereo matching, respectively analogous to the ventral and dorsal streams in our human brain, are two key components of autonomous driving perception systems. Addressing these two tasks with separate networks is no longer the mainstream direction in developing computer vision algorithms, particularly with the recent advances in large vision models and embodied artificial intelligence. The trend is shifting towards combining them within a joint learning framework, especially emphasizing feature sharing between the two tasks. The major contributions of this study lie in comprehensively tightening the coupling between semantic segmentation and stereo matching. Specifically, this study introduces three novelties: (1) a tightly coupled, gated feature fusion strategy, (2) a hierarchical deep supervision strategy, and (3) a coupling tightening loss function. The combined use of these technical contributions results in TiCoSS, a state-of-the-art joint learning framework that simultaneously tackles semantic segmentation and stereo matching. Through extensive experiments on the KITTI and vKITTI2 datasets, along with qualitative and quantitative analyses, we validate the effectiveness of our developed strategies and loss function, and demonstrate its superior performance compared to prior arts, with a notable increase in mIoU by over 9%. Our source code will be publicly available at mias.group/TiCoSS upon publication.
[24] arXiv:2407.18099 (cross-list from eess.SY) [pdf, other]: Title: Pose, Velocity and Landmark Position Estimation Using IMU and Bearing Measurements

Miaomiao Wang, Abdelhamid Tayebi

Comments: 8 pages, 3 figures

Subjects: Systems and Control (eess.SY); Robotics (cs.RO)

This paper investigates the estimation problem of the pose (orientation and position) and linear velocity of a rigid body, as well as the landmark positions, using an inertial measurement unit (IMU) and a monocular camera. First, we propose a globally exponentially stable (GES) linear time-varying (LTV) observer for the estimation of body-frame landmark positions and velocity, using IMU and monocular bearing measurements. Thereafter, using the gyro measurements, some landmarks known in the inertial frame and the estimates from the LTV observer, we propose a nonlinear pose observer on $\SO(3)\times \mathbb{R}^3$. The overall estimation system is shown to be almost globally asymptotically stable (AGAS) using the notion of almost global input-to-state stability (ISS). Interestingly, we show that with the knowledge (in the inertial frame) of a small number of landmarks, we can recover (under some conditions) the unknown positions (in the inertial frame) of a large number of landmarks. Numerical simulation results are presented to illustrate the performance of the proposed estimation scheme.
[25] arXiv:2407.18145 (cross-list from cs.CV) [pdf, other]: Title: Taxonomy-Aware Continual Semantic Segmentation in Hyperbolic Spaces for Open-World Perception

Julia Hindel, Daniele Cattaneo, Abhinav Valada

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Semantic segmentation models are typically trained on a fixed set of classes, limiting their applicability in open-world scenarios. Class-incremental semantic segmentation aims to update models with emerging new classes while preventing catastrophic forgetting of previously learned ones. However, existing methods impose strict rigidity on old classes, reducing their effectiveness in learning new incremental classes. In this work, we propose Taxonomy-Oriented Poincaré-regularized Incremental-Class Segmentation (TOPICS) that learns feature embeddings in hyperbolic space following explicit taxonomy-tree structures. This supervision provides plasticity for old classes, updating ancestors based on new classes while integrating new classes at fitting positions. Additionally, we maintain implicit class relational constraints on the geometric basis of the Poincaré ball. This ensures that the latent space can continuously adapt to new constraints while maintaining a robust structure to combat catastrophic forgetting. We also establish eight realistic incremental learning protocols for autonomous driving scenarios, where novel classes can originate from known classes or the background. Extensive evaluations of TOPICS on the Cityscapes and Mapillary Vistas 2.0 benchmarks demonstrate that it achieves state-of-the-art performance. We make the code and trained models publicly available at this http URL.
[26] arXiv:2407.18178 (cross-list from cs.CV) [pdf, other]: Title: PianoMime: Learning a Generalist, Dexterous Piano Player from Internet Demonstrations

Cheng Qian, Julen Urain, Kevin Zakka, Jan Peters

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

In this work, we introduce PianoMime, a framework for training a piano-playing agent using internet demonstrations. The internet is a promising source of large-scale demonstrations for training our robot agents. In particular, for the case of piano-playing, Youtube is full of videos of professional pianists playing a wide myriad of songs. In our work, we leverage these demonstrations to learn a generalist piano-playing agent capable of playing any arbitrary song. Our framework is divided into three parts: a data preparation phase to extract the informative features from the Youtube videos, a policy learning phase to train song-specific expert policies from the demonstrations and a policy distillation phase to distil the policies into a single generalist agent. We explore different policy designs to represent the agent and evaluate the influence of the amount of training data on the generalization capability of the agent to novel songs not available in the dataset. We show that we are able to learn a policy with up to 56\% F1 score on unseen songs.
[27] arXiv:2407.18180 (cross-list from physics.bio-ph) [pdf, other]: Title: Passive wing deployment and retraction in beetles and flapping microrobots

Hoang-Vu Phan, Hoon Cheol Park, Dario Floreano

Comments: 20 pages, 10 figures

Subjects: Biological Physics (physics.bio-ph); Robotics (cs.RO)

Birds, bats and many insects can tuck their wings against their bodies at rest and deploy them to power flight. Whereas birds and bats use well-developed pectoral and wing muscles and tendons, how insects control these movements remains unclear, as mechanisms of wing deployment and retraction vary among insect species. Beetles (Coleoptera) display one of the most complex wing mechanisms. For example, in rhinoceros beetles, the wing deployment initiates by fully opening the elytra and partially releasing the hindwings from the abdomen. Subsequently, the beetle starts flapping, elevates the hindwings at the bases, and unfolds the wingtips in an origami-like fashion. Whilst the origami-like fold have been extensively explored, limited attention has been given to the hindwing base deployment and retraction, which are believed to be driven by thoracic muscles. Using high-speed cameras and robotic flapping-wing models, here we demonstrate that rhinoceros beetles can effortlessly elevate the hindwings to flight position without the need for muscular activity. We show that opening the elytra triggers a spring-like partial release of the hindwings from the body, allowing the clearance needed for subsequent flapping motion that brings the hindwings into flight position. The results also show that after flight, beetles can leverage the elytra to push the hindwings back into the resting position, further strengthening the hypothesis of a passive deployment mechanism. Finally, we validate the hypothesis with a flapping microrobot that passively deploys its wings for stable controlled flight and retracts them neatly upon landing, which offers a simple yet effective approach to the design of insect-like flying micromachines.

[28] arXiv:2305.01648 (replaced) [pdf, other]: Title: Manipulator as a Tail: Promoting Dynamic Stability for Legged Locomotion

Huang Huang, Antonio Loquercio, Ashish Kumar, Neerja Thakkar, Ken Goldberg, Jitendra Malik

Subjects: Robotics (cs.RO)

For locomotion, is an arm on a legged robot a liability or an asset for locomotion? Biological systems evolved additional limbs beyond legs that facilitates postural control. This work shows how a manipulator can be an asset for legged locomotion at high speeds or under external perturbations, where the arm serves beyond manipulation. Since the system has 15 degrees of freedom (twelve for the legged robot and three for the arm), off-the-shelf reinforcement learning (RL) algorithms struggle to learn effective locomotion policies. Inspired by Bernstein's neurophysiological theory of animal motor learning, we develop an incremental training procedure that initially freezes some degrees of freedom and gradually releases them, using behaviour cloning (BC) from an early learning procedure to guide optimization in later learning. Simulation experiments show that our policy increases the success rate by up to 61 percentage points over the baselines. Simulation and real robot experiments suggest that our policy learns to use the arm as a tail to initiate robot turning at high speeds and to stabilize the quadruped under external perturbations. Quantitatively, in simulation experiments, we cut the failure rate up to 43.6% during high-speed turning and up to 31.8% for quadruped under external forces compared to using a locked arm.
[29] arXiv:2402.10885 (replaced) [pdf, other]: Title: 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations

Tsung-Wei Ke, Nikolaos Gkanatsios, Katerina Fragkiadaki

Comments: First two authors contributed equally

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Diffusion policies are conditional diffusion models that learn robot action distributions conditioned on the robot and environment state. They have recently shown to outperform both deterministic and alternative action distribution learning formulations. 3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views using sensed depth. They have shown to generalize better than their 2D counterparts across camera viewpoints. We unify these two lines of work and present 3D Diffuser Actor, a neural policy equipped with a novel 3D denoising transformer that fuses information from the 3D visual scene, a language instruction and proprioception to predict the noise in noised 3D robot pose trajectories. 3D Diffuser Actor sets a new state-of-the-art on RLBench with an absolute performance gain of 18.1% over the current SOTA on a multi-view setup and an absolute gain of 13.1% on a single-view setup. On the CALVIN benchmark, it improves over the current SOTA by a 9% relative increase. It also learns to control a robot manipulator in the real world from a handful of demonstrations. Through thorough comparisons with the current SOTA policies and ablations of our model, we show 3D Diffuser Actor's design choices dramatically outperform 2D representations, regression and classification objectives, absolute attentions, and holistic non-tokenized 3D scene embeddings.
[30] arXiv:2403.11761 (replaced) [pdf, other]: Title: BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation

Jonas Schramm, Niclas Vödisch, Kürsat Petek, B Ravi Kiran, Senthil Yogamani, Wolfram Burgard, Abhinav Valada

Comments: Accepted for the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

Semantic scene segmentation from a bird's-eye-view (BEV) perspective plays a crucial role in facilitating planning and decision-making for mobile robots. Although recent vision-only methods have demonstrated notable advancements in performance, they often struggle under adverse illumination conditions such as rain or nighttime. While active sensors offer a solution to this challenge, the prohibitively high cost of LiDARs remains a limiting factor. Fusing camera data with automotive radars poses a more inexpensive alternative but has received less attention in prior research. In this work, we aim to advance this promising avenue by introducing BEVCar, a novel approach for joint BEV object and map segmentation. The core novelty of our approach lies in first learning a point-based encoding of raw radar data, which is then leveraged to efficiently initialize the lifting of image features into the BEV space. We perform extensive experiments on the nuScenes dataset and demonstrate that BEVCar outperforms the current state of the art. Moreover, we show that incorporating radar information significantly enhances robustness in challenging environmental conditions and improves segmentation performance for distant objects. To foster future research, we provide the weather split of the nuScenes dataset used in our experiments, along with our code and trained models at this http URL.
[31] arXiv:2404.01537 (replaced) [pdf, other]: Title: Are Doppler Velocity Measurements Useful for Spinning Radar Odometry?

Daniil Lisus, Keenan Burnett, David J. Yoon, Richard Poulton, John Marshall, Timothy D. Barfoot

Comments: 8 pages, 7 figures, 2 tables, submitted to Robotics and Automation Letters (RA-L)

Subjects: Robotics (cs.RO)

Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike 'fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial velocities observable, we modified the firmware of a commercial spinning radar to use triangular frequency modulation. In this paper, we develop a novel way to use this modulation to extract radial Doppler velocity measurements from single raw radar intensity scans without any required data association. We show that these noisy, error-prone measurements contain enough information to provide good ego-velocity estimates, and incorporate these estimates into different modern odometry pipelines. We extensively evaluate the pipelines on over 110 km of driving data in progressively more geometrically challenging autonomous-driving environments. We show that Doppler velocity measurements improve odometry in well-defined geometric conditions and enable it to continue functioning even in severely geometrically degenerate environments, such as long tunnels.
[32] arXiv:2404.10658 (replaced) [pdf, other]: Title: Trajectory Planning Using Reinforcement Learning for Interactive Overtaking Maneuvers in Autonomous Racing Scenarios

Levent Ögretmen, Mo Chen, Phillip Pitschi, Boris Lohmann

Comments: 8 pages, accepted to be published at the 27th IEEE International Conference on Intelligent Transportation Systems, September 24 - 27, 2024, Edmonton, Canada

Subjects: Robotics (cs.RO)

Conventional trajectory planning approaches for autonomous racing are based on the sequential execution of prediction of the opposing vehicles and subsequent trajectory planning for the ego vehicle. If the opposing vehicles do not react to the ego vehicle, they can be predicted accurately. However, if there is interaction between the vehicles, the prediction loses its validity. For high interaction, instead of a planning approach that reacts exclusively to the fixed prediction, a trajectory planning approach is required that incorporates the interaction with the opposing vehicles. This paper demonstrates the limitations of a widely used conventional sampling-based approach within a highly interactive blocking scenario. We show that high success rates are achieved for less aggressive blocking behavior but that the collision rate increases with more significant interaction. We further propose a novel Reinforcement Learning (RL)-based trajectory planning approach for racing that explicitly exploits the interaction with the opposing vehicle without requiring a prediction. In contrast to the conventional approach, the RL-based approach achieves high success rates even for aggressive blocking behavior. Furthermore, we propose a novel safety layer (SL) that intervenes when the trajectory generated by the RL-based approach is infeasible. In that event, the SL generates a sub-optimal but feasible trajectory, avoiding termination of the scenario due to a not found valid solution.
[33] arXiv:2406.16087 (replaced) [pdf, other]: Title: Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeSy) computational framework, imperative learning (IL), for robot autonomy, leveraging the generalization abilities of symbolic reasoning. The framework of IL consists of three primary components: a neural module, a reasoning engine, and a memory system. We formulate IL as a special bilevel optimization (BLO), which enables reciprocal learning over the three modules. This overcomes the label-intensive obstacles associated with data-driven approaches and takes advantage of symbolic reasoning concerning logical reasoning, physical principles, geometric analysis, etc. We discuss several optimization techniques for IL and verify their effectiveness in five distinct robot autonomy tasks including path planning, rule induction, optimal control, visual odometry, and multi-robot routing. Through various experiments, we show that IL can significantly enhance robot autonomy capabilities and we anticipate that it will catalyze further research across diverse domains.
[34] arXiv:2406.17249 (replaced) [pdf, other]: Title: SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Xu Liu, Jiuzhou Lei, Ankit Prabhu, Yuezhan Tao, Igor Spasojevic, Pratik Chaudhari, Nikolay Atanasov, Vijay Kumar

Comments: Xu Liu, Jiuzhou Lei, and Ankit Prabhu contributed equally to this work. This is a preliminary release and is subject to improvement

Subjects: Robotics (cs.RO)

This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Mapping (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment, including high-level sparse semantic maps of object models and low-level voxel maps. We leverage the informativeness and viewpoint invariance of the high-level semantic map to obtain an effective semantics-driven place-recognition algorithm for inter-robot loop closure detection across aerial and ground robots with different sensing modalities. A communication module is designed to track each robot's own observations and those of other robots whenever communication links are available. Such observations are then used to construct a merged map. Our framework enables real-time decentralized operations onboard robots, allowing them to opportunistically leverage communication. We integrate and deploy our proposed framework on three types of aerial and ground robots. Extensive experimental results show an average inter-robot localization error of approximately 20 cm in position and 0.2 degrees in orientation, an object mapping F1 score consistently over 0.9, and a communication packet size of merely 2-3 megabytes per kilometer trajectory with as many as 1,000 landmarks. The project website can be found at this https URL.
[35] arXiv:2407.10959 (replaced) [pdf, other]: Title: A unified theory and statistical learning approach for traffic conflict detection

Yiru Jiao, Simeon C. Calvert, Sander van Cranenburgh, Hans van Lint

Comments: 21 pages, 9 figures, prepared for submission

Subjects: Robotics (cs.RO); Machine Learning (stat.ML)

This study proposes a unified theory and statistical learning approach for traffic conflict detection, addressing the long-existing call for a consistent and comprehensive methodology to evaluate the collision risk emerging in road user interactions. The proposed theory assumes context-dependent probabilistic collision risk and frames conflict detection as assessing this risk by statistical learning of extreme events in daily interactions. Experiments using real-world trajectory data are conducted in this study, where a unified metric of conflict is trained with lane-changing interactions on German highways and applied to near-crash events from the 100-Car Naturalistic Driving Study in the U.S. Results of the experiments demonstrate that the trained metric provides effective collision warnings, generalises across distinct datasets and traffic environments, covers a broad range of conflicts, and delivers a long-tailed distribution of conflict intensity. Reflecting on these results, the unified theory ensures consistent evaluation by a generic formulation that encompasses varying assumptions of traffic conflicts; the statistical learning approach then enables a comprehensive consideration of influencing factors such as motion states of road users, environment conditions, and participant characteristics. Therefore, the theory and learning approach jointly provide an explainable and adaptable methodology for conflict detection among different road users and across various interaction scenarios. This promises to reduce accidents and improve overall traffic safety, by enhanced safety assessment of traffic infrastructures, more effective collision warning systems for autonomous driving, and a deeper understanding of road user behaviour in different traffic conditions.
[36] arXiv:2407.12197 (replaced) [pdf, other]: Title: Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions

Enrico Donato, Thomas George Thuruthel, Egidio Falotico

Comments: IEEE RAS EMBS 10th International Conference on Biomedical Robotics and Biomechatronics (BioRob 2024)

Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)

Autonomous systems face the intricate challenge of navigating unpredictable environments and interacting with external objects. The successful integration of robotic agents into real-world situations hinges on their perception capabilities, which involve amalgamating world models and predictive skills. Effective perception models build upon the fusion of various sensory modalities to probe the surroundings. Deep learning applied to raw sensory modalities offers a viable option. However, learning-based perceptive representations become difficult to interpret. This challenge is particularly pronounced in soft robots, where the compliance of structures and materials makes prediction even harder. Our work addresses this complexity by harnessing a generative model to construct a multi-modal perception model for soft robots and to leverage proprioceptive and visual information to anticipate and interpret contact interactions with external objects. A suite of tools to interpret the perception model is furnished, shedding light on the fusion and prediction processes across multiple sensory inputs after the learning phase. We will delve into the outlooks of the perception model and its implications for control purposes.
[37] arXiv:2407.13842 (replaced) [pdf, other]: Title: Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Comments: Accepted at ECCV 2024

Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)

6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at this https URL.
[38] arXiv:2311.13976 (replaced) [pdf, other]: Title: Low Latency Instance Segmentation by Continuous Clustering for LiDAR Sensors

Andreas Reich, Mirko Maehlisch

Comments: Accompanying Video: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

Low-latency instance segmentation of LiDAR point clouds is crucial in real-world applications because it serves as an initial and frequently-used building block in a robot's perception pipeline, where every task adds further delay. Particularly in dynamic environments, this total delay can result in significant positional offsets of dynamic objects, as seen in highway scenarios. To address this issue, we employ a new technique, which we call continuous clustering. Unlike most existing clustering approaches, which use a full revolution of the LiDAR sensor, we process the data stream in a continuous and seamless fashion. Our approach does not rely on the concept of complete or partial sensor rotations with multiple discrete range images; instead, it views the range image as a single and infinitely horizontally growing entity. Each new column of this continuous range image is processed as soon it is available. Obstacle points are clustered to existing instances in real-time and it is checked at a high-frequency which instances are completed in order to publish them without waiting for the completion of the revolution or some other integration period. In the case of rotating sensors, no problematic discontinuities between the points of the end and the start of a scan are observed. In this work we describe the two-layered data structure and the corresponding algorithm for continuous clustering. It is able to achieve an average latency of just 5 ms with respect to the latest timestamp of all points in the cluster. We are publishing the source code at this https URL.
[39] arXiv:2402.16598 (replaced) [pdf, other]: Title: PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers

Seong Hun Lee, Javier Civera, Patrick Vandewalle

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed.
[40] arXiv:2403.12856 (replaced) [pdf, other]: Title: Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planning

Mirco Theile, Hongpeng Cao, Marco Caccamo, Alberto L. Sangiovanni-Vincentelli

Comments: Accepted at IROS 2024. A video can be found here: this https URL. The code is available at this https URL

Subjects: Machine Learning (cs.LG); Robotics (cs.RO)

In reinforcement learning (RL), exploiting environmental symmetries can significantly enhance efficiency, robustness, and performance. However, ensuring that the deep RL policy and value networks are respectively equivariant and invariant to exploit these symmetries is a substantial challenge. Related works try to design networks that are equivariant and invariant by construction, limiting them to a very restricted library of components, which in turn hampers the expressiveness of the networks. This paper proposes a method to construct equivariant policies and invariant value functions without specialized neural network components, which we term equivariant ensembles. We further add a regularization term for adding inductive bias during training. In a map-based path planning case study, we show how equivariant ensembles and regularization benefit sample efficiency and performance.
[41] arXiv:2403.13129 (replaced) [pdf, other]: Title: Better Call SAL: Towards Learning to Segment Anything in Lidar

Aljoša Ošep, Tim Meinhardt, Francesco Ferroni, Neehar Peri, Deva Ramanan, Laura Leal-Taixé

Comments: Accepted to ECCV 2024

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)

We propose the SAL (Segment Anything in Lidar) method consisting of a text-promptable zero-shot model for segmenting and classifying any object in Lidar, and a pseudo-labeling engine that facilitates model training without manual supervision. While the established paradigm for Lidar Panoptic Segmentation (LPS) relies on manual supervision for a handful of object classes defined a priori, we utilize 2D vision foundation models to generate 3D supervision ``for free''. Our pseudo-labels consist of instance masks and corresponding CLIP tokens, which we lift to Lidar using calibrated multi-modal data. By training our model on these labels, we distill the 2D foundation models into our Lidar SAL model. Even without manual labels, our model achieves $91\%$ in terms of class-agnostic segmentation and $54\%$ in terms of zero-shot Lidar Panoptic Segmentation of the fully supervised state-of-the-art. Furthermore, we outperform several baselines that do not distill but only lift image features to 3D. More importantly, we demonstrate that SAL supports arbitrary class prompts, can be easily extended to new datasets, and shows significant potential to improve with increasing amounts of self-labeled data. Code and models are available at this $\href{this https URL}{URL}$.
[42] arXiv:2407.05433 (replaced) [pdf, other]: Title: An efficient algorithm for solving linear equality-constrained LQR problems

João Sousa-Pinto, Dominique Orban

Comments: 6 pages

Subjects: Optimization and Control (math.OC); Robotics (cs.RO)

We present a new algorithm for solving linear-quadratic regulator (LQR) problems with linear equality constraints, also known as constrained LQR (CLQR) problems. Our method's sequential runtime is linear in the number of stages and constraints, and its parallel runtime is logarithmic in the number of stages. The main technical contribution of this paper is the derivation of parallelizable techniques for eliminating the linear equality constraints while preserving the standard positive (semi-)definiteness requirements of LQR problems.
[43] arXiv:2407.17413 (replaced) [pdf, other]: Title: $A^*$ for Graphs of Convex Sets

Kaarthik Sundar, Sivakumar Rathinam

Subjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Robotics (cs.RO)

We present a novel algorithm that fuses the existing convex-programming based approach with heuristic information to find optimality guarantees and near-optimal paths for the Shortest Path Problem in the Graph of Convex Sets (SPP-GCS). Our method, inspired by $A^*$, initiates a best-first-like procedure from a designated subset of vertices and iteratively expands it until further growth is neither possible nor beneficial. Traditionally, obtaining solutions with bounds for an optimization problem involves solving a relaxation, modifying the relaxed solution to a feasible one, and then comparing the two solutions to establish bounds. However, for SPP-GCS, we demonstrate that reversing this process can be more advantageous, especially with Euclidean travel costs. In other words, we initially employ $A^*$ to find a feasible solution for SPP-GCS, then solve a convex relaxation restricted to the vertices explored by $A^*$ to obtain a relaxed solution, and finally, compare the solutions to derive bounds. We present numerical results to highlight the advantages of our algorithm over the existing approach in terms of the sizes of the convex programs solved and computation time.

Total of 43 entries

Showing up to 2000 entries per page: fewer | more | all

Robotics

New submissions for Friday, 26 July 2024 (showing 15 of 15 entries )

Cross submissions for Friday, 26 July 2024 (showing 12 of 12 entries )

Replacement submissions for Friday, 26 July 2024 (showing 16 of 16 entries )