Skip to main content

Showing 1–50 of 55 results for author: Guan, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12825  [pdf, other

    cs.CV

    MergeUp-augmented Semi-Weakly Supervised Learning for WSI Classification

    Authors: Mingxi Ouyang, Yuqiu Fu, Renao Yan, ShanShan Shi, Xitong Ling, Lianghui Zhu, Yonghong He, Tian Guan

    Abstract: Recent advancements in computational pathology and artificial intelligence have significantly improved whole slide image (WSI) classification. However, the gigapixel resolution of WSIs and the scarcity of manual annotations present substantial challenges. Multiple instance learning (MIL) is a promising weakly supervised learning approach for WSI classification. Recently research revealed employing… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  2. arXiv:2407.07764  [pdf, other

    cs.CV

    PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer

    Authors: Tongkun Guan, Chengyu Lin, Wei Shen, Xiaokang Yang

    Abstract: Handwritten Mathematical Expression Recognition (HMER) has wide applications in human-machine interaction scenarios, such as digitized education and automated offices. Recently, sequence-based models with encoder-decoder architectures have been commonly adopted to address this task by directly predicting LaTeX sequences of expression images. However, these methods only implicitly learn the syntax… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  3. arXiv:2406.18054  [pdf, other

    eess.IV cs.CV

    Leveraging Pre-trained Models for FF-to-FFPE Histopathological Image Translation

    Authors: Qilai Zhang, Jiawen Li, Peiran Liao, Jiali Hu, Tian Guan, Anjia Han, Yonghong He

    Abstract: The two primary types of Hematoxylin and Eosin (H&E) slides in histopathology are Formalin-Fixed Paraffin-Embedded (FFPE) and Fresh Frozen (FF). FFPE slides offer high quality histopathological images but require a labor-intensive acquisition process. In contrast, FF slides can be prepared quickly, but the image quality is relatively poor. Our task is to translate FF images into FFPE style, thereb… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  4. arXiv:2406.10900  [pdf, other

    cs.CV cs.CL

    AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models

    Authors: Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha

    Abstract: Large vision-language models (LVLMs) hallucinate: certain context cues in an image may trigger the language module's overconfident and incorrect reasoning on abnormal or hypothetical objects. Though a few benchmarks have been developed to investigate LVLM hallucinations, they mainly rely on hand-crafted corner cases whose fail patterns may hardly generalize, and finetuning on them could undermine… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  5. arXiv:2406.00672  [pdf, other

    cs.CV

    Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification

    Authors: Xuenian Wang, Shanshan Shi, Renao Yan, Qiehe Sun, Lianghui Zhu, Tian Guan, Yonghong He

    Abstract: In the field of whole slide image (WSI) classification, multiple instance learning (MIL) serves as a promising approach, commonly decoupled into feature extraction and aggregation. In this paradigm, our observation reveals that discriminative embeddings are crucial for aggregation to the final prediction. Among all feature updating strategies, task-oriented ones can capture characteristics specifi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  6. arXiv:2405.05363  [pdf, other

    cs.CV cs.RO

    LOC-ZSON: Language-driven Object-Centric Zero-Shot Object Retrieval and Navigation

    Authors: Tianrui Guan, Yurou Yang, Harry Cheng, Muyuan Lin, Richard Kim, Rajasimman Madhivanan, Arnie Sen, Dinesh Manocha

    Abstract: In this paper, we present LOC-ZSON, a novel Language-driven Object-Centric image representation for object navigation task within complex scenes. We propose an object-centric image representation and corresponding losses for visual-language model (VLM) fine-tuning, which can handle complex object-level queries. In addition, we design a novel LLM-based augmentation and prompt templates for stabilit… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to ICRA 2024

  7. arXiv:2404.12777  [pdf, other

    cs.CV

    EfficientGS: Streamlining Gaussian Splatting for Large-Scale High-Resolution Scene Representation

    Authors: Wenkai Liu, Tao Guan, Bin Zhu, Lili Ju, Zikai Song, Dan Li, Yuesong Wang, Wei Yang

    Abstract: In the domain of 3D scene representation, 3D Gaussian Splatting (3DGS) has emerged as a pivotal technology. However, its application to large-scale, high-resolution scenes (exceeding 4k$\times$4k pixels) is hindered by the excessive computational requirements for managing a large number of Gaussians. Addressing this, we introduce 'EfficientGS', an advanced approach that optimizes 3DGS for high-res… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  8. arXiv:2404.03187  [pdf, other

    cs.CV

    AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales

    Authors: Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha

    Abstract: We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. AGL-NET tackles two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view. To address these challenges, AGL-NET leverages a unified network… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  9. arXiv:2403.13235  [pdf, other

    cs.RO

    AMCO: Adaptive Multimodal Coupling of Vision and Proprioception for Quadruped Robot Navigation in Outdoor Environments

    Authors: Mohamed Elnoor, Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Tianrui Guan, Vignesh Rajagopal, Dinesh Manocha

    Abstract: We present AMCO, a novel navigation method for quadruped robots that adaptively combines vision-based and proprioception-based perception capabilities. Our approach uses three cost maps: general knowledge map; traversability history map; and current proprioception map; which are derived from a robot's vision and proprioception data, and couples them to obtain a coupled traversability cost map for… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: 8 pages

  10. arXiv:2403.11193  [pdf, other

    cs.CV

    Neural Markov Random Field for Stereo Matching

    Authors: Tongfan Guan, Chen Wang, Yun-Hui Liu

    Abstract: Stereo matching is a core task for many computer vision and robotics applications. Despite their dominance in traditional stereo methods, the hand-crafted Markov Random Field (MRF) models lack sufficient modeling accuracy compared to end-to-end deep models. While deep learning representations have greatly improved the unary terms of the MRF models, the overall accuracy is still severely limited by… ▽ More

    Submitted 21 March, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  11. arXiv:2403.10858  [pdf, other

    cs.CV

    RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification

    Authors: Hongbo Chu, Qiehe Sun, Jiawen Li, Yuxuan Chen, Lizhong Zhang, Tian Guan, Anjia Han, Yonghong He

    Abstract: Histopathological whole slide image (WSI) analysis with deep learning has become a research focus in computational pathology. The current paradigm is mainly based on multiple instance learning (MIL), in which approaches with Transformer as the backbone are well discussed. These methods convert WSI tasks into sequence tasks by representing patches as tokens in the WSI sequence. However, the feature… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: under review

  12. arXiv:2403.09606  [pdf, ps, other

    cs.CL cs.AI

    Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey

    Authors: Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zhou, Fuxiao Liu, Tianrui Guan, Haoliang Wang, Tong Yu, Julian McAuley, Wei Ai, Furong Huang

    Abstract: Causal inference has shown potential in enhancing the predictive accuracy, fairness, robustness, and explainability of Natural Language Processing (NLP) models by capturing causal relationships among variables. The emergence of generative Large Language Models (LLMs) has significantly impacted various NLP domains, particularly through their advanced reasoning capabilities. This survey focuses on e… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  13. arXiv:2403.07719  [pdf, other

    cs.CV

    Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis

    Authors: Jiawen Li, Yuxuan Chen, Hongbo Chu, Qiehe Sun, Tian Guan, Anjia Han, Yonghong He

    Abstract: Histopathological whole slide images (WSIs) classification has become a foundation task in medical microscopic imaging processing. Prevailing approaches involve learning WSIs as instance-bag representations, emphasizing significant instances but struggling to capture the interactions between instances. Additionally, conventional graph representation methods utilize explicit spatial positions to co… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  14. arXiv:2402.10340  [pdf, other

    cs.RO cs.AI

    Highlighting the Safety Concerns of Deploying LLMs/VLMs in Robotics

    Authors: Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, Jing Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi

    Abstract: In this paper, we highlight the critical issues of robustness and safety associated with integrating large language models (LLMs) and vision-language models (VLMs) into robotics applications. Recent works focus on using LLMs and VLMs to improve the performance of robotics tasks, such as manipulation and navigation. Despite these improvements, analyzing the safety of such systems remains underexplo… ▽ More

    Submitted 16 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  15. arXiv:2312.05490  [pdf, other

    cs.CV

    Shapley Values-enabled Progressive Pseudo Bag Augmentation for Whole Slide Image Classification

    Authors: Renao Yan, Qiehe Sun, Cheng Jin, Yiqing Liu, Yonghong He, Tian Guan, Hao Chen

    Abstract: In computational pathology, whole slide image (WSI) classification presents a formidable challenge due to its gigapixel resolution and limited fine-grained annotations. Multiple instance learning (MIL) offers a weakly supervised solution, yet refining instance-level information from bag-level labels remains complex. While most of the conventional MIL methods use attention scores to estimate instan… ▽ More

    Submitted 15 May, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

    Comments: submitted to IEEE TRANSACTIONS ON MEDICAL IMAGING

  16. arXiv:2312.05286  [pdf, other

    cs.CV

    Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors

    Authors: Tongkun Guan, Wei Shen, Xue Yang, Xuehui Wang, Xiaokang Yang

    Abstract: Existing scene text detection methods typically rely on extensive real data for training. Due to the lack of annotated real images, recent works have attempted to exploit large-scale labeled synthetic data (LSD) for pre-training text detectors. However, a synth-to-real domain gap emerges, further limiting the performance of text detectors. Differently, in this work, we propose FreeReal, a real-dom… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: Accepted by ECCV2024

  17. arXiv:2310.14566  [pdf, other

    cs.CV cs.CL

    HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models

    Authors: Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou

    Abstract: We introduce HallusionBench, a comprehensive benchmark designed for the evaluation of image-context reasoning. This benchmark presents significant challenges to advanced large visual-language models (LVLMs), such as GPT-4V(Vision), Gemini Pro Vision, Claude 3, and LLaVA-1.5, by emphasizing nuanced understanding and interpretation of visual data. The benchmark comprises 346 images paired with 1129… ▽ More

    Submitted 25 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to CVPR 2024

  18. arXiv:2310.07191  [pdf, other

    cs.CG math.NA

    $pκ$-Curves: Interpolatory curves with curvature approximating a parabola

    Authors: Zhihao Wang, Juan Cao, Tuan Guan, Zhonggui Chen, Yongjie Jessica Zhang

    Abstract: This paper introduces a novel class of fair and interpolatory curves called $pκ$-curves. These curves are comprised of smoothly stitched Bézier curve segments, where the curvature distribution of each segment is made to closely resemble a parabola, resulting in an aesthetically pleasing shape. Moreover, each segment passes through an interpolated point at a parameter where the parabola has an extr… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  19. arXiv:2307.06344  [pdf, other

    q-bio.QM cs.CV eess.IV

    The Whole Pathological Slide Classification via Weakly Supervised Learning

    Authors: Qiehe Sun, Jiawen Li, Jin Xu, Junru Cheng, Tian Guan, Yonghong He

    Abstract: Due to its superior efficiency in utilizing annotations and addressing gigapixel-sized images, multiple instance learning (MIL) has shown great promise as a framework for whole slide image (WSI) classification in digital pathology diagnosis. However, existing methods tend to focus on advanced aggregators with different structures, often overlooking the intrinsic features of H\&E pathological slide… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  20. arXiv:2306.10003  [pdf, other

    cs.CV

    C2F2NeUS: Cascade Cost Frustum Fusion for High Fidelity and Generalizable Neural Surface Reconstruction

    Authors: Luoyuan Xu, Tao Guan, Yuesong Wang, Wenkai Liu, Zhaojie Zeng, Junle Wang, Wei Yang

    Abstract: There is an emerging effort to combine the two popular 3D frameworks using Multi-View Stereo (MVS) and Neural Implicit Surfaces (NIS) with a specific focus on the few-shot / sparse view setting. In this paper, we introduce a novel integration scheme that combines the multi-view stereo with neural signed distance function representations, which potentially overcomes the limitations of both methods.… ▽ More

    Submitted 14 August, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by ICCV2023

  21. arXiv:2306.06236  [pdf, other

    cs.MA cs.LG cs.RO

    iPLAN: Intent-Aware Planning in Heterogeneous Traffic via Distributed Multi-Agent Reinforcement Learning

    Authors: Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha

    Abstract: Navigating safely and efficiently in dense and heterogeneous traffic scenarios is challenging for autonomous vehicles (AVs) due to their inability to infer the behaviors or intentions of nearby drivers. In this work, we introduce a distributed multi-agent reinforcement learning (MARL) algorithm that can predict trajectories and intents in dense and heterogeneous traffic scenarios. Our approach for… ▽ More

    Submitted 21 August, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  22. arXiv:2305.12437  [pdf, other

    cs.CV

    SCP: Soft Conditional Prompt Learning for Aerial Video Action Recognition

    Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Fuxiao Liu, Dinesh Manocha

    Abstract: We present a new learning approach, Soft Conditional Prompt Learning (SCP), which leverages the strengths of prompt learning for aerial video action recognition. Our approach is designed to predict the action of each agent by helping the models focus on the descriptions or instructions associated with actions in the input videos for aerial/robot visual perception. Our formulation supports various… ▽ More

    Submitted 28 August, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: IROS2024

  23. arXiv:2303.17778  [pdf, other

    cs.CV

    CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition

    Authors: Tianrui Guan, Aswath Muthuselvam, Montana Hoover, Xijun Wang, Jing Liang, Adarsh Jagan Sathyamoorthy, Damon Conover, Dinesh Manocha

    Abstract: We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting. Cross-source point cloud data corresponds to point sets captured by depth sensors with different accuracies or from different distances and perspectives. We address the challenges in terms of developing 3D place recognition methods that account for the representati… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

  24. arXiv:2303.14502  [pdf, other

    cs.RO

    VERN: Vegetation-aware Robot Navigation in Dense Unstructured Outdoor Environments

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Tianrui Guan, Mason Russell, Damon Conover, Jason Pusey, Dinesh Manocha

    Abstract: We propose a novel method for autonomous legged robot navigation in densely vegetated environments with a variety of pliable/traversable and non-pliable/untraversable vegetation. We present a novel few-shot learning classifier that can be trained on a few hundred RGB images to differentiate flora that can be navigated through, from the ones that must be circumvented. Using the vegetation classific… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 8 Pages, 5 figures

  25. AZTR: Aerial Video Action Recognition with Auto Zoom and Temporal Reasoning

    Authors: Xijun Wang, Ruiqi Xian, Tianrui Guan, Celso M. de Melo, Stephen M. Nogar, Aniket Bera, Dinesh Manocha

    Abstract: We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also presen… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at ICRA 2023

  26. arXiv:2211.00288  [pdf, other

    cs.CV

    Self-supervised Character-to-Character Distillation for Text Recognition

    Authors: Tongkun Guan, Wei Shen, Xue Yang, Qi Feng, Zekun Jiang, Xiaokang Yang

    Abstract: When handling complicated text images (e.g., irregular structures, low resolution, heavy occlusion, and uneven illumination), existing supervised text recognition methods are data-hungry. Although these methods employ large-scale synthetic text images to reduce the dependence on annotated real images, the domain gap still limits the recognition performance. Therefore, exploring the robust text fea… ▽ More

    Submitted 18 August, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: Accepted by ICCV2023

  27. arXiv:2209.07725  [pdf, other

    cs.RO cs.CV

    VINet: Visual and Inertial-based Terrain Classification and Adaptive Navigation over Unknown Terrain

    Authors: Tianrui Guan, Ruitao Song, Zhixian Ye, Liangjun Zhang

    Abstract: We present a visual and inertial-based terrain classification network (VINet) for robotic navigation over different traversable surfaces. We use a novel navigation-based labeling scheme for terrain classification and generalization on unknown surfaces. Our proposed perception method and adaptive scheduling control framework can make predictions according to terrain navigation properties and lead t… ▽ More

    Submitted 1 March, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

  28. arXiv:2209.05722  [pdf, other

    cs.RO

    GrASPE: Graph based Multimodal Fusion for Robot Navigation in Unstructured Outdoor Environments

    Authors: Kasun Weerakoon, Adarsh Jagan Sathyamoorthy, Jing Liang, Tianrui Guan, Utsav Patel, Dinesh Manocha

    Abstract: We present a novel trajectory traversability estimation and planning algorithm for robot navigation in complex outdoor environments. We incorporate multimodal sensory inputs from an RGB camera, 3D LiDAR, and the robot's odometry sensor to train a prediction model to estimate candidate trajectories' success probabilities based on partially reliable multi-modal sensor observations. We encode high-di… ▽ More

    Submitted 16 May, 2023; v1 submitted 13 September, 2022; originally announced September 2022.

  29. arXiv:2207.13848  [pdf, other

    cs.DC cs.LG cs.PF math.NA

    Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio

    Authors: Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Nianxiong Tan, Xiaopeng Yu, Hongzhong Zheng, Jianyi Meng, Xiaolang Yan, Yuan Xie

    Abstract: Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    Comments: This paper has been submitted to the IEEE International Conference on Parallel and Distributed Systems (ICPADS). 8 pages, 2 fgures, 3 tables

    ACM Class: F.2.1; G.3; D.1.3; G.1.3

  30. arXiv:2206.07244  [pdf, other

    cs.DC

    OpSparse: a Highly Optimized Framework for Sparse General Matrix Multiplication on GPUs

    Authors: Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Linyong Huang, Hongzhong Zheng, Yuan Xie

    Abstract: Sparse general matrix multiplication (SpGEMM) is an important and expensive computation primitive in many real-world applications. Due to SpGEMM's inherent irregularity and the vast diversity of its input matrices, developing high-performance SpGEMM implementation on modern processors such as GPUs is challenging. The state-of-the-art SpGEMM libraries (i.e., $nsparse$ and $spECK$) adopt several alg… ▽ More

    Submitted 14 June, 2022; originally announced June 2022.

    Comments: This paper has been submitted to the IEEE Access since May 7, 2022, and is currently under review by IEEE Access. 20 pages, 11 fgures, 5 tables

    MSC Class: 68-02; 68W10; 65F50 ACM Class: D.1.3; G.1.3

  31. arXiv:2206.06611  [pdf, other

    cs.DC cs.MS cs.PF

    Accelerating CPU-Based Sparse General Matrix Multiplication With Binary Row Merging

    Authors: Zhaoyang Du, Yijin Guan, Tianchan Guan, Dimin Niu, Hongzhong Zheng, Yuan Xie

    Abstract: Sparse general matrix multiplication (SpGEMM) is a fundamental building block for many real-world applications. Since SpGEMM is a well-known memory-bounded application with vast and irregular memory accesses, considering the memory access efficiency is of critical importance for SpGEMM's performance. Yet, the existing methods put less consideration into the memory subsystem and achieved suboptimal… ▽ More

    Submitted 19 August, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: This work has been accepted by IEEE Access (DOI:10.1109/ACCESS.2022.3193937). There are 12 pages, 6 fgures, 2 tables

    MSC Class: 68-02; 68W10; 65F50 ACM Class: D.1.3; G.1.3

  32. arXiv:2206.05840  [pdf, other

    cs.LG cs.AI

    GAN based Data Augmentation to Resolve Class Imbalance

    Authors: Sairamvinay Vijayaraghavan, Terry Guan, Jason, Song

    Abstract: The number of credit card fraud has been growing as technology grows and people can take advantage of it. Therefore, it is very important to implement a robust and effective method to detect such frauds. The machine learning algorithms are appropriate for these tasks since they try to maximize the accuracy of predictions and hence can be relied upon. However, there is an impending flaw where in ma… ▽ More

    Submitted 12 June, 2022; originally announced June 2022.

  33. arXiv:2205.03517  [pdf, other

    cs.RO

    AdaptiveON: Adaptive Outdoor Local Navigation Method For Stable and Reliable Actions

    Authors: Jing Liang, Kasun Weerakoon, Tianrui Guan, Nare Karapetyan, Dinesh Manocha

    Abstract: We present a novel outdoor navigation algorithm to generate stable and efficient actions to navigate a robot to reach a goal. We use a multi-stage training pipeline and show that our approach produces policies that result in stable and reliable robot navigation on complex terrains. Based on the Proximal Policy Optimization (PPO) algorithm, we developed a novel method to achieve multiple capabiliti… ▽ More

    Submitted 6 December, 2022; v1 submitted 6 May, 2022; originally announced May 2022.

    Comments: 10 pages

  34. arXiv:2203.15459  [pdf, other

    cs.SE

    Influence of Communication Among Shared Developers on the Productivity of Open Source Software Projects

    Authors: Sairamvinay Vijayaraghavan, Jinxiao Song, Terry Guan, Seongwoo Choi, Sutej Kulkarni

    Abstract: Many software developers rely on open source software for developing their applications and writing their source codes. Measuring an independent project's overall productivity is still an open problem for many technology companies. In this project, we address to bridge the gap of analyzing which are the most important features for prediction of a productivity based system. We have chosen to collec… ▽ More

    Submitted 25 March, 2022; originally announced March 2022.

  35. arXiv:2203.10694  [pdf, other

    cs.CV

    FAR: Fourier Aerial Video Recognition

    Authors: Divya Kothandaraman, Tianrui Guan, Xijun Wang, Sean Hu, Ming Lin, Dinesh Manocha

    Abstract: We present an algorithm, Fourier Activity Recognition (FAR), for UAV video activity recognition. Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background. Our disentanglement technique operates in the frequency domain to characterize the extent of temporal change of spatial pixels, and exploits convol… ▽ More

    Submitted 18 July, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

    Comments: ECCV 2022 Poster paper

  36. arXiv:2203.03382  [pdf, other

    cs.CV

    Self-supervised Implicit Glyph Attention for Text Recognition

    Authors: Tongkun Guan, Chaochen Gu, Jingzheng Tu, Xue Yang, Qi Feng, Yudi Zhao, Xiaokang Yang, Wei Shen

    Abstract: The attention mechanism has become the \emph{de facto} module in scene text recognition (STR) methods, due to its capability of extracting character-level representations. These methods can be summarized into implicit attention based and supervised attention based, depended on how the attention is computed, i.e., implicit attention and supervised attention are learned from sequence-level text anno… ▽ More

    Submitted 15 May, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: CVPR2023

  37. arXiv:2202.12873  [pdf, other

    cs.RO

    TerraPN: Unstructured Terrain Navigation using Online Self-Supervised Learning

    Authors: Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Tianrui Guan, Jing Liang, Dinesh Manocha

    Abstract: We present TerraPN, a novel method that learns the surface properties (traction, bumpiness, deformability, etc.) of complex outdoor terrains directly from robot-terrain interactions through self-supervised learning, and uses it for autonomous robot navigation. Our method uses RGB images of terrain surfaces and the robot's velocities as inputs, and the IMU vibrations and odometry errors experienced… ▽ More

    Submitted 22 June, 2022; v1 submitted 25 February, 2022; originally announced February 2022.

    Comments: 10 pages, 6 figures

  38. Learning to be a Statistician: Learned Estimator for Number of Distinct Values

    Authors: Renzhi Wu, Bolin Ding, Xu Chu, Zhewei Wei, Xiening Dai, Tao Guan, Jingren Zhou

    Abstract: Estimating the number of distinct values (NDV) in a column is useful for many tasks in database systems, such as columnstore compression and data profiling. In this work, we focus on how to derive accurate NDV estimations from random (online/offline) samples. Such efficient estimation is critical for tasks where it is prohibitive to scan the data even once. Existing sample-based estimators typical… ▽ More

    Submitted 6 February, 2022; originally announced February 2022.

    Comments: Published at International Conference on Very Large Data Bases (VLDB) 2022

    Journal ref: PVLDB, 15(2): 272 - 284, 2022

  39. Industrial Scene Text Detection with Refined Feature-attentive Network

    Authors: Tongkun Guan, Chaochen Gu, Changsheng Lu, Jingzheng Tu, Qi Feng, Kaijie Wu, Xinping Guan

    Abstract: Detecting the marking characters of industrial metal parts remains challenging due to low visual contrast, uneven illumination, corroded character structures, and cluttered background of metal part images. Affected by these factors, bounding boxes generated by most existing methods locate low-contrast text areas inaccurately. In this paper, we propose a refined feature-attentive network (RFN) to s… ▽ More

    Submitted 29 March, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

  40. arXiv:2109.06250  [pdf, other

    cs.RO

    TNS: Terrain Traversability Mapping and Navigation System for Autonomous Excavators

    Authors: Tianrui Guan, Zhenpeng He, Ruitao Song, Dinesh Manocha, Liangjun Zhang

    Abstract: We present a terrain traversability mapping and navigation system (TNS) for autonomous excavator applications in an unstructured environment. We use an efficient approach to extract terrain features from RGB images and 3D point clouds and incorporate them into a global map for planning and navigation. Our system can adapt to changing environments and update the terrain information in real-time. Mo… ▽ More

    Submitted 5 May, 2022; v1 submitted 13 September, 2021; originally announced September 2021.

  41. arXiv:2104.11896  [pdf, other

    cs.CV

    M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

    Authors: Tianrui Guan, Jun Wang, Shiyi Lan, Rohan Chandra, Zuxuan Wu, Larry Davis, Dinesh Manocha

    Abstract: We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids. M3DeTR is the first approach that unifies multiple point cloud representations, feature scales, as well as models mutual relationships between point clouds simultaneously using transfo… ▽ More

    Submitted 22 October, 2021; v1 submitted 24 April, 2021; originally announced April 2021.

  42. arXiv:2103.04233  [pdf, other

    cs.RO cs.CV

    GANav: Efficient Terrain Segmentation for Robot Navigation in Unstructured Outdoor Environments

    Authors: Tianrui Guan, Divya Kothandaraman, Rohan Chandra, Adarsh Jagan Sathyamoorthy, Kasun Weerakoon, Dinesh Manocha

    Abstract: We propose GANav, a novel group-wise attention mechanism to identify safe and navigable regions in off-road terrains and unstructured environments from RGB images. Our approach classifies terrains based on their navigability levels using coarse-grained semantic segmentation. Our novel group-wise attention loss enables any backbone network to explicitly focus on the different groups' features with… ▽ More

    Submitted 17 June, 2022; v1 submitted 6 March, 2021; originally announced March 2021.

  43. arXiv:2009.13631  [pdf, other

    cs.DB

    Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version)

    Authors: Zuozhi Wang, Kai Zeng, Botong Huang, Wei Chen, Xiaozong Cui, Bo Wang, Ji Liu, Liya Fan, Dachuan Qu, Zhenyu Hou, Tao Guan, Chen Li, Jingren Zhou

    Abstract: Incremental processing is widely-adopted in many applications, ranging from incremental view maintenance, stream computing, to recently emerging progressive data warehouse and intermittent query processing. Despite many algorithms developed on this topic, none of them can produce an incremental plan that always achieves the best performance, since the optimal plan is data dependent. In this paper,… ▽ More

    Submitted 28 September, 2020; originally announced September 2020.

    Comments: 19 pages, 8 figures. The short version of this paper is accepeted at VLDB 2021 (PVLDB Volume 14, Issue 1)

    ACM Class: H.2.4

  44. Ultrasound Liver Fibrosis Diagnosis using Multi-indicator guided Deep Neural Networks

    Authors: Jiali Liu, Wenxuan Wang, Tianyao Guan, Ningbo Zhao, Xiaoguang Han, Zhen Li

    Abstract: Accurate analysis of the fibrosis stage plays very important roles in follow-up of patients with chronic hepatitis B infection. In this paper, a deep learning framework is presented for automatically liver fibrosis prediction. On contrary of previous works, our approach can take use of the information provided by multiple ultrasound images. An indicator-guided learning mechanism is further propose… ▽ More

    Submitted 10 September, 2020; originally announced September 2020.

    Comments: Jiali Liu and Wenxuan Wang are equal contribution

    Journal ref: Machine Learning in Medical Imaging 2019

  45. arXiv:2004.10976  [pdf, other

    cs.RO

    OF-VO: Efficient Navigation among Pedestrians Using Commodity Sensors

    Authors: Jing Liang, Yi-Ling Qiao, Tianrui Guan, Dinesh Manocha

    Abstract: We present a modified velocity-obstacle (VO) algorithm that uses probabilistic partial observations of the environment to compute velocities and navigate a robot to a target. Our system uses commodity visual sensors, including a mono-camera and a 2D Lidar, to explicitly predict the velocities and positions of surrounding obstacles through optical flow estimation, object detection, and sensor fusio… ▽ More

    Submitted 8 June, 2021; v1 submitted 23 April, 2020; originally announced April 2020.

    Comments: 10 pages

  46. arXiv:2004.06042  [pdf, other

    cs.CV cs.LG eess.IV

    Adversarial Style Mining for One-Shot Unsupervised Domain Adaptation

    Authors: Yawei Luo, Ping Liu, Tao Guan, Junqing Yu, Yi Yang

    Abstract: We aim at the problem named One-Shot Unsupervised Domain Adaptation. Unlike traditional Unsupervised Domain Adaptation, it assumes that only one unlabeled target sample can be available when learning to adapt. This setting is realistic but more challenging, in which conventional adaptation approaches are prone to failure due to the scarce of unlabeled target data. To this end, we propose a novel A… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: Preprint

  47. arXiv:2003.05395  [pdf, other

    cs.RO

    Frozone: Freezing-Free, Pedestrian-Friendly Navigation in Human Crowds

    Authors: Adarsh Jagan Sathyamoorthy, Utsav Patel, Tianrui Guan, Dinesh Manocha

    Abstract: We present Frozone, a novel algorithm to deal with the Freezing Robot Problem (FRP) that arises when a robot navigates through dense scenarios and crowds. Our method senses and explicitly predicts the trajectories of pedestrians and constructs a Potential Freezing Zone (PFZ); a spatial zone where the robot could freeze or be obtrusive to humans. Our formulation computes a deviation velocity to avo… ▽ More

    Submitted 11 March, 2020; originally announced March 2020.

    Comments: 9 pages, 6 figures

  48. arXiv:2002.03038  [pdf, other

    cs.RO

    DenseCAvoid: Real-time Navigation in Dense Crowds using Anticipatory Behaviors

    Authors: Adarsh Jagan Sathyamoorthy, Jing Liang, Utsav Patel, Tianrui Guan, Rohan Chandra, Dinesh Manocha

    Abstract: We present DenseCAvoid, a novel navigation algorithm for navigating a robot through dense crowds and avoiding collisions by anticipating pedestrian behaviors. Our formulation uses visual sensors and a pedestrian trajectory prediction algorithm to track pedestrians in a set of input frames and provide bounding boxes that extrapolate the pedestrian positions in a future time. Our hybrid approach com… ▽ More

    Submitted 7 February, 2020; originally announced February 2020.

    Comments: 8 pages, 5 figures

  49. arXiv:1912.01118  [pdf, other

    cs.RO

    Forecasting Trajectory and Behavior of Road-Agents Using Spectral Clustering in Graph-LSTMs

    Authors: Rohan Chandra, Tianrui Guan, Srujan Panuganti, Trisha Mittal, Uttaran Bhattacharya, Aniket Bera, Dinesh Manocha

    Abstract: We present a novel approach for traffic forecasting in urban traffic scenarios using a combination of spectral graph analysis and deep learning. We predict both the low-level information (future trajectories) as well as the high-level information (road-agent behavior) from the extracted trajectory of each road-agent. Our formulation represents the proximity between the road agents using a weighted… ▽ More

    Submitted 5 August, 2020; v1 submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted to RAL/IROS 2020 as a dual journal and conference submission

  50. arXiv:1905.02607  [pdf, other

    cs.SI cs.LG

    PocketCare: Tracking the Flu with Mobile Phones using Partial Observations of Proximity and Symptoms

    Authors: Wen Dong, Tong Guan, Bruno Lepri, Chunming Qiao

    Abstract: Mobile phones provide a powerful sensing platform that researchers may adopt to understand proximity interactions among people and the diffusion, through these interactions, of diseases, behaviors, and opinions. However, it remains a challenge to track the proximity-based interactions of a whole community and then model the social diffusion of diseases and behaviors starting from the observations… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.