Skip to main content

Showing 1–50 of 166 results for author: Chellappa, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.18428  [pdf, other

    cs.LG cs.AI cs.CV

    Weighted Risk Invariance: Domain Generalization under Invariant Feature Shift

    Authors: Gina Wong, Joshua Gleason, Rama Chellappa, Yoav Wald, Anqi Liu

    Abstract: Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2407.09413  [pdf, other

    cs.CL cs.AI cs.CV

    SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

    Authors: Shraman Pramanick, Rama Chellappa, Subhashini Venugopalan

    Abstract: Seeking answers to questions within long scientific research articles is a crucial area of study that aids readers in quickly addressing their inquiries. However, existing question-answering (QA) datasets based on scientific papers are limited in scale and focus solely on textual content. To address this limitation, we introduce SPIQA (Scientific Paper Image Question Answering), the first large-sc… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: preprint

  3. arXiv:2405.18784  [pdf, other

    cs.CV

    LP-3DGS: Learning to Prune 3D Gaussian Splatting

    Authors: Zhaoliang Zhang, Tianchen Song, Yongjae Lee, Li Yang, Cheng Peng, Rama Chellappa, Deliang Fan

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has become one of the mainstream methodologies for novel view synthesis (NVS) due to its high quality and fast rendering speed. However, as a point-based scene representation, 3DGS potentially generates a large number of Gaussians to fit the scene, leading to high memory usage. Improvements that have been proposed require either an empirical and preset prunin… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  4. arXiv:2404.09432  [pdf, other

    cs.CV cs.AI cs.LG

    The 8th AI City Challenge

    Authors: Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Pranamesh Chakraborty, Sanjita Prajapati, Quan Kong, Norimasa Kobori, Munkhjargal Gochoo, Munkh-Erdene Otgonbold, Fady Alnajjar, Ganzorig Batnasan, Ping-Yang Chen, Jun-Wei Hsieh, Xunlei Wu, Sameer Satish Pusegaonkar, Yizhou Wang, Sujit Biswas, Rama Chellappa

    Abstract: The eighth AI City Challenge highlighted the convergence of computer vision and artificial intelligence in areas like retail, warehouse settings, and Intelligent Traffic Systems (ITS), presenting significant research opportunities. The 2024 edition featured five tracks, attracting unprecedented interest from 726 teams in 47 countries and regions. Track 1 dealt with multi-target multi-camera (MTMC)… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: Summary of the 8th AI City Challenge Workshop in conjunction with CVPR 2024

  5. arXiv:2403.16365  [pdf, other

    cs.LG cs.CR cs.CV

    Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion

    Authors: Hossein Souri, Arpit Bansal, Hamid Kazemi, Liam Fowl, Aniruddha Saha, Jonas Geiping, Andrew Gordon Wilson, Rama Chellappa, Tom Goldstein, Micah Goldblum

    Abstract: Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  6. arXiv:2403.12960  [pdf, other

    cs.CV

    FaceXFormer: A Unified Transformer for Facial Analysis

    Authors: Kartik Narayan, Vibashan VS, Rama Chellappa, Vishal M. Patel

    Abstract: In this work, we introduce FaceXformer, an end-to-end unified transformer model for a comprehensive range of facial analysis tasks such as face parsing, landmark detection, head pose estimation, attributes recognition, and estimation of age, gender, race, and landmarks visibility. Conventional methods in face analysis have often relied on task-specific designs and preprocessing techniques, which l… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Project page: https://kartik-3004.github.io/facexformer_web/

  7. arXiv:2403.04926  [pdf, other

    cs.CV

    BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling

    Authors: Cheng Peng, Yutao Tang, Yifan Zhou, Nengyu Wang, Xijun Liu, Deming Li, Rama Chellappa

    Abstract: Recent efforts in using 3D Gaussians for scene reconstruction and novel view synthesis can achieve impressive results on curated benchmarks; however, images captured in real life are often blurry. In this work, we analyze the robustness of Gaussian-Splatting-based methods against various image blur, such as motion blur, defocus blur, downscaling blur, \etc. Under these degradations, Gaussian-Splat… ▽ More

    Submitted 24 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

  8. arXiv:2402.08113  [pdf, other

    cs.CL cs.HC

    Addressing cognitive bias in medical language models

    Authors: Samuel Schmidgall, Carl Harris, Ime Essien, Daniel Olshvang, Tawsifur Rahman, Ji Woong Kim, Rojin Ziaei, Jason Eshraghian, Peter Abadir, Rama Chellappa

    Abstract: There is increasing interest in the application large language models (LLMs) to the medical field, in part because of their impressive performance on medical exam questions. While promising, exam questions do not reflect the complexity of real patient-doctor interactions. In reality, physicians' decisions are shaped by many complex factors, such as patient compliance, personal experience, ethical… ▽ More

    Submitted 20 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  9. arXiv:2402.06106  [pdf, other

    cs.CV

    CLR-Face: Conditional Latent Refinement for Blind Face Restoration Using Score-Based Diffusion Models

    Authors: Maitreya Suin, Rama Chellappa

    Abstract: Recent generative-prior-based methods have shown promising blind face restoration performance. They usually project the degraded images to the latent space and then decode high-quality faces either by single-stage latent optimization or directly from the encoding. Generating fine-grained facial details faithful to inputs remains a challenging problem. Most existing methods produce either overly sm… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

  10. arXiv:2401.15900  [pdf, other

    cs.CV

    MV2MAE: Multi-View Video Masked Autoencoders

    Authors: Ketul Shah, Robert Crandall, Jie Xu, Peng Zhou, Marian George, Mayank Bansal, Rama Chellappa

    Abstract: Videos captured from multiple viewpoints can help in perceiving the 3D structure of the world and benefit computer vision tasks such as action recognition, tracking, etc. In this paper, we present a method for self-supervised learning from synchronized multi-view videos. We use a cross-view reconstruction task to inject geometry information in the model. Our approach is based on the masked autoenc… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  11. arXiv:2312.14976  [pdf, other

    cs.CV cs.CY

    Gaussian Harmony: Attaining Fairness in Diffusion-based Face Generation Models

    Authors: Basudha Pal, Arunkumar Kannan, Ram Prabhakar Kathirvel, Alice J. O'Toole, Rama Chellappa

    Abstract: Diffusion models have achieved great progress in face generation. However, these models amplify the bias in the generation process, leading to an imbalance in distribution of sensitive attributes such as age, gender and race. This paper proposes a novel solution to this problem by balancing the facial attributes of the generated images. We mitigate the bias by localizing the means of the facial at… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

  12. arXiv:2312.12423  [pdf, other

    cs.CV cs.AI

    Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

    Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

    Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More

    Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight

  13. arXiv:2312.02914  [pdf, other

    cs.CV cs.LG

    Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training

    Authors: Arun Reddy, William Paul, Corban Rivera, Ketul Shah, Celso M. de Melo, Rama Chellappa

    Abstract: In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then… ▽ More

    Submitted 20 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted at CVPR 2024. 13 pages, 4 figures. Approved for public release: distribution unlimited

  14. arXiv:2312.02290  [pdf, other

    cs.CV

    You Can Run but not Hide: Improving Gait Recognition with Intrinsic Occlusion Type Awareness

    Authors: Ayush Gupta, Rama Chellappa

    Abstract: While gait recognition has seen many advances in recent years, the occlusion problem has largely been ignored. This problem is especially important for gait recognition from uncontrolled outdoor sequences at range - since any small obstruction can affect the recognition system. Most current methods assume the availability of complete body information while extracting the gait features. When parts… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: This work has been accepted to WACV 2024 as an Oral paper

  15. arXiv:2311.17074  [pdf, other

    cs.CV

    Self-Supervised Learning of Whole and Component-Based Semantic Representations for Person Re-Identification

    Authors: Siyuan Huang, Yifan Zhou, Ram Prabhakar, Xijun Liu, Yuxiang Guo, Hongrui Yi, Cheng Peng, Rama Chellappa, Chun Pong Lau

    Abstract: Person Re-Identification (ReID) is a challenging problem, focusing on identifying individuals across diverse settings. However, previous ReID methods primarily concentrated on a single domain or modality, such as Clothes-Changing ReID (CC-ReID) and video ReID. Real-world ReID is not constrained by factors like clothes or input types. Recent approaches emphasize on learning semantics through pre-tr… ▽ More

    Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  16. arXiv:2311.16497  [pdf, other

    cs.CV

    GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

    Authors: Yuxiang Guo, Anshul Shah, Jiang Liu, Ayush Gupta, Rama Chellappa, Cheng Peng

    Abstract: Gait recognition holds the promise to robustly identify subjects based on walking patterns instead of appearance information. In recent years, this field has been dominated by learning methods based on two principal input representations: dense silhouette masks or sparse pose keypoints. In this work, we propose a novel, point-based Contour-Pose representation, which compactly expresses both body s… ▽ More

    Submitted 14 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  17. arXiv:2311.15551  [pdf, other

    cs.CV cs.AI cs.CR cs.LG eess.IV

    Instruct2Attack: Language-Guided Semantic Adversarial Attacks

    Authors: Jiang Liu, Chen Wei, Yuxiang Guo, Heng Yu, Alan Yuille, Soheil Feizi, Chun Pong Lau, Rama Chellappa

    Abstract: We propose Instruct2Attack (I2A), a language-guided semantic attack that generates semantically meaningful perturbations according to free-form language instructions. We make use of state-of-the-art latent diffusion models, where we adversarially guide the reverse diffusion process to search for an adversarial latent code conditioned on the input image and text instruction. Compared to existing no… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: under submission, code coming soon

  18. arXiv:2311.09753  [pdf, other

    cs.CV

    DIFFNAT: Improving Diffusion Image Quality Using Natural Image Statistics

    Authors: Aniket Roy, Maiterya Suin, Anshul Shah, Ketul Shah, Jiang Liu, Rama Chellappa

    Abstract: Diffusion models have advanced generative AI significantly in terms of editing and creating naturalistic images. However, efficiently improving generated image quality is still of paramount interest. In this context, we propose a generic "naturalness" preserving loss function, viz., kurtosis concentration (KC) loss, which can be readily applied to any standard diffusion model pipeline to elevate t… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  19. arXiv:2311.05725  [pdf, other

    cs.CV

    Whole-body Detection, Recognition and Identification at Altitude and Range

    Authors: Siyuan Huang, Ram Prabhakar Kathirvel, Chun Pong Lau, Rama Chellappa

    Abstract: In this paper, we address the challenging task of whole-body biometric detection, recognition, and identification at distances of up to 500m and large pitch angles of up to 50 degree. We propose an end-to-end system evaluated on diverse datasets, including the challenging Biometric Recognition and Identification at Range (BRIAR) dataset. Our approach involves pre-training the detector on common im… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  20. arXiv:2310.19909  [pdf, other

    cs.CV cs.LG

    Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

    Authors: Micah Goldblum, Hossein Souri, Renkun Ni, Manli Shu, Viraj Prabhu, Gowthami Somepalli, Prithvijit Chattopadhyay, Mark Ibrahim, Adrien Bardes, Judy Hoffman, Rama Chellappa, Andrew Gordon Wilson, Tom Goldstein

    Abstract: Neural network based computer vision systems are typically built on a backbone, a pretrained or randomly initialized feature extractor. Several years ago, the default option was an ImageNet-trained convolutional neural network. However, the recent past has seen the emergence of countless backbones pretrained using various algorithms and datasets. While this abundance of choice has led to performan… ▽ More

    Submitted 19 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  21. arXiv:2310.03103  [pdf, other

    cs.LG

    Learning to Prompt Your Domain for Vision-Language Models

    Authors: Guoyizhe Wei, Feng Wang, Anshul Shah, Rama Chellappa

    Abstract: Prompt learning has recently become a very efficient transfer learning paradigm for Contrastive Language Image Pretraining (CLIP) models. Compared with fine-tuning the entire encoder, prompt learning can obtain highly competitive results by optimizing only a small number of parameters, which presents considerably exciting benefits for federated learning applications that prioritizes communication… ▽ More

    Submitted 29 August, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  22. arXiv:2310.00116  [pdf, other

    cs.LG cs.AI

    Certified Robustness via Dynamic Margin Maximization and Improved Lipschitz Regularization

    Authors: Mahyar Fazlyab, Taha Entesari, Aniket Roy, Rama Chellappa

    Abstract: To improve the robustness of deep classifiers against adversarial perturbations, many approaches have been proposed, such as designing new architectures with better robustness properties (e.g., Lipschitz-capped networks), or modifying the training process itself (e.g., min-max optimization, constrained learning, or regularization). These approaches, however, might not be effective at increasing th… ▽ More

    Submitted 12 March, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

  23. arXiv:2309.16650  [pdf, other

    cs.RO cs.CV

    ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning

    Authors: Qiao Gu, Alihusein Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Rama Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B. Tenenbaum, Antonio Torralba, Florian Shkurti, Liam Paull

    Abstract: For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, whi… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: Project page: https://concept-graphs.github.io/ Explainer video: https://youtu.be/mRhNkQwRYnc

  24. arXiv:2307.14578  [pdf, other

    cs.CV

    GADER: GAit DEtection and Recognition in the Wild

    Authors: Yuxiang Guo, Cheng Peng, Ram Prabhakar, Chun Pong Lau, Rama Chellappa

    Abstract: Gait recognition holds the promise of robustly identifying subjects based on their walking patterns instead of color information. While previous approaches have performed well for curated indoor scenes, they have significantly impeded applicability in unconstrained situations, e.g. outdoor, long distance scenes. We propose an end-to-end GAit DEtection and Recognition (GADER) algorithm for human au… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

  25. arXiv:2307.05463  [pdf, other

    cs.CV

    EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

    Authors: Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

    Abstract: Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of e… ▽ More

    Submitted 18 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Published in ICCV 2023

  26. arXiv:2306.08213  [pdf, other

    cs.CV cs.AI

    SMC-UDA: Structure-Modal Constraint for Unsupervised Cross-Domain Renal Segmentation

    Authors: Zhusi Zhong, Jie Li, Lulu Bi, Li Yang, Ihab Kamel, Rama Chellappa, Xinbo Gao, Harrison Bai, Zhicheng Jiao

    Abstract: Medical image segmentation based on deep learning often fails when deployed on images from a different domain. The domain adaptation methods aim to solve domain-shift challenges, but still face some problems. The transfer learning methods require annotation on the target domain, and the generative unsupervised domain adaptation (UDA) models ignore domain-specific representations, whose generated q… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: conference

  27. arXiv:2305.13625  [pdf, other

    cs.CV cs.CR

    DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection

    Authors: Jiang Liu, Chun Pong Lau, Rama Chellappa

    Abstract: The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from being identified by unauthorized FR systems utilizing adversarial attacks to generate encrypted face images. However, existing methods suffer from… ▽ More

    Submitted 28 May, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Code will be available at https://github.com/joellliu/DiffProtect/

  28. arXiv:2305.13548  [pdf, ps, other

    cs.CV cs.CR

    Attribute-Guided Encryption with Facial Texture Masking

    Authors: Chun Pong Lau, Jiang Liu, Rama Chellappa

    Abstract: The increasingly pervasive facial recognition (FR) systems raise serious concerns about personal privacy, especially for billions of users who have publicly shared their photos on social media. Several attempts have been made to protect individuals from unauthorized FR systems utilizing adversarial attacks to generate encrypted face images to protect users from being identified by FR systems. Howe… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  29. arXiv:2304.07500  [pdf, other

    cs.CV

    The 7th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Meenakshi S. Arya, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Sanjita Prajapati, Alice Li, Shangru Li, Krishna Kunadharaju, Shenxin Jiang, Rama Chellappa

    Abstract: The AI City Challenge's seventh edition emphasizes two domains at the intersection of computer vision and artificial intelligence - retail business and Intelligent Traffic Systems (ITS) - that have considerable untapped potential. The 2023 challenge had five tracks, which drew a record-breaking number of participation requests from 508 teams across 46 countries. Track 1 was a brand new track that… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: Summary of the 7th AI City Challenge Workshop in conjunction with CVPR 2023

  30. arXiv:2304.05387  [pdf, other

    cs.CV

    MOST: Multiple Object localization with Self-supervised Transformers for object discovery

    Authors: Sai Saketh Rambhatla, Ishan Misra, Rama Chellappa, Abhinav Shrivastava

    Abstract: We tackle the challenging task of unsupervised object localization in this work. Recently, transformers trained with self-supervised learning have been shown to exhibit object localization properties without being trained for this task. In this work, we present Multiple Object localization with Self-supervised Transformers (MOST) that uses features of transformers trained using self-supervised lea… ▽ More

    Submitted 26 August, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Accepted to ICCV2023 as an Oral. Project webpage: https://rssaketh.github.io/most

  31. arXiv:2304.00387  [pdf, other

    cs.CV

    HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions

    Authors: Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, Rama Chellappa

    Abstract: Supervised learning of skeleton sequence encoders for action recognition has received significant attention in recent times. However, learning such encoders without labels continues to be a challenging problem. While prior works have shown promising results by applying contrastive learning to pose sequences, the quality of the learned representations is often observed to be closely tied to data au… ▽ More

    Submitted 1 April, 2023; originally announced April 2023.

    Comments: To be presented at CVPR 2023

  32. arXiv:2303.10280  [pdf, other

    cs.CV

    Synthetic-to-Real Domain Adaptation for Action Recognition: A Dataset and Baseline Performances

    Authors: Arun V. Reddy, Ketul Shah, William Paul, Rohita Mocharla, Judy Hoffman, Kapil D. Katyal, Dinesh Manocha, Celso M. de Melo, Rama Chellappa

    Abstract: Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data h… ▽ More

    Submitted 1 August, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Comments: ICRA 2023. The first two authors contributed equally. Dataset available at: https://github.com/reddyav1/RoCoG-v2

  33. arXiv:2301.00794  [pdf, other

    cs.CV

    STEPs: Self-Supervised Key Step Extraction and Localization from Unlabeled Procedural Videos

    Authors: Anshul Shah, Benjamin Lundell, Harpreet Sawhney, Rama Chellappa

    Abstract: We address the problem of extracting key steps from unlabeled procedural videos, motivated by the potential of Augmented Reality (AR) headsets to revolutionize job training and performance. We decompose the problem into two steps: representation learning and key steps extraction. We propose a training objective, Bootstrapped Multi-Cue Contrastive (BMC2) loss to learn discriminative representations… ▽ More

    Submitted 9 September, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

    Comments: Accepted at ICCV 2023

  34. arXiv:2212.08969  [pdf, other

    cs.CV

    A Brief Survey on Person Recognition at a Distance

    Authors: Chrisopher B. Nalty, Neehar Peri, Joshua Gleason, Carlos D. Castillo, Shuowen Hu, Thirimachos Bourlai, Rama Chellappa

    Abstract: Person recognition at a distance entails recognizing the identity of an individual appearing in images or videos collected by long-range imaging systems such as drones or surveillance cameras. Despite recent advances in deep convolutional neural networks (DCNNs), this remains challenging. Images or videos collected by long-range cameras often suffer from atmospheric turbulence, blur, low-resolutio… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: This work has been accepted to the IEEE Asilomar Conference on Signals, Systems, and Computers (ACSSC) 2022

  35. arXiv:2212.05404  [pdf, other

    cs.CV

    Cap2Aug: Caption guided Image to Image data Augmentation

    Authors: Aniket Roy, Anshul Shah, Ketul Shah, Anirban Roy, Rama Chellappa

    Abstract: Visual recognition in a low-data regime is challenging and often prone to overfitting. To mitigate this issue, several data augmentation strategies have been proposed. However, standard transformations, e.g., rotation, cropping, and flipping provide limited semantic variations. To this end, we propose Cap2Aug, an image-to-image diffusion model-based data augmentation strategy using image captions… ▽ More

    Submitted 6 November, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

  36. arXiv:2210.09305  [pdf, other

    cs.LG cs.CR

    Thinking Two Moves Ahead: Anticipating Other Users Improves Backdoor Attacks in Federated Learning

    Authors: Yuxin Wen, Jonas Geiping, Liam Fowl, Hossein Souri, Rama Chellappa, Micah Goldblum, Tom Goldstein

    Abstract: Federated learning is particularly susceptible to model poisoning and backdoor attacks because individual users have direct control over the training data and model updates. At the same time, the attack power of an individual user is limited because their updates are quickly drowned out by those of many other users. Existing attacks do not account for future behaviors of other users, and thus requ… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Code is available at \url{https://github.com/YuxinWenRick/thinking-two-moves-ahead}

  37. arXiv:2210.05117  [pdf, other

    eess.IV cs.CV

    DA-VSR: Domain Adaptable Volumetric Super-Resolution For Medical Images

    Authors: Cheng Peng, S. Kevin Zhou, Rama Chellappa

    Abstract: Medical image super-resolution (SR) is an active research area that has many potential applications, including reducing scan time, bettering visual understanding, increasing robustness in downstream tasks, etc. However, applying deep-learning-based SR approaches for clinical applications often encounters issues of domain inconsistency, as the test data may be acquired by different machines or on d… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

    Comments: MICCAI2021

  38. arXiv:2210.04135  [pdf, other

    cs.CV cs.LG cs.MM

    VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

    Authors: Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

    Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accu… ▽ More

    Submitted 29 October, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Published in TMLR 2023

  39. arXiv:2210.04050  [pdf, other

    cs.CV

    Multi-Modal Human Authentication Using Silhouettes, Gait and RGB

    Authors: Yuxiang Guo, Cheng Peng, Chun Pong Lau, Rama Chellappa

    Abstract: Whole-body-based human authentication is a promising approach for remote biometrics scenarios. Current literature focuses on either body recognition based on RGB images or gait recognition based on body shapes and walking patterns; both have their advantages and drawbacks. In this work, we propose Dual-Modal Ensemble (DME), which combines both RGB and silhouette data to achieve more robust perform… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

  40. arXiv:2208.08049  [pdf, other

    cs.CV

    PDRF: Progressively Deblurring Radiance Field for Fast and Robust Scene Reconstruction from Blurry Images

    Authors: Cheng Peng, Rama Chellappa

    Abstract: We present Progressively Deblurring Radiance Field (PDRF), a novel approach to efficiently reconstruct high quality radiance fields from blurry images. While current State-of-The-Art (SoTA) scene reconstruction methods achieve photo-realistic rendering results from clean source views, their performances suffer when the source views are affected by blur, which is commonly observed for images in the… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  41. arXiv:2208.08048  [pdf, other

    eess.IV cs.CV

    REGAS: REspiratory-GAted Synthesis of Views for Multi-Phase CBCT Reconstruction from a single 3D CBCT Acquisition

    Authors: Cheng Peng, Haofu Liao, S. Kevin Zhou, Rama Chellappa

    Abstract: It is a long-standing challenge to reconstruct Cone Beam Computed Tomography (CBCT) of the lung under respiratory motion. This work takes a step further to address a challenging setting in reconstructing a multi-phase}4D lung image from just a single}3D CBCT acquisition. To this end, we introduce REpiratory-GAted Synthesis of views, or REGAS. REGAS proposes a self-supervised method to synthesize t… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

  42. arXiv:2205.07613  [pdf, other

    cs.CV

    Scalable Vehicle Re-Identification via Self-Supervision

    Authors: Pirazh Khorramshahi, Vineet Shenoy, Rama Chellappa

    Abstract: As Computer Vision technologies become more mature for intelligent transportation applications, it is time to ask how efficient and scalable they are for large-scale and real-time deployment. Among these technologies is Vehicle Re-Identification which is one of the key elements in city-scale vehicle analytics systems. Many state-of-the-art solutions for vehicle re-id mostly focus on improving the… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  43. arXiv:2204.13861  [pdf, other

    cs.CV

    Where in the World is this Image? Transformer-based Geo-localization in the Wild

    Authors: Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

    Abstract: Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. The challenges include huge diversity of images due to different environmental scenarios, drastic variation in the appearance of the same location depending on the time of the day, weather, season, and more importantly, the prediction is made from a… ▽ More

    Submitted 25 July, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Accepted in ECCV 2022

  44. arXiv:2204.10380  [pdf, other

    cs.CV

    The 6th AI City Challenge

    Authors: Milind Naphade, Shuo Wang, David C. Anastasiu, Zheng Tang, Ming-Ching Chang, Yue Yao, Liang Zheng, Mohammed Shaiqur Rahman, Archana Venkatachalapathy, Anuj Sharma, Qi Feng, Vitaly Ablavsky, Stan Sclaroff, Pranamesh Chakraborty, Alice Li, Shangru Li, Rama Chellappa

    Abstract: The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at the intersection of computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), and brick and mortar retail businesses. The four challenge tracks of the 2022 AI City Challenge received participation requests from 254 teams across 27 countries.… ▽ More

    Submitted 9 June, 2022; v1 submitted 21 April, 2022; originally announced April 2022.

    Comments: Summary of the 6th AI City Challenge Workshop in conjunction with CVPR 2022. arXiv admin note: text overlap with arXiv:2104.12233

  45. arXiv:2204.07841  [pdf, other

    cs.CV cs.AI cs.MM

    Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

    Authors: Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang

    Abstract: We study multi-modal few-shot object detection (FSOD) in this paper, using both few-shot visual examples and class semantic information for detection, which are complementary to each other by definition. Most of the previous works on multi-modal FSOD are fine-tuning-based which are inefficient for online applications. Moreover, these methods usually require expertise like class names to extract cl… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 April, 2022; originally announced April 2022.

    Comments: 17 pages

  46. arXiv:2204.07442  [pdf, other

    cs.CV cs.AI

    Scalable and Real-time Multi-Camera Vehicle Detection, Re-Identification, and Tracking

    Authors: Pirazh Khorramshahi, Vineet Shenoy, Michael Pack, Rama Chellappa

    Abstract: Multi-camera vehicle tracking is one of the most complicated tasks in Computer Vision as it involves distinct tasks including Vehicle Detection, Tracking, and Re-identification. Despite the challenges, multi-camera vehicle tracking has immense potential in transportation applications including speed, volume, origin-destination (O-D), and routing data generation. Several recent works have addressed… ▽ More

    Submitted 15 April, 2022; originally announced April 2022.

  47. arXiv:2203.04292  [pdf, other

    eess.IV cs.CV cs.LG

    Towards performant and reliable undersampled MR reconstruction via diffusion model sampling

    Authors: Cheng Peng, Pengfei Guo, S. Kevin Zhou, Vishal Patel, Rama Chellappa

    Abstract: Magnetic Resonance (MR) image reconstruction from under-sampled acquisition promises faster scanning time. To this end, current State-of-The-Art (SoTA) approaches leverage deep neural networks and supervised training to learn a recovery model. While these approaches achieve impressive performances, the learned model can be fragile on unseen degradation, e.g. when given a different acceleration fac… ▽ More

    Submitted 10 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

  48. arXiv:2201.04620  [pdf, other

    cs.CV

    SparseDet: Improving Sparsely Annotated Object Detection with Pseudo-positive Mining

    Authors: Saksham Suri, Sai Saketh Rambhatla, Rama Chellappa, Abhinav Shrivastava

    Abstract: Training with sparse annotations is known to reduce the performance of object detectors. Previous methods have focused on proxies for missing ground truth annotations in the form of pseudo-labels for unlabeled boxes. We observe that existing methods suffer at higher levels of sparsity in the data due to noisy pseudo-labels. To prevent this, we propose an end-to-end system that learns to separate t… ▽ More

    Submitted 26 August, 2023; v1 submitted 12 January, 2022; originally announced January 2022.

    Comments: Accepted at ICCV2023. Project webpage: https://www.cs.umd.edu/~sakshams/SparseDet. The first two authors contributed equally

  49. arXiv:2112.11610  [pdf, other

    cs.CV

    EyePAD++: A Distillation-based approach for joint Eye Authentication and Presentation Attack Detection using Periocular Images

    Authors: Prithviraj Dhar, Amit Kumar, Kirsten Kaplan, Khushi Gupta, Rakesh Ranjan, Rama Chellappa

    Abstract: A practical eye authentication (EA) system targeted for edge devices needs to perform authentication and be robust to presentation attacks, all while remaining compute and latency efficient. However, existing eye-based frameworks a) perform authentication and Presentation Attack Detection (PAD) independently and b) involve significant pre-processing steps to extract the iris region. Here, we intro… ▽ More

    Submitted 28 December, 2021; v1 submitted 21 December, 2021; originally announced December 2021.

  50. arXiv:2112.11450  [pdf, other

    cs.LG cs.AI cs.CV

    Max-Margin Contrastive Learning

    Authors: Anshul Shah, Suvrit Sra, Rama Chellappa, Anoop Cherian

    Abstract: Standard contrastive learning approaches usually require a large number of negatives for effective unsupervised learning and often exhibit slow convergence. We suspect this behavior is due to the suboptimal selection of negatives used for offering contrast to the positives. We counter this difficulty by taking inspiration from support vector machines (SVMs) to present max-margin contrastive learni… ▽ More

    Submitted 21 December, 2021; originally announced December 2021.

    Comments: Accepted at AAAI 2022