Skip to main content

Showing 1–50 of 69 results for author: Nag, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.02389  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    SafaRi:Adaptive Sequence Transformer for Weakly Supervised Referring Expression Segmentation

    Authors: Sayan Nag, Koustava Goswami, Srikrishna Karanam

    Abstract: Referring Expression Segmentation (RES) aims to provide a segmentation mask of the target object in an image referred to by the text (i.e., referring expression). Existing methods require large-scale mask annotations. Moreover, such approaches do not generalize well to unseen/zero-shot scenarios. To address the aforementioned issues, we propose a weakly-supervised bootstrapping architecture for RE… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  2. arXiv:2407.01851  [pdf, other

    cs.CV cs.AI cs.LG eess.AS

    Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

    Authors: Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta, Jun Chen, Mohamed Elhoseiny, Ruohan Gao, Dinesh Manocha

    Abstract: Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks that only require a coarse-grained understanding of the audio-visual semantics. We present Meerkat, an audio-visual LLM equipped with a fine-grained un… ▽ More

    Submitted 3 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  3. arXiv:2406.04673  [pdf, other

    cs.CV cs.AI cs.MM eess.AS

    MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models

    Authors: Sanjoy Chowdhury, Sayan Nag, K J Joseph, Balaji Vasan Srinivasan, Dinesh Manocha

    Abstract: Music is a universal language that can communicate emotions and feelings. It forms an essential part of the whole spectrum of creative media, ranging from movies to social media posts. Machine learning models that can synthesize music are predominantly conditioned on textual descriptions of it. Inspired by how musicians compose music not just from a movie script, but also through visualizations, w… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPR 2024 as Highlight paper. Webpage: https://schowdhury671.github.io/melfusion_cvpr2024/

  4. arXiv:2403.19113  [pdf, other

    cs.CL cs.AI

    FACTOID: FACtual enTailment fOr hallucInation Detection

    Authors: Vipula Rawte, S. M Towhidul Islam Tonmoy, Krishnav Rajbangshi, Shravani Nag, Aman Chadha, Amit P. Sheth, Amitava Das

    Abstract: The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supp… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  5. arXiv:2403.18341  [pdf, other

    cs.CL

    IterAlign: Iterative Constitutional Alignment of Large Language Models

    Authors: Xiusi Chen, Hongzhi Wen, Sreyashi Nag, Chen Luo, Qingyu Yin, Ruirui Li, Zheng Li, Wei Wang

    Abstract: With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are l… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

    Comments: NAACL 2024

  6. arXiv:2403.06021  [pdf, other

    cs.IR cs.LG

    Hierarchical Query Classification in E-commerce Search

    Authors: Bing He, Sreyashi Nag, Limeng Cui, Suhang Wang, Zheng Li, Rahul Goutam, Zhen Li, Haiyang Zhang

    Abstract: E-commerce platforms typically store and structure product information and search data in a hierarchy. Efficiently categorizing user search queries into a similar hierarchical structure is paramount in enhancing user experience on e-commerce platforms as well as news curation and academic research. The significance of this task is amplified when dealing with sensitive query categorization or criti… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Published at: the ACM Web Conference 2024 in the industry track (WWW'24)

  7. arXiv:2403.05435  [pdf, other

    cs.CV eess.IV eess.SP

    OmniCount: Multi-label Object Counting with Semantic-Geometric Priors

    Authors: Anindya Mondal, Sauradip Nag, Xiatian Zhu, Anjan Dutta

    Abstract: Object counting is pivotal for understanding the composition of scenes. Previously, this task was dominated by class-specific methods, which have gradually evolved into more adaptable class-agnostic strategies. However, these strategies come with their own set of limitations, such as the need for manual exemplar input and multiple passes for multiple categories, resulting in significant inefficien… ▽ More

    Submitted 20 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  8. arXiv:2403.05174  [pdf, other

    cs.LG

    VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

    Authors: Soumi Das, Shubhadip Nag, Shreyyash Sharma, Suparna Bhattacharya, Sourangshu Bhattacharya

    Abstract: Trustworthy AI is crucial to the widespread adoption of AI in high-stakes applications with fairness, robustness, and accuracy being some of the key trustworthiness metrics. In this work, we propose a controllable framework for data-centric trustworthy AI (DCTAI)- VTruST, that allows users to control the trade-offs between the different trustworthiness metrics of the constructed training datasets.… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted in ICLR 2024 DMLR workshop

  9. arXiv:2312.12423  [pdf, other

    cs.CV cs.AI

    Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

    Authors: Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

    Abstract: The ability of large language models (LLMs) to process visual inputs has given rise to general-purpose vision systems, unifying various vision-language (VL) tasks by instruction tuning. However, due to the enormous diversity in input-output formats in the vision domain, existing general-purpose models fail to successfully integrate segmentation and multi-image inputs with coarse-level tasks into a… ▽ More

    Submitted 19 June, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

    Comments: CVPR 2024 Highlight

  10. arXiv:2312.05407  [pdf, other

    cs.CV

    Active Learning Guided Federated Online Adaptation: Applications in Medical Image Segmentation

    Authors: Md Shazid Islam, Sayak Nag, Arindam Dutta, Miraj Ahmed, Fahim Faisal Niloy, Amit K. Roy-Chowdhury

    Abstract: Data privacy, storage, and distribution shifts are major bottlenecks in medical image analysis. Data cannot be shared across patients, physicians, and facilities due to privacy concerns, usually requiring each patient's data to be analyzed in a discreet setting at a near real-time pace. However, one would like to take advantage of the accumulated knowledge across healthcare facilities as the compu… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  11. arXiv:2312.01564  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    APoLLo: Unified Adapter and Prompt Learning for Vision Language Models

    Authors: Sanjoy Chowdhury, Sayan Nag, Dinesh Manocha

    Abstract: The choice of input text prompt plays a critical role in the performance of Vision-Language Pretrained (VLP) models such as CLIP. We present APoLLo, a unified multi-modal approach that combines Adapter and Prompt learning for Vision-Language models. Our method is designed to substantially improve the generalization capabilities of VLP models when they are fine-tuned in a few-shot setting. We intro… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted at EMNLP 2023 (Main track)

  12. arXiv:2311.05198  [pdf, other

    cs.CV

    Adaptive-Labeling for Enhancing Remote Sensing Cloud Understanding

    Authors: Jay Gala, Sauradip Nag, Huichou Huang, Ruirui Liu, Xiatian Zhu

    Abstract: Cloud analysis is a critical component of weather and climate science, impacting various sectors like disaster management. However, achieving fine-grained cloud analysis, such as cloud segmentation, in remote sensing remains challenging due to the inherent difficulties in obtaining accurate labels, leading to significant labeling errors in training data. Existing methods often assume the availabil… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: Accepted at the TCCML Workshop at NeurIPS 2023

  13. arXiv:2308.14115  [pdf, other

    cs.CL

    Situated Natural Language Explanations

    Authors: Zining Zhu, Haoming Jiang, Jingfeng Yang, Sreyashi Nag, Chao Zhang, Jie Huang, Yifan Gao, Frank Rudzicz, Bing Yin

    Abstract: Natural language is among the most accessible tools for explaining decisions to humans, and large pretrained language models (PLMs) have demonstrated impressive abilities to generate coherent natural language explanations (NLE). The existing NLE research perspectives do not take the audience into account. An NLE can have high textual quality, but it might not accommodate audiences' needs and prefe… ▽ More

    Submitted 24 March, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

  14. arXiv:2308.07293  [pdf, other

    cs.SD cs.LG eess.AS

    DiffSED: Sound Event Detection with Denoising Diffusion

    Authors: Swapnil Bhosale, Sauradip Nag, Diptesh Kanojia, Jiankang Deng, Xiatian Zhu

    Abstract: Sound Event Detection (SED) aims to predict the temporal boundaries of all the events of interest and their class labels, given an unconstrained audio sample. Taking either the splitand-classify (i.e., frame-level) strategy or the more principled event-level modeling approach, all existing methods consider the SED problem from the discriminative learning perspective. In this work, we reformulate t… ▽ More

    Submitted 16 August, 2023; v1 submitted 14 August, 2023; originally announced August 2023.

  15. arXiv:2307.10763  [pdf, other

    cs.CV cs.AI cs.LG eess.IV

    Actor-agnostic Multi-label Action Recognition with Multi-modal Query

    Authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta

    Abstract: Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecti… ▽ More

    Submitted 10 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: Published at the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France

  16. arXiv:2307.05463  [pdf, other

    cs.CV

    EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

    Authors: Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

    Abstract: Video-language pre-training (VLP) has become increasingly important due to its ability to generalize to various vision and language tasks. However, existing egocentric VLP frameworks utilize separate video and language encoders and learn task-specific cross-modal information only during fine-tuning, limiting the development of a unified system. In this work, we introduce the second generation of e… ▽ More

    Submitted 18 August, 2023; v1 submitted 11 July, 2023; originally announced July 2023.

    Comments: Published in ICCV 2023

  17. arXiv:2306.02680  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    BeAts: Bengali Speech Acts Recognition using Multimodal Attention Fusion

    Authors: Ahana Deb, Sayan Nag, Ayan Mahapatra, Soumitri Chattopadhyay, Aritra Marik, Pijush Kanti Gayen, Shankha Sanyal, Archi Banerjee, Samir Karmakar

    Abstract: Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful represent… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  18. arXiv:2304.00733  [pdf, other

    cs.CV

    Unbiased Scene Graph Generation in Videos

    Authors: Sayak Nag, Kyle Min, Subarna Tripathi, Amit K. Roy Chowdhury

    Abstract: The task of dynamic scene graph generation (SGG) from videos is complicated and challenging due to the inherent dynamics of a scene, temporal fluctuation of model predictions, and the long-tailed distribution of the visual relationships in addition to the already existing challenges in image-based SGG. Existing methods for dynamic SGG have primarily focused on capturing spatio-temporal context usi… ▽ More

    Submitted 29 June, 2023; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Published in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023

  19. arXiv:2303.14863  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion

    Authors: Sauradip Nag, Xiatian Zhu, Jiankang Deng, Yi-Zhe Song, Tao Xiang

    Abstract: We propose a new formulation of temporal action detection (TAD) with denoising diffusion, DiffTAD in short. Taking as input random temporal proposals, it can yield action proposals accurately given an untrimmed long video. This presents a generative modeling perspective, against previous discriminative learning manners. This capability is achieved by first diffusing the ground-truth proposals to r… ▽ More

    Submitted 14 July, 2023; v1 submitted 26 March, 2023; originally announced March 2023.

    Comments: ICCV 2023; Code available at https://github.com/sauradip/DiffusionTAD

  20. arXiv:2303.09695  [pdf, other

    cs.CV cs.GR cs.MM

    PersonalTailor: Personalizing 2D Pattern Design from 3D Garment Point Clouds

    Authors: Sauradip Nag, Anran Qi, Xiatian Zhu, Ariel Shamir

    Abstract: Garment pattern design aims to convert a 3D garment to the corresponding 2D panels and their sewing structure. Existing methods rely either on template fitting with heuristics and prior assumptions, or on model learning with complicated shape parameterization. Importantly, both approaches do not allow for personalization of the output garment, which today has increasing demands. To fill this deman… ▽ More

    Submitted 11 August, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: Technical Report

  21. arXiv:2303.05556  [pdf, other

    cs.CV

    An Evaluation of Non-Contrastive Self-Supervised Learning for Federated Medical Image Analysis

    Authors: Soumitri Chattopadhyay, Soham Ganguly, Sreejit Chaudhury, Sayan Nag, Samiran Chattopadhyay

    Abstract: Privacy and annotation bottlenecks are two major issues that profoundly affect the practicality of machine learning-based medical image analysis. Although significant progress has been made in these areas, these issues are not yet fully resolved. In this paper, we seek to tackle these concerns head-on and systematically explore the applicability of non-contrastive self-supervised learning (SSL) al… ▽ More

    Submitted 9 March, 2023; originally announced March 2023.

  22. arXiv:2303.02245  [pdf, other

    cs.CV

    Exploring Self-Supervised Representation Learning For Low-Resource Medical Image Analysis

    Authors: Soumitri Chattopadhyay, Soham Ganguly, Sreejit Chaudhury, Sayan Nag, Samiran Chattopadhyay

    Abstract: The success of self-supervised learning (SSL) has mostly been attributed to the availability of unlabeled yet large-scale datasets. However, in a specialized domain such as medical imaging which is a lot different from natural images, the assumption of data availability is unrealistic and impractical, as the data itself is scanty and found in small databases, collected for specific prognosis tasks… ▽ More

    Submitted 28 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted at IEEE ICIP 2023

  23. ViTA: A Vision Transformer Inference Accelerator for Edge Applications

    Authors: Shashank Nag, Gourav Datta, Souvik Kundu, Nitin Chandrachoodan, Peter A. Beerel

    Abstract: Vision Transformer models, such as ViT, Swin Transformer, and Transformer-in-Transformer, have recently gained significant traction in computer vision tasks due to their ability to capture the global relation between features which leads to superior performance. However, they are compute-heavy and difficult to deploy in resource-constrained edge devices. Existing hardware accelerators, including t… ▽ More

    Submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted at ISCAS 2023

    Journal ref: 2023 IEEE International Symposium on Circuits and Systems (ISCAS), Monterey, CA, USA, 2023, pp. 1-5

  24. arXiv:2211.14924  [pdf, other

    cs.CV

    Post-Processing Temporal Action Detection

    Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

    Abstract: Existing Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence, before temporal boundary estimation and action classification. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original… ▽ More

    Submitted 3 March, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: CVPR 2023; Code available at https://github.com/sauradip/GAP

  25. arXiv:2211.14905  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Multi-Modal Few-Shot Temporal Action Detection

    Authors: Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

    Abstract: Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection (TAD) to new classes. The former adapts a pretrained vision model to a new task represented by as few as a single video per class, whilst the latter requires no training examples by exploiting a semantic description of the new class. In this work, we introduce a new multi-modality few-shot… ▽ More

    Submitted 27 March, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: Technical Report

  26. arXiv:2210.15075  [pdf, other

    cs.CV

    IDEAL: Improved DEnse locAL Contrastive Learning for Semi-Supervised Medical Image Segmentation

    Authors: Hritam Basak, Soumitri Chattopadhyay, Rohit Kundu, Sayan Nag, Rammohan Mallipeddi

    Abstract: Due to the scarcity of labeled data, Contrastive Self-Supervised Learning (SSL) frameworks have lately shown great potential in several medical image analysis tasks. However, the existing contrastive mechanisms are sub-optimal for dense pixel-level segmentation tasks due to their inability to mine local features. To this end, we extend the concept of metric learning to the segmentation task, using… ▽ More

    Submitted 2 March, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Paper accepted for publication at IEEE ICASSP 2023

  27. arXiv:2210.04135  [pdf, other

    cs.CV cs.LG cs.MM

    VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

    Authors: Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

    Abstract: Vision-language pre-training (VLP) has recently proven highly effective for various uni- and multi-modal downstream applications. However, most existing end-to-end VLP methods use high-resolution image-text box data to perform well on fine-grained region-level tasks, such as object detection, segmentation, and referring expression comprehension. Unfortunately, such high-resolution images with accu… ▽ More

    Submitted 29 October, 2023; v1 submitted 8 October, 2022; originally announced October 2022.

    Comments: Published in TMLR 2023

  28. arXiv:2208.00955  [pdf, other

    cs.CV

    Large-Scale Product Retrieval with Weakly Supervised Representation Learning

    Authors: Xiao Han, Kam Woh Ng, Sauradip Nag, Zhiyu Qu

    Abstract: Large-scale weakly supervised product retrieval is a practically useful yet computationally challenging problem. This paper introduces a novel solution for the eBay Visual Search Challenge (eProduct) held at the Ninth Workshop on Fine-Grained Visual Categorisation workshop (FGVC9) of CVPR 2022. This competition presents two challenges: (a) E-commerce is a drastically fine-grained domain including… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: FGVC9 CVPR2022

  29. arXiv:2207.08184  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Zero-Shot Temporal Action Detection via Vision-Language Prompting

    Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

    Abstract: Existing temporal action detection (TAD) methods rely on large training data including segment-level annotations, limited to recognizing previously seen classes alone during inference. Collecting and annotating a large training set for each class of interest is costly and hence unscalable. Zero-shot TAD (ZS-TAD) resolves this obstacle by enabling a pre-trained model to recognize any unseen action… ▽ More

    Submitted 17 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code available at https://github.com/sauradip/STALE

  30. arXiv:2207.07059  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Semi-Supervised Temporal Action Detection with Proposal-Free Masking

    Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

    Abstract: Existing temporal action detection (TAD) methods rely on a large number of training data with segment-level annotations. Collecting and annotating such a training set is thus highly expensive and unscalable. Semi-supervised TAD (SS-TAD) alleviates this problem by leveraging unlabeled videos freely available at scale. However, SS-TAD is also a much more challenging problem than supervised TAD, and… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code available at https://github.com/sauradip/SPOT

  31. arXiv:2207.06580  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Proposal-Free Temporal Action Detection via Global Segmentation Mask Learning

    Authors: Sauradip Nag, Xiatian Zhu, Yi-Zhe Song, Tao Xiang

    Abstract: Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video. This leads to complex model designs due to proposal generation and/or per-proposal action instance evaluation and the resultant high computational cost. In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (… ▽ More

    Submitted 19 August, 2022; v1 submitted 13 July, 2022; originally announced July 2022.

    Comments: ECCV 2022; Code available at https://github.com/sauradip/TAGS

  32. ACLNet: An Attention and Clustering-based Cloud Segmentation Network

    Authors: Dhruv Makwana, Subhrajit Nag, Onkar Susladkar, Gayatri Deshmukh, Sai Chandra Teja R, Sparsh Mittal, C Krishna Mohan

    Abstract: We propose a novel deep learning model named ACLNet, for cloud segmentation from ground images. ACLNet uses both deep neural network and machine learning (ML) algorithm to extract complementary features. Specifically, it uses EfficientNet-B0 as the backbone, "`a trous spatial pyramid pooling" (ASPP) to learn at multiple receptive fields, and "global attention module" (GAM) to extract finegrained d… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: 11 pages, 3 figures, 5 tables, Published in remote sensing letters

    Journal ref: volume 13, pages 865-875, year 2022

  33. WaferSegClassNet -- A Light-weight Network for Classification and Segmentation of Semiconductor Wafer Defects

    Authors: Subhrajit Nag, Dhruv Makwana, Sai Chandra Teja R, Sparsh Mittal, C Krishna Mohan

    Abstract: As the integration density and design intricacy of semiconductor wafers increase, the magnitude and complexity of defects in them are also on the rise. Since the manual inspection of wafer defects is costly, an automated artificial intelligence (AI) based computer-vision approach is highly desired. The previous works on defect analysis have several limitations, such as low accuracy and the need fo… ▽ More

    Submitted 3 July, 2022; originally announced July 2022.

    Comments: 11 pages, 2 figures, 7 tables, Published in Computers in Industry

    Journal ref: Volume 142, 2022, 103720, ISSN 0166-3615,

  34. arXiv:2207.00506  [pdf, other

    cs.CV cs.CG

    How Far Can I Go ? : A Self-Supervised Approach for Deterministic Video Depth Forecasting

    Authors: Sauradip Nag, Nisarg Shah, Anran Qi, Raghavendra Ramachandra

    Abstract: In this paper we present a novel self-supervised method to anticipate the depth estimate for a future, unobserved real-world urban scene. This work is the first to explore self-supervised learning for estimation of monocular depth of future unobserved frames of a video. Existing works rely on a large number of annotated samples to generate the probabilistic prediction of depth for unseen frames. H… ▽ More

    Submitted 8 July, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted in ML4AD Workshop, NeurIPS 2021

  35. arXiv:2111.07042  [pdf

    cs.RO eess.SY

    Agile Satellite Planning for Multi-Payload Observations for Earth Science

    Authors: Rich Levinson, Sreeja Nag, Vinay Ravindra

    Abstract: We present planning challenges, methods and preliminary results for a new model-based paradigm for earth observing systems in adaptive remote sensing. Our heuristically guided constraint optimization planner produces coordinated plans for multiple satellites, each with multiple instruments (payloads). The satellites are agile, meaning they can quickly maneuver to change viewing angles in response… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

    Journal ref: International Workshop on Planning & Scheduling for Space (IWPSS) 2021

  36. arXiv:2110.10552  [pdf, other

    cs.CV cs.LG cs.MM

    Few-Shot Temporal Action Localization with Query Adaptive Transformer

    Authors: Sauradip Nag, Xiatian Zhu, Tao Xiang

    Abstract: Existing temporal action localization (TAL) works rely on a large number of training videos with exhaustive segment-level annotation, preventing them from scaling to new classes. As a solution to this problem, few-shot TAL (FS-TAL) aims to adapt a model to a new class represented by as few as a single video. Exiting FS-TAL methods assume trimmed training videos for new classes. However, this setti… ▽ More

    Submitted 20 October, 2021; originally announced October 2021.

    Comments: BMVC 2021

  37. arXiv:2109.04572  [pdf, other

    cs.LG cs.AI physics.data-an

    Deciphering Environmental Air Pollution with Large Scale City Data

    Authors: Mayukh Bhattacharyya, Sayan Nag, Udita Ghosh

    Abstract: Air pollution poses a serious threat to sustainable environmental conditions in the 21st century. Its importance in determining the health and living standards in urban settings is only expected to increase with time. Various factors ranging from artificial emissions to natural phenomena are known to be primary causal agents or influencers behind rising air pollution levels. However, the lack of l… ▽ More

    Submitted 15 June, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

    Comments: Accepted as a Oral Spotlight Paper at International Joint Conference of Artificial Intelligence (IJCAI) 2022

  38. arXiv:2108.09598  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    SERF: Towards better training of deep neural networks using log-Softplus ERror activation Function

    Authors: Sayan Nag, Mayukh Bhattacharyya

    Abstract: Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and nonmonotonic in nature. Like Mish, Serf… ▽ More

    Submitted 24 August, 2021; v1 submitted 21 August, 2021; originally announced August 2021.

  39. arXiv:2108.00340  [pdf, other

    cs.CV

    Reconstruction guided Meta-learning for Few Shot Open Set Recognition

    Authors: Sayak Nag, Dripta S. Raychaudhuri, Sujoy Paul, Amit K. Roy-Chowdhury

    Abstract: In many applications, we are constrained to learn classifiers from very limited data (few-shot classification). The task becomes even more challenging if it is also required to identify samples from unknown categories (open-set classification). Learning a good abstraction for a class with very few samples is extremely difficult, especially under open-set settings. As a result, open-set recognition… ▽ More

    Submitted 30 September, 2023; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: Accepted for publication in IEEE Transactions in Pattern Analysis and Machine Intelligence (TPAMI)

  40. arXiv:2105.12247  [pdf, other

    cs.LG cs.AI cs.CG cs.CV stat.ML

    GraphVICRegHSIC: Towards improved self-supervised representation learning for graphs with a hyrbid loss function

    Authors: Sayan Nag

    Abstract: Self-supervised learning and pre-training strategieshave developed over the last few years especiallyfor Convolutional Neural Networks (CNNs). Re-cently application of such methods can also be no-ticed for Graph Neural Networks (GNNs) . In thispaper, we have used a graph based self-supervisedlearning strategy with different loss functions (Bar-low Twins[Zbontaret al., 2021], HSIC[Tsaiet al.,2021],… ▽ More

    Submitted 26 November, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

    Comments: Paper Accepted in the Weakly Supervised Representation Learning Workshop, IJCAI 2021 (IJCAI2021-WSRL)

  41. arXiv:2102.06038  [pdf

    cs.SD cs.CL eess.AS

    A Fractal Approach to Characterize Emotions in Audio and Visual Domain: A Study on Cross-Modal Interaction

    Authors: Sayan Nag, Uddalok Sarkar, Shankha Sanyal, Archi Banerjee, Souparno Roy, Samir Karmakar, Ranjan Sengupta, Dipak Ghosh

    Abstract: It is already known that both auditory and visual stimulus is able to convey emotions in human mind to different extent. The strength or intensity of the emotional arousal vary depending on the type of stimulus chosen. In this study, we try to investigate the emotional arousal in a cross-modal scenario involving both auditory and visual stimulus while studying their source characteristics. A robus… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  42. arXiv:2102.06003  [pdf

    cs.SD cs.CL eess.AS

    Language Independent Emotion Quantification using Non linear Modelling of Speech

    Authors: Uddalok Sarkar, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

    Abstract: At present emotion extraction from speech is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking styles of a person, vocal tract information, timbral qualities and other congenital information regarding his voice. Our speech production system is a nonlinear system like most other real world system… ▽ More

    Submitted 11 February, 2021; originally announced February 2021.

  43. arXiv:2102.00616  [pdf

    cs.SD cs.LG cs.MM eess.AS

    Neural Network architectures to classify emotions in Indian Classical Music

    Authors: Uddalok Sarkar, Sayan Nag, Medha Basu, Archi Banerjee, Shankha Sanyal, Ranjan Sengupta, Dipak Ghosh

    Abstract: Music is often considered as the language of emotions. It has long been known to elicit emotions in human being and thus categorizing music based on the type of emotions they induce in human being is a very intriguing topic of research. When the task comes to classify emotions elicited by Indian Classical Music (ICM), it becomes much more challenging because of the inherent ambiguity associated wi… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  44. arXiv:2012.05694  [pdf

    cs.CV cs.AI cs.GR cs.LG physics.data-an

    Lookahead optimizer improves the performance of Convolutional Autoencoders for reconstruction of natural images

    Authors: Sayan Nag

    Abstract: Autoencoders are a class of artificial neural networks which have gained a lot of attention in the recent past. Using the encoder block of an autoencoder the input image can be compressed into a meaningful representation. Then a decoder is employed to reconstruct the compressed representation back to a version which looks like the input image. It has plenty of applications in the field of data com… ▽ More

    Submitted 2 December, 2020; originally announced December 2020.

  45. arXiv:2006.15100  [pdf, other

    cs.LG eess.SP stat.ML

    E2GC: Energy-efficient Group Convolution in Deep Neural Networks

    Authors: Nandan Kumar Jha, Rajat Saini, Subhrajit Nag, Sparsh Mittal

    Abstract: The number of groups ($g$) in group convolution (GConv) is selected to boost the predictive performance of deep neural networks (DNNs) in a compute and parameter efficient manner. However, we show that naive selection of $g$ in GConv creates an imbalance between the computational complexity and degree of data reuse, which leads to suboptimal energy efficiency in DNNs. We devise an optimum group si… ▽ More

    Submitted 26 June, 2020; originally announced June 2020.

    Comments: Accepted as a conference paper in 2020 33rd International Conference on VLSI Design and 2020 19th International Conference on Embedded Systems (VLSID)

    ACM Class: I.5.1; I.5.2; I.5.5; C.0

    Journal ref: VLSID (2020) 155-160

  46. arXiv:2005.12524  [pdf

    cs.CV cs.MM

    A New Unified Method for Detecting Text from Marathon Runners and Sports Players in Video

    Authors: Sauradip Nag, Palaiahnakote Shivakumara, Umapada Pal, Tong Lu, Michael Blumenstein

    Abstract: Detecting text located on the torsos of marathon runners and sports players in video is a challenging issue due to poor quality and adverse effects caused by flexible/colorful clothing, and different structures of human bodies or actions. This paper presents a new unified method for tackling the above challenges. The proposed method fuses gradient magnitude and direction coherence of text pixels i… ▽ More

    Submitted 26 May, 2020; originally announced May 2020.

    Comments: Accepted in Pattern Recognition, Elsevier

  47. arXiv:2004.08248  [pdf

    eess.AS cs.SD nlin.CD q-bio.NC

    Acoustical classification of different speech acts using nonlinear methods

    Authors: Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

    Abstract: A recitation is a way of combining the words together so that they have a sense of rhythm and thus an emotional content is imbibed within. In this study we envisaged to answer these questions in a scientific manner taking into consideration 5 (five) well known Bengali recitations of different poets conveying a variety of moods ranging from joy to sorrow. The clips were recited as well as read (in… ▽ More

    Submitted 5 August, 2020; v1 submitted 15 April, 2020; originally announced April 2020.

    Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018

  48. arXiv:2004.07820  [pdf

    cs.SD cs.CL eess.AS

    Speaker Recognition in Bengali Language from Nonlinear Features

    Authors: Uddalok Sarkar, Soumyadeep Pal, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

    Abstract: At present Automatic Speaker Recognition system is a very important issue due to its diverse applications. Hence, it becomes absolutely necessary to obtain models that take into consideration the speaking style of a person, vocal tract information, timbral qualities of his voice and other congenital information regarding his voice. The study of Bengali speech recognition and speaker identification… ▽ More

    Submitted 15 April, 2020; originally announced April 2020.

    Comments: arXiv admin note: text overlap with arXiv:1612.00171, arXiv:1601.07709

  49. arXiv:2004.02071  [pdf, ps, other

    cs.CL

    Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation

    Authors: Sreyashi Nag, Mihir Kale, Varun Lakshminarasimhan, Swapnil Singhavi

    Abstract: We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the available parallel corpora, this could adversely impact the synthetically generated sentences in a low resou… ▽ More

    Submitted 4 April, 2020; originally announced April 2020.

  50. arXiv:1912.05014  [pdf, other

    cs.CV cs.LG cs.MM

    Hybrid Style Siamese Network: Incorporating style loss in complementary apparels retrieval

    Authors: Mayukh Bhattacharyya, Sayan Nag

    Abstract: Image Retrieval grows to be an integral part of fashion e-commerce ecosystem as it keeps expanding in multitudes. Other than the retrieval of visually similar items, the retrieval of visually compatible or complementary items is also an important aspect of it. Normal Siamese Networks tend to work well on complementary items retrieval. But it fails to identify low level style features which make it… ▽ More

    Submitted 9 June, 2020; v1 submitted 23 November, 2019; originally announced December 2019.

    Comments: Paper Accepted in the Third Workshop on Computer Vision for Fashion, Art and Design, CVPR 2020