Skip to main content

Showing 1–50 of 65 results for author: Krishnamurthy, B

.
  1. arXiv:2405.00942  [pdf, other

    cs.CV cs.CL

    LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMs

    Authors: Somesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

    Abstract: Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLM… ▽ More

    Submitted 16 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

  2. arXiv:2402.01155  [pdf, other

    cs.CL

    CABINET: Content Relevance based Noise Reduction for Table Question Answering

    Authors: Sohan Patnaik, Heril Changwal, Milan Aggarwal, Sumit Bhatia, Yaman Kumar, Balaji Krishnamurthy

    Abstract: Table understanding capability of Large Language Models (LLMs) has been extensively studied through the task of question-answering (QA) over tables. Typically, only a small part of the whole table is relevant to derive the answer for a given question. The irrelevant parts act as noise and are distracting information, resulting in sub-optimal performance due to the vulnerability of LLMs to noise. T… ▽ More

    Submitted 13 February, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted at ICLR 2024 (spotlight)

  3. arXiv:2311.10995  [pdf, other

    cs.CV cs.CL

    Behavior Optimized Image Generation

    Authors: Varun Khurana, Yaman K Singla, Jayakumar Subramanian, Rajiv Ratn Shah, Changyou Chen, Zhiqiang Xu, Balaji Krishnamurthy

    Abstract: The last few years have witnessed great success on image generation, which has crossed the acceptance thresholds of aesthetics, making it directly applicable to personal and commercial applications. However, images, especially in marketing and advertising applications, are often created as a means to an end as opposed to just aesthetic concerns. The goal can be increasing sales, getting more click… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  4. arXiv:2311.05451  [pdf, other

    cs.CL cs.CY cs.LG

    All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation

    Authors: Pragyan Banerjee, Abhinav Java, Surgan Jandial, Simra Shahid, Shaz Furniturewala, Balaji Krishnamurthy, Sumit Bhatia

    Abstract: Fairness in Language Models (LMs) remains a longstanding challenge, given the inherent biases in training data that can be perpetuated by models and affect the downstream tasks. Recent methods employ expensive retraining or attempt debiasing during inference by constraining model outputs to contrast from a reference set of biased templates or exemplars. Regardless, they dont address the primary go… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: The first four authors contributed equally to the work

  5. arXiv:2309.00378  [pdf, other

    cs.CL cs.CV cs.HC

    Long-Term Ad Memorability: Understanding & Generating Memorable Ads

    Authors: Harini S I, Somesh Singh, Yaman K Singla, Aanisha Bhattacharyya, Veeky Baths, Changyou Chen, Rajiv Ratn Shah, Balaji Krishnamurthy

    Abstract: Marketers spend billions of dollars on advertisements, but to what end? At purchase time, if customers cannot recognize the brand for which they saw an ad, the money spent on the ad is essentially wasted. Despite its importance in marketing, until now, there has been no large-scale study on the memorability of ads. All previous memorability studies have been conducted on short-term recall on speci… ▽ More

    Submitted 20 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  6. arXiv:2309.00359  [pdf, other

    cs.CL cs.CV

    Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

    Authors: Ashmit Khandelwal, Aditya Agrawal, Aanisha Bhattacharyya, Yaman K Singla, Somesh Singh, Uttaran Bhattacharya, Ishita Dasgupta, Stefano Petrangeli, Rajiv Ratn Shah, Changyou Chen, Balaji Krishnamurthy

    Abstract: Shannon and Weaver's seminal information theory divides communication into three levels: technical, semantic, and effectiveness. While the technical level deals with the accurate reconstruction of transmitted symbols, the semantic and effectiveness levels deal with the inferred meaning and its effect on the receiver. Large Language Models (LLMs), with their wide generalizability, make some progres… ▽ More

    Submitted 16 March, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

  7. arXiv:2308.11239  [pdf, other

    cs.CV

    LOCATE: Self-supervised Object Discovery via Flow-guided Graph-cut and Bootstrapped Self-training

    Authors: Silky Singh, Shripad Deshmukh, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: Learning object segmentation in image and video datasets without human supervision is a challenging problem. Humans easily identify moving salient objects in videos using the gestalt principle of common fate, which suggests that what moves together belongs together. Building upon this idea, we propose a self-supervised object discovery approach that leverages motion and appearance information to p… ▽ More

    Submitted 2 December, 2023; v1 submitted 22 August, 2023; originally announced August 2023.

    Comments: Accepted to British Machine Vision Conference (BMVC) 2023

  8. arXiv:2307.04392  [pdf, other

    cs.CV

    FODVid: Flow-guided Object Discovery in Videos

    Authors: Silky Singh, Shripad Deshmukh, Mausoom Sarkar, Rishabh Jain, Mayur Hemani, Balaji Krishnamurthy

    Abstract: Segmentation of objects in a video is challenging due to the nuances such as motion blurring, parallax, occlusions, changes in illumination, etc. Instead of addressing these nuances separately, we focus on building a generalizable solution that avoids overfitting to the individual intricacies. Such a solution would also help us save enormous resources involved in human annotation of video corpora.… ▽ More

    Submitted 10 July, 2023; originally announced July 2023.

    Comments: CVPR 2023 (L3D-IVU workshop)

  9. arXiv:2306.16503  [pdf, other

    cs.LG cs.AI

    SARC: Soft Actor Retrospective Critic

    Authors: Sukriti Verma, Ayush Chopra, Jayakumar Subramanian, Mausoom Sarkar, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy

    Abstract: The two-time scale nature of SAC, which is an actor-critic algorithm, is characterised by the fact that the critic estimate has not converged for the actor at any given time, but since the critic learns faster than the actor, it ensures eventual consistency between the two. Various strategies have been introduced in literature to learn better gradient estimates to help achieve better convergence.… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted at RLDM 2022

  10. arXiv:2305.09758  [pdf, other

    cs.CV cs.CL

    A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot

    Authors: Aanisha Bhattacharya, Yaman K Singla, Balaji Krishnamurthy, Rajiv Ratn Shah, Changyou Chen

    Abstract: Multimedia content, such as advertisements and story videos, exhibit a rich blend of creativity and multiple modalities. They incorporate elements like text, visuals, audio, and storytelling techniques, employing devices like emotions, symbolism, and slogans to convey meaning. There is a dearth of large annotated training datasets in the multimedia domain hindering the development of supervised le… ▽ More

    Submitted 26 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP-23 TL;DR: Video understanding lags far behind NLP; LLMs excel in zero-shot. Our approach utilizes LLMs to verbalize videos, creating stories for zero-shot video understanding. This yields state-of-the-art results across five datasets, covering fifteen tasks

  11. arXiv:2305.09258  [pdf, other

    cs.IR cs.CL

    HyHTM: Hyperbolic Geometry based Hierarchical Topic Models

    Authors: Simra Shahid, Tanay Anand, Nikitha Srikanth, Sumit Bhatia, Balaji Krishnamurthy, Nikaash Puri

    Abstract: Hierarchical Topic Models (HTMs) are useful for discovering topic hierarchies in a collection of documents. However, traditional HTMs often produce hierarchies where lowerlevel topics are unrelated and not specific enough to their higher-level topics. Additionally, these methods can be computationally expensive. We present HyHTM - a Hyperbolic geometry based Hierarchical Topic Models - that addres… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: This paper is accepted in Findings of the Association for Computational Linguistics (2023)

  12. arXiv:2305.06677  [pdf, other

    cs.CL cs.AI cs.LG

    INGENIOUS: Using Informative Data Subsets for Efficient Pre-Training of Language Models

    Authors: H S V N S Kowndinya Renduchintala, Krishnateja Killamsetty, Sumit Bhatia, Milan Aggarwal, Ganesh Ramakrishnan, Rishabh Iyer, Balaji Krishnamurthy

    Abstract: A salient characteristic of pre-trained language models (PTLMs) is a remarkable improvement in their generalization capability and emergence of new capabilities with increasing model capacity and pre-training dataset size. Consequently, we are witnessing the development of enormous models pushing the state-of-the-art. It is, however, imperative to realize that this inevitably leads to prohibitivel… ▽ More

    Submitted 19 October, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

  13. arXiv:2305.04073  [pdf, other

    cs.AI cs.LG

    Explaining RL Decisions with Trajectories

    Authors: Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

    Abstract: Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL… ▽ More

    Submitted 22 January, 2024; v1 submitted 6 May, 2023; originally announced May 2023.

    Comments: Published at International Conference on Learning Representations (ICLR), 2023

  14. arXiv:2303.15122  [pdf, other

    cs.CV

    Parameter Efficient Local Implicit Image Function Network for Face Segmentation

    Authors: Mausoom Sarkar, Nikitha SR, Mayur Hemani, Rishabh Jain, Balaji Krishnamurthy

    Abstract: Face parsing is defined as the per-pixel labeling of images containing human faces. The labels are defined to identify key facial regions like eyes, lips, nose, hair, etc. In this work, we make use of the structural consistency of the human face to propose a lightweight face-parsing method using a Local Implicit Function network, FP-LIIF. We propose a simple architecture having a convolutional enc… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  15. arXiv:2302.05721  [pdf, other

    cs.HC cs.CL

    Synthesizing Human Gaze Feedback for Improved NLP Performance

    Authors: Varun Khurana, Yaman Kumar Singla, Nora Hollenstein, Rajesh Kumar, Balaji Krishnamurthy

    Abstract: Integrating human feedback in models can improve the performance of natural language processing (NLP) models. Feedback can be either explicit (e.g. ranking used in training language models) or implicit (e.g. using human cognitive signals in the form of eyetracking). Prior eye tracking and NLP research reveal that cognitive processes, such as human scanpaths, gleaned from human gaze patterns aid in… ▽ More

    Submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted at European Chapter of the Association for Computational Linguistics (EACL)

  16. arXiv:2301.06928  [pdf, other

    cs.LG cs.AI

    Towards Estimating Transferability using Hard Subsets

    Authors: Tarun Ram Menta, Surgan Jandial, Akash Patil, Vimal KB, Saketh Bachu, Balaji Krishnamurthy, Vineeth N. Balasubramanian, Chirag Agarwal, Mausoom Sarkar

    Abstract: As transfer learning techniques are increasingly used to transfer knowledge from the source model to the target task, it becomes important to quantify which source models are suitable for a given target task without performing computationally expensive fine tuning. In this work, we propose HASTE (HArd Subset TransfErability), a new strategy to estimate the transferability of a source model to a pa… ▽ More

    Submitted 17 January, 2023; originally announced January 2023.

    Comments: First three authors contributed equally

  17. arXiv:2211.10157  [pdf, other

    cs.CV cs.AI

    UMFuse: Unified Multi View Fusion for Human Editing applications

    Authors: Rishabh Jain, Mayur Hemani, Duygu Ceylan, Krishna Kumar Singh, Jingwan Lu, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: Numerous pose-guided human editing methods have been explored by the vision community due to their extensive practical applications. However, most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. This objective becomes ill-defined in cases when the target pose differs significantly from the input pose. Existing… ▽ More

    Submitted 28 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: 8 pages, 6 figures

    ACM Class: I.4; I.5

  18. arXiv:2211.08540  [pdf, other

    cs.CV cs.AI

    VGFlow: Visibility guided Flow Network for Human Reposing

    Authors: Rishabh Jain, Krishna Kumar Singh, Mayur Hemani, Jingwan Lu, Mausoom Sarkar, Duygu Ceylan, Balaji Krishnamurthy

    Abstract: The task of human reposing involves generating a realistic image of a person standing in an arbitrary conceivable pose. There are multiple difficulties in generating perceptually accurate images, and existing methods suffer from limitations in preserving texture, maintaining pattern coherence, respecting cloth boundaries, handling occlusions, manipulating skin generation, etc. These difficulties a… ▽ More

    Submitted 28 March, 2023; v1 submitted 13 November, 2022; originally announced November 2022.

    Comments: Selected for publication in CVPR2023

    ACM Class: I.4; I.5

  19. arXiv:2210.11728  [pdf, other

    cs.CV

    Distilling the Undistillable: Learning from a Nasty Teacher

    Authors: Surgan Jandial, Yash Khasbage, Arghya Pal, Vineeth N Balasubramanian, Balaji Krishnamurthy

    Abstract: The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

    Comments: Published in main track of ECCV 2022, 17 pages with references, 5 figures, 6 tables

    Journal ref: ECCV 2022

  20. arXiv:2209.06584  [pdf, other

    cs.CV

    One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text

    Authors: Abhinav Java, Shripad Deshmukh, Milan Aggarwal, Surgan Jandial, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: Active consumption of digital documents has yielded scope for research in various applications, including search. Traditionally, searching within a document has been cast as a text matching problem ignoring the rich layout and visual cues commonly present in structured documents, forms, etc. To that end, we ask a mostly unexplored question: "Can we search for other similar snippets present in a ta… ▽ More

    Submitted 12 September, 2022; originally announced September 2022.

  21. arXiv:2208.09626  [pdf, other

    cs.CL cs.CV

    Persuasion Strategies in Advertisements

    Authors: Yaman Kumar Singla, Rajat Jha, Arunim Gupta, Milan Aggarwal, Aditya Garg, Tushar Malyan, Ayush Bhardwaj, Rajiv Ratn Shah, Balaji Krishnamurthy, Changyou Chen

    Abstract: Modeling what makes an advertisement persuasive, i.e., eliciting the desired response from consumer, is critical to the study of propaganda, social psychology, and marketing. Despite its importance, computational modeling of persuasion in computer vision is still in its infancy, primarily due to the lack of benchmark datasets that can provide persuasion-strategy labels associated with ads. Motivat… ▽ More

    Submitted 6 May, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted at AAAI-23

  22. arXiv:2208.06458  [pdf, other

    cs.CL cs.LG

    LM-CORE: Language Models with Contextually Relevant External Knowledge

    Authors: Jivat Neet Kaur, Sumit Bhatia, Milan Aggarwal, Rachit Bansal, Balaji Krishnamurthy

    Abstract: Large transformer-based pre-trained language models have achieved impressive performance on a variety of knowledge-intensive tasks and can capture factual knowledge in their parameters. We argue that storing large amounts of knowledge in the model parameters is sub-optimal given the ever-growing amounts of knowledge and resource requirements. We posit that a more efficient alternative is to provid… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Comments: Published at Findings of NAACL, 2022

  23. arXiv:2207.09714  [pdf, other

    cs.LG cs.AI cs.MA q-bio.PE q-bio.QM

    Differentiable Agent-based Epidemiology

    Authors: Ayush Chopra, Alexander Rodríguez, Jayakumar Subramanian, Arnau Quera-Bofarull, Balaji Krishnamurthy, B. Aditya Prakash, Ramesh Raskar

    Abstract: Mechanistic simulators are an indispensable tool for epidemiology to explore the behavior of complex, dynamic infections under varying conditions and navigate uncertain environments. Agent-based models (ABMs) are an increasingly popular simulation paradigm that can represent the heterogeneity of contact interactions with granular detail and agency of individual behavior. However, conventional ABM… ▽ More

    Submitted 21 May, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: Appears in AAMAS 2023 and ICML AI4ABM 2022 (best paper award)

  24. arXiv:2206.05912  [pdf, other

    cs.CV

    INDIGO: Intrinsic Multimodality for Domain Generalization

    Authors: Puneet Mangla, Shivam Chandhok, Milan Aggarwal, Vineeth N Balasubramanian, Balaji Krishnamurthy

    Abstract: For models to generalize under unseen domains (a.k.a domain generalization), it is crucial to learn feature representations that are domain-agnostic and capture the underlying semantics that makes up an object category. Recent advances towards weakly supervised vision-language models that learn holistic representations from cheap weakly supervised noisy text annotations have shown their ability on… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: Under Submission

  25. arXiv:2206.05706  [pdf, other

    cs.CL

    CoSe-Co: Text Conditioned Generative CommonSense Contextualizer

    Authors: Rachit Bansal, Milan Aggarwal, Sumit Bhatia, Jivat Neet Kaur, Balaji Krishnamurthy

    Abstract: Pre-trained Language Models (PTLMs) have been shown to perform well on natural language tasks. Many prior works have leveraged structured commonsense present in the form of entities linked through labeled relations in Knowledge Graphs (KGs) to assist PTLMs. Retrieval approaches use KG as a separate static module which limits coverage since KGs contain finite knowledge. Generative methods train PTL… ▽ More

    Submitted 17 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: Accepted at NAACL 2022 (main conference)

  26. arXiv:2205.03859  [pdf, other

    cs.CV cs.LG

    On Conditioning the Input Noise for Controlled Image Generation with Diffusion Models

    Authors: Vedant Singh, Surgan Jandial, Ayush Chopra, Siddharth Ramesh, Balaji Krishnamurthy, Vineeth N. Balasubramanian

    Abstract: Conditional image generation has paved the way for several breakthroughs in image editing, generating stock photos and 3-D object generation. This continues to be a significant area of interest with the rise of new state-of-the-art methods that are based on diffusion models. However, diffusion models provide very little control over the generated image, which led to subsequent works exploring tech… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

    Comments: Accepted at the workshop on AI for Content Creation at CVPR 2022

  27. arXiv:2111.11692  [pdf, other

    cs.MA

    Status-quo policy gradient in Multi-Agent Reinforcement Learning

    Authors: Pinkesh Badjatiya, Mausoom Sarkar, Nikaash Puri, Jayakumar Subramanian, Abhishek Sinha, Siddharth Singh, Balaji Krishnamurthy

    Abstract: Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such s… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

  28. arXiv:2110.04421  [pdf, other

    cs.MA cs.LG

    DeepABM: Scalable, efficient and differentiable agent-based simulations via graph neural networks

    Authors: Ayush Chopra, Esma Gel, Jayakumar Subramanian, Balaji Krishnamurthy, Santiago Romero-Brufau, Kalyan S. Pasupathy, Thomas C. Kingsley, Ramesh Raskar

    Abstract: We introduce DeepABM, a framework for agent-based modeling that leverages geometric message passing of graph neural networks for simulating action and interactions over large agent populations. Using DeepABM allows scaling simulations to large agent populations in real-time and running them efficiently on GPU architectures. To demonstrate the effectiveness of DeepABM, we build DeepABM-COVID simula… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: Accepted at Winter Simulation Conference 2021

  29. arXiv:2109.12406  [pdf, other

    cs.CL cs.AI cs.LG

    MINIMAL: Mining Models for Data Free Universal Adversarial Triggers

    Authors: Swapnil Parekh, Yaman Singla Kumar, Somesh Singh, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

    Abstract: It is well known that natural language models are vulnerable to adversarial attacks, which are mostly input-specific in nature. Recently, it has been shown that there also exist input-agnostic attacks in NLP models, called universal adversarial triggers. However, existing methods to craft universal triggers are data intensive. They require large amounts of data samples to generate adversarial trig… ▽ More

    Submitted 25 September, 2021; originally announced September 2021.

  30. arXiv:2109.07001  [pdf, other

    cs.CV

    ZFlow: Gated Appearance Flow-based Virtual Try-on with 3D Priors

    Authors: Ayush Chopra, Rishabh Jain, Mayur Hemani, Balaji Krishnamurthy

    Abstract: Image-based virtual try-on involves synthesizing perceptually convincing images of a model wearing a particular garment and has garnered significant research interest due to its immense practical applicability. Recent methods involve a two stage process: i) warping of the garment to align with the model ii) texture fusion of the warped garment and target model to generate the try-on output. Issues… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.

    Comments: Accepted at ICCV 2021

  31. arXiv:2109.03813  [pdf, other

    cs.AI

    Video2Skill: Adapting Events in Demonstration Videos to Skills in an Environment using Cyclic MDP Homomorphisms

    Authors: Sumedh A Sontakke, Sumegh Roychowdhury, Mausoom Sarkar, Nikaash Puri, Balaji Krishnamurthy, Laurent Itti

    Abstract: Humans excel at learning long-horizon tasks from demonstrations augmented with textual commentary, as evidenced by the burgeoning popularity of tutorial videos online. Intuitively, this capability can be separated into 2 distinct subtasks - first, dividing a long-horizon demonstration sequence into semantically meaningful events; second, adapting such events into meaningful behaviors in one's own… ▽ More

    Submitted 9 September, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

  32. arXiv:2109.00928  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

    Authors: Yaman Kumar Singla, Avykat Gupta, Shaurya Bagga, Changyou Chen, Balaji Krishnamurthy, Rajiv Ratn Shah

    Abstract: Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, ma… ▽ More

    Submitted 30 August, 2021; originally announced September 2021.

    Comments: Published in CIKM 2021

  33. arXiv:2107.04419  [pdf, other

    cs.LG

    Form2Seq : A Framework for Higher-Order Form Structure Extraction

    Authors: Milan Aggarwal, Hiresh Gupta, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: Document structure extraction has been a widely researched area for decades with recent works performing it as a semantic segmentation task over document images using fully-convolution networks. Such methods are limited by image resolution due to which they fail to disambiguate structures in dense regions which appear commonly in forms. To mitigate this, we propose Form2Seq, a novel sequence-to-se… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: This paper has been presented at EMNLP 2020

  34. arXiv:2107.04396  [pdf, other

    cs.CV

    Multi-Modal Association based Grouping for Form Structure Extraction

    Authors: Milan Aggarwal, Mausoom Sarkar, Hiresh Gupta, Balaji Krishnamurthy

    Abstract: Document structure extraction has been a widely researched area for decades. Recent work in this direction has been deep learning-based, mostly focusing on extracting structure using fully convolution NN through semantic segmentation. In this work, we present a novel multi-modal approach for form structure extraction. Given simple elements such as textruns and widgets, we extract higher-order stru… ▽ More

    Submitted 9 July, 2021; originally announced July 2021.

    Comments: This work has been accepted and presented at WACV 2020

  35. arXiv:2105.06956  [pdf, other

    cs.LG

    Information-theoretic Evolution of Model Agnostic Global Explanations

    Authors: Sukriti Verma, Nikaash Puri, Piyush Gupta, Balaji Krishnamurthy

    Abstract: Explaining the behavior of black box machine learning models through human interpretable rules is an important research area. Recent work has focused on explaining model behavior locally i.e. for specific predictions as well as globally across the fields of vision, natural language, reinforcement learning and data science. We present a novel model-agnostic approach that derives rules to globally e… ▽ More

    Submitted 14 May, 2021; originally announced May 2021.

  36. arXiv:2012.04256  [pdf, other

    cs.CV

    Data InStance Prior (DISP) in Generative Adversarial Networks

    Authors: Puneet Mangla, Nupur Kumari, Mayank Singh, Balaji Krishnamurthy, Vineeth N Balasubramanian

    Abstract: Recent advances in generative adversarial networks (GANs) have shown remarkable progress in generating high-quality images. However, this gain in performance depends on the availability of a large amount of training data. In limited data regimes, training typically diverges, and therefore the generated samples are of low quality and lack diversity. Previous works have addressed training in low dat… ▽ More

    Submitted 21 September, 2021; v1 submitted 8 December, 2020; originally announced December 2020.

    Comments: Accepted at WACV 2022

  37. arXiv:2012.01524  [pdf, other

    cs.CL

    TAN-NTM: Topic Attention Networks for Neural Topic Modeling

    Authors: Madhur Panwar, Shashank Shailabh, Milan Aggarwal, Balaji Krishnamurthy

    Abstract: Topic models have been widely used to learn text representations and gain insight into document corpora. To perform topic discovery, most existing neural models either take document bag-of-words (BoW) or sequence of tokens as input followed by variational inference and BoW reconstruction to learn topic-word distribution. However, leveraging topic-word distribution for learning better features duri… ▽ More

    Submitted 9 July, 2021; v1 submitted 2 December, 2020; originally announced December 2020.

    Comments: Accepted as a long paper at ACL 2021 (Oral)

  38. arXiv:2010.09893  [pdf, other

    cs.CV

    LT-GAN: Self-Supervised GAN with Latent Transformation Detection

    Authors: Parth Patel, Nupur Kumari, Mayank Singh, Balaji Krishnamurthy

    Abstract: Generative Adversarial Networks (GANs) coupled with self-supervised tasks have shown promising results in unconditional and semi-supervised image generation. We propose a self-supervised approach (LT-GAN) to improve the generation quality and diversity of images by estimating the GAN-induced transformation (i.e. transformation induced in the generated images by perturbing the latent space of gener… ▽ More

    Submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted at WACV2021

  39. arXiv:2010.02556  [pdf, other

    cs.LG cs.AI cs.CL

    SHERLock: Self-Supervised Hierarchical Event Representation Learning

    Authors: Sumegh Roychowdhury, Sumedh A. Sontakke, Nikaash Puri, Mausoom Sarkar, Milan Aggarwal, Pinkesh Badjatiya, Balaji Krishnamurthy, Laurent Itti

    Abstract: Temporal event representations are an essential aspect of learning among humans. They allow for succinct encoding of the experiences we have through a variety of sensory inputs. Also, they are believed to be arranged hierarchically, allowing for an efficient representation of complex long-horizon experiences. Additionally, these representations are acquired in a self-supervised manner. Analogously… ▽ More

    Submitted 22 August, 2022; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Accepted at ICPR '22

  40. arXiv:2009.01485  [pdf, other

    cs.CV cs.AI

    SAC: Semantic Attention Composition for Text-Conditioned Image Retrieval

    Authors: Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, Ayush Chopra, Mausoom Sarkar, Balaji Krishnamurthy

    Abstract: The ability to efficiently search for images is essential for improving the user experiences across various products. Incorporating user feedback, via multi-modal inputs, to navigate visual search can help tailor retrieved results to specific user queries. We focus on the task of text-conditioned image retrieval that utilizes support text feedback alongside a reference image to retrieve images tha… ▽ More

    Submitted 19 October, 2021; v1 submitted 3 September, 2020; originally announced September 2020.

    Comments: Surgan Jandial, Pinkesh Badjatiya, Pranit Chawla, and Ayush Chopra contributed equally to this work. Work accepted at WACV 2022

  41. arXiv:2006.13593  [pdf, other

    cs.CV

    Retrospective Loss: Looking Back to Improve Training of Deep Neural Networks

    Authors: Surgan Jandial, Ayush Chopra, Mausoom Sarkar, Piyush Gupta, Balaji Krishnamurthy, Vineeth Balasubramanian

    Abstract: Deep neural networks (DNNs) are powerful learning machines that have enabled breakthroughs in several domains. In this work, we introduce a new retrospective loss to improve the training of deep neural network models by utilizing the prior experience available in past model states during training. Minimizing the retrospective loss, along with the task-specific loss, pushes the parameter state at t… ▽ More

    Submitted 24 June, 2020; originally announced June 2020.

    Comments: Accepted at KDD 2020; The first two authors contributed equally

  42. arXiv:2006.06082  [pdf, other

    cs.CY cs.AI cs.LG

    Towards Integrating Fairness Transparently in Industrial Applications

    Authors: Emily Dodwell, Cheryl Flynn, Balachander Krishnamurthy, Subhabrata Majumdar, Ritwik Mitra

    Abstract: Numerous Machine Learning (ML) bias-related failures in recent years have led to scrutiny of how companies incorporate aspects of transparency and accountability in their ML lifecycles. Companies have a responsibility to monitor ML processes for bias and mitigate any bias detected, ensure business product integrity, preserve customer loyalty, and protect brand image. Challenges specific to industr… ▽ More

    Submitted 13 February, 2021; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: 14 pages, 4 figures

  43. arXiv:2004.15014  [pdf, other

    cs.CV

    SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation

    Authors: Siddhartha Gairola, Mayur Hemani, Ayush Chopra, Balaji Krishnamurthy

    Abstract: Few-shot segmentation (FSS) methods perform image segmentation for a particular object class in a target (query) image, using a small set of (support) image-mask pairs. Recent deep neural network based FSS methods leverage high-dimensional feature similarity between the foreground features of the support images and the query image features. In this work, we demonstrate gaps in the utilization of t… ▽ More

    Submitted 2 May, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

    Comments: An updated version of this work was accepted at IJCAI 2020

  44. arXiv:2002.06544  [pdf, other

    cs.CL cs.LG

    Exploring Neural Models for Parsing Natural Language into First-Order Logic

    Authors: Hrituraj Singh, Milan Aggrawal, Balaji Krishnamurthy

    Abstract: Semantic parsing is the task of obtaining machine-interpretable representations from natural language text. We consider one such formal representation - First-Order Logic (FOL) and explore the capability of neural models in parsing English sentences to FOL. We model FOL parsing as a sequence to sequence mapping task where given a natural language sentence, it is encoded into an intermediate repres… ▽ More

    Submitted 16 February, 2020; originally announced February 2020.

    Comments: 11 Pages, 2 Figures

  45. arXiv:2001.06265  [pdf, other

    cs.CV cs.LG eess.IV

    SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On

    Authors: Surgan Jandial, Ayush Chopra, Kumar Ayush, Mayur Hemani, Abhijeet Kumar, Balaji Krishnamurthy

    Abstract: Image-based virtual try-on for fashion has gained considerable attention recently. The task requires trying on a clothing item on a target model image. An efficient framework for this is composed of two stages: (1) warping (transforming) the try-on cloth to align with the pose and shape of the target model, and (2) a texture transfer module to seamlessly integrate the warped try-on cloth onto the… ▽ More

    Submitted 17 January, 2020; originally announced January 2020.

    Comments: Accepted at IEEE WACV 2020

  46. arXiv:2001.05458  [pdf, other

    cs.AI cs.GT cs.LG

    Inducing Cooperative behaviour in Sequential-Social dilemmas through Multi-Agent Reinforcement Learning using Status-Quo Loss

    Authors: Pinkesh Badjatiya, Mausoom Sarkar, Abhishek Sinha, Siddharth Singh, Nikaash Puri, Jayakumar Subramanian, Balaji Krishnamurthy

    Abstract: In social dilemma situations, individual rationality leads to sub-optimal group outcomes. Several human engagements can be modeled as a sequential (multi-step) social dilemmas. However, in contrast to humans, Deep Reinforcement Learning agents trained to optimize individual rewards in sequential social dilemmas converge to selfish, mutually harmful behavior. We introduce a status-quo loss (SQLoss)… ▽ More

    Submitted 13 February, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

  47. arXiv:2001.05166  [pdf, other

    cs.LG cs.HC stat.ML

    ShapeVis: High-dimensional Data Visualization at Scale

    Authors: Nupur Kumari, Siddarth R., Akash Rupela, Piyush Gupta, Balaji Krishnamurthy

    Abstract: We present ShapeVis, a scalable visualization technique for point cloud data inspired from topological data analysis. Our method captures the underlying geometric and topological structure of the data in a compressed graphical representation. Much success has been reported by the data visualization technique Mapper, that discreetly approximates the Reeb graph of a filter function on the data. Howe… ▽ More

    Submitted 21 January, 2020; v1 submitted 15 January, 2020; originally announced January 2020.

    Comments: Accepted at WWW 2020

  48. arXiv:1912.12191  [pdf, other

    cs.CV cs.AI

    Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

    Authors: Nikaash Puri, Sukriti Verma, Piyush Gupta, Dhruv Kayastha, Shripad Deshmukh, Balaji Krishnamurthy, Sameer Singh

    Abstract: As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant t… ▽ More

    Submitted 3 April, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

    Comments: Accepted at the International Conference on Learning Representations (ICLR) 2020

  49. arXiv:1912.00466  [pdf, other

    cs.LG cs.CR cs.CV stat.ML

    A Method for Computing Class-wise Universal Adversarial Perturbations

    Authors: Tejus Gupta, Abhishek Sinha, Nupur Kumari, Mayank Singh, Balaji Krishnamurthy

    Abstract: We present an algorithm for computing class-specific universal adversarial perturbations for deep neural networks. Such perturbations can induce misclassification in a large fraction of images of a specific class. Unlike previous methods that use iterative optimization for computing a universal perturbation, the proposed method employs a perturbation that is a linear function of weights of the neu… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

  50. arXiv:1911.13073  [pdf, other

    cs.CV cs.LG eess.IV

    Attributional Robustness Training using Input-Gradient Spatial Alignment

    Authors: Mayank Singh, Nupur Kumari, Puneet Mangla, Abhishek Sinha, Vineeth N Balasubramanian, Balaji Krishnamurthy

    Abstract: Interpretability is an emerging area of research in trustworthy machine learning. Safe deployment of machine learning system mandates that the prediction and its explanation be reliable and robust. Recently, it has been shown that the explanations could be manipulated easily by adding visually imperceptible perturbations to the input while keeping the model's prediction intact. In this work, we st… ▽ More

    Submitted 18 July, 2020; v1 submitted 29 November, 2019; originally announced November 2019.

    Comments: ECCV 2020, Code at https://github.com/nupurkmr9/Attributional-Robustness