Search | arXiv e-print repository

Development of an interactive GUI using MATLAB for the detection of type and stage of Breast Tumor

Abstract: Breast cancer is described as one of the most common types of cancer which has been diagnosed mainly in women. When compared in the ratio of male to female, it has been duly found that the prone of having breast cancer is more in females than males. Breast lumps are classified mainly into two groups namely: cancerous and non-cancerous. When we say that the lump in the breast is cancerous, it means… ▽ More Breast cancer is described as one of the most common types of cancer which has been diagnosed mainly in women. When compared in the ratio of male to female, it has been duly found that the prone of having breast cancer is more in females than males. Breast lumps are classified mainly into two groups namely: cancerous and non-cancerous. When we say that the lump in the breast is cancerous, it means that it can spread via lobules, ducts, areola, stroma to various organs of the body. On the other hand, non-cancerous breast lumps are less harmful but it should be monitored under proper diagnosis to avoid it being transformed to cancerous lump. To diagnose these breast lumps the method of mammogram, ultrasonic images and MRI images are undertaken. Also, for better diagnosis sometimes doctors recommend for biopsy and any unforeseen anomalies occurring there may give rise to inaccurate test report. To avoid these discrepancies, processing the mammogram images is considered to be one of the most reliable methods. In the proposed method MATLAB GUI is developed and some sample images of breast lumps are placed accordingly in the respective axes. With the help of sliders the actual breast lump image is compared with the already stored breast lump sample images and then accordingly the history of the breast lumps is generated in real time in the form of test report. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.14290 [pdf, ps, other]

Examining the Implications of Deepfakes for Election Integrity

Authors: Hriday Ranka, Mokshit Surana, Neel Kothari, Veer Pariawala, Pratyay Banerjee, Aditya Surve, Sainath Reddy Sankepally, Raghav Jain, Jhagrut Lalwani, Swapneel Mehta

Abstract: It is becoming cheaper to launch disinformation operations at scale using AI-generated content, in particular 'deepfake' technology. We have observed instances of deepfakes in political campaigns, where generated content is employed to both bolster the credibility of certain narratives (reinforcing outcomes) and manipulate public perception to the detriment of targeted candidates or causes (advers… ▽ More It is becoming cheaper to launch disinformation operations at scale using AI-generated content, in particular 'deepfake' technology. We have observed instances of deepfakes in political campaigns, where generated content is employed to both bolster the credibility of certain narratives (reinforcing outcomes) and manipulate public perception to the detriment of targeted candidates or causes (adversarial outcomes). We discuss the threats from deepfakes in politics, highlight model specifications underlying different types of deepfake generation methods, and contribute an accessible evaluation of the efficacy of existing detection methods. We provide this as a summary for lawmakers and civil society actors to understand how the technology may be applied in light of existing policies regulating its use. We highlight the limitations of existing detection mechanisms and discuss the areas where policies and regulations are required to address the challenges of deepfakes. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: Accepted at the AAAI 2024 conference, AI for Credible Elections Workshop-AI4CE 2024

arXiv:2406.09598 [pdf, other]

Introducing HOT3D: An Egocentric Dataset for 3D Hand and Object Tracking

Authors: Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon, Shreyas Hampali, Fan Zhang, Jade Fountain, Edward Miller, Selen Basol, Richard Newcombe, Robert Wang, Jakob Julian Engel, Tomas Hodan

Abstract: We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of object… ▽ More We introduce HOT3D, a publicly available dataset for egocentric hand and object tracking in 3D. The dataset offers over 833 minutes (more than 3.7M images) of multi-view RGB/monochrome image streams showing 19 subjects interacting with 33 diverse rigid objects, multi-modal signals such as eye gaze or scene point clouds, as well as comprehensive ground truth annotations including 3D poses of objects, hands, and cameras, and 3D models of hands and objects. In addition to simple pick-up/observe/put-down actions, HOT3D contains scenarios resembling typical actions in a kitchen, office, and living room environment. The dataset is recorded by two head-mounted devices from Meta: Project Aria, a research prototype of light-weight AR/AI glasses, and Quest 3, a production VR headset sold in millions of units. Ground-truth poses were obtained by a professional motion-capture system using small optical markers attached to hands and objects. Hand annotations are provided in the UmeTrack and MANO formats and objects are represented by 3D meshes with PBR materials obtained by an in-house scanner. We aim to accelerate research on egocentric hand-object interaction by making the HOT3D dataset publicly available and by co-organizing public challenges on the dataset at ECCV 2024. The dataset can be downloaded from the project website: https://facebookresearch.github.io/hot3d/. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2405.10431 [pdf, other]

Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models

Authors: Shaz Furniturewala, Surgan Jandial, Abhinav Java, Pragyan Banerjee, Simra Shahid, Sumit Bhatia, Kokil Jaidka

Abstract: Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framew… ▽ More Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: The first two authors have equal contribution

arXiv:2403.15924 [pdf, other]

Perception and Control of Surfing in Virtual Reality using a 6-DoF Motion Platform

Authors: Premankur Banerjee, Jason Cherin, Jayati Upadhyay, Jason Kutch, Heather Culbertson

Abstract: The paper presents a system for simulating surfing in Virtual Reality (VR), emphasizing the recreation of aquatic motions and user-initiated propulsive forces using a 6-Degree of Freedom (DoF) motion platform. We present an algorithmic approach to accurately render surfboard kinematics and interactive paddling dynamics, validated through experimental evaluation with $N=17$ participants. Results… ▽ More The paper presents a system for simulating surfing in Virtual Reality (VR), emphasizing the recreation of aquatic motions and user-initiated propulsive forces using a 6-Degree of Freedom (DoF) motion platform. We present an algorithmic approach to accurately render surfboard kinematics and interactive paddling dynamics, validated through experimental evaluation with $N=17$ participants. Results indicate that the system effectively reproduces various acceleration levels, the perception of which is independent of users' body posture. We additionally found that the presence of ocean ripples amplifies the perception of acceleration. This system aims to enhance the realism and interactivity of VR surfing, laying a foundation for future advancements in surf therapy and interactive aquatic VR experiences. △ Less

Submitted 23 March, 2024; originally announced March 2024.

arXiv:2403.13672 [pdf, other]

Machine Learning Optimized Approach for Parameter Selection in MESHFREE Simulations

Authors: Paulami Banerjee, Mohan Padmanabha, Chaitanya Sanghavi, Isabel Michel, Simone Gramsch

Abstract: Meshfree simulation methods are emerging as compelling alternatives to conventional mesh-based approaches, particularly in the fields of Computational Fluid Dynamics (CFD) and continuum mechanics. In this publication, we provide a comprehensive overview of our research combining Machine Learning (ML) and Fraunhofer's MESHFREE software (www.meshfree.eu), a powerful tool utilizing a numerical point… ▽ More Meshfree simulation methods are emerging as compelling alternatives to conventional mesh-based approaches, particularly in the fields of Computational Fluid Dynamics (CFD) and continuum mechanics. In this publication, we provide a comprehensive overview of our research combining Machine Learning (ML) and Fraunhofer's MESHFREE software (www.meshfree.eu), a powerful tool utilizing a numerical point cloud in a Generalized Finite Difference Method (GFDM). This tool enables the effective handling of complex flow domains, moving geometries, and free surfaces, while allowing users to finely tune local refinement and quality parameters for an optimal balance between computation time and results accuracy. However, manually determining the optimal parameter combination poses challenges, especially for less experienced users. We introduce a novel ML-optimized approach, using active learning, regression trees, and visualization on MESHFREE simulation data, demonstrating the impact of input combinations on results quality and computation time. This research contributes valuable insights into parameter optimization in meshfree simulations, enhancing accessibility and usability for a broader user base in scientific and engineering applications. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2402.00295 [pdf]

Comparative Evaluation of Traditional and Deep Learning-Based Segmentation Methods for Spoil Pile Delineation Using UAV Images

Authors: Sureka Thiruchittampalam, Bikram P. Banerjee, Nancy F. Glenn, Simit Raval

Abstract: The stability of mine dumps is contingent upon the precise arrangement of spoil piles, taking into account their geological and geotechnical attributes. Yet, on-site characterisation of individual piles poses a formidable challenge. The utilisation of image-based techniques for spoil pile characterisation, employing remotely acquired data through unmanned aerial systems, is a promising complementa… ▽ More The stability of mine dumps is contingent upon the precise arrangement of spoil piles, taking into account their geological and geotechnical attributes. Yet, on-site characterisation of individual piles poses a formidable challenge. The utilisation of image-based techniques for spoil pile characterisation, employing remotely acquired data through unmanned aerial systems, is a promising complementary solution. Image processing, such as object-based classification and feature extraction, are dependent upon effective segmentation. This study refines and juxtaposes various segmentation approaches, specifically colour-based and morphology-based techniques. The objective is to enhance and evaluate avenues for object-based analysis for spoil characterisation within the context of mining environments. Furthermore, a comparative analysis is conducted between conventional segmentation approaches and those rooted in deep learning methodologies. Among the diverse segmentation approaches evaluated, the morphology-based deep learning segmentation approach, Segment Anything Model (SAM), exhibited superior performance in comparison to other approaches. This outcome underscores the efficacy of incorporating advanced morphological and deep learning techniques for accurate and efficient spoil pile characterisation. The findings of this study contribute valuable insights to the optimisation of segmentation strategies, thereby advancing the application of image-based techniques for the characterisation of spoil piles in mining environments. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2311.11214 [pdf]

Infrared image identification method of substation equipment fault under weak supervision

Authors: Anjali Sharma, Priya Banerjee, Nikhil Singh

Abstract: This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performanc… ▽ More This study presents a weakly supervised method for identifying faults in infrared images of substation equipment. It utilizes the Faster RCNN model for equipment identification, enhancing detection accuracy through modifications to the model's network structure and parameters. The method is exemplified through the analysis of infrared images captured by inspection robots at substations. Performance is validated against manually marked results, demonstrating that the proposed algorithm significantly enhances the accuracy of fault identification across various equipment types. △ Less

Submitted 18 November, 2023; originally announced November 2023.

arXiv:2311.05451 [pdf, other]

All Should Be Equal in the Eyes of Language Models: Counterfactually Aware Fair Text Generation

Authors: Pragyan Banerjee, Abhinav Java, Surgan Jandial, Simra Shahid, Shaz Furniturewala, Balaji Krishnamurthy, Sumit Bhatia

Abstract: Fairness in Language Models (LMs) remains a longstanding challenge, given the inherent biases in training data that can be perpetuated by models and affect the downstream tasks. Recent methods employ expensive retraining or attempt debiasing during inference by constraining model outputs to contrast from a reference set of biased templates or exemplars. Regardless, they dont address the primary go… ▽ More Fairness in Language Models (LMs) remains a longstanding challenge, given the inherent biases in training data that can be perpetuated by models and affect the downstream tasks. Recent methods employ expensive retraining or attempt debiasing during inference by constraining model outputs to contrast from a reference set of biased templates or exemplars. Regardless, they dont address the primary goal of fairness to maintain equitability across different demographic groups. In this work, we posit that inferencing LMs to generate unbiased output for one demographic under a context ensues from being aware of outputs for other demographics under the same context. To this end, we propose Counterfactually Aware Fair InferencE (CAFIE), a framework that dynamically compares the model understanding of diverse demographics to generate more equitable sentences. We conduct an extensive empirical evaluation using base LMs of varying sizes and across three diverse datasets and found that CAFIE outperforms strong baselines. CAFIE produces fairer text and strikes the best balance between fairness and language modeling capability △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: The first four authors contributed equally to the work

arXiv:2310.20174 [pdf, other]

GraphTransformers for Geospatial Forecasting of Hurricane Trajectories

Authors: Pallavi Banerjee, Satyaki Chakraborty

Abstract: In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial traject… ▽ More In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial trajectory prediction can be significantly improved. Our GraphTransformer approach improves upon state-of-the-art Transformer based baseline significantly on HURDAT, a dataset where we are interested in predicting the trajectory of a hurricane on a 6 hourly basis. △ Less

Submitted 26 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.00836 [pdf, other]

Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models

Authors: Man Luo, Shrinidhi Kumbhar, Ming shen, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral

Abstract: Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge R… ▽ More Logical reasoning is fundamental for humans yet presents a substantial challenge in the domain of Artificial Intelligence. Initially, researchers used Knowledge Representation and Reasoning (KR) systems that did not scale and required non-trivial manual effort. Recently, the emergence of large language models (LLMs) has demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. Consequently, there's a growing interest in using LLMs for logical reasoning via natural language. This work strives to understand the proficiency of LLMs in logical reasoning by offering a brief review of the latest progress in this area; with a focus on the logical reasoning datasets, tasks, and the methods adopted to utilize LLMs for reasoning. To offer a thorough analysis, we have compiled a benchmark titled LogiGLUE. This includes 24 varied datasets encompassing deductive, abductive, and inductive reasoning. Utilizing LogiGLUE as a foundation, we have trained an instruction fine-tuned language model, resulting in LogiT5. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model across the different logical reasoning categories. We also assess various LLMs using LogiGLUE, and the findings indicate that LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We aim to shed light on the capabilities and potential pathways for enhancing logical reasoning proficiency in LLMs, paving the way for more advanced and nuanced developments in this critical field. △ Less

Submitted 30 March, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

Comments: Work in progress

arXiv:2308.04407 [pdf, other]

Chrisimos: A useful Proof-of-Work for finding Minimal Dominating Set of a graph

Authors: Diptendu Chatterjee, Prabal Banerjee, Subhra Mazumdar

Abstract: Hash-based Proof-of-Work (PoW) used in the Bitcoin Blockchain leads to high energy consumption and resource wastage. In this paper, we aim to re-purpose the energy by replacing the hash function with real-life problems having commercial utility. We propose Chrisimos, a useful Proof-of-Work where miners are required to find a minimal dominating set for real-life graph instances. A miner who is able… ▽ More Hash-based Proof-of-Work (PoW) used in the Bitcoin Blockchain leads to high energy consumption and resource wastage. In this paper, we aim to re-purpose the energy by replacing the hash function with real-life problems having commercial utility. We propose Chrisimos, a useful Proof-of-Work where miners are required to find a minimal dominating set for real-life graph instances. A miner who is able to output the smallest dominating set for the given graph within the block interval time wins the mining game. We also propose a new chain selection rule that ensures the security of the scheme. Thus our protocol also realizes a decentralized minimal dominating set solver for any graph instance. We provide formal proof of correctness and show via experimental results that the block interval time is within feasible bounds of hash-based PoW. △ Less

Submitted 13 September, 2023; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: 20 pages, 3 figures. An abridged version of the paper got accepted in The International Symposium on Intelligent and Trustworthy Computing, Communications, and Networking (ITCCN-2023) held in conjunction with the 22nd IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom-2023)

arXiv:2304.03805 [pdf, other]

Correcting Model Misspecification via Generative Adversarial Networks

Authors: Pronoma Banerjee, Manasi V Gude, Rajvi J Sampat, Sharvari M Hedaoo, Soma Dhavala, Snehanshu Saha

Abstract: Machine learning models are often misspecified in the likelihood, which leads to a lack of robustness in the predictions. In this paper, we introduce a framework for correcting likelihood misspecifications in several paradigm agnostic noisy prior models and test the model's ability to remove the misspecification. The "ABC-GAN" framework introduced is a novel generative modeling paradigm, which com… ▽ More Machine learning models are often misspecified in the likelihood, which leads to a lack of robustness in the predictions. In this paper, we introduce a framework for correcting likelihood misspecifications in several paradigm agnostic noisy prior models and test the model's ability to remove the misspecification. The "ABC-GAN" framework introduced is a novel generative modeling paradigm, which combines Generative Adversarial Networks (GANs) and Approximate Bayesian Computation (ABC). This new paradigm assists the existing GANs by incorporating any subjective knowledge available about the modeling process via ABC, as a regularizer, resulting in a partially interpretable model that operates well under low data regimes. At the same time, unlike any Bayesian analysis, the explicit knowledge need not be perfect, since the generator in the GAN can be made arbitrarily complex. ABC-GAN eliminates the need for summary statistics and distance metrics as the discriminator implicitly learns them and enables simultaneous specification of multiple generative models. The model misspecification is simulated in our experiments by introducing noise of various biases and variances. The correction term is learnt via the ABC-GAN, with skip connections, referred to as skipGAN. The strength of the skip connection indicates the amount of correction needed or how misspecified the prior model is. Based on a simple experimental setup, we show that the ABC-GAN models not only correct the misspecification of the prior, but also perform as well as or better than the respective priors under noisier conditions. In this proposal, we show that ABC-GANs get the best of both worlds. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.14994 [pdf, other]

Analysis of DNA sequences through local distribution of nucleotides in strategic neighborhoods

Authors: Probir Mondal, Pratyay Banerjee, Krishnendu Basuli

Abstract: We propose a new alignment-free algorithm by constructing a compact vector representation on $\mathbb{R}^{24}$ of a DNA sequence of arbitrary length. Each component of this vector is obtained from a representative sequence, the elements of which are the values realized by a function $Γ$. $Γ$ acts on neighborhoods of arbitrary radius that are located at strategic positions within the DNA sequence a… ▽ More We propose a new alignment-free algorithm by constructing a compact vector representation on $\mathbb{R}^{24}$ of a DNA sequence of arbitrary length. Each component of this vector is obtained from a representative sequence, the elements of which are the values realized by a function $Γ$. $Γ$ acts on neighborhoods of arbitrary radius that are located at strategic positions within the DNA sequence and carries complete information about the local distribution of frequencies of the nucleotides as a consequence of the uniqueness of prime factorization of integer. The algorithm exhibits linear time complexity and turns out to consume significantly small memory. The two natural parameters characterizing the radius and location of the neighbourhoods are fixed by comparing the phylogenetic tree with the benchmark for full genome sequences of fish mtDNA datasets. Using these fitting parameters, the method is applied to analyze a number of genome sequences from benchmark and other standard datasets. The algorithm proves to be computationally efficient compared to Co-phylog and CD-MAWS when applied over a certain range of a simulated dataset. △ Less

Submitted 16 January, 2024; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 12 pages, 5 figures

arXiv:2301.10165 [pdf, other]

Lexi: Self-Supervised Learning of the UI Language

Authors: Pratyay Banerjee, Shweti Mahajan, Kushal Arora, Chitta Baral, Oriana Riva

Abstract: Humans can learn to operate the user interface (UI) of an application by reading an instruction manual or how-to guide. Along with text, these resources include visual content such as UI screenshots and images of application icons referenced in the text. We explore how to leverage this data to learn generic visio-linguistic representations of UI screens and their components. These representations… ▽ More Humans can learn to operate the user interface (UI) of an application by reading an instruction manual or how-to guide. Along with text, these resources include visual content such as UI screenshots and images of application icons referenced in the text. We explore how to leverage this data to learn generic visio-linguistic representations of UI screens and their components. These representations are useful in many real applications, such as accessibility, voice navigation, and task automation. Prior UI representation models rely on UI metadata (UI trees and accessibility labels), which is often missing, incompletely defined, or not accessible. We avoid such a dependency, and propose Lexi, a pre-trained vision and language model designed to handle the unique features of UI screens, including their text richness and context sensitivity. To train Lexi we curate the UICaption dataset consisting of 114k UI images paired with descriptions of their functionality. We evaluate Lexi on four tasks: UI action entailment, instruction-based UI image retrieval, grounding referring expressions, and UI entity recognition. △ Less

Submitted 23 January, 2023; originally announced January 2023.

Comments: EMNLP (Findings) 2022

arXiv:2212.03866 [pdf, other]

Learning Action-Effect Dynamics for Hypothetical Vision-Language Reasoning Task

Authors: Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

Abstract: 'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat… ▽ More 'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). This has been an important research direction in Artificial Intelligence (AI) in general, but the study of RAC with visual and linguistic inputs is relatively recent. The CLEVR_HYP (Sampat et. al., 2021) is one such testbed for hypothetical vision-language reasoning with actions as the key focus. In this work, we propose a novel learning strategy that can improve reasoning about the effects of actions. We implement an encoder-decoder architecture to learn the representation of actions as vectors. We combine the aforementioned encoder-decoder architecture with existing modality parsers and a scene graph question answering model to evaluate our proposed system on the CLEVR_HYP dataset. We conduct thorough experiments to demonstrate the effectiveness of our proposed approach and discuss its advantages over previous baselines in terms of performance, data efficiency, and generalization capability. △ Less

Submitted 7 December, 2022; originally announced December 2022.

Comments: 11 pages, 9 figures; Accepted at Findings of EMNLP 2022. arXiv admin note: substantial text overlap with arXiv:2212.03433

arXiv:2212.03433 [pdf, other]

Learning Action-Effect Dynamics from Pairs of Scene-graphs

Authors: Shailaja Keyur Sampat, Pratyay Banerjee, Yezhou Yang, Chitta Baral

Abstract: 'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). Recently, there has been growing interest in the study of RAC with visual and linguistic inputs. Graphs are often used to represent semantic structure of the visual content (i.e. objects, t… ▽ More 'Actions' play a vital role in how humans interact with the world. Thus, autonomous agents that would assist us in everyday tasks also require the capability to perform 'Reasoning about Actions & Change' (RAC). Recently, there has been growing interest in the study of RAC with visual and linguistic inputs. Graphs are often used to represent semantic structure of the visual content (i.e. objects, their attributes and relationships among objects), commonly referred to as scene-graphs. In this work, we propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language. We experiment with existing CLEVR_HYP (Sampat et. al, 2021) dataset and show that our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models. △ Less

Submitted 6 December, 2022; originally announced December 2022.

Comments: 5 pages, 6 figures; Accepted at 3rd Workshop on Graphs and more Complex structures for Learning and Reasoning (GCLR) workshop, AAAI 2023

arXiv:2211.11181 [pdf]

doi 10.1016/j.ijmst.2022.09.022

A review of laser scanning for geological and geotechnical applications in underground mining

Authors: Sarvesh Kumar Singh, Bikram Pratap Banerjee, Simit Raval

Abstract: Laser scanning can provide timely assessments of mine sites despite adverse challenges in the operational environment. Although there are several published articles on laser scanning, there is a need to review them in the context of underground mining applications. To this end, a holistic review of laser scanning is presented including progress in 3D scanning systems, data capture/processing techn… ▽ More Laser scanning can provide timely assessments of mine sites despite adverse challenges in the operational environment. Although there are several published articles on laser scanning, there is a need to review them in the context of underground mining applications. To this end, a holistic review of laser scanning is presented including progress in 3D scanning systems, data capture/processing techniques and primary applications in underground mines. Laser scanning technology has advanced significantly in terms of mobility and mapping, but there are constraints in coherent and consistent data collection at certain mines due to feature deficiency, dynamics, and environmental influences such as dust and water. Studies suggest that laser scanning has matured over the years for change detection, clearance measurements and structure mapping applications. However, there is scope for improvements in lithology identification, surface parameter measurements, logistic tracking and autonomous navigation. Laser scanning has the potential to provide real-time solutions but the lack of infrastructure in underground mines for data transfer, geodetic networking and processing capacity remain limiting factors. Nevertheless, laser scanners are becoming an integral part of mine automation thanks to their affordability, accuracy and mobility, which should support their widespread usage in years to come. △ Less

Submitted 20 November, 2022; originally announced November 2022.

arXiv:2210.11790 [pdf, other]

FoSR: First-order spectral rewiring for addressing oversquashing in GNNs

Authors: Kedar Karhadkar, Pradeep Kr. Banerjee, Guido Montúfar

Abstract: Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On… ▽ More Graph neural networks (GNNs) are able to leverage the structure of graph data by passing messages along the edges of the graph. While this allows GNNs to learn features depending on the graph structure, for certain graph topologies it leads to inefficient information propagation and a problem known as oversquashing. This has recently been linked with the curvature and spectral gap of the graph. On the other hand, adding edges to the message-passing graph can lead to increasingly similar node representations and a problem known as oversmoothing. We propose a computationally efficient algorithm that prevents oversquashing by systematically adding edges to the graph based on spectral expansion. We combine this with a relational architecture, which lets the GNN preserve the original graph structure and provably prevents oversmoothing. We find experimentally that our algorithm outperforms existing graph rewiring methods in several graph classification tasks. △ Less

Submitted 15 February, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

Comments: 21 pages, accepted to ICLR 2023

arXiv:2208.03471 [pdf, other]

Oversquashing in GNNs through the lens of information contraction and graph expansion

Authors: Pradeep Kr. Banerjee, Kedar Karhadkar, Yu Guang Wang, Uri Alon, Guido Montúfar

Abstract: The quality of signal propagation in message-passing graph neural networks (GNNs) strongly influences their expressivity as has been observed in recent works. In particular, for prediction tasks relying on long-range interactions, recursive aggregation of node features can lead to an undesired phenomenon called "oversquashing". We present a framework for analyzing oversquashing based on informatio… ▽ More The quality of signal propagation in message-passing graph neural networks (GNNs) strongly influences their expressivity as has been observed in recent works. In particular, for prediction tasks relying on long-range interactions, recursive aggregation of node features can lead to an undesired phenomenon called "oversquashing". We present a framework for analyzing oversquashing based on information contraction. Our analysis is guided by a model of reliable computation due to von Neumann that lends a new insight into oversquashing as signal quenching in noisy computation graphs. Building on this, we propose a graph rewiring algorithm aimed at alleviating oversquashing. Our algorithm employs a random local edge flip primitive motivated by an expander graph construction. We compare the spectral expansion properties of our algorithm with that of an existing curvature-based non-local rewiring strategy. Synthetic experiments show that while our algorithm in general has a slower rate of expansion, it is overall computationally cheaper, preserves the node degrees exactly and never disconnects the graph. △ Less

Submitted 6 August, 2022; originally announced August 2022.

Comments: 8 pages, 5 figures; Accepted at the 58th Annual Allerton Conference on Communication, Control, and Computing

arXiv:2204.10982 [pdf, ps, other]

Continuity and Additivity Properties of Information Decompositions

Authors: Johannes Rauh, Pradeep Kr. Banerjee, Eckehard Olbrich, Guido Montúfar, Jürgen Jost

Abstract: Information decompositions quantify how the Shannon information about a given random variable is distributed among several other random variables. Various requirements have been proposed that such a decomposition should satisfy, leading to different candidate solutions. Curiously, however, only two of the original requirements that determined the Shannon information have been considered, namely mo… ▽ More Information decompositions quantify how the Shannon information about a given random variable is distributed among several other random variables. Various requirements have been proposed that such a decomposition should satisfy, leading to different candidate solutions. Curiously, however, only two of the original requirements that determined the Shannon information have been considered, namely monotonicity and normalization. Two other important properties, continuity and additivity, have not been considered. In this contribution, we focus on the mutual information of two finite variables $Y,Z$ about a third finite variable $S$ and check which of the decompositions satisfy these two properties. While most of them satisfy continuity, only one of them is both continuous and additive. △ Less

Submitted 9 July, 2023; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: 17 pages

MSC Class: 94A15; 94A17

Journal ref: International Journal of Approximate Reasoning, 2023

arXiv:2204.10869 [pdf, other]

Identity Preserving Loss for Learned Image Compression

Authors: Jiuhong Xiao, Lavisha Aggarwal, Prithviraj Banerjee, Manoj Aggarwal, Gerard Medioni

Abstract: Deep learning model inference on embedded devices is challenging due to the limited availability of computation resources. A popular alternative is to perform model inference on the cloud, which requires transmitting images from the embedded device to the cloud. Image compression techniques are commonly employed in such cloud-based architectures to reduce transmission latency over low bandwidth ne… ▽ More Deep learning model inference on embedded devices is challenging due to the limited availability of computation resources. A popular alternative is to perform model inference on the cloud, which requires transmitting images from the embedded device to the cloud. Image compression techniques are commonly employed in such cloud-based architectures to reduce transmission latency over low bandwidth networks. This work proposes an end-to-end image compression framework that learns domain-specific features to achieve higher compression ratios than standard HEVC/JPEG compression techniques while maintaining accuracy on downstream tasks (e.g., recognition). Our framework does not require fine-tuning of the downstream task, which allows us to drop-in any off-the-shelf downstream task model without retraining. We choose faces as an application domain due to the ready availability of datasets and off-the-shelf recognition models as representative downstream tasks. We present a novel Identity Preserving Reconstruction (IPR) loss function which achieves Bits-Per-Pixel (BPP) values that are ~38% and ~42% of CRF-23 HEVC compression for LFW (low-resolution) and CelebA-HQ (high-resolution) datasets, respectively, while maintaining parity in recognition accuracy. The superior compression ratio is achieved as the model learns to retain the domain-specific features (e.g., facial features) while sacrificing details in the background. Furthermore, images reconstructed by our proposed compression model are robust to changes in downstream model architectures. We show at-par recognition performance on the LFW dataset with an unseen recognition model while retaining a lower BPP value of ~38% of CRF-23 HEVC compression. △ Less

Submitted 26 April, 2022; v1 submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted by CVPR 2022 Workshop on New Trends in Image Restoration and Enhancement and Challenges

arXiv:2203.16682 [pdf, other]

To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo

Authors: Yiran Luo, Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Abstract: We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of bias… ▽ More We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image. Naturally, models trained on these biased data lead to over-estimation of performance on the benchmark. To enforce models being correct for the correct reasons, we design automated tools to filter and debias the original dataset by ruling out all examples of insufficient context, such as those with no verb or with a long chain of conjunct names in their captions. Our experiments show that our new sub-sampled dataset contains less bias with much lowered heuristic performances and widened gaps between heuristic and supervised methods. We also demonstrate the same benchmark model trained on our debiased training set outperforms that trained on the original biased (and larger) training set on our debiased test set. We argue our debiased dataset offers the PCVG task a more practical baseline for reliable benchmarking and future improvements. △ Less

Submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted at ACL 2022 (Short Paper)

arXiv:2203.10533 [pdf, other]

doi 10.1109/TNSM.2022.3230768

Strategic Analysis of Griefing Attack in Lightning Network

Authors: Subhra Mazumdar, Prabal Banerjee, Abhinandan Sinha, Sushmita Ruj, Bimal Roy

Abstract: Hashed Timelock Contract (HTLC) in Lightning Network is susceptible to a griefing attack. An attacker can block several channels and stall payments by mounting this attack. A state-of-the-art countermeasure, Hashed Timelock Contract with Griefing-Penalty (HTLC-GP) is found to work under the classical assumption of participants being either honest or malicious but fails for rational participants. T… ▽ More Hashed Timelock Contract (HTLC) in Lightning Network is susceptible to a griefing attack. An attacker can block several channels and stall payments by mounting this attack. A state-of-the-art countermeasure, Hashed Timelock Contract with Griefing-Penalty (HTLC-GP) is found to work under the classical assumption of participants being either honest or malicious but fails for rational participants. To address the gap, we introduce a game-theoretic model for analyzing griefing attacks in HTLC. We use this model to analyze griefing attacks in HTLC-GP and conjecture that it is impossible to design an efficient protocol that will penalize a malicious participant with the current Bitcoin scripting system. We study the impact of the penalty on the cost of mounting the attack and observe that HTLC-GP is weakly effective in disincentivizing the attacker in certain conditions. To further increase the cost of attack, we introduce the concept of \emph{guaranteed minimum compensation}, denoted as $ζ$, and modify HTLC-GP into $\textrm{HTLC-GP}^ζ$. By experimenting on several instances of Lightning Network, we observe that the total coins locked in the network drops to $28\%$ for $\textrm{HTLC-GP}^ζ$, unlike in HTLC-GP where total coins locked does not drop below $40\%$. These results justify that $\textrm{HTLC-GP}^ζ$ is better than HTLC-GP to counter griefing attacks. △ Less

Submitted 20 December, 2022; v1 submitted 20 March, 2022; originally announced March 2022.

Comments: 17 pages, Accepted in IEEE Transactions on Network and Service Management (Special Issue Advances on Blockchain)

arXiv:2201.04933 [pdf, other]

Machine Learning-enhanced Efficient Spectroscopic Ellipsometry Modeling

Authors: Ayush Arunachalam, S. Novia Berriel, Parag Banerjee, Kanad Basu

Abstract: Over the recent years, there has been an extensive adoption of Machine Learning (ML) in a plethora of real-world applications, ranging from computer vision to data mining and drug discovery. In this paper, we utilize ML to facilitate efficient film fabrication, specifically Atomic Layer Deposition (ALD). In order to make advances in ALD process development, which is utilized to generate thin films… ▽ More Over the recent years, there has been an extensive adoption of Machine Learning (ML) in a plethora of real-world applications, ranging from computer vision to data mining and drug discovery. In this paper, we utilize ML to facilitate efficient film fabrication, specifically Atomic Layer Deposition (ALD). In order to make advances in ALD process development, which is utilized to generate thin films, and its subsequent accelerated adoption in industry, it is imperative to understand the underlying atomistic processes. Towards this end, in situ techniques for monitoring film growth, such as Spectroscopic Ellipsometry (SE), have been proposed. However, in situ SE is associated with complex hardware and, hence, is resource intensive. To address these challenges, we propose an ML-based approach to expedite film thickness estimation. The proposed approach has tremendous implications of faster data acquisition, reduced hardware complexity and easier integration of spectroscopic ellipsometry for in situ monitoring of film thickness deposition. Our experimental results involving SE of TiO2 demonstrate that the proposed ML-based approach furnishes promising thickness prediction accuracy results of 88.76% within +/-1.5 nm and 85.14% within +/-0.5 nm intervals. Furthermore, we furnish accuracy results up to 98% at lower thicknesses, which is a significant improvement over existing SE-based analysis, thereby making our solution a viable option for thickness estimation of ultrathin films. △ Less

Submitted 8 February, 2022; v1 submitted 1 January, 2022; originally announced January 2022.

arXiv:2110.12231 [pdf, other]

Learning curves for Gaussian process regression with power-law priors and targets

Authors: Hui Jin, Pradeep Kr. Banerjee, Guido Montúfar

Abstract: We characterize the power-law asymptotics of learning curves for Gaussian process regression (GPR) under the assumption that the eigenspectrum of the prior and the eigenexpansion coefficients of the target function follow a power law. Under similar assumptions, we leverage the equivalence between GPR and kernel ridge regression (KRR) to show the generalization error of KRR. Infinitely wide neural… ▽ More We characterize the power-law asymptotics of learning curves for Gaussian process regression (GPR) under the assumption that the eigenspectrum of the prior and the eigenexpansion coefficients of the target function follow a power law. Under similar assumptions, we leverage the equivalence between GPR and kernel ridge regression (KRR) to show the generalization error of KRR. Infinitely wide neural networks can be related to GPR with respect to the neural network GP kernel and the neural tangent kernel, which in several cases is known to have a power-law spectrum. Hence our methods can be applied to study the generalization error of infinitely wide neural networks. We present toy experiments demonstrating the theory. △ Less

Submitted 27 November, 2021; v1 submitted 23 October, 2021; originally announced October 2021.

Comments: 76 pages, 7 table, 6 figure

arXiv:2110.08438 [pdf, other]

Unsupervised Natural Language Inference Using PHL Triplet Generation

Authors: Neeraj Varshney, Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Abstract: Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address the above challenge and present an explorative study on unsupervised NLI, a paradigm… ▽ More Transformer-based models achieve impressive performance on numerous Natural Language Inference (NLI) benchmarks when trained on respective training datasets. However, in certain cases, training samples may not be available or collecting them could be time-consuming and resource-intensive. In this work, we address the above challenge and present an explorative study on unsupervised NLI, a paradigm in which no human-annotated training samples are available. We investigate it under three settings: PH, P, and NPH that differ in the extent of unlabeled data available for learning. As a solution, we propose a procedural data generation approach that leverages a set of sentence transformations to collect PHL (Premise, Hypothesis, Label) triplets for training NLI models, bypassing the need for human-annotated training data. Comprehensive experiments with several NLI datasets show that the proposed approach results in accuracies of up to 66.75%, 65.9%, 65.39% in PH, P, and NPH settings respectively, outperforming all existing unsupervised baselines. Furthermore, fine-tuning our model with as little as ~0.1% of the human-annotated training dataset (500 instances) leads to 12.2% higher accuracy than the model trained from scratch on the same 500 instances. Supported by this superior performance, we conclude with a recommendation for collecting high-quality task-specific data. △ Less

Submitted 15 March, 2022; v1 submitted 15 October, 2021; originally announced October 2021.

Comments: ACL 2022 Findings

arXiv:2110.07165 [pdf, other]

Semantically Distributed Robust Optimization for Vision-and-Language Inference

Authors: Tejas Gokhale, Abhishek Chaudhary, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Abstract: Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms. While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored. In this paper,… ▽ More Analysis of vision-and-language models has revealed their brittleness under linguistic phenomena such as paraphrasing, negation, textual entailment, and word substitutions with synonyms or antonyms. While data augmentation techniques have been designed to mitigate against these failure modes, methods that can integrate this knowledge into the training pipeline remain under-explored. In this paper, we present \textbf{SDRO}, a model-agnostic method that utilizes a set linguistic transformations in a distributed robust optimization setting, along with an ensembling technique to leverage these transformations during inference. Experiments on benchmark datasets with images (NLVR$^2$) and video (VIOLIN) demonstrate performance improvements as well as robustness to adversarial attacks. Experiments on binary VQA explore the generalizability of this method to other V\&L tasks. △ Less

Submitted 14 March, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

Comments: Findings of ACL 2022; code available at https://github.com/ASU-APG/VLI_SDRO

arXiv:2109.04014 [pdf, other]

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Authors: Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral

Abstract: Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of… ▽ More Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in https://github.com/luomancs/retriever\_reader\_for\_okvqa.git △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: accepted at EMNLP 2021

arXiv:2109.01934 [pdf, other]

Weakly Supervised Relative Spatial Reasoning for Visual Question Answering

Authors: Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Abstract: Vision-and-language (V\&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e.\ implicitly learning the geometry of the scene. In… ▽ More Vision-and-language (V\&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e.\ implicitly learning the geometry of the scene. In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task. Our findings suggest that state-of-the-art transformer-based V\&L models lack sufficient abilities to excel at this task. Motivated by this, we design two objectives as proxies for 3D spatial reasoning (SR) -- object centroid estimation, and relative position estimation, and train V\&L with weak supervision from off-the-shelf depth estimators. This leads to considerable improvements in accuracy for the "GQA" visual question answering challenge (in fully supervised, few-shot, and O.O.D settings) as well as improvements in relative spatial reasoning. Code and data will be released \href{https://github.com/pratyay-banerjee/weak_sup_vqa}{here}. △ Less

Submitted 4 September, 2021; originally announced September 2021.

Comments: Accepted to ICCV 2021. PaperId : ICCV2021-10857 Copyright transferred to IEEE ICCV. DOI will be updated later

arXiv:2105.14357 [pdf, other]

Constructing Flow Graphs from Procedural Cybersecurity Texts

Authors: Kuntal Kumar Pal, Kazuaki Kashihara, Pratyay Banerjee, Swaroop Mishra, Ruoyu Wang, Chitta Baral

Abstract: Following procedural texts written in natural languages is challenging. We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures. If such texts are structured, we can readily visualize instruction-flows, reason or infer a particular step, or even build automated systems to help novice agents achieve a goal. Ho… ▽ More Following procedural texts written in natural languages is challenging. We must read the whole text to identify the relevant information or identify the instruction flows to complete a task, which is prone to failures. If such texts are structured, we can readily visualize instruction-flows, reason or infer a particular step, or even build automated systems to help novice agents achieve a goal. However, this structure recovery task is a challenge because of such texts' diverse nature. This paper proposes to identify relevant information from such texts and generate information flows between sentences. We built a large annotated procedural text dataset (CTFW) in the cybersecurity domain (3154 documents). This dataset contains valuable instructions regarding software vulnerability analysis experiences. We performed extensive experiments on CTFW with our LM-GNN model variants in multiple settings. To show the generalizability of both this task and our method, we also experimented with procedural texts from two other domains (Maintenance Manual and Cooking), which are substantially different from cybersecurity. Our experiments show that Graph Convolution Network with BERT sentence embeddings outperforms BERT in all three domains △ Less

Submitted 29 May, 2021; originally announced May 2021.

Comments: 13 pages, 5 pages, accepted in the Findings of ACL 2021

arXiv:2105.12392 [pdf, other]

Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction

Authors: Ming Shen, Pratyay Banerjee, Chitta Baral

Abstract: In this work, we propose Masked Noun-Phrase Prediction (MNPP), a pre-training strategy to tackle pronoun resolution in a fully unsupervised setting. Firstly, We evaluate our pre-trained model on various pronoun resolution datasets without any finetuning. Our method outperforms all previous unsupervised methods on all datasets by large margins. Secondly, we proceed to a few-shot setting where we fi… ▽ More In this work, we propose Masked Noun-Phrase Prediction (MNPP), a pre-training strategy to tackle pronoun resolution in a fully unsupervised setting. Firstly, We evaluate our pre-trained model on various pronoun resolution datasets without any finetuning. Our method outperforms all previous unsupervised methods on all datasets by large margins. Secondly, we proceed to a few-shot setting where we finetune our pre-trained model on WinoGrande-S and XS separately. Our method outperforms RoBERTa-large baseline with large margins, meanwhile, achieving a higher AUC score after further finetuning on the remaining three official splits of WinoGrande. △ Less

Submitted 28 May, 2021; v1 submitted 26 May, 2021; originally announced May 2021.

Comments: Accepted to ACL2021

arXiv:2105.01747 [pdf, ps, other]

doi 10.1109/ISIT45174.2021.9517960

Information Complexity and Generalization Bounds

Authors: Pradeep Kr. Banerjee, Guido Montúfar

Abstract: We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under differe… ▽ More We present a unifying picture of PAC-Bayesian and mutual information-based upper bounds on the generalization error of randomized learning algorithms. As we show, Tong Zhang's information exponential inequality (IEI) gives a general recipe for constructing bounds of both flavors. We show that several important results in the literature can be obtained as simple corollaries of the IEI under different assumptions on the loss function. Moreover, we obtain new bounds for data-dependent priors and unbounded loss functions. Optimizing the bounds gives rise to variants of the Gibbs algorithm, for which we discuss two practical examples for learning with neural networks, namely, Entropy- and PAC-Bayes- SGD. Further, we use an Occam's factor argument to show a PAC-Bayesian bound that incorporates second-order curvature information of the training loss. △ Less

Submitted 23 October, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

Comments: To appear in 2021 IEEE International Symposium on Information Theory (ISIT); 23 pages

MSC Class: 68Q32; 68T05; 94A15 ACM Class: I.2.6; G.3

arXiv:2103.12801 [pdf, other]

Variable Name Recovery in Decompiled Binary Code using Constrained Masked Language Modeling

Authors: Pratyay Banerjee, Kuntal Kumar Pal, Fish Wang, Chitta Baral

Abstract: Decompilation is the procedure of transforming binary programs into a high-level representation, such as source code, for human analysts to examine. While modern decompilers can reconstruct and recover much information that is discarded during compilation, inferring variable names is still extremely difficult. Inspired by recent advances in natural language processing, we propose a novel solution… ▽ More Decompilation is the procedure of transforming binary programs into a high-level representation, such as source code, for human analysts to examine. While modern decompilers can reconstruct and recover much information that is discarded during compilation, inferring variable names is still extremely difficult. Inspired by recent advances in natural language processing, we propose a novel solution to infer variable names in decompiled code based on Masked Language Modeling, Byte-Pair Encoding, and neural architectures such as Transformers and BERT. Our solution takes \textit{raw} decompiler output, the less semantically meaningful code, as input, and enriches it using our proposed \textit{finetuning} technique, Constrained Masked Language Modeling. Using Constrained Masked Language Modeling introduces the challenge of predicting the number of masked tokens for the original variable name. We address this \textit{count of token prediction} challenge with our post-processing algorithm. Compared to the state-of-the-art approaches, our trained VarBERT model is simpler and of much better performance. We evaluated our model on an existing large-scale data set with 164,632 binaries and showed that it can predict variable names identical to the ones present in the original source code up to 84.15\% of the time. △ Less

Submitted 23 March, 2021; originally announced March 2021.

Comments: Work In Progress

arXiv:2103.11263 [pdf, other]

Self-Supervised Test-Time Learning for Reading Comprehension

Authors: Pratyay Banerjee, Tejas Gokhale, Chitta Baral

Abstract: Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods. In this work, we consider the task of unsupervised reading comprehension and present a method that performs "test-time learning" (TTL) on a given context (text passage), without requiring training on l… ▽ More Recent work on unsupervised question answering has shown that models can be trained with procedurally generated question-answer pairs and can achieve performance competitive with supervised methods. In this work, we consider the task of unsupervised reading comprehension and present a method that performs "test-time learning" (TTL) on a given context (text passage), without requiring training on large-scale human-authored datasets containing \textit{context-question-answer} triplets. This method operates directly on a single test context, uses self-supervision to train models on synthetically generated question-answer pairs, and then infers answers to unseen human-authored questions for this context. Our method achieves accuracies competitive with fully supervised methods and significantly outperforms current unsupervised methods. TTL methods with a smaller model are also competitive with the current state-of-the-art in unsupervised reading comprehension. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Comments: Accepted to NAACL 2021

arXiv:2102.10731 [pdf]

doi 10.3390/rs13163145

Three dimensional unique identifier based automated georeferencing and coregistration of point clouds in underground environment

Authors: Sarvesh Kumar Singh, Bikram Pratap Banerjee, Simit Raval

Abstract: Spatially and geometrically accurate laser scans are essential in modelling infrastructure for applications in civil, mining and transportation. Monitoring of underground or indoor environments such as mines or tunnels is challenging due to unavailability of a sensor positioning framework, complicated structurally symmetric layouts, repetitive features and occlusions. Current practices largely inc… ▽ More Spatially and geometrically accurate laser scans are essential in modelling infrastructure for applications in civil, mining and transportation. Monitoring of underground or indoor environments such as mines or tunnels is challenging due to unavailability of a sensor positioning framework, complicated structurally symmetric layouts, repetitive features and occlusions. Current practices largely include a manual selection of discernable reference points for georeferencing and coregistration purpose. This study aims at overcoming these practical challenges in underground or indoor laser scanning. The developed approach involves automatically and uniquely identifiable three dimensional unique identifiers (3DUIDs) in laser scans, and a 3D registration (3DReG) workflow. Field testing of the method in an underground tunnel has been found accurate, effective and efficient. Additionally, a method for automatically extracting roadway tunnel profile has been exhibited. The developed 3DUID can be used in roadway profile extraction, guided automation, sensor calibration, reference targets for routine survey and deformation monitoring. △ Less

Submitted 21 February, 2021; originally announced February 2021.

Comments: 26 pages, 10 figures

ACM Class: I.4.9

Journal ref: Remote Sensing. 2021; 13(16):3145

arXiv:2012.09938 [pdf, other]

Can Transformers Reason About Effects of Actions?

Authors: Pratyay Banerjee, Chitta Baral, Man Luo, Arindam Mitra, Kuntal Pal, Tran C. Son, Neeraj Varshney

Abstract: A recent work has shown that transformers are able to "reason" with facts and rules in a limited setting where the rules are natural language expressions of conjunctions of conditions implying a conclusion. Since this suggests that transformers may be used for reasoning with knowledge given in natural language, we do a rigorous evaluation of this with respect to a common form of knowledge and its… ▽ More A recent work has shown that transformers are able to "reason" with facts and rules in a limited setting where the rules are natural language expressions of conjunctions of conditions implying a conclusion. Since this suggests that transformers may be used for reasoning with knowledge given in natural language, we do a rigorous evaluation of this with respect to a common form of knowledge and its corresponding reasoning -- the reasoning about effects of actions. Reasoning about action and change has been a top focus in the knowledge representation subfield of AI from the early days of AI and more recently it has been a highlight aspect in common sense question answering. We consider four action domains (Blocks World, Logistics, Dock-Worker-Robots and a Generic Domain) in natural language and create QA datasets that involve reasoning about the effects of actions in these domains. We investigate the ability of transformers to (a) learn to reason in these domains and (b) transfer that learning from the generic domains to the other domains. △ Less

Submitted 17 December, 2020; originally announced December 2020.

arXiv:2012.03354 [pdf, other]

Maximizing Social Welfare in a Competitive Diffusion Model

Authors: Prithu Banerjee, Wei Chen, Laks V. S. Lakshmanan

Abstract: Influence maximization (IM) has garnered a lot of attention in the literature owing to applications such as viral marketing and infection containment. It aims to select a small number of seed users to adopt an item such that adoption propagates to a large number of users in the network. Competitive IM focuses on the propagation of competing items in the network. Existing works on competitive IM ha… ▽ More Influence maximization (IM) has garnered a lot of attention in the literature owing to applications such as viral marketing and infection containment. It aims to select a small number of seed users to adopt an item such that adoption propagates to a large number of users in the network. Competitive IM focuses on the propagation of competing items in the network. Existing works on competitive IM have several limitations. (1) They fail to incorporate economic incentives in users' decision making in item adoptions. (2) Majority of the works aim to maximize the adoption of one particular item, and ignore the collective role that different items play. (3) They focus mostly on one aspect of competition -- pure competition. To address these concerns we study competitive IM under a utility-driven propagation model called UIC, and study social welfare maximization. The problem in general is not only NP-hard but also NP-hard to approximate within any constant factor. We, therefore, devise instant dependent efficient approximation algorithms for the general case as well as a $(1-1/e-ε)$-approximation algorithm for a restricted setting. Our algorithms outperform different baselines on competitive IM, both in terms of solution quality and running time on large real networks under both synthetic and real utility configurations. △ Less

Submitted 6 December, 2020; originally announced December 2020.

arXiv:2012.02356 [pdf, other]

WeaQA: Weak Supervision via Captions for Visual Question Answering

Authors: Pratyay Banerjee, Tejas Gokhale, Yezhou Yang, Chitta Baral

Abstract: Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets. This has led to heavy reliance on datasets and a lack of generalization to new types of questions and scenes. Linguistic priors along with biases and errors due to annotator subjectivity have been shown to percolate into VQA mod… ▽ More Methodologies for training visual question answering (VQA) models assume the availability of datasets with human-annotated \textit{Image-Question-Answer} (I-Q-A) triplets. This has led to heavy reliance on datasets and a lack of generalization to new types of questions and scenes. Linguistic priors along with biases and errors due to annotator subjectivity have been shown to percolate into VQA models trained on such samples. We study whether models can be trained without any human-annotated Q-A pairs, but only with images and their associated textual descriptions or captions. We present a method to train models with synthetic Q-A pairs generated procedurally from captions. Additionally, we demonstrate the efficacy of spatial-pyramid image patches as a simple but effective alternative to dense and costly object bounding box annotations used in existing VQA models. Our experiments on three VQA benchmarks demonstrate the efficacy of this weakly-supervised approach, especially on the VQA-CP challenge, which tests performance under changing linguistic priors. △ Less

Submitted 28 May, 2021; v1 submitted 3 December, 2020; originally announced December 2020.

Comments: Accepted in Findings of ACL 2021

arXiv:2010.04833 [pdf, other]

Pandemic Lessons -- Devising an assessment framework to analyse policies for sustainability

Authors: Pradipta Banerjee, Subhrabrata Choudhury

Abstract: COVID-19 pandemic has sharply projected the globally persistent multi-dimensional fundamental challenges in securing general socio-economic wellbeing of the society. The problems intensify with increasing population densities and also vary with several socio-economic-geo-cultural activity parameters. These problems directly highlight the urgent need for accomplishing the interdependent United Nati… ▽ More COVID-19 pandemic has sharply projected the globally persistent multi-dimensional fundamental challenges in securing general socio-economic wellbeing of the society. The problems intensify with increasing population densities and also vary with several socio-economic-geo-cultural activity parameters. These problems directly highlight the urgent need for accomplishing the interdependent United Nations Sustainable Development Goals (SDGs) to ensure that in future we do not enter into vicious loops of contracting newer zoonotic viruses and need not search for their vaccines while incurring socio-economic havoc. Behavioural changes in human activities/responses are indispensable for achieving the interdependent SDGs. Using root cause analysis approach, we have developed a yearly assessment framework for viably analysing and identifying requisite region-specific downstream/upstream socio-economic policies to reach the SDGs. The framework makes use of an infographic bar chart representation based on the normalised values of 20 human activity/impact parameters classified under three categories as - negative, limiting and positive. With a holistic view encompassing the SDGs, we illustrate through this framework the impact and urgent need of region-specific human behavioural reforms. This framework enables the foresight about policies regarding their potential in bringing down the negative parameter values to the desired zero level for accomplishing the SDGs through planetary health. △ Less

Submitted 24 May, 2021; v1 submitted 9 October, 2020; originally announced October 2020.

Comments: 11 pages

arXiv:2010.03677 [pdf, other]

Agent Based Computational Model Aided Approach to Improvise the Inequality-Adjusted Human Development Index (IHDI) for Greater Parity in Real Scenario Assessments

Authors: Pradipta Banerjee, Subhrabrata Choudhury

Abstract: To design, evaluate and tune policies for all-inclusive human development, the primary requisite is to assess the true state of affairs of the society. Statistical indices like GDP, Gini Coefficients have been developed to accomplish the evaluation of the socio-economic systems. They have remained prevalent in the conventional economic theories but little do they have in the offing regarding true… ▽ More To design, evaluate and tune policies for all-inclusive human development, the primary requisite is to assess the true state of affairs of the society. Statistical indices like GDP, Gini Coefficients have been developed to accomplish the evaluation of the socio-economic systems. They have remained prevalent in the conventional economic theories but little do they have in the offing regarding true well-being and development of humans. Human Development Index (HDI) and thereafter Inequality-adjusted Human Development Index (IHDI) has been the path changing composite-index having the focus on human development. However, even though its fundamental philosophy has an all-inclusive human development focus, the composite-indices appear to be unable to grasp the actual assessment in several scenarios. This happens due to the dynamic non-linearity of social-systems where superposition principle cannot be applied between all of its inputs and outputs of the system as the system's own attributes get altered upon each input. We would discuss the apparent shortcomings and probable refinement of the existing index using an agent based computational system model approach. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: 8 pages, 4 figures

arXiv:2009.11033 [pdf, other]

Reliable, Fair and Decentralized Marketplace for Content Sharing Using Blockchain

Authors: Prabal Banerjee, Chander Govindarajan, Praveen Jayachandran, Sushmita Ruj

Abstract: Content sharing platforms such as Youtube and Vimeo have promoted pay per view models for artists to monetize their content. Yet, artists remain at the mercy of centralized platforms that control content listing and advertisement, with little transparency and fairness in terms of number of views or revenue. On the other hand, consumers are distanced from the publishers and cannot authenticate orig… ▽ More Content sharing platforms such as Youtube and Vimeo have promoted pay per view models for artists to monetize their content. Yet, artists remain at the mercy of centralized platforms that control content listing and advertisement, with little transparency and fairness in terms of number of views or revenue. On the other hand, consumers are distanced from the publishers and cannot authenticate originality of the content. In this paper, we develop a reliable and fair platform for content sharing without a central facilitator. The platform is built as a decentralized data storage layer to store and share content in a fault-tolerant manner, where the peers also participate in a blockchain network. The blockchain is used to manage content listings and as an auditable and fair marketplace transaction processor that automatically pays out the content creators and the storage facilitators using smart contracts. We demonstrate the system with the blockchain layer built on Hyperledger Fabric and the data layer built on Tahoe-LAFS,and show that our design is practical and scalable with low overheads. △ Less

Submitted 23 September, 2020; originally announced September 2020.

arXiv:2009.08566 [pdf, other]

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

Authors: Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Abstract: While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct… ▽ More While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57\%$ improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering. △ Less

Submitted 15 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted to EMNLP 2020, Long Papers

arXiv:2005.09327 [pdf, other]

Griefing-Penalty: Countermeasure for Griefing Attack in Lightning Network

Authors: Subhra Mazumdar, Prabal Banerjee, Sushmita Ruj

Abstract: Lightning Network can execute unlimited number of off-chain payments, without incurring the cost of recording each of them in the blockchain. However, conditional payments in such networks is susceptible to Griefing Attack. In this attack, an adversary doesn't resolve the payment with the intention of blocking channel capacity of the network. We propose an efficient countermeasure for the attack,… ▽ More Lightning Network can execute unlimited number of off-chain payments, without incurring the cost of recording each of them in the blockchain. However, conditional payments in such networks is susceptible to Griefing Attack. In this attack, an adversary doesn't resolve the payment with the intention of blocking channel capacity of the network. We propose an efficient countermeasure for the attack, known as Griefing-Penalty. If any party in the network mounts a griefing attack, it needs to pay a penalty proportional to the collateral cost of executing a payment. The penalty is used for compensating affected parties in the network. We propose a new payment protocol HTLC-GP or Hashed Timelock Contract with Griefing-Penalty to demonstrate the utility of the countermeasure. Upon comparing our protocol with existing payment protocol Hashed Timelock Contract, we observe that the average revenue earned by the attacker decreases substantially for HTLC-GP as compared to HTLC. We also study the impact of path length for routing a transaction and rate of griefing-penalty on the budget invested by an adversary for mounting the attack. The budget needed for mounting griefing attack in HTLC-GP is 12 times more than the budget needed by attacker in HTLC, given that each payment instance being routed via path length of hop count 20. △ Less

Submitted 16 June, 2021; v1 submitted 19 May, 2020; originally announced May 2020.

Comments: 29 pages, 20 figures, 2 table, A preliminary version of the paper was accepted in the proceedings of The 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom 2020), DOI Bookmark: 10.1109/TrustCom50675.2020.00138

arXiv:2005.04612 [pdf, other]

A machine learning based heuristic to predict the efficacy of online sale

Authors: Aditya Vikram Singhania, Saronyo Lal Mukherjee, Ritajit Majumdar, Akash Mehta, Priyanka Banerjee, Debasmita Bhoumik

Abstract: It is difficult to decide upon the efficacy of an online sale simply from the discount offered on commodities. Different features have different influence on the price of a product which must be taken into consideration when determining the significance of a discount. In this paper we have proposed a machine learning based heuristic to quantify the \textit{"significance"} of the discount offered o… ▽ More It is difficult to decide upon the efficacy of an online sale simply from the discount offered on commodities. Different features have different influence on the price of a product which must be taken into consideration when determining the significance of a discount. In this paper we have proposed a machine learning based heuristic to quantify the \textit{"significance"} of the discount offered on any commodity. Our proposed technique can quantify the significance of the discount based on features and the original price, and hence can guide a buyer during a sale season by predicting the efficacy of the sale. We have applied this technique on the Flipkart Summer Sale dataset using Support Vector Machine, which predicts the efficacy of the sale with an accuracy of 91.11\%. Our result shows that very few mobile phones have a significant discount during the Flipkart Summer Sale. △ Less

Submitted 10 May, 2020; originally announced May 2020.

Comments: Paper selected for Oral presentation at the 2nd International Conference on Emerging Technologies in Data Mining and Information Security (IEMIS 2020). Will appear in Springer Advances in Intelligent Systems and Computing (AISC) Series

arXiv:2005.00316 [pdf, other]

Self-supervised Knowledge Triplet Learning for Zero-shot Question Answering

Authors: Pratyay Banerjee, Chitta Baral

Abstract: The aim of all Question Answering (QA) systems is to be able to generalize to unseen questions. Current supervised methods are reliant on expensive data annotation. Moreover, such annotations can introduce unintended annotator bias which makes systems focus more on the bias than the actual task. In this work, we propose Knowledge Triplet Learning (KTL), a self-supervised task over knowledge graphs… ▽ More The aim of all Question Answering (QA) systems is to be able to generalize to unseen questions. Current supervised methods are reliant on expensive data annotation. Moreover, such annotations can introduce unintended annotator bias which makes systems focus more on the bias than the actual task. In this work, we propose Knowledge Triplet Learning (KTL), a self-supervised task over knowledge graphs. We propose heuristics to create synthetic graphs for commonsense and scientific knowledge. We propose methods of how to use KTL to perform zero-shot QA and our experiments show considerable improvements over large pre-trained transformer models. △ Less

Submitted 17 September, 2020; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: Accepted to EMNLP 2020 Long Papers

arXiv:2004.03101 [pdf, other]

Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering

Authors: Pratyay Banerjee, Chitta Baral

Abstract: Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning by composing knowledge spread over multiple sentences. In the recently introduced open domain question answering challenge datasets, QASC and OpenBookQA, we need to perform retrieval of facts and compose facts to correctly answer questions. In our work, we learn a semantic knowledge ranki… ▽ More Open Domain Question Answering requires systems to retrieve external knowledge and perform multi-hop reasoning by composing knowledge spread over multiple sentences. In the recently introduced open domain question answering challenge datasets, QASC and OpenBookQA, we need to perform retrieval of facts and compose facts to correctly answer questions. In our work, we learn a semantic knowledge ranking model to re-rank knowledge retrieved through Lucene based information retrieval systems. We further propose a "knowledge fusion model" which leverages knowledge in BERT-based language models with externally retrieved knowledge and improves the knowledge understanding of the BERT-based language models. On both OpenBookQA and QASC datasets, the knowledge fusion model with semantically re-ranked knowledge outperforms previous attempts. △ Less

Submitted 17 April, 2020; v1 submitted 6 April, 2020; originally announced April 2020.

Comments: 9 pages. 4 figures, 4 tables

arXiv:2003.05162 [pdf, other]

Video2Commonsense: Generating Commonsense Descriptions to Enrich Video Captioning

Authors: Zhiyuan Fang, Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Abstract: Captioning is a crucial and challenging task for video understanding. In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene. Observable changes such as movements, manipulations, and transformations of the objects in the scene, are reflected in conventional video captioning. Unlike images, actions in videos are also inherently linked to… ▽ More Captioning is a crucial and challenging task for video understanding. In videos that involve active agents such as humans, the agent's actions can bring about myriad changes in the scene. Observable changes such as movements, manipulations, and transformations of the objects in the scene, are reflected in conventional video captioning. Unlike images, actions in videos are also inherently linked to social aspects such as intentions (why the action is taking place), effects (what changes due to the action), and attributes that describe the agent. Thus for video understanding, such as when captioning videos or when answering questions about videos, one must have an understanding of these commonsense aspects. We present the first work on generating commonsense captions directly from videos, to describe latent aspects such as intentions, effects, and attributes. We present a new dataset "Video-to-Commonsense (V2C)" that contains $\sim9k$ videos of human agents performing various actions, annotated with 3 types of commonsense descriptions. Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions. Both the generation task and the QA task can be used to enrich video captions. △ Less

Submitted 7 January, 2023; v1 submitted 11 March, 2020; originally announced March 2020.

Comments: EMNLP 2020. V2C Website: https://asu-apg.github.io/Video2Commonsense/

arXiv:2003.03446 [pdf, other]

Natural Language QA Approaches using Reasoning with External Knowledge

Authors: Chitta Baral, Pratyay Banerjee, Kuntal Kumar Pal, Arindam Mitra

Abstract: Question answering (QA) in natural language (NL) has been an important aspect of AI from its early days. Winograd's ``councilmen'' example in his 1972 paper and McCarthy's Mr. Hug example of 1976 highlights the role of external knowledge in NL understanding. While Machine Learning has been the go-to approach in NL processing as well as NL question answering (NLQA) for the last 30 years, recently t… ▽ More Question answering (QA) in natural language (NL) has been an important aspect of AI from its early days. Winograd's ``councilmen'' example in his 1972 paper and McCarthy's Mr. Hug example of 1976 highlights the role of external knowledge in NL understanding. While Machine Learning has been the go-to approach in NL processing as well as NL question answering (NLQA) for the last 30 years, recently there has been an increasingly emphasized thread on NLQA where external knowledge plays an important role. The challenges inspired by Winograd's councilmen example, and recent developments such as the Rebooting AI book, various NLQA datasets, research on knowledge acquisition in the NLQA context, and their use in various NLQA models have brought the issue of NLQA using ``reasoning'' with external knowledge to the forefront. In this paper, we present a survey of the recent work on them. We believe our survey will help establish a bridge between multiple fields of AI, especially between (a) the traditional fields of knowledge representation and reasoning and (b) the field of NL understanding and NLQA. △ Less

Submitted 6 March, 2020; originally announced March 2020.

Comments: 6 pages, 3 figures, Work in Progress

arXiv:2002.08325 [pdf, other]

VQA-LOL: Visual Question Answering under the Lens of Logic

Authors: Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Abstract: Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions. When put under this \textit{Lens of Logic}, state-of-the-art VQA models ha… ▽ More Logical connectives and their implications on the meaning of a natural language sentence are a fundamental aspect of understanding. In this paper, we investigate whether visual question answering (VQA) systems trained to answer a question about an image, are able to answer the logical composition of multiple such questions. When put under this \textit{Lens of Logic}, state-of-the-art VQA models have difficulty in correctly answering these logically composed questions. We construct an augmentation of the VQA dataset as a benchmark, with questions containing logical compositions and linguistic transformations (negation, disjunction, conjunction, and antonyms). We propose our {Lens of Logic (LOL)} model which uses question-attention and logic-attention to understand logical connectives in the question, and a novel Fréchet-Compatibility Loss, which ensures that the answers of the component questions and the composed question are consistent with the inferred logical operation. Our model shows substantial improvement in learning logical compositions while retaining performance on VQA. We suggest this work as a move towards robustness by embedding logical connectives in visual understanding. △ Less

Submitted 15 July, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Accepted to ECCV 2020

Showing 1–50 of 80 results for author: Banerjee, P