-
Cage: Hardware-Accelerated Safe WebAssembly
Authors:
Martin Fink,
Dimitrios Stavrakakis,
Dennis Sprokholt,
Soham Chakraborty,
Jan-Erik Ekberg,
Pramod Bhatotia
Abstract:
WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM's sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarant…
▽ More
WebAssembly (WASM) is an immensely versatile and increasingly popular compilation target. It executes applications written in several languages (e.g., C/C++) with near-native performance in various domains (e.g., mobile, edge, cloud). Despite WASM's sandboxing feature, which isolates applications from other instances and the host platform, WASM does not inherently provide any memory safety guarantees for applications written in low-level, unsafe languages.
To this end, we propose Cage, a hardware-accelerated toolchain for WASM that supports unmodified applications compiled to WASM and utilizes diverse Arm hardware features aiming to enrich the memory safety properties of WASM. Precisely, Cage leverages Arm's Memory Tagging Extension (MTE) to (i)~provide spatial and temporal memory safety for heap and stack allocations and (ii)~improve the performance of WASM's sandboxing mechanism. Cage further employs Arm's Pointer Authentication (PAC) to prevent leaked pointers from being reused by other WASM instances, thus enhancing WASM's security properties.
We implement our system based on 64-bit WASM. We provide a WASM compiler and runtime with support for Arm's MTE and PAC. On top of that, Cage's LLVM-based compiler toolchain transforms unmodified applications to provide spatial and temporal memory safety for stack and heap allocations and prevent function pointer reuse. Our evaluation on real hardware shows that Cage incurs minimal runtime ($<5.8\,\%$) and memory ($<3.7\,\%$) overheads and can improve the performance of WASM's sandboxing mechanism, achieving a speedup of over $5.1\,\%$, while offering efficient memory safety guarantees.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Quantum Inverse Contextual Vision Transformers (Q-ICVT): A New Frontier in 3D Object Detection for AVs
Authors:
Sanjay Bhargav Dharavath,
Tanmoy Dam,
Supriyo Chakraborty,
Prithwiraj Roy,
Aniruddha Maiti
Abstract:
The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in detecting distant objects due to the disparity between the high resolution of cameras and the sparse data from LiDAR. Insufficient integration of global perspectives wi…
▽ More
The field of autonomous vehicles (AVs) predominantly leverages multi-modal integration of LiDAR and camera data to achieve better performance compared to using a single modality. However, the fusion process encounters challenges in detecting distant objects due to the disparity between the high resolution of cameras and the sparse data from LiDAR. Insufficient integration of global perspectives with local-level details results in sub-optimal fusion performance.To address this issue, we have developed an innovative two-stage fusion process called Quantum Inverse Contextual Vision Transformers (Q-ICVT). This approach leverages adiabatic computing in quantum concepts to create a novel reversible vision transformer known as the Global Adiabatic Transformer (GAT). GAT aggregates sparse LiDAR features with semantic features in dense images for cross-modal integration in a global form. Additionally, the Sparse Expert of Local Fusion (SELF) module maps the sparse LiDAR 3D proposals and encodes position information of the raw point cloud onto the dense camera feature space using a gating point fusion approach. Our experiments show that Q-ICVT achieves an mAPH of 82.54 for L2 difficulties on the Waymo dataset, improving by 1.88% over current state-of-the-art fusion methods. We also analyze GAT and SELF in ablation studies to highlight the impact of Q-ICVT. Our code is available at https://github.com/sanjay-810/Qicvt Q-ICVT
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
Exploiting Air Quality Monitors to Perform Indoor Surveillance: Academic Setting
Authors:
Prasenjit Karmakar,
Swadhin Pradhan,
Sandip Chakraborty
Abstract:
Changing public perceptions and government regulations have led to the widespread use of low-cost air quality monitors in modern indoor spaces. Typically, these monitors detect air pollutants to augment the end user's understanding of her indoor environment. Studies have shown that having access to one's air quality context reinforces the user's urge to take necessary actions to improve the air ov…
▽ More
Changing public perceptions and government regulations have led to the widespread use of low-cost air quality monitors in modern indoor spaces. Typically, these monitors detect air pollutants to augment the end user's understanding of her indoor environment. Studies have shown that having access to one's air quality context reinforces the user's urge to take necessary actions to improve the air over time. Thus, user's activities significantly influence the indoor air quality. Such correlation can be exploited to get hold of sensitive indoor activities from the side-channel air quality fluctuations. This study explores the odds of identifying eight indoor activities (i.e., enter, exit, fan on, fan off, AC on, AC off, gathering, eating) in a research lab with an in-house low-cost air quality monitoring platform named DALTON. Our extensive data collection and analysis over three months shows 97.7% classification accuracy in our dataset.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis
Authors:
Sneha Swaroopa,
Rijoy Mukherjee,
Anushka Debnath,
Rajat Subhra Chakraborty
Abstract:
The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production…
▽ More
The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production using LLMs at current state-of-the-art lack sufficient functional correctness to be practically viable, compared to automatic generation of programs in general-purpose programming languages such as C, C++, Python, etc. With this as the key insight, in this paper we assess the performance of a two-stage software pipeline for automated Verilog RTL generation: LLM based automatic generation of annotated C++ code suitable for high-level synthesis (HLS), followed by HLS to generate Verilog RTL. We have benchmarked the performance of our proposed scheme using the open-source VerilogEval dataset, for four different industry-scale LLMs, and the Vitis HLS tool. Our experimental results demonstrate that our two-step technique substantially outperforms previous proposed techniques of direct Verilog RTL generation by LLMs in terms of average functional correctness rates, reaching score of 0.86 in pass@1 metric.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
A Dataset for Multi-intensity Continuous Human Activity Recognition through Passive Sensing
Authors:
Argha Sen,
Anirban Das,
Swadhin Pradhan,
Sandip Chakraborty
Abstract:
Human activity recognition (HAR) is essential in healthcare, elder care, security, and human-computer interaction. The use of precise sensor data to identify activities passively and continuously makes HAR accessible and ubiquitous. Specifically, millimeter wave (mmWave) radar is promising for passive and continuous HAR due to its ability to penetrate non-metallic materials and provide high-resolu…
▽ More
Human activity recognition (HAR) is essential in healthcare, elder care, security, and human-computer interaction. The use of precise sensor data to identify activities passively and continuously makes HAR accessible and ubiquitous. Specifically, millimeter wave (mmWave) radar is promising for passive and continuous HAR due to its ability to penetrate non-metallic materials and provide high-resolution wireless sensing. Although mmWave sensors are effective at capturing macro-scale activities, like exercising, they fail to capture micro-scale activities, such as typing. In this paper, we introduce mmDoppler, a novel dataset that utilizes off-the-shelf (COTS) mmWave radar in order to capture both macro and micro-scale human movements using a machine-learning driven signal processing pipeline. The dataset includes seven subjects performing 19 distinct activities and employs adaptive doppler resolution to enhance activity recognition. By adjusting the radar's doppler resolution based on the activity type, our system captures subtle movements more precisely. mmDoppler includes range-doppler heatmaps, offering detailed motion dynamics, with data collected in a controlled environment with single as well as multiple subjects performing activities simultaneously. The dataset aims to bridge the gap in HAR systems by providing a more comprehensive and detailed resource for improving the robustness and accuracy of mmWave radar activity recognition.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Engineering an Efficient Approximate DNF-Counter
Authors:
Mate Soos,
Uddalok Sarkar,
Divesh Aggarwal,
Sourav Chakraborty,
Kuldeep S. Meel,
Maciej Obremski
Abstract:
Model counting is a fundamental problem in many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in the Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it often intractable…
▽ More
Model counting is a fundamental problem in many practical applications, including query evaluation in probabilistic databases and failure-probability estimation of networks. In this work, we focus on a variant of this problem where the underlying formula is expressed in the Disjunctive Normal Form (DNF), also known as #DNF. This problem has been shown to be #P-complete, making it often intractable to solve exactly. Much research has therefore focused on obtaining approximate solutions, particularly in the form of $(\varepsilon, δ)$ approximations.
The primary contribution of this paper is a new approach, called pepin, an approximate #DNF counter that significantly outperforms prior state-of-the-art approaches. Our work is based on the recent breakthrough in the context of the union of sets in the streaming model. We demonstrate the effectiveness of our approach through extensive experiments and show that it provides an affirmative answer to the challenge of efficiently computing #DNF.
△ Less
Submitted 29 July, 2024;
originally announced July 2024.
-
FedAR: Addressing Client Unavailability in Federated Learning with Local Update Approximation and Rectification
Authors:
Chutian Jiang,
Hansong Zhou,
Xiaonan Zhang,
Shayok Chakraborty
Abstract:
Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorate…
▽ More
Federated learning (FL) enables clients to collaboratively train machine learning models under the coordination of a server in a privacy-preserving manner. One of the main challenges in FL is that the server may not receive local updates from each client in each round due to client resource limitations and intermittent network connectivity. The existence of unavailable clients severely deteriorates the overall FL performance. In this paper, we propose , a novel client update Approximation and Rectification algorithm for FL to address the client unavailability issue. FedAR can get all clients involved in the global model update to achieve a high-quality global model on the server, which also furnishes accurate predictions for each client. To this end, the server uses the latest update from each client as a surrogate for its current update. It then assigns a different weight to each client's surrogate update to derive the global model, in order to guarantee contributions from both available and unavailable clients. Our theoretical analysis proves that FedAR achieves optimal convergence rates on non-IID datasets for both convex and non-convex smooth loss functions. Extensive empirical studies show that FedAR comprehensively outperforms state-of-the-art FL baselines including FedAvg, MIFA, FedVARP and Scaffold in terms of the training loss, test accuracy, and bias mitigation. Moreover, FedAR also depicts impressive performance in the presence of a large number of clients with severe client unavailability.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
A Flexible and Scalable Approach for Collecting Wildlife Advertisements on the Web
Authors:
Juliana Barbosa,
Sunandan Chakraborty,
Juliana Freire
Abstract:
Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficul…
▽ More
Wildlife traffickers are increasingly carrying out their activities in cyberspace. As they advertise and sell wildlife products in online marketplaces, they leave digital traces of their activity. This creates a new opportunity: by analyzing these traces, we can obtain insights into how trafficking networks work as well as how they can be disrupted. However, collecting such information is difficult. Online marketplaces sell a very large number of products and identifying ads that actually involve wildlife is a complex task that is hard to automate. Furthermore, given that the volume of data is staggering, we need scalable mechanisms to acquire, filter, and store the ads, as well as to make them available for analysis. In this paper, we present a new approach to collect wildlife trafficking data at scale. We propose a data collection pipeline that combines scoped crawlers for data discovery and acquisition with foundational models and machine learning classifiers to identify relevant ads. We describe a dataset we created using this pipeline which is, to the best of our knowledge, the largest of its kind: it contains almost a million ads obtained from 41 marketplaces, covering 235 species and 20 languages. The source code is publicly available at \url{https://github.com/VIDA-NYU/wildlife_pipeline}.
△ Less
Submitted 26 July, 2024;
originally announced July 2024.
-
Empowering the Quantum Cloud User with QRIO
Authors:
Shmeelok Chakraborty,
Yuewen Hou,
Ang Chen,
Gokul Subramanian Ravi
Abstract:
Quantum computing is moving swiftly from theoretical to practical applications, making it crucial to establish a significant quantum advantage. Despite substantial investments, access to quantum devices is still limited, with users facing issues like long wait times and inefficient resource management. Unlike the mature cloud solutions for classical computing, quantum computing lacks effective inf…
▽ More
Quantum computing is moving swiftly from theoretical to practical applications, making it crucial to establish a significant quantum advantage. Despite substantial investments, access to quantum devices is still limited, with users facing issues like long wait times and inefficient resource management. Unlike the mature cloud solutions for classical computing, quantum computing lacks effective infrastructure for resource optimization. We propose a Quantum Resource Infrastructure Orchestrator (QRIO), a state-of-the-art cloud resource manager built on Kubernetes that is tailored to quantum computing. QRIO seeks to democratize access to quantum devices by providing customizable, user-friendly, open-source resource management. QRIO's design aims to ensure equitable access, optimize resource utilization, and support diverse applications, thereby speeding up innovation and making quantum computing more accessible and efficient to a broader user base. In this paper, we discuss QRIO's various features and evaluate its capability in several representative usecases.
△ Less
Submitted 25 July, 2024; v1 submitted 24 July, 2024;
originally announced July 2024.
-
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Authors:
Michael-Andrei Panaitescu-Liess,
Zora Che,
Bang An,
Yuancheng Xu,
Pankayaraj Pathmanathan,
Souradip Chakraborty,
Sicheng Zhu,
Tom Goldstein,
Furong Huang
Abstract:
Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis an…
▽ More
Large Language Models (LLMs) have demonstrated impressive capabilities in generating diverse and contextually rich text. However, concerns regarding copyright infringement arise as LLMs may inadvertently produce copyrighted material. In this paper, we first investigate the effectiveness of watermarking LLMs as a deterrent against the generation of copyrighted texts. Through theoretical analysis and empirical evaluation, we demonstrate that incorporating watermarks into LLMs significantly reduces the likelihood of generating copyrighted content, thereby addressing a critical concern in the deployment of LLMs. Additionally, we explore the impact of watermarking on Membership Inference Attacks (MIAs), which aim to discern whether a sample was part of the pretraining dataset and may be used to detect copyright violations. Surprisingly, we find that watermarking adversely affects the success rate of MIAs, complicating the task of detecting copyrighted text in the pretraining dataset. Finally, we propose an adaptive technique to improve the success rate of a recent MIA under watermarking. Our findings underscore the importance of developing adaptive methods to study critical problems in LLMs with potential legal implications.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities
Authors:
Prasenjit Karmakar,
Swadhin Pradhan,
Sandip Chakraborty
Abstract:
In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affec…
▽ More
In recent years, indoor air pollution has posed a significant threat to our society, claiming over 3.2 million lives annually. Developing nations, such as India, are most affected since lack of knowledge, inadequate regulation, and outdoor air pollution lead to severe daily exposure to pollutants. However, only a limited number of studies have attempted to understand how indoor air pollution affects developing countries like India. To address this gap, we present spatiotemporal measurements of air quality from 30 indoor sites over six months during summer and winter seasons. The sites are geographically located across four regions of type: rural, suburban, and urban, covering the typical low to middle-income population in India. The dataset contains various types of indoor environments (e.g., studio apartments, classrooms, research laboratories, food canteens, and residential households), and can provide the basis for data-driven learning model research aimed at coping with unique pollution patterns in developing countries. This unique dataset demands advanced data cleaning and imputation techniques for handling missing data due to power failure or network outages during data collection. Furthermore, through a simple speech-to-text application, we provide real-time indoor activity labels annotated by occupants. Therefore, environmentalists and ML enthusiasts can utilize this dataset to understand the complex patterns of the pollutants under different indoor activities, identify recurring sources of pollution, forecast exposure, improve floor plans and room structures of modern indoor designs, develop pollution-aware recommender systems, etc.
△ Less
Submitted 17 August, 2024; v1 submitted 19 July, 2024;
originally announced July 2024.
-
Exploring Indoor Air Quality Dynamics in Developing Nations: A Perspective from India
Authors:
Prasenjit Karmakar,
Swadhin Pradhan,
Sandip Chakraborty
Abstract:
Indoor air pollution is a major issue in developing countries such as India and Bangladesh, exacerbated by factors like traditional cooking methods, insufficient ventilation, and cramped living conditions, all of which elevate the risk of health issues like lung infections and cardiovascular diseases. With the World Health Organization associating around 3.2 million annual deaths globally to house…
▽ More
Indoor air pollution is a major issue in developing countries such as India and Bangladesh, exacerbated by factors like traditional cooking methods, insufficient ventilation, and cramped living conditions, all of which elevate the risk of health issues like lung infections and cardiovascular diseases. With the World Health Organization associating around 3.2 million annual deaths globally to household air pollution, the gravity of the problem is clear. Yet, extensive empirical studies exploring these unique patterns and indoor pollutions extent are missing. To fill this gap, we carried out a six months long field study involving over 30 households, uncovering the complexity of indoor air pollution in developing countries, such as the longer lingering time of VOCs in the air or the significant influence of air circulation on the spatiotemporal distribution of pollutants. We introduced an innovative IoT air quality sensing platform, the Distributed Air QuaLiTy MONitor (DALTON ), explicitly designed to meet the needs of these nations, considering factors like cost, sensor type, accuracy, network connectivity, power, and usability. As a result of a multi-device deployment, the platform identifies pollution hot-spots in low and middle-income households in developing nations. It identifies best practices to minimize daily indoor pollution exposure. Our extensive qualitative survey estimates an overall system usability score of 2.04, indicating an efficient system for air quality monitoring.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
Discovering governing equation in structural dynamics from acceleration-only measurements
Authors:
Calvin Alvares,
Souvik Chakraborty
Abstract:
Over the past few years, equation discovery has gained popularity in different fields of science and engineering. However, existing equation discovery algorithms rely on the availability of noisy measurements of the state variables (i.e., displacement {and velocity}). This is a major bottleneck in structural dynamics, where we often only have access to acceleration measurements. To that end, this…
▽ More
Over the past few years, equation discovery has gained popularity in different fields of science and engineering. However, existing equation discovery algorithms rely on the availability of noisy measurements of the state variables (i.e., displacement {and velocity}). This is a major bottleneck in structural dynamics, where we often only have access to acceleration measurements. To that end, this paper introduces a novel equation discovery algorithm for discovering governing equations of dynamical systems from acceleration-only measurements. The proposed algorithm employs a library-based approach for equation discovery. To enable equation discovery from acceleration-only measurements, we propose a novel Approximate Bayesian Computation (ABC) model that prioritizes parsimonious models. The efficacy of the proposed algorithm is illustrated using {four} structural dynamics examples that include both linear and nonlinear dynamical systems. The case studies presented illustrate the possible application of the proposed approach for equation discovery of dynamical systems from acceleration-only measurements.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
Enhancing Split Computing and Early Exit Applications through Predefined Sparsity
Authors:
Luigi Capogrosso,
Enrico Fraccaroli,
Giulio Petrozziello,
Francesco Setti,
Samarjit Chakraborty,
Franco Fummi,
Marco Cristani
Abstract:
In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices availab…
▽ More
In the past decade, Deep Neural Networks (DNNs) achieved state-of-the-art performance in a broad range of problems, spanning from object classification and action recognition to smart building and healthcare. The flexibility that makes DNNs such a pervasive technology comes at a price: the computational requirements preclude their deployment on most of the resource-constrained edge devices available today to solve real-time and real-world tasks. This paper introduces a novel approach to address this challenge by combining the concept of predefined sparsity with Split Computing (SC) and Early Exit (EE). In particular, SC aims at splitting a DNN with a part of it deployed on an edge device and the rest on a remote server. Instead, EE allows the system to stop using the remote server and rely solely on the edge device's computation if the answer is already good enough. Specifically, how to apply such a predefined sparsity to a SC and EE paradigm has never been studied. This paper studies this problem and shows how predefined sparsity significantly reduces the computational, storage, and energy burdens during the training and inference phases, regardless of the hardware platform. This makes it a valuable approach for enhancing the performance of SC and EE applications. Experimental results showcase reductions exceeding 4x in storage and computational complexity without compromising performance. The source code is available at https://github.com/intelligolabs/sparsity_sc_ee.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Approximate Degree Composition for Recursive Functions
Authors:
Sourav Chakraborty,
Chandrima Kayal,
Rajat Mittal,
Manaswi Paraashar,
Nitin Saurabh
Abstract:
Determining the approximate degree composition for Boolean functions remains a significant unsolved problem in Boolean function complexity. In recent decades, researchers have concentrated on proving that approximate degree composes for special types of inner and outer functions. An important and extensively studied class of functions are the recursive functions, i.e.~functions obtained by composi…
▽ More
Determining the approximate degree composition for Boolean functions remains a significant unsolved problem in Boolean function complexity. In recent decades, researchers have concentrated on proving that approximate degree composes for special types of inner and outer functions. An important and extensively studied class of functions are the recursive functions, i.e.~functions obtained by composing a base function with itself a number of times. Let $h^d$ denote the standard $d$-fold composition of the base function $h$.
The main result of this work is to show that the approximate degree composes if either of the following conditions holds:
\begin{itemize}
\item The outer function $f:\{0,1\}^n\to \{0,1\}$ is a recursive function of the form $h^d$, with $h$ being any base function and $d= Ω(\log\log n)$.
\item The inner function is a recursive function of the form $h^d$, with $h$ being any constant arity base function (other than AND and OR) and $d= Ω(\log\log n)$, where $n$ is the arity of the outer function.
\end{itemize}
In terms of proof techniques, we first observe that the lower bound for composition can be obtained by introducing majority in between the inner and the outer functions. We then show that majority can be \emph{efficiently eliminated} if the inner or outer function is a recursive function.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
MTL-Split: Multi-Task Learning for Edge Devices using Split Computing
Authors:
Luigi Capogrosso,
Enrico Fraccaroli,
Samarjit Chakraborty,
Franco Fummi,
Marco Cristani
Abstract:
Split Computing (SC), where a Deep Neural Network (DNN) is intelligently split with a part of it deployed on an edge device and the rest on a remote server is emerging as a promising approach. It allows the power of DNNs to be leveraged for latency-sensitive applications that do not allow the entire DNN to be deployed remotely, while not having sufficient computation bandwidth available locally. I…
▽ More
Split Computing (SC), where a Deep Neural Network (DNN) is intelligently split with a part of it deployed on an edge device and the rest on a remote server is emerging as a promising approach. It allows the power of DNNs to be leveraged for latency-sensitive applications that do not allow the entire DNN to be deployed remotely, while not having sufficient computation bandwidth available locally. In many such embedded systems scenarios, such as those in the automotive domain, computational resource constraints also necessitate Multi-Task Learning (MTL), where the same DNN is used for multiple inference tasks instead of having dedicated DNNs for each task, which would need more computing bandwidth. However, how to partition such a multi-tasking DNN to be deployed within a SC framework has not been sufficiently studied. This paper studies this problem, and MTL-Split, our novel proposed architecture, shows encouraging results on both synthetic and real-world data. The source code is available at https://github.com/intelligolabs/MTL-Split.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Predicting Visual Attention in Graphic Design Documents
Authors:
Souradeep Chakraborty,
Zijun Wei,
Conor Kelton,
Seoyoung Ahn,
Aruna Balasubramanian,
Gregory J. Zelinsky,
Dimitris Samaras
Abstract:
We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage m…
▽ More
We present a model for predicting visual attention during the free viewing of graphic design documents. While existing works on this topic have aimed at predicting static saliency of graphic designs, our work is the first attempt to predict both spatial attention and dynamic temporal order in which the document regions are fixated by gaze using a deep learning based model. We propose a two-stage model for predicting dynamic attention on such documents, with webpages being our primary choice of document design for demonstration. In the first stage, we predict the saliency maps for each of the document components (e.g. logos, banners, texts, etc. for webpages) conditioned on the type of document layout. These component saliency maps are then jointly used to predict the overall document saliency. In the second stage, we use these layout-specific component saliency maps as the state representation for an inverse reinforcement learning model of fixation scanpath prediction during document viewing. To test our model, we collected a new dataset consisting of eye movements from 41 people freely viewing 450 webpages (the largest dataset of its kind). Experimental results show that our model outperforms existing models in both saliency and scanpath prediction for webpages, and also generalizes very well to other graphic design documents such as comics, posters, mobile UIs, etc. and natural images.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
A Simple Representation of Tree Covering Utilizing Balanced Parentheses and Efficient Implementation of Average-Case Optimal RMQs
Authors:
Kou Hamada,
Sankardeep Chakraborty,
Seungbum Jo,
Takuto Koriyama,
Kunihiko Sadakane,
Srinivasa Rao Satti
Abstract:
Tree covering is a technique for decomposing a tree into smaller-sized trees with desirable properties, and has been employed in various succinct data structures. However, significant hurdles stand in the way of a practical implementation of tree covering: a lot of pointers are used to maintain the tree-covering hierarchy and many indices for tree navigational queries consume theoretically negligi…
▽ More
Tree covering is a technique for decomposing a tree into smaller-sized trees with desirable properties, and has been employed in various succinct data structures. However, significant hurdles stand in the way of a practical implementation of tree covering: a lot of pointers are used to maintain the tree-covering hierarchy and many indices for tree navigational queries consume theoretically negligible yet practically vast space. To tackle these problems, we propose a simple representation of tree covering using a balanced parenthesis representation. The key to the proposal is the observation that every micro tree splits into at most two intervals on the BP representation. Utilizing the representation, we propose several data structures that represent a tree and its tree cover, which consequently allow micro tree compression with arbitrary coding and efficient tree navigational queries. We also applied our data structure to average-case optimal RMQ by Munro et al.~[ESA 2021] and implemented the RMQ data structure. Our RMQ data structures spend less than $2n$ bits and process queries in a practical time on several settings of the performance evaluation, reducing the gap between theoretical space complexity and actual space consumption. We also implement tree navigational operations while using the same amount of space as the RMQ data structures. We believe the representation can be widely utilized for designing practically memory-efficient data structures based on tree covering.
△ Less
Submitted 7 August, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
On Fourier analysis of sparse Boolean functions over certain Abelian groups
Authors:
Sourav Chakraborty,
Swarnalipa Datta,
Pranjal Dutta,
Arijit Ghosh,
Swagato Sanyal
Abstract:
Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied s…
▽ More
Given an Abelian group G, a Boolean-valued function f: G -> {-1,+1}, is said to be s-sparse, if it has at most s-many non-zero Fourier coefficients over the domain G. In a seminal paper, Gopalan et al. proved "Granularity" for Fourier coefficients of Boolean valued functions over Z_2^n, that have found many diverse applications in theoretical computer science and combinatorics. They also studied structural results for Boolean functions over Z_2^n which are approximately Fourier-sparse. In this work, we obtain structural results for approximately Fourier-sparse Boolean valued functions over Abelian groups G of the form,G:= Z_{p_1}^{n_1} \times ... \times Z_{p_t}^{n_t}, for distinct primes p_i. We also obtain a lower bound of the form 1/(m^{2}s)^ceiling(phi(m)/2), on the absolute value of the smallest non-zero Fourier coefficient of an s-sparse function, where m=p_1 ... p_t, and phi(m)=(p_1-1) ... (p_t-1). We carefully apply probabilistic techniques from Gopalan et al., to obtain our structural results, and use some non-trivial results from algebraic number theory to get the lower bound.
We construct a family of at most s-sparse Boolean functions over Z_p^n, where p > 2, for arbitrarily large enough s, where the minimum non-zero Fourier coefficient is 1/omega(n). The "Granularity" result of Gopalan et al. implies that the absolute values of non-zero Fourier coefficients of any s-sparse Boolean valued function over Z_2^n are 1/O(s). So, our result shows that one cannot expect such a lower bound for general Abelian groups.
Using our new structural results on the Fourier coefficients of sparse functions, we design an efficient testing algorithm for Fourier-sparse Boolean functions, thata requires poly((ms)^phi(m),1/epsilon)-many queries. Further, we prove an Omega(sqrt{s}) lower bound on the query complexity of any adaptive sparsity testing algorithm.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Authors:
Mucong Ding,
Souradip Chakraborty,
Vibhu Agrawal,
Zora Che,
Alec Koppel,
Mengdi Wang,
Amrit Bedi,
Furong Huang
Abstract:
Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conc…
▽ More
Reinforcement Learning from Human Feedback (RLHF) is a key method for aligning large language models (LLMs) with human preferences. However, current offline alignment approaches like DPO, IPO, and SLiC rely heavily on fixed preference datasets, which can lead to sub-optimal performance. On the other hand, recent literature has focused on designing online RLHF methods but still lacks a unified conceptual formulation and suffers from distribution shift issues. To address this, we establish that online LLM alignment is underpinned by bilevel optimization. By reducing this formulation to an efficient single-level first-order method (using the reward-policy equivalence), our approach generates new samples and iteratively refines model alignment by exploring responses and regulating preference labels. In doing so, we permit alignment methods to operate in an online and self-improving manner, as well as generalize prior online RLHF methods as special cases. Compared to state-of-the-art iterative RLHF methods, our approach significantly improves alignment performance on open-sourced datasets with minimal computational overhead.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
A Dual Attention-aided DenseNet-121 for Classification of Glaucoma from Fundus Images
Authors:
Soham Chakraborty,
Ayush Roy,
Payel Pramanik,
Daria Valenkova,
Ram Sarkar
Abstract:
Deep learning and computer vision methods are nowadays predominantly used in the field of ophthalmology. In this paper, we present an attention-aided DenseNet-121 for classifying normal and glaucomatous eyes from fundus images. It involves the convolutional block attention module to highlight relevant spatial and channel features extracted by DenseNet-121. The channel recalibration module further…
▽ More
Deep learning and computer vision methods are nowadays predominantly used in the field of ophthalmology. In this paper, we present an attention-aided DenseNet-121 for classifying normal and glaucomatous eyes from fundus images. It involves the convolutional block attention module to highlight relevant spatial and channel features extracted by DenseNet-121. The channel recalibration module further enriches the features by utilizing edge information along with the statistical features of the spatial dimension. For the experiments, two standard datasets, namely RIM-ONE and ACRIMA, have been used. Our method has shown superior results than state-of-the-art models. An ablation study has also been conducted to show the effectiveness of each of the components. The code of the proposed work is available at: https://github.com/Soham2004GitHub/DADGC.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Is poisoning a real threat to LLM alignment? Maybe more so than you think
Authors:
Pankayaraj Pathmanathan,
Souradip Chakraborty,
Xiangyu Liu,
Yongyuan Liang,
Furong Huang
Abstract:
Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLH…
▽ More
Recent advancements in Reinforcement Learning with Human Feedback (RLHF) have significantly impacted the alignment of Large Language Models (LLMs). The sensitivity of reinforcement learning algorithms such as Proximal Policy Optimization (PPO) has led to new line work on Direct Policy Optimization (DPO), which treats RLHF in a supervised learning framework. The increased practical use of these RLHF methods warrants an analysis of their vulnerabilities. In this work, we investigate the vulnerabilities of DPO to poisoning attacks under different scenarios and compare the effectiveness of preference poisoning, a first of its kind. We comprehensively analyze DPO's vulnerabilities under different types of attacks, i.e., backdoor and non-backdoor attacks, and different poisoning methods across a wide array of language models, i.e., LLama 7B, Mistral 7B, and Gemma 7B. We find that unlike PPO-based methods, which, when it comes to backdoor attacks, require at least 4\% of the data to be poisoned to elicit harmful behavior, we exploit the true vulnerabilities of DPO more simply so we can poison the model with only as much as 0.5\% of the data. We further investigate the potential reasons behind the vulnerability and how well this vulnerability translates into backdoor vs non-backdoor attacks.
△ Less
Submitted 19 June, 2024; v1 submitted 17 June, 2024;
originally announced June 2024.
-
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning
Authors:
Utsav Singh,
Souradip Chakraborty,
Wesley A. Suttle,
Brian M. Sadler,
Vinay P Namboodiri,
Amrit Singh Bedi
Abstract:
Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, whil…
▽ More
Learning control policies to perform complex robotics tasks from human preference data presents significant challenges. On the one hand, the complexity of such tasks typically requires learning policies to perform a variety of subtasks, then combining them to achieve the overall goal. At the same time, comprehensive, well-engineered reward functions are typically unavailable in such problems, while limited human preference data often is; making efficient use of such data to guide learning is therefore essential. Methods for learning to perform complex robotics tasks from human preference data must overcome both these challenges simultaneously. In this work, we introduce DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learning, an efficient hierarchical approach that leverages direct preference optimization to learn a higher-level policy and reinforcement learning to learn a lower-level policy. DIPPER enjoys improved computational efficiency due to its use of direct preference optimization instead of standard preference-based approaches such as reinforcement learning from human feedback, while it also mitigates the well-known hierarchical reinforcement learning issues of non-stationarity and infeasible subgoal generation due to our use of primitive-informed regularization inspired by a novel bi-level optimization formulation of the hierarchical reinforcement learning problem. To validate our approach, we perform extensive experimental analysis on a variety of challenging robotics tasks, demonstrating that DIPPER outperforms hierarchical and non-hierarchical baselines, while ameliorating the non-stationarity and infeasible subgoal generation issues of hierarchical reinforcement learning.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Language Models are Crossword Solvers
Authors:
Soumadeep Saha,
Sutanoya Chakraborty,
Saptarshi Saha,
Utpal Garain
Abstract:
Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoT…
▽ More
Crosswords are a form of word puzzle that require a solver to demonstrate a high degree of proficiency in natural language understanding, wordplay, reasoning, and world knowledge, along with adherence to character and length constraints. In this paper we tackle the challenge of solving crosswords with Large Language Models (LLMs). We demonstrate that the current generation of state-of-the art (SoTA) language models show significant competence at deciphering cryptic crossword clues, and outperform previously reported SoTA results by a factor of 2-3 in relevant benchmarks. We also develop a search algorithm that builds off this performance to tackle the problem of solving full crossword grids with LLMs for the very first time, achieving an accuracy of 93\% on New York Times crossword puzzles. Contrary to previous work in this area which concluded that LLMs lag human expert performance significantly, our research suggests this gap is a lot narrower.
△ Less
Submitted 14 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Neural-g: A Deep Learning Framework for Mixing Density Estimation
Authors:
Shijie Wang,
Saptarshi Chakraborty,
Qian Qin,
Ray Bai
Abstract:
Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a…
▽ More
Mixing (or prior) density estimation is an important problem in machine learning and statistics, especially in empirical Bayes $g$-modeling where accurately estimating the prior is necessary for making good posterior inferences. In this paper, we propose neural-$g$, a new neural network-based estimator for $g$-modeling. Neural-$g$ uses a softmax output layer to ensure that the estimated prior is a valid probability density. Under default hyperparameters, we show that neural-$g$ is very flexible and capable of capturing many unknown densities, including those with flat regions, heavy tails, and/or discontinuities. In contrast, existing methods struggle to capture all of these prior shapes. We provide justification for neural-$g$ by establishing a new universal approximation theorem regarding the capability of neural networks to learn arbitrary probability mass functions. To accelerate convergence of our numerical implementation, we utilize a weighted average gradient descent approach to update the network parameters. Finally, we extend neural-$g$ to multivariate prior density estimation. We illustrate the efficacy of our approach through simulations and analyses of real datasets. A software package to implement neural-$g$ is publicly available at https://github.com/shijiew97/neuralG.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
PhyPlan: Generalizable and Rapid Physical Task Planning with Physics Informed Skill Networks for Robot Manipulators
Authors:
Mudit Chopra,
Abhinav Barnawal,
Harshil Vagadia,
Tamajit Banerjee,
Shreshth Tuli,
Souvik Chakraborty,
Rohan Paul
Abstract:
Given the task of positioning a ball-like object to a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. However, enabling robots to reason similarly is non-trivial. Existing methods for physical reasoning are data-hungry and struggle with complexity and uncertainty inherent in the real world. This paper presents PhyPlan, a novel…
▽ More
Given the task of positioning a ball-like object to a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. However, enabling robots to reason similarly is non-trivial. Existing methods for physical reasoning are data-hungry and struggle with complexity and uncertainty inherent in the real world. This paper presents PhyPlan, a novel physics-informed planning framework that combines physics-informed neural networks (PINNs) with modified Monte Carlo Tree Search (MCTS) to enable embodied agents to perform dynamic physical tasks. PhyPlan leverages PINNs to simulate and predict outcomes of actions in a fast and accurate manner and uses MCTS for planning. It dynamically determines whether to consult a PINN-based simulator (coarse but fast) or engage directly with the actual environment (fine but slow) to determine optimal policy. Given an unseen task, PhyPlan can infer the sequence of actions and learn the latent parameters, resulting in a generalizable approach that can rapidly learn to perform novel physical tasks. Evaluation with robots in simulated 3D environments demonstrates the ability of our approach to solve 3D-physical reasoning tasks involving the composition of dynamic skills. Quantitatively, PhyPlan excels in several aspects: (i) it achieves lower regret when learning novel tasks compared to the state-of-the-art, (ii) it expedites skill learning and enhances the speed of physical reasoning, (iii) it demonstrates higher data efficiency compared to a physics un-informed approach.
△ Less
Submitted 22 April, 2024;
originally announced June 2024.
-
Transfer Q Star: Principled Decoding for LLM Alignment
Authors:
Souradip Chakraborty,
Soumya Suvra Ghosal,
Ming Yin,
Dinesh Manocha,
Mengdi Wang,
Amrit Singh Bedi,
Furong Huang
Abstract:
Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame…
▽ More
Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{π_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $ρ_{\texttt{BL}}$ aligned with a baseline reward $ρ_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Gamified AI Approch for Early Detection of Dementia
Authors:
Paramita Kundu Maji,
Soubhik Acharya,
Priti Paul,
Sanjay Chakraborty,
Saikat Basu
Abstract:
This paper aims to develop a new deep learning-inspired gaming approach for early detection of dementia. This research integrates a robust convolutional neural network (CNN)-based model for early dementia detection using health metrics data as well as facial image data through a cognitive assessment-based gaming application. We have collected 1000 data samples of health metrics dataset from Apollo…
▽ More
This paper aims to develop a new deep learning-inspired gaming approach for early detection of dementia. This research integrates a robust convolutional neural network (CNN)-based model for early dementia detection using health metrics data as well as facial image data through a cognitive assessment-based gaming application. We have collected 1000 data samples of health metrics dataset from Apollo Diagnostic Center Kolkata that is labeled as either demented or non-demented for the training of MOD-1D-CNN for the game level 1 and another dataset of facial images containing 1800 facial data that are labeled as either demented or non-demented is collected by our research team for the training of MOD-2D-CNN model in-game level 2. In our work, the loss for the proposed MOD-1D-CNN model is 0.2692 and the highest accuracy is 70.50% for identifying the dementia traits using real-life health metrics data. Similarly, the proposed MOD-2D-CNN model loss is 0.1755 and the highest accuracy is obtained here 95.72% for recognizing the dementia status using real-life face-based image data. Therefore, a rule-based weightage method is applied to combine both the proposed methods to achieve the final decision. The MOD-1D-CNN and MOD-2D-CNN models are more lightweight and computationally efficient alternatives because they have a significantly lower number of parameters when compared to the other state-of-the-art models. We have compared their accuracies and parameters with the other state-of-the-art deep learning models.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
FLIPHAT: Joint Differential Privacy for High Dimensional Sparse Linear Bandits
Authors:
Sunrit Chakraborty,
Saptarshi Roy,
Debabrota Basu
Abstract:
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems (e.g. personalized medicine), where high dimensional features (e.g. genomic data) on the users are available, but only a small subset of them are relevant. Motivated by data privacy concerns in these applications, we study the joint differentially private high dimensional sparse linear bandit…
▽ More
High dimensional sparse linear bandits serve as an efficient model for sequential decision-making problems (e.g. personalized medicine), where high dimensional features (e.g. genomic data) on the users are available, but only a small subset of them are relevant. Motivated by data privacy concerns in these applications, we study the joint differentially private high dimensional sparse linear bandits, where both rewards and contexts are considered as private data. First, to quantify the cost of privacy, we derive a lower bound on the regret achievable in this setting. To further address the problem, we design a computationally efficient bandit algorithm, \textbf{F}orgetfu\textbf{L} \textbf{I}terative \textbf{P}rivate \textbf{HA}rd \textbf{T}hresholding (FLIPHAT). Along with doubling of episodes and episodic forgetting, FLIPHAT deploys a variant of Noisy Iterative Hard Thresholding (N-IHT) algorithm as a sparse linear regression oracle to ensure both privacy and regret-optimality. We show that FLIPHAT achieves optimal regret up to logarithmic factors. We analyze the regret by providing a novel refined analysis of the estimation error of N-IHT, which is of parallel interest.
△ Less
Submitted 16 July, 2024; v1 submitted 22 May, 2024;
originally announced May 2024.
-
Towards A Comprehensive Assessment of AI's Environmental Impact
Authors:
Srija Chakraborty
Abstract:
Artificial Intelligence, machine learning (AI/ML) has allowed exploring solutions for a variety of environmental and climate questions ranging from natural disasters, greenhouse gas emission, monitoring biodiversity, agriculture, to weather and climate modeling, enabling progress towards climate change mitigation. However, the intersection of AI/ML and environment is not always positive. The recen…
▽ More
Artificial Intelligence, machine learning (AI/ML) has allowed exploring solutions for a variety of environmental and climate questions ranging from natural disasters, greenhouse gas emission, monitoring biodiversity, agriculture, to weather and climate modeling, enabling progress towards climate change mitigation. However, the intersection of AI/ML and environment is not always positive. The recent surge of interest in ML, made possible by processing very large volumes of data, fueled by access to massive compute power, has sparked a trend towards large-scale adoption of AI/ML. This interest places tremendous pressure on natural resources, that are often overlooked and under-reported. There is a need for a framework that monitors the environmental impact and degradation from AI/ML throughout its lifecycle for informing policymakers, stakeholders to adequately implement standards and policies and track the policy outcome over time. For these policies to be effective, AI's environmental impact needs to be monitored in a spatially-disaggregated, timely manner across the globe at the key activity sites. This study proposes a methodology to track environmental variables relating to the multifaceted impact of AI around datacenters using openly available energy data and globally acquired satellite observations. We present a case study around Northern Virginia, United States that hosts a growing number of datacenters and observe changes in multiple satellite-based environmental metrics. We then discuss the steps to expand this methodology for comprehensive assessment of AI's environmental impact across the planet. We also identify data gaps and formulate recommendations for improving the understanding and monitoring AI-induced changes to the environment and climate.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Diagnosing and Predicting Autonomous Vehicle Operational Safety Using Multiple Simulation Modalities and a Virtual Environment
Authors:
Joe Beck,
Shean Huff,
Subhadeep Chakraborty
Abstract:
Even as technology and performance gains are made in the sphere of automated driving, safety concerns remain. Vehicle simulation has long been seen as a tool to overcome the cost associated with a massive amount of on-road testing for development and discovery of safety critical "edge-cases". However, purely software-based vehicle models may leave a large realism gap between their real-world count…
▽ More
Even as technology and performance gains are made in the sphere of automated driving, safety concerns remain. Vehicle simulation has long been seen as a tool to overcome the cost associated with a massive amount of on-road testing for development and discovery of safety critical "edge-cases". However, purely software-based vehicle models may leave a large realism gap between their real-world counterparts in terms of dynamic response, and highly realistic vehicle-in-the-loop (VIL) simulations that encapsulate a virtual world around a physical vehicle may still be quite expensive to produce and similarly time intensive as on-road testing. In this work, we demonstrate an AV simulation test bed that combines the realism of vehicle-in-the-loop (VIL) simulation with the ease of implementation of model-in-the-loop (MIL) simulation. The setup demonstrated in this work allows for response diagnosis for the VIL simulations. By observing causal links between virtual weather and lighting conditions that surround the virtual depiction of our vehicle, the vision-based perception model and controller of Openpilot, and the dynamic response of our physical vehicle under test, we can draw conclusions regarding how the perceived environment contributed to vehicle response. Conversely, we also demonstrate response prediction for the MIL setup, where the need for a physical vehicle is not required to draw richer conclusions around the impact of environmental conditions on AV performance than could be obtained with VIL simulation alone. These combine for a simulation setup with accurate real-world implications for edge-case discovery that is both cost effective and time efficient to implement.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Generative flow induced neural architecture search: Towards discovering optimal architecture in wavelet neural operator
Authors:
Hartej Soin,
Tapas Tripura,
Souvik Chakraborty
Abstract:
We propose a generative flow-induced neural architecture search algorithm. The proposed approach devices simple feed-forward neural networks to learn stochastic policies to generate sequences of architecture hyperparameters such that the generated states are in proportion with the reward from the terminal state. We demonstrate the efficacy of the proposed search algorithm on the wavelet neural ope…
▽ More
We propose a generative flow-induced neural architecture search algorithm. The proposed approach devices simple feed-forward neural networks to learn stochastic policies to generate sequences of architecture hyperparameters such that the generated states are in proportion with the reward from the terminal state. We demonstrate the efficacy of the proposed search algorithm on the wavelet neural operator (WNO), where we learn a policy to generate a sequence of hyperparameters like wavelet basis and activation operators for wavelet integral blocks. While the trajectory of the generated wavelet basis and activation sequence is cast as flow, the policy is learned by minimizing the flow violation between each state in the trajectory and maximizing the reward from the terminal state. In the terminal state, we train WNO simultaneously to guide the search. We propose to use the exponent of the negative of the WNO loss on the validation dataset as the reward function. While the grid search-based neural architecture generation algorithms foresee every combination, the proposed framework generates the most probable sequence based on the positive reward from the terminal state, thereby reducing exploration time. Compared to reinforcement learning schemes, where complete episodic training is required to get the reward, the proposed algorithm generates the hyperparameter trajectory sequentially. Through four fluid mechanics-oriented problems, we illustrate that the learned policies can sample the best-performing architecture of the neural operator, thereby improving the performance of the vanilla wavelet neural operator.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Towards Neural Synthesis for SMT-Assisted Proof-Oriented Programming
Authors:
Saikat Chakraborty,
Gabriel Ebner,
Siddharth Bhat,
Sarah Fakhoury,
Sakina Fatima,
Shuvendu Lahiri,
Nikhil Swamy
Abstract:
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*.
Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600…
▽ More
Proof-oriented programs mix computational content with proofs of program correctness. However, the human effort involved in programming and proving is still substantial, despite the use of Satisfiability Modulo Theories (SMT) solvers to automate proofs in languages such as F*.
Seeking to spur research on using AI to automate the construction of proof-oriented programs, we curate a dataset of 600K lines of open-source F* programs and proofs, including software used in production systems ranging from Windows and Linux, to Python and Firefox. Our dataset includes around 32K top-level F* definitions, each representing a type-directed program and proof synthesis problem -- producing a definition given a formal specification expressed as an F* type. We provide a program-fragment checker that queries F* to check the correctness of candidate solutions. We believe this is the largest corpus of SMT-assisted program proofs coupled with a reproducible program-fragment checker.
Grounded in this dataset, we investigate the use of AI to synthesize programs and their proofs in F*, with promising results. Our main finding in that the performance of fine-tuned smaller language models (such as Phi-2 or StarCoder) compare favorably with large language models (such as GPT-4), at a much lower computational cost. We also identify various type-based retrieval augmentation techniques and find that they boost performance significantly. With detailed error analysis and case studies, we identify potential strengths and weaknesses of models and techniques and suggest directions for future improvements.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Human Factors in Model-Driven Engineering: Future Research Goals and Initiatives for MDE
Authors:
Grischa Liebel,
Jil Klünder,
Regina Hebig,
Christopher Lazik,
Inês Nunes,
Isabella Graßl,
Jan-Philipp Steghöfer,
Joeri Exelmans,
Julian Oertel,
Kai Marquardt,
Katharina Juhnke,
Kurt Schneider,
Lucas Gren,
Lucia Happe,
Marc Herrmann,
Marvin Wyrich,
Matthias Tichy,
Miguel Goulão,
Rebekka Wohlrab,
Reyhaneh Kalantari,
Robert Heinrich,
Sandra Greiner,
Satrio Adi Rukmono,
Shalini Chakraborty,
Silvia Abrahão
, et al. (1 additional authors not shown)
Abstract:
Purpose: Software modelling and Model-Driven Engineering (MDE) is traditionally studied from a technical perspective. However, one of the core motivations behind the use of software models is inherently human-centred. Models aim to enable practitioners to communicate about software designs, make software understandable, or make software easier to write through domain-specific modelling languages.…
▽ More
Purpose: Software modelling and Model-Driven Engineering (MDE) is traditionally studied from a technical perspective. However, one of the core motivations behind the use of software models is inherently human-centred. Models aim to enable practitioners to communicate about software designs, make software understandable, or make software easier to write through domain-specific modelling languages. Several recent studies challenge the idea that these aims can always be reached and indicate that human factors play a role in the success of MDE. However, there is an under-representation of research focusing on human factors in modelling. Methods: During a GI-Dagstuhl seminar, topics related to human factors in modelling were discussed by 26 expert participants from research and industry. Results: In breakout groups, five topics were covered in depth, namely modelling human aspects, factors of modeller experience, diversity and inclusion in MDE, collaboration and MDE, and teaching human-aware MDE. Conclusion: We summarise our insights gained during the discussions on the five topics. We formulate research goals, questions, and propositions that support directing future initiatives towards an MDE community that is aware of and supportive of human factors and values.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
MD-NOMAD: Mixture density nonlinear manifold decoder for emulating stochastic differential equations and uncertainty propagation
Authors:
Akshay Thakur,
Souvik Chakraborty
Abstract:
We propose a neural operator framework, termed mixture density nonlinear manifold decoder (MD-NOMAD), for stochastic simulators. Our approach leverages an amalgamation of the pointwise operator learning neural architecture nonlinear manifold decoder (NOMAD) with mixture density-based methods to estimate conditional probability distributions for stochastic output functions. MD-NOMAD harnesses the a…
▽ More
We propose a neural operator framework, termed mixture density nonlinear manifold decoder (MD-NOMAD), for stochastic simulators. Our approach leverages an amalgamation of the pointwise operator learning neural architecture nonlinear manifold decoder (NOMAD) with mixture density-based methods to estimate conditional probability distributions for stochastic output functions. MD-NOMAD harnesses the ability of probabilistic mixture models to estimate complex probability and the high-dimensional scalability of pointwise neural operator NOMAD. We conduct empirical assessments on a wide array of stochastic ordinary and partial differential equations and present the corresponding results, which highlight the performance of the proposed framework.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Neural Operator induced Gaussian Process framework for probabilistic solution of parametric partial differential equations
Authors:
Sawan Kumar,
Rajdip Nayek,
Souvik Chakraborty
Abstract:
The study of neural operators has paved the way for the development of efficient approaches for solving partial differential equations (PDEs) compared with traditional methods. However, most of the existing neural operators lack the capability to provide uncertainty measures for their predictions, a crucial aspect, especially in data-driven scenarios with limited available data. In this work, we p…
▽ More
The study of neural operators has paved the way for the development of efficient approaches for solving partial differential equations (PDEs) compared with traditional methods. However, most of the existing neural operators lack the capability to provide uncertainty measures for their predictions, a crucial aspect, especially in data-driven scenarios with limited available data. In this work, we propose a novel Neural Operator-induced Gaussian Process (NOGaP), which exploits the probabilistic characteristics of Gaussian Processes (GPs) while leveraging the learning prowess of operator learning. The proposed framework leads to improved prediction accuracy and offers a quantifiable measure of uncertainty. The proposed framework is extensively evaluated through experiments on various PDE examples, including Burger's equation, Darcy flow, non-homogeneous Poisson, and wave-advection equations. Furthermore, a comparative study with state-of-the-art operator learning algorithms is presented to highlight the advantages of NOGaP. The results demonstrate superior accuracy and expected uncertainty characteristics, suggesting the promising potential of the proposed framework.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Lessons Learned in Performing a Trustworthy AI and Fundamental Rights Assessment
Authors:
Marjolein Boonstra,
Frédérick Bruneault,
Subrata Chakraborty,
Tjitske Faber,
Alessio Gallucci,
Eleanore Hickman,
Gerard Kema,
Heejin Kim,
Jaap Kooiker,
Elisabeth Hildt,
Annegret Lamadé,
Emilie Wiinblad Mathez,
Florian Möslein,
Genien Pathuis,
Giovanni Sartor,
Marijke Steege,
Alice Stocco,
Willy Tadema,
Jarno Tuimala,
Isabel van Vledder,
Dennis Vetter,
Jana Vetter,
Magnus Westerlund,
Roberto V. Zicari
Abstract:
This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 throug…
▽ More
This report shares the experiences, results and lessons learned in conducting a pilot project ``Responsible use of AI'' in cooperation with the Province of Friesland, Rijks ICT Gilde-part of the Ministry of the Interior and Kingdom Relations (BZK) (both in The Netherlands) and a group of members of the Z-Inspection$^{\small{\circledR}}$ Initiative. The pilot project took place from May 2022 through January 2023. During the pilot, the practical application of a deep learning algorithm from the province of Frŷslan was assessed. The AI maps heathland grassland by means of satellite images for monitoring nature reserves. Environmental monitoring is one of the crucial activities carried on by society for several purposes ranging from maintaining standards on drinkable water to quantifying the CO2 emissions of a particular state or region. Using satellite imagery and machine learning to support decisions is becoming an important part of environmental monitoring. The main focus of this report is to share the experiences, results and lessons learned from performing both a Trustworthy AI assessment using the Z-Inspection$^{\small{\circledR}}$ process and the EU framework for Trustworthy AI, and combining it with a Fundamental Rights assessment using the Fundamental Rights and Algorithms Impact Assessment (FRAIA) as recommended by the Dutch government for the use of AI algorithms by the Dutch public authorities.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Soil Fertility Prediction Using Combined USB-microscope Based Soil Image, Auxiliary Variables, and Portable X-Ray Fluorescence Spectrometry
Authors:
Shubhadip Dasgupta,
Satwik Pate,
Divya Rathore,
L. G. Divyanth,
Ayan Das,
Anshuman Nayak,
Subhadip Dey,
Asim Biswas,
David C. Weindorf,
Bin Li,
Sergio Henrique Godinho Silva,
Bruno Teixeira Ribeiro,
Sanjay Srivastava,
Somsubhra Chakraborty
Abstract:
This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and t…
▽ More
This study explored the application of portable X-ray fluorescence (PXRF) spectrometry and soil image analysis to rapidly assess soil fertility, focusing on critical parameters such as available B, organic carbon (OC), available Mn, available S, and the sulfur availability index (SAI). Analyzing 1,133 soil samples from various agro-climatic zones in Eastern India, the research combined color and texture features from microscopic soil images, PXRF data, and auxiliary soil variables (AVs) using a Random Forest model. Results indicated that integrating image features (IFs) with auxiliary variables (AVs) significantly enhanced prediction accuracy for available B (R^2 = 0.80) and OC (R^2 = 0.88). A data fusion approach, incorporating IFs, AVs, and PXRF data, further improved predictions for available Mn and SAI with R^2 values of 0.72 and 0.70, respectively. The study demonstrated how these integrated technologies have the potential to provide quick and affordable options for soil testing, opening up access to more sophisticated prediction models and a better comprehension of the fertility and health of the soil. Future research should focus on the application of deep learning models on a larger dataset of soil images, developed using soils from a broader range of agro-climatic zones under field condition.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
LLM-based Test-driven Interactive Code Generation: User Study and Empirical Evaluation
Authors:
Sarah Fakhoury,
Aaditya Naik,
Georgios Sakkas,
Saikat Chakraborty,
Shuvendu K. Lahiri
Abstract:
Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e.,…
▽ More
Large language models (LLMs) have shown great potential in automating significant aspects of coding by producing natural code from informal natural language (NL) intent. However, given NL is informal, it does not lend easily to checking that the generated code correctly satisfies the user intent. In this paper, we propose a novel interactive workflow TiCoder for guided intent clarification (i.e., partial formalization) through tests to support the generation of more accurate code suggestions. Through a mixed methods user study with 15 programmers, we present an empirical evaluation of the effectiveness of the workflow to improve code generation accuracy. We find that participants using the proposed workflow are significantly more likely to correctly evaluate AI generated code, and report significantly less task-induced cognitive load. Furthermore, we test the potential of the workflow at scale with four different state-of-the-art LLMs on two python datasets, using an idealized proxy for a user feedback. We observe an average absolute improvement of 38.43% in the pass@1 code generation accuracy for both datasets and across all LLMs within 5 user interactions, in addition to the automatic generation of accompanying unit tests.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Dynamic Ego-Velocity estimation Using Moving mmWave Radar: A Phase-Based Approach
Authors:
Argha Sen,
Soham Chakraborty,
Soham Tripathy,
Sandip Chakraborty
Abstract:
Precise ego-motion measurement is crucial for various applications, including robotics, augmented reality, and autonomous navigation. In this poster, we propose mmPhase, an odometry framework based on single-chip millimetre-wave (mmWave) radar for robust ego-motion estimation in mobile platforms without requiring additional modalities like the visual, wheel, or inertial odometry. mmPhase leverages…
▽ More
Precise ego-motion measurement is crucial for various applications, including robotics, augmented reality, and autonomous navigation. In this poster, we propose mmPhase, an odometry framework based on single-chip millimetre-wave (mmWave) radar for robust ego-motion estimation in mobile platforms without requiring additional modalities like the visual, wheel, or inertial odometry. mmPhase leverages a phase-based velocity estimation approach to overcome the limitations of conventional doppler resolution. For real-world evaluations of mmPhase we have developed an ego-vehicle prototype. Compared to the state-of-the-art baselines, mmPhase shows superior performance in ego-velocity estimation.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
GENESIS-RL: GEnerating Natural Edge-cases with Systematic Integration of Safety considerations and Reinforcement Learning
Authors:
Hsin-Jung Yang,
Joe Beck,
Md Zahid Hasan,
Ekin Beyazit,
Subhadeep Chakraborty,
Tichakorn Wongpiromsarn,
Soumik Sarkar
Abstract:
In the rapidly evolving field of autonomous systems, the safety and reliability of the system components are fundamental requirements. These components are often vulnerable to complex and unforeseen environments, making natural edge-case generation essential for enhancing system resilience. This paper presents GENESIS-RL, a novel framework that leverages system-level safety considerations and rein…
▽ More
In the rapidly evolving field of autonomous systems, the safety and reliability of the system components are fundamental requirements. These components are often vulnerable to complex and unforeseen environments, making natural edge-case generation essential for enhancing system resilience. This paper presents GENESIS-RL, a novel framework that leverages system-level safety considerations and reinforcement learning techniques to systematically generate naturalistic edge cases. By simulating challenging conditions that mimic the real-world situations, our framework aims to rigorously test entire system's safety and reliability. Although demonstrated within the autonomous driving application, our methodology is adaptable across diverse autonomous systems. Our experimental validation, conducted on high-fidelity simulator underscores the overall effectiveness of this framework.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Decoding the visual attention of pathologists to reveal their level of expertise
Authors:
Souradeep Chakraborty,
Dana Perez,
Paul Friedman,
Natallia Sheuka,
Constantin Friedman,
Oksana Yaskiv,
Rajarsi Gupta,
Gregory J. Zelinsky,
Joel H. Saltz,
Dimitris Samaras
Abstract:
We present a method for classifying the expertise of a pathologist based on how they allocated their attention during a cancer reading. We engage this decoding task by developing a novel method for predicting the attention of pathologists as they read whole-slide Images (WSIs) of prostate and make cancer grade classifications. Our ground truth measure of a pathologists' attention is the x, y and z…
▽ More
We present a method for classifying the expertise of a pathologist based on how they allocated their attention during a cancer reading. We engage this decoding task by developing a novel method for predicting the attention of pathologists as they read whole-slide Images (WSIs) of prostate and make cancer grade classifications. Our ground truth measure of a pathologists' attention is the x, y and z (magnification) movement of their viewport as they navigated through WSIs during readings, and to date we have the attention behavior of 43 pathologists reading 123 WSIs. These data revealed that specialists have higher agreement in both their attention and cancer grades compared to general pathologists and residents, suggesting that sufficient information may exist in their attention behavior to classify their expertise level. To attempt this, we trained a transformer-based model to predict the visual attention heatmaps of resident, general, and specialist (GU) pathologists during Gleason grading. Based solely on a pathologist's attention during a reading, our model was able to predict their level of expertise with 75.3%, 56.1%, and 77.2% accuracy, respectively, better than chance and baseline models. Our model therefore enables a pathologist's expertise level to be easily and objectively evaluated, important for pathology training and competency assessment. Tools developed from our model could also be used to help pathology trainees learn how to read WSIs like an expert.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Self-supervised co-salient object detection via feature correspondence at multiple scales
Authors:
Souradeep Chakraborty,
Dimitris Samaras
Abstract:
Our paper introduces a novel two-stage self-supervised approach for detecting co-occurring salient objects (CoSOD) in image groups without requiring segmentation annotations. Unlike existing unsupervised methods that rely solely on patch-level information (e.g. clustering patch descriptors) or on computation heavy off-the-shelf components for CoSOD, our lightweight model leverages feature correspo…
▽ More
Our paper introduces a novel two-stage self-supervised approach for detecting co-occurring salient objects (CoSOD) in image groups without requiring segmentation annotations. Unlike existing unsupervised methods that rely solely on patch-level information (e.g. clustering patch descriptors) or on computation heavy off-the-shelf components for CoSOD, our lightweight model leverages feature correspondences at both patch and region levels, significantly improving prediction performance. In the first stage, we train a self-supervised network that detects co-salient regions by computing local patch-level feature correspondences across images. We obtain the segmentation predictions using confidence-based adaptive thresholding. In the next stage, we refine these intermediate segmentations by eliminating the detected regions (within each image) whose averaged feature representations are dissimilar to the foreground feature representation averaged across all the cross-attention maps (from the previous stage). Extensive experiments on three CoSOD benchmark datasets show that our self-supervised model outperforms the corresponding state-of-the-art models by a huge margin (e.g. on the CoCA dataset, our model has a 13.7% F-measure gain over the SOTA unsupervised CoSOD model). Notably, our self-supervised model also outperforms several recent fully supervised CoSOD models on the three test datasets (e.g., on the CoCA dataset, our model has a 4.6% F-measure gain over a recent supervised CoSOD model).
△ Less
Submitted 3 July, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Could We Generate Cytology Images from Histopathology Images? An Empirical Study
Authors:
Soumyajyoti Dey,
Sukanta Chakraborty,
Utso Guha Roy,
Nibaran Das
Abstract:
Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implemen…
▽ More
Automation in medical imaging is quite challenging due to the unavailability of annotated datasets and the scarcity of domain experts. In recent years, deep learning techniques have solved some complex medical imaging tasks like disease classification, important object localization, segmentation, etc. However, most of the task requires a large amount of annotated data for their successful implementation. To mitigate the shortage of data, different generative models are proposed for data augmentation purposes which can boost the classification performances. For this, different synthetic medical image data generation models are developed to increase the dataset. Unpaired image-to-image translation models here shift the source domain to the target domain. In the breast malignancy identification domain, FNAC is one of the low-cost low-invasive modalities normally used by medical practitioners. But availability of public datasets in this domain is very poor. Whereas, for automation of cytology images, we need a large amount of annotated data. Therefore synthetic cytology images are generated by translating breast histopathology samples which are publicly available. In this study, we have explored traditional image-to-image transfer models like CycleGAN, and Neural Style Transfer. Further, it is observed that the generated cytology images are quite similar to real breast cytology samples by measuring FID and KID scores.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Fuzzy Rank-based Late Fusion Technique for Cytology image Segmentation
Authors:
Soumyajyoti Dey,
Sukanta Chakraborty,
Utso Guha Roy,
Nibaran Das
Abstract:
Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlapping regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-…
▽ More
Cytology image segmentation is quite challenging due to its complex cellular structure and multiple overlapping regions. On the other hand, for supervised machine learning techniques, we need a large amount of annotated data, which is costly. In recent years, late fusion techniques have given some promising performances in the field of image classification. In this paper, we have explored a fuzzy-based late fusion techniques for cytology image segmentation. This fusion rule integrates three traditional semantic segmentation models UNet, SegNet, and PSPNet. The technique is applied on two cytology image datasets, i.e., cervical cytology(HErlev) and breast cytology(JUCYT-v1) image datasets. We have achieved maximum MeanIoU score 84.27% and 83.79% on the HErlev dataset and JUCYT-v1 dataset after the proposed late fusion technique, respectively which are better than that of the traditional fusion rules such as average probability, geometric mean, Borda Count, etc. The codes of the proposed model are available on GitHub.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Incentivized Exploration of Non-Stationary Stochastic Bandits
Authors:
Sourav Chakraborty,
Lijun Chen
Abstract:
We study incentivized exploration for the multi-armed bandit (MAB) problem with non-stationary reward distributions, where players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on the reward. We consider two different non-stationary environments: abruptly-changing and continuously-changing, and propose respective incentivized exploration algor…
▽ More
We study incentivized exploration for the multi-armed bandit (MAB) problem with non-stationary reward distributions, where players receive compensation for exploring arms other than the greedy choice and may provide biased feedback on the reward. We consider two different non-stationary environments: abruptly-changing and continuously-changing, and propose respective incentivized exploration algorithms. We show that the proposed algorithms achieve sublinear regret and compensation over time, thus effectively incentivizing exploration despite the nonstationarity and the biased or drifted feedback.
△ Less
Submitted 16 March, 2024;
originally announced March 2024.
-
Equivalence Testing: The Power of Bounded Adaptivity
Authors:
Diptarka Chakraborty,
Sourav Chakraborty,
Gunjan Kumar,
Kuldeep S. Meel
Abstract:
Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on $[n]$ are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also…
▽ More
Equivalence testing, a fundamental problem in the field of distribution testing, seeks to infer if two unknown distributions on $[n]$ are the same or far apart in the total variation distance. Conditional sampling has emerged as a powerful query model and has been investigated by theoreticians and practitioners alike, leading to the design of optimal algorithms albeit in a sequential setting (also referred to as adaptive tester). Given the profound impact of parallel computing over the past decades, there has been a strong desire to design algorithms that enable high parallelization. Despite significant algorithmic advancements over the last decade, parallelizable techniques (also termed non-adaptive testers) have $\tilde{O}(\log^{12}n)$ query complexity, a prohibitively large complexity to be of practical usage. Therefore, the primary challenge is whether it is possible to design algorithms that enable high parallelization while achieving efficient query complexity.
Our work provides an affirmative answer to the aforementioned challenge: we present a highly parallelizable tester with a query complexity of $\tilde{O}(\log n)$, achieved through a single round of adaptivity, marking a significant stride towards harmonizing parallelizability and efficiency in equivalence testing.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Evaluating Large Language Models as Virtual Annotators for Time-series Physical Sensing Data
Authors:
Aritra Hota,
Soumyajit Chatterjee,
Sandip Chakraborty
Abstract:
Traditional human-in-the-loop-based annotation for time-series data like inertial data often requires access to alternate modalities like video or audio from the environment. These alternate sources provide the necessary information to the human annotator, as the raw numeric data is often too obfuscated even for an expert. However, this traditional approach has many concerns surrounding overall co…
▽ More
Traditional human-in-the-loop-based annotation for time-series data like inertial data often requires access to alternate modalities like video or audio from the environment. These alternate sources provide the necessary information to the human annotator, as the raw numeric data is often too obfuscated even for an expert. However, this traditional approach has many concerns surrounding overall cost, efficiency, storage of additional modalities, time, scalability, and privacy. Interestingly, recent large language models (LLMs) are also trained with vast amounts of publicly available alphanumeric data, which allows them to comprehend and perform well on tasks beyond natural language processing. Naturally, this opens up a potential avenue to explore LLMs as virtual annotators where the LLMs will be directly provided the raw sensor data for annotation instead of relying on any alternate modality. Naturally, this could mitigate the problems of the traditional human-in-the-loop approach. Motivated by this observation, we perform a detailed study in this paper to assess whether the state-of-the-art (SOTA) LLMs can be used as virtual annotators for labeling time-series physical sensing data. To perform this in a principled manner, we segregate the study into two major phases. In the first phase, we investigate the challenges an LLM like GPT-4 faces in comprehending raw sensor data. Considering the observations from phase 1, in the next phase, we investigate the possibility of encoding the raw sensor data using SOTA SSL approaches and utilizing the projected time-series data to get annotations from the LLM. Detailed evaluation with four benchmark HAR datasets shows that SSL-based encoding and metric-based guidance allow the LLM to make more reasonable decisions and provide accurate annotations without requiring computationally expensive fine-tuning or sophisticated prompt engineering.
△ Less
Submitted 14 April, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
PhyPlan: Compositional and Adaptive Physical Task Reasoning with Physics-Informed Skill Networks for Robot Manipulators
Authors:
Harshil Vagadia,
Mudit Chopra,
Abhinav Barnawal,
Tamajit Banerjee,
Shreshth Tuli,
Souvik Chakraborty,
Rohan Paul
Abstract:
Given the task of positioning a ball-like object to a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. However, enabling robots to reason similarly is non-trivial. Existing methods for physical reasoning are data-hungry and struggle with complexity and uncertainty inherent in the real world. This paper presents PhyPlan, a novel…
▽ More
Given the task of positioning a ball-like object to a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. However, enabling robots to reason similarly is non-trivial. Existing methods for physical reasoning are data-hungry and struggle with complexity and uncertainty inherent in the real world. This paper presents PhyPlan, a novel physics-informed planning framework that combines physics-informed neural networks (PINNs) with modified Monte Carlo Tree Search (MCTS) to enable embodied agents to perform dynamic physical tasks. PhyPlan leverages PINNs to simulate and predict outcomes of actions in a fast and accurate manner and uses MCTS for planning. It dynamically determines whether to consult a PINN-based simulator (coarse but fast) or engage directly with the actual environment (fine but slow) to determine optimal policy. Evaluation with robots in simulated 3D environments demonstrates the ability of our approach to solve 3D-physical reasoning tasks involving the composition of dynamic skills. Quantitatively, PhyPlan excels in several aspects: (i) it achieves lower regret when learning novel tasks compared to state-of-the-art, (ii) it expedites skill learning and enhances the speed of physical reasoning, (iii) it demonstrates higher data efficiency compared to a physics un-informed approach.
△ Less
Submitted 24 February, 2024;
originally announced February 2024.
-
A Statistical Analysis of Wasserstein Autoencoders for Intrinsically Low-dimensional Data
Authors:
Saptarshi Chakraborty,
Peter L. Bartlett
Abstract:
Variational Autoencoders (VAEs) have gained significant popularity among researchers as a powerful tool for understanding unknown distributions based on limited samples. This popularity stems partly from their impressive performance and partly from their ability to provide meaningful feature representations in the latent space. Wasserstein Autoencoders (WAEs), a variant of VAEs, aim to not only im…
▽ More
Variational Autoencoders (VAEs) have gained significant popularity among researchers as a powerful tool for understanding unknown distributions based on limited samples. This popularity stems partly from their impressive performance and partly from their ability to provide meaningful feature representations in the latent space. Wasserstein Autoencoders (WAEs), a variant of VAEs, aim to not only improve model efficiency but also interpretability. However, there has been limited focus on analyzing their statistical guarantees. The matter is further complicated by the fact that the data distributions to which WAEs are applied - such as natural images - are often presumed to possess an underlying low-dimensional structure within a high-dimensional feature space, which current theory does not adequately account for, rendering known bounds inefficient. To bridge the gap between the theory and practice of WAEs, in this paper, we show that WAEs can learn the data distributions when the network architectures are properly chosen. We show that the convergence rates of the expected excess risk in the number of samples for WAEs are independent of the high feature dimension, instead relying only on the intrinsic dimension of the data distribution.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.