-
Lossy Catalytic Computation
Authors:
Chetan Gupta,
Rahul Jain,
Vimal Raj Sharma,
Raghunath Tewari
Abstract:
A catalytic Turing machine is a variant of a Turing machine in which there exists an auxiliary tape in addition to the input tape and the work tape. This auxiliary tape is initially filled with arbitrary content. The machine can read and write on the auxiliary tape, but it is constrained to restore its initial content when it halts. Studying such a model and finding its powers and limitations has…
▽ More
A catalytic Turing machine is a variant of a Turing machine in which there exists an auxiliary tape in addition to the input tape and the work tape. This auxiliary tape is initially filled with arbitrary content. The machine can read and write on the auxiliary tape, but it is constrained to restore its initial content when it halts. Studying such a model and finding its powers and limitations has practical applications.
In this paper, we study catalytic Turing machines with O(log n)-sized work tape and polynomial-sized auxiliary tape that are allowed to lose at most constant many bits of the auxiliary tape when they halt. We show that such catalytic Turing machines can only decide the same set of languages as standard catalytic Turing machines with the same size work and auxiliary tape.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
PAL: A Variability-Aware Policy for Scheduling ML Workloads in GPU Clusters
Authors:
Rutwik Jain,
Brandon Tran,
Keting Chen,
Matthew D. Sinclair,
Shivaram Venkataraman
Abstract:
Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and growing use of ML, including in some scientific applications, optimizing these clusters for ML workloads is particularly important. However, recent work has demonstra…
▽ More
Large-scale computing systems are increasingly using accelerators such as GPUs to enable peta- and exa-scale levels of compute to meet the needs of Machine Learning (ML) and scientific computing applications. Given the widespread and growing use of ML, including in some scientific applications, optimizing these clusters for ML workloads is particularly important. However, recent work has demonstrated that accelerators in these clusters can suffer from performance variability and this variability can lead to resource under-utilization and load imbalance. In this work we focus on how clusters schedulers, which are used to share accelerator-rich clusters across many concurrent ML jobs, can embrace performance variability to mitigate its effects. Our key insight to address this challenge is to characterize which applications are more likely to suffer from performance variability and take that into account while placing jobs on the cluster. We design a novel cluster scheduler, PAL, which uses performance variability measurements and application-specific profiles to improve job performance and resource utilization. PAL also balances performance variability with locality to ensure jobs are spread across as few nodes as possible. Overall, PAL significantly improves GPU-rich cluster scheduling: across traces for six ML workload applications spanning image, language, and vision models with a variety of variability profiles, PAL improves geomean job completion time by 42%, cluster utilization by 28%, and makespan by 47% over existing state-of-the-art schedulers.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
Dynamical response and time correlation functions in random quantum systems
Authors:
Sudhir Ranjan Jain,
Pierre Gaspard
Abstract:
Time-dependent response and correlation functions are studied in random quantum systems composed of infinitely many parts without mutual interaction and defined with statistically independent random matrices. The latter are taken within the three Wigner-Dyson universality classes. In these systems, the response functions are shown to be exactly given by statistical averages over the random-matrix…
▽ More
Time-dependent response and correlation functions are studied in random quantum systems composed of infinitely many parts without mutual interaction and defined with statistically independent random matrices. The latter are taken within the three Wigner-Dyson universality classes. In these systems, the response functions are shown to be exactly given by statistical averages over the random-matrix ensemble. Analytical results are obtained for the time dependence of the mean response and correlation functions at zero and positive temperatures. At long times, the mean correlation functions are shown to have a power-law decay for GOE at positive temperatures, but for GUE and GSE at zero temperature. Otherwise, the decay is much faster in time. In relation to these power-law decays, the associated spectral densities have a dip around zero frequency. The diagrammatic method is developed to obtain higher-order response functions and the third-order response function is explicitly calculated. The response to impulsive perturbations is also considered. In addition, the quantum fluctuations of the correlation function in individual members of the ensemble are characterised in terms of their probability distribution, which is shown to change with the temperature.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Markov Balance Satisfaction Improves Performance in Strictly Batch Offline Imitation Learning
Authors:
Rishabh Agrawal,
Nathan Dahlin,
Rahul Jain,
Ashutosh Nayyar
Abstract:
Imitation learning (IL) is notably effective for robotic tasks where directly programming behaviors or defining optimal control costs is challenging. In this work, we address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. It does not have additional supplementary datasets beyond the expert's dataset nor any information a…
▽ More
Imitation learning (IL) is notably effective for robotic tasks where directly programming behaviors or defining optimal control costs is challenging. In this work, we address a scenario where the imitator relies solely on observed behavior and cannot make environmental interactions during learning. It does not have additional supplementary datasets beyond the expert's dataset nor any information about the transition dynamics. Unlike state-of-the-art (SOTA) IL methods, this approach tackles the limitations of conventional IL by operating in a more constrained and realistic setting. Our method uses the Markov balance equation and introduces a novel conditional density estimation-based imitation learning framework. It employs conditional normalizing flows for transition dynamics estimation and aims at satisfying a balance equation for the environment. Through a series of numerical experiments on Classic Control and MuJoCo environments, we demonstrate consistently superior empirical performance compared to many SOTA IL algorithms.
△ Less
Submitted 17 August, 2024;
originally announced August 2024.
-
Phishing Codebook: A Structured Framework for the Characterization of Phishing Emails
Authors:
Tarini Saka,
Rachiyta Jain,
Kami Vaniea,
Nadin Kökciyan
Abstract:
Phishing is one of the most prevalent and expensive types of cybercrime faced by organizations and individuals worldwide. Most prior research has focused on various technical features and traditional representations of text to characterize phishing emails. There is a significant knowledge gap about the qualitative traits embedded in them, which could be useful in a range of phishing mitigation tas…
▽ More
Phishing is one of the most prevalent and expensive types of cybercrime faced by organizations and individuals worldwide. Most prior research has focused on various technical features and traditional representations of text to characterize phishing emails. There is a significant knowledge gap about the qualitative traits embedded in them, which could be useful in a range of phishing mitigation tasks. In this paper, we dissect the structure of phishing emails to gain a better understanding of the factors that influence human decision-making when assessing suspicious emails and identify a novel set of descriptive features. For this, we employ an iterative qualitative coding approach to identify features that are descriptive of the emails. We developed the ``Phishing Codebook'', a structured framework to systematically extract key information from phishing emails, and we apply this codebook to a publicly available dataset of 503 phishing emails collected between 2015 and 2021. We present key observations and challenges related to phishing attacks delivered indirectly through legitimate services, the challenge of recurring and long-lasting scams, and the variations within campaigns used by attackers to bypass rule-based filters. Furthermore, we provide two use cases to show how the Phishing Codebook is useful in identifying similar phishing emails and in creating well-tailored responses to end-users. We share the Phishing Codebook and the annotated benchmark dataset to help researchers have a better understanding of phishing emails.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Quantum cloning transformation unlocks the potential of W class of states in a secret sharing protocol
Authors:
Rashi Jain,
Satyabrata Adhikari
Abstract:
One of the most challenging problems is to share a secret because the sender does not trust the receiver completely. Thus, the sender provides one part of the information to the receiver and shares the other part of the information to a third party on whom the sender can rely. The secret can be revealed when the receiver and the third party agree to cooperate. This is the essence of the secret-sha…
▽ More
One of the most challenging problems is to share a secret because the sender does not trust the receiver completely. Thus, the sender provides one part of the information to the receiver and shares the other part of the information to a third party on whom the sender can rely. The secret can be revealed when the receiver and the third party agree to cooperate. This is the essence of the secret-sharing protocol. A lot of studies have been done on it using the three-qubit GHZ state, and only a few works have involved the W state. In this work, we introduce a quantum secret sharing protocol exploiting a three-qubit W class of state shared between three parties, Alice (Sender), Bob (Mediator), and Charlie (Receiver). In the proposed protocol, the shared state parameters and the secret are linked in such a way that it is very difficult to factor them. We will show that these parameters can be factored out easily if the receiver uses a quantum cloning machine (QCM) and thus can retrieve the secret. We find that the protocol is probabilistic and have calculated the probability of success of the protocol. Further, we establish the relation between the success probability and the efficiency of the QCM. In general, we find that the efficiency of the constructed QCM is greater than or equal to $\frac{1}{3}$, but we have shown that its efficiency can be enhanced when the parameters of the shared state are used as the parameters of the QCM. Moreover, we derived the linkage between the probability of success and the amount of entanglement in the shared W class of state. We analyzed the obtained result and found that even a less entangled W class of state can also play a vital role in the proposed secret-sharing scheme.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
Multi-tree Quantum Routing in Realistic Topologies
Authors:
Zebo Yang,
Ali Ghubaish,
Raj Jain,
Ramana Kompella,
Hassan Shapourian
Abstract:
In entanglement distribution networks, communication between two nodes necessitates the generation of end-to-end entanglement by entanglement swapping at intermediate nodes. Efficiently creating end-to-end entanglements over long distances is a key objective. In our prior study on asynchronous routing, we enhanced these entanglement rates by leveraging solely the local knowledge of the entanglemen…
▽ More
In entanglement distribution networks, communication between two nodes necessitates the generation of end-to-end entanglement by entanglement swapping at intermediate nodes. Efficiently creating end-to-end entanglements over long distances is a key objective. In our prior study on asynchronous routing, we enhanced these entanglement rates by leveraging solely the local knowledge of the entanglement links of a node. This was achieved by creating a tree structure, particularly a destination-oriented directed acyclic graph (DODAG) or a spanning tree, eliminating synchronous operations and conserving unused entanglement links. In this article, we present a multi-tree approach with multiple DODAGs designed to improve end-to-end entanglement rates in large-scale networks, specifically catering to a range of network topologies, including grids and barbells, as well as realistic topologies found in research testbeds like ESnet and Internet2. Our simulations show a marked improvement in end-to-end entanglement rates for specific topologies compared to the single-tree method. This study underscores the promise of asynchronous routing schemes in quantum networks, highlighting the effectiveness of asynchronous routing across different network topologies and proposing a superior routing tactic.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
Securing Tomorrow's Smart Cities: Investigating Software Security in Internet of Vehicles and Deep Learning Technologies
Authors:
Ridhi Jain,
Norbert Tihanyi,
Mohamed Amine Ferrag
Abstract:
Integrating Deep Learning (DL) techniques in the Internet of Vehicles (IoV) introduces many security challenges and issues that require thorough examination. This literature review delves into the inherent vulnerabilities and risks associated with DL in IoV systems, shedding light on the multifaceted nature of security threats. Through an extensive analysis of existing research, we explore potenti…
▽ More
Integrating Deep Learning (DL) techniques in the Internet of Vehicles (IoV) introduces many security challenges and issues that require thorough examination. This literature review delves into the inherent vulnerabilities and risks associated with DL in IoV systems, shedding light on the multifaceted nature of security threats. Through an extensive analysis of existing research, we explore potential threats posed by DL algorithms, including adversarial attacks, data privacy breaches, and model poisoning. Additionally, we investigate the impact of DL on critical aspects of IoV security, such as intrusion detection, anomaly detection, and secure communication protocols. Our review emphasizes the complexities of ensuring the robustness, reliability, and trustworthiness of DL-based IoV systems, given the dynamic and interconnected nature of vehicular networks. Furthermore, we discuss the need for novel security solutions tailored to address these challenges effectively and enhance the security posture of DL-enabled IoV environments. By offering insights into these critical issues, this chapter aims to stimulate further research, innovation, and collaboration in securing DL techniques within the context of the IoV, thereby fostering a safer and more resilient future for vehicular communication and connectivity.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
avaTTAR: Table Tennis Stroke Training with On-body and Detached Visualization in Augmented Reality
Authors:
Dizhi Ma,
Xiyun Hu,
Jingyu Shi,
Mayank Patel,
Rahul Jain,
Ziyi Liu,
Zhengzhe Zhu,
Karthik Ramani
Abstract:
Table tennis stroke training is a critical aspect of player development. We designed a new augmented reality (AR) system, avaTTAR, for table tennis stroke training. The system provides both "on-body" (first-person view) and "detached" (third-person view) visual cues, enabling users to visualize target strokes and correct their attempts effectively with this dual perspectives setup. By employing a…
▽ More
Table tennis stroke training is a critical aspect of player development. We designed a new augmented reality (AR) system, avaTTAR, for table tennis stroke training. The system provides both "on-body" (first-person view) and "detached" (third-person view) visual cues, enabling users to visualize target strokes and correct their attempts effectively with this dual perspectives setup. By employing a combination of pose estimation algorithms and IMU sensors, avaTTAR captures and reconstructs the 3D body pose and paddle orientation of users during practice, allowing real-time comparison with expert strokes. Through a user study, we affirm avaTTAR's capacity to amplify player experience and training results.
△ Less
Submitted 26 July, 2024; v1 submitted 22 July, 2024;
originally announced July 2024.
-
A Perspective on Foundation Models for the Electric Power Grid
Authors:
Hendrik F. Hamann,
Thomas Brunschwiler,
Blazhe Gjorgiev,
Leonardo S. A. Martins,
Alban Puech,
Anna Varbella,
Jonas Weiss,
Juan Bernabe-Moreno,
Alexandre Blondin Massé,
Seong Choi,
Ian Foster,
Bri-Mathias Hodge,
Rishabh Jain,
Kibaek Kim,
Vincent Mai,
François Mirallès,
Martin De Montigny,
Octavio Ramos-Leaños,
Hussein Suprême,
Le Xie,
El-Nasser S. Youssef,
Arnaud Zinflou,
Alexander J. Belvi,
Ricardo J. Bessa,
Bishnu Prasad Bhattari
, et al. (2 additional authors not shown)
Abstract:
Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi…
▽ More
Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
iMIV: in-Memory Integrity Verification for NVM
Authors:
Rajat Jain,
Aravinda Prasad,
Sreenivas Subramoney,
Arkaprava Basu
Abstract:
Non-volatile Memory (NVM) could bridge the gap between memory and storage. However, NVMs are susceptible to data remanence attacks. Thus, multiple security metadata must persist along with the data to protect the confidentiality and integrity of NVM-resident data. Persisting Bonsai Merkel Tree (BMT) nodes, critical for data integrity, can add significant overheads due to need to write large amount…
▽ More
Non-volatile Memory (NVM) could bridge the gap between memory and storage. However, NVMs are susceptible to data remanence attacks. Thus, multiple security metadata must persist along with the data to protect the confidentiality and integrity of NVM-resident data. Persisting Bonsai Merkel Tree (BMT) nodes, critical for data integrity, can add significant overheads due to need to write large amounts of metadata off-chip to the bandwidth-constrained NVMs.
We propose iMIV for low-overhead, fine-grained integrity verification through in-memory computing. We argue that memory-intensive integrity verification operations (BMT updates and verification) should be employed close to the NVM to limit off-chip data movement. We design iMIV based on typical NVDIMM designs that have an onboard logic chip with a trusted encryption engine, separate from the untrusted storage media. iMIV reduces the performance overheads from 205% to 55% when integrity verification operations are offloaded to NVM compared to when all the security operations are employed at the memory controller.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
From Directed Steiner Tree to Directed Polymatroid Steiner Tree in Planar Graphs
Authors:
Chandra Chekuri,
Rhea Jain,
Shubhang Kulkarni,
Da Wei Zheng,
Weihao Zhu
Abstract:
In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph $G=(V,E)$, a root vertex $r$ and a set $S \subseteq V$ of $k$ terminals. The goal is to find a min-cost subgraph that connects $r$ to each of the terminals. DST admits an $O(\log^2 k/\log \log k)$-approximation in quasi-polynomial time, and an $O(k^ε)$-approximation for any fixed $ε> 0$ in polynomial-time. Resol…
▽ More
In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph $G=(V,E)$, a root vertex $r$ and a set $S \subseteq V$ of $k$ terminals. The goal is to find a min-cost subgraph that connects $r$ to each of the terminals. DST admits an $O(\log^2 k/\log \log k)$-approximation in quasi-polynomial time, and an $O(k^ε)$-approximation for any fixed $ε> 0$ in polynomial-time. Resolving the existence of a polynomial-time poly-logarithmic approximation is a major open problem in approximation algorithms. In a recent work, Friggstad and Mousavi [ICALP 2023] obtained a simple and elegant polynomial-time $O(\log k)$-approximation for DST in planar digraphs via Thorup's shortest path separator theorem. We build on their work and obtain several new results on DST and related problems.
- We develop a tree embedding technique for rooted problems in planar digraphs via an interpretation of the recursion in Friggstad and Mousavi [ICALP 2023]. Using this we obtain polynomial-time poly-logarithmic approximations for Group Steiner Tree, Covering Steiner Tree, and the Polymatroid Steiner Tree problems in planar digraphs. All these problems are hard to approximate to within a factor of $Ω(\log^2 n/\log \log n)$ even in trees.
- We prove that the natural cut-based LP relaxation for DST has an integrality gap of $O(\log^2 k)$ in planar graphs. This is in contrast to general graphs where the integrality gap of this LP is known to be $Ω(k)$ and $Ω(n^δ)$ for some fixed $δ> 0$.
- We combine the preceding results with density based arguments to obtain poly-logarithmic approximations for the multi-rooted versions of the problems in planar digraphs. For DST our result improves the $O(R + \log k)$ approximation of Friggstad and Mousavi [ICALP 2023] when $R= ω(\log^2 k)$.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Assessing the Value of Coupling Thermal Energy Storage with Air-Source Heat Pumps for Residential Space Heating in U.S. Cities
Authors:
An T. Pham,
Bryan Kinzer,
Ritvik Jain,
Rohini Bala Chandran,
Michael T. Craig
Abstract:
Widespread air source heat pump (ASHP) adoption faces several challenges that on-site thermal energy storage (TES), particularly thermochemical salt hydrate TES, can mitigate. No techno-economic analyses for salt-hydrate-based TES in residential applications exist. We quantify the residential space heating value of four salt hydrate TES materials - MgSO4, MgCl2, K2CO3, and SrBr2 - coupled with ASH…
▽ More
Widespread air source heat pump (ASHP) adoption faces several challenges that on-site thermal energy storage (TES), particularly thermochemical salt hydrate TES, can mitigate. No techno-economic analyses for salt-hydrate-based TES in residential applications exist. We quantify the residential space heating value of four salt hydrate TES materials - MgSO4, MgCl2, K2CO3, and SrBr2 - coupled with ASHPs across 4,800 representative households in 12 U.S. cities by embedding salt-hydrate-specific Ragone plots into a techno-economic model of coupled ASHP-TES operations. In Detroit, salt hydrate TES is projected to reduce household annual electricity costs by up to $\$$241 (8$\%$). Cost savings from TES can differ by over an order of magnitude between households and salt hydrates. We identify the most promising salt in this study, SrBr2, due to its high energy density and low humidification parasitic load. Break-even capital costs of SrBr2-based TES range from $\$$13/kWh to $\$$17/kWh, making it the only salt hydrate studied to reach and exceed the U.S. Department of Energy's $\$$15/kWh TES cost target. Sensitivities highlight the importance of variable TES sizing and efficiency losses in the value of TES.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Fuzzing at Scale: The Untold Story of the Scheduler
Authors:
Ivica Nikolic,
Racchit Jain
Abstract:
How to search for bugs in 1,000 programs using a pre-existing fuzzer and a standard PC? We consider this problem and show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs. In fact, the impact of employing an effective strategy is comparable to that of utilizing a state-of-the-art fuzzer. The consid…
▽ More
How to search for bugs in 1,000 programs using a pre-existing fuzzer and a standard PC? We consider this problem and show that a well-designed strategy that determines which programs to fuzz and for how long can greatly impact the number of bugs found across the programs. In fact, the impact of employing an effective strategy is comparable to that of utilizing a state-of-the-art fuzzer. The considered problem is referred to as fuzzing at scale, and the strategy as scheduler. We show that besides a naive scheduler, that allocates equal fuzz time to all programs, we can consider dynamic schedulers that adjust time allocation based on the ongoing fuzzing progress of individual programs. Such schedulers are superior because they lead both to higher number of total found bugs and to higher number of found bugs for most programs. The performance gap between naive and dynamic schedulers can be as wide (or even wider) as the gap between two fuzzers. Our findings thus suggest that the problem of advancing schedulers is fundamental for fuzzing at scale. We develop several schedulers and leverage the most sophisticated one to fuzz simultaneously our newly compiled benchmark of around 5,000 Ubuntu programs, and detect 4908 bugs.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Towards a data-driven and scalable approach for window operation detection in multi-family residential buildings
Authors:
Juliet Nwagwu Ume-Ezeoke,
Kopal Nihar,
Catherine Gorle,
Rishee Jain
Abstract:
Natural cooling, utilizing non-mechanical cooling, presents a low-carbon and low-cost way to provide thermal comfort in residential buildings. However, designing naturally cooled buildings requires a clear understanding of how opening and closing windows affect occupants' comfort. Predicting when and why occupants open windows is a challenging task, often relying on specialized sensors and buildin…
▽ More
Natural cooling, utilizing non-mechanical cooling, presents a low-carbon and low-cost way to provide thermal comfort in residential buildings. However, designing naturally cooled buildings requires a clear understanding of how opening and closing windows affect occupants' comfort. Predicting when and why occupants open windows is a challenging task, often relying on specialized sensors and building-specific training data. This limits the scalability of natural cooling solutions. Here, we, propose a novel unsupervised method that utilizes easily deployable off-the-shelf temperature and humidity sensors to detect window operations. The effectiveness of our approach is evaluated using an empirical dataset and compared with a state-of-the-art support vector machine (SVM) model. The results demonstrate that our proposed method outperforms the SVM on key indicators, except when indoor and outdoor temperatures have small differences. Unlike the SVM's sensitivity to time series characteristics, our proposed method relies solely on indoor temperature and exhibits robust performance in pilot studies, making it a promising candidate for developing a highly scalable and generalizable window operation detection model This work demonstrates the potential of unsupervised data-driven methods for understanding window operations in residential buildings. By enabling more accurate modeling of naturally cooled buildings, our work aims to facilitate the widespread adoption of this low-cost and low-carbon technology.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Multimodal Segmentation for Vocal Tract Modeling
Authors:
Rishi Jain,
Bohan Yu,
Peter Wu,
Tejas Prabhune,
Gopala Anumanchipalli
Abstract:
Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech…
▽ More
Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech, but annotated datasets of MRI are limited in size due to time-consuming and computationally expensive labeling methods. We first present a deep labeling strategy for the RT-MRI video using a vision-only segmentation approach. We then introduce a multimodal algorithm using audio to improve segmentation of vocal articulators. Together, we set a new benchmark for vocal tract modeling in MRI video segmentation and use this to release labels for a 75-speaker RT-MRI dataset, increasing the amount of labeled public RT-MRI data of the vocal tract by over a factor of 9. The code and dataset labels can be found at \url{rishiraij.github.io/multimodal-mri-avatar/}.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Exponential Time Approximation for Coloring 3-Colorable Graphs
Authors:
Venkatesan Guruswami,
Rhea Jain
Abstract:
The problem of efficiently coloring $3$-colorable graphs with few colors has received much attention on both the algorithmic and inapproximability fronts. We consider exponential time approximations, in which given a parameter $r$, we aim to develop an $r$-approximation algorithm with the best possible runtime, providing a tradeoff between runtime and approximation ratio. In this vein, an algorith…
▽ More
The problem of efficiently coloring $3$-colorable graphs with few colors has received much attention on both the algorithmic and inapproximability fronts. We consider exponential time approximations, in which given a parameter $r$, we aim to develop an $r$-approximation algorithm with the best possible runtime, providing a tradeoff between runtime and approximation ratio. In this vein, an algorithm to $O(n^\varepsilon)$-color a 3-colorable graphs in time $2^{Θ(n^{1-2\varepsilon}\log(n))}$ is given in (Atserias and Dalmau, SODA 2022.)
We build on tools developed in (Bansal et al., Algorithmic, 2019) to obtain an algorithm to color $3$-colorable graphs with $O(r)$ colors in $\exp\left(\tilde{O}\left(\frac {n\log^{11/2}r} {r^3}\right)\right)$ time, asymptotically improving upon the bound given by Atserias and Dalmau.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Examining the Implications of Deepfakes for Election Integrity
Authors:
Hriday Ranka,
Mokshit Surana,
Neel Kothari,
Veer Pariawala,
Pratyay Banerjee,
Aditya Surve,
Sainath Reddy Sankepally,
Raghav Jain,
Jhagrut Lalwani,
Swapneel Mehta
Abstract:
It is becoming cheaper to launch disinformation operations at scale using AI-generated content, in particular 'deepfake' technology. We have observed instances of deepfakes in political campaigns, where generated content is employed to both bolster the credibility of certain narratives (reinforcing outcomes) and manipulate public perception to the detriment of targeted candidates or causes (advers…
▽ More
It is becoming cheaper to launch disinformation operations at scale using AI-generated content, in particular 'deepfake' technology. We have observed instances of deepfakes in political campaigns, where generated content is employed to both bolster the credibility of certain narratives (reinforcing outcomes) and manipulate public perception to the detriment of targeted candidates or causes (adversarial outcomes). We discuss the threats from deepfakes in politics, highlight model specifications underlying different types of deepfake generation methods, and contribute an accessible evaluation of the efficacy of existing detection methods. We provide this as a summary for lawmakers and civil society actors to understand how the technology may be applied in light of existing policies regulating its use. We highlight the limitations of existing detection mechanisms and discuss the areas where policies and regulations are required to address the challenges of deepfakes.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Online Bandit Learning with Offline Preference Data
Authors:
Akhil Agnihotri,
Rahul Jain,
Deepak Ramachandran,
Zheng Wen
Abstract:
Reinforcement Learning with Human Feedback (RLHF) is at the core of fine-tuning methods for generative AI models for language and images. Such feedback is often sought as rank or preference feedback from human raters, as opposed to eliciting scores since the latter tends to be very noisy. On the other hand, RL theory and algorithms predominantly assume that a reward feedback is available. In parti…
▽ More
Reinforcement Learning with Human Feedback (RLHF) is at the core of fine-tuning methods for generative AI models for language and images. Such feedback is often sought as rank or preference feedback from human raters, as opposed to eliciting scores since the latter tends to be very noisy. On the other hand, RL theory and algorithms predominantly assume that a reward feedback is available. In particular, approaches for online learning that can be helpful in adaptive data collection via active learning cannot incorporate offline preference data. In this paper, we adopt a finite-armed linear bandit model as a prototypical model of online learning. We consider an offline preference dataset to be available generated by an expert of unknown 'competence'. We propose $\texttt{warmPref-PS}$, a posterior sampling algorithm for online learning that can be warm-started with an offline dataset with noisy preference feedback. We show that by modeling the competence of the expert that generated it, we are able to use such a dataset most effectively. We support our claims with novel theoretical analysis of its Bayesian regret, as well as extensive empirical evaluation of an approximate algorithm which performs substantially better (almost 25 to 50% regret reduction in our studies) as compared to baselines.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
e-COP : Episodic Constrained Optimization of Policies
Authors:
Akhil Agnihotri,
Rahul Jain,
Deepak Ramachandran,
Sahil Singla
Abstract:
In this paper, we present the $\texttt{e-COP}$ algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate sets of optimization criteria and constraints on a system's behavior. We approach this problem by first establishing a policy difference lemma for the episodic se…
▽ More
In this paper, we present the $\texttt{e-COP}$ algorithm, the first policy optimization algorithm for constrained Reinforcement Learning (RL) in episodic (finite horizon) settings. Such formulations are applicable when there are separate sets of optimization criteria and constraints on a system's behavior. We approach this problem by first establishing a policy difference lemma for the episodic setting, which provides the theoretical foundation for the algorithm. Then, we propose to combine a set of established and novel solution ideas to yield the $\texttt{e-COP}$ algorithm that is easy to implement and numerically stable, and provide a theoretical guarantee on optimality under certain scaling assumptions. Through extensive empirical analysis using benchmarks in the Safety Gym suite, we show that our algorithm has similar or better performance than SoTA (non-episodic) algorithms adapted for the episodic setting. The scalability of the algorithm opens the door to its application in safety-constrained Reinforcement Learning from Human Feedback for Large Language or Diffusion Models.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
DocSynthv2: A Practical Autoregressive Modeling for Document Generation
Authors:
Sanket Biswas,
Rajiv Jain,
Vlad I. Morariu,
Jiuxiang Gu,
Puneet Mathur,
Curtis Wigington,
Tong Sun,
Josep Lladós
Abstract:
While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both la…
▽ More
While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches. By focusing on the relationship between the structural elements and the textual content within documents, we aim to generate cohesive and contextually relevant documents without any reliance on visual components. Through experimental studies on our curated benchmark for the new task, we demonstrate the ability of our model combining layout and textual information in enhancing the generation quality and relevance of documents, opening new pathways for research in document creation and automated design. Our findings emphasize the effectiveness of autoregressive models in handling complex document generation tasks.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Authors:
Prince Jha,
Raghav Jain,
Konika Mandal,
Aman Chadha,
Sriparna Saha,
Pushpak Bhattacharyya
Abstract:
In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit…
▽ More
In the digital world, memes present a unique challenge for content moderation due to their potential to spread harmful content. Although detection methods have improved, proactive solutions such as intervention are still limited, with current research focusing mostly on text-based content, neglecting the widespread influence of multimodal content like memes. Addressing this gap, we present \textit{MemeGuard}, a comprehensive framework leveraging Large Language Models (LLMs) and Visual Language Models (VLMs) for meme intervention. \textit{MemeGuard} harnesses a specially fine-tuned VLM, \textit{VLMeme}, for meme interpretation, and a multimodal knowledge selection and ranking mechanism (\textit{MKS}) for distilling relevant knowledge. This knowledge is then employed by a general-purpose LLM to generate contextually appropriate interventions. Another key contribution of this work is the \textit{\textbf{I}ntervening} \textit{\textbf{C}yberbullying in \textbf{M}ultimodal \textbf{M}emes (ICMM)} dataset, a high-quality, labeled dataset featuring toxic memes and their corresponding human-annotated interventions. We leverage \textit{ICMM} to test \textit{MemeGuard}, demonstrating its proficiency in generating relevant and effective responses to toxic memes.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
Nuclear Data to Quantify Urca Cooling in Accreting Neutron Stars
Authors:
Rahul Jain
Abstract:
Neutron stars in Low Mass X-ray Binaries (LMXBs) can accrete matter onto their surface from the companion star. Transiently accreting neutron stars go through alternating phases of active accretion outbursts and quiescence. X-ray observations during the quiescence phase show a drop in X-ray luminosity with the time in quiescence. This is also inferred as the drop in surface temperature or the cool…
▽ More
Neutron stars in Low Mass X-ray Binaries (LMXBs) can accrete matter onto their surface from the companion star. Transiently accreting neutron stars go through alternating phases of active accretion outbursts and quiescence. X-ray observations during the quiescence phase show a drop in X-ray luminosity with the time in quiescence. This is also inferred as the drop in surface temperature or the cooling of accreting neutron stars in quiescence. Analyzing these cooling curves reveals a great deal of information about the structure and composition of neutron stars. However, model-observation comparisons of such cooling curves are challenging - partly due to observational uncertainties, and partly due to incomplete knowledge of heating mechanisms during accretion outbursts. This situation is further exacerbated by the recent discovery of Urca cooling in the neutron star crust. These are cycles that alternate between electron-capture and beta-decay to produce a large flux of neutrinos and anti-neutrinos. These freely stream out of the star and carry energy with them, essentially cooling the neutron star crust without changing the composition. As a result, it is necessary to accurately quantify the strength of Urca cooling to constrain the heat sources in neutron star crusts and facilitate better model-observation comparisons of the cooling curves.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Harvard Undergraduate Survey on Generative AI
Authors:
Shikoh Hirabayashi,
Rishab Jain,
Nikola Jurković,
Gabriel Wu
Abstract:
How has generative AI impacted the experiences of college students? We study the influence of AI on the study habits, class choices, and career prospects of Harvard undergraduates (n=326), finding that almost 90% of students use generative AI. For roughly 25% of these students, AI has begun to substitute for attending office hours and completing required readings. Half of students are concerned th…
▽ More
How has generative AI impacted the experiences of college students? We study the influence of AI on the study habits, class choices, and career prospects of Harvard undergraduates (n=326), finding that almost 90% of students use generative AI. For roughly 25% of these students, AI has begun to substitute for attending office hours and completing required readings. Half of students are concerned that AI will negatively impact their job prospects, and over half of students wish that Harvard had more classes on the future impacts of AI. We also investigate students' outlook on the broader social implications of AI, finding that half of students are worried that AI will increase economic inequality, and 40% believe that extinction risk from AI should be treated as a global priority with the same urgency as pandemics and nuclear war. Around half of students who have taken a class on AI expect AI to exceed human capabilities on almost all tasks within 30 years. We make some recommendations to the Harvard community in light of these results.
△ Less
Submitted 7 August, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization
Authors:
Richard Luo,
Austin Peng,
Adithya Vasudev,
Rishabh Jain
Abstract:
Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comp…
▽ More
Video is an increasingly prominent and information-dense medium, yet it poses substantial challenges for language models. A typical video consists of a sequence of shorter segments, or shots, that collectively form a coherent narrative. Each shot is analogous to a word in a sentence where multiple data streams of information (such as visual and auditory data) must be processed simultaneously. Comprehension of the entire video requires not only understanding the visual-audio information of each shot but also requires that the model links the ideas between each shot to generate a larger, all-encompassing story. Despite significant progress in the field, current works often overlook videos' more granular shot-by-shot semantic information. In this project, we propose a family of efficient large language vision models (LLVMs) to boost video summarization and captioning called Shotluck Holmes. By leveraging better pretraining and data collection strategies, we extend the abilities of existing small LLVMs from being able to understand a picture to being able to understand a sequence of frames. Specifically, we show that Shotluck Holmes achieves better performance than state-of-the-art results on the Shot2Story video captioning and summary task with significantly smaller and more computationally efficient models.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Modified Six State Cryptographic Protocol with Entangled Ancilla Component States
Authors:
Rashi Jain,
Satyabrata Adhikari
Abstract:
In a realistic situation, it is very difficult to communicate securely between two distant parties without introducing any disturbances. These disturbances might occur either due to external noise or may be due to the interference of an eavesdropper sitting in between the sender and the receiver. In this work, we probe here the existence of the possibility of the situation of generation of a secre…
▽ More
In a realistic situation, it is very difficult to communicate securely between two distant parties without introducing any disturbances. These disturbances might occur either due to external noise or may be due to the interference of an eavesdropper sitting in between the sender and the receiver. In this work, we probe here the existence of the possibility of the situation of generation of a secret key even if the eavesdropper is able to construct an entangled ancilla state in such a way that she can extract information from the intercepted qubit. To achieve this task, we consider and modify the six-state QKD protocol in which Eve can construct the unitary transformation that may make all ancilla components entangled at the output. Then, we calculate the mutual information between Alice and Bob and Alice and Eve, and identify the region where the secret key is generated even in the presence of Eve. We find that, in general, the mutual information of Alice and Eve depends not only on the disturbance D, but here we have shown that it also depends on the concurrence of the ancilla component states. We have further shown that it is possible to derive the disturbance-free mutual information of Alice and Eve, if Eve manipulates her entangled ancilla state in a particular manner. Thus, in this way, we are able to show that a secret key can be generated between Alice and Bob even if the disturbance is large enough. Moreover, we show that Bruss's six state QKD protocol failed to generate the secret key in the region where the modified six-state QKD protocol can generate the secret key.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Pure Exploration for Constrained Best Mixed Arm Identification with a Fixed Budget
Authors:
Dengwang Tang,
Rahul Jain,
Ashutosh Nayyar,
Pierluigi Nuzzo
Abstract:
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomiz…
▽ More
In this paper, we introduce the constrained best mixed arm identification (CBMAI) problem with a fixed budget. This is a pure exploration problem in a stochastic finite armed bandit model. Each arm is associated with a reward and multiple types of costs from unknown distributions. Unlike the unconstrained best arm identification problem, the optimal solution for the CBMAI problem may be a randomized mixture of multiple arms. The goal thus is to find the best mixed arm that maximizes the expected reward subject to constraints on the expected costs with a given learning budget $N$. We propose a novel, parameter-free algorithm, called the Score Function-based Successive Reject (SFSR) algorithm, that combines the classical successive reject framework with a novel score-function-based rejection criteria based on linear programming theory to identify the optimal support. We provide a theoretical upper bound on the mis-identification (of the the support of the best mixed arm) probability and show that it decays exponentially in the budget $N$ and some constants that characterize the hardness of the problem instance. We also develop an information theoretic lower bound on the error probability that shows that these constants appropriately characterize the problem difficulty. We validate this empirically on a number of average and hard instances.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Service Mesh: Architectures, Applications, and Implementations
Authors:
Behrooz Farkiani,
Raj Jain
Abstract:
The scalability and flexibility of microservice architecture have led to major changes in cloud-native application architectures. However, the complexity of managing thousands of small services written in different languages and handling the exchange of data between them have caused significant management challenges. Service mesh is a promising solution that could mitigate these problems by introd…
▽ More
The scalability and flexibility of microservice architecture have led to major changes in cloud-native application architectures. However, the complexity of managing thousands of small services written in different languages and handling the exchange of data between them have caused significant management challenges. Service mesh is a promising solution that could mitigate these problems by introducing an overlay layer on top of the services. In this paper, we first study the architecture and components of service mesh architecture. Then, we review two important service mesh implementations and discuss how the service mesh could be helpful in other areas, including 5G.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Empathy Through Multimodality in Conversational Interfaces
Authors:
Mahyar Abbasian,
Iman Azimi,
Mohammad Feli,
Amir M. Rahmani,
Ramesh Jain
Abstract:
Agents represent one of the most emerging applications of Large Language Models (LLMs) and Generative AI, with their effectiveness hinging on multimodal capabilities to navigate complex user environments. Conversational Health Agents (CHAs), a prime example of this, are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence. This pa…
▽ More
Agents represent one of the most emerging applications of Large Language Models (LLMs) and Generative AI, with their effectiveness hinging on multimodal capabilities to navigate complex user environments. Conversational Health Agents (CHAs), a prime example of this, are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence. This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support. It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses. Our implementation leverages the versatile openCHA framework, and our comprehensive evaluation involves neutral prompts expressed in diverse emotional tones: sadness, anger, and joy. We evaluate the consistency and repeatability of the planning capability of the proposed CHA. Furthermore, human evaluators critique the CHA's empathic delivery, with findings revealing a striking concordance between the CHA's outputs and evaluators' assessments. These results affirm the indispensable role of vocal (soon multimodal) emotion recognition in strengthening the empathetic connection built by CHAs, cementing their place at the forefront of interactive, compassionate digital health solutions.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models
Authors:
Norbert Tihanyi,
Tamas Bisztray,
Mohamed Amine Ferrag,
Ridhi Jain,
Lucas C. Cordeiro
Abstract:
This study provides a comparative analysis of state-of-the-art large language models (LLMs), analyzing how likely they generate vulnerabilities when writing simple C programs using a neutral zero-shot prompt. We address a significant gap in the literature concerning the security properties of code produced by these models without specific directives. N. Tihanyi et al. introduced the FormAI dataset…
▽ More
This study provides a comparative analysis of state-of-the-art large language models (LLMs), analyzing how likely they generate vulnerabilities when writing simple C programs using a neutral zero-shot prompt. We address a significant gap in the literature concerning the security properties of code produced by these models without specific directives. N. Tihanyi et al. introduced the FormAI dataset at PROMISE '23, containing 112,000 GPT-3.5-generated C programs, with over 51.24% identified as vulnerable. We expand that work by introducing the FormAI-v2 dataset comprising 265,000 compilable C programs generated using various LLMs, including robust models such as Google's GEMINI-pro, OpenAI's GPT-4, and TII's 180 billion-parameter Falcon, to Meta's specialized 13 billion-parameter CodeLLama2 and various other compact models. Each program in the dataset is labelled based on the vulnerabilities detected in its source code through formal verification using the Efficient SMT-based Context-Bounded Model Checker (ESBMC). This technique eliminates false positives by delivering a counterexample and ensures the exclusion of false negatives by completing the verification process. Our study reveals that at least 63.47% of the generated programs are vulnerable. The differences between the models are minor, as they all display similar coding errors with slight variations. Our research highlights that while LLMs offer promising capabilities for code generation, deploying their output in a production environment requires risk assessment and validation.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
LEMDA: A Novel Feature Engineering Method for Intrusion Detection in IoT Systems
Authors:
Ali Ghubaish,
Zebo Yang,
Aiman Erbad,
Raj Jain
Abstract:
Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model…
▽ More
Intrusion detection systems (IDS) for the Internet of Things (IoT) systems can use AI-based models to ensure secure communications. IoT systems tend to have many connected devices producing massive amounts of data with high dimensionality, which requires complex models. Complex models have notorious problems such as overfitting, low interpretability, and high computational complexity. Adding model complexity penalty (i.e., regularization) can ease overfitting, but it barely helps interpretability and computational efficiency. Feature engineering can solve these issues; hence, it has become critical for IDS in large-scale IoT systems to reduce the size and dimensionality of data, resulting in less complex models with excellent performance, smaller data storage, and fast detection. This paper proposes a new feature engineering method called LEMDA (Light feature Engineering based on the Mean Decrease in Accuracy). LEMDA applies exponential decay and an optional sensitivity factor to select and create the most informative features. The proposed method has been evaluated and compared to other feature engineering methods using three IoT datasets and four AI/ML models. The results show that LEMDA improves the F1 score performance of all the IDS models by an average of 34% and reduces the average training and detection times in most cases.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Approximation Algorithms for Hop Constrained and Buy-at-Bulk Network Design via Hop Constrained Oblivious Routing
Authors:
Chandra Chekuri,
Rhea Jain
Abstract:
We consider two-cost network design models in which edges of the input graph have an associated cost and length. We build upon recent advances in hop-constrained oblivious routing to obtain two sets of results.
We address multicommodity buy-at-bulk network design in the nonuniform setting. Existing poly-logarithmic approximations are based on the junction tree approach [CHKS09,KN11]. We obtain a…
▽ More
We consider two-cost network design models in which edges of the input graph have an associated cost and length. We build upon recent advances in hop-constrained oblivious routing to obtain two sets of results.
We address multicommodity buy-at-bulk network design in the nonuniform setting. Existing poly-logarithmic approximations are based on the junction tree approach [CHKS09,KN11]. We obtain a new polylogarithmic approximation via a natural LP relaxation. This establishes an upper bound on its integrality gap and affirmatively answers an open question raised in [CHKS09]. The rounding is based on recent results in hop-constrained oblivious routing [GHZ21], and this technique yields a polylogarithmic approximation in more general settings such as set connectivity. Our algorithm for buy-at-bulk network design is based on an LP-based reduction to hop constrained network design for which we obtain LP-based bicriteria approximation algorithms.
We also consider a fault-tolerant version of hop constrained network design where one wants to design a low-cost network to guarantee short paths between a given set of source-sink pairs even when k-1 edges can fail. This model has been considered in network design [GL17,GML18,AJL20] but no approximation algorithms were known. We obtain polylogarithmic bicriteria approximation algorithms for the single-source setting for any fixed k. We build upon the single-source algorithm and the junction-tree approach to obtain an approximation algorithm for the multicommodity setting when at most one edge can fail.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Robust and composable device-independent quantum protocols for oblivious transfer and bit commitment
Authors:
Rishabh Batra,
Sayantan Chakraborty,
Rahul Jain,
Upendra Kapshikar
Abstract:
We present robust and composable device-independent quantum protocols for oblivious transfer (OT) and bit commitment (BC) using Magic Square devices. We assume there is no long-term quantum memory, that is, after a finite time interval, referred to as \textbf{DELAY}, the states stored in the devices decohere. By robustness, which is a highlight of our protocols, we mean that the protocols are corr…
▽ More
We present robust and composable device-independent quantum protocols for oblivious transfer (OT) and bit commitment (BC) using Magic Square devices. We assume there is no long-term quantum memory, that is, after a finite time interval, referred to as \textbf{DELAY}, the states stored in the devices decohere. By robustness, which is a highlight of our protocols, we mean that the protocols are correct and secure even when devices are slightly off from their ideal specifications (the \emph{faulty but non-malicious} regime). This is an important property, since in the real world, devices would certainly have small manufacturing errors and cannot be expected to be ideal. To the best of our understanding and knowledge, none of the known DI protocols for OT and BC in the literature are robust; they can not guarantee correctness in the faulty but non-malicious regime. Our protocols are sequentially composable and hence, can be used as building blocks to construct larger protocols, while still preserving security guarantees.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
A new approach to construct minimal linear codes over $\mathbb{F}_{3}$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil,
Sahar M. A. Maqbol
Abstract:
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
In this article, we present two new approaches to construct minimal linear codes of dimension $n+1$ over $\mathbb{F}_{3}$ using characteristic and ternary functions. We also obtain the weight distributions of these constructed minimal linear codes. We further show that a specific class of these codes violates Ashikhmin-Barg condition.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
Commitments are equivalent to one-way state generators
Authors:
Rishabh Batra,
Rahul Jain
Abstract:
One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-…
▽ More
One-way state generators (OWSG) are natural quantum analogs to classical one-way functions. We show that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs ($n$ represents the input length) are equivalent to $poly(n)$-copy OWSG and to quantum commitments. Since known results show that $o\left(\frac{n}{\log(n)}\right)$-copy OWSG cannot imply commitments, this shows that $O\left(\frac{n}{\log(n)}\right)$-copy OWSGs are the weakest OWSGs from which we can get commitments (and hence much of quantum cryptography).
Our construction follows along the lines of Håstad, Impagliazzo, Levin and Luby [HILL], who obtained classical pseudorandom generators (PRG) from classical one-way functions (OWF), however with crucial modifications. Our construction, when applied to the classical case, provides an alternative to the construction provided by [HILL]. Since we do not argue conditioned on the output of the one-way function, our construction and analysis are arguably simpler and may be of independent interest.
△ Less
Submitted 17 April, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
NLP at UC Santa Cruz at SemEval-2024 Task 5: Legal Answer Validation using Few-Shot Multi-Choice QA
Authors:
Anish Pahilajani,
Samyak Rajesh Jain,
Devasha Trivedi
Abstract:
This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we perfor…
▽ More
This paper presents our submission to the SemEval 2024 Task 5: The Legal Argument Reasoning Task in Civil Procedure. We present two approaches to solving the task of legal answer validation, given an introduction to the case, a question and an answer candidate. Firstly, we fine-tuned pre-trained BERT-based models and found that models trained on domain knowledge perform better. Secondly, we performed few-shot prompting on GPT models and found that reformulating the answer validation task to be a multiple-choice QA task remarkably improves the performance of the model. Our best submission is a BERT-based model that achieved the 7th place out of 20.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
DE-HNN: An effective neural model for Circuit Netlist representation
Authors:
Zhishang Luo,
Truong Son Hy,
Puoya Tabaghi,
Donghyeon Koh,
Michael Defferrard,
Elahe Rezaei,
Ryan Carey,
Rhett Davis,
Rajeev Jain,
Yusu Wang
Abstract:
The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottleneck. Designers want fast tools that can quickly give feedback on a design. Using the input and output data of the tools from past designs, one can attempt to build a machine learning model that predicts…
▽ More
The run-time for optimization tools used in chip design has grown with the complexity of designs to the point where it can take several days to go through one design cycle which has become a bottleneck. Designers want fast tools that can quickly give feedback on a design. Using the input and output data of the tools from past designs, one can attempt to build a machine learning model that predicts the outcome of a design in significantly shorter time than running the tool. The accuracy of such models is affected by the representation of the design data, which is usually a netlist that describes the elements of the digital circuit and how they are connected. Graph representations for the netlist together with graph neural networks have been investigated for such models. However, the characteristics of netlists pose several challenges for existing graph learning frameworks, due to the large number of nodes and the importance of long-range interactions between nodes. To address these challenges, we represent the netlist as a directed hypergraph and propose a Directional Equivariant Hypergraph Neural Network (DE-HNN) for the effective learning of (directed) hypergraphs. Theoretically, we show that our DE-HNN can universally approximate any node or hyperedge based function that satisfies certain permutation equivariant and invariant properties natural for directed hypergraphs. We compare the proposed DE-HNN with several State-of-the-art (SOTA) machine learning models for (hyper)graphs and netlists, and show that the DE-HNN significantly outperforms them in predicting the outcome of optimized place-and-route tools directly from the input netlists. Our source code and the netlists data used are publicly available at https://github.com/YusuLab/chips.git
△ Less
Submitted 16 April, 2024; v1 submitted 30 March, 2024;
originally announced April 2024.
-
One-Shot Non-Catalytic Distributed Purity Distillation
Authors:
Sayantan Chakraborty,
Rahul Jain,
Pranab Sen
Abstract:
Pure states are an important resource in many quantum information processing protocols. However, even making a fixed pure state, say $|0\rangle$, in the laboratory requires a considerable amount of effort. Often one ends up with a mixed state $ρ$ whose classical description is nevertheless known. Hence it is important to develop protocols that extract a fixed pure state from a known mixed state. I…
▽ More
Pure states are an important resource in many quantum information processing protocols. However, even making a fixed pure state, say $|0\rangle$, in the laboratory requires a considerable amount of effort. Often one ends up with a mixed state $ρ$ whose classical description is nevertheless known. Hence it is important to develop protocols that extract a fixed pure state from a known mixed state. In this work, we study the problem of extracting a fixed pure state $|0\rangle^{A'} |0\rangle^{B'}$ from a known pure state $ρ^{AB}$ distributed between two parties $A$ and $B$. Here, $A'$, $B'$ are subspaces of $A$, $B$ and the total amount of purity extracted is $\log |A'| + \log |B'|$. The parties can borrow local pure ancilla, apply local unitary operations and send a message from $A$ to $B$ through a dephasing channel. If local pure ancilla is borrowed, it must be subtracted in order to properly account for the purity extracted. We obtain the most efficient achievable bounds on one shot distributed purity extraction, in terms of the rate of local ancilla borrowed by the protocol, while distilling pure qubits at the best known rate. Our protocols borrow little to no local pure ancilla. Our bounds improve upon the existing bounds for this problem in both one shot as well as asymptotic iid settings. In particular they subsume all the asymptotic iid results of Devetak and Krovi-Devetak. In addition, we derive upper bounds for the rate of distillation in the one shot setting, which nearly match our achievable bounds.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Approximation Algorithms for Network Design in Non-Uniform Fault Models
Authors:
Chandra Chekuri,
Rhea Jain
Abstract:
The Survivable Network Design problem (SNDP) is a well-studied problem, motivated by the design of networks that are robust to faults under the assumption that any subset of edges up to a specific number can fail. We consider non-uniform fault models where the subset of edges that fail can be specified in different ways. Our primary interest is in the flexible graph connectivity model, in which th…
▽ More
The Survivable Network Design problem (SNDP) is a well-studied problem, motivated by the design of networks that are robust to faults under the assumption that any subset of edges up to a specific number can fail. We consider non-uniform fault models where the subset of edges that fail can be specified in different ways. Our primary interest is in the flexible graph connectivity model, in which the edge set is partitioned into safe and unsafe edges. The goal is to design a network that has desired connectivity properties under the assumption that only unsafe edges up to a specific number can fail. We also discuss the bulk-robust model and the relative survivable network design model. While SNDP admits a 2-approximation, the approximability of problems in these more complex models is much less understood even in special cases. We make two contributions.
Our first set of results are in the flexible graph connectivity model. Motivated by a conjecture that a constant factor approximation is feasible when the robustness parameters are fixed constants, we consider two important special cases, namely the single pair case, and the global connectivity case. For both these, we obtain constant factor approximations in several parameter ranges of interest. These are based on an augmentation framework and via decomposing the families of cuts that need to be covered into a small number of uncrossable families.
Our second set of results are poly-logarithmic approximations for the bulk-robust model when the "width" of the given instance (the maximum number of edges that can fail in any particular scenario) is fixed. Via this, we derive corresponding approximations for the flexible graph connectivity model and the relative survivable network design model. The results are obtained via two algorithmic approaches and they have different tradeoffs in terms of the approximation ratio and generality.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Quantum Channel Simulation in Fidelity is no more difficult than State Splitting
Authors:
Michael X. Cao,
Rahul Jain,
Marco Tomamichel
Abstract:
Characterizing the minimal communication needed for the quantum channel simulation is a fundamental task in the quantum information theory. In this paper, we show that, in fidelity, the quantum channel simulation can be directly achieved via quantum state splitting without using a technique known as the de~Finetti reduction, and thus provide a pair of tighter one-shot bounds. Using the bounds, we…
▽ More
Characterizing the minimal communication needed for the quantum channel simulation is a fundamental task in the quantum information theory. In this paper, we show that, in fidelity, the quantum channel simulation can be directly achieved via quantum state splitting without using a technique known as the de~Finetti reduction, and thus provide a pair of tighter one-shot bounds. Using the bounds, we also recover the quantum reverse Shannon theorem in a much simpler way.
△ Less
Submitted 24 June, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Construction of Minimal Binary Linear Codes of dimension $n+3$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy,
Bhagyashri S. Patil
Abstract:
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution o…
▽ More
In this paper, we will give the generic construction of a binary linear code of dimension $n+3$ and derive the necessary and sufficient conditions for the constructed code to be minimal. Using generic construction, a new family of minimal binary linear code will be constructed from a special class of Boolean functions violating the Ashikhmin-Barg condition. We also obtain the weight distribution of the constructed minimal binary linear code.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Knowing Your Nonlinearities: Shapley Interactions Reveal the Underlying Structure of Data
Authors:
Divyansh Singhvi,
Andrej Erkelens,
Raghav Jain,
Diganta Misra,
Naomi Saphra
Abstract:
Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language mo…
▽ More
Measuring nonlinear feature interaction is an established approach to understanding complex patterns of attribution in many models. In this paper, we use Shapley Taylor interaction indices (STII) to analyze the impact of underlying data structure on model representations in a variety of modalities, tasks, and architectures. Considering linguistic structure in masked and auto-regressive language models (MLMs and ALMs), we find that STII increases within idiomatic expressions and that MLMs scale STII with syntactic distance, relying more on syntax in their nonlinear structure than ALMs do. Our speech model findings reflect the phonetic principal that the openness of the oral cavity determines how much a phoneme varies based on its context. Finally, we study image classifiers and illustrate that feature interactions intuitively reflect object boundaries. Our wide range of results illustrates the benefits of interdisciplinary work and domain expertise in interpretability research.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Authors:
Gemini Team,
Petko Georgiev,
Ving Ian Lei,
Ryan Burnell,
Libin Bai,
Anmol Gulati,
Garrett Tanzer,
Damien Vincent,
Zhufeng Pan,
Shibo Wang,
Soroosh Mariooryad,
Yifan Ding,
Xinyang Geng,
Fred Alcober,
Roy Frostig,
Mark Omernick,
Lexi Walker,
Cosmin Paduraru,
Christina Sorokin,
Andrea Tacchetti,
Colin Gaffney,
Samira Daruki,
Olcan Sercinoglu,
Zach Gleicher,
Juliette Love
, et al. (1110 additional authors not shown)
Abstract:
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February…
▽ More
In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
△ Less
Submitted 8 August, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits
Authors:
Chenhui Deng,
Zichao Yue,
Cunxi Yu,
Gokce Sarar,
Ryan Carey,
Rajeev Jain,
Zhiru Zhang
Abstract:
While graph neural networks (GNNs) have gained popularity for learning circuit representations in various electronic design automation (EDA) tasks, they face challenges in scalability when applied to large graphs and exhibit limited generalizability to new designs. These limitations make them less practical for addressing large-scale, complex circuit problems. In this work we propose HOGA, a novel…
▽ More
While graph neural networks (GNNs) have gained popularity for learning circuit representations in various electronic design automation (EDA) tasks, they face challenges in scalability when applied to large graphs and exhibit limited generalizability to new designs. These limitations make them less practical for addressing large-scale, complex circuit problems. In this work we propose HOGA, a novel attention-based model for learning circuit representations in a scalable and generalizable manner. HOGA first computes hop-wise features per node prior to model training. Subsequently, the hop-wise features are solely used to produce node representations through a gated self-attention module, which adaptively learns important features among different hops without involving the graph topology. As a result, HOGA is adaptive to various structures across different circuits and can be efficiently trained in a distributed manner. To demonstrate the efficacy of HOGA, we consider two representative EDA tasks: quality of results (QoR) prediction and functional reasoning. Our experimental results indicate that (1) HOGA reduces estimation error over conventional GNNs by 46.76% for predicting QoR after logic synthesis; (2) HOGA improves 10.0% reasoning accuracy over GNNs for identifying functional blocks on unseen gate-level netlists after complex technology mapping; (3) The training time for HOGA almost linearly decreases with an increase in computing resources.
△ Less
Submitted 10 April, 2024; v1 submitted 2 March, 2024;
originally announced March 2024.
-
ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework
Authors:
Zhongqi Yang,
Elahe Khatibi,
Nitish Nagesh,
Mahyar Abbasian,
Iman Azimi,
Ramesh Jain,
Amir M. Rahmani
Abstract:
The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet,…
▽ More
The profound impact of food on health necessitates advanced nutrition-oriented food recommendation services. Conventional methods often lack the crucial elements of personalization, explainability, and interactivity. While Large Language Models (LLMs) bring interpretability and explainability, their standalone use falls short of achieving true personalization. In this paper, we introduce ChatDiet, a novel LLM-powered framework designed specifically for personalized nutrition-oriented food recommendation chatbots. ChatDiet integrates personal and population models, complemented by an orchestrator, to seamlessly retrieve and process pertinent information. The personal model leverages causal discovery and inference techniques to assess personalized nutritional effects for a specific user, whereas the population model provides generalized information on food nutritional content. The orchestrator retrieves, synergizes and delivers the output of both models to the LLM, providing tailored food recommendations designed to support targeted health outcomes. The result is a dynamic delivery of personalized and explainable food recommendations, tailored to individual user preferences. Our evaluation of ChatDiet includes a compelling case study, where we establish a causal personal model to estimate individual nutrition effects. Our assessments, including a food recommendation test showcasing a 92\% effectiveness rate, coupled with illustrative dialogue examples, underscore ChatDiet's strengths in explainability, personalization, and interactivity.
△ Less
Submitted 16 March, 2024; v1 submitted 18 February, 2024;
originally announced March 2024.
-
EROS: Entity-Driven Controlled Policy Document Summarization
Authors:
Joykirat Singh,
Sehban Fazili,
Rohan Jain,
Md Shad Akhtar
Abstract:
Privacy policy documents have a crucial role in educating individuals about the collection, usage, and protection of users' personal data by organizations. However, they are notorious for their lengthy, complex, and convoluted language especially involving privacy-related entities. Hence, they pose a significant challenge to users who attempt to comprehend organization's data usage policy. In this…
▽ More
Privacy policy documents have a crucial role in educating individuals about the collection, usage, and protection of users' personal data by organizations. However, they are notorious for their lengthy, complex, and convoluted language especially involving privacy-related entities. Hence, they pose a significant challenge to users who attempt to comprehend organization's data usage policy. In this paper, we propose to enhance the interpretability and readability of policy documents by using controlled abstractive summarization -- we enforce the generated summaries to include critical privacy-related entities (e.g., data and medium) and organization's rationale (e.g.,target and reason) in collecting those entities. To achieve this, we develop PD-Sum, a policy-document summarization dataset with marked privacy-related entity labels. Our proposed model, EROS, identifies critical entities through a span-based entity extraction model and employs them to control the information content of the summaries using proximal policy optimization (PPO). Comparison shows encouraging improvement over various baselines. Furthermore, we furnish qualitative and human evaluations to establish the efficacy of EROS.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Studying Differential Mental Health Expressions in India
Authors:
Khushi Shelat,
Sunny Rai,
Devansh R Jain,
Kishen Sivabalan,
Young Min Cho,
Maitreyi Redkar,
Samindara Sawant,
Sharath Chandra Guntuku
Abstract:
Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online…
▽ More
Psychosocial stressors and the symptomatology of mental disorders vary across cultures. However, current understandings of mental health expressions on social media are predominantly derived from studies in WEIRD (Western, Educated, Industrialized, Rich, and Democratic) contexts. In this paper, we analyze mental health posts on Reddit made by individuals in India, to identify variations in online depression language specific to the Indian context compared to users from the Rest of the World (ROW). Unlike in Western samples, we observe that mental health discussions in India additionally express sadness, use negation, are present-focused, and are related to work and achievement. Illness is uniquely correlated to India, indicating the association between depression and physical health in Indian patients. Two clinical psychologists validated the findings from social media posts and found 95% of the top 20 topics associated with mental health discussions as prevalent in Indians. Significant linguistic variations in online mental health-related language in India compared to ROW, emphasize the importance of developing precision-targeted interventions that are culturally appropriate.
△ Less
Submitted 16 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Construction of Linear Codes from the Unit Graph $G(\mathbb{Z}_{n}\oplus \mathbb{Z}_{m})$
Authors:
Wajid M. Shaikh,
Rupali S. Jain,
B. Surendranath Reddy
Abstract:
In this paper, we develop the python code for generating unit graph $G(\mathbb{Z}_{n}\oplus\mathbb{Z}_{m})$, for any integers $m\ \& \ n$. For any prime $r$, we construct $r$-ary linear codes from the incidence matrix of the unit graph $G(\mathbb{Z}_{n}\oplus\mathbb{Z}_{m})$, where $n \ \& \ m$ are either power of prime or product of power of primes. We also prove the minimum distance of dual of t…
▽ More
In this paper, we develop the python code for generating unit graph $G(\mathbb{Z}_{n}\oplus\mathbb{Z}_{m})$, for any integers $m\ \& \ n$. For any prime $r$, we construct $r$-ary linear codes from the incidence matrix of the unit graph $G(\mathbb{Z}_{n}\oplus\mathbb{Z}_{m})$, where $n \ \& \ m$ are either power of prime or product of power of primes. We also prove the minimum distance of dual of the constructed codes as either 3 or 4. Finally, we state conjectures two on linear codes constructed from the unit graph $G(\mathbb{Z}_{n}\oplus \mathbb{Z}_{m})$, for any integer $m\ \& \ n$.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
Chain of Logic: Rule-Based Reasoning with Large Language Models
Authors:
Sergio Servantez,
Joe Barrow,
Kristian Hammond,
Rajiv Jain
Abstract:
Rule-based reasoning, a fundamental type of legal reasoning, enables us to draw conclusions by accurately applying a rule to a set of facts. We explore causal language models as rule-based reasoners, specifically with respect to compositional rules - rules consisting of multiple elements which form a complex logical expression. Reasoning about compositional rules is challenging because it requires…
▽ More
Rule-based reasoning, a fundamental type of legal reasoning, enables us to draw conclusions by accurately applying a rule to a set of facts. We explore causal language models as rule-based reasoners, specifically with respect to compositional rules - rules consisting of multiple elements which form a complex logical expression. Reasoning about compositional rules is challenging because it requires multiple reasoning steps, and attending to the logical relationships between elements. We introduce a new prompting method, Chain of Logic, which elicits rule-based reasoning through decomposition (solving elements as independent threads of logic), and recomposition (recombining these sub-answers to resolve the underlying logical expression). This method was inspired by the IRAC (Issue, Rule, Application, Conclusion) framework, a sequential reasoning approach used by lawyers. We evaluate chain of logic across eight rule-based reasoning tasks involving three distinct compositional rules from the LegalBench benchmark and demonstrate it consistently outperforms other prompting methods, including chain of thought and self-ask, using open-source and commercial language models.
△ Less
Submitted 23 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.
-
Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients
Authors:
Mahyar Abbasian,
Zhongqi Yang,
Elahe Khatibi,
Pengfei Zhang,
Nitish Nagesh,
Iman Azimi,
Ramesh Jain,
Amir M. Rahmani
Abstract:
Effective diabetes management is crucial for maintaining health in diabetic patients. Large Language Models (LLMs) have opened new avenues for diabetes management, facilitating their efficacy. However, current LLM-based approaches are limited by their dependence on general sources and lack of integration with domain-specific knowledge, leading to inaccurate responses. In this paper, we propose a k…
▽ More
Effective diabetes management is crucial for maintaining health in diabetic patients. Large Language Models (LLMs) have opened new avenues for diabetes management, facilitating their efficacy. However, current LLM-based approaches are limited by their dependence on general sources and lack of integration with domain-specific knowledge, leading to inaccurate responses. In this paper, we propose a knowledge-infused LLM-powered conversational health agent (CHA) for diabetic patients. We customize and leverage the open-source openCHA framework, enhancing our CHA with external knowledge and analytical capabilities. This integration involves two key components: 1) incorporating the American Diabetes Association dietary guidelines and the Nutritionix information and 2) deploying analytical tools that enable nutritional intake calculation and comparison with the guidelines. We compare the proposed CHA with GPT4. Our evaluation includes 100 diabetes-related questions on daily meal choices and assessing the potential risks associated with the suggested diet. Our findings show that the proposed agent demonstrates superior performance in generating responses to manage essential nutrients.
△ Less
Submitted 28 February, 2024; v1 submitted 15 February, 2024;
originally announced February 2024.