Search | arXiv e-print repository

UKAN: Unbound Kolmogorov-Arnold Network Accompanied with Accelerated Library

Authors: Alireza Moradzadeh, Lukasz Wawrzyniak, Miles Macklin, Saee G. Paliwal

Abstract: In this work, we present a GPU-accelerated library for the underlying components of Kolmogorov-Arnold Networks (KANs), along with an algorithm to eliminate bounded grids in KANs. The GPU-accelerated library reduces the computational complexity of Basis Spline (B-spline) evaluation by a factor of $\mathcal{O}$(grid size) compared to existing codes, enabling batch computation for large-scale learnin… ▽ More In this work, we present a GPU-accelerated library for the underlying components of Kolmogorov-Arnold Networks (KANs), along with an algorithm to eliminate bounded grids in KANs. The GPU-accelerated library reduces the computational complexity of Basis Spline (B-spline) evaluation by a factor of $\mathcal{O}$(grid size) compared to existing codes, enabling batch computation for large-scale learning. To overcome the limitations of traditional KANs, we introduce Unbounded KANs (UKANs), which eliminate the need for a bounded grid and a fixed number of B-spline coefficients. To do so, we replace the KAN parameters (B-spline coefficients) with a coefficient generator (CG) model. The inputs to the CG model are designed based on the idea of an infinite symmetric grid extending from negative infinity to positive infinity. The positional encoding of grid group, a sequential collection of B-spline grid indexes, is fed into the CG model, and coefficients are consumed by the efficient implementation (matrix representations) of B-spline functions to generate outputs. We perform several experiments on regression, classification, and generative tasks, which are promising. In particular, UKAN does not require data normalization or a bounded domain for evaluation. Additionally, our benchmarking results indicate the superior memory and computational efficiency of our library compared to existing codes. △ Less

Submitted 20 August, 2024; originally announced August 2024.

Comments: 10 pages, 7 figures, 4 tables

arXiv:2406.07770 [pdf, other]

DualBind: A Dual-Loss Framework for Protein-Ligand Binding Affinity Prediction

Authors: Meng Liu, Saee Gopal Paliwal

Abstract: Accurate prediction of protein-ligand binding affinities is crucial for drug development. Recent advances in machine learning show promising results on this task. However, these methods typically rely heavily on labeled data, which can be scarce or unreliable, or they rely on assumptions like Boltzmann-distributed data that may not hold true in practice. Here, we present DualBind, a novel framewor… ▽ More Accurate prediction of protein-ligand binding affinities is crucial for drug development. Recent advances in machine learning show promising results on this task. However, these methods typically rely heavily on labeled data, which can be scarce or unreliable, or they rely on assumptions like Boltzmann-distributed data that may not hold true in practice. Here, we present DualBind, a novel framework that integrates supervised mean squared error (MSE) with unsupervised denoising score matching (DSM) to accurately learn the binding energy function. DualBind not only addresses the limitations of DSM-only models by providing more accurate absolute affinity predictions but also improves generalizability and reduces reliance on labeled data compared to MSE-only models. Our experimental results demonstrate that DualBind excels in predicting binding affinities and can effectively utilize both labeled and unlabeled data to enhance performance. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Preprint, work in progress

arXiv:2406.00855 [pdf, other]

LinkLogic: A New Method and Benchmark for Explainable Knowledge Graph Predictions

Authors: Niraj Kumar-Singh, Gustavo Polleti, Saee Paliwal, Rachel Hodos-Nkhereanye

Abstract: While there are a plethora of methods for link prediction in knowledge graphs, state-of-the-art approaches are often black box, obfuscating model reasoning and thereby limiting the ability of users to make informed decisions about model predictions. Recently, methods have emerged to generate prediction explanations for Knowledge Graph Embedding models, a widely-used class of methods for link predi… ▽ More While there are a plethora of methods for link prediction in knowledge graphs, state-of-the-art approaches are often black box, obfuscating model reasoning and thereby limiting the ability of users to make informed decisions about model predictions. Recently, methods have emerged to generate prediction explanations for Knowledge Graph Embedding models, a widely-used class of methods for link prediction. The question then becomes, how well do these explanation systems work? To date this has generally been addressed anecdotally, or through time-consuming user research. In this work, we present an in-depth exploration of a simple link prediction explanation method we call LinkLogic, that surfaces and ranks explanatory information used for the prediction. Importantly, we construct the first-ever link prediction explanation benchmark, based on family structures present in the FB13 dataset. We demonstrate the use of this benchmark as a rich evaluation sandbox, probing LinkLogic quantitatively and qualitatively to assess the fidelity, selectivity and relevance of the generated explanations. We hope our work paves the way for more holistic and empirical assessment of knowledge graph prediction explanation methods in the future. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 12 pages, 4 figures in main text. For code and data, see https://github.com/niraj17singh/LinkLogic

ACM Class: I.2.4

arXiv:2405.12842 [pdf, other]

SmartFlow: Robotic Process Automation using LLMs

Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig, Gautam Shroff

Abstract: Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we p… ▽ More Robotic Process Automation (RPA) systems face challenges in handling complex processes and diverse screen layouts that require advanced human-like decision-making capabilities. These systems typically rely on pixel-level encoding through drag-and-drop or automation frameworks such as Selenium to create navigation workflows, rather than visual understanding of screen elements. In this context, we present SmartFlow, an AI-based RPA system that uses pre-trained large language models (LLMs) coupled with deep-learning based image understanding. Our system can adapt to new scenarios, including changes in the user interface and variations in input data, without the need for human intervention. SmartFlow uses computer vision and natural language processing to perceive visible elements on the graphical user interface (GUI) and convert them into a textual representation. This information is then utilized by LLMs to generate a sequence of actions that are executed by a scripting engine to complete an assigned task. To assess the effectiveness of SmartFlow, we have developed a dataset that includes a set of generic enterprise applications with diverse layouts, which we are releasing for research use. Our evaluations on this dataset demonstrate that SmartFlow exhibits robustness across different layouts and applications. SmartFlow can automate a wide range of business processes such as form filling, customer service, invoice processing, and back-office operations. SmartFlow can thus assist organizations in enhancing productivity by automating an even larger fraction of screen-based workflows. The demo-video and dataset are available at https://smartflow-4c5a0a.webflow.io/. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 32nd ACM International Conference on Information and Knowledge Management

arXiv:2405.12742 [pdf, other]

Multi-Subject Personalization

Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Vikram Jamwal, Lovekesh Vig

Abstract: Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to… ▽ More Creative story illustration requires a consistent interplay of multiple characters or objects. However, conventional text-to-image models face significant challenges while producing images featuring multiple personalized subjects. For example, they distort the subject rendering, or the text descriptions fail to render coherent subject interactions. We present Multi-Subject Personalization (MSP) to alleviate some of these challenges. We implement MSP using Stable Diffusion and assess our approach against other text-to-image models, showcasing its consistent generation of good-quality images representing intended subjects and interactions. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: 2023 Conference on Neural Information Processing Systems

arXiv:2405.12531 [pdf, other]

CustomText: Customized Textual Image Generation using Diffusion Models

Authors: Shubham Paliwal, Arushi Jain, Monika Sharma, Vikram Jamwal, Lovekesh Vig

Abstract: Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the s… ▽ More Textual image generation spans diverse fields like advertising, education, product packaging, social media, information visualization, and branding. Despite recent strides in language-guided image synthesis using diffusion models, current models excel in image generation but struggle with accurate text rendering and offer limited control over font attributes. In this paper, we aim to enhance the synthesis of high-quality images with precise text customization, thereby contributing to the advancement of image generation models. We call our proposed method CustomText. Our implementation leverages a pre-trained TextDiffuser model to enable control over font color, background, and types. Additionally, to address the challenge of accurately rendering small-sized fonts, we train the ControlNet model for a consistency decoder, significantly enhancing text-generation performance. We assess the performance of CustomText in comparison to previous methods of textual image generation on the publicly available CTW-1500 dataset and a self-curated dataset for small-text generation, showcasing superior results. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Comments: Accepted by AI for Content Creation (AI4CC) workshop at CVPR 2024

arXiv:2212.03720 [pdf, other]

Pseudo-Riemannian Embedding Models for Multi-Relational Graph Representations

Authors: Saee Paliwal, Angus Brayne, Benedek Fabian, Maciej Wiatrak, Aaron Sim

Abstract: In this paper we generalize single-relation pseudo-Riemannian graph embedding models to multi-relational networks, and show that the typical approach of encoding relations as manifold transformations translates from the Riemannian to the pseudo-Riemannian case. In addition we construct a view of relations as separate spacetime submanifolds of multi-time manifolds, and consider an interpolation bet… ▽ More In this paper we generalize single-relation pseudo-Riemannian graph embedding models to multi-relational networks, and show that the typical approach of encoding relations as manifold transformations translates from the Riemannian to the pseudo-Riemannian case. In addition we construct a view of relations as separate spacetime submanifolds of multi-time manifolds, and consider an interpolation between a pseudo-Riemannian embedding model and its Wick-rotated Riemannian counterpart. We validate these extensions in the task of link prediction, focusing on flat Lorentzian manifolds, and demonstrate their use in both knowledge graph completion and knowledge discovery in a biological domain. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: 11 pages, 3 figures, AKBC 2022 conference

Journal ref: 4th Conference on Automated Knowledge Base Construction 2022

arXiv:2203.06873 [pdf, other]

TSR-DSAW: Table Structure Recognition via Deep Spatial Association of Words

Authors: Arushi Jain, Shubham Paliwal, Monika Sharma, Lovekesh Vig

Abstract: Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we… ▽ More Existing methods for Table Structure Recognition (TSR) from camera-captured or scanned documents perform poorly on complex tables consisting of nested rows / columns, multi-line texts and missing cell data. This is because current data-driven methods work by simply training deep models on large volumes of data and fail to generalize when an unseen table structure is encountered. In this paper, we propose to train a deep network to capture the spatial associations between different word pairs present in the table image for unravelling the table structure. We present an end-to-end pipeline, named TSR-DSAW: TSR via Deep Spatial Association of Words, which outputs a digital representation of a table image in a structured format such as HTML. Given a table image as input, the proposed method begins with the detection of all the words present in the image using a text-detection network like CRAFT which is followed by the generation of word-pairs using dynamic programming. These word-pairs are highlighted in individual images and subsequently, fed into a DenseNet-121 classifier trained to capture spatial associations such as same-row, same-column, same-cell or none. Finally, we perform post-processing on the classifier output to generate the table structure in HTML format. We evaluate our TSR-DSAW pipeline on two public table-image datasets -- PubTabNet and ICDAR 2013, and demonstrate improvement over previous methods such as TableNet and DeepDeSRT. △ Less

Submitted 14 March, 2022; originally announced March 2022.

Comments: 6 pages, 1 figure, 1 table, ESANN 2021 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Online event, 6-8 October 2021, i6doc.com publ., ISBN 978287587082-7

Journal ref: In ESANN 2021 proceedings, pages 257-262

arXiv:2109.03849 [pdf, other]

OSSR-PID: One-Shot Symbol Recognition in P&ID Sheets using Path Sampling and GCN

Authors: Shubham Paliwal, Monika Sharma, Lovekesh Vig

Abstract: Piping and Instrumentation Diagrams (P&ID) are ubiquitous in several manufacturing, oil and gas enterprises for representing engineering schematics and equipment layout. There is an urgent need to extract and digitize information from P&IDs without the cost of annotating a varying set of symbols for each new use case. A robust one-shot learning approach for symbol recognition i.e., localization fo… ▽ More Piping and Instrumentation Diagrams (P&ID) are ubiquitous in several manufacturing, oil and gas enterprises for representing engineering schematics and equipment layout. There is an urgent need to extract and digitize information from P&IDs without the cost of annotating a varying set of symbols for each new use case. A robust one-shot learning approach for symbol recognition i.e., localization followed by classification, would therefore go a long way towards this goal. Our method works by sampling pixels sequentially along the different contour boundaries in the image. These sampled points form paths which are used in the prototypical line diagram to construct a graph that captures the structure of the contours. Subsequently, the prototypical graphs are fed into a Dynamic Graph Convolutional Neural Network (DGCNN) which is trained to classify graphs into one of the given symbol classes. Further, we append embeddings from a Resnet-34 network which is trained on symbol images containing sampled points to make the classification network more robust. Since, many symbols in P&ID are structurally very similar to each other, we utilize Arcface loss during DGCNN training which helps in maximizing symbol class separability by producing highly discriminative embeddings. The images consist of components attached on the pipeline (straight line). The sampled points segregated around the symbol regions are used for the classification task. The proposed pipeline, named OSSR-PID, is fast and gives outstanding performance for recognition of symbols on a synthetic dataset of 100 P&ID diagrams. We also compare our method against prior-work on a real-world private dataset of 12 P&ID sheets and obtain comparable/superior results. Remarkably, it is able to achieve such excellent performance using only one prototypical example per symbol. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Journal ref: International Joint Conference on Neural Network (IJCNN), 2021

arXiv:2109.03794 [pdf, other]

doi 10.1007/978-3-030-75015-2_17

Digitize-PID: Automatic Digitization of Piping and Instrumentation Diagrams

Authors: Shubham Paliwal, Arushi Jain, Monika Sharma, Lovekesh Vig

Abstract: Digitization of scanned Piping and Instrumentation diagrams(P&ID), widely used in manufacturing or mechanical industries such as oil and gas over several decades, has become a critical bottleneck in dynamic inventory management and creation of smart P&IDs that are compatible with the latest CAD tools. Historically, P&ID sheets have been manually generated at the design stage, before being scanned… ▽ More Digitization of scanned Piping and Instrumentation diagrams(P&ID), widely used in manufacturing or mechanical industries such as oil and gas over several decades, has become a critical bottleneck in dynamic inventory management and creation of smart P&IDs that are compatible with the latest CAD tools. Historically, P&ID sheets have been manually generated at the design stage, before being scanned and stored as PDFs. Current digitization initiatives involve manual processing and are consequently very time consuming, labour intensive and error-prone.Thanks to advances in image processing, machine and deep learning techniques there are emerging works on P&ID digitization. However, existing solutions face several challenges owing to the variation in the scale, size and noise in the P&IDs, sheer complexity and crowdedness within drawings, domain knowledge required to interpret the drawings. This motivates our current solution called Digitize-PID which comprises of an end-to-end pipeline for detection of core components from P&IDs like pipes, symbols and textual information, followed by their association with each other and eventually, the validation and correction of output data based on inherent domain knowledge. A novel and efficient kernel-based line detection and a two-step method for detection of complex symbols based on a fine-grained deep recognition technique is presented in the paper. In addition, we have created an annotated synthetic dataset, Dataset-P&ID, of 500 P&IDs by incorporating different types of noise and complex symbols which is made available for public use (currently there exists no public P&ID dataset). We evaluate our proposed method on this synthetic dataset and a real-world anonymized private dataset of 12 P&ID sheets. Results show that Digitize-PID outperforms the existing state-of-the-art for P&ID digitization. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: 13 pages

Journal ref: Trends and Applications in Knowledge Discovery and Data Mining. 168-180, PAKDD 2021

arXiv:2106.08678 [pdf, other]

Directed Graph Embeddings in Pseudo-Riemannian Manifolds

Authors: Aaron Sim, Maciej Wiatrak, Angus Brayne, Páidí Creed, Saee Paliwal

Abstract: The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorpora… ▽ More The inductive biases of graph representation learning algorithms are often encoded in the background geometry of their embedding space. In this paper, we show that general directed graphs can be effectively represented by an embedding model that combines three components: a pseudo-Riemannian metric structure, a non-trivial global topology, and a unique likelihood function that explicitly incorporates a preferred direction in embedding space. We demonstrate the representational capabilities of this method by applying it to the task of link prediction on a series of synthetic and real directed graphs from natural language applications and biology. In particular, we show that low-dimensional cylindrical Minkowski and anti-de Sitter spacetimes can produce equal or better graph representations than curved Riemannian manifolds of higher dimensions. △ Less

Submitted 16 June, 2021; originally announced June 2021.

Comments: Accepted at ICML 2021

arXiv:2101.04727 [pdf, other]

doi 10.18653/v1/2020.alvr-1.5

Latent Alignment of Procedural Concepts in Multimodal Recipes

Authors: Hossein Rajaby Faghihi, Roshanak Mirzaee, Sudarshan Paliwal, Parisa Kordjamshidi

Abstract: We propose a novel alignment mechanism to deal with procedural reasoning on a newly released multimodal QA dataset, named RecipeQA. Our model is solving the textual cloze task which is a reading comprehension on a recipe containing images and instructions. We exploit the power of attention networks, cross-modal representations, and a latent alignment space between instructions and candidate answer… ▽ More We propose a novel alignment mechanism to deal with procedural reasoning on a newly released multimodal QA dataset, named RecipeQA. Our model is solving the textual cloze task which is a reading comprehension on a recipe containing images and instructions. We exploit the power of attention networks, cross-modal representations, and a latent alignment space between instructions and candidate answers to solve the problem. We introduce constrained max-pooling which refines the max-pooling operation on the alignment matrix to impose disjoint constraints among the outputs of the model. Our evaluation result indicates a 19\% improvement over the baselines. △ Less

Submitted 12 January, 2021; originally announced January 2021.

Comments: Published in ALVR 2020, a workshop in ACL 2020

ACM Class: I.2.7

Journal ref: Proceedings of the First Workshop on Advances in Language and Vision Research 2020 (26-31)

arXiv:2001.01469 [pdf, other]

TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images

Authors: Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig

Abstract: With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular s… ▽ More With the widespread use of mobile phones and scanners to photograph and upload documents, the need for extracting the information trapped in unstructured document images such as retail receipts, insurance claim forms and financial invoices is becoming more acute. A major hurdle to this objective is that these images often contain information in the form of tables and extracting data from tabular sub-images presents a unique set of challenges. This includes accurate detection of the tabular region within an image, and subsequently detecting and extracting information from the rows and columns of the detected table. While some progress has been made in table detection, extracting the table contents is still a challenge since this involves more fine grained table structure(rows & columns) recognition. Prior approaches have attempted to solve the table detection and structure recognition problems independently using two separate models. In this paper, we propose TableNet: a novel end-to-end deep learning model for both table detection and structure recognition. The model exploits the interdependence between the twin tasks of table detection and table structure recognition to segment out the table and column regions. This is followed by semantic rule-based row extraction from the identified tabular sub-regions. The proposed model and extraction approach was evaluated on the publicly available ICDAR 2013 and Marmot Table datasets obtaining state of the art results. Additionally, we demonstrate that feeding additional semantic features further improves model performance and that the model exhibits transfer learning across datasets. Another contribution of this paper is to provide additional table structure annotations for the Marmot data, which currently only has annotations for table detection. △ Less

Submitted 6 January, 2020; originally announced January 2020.

arXiv:1901.11383 [pdf, other]

Automatic Information Extraction from Piping and Instrumentation Diagrams

Authors: Rohit Rahul, Shubham Paliwal, Monika Sharma, Lovekesh Vig

Abstract: One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation,… ▽ More One of the most common modes of representing engineering schematics are Piping and Instrumentation diagrams (P&IDs) that describe the layout of an engineering process flow along with the interconnected process equipment. Over the years, P&ID diagrams have been manually generated, scanned and stored as image files. These files need to be digitized for purposes of inventory management and updation, and easy reference to different components of the schematics. There are several challenging vision problems associated with digitizing real world P&ID diagrams. Real world P&IDs come in several different resolutions, and often contain noisy textual information. Extraction of instrumentation information from these diagrams involves accurate detection of symbols that frequently have minute visual differences between them. Identification of pipelines that may converge and diverge at different points in the image is a further cause for concern. Due to these reasons, to the best of our knowledge, no system has been proposed for end-to-end data extraction from P&ID diagrams. However, with the advent of deep learning and the spectacular successes it has achieved in vision, we hypothesized that it is now possible to re-examine this problem armed with the latest deep learning models. To that end, we present a novel pipeline for information extraction from P&ID sheets via a combination of traditional vision techniques and state-of-the-art deep learning models to identify and isolate pipeline codes, pipelines, inlets and outlets, and for detecting symbols. This is followed by association of the detected components with the appropriate pipeline. The extracted pipeline information is used to populate a tree-like data-structure for capturing the structure of the piping schematics. We evaluated proposed method on a real world dataset of P&ID sheets obtained from an oil firm and have obtained promising results. △ Less

Submitted 28 January, 2019; originally announced January 2019.

Journal ref: IEEE ICPRAM 2019

arXiv:1507.05398 [pdf, other]

Generating Binary Optimal Codes Using Heterogeneous Parallel Computing

Authors: Srajan Paliwal, Saurabh Tiwary, Bhaskar Chaudhury, Manish K. Gupta

Abstract: Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths $n$ using serial algorithms is computationally very expensive, for example the worst case time complexity of a Greedy algorithm is $\mathcal{O}(n\; 4^n)$. In order to improve the efficiency of gener… ▽ More Generation of optimal codes is a well known problem in coding theory. Many computational approaches exist in the literature for finding record breaking codes. However generating codes with long lengths $n$ using serial algorithms is computationally very expensive, for example the worst case time complexity of a Greedy algorithm is $\mathcal{O}(n\; 4^n)$. In order to improve the efficiency of generating codes with long lengths, we propose and investigate some parallel algorithms using General Purpose Graphic Processing Units (GPGPU). This paper considers the implementation of parallel Greedy algorithm using GPGPU-CUDA (Computed Unified Device Architecture) framework and discusses various optimization techniques to accelerate the GPU code. The performance achieved for optimized parallel implementations is more than two to three orders of magnitude faster than that of serial implementation and shows a great potential of GPGPU in the field of coding theory applications. △ Less

Submitted 20 July, 2015; originally announced July 2015.

Comments: 8 pages, draft

Showing 1–15 of 15 results for author: Paliwal, S