Search | arXiv e-print repository

Pose Estimation from Camera Images for Underwater Inspection

Authors: Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

Abstract: High-precision localization is pivotal in underwater reinspection missions. Traditional localization methods like inertial navigation systems, Doppler velocity loggers, and acoustic positioning face significant challenges and are not cost-effective for some applications. Visual localization is a cost-effective alternative in such cases, leveraging the cameras already equipped on inspection vehicle… ▽ More High-precision localization is pivotal in underwater reinspection missions. Traditional localization methods like inertial navigation systems, Doppler velocity loggers, and acoustic positioning face significant challenges and are not cost-effective for some applications. Visual localization is a cost-effective alternative in such cases, leveraging the cameras already equipped on inspection vehicles to estimate poses from images of the surrounding scene. Amongst these, machine learning-based pose estimation from images shows promise in underwater environments, performing efficient relocalization using models trained based on previously mapped scenes. We explore the efficacy of learning-based pose estimators in both clear and turbid water inspection missions, assessing the impact of image formats, model architectures and training data diversity. We innovate by employing novel view synthesis models to generate augmented training data, significantly enhancing pose estimation in unexplored regions. Moreover, we enhance localization accuracy by integrating pose estimator outputs with sensor data via an extended Kalman filter, demonstrating improved trajectory smoothness and accuracy. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE Journal of Oceanic Engineering

arXiv:2405.07007 [pdf, ps, other]

A New Algorithm for Computing Branch Number of Non-Singular Matrices over Finite Fields

Authors: P. R. Mishra, Yogesh Kumar, Susanta Samanta, Atul Gaur

Abstract: The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear… ▽ More The notion of branch numbers of a linear transformation is crucial for both linear and differential cryptanalysis. The number of non-zero elements in a state difference or linear mask directly correlates with the active S-Boxes. The differential or linear branch number indicates the minimum number of active S-Boxes in two consecutive rounds of an SPN cipher, specifically for differential or linear cryptanalysis, respectively. This paper presents a new algorithm for computing the branch number of non-singular matrices over finite fields. The algorithm is based on the existing classical method but demonstrates improved computational complexity compared to its predecessor. We conduct a comparative study of the proposed algorithm and the classical approach, providing an analytical estimation of the algorithm's complexity. Our analysis reveals that the computational complexity of our algorithm is the square root of that of the classical approach. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.05354 [pdf, other]

Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios

Authors: Chirag Parikh, Ravi Shankar Mishra, Rohan Chandra, Ravi Kiran Sarvadevabhatla

Abstract: Recognizing driving behaviors is important for downstream tasks such as reasoning, planning, and navigation. Existing video recognition approaches work well for common behaviors (e.g. "drive straight", "brake", "turn left/right"). However, the performance is sub-par for underrepresented/rare behaviors typically found in tail of the behavior class distribution. To address this shortcoming, we propo… ▽ More Recognizing driving behaviors is important for downstream tasks such as reasoning, planning, and navigation. Existing video recognition approaches work well for common behaviors (e.g. "drive straight", "brake", "turn left/right"). However, the performance is sub-par for underrepresented/rare behaviors typically found in tail of the behavior class distribution. To address this shortcoming, we propose Transfer-LMR, a modular training routine for improving the recognition performance across all driving behavior classes. We extensively evaluate our approach on METEOR and HDD datasets that contain rich yet heavy-tailed distribution of driving behaviors and span diverse traffic scenarios. The experimental results demonstrate the efficacy of our approach, especially for recognizing underrepresented/rare driving behaviors. △ Less

Submitted 8 May, 2024; originally announced May 2024.

arXiv:2405.00717 [pdf, other]

Exploring News Summarization and Enrichment in a Highly Resource-Scarce Indian Language: A Case Study of Mizo

Authors: Abhinaba Bala, Ashok Urlana, Rahul Mishra, Parameswari Krishnamurthy

Abstract: Obtaining sufficient information in one's mother tongue is crucial for satisfying the information needs of the users. While high-resource languages have abundant online resources, the situation is less than ideal for very low-resource languages. Moreover, the insufficient reporting of vital national and international events continues to be a worry, especially in languages with scarce resources, li… ▽ More Obtaining sufficient information in one's mother tongue is crucial for satisfying the information needs of the users. While high-resource languages have abundant online resources, the situation is less than ideal for very low-resource languages. Moreover, the insufficient reporting of vital national and international events continues to be a worry, especially in languages with scarce resources, like \textbf{Mizo}. In this paper, we conduct a study to investigate the effectiveness of a simple methodology designed to generate a holistic summary for Mizo news articles, which leverages English-language news to supplement and enhance the information related to the corresponding news events. Furthermore, we make available 500 Mizo news articles and corresponding enriched holistic summaries. Human evaluation confirms that our approach significantly enhances the information coverage of Mizo news articles. The mizo dataset and code can be accessed at \url{https://github.com/barvin04/mizo_enrichment △ Less

Submitted 25 April, 2024; originally announced May 2024.

Comments: Accepted at LREC-COLING2024 WILDRE Workshop

ACM Class: I.2.7

arXiv:2404.15774 [pdf, other]

Toward Physics-Aware Deep Learning Architectures for LiDAR Intensity Simulation

Authors: Vivek Anand, Bharat Lohani, Gaurav Pandey, Rakesh Mishra

Abstract: Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the env… ▽ More Autonomous vehicles (AVs) heavily rely on LiDAR perception for environment understanding and navigation. LiDAR intensity provides valuable information about the reflected laser signals and plays a crucial role in enhancing the perception capabilities of AVs. However, accurately simulating LiDAR intensity remains a challenge due to the unavailability of material properties of the objects in the environment, and complex interactions between the laser beam and the environment. The proposed method aims to improve the accuracy of intensity simulation by incorporating physics-based modalities within the deep learning framework. One of the key entities that captures the interaction between the laser beam and the objects is the angle of incidence. In this work we demonstrate that the addition of the LiDAR incidence angle as a separate input to the deep neural networks significantly enhances the results. We present a comparative study between two prominent deep learning architectures: U-NET a Convolutional Neural Network (CNN), and Pix2Pix a Generative Adversarial Network (GAN). We implemented these two architectures for the intensity prediction task and used SemanticKITTI and VoxelScape datasets for experiments. The comparative analysis reveals that both architectures benefit from the incidence angle as an additional input. Moreover, the Pix2Pix architecture outperforms U-NET, especially when the incidence angle is incorporated. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 7 pages, 7 figures

arXiv:2404.09561 [pdf, ps, other]

Generalization the parameters of minimal linear codes over the ring $\mathbb{Z}_{p^l}$ and $\mathbb{Z}_{{p_1}{p_2}}$

Authors: Biplab Chatterjee, Ratnesh Kumar Mishra

Abstract: In this article, We introduce a condition that is both necessary and sufficient for a linear code to achieve minimality when analyzed over the rings $\mathbb{Z}_{n}$.The fundamental inquiry in minimal linear codes is the existence of a $[m,k]$ minimal linear code where $k$ is less than or equal to $m$. W. Lu et al. ( see \cite{nine}) showed that there exists a positive integer $m(k;q)$ such that f… ▽ More In this article, We introduce a condition that is both necessary and sufficient for a linear code to achieve minimality when analyzed over the rings $\mathbb{Z}_{n}$.The fundamental inquiry in minimal linear codes is the existence of a $[m,k]$ minimal linear code where $k$ is less than or equal to $m$. W. Lu et al. ( see \cite{nine}) showed that there exists a positive integer $m(k;q)$ such that for $m\geq m(k;q)$ a minimal linear code of length $m$ and dimension $k$ over a finite field $\mathbb{F}_q$ must exist. They give the upper and lower bound of $m(k;q)$. In this manuscript, we establish both an upper and lower bound for $m(k;p^l)$ and $m(k;p_1p_2)$ within the ring $\mathbb{Z}_{p^l}$ and $\mathbb{Z}_{p_1p_2}$ respectively. △ Less

Submitted 26 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.08250 [pdf, other]

doi 10.1007/s12190-024-02142-z

A Systematic Construction Approach for All $4\times 4$ Involutory MDS Matrices

Authors: Yogesh Kumar, P. R. Mishra, Susanta Samanta, Atul Gaur

Abstract: Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even or… ▽ More Maximum distance separable (MDS) matrices play a crucial role not only in coding theory but also in the design of block ciphers and hash functions. Of particular interest are involutory MDS matrices, which facilitate the use of a single circuit for both encryption and decryption in hardware implementations. In this article, we present several characterizations of involutory MDS matrices of even order. Additionally, we introduce a new matrix form for obtaining all involutory MDS matrices of even order and compare it with other matrix forms available in the literature. We then propose a technique to systematically construct all $4 \times 4$ involutory MDS matrices over a finite field $\mathbb{F}_{2^m}$. This method significantly reduces the search space by focusing on involutory MDS class representative matrices, leading to the generation of all such matrices within a substantially smaller set compared to considering all $4 \times 4$ involutory matrices. Specifically, our approach involves searching for these representative matrices within a set of cardinality $(2^m-1)^5$. Through this method, we provide an explicit enumeration of the total number of $4 \times 4$ involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=3,4,\ldots,8$. △ Less

Submitted 17 June, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Journal ref: Journal of Applied Mathematics and Computing, 14 Jun 2024

arXiv:2403.15529 [pdf, other]

LimGen: Probing the LLMs for Generating Suggestive Limitations of Research Papers

Authors: Abdur Rahman Bin Md Faizullah, Ashok Urlana, Rahul Mishra

Abstract: Examining limitations is a crucial step in the scholarly research reviewing process, revealing aspects where a study might lack decisiveness or require enhancement. This aids readers in considering broader implications for further research. In this article, we present a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers. We compile a dataset called \textbf{\te… ▽ More Examining limitations is a crucial step in the scholarly research reviewing process, revealing aspects where a study might lack decisiveness or require enhancement. This aids readers in considering broader implications for further research. In this article, we present a novel and challenging task of Suggestive Limitation Generation (SLG) for research papers. We compile a dataset called \textbf{\textit{LimGen}}, encompassing 4068 research papers and their associated limitations from the ACL anthology. We investigate several approaches to harness large language models (LLMs) for producing suggestive limitations, by thoroughly examining the related challenges, practical insights, and potential opportunities. Our LimGen dataset and code can be accessed at \url{https://github.com/arbmf/LimGen}. △ Less

Submitted 14 June, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted at ECML-PKDD 2024

arXiv:2403.10372 [pdf, other]

Construction of all MDS and involutory MDS matrices

Authors: Yogesh Kumar, P. R. Mishra, Susanta Samanta, Kishan Chand Gupta, Atul Gaur

Abstract: In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best… ▽ More In this paper, we propose two algorithms for a hybrid construction of all $n\times n$ MDS and involutory MDS matrices over a finite field $\mathbb{F}_{p^m}$, respectively. The proposed algorithms effectively narrow down the search space to identify $(n-1) \times (n-1)$ MDS matrices, facilitating the generation of all $n \times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. To the best of our knowledge, existing literature lacks methods for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. In our approach, we introduce a representative matrix form for generating all $n\times n$ MDS and involutory MDS matrices over $\mathbb{F}_{p^m}$. The determination of these representative MDS matrices involves searching through all $(n-1)\times (n-1)$ MDS matrices over $\mathbb{F}_{p^m}$. Our contributions extend to proving that the count of all $3\times 3$ MDS matrices over $\mathbb{F}_{2^m}$ is precisely $(2^m-1)^5(2^m-2)(2^m-3)(2^{2m}-9\cdot 2^m+21)$. Furthermore, we explicitly provide the count of all $4\times 4$ MDS and involutory MDS matrices over $\mathbb{F}_{2^m}$ for $m=2, 3, 4$. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.08360 [pdf, other]

Improved Image-based Pose Regressor Models for Underwater Environments

Authors: Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra

Abstract: We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in b… ▽ More We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in both simulated and clear waters, promising effective real-world underwater navigation and inspection applications. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Presented at AUV Symposium 2022

arXiv:2402.14558 [pdf, other]

LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey

Authors: Ashok Urlana, Charaka Vinayak Kumar, Ajeet Kumar Singh, Bala Mallikarjunarao Garlapati, Srinivasa Rao Chalamala, Rahul Mishra

Abstract: Large language models (LLMs) have become the secret ingredient driving numerous industrial applications, showcasing their remarkable versatility across a diverse spectrum of tasks. From natural language processing and sentiment analysis to content generation and personalized recommendations, their unparalleled adaptability has facilitated widespread adoption across industries. This transformative… ▽ More Large language models (LLMs) have become the secret ingredient driving numerous industrial applications, showcasing their remarkable versatility across a diverse spectrum of tasks. From natural language processing and sentiment analysis to content generation and personalized recommendations, their unparalleled adaptability has facilitated widespread adoption across industries. This transformative shift driven by LLMs underscores the need to explore the underlying associated challenges and avenues for enhancement in their utilization. In this paper, our objective is to unravel and evaluate the obstacles and opportunities inherent in leveraging LLMs within an industrial context. To this end, we conduct a survey involving a group of industry practitioners, develop four research questions derived from the insights gathered, and examine 68 industry papers to address these questions and derive meaningful conclusions. △ Less

Submitted 22 February, 2024; originally announced February 2024.

Comments: 25 pages, 7 figures

arXiv:2402.13571 [pdf]

Multilingual Coreference Resolution in Low-resource South Asian Languages

Authors: Ritwik Mishra, Pooja Desur, Rajiv Ratn Shah, Ponnurangam Kumaraguru

Abstract: Coreference resolution involves the task of identifying text spans within a discourse that pertain to the same real-world entity. While this task has been extensively explored in the English language, there has been a notable scarcity of publicly accessible resources and models for coreference resolution in South Asian languages. We introduce a Translated dataset for Multilingual Coreference Resol… ▽ More Coreference resolution involves the task of identifying text spans within a discourse that pertain to the same real-world entity. While this task has been extensively explored in the English language, there has been a notable scarcity of publicly accessible resources and models for coreference resolution in South Asian languages. We introduce a Translated dataset for Multilingual Coreference Resolution (TransMuCoRes) in 31 South Asian languages using off-the-shelf tools for translation and word-alignment. Nearly all of the predicted translations successfully pass a sanity check, and 75% of English references align with their predicted translations. Using multilingual encoders, two off-the-shelf coreference resolution models were trained on a concatenation of TransMuCoRes and a Hindi coreference resolution dataset with manual annotations. The best performing model achieved a score of 64 and 68 for LEA F1 and CoNLL F1, respectively, on our test-split of Hindi golden set. This study is the first to evaluate an end-to-end coreference resolution model on a Hindi golden set. Furthermore, this work underscores the limitations of current coreference evaluation metrics when applied to datasets with split antecedents, advocating for the development of more suitable evaluation metrics. △ Less

Submitted 23 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted at LREC-COLING 2024

arXiv:2402.10115 [pdf, other]

Generating Visual Stimuli from EEG Recordings using Transformer-encoder based EEG encoder and GAN

Authors: Rahul Mishra, Arnav Bhavsar

Abstract: In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transforme… ▽ More In this study, we tackle a modern research challenge within the field of perceptual brain decoding, which revolves around synthesizing images from EEG signals using an adversarial deep learning framework. The specific objective is to recreate images belonging to various object categories by leveraging EEG recordings obtained while subjects view those images. To achieve this, we employ a Transformer-encoder based EEG encoder to produce EEG encodings, which serve as inputs to the generator component of the GAN network. Alongside the adversarial loss, we also incorporate perceptual loss to enhance the quality of the generated images. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.09095 [pdf, other]

FedSiKD: Clients Similarity and Knowledge Distillation: Addressing Non-i.i.d. and Constraints in Federated Learning

Authors: Yousef Alsenani, Rahul Mishra, Khaled R. Ahmed, Atta Ur Rahman

Abstract: In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of co… ▽ More In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of communication rounds can be risky and potentially unsafe for model exploitation. Traditional FL approaches may suffer from these challenges. Therefore, we introduce FedSiKD, which incorporates knowledge distillation (KD) within a similarity-based federated learning framework. As clients join the system, they securely share relevant statistics about their data distribution, promoting intra-cluster homogeneity. This enhances optimization efficiency and accelerates the learning process, effectively transferring knowledge between teacher and student models and addressing device constraints. FedSiKD outperforms state-of-the-art algorithms by achieving higher accuracy, exceeding by 25\% and 18\% for highly skewed data at $α= {0.1,0.5}$ on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (https://github.com/SimuEnv/FedSiKD) △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 11 pages, 10 figures Under Review - IEEE Transactions on Information Forensics & Security

arXiv:2402.07054 [pdf]

HNMblock: Blockchain technology powered Healthcare Network Model for epidemiological monitoring, medical systems security, and wellness

Authors: Naresh Kshetri, Rahul Mishra, Mir Mehedi Rahman, Tanja Steigner

Abstract: In the ever-evolving healthcare sector, the widespread adoption of Internet of Things and wearable technologies facilitates remote patient monitoring. However, the existing client/server infrastructure poses significant security and privacy challenges, necessitating strict adherence to healthcare data regulations. To combat these issues, a decentralized approach is imperative, and blockchain techn… ▽ More In the ever-evolving healthcare sector, the widespread adoption of Internet of Things and wearable technologies facilitates remote patient monitoring. However, the existing client/server infrastructure poses significant security and privacy challenges, necessitating strict adherence to healthcare data regulations. To combat these issues, a decentralized approach is imperative, and blockchain technology emerges as a compelling solution for strengthening Internet of Things and medical systems security. This paper introduces HNMblock, a model that elevates the realms of epidemiological monitoring, medical system security, and wellness enhancement. By harnessing the transparency and immutability inherent in blockchain, HNMblock empowers real-time, tamper-proof tracking of epidemiological data, enabling swift responses to disease outbreaks. Furthermore, it fortifies the security of medical systems through advanced cryptographic techniques and smart contracts, with a paramount focus on safeguarding patient privacy. HNMblock also fosters personalized healthcare, encouraging patient involvement and data-informed decision-making. The integration of blockchain within the healthcare domain, as exemplified by HNMblock, holds the potential to revolutionize data management, epidemiological surveillance, and wellness, as meticulously explored in this research article. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: 7 pages, 3 figures

arXiv:2402.00260 [pdf, other]

Human-mediated Large Language Models for Robotic Intervention in Children with Autism Spectrum Disorders

Authors: Ruchik Mishra, Karla Conn Welch, Dan O Popa

Abstract: The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-takin… ▽ More The robotic intervention for individuals with Autism Spectrum Disorder (ASD) has generally used pre-defined scripts to deliver verbal content during one-to-one therapy sessions. This practice restricts the use of robots to limited, pre-mediated instructional curricula. In this paper, we increase robot autonomy in one such robotic intervention for children with ASD by implementing perspective-taking teaching. Our approach uses large language models (LLM) to generate verbal content as texts and then deliver it to the child via robotic speech. In the proposed pipeline, we teach perspective-taking through which our robot takes up three roles: initiator, prompter, and reinforcer. We adopted the GPT-2 + BART pipelines to generate social situations, ask questions (as initiator), and give options (as prompter) when required. The robot encourages the child by giving positive reinforcement for correct answers (as a reinforcer). In addition to our technical contribution, we conducted ten-minute sessions with domain experts simulating an actual perspective teaching session, with the researcher acting as a child participant. These sessions validated our robotic intervention pipeline through surveys, including those from NASA TLX and GodSpeed. We used BERTScore to compare our GPT-2 + BART pipeline with an all GPT-2 and found the performance of the former to be better. Based on the responses by the domain experts, the robot session demonstrated higher performance with no additional increase in mental or physical demand, temporal demand, effort, or frustration compared to a no-robot session. We also concluded that the domain experts perceived the robot as ideally safe, likable, and reliable. △ Less

Submitted 9 July, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: This work is submitted for possible publication

arXiv:2401.14111 [pdf, other]

Image Synthesis with Graph Conditioning: CLIP-Guided Diffusion Models for Scene Graphs

Authors: Rameshwar Mishra, A V Subramanyam

Abstract: Advancements in generative models have sparked significant interest in generating images while adhering to specific structural guidelines. Scene graph to image generation is one such task of generating images which are consistent with the given scene graph. However, the complexity of visual scenes poses a challenge in accurately aligning objects based on specified relations within the scene graph.… ▽ More Advancements in generative models have sparked significant interest in generating images while adhering to specific structural guidelines. Scene graph to image generation is one such task of generating images which are consistent with the given scene graph. However, the complexity of visual scenes poses a challenge in accurately aligning objects based on specified relations within the scene graph. Existing methods approach this task by first predicting a scene layout and generating images from these layouts using adversarial training. In this work, we introduce a novel approach to generate images from scene graphs which eliminates the need of predicting intermediate layouts. We leverage pre-trained text-to-image diffusion models and CLIP guidance to translate graph knowledge into images. Towards this, we first pre-train our graph encoder to align graph features with CLIP features of corresponding images using a GAN based training. Further, we fuse the graph features with CLIP embedding of object labels present in the given scene graph to create a graph consistent CLIP guided conditioning signal. In the conditioning input, object embeddings provide coarse structure of the image and graph features provide structural alignment based on relationships among objects. Finally, we fine tune a pre-trained diffusion model with the graph consistent conditioning signal with reconstruction and CLIP alignment loss. Elaborate experiments reveal that our method outperforms existing methods on standard benchmarks of COCO-stuff and Visual Genome dataset. △ Less

Submitted 22 July, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.15954 [pdf, ps, other]

On two-dimensional minimal linear codes over the rings $\mathbb{Z}_{p^n}$

Authors: Biplab Chatterjee, Ratnesh Kumar Mishra

Abstract: In this paper we study two dimensional minimal linear code over the ring $\mathbb{Z}_{p^n}$(where $p$ is prime). We show that if the generator matrix $G$ of the two dimensional linear code $M$ contains $p^n+p^{n-1}$ column vector of the following type {\scriptsize{$u_{l_1}\begin{pmatrix} 1\\ 0 \end{pmatrix}$, $u_{l_2}\begin{pmatrix} 0\\1 \end{pmatrix}$,… ▽ More In this paper we study two dimensional minimal linear code over the ring $\mathbb{Z}_{p^n}$(where $p$ is prime). We show that if the generator matrix $G$ of the two dimensional linear code $M$ contains $p^n+p^{n-1}$ column vector of the following type {\scriptsize{$u_{l_1}\begin{pmatrix} 1\\ 0 \end{pmatrix}$, $u_{l_2}\begin{pmatrix} 0\\1 \end{pmatrix}$, $u_{l_3}\begin{pmatrix} 1\\u_1 \end{pmatrix}$, $u_{l_4}\begin{pmatrix} 1\\u_2 \end{pmatrix}$,...,$u_{l_{p^n-p^{n-1}+2}} \begin{pmatrix} 1\\u_{p^n-p^{n-1}} \end{pmatrix}$, $u_{l_{p^n-p^{n-1}+3}}\begin{pmatrix} d_1 \\ 1 \end{pmatrix}$, $u_{l_{p^n-p^{n-1}+4}}\begin{pmatrix} d_2\\ 1 \end{pmatrix}$,..., $u_{l_{p^n+1}}\begin{pmatrix} d_{p^{n-1}-1}\\1 \end{pmatrix}$, $u_{l_{p^n+2}}\begin{pmatrix} 1\\d_1 \end{pmatrix}$, $u_{l_{p^n+3}}\begin{pmatrix} 1\\d_2 \end{pmatrix}$,...,$u_{l_{p^n+p^{n-1}}}\begin{pmatrix} 1 \\d_{p^{n-1}-1} \end{pmatrix}$}}, where $u_i$ and $d_j$ are distinct units and zero divisors respectively in the ring $\mathbb{Z}_{p^n}$ for $1\leq i \leq p^n+p^{n-1}$, $1\leq j \leq p^{n-1}-1$ and additionally, denote $u_{l_i}$ as units in $\mathbb{Z}_{p^n}$, then the module generated by $G$ is a minimal linear code. Also we show that if any one column vector of the above types are not present entirely in $G$, then the generated module is not a minimal linear code. △ Less

Submitted 3 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

arXiv:2312.07709 [pdf, other]

Majority is Not Required: A Rational Analysis of the Private Double-Spend Attack from a Sub-Majority Adversary

Authors: Yanni Georghiades, Rajesh Mishra, Karl Kreder, Sriram Vishwanath

Abstract: We study the incentives behind double-spend attacks on Nakamoto-style Proof-of-Work cryptocurrencies. In these systems, miners are allowed to choose which transactions to reference with their block, and a common strategy for selecting transactions is to simply choose those with the highest fees. This can be problematic if these transactions originate from an adversary with substantial (but less th… ▽ More We study the incentives behind double-spend attacks on Nakamoto-style Proof-of-Work cryptocurrencies. In these systems, miners are allowed to choose which transactions to reference with their block, and a common strategy for selecting transactions is to simply choose those with the highest fees. This can be problematic if these transactions originate from an adversary with substantial (but less than 50\%) computational power, as high-value transactions can present an incentive for a rational adversary to attempt a double-spend attack if they expect to profit. The most common mechanism for deterring double-spend attacks is for the recipients of large transactions to wait for additional block confirmations (i.e., to increase the attack cost). We argue that this defense mechanism is not satisfactory, as the security of the system is contingent on the actions of its users. Instead, we propose that defending against double-spend attacks should be the responsibility of the miners; specifically, miners should limit the amount of transaction value they include in a block (i.e., reduce the attack reward). To this end, we model cryptocurrency mining as a mean-field game in which we augment the standard mining reward function to simulate the presence of a rational, double-spending adversary. We design and implement an algorithm which characterizes the behavior of miners at equilibrium, and we show that miners who use the adversary-aware reward function accumulate more wealth than those who do not. We show that the optimal strategy for honest miners is to limit the amount of value transferred by each block such that the adversary's expected profit is 0. Additionally, we examine Bitcoin's resilience to double-spend attacks. Assuming a 6 block confirmation time, we find that an attacker with at least 25% of the network mining power can expect to profit from a double-spend attack. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.09212 [pdf, other]

Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey

Authors: Ashok Urlana, Pruthwik Mishra, Tathagato Roy, Rahul Mishra

Abstract: Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. Despite a growing corpus of controllable summarization research, there is no comprehensive survey available… ▽ More Generic text summarization approaches often fail to address the specific intent and needs of individual users. Recently, scholarly attention has turned to the development of summarization methods that are more closely tailored and controlled to align with specific objectives and user needs. Despite a growing corpus of controllable summarization research, there is no comprehensive survey available that thoroughly explores the diverse controllable attributes employed in this context, delves into the associated challenges, and investigates the existing solutions. In this survey, we formalize the Controllable Text Summarization (CTS) task, categorize controllable attributes according to their shared characteristics and objectives, and present a thorough examination of existing datasets and methods within each category. Moreover, based on our findings, we uncover limitations and research gaps, while also exploring potential solutions and future directions for CTS. We release our detailed analysis of CTS papers at https://github.com/ashokurlana/controllable_text_summarization_survey. △ Less

Submitted 27 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 21 pages, 6 figures, Accepted in ACL Findings 2024

ACM Class: I.2.7

arXiv:2310.05229 [pdf, other]

Design Verification of the Quantum Control Stack

Authors: Seyed Amir Alavi, Samin Ishtiaq, Nick Johnson, Rojalin Mishra, Dwaraka Oruganti Nagalakshmi, Asher Pearl, Jan Snoeijs

Abstract: This paper describes the verification of the classical software and hardware stack that is used to control cold atom- and superconducting-based quantum computing hardware. The paper serves both as an introduction to quantum computing and to how classical device verification techniques can be employed there. Two main challenges in building a quantum control stack are generating precise deterministi… ▽ More This paper describes the verification of the classical software and hardware stack that is used to control cold atom- and superconducting-based quantum computing hardware. The paper serves both as an introduction to quantum computing and to how classical device verification techniques can be employed there. Two main challenges in building a quantum control stack are generating precise deterministic-timing operations at the edge and scaled-out processing in the middle layer. Both challenges are to do with a certain kind of functional performance correctness. And, as usual, the design lives under tight power, memory and latency constraints. The quantum control stack is a complex interaction of algorithms, software runtimes and digital hardware. We take inspiration from modern software approaches to engineering, such as continuous integration and hardware automation, to quickly ship experimental features to customers in the field. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: In DVCon Europe 2023

ACM Class: D.1; C.1

arXiv:2309.15886 [pdf, ps, other]

Projection based fuzzy least squares twin support vector machine for class imbalance problems

Authors: M. Tanveer, Ritik Mishra, Bharat Richhariya

Abstract: Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well no… ▽ More Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2307.08237 [pdf, other]

doi 10.1145/3580305.3599763

A Look into Causal Effects under Entangled Treatment in Graphs: Investigating the Impact of Contact on MRSA Infection

Authors: Jing Ma, Chen Chen, Anil Vullikanti, Ritwick Mishra, Gregory Madden, Daniel Borrajo, Jundong Li

Abstract: Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatme… ▽ More Methicillin-resistant Staphylococcus aureus (MRSA) is a type of bacteria resistant to certain antibiotics, making it difficult to prevent MRSA infections. Among decades of efforts to conquer infectious diseases caused by MRSA, many studies have been proposed to estimate the causal effects of close contact (treatment) on MRSA infection (outcome) from observational data. In this problem, the treatment assignment mechanism plays a key role as it determines the patterns of missing counterfactuals -- the fundamental challenge of causal effect estimation. Most existing observational studies for causal effect learning assume that the treatment is assigned individually for each unit. However, on many occasions, the treatments are pairwisely assigned for units that are connected in graphs, i.e., the treatments of different units are entangled. Neglecting the entangled treatments can impede the causal effect estimation. In this paper, we study the problem of causal effect estimation with treatment entangled in a graph. Despite a few explorations for entangled treatments, this problem still remains challenging due to the following challenges: (1) the entanglement brings difficulties in modeling and leveraging the unknown treatment assignment mechanism; (2) there may exist hidden confounders which lead to confounding biases in causal effect estimation; (3) the observational data is often time-varying. To tackle these challenges, we propose a novel method NEAT, which explicitly leverages the graph structure to model the treatment assignment mechanism, and mitigates confounding biases based on the treatment assignment modeling. We also extend our method into a dynamic setting to handle time-varying observational data. Experiments on both synthetic datasets and a real-world MRSA dataset validate the effectiveness of the proposed method, and provide insights for future applications. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2307.05909 [pdf]

Exploring AI Tool's Versatile Responses: An In-depth Analysis Across Different Industries and Its Performance Evaluation

Authors: Hitesh Mohapatra, Soumya Ranjan Mishra

Abstract: AI Tool is a large language model (LLM) designed to generate human-like responses in natural language conversations. It is trained on a massive corpus of text from the internet, which allows it to leverage a broad understanding of language, general knowledge, and various domains. AI Tool can provide information, engage in conversations, assist with tasks, and even offer creative suggestions. The u… ▽ More AI Tool is a large language model (LLM) designed to generate human-like responses in natural language conversations. It is trained on a massive corpus of text from the internet, which allows it to leverage a broad understanding of language, general knowledge, and various domains. AI Tool can provide information, engage in conversations, assist with tasks, and even offer creative suggestions. The underlying technology behind AI Tool is a transformer neural network. Transformers excel at capturing long-range dependencies in text, making them well-suited for language-related tasks. AI Tool has 175 billion parameters, making it one of the largest and most powerful LLMs to date. This work presents an overview of AI Tool's responses on various sectors of industry. Further, the responses of AI Tool have been cross-verified with human experts in the corresponding fields. To validate the performance of AI Tool, a few explicit parameters have been considered and the evaluation has been done. This study will help the research community and other users to understand the uses of AI Tool and its interaction pattern. The results of this study show that AI Tool is able to generate human-like responses that are both informative and engaging. However, it is important to note that AI Tool can occasionally produce incorrect or nonsensical answers. It is therefore important to critically evaluate the information that AI Tool provides and to verify it from reliable sources when necessary. Overall, this study suggests that AI Tool is a promising new tool for natural language processing, and that it has the potential to be used in a wide variety of applications. △ Less

Submitted 21 August, 2023; v1 submitted 12 July, 2023; originally announced July 2023.

ACM Class: I.2.6

arXiv:2307.02480 [pdf, other]

doi 10.21227/av6q-jj17

A Dataset of Inertial Measurement Units for Handwritten English Alphabets

Authors: Hari Prabhat Gupta, Rahul Mishra

Abstract: This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces var… ▽ More This paper presents an end-to-end methodology for collecting datasets to recognize handwritten English alphabets by utilizing Inertial Measurement Units (IMUs) and leveraging the diversity present in the Indian writing style. The IMUs are utilized to capture the dynamic movement patterns associated with handwriting, enabling more accurate recognition of alphabets. The Indian context introduces various challenges due to the heterogeneity in writing styles across different regions and languages. By leveraging this diversity, the collected dataset and the collection system aim to achieve higher recognition accuracy. Some preliminary experimental results demonstrate the effectiveness of the dataset in accurately recognizing handwritten English alphabet in the Indian context. This research can be extended and contributes to the field of pattern recognition and offers valuable insights for developing improved systems for handwriting recognition, particularly in diverse linguistic and cultural contexts. △ Less

Submitted 5 July, 2023; originally announced July 2023.

Comments: 10 pages, 12 figures

arXiv:2307.01309 [pdf, other]

doi 10.1109/ACIIW59127.2023.10388187

Social Impressions of the NAO Robot and its Impact on Physiology

Authors: Ruchik Mishra, Karla Conn Welch

Abstract: The social applications of robots possess intrinsic challenges with respect to social paradigms and heterogeneity of different groups. These challenges can be in the form of social acceptability, anthropomorphism, likeability, past experiences with robots etc. In this paper, we have considered a group of neurotypical adults to describe how different voices and motion types of the NAO robot can hav… ▽ More The social applications of robots possess intrinsic challenges with respect to social paradigms and heterogeneity of different groups. These challenges can be in the form of social acceptability, anthropomorphism, likeability, past experiences with robots etc. In this paper, we have considered a group of neurotypical adults to describe how different voices and motion types of the NAO robot can have effect on the perceived safety, anthropomorphism, likeability, animacy, and perceived intelligence of the robot. In addition, prior robot experience has also been taken into consideration to perform this analysis using a one-way Analysis of Variance (ANOVA). Further, we also demonstrate that these different modalities instigate different physiological responses in the person. This classification has been done using two different deep learning approaches, 1) Convolutional Neural Network (CNN), and 2) Gramian Angular Fields on the Blood Volume Pulse (BVP) data recorded. Both of these approaches achieve better than chance accuracy 25% for a 4 class classification. △ Less

Submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted for the Special Track on Affective Robotics (AFFRO) of ACII 2023

Journal ref: 2023 11th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW)

arXiv:2306.10165 [pdf, other]

Data Selection for Fine-tuning Large Language Models Using Transferred Shapley Values

Authors: Stephanie Schoch, Ritwick Mishra, Yangfeng Ji

Abstract: Although Shapley values have been shown to be highly effective for identifying harmful training instances, dataset size and model complexity constraints limit the ability to apply Shapley-based data valuation to fine-tuning large pre-trained language models. To address this, we propose TS-DShapley, an algorithm that reduces computational cost of Shapley-based data valuation through: 1) an efficien… ▽ More Although Shapley values have been shown to be highly effective for identifying harmful training instances, dataset size and model complexity constraints limit the ability to apply Shapley-based data valuation to fine-tuning large pre-trained language models. To address this, we propose TS-DShapley, an algorithm that reduces computational cost of Shapley-based data valuation through: 1) an efficient sampling-based method that aggregates Shapley values computed from subsets for valuation of the entire training set, and 2) a value transfer method that leverages value information extracted from a simple classifier trained using representations from the target language model. Our experiments applying TS-DShapley to select data for fine-tuning BERT-based language models on benchmark natural language understanding (NLU) datasets show that TS-DShapley outperforms existing data selection methods. Further, TS-DShapley can filter fine-tuning data to increase language model performance compared to training with the full fine-tuning dataset. △ Less

Submitted 16 June, 2023; originally announced June 2023.

Comments: Accepted to ACL SRW 2023

arXiv:2306.04207 [pdf, ps, other]

Resource Aware Clustering for Tackling the Heterogeneity of Participants in Federated Learning

Authors: Rahul Mishra, Hari Prabhat Gupta, Garvit Banga

Abstract: Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy and minimizing communication overhead. The heterogeneity of devices and networking resources of the participants delay the training and aggregation in federated learning. This paper proposes a federated learning approach to manoeuvre the heterogeneity… ▽ More Federated Learning is a training framework that enables multiple participants to collaboratively train a shared model while preserving data privacy and minimizing communication overhead. The heterogeneity of devices and networking resources of the participants delay the training and aggregation in federated learning. This paper proposes a federated learning approach to manoeuvre the heterogeneity among the participants using resource aware clustering. The approach begins with the server gathering information about the devices and networking resources of participants, after which resource aware clustering is performed to determine the optimal number of clusters using Dunn Indices. The mechanism of participant assignment is then introduced, and the expression of communication rounds required for model convergence in each cluster is mathematically derived. Furthermore, a master-slave technique is introduced to improve the performance of the lightweight models in the clusters using knowledge distillation. Finally, experimental evaluations are conducted to verify the feasibility and effectiveness of the approach and to compare it with state-of-the-art techniques. △ Less

Submitted 7 June, 2023; originally announced June 2023.

Comments: 13 pages, 4 figures

arXiv:2304.09574 [pdf]

doi 10.1007/s11277-023-10351-1

The State-of-the-Art in Air Pollution Monitoring and Forecasting Systems using IoT, Big Data, and Machine Learning

Authors: Amisha Gangwar, Sudhakar Singh, Richa Mishra, Shiv Prakash

Abstract: The quality of air is closely linked with the life quality of humans, plantations, and wildlife. It needs to be monitored and preserved continuously. Transportations, industries, construction sites, generators, fireworks, and waste burning have a major percentage in degrading the air quality. These sources are required to be used in a safe and controlled manner. Using traditional laboratory analys… ▽ More The quality of air is closely linked with the life quality of humans, plantations, and wildlife. It needs to be monitored and preserved continuously. Transportations, industries, construction sites, generators, fireworks, and waste burning have a major percentage in degrading the air quality. These sources are required to be used in a safe and controlled manner. Using traditional laboratory analysis or installing bulk and expensive models every few miles is no longer efficient. Smart devices are needed for collecting and analyzing air data. The quality of air depends on various factors, including location, traffic, and time. Recent researches are using machine learning algorithms, big data technologies, and the Internet of Things to propose a stable and efficient model for the stated purpose. This review paper focuses on studying and compiling recent research in this field and emphasizes the Data sources, Monitoring, and Forecasting models. The main objective of this paper is to provide the astuteness of the researches happening to improve the various aspects of air polluting models. Further, it casts light on the various research issues and challenges also. △ Less

Submitted 19 April, 2023; originally announced April 2023.

Comments: 30 pages, 11 figures, Wireless Personal Communications. Wireless Pers Commun (2023)

Report number: WIRE-D-22-01442-R1

arXiv:2303.15745 [pdf, other]

On Feature Scaling of Recursive Feature Machines

Authors: Arunav Gupta, Rohit Mishra, William Luu, Mehdi Bouassami

Abstract: In this technical report, we explore the behavior of Recursive Feature Machines (RFMs), a type of novel kernel machine that recursively learns features via the average gradient outer product, through a series of experiments on regression datasets. When successively adding random noise features to a dataset, we observe intriguing patterns in the Mean Squared Error (MSE) curves with the test MSE exh… ▽ More In this technical report, we explore the behavior of Recursive Feature Machines (RFMs), a type of novel kernel machine that recursively learns features via the average gradient outer product, through a series of experiments on regression datasets. When successively adding random noise features to a dataset, we observe intriguing patterns in the Mean Squared Error (MSE) curves with the test MSE exhibiting a decrease-increase-decrease pattern. This behavior is consistent across different dataset sizes, noise parameters, and target functions. Interestingly, the observed MSE curves show similarities to the "double descent" phenomenon observed in deep neural networks, hinting at new connection between RFMs and neural network behavior. This report lays the groundwork for future research into this peculiar behavior. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2212.06498 [pdf, other]

Active Vibration Fluidization for Granular Jamming Grippers

Authors: Cameron Coombe, James Brett, Raghav Mishra, Gary W. Delaney, David Howard

Abstract: Granular jamming has recently become popular in soft robotics with widespread applications including industrial gripping, surgical robotics and haptics. Previous work has investigated the use of various techniques that exploit the nature of granular physics to improve jamming performance, however this is generally underrepresented in the literature compared to its potential impact. We present the… ▽ More Granular jamming has recently become popular in soft robotics with widespread applications including industrial gripping, surgical robotics and haptics. Previous work has investigated the use of various techniques that exploit the nature of granular physics to improve jamming performance, however this is generally underrepresented in the literature compared to its potential impact. We present the first research that exploits vibration-based fluidisation actively (e.g., during a grip) to elicit bespoke performance from granular jamming grippers. We augment a conventional universal gripper with a computer-controllled audio exciter, which is attached to the gripper via a 3D printed mount, and build an automated test rig to allow large-scale data collection to explore the effects of active vibration. We show that vibration in soft jamming grippers can improve holding strength. In a series of studies, we show that frequency and amplitude of the waveforms are key determinants to performance, and that jamming performance is also dependent on temporal properties of the induced waveform. We hope to encourage further study focused on active vibrational control of jamming in soft robotics to improve performance and increase diversity of potential applications. △ Less

Submitted 13 December, 2022; originally announced December 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2109.10496

arXiv:2209.15614 [pdf, other]

doi 10.1109/ISIT50566.2022.9834589

TinyTurbo: Efficient Turbo Decoders on Edge

Authors: S Ashwin Hebbar, Rajesh K Mishra, Sravan Kumar Ankireddy, Ashok V Makkuva, Hyeji Kim, Pramod Viswanath

Abstract: In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, whic… ▽ More In this paper, we introduce a neural-augmented decoder for Turbo codes called TINYTURBO . TINYTURBO has complexity comparable to the classical max-log-MAP algorithm but has much better reliability than the max-log-MAP baseline and performs close to the MAP algorithm. We show that TINYTURBO exhibits strong robustness on a variety of practical channels of interest, such as EPA and EVA channels, which are included in the LTE standards. We also show that TINYTURBO strongly generalizes across different rate, blocklengths, and trellises. We verify the reliability and efficiency of TINYTURBO via over-the-air experiments. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 10 pages, 6 figures. Published at the 2022 IEEE International Symposium on Information Theory (ISIT)

Journal ref: "TinyTurbo: Efficient Turbo Decoders on Edge," 2022 IEEE International Symposium on Information Theory (ISIT), 2022, pp. 2797-2802

arXiv:2209.15560 [pdf, ps, other]

Designing and Training of Lightweight Neural Networks on Edge Devices using Early Halting in Knowledge Distillation

Authors: Rahul Mishra, Hari Prabhat Gupta

Abstract: Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach… ▽ More Automated feature extraction capability and significant performance of Deep Neural Networks (DNN) make them suitable for Internet of Things (IoT) applications. However, deploying DNN on edge devices becomes prohibitive due to the colossal computation, energy, and storage requirements. This paper presents a novel approach for designing and training lightweight DNN using large-size DNN. The approach considers the available storage, processing speed, and maximum allowable processing time to execute the task on edge devices. We present a knowledge distillation based training procedure to train the lightweight DNN to achieve adequate accuracy. During the training of lightweight DNN, we introduce a novel early halting technique, which preserves network resources; thus, speedups the training procedure. Finally, we present the empirically and real-world evaluations to verify the effectiveness of the proposed approach under different constraints using various edge devices. △ Less

Submitted 30 September, 2022; originally announced September 2022.

Comments: 13 pages, 7 figures, 11 tables

arXiv:2209.01417 [pdf, ps, other]

Suppressing Noise from Built Environment Datasets to Reduce Communication Rounds for Convergence of Federated Learning

Authors: Rahul Mishra, Hari Prabhat Gupta, Tanima Dutta, Sajal K. Das

Abstract: Smart sensing provides an easier and convenient data-driven mechanism for monitoring and control in the built environment. Data generated in the built environment are privacy sensitive and limited. Federated learning is an emerging paradigm that provides privacy-preserving collaboration among multiple participants for model training without sharing private and limited data. The noisy labels in the… ▽ More Smart sensing provides an easier and convenient data-driven mechanism for monitoring and control in the built environment. Data generated in the built environment are privacy sensitive and limited. Federated learning is an emerging paradigm that provides privacy-preserving collaboration among multiple participants for model training without sharing private and limited data. The noisy labels in the datasets of the participants degrade the performance and increase the number of communication rounds for convergence of federated learning. Such large communication rounds require more time and energy to train the model. In this paper, we propose a federated learning approach to suppress the unequal distribution of the noisy labels in the dataset of each participant. The approach first estimates the noise ratio of the dataset for each participant and normalizes the noise ratio using the server dataset. The proposed approach can handle bias in the server dataset and minimizes its impact on the participants' dataset. Next, we calculate the optimal weighted contributions of the participants using the normalized noise ratio and influence of each participant. We further derive the expression to estimate the number of communication rounds required for the convergence of the proposed approach. Finally, experimental results demonstrate the effectiveness of the proposed approach over existing techniques in terms of the communication rounds and achieved performance in the built environment. △ Less

Submitted 3 September, 2022; originally announced September 2022.

Comments: 11 pages, 5 figures

arXiv:2205.05882 [pdf]

doi 10.1109/DASA54658.2022.9765017

E-Mail Assistant -- Automation of E-Mail Handling and Management using Robotic Process Automation

Authors: Arpit Khare, Sudhakar Singh, Richa Mishra, Shiv Prakash, Pratibha Dixit

Abstract: In this paper, a workflow for designing a bot using Robotic Process Automation (RPA), associated with Artificial Intelligence (AI) that is used for information extraction, classification, etc., is proposed. The bot is equipped with many features that make email handling a stress-free job. It automatically login into the mailbox through secured channels, distinguishes between the useful and not use… ▽ More In this paper, a workflow for designing a bot using Robotic Process Automation (RPA), associated with Artificial Intelligence (AI) that is used for information extraction, classification, etc., is proposed. The bot is equipped with many features that make email handling a stress-free job. It automatically login into the mailbox through secured channels, distinguishes between the useful and not useful emails, classifies the emails into different labels, downloads the attached files, creates different directories, and stores the downloaded files into relevant directories. It moves the not useful emails into the trash. Further, the bot can also be trained to rename the attached files with the names of the sender/applicant in case of a job application for the sake of convenience. The bot is designed and tested using the UiPath tool to improve the performance of the system. The paper also discusses the further possible functionalities that can be added on to the bot. △ Less

Submitted 12 May, 2022; originally announced May 2022.

Comments: 7 pages, 4 figures, Accepted in DASA 2022

Report number: 1570792902 ACM Class: I.2.1

arXiv:2201.01486 [pdf]

doi 10.1007/978-3-030-96040-7_48

Sign Language Recognition System using TensorFlow Object Detection API

Authors: Sharvani Srivastava, Amisha Gangwar, Richa Mishra, Sudhakar Singh

Abstract: Communication is defined as the act of sharing or exchanging information, ideas or feelings. To establish communication between two people, both of them are required to have knowledge and understanding of a common language. But in the case of deaf and dumb people, the means of communication are different. Deaf is the inability to hear and dumb is the inability to speak. They communicate using sign… ▽ More Communication is defined as the act of sharing or exchanging information, ideas or feelings. To establish communication between two people, both of them are required to have knowledge and understanding of a common language. But in the case of deaf and dumb people, the means of communication are different. Deaf is the inability to hear and dumb is the inability to speak. They communicate using sign language among themselves and with normal people but normal people do not take seriously the importance of sign language. Not everyone possesses the knowledge and understanding of sign language which makes communication difficult between a normal person and a deaf and dumb person. To overcome this barrier, one can build a model based on machine learning. A model can be trained to recognize different gestures of sign language and translate them into English. This will help a lot of people in communicating and conversing with deaf and dumb people. The existing Indian Sing Language Recognition systems are designed using machine learning algorithms with single and double-handed gestures but they are not real-time. In this paper, we propose a method to create an Indian Sign Language dataset using a webcam and then using transfer learning, train a TensorFlow model to create a real-time Sign Language Recognition system. The system achieves a good level of accuracy even with a limited size dataset. △ Less

Submitted 2 March, 2022; v1 submitted 5 January, 2022; originally announced January 2022.

Comments: 14 pages, 5 figures, ANTIC 2021

arXiv:2111.03547 [pdf, other]

doi 10.1145/3459637.3482376

POSHAN: Cardinal POS Pattern Guided Attention for News Headline Incongruence

Authors: Rahul Mishra, Shuo Zhang

Abstract: Automatic detection of click-bait and incongruent news headlines is crucial to maintaining the reliability of the Web and has raised much research attention. However, most existing methods perform poorly when news headlines contain contextually important cardinal values, such as a quantity or an amount. In this work, we focus on this particular case and propose a neural attention based solution, w… ▽ More Automatic detection of click-bait and incongruent news headlines is crucial to maintaining the reliability of the Web and has raised much research attention. However, most existing methods perform poorly when news headlines contain contextually important cardinal values, such as a quantity or an amount. In this work, we focus on this particular case and propose a neural attention based solution, which uses a novel cardinal Part of Speech (POS) tag pattern based hierarchical attention network, namely POSHAN, to learn effective representations of sentences in a news article. In addition, we investigate a novel cardinal phrase guided attention, which uses word embeddings of the contextually-important cardinal value and neighbouring words. In the experiments conducted on two publicly available datasets, we observe that the proposed methodgives appropriate significance to cardinal values and outperforms all the baselines. An ablation study of POSHAN shows that the cardinal POS-tag pattern-based hierarchical attention is very effective for the cases in which headlines contain cardinal values. △ Less

Submitted 5 November, 2021; originally announced November 2021.

Comments: Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM '21), November 1--5, 2021, Virtual Event, QLD, Australia

arXiv:2111.01105 [pdf]

FREGAN : an application of generative adversarial networks in enhancing the frame rate of videos

Authors: Rishik Mishra, Neeraj Gupta, Nitya Shukla

Abstract: A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in b… ▽ More A digital video is a collection of individual frames, while streaming the video the scene utilized the time slice for each frame. High refresh rate and high frame rate is the demand of all high technology applications. The action tracking in videos becomes easier and motion becomes smoother in gaming applications due to the high refresh rate. It provides a faster response because of less time in between each frame that is displayed on the screen. FREGAN (Frame Rate Enhancement Generative Adversarial Network) model has been proposed, which predicts future frames of a video sequence based on a sequence of past frames. In this paper, we investigated the GAN model and proposed FREGAN for the enhancement of frame rate in videos. We have utilized Huber loss as a loss function in the proposed FREGAN. It provided excellent results in super-resolution and we have tried to reciprocate that performance in the application of frame rate enhancement. We have validated the effectiveness of the proposed model on the standard datasets (UCF101 and RFree500). The experimental outcomes illustrate that the proposed model has a Peak signal-to-noise ratio (PSNR) of 34.94 and a Structural Similarity Index (SSIM) of 0.95. △ Less

Submitted 1 November, 2021; originally announced November 2021.

ACM Class: I.2.1

arXiv:2110.06948 [pdf, other]

doi 10.1007/JHEP03(2022)066

Challenges for Unsupervised Anomaly Detection in Particle Physics

Authors: Katherine Fraser, Samuel Homiller, Rashmish K. Mishra, Bryan Ostdiek, Matthew D. Schwartz

Abstract: Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence o… ▽ More Anomaly detection relies on designing a score to determine whether a particular event is uncharacteristic of a given background distribution. One way to define a score is to use autoencoders, which rely on the ability to reconstruct certain types of data (background) but not others (signals). In this paper, we study some challenges associated with variational autoencoders, such as the dependence on hyperparameters and the metric used, in the context of anomalous signal (top and $W$) jets in a QCD background. We find that the hyperparameter choices strongly affect the network performance and that the optimal parameters for one signal are non-optimal for another. In exploring the networks, we uncover a connection between the latent space of a variational autoencoder trained using mean-squared-error and the optimal transport distances within the dataset. We then show that optimal transport distances to representative events in the background dataset can be used directly for anomaly detection, with performance comparable to the autoencoders. Whether using autoencoders or optimal transport distances for anomaly detection, we find that the choices that best represent the background are not necessarily best for signal identification. These challenges with unsupervised anomaly detection bolster the case for additional exploration of semi-supervised or alternative approaches. △ Less

Submitted 13 October, 2021; originally announced October 2021.

Comments: 22 + 2 pages, 8 figures, 2 tables

arXiv:2109.10496 [pdf, other]

Vibration Improves Performance in Granular Jamming Grippers

Authors: Raghav Mishra, Tyson Philips, Gary W. Delaney, David Howard

Abstract: Granular jamming is a popular soft robotics technology that has seen recent widespread applications including industrial gripping, surgical robotics and haptics. However, to date the field has not fully exploited the fundamental science of the jamming phase transition, which has been rigorously studied in the field of statistical and condensed matter physics. This work introduces vibration as a me… ▽ More Granular jamming is a popular soft robotics technology that has seen recent widespread applications including industrial gripping, surgical robotics and haptics. However, to date the field has not fully exploited the fundamental science of the jamming phase transition, which has been rigorously studied in the field of statistical and condensed matter physics. This work introduces vibration as a means to improve the properties of granular jamming grippers through vibratory fluidisation and the exploitation of resonant modes within the granular material. We show that vibration in soft jamming grippers can improve holding strength, reduce the downwards force needed for the gripping action, and lead to a simplified setup where the second air pump, generally used for unjamming, could be removed. In a series of studies, we show that frequency and amplitude of the waveforms are key determinants to performance, and that jamming performance is also dependent on temporal properties of the induced waveform. We hope to encourage further study in transitioning fundamental jamming mechanisms into a soft robotics context to improve performance and increase diversity of applications for granular jamming grippers. △ Less

Submitted 21 September, 2021; originally announced September 2021.

Comments: This paper is under consideration for publication by IEEE and may be removed without notice

arXiv:2107.07579 [pdf, other]

A Channel Coding Benchmark for Meta-Learning

Authors: Rui Li, Ondrej Bohdal, Rajesh Mishra, Hyeji Kim, Da Li, Nicholas Lane, Timothy Hospedales

Abstract: Meta-learning provides a popular and effective family of methods for data-efficient learning of new tasks. However, several important issues in meta-learning have proven hard to study thus far. For example, performance degrades in real-world settings where meta-learners must learn from a wide and potentially multi-modal distribution of training tasks; and when distribution shift exists between met… ▽ More Meta-learning provides a popular and effective family of methods for data-efficient learning of new tasks. However, several important issues in meta-learning have proven hard to study thus far. For example, performance degrades in real-world settings where meta-learners must learn from a wide and potentially multi-modal distribution of training tasks; and when distribution shift exists between meta-train and meta-test task distributions. These issues are typically hard to study since the shape of task distributions, and shift between them are not straightforward to measure or control in standard benchmarks. We propose the channel coding problem as a benchmark for meta-learning. Channel coding is an important practical application where task distributions naturally arise, and fast adaptation to new tasks is practically valuable. We use our MetaCC benchmark to study several aspects of meta-learning, including the impact of task distribution breadth and shift, which can be controlled in the coding problem. Going forward, MetaCC provides a tool for the community to study the capabilities and limitations of meta-learning, and to drive research on practically robust and effective meta-learners. △ Less

Submitted 2 December, 2021; v1 submitted 15 July, 2021; originally announced July 2021.

arXiv:2106.09493 [pdf, other]

Scalable Approach for Normalizing E-commerce Text Attributes (SANTA)

Authors: Ravi Shankar Mishra, Kartik Mehta, Nikhil Rasiwasia

Abstract: In this paper, we present SANTA, a scalable framework to automatically normalize E-commerce attribute values (e.g. "Win 10 Pro") to a fixed set of pre-defined canonical values (e.g. "Windows 10"). Earlier works on attribute normalization focused on fuzzy string matching (also referred as syntactic matching in this paper). In this work, we first perform an extensive study of nine syntactic matching… ▽ More In this paper, we present SANTA, a scalable framework to automatically normalize E-commerce attribute values (e.g. "Win 10 Pro") to a fixed set of pre-defined canonical values (e.g. "Windows 10"). Earlier works on attribute normalization focused on fuzzy string matching (also referred as syntactic matching in this paper). In this work, we first perform an extensive study of nine syntactic matching algorithms and establish that 'cosine' similarity leads to best results, showing 2.7% improvement over commonly used Jaccard index. Next, we argue that string similarity alone is not sufficient for attribute normalization as many surface forms require going beyond syntactic matching (e.g. "720p" and "HD" are synonyms). While semantic techniques like unsupervised embeddings (e.g. word2vec/fastText) have shown good results in word similarity tasks, we observed that they perform poorly to distinguish between close canonical forms, as these close forms often occur in similar contexts. We propose to learn token embeddings using a twin network with triplet loss. We propose an embedding learning task leveraging raw attribute values and product titles to learn these embeddings in a self-supervised fashion. We show that providing supervision using our proposed task improves over both syntactic and unsupervised embeddings based techniques for attribute normalization. Experiments on a real-world attribute normalization dataset of 50 attributes show that the embeddings trained using our proposed approach obtain 2.3% improvement over best string matching and 19.3% improvement over best unsupervised embeddings. △ Less

Submitted 12 June, 2021; originally announced June 2021.

Comments: Accepted in ECNLP workshop of ACL-IJCNLP 2021 (https://sites.google.com/view/ecnlp)

arXiv:2103.10807 [pdf, other]

doi 10.13140/RG.2.2.32941.82403

Linear Coding for AWGN channels with Noisy Output Feedback via Dynamic Programming

Authors: Rajesh Mishra, Deepanshu Vasal, Hyeji Kim

Abstract: The optimal coding scheme for communicating a Gaussian message over an Additive White Gaussian noise (AWGN) channel with AWGN output feedback, with a limited number of transmissions is unknown. Even if we restrict the scope of the coding scheme to linear schemes, still, deriving the optimal coding scheme is a challenging task. The state-of-the-art linear scheme for channels with noisy feedback is… ▽ More The optimal coding scheme for communicating a Gaussian message over an Additive White Gaussian noise (AWGN) channel with AWGN output feedback, with a limited number of transmissions is unknown. Even if we restrict the scope of the coding scheme to linear schemes, still, deriving the optimal coding scheme is a challenging task. The state-of-the-art linear scheme for channels with noisy feedback is by Chance and Love, where the coefficients of the linear scheme are numerically optimized based on unique observations [1]. In this paper, we introduce a new class of sequential linear schemes for this channel by introducing a novel linear state process at the transmitter and derive the optimal sequential scheme within this class of schemes in a closed-form by formulating a novel Dynamic Programming (DP). We empirically show that our scheme outperforms the state-of-the-art linear scheme in [1] for noisy feedback and coincides with the SK scheme for noiseless feedback. We also show that in communicating message bits as opposed to a Gaussian message, a learning-based approach further improves the reliability of sequential linear schemes. This problem is an instance of decentralized control without any common information and to the best of our knowledge the first such scenario where we can derive analytical solutions using a DP. △ Less

Submitted 23 May, 2022; v1 submitted 17 March, 2021; originally announced March 2021.

Comments: 27 pages, 10 figures

arXiv:2010.08570 [pdf, other]

Generating Fact Checking Summaries for Web Claims

Authors: Rahul Mishra, Dhruv Gupta, Markus Leippold

Abstract: We present SUMO, a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents (e.g., news articles or Web documents). SUMO further generates an extractive summary by presenting a diversified set of sentences from the documents that explain its decision on the correctness of the textual claim. Prior approaches to addres… ▽ More We present SUMO, a neural attention-based approach that learns to establish the correctness of textual claims based on evidence in the form of text documents (e.g., news articles or Web documents). SUMO further generates an extractive summary by presenting a diversified set of sentences from the documents that explain its decision on the correctness of the textual claim. Prior approaches to address the problem of fact checking and evidence extraction have relied on simple concatenation of claim and document word embeddings as an input to claim driven attention weight computation. This is done so as to extract salient words and sentences from the documents that help establish the correctness of the claim. However, this design of claim-driven attention does not capture the contextual information in documents properly. We improve on the prior art by using improved claim and title guided hierarchical attention to model effective contextual cues. We show the efficacy of our approach on datasets concerning political, healthcare, and environmental issues. △ Less

Submitted 16 October, 2020; originally announced October 2020.

Comments: Accepted paper; The 2020 Conference on Empirical Methods in Natural Language Processing EMNLP - WNUT

MSC Class: 68T50 ACM Class: H.1.1; H.3.1; H.3.3

arXiv:2010.03954 [pdf, ps, other]

A Survey on Deep Neural Network Compression: Challenges, Overview, and Solutions

Authors: Rahul Mishra, Hari Prabhat Gupta, Tanima Dutta

Abstract: Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade. However, the colossal requirement of computation, energy, and storage of DNN models make their deployment prohibitive on resource constra… ▽ More Deep Neural Network (DNN) has gained unprecedented performance due to its automated feature extraction capability. This high order performance leads to significant incorporation of DNN models in different Internet of Things (IoT) applications in the past decade. However, the colossal requirement of computation, energy, and storage of DNN models make their deployment prohibitive on resource constraint IoT devices. Therefore, several compression techniques were proposed in recent years for reducing the storage and computation requirements of the DNN model. These techniques on DNN compression have utilized a different perspective for compressing DNN with minimal accuracy compromise. It encourages us to make a comprehensive overview of the DNN compression techniques. In this paper, we present a comprehensive review of existing literature on compressing DNN model that reduces both storage and computation requirements. We divide the existing approaches into five broad categories, i.e., network pruning, sparse representation, bits precision, knowledge distillation, and miscellaneous, based upon the mechanism incorporated for compressing the DNN model. The paper also discussed the challenges associated with each category of DNN compression techniques. Finally, we provide a quick summary of existing work under each category with the future direction in DNN compression. △ Less

Submitted 5 October, 2020; originally announced October 2020.

Comments: 19 pages, 9 figures

arXiv:2010.03617 [pdf, other]

MuSeM: Detecting Incongruent News Headlines using Mutual Attentive Semantic Matching

Authors: Rahul Mishra, Piyush Yadav, Remi Calizzano, Markus Leippold

Abstract: Measuring the congruence between two texts has several useful applications, such as detecting the prevalent deceptive and misleading news headlines on the web. Many works have proposed machine learning based solutions such as text similarity between the headline and body text to detect the incongruence. Text similarity based methods fail to perform well due to different inherent challenges such as… ▽ More Measuring the congruence between two texts has several useful applications, such as detecting the prevalent deceptive and misleading news headlines on the web. Many works have proposed machine learning based solutions such as text similarity between the headline and body text to detect the incongruence. Text similarity based methods fail to perform well due to different inherent challenges such as relative length mismatch between the news headline and its body content and non-overlapping vocabulary. On the other hand, more recent works that use headline guided attention to learn a headline derived contextual representation of the news body also result in convoluting overall representation due to the news body's lengthiness. This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines, which utilizes the difference between all pairs of word embeddings of words involved. The paper also investigates two more variations of our method, which use concatenation and dot-products of word embeddings of the words of original and synthetic headlines. We observe that the proposed method outperforms prior arts significantly for two publicly available datasets. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: Accepted paper; IEEE 2020 International Conference on Machine Learning and Applications (ICMLA)

MSC Class: 68T50 ACM Class: H.1.1; H.3.1; H.3.3

arXiv:2009.09088 [pdf, other]

An AI based talent acquisition and benchmarking for job

Authors: Rudresh Mishra, Ricardo Rodriguez, Valentin Portillo

Abstract: In a recruitment industry, selecting a best CV from a particular job post within a pile of thousand CV's is quite challenging. Finding a perfect candidate for an organization who can be fit to work within organizational culture is a difficult task. In order to help the recruiters to fill these gaps we leverage the help of AI. We propose a methodology to solve these problems by matching the skill g… ▽ More In a recruitment industry, selecting a best CV from a particular job post within a pile of thousand CV's is quite challenging. Finding a perfect candidate for an organization who can be fit to work within organizational culture is a difficult task. In order to help the recruiters to fill these gaps we leverage the help of AI. We propose a methodology to solve these problems by matching the skill graph generated from CV and Job Post. In this report our approach is to perform the business understanding in order to justify why such problems arise and how we intend to solve these problems using natural language processing and machine learning techniques. We limit our project only to solve the problem in the domain of the computer science industry. △ Less

Submitted 12 August, 2020; originally announced September 2020.

Comments: 26 pages , 23 figures, This paper is yet to publish in conferences

arXiv:2008.13116 [pdf, other]

Analysis, Modeling, and Representation of COVID-19 Spread: A Case Study on India

Authors: Rahul Mishra, Hari Prabhat Gupta, Tanima Dutta

Abstract: Coronavirus outbreak is one of the most challenging pandemics for the entire human population of the planet Earth. Techniques such as the isolation of infected persons and maintaining social distancing are the only preventive measures against the epidemic COVID-19. The actual estimation of the number of infected persons with limited data is an indeterminate problem faced by data scientists. There… ▽ More Coronavirus outbreak is one of the most challenging pandemics for the entire human population of the planet Earth. Techniques such as the isolation of infected persons and maintaining social distancing are the only preventive measures against the epidemic COVID-19. The actual estimation of the number of infected persons with limited data is an indeterminate problem faced by data scientists. There are a large number of techniques in the existing literature, including reproduction number, the case fatality rate, etc., for predicting the duration of an epidemic and infectious population. This paper presents a case study of different techniques for analysing, modeling, and representation of data associated with an epidemic such as COVID-19. We further propose an algorithm for estimating infection transmission states in a particular area. This work also presents an algorithm for estimating end-time of an epidemic from Susceptible Infectious and Recovered model. Finally, this paper presents empirical and data analysis to study the impact of transmission probability, rate of contact, infectious, and susceptible on the epidemic spread. △ Less

Submitted 30 August, 2020; originally announced August 2020.

Comments: 10 pages, 14 figures

arXiv:2006.02570 [pdf, other]

doi 10.3390/jimaging10020045

Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images

Authors: Soumick Chatterjee, Fatima Saad, Chompunuch Sarasaen, Suhita Ghosh, Valerie Krug, Rupali Khatun, Rahul Mishra, Nirja Desai, Petia Radeva, Georg Rose, Sebastian Stober, Oliver Speck, Andreas Nürnberger

Abstract: The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing infected patients. Medical imaging, such as X-ray and Computed Tomography (CT), combined with the potential of Artificial Intelligence (AI), plays an essential role in supporting medical pers… ▽ More The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing infected patients. Medical imaging, such as X-ray and Computed Tomography (CT), combined with the potential of Artificial Intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2 and DenseNet161) and their ensemble, using majority voting have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model. △ Less

Submitted 24 January, 2024; v1 submitted 3 June, 2020; originally announced June 2020.

Journal ref: Journal of Imaging. 2024; 10(2):45

arXiv:2005.11853 [pdf, other]

Model-free Reinforcement Learning for Stochastic Stackelberg Security Games

Authors: Rajesh K Mishra, Deepanshu Vasal, Sriram Vishwanath

Abstract: In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to… ▽ More In this paper, we consider a sequential stochastic Stackelberg game with two players, a leader and a follower. The follower has access to the state of the system while the leader does not. Assuming that the players act in their respective best interests, the follower's strategy is to play the best response to the leader's strategy. In such a scenario, the leader has the advantage of committing to a policy which maximizes its own returns given the knowledge that the follower is going to play the best response to its policy. Thus, both players converge to a pair of policies that form the Stackelberg equilibrium of the game. Recently,~[1] provided a sequential decomposition algorithm to compute the Stackelberg equilibrium for such games which allow for the computation of Markovian equilibrium policies in linear time as opposed to double exponential, as before. In this paper, we extend the idea to an MDP whose dynamics are not known to the players, to propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. by simulating a model of the MDP. We use particle filters to estimate the belief update for a common agent which computes the optimal policy based on the information which is common to both the players. We present a security game example to illustrate the policy learned by our algorithm. △ Less

Submitted 24 May, 2020; originally announced May 2020.

Showing 1–50 of 64 results for author: Mishra, R