Skip to main content

Showing 1–26 of 26 results for author: Melnyk, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.05882  [pdf, other

    cs.LG stat.ML

    Distributional Preference Alignment of LLMs via Optimal Transport

    Authors: Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

    Abstract: Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  2. arXiv:2403.11901  [pdf, other

    cs.LG cs.AI

    Larimar: Large Language Models with Episodic Memory Control

    Authors: Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, Soham Dan, Pin-Yu Chen

    Abstract: Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tunin… ▽ More

    Submitted 21 August, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  3. arXiv:2310.07132  [pdf, other

    cs.LG math.ST q-fin.RM stat.ML

    Risk Aware Benchmarking of Large Language Models

    Authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross

    Abstract: We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and math… ▽ More

    Submitted 9 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: ICML 2024

  4. arXiv:2304.10819  [pdf, other

    cs.LG cs.AI stat.ML

    Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

    Authors: Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young

    Abstract: Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew… ▽ More

    Submitted 9 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

    Comments: submitted

  5. arXiv:2211.10511  [pdf, other

    cs.CL cs.LG

    Knowledge Graph Generation From Text

    Authors: Igor Melnyk, Pierre Dognin, Payel Das

    Abstract: In this work we propose a novel end-to-end multi-stage Knowledge Graph (KG) generation system from textual inputs, separating the overall process into two stages. The graph nodes are generated first using pretrained language model, followed by a simple edge construction head, enabling efficient KG extraction from the text. For each stage we consider several architectural choices that can be used d… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Findings of EMNLP 2022

  6. arXiv:2210.07144  [pdf, other

    q-bio.BM cs.LG

    Reprogramming Pretrained Language Models for Antibody Sequence Infilling

    Authors: Igor Melnyk, Vijil Chenthamarakshan, Pin-Yu Chen, Payel Das, Amit Dhurandhar, Inkit Padhi, Devleena Das

    Abstract: Antibodies comprise the most versatile class of binding molecules, with numerous applications in biomedicine. Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency. Unique to antibodies, designing the complementarity-determining region (CDR), which determines the antigen binding affinity and specificity, creates its own unique… ▽ More

    Submitted 19 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICML 2023

  7. arXiv:2210.03488  [pdf, other

    q-bio.BM cs.LG

    AlphaFold Distillation for Protein Design

    Authors: Igor Melnyk, Aurelie Lozano, Payel Das, Vijil Chenthamarakshan

    Abstract: Inverse protein folding, the process of designing sequences that fold into a specific 3D structure, is crucial in bio-engineering and drug discovery. Traditional methods rely on experimentally resolved structures, but these cover only a small fraction of protein sequences. Forward folding models like AlphaFold offer a potential solution by accurately predicting structures from sequences. However,… ▽ More

    Submitted 22 November, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Preprint

  8. arXiv:2208.06665  [pdf, other

    cs.LG

    Cloud-Based Real-Time Molecular Screening Platform with MolFormer

    Authors: Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young

    Abstract: With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The pla… ▽ More

    Submitted 13 August, 2022; originally announced August 2022.

    Comments: Paper accepted at ECML PKDD 2022 demo track

  9. arXiv:2111.06801  [pdf, other

    q-bio.BM cs.CL

    Benchmarking deep generative models for diverse antibody sequence design

    Authors: Igor Melnyk, Payel Das, Vijil Chenthamarakshan, Aurelie Lozano

    Abstract: Computational protein design, i.e. inferring novel and diverse protein sequences consistent with a given structure, remains a major unsolved challenge. Recently, deep generative models that learn from sequences alone or from sequences and structures jointly have shown impressive performance on this task. However, those models appear limited in terms of modeling structural constraints, capturing en… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Learning Meaningful Representations of Life Workshop paper at NeurIPS 2021

  10. arXiv:2108.12472  [pdf, other

    cs.CL cs.LG

    ReGen: Reinforcement Learning for Text and Knowledge Base Generation using Pretrained Language Models

    Authors: Pierre L. Dognin, Inkit Padhi, Igor Melnyk, Payel Das

    Abstract: Automatic construction of relevant Knowledge Bases (KBs) from text, and generation of semantically meaningful text from KBs are both long-standing goals in Machine Learning. In this paper, we present ReGen, a bidirectional generation of text and graph leveraging Reinforcement Learning (RL) to improve performance. Graph linearization enables us to re-frame both tasks as a sequence to sequence gener… ▽ More

    Submitted 27 August, 2021; originally announced August 2021.

    Comments: Accepted to appear in the main conference of EMNLP 2021

  11. arXiv:2106.13058  [pdf, other

    cs.LG q-bio.BM

    Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

    Authors: Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, Yang Shen

    Abstract: Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence--fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based genera… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: ICML 2021

  12. arXiv:2012.11696  [pdf, other

    cs.CV cs.LG

    Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young, Brian Belgodere

    Abstract: Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on the… ▽ More

    Submitted 18 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: In submission to JAIR. Copyright may be transferred without notice, after which this version may no longer be accessible

  13. arXiv:2012.11691  [pdf, other

    cs.CV cs.LG

    Alleviating Noisy Data in Image Captioning with Cooperative Distillation

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff

    Abstract: Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

    Comments: CVPR 2020 VizWiz Challenge

  14. arXiv:2011.01843  [pdf, other

    cs.LG cs.AI

    Tabular Transformers for Modeling Multivariate Time Series

    Authors: Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh, Pierre Dognin, Jerret Ross, Ravi Nair, Erik Altman

    Abstract: Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learn… ▽ More

    Submitted 11 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to ICASSP, 2021; https://github.com/IBM/TabFormer

  15. arXiv:2010.14660  [pdf, other

    cs.CL cs.LG

    DualTKB: A Dual Learning Bridge between Text and Knowledge Base

    Authors: Pierre L. Dognin, Igor Melnyk, Inkit Padhi, Cicero Nogueira dos Santos, Payel Das

    Abstract: In this work, we present a dual learning approach for unsupervised text to path and path to text transfers in Commonsense Knowledge Bases (KBs). We investigate the impact of weak supervision by creating a weakly supervised dataset and show that even a slight amount of supervision can significantly improve the model performance and enable better-quality transfers. We examine different model archite… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Equal Contributions of Authors Pierre L. Dognin, Igor Melnyk, and Inkit Padhi. Accepted at EMNLP'20

  16. arXiv:2009.02439  [pdf, other

    cs.LG math.OC stat.ML

    Optimizing Mode Connectivity via Neuron Alignment

    Authors: N. Joseph Tatro, Pin-Yu Chen, Payel Das, Igor Melnyk, Prasanna Sattigeri, Rongjie Lai

    Abstract: The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by… ▽ More

    Submitted 2 November, 2020; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: Accepted to NeurIPS 2020, 24 pages, 9 figures, code available at https://github.com/IBM/NeuronAlignment

    Journal ref: Advances in Neural Information Processing Systems, Volume 33, 2020

  17. arXiv:1902.04999  [pdf, other

    cs.LG stat.ML

    Wasserstein Barycenter Model Ensembling

    Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu

    Abstract: In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement b… ▽ More

    Submitted 13 February, 2019; originally announced February 2019.

    Comments: ICLR 2019

  18. arXiv:1810.05728  [pdf, other

    cs.LG stat.ML

    Estimating Information Flow in Deep Neural Networks

    Authors: Ziv Goldfeld, Ewout van den Berg, Kristjan Greenewald, Igor Melnyk, Nam Nguyen, Brian Kingsbury, Yury Polyanskiy

    Abstract: We study the flow of information and the evolution of internal representations during deep neural network (DNN) training, aiming to demystify the compression aspect of the information bottleneck theory. The theory suggests that DNN training comprises a rapid fitting phase followed by a slower compression phase, in which the mutual information $I(X;T)$ between the input $X$ and internal representat… ▽ More

    Submitted 30 May, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

    Comments: Main text accepted to ICML 2019. This preprint contains the full version of that paper (including omitted appendices)

  19. arXiv:1805.07685  [pdf, other

    cs.CL cs.LG

    Fighting Offensive Language on Social Media with Unsupervised Text Style Transfer

    Authors: Cicero Nogueira dos Santos, Igor Melnyk, Inkit Padhi

    Abstract: We introduce a new approach to tackle the problem of offensive language in online social media. Our approach uses unsupervised text style transfer to translate offensive sentences into non-offensive ones. We propose a new method for training encoder-decoders using non-parallel data that combines a collaborative classifier, attention and the cycle consistency loss. Experimental results on data from… ▽ More

    Submitted 19 May, 2018; originally announced May 2018.

    Comments: ACL 2018

  20. arXiv:1805.00063  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Adversarial Semantic Alignment for Improved Image Captions

    Authors: Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu

    Abstract: In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient… ▽ More

    Submitted 6 June, 2019; v1 submitted 30 April, 2018; originally announced May 2018.

    Comments: Authors Equal Contribution, CVPR 2019

  21. arXiv:1802.08323  [pdf, other

    physics.comp-ph cs.LG physics.data-an stat.ML

    Deep learning algorithm for data-driven simulation of noisy dynamical system

    Authors: Kyongmin Yeo, Igor Melnyk

    Abstract: We present a deep learning model, DE-LSTM, for the simulation of a stochastic process with an underlying nonlinear dynamics. The deep learning model aims to approximate the probability density function of a stochastic process via numerical discretization and the underlying nonlinear dynamics is modeled by the Long Short-Term Memory (LSTM) network. It is shown that, when the numerical discretizatio… ▽ More

    Submitted 5 September, 2018; v1 submitted 22 February, 2018; originally announced February 2018.

  22. arXiv:1711.09395  [pdf, other

    cs.CL cs.AI cs.LG

    Improved Neural Text Attribute Transfer with Non-parallel Data

    Authors: Igor Melnyk, Cicero Nogueira dos Santos, Kahini Wadhawan, Inkit Padhi, Abhishek Kumar

    Abstract: Text attribute transfer using non-parallel data requires methods that can perform disentanglement of content and linguistic attributes. In this work, we propose multiple improvements over the existing approaches that enable the encoder-decoder framework to cope with the text attribute transfer from non-parallel data. We perform experiments on the sentiment transfer task using two datasets. For bot… ▽ More

    Submitted 4 December, 2017; v1 submitted 26 November, 2017; originally announced November 2017.

    Comments: NIPS 2017 Workshop on Learning Disentangled Representations: from Perception to Control

  23. arXiv:1709.03159  [pdf, other

    cs.LG stat.ML

    R2N2: Residual Recurrent Neural Networks for Multivariate Time Series Forecasting

    Authors: Hardik Goel, Igor Melnyk, Arindam Banerjee

    Abstract: Multivariate time-series modeling and forecasting is an important problem with numerous applications. Traditional approaches such as VAR (vector auto-regressive) models and more recent approaches such as RNNs (recurrent neural networks) are indispensable tools in modeling time-series data. In many multivariate time series modeling problems, there is usually a significant linear dependency componen… ▽ More

    Submitted 10 September, 2017; originally announced September 2017.

  24. arXiv:1708.00308  [pdf, other

    cs.CL cs.LG stat.ML

    SenGen: Sentence Generating Neural Variational Topic Model

    Authors: Ramesh Nallapati, Igor Melnyk, Abhishek Kumar, Bowen Zhou

    Abstract: We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable t… ▽ More

    Submitted 1 August, 2017; originally announced August 2017.

  25. arXiv:1602.06550  [pdf, other

    cs.LG stat.AP stat.ML

    Semi-Markov Switching Vector Autoregressive Model-based Anomaly Detection in Aviation Systems

    Authors: Igor Melnyk, Arindam Banerjee, Bryan Matthews, Nikunj Oza

    Abstract: In this work we consider the problem of anomaly detection in heterogeneous, multivariate, variable-length time series datasets. Our focus is on the aviation safety domain, where data objects are flights and time series are sensor readings and pilot switches. In this context the goal is to detect anomalous flight segments, due to mechanical, environmental, or human factors in order to identifying o… ▽ More

    Submitted 28 February, 2016; v1 submitted 21 February, 2016; originally announced February 2016.

  26. arXiv:1407.3422  [pdf, other

    stat.ML cs.LG

    A Spectral Algorithm for Inference in Hidden Semi-Markov Models

    Authors: Igor Melnyk, Arindam Banerjee

    Abstract: Hidden semi-Markov models (HSMMs) are latent variable models which allow latent state persistence and can be viewed as a generalization of the popular hidden Markov models (HMMs). In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs. Unlike expectation maximization (EM), our approach correctly estimates the probability of given observation sequence based on a set of… ▽ More

    Submitted 28 February, 2016; v1 submitted 12 July, 2014; originally announced July 2014.