Skip to main content

Showing 1–14 of 14 results for author: Anwar, U

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.12137  [pdf, other

    cs.AI

    IDs for AI Systems

    Authors: Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, Markus Anderljung

    Abstract: AI systems are increasingly pervasive, yet information needed to decide whether and how to engage with them may not exist or be accessible. A user may not be able to verify whether a system has certain safety certifications. An investigator may not know whom to investigate when a system causes an incident. It may not be clear whom to contact to shut down a malfunctioning system. Across a number of… ▽ More

    Submitted 18 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: Work-in-progress

  2. arXiv:2404.09932  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Foundational Challenges in Assuring Alignment and Safety of Large Language Models

    Authors: Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi , et al. (13 additional authors not shown)

    Abstract: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose $200+$ concrete research questions.

    Submitted 15 April, 2024; originally announced April 2024.

  3. arXiv:2310.02743  [pdf, other

    cs.LG

    Reward Model Ensembles Help Mitigate Overoptimization

    Authors: Thomas Coste, Usman Anwar, Robert Kirk, David Krueger

    Abstract: Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning large language models to follow instructions. As part of this process, learned reward models are used to approximately model human preferences. However, as imperfect representations of the "true" reward, these learned reward models are susceptible to overoptimization. Gao et al. (2023) studied this phenomenon… ▽ More

    Submitted 10 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted at ICLR 2024

  4. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  5. arXiv:2211.14827  [pdf, other

    cs.LG cs.AI stat.ML

    Domain Generalization for Robust Model-Based Offline Reinforcement Learning

    Authors: Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger

    Abstract: Existing offline reinforcement learning (RL) algorithms typically assume that training data is either: 1) generated by a known policy, or 2) of entirely unknown origin. We consider multi-demonstrator offline RL, a middle ground where we know which demonstrators generated each dataset, but make no assumptions about the underlying policies of the demonstrators. This is the most natural setting when… ▽ More

    Submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted to the NeurIPS 2022 Workshops on Distribution Shifts and Offline Reinforcement Learning

  6. arXiv:2204.11848  [pdf, other

    cs.CV cs.AI cs.IR cs.LG

    On Leveraging Variational Graph Embeddings for Open World Compositional Zero-Shot Learning

    Authors: Muhammad Umer Anwaar, Zhihui Pan, Martin Kleinsteuber

    Abstract: Humans are able to identify and categorize novel compositions of known concepts. The task in Compositional Zero-Shot learning (CZSL) is to learn composition of primitive concepts, i.e. objects and states, in such a way that even their novel compositions can be zero-shot classified. In this work, we do not assume any prior knowledge on the feasibility of novel compositions i.e.open-world setting, w… ▽ More

    Submitted 23 April, 2022; originally announced April 2022.

    Comments: Submitted to a conference

  7. arXiv:2202.03973  [pdf

    cs.HC eess.SY

    Hearing Loss, Cognitive Load and Dementia: An Overview of Interrelation, Detection and Monitoring Challenges with Wearable Non-invasive Microwave Sensors

    Authors: Usman Anwar, Tughrul Arslan, Amir Hussain

    Abstract: This paper provides an overview of hearing loss effects on neurological function and progressive diseases; and explores the role of cognitive load monitoring to detect dementia. It also investigates the prospects of utilizing hearing aid technology to reverse cognitive decline and delay the onset of dementia, for the old age population. The interrelation between hearing loss, cognitive load and de… ▽ More

    Submitted 8 February, 2022; originally announced February 2022.

    Comments: 5 pages, 1 figure, conference (submitted)

  8. arXiv:2101.03885  [pdf, other

    cs.LG

    Variational Embeddings for Community Detection and Node Representation

    Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Omran Kaddah, Martin Kleinsteuber

    Abstract: In this paper, we study how to simultaneously learn two highly correlated tasks of graph analysis, i.e., community detection and node representation learning. We propose an efficient generative model called VECoDeR for jointly learning Variational Embeddings for Community Detection and node Representation. VECoDeR assumes that every node can be a member of one or more communities. The node embeddi… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

  9. arXiv:2011.09999  [pdf, other

    cs.LG cs.RO eess.SY

    Inverse Constrained Reinforcement Learning

    Authors: Usman Anwar, Shehryar Malik, Alireza Aghasi, Ali Ahmed

    Abstract: In real world settings, numerous constraints are present which are hard to specify mathematically. However, for the real world deployment of reinforcement learning (RL), it is critical that RL agents are aware of these constraints, so that they can act safely. In this work, we consider the problem of learning constraints from demonstrations of a constraint-abiding agent's behavior. We experimental… ▽ More

    Submitted 21 May, 2021; v1 submitted 19 November, 2020; originally announced November 2020.

    Comments: Camera-ready version for ICML 2021

  10. arXiv:2010.11793  [pdf, other

    cs.LG cs.AI

    Metapath- and Entity-aware Graph Neural Network for Recommendation

    Authors: Muhammad Umer Anwaar, Zhiwei Han, Shyam Arumugaswamy, Rayyan Ahmad Khan, Thomas Weber, Tianming Qiu, Hao Shen, Yuanting Liu, Martin Kleinsteuber

    Abstract: In graph neural networks (GNNs), message passing iteratively aggregates nodes' information from their direct neighbors while neglecting the sequential nature of multi-hop node connections. Such sequential node connections e.g., metapaths, capture critical insights for downstream tasks. Concretely, in recommender systems (RSs), disregarding these insights leads to inadequate distillation of collabo… ▽ More

    Submitted 1 April, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  11. arXiv:2006.11149  [pdf, other

    cs.CV cs.IR

    Compositional Learning of Image-Text Query for Image Retrieval

    Authors: Muhammad Umer Anwaar, Egor Labintcev, Martin Kleinsteuber

    Abstract: In this paper, we investigate the problem of retrieving images from a database based on a multi-modal (image-text) query. Specifically, the query text prompts some modification in the query image and the task is to retrieve images with the desired modifications. For instance, a user of an E-Commerce platform is interested in buying a dress, which should look similar to her friend's dress, but the… ▽ More

    Submitted 31 May, 2021; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: Published at IEEE WACV 2021

  12. arXiv:2004.01468  [pdf, other

    cs.LG stat.ML

    Epitomic Variational Graph Autoencoder

    Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Martin Kleinsteuber

    Abstract: Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects lear… ▽ More

    Submitted 7 August, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

  13. arXiv:2003.12159  [pdf, other

    cs.LG stat.ML

    Learning To Solve Differential Equations Across Initial Conditions

    Authors: Shehryar Malik, Usman Anwar, Ali Ahmed, Alireza Aghasi

    Abstract: Recently, there has been a lot of interest in using neural networks for solving partial differential equations. A number of neural network-based partial differential equation solvers have been formulated which provide performances equivalent, and in some cases even superior, to classical solvers. However, these neural solvers, in general, need to be retrained each time the initial conditions or th… ▽ More

    Submitted 19 April, 2020; v1 submitted 26 March, 2020; originally announced March 2020.

  14. arXiv:1907.10409  [pdf, other

    cs.IR cs.LG

    Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

    Authors: Muhammad Umer Anwaar, Dmytro Rybalko, Martin Kleinsteuber

    Abstract: Improved search quality enhances users' satisfaction, which directly impacts sales growth of an E-Commerce (E-Com) platform. Traditional Learning to Rank (LTR) algorithms require relevance judgments on products. In E-Com, getting such judgments poses an immense challenge. In the literature, it is proposed to employ user feedback (such as clicks, add-to-basket (AtB) clicks and orders) to generate r… ▽ More

    Submitted 9 July, 2020; v1 submitted 24 July, 2019; originally announced July 2019.

    Comments: Accepted for ECML-PKDD 2020