Skip to main content

Showing 1–4 of 4 results for author: Segerie, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01364  [pdf, other

    cs.CR cs.AI cs.CL

    BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM Safeguards

    Authors: Diego Dorn, Alexandre Variengien, Charbel-Raphaël Segerie, Vincent Corruble

    Abstract: Input-output safeguards are used to detect anomalies in the traces produced by Large Language Models (LLMs) systems. These detectors are at the core of diverse safety-critical applications such as real-time monitoring, offline evaluation of traces, and content moderation. However, there is no widely recognized methodology to evaluate them. To fill this gap, we introduce the Benchmarks for the Eval… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  2. arXiv:2401.08999  [pdf, other

    cs.AI cs.LG

    Continuous Time Continuous Space Homeostatic Reinforcement Learning (CTCS-HRRL) : Towards Biological Self-Autonomous Agent

    Authors: Hugo Laurencon, Yesoda Bhargava, Riddhi Zantye, Charbel-Raphaël Ségerie, Johann Lussange, Veeky Baths, Boris Gutkin

    Abstract: Homeostasis is a biological process by which living beings maintain their internal balance. Previous research suggests that homeostasis is a learned behaviour. Recently introduced Homeostatic Regulated Reinforcement Learning (HRRL) framework attempts to explain this learned homeostatic behavior by linking Drive Reduction Theory and Reinforcement Learning. This linkage has been proven in the discre… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: This work is a result of the ongoing collaboration between Cognitive Neuroscience Lab, BITS Pilani K K Birla Goa Campus and Ecole Normale Superieure, Paris France. This work is jointly supervised by Prof. Boris Gutkin and Prof. Veeky Baths. arXiv admin note: substantial text overlap with arXiv:2109.06580

  3. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  4. arXiv:2109.06580  [pdf, other

    cs.AI

    Continuous Homeostatic Reinforcement Learning for Self-Regulated Autonomous Agents

    Authors: Hugo Laurençon, Charbel-Raphaël Ségerie, Johann Lussange, Boris S. Gutkin

    Abstract: Homeostasis is a prevalent process by which living beings maintain their internal milieu around optimal levels. Multiple lines of evidence suggest that living beings learn to act to predicatively ensure homeostasis (allostasis). A classical theory for such regulation is drive reduction, where a function of the difference between the current and the optimal internal state. The recently introduced h… ▽ More

    Submitted 14 September, 2021; originally announced September 2021.