Skip to main content

Showing 1–4 of 4 results for author: Xhonneux, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.15589  [pdf, other

    cs.LG cs.CR

    Efficient Adversarial Training in LLMs with Continuous Attacks

    Authors: Sophie Xhonneux, Alessandro Sordoni, Stephan Günnemann, Gauthier Gidel, Leo Schwinn

    Abstract: Large language models (LLMs) are vulnerable to adversarial attacks that can bypass their safety guardrails. In many domains, adversarial training has proven to be one of the most promising methods to reliably improve robustness against such attacks. Yet, in the context of LLMs, current methods for adversarial training are hindered by the high computational costs required to perform discrete advers… ▽ More

    Submitted 21 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 19 pages, 4 figures

  2. arXiv:2402.09063  [pdf, other

    cs.LG

    Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space

    Authors: Leo Schwinn, David Dobre, Sophie Xhonneux, Gauthier Gidel, Stephan Gunnemann

    Abstract: Current research in adversarial robustness of LLMs focuses on discrete input manipulations in the natural language space, which can be directly transferred to closed-source models. However, this approach neglects the steady progression of open-source models. As open-source models advance in capability, ensuring their safety also becomes increasingly imperative. Yet, attacks tailored to open-source… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Trigger Warning: the appendix contains LLM-generated text with violence and harassment

  3. arXiv:2402.05723  [pdf, other

    cs.LG cs.CR

    In-Context Learning Can Re-learn Forbidden Tasks

    Authors: Sophie Xhonneux, David Dobre, Jian Tang, Gauthier Gidel, Dhanya Sridhar

    Abstract: Despite significant investment into safety training, large language models (LLMs) deployed in the real world still suffer from numerous vulnerabilities. One perspective on LLM safety training is that it algorithmically forbids the model from answering toxic or harmful queries. To assess the effectiveness of safety training, in this work, we study forbidden tasks, i.e., tasks the model is designed… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 19 pages, 7 figures

  4. arXiv:2206.04798  [pdf, other

    cs.AI cs.LG

    A*Net: A Scalable Path-based Reasoning Approach for Knowledge Graphs

    Authors: Zhaocheng Zhu, Xinyu Yuan, Mikhail Galkin, Sophie Xhonneux, Ming Zhang, Maxime Gazeau, Jian Tang

    Abstract: Reasoning on large-scale knowledge graphs has been long dominated by embedding methods. While path-based methods possess the inductive capacity that embeddings lack, their scalability is limited by the exponential number of paths. Here we present A*Net, a scalable path-based method for knowledge graph reasoning. Inspired by the A* algorithm for shortest path problems, our A*Net learns a priority f… ▽ More

    Submitted 8 November, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2023