NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

Traian Rebedea; Razvan Dinu; Makesh Narsimhan Sreedhar; Christopher Parisien; Jonathan Cohen

doi:10.18653/v1/2023.emnlp-demo.40

NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails

Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, Jonathan Cohen

Abstract

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or rails for short) are a specific way of controlling the output of an LLM, such as not talking about topics considered harmful, following a predefined dialogue path, using a particular language style, and more. There are several mechanisms that allow LLM providers and developers to add guardrails that are embedded into a specific model at training, e.g. using model alignment. Using a runtime inspired from dialogue management, NeMo Guardrails provides a different approach by allowing developers to add programmable rails to LLM applications - these are user-defined, independent of the underlying LLM, and interpretable. Our initial results show that the proposed approach can be used with several LLM providers to develop controllable and safe LLM applications using programmable rails.

Anthology ID:: 2023.emnlp-demo.40
Volume:: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Month:: December
Year:: 2023
Address:: Singapore
Editors:: Yansong Feng, Els Lefever
Venue:: EMNLP
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 431–445
Language:
URL:: https://aclanthology.org/2023.emnlp-demo.40
DOI:: 10.18653/v1/2023.emnlp-demo.40
Bibkey:
Cite (ACL):: Traian Rebedea, Razvan Dinu, Makesh Narsimhan Sreedhar, Christopher Parisien, and Jonathan Cohen. 2023. NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 431–445, Singapore. Association for Computational Linguistics.
Cite (Informal):: NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails (Rebedea et al., EMNLP 2023)
Copy Citation:
PDF:: https://aclanthology.org/2023.emnlp-demo.40.pdf
Video:: https://aclanthology.org/2023.emnlp-demo.40.mp4

PDF Cite Search Video