Paul Röttger

Cited by

	All	Since 2019
Citations	1259	1258
h-index	16	16
i10-index	20	20

780

390

195

585

202120222023202418 120 336 780

Public access

View all

1 article

0 articles

available

not available

Based on funding mandates

Co-authors

Bertie VidgenOxford, TuringVerified email at rewire.online
Hannah Rose KirkUniversity of OxfordVerified email at oii.ox.ac.uk
Dirk HovyBocconi UniversityVerified email at unibocconi.it
Janet B. PierrehumbertProf. of Language Modelling, Univ. of Oxford Dept. of Engineering ScienceVerified email at oerc.ox.ac.uk
Helen MargettsProfessor of Society and the Internet, University of OxfordVerified email at oii.ox.ac.uk
Giuseppe AttanasioPostdoctoral Researcher, Instituto de TelecomunicaçõesVerified email at lx.it.pt
Debora NozzaAssistant Professor, Bocconi UniversityVerified email at unibocconi.it

Paul Röttger

Postdoctoral Researcher, Bocconi University

Verified email at unibocconi.it - Homepage

Natural Language Processing Large Language Models Online Harms AI Safety


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
HateCheck: Functional Tests for Hate Speech Detection Models P Röttger, B Vidgen, D Nguyen, Z Waseem, H Margetts, J Pierrehumbert ACL 2021 (Main) - 🏆 Stanford HAI AI Audit Challenge, 2021	248	2021
Two Contrasting Data Annotation Paradigms for Subjective NLP Tasks P Röttger, B Vidgen, D Hovy, JB Pierrehumbert NAACL 2022 (Main), 2022	141	2022
The benefits, risks and bounds of personalizing the alignment of large language models to individuals HR Kirk, B Vidgen, P Röttger, SA Hale Nature Machine Intelligence, 2024	140*	2024
SemEval-2023 Task 10: Explainable Detection of Online Sexism HR Kirk, W Yin, B Vidgen, P Röttger ACL 2023 (Main) - 🏆 Best Task Paper, 2023	121	2023
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions F Bianchi, M Suzgun, G Attanasio, P Röttger, D Jurafsky, T Hashimoto, ... ICLR 2024 (Poster), 2023	98	2023
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy NAACL 2024 (Main), 2023	89	2023
Temporal Adaptation of BERT and Performance on Downstream Document Classification: Insights from Social Media P Röttger, JB Pierrehumbert EMNLP 2021 (Findings), 2021	63	2021
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale NAACL 2022 (Main), 2021	57	2021
Multilingual HateCheck: Functional Tests for Multilingual Hate Speech Detection Models P Röttger, H Seelawi, D Nozza, Z Talat, B Vidgen WOAH at NAACL 2022, 2022	50	2022
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models HR Kirk, A Whitefield, P Röttger, A Bean, K Margatina, J Ciro, R Mosquera, ... NeurIPS 2024 (Oral), 2024	34	2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy ACL 2024 (Main) - 🏆 Outstanding Paper, 2024	32	2024
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale EMNLP 2023 (Main), 2023	29	2023
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models X Wang, B Ma, C Hu, L Weber-Genzel, P Röttger, F Kreuter, D Hovy, ... ACL 2024 (Findings), 2024	22	2024
Introducing v0.5 of the AI Safety Benchmark from MLCommons B Vidgen, A Agrawal, AM Ahmed, V Akinwande, N Al-Nuaimi, N Alfaraj, ... arXiv, 2024	20	2024
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models B Vidgen, HR Kirk, R Qian, N Scherrer, A Kannappan, SA Hale, P Röttger arXiv, 2023	19	2023
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics M Orlikowski, P Röttger, P Cimiano, D Hovy ACL 2023 (Main), 2023	19	2023
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages P Röttger, D Nozza, F Bianchi, D Hovy EMNLP 2022 (Main), 2022	15	2022
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety P Röttger, F Pernisi, B Vidgen, D Hovy arXiv preprint arXiv:2404.05399, 2024	14	2024
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models HR Kirk, B Vidgen, P Röttger, SA Hale SoLaR at NeurIPS 2023, 2023	11	2023
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ C Holtermann, P Röttger, T Dill, A Lauscher ACL 2024 (Findings), 2024	10	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors