


default search action
Rylan Schaeffer
2020 – today
- 2025
- [i28]Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo:
How Do Large Language Monkeys Get Their Power (Laws)? CoRR abs/2502.17578 (2025) - [i27]Rylan Schaeffer, Punit Singh Koura, Binh Tang, Ranjan Subramanian, Aaditya K. Singh, Todor Mihaylov, Prajjwal Bhargava, Lovish Madaan, Niladri S. Chatterji, Vedanuj Goswami, Sergey Edunov, Dieuwke Hupkes, Sanmi Koyejo, Sharan Narang:
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks. CoRR abs/2502.18339 (2025) - [i26]Joshua Kazdan, Lisa Yu, Rylan Schaeffer, Chris Cundy, Sanmi Koyejo, Dvijotham Krishnamurthy:
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data. CoRR abs/2502.19537 (2025) - 2024
- [c10]Cem Anil, Esin Durmus, Nina Panickssery, Mrinank Sharma, Joe Benton, Sandipan Kundu, Joshua Batson, Meg Tong, Jesse Mu, Daniel Ford, Francesco Mosconi, Rajashree Agrawal, Rylan Schaeffer, Naomi Bashkansky, Samuel Svenningsen, Mike Lambert, Ansh Radhakrishnan, Carson Denison, Evan Hubinger, Yuntao Bai, Trenton Bricken, Timothy Maxwell, Nicholas Schiefer, James Sully, Alex Tamkin, Tamera Lanham, Karina Nguyen, Tomek Korbak, Jared Kaplan, Deep Ganguli, Samuel R. Bowman, Ethan Perez, Roger B. Grosse, David Kristjanson Duvenaud:
Many-shot Jailbreaking. NeurIPS 2024 - [i25]Minhao Jiang, Ken Ziyu Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo:
Investigating Data Contamination for Pre-training Language Models. CoRR abs/2401.06059 (2024) - [i24]Rylan Schaeffer, Nika Zahedi, Mikail Khona, Dhruv Pai, Sang T. Truong
, Yilun Du, Mitchell Ostrow, Sarthak Chandra, Andres Carranza, Ila Rani Fiete, Andrey Gromov, Sanmi Koyejo:
Bridging Associative Memory and Probabilistic Modeling. CoRR abs/2402.10202 (2024) - [i23]Matthias Gerstgrasser, Rylan Schaeffer, Apratim Dey, Rafael Rafailov, Henry Sleight, John Hughes, Tomasz Korbak, Rajashree Agrawal, Dhruv Pai, Andrey Gromov, Daniel A. Roberts, Diyi Yang, David L. Donoho, Sanmi Koyejo:
Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data. CoRR abs/2404.01413 (2024) - [i22]Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo:
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive? CoRR abs/2406.04391 (2024) - [i21]Rylan Schaeffer, Victor Lecomte, Dhruv Bhandarkar Pai, Andres Carranza, Berivan Isik, Alyssa Unell, Mikail Khona, Thomas E. Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo:
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations. CoRR abs/2406.09366 (2024) - [i20]Lovish Madaan, Aaditya K. Singh, Rylan Schaeffer, Andrew Poulton, Sanmi Koyejo, Pontus Stenetorp, Sharan Narang, Dieuwke Hupkes:
Quantifying Variance in Evaluation Benchmarks. CoRR abs/2406.10229 (2024) - [i19]Rylan Schaeffer, Mikail Khona, Sanmi Koyejo:
In-Context Learning of Energy Functions. CoRR abs/2406.12785 (2024) - [i18]Sunny Duan, Mikail Khona, Abhiram Iyer, Rylan Schaeffer, Ila R. Fiete:
Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models. CoRR abs/2406.14549 (2024) - [i17]Anka Reuel, Ben Bucknall, Stephen Casper, Tim Fist, Lisa Soder, Onni Aarne, Lewis Hammond, Lujain Ibrahim, Alan Chan, Peter Wills, Markus Anderljung, Ben Garfinkel, Lennart Heim, Andrew Trask, Gabriel Mukobi, Rylan Schaeffer, Mauricio Baker, Sara Hooker, Irene Solaiman, Alexandra Sasha Luccioni, Nitarshan Rajkumar, Nicolas Moës, Jeffrey Ladish, Neel Guha, Jessica Newman, Yoshua Bengio, Tobin South, Alex Pentland, Sanmi Koyejo, Mykel J. Kochenderfer, Robert Trager:
Open Problems in Technical AI Governance. CoRR abs/2407.14981 (2024) - [i16]Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez:
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? CoRR abs/2407.15211 (2024) - [i15]Joshua Kazdan, Rylan Schaeffer, Apratim Dey, Matthias Gerstgrasser, Rafael Rafailov, David L. Donoho, Sanmi Koyejo:
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World. CoRR abs/2410.16713 (2024) - [i14]Elyas Obbad, Iddah Mlauzi, Brando Miranda, Rylan Schaeffer, Kamal Obbad, Suhana Bedi, Sanmi Koyejo:
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment. CoRR abs/2410.18194 (2024) - [i13]Tony T. Wang, John Hughes, Henry Sleight, Rylan Schaeffer, Rajashree Agrawal, Fazl Barez, Mrinank Sharma, Jesse Mu, Nir Shavit, Ethan Perez:
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach. CoRR abs/2412.02159 (2024) - [i12]John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma:
Best-of-N Jailbreaking. CoRR abs/2412.03556 (2024) - 2023
- [c9]Trenton Bricken, Rylan Schaeffer, Bruno A. Olshausen, Gabriel Kreiman:
Emergence of Sparse Representations from Noise. ICML 2023: 3148-3191 - [c8]Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Fiete:
Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells. NeurIPS 2023 - [c7]Rylan Schaeffer, Brando Miranda, Sanmi Koyejo:
Are Emergent Abilities of Large Language Models a Mirage? NeurIPS 2023 - [c6]Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li:
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. NeurIPS 2023 - [i11]Rylan Schaeffer, Mikail Khona, Zachary Robertson, Akhilan Boopathy, Kateryna Pistunova, Jason W. Rocks
, Ila Rani Fiete, Oluwasanmi Koyejo:
Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle. CoRR abs/2303.14151 (2023) - [i10]Rylan Schaeffer, Brando Miranda, Sanmi Koyejo:
Are Emergent Abilities of Large Language Models a Mirage? CoRR abs/2304.15004 (2023) - [i9]Boxin Wang, Weixin Chen, Hengzhi Pei, Chulin Xie, Mintong Kang, Chenhui Zhang, Chejian Xu, Zidi Xiong, Ritik Dutta, Rylan Schaeffer, Sang T. Truong
, Simran Arora, Mantas Mazeika, Dan Hendrycks, Zinan Lin, Yu Cheng, Sanmi Koyejo, Dawn Song, Bo Li:
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. CoRR abs/2306.11698 (2023) - [i8]Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo:
FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation. CoRR abs/2307.10563 (2023) - [i7]Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo:
Deceptive Alignment Monitoring. CoRR abs/2307.10569 (2023) - [i6]Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo:
Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting. CoRR abs/2307.10573 (2023) - [i5]Rylan Schaeffer:
Pretraining on the Test Set Is All You Need. CoRR abs/2309.08632 (2023) - [i4]Rylan Schaeffer, Mikail Khona, Tzuhsuan Ma, Cristóbal Eyzaguirre, Sanmi Koyejo, Ila Rani Fiete:
Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells. CoRR abs/2311.02316 (2023) - [i3]Victor Lecomte, Kushal Thaman, Trevor Chow, Rylan Schaeffer, Sanmi Koyejo:
Incidental Polysemanticity. CoRR abs/2312.03096 (2023) - 2022
- [c5]Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete:
Streaming Inference for Infinite Non-Stationary Clustering. CoLLAs 2022: 310-326 - [c4]Rylan Schaeffer, Yilun Du, Gabrielle K. Liu
, Ila Fiete:
Streaming Inference for Infinite Feature Models. ICML 2022: 19366-19387 - [c3]Rylan Schaeffer, Mikail Khona, Ila Fiete:
No Free Lunch from Deep Learning in Neuroscience: A Case Study through Models of the Entorhinal-Hippocampal Circuit. NeurIPS 2022 - [i2]Rylan Schaeffer, Gabrielle Kaili-May Liu, Yilun Du, Scott Linderman, Ila Rani Fiete:
Streaming Inference for Infinite Non-Stationary Clustering. CoRR abs/2205.01212 (2022) - 2021
- [c2]Rylan Schaeffer, Blake Bordelon, Mikail Khona, Weiwei Pan, Ila Rani Fiete:
Efficient online inference for nonparametric mixture models. UAI 2021: 2072-2081 - [i1]Rylan Schaeffer:
An Algorithmic Theory of Metacognition in Minds and Machines. CoRR abs/2111.03745 (2021) - 2020
- [c1]Rylan Schaeffer, Mikail Khona, Leenoy Meshulam, International Brain Laboratory, Ila Fiete:
Reverse-engineering recurrent neural network solutions to a hierarchical inference task for mice. NeurIPS 2020
Coauthor Index
aka: Sanmi Koyejo

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
[+][–] Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
[+][–] Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-03-22 00:04 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint