default search action

combined dblp search
author search
venue search
publication search

ask others

Kaiyue Wen

> Home > Persons

Person information

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2025
[c11]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/QiuHZWWMTL0L25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/QiuHZWWMTL0L25
Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou, Junyang Lin:
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models. ACL (1) 2025: 5005-5018
[c10]
- view
  - electronic edition @ aclanthology.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/acl/WuSWH25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/acl/WuSWH25
Shengguang Wu, Fan-Yun Sun, Kaiyue Wen, Nick Haber:
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images. ACL (1) 2025: 30284-30297
[c9]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/Wen0WHL025
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/Wen0WHL025
Kaiyue Wen, Zhiyuan Li, Jason S. Wang, David Leo Wright Hall, Percy Liang, Tengyu Ma:
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape View. ICLR 2025
[c8]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/WenDL25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/WenDL25
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu:
RNNs are not Transformers (Yet): The Key Bottleneck on In-Context Retrieval. ICLR 2025
[c7]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/WenZLZ25
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/WenZLZ25
Kaiyue Wen, Huaqing Zhang, Hongzhou Lin, Jingzhao Zhang:
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency. ICLR 2025
[i18]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2501-11873
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2501-11873
Zihan Qiu, Zeyu Huang, Bo Zheng, Kaiyue Wen, Zekun Wang, Rui Men, Ivan Titov, Dayiheng Liu, Jingren Zhou, Junyang Lin:
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models. CoRR abs/2501.11873 (2025)
[i17]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-08991
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-08991
Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen, Hongzhou Lin, Jingzhao Zhang, Mikhail Belkin:
Task Generalization With AutoRegressive Compositional Structure: Can Learning From D Tasks Generalize to D^T Tasks? CoRR abs/2502.08991 (2025)
[i16]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2502-13928
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2502-13928
Shengguang Wu, Fan-Yun Sun, Kaiyue Wen, Nick Haber:
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images. CoRR abs/2502.13928 (2025)
[i15]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2503-19206
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2503-19206
Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, Aditi Raghunathan:
Overtrained Language Models Are Harder to Fine-Tune. CoRR abs/2503.19206 (2025)
[i14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2504-10478
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2504-10478
Xingyu Dang, Christina Baek, Kaiyue Wen, Zico Kolter, Aditi Raghunathan:
Weight Ensembling Improves Reasoning in Language Models. CoRR abs/2504.10478 (2025)
[i13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-06708
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-06708
Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin:
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free. CoRR abs/2505.06708 (2025)
[i12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2505-16381
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2505-16381
Songlin Yang, Yikang Shen, Kaiyue Wen, Shawn Tan, Mayank Mishra, Liliang Ren, Rameswar Panda, Yoon Kim:
PaTH Attention: Position Encoding via Accumulating Householder Transformations. CoRR abs/2505.16381 (2025)
[i11]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2507-13266
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2507-13266
Jiazheng Li, Hong Lu, Kaiyue Wen, Zaiwen Yang, Jiaxuan Gao, Hongzhou Lin, Yi Wu, Jingzhao Zhang:
QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation. CoRR abs/2507.13266 (2025)
2024
[i10]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2402-18510
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2402-18510
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu:
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval. CoRR abs/2402.18510 (2024)
[i9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-05192
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-05192
Kaiyue Wen, Zhiyuan Li, Jason S. Wang, David Hall, Percy Liang, Tengyu Ma:
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective. CoRR abs/2410.05192 (2024)
[i8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2410-05459
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2410-05459
Kaiyue Wen, Huaqing Zhang, Hongzhou Lin, Jingzhao Zhang:
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency. CoRR abs/2410.05459 (2024)
2023
[c6]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/Wen0L23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/Wen0L23
Kaiyue Wen, Tengyu Ma, Zhiyuan Li:
How Sharpness-Aware Minimization Minimizes Sharpness? ICLR 2023
[c5]
- view
  - electronic edition @ openreview.net (open access)
  - details & citations
- export record
  dblp key:
  - conf/iclr/WenTZ23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iclr/WenTZ23
Kaiyue Wen, Jiaye Teng, Jingzhao Zhang:
Benign Overfitting in Classification: Provably Counter Label Noise with Larger Models. ICLR 2023
[c4]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/Wen0023
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/Wen0023
Kaiyue Wen, Zhiyuan Li, Tengyu Ma:
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization. NeurIPS 2023
[c3]
- view
  - electronic edition @ nips.cc (open access)
  - details & citations
- export record
  dblp key:
  - conf/nips/Wen0LR23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/nips/Wen0LR23
Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski:
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. NeurIPS 2023
[i7]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-07987
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-07987
Haozhe Jiang, Kaiyue Wen, Yilei Chen:
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks. CoRR abs/2303.07987 (2023)
[i6]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2307-11007
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2307-11007
Kaiyue Wen, Zhiyuan Li, Tengyu Ma:
Sharpness Minimization Algorithms Do Not Only Minimize Sharpness To Achieve Better Generalization. CoRR abs/2307.11007 (2023)
[i5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2312-01429
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2312-01429
Kaiyue Wen, Yuchen Li, Bingbin Liu, Andrej Risteski:
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars. CoRR abs/2312.01429 (2023)
[i4]
- view
  - electronic edition @ iacr.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/iacr/JiangWC23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/iacr/JiangWC23
Haozhe Jiang, Kaiyue Wen, Yilei Chen:
Practically Solving LPN in High Noise Regimes Faster Using Neural Networks. IACR Cryptol. ePrint Arch. 2023: 372 (2023)
2022
[c2]
- view
  authority control:
- export record
  dblp key:
  - conf/emnlp/WangWZ0LL22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/emnlp/WangWZ0LL22
Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li:
Finding Skill Neurons in Pre-trained Transformer-based Language Models. EMNLP 2022: 11132-11152
[c1]
- view
  authority control:
- export record
  dblp key:
  - conf/naacl/SuWQCLWWLLL0SZ22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/naacl/SuWQCLWWLLL0SZ22
Yusheng Su, Xiaozhi Wang, Yujia Qin, Chi-Min Chan, Yankai Lin, Huadong Wang, Kaiyue Wen, Zhiyuan Liu, Peng Li, Juanzi Li, Lei Hou, Maosong Sun, Jie Zhou:
On Transferability of Prompt Tuning for Natural Language Processing. NAACL-HLT 2022: 3949-3969
[i3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2206-00501
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2206-00501
Kaiyue Wen, Jiaye Teng, Jingzhao Zhang:
Realistic Deep Learning May Not Fit Benignly. CoRR abs/2206.00501 (2022)
[i2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-05729
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-05729
Kaiyue Wen, Tengyu Ma, Zhiyuan Li:
How Does Sharpness-Aware Minimization Minimize Sharpness? CoRR abs/2211.05729 (2022)
[i1]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2211-07349
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2211-07349
Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi Li:
Finding Skill Neurons in Pre-trained Transformer-based Language Models. CoRR abs/2211.07349 (2022)

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.