Researcher @ Kuaishou | M.E. @ UCAS | Focus on LLM Reasoning & Mixture-of-Experts
Popular repositories Loading
-
KlearReasoner
KlearReasoner PublicKlear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
-
LLaMA-MiLe-Loss
LLaMA-MiLe-Loss PublicCode for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models
-
chatgpt-comparison-detection-HC3-Plus
chatgpt-comparison-detection-HC3-Plus PublicCode for HC3 Plus: A Semantic-Invariant Human ChatGPT Comparison Corpus
-
-
ARPO
ARPO PublicForked from RUC-NLPIR/ARPO
The official code of “Agentic Reinforced Policy Optimization”, an agentic RL algorithm optimization.
Python 1
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
