About me

I am a final-year Ph.D. candidate in the Department of Electronic Engineering, Tsinghua University. My research focuses on reinforcement learning for large language models, particularly post-training alignment algorithms and autonomous coding agents.

I am advised by Prof. Yuan Shen at the Institute of Information Systems, Tsinghua University, and have collaborated with Prof. Yu Wang on collaborative intelligence systems. My theoretical work on RL optimization is conducted with Prof. Kaiqing Zhang and Prof. Tamer Başar. I received my B.Eng. from Tsinghua in 2021.

Previously, I interned at Moonshot AI, Baichuan AI, and MSRA, working on LLM training and alignment. Upon graduation in 2026, I will join the Qwen Team at Alibaba through Alibaba Star Program, continuing my work on coding agents and RL-based post-training.

If our interests align, feel free to reach out: yan-yz17@tsinghua.org.cn

🔥News

2026.01: 🎉 We released Kimi-K2.5, delivering state-of-the-art coding and vision capabilities and a self-directed agent swarm paradigm. It ranks Open#1 in LMArena in both Text/Code/Vision domains to date and glad to be part of the achievement!
2025.11: 🎉 We released Kimi-K2-thinking, the thinking version of Kimi-K2.
2025.07: 🎉 We released Kimi-K2, an open-source model with 1T total parameters.
2025.06: 🎉 We released Kimi-Dev-72B.
2025.04: 🎉 We released Kimi-VL and Kimi-VL-Thinking, a light but powerful MoE VLM with reasoning capability.
2025.02: 💻 Joined Moonshot AI as a research intern, focusing on general RL for LLMs/mLLMs.
2025.01: 🎉 Paper accepted by ICLR 2025.
2024.10: 📖 Go to UIUC for a 6-month visiting, hosted by Tamer Basar.
2024.06: 🎉 Paper accepted by ICML 2024.
2024.01: 🎉 Paper accepted by IEEE TSP.
2023.08: 💻 Joined Baichuan AI as a research intern, focusing on RLHF/alignment in LLMs.
2023.06: 🎉 Paper accepted by ICASSP 2023 (oral).
2022.06: 🎉 Paper accepted by ICRA 2022.
2021.12: 🎉 Paper accepted by ICASSP, INTERSPEECH.
2020.08: 💻 Joined MSRA as a research intern, focusing on Text-to-speech generation.

📑Selected Publications

[ICLR25] 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward, Y Yan, Y Miao, J Li, et al.
[ICML24] Exploring the LLM Journey from Cognition to Expression with Linear Representations, Y Yan, J Li, Y Zhang, D Yan.
[IEEE-TSP] Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range, Y Yan, Y Shen.
[ICASSP23] Approximation Error Back-Propagation for Q-Function in Scalable Reinforcement Learning with Tree Dependence Structure, Y Yan, Y Dong, K Ma, et al.
[ICRA22] Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning, Y Yan, X Li, X Qiu, et al.
[INTERSPEECH21] Adaspeech 3: Adaptive text to speech for spontaneous style, Y Yan, X Tan, B Li, et al.
[ICASSP21] Adaspeech 2: Adaptive Text to Speech with Untranscribed Data, Y Yan, X Tan, B Li, et al.

😊Recent Preprints

[Arxiv] Reward-Robust RLHF in LLMs, Y Yan, X Lou, J Li, et al.
[Arxiv] Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown, X Lou, D Yan, W Shen, Y Yan, et al.

🏆Awards

🏅️Final Champion of the 2nd/3rd AI Arena Multi-agent Reinforcement Competition, Tencent Technology (腾讯开悟多智能体大赛). (2022, 2023)
🥈First Prize, second place at the ICRA RoboMaster University Sim2Real Challenge, DJI Technology (大疆Robomaster机器人大赛). (2022)
🥉Third place of the World University Math & Intelligence Competition, Chengdu FISU World University Games (成都大运会数智竞技项目, AI多智能体博弈赛道). (2023)
Comprehensive First-Class Scholarship for Ph.D. Student （2023, 2024）
Tsinghua Scholarship for Overseas Graduate Studies (2022, 2023)
Finalist Prize at the 1st Construction Robot Innovtion Competition, Guoqiang Research Institute, Tsinghua University (2022)
Comprehensive Second-Class Scholarship for Ph.D. Student (2022)
First Prize in Electronic Design Contest, Tsinghua University (2019)
Second Prize in China Undergraduate Mathematical Contest in Modeling (2018)
Second Prize in Parts of the National College Student Physics Competition (2018)
Comprehensive First-Class Scholarship for Undergraduate Student, Tsinghua University (2018-2020)

📖Educations

2021.09 - now, PH.D, EE, Tsinghua University, China.
2024.10 - 2025.02, visiting scholar, UIUC, ECE, USA.
2018.11 - 2019.01, visiting student, University of Cambridge, UK.
2017.09 - 2021.06, Undergraduate, EE, Tsinghua University, China.
2014.09 - 2017.06, HuBei Wuchang Experimental High School, Wuhan, China.

💻Internships

2025.02 - now, Moonshot AI, Beijing.
2023.07 - 2024.09, Baichuan AI, Beijing.
2020.06 - 2021.08, MSRA, Machine Learning Group, Beijing.