About me

I’m a fourth year Ph.D candidate from Department of Electronic Engineering, Tsinghua University. My research interest includes reinforcement learning in both LLMs and robotics, robust optimization theory in RL and MARL. I also have some experience in generative models especially in NLP and speech.

I am very fortunate to be advised by Yuan Shen (沈渊) in SGroup from Institute of Information Systems, EE (信息系统研究所), Tsinghua University. I am also very fortunate to collaborate with Yu Wang (汪玉) to contribute to the development of Collaborative Intelligence Group (智能协同团队). I also work with Kaiqing Zhang and Tamer Basar to explore the underlying theory of RL/MARL. Before that, I received my bachelor’s degree from the Department of Electronic Engineering, Tsinghua University.

Recently, I am working as an intern in Moonshot AI(月之暗面), working on RL with multimodal LLMs and developing K-series models for Kimi. Before that, I was interning in the RLHF group in Baichuan AI (百川智能), mentored by Dong Yan (阎栋), and in Machine Learning Group, Microsoft Asia Research, mentored by Xu Tan (谭旭), Tao Qin (秦涛) and Tieyan Liu (刘铁岩).

🌟With my Ph.D expected to conclude in 2026, I am actively exploring job or internship opportunities in autumn 2025, particularly in areas such as LLM, Multimodal LM, and embodied AI🤖️. If my expertise aligns with your interests, I would be delighted to connect (yan-yz17@tsinghua.org.cn).

🔥News

  • 2025.04:  🎉 We release Kimi-VL and Kimi-VL-Thinking, a light but powerful MoE VLM with reasoning capability.
  • 2025.02:  💻 Join Moonshot AI as a research intern, focusing on general RL for multimodal LLMs.
  • 2025.01:  🎉 Paper accepted by ICLR 2025.
  • 2024.10:  📖 Go to UIUC for a 6-month visiting, hosted by Tamer Basar.
  • 2024.06:  🎉 Paper accepted by ICML 2024.
  • 2024.01:  🎉 Paper accepted by IEEE TSP.
  • 2023.08:  💻 Join Baichuan AI as a research intern, focusing on RLHF/alignment in LLMs.
  • 2023.06:  🎉 Paper accepted by ICASSP 2023 (oral).
  • 2022.06:  🎉 Paper accepted by ICRA 2022.
  • 2021.12:  🎉 Paper accepted by ICASSP, INTERSPEECH.
  • 2020.08:  💻 Join MSRA as a research intern, focusing on Text-to-speech generation.

📑Selected Publications

  • [ICLR25] 3D-Properties: Identifying Challenges in DPO and Charting a Path Forward, Y Yan, Y Miao, J Li, et al.
  • [ICML24] Exploring the LLM Journey from Cognition to Expression with Linear Representations, Y Yan, J Li, Y Zhang, D Yan.
  • [IEEE-TSP] Distributed Policy Gradient for Linear Quadratic Networked Control with Limited Communication Range, Y Yan, Y Shen.
  • [ICASSP23] Approximation Error Back-Propagation for Q-Function in Scalable Reinforcement Learning with Tree Dependence Structure, Y Yan, Y Dong, K Ma, et al.
  • [ICRA22] Relative Distributed Formation and Obstacle Avoidance with Multi-agent Reinforcement Learning, Y Yan, X Li, X Qiu, et al.
  • [INTERSPEECH21] Adaspeech 3: Adaptive text to speech for spontaneous style, Y Yan, X Tan, B Li, et al.
  • [ICASSP21] Adaspeech 2: Adaptive Text to Speech with Untranscribed Data, Y Yan, X Tan, B Li, et al.

😊Recent Preprints

  • [Arxiv] Reward-Robust RLHF in LLMs, Y Yan, X Lou, J Li, et al.
  • [Arxiv] Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown, X Lou, D Yan, W Shen, Y Yan, et al.

🏆Awards

  • 🏅️Final Champion of the 2nd/3rd AI Arena Multi-agent Reinforcement Competition, Tencent Technology (腾讯开悟多智能体大赛). (2022, 2023)
  • 🥈First Prize, second place at the ICRA RoboMaster University Sim2Real Challenge, DJI Technology (大疆Robomaster机器人大赛). (2022)
  • 🥉Third place of the World University Math & Intelligence Competition, Chengdu FISU World University Games (成都大运会数智竞技项目, AI多智能体博弈赛道). (2023)
  • Comprehensive First-Class Scholarship for Ph.D. Student (2023, 2024)
  • Tsinghua Scholarship for Overseas Graduate Studies (2022, 2023)
  • Finalist Prize at the 1st Construction Robot Innovtion Competition, Guoqiang Research Institute, Tsinghua University (2022)
  • Comprehensive Second-Class Scholarship for Ph.D. Student (2022)
  • First Prize in Electronic Design Contest, Tsinghua University (2019)
  • Second Prize in China Undergraduate Mathematical Contest in Modeling (2018)
  • Second Prize in Parts of the National College Student Physics Competition (2018)
  • Comprehensive First-Class Scholarship for Undergraduate Student, Tsinghua University (2018-2020)

📖Educations

  • 2021.09 - now, PH.D, EE, Tsinghua University, China.
  • 2024.10 - 2025.02, visiting scholar, UIUC, ECE, USA.
  • 2018.11 - 2019.01, visiting student, University of Cambridge, UK.
  • 2017.09 - 2021.06, Undergraduate, EE, Tsinghua University, China.
  • 2014.09 - 2017.06, HuBei Wuchang Experimental High School, Wuhan, China.

💻Internships

  • 2025.02 - now, Moonshot AI, Beijing.
  • 2023.07 - 2024.09, Baichuan AI, Beijing.
  • 2020.06 - 2021.08, MSRA, Machine Learning Group, Beijing.