Liyuan Liu

Senior Researcher @ MSR

Hi there!

Welcome to Liyuan Lucas Liu (刘力源)‘s webpage! I am a Senior Researcher at Microsoft Research. My Ph.D. advisor is Prof. Jiawei Han, and my undergraduate advisor is Prof. Linli Xu. My research is about to understand the underlying mechanism of pretraining heuristics.

If you are going to visit Redmond, please let me buy you a bubble tea.


  • Pretraining Heuristics
  • Training Stability & Dynamics
  • Structures in Deep Learning


  • Ph.D. in Computer Science, 2024

    University of Illinois at Urbana-Champaign

  • B.Eng. in Computer Science, 2016

    University of Science and Technology of China

Things I do

... and want to do

The success of large-scale pretraining hinges on intricate engineering heuristics. While the empirical benefits of these heuristics are evident, their underlying mechanisms remain elusive. My research endeavors to demystify the mathematical principles underlying these pretraining heuristics, aiming to illuminate their mechanisms and potentially guide future algorithm developments.

Fun Facts

  • Received more than 3,000 GitHub stars in total! It is ranked 2,041 among all github users (according to Gitstar).
    • Although I doubt the Gitstar ranking is incomplete and outdated , it is still nice to have such an encouragement .
  • Torch-Scope has been downloaded by more than 32,000 times.
    • Although this number cannot reflect the actual user number , there must be someone other than myself using this package .
  • Won the topcoder Arabic NER challenge.
    • Since I know nothing about Arabic, my model has surely surpassed me on that . On second thought, why I am happy for being worse than PC…
  • Love skiing (can do black run, but still a rookie); DJI fans (proudly own Mavic Pro, Mavic Mini & Spark); love to watch Texas Hold’em (but seldom play); play Sheng Ji (双扣) and Mafia (狼人杀) with families & friends.

Selected Publications

List of all publications >>

(2023). Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs. Proceedings of the Twelfth International Conference on Learning Representations (ICLR 2024). Selected as Oral.


(2023). Bridging Discrete and Backpropagation: Straight-Through and Beyond. Proceedings of the Proceeding of Thirty-seventh Annual Conference on Neural Information Processing Systems (NeurIPS 2023). Selected as Oral.


(2022). Label Noise in Adversarial Training: A Novel Perspective to Study Robust Overfitting. Proceedings of the Thirty-sixth Annual Conference on Neural Information Processing Systems (NeurIPS 2022). Selected as Oral.


(2020). Understanding the Difficulty of Training Transformers. the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020). Selected as Oral.

PDF Code Slide

(2020). On the Variance of the Adaptive Learning Rate and Beyond. the Eighth International Conference on Learning Representations (ICLR 2020).

PDF Code

(2018). Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling. the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP 2018). Selected as Oral.

PDF Code Blog Doc

(2018). Empower Sequence Labeling with Task-Aware Neural Language Model. Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018). Selected as Oral.

PDF Code (new) Code (old) Blog

Highlighted Honors

List of all honors »

Winner of the Topcoder Arabic NER Challenge

Ranked 1st among 137 registrants and 220 submissions.

Guo Moruo Scholarship

Highest honor for USTC undergraduate students.

Google Excellent Scholarship

Only 58 graduate and undergraduate students shortlisted nationwide.

Reading Group

The latest reading group is available at Google Group. The reading lists are:

DMG Group Meeting

The latest DMG group meeting notification is available at Google Group.

PC Member

I’ve served as a PC Member for ACL2020, WWW2020, AAAI2020, IJCAI2020, EMNLP2019, LLD2019, ACL2019, NAACL2019, AAAI2019, and EMNLP2018.


… stay in touch!