FlashRL: 8Bit Rollouts, Full Power RL

Liyuan Liu* , Feng Yao*, Dinghuai Zhang, Chengyu Dong, Jingbo Shang, and Jianfeng Gao

August 2025

PDF Code

Abstract

Rollout generation is a primary bottleneck in RL training, taking up ~70% of total training time in DAPO-32B. FlashRL provides the first open-sourced & working RL recipe that applies quantized rollout generation while preserving downstream performance via the TIS technique. It can be easily used via pip install flash-llm-rl and supports both INT8 and FP8 quantization for both the latest GPUs (H100) and older ones (A100).

Type

Preprint

Reinforcement Learning

Liyuan Liu

Principal Researcher @ MSR

Understand the underlying mechanism of pretraining heuristics.