Adaptive

On the Transformer Growth for Progressive BERT Training

As the excessive pre-training cost arouses the need to improve efficiency, considerable efforts have been made to train BERT progressively--start from an inferior but low-cost model and gradually increase the computational complexity. Our objective …

Towards Adaptive Residual Network Training: A Neural-ODE Perspective

In pursuit of resource-economical machine learning, attempts have been made to dynamically adjust computation workloads in different training stages, i.e., starting with a shallow network and gradually increasing the model depth (and computation …