MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo, Jiawei Zhao, Zhuoming Cheng, Beidi Cheng, Anima Anandkumar
under review
RTP: Rethinking Tensor Parallelism with Memory Deduplicationn
Cheng Luo, Tianle Zhong, Geoffrey Fox
Arxiv 2023
Towards efficient deep neural network training by FPGA-based batch-level parallelism
Cheng Luo, Man-Kit Sit, Hongxiang Fan, Shuanglong Liu, Wayne Luk, Ce Guo
FCCM 2020