Research

mst Image

MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

Cheng Luo, Jiawei Zhao, Zhuoming Cheng, Beidi Cheng, Anima Anandkumar

under review

RTP Image

RTP: Rethinking Tensor Parallelism with Memory Deduplicationn

Cheng Luo, Tianle Zhong, Geoffrey Fox

Arxiv 2023

Lion Image

Towards efficient deep neural network training by FPGA-based batch-level parallelism

Cheng Luo, Man-Kit Sit, Hongxiang Fan, Shuanglong Liu, Wayne Luk, Ce Guo

FCCM 2020