Xinyi Wan

@ufotalent

Joined on Nov 16, 2023

  • Penghui Qi*, Xinyi Wan*, Guangxing Huang, Min Lin                                             Sea AI Lab * Equal Contributions                                                                                      2025年2月27日 See English Version Deepseek 在他们的开源周第四天开源了 DualPipe。这是一种为提升训练性能而设计的流水线并行(PP)与专家并行(EP)的协同设计。 在本文中,我们展示了 DualPipe 的 Dual 部分实际上会导致 2 倍的参数冗余,这是不必要的,并且几乎可以无代价移除,对调度的其他属性影响极小。其关键在于通过简单的“对半裁剪”操作将其转换为 V-Shape 调度。进一步地,我们展示了当不需要专家并行(EP)时,可以进一步提高效率,最终得到无流水线气泡的 ZBV 调度。 从 DualPipe 中移除重复的Parameter 需要注意的是,DualPipe 调度可以分为两个镜像对称的部分,如下图所示。例如,设备 $0$ 和 $7$ 拥有相同的pipeline stage,并且调度方式完全一致。
     Like  Bookmark
  • DualPipe could be better without the Dual Penghui Qi*, Xinyi Wan*, Guangxing Huang, Min Lin                                          Sea AI Lab * Equal Contributions                                                                                      27 Feb, 2025 中文版 Deepseek opensourced DualPipe on day 4 of their OpenSourceWeek. It's a codesign of Pipeline Parallelism and Expert Parallelism for better training performance. In this blog, we show that the Dual part of the DualPipe is actually bad for its 2×parameter redundancy, it is unnecessary and can be removed almost for free, with very slight impacts on other properties of the schedule. The trick is to transform it into a V-Shape schedule by a simple "cut-in-half" procedure. We further show that when Expert Parallel (EP) is not required, the efficiency can be further improved which leads to the ZBV schedule. DISPATCH(F) -> MLP(F) -> COMBINE(F)
     Like 1 Bookmark