# PyTorch Distributed Training ###### tags: `distributed` `share` - first: https://pytorch.org/tutorials/beginner/dist_overview.html - https://zhuanlan.zhihu.com/p/129912419 - https://github.com/horovod/horovod - https://github.com/microsoft/DeepSpeed - 单机单卡 - 单机多卡dp - 单机多卡DDP - python -m torch.distributed.launch - multiple-processingmp.spawn 调用方式 - 多机多卡 - lanch - slurm - see https://github.com/facebookresearch/swav.git - others - sync BN https://zhuanlan.zhihu.com/p/337732517 - backend gloo nccl - apm 混合精度训练 https://zhuanlan.zhihu.com/p/103685761 - 参考文献 - https://github.com/facebookresearch/swav 学习如何写 - https://github.com/pytorch/examples/blob/master/imagenet/main.py - https://zhuanlan.zhihu.com/p/267157806 - https://github.com/pytorch/examples/blob/master/distributed/ddp/README.md - https://arxiv.org/abs/2006.15704 - https://pytorch.org/tutorials/beginner/dist_overview.html - https://zhuanlan.zhihu.com/p/98535650 - https://zhuanlan.zhihu.com/p/250471767 - https://zhuanlan.zhihu.com/p/178402798 - https://leimao.github.io/blog/PyTorch-Distributed-Training/ - https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html - https://www.cnblogs.com/ocean1100/p/9494640.html - https://zhuanlan.zhihu.com/p/76638962 - https://github.com/open-mmlab/mmcv - https://zhuanlan.zhihu.com/p/68717029 - https://pytorch.org/docs/stable/notes/randomness.html - ring-allreduce: https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/ - https://www.jianshu.com/p/8c0e7edbefb9 - https://github.com/baidu-research/baidu-allreduce/blob/master/collectives.cu#L156 - https://cloud.tencent.com/developer/article/1421382 - https://sharzy.in/2020/01/08/distributed-dl.html - https://github.com/bharathgs/Awesome-Distributed-Deep-Learning#blogs - reduce+broadcast(PS 架构) - ring allreduce 架构 - https://zhuanlan.zhihu.com/p/79030485 (NCCL 百度,tf)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up