06/25/20-07/02/20

# 06/25/20-07/02/20 I have read four papers roughly in the past one day. They are about ML+Sys and can be divided into two categories from my perspective. **1. To accelerate DNN Training on Distributed System** * [SOSP'19][PipeDream: Generalized Pipeline Parallelism for DNN Training](https://cs.stanford.edu/~matei/papers/2019/sosp_pipedream.pdf) * [MLsys'20] [PLink: Discovering and Exploiting Datacenter Network Locality for Efficient Cloud-based Distributed Training](https://homes.cs.washington.edu/~arvind/papers/plink.pdf) These two papers are to accelerate training process on distributed system with **System** knowledge. DNN training has some specific characters, for example, they have many layers (compuatation can be divided) and the former layer result is critica for later layer(communication is frequent and important). The solution to accelerate training is to leverage system knowledge, such as to optimize compution parallelism, to diminish communication overhead. The above is about the first paper. The second paper is mainly to address the link bandwidth limits on Distributed DataCenter for multi-tenant environment with the Network knowledge. **2. Programing Language analysis with Machine Learning** * [ICSE'17][To Type or Not to Type: Quantifying Detectable Bugs in JavaScript](http://earlbarr.com/publications/typestudy.pdf) * [PLDI'20][Typilus: Neural Type Hints](https://arxiv.org/pdf/2004.10657.pdf) The two papers are to solve programming language analysis problems with machine learning. They focus on type system which includes static type system and dynamic type system. For type language, the first paper points out type detection can help debug. The second paper do type predcition with machine learning. From above four papers, I think machine learning can be problem (to be optimized) or be tool (to solve many system track problems, such as PL analysis, computer architecture). I think all these ideas are interesting and can drive me to learn more about them. However, I don't have a comprehensive understanding about details of above papers now. Still have more needed to learn. --- * [SOSP'19][Efficient Scalable Thread-Safety-Violation Detection](https://www.microsoft.com/en-us/research/uploads/prod/2019/09/sosp19-final193.pdf) I also have read one paper that comes from Shan's SOSP'19 best paper. This paper is to find concurrency bugs. Maybe we can leverage machine learning to analysis multithread program to find bug or do some interesting things.