20191028-20191101 周报

# 20191028-20191101 周报 ## 0. 展宏学长的量化实验相关完成了各层独立 scaling factor 的代码。 ## 1. TORCH.SPARSE 调研 ![](https://i.imgur.com/RMeMVWi.png) `torch.sparse` 尚在开发中。 > for now sparse tensor support basically evolves with the needs of our respective projects. Basic GPU support is being worked on - **it will rely on cuSPARSE for some operations.** > [Reference](https://discuss.pytorch.org/t/backprop-through-sparse-dense-matmul/1244/2) ### 稀疏矩阵乘法在 GPU 的效益简单总结 CPU 上运行的结论： [Reference](https://towardsdatascience.com/sparse-matrices-in-pytorch-be8ecaccae6) > 2 dense matrices always multiply faster than a sparse and dense matrix unless the sparse matrix has very low density. ‘Very low’ seems to be 1.5% and below. **而对于 GPU 效益更加不明显：** [Reference](https://towardsdatascience.com/sparse-matrices-in-pytorch-part-2-gpus-fd9cc0725b71) 下图比较了四种 **不同 size 的矩阵**在 **GPU** 做矩阵乘法运算的时间。蓝色：**不同密度**下做**稀疏矩阵乘法**的时间。红色：一般矩阵乘法时间（对照组） ![](https://i.imgur.com/Latmlqg.png) 可以发现只有当 n 特别大时（如 n=4096）且密度约小于 1.5% 时，在 GPU 上的运行才能带来效益。 **结论：目前在 pruning 的任务用起来不实际。** > These issues with sparse matrices have been expressed by other Pytorch users, so here’s hoping that the devs come up with more efficient ways of handling sparse matrices. ## 2. Direct Feedback Alignment (DFA) 调研 BP v.s. FA v.s. DFA 公式 6 **BP** + 传统 BP 依赖 $W^T$ 来传递。公式 7 **FA** + 用一个**随机**的**固定**的矩阵 $B$ 来取代 $W^T$，同样能达到训练的目的。（效果差一些，但可以 train 起来） + 听起来挺神奇公式 8 **DFA** + 同样不依赖 $W_T$ 而直接采用**随机**的固定的 $B$，且每一层的权重 $W_i$ **不依赖下一层**的 error $\delta a_i$ 来更新，而是**直接从输出层**的 error $e$ 来更新。 + 好处：每一层的更新都是**独立的**，可以**并行进行**。 + 缺点：精度不如 BP。 ![](https://i.imgur.com/4sd3lrJ.png) ![](https://i.imgur.com/nPFtr6v.png) **以下结果可以发现 BP 训练结果更好一些**： ![](https://i.imgur.com/SnOJnLB.png) ![](https://i.imgur.com/cpO8SP0.png) ![](https://i.imgur.com/OFlbe9x.png) ## 3. 其他完成一份编程课堂作业：[基于任意 n 元字模型的拼音输入法](https://siahuat0727.github.io/2019/11/01/pinyin-input-method/)。小用心，练练 python。 ## 下周计划继续了解 DFA 相关（online training 的准备），毕竟今年有几篇关于 DFA 的。

Read more

PyFusion

PyFusion 1 - 初步调研

Algorithm - Homework 3

Chemprop 代码相关