owned this note
owned this note
Published
Linked with GitHub
在21世紀做自動微分?你需要Zygote.jl! - 鄭景文
===
{%hackmd LcL4-VI0SGitSiINTuy4rA %}
> 請從這裡開始
[slides/投影片](https://slides.com/peter-cheng/zygotejl#/)
## Automatic Differentiation (AD) 自動微分
## How AD work
- Wengert List (Tape / Graph)
- Derivative Definition
- Chain Rules
Steps:
1. Get the Wengert List of the given expression
2. Transform each instruction in the Wengert List
3. Apply Chain rule
### Wengert List
- A list of expression / instruction
- Transform the expression with derivative definition
- EX: $f(x) = 5\sin(\log(x))$
- Wengert List of $f$
- $y1 = \log (x)$
- $y2 = sin(y1)$
- $y3=5 \times y2$
- EX : $\frac{d}{dx}f(x)=\frac{5\cos(\log(x))}{x}$
## Different types of AD
- Foward mode
- Reverse mode
- Mix mode...
### Forward mode
- dual number
- 數學對應:類似複數,平方為 0
```julia
struct Dual{T<:Real} <: Real
x::T
ϵ::T
```
- chain rule multification start from the input $dy1/dx \times dy2/dy1$
- computational efficiency on multivariable differentiation with more output than input
- 多 Output 時效率高
### Reverse mode
現代深度學習較常見: back propogation做兩件事情
- reverse mode
- 更新參數
從最接近輸出的地方開始
對於output跟變數數量差很多的,reverse mode會比較有效率。例如,通常back propogation只有一個output (loss function),但有很多變數(parameters)
- Tracker
- Record every operation on variables to Wengert List
- Derive the Wengert List w.r.t. given variable
-
- Chain rule multiplication start from the output $dy/dy2 \times dy2/dy1$
- computational efficiency on mulivariable differentiation with more input than output
- DL situation
## Where is the Wengert List in Forward mode?
### Wengert underneath Dual Number
```julia
julia> @code_warntype f(3.)
Body::Float64
1 ─ %1 = invoke Main.log(_2::Float64)::Float64
│ %2 = invoke Main.sin(%1::Float64)::Float64
│ %3 = (Base.mul_float)(5.0, %2)::Float64
└── return %3
julia> @code_warntype D(f, 3)
Body::Float64
1 ─ %1 = (Base.sitofp)(Float64, x)::Float64 # sitofp : 型態轉換
│ %2 = invoke Base.Math.log(%1::Float64)::Float64
│ %3 = (Base.sitofp)(Float64, 1)::Float64
│ %4 = (Base.sitofp)(Float64, x)::Float64
│ %5 = (Base.div_float)(%3, %4)::Float64
│ %6 = invoke Main.sin(%2::Float64)::Float64
│ %7 = invoke Main.cos(%2::Float64)::Float64
│ %8 = (Base.mul_float)(%5, %7)::Float64
│ %9 = (Base.mul_float)(%6, 0.0)::Float64
│ %10 = (Base.mul_float)(5.0, %8)::Float64
│ %11 = (Base.add_float)(%9, %10)::Float64
└── return %11
```
## Why Julia
- Fast
- really good compiler design
## Zygote.jl
- Source to source AD
- support control flow, recursion, closures, structs, dictionaries, ...
安裝:`import Pkg ; Pkg.add("Zygote")`
## Source to Source AD
直接把某程式碼轉成另個程式碼
## Differentiate SSA Form
### Julia Compile Process
![](http://blog.rogerluo.me/images/julia-compile-diagram.png)
## SSA From
- Static Single Assignment Form
- All the variable will only be assigned once
- Most variable comes from function calls
- All the control flows become branches
https://arxiv.org/pdf/1810.07951.pdf
## Not just Unroll control flow
```julia
function pow(x, n)
r = 1
while n > 0
n -= 1
r *= x
end
return r
end
```
```julia
function grad_pow(x, n)
r = 1
Bs = Tuple{Int, Int}[]
while n > 0
push!(Bs, (r, x))
r *= x
n -= 1
end
dx = 0
dr = 1
for i = length(Bs):-1:1
(r, x) = Bs[i]
dx += dr*r
dr = dr*x
end
return dx
end
```
減少呼叫次數
## Zygote
- Compile every Julia function into differentiable
- Easy to add gradient hook
- Differentiable Programming!!
## Differentiable Programming
![](https://fluxml.ai/assets/2019-03-05-dp-vs-rl/trebuchet-flow.png)
## Zygote + Pytorch
![](https://s3.amazonaws.com/media-p.slid.es/uploads/762879/images/6450610/2019-08-17_07-02-37_%E7%9A%84%E8%9E%A2%E5%B9%95%E6%93%B7%E5%9C%96.png)
## Zygote + XLA.jl
XLA : Google TPU 格式
不須以 TensorFlow 改寫,即可在 Google TPU 上運作。
https://arxiv.org/pdf/1810.09868.pdf
###### tags: `COSCUP2019` `Julia language` `IB501`