在21世紀做自動微分？你需要Zygote.jl！ - 鄭景文

由於場地問題，第二天我們移動到另一棟大樓啦！議程教室變動請見網站上的議程表。

歡迎來到 https://hackmd.io/@coscup/2019 共筆

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

點擊本頁上方的開始用 Markdown 一起寫筆記！
手機版請點選上方按鈕展開議程列表。

請從這裡開始

slides/投影片

Automatic Differentiation (AD) 自動微分

How AD work

Wengert List (Tape / Graph)
Derivative Definition
Chain Rules

Steps:

Get the Wengert List of the given expression
Transform each instruction in the Wengert List
Apply Chain rule

Wengert List

A list of expression / instruction
Transform the expression with derivative definition
EX: \(f(x) = 5\sin(\log(x))\)
- Wengert List of \(f\)
- \(y1 = \log (x)\)
- \(y2 = sin(y1)\)
- \(y3=5 \times y2\)
EX : \(\frac{d}{dx}f(x)=\frac{5\cos(\log(x))}{x}\)

Different types of AD

Foward mode
Reverse mode
Mix mode…

Forward mode

dual number
- 數學對應：類似複數，平方為 0

struct Dual{T<:Real} <: Real
  x::T
  ϵ::T

chain rule multification start from the input \(dy1/dx \times dy2/dy1\)
computational efficiency on multivariable differentiation with more output than input
- 多 Output 時效率高

Reverse mode

現代深度學習較常見: back propogation做兩件事情

reverse mode
更新參數

從最接近輸出的地方開始

對於output跟變數數量差很多的，reverse mode會比較有效率。例如，通常back propogation只有一個output (loss function)，但有很多變數(parameters)

Tracker
- Record every operation on variables to Wengert List
- Derive the Wengert List w.r.t. given variable
Chain rule multiplication start from the output \(dy/dy2 \times dy2/dy1\)
computational efficiency on mulivariable differentiation with more input than output
- DL situation

Where is the Wengert List in Forward mode?

Wengert underneath Dual Number

julia> @code_warntype f(3.)
Body::Float64
1 ─ %1 = invoke Main.log(_2::Float64)::Float64
│   %2 = invoke Main.sin(%1::Float64)::Float64
│   %3 = (Base.mul_float)(5.0, %2)::Float64
└──      return %3

julia> @code_warntype D(f, 3)
Body::Float64
1 ─ %1  = (Base.sitofp)(Float64, x)::Float64 # sitofp : 型態轉換
│   %2  = invoke Base.Math.log(%1::Float64)::Float64
│   %3  = (Base.sitofp)(Float64, 1)::Float64
│   %4  = (Base.sitofp)(Float64, x)::Float64
│   %5  = (Base.div_float)(%3, %4)::Float64
│   %6  = invoke Main.sin(%2::Float64)::Float64
│   %7  = invoke Main.cos(%2::Float64)::Float64
│   %8  = (Base.mul_float)(%5, %7)::Float64
│   %9  = (Base.mul_float)(%6, 0.0)::Float64
│   %10 = (Base.mul_float)(5.0, %8)::Float64
│   %11 = (Base.add_float)(%9, %10)::Float64
└──       return %11

Why Julia

Fast
really good compiler design

Zygote.jl

Source to source AD
support control flow, recursion, closures, structs, dictionaries, …

安裝：import Pkg ; Pkg.add("Zygote")

Source to Source AD

直接把某程式碼轉成另個程式碼

Differentiate SSA Form

Julia Compile Process

SSA From

Static Single Assignment Form
All the variable will only be assigned once
Most variable comes from function calls
All the control flows become branches

https://arxiv.org/pdf/1810.07951.pdf

Not just Unroll control flow

function pow(x, n)
    r = 1
    while n > 0
        n -= 1
        r *= x
    end
    return r
end

function grad_pow(x, n)
    r = 1
    Bs = Tuple{Int, Int}[]
    while n > 0
        push!(Bs, (r, x))
        r *= x
        n -= 1
    end
    dx = 0
    dr = 1
    for i = length(Bs):-1:1
        (r, x) = Bs[i]
        dx += dr*r
        dr = dr*x
    end
    return dx
end

減少呼叫次數

Zygote

Compile every Julia function into differentiable
Easy to add gradient hook
Differentiable Programming!!

Differentiable Programming

Zygote + Pytorch

Zygote + XLA.jl

XLA : Google TPU 格式
不須以 TensorFlow 改寫，即可在 Google TPU 上運作。
https://arxiv.org/pdf/1810.09868.pdf

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

在21世紀做自動微分？你需要Zygote.jl！ - 鄭景文

由於場地問題，第二天我們移動到另一棟大樓啦！議程教室變動請見網站上的議程表。

Automatic Differentiation (AD) 自動微分

How AD work

Wengert List

Different types of AD

Forward mode

Reverse mode

Where is the Wengert List in Forward mode?

Wengert underneath Dual Number

Why Julia

Zygote.jl

Source to Source AD

Differentiate SSA Form

Julia Compile Process

SSA From

Not just Unroll control flow

Zygote

Differentiable Programming

Zygote + Pytorch

Zygote + XLA.jl

tags: COSCUP2019 Julia language IB501

tags: `COSCUP2019` `Julia language` `IB501`