owned this note changed 6 years ago
Linked with GitHub

在21世紀做自動微分?你需要Zygote.jl! - 鄭景文

由於場地問題,第二天我們移動到另一棟大樓啦!議程教室變動請見網站上的議程表

歡迎來到 https://hackmd.io/@coscup/2019 共筆

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

點擊本頁上方的 開始用 Markdown 一起寫筆記!
手機版請點選上方 按鈕展開議程列表。

請從這裡開始

slides/投影片

Automatic Differentiation (AD) 自動微分

How AD work

  • Wengert List (Tape / Graph)
  • Derivative Definition
  • Chain Rules

Steps:

  1. Get the Wengert List of the given expression
  2. Transform each instruction in the Wengert List
  3. Apply Chain rule

Wengert List

  • A list of expression / instruction
  • Transform the expression with derivative definition
  • EX: \(f(x) = 5\sin(\log(x))\)
    • Wengert List of \(f\)
    • \(y1 = \log (x)\)
    • \(y2 = sin(y1)\)
    • \(y3=5 \times y2\)
  • EX : \(\frac{d}{dx}f(x)=\frac{5\cos(\log(x))}{x}\)

Different types of AD

  • Foward mode
  • Reverse mode
  • Mix mode

Forward mode

  • dual number
    • 數學對應:類似複數,平方為 0
struct Dual{T<:Real} <: Real
  x::T
  ϵ::T


  • chain rule multification start from the input \(dy1/dx \times dy2/dy1\)
  • computational efficiency on multivariable differentiation with more output than input
    • 多 Output 時效率高

Reverse mode

現代深度學習較常見: back propogation做兩件事情

  • reverse mode
  • 更新參數

從最接近輸出的地方開始

對於output跟變數數量差很多的,reverse mode會比較有效率。例如,通常back propogation只有一個output (loss function),但有很多變數(parameters)

  • Tracker
    • Record every operation on variables to Wengert List
    • Derive the Wengert List w.r.t. given variable
  • Chain rule multiplication start from the output \(dy/dy2 \times dy2/dy1\)
  • computational efficiency on mulivariable differentiation with more input than output
    • DL situation

Where is the Wengert List in Forward mode?

Wengert underneath Dual Number

julia> @code_warntype f(3.)
Body::Float64
1 ─ %1 = invoke Main.log(_2::Float64)::Float64
│   %2 = invoke Main.sin(%1::Float64)::Float64
│   %3 = (Base.mul_float)(5.0, %2)::Float64
└──      return %3

julia> @code_warntype D(f, 3)
Body::Float64
1 ─ %1  = (Base.sitofp)(Float64, x)::Float64 # sitofp : 型態轉換
│   %2  = invoke Base.Math.log(%1::Float64)::Float64
│   %3  = (Base.sitofp)(Float64, 1)::Float64
│   %4  = (Base.sitofp)(Float64, x)::Float64
│   %5  = (Base.div_float)(%3, %4)::Float64
│   %6  = invoke Main.sin(%2::Float64)::Float64
│   %7  = invoke Main.cos(%2::Float64)::Float64
│   %8  = (Base.mul_float)(%5, %7)::Float64
│   %9  = (Base.mul_float)(%6, 0.0)::Float64
│   %10 = (Base.mul_float)(5.0, %8)::Float64
│   %11 = (Base.add_float)(%9, %10)::Float64
└──       return %11


Why Julia

  • Fast
  • really good compiler design

Zygote.jl

  • Source to source AD
  • support control flow, recursion, closures, structs, dictionaries,

安裝:import Pkg ; Pkg.add("Zygote")

Source to Source AD

直接把某程式碼轉成另個程式碼

Differentiate SSA Form

Julia Compile Process

SSA From

  • Static Single Assignment Form
  • All the variable will only be assigned once
  • Most variable comes from function calls
  • All the control flows become branches

https://arxiv.org/pdf/1810.07951.pdf

Not just Unroll control flow

function pow(x, n)
    r = 1
    while n > 0
        n -= 1
        r *= x
    end
    return r
end
function grad_pow(x, n)
    r = 1
    Bs = Tuple{Int, Int}[]
    while n > 0
        push!(Bs, (r, x))
        r *= x
        n -= 1
    end
    dx = 0
    dr = 1
    for i = length(Bs):-1:1
        (r, x) = Bs[i]
        dx += dr*r
        dr = dr*x
    end
    return dx
end

減少呼叫次數

Zygote

  • Compile every Julia function into differentiable
  • Easy to add gradient hook
  • Differentiable Programming!!

Differentiable Programming

Zygote + Pytorch

Zygote + XLA.jl

XLA : Google TPU 格式
不須以 TensorFlow 改寫,即可在 Google TPU 上運作。
https://arxiv.org/pdf/1810.09868.pdf

tags: COSCUP2019 Julia language IB501
Select a repo