Julia で OpenAI Gym

Julia で OpenAI Gym === 2018/06/30 機械学習名古屋第16回勉強会 antimon2（後藤俊介） <aside class="notes">スポンサー枠(?)LTスライドですっ</aside> ---- ## お品書き + 自己紹介 + Juliaの紹介 + Julia で OpenAI Gym --- # 自己紹介 ---- ## 自己紹介 + 名前：後藤俊介 + 所属：**[有限会社来栖川電算](https://www.kurusugawa.jp)** + コミュニティ：**[機械学習名古屋](https://machine-learning.connpass.com/)**, NGK2017B, [Python東海](https://connpass.com/series/292/), Ruby東海, Rails Girls Nagoya（コーチ）, … + 言語：**[Julia](https://julialang.org)**, Python, Scala（勉強中）, Ruby, … + ![Twitter](https://i.imgur.com/HqouMIg.png) [@antimon2](https://twitter.com/antimon2) / ![Facebook](https://i.imgur.com/01nPd37.png) [antimon2](https://www.facebook.com/antimon2) + ![Github](https://i.imgur.com/yBKtii5.png) [antimon2](https://github.com/antimon2/) / ![Qiita](https://i.imgur.com/FxHMi64.png) [@antimon2](http://qiita.com/antimon2) <aside class="notes">今日は久々に Julia の話っ</aside> ---- [![有限会社来栖川電算](https://i.imgur.com/8Kuhfel.png) https://www.kurusugawa.jp](https://www.kurusugawa.jp) <aside class="notes">スポンサー枠っ</aside> ---- [![Python東海第36回勉強会](https://i.imgur.com/HxA1AMu.png) https://connpass.com/event/88249/](https://connpass.com/event/88249/) <aside class="notes">『機械学習とPython』というテーマで講演予定っ</aside> --- # Julia の紹介 <aside class="notes">Julia 良いよ Julia っ</aside> ---- [![Julia](https://upload.wikimedia.org/wikipedia/commons/6/69/Julia_prog_language.svg)](https://julialang.org) ---- ## Julia とは？(1) + [The Julia Language](https://julialang.org) + 最新 v0.6.3（2018/05/29）/ v0.7.0-beta（2018/06/24） + もうすぐ v1.0 出ます！ + 科学技術計算に強い！ + 動作が速い！（LLVM JIT コンパイル） <aside class="notes">ググるときは <a href="https://www.google.co.jp/search?q=julialang">julialang</a> で！</aside> ---- ## Julia とは？(2) > + Rのように中身がぐちゃぐちゃでなく、 > + Rubyのように遅くなく、 > + Lispのように原始的またはエレファントでなく、 > + Prologのように変態的なところはなく、 > + Javaのように硬すぎることはなく、 > + Haskellのように抽象的すぎない > > ほどよい言語である  引用元：http://www.slideshare.net/Nikoriks/julia-28059489/8  ---- ## Julia とは？(3) > + C のように高速だけど、 Ruby のような動的型付言語である > + Lisp のようにプログラムと同等に扱えるマクロがあって、しかも Matlab のような直感的な数式表現もできる > + Python のように総合的なプログラミングができて、 R のように統計処理も得意で、 Perl のように文字列処理もできて、 Matlab のように線形代数もできて、 shell のように複数のプログラムを組み合わせることもできる > + 超初心者にも習得は簡単で、超上級者の満足にも応えられる > + インタラクティブにも動作して、コンパイルもできる  （[Why We Created Julia](http://julialang.org/blog/2012/02/why-we-created-julia) から抜粋・私訳）  <aside class="notes">要するに、いろんな言語の「いいとこどり」言語！ってことでっ</aside> ---- ## 主な機能 + [多重ディスパッチ](https://ja.wikipedia.org/wiki/%E5%A4%9A%E9%87%8D%E3%83%87%E3%82%A3%E3%82%B9%E3%83%91%E3%83%83%E3%83%81) + 動的型システム + [並行・並列処理](https://docs.julialang.org/en/stable/manual/parallel-computing/)、コルーチン + 組込パッケージマネージャ <aside class="notes">v0.7から組込パッケージマネージャに大きな仕様変更が入ったんですけれどその話はまたいつかっ</aside> ---- ## 文法・関数 <aside class="notes">以降、過去スライドからのコピペ。すっ飛ばして先へ進んで戴いてもOKっ</aside> ---- ### 基本的な演算 ```julia julia> 1 + 2 - 3 * 4 # 四則演算（除算以外） -9 julia> 7 / 5 # `整数 / 整数` の結果は浮動小数 1.4 julia> 7 ÷ 5 # `整数 ÷ 整数` の結果は整数 1 julia> 2 ^ 10 # 冪乗は `^` 1024 julia> 123 & 234 | 345 # 論理積 / 論理和 376 julia> 123 ⊻ 234 # 排他的論理和(==`xor(123, 234)`) 145 ```  <aside class="notes">整数同士の除算は実数になりますっ 整数除算演算子 <code>÷</code> が別に存在します（Python の <code>//</code> 相当）っ また冪乗も（<code>**</code> ではなく）<code>^</code> ですっ <code>⊻</code> は <code>\veebar</code>+<kbd>Tab</kbd>で変換できますっ ちなみに先ほどの <code>÷</code> も <code>\div</code>+<kbd>Tab</kbd>で（基本的に $TeX$ の書式）っ</aside> ---- ### 配列 ```julia julia> a = [1, 2, 3, 4, 5] 5-element Array{Int64,1}: 1 2 3 4 5 julia> a[1] # Julia は 1-origin 1 julia> println(a[2:3]) # 範囲指定は両端含む [2, 3] ```  <aside class="notes">1-origin であることに注意すればあとは普通の配列っ あと <code>a:b</code> は範囲（<code>Range</code>）の記法。両端を含む（Ruby の <code>a..b</code> と同じ）っ</aside> ---- ### 配列の内包表記 (1) ```julia julia> a = [n^2 for n=1:5] 5-element Array{Int64,1}: 1 4 9 16 25 julia> A = [x+10y for y=1:3, x=1:3] 3×3 Array{Int64,2}: 11 12 13 21 22 23 31 32 33 ```  <aside class="notes">内包表記の記法は Python に類似っ かつ、<code>for</code> にカンマ区切りで複数のイテレータを渡すことで2次元以上の配列も作成可能っ</aside> ---- ### 配列の内包表記 (2) ```julia julia> [(a,b,c) for c=1:15,b=1:15,a=1:15 if a^2+a*b+b^2==c^2] 6-element Array{Tuple{Int64,Int64,Int64},1}: (3, 5, 7) (5, 3, 7) (6, 10, 14) (7, 8, 13) (8, 7, 13) (10, 6, 14) ``` <aside class="notes">Python と同様に <code>if</code> で条件を指定することも可能っ あと Python と同様、<code>[○ for ○=○]</code> を <code>(○ for ○=○)</code> と書くと配列ではなくて <code>Generator</code> が返りますっ</aside> ---- ### ベクトル ```julia julia> x = [1., 2., 3.]; julia> y = [3., 1., 2.]; julia> x + y # `x .+ y` と書いても同じ（elementwise operation） [4., 3., 5.] julia> x .* y # これは `x * y` と書くとNG [3., 2., 6.] julia> x ⋅ y # 内積（dot積、`dot(x, y)` と書いても同じ） 11.0 julia> x × y # 外積（cross積、`cross(x, y)` と書いても同じ） [1., 7., -5.] ```  <aside class="notes">Julia では実は1次元配列がベクトルの扱いっ <code>⋅</code> は <code>\cdot</code>+<kbd>Tab</kbd>、<code>×</code> は <code>\times</code>+<kbd>Tab</kbd>っ あとこれらや先ほどの<code>÷</code>や<code>⊻</code>などのように、ASCIIの範囲を超えたUnicode文字の演算子（そのほとんどが $TeX$ 由来）が Julia にはたくさんあります（他には例えば比較演算子の <code>≤</code> <code>≥</code> や、集合の要素 <code>∈</code> や包含関係 <code>⊆</code> などなど）</aside> ---- ### 行列 ```julia julia> A = [1 2; 3 4] # この記法は MATLAB/Octave 由来 2×2 Array{Int64,2}: 1 2 3 4 julia> A.' # `○.'` は転置行列の記法（これも MATLAB/Octave 由来） 2×2 Array{Int64,2}: 1 3 2 4 ```  <aside class="notes">Julia では2次元配列が行列の扱いっ</aside> ---- ### 行列の演算 ```julia julia> A = [1 2; 3 4]; B = [3 0; 0 6]; julia> A + B # A .+ B でも同様 2×2 Array{Int64,2}: 4 2 3 10 julia> A * B # matrix multiply 2×2 Array{Int64,2}: 3 12 9 24 julia> A .* B # elementwise multiply 2×2 Array{Int64,2}: 3 0 0 24 ```  <aside class="notes">行列は<code>*</code> で通常の行列積になりますっこれ便利っ</aside> ---- ### ブロードキャスト ```python julia> sin(0.1) 0.09983341664682815 julia> sin.([0.1, 0.2, 0.3, 0.4]) 4-element Array{Float64,1}: 0.0998334 0.198669 0.29552 0.389418 julia> [0.1, 0.2, 0.3, 0.4] .^ 2 # => [0.01, 0.04, 0.09, 0.16] ```  <aside class="notes">関数名と <code>(</code> の間に <code>.</code> を置くと、普通の関数を配列に拡張してくれる（ブロードキャスト）っ <code>.^</code> のように演算子の前に <code>.</code> を書いても同様（先ほど出た <code>.+</code> <code>.*</code> もブロードキャスト）っ</aside> ---- ### 関数定義 ```julia julia> f(x) = x^2 + 2x - 1 f (generic function with 1 method) julia> f(1) 2 julia> f.(1:5) # => [2, 7, 14, 23, 34] ``` <aside class="notes">数学のように直感的な記述で関数を定義可能っ <code>2x</code>は<code>2*x</code>の省略形、曖昧さがなければリテラルと他の識別子が続く場合などに勝手に乗算と解釈してくれるっ またユーザ定義関数も <code>.</code> をつけて自動的にブロードキャスト対応っ</aside> ---- ### 有理数・複素数 ```python julia> 1//2 == 0.5 true julia> 1//2 - 1//3 1//6 julia> 1im ^ 2 == -1 true julia> (1.0 + 0.5im) * (2.0 - 3.0im) 3.5 - 2.0im ``` <aside class="notes">有理数・複素数を標準サポート。 <code>//</code> は有理数除算（結果は有理数） <code>im</code> は虚数単位。 どちらも四則演算も普通に書けますっ</aside> --- # Julia で OpenAI Gym ---- ## OpenAI Gym + [OpenAI Gym](https://gym.openai.com/) + [OpenAI Gym (GitHub)](https://github.com/openai/gym) + 強化学習用ツールキット + 統一インターフェース（Python の）を提供 + いくつかのシミュレーション環境（ゲーム等）を用意 <aside class="notes">本日の勉強会に参加された方にはおさらいっ</aside> ---- ## Gym.jl + [Gym.jl](https://github.com/ozanarkancan/Gym.jl) + [fork by antimon2](https://github.com/antimon2/Gym.jl) + OpenAI Gym の Julia ラッパー + 基本的なインターフェースを利用可能 <aside class="notes">オリジナルの Gym.jl があまりメンテされてないし不具合があったのでforkしましたっ</aside> ---- ### Gym.jl のインストール ```julia julia> VERSION v"0.6.3" julia> Pkg.clone("https://github.com/antimon2/Gym.jl.git") julia> Pkg.checkout("Gym", "mln_ngy") julia> Pkg.build("Gym") ``` <aside class="notes">環境によっては PyCall のビルドで失敗することがあるので、その場合はそちらも私の fork したヤツ使ってねっ</aside> ---- ### 動作確認 ```julia= using Gym env = GymEnv("CartPole-v0") action_space = env.action_space @show action_space.n # => 2 obs_space = env.observation_space @show obs_space.shape # => (4,) ``` <aside class="notes">裏で <code>PyCall</code> を利用して Gym の <code>Env</code> を取得していますっ</aside> ---- ```julia=13 episode_count = 10 for i=1:episode_count total = 0 ob = reset!(env) render(env) while true action = sample(env.action_space) ob, reward, done, information = step!(env, action) total += reward render(env) done && break end println("episode $i total Rewards: $total") end ``` <aside class="notes">このコード中だと、<code>reset!</code> <code>render</code> <code>step!</code> が Gym の API を呼び出していますっ</aside> ---- ![表示例](https://i.imgur.com/tIFDr8R.png) ---- ## 学習 (DQN) ---- ### Julia で DQN + [Reinforce.jl](https://github.com/JuliaML/Reinforce.jl)（更新停止中？） + **[Knet.jl](https://github.com/denizyuret/Knet.jl)**（DL FW、Gym.jl を利用したサンプルあり） + [PyCall](https://github.com/JuliaPy/PyCall.jl) 経由で Python のFW（例：[ChainerRL](https://github.com/chainer/chainerrl)）を利用 <aside class="notes">今回は Knet.jl を採用っ</aside> ---- ### Knet.jl 使用例 ```julia= using Knet function predict(w, x) y = w[1] * x .+ w[2] return y end loss(w, x, y) = mean(abs2, y - predict(w, x)) lossgradient = grad(loss) function train(model, data, optim) for (x, y) in data grads = lossgradient(model, x, y) update!(model, grads, optim) end end ```  <aside class="notes">ほとんど素の Julia のコード（API は <code>grad</code> と <code>update!</code>）でお手軽お気軽に Deep Learning っ</aside> ---- ### Knet で DQN ---- #### mlp.jl（抜粋） ```julia=10 function predict_q(w, x; nh::Int=1) inp = x for i=1:nh inp = relu.(w["w_$i"] * inp .+ w["b_$i"]) end q = w["w_out"] * inp .+ w["b_out"] return q end ``` ---- #### piecewise_schedule.jl（抜粋） ```julia=4 struct PiecewiseSchedule endpoints::Vector{Tuple{Int, Float64}} end function value(sch::PiecewiseSchedule, t::Int) for ((l_t, l), (r_t, r)) in zip(sch.endpoints[1:end-1], sch.endpoints[2:end]) if l_t <= t < r_t α = (t - l_t) / (r_t - l_t) return l + α * (r - l) end end return sch.endpoints[end][2] end ``` <aside class="notes">ε-greeding 法で使う ε の値を算出するやつですっ</aside> ---- #### replay_buffer.jl（抜粋）(1) ```julia=10 mutable struct ReplayBuffer size::Int storage::Vector{Any} next_idx::Int end ReplayBuffer(size::Int) = ReplayBuffer(size, Any[], 1) length(buf::ReplayBuffer) = length(buf.storage) ``` <aside class="notes">ReplayBuffer っ</aside> ---- #### replay_buffer.jl（抜粋）(2) ```julia=18 function push!(buf::ReplayBuffer, obs_t, action, reward, obs_tp1, done) data = (obs_t, action, reward, obs_tp1, done) if buf.next_idx > length(buf) push!(buf.storage, data) else buf.storage[buf.next_idx] = data end buf.next_idx = mod1(buf.next_idx + 1, buf.size) end ``` <aside class="notes"><code>buf.size</code> までは自動拡張、超えたら cyclic に古いものを上書きっ</aside> ---- #### replay_buffer.jl（抜粋）(3) ```julia=31 function sample_batch(buf::ReplayBuffer, batchsize::Int; stack::Int=1) idxes = randperm(length(buf))[1:batchsize] return encode_sample(buf, idxes; stack=stack) end # :《後略》 ``` <aside class="notes"><code>encode_sample</code> 以降は長くなるので省略っ</aside> ---- #### dqn.jl（抜粋）(1) ```julia=12 function loss(w, states, actions, targets; nh=1) qvals = predict_q(w, states; nh=nh) nrows = size(qvals, 1) index = actions .+ nrows .* (0:length(actions)-1) qpred = reshape(qvals[index], size(targets)) mse = sum(abs2, targets .- qpred) / size(states, 2) return mse end lossgradient = gradloss(loss) ```` ---- #### dqn.jl（抜粋）(2) ```julia=23 function train!(w, prms, states, actions, targets; nh=1) g, mse = lossgradient(w, states, actions, targets; nh=nh) update!(w, g, prms) return mse end ``` <aside class="notes">ここまではさっきの Knet.jl のサンプルほぼそのままっ</aside> ---- #### dqn.jl（抜粋）(3) ```julia=29 function dqn_learn(w, opts, env, buffer, exploration, o) total = 0.0 readytosave = save_interval = get(o, "save_interval", 10000) episode_rewards = Float32[] frames = Float32[] ob_t = reset!(env) n_hiddens = get(o, "n_hiddens", length(o["hiddens"]))::Int target_w = o["play"] || o["tupdate"] <= 1 ? w : deepcopy(w) ``` <aside class="notes">DQN（準備パート）っ</aside> ---- ```julia=39 for fnum = 1:o["frames"] o["render"] && render(env) ob_t_reshaped = reshape(ob_t, size(ob_t)..., 1) if !o["play"] && rand() < value(exploration, fnum) a = sample(env.action_space) else obses_t = encode_recent(buffer, ob_t_reshaped; stack=o["stack"]) inp = convert(o["atype"], obses_t) qvals = predict_q(w, inp; nh=n_hiddens) a = indmax(Array(qvals)) - 1 end ob_t, reward, done, _ = step!(env, a) total += reward ``` <aside class="notes">DQN(1)（ε-greeding）っ</aside> ---- ```julia=54 if !o["play"] #process the raw ob ob_tp1_reshaped = reshape(ob_t, size(ob_t)..., 1) push!(buffer, ob_t_reshaped, a + 1, reward, ob_tp1_reshaped, done) if can_sample(buffer, o["bs"]) obses_t, actions, rewards, obses_tp1, dones = sample_batch(buffer, o["bs"]; stack=o["stack"]) obses_tp1 = convert(o["atype"], obses_tp1) #predict next q values with the target network nextq = Array(predict_q(target_w, obses_tp1; nh=n_hiddens)) maxs = maximum(nextq, 1) nextmax = sum(nextq .* (nextq .== maxs), 1) targets = reshape(rewards, 1, :) .+ (o["gamma"] .* nextmax .* dones) mse = train!(w, opts, convert(o["atype"], obses_t), actions, convert(o["atype"], targets); nh=n_hiddens) end ```  <aside class="notes">DQN(2)（Replay Buffer & Training）っ</aside> ---- ```julia=76 if o["tupdate"] > 1 && fnum % o["tupdate"] == 0 target_w = deepcopy(w) end end ``` <aside class="notes">DQN(3)（target 更新）っ</aside> ---- ```julia=81 if done ob_t = reset!(env) # 《情報表示等（略）》 push!(episode_rewards, total) push!(frames, fnum) total = 0.0 end end ``` <aside class="notes">DQN(4)（終処理）っ</aside> ---- #### cartpole_train.jl（抜粋）(1) ```julia= using Gym using Knet using ArgParse include(joinpath(@__DIR__, "dqn", "dqn.jl")) ``` <aside class="notes">Main(1)：パッケージ・モジュール読込っ</aside> ---- #### cartpole_train.jl（抜粋）(2) ```julia=8 main() = main(ARGS) main(args::String) = main(split(args)) function main(args::Vector{<:AbstractString}) s = ArgParseSettings() # : 《コマンドライン引数の処理記述省略》 o = parse_args(args, s) o["atype"] = Array{Float32} ``` <aside class="notes">Main(2)：コマンドライン引数の処理（ばっさり省略）っ</aside> ---- ```julia=16 # train env = GymEnv(o["env_id"]) INPUT = env.observation_space.shape[1] * o["stack"] OUTPUT = env.action_space.n w = DQN.init_weights(INPUT, o["hiddens"], OUTPUT, o["atype"]) o["n_hiddens"] = length(o["hiddens"]) opts = Dict(k => Rmsprop(lr=o["lr"]) for k in keys(w)) buffer = DQN.ReplayBuffer(o["memory"]) exploration = DQN.PiecewiseSchedule([(0, 1.0), (round(Int, o["frames"]/5), 0.1)]) rewards, frames = DQN.dqn_learn(w, opts, env, buffer, exploration, o) close!(env) end ``` <aside class="notes">Main(3)：学習っ</aside> ---- #### cartpole_train.jl（抜粋）(3) ```julia=32 if abspath(PROGRAM_FILE) == @__FILE__ main() end ``` <aside class="notes">Main(4)：エントリーポイントっ この記述は、Python の <code>if __name__ == '__main__': 〜</code> や Ruby の <code>if $0 == __FILE__ 〜 end</code> と同様の記述（特に後者）っ </aside> ---- ![実行例](https://i.imgur.com/oU8mX4B.jpg) ---- #### 解説 + [全ソースコード (GitHub)](https://github.com/antimon2/JuliaGymDemo) + モデルをsave/loadして追加学習や「動きを見るだけ（検証目的）」も可能 + `--env_id` を指定すれば（対応していれば）他の環境も学習・検証可能 + CNN にも対応中（別ブランチ） --- ## 参考リンク ---- ### OpenAI Gym 関連 + [OpenAI Gym](https://gym.openai.com/) + [OpenAI Gym (GitHub)](https://github.com/openai/gym) + [Table of environments（環境一覧）](https://github.com/openai/gym/wiki/Table-of-environments) + [Gym.jl（オリジナル）](https://github.com/ozanarkancan/Gym.jl) ---- ### Knet.jl 関連 + [Knet.jl](https://github.com/denizyuret/Knet.jl) + [Juliaで機械学習：深層学習フレームワークKnet.jlを使ってみる - Qiita](https://qiita.com/cometscome_phys/items/f09e801bc5b3f57f6350) ---- ### その他強化学習関連 + [ゼロからDeepまで学ぶ強化学習 - Qiita](https://qiita.com/icoxfog417/items/242439ecd1a477ece312) + [倒立振子で学ぶ DQN (Deep Q Network) - Qiita](https://qiita.com/ashitani/items/bb393e24c20e83e54577) + [DQNをKerasとTensorFlowとOpenAI Gymで実装する](http://elix-tech.github.io/ja/2016/06/29/dqn-ja.html) --- ご清聴ありがとうございます。