ML-Agents 閱讀筆記

ML-Agents 閱讀筆記 === Github：https://github.com/Unity-Technologies/ml-agents 說明文件（Docs）：[Unity ML-Agents Toolkit Documentation](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Readme.md) （不建議讀翻譯版，版本太舊，自行用 Google 翻譯看效果不錯） Unity Manual：https://docs.unity3d.com/Packages/com.unity.ml-agents@2.1/manual/index.html Installation & Set-up --- ### [Installation](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Installation.md) - 如果要修改或想用最新的 example，要用 git clone 的方式安裝 - 要裝的東西： - Unity：com.unity.ml-agents - Python： - mlagents：內建訓練模型，基於 mlagents_envs - 附帶 `mlagents-learn` command line 指令 - mlagents.trainers 裡面有 - demo_loader - demo_to_buffer - buffer - AgentBuffer - mlagents_envs：最基本的，較底層與 Unity 溝通 - gym_unity（optional）：OpenAI Gym interface #### [Using Virtual Environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Using-Virtual-Environment.md) - Python venv 的教學，會的話不用看 - 我的想法：可以用 poetry 或是 conda 看看，不然也可用 Colab Getting Started --- ### [Getting Started Guide](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Getting-Started.md) - 使用 3D Balance Ball 作為範例講解 - Agent（智能代理 / 主體） - 同一個環境（場景）可以放好幾個 - Behavior Parameters：決策依據 - 觀察結果的向量 - Actions（分為連續和離散） - Max Step：最多幾次要歸零重來 - 在講解訓練的步驟、方式，以及如何載入訓練好的模型 - 要熟悉 ML-Agents 的操作流程可以由這個範例著手 ### [ML-Agents Toolkit Overview](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/ML-Agents-Overview.md) ==寫程式破關的人必看！== 講 ML-Agents Toolkit 的概觀、細部架構，就算不使用官方內建 Trainer 也是可以參考這些機器學習的模型優點：訓練智能代理的平台，支持多種訓練模式和場景，並提供多種功能 #### [Background: Unity](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Background-Unity.md) - Unity 的使用說明（一些常用到的概念、物理引擎、說明等等） - 熟悉 Unity 的就不用看 #### [Background: Machine Learning](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Background-Machine-Learning.md) - 無監督式 / 監督式 / 強化學習 - 強化學習：訓練出策略 - ![](https://i.imgur.com/4yTeKts.png) - 屬性的選擇與模型的選擇 - 深度學習可以擬合上述的這些無監督式 / 監督式 / 強化學習 - 熟悉 ML 的就不用看 #### [Background: PyTorch](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Background-PyTorch.md) - 有用到 PyTorch（模型）、TensorBoard（可視化） - PyTorch 是可以支援 TensorBoard 的（神奇吧！）以上三篇的背景知識（Background）要先看再繼續下去： - 文章中的「圖」都是重點，要理解 - 以下面「訓練 NPC 行為」的範例遊戲來貫穿整篇文章 - 範例遊戲是如何在戰爭中讓醫生救到最多士兵 - 重要元件 - 學習環境（Learning Environment） - Agent（智能代理 / 主體） - 負責觀察、執行 Action、給獎勵 - Behavior（行為 / 大腦） - 負責和機器學習的模型溝通、接收 Agent 傳過來的觀察結果，判斷要做什麼 Action - 判斷依據 - 訓練（Training）：透過 Python 訓練的階段 - 啟發（Heuristic）：人為操作、給定規則 - 推測（Inference）：根據訓練好的模型推測 - Environment Parameters（環境信息） - 可以透過 Side Channel 傳給 Python - Python 底層 API - Python 端的 mlagents_envs 套件 - 外部溝通管道（External Communicator） - Unity 與 Python 的溝通管道 - Python Trainer：Python 端的 mlagents 套件 - 附帶一個 Command Line 程式：mlagents-learn - Gym Wrapper - 與 OpenAI 的 gym 的互動管道 - 架構圖： ![](https://i.imgur.com/kgLvOUz.png) - 訓練模式分成內建 / 自訂 - 內建的範例本身就有自帶內建好的訓練架構 - Python Trainer（不用自己寫，而是使用 `mlagents-learn` 執行） - 用 yaml 格式的 config 來讓 `mlagents-learn` 訓練 - 推測的部分可在 Unity 有支援的平台跑（大部分都有） - 參考後面 Unity Inference Engine 章節 - 自訂的模式在說明文件不會太多提到 - 可以參考下面 [How to use the Python API](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API.md) 章節 - 甚至可以變成 gym 的環境 - 訓練的情境（Scenarios） - Single-Agent - 同時 Single-Agent：並行化（共用 Behavior），加速訓練 - 對抗式 Self-Play：金庸裡面周伯通的雙手互搏 - 合作的 Multi-Agent - 競爭的 Multi-Agent - Ecosystem：一個小世界，自駕模擬屬於這類 - 以下提到的訓練方式都是「內建訓練」，但如果要用自訂方法也可以參考 - 「普遍」環境訓練方法 - 獎勵分為外部（環境）與內部（與模型相關） - 模型主要可以區分為兩大類： - Deep Reinforcement Learning（深度強化學習） - 用試誤（Trial and error）的方式學習 - Reinforcement Learning（強化學習）簡稱 RL - 一般頻率的獎勵情形 - PPO（Proximal Policy Optimization，近端策略優化） - 預設模型 - SAC（Soft Actor-Critic） - 用最大熵來做強化學習 - 稀疏頻率的獎勵情形 - Curiosity for Sparse-reward Environments - 使用正向跟反向模型 - RND for Sparse-reward Environments - Random Network Distillation (RND，D 是激勵、蒸餾的意思，也就是選一些比較有代表性的來用) - 使用隨機生成權重跟預測模型 - Imitation Learning（模仿學習） - 用模仿有經驗的玩家（Demo）來學習 - 可以使用 Demonstration Recorder 進行錄製 - Observations、Actions、Rewards 的資訊 - 可以結合 RL 來輔助與加速 RL 的訓練 - GAIL（Generative Adversarial Imitation Learning，生成對抗模仿學習） - 判別器會判斷動作是由 Agent 還是 Demo 產生 - 可以不需要獎勵 - BC（Behavioral Cloning，行為複製） - 如果行為越接近 Demo 效果越好 - 可以作為預訓練的方法 - 實驗結果圖： ![](https://i.imgur.com/wu89enV.png) - 「特定」環境訓練方法 - 自我對戰（Self-Play） - 對稱：像是足球 - 可以共用策略，戰勝過去的自己 - 非對稱：像是捉迷藏 - 建議使用 PPO - 多人對戰（Multi-Agent） - 合作完成目標 - 可以使用 MA-POCA（MultiAgent POsthumous Credit Assignment） - 像是足球隊教練指揮足球隊員，也可以替換成員 - 課程學習（Curriculum Learning） - 循序漸進，由簡單到複雜 - 人也是這樣學習的 - 在內建的 yaml config 可以定義不同的課程學習環境 - 藉著環境參數隨機化來訓練 - 可以增加適應性，適應不同環境變化 - 在內建的 yaml config 可以設定環境參數如何隨機 - 模型的類型 - 輸入為向量（像是位置、速度、射線等等） - 使用一般的 NN - 輸入為影像（支援多視角，多相機） - 使用 CNN - 輸入為可變長度向量（像是多個子彈） - 使用 Attension Models（學長有講過） - 需要有記憶性的特質（RNN） - 使用 LSTM - 其他 ML-Agents Toolkit 提供的功能 - 並行多個編譯好的執行檔，加速運算（加入 `--num-envs=<n>`） - port 的部分從 `--base-port` 開始算起 - 記錄 Unity 執行時的統計數據 - 自定義 Side Channels - Unity 與 Python 進行額外的溝通 - 自定義環境參數隨機化的 Sampler（取樣方式） ### [Example Environments（可參考）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Examples.md) - 提供範例訓練場景的各種細節 Creating Learning Environments --- ==Unity 端的人必看!（尤其是前三篇）== ### [Making a New Learning Environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Create-New.md) 如何建造一個學習環境與 Agent - 安裝 ML-Agents 套件 - com.unity.ml-agents - 建立場景，包含 Agent Object - 實作 Agent - 要加入 Library，繼承自 Agent 而非 MonoBehaviour - 幾個重要的 method（需要被覆寫） - 初始化：`OnEpisodeBegin()` - 觀察環境：`CollectObservations(VectorSensor sensor)` - 回應：`OnActionReceived(ActionBuffers actionBuffers)` - Episode 表示一個回合，結束的原因有成功、失敗、超時 - 初始化（開始一個新的回合） - 寫在 `OnEpisodeBegin()` 裡面 - 觀察環境 - 寫在 `CollectObservations(VectorSensor sensor)` 裡面 - `sensor.AddObservation(資訊來源)` - 可以放不同類型的資料 - 形式待研究 - 回應（處理 Action、獎勵與結束） - 寫在 `OnActionReceived(ActionBuffers actionBuffers)` 裡面 - `actionBuffers` 裡面有 `ContinuousActions` 和 `DiscreteActions` 兩個陣列 - 收到資訊之後要決定 Action 對應的動作 - 獎勵 - 在適當位置使用 `SetReward(分數)` 給出獎勵 - 結束 - 在適當位置使用 `EndEpisode()` 結束該回合 - 最終的 Unity 編輯器設定 - ![](https://i.imgur.com/1J1i0WP.png) - 在 Agent Object 底下加入兩個 Script 元件： - `Decision Requester` 元件 - `Agent.RequestDecision()` 被呼叫的週期 - 需要做決定時要呼叫 `Agent.RequestDecision()` - 一個 step 為一個單位 - DecisionFrequency 是決定訓練速度、成功率的關鍵 - 越少做出決定訓練越快，成功率越高（在簡易場景） - 也可以選擇手動呼叫，就不需要 Decision Requester 元件了 - `Behavior Parameters` 元件 - `Behavior Name`：設定名字 - `Vector Observation`：向量性質觀察的參數設定 - `Space Size`：參數的量 - `Stacked Vectors`：目前先設 1 - `Actions`：Action 的參數設定 - `Continuous Actions`：連續 Action 的量 - `Discrete Branches`：離散 Action 的量 - 測試場景 - 可以實作 `Heuristic(in ActionBuffers actionsOut)` 來手動控制 - `actionsOut`對應 `OnActionReceived()` 的 `actionBuffers` - 把上下左右（`Input.GetAxis("Horizontal|Vertical")`，-1 到 1 的值）、其他鍵盤滑鼠等控制訊號對應到 actionsOut 連續 / 離散的 Action - `Behavior Parameters` 元件中的 `Behavior Type` 要改成 `Heuristic Only` - 點擊 Play 鍵就能玩了，看看 Agent 設定有沒有問題 - 訓練 - 可以用官房提供的 `mlagents-learn` 搭配 yaml 設定檔訓練 - 語法：`mlagents-learn yaml位置 --run-id=取一個名字` - 經驗建議（別的場景不一定） - 對於小型的資料，batch size 和 buffer size 可以設小一點 - DecisionFrequency 是決定訓練速度、成功率的關鍵 - 在簡易場景，做決定的頻率越小，訓練越快，成功率越高 - 可用 TensorBoard 觀察訓練過程 - cumulative_reward 和 value_estimate 顯示學得多好 - 加速訓練（平行化） - 同一個 Scene 放好幾份訓練區域 - 先把要訓練的區域做成 Prefab，就可以拖拉好幾份 - Behavior Name 取一樣的名字 - 開多個 Unity instance - 在 `mlagents-learn` 指令最後加上 `--num-envs=數量` ### [Designing a Learning Environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Design.md) 如何設計一個學習環境，還有一些如何設計場景、模擬、簡單設計 Agent（Designing Agents 會提更多）的概覽。如果要完全了解 ML-Agents Toolkit，可以直接去讀 API（在下面我有做一些 Source Code 的閱讀筆記），也可以透過範例來了解。 - 訓練、模擬的過程 - Academy 會驅動這些步驟 - 生命週期 1. 執行 Academy 的 `OnEnvironmentReset` hook 2. 執行每個 Agent 的 `OnEpisodeBegin()` 3. 收集觀察資訊，藉由 Agent 的 `CollectObservations(VectorSensor sensor)` 4. 用每個 Agent 對應到的 Policy（外部、啟發式、訓練結果）決定下一步 Action 5. 執行 `OnActionReceived()`，傳入上面的 Action 6. `OnEpisodeBegin()` 會再次被執行，進入新回合，如果到達 `Max Step` 或者 `EndEpisode()` 被呼叫 - 以上的一些 method 要自己覆寫 - 安排 Unity 場景 - Academy（學院） - Academy 會安排 Agent 們的運作，是一個 singleton（單獨個體），一次只會有一個（`Academy.Instance`） - Academy 的 Reset - 請在任意的 Script 中的函式掛到 `Academy.Instance.OnEnvironmentReset` 這個 hook 上面 - `Academy.Instance.OnEnvironmentReset += 函式名稱; - 時機：在 Python `UnityEnvironment.reset()` 時會執行 - 例如：在新回合時把 Agent 移到新位置、目標物擺到隨機位置 - 如果想讓訓練結果適應更普遍的環境，可以在這時候改變一些環境的因子（像是迷宮的路線） - 多個區域 - 可以同時擺好幾個區域用平行化加速訓練，只要把 Agent 的 `Behavior Name` 設成一樣的就好 - 在設計場景的時候，也可以設計成方便這種多個區域的形式 - 可以參考範例，或上一篇（講到用 Prefab 快速建立區域） - 環境 - 製作訓練環境時要注意 - 訓練場景要在 Unity 程式開啟的時候自動打開 - 在每一回合結束後，Academy 要重設場景（可用 `OnEnvironmentReset` hook） - 每一回合一定要可以結束，不論透過到達 `Max Step` 或是呼叫 `EndEpisode()` - 環境參數 - 使用時機：Curriculum learning（課程學習）和 environment parameter randomization（環境參數隨機調整） - Python 端 API 和內建訓練設定的 yaml 檔都可以透過 EnvironmentParameters 專屬的 Side Channel 來與之互動（提供了常數或是幾種亂數生成的功能），使用方法自行查詢文件 - 建議在 `OnEpisodeBegin()`（不一定，適當位置）透過 `Academy.Instance.EnvironmentParameters.GetWithDefault(key, float的預設值)` 來取用 - Agent - Agent 要附在 GameObject 上面 - Behavior Parameters 要設定好 - 要覆寫的 methods： - `OnEpisodeBegin()` - 初始化或重設的時候會呼叫 - `CollectObservations(VectorSensor sensor)` - 收集環境資訊 - `OnActionReceived(ActionBuffers actionBuffers)` - 執行 Policy 決定好的 Action 並設定獎勵 - Behavior Parameters 與後兩個 methods 有關 - 必須要決定如何結束回合 - 在 `OnActionReceived()` 中呼叫 `EndEpisode()` - 或設定最大步數 `Max Steps` - 參考下一篇 Designing Agents 有更多資訊 - 錄製統計資訊 - `StatsRecorder.Add(key, value [, 性質])` - 可以呈現在 TensorBoard 或透過 Side Channel 傳送到 Python 端 #### [Designing Agents（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Design-Agents.md) Agent 相關的運作、觀察資料的種類、Action 的種類、如何給獎勵、多個 Agent 的狀況（Teams）、**錄製 Demonstrations** 裡面都有該如何設定的建議（Best Practices），對學機器學習很有幫助如果要更知道更詳細運作原理請看下方的 Source Code 閱讀筆記 - Agent - ### [Using an Executable Environment](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Executable.md) - 可以先 Build 執行檔來進行訓練（當然，在 Unity Editor 也可以） - 可以搭配 Unity Environment Registry（下面有解釋）使用 - 可以用 Server Mode（前提是不需要渲染畫面） - 這篇文章有教如何 Build 一個執行檔 - 不過 Run in background 我覺得不需要 - 使用 Python API 時在 UnityEnvironment 的 file_name 傳入執行檔路徑即可 ```=Python from mlagents_envs.environment import UnityEnvironment env = UnityEnvironment(file_name=<env_name>) ``` - 如果用內建的 `mlagents-learn` 要這樣執行 `mlagents-learn <trainer-config-file> --env=<env_name> --run-id=<run-identifier>` - `<trainer-config-file>`：yaml 的 config 檔 - `<env_name>`：執行檔路徑（不用副檔名） - `<run-identifier>`：區分不同次訓練用的 - 訓練完的模型會放在 `results/<run-identifier>/<behavior_name>.onnx` - 可以移到適合位置（建議：專案下的 TFModels 資料夾） - 在 Inspector 視窗把模型拉到 Agent 的 Model ### [ML-Agents Package Settings](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Package-Settings.md) - 設定位於 Edit > Project Settings... > ML-Agents - 可以針對整體做設定，也可以針對不同情境做設定 - 像是給訓練使用的設定、給推測使用的設定 Training & Inference --- 這部分 PAIA 框架不太需要，但可以試試自行轉換的 ONNX 能否使用？ ### [Training ML-Agents（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-ML-Agents.md) #### [Training Configuration File（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-Configuration-File.md) #### [Using TensorBoard to Observe Training（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Using-Tensorboard.md) #### [Profiling Trainers（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Profiling-Python.md) ### [Unity Inference Engine](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Unity-Inference-Engine.md) - 用 Barracuda 使 Unity 可以用來做機器學習的 prediction - [Unity Barracuda Github](https://github.com/Unity-Technologies/barracuda-release) - [Unity Barracuda Mannual](https://docs.unity3d.com/Packages/com.unity.barracuda@2.1/manual/index.html) - ML-Agents 只有對使用內建的訓練結果來 Inference 做說明 - 使用 ONNX（Open Neural Network Exchange）開放神經網路交換格式 - 訓練好的策略模型（Policy）是 `.onnx` 格式 - 如果要使用自己提供訓練好的模型 - 要轉成 ONNX 格式 - 要遵從一些慣例，參考： - [TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/TensorNames.cs) - [BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs) - 如果不想管這些慣例，建議自行用 Barracuda 來跑 - 如果要把內建的訓練結果給外部使用 - 自行參考 ONNX 格式、ML-Agents 的慣例 Extending ML-Agents --- ### [Creating Custom Side Channels（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Custom-SideChannels.md) ### [Creating Custom Samplers for Environment Parameter Randomization（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-ML-Agents.md#defining-a-new-sampler-type) Python Tutorial with Google Colab --- 使用 Python API 可以參考這些使用方法 ### [Using a UnityEnvironment（未讀完）](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_18_docs/colab/Colab_UnityEnvironment_1_Run.ipynb) 訓練時期如何使用 API，很多基本的操作都有示範了！ - Get the Behavior Specs from the Environment ```=Python # We will only consider the first Behavior behavior_name = list(env.behavior_specs)[0] print(f"Name of the behavior : {behavior_name}") spec = env.behavior_specs[behavior_name] ``` ### [Q-Learning with a UnityEnvironment（未讀完）](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_18_docs/colab/Colab_UnityEnvironment_2_Train.ipynb) 訓練時期如何使用 API（以 Q-Learning 為範例） - 有儲存過去的觀察資料，這部分是要自己實作的 - Q-Learning 好像需要一個 memory 儲存過去的東西 ### [Using Side Channels on a UnityEnvironment（未讀完）](https://colab.research.google.com/github/Unity-Technologies/ml-agents/blob/release_18_docs/colab/Colab_UnityEnvironment_3_SideChannel.ipynb) Help --- ### [Migrating from earlier versions of ML-Agents（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Migrating.md) - 以前的 Brain 現在變成 Behavior ### [Frequently Asked Questions](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/FAQ.md) - 主要是環境設定的常見問題 - 要注意執行檔的權限 - 如果連線超時要怎麼解決？ - port 被佔用要修改 ### [ML-Agents Glossary](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Glossary.md) - 各種名詞解釋：Academy、Action、Agent、Decision、Editor、Environment、Experience、External Coordinator、FixedUpdate、Frame、Observation、Policy、Reward、State、Step、Trainer、Update ### [Limitations](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Limitations.md) - 每個 package 都有一些使用限制要注意： - com.unity.mlagents - 訓練只支援三大作業系統：Windows、Mac、Linux - 推測的話 CPU 都可以用，GPU 在行動裝置要看情況 - Headless Mode（Server Mode）無法支援「visual observations」 - 這樣就沒有圖形類的訓練資料 - 目前 Unity Inference Engine 只支援用官方 trainers 訓練出的模型 - 如果嫌照官方慣例太麻煩，可以自己用 Barracuda 跑 - mlagents - 如果有使用到 Multi-Agent 或是 Self-Play 再注意一下 - mlagents_envs - 用 Socket 來溝通，要注意防火牆設定，確保安全 - 如果要執行「多個」 UnityEnvironment 要指定不同 port - 這個通道是不加密的，要注意不要外露 - Linux 如果剛關閉程式，port 是不會立即釋放的 - gym_unity - 只適用在 Single Agent - 其他限制自行去看 API Docs --- ### [API Reference](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/API-Reference.md) - 請參考 [Scripting API](https://docs.unity3d.com/Packages/com.unity.ml-agents@2.1/api/index.html)，Reference 都在裡面 ### [Python API Documentation（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API-Documentation.md) - UnityEnvironment - 用 Socket 來溝通（可不指定 port），要注意防火牆設定，確保安全 - 如果要執行多個 UnityEnvironment 要指定不同 port - 這個通道是不加密的，要注意不要外露 - Linux 如果剛關閉程式，port 是不會立即釋放的 - 繼承自 `BaseEnv`（可參考底下 Source Code 閱讀筆記） - UnityEnvRegistry - SideChannel ### [How to use the Python API（未讀完、待思考）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Python-API.md) - ==Python 端的人必看！== - 可以不用下內建的 `mlagents-learn` 的 command 指令 - 學弟的直升機就是用這種方式 - ==待思考：這樣使用 ML-Agents 的優勢在哪裡？相比自己寫 Socket？== - 我們善用 Agent 和 Recording 的部分，不用 Trainer 的部分 - 自訂模式要先開 Python 才能夠進行推測 - ML-Agents 是用 gRPC 來溝通的，資料結構定義完善 - 每一個 step 要怎麼做、資料傳送的格式、生命週期 ### [How to use the Unity Environment Registry](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Unity-Environment-Registry.md) - 不用安裝龐大的 Unity Editor 就可以執行 ML-Agents 相關訓練操作的方法 - 等於是把程式打包成不同的平台這樣 - 用 yaml 來定義不同學習環境 ### [Wrapping Learning Environment as a Gym (+Baselines/Dopamine Integration)（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/gym-unity/README.md) Deprecated Docs --- 因為已經不被使用，可能就不會再更新了，保留的原因是或許對你有幫助。 ### [Windows Anaconda Installation（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Installation-Anaconda-Windows.md) ### [Using Docker（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Using-Docker.md) ### [Training on the Cloud with Amazon Web Services（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-on-Amazon-Web-Service.md) ### [Training on the Cloud with Microsoft Azure（未讀完）](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Training-on-Microsoft-Azure.md) ### [Using the Video Recorder（未讀完）](https://github.com/Unity-Technologies/video-recorder) 其他 --- 官方文件沒有，是我自己整理的 ==Python、Unity 端的人都要看！== ### Demonstration Recorder / Loader 錄製 demo 以及讀取 demo（一個 demo 對應一個 Agent，應該吧） - Demonstration Recorder（Unity 端） - [Imitation Learning](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/ML-Agents-Overview.md#imitation-learning) - [Designing Agents - Recording Demonstrations](https://github.com/Unity-Technologies/ml-agents/blob/main/docs/Learning-Environment-Design-Agents.md#recording-demonstrations) - [Imitation Learning and demo - old version](https://github.com/gzrjzcx/ML-agents/blob/master/docs/Training-Imitation-Learning.md) - [Unity ML-Agents - Demonstration Recorder for Imitation Learning](https://www.youtube.com/watch?v=Dhr4tHY3joE) - [How to Train a Machine Learning Agent via Demonstration](https://gamedevacademy.org/unity-machine-learning-training-tutorial/) - Demonstration Loader（Python 端），官方文件沒寫 - mlagents.trainers.demo_loader - Reference：[mlagents/trainers/demo_loader.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/demo_loader.py) - 函式 demo_to_buffer(file_path, sequence_length) - 說明：讀出 demo 檔然後存到訓練 buffer - 參數 `file_path`：`.demo` 檔路徑 - 參數 `sequence_length`：填充 buffer 的 trajectories 長度 - 建議設為 `None` 或是 1 - [What is a "trajectory" in reinforcement learning?](https://ai.stackexchange.com/questions/7359/what-is-a-trajectory-in-reinforcement-learning/8955) - 參考 [`resequence_and_append()`](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/demo_loader.py#L94) - `batch_size=None` - `training_length=sequence_length` - 參考 [`get_batch()`](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/buffer.py#L143) - `batch_size=batch_size=None` - `training_length` 預設值為 1 - `training_length=None` 會自動變 1 - `sequential`：`If true and training_length is not None: the elements will not repeat in the sequence. [a,b,c,d,e] with training_length = 2 and sequential=True gives [[0,a],[b,c],[d,e]]. If sequential=False gives [[a,b],[b,c],[c,d],[d,e]]` - 我們的狀況是預設值為 True 的 - 可省參數 expected_behavior_spec：預期 Behavior 的架構 - 回傳 Tuple[BehaviorSpec, AgentBuffer] - BehaviorSpec：Behavior 的架構 - AgentBuffer：讀出來的資料 - 錄製的資料是以 Protocol Buffers 的文字形式儲存的 - `ParseFromString(data[pos : pos + next_pos])` - [Interpreting .demo files. #2440](https://github.com/Unity-Technologies/ml-agents/issues/2440) - [Faster loading of visual demonstrations? #2979](https://github.com/Unity-Technologies/ml-agents/issues/2979) ### Python 運作 #### Buffer 相關 Reference：[mlagents/trainers/buffer.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/buffer.py) 把 demo 的資料讀出來後會存到 AgentBuffer - mlagents.trainers.AgentBuffer - AgentBufferKey 說明（剩下直接參考 Source Code） - `BufferKey`：取用 meta 資訊 - 常用 BufferKey： - ACTION_MASK = "action_mask" - ==CONTINUOUS_ACTION = "continuous_action"== - ==DISCRETE_ACTION = "discrete_action"== - ==DONE = "done"== - ==ENVIRONMENT_REWARDS = "environment_rewards"== - 這是取得 Reward 的部分！ - MASKS = "masks" - MEMORY = "memory" - GROUP_DONES = "group_dones" - GROUPMATE_REWARDS = "groupmate_reward" - GROUP_REWARD = "group_reward" - GROUP_CONTINUOUS_ACTION = "group_continuous_action" - GROUP_DISCRETE_ACTION = "group_discrete_aaction" - `(ObservationKeyPrefix, int)`：取用觀察數據 - 常用 ObservationKeyPrefix： - ==OBSERVATION = "obs"== - GROUP_OBSERVATION = "group_obs" - `(RewardSignalKeyPrefix, name)`：取用獎勵種類 - 常用 RewardSignalKeyPrefix： - REWARDS = "rewards" - VALUE_ESTIMATES = "value_estimates" - name：`extrinsic`、`gail`、`curiosity`、`rnd` - AgentBufferField（繼承自 list） - 針對每個 Field 所儲存的錄製資料 - `[index]`：取用數據（一般是 np.ndarray，如果是群組、Actions、獎勵等才是 np.ndarray 的 List） - index 第幾筆錄製資料 - AgentBuffer - demo_to_buffer 產生的 buffer 屬於 AgentBuffer - `[AgentBufferKey]`：取用 AgentBufferField - 先類別再索引的概念 - ==batch、shuffle 之後再研究== #### Env 相關先參考上面 How to use the Python API、Python API Documentation，其餘的看 Source Code Reference：[ml-agents-envs/mlagents_envs/base_env.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/base_env.py) - Behavior 的資料之數值和 meta 資訊是分開的 - 數值放在 `DecisionStep`、`TerminalStep` 中 - 觀察類的數值用 `List[np.ndarray]` 型態存在 `.obs` 裡面 - 對應到的是 `BehaviorSpec.observation_specs`，也是 List 類型 - 獎勵存在 `.reward` - 還有一些其他資訊（Agent ID、群組等等） - 給出的 Action 要照著 `ActionSpec` 放在 `ActionTuple` 中 - meta 資訊用 `BehaviorSpec`、`ActionSpec` 表示 - DecisionStep - ==待補充== - DecisionSteps - ==待補充== - TerminalStep - ==待補充== - TerminalSteps - ==待補充== - ==BehaviorSpec：這是 Behavior meta 資訊的部分== - `.observation_specs`：`List[ObservationSpec]` - 我「猜」會依照 Vector、Camera、Raycast、可變長度區分 - `.action_spec`：`ActionSpec` - ObservationSpec：觀察資訊的部分 - `.shape`：`Tuple[int, ...]` - 每一個維度資料多長 - `.dimension_property`：`Tuple[DimensionProperty, ...]` - 每一個維度資料的類型 - `.observation_type`：`ObservationType` - 對達成目標有無重要影響 - `(.name)`：`str` - 可選欄位，為 ISensor 的名字（`ISensor.GetName()`） - ==DimensionProperty：維度的資料類型== - UNSPECIFIED = 0 - 未知 - NONE = 1 - 一般的有序資料，可以直接丟到全連接層處理 - TRANSLATIONAL_EQUIVARIANCE = 2 - 平移（區域）不變性，適合用卷積神經網路 CNN 處理 - VARIABLE_SIZE = 4 - 資料長度可變，並且沒有排序，應該適合用 Attension 處理 - ObservationType：觀察資訊的類型（對達成目標有無重要影響） - DEFAULT = 0 - 一般 - GOAL_SIGNAL = 1 - 這是對達成目標有重要影響的資訊 - ==ActionTuple：存放 Action 數值的 Object== - 需照著 ActionSpec 來存放 - 維度分別是 (n_agents, continuous_size) 和 (n_agents, discrete_size) - 均為二維陣列！ - branch 的意思不是多加一個維度，而是用來設定「每一組」離散 Action 的可能數目 - 可以給 Behavior 的所有 Agents 用，也可以給指定 Agent（n_agents = 1） - 型態部分：連續是 np.float32，離散是 np.int32 - `.add_continuous(self, continuous: np.ndarray)` - `.add_discrete(self, discrete: np.ndarray)` - `.continuous(self) -> np.ndarray` - `.discrete(self) -> np.ndarray` - ActionSpec：Action meta 資訊的部分 - `.continuous_size`：`int` - 連續 Action 的數量（沒有分組） - `.discrete_branches`：`Tuple[int, ...]` - 離散 Action 的上限（每組有幾個可能值） - `.empty_action(self, n_agents: int) -> ActionTuple` - 照著 ActionSpec 產生一個空的 ActionTuple - `.random_action(self, n_agents: int) -> ActionTuple` - 照著 ActionSpec 產生一個隨機的 ActionTuple - 連續的部分是從 -1.0 到 1.0 取亂數 - 離散的部分是從 0 到該組的個數取亂數 - BaseEnv - ==待補充== Reference：[ml-agents-envs/mlagents_envs/environment.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/environment.py) - UnityEnvironment - ==待補充== ### 底層溝通管道 Python 與 Unity 之間的溝通管道是用 [gRPC](https://grpc.io/) Reference：[rpc_communicator.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/rpc_communicator.py) - 預設的 port 是從 5005 往上加 - gRPC 預設就是用 Protocol buffers 當資料結構了 Python 與 Unity 之間的溝通的資料結構是用 Protocol buffers 來實作的 Reference：[Unity ML-Agents Protobuf Definitions](https://github.com/Unity-Technologies/ml-agents/tree/main/protobuf-definitions) 可以參考： - [Protocol Buffers 官網](https://developers.google.com/protocol-buffers) - [比起 JSON 更方便、更快速、更簡短的 Protobuf 格式](https://yami.io/protobuf/) 其中 observation.proto 裡面有一些觀察資料的傳輸過程： - [observation.proto](https://github.com/Unity-Technologies/ml-agents/blob/main/protobuf-definitions/proto/mlagents_envs/communicator_objects/observation.proto) - 例如可以看到 PNG 圖檔如何被傳輸等等 - [observation_pb2.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/communicator_objects/observation_pb2.py) - 這是 Protocol Buffers 自動生成的 Python 檔案 ==從 Protocol Buffers 轉換到 Python 端的資料型態的工具== Reference：[rpc_utils.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/rpc_utils.py) - 裡面有實作如何把收到的資料轉換成 Python Low Level API 可用的資料 - 例如圖片如何被轉換、Spec 如何被轉換等等 - Low Level API、Demo Loader 都有用到 `steps_from_proto`、`behavior_spec_from_proto` 來轉換 - Steps、BehaviorSpec 兩大重點 - Low Level API：[environment.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents-envs/mlagents_envs/environment.py) - Demo Loader：[demo_loader.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/demo_loader.py) ### Unity 運作在 Source Code 當中可以看到大量的 `event Action` 配合 `Invoke()` 來執行，可以參考 [[C#]使用委派(Delegate)與事件(Event)](https://hackmd.io/@BKLiang/csharp_delegate_event) 這篇文章，要注意的是 `Invoke()` 本身是同步而不是非同步執行的，也就是會等待，執行完一個才會再執行下一個的意思。註：Source Code 中如果出現 ```=C# using (TimerStack.Instance.Scoped("...")) { // 程式 } ``` 是用來計算運行這段程式所需的時間（參考[【C#】小知識 #5 : 為什麼要使用using](https://ithelp.ithome.com.tw/articles/10199186)）應該是給統計數據用的 #### Agent 參考：[Class Agent](https://docs.unity3d.com/Packages/com.unity.ml-agents@2.1/api/Unity.MLAgents.Agent.html) 參考：[Agent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Agent.cs) 執行 `RequestAction()` 表示想要做一個動作（但不收集新數據），這會設置 `m_RequestAction = true`，跟 `AgentStep()` 有關想單獨觸發決策建議用 `RequestDecision()`，跟 `SendInfo()` 有關，會處理新收集到的資訊，這會設置 `m_RequestDecision = true`，也會執行 `RequestAction()`，所以 `RequestAction()` 就不用再執行一次了（因為沒差）想要重設 Agent 可以執行 `EndEpisode()`，這是除了到達 MaxStep 之外的另外一個重新開始的方式，`EndEpisode()` 中會執行到 `_AgentReset()`，並且也會通知說該回合結束了 `SendInfo()` 會執行 `SendInfoToBrain()` 收集新資料、更新舊資訊，會跟 Policy 溝通，取得 `IPolicy.RequestDecision(m_Info, sensors)` 後的結果，也會把資訊錄製到 Demo Recorder，處理完之後 `m_RequestDecision` 會設回 `false` `DecideAction()` 會跟 Policy 溝通，取得 `IPolicy.DecideAction()` 後的結果並用 `m_ActuatorManager.UpdateActions(actions)` 更新 `m_ActuatorManager` 裡面 Actuator 們的 Action `AgentStep()` 會執行 m_ActuatorManager 中的所有 Action，或是終止動作，處理完之後 `m_RequestAction` 會設回 `false` `_AgentReset()` 會清空先前資料，Step 設回 0，並執行 `OnEpisodeBegin()`，開始新回合上面的東西外加一些其他的會對到這些 hooks： ``` Academy.Instance.AgentIncrementStep += AgentIncrementStep Academy.Instance.AgentSendState += SendInfo Academy.Instance.DecideAction += DecideAction Academy.Instance.AgentAct += AgentStep Academy.Instance.AgentForceReset += _AgentReset ``` 都是放在 `LazyInitialize()`，在 Academy 中被 Invoke 的時候就會被執行到註：`OnEnable()` 會執行 `LazyInitialize()`，`LazyInitialize()` 會執行 `Initialize()`（如果有的話） `m_Info` 屬於 `struct AgentInfo`，儲存跟 Agent 有關的資料，包含 observations, actions 還有目前狀態 `m_PolicyFactory` 屬於 BehaviorParameters，會抓取 Agent GameObject 中的 BehaviorParameters `m_Brain` 屬於 IPolicy，就是決策提供者，根據 `m_PolicyFactory` 的資訊選擇決策方式（Barracuda、Heuristic、Remote） `SetModel(behaviorName, NNModel, InferenceDevice)` 可以變更 Behavior Name、替換 NN 的 Model 單純要改 Behavior Name（不要從 BehaviorParameters 改！）： `SetModel(behaviorName, null)` 會需要 Override 的幾個 methods 的執行時間點： `OnEpisodeBegin()` 會在 `LazyInitialize()` 的時候被執行 `CollectObservations(VectorSensor sensor)` 會在 `SendInfoToBrain()` 中跟 Policy 溝通之前或 `AgentStep()` 遇到終止狀況的時候被執行（這算是比較舊的方法了，現在建議用 `ObservableAttribute`） `OnActionReceived(ActionBuffers actionBuffers)` 會在 `AgentStep()` 裡面的 `m_ActuatorManager.ExecuteActions()` 被執行 #### Sensor 負責收集資訊參考：[ISensor.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/ISensor.cs) `GetName()` 可以取得名字，ML Agents 提供的 Sensor 都會自己取看得懂的名字（如果沒有提供）在 Agent.cs 中，`InitializeSensors()` 時會執行到 `SensorUtils.SortSensors(sensors)`，所以我們拿到資訊的時候是照這些 Sensor 的名字去排序參考：[ObservableAttribute.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/Reflection/ObservableAttribute.cs) 這是加入 Vector 數值的建議方法（取代 `CollectObservations(sensor)`）用法： `[Observable(name: null, numStackedObservations: 1)]` 如果沒有提供 name，命名的慣例是（Line 197）： `ObservableAttribute:型態名稱.成員名稱` 參考：[VectorSensor.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/VectorSensor.cs) 如果沒有提供 name，命名的慣例是（Line 28~31）： `VectorSensor_size + 參數數量 + _GOAL_SIGNAL（如果是的話）` VectorSensorComponent 可以設定 ObservationType 屬性參考：[StackingSensor.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/StackingSensor.cs) 如果沒有提供 name，命名的慣例是（Line 58）： `StackingSensor_size + Stack數量 + Sensor原本名字` 參考：[SensorComponent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/SensorComponent.cs) 這是 Component 形式的 Sensor 集合，裡面可以塞很多個 Sensor 要實作 `ISensor[] CreateSensors()` 參考：[CameraSensorComponent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/CameraSensorComponent.cs) 裡面對應到一個 CameraSensor 如果沒有提供 SensorName，命名的慣例是（Line 27~38）： `CameraSensor` 注意：如果執行後來再去改名，並不會影響原本排序可以設定 ObservationType 屬性參考：[RenderTextureSensorComponent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/RenderTextureSensorComponent.cs) 裡面對應到一個 RenderTextureSensor 如果沒有提供 SensorName，命名的慣例是（Line 27~38）： `RenderTextureSensor` 注意：如果執行後來再去改名，並不會影響原本排序參考：[RayPerceptionSensor.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensor.cs) RayPerceptionSensorComponent2D、RayPerceptionSensorComponent3D 裡面對應到一個 RayPerceptionSensor 如果 Component 沒有提供 SensorName，命名的慣例是（[RayPerceptionSensorComponentBase.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponentBase.cs#L14)）： `RayPerceptionSensor` 注意：如果執行後來再去改名，並不會影響原本排序資料的型態是 Vector，長度為： `(Observation Stacks) * (1 + 2 * Rays Per Direction) * (Num Detectable Tags + 2)` 關於資料擺放方式可以參考 `ToFloatArray()`（Line 218、322）一個 Ray 所存放的資訊（`ToFloatArray()` 上面的註解要看！）：先是 One-hot 的 HitTag，再來是 HasHit，再來是 HitFraction 所以才要 `Num Detectable Tags + 2` `HitFraction` 是把觀察到的長度除以最大可觀察長度的結果關於各個角度的擺放順序可以參考 [RayPerceptionSensorComponentBase.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/RayPerceptionSensorComponentBase.cs#L209) 的 `GetRayAngles()`：從中間（90 度）開始，右左右左越來越大來擺放，例如： `{ 90, 90 - delta, 90 + delta, 90 - 2*delta, 90 + 2*delta }` 回到 RayPerceptionSensor，從 `RayExtents()`、`PolarToCartesian3D()` 可知，經查證極坐標平面是 xz 平面，右方是 x，前方是 z，所以才要 `1 + 2 * Rays Per Direction` 參考：[GridSensorComponent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/GridSensorComponent.cs) GridSensorComponent 裡面對應到一個 GridSensor 如果沒有提供 SensorName，命名的慣例是（Line 19~28）： `GridSensor` 注意：如果執行後來再去改名，並不會影響原本排序資料的型態是 Vector，長度為： `GridSize.x * GridSize.z * Num Detectable Tags` 關於資料擺放方式可以參考 [GridSensorBase.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/GridSensorBase.cs#L316)（`Write()`）參考：[BufferSensorComponent.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Sensors/BufferSensorComponent.cs) 裡面對應到一個 BufferSensor 如果沒有提供 SensorName，命名的慣例是（Line 27~38）： `BufferSensor` 注意：如果執行後來再去改名，並不會影響原本排序用 `AppendObservation(float[] obs)` 加入觀察數值陣列（長度需要與給定的一樣） #### Actuator 這是負責執行動作的角色參考：[IActionReceiver.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Actuators/IActionReceiver.cs) `struct ActionBuffers` 是定義在這個檔案裡面 `IActionReceiver` 中有 `OnActionReceived(ActionBuffers actionBuffers)`、`WriteDiscreteActionMask(IDiscreteActionMask actionMask)`，是要被覆寫的 method 參考：[IActuator.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Actuators/IActuator.cs) `IActuator` 有繼承 `IActionReceiver` 等，再加上一些 `ActionSpec`、`Name` 之類的東西參考：[ActuatorManager.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Actuators/ActuatorManager.cs) `ActuatorManager` 其實就是 `IActuator` 的 List，提供一些像是 `UpdateActions(ActionBuffers actions)`、`ExecuteActions()` 等等的操作 #### DecisionRequester 參考：[DecisionRequester.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/DecisionRequester.cs) 參考：[Class DecisionRequester](https://docs.unity3d.com/Packages/com.unity.ml-agents@2.1/api/Unity.MLAgents.DecisionRequester.html) 如果是固定頻率要執行，可以掛上 DecisionRequester，`MakeRequests(int academyStepCount)` 會定期執行 `Agent.RequestDecision()` 或是可以設定 `TakeActionsBetweenDecisions`（預設為 `true`）在一般的時候要不要執行 `Agent.RequestAction()` MakeRequests 會對到這個 hook： ``` Academy.Instance.AgentPreStep += MakeRequests ``` 在 Academy 中被 Invoke 的時候就會被執行到 #### BehaviorParameters 參考：[BehaviorParameters.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Policies/BehaviorParameters.cs) `GeneratePolicy()` 有寫到決策方式的選擇順序為何 #### Academy 參考：[Academy.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Academy.cs) 以下列出的 Hooks： `AgentPreStep`（DecisionRequester 的 MakeRequests）、 `AgentIncrementStep`（計算次數）、 `AgentSendState`（SendInfo）、 `DecideAction`（DecideAction）、 `AgentAct`（AgentStep）在 `EnvironmentStep()` 會依序被 Invoke（執行）到 `EnvironmentStep()` 在 `AcademyFixedUpdateStepper.FixedUpdate()` 被呼叫所以可以知道==如果有掛 DecisionRequester，一個 step 單位為一個 FixedUpdate 的週期==，然而如果在決策時或其他地方有拖到時間，Frame 的週期會變長，一步的時間也會變長。再來解釋如何 Reset： `EnvironmentReset()` 是進到一個新環境，Agent 本身並不會 Reset，並且 `OnEnvironmentReset` 裡面的 hook 會被 Invoke（可以在 Unity 隨便一個 Script 把 function 加到 `Academy.Instance.OnEnvironmentReset` 這個 hook）另外 `ForcedFullReset()` 被執行時會執行 `EnvironmentReset()`，也會 Invoke `AgentForceReset`，所以 Agent 們也會 Reset 在一開始的時候、`OnResetCommand()` 的時候會執行到 `ForcedFullReset()`，而 `OnResetCommand()` 又是由外部（Python 的溝通）來控制的 Academy 會在 Agent 被生成（`Agent.LazyInitialize()`）的時候自動生成（因為有用到 `Academy.Instance`） `EnvironmentParameters` 可以獲得有關環境參數的資訊，透過 `Academy.Instance.EnvironmentParameters.GetWithDefault(key, float預設值)` 來取用 #### EnvironmentParameters 參考：[EnvironmentParameters.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/EnvironmentParameters.cs) 基本預設：每個參數都是 float 型態 Python 端 API 和內建訓練設定的 yaml 檔都可以透過 EnvironmentParameters 專屬的 Side Channel 來與之互動（提供了常數或是幾種亂數生成的功能），使用方法自行查詢文件 `float GetWithDefault(string key, float defaultValue)` 取得某個 key 的值（須提供預設值，萬一這個參數沒有被設定） #### StatsRecorder 參考：[StatsRecorder.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/StatsRecorder.cs) 會建立 Side Channel，並且可以透過 TensorBoard 呈現用 `/` 分隔可以建立巢狀的數據名稱 `Add(string key, float value, StatAggregationMethod aggregationMethod = StatAggregationMethod.Average)` 可以加入一個要觀察的值 ### ONNX 格式（Open Neural Network Exchange）官網：https://onnx.ai/ GitHub：https://github.com/onnx ONNX Runtime：https://onnxruntime.ai/ ONNX 教學：https://github.com/onnx/tutorials #### 教學文 - [Three Levels of ML Software](https://ml-ops.org/content/three-levels-of-ml-software) - 跟 ONNX 類似的資料交換格式有哪些 - [Overview of ONNX and operators](https://medium.com/axinc-ai/overview-of-onnx-and-operators-9913540468ae) - [Creating ONNX from scratch](https://towardsdatascience.com/creating-onnx-from-scratch-4063eab80fcd) - 蠻詳細的 - [Python Bindings for ONNX Runtime](https://onnxruntime.ai/python/) - [Tutorials: ORT Inferencing](https://onnxruntime.ai/docs/tutorials/inferencing/) - 如何在 Python 上用已經訓練好的 ONNX 模型做 Inference - `import onnxruntime as ort` - `sess = ort.InferenceSession(path_or_bytes)` - `sess.run([output_name], {input_name: x})` - input 是 Numpy arrays（tensors）、dictionaries（maps）、Numpy arrays 的 List（sequences） - [ONNX - Python API Overview](https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md) - 如何在 Python 把模型轉成 ONNX - [將您的 PyTorch 模型轉換成 ONNX](https://docs.microsoft.com/zh-tw/windows/ai/windows-ml/tutorials/pytorch-convert-model) - [TORCH.ONNX](https://pytorch.org/docs/stable/onnx.html) - 有些限制要注意一下 - 有分 trace-based 和 script-based - script-based 會在 method 前加上 `@torch.jit.script` decorator - 不支持 numpy - 支持 index（Getter、Setter） - Opset version 要新一點 - [tf2onnx - Convert TensorFlow, Keras, Tensorflow.js and Tflite models to ONNX](https://github.com/onnx/tensorflow-onnx) - 有一些注意事項（How tf2onnx works） - [tf2onnx Support Status](https://github.com/onnx/tensorflow-onnx/blob/master/support_status.md) - 現在有支援哪些 Opset - 分為 `tf2onnx.convert.from_keras()`、`tf2onnx.convert.from_function()` 和 `tf2onnx.convert.from_graph_def()` - [sklearn-onnx: Convert your scikit-learn model into ONNX](http://onnx.ai/sklearn-onnx/) - `skl2onnx.convert_sklearn(model, initial_types)` - 或是 `skl2onnx.to_onnx(model, training_dataset)` - [ONNXMLTools](https://github.com/onnx/onnxmltools) - 有支援 Apple Core ML、Spark ML、LightGBM、libsvm、XGBoost、H2O、CatBoost 等等 - `convert_coreml(model, ...)` - `convert_libsvm(model, ...)` - `convert_catboost(model, ...)` - `convert_lightgbm(model, ...)` - `convert_sparkml(model, ...)` - `convert_xgboost(model, ...)` - `convert_h2o(model, ...)` - 也有 Keras、TensorFlow、scikit-learn 支援，是 tf2onnx、skl2onnx 的包裝，但還是建議直接使用原本的 - `convert_keras(model, ...)` - `from_keras()` - `convert_sklearn(model, ...)` - `convert_sklearn()` - `convert_tensorflow(model, ...)` - `from_graph_def()` - [Apache MXNet contrib.onnx](https://mxnet.incubator.apache.org/api/python/docs/api/contrib/onnx/index.html) - [Operator support and coverage](https://github.com/apache/incubator-mxnet/tree/v1.x/python/mxnet/onnx#operator-support-matrix) #### 討論區 - [Can't load external model to agent (from .pth convert to .onnx and then to .nn) #3398](https://github.com/Unity-Technologies/ml-agents/issues/3398) - [Running a PyTorch (or even a TensorFlow) Model on Unity MLAgent](https://forum.unity.com/threads/running-a-pytorch-or-even-a-tensorflow-model-on-unity-mlagent.895631/) - 官方表示不會針對這部份（使用自己的模型）多做說明，但是是有可能是成功，只要遵照他們的規則（TensorNames.cs 的命名規定） - [Using ONNX Runtime inside Unity worked flawlessly however, and I am now using that instead of Barracuda.](https://github.com/Unity-Technologies/barracuda-release/issues/190) - Barracuda（Unity 用來跑 ONNX 的）有蠻多 Operator 不支援（例如 If） - 可以試試用 ONNX Runtime 代替 - [Microsoft.ML.OnnxRuntime](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime/) - [NuGetForUnity](https://github.com/GlitchEnzo/NuGetForUnity) - 因為 ML Agents 沒有辦法改，所以還是串接到 Python 用 ONNX Runtime 跑最理想 #### Unity 端參考：[TensorNames.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/TensorNames.cs) 參數要對應到這些名稱才有「機會」運作成功參考：[BarracudaModelParamLoader.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/BarracudaModelParamLoader.cs) 這是在載入模型的時候會做的檢查（版本、存不存在、shape 等），可以去了解這些限制代表什麼意思參數的擺放參考 model_serialization.py 規定參數的型態應該是 Tensor（[ModelLoader.cs](https://github.com/Unity-Technologies/barracuda-release/blob/release/2.1.0/Barracuda/Runtime/Core/ModelLoader.cs#L153)），設計 Network 時加上這些參數在 ML Agents 2.0，會有一個給類 RNN 模型用的 memory，大小存在 `TensorNames.MemorySize`，如果是 0 代表不用類 RNN 模型 Legacy 的部分是給 ML Agents 1.0 的，不用看 `CheckPreviousActionShape()` 在 ML Agents 2.0 不使用參考：[Model.cs](https://github.com/Unity-Technologies/barracuda-release/blob/release/2.1.0/Barracuda/Runtime/Core/Model.cs) 這是 Barracuda 中的 Model Class，ML Agents 中會用到，可參考參考：[ModelLoader.cs](https://github.com/Unity-Technologies/barracuda-release/blob/release/2.1.0/Barracuda/Runtime/Core/ModelLoader.cs) Barracuda 最多只支援到 8 維 Tensor 參考：[BarracudaModelExtensions.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/BarracudaModelExtensions.cs) 在 C# 當中有 Extension（擴充方法），在 static method 第一個 argument 放上 `this 要擴充的類別變數` 就可以用了 BarracudaModelExtensions 是 ML Agent 對 Barracuda.Model 的擴充 `GetVersion()`、`CheckExpectedTensors()` 等等都有被 BarracudaModelParamLoader 用到參考：[ModelRunner.cs](https://github.com/Unity-Technologies/ml-agents/blob/main/com.unity.ml-agents/Runtime/Inference/ModelRunner.cs) 這裡面有 Model 如何執行的細節，Input 跟 Output 傳遞的重要細節都在這 #### Python 端參考：[model_serialization.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/torch/model_serialization.py#L149) 很完整的呈現 ONNX 檔是如何被生出來的因為是 Inference，所以 batch 的維度是 1 參考：[NCHW和NHWC](https://www.itread01.com/content/1545104706.html) ML Agent 和 Barracuda 是使用 NCHW 慣例參考：[torch_policy.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/policy/torch_policy.py#L57) self.actor（Line 57、70）是 Network 轉換的本體參考：[torch_policy.py](https://github.com/Unity-Technologies/ml-agents/blob/main/ml-agents/mlagents/trainers/torch/networks.py) SimpleActor（Line 572）、SharedActorCritic（Line 689）會用來生成 actor NetworkBody（Line 174）底下的 `memory_size(self) -> int` 是用來看有沒有使用類 RNN 模型，有的話是需要傳回去的大小，如果是 0 代表不用類 RNN 模型