開會討論 - HackMD

# 開會討論 $$r_1=-(d-d_{prev})$$ $$r_2 =Rot_{22}-1.0$$ ## 11/20開會內容 ### 動機手臂(DoF 7)抓取物體 4 FC: 64->32->16->7 目前可假定燒杯位置資料已知視覺辨識部分直接使用朱宏國老師現有已train好的model就好情境設計需要更完整(更多模型、動作...) ## 11/12開會內容 ## 架設相機(含相機位置鍵盤控制)[Github Link](https://github.com/CT-Lab/deepbots-panda/tree/Camera) ![](https://i.imgur.com/kVJXNur.gif) ### How to Move Cameras * CAM_A * Press <code>1</code> to turn right * Press <code>Q</code> to turn left * CAM_B * Press <code>2</code> to turn right * Press <code>W</code> to turn left * CAM_C * Press <code>3</code> to turn right * Press <code>E</code> to turn left * CAM_D * Press <code>4</code> to turn right * Press <code>R</code> to turn left * ALL * Press <code>5</code> to move up all cameras * Press <code>T</code> to move down all cameras The <code>position_on_image</code> and <code>size_on_image </code>can be used to determine the bounding box of the object in the camera image, the units are pixels. **神經網路結構 4096->2048->1024->512->256->輸出** **ActorNetwork and CriticNetwork** ### Train with DDPG pytorch [GitHub Link](https://github.com/CT-Lab/deepbots-panda/blob/Camera/Panda_RL/controllers/supervisorController/DDPGAgent.py) Issue：Rot[4]在新版與scale並無關聯，因此仍為1.0 \ \ \ \ \ \ \ \ \ ## 10/30開會內容 * 目前RL部分的middleware可以分為三種 1. pipe 2. deepbots 3. TCP/IP socket * RL的agent有 1. DDPG (與pipe一起) 2. PPO, DQN, DDQN, DDPG (與deepbots一起) 訓練目標：夾住燒杯後，往上提高一個燒杯的高度，過程保持燒杯不傾倒。分工項目： - [ ] 廖軒裕、石詠太改observation input(PPO w/ deepbots), reward分三個圖表看趨勢 - [ ] 楊鈞凱DDPG in deepbots, camera #Hierarchical Reinforcement Learning - [ ] 陳楚翔、蘇晏鋒維持水平上移IKPY，IKPY動作腳本 - [ ] 楊鈞愷燒杯水平上移(DDPG dopamine w/ pipe) ## 射月計畫開會 10/16 no comment ## 10/23開會內容 * 目前RL部分的middleware可以分為三種 1. pipe 2. deepbots 3. TCP/IP socket * RL的agent有 1. DDPG (與pipe一起) 2. PPO (與deepbots一起) 3. 開發中: 1. DQN (與deepbots一起) 2. DDQN (與deepbots一起) 訓練目標：夾住燒杯後，往上提高一個燒杯的高度，過程保持燒杯不傾倒。分工項目： - [ ] 楊鈞愷燒杯水平上移(DDPG dopamine w/ pipe) - [ ] 廖軒裕改observation input(PPO w/ deepbots) - [ ] 楊鈞凱DDPG in deepbots and Hierarchical Reinforcement Learning - [ ] 陳楚翔、蘇晏鋒靠近之後手指夾燒杯 ->移動到其他地方-> 倒水 - [ ] 石詠太你要甚麼~~~ ## 10/16開會內容 * 目前RL部分的middleware可以分為三種 1. pipe 2. deepbots 3. TCP/IP socket * RL的agent有 1. DDPG (與pipe一起) 2. PPO (與deepbots一起) 3. 開發中: 1. DQN (與deepbots一起) 2. DDQN (與deepbots一起) 訓練目標：夾住燒杯後，往上提高一個燒杯的高度，過程保持燒杯不傾倒。 1. IK靠近燒杯後，用RL訓練讓燒杯至於兩指之間以便Rule base直接夾取 2. 夾著瓶子保持orientation移動分工項目： - [ ] 楊鈞凱、石詠太負責訓練目標2(PPO w/ deepbots) - [ ] 廖軒裕+楊鈞愷負責訓練目標1(DDPG dopamine w/ pipe) - [ ] 陳楚翔 w/ 蘇晏鋒靠近之後手指夾燒杯 --(等移動到其他地方(RL))--> 倒藥水 - [ ] 夾住燒杯後，往上提高一個燒杯的高度，過程保持燒杯不傾倒。下週五開會繼續討論~~ ## 9/25開會內容 * 目前RL部分的middleware可以分為三種 1. pipe 2. deepbots 3. TCP/IP socket (測試階段) * RL的agent有 1. DDPG (與pipe一起) 2. PPO (與deepbots一起) 3. 開發中: 1. DQN (與deepbots一起) 2. DDQN (與deepbots一起) 分工項目： - [ ] 石詠太持續測試webots TCP/IP，測試成功可以作為取代pipe的方式 - [ ] 楊鈞凱刻其他deepbots agent - [ ] 廖軒裕 w/ 楊鈞愷討論DDPG，嘗試將NN從在不同位置以開啟多個webots在電腦裡面進行訓練，只跑一個CPU沒甚麼負擔 - [ ] 陳楚翔 w/ 蘇晏鋒持續用rule based的環境與動作 - [ ] 其他代辦事項：將楚翔&晏鋒用好的world設置為新的訓練環境，訓練夾杯子動作，相比原本訓練目標能大幅減少input feature的數量 __________________________________ * 射月計畫定期會議台達306會議室 9/4 下午2:00~4:00 * 9/11 demo預演 * 9/1號繳交，8月份月報:9~12月預期進度 * 成果佐證資料表 6/11~9/15 9/10繳交 * 9/16號成果展 * 以上內容會再寄信給負責的學生教授對於deepbots建議對於actionSpace過大的問題 1. 固定某幾個軸做訓練 2. 縮小馬達邊界 ## deepbots套件解釋 ![](https://i.imgur.com/snnG5Oo.png) ## reward機制 1. 根據目標點TARGET的getPosition得到在Cartesian coordinate system的座標$T$與七個轉軸用IKPY的[forward_kinematics](https://ikpy.readthedocs.io/en/latest/chain.html#ikpy.chain.Chain.forward_kinematics) [0:3, 3]取出他的Cartesian coordinate system的座標$E$ 用np.linalg.norm的default算出Frobenius norm，此參數命名為**$\text{L2norm}_t$** $$\text{L2norm}_t=||T_t -E_t ||_F = [\sum_{i=1}^3 abs(T_{t,i}-E_{t,i})^2]^{\frac{1}{2}}$$ 2. 在同一個trajectory $\tau$中，observation包含此**L2norm的變化**，此參數命名為**dL2norm** $$\text{dL2norm}_t=\text{L2norm}_t-\text{L2norm}_{t-1}$$ ## 取得observation 1. 由機械手臂發射各馬達轉動角度回到supervisor，以記錄共七個轉動位置 2. TARGET的位置共三個參數 3. end-effector的位置共三個參數 4. L2norm一個參數共計14個參數 ### 討論主題 - [ ] observation space增改 > [motorPosition, targetPosition, endEffectorPosition, L2norm] - [ ] reward機制 > 考慮是否加上Baseline，與合理的扣分加分機制 - [ ] agent內的neural network大小 - [ ] action控制方式為改變velocity 或是改變position - [ ] motor邊界控制與是否加入超出馬達邊界的懲罰或是直接以程式碼忽略超出的 ``` __________ ____________ / _______/ /____ ____/ / / / / / / / / / /_______ / / /__________/ /__/ __ ___ ______ / / / _ \ | __ \ / / / / \ \ | _____/ / / / /___\ \ | __ \ / /______ / _____ \ | |__| | /________/ /_/ \_\ |______/ ```