# 6/26 Paper #1
## RL 預備知識
[Powerful Blog 內含所有RL Algorithm](https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html)
## Multi-Agent Actor-Critic for <br>Mixed cooperative-Competitive Environments
### Abstract:
mult-agent 的環境,如果使用單一個agent的train法(ppo, A2C),收集的experience會因為環境中有多個不同的policy在訓練的過程中不斷的更新而造成training不穩定, 這樣的experience在近似上很難近似,training也很難收斂(因為如果不把他人的action也視為環境的一部分,只要其他agent它門的policy變了,相當於環境也有所改變,這樣之前training的data和更新後的experience勢必不能視為相同),variance 變大,那在這邊提到一個想法,如果我們的critic在training時可以看到全場的資訊,包含他人的observation和他所對應的action這樣做出來對應的Q Value應該更加穩定。
### Related Work
DPG: Deterministic policy gradient [Paper Link](https://hal.inria.fr/file/index/docid/938992/filename/dpg-icml2014.pdf)
DDPG: Deep Deterministic Policy Gradient [Paper Link](https://arxiv.org/abs/1509.02971)
### Motivation
(1)policies can only use local information.
(2)didn't must be a differentiable model of the environment.
(3)no particular structure on the communication method between agents.
#### 3.1 Overview

#### 3.2 Inferring Policies of Other Agents
Equation 1 ->

#### 3.3 Agents with Policy Ensembles
Equation 2 ->

### Experiment <br>
### Implementation
先放上Pseudo code <br>
