## HyperNeat [[Paper]](https://axon.cs.byu.edu/~dan/778/papers/NeuroEvolution/stanley3**.pdf)[[lecture]](https://www.youtube.com/watch?v=-bWBCpZxmxU) * If CPPNs are to evolve and represent connectivity patterns, the problem is to find the best interpretation of their output to effectively describe such a structure. * Compositional Pattern Producing Networks (CPPNs) * In biological genetic encoding the mapping between genotype and phenotype is indirect. The phenotype typically contains orders of magnitude more structural components than the genotype contains genes. ![](https://hackmd.io/_uploads/r19O78YF3.png) Compositional Pattern Producing Networks (CPPNs) are a novel abstraction of evelopment that can represent sophisticated repeating patterns in Cartesian space [49, 50]. Unlike most generative and develop mental encodings, CPPNs do not require an explicit simulation of growth or local interaction, yet still realize their essential functions. This section reviews CPPNs, which will be augmented in this paper to represent connectivity patterns and ANNs. ![](https://hackmd.io/_uploads/rJTJNUtt2.png) * Mapping Spatial Patterns to Connectivity Patterns * It turns out that there is an effective mapping between spatial and connectivity patterns that can elegantly exploit geometry. The main idea is to input into the CPPN the coordinates of the two points that define a connection rather than inputting only the position of a single point as in Section 2.1. The output is nterpreted as the weight of the connection rather than the intensity of a point. This way, connections can be defined in terms of the locations that they connect, thereby taking into account the network’s geometry.The CPPN in effect computes a four-dimensional function CP P N(x1, y1, x2, y2) = w, where the firstnode is at (x1, y1) and the second node is at (x2, y2). This formalism returns a weight for every connection between every node in the grid, including recurrent connections. By convention, a connection is not expressed if the magnitude of its weight, which may be positive or negative, is below a minimal threshold wmin. The magnitude of weights above this threshold are scaled to be between zero and a maximum magenitude in the ubstrate. That way, the pattern produced by the CPPN can represent any network topology. **Connectivity Patterns Produced by Connective CPPNs** ![](https://hackmd.io/_uploads/HkqhELKKn.png) **Activation Patterns of the Same Connective CPPN at Different Resolutions** ![](https://hackmd.io/_uploads/r1_JSLtY2.png) **Connectivity Motifs of the Same Substrate at Different Locations** ![](https://hackmd.io/_uploads/rkfeHLFtn.png) ### 結論 > 因HyperNeat適用在圖像漸變生成及座標位置判別的訓練,因此不適合運用在self-driving car的專案中 ## Deep Deterministic Policy Gradient (DDPG) [[Doc1]](https://arxiv.org/pdf/1509.02971v2.pdf)[[Doc2]](https://keras.io/examples/rl/ddpg_pendulum/)[[Github1]](https://github.com/DeeKay3/TORCS_DDPG)[[Github2]](https://github.com/djo10/deep-rl-ddpg-self-driving-car)[[Github3]](https://github.com/wpiszlogin/driver_critic)[[Github4]](https://github.com/cookbenjamin/DDPG) * In DDPG, the Target network is a counterpart of the Actor-Critic network. The Target network has the same structure and parameterization as the Actor-Critic network. * The DDPG deep RL algorithm is a model-free off-policy Actor-Critic algorithm inspired by the Deep Q-Network (DQN) algorithm. It combines the advantages of the Policy Gradient methods and Q-learning to learn a deterministic policy for continuous action spaces. ![](https://hackmd.io/_uploads/HJcjsdqt2.png) * During training, a DDPG agent: * Updates the actor and critic properties at each time step during learning. * Stores past experiences using a circular experience buffer. The agent updates the actor and critic using a mini-batch of experiences randomly sampled from the buffer. * Perturbs the action chosen by the policy using a stochastic noise model at each training step. ![](https://hackmd.io/_uploads/rJzKj_ctn.png) * Actor and Critic Functions * To estimate the policy and value function, a DDPG agent maintains four function approximators: * Actor π(S;θ)— The actor, with parameters θ, takes observation S and returns the corresponding action that maximizes the long-term reward. * Target actor πt(S;θt) — To improve the stability of the optimization, the agent periodically updates the target actor parameters θt using the latest actor parameter values. * Critic Q(S,A;ϕ) — The critic, with parameters ϕ, takes observation S and action A as inputs and returns the corresponding expectation of the long-term reward. * Target critic Qt(S,A;ϕt) — To improve the stability of the optimization, the agent periodically updates the target critic parameters ϕt using the latest critic parameter values. * Both Q(S,A;ϕ) and Qt(S,A;ϕt) have the same structure and parameterization, and both π(S;θ) and πt(S;θt) have the same structure and parameterization. * For more information on creating actors and critics for function approximation, see Create Policies and Value Functions. During training, the agent tunes the parameter values in θ. After training, the parameters remain at their tuned value and the trained actor function approximator is stored in π(S). ![](https://hackmd.io/_uploads/B1oKo_cFn.png) ### 結論 >適合self-driving car主題,但與Neat結構完全不一樣,實作需大改。 {%hackmd @themes/dracula %}