# Reinforcement Learning via Fenchel Rockafellar Duality #### Author: [Sharath](https://sharathraparthy.github.io/) **Note:** This notes is incomplete ### Outline This paper reviews the basic concepts of convex duality and explores very useful "Fenchel Rockafellar duality" in detail. They also discuss about the applicability of this powerful technique to reinforcement learning. ### Introduction In reinforcement learning, we traditionally operate in an MDP and dynamic programming is often used to solve for the optimal solutions. Some of the famous methods which use DP are SARSA, Q-learning, and thier deep learning variants etc. But there is also another paradigm in RL which is based on linear programming where the optimization objective involves a linear objective coupled with some linear constraints. In this paper they discuss the RL formulation as a convex objective with some linear constraints. ### Convex Duality In optimization, we can view any problem in two ways: 1. Primal problem 2. Dual Problem The advantage of duality is that it provides a lot of computational advantage with less number of problems and lot of constraints. (TODO: Write more about duality here and why it is an important cocncept) ### Fenchel Conjugate The *Fenchel Conjugate* $f_{*}$ of a function $f: \Omega \rightarrow \mathbb{R}$ is defined as $$f_{*} = \max_{x \in \Omega} <x, y> - f(x)$$ The interpretation is that $f_{*}(y)$, which is the fenchel conjugate, is simply the maximum amount that the linear functional (dot product) exceeds $f(x)$ Just for the sake of intuition, let's consier an example. Let $f(x) = x^{2} - 2x + 2$. Now the fenchel conjugate can be written as $$f_{*}(y) = \max_{x \in \Omega}<x, y> - (x^2 - 2x + 2)$$$$ = \max_{x \in \Omega} (xy - x^2 + 2x - 2)$$ To find the maximum, we can differentiate the above expression with x and equate it to zero. By doing so, we get $x^{*} = \frac{y - 2}{2}$. If we evaluate the fenchel conjugate at this point, we get, $$f_{*}(y) = \frac{y^{2}}{4} + y - 1$$ ![](https://i.imgur.com/z4BIllM.jpg) In the above figure, we overlayed $yx$ on $f(x)$. The vertical difference between these two is maximised at $x=2$ and this is the fenchel conjugate $f_{*}(a)$ (TDO: Make this more intuitive. I don't really get the intuition.) ### Indicator Function It is defined as following $$\delta_{c}(x) = \begin{cases} 0 & x \in C \\ \infty & otherwise \end{cases}$$ In optimization problems with constraints, this function becomes handy in expressing everything in a neat way. For example, the optimization problem $\min_{Ax=0} f(x)$ can also be written as $\min_{x} f(x) + \delta_{{0}}(Ax)$ ### $f$ divergences It is basically used to measure the discrepancy between two distributions. Formally, if we have two probability distributions, $p$ and $q$, then the $f$ divergence can we written as $$D_{f}(p || q) = \mathbb{E}_{z \sim q} \left[ f\frac{p(z)}{q(z)} \right]$$ #### Properties 1. **Non-Negativity:** The $f$-divergence is always positive and zero if the distributions match 2. **Monotonicity:** If k is an arbitrary transition probaility that transforms $P$ and $Q$ to $P_k$ and $Q_k$ then, $D_{f}(P || Q) \geq D_{f}(P_k || Q_k)$ 3. Convexivity: The $f$-divergence is convex in nature. (TODO: Check) ### Fenchel-Rockafellar Duality This is one of the useful tools for tackling optimization problems which deals with fenchel conjugates. Let's say we have a primal problem $\min_{x \in \Omega} J_p(x) = f(x) + g(Ax)$. Here $f, g : \Omega \rightarrow \mathbb{R}$ are convex and lower semi-continuous functions and A is a linear operator. Now, let's convert this primal problem to dual problem. $$\min_{x \in \Omega} f(x) + g(Ax) \\ = \max_{y \in \Omega^{*}}\min_{x \in \Omega} f(x) + <y, Ax> - g_{*}(y) \\ = \max_{y \in \Omega^{*}}\min_{x \in \Omega} \{f(x) + <y, Ax>\} - g_{*}(y) \\ = \max_{y \in \Omega^{*}}-\max_{x \in \Omega} \{ - f(x) + < -A_{*}y, x>\} - g_{*}(y) \\ \max_{y \in \Omega^{*}} -f(-A_{*}y) - g_{*}(y)$$