Bregman Cost Functions in Optimal Transport

###### tags: `one-offs` `transport` `bregman` # Bregman Cost Functions in Optimal Transport **Overview**: In this note, I present a problem in Optimal Transport which involves Bregman divergences, and sketch its solution. ## Problem Consider a measure transport problem with cost given by a Bregman divergence, i.e. \begin{align} c\left(x,y\right)=F\left(y\right)-F\left(x\right)-\left\langle \nabla F\left(x\right),y-x\right\rangle \end{align} with $F$ e.g. strictly convex and $C^{2}$, so that one is aiming to solve \begin{align} \min_{\gamma\in\Gamma\left(\mu,\nu\right)}\left\{ C\left(\gamma\right):=\int\gamma\left(\mathrm{d}x,\mathrm{d}y\right)\cdot c\left(x,y\right)\right\} . \end{align} **Task**: Show by a (nonlinear) change of coordinates that this is equivalent to solving an $L^{2}$ OT problem for another pair of probability measures. **Bonus Task**: Deduce a Brenier-type structure theorem for the $c$-optimal coupling between the measures (being appropriately generous with assumptions, etc.). ## Solution Write $\hat{x}=\nabla F\left(x\right)$, $\hat{\mu}=\left(\nabla F\right)_{\#}\mu$, so that $c\left(x,y\right)=:\hat{c}\left(\hat{x},y\right)=F^{*}\left(\hat{x}\right)+F\left(y\right)-\left\langle \hat{x},y\right\rangle$. Similarly, translate between $\Gamma\left(\mu,\nu\right)$ and $\Gamma\left(\hat{\mu},\nu\right)$ through the map $\nabla F\otimes\mathrm{Id}$, writing $\int\gamma\left(\mathrm{d}x,\mathrm{d}y\right)\cdot c\left(x,y\right) = \int\hat{\gamma}\left(\mathrm{d}\hat{x},\mathrm{d}y\right)\cdot\hat{c}\left(\hat{x},y\right)$. Note now that \begin{align} \hat{c}\left(\hat{x},y\right) &=F^{*}\left(\hat{x}\right)+F\left(y\right)-\left\langle \hat{x},y\right\rangle \\ &=\left(F^{*}\left(\hat{x}\right)-\frac{1}{2}\left|\hat{x}\right|_{2}^{2}\right)+\left(F\left(y\right)-\frac{1}{2}\left|y\right|_{2}^{2}\right)+\frac{1}{2}\left|x-y\right|_{2}^{2}, \end{align} and so \begin{align} C\left(\gamma\right)=\hat{C}\left(\hat{\gamma}\right)=&\hat{\mu}\left(F^{*}-\frac{1}{2}\left|\cdot\right|_{2}^{2}\right)+\nu\left(F-\frac{1}{2}\left|\cdot\right|_{2}^{2}\right)\\ &+\int\hat{\gamma}\left(\mathrm{d}\hat{x},\mathrm{d}y\right)\cdot\left(\frac{1}{2}\left|x-y\right|_{2}^{2}\right). \end{align} Noting that the first two terms on the right-hand side are independent of the chosen coupling, one sees that the cost is minimised by taking $\hat{\gamma}$ to be the $L^{2}$-optimal transport coupling between $\hat{\mu}$ and $\nu$. As such, naive 'Bregman-ising' of the optimal transport seems not to lead to essentially new problems. For the bonus problem, note that the Brenier theorem says that $L^{2}$-optimal transport from $\hat{\mu}$ to $\nu$ can be attained through a map of the form $T\left(x\right)=\nabla\Phi\left(x\right)$, for some convex $\Phi$. Noting that $\hat{\mu}$ is obtained from $\mu$ by transport under the map $\nabla F$, it follows that the Bregman-optimal mapping from $\mu$ to $\nu$ takes the form $\hat{T}\left(x\right)=\left(\nabla\Phi\circ\nabla F\right)\left(x\right)$, and we conclude.