# Optimization of sound field problem with hyperparameters and mixed signal model.
I've written here ideas I have had using the model proposed in example #3 of the document.
The incident pressure field is represented by linear combination of kernel functions [Ueno+ IWAENC 2018]:
\begin{align}
u(\boldsymbol{r}) &= \sum_{m=1}^M \alpha_m \kappa(\boldsymbol{r},\boldsymbol{r}_m) + \sum_{\nu,\mu} \mathring{u}_{\mathrm{sct},\nu,\mu} \varphi_{\mathrm{sct},\nu,\mu}(\boldsymbol{r}) \\
&\approx \sum_{m=1}^M \alpha_m \kappa(\boldsymbol{r},\boldsymbol{r}_m) + \sum_{\nu,\mu}^N \mathring{u}_{\mathrm{sct},\nu,\mu} \varphi_{\mathrm{sct},\nu,\mu}(\boldsymbol{r})
\end{align}
with the kernel function
\begin{align}
\kappa(\boldsymbol{r}_1,\boldsymbol{r}_2) = j_0 \left( \sqrt{\left( \mathrm{j} \rho \boldsymbol{\eta}_{\mathrm{pr}} - k(\boldsymbol{r}_1-\boldsymbol{r}_2) \right)^{\mathsf{T}} \left( \mathrm{j} \rho \boldsymbol{\eta}_{\mathrm{pr}} - k(\boldsymbol{r}_1-\boldsymbol{r}_2) \right) } \right)
\end{align}
where $\boldsymbol{\eta}_{\mathrm{pr}}\in\mathbb{S}_2$ is the prior arrival source direction, $\rho\in\mathbb{R}$ is the hyperparameter.
# Norm-based criteria
My first instinct is to position this problem as a classical $\ell_p$ norm losses with $\ell_p$ norm regularization.
\begin{equation}
\min_{\boldsymbol{\alpha} \in \mathbb{C}^M,\ \mathring{\boldsymbol{u}}_\mathrm{sct}\in \mathbb{C}^{(N+1)^2}} \|\mathbf{s} - \mathbf{K}\boldsymbol{\alpha} -\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}\|_{p_1}^{p_1} + \lambda_1 \|\boldsymbol{\alpha}\|_{p_2}^{p_2} + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_{p_3}^{p_3}
\end{equation}
In this case, $p_{1,2,3}$ of course can take any value presumably in the interval $(0,2]$, and not necessarily the same value for all three. However two notable cases pop to mind:
* $p_1=p_2=p_3=2$
* $p_2=p_3=1$
The case $p=2$ for all three gives us a closed form solution to the optimization problem, while the case where the regularization portions are performed with $p=1$ returns sparse representations, which supply better noise robustness and tend to perform better.
## The $p_1=p_2=p_3=2$ case.
The advantage of the $\ell_2$ losses is that in this case, we do have a closed form solution to the problem. First, let's define the loss as a function:
\begin{equation}
\mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct};\rho) := \|\mathbf{s} - \mathbf{K}\boldsymbol{\alpha} -\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2 + \lambda_1 \|\boldsymbol{\alpha}\|_2^2 + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2
\end{equation}
We can obtain a closed form solution for this due to a simple fact: we can optimize the loss separately for $\boldsymbol{\alpha}$ and $\mathring{\boldsymbol{u}}_\mathrm{sct}$. Performing an optimization on $\boldsymbol{\alpha}$ only, we consider $\mathbf{s}_\mathrm{sct} = \mathbf{s} - \boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}$. In that case, we consider $\mathring{\boldsymbol{u}}_\mathrm{sct}$ to have been found and create the partial problem below:
\begin{equation}
\min_{\boldsymbol{\alpha}\in \mathbb{C}^N} \|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha}\|_2^2 + \lambda_1\|\boldsymbol{\alpha}\|_2^2
\end{equation}
This is a classical ridge regression problem, for which the solution is already known:
\begin{equation}
\boldsymbol{\alpha}_\mathrm{sct} = (\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{K}\mathbf{s}_\mathrm{sct}
\end{equation}
In other words:
\begin{align}
\|\mathbf{s}-\mathbf{K}\boldsymbol{\alpha} -\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct} \|_2^2+\lambda_1 \|\boldsymbol{\alpha}\|_2^2 &= \|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha} \|_2^2+\lambda_1 \|\boldsymbol{\alpha}\|_2^2\\
&\geq \|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha}_\mathrm{sct} \|_2^2+\lambda_1 \|\boldsymbol{\alpha}_\mathrm{sct}\|_2^2
\end{align}
The last expression on the left can be reduced further using the explicit expression of $\boldsymbol{\boldsymbol{\alpha}}_\mathrm{sct}$ and the singular value decomposition of $\mathbf{K}=\mathbf{V} \boldsymbol{\Sigma}\mathbf{V}^\mathsf{H}$ to obtain:
\begin{align}
\|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha}_\mathrm{sct} \|_2^2+\lambda_1 \|\boldsymbol{\alpha}_\mathrm{sct}\|_2^2 &= \|\mathbf{s}_\mathrm{sct} - \mathbf{K}(\mathbf{K}^2 +\lambda_1\mathbf{I})^{-1}\mathbf{K}\mathbf{s}_\mathrm{sct}\|_2^2 + \lambda_1 \|(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{K}\mathbf{s}_\mathrm{sct}\|_2^2\\
&=\|(\mathbf{V}\mathbf{V}^\mathsf{H} - \mathbf{V}\boldsymbol{\Sigma}(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})\boldsymbol{\Sigma}\mathbf{V}^\mathsf{H})\mathbf{s}_\mathrm{sct}\|_2^2+\lambda_1 \|(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{K}\mathbf{s}_\mathrm{sct}\|_2^2
\end{align}
Since $\boldsymbol{\Sigma}$ is a diagonal matrix, the product between it and $(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}$ is commutative, meaning:
\begin{align}
\boldsymbol{\Sigma}(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Sigma}&=\boldsymbol{\Sigma}^2(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1} = (\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I}-\lambda_1 \mathbf{I})(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}\\
&=\mathbf{I} - \lambda_1(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}
\end{align}
Meaning that:
\begin{equation}
\mathbf{V}\mathbf{V}^\mathsf{H} - \mathbf{V}\boldsymbol{\Sigma}(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Sigma}\mathbf{V}^\mathsf{H} = \lambda_1 \mathbf{V}(\boldsymbol{\Sigma}^2+\lambda_1\mathbf{I})^{-1}\mathbf{V}^ \mathsf{H}=\lambda_1(\mathbf{K}^2 + \lambda_1\mathbf{I})^{-1},
\end{equation}
and thus,
\begin{align}
\|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha}_\mathrm{sct} \|_2^2+\lambda_1 \|\boldsymbol{\alpha}_\mathrm{sct}\|_2^2 &= \|\lambda_1(\mathbf{K}^2 + \lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}\|_2^2+\lambda_1\|(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{K}\mathbf{s}_\mathrm{sct}\|\\
&=\lambda_1^2\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-2}\mathbf{s}_\mathrm{sct}+\lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-2}\mathbf{s}_\mathrm{sct}\\
&=\mathbf{s}_\mathrm{sct}^\mathsf{H}\mathbf{V}^\mathsf{H}
\left ( \lambda^2_1(\boldsymbol{\Sigma}^2 + \lambda_1\mathbf{I})^{-2} + \lambda_1 \boldsymbol{\Sigma}(\boldsymbol{\Sigma}^2 + \lambda_1\mathbf{I})^{-2}\boldsymbol{\Sigma} \right )\mathbf{V} \mathbf{s}_\mathrm{sct}\\
&=\lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}\mathbf{V}^\mathsf{H}(\boldsymbol{\Sigma}^2+\lambda_1 \mathbf{I})^{-1}\mathbf{V} \mathbf{s}_\mathrm{sct}\\
&=\lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K^2}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct},
\end{align}
so we can impose the following bound:
\begin{align}
\mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct};\rho) &= \|\mathbf{s} - \mathbf{K}\boldsymbol{\alpha} -\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2 + \lambda_1 \|\boldsymbol{\alpha}\|_2^2 + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2\\
&=(\|\mathbf{s}_\mathrm{sct} - \mathbf{K}\boldsymbol{\alpha}\|+\lambda_1\|\boldsymbol{\alpha}\|_2^2) + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{scr}\|_2^2\\
&\geq\lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K^2}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}+\lambda_2\|\mathring{\mathbf{u}}_\mathrm{sct}\|.
\end{align}
We can define a simpler optimization criterion remembering that $\boldsymbol{\alpha}_\mathrm{sct}$ is a function of $\mathring{\boldsymbol{u}}_\mathrm{sct}$:
\begin{align}
\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) = \mathcal{J}(\boldsymbol{\alpha}_\mathrm{sct}, \mathring{\boldsymbol{u}}_\mathrm{sct}) = \lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K^2}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}+\lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2
\end{align}
And thus:
\begin{equation}
\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) \leq \mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct})\ \forall \boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}\Rightarrow \inf_{\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}} \mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}) \geq \inf_{\mathring{\boldsymbol{u}}_\mathrm{sct}} \mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct})
\end{equation}
Therefore, if we find the value of $\mathring{\boldsymbol{u}}_\mathrm{sct}$ that minimizes $\mathcal{J}_\mathrm{sct}$, we have solved the problem. We now open $\mathcal{J}_\mathrm{sct}$ into matrix form.
\begin{align}
\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) &= \lambda_1(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct})^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}) + \lambda_2\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathrm{H}\mathring{\boldsymbol{u}}_\mathrm{sct}\\
&=\lambda_1\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}-\lambda_1\mathring{\boldsymbol{u}}^\mathsf{H}_\mathrm{sct}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\ \ \ \ \ -\lambda_1\mathbf{s}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}+\lambda_1\mathbf{s}^\mathsf{H}(\mathbf{K}^2+\lambda_1)^{-1}\mathbf{s}+\lambda_2\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathrm{H}\mathring{\boldsymbol{u}}_\mathrm{sct}
\end{align}
This is a classic quadratic form, for which we do know the solution of for a minimization problem:
\begin{equation}
\mathring{\boldsymbol{u}}_\mathrm{opt} = \arg\min_{\mathring{\boldsymbol{u}}_\mathrm{sct}} \mathcal{J}_\mathrm{sct} (\mathring{\boldsymbol{u}}_\mathrm{sct})= \left (\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi} + \frac{\lambda_2}{\lambda_1}\mathbf{I} \right )^{-1}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{s}
\end{equation}
Therefore, since the infimum is reachable, it must be the same as the infimum of $\mathcal{J}$, meaning we have a solution in closed form given as:
\begin{align}
&\mathring{\boldsymbol{u}}_\mathrm{opt} = \left (\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi} + \frac{\lambda_2}{\lambda_1}\mathbf{I} \right )^{-1}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\boldsymbol{\alpha}_\mathrm{opt} = (\mathbf{K}^2+\lambda_1\mathbf{I})^{-1} \mathbf{K}(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{opt})
\end{align}
It might be reducible further, with perhaps the decomposition of the matrices above into their SVDs.
## The $p_1=2$ and $p_2=p_3=1$ case.
I won't be as formal as with the previous one. For this one, I propose the use of a surrogate function leveraging the fact we know the solution to the $\ell_2$ case to build our iterative steps.
Observe the following optimization problem:
# RKHS-based criteria
Another form of wording the problem that would interest us is to make it a partial RKHS/partial data-based regression problem. We introduce the optimization problem:
\begin{equation}
\min_{u \in \mathscr{H},\ \mathring{\boldsymbol{u}}_\mathrm{sct}\in \mathbb{C}^N} \sum_{m=1}^M |s(\mathbf{r}_m) - u(\mathbf{r}_m)-u_\mathrm{sct}(\mathbf{r}_m)|^{p_1} + \lambda_1 \|u\|_\mathscr{H}^{2} + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_{p_2}^{p_2}
\end{equation}
The division of cases is similar, except, the norm of $u$ has to be a quadratic norm because of the RKHS structure. Because we know $u$ is in a RKHS, we can express it as linear combination of kernel functions. This, an equivalent formulation is:
\begin{equation}
\min_{\boldsymbol{\mathbf{\alpha}} \in \mathbb{C}^M,\ \mathring{\boldsymbol{u}}_\mathrm{sct}\in \mathbb{C}^N} \sum_{m=1}^M |s(\mathbf{r}_m) - u(\mathbf{r}_m)-u_\mathrm{sct}(\mathbf{r}_m)|^{p_1} + \lambda_1 \boldsymbol{\alpha}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha} + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_{p_2}^{p_2}
\end{equation}
## The $p_1=p_2=2$ case
For the most part, this derivation will be very similar to the previous case for $\ell_2$ norms. We know the form of $u$ due to the representer theorem, so I'l define a loss function:
\begin{equation}
\mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}) := \|\mathbf{s} - \mathbf{K}\boldsymbol{\alpha} -\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2 + \lambda_1 \boldsymbol{\alpha}^\mathsf{H}\mathbf{K}\boldsymbol{\alpha} + \lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2
\end{equation}
We will first act as though we have solved the problem for $u_\mathrm{sct}$ and will focus on solving for $u$, so:
\begin{equation}
\min_{\alpha\in \mathbb{C}^M}\|\mathbf{s}_\mathrm{sct}-\mathbf{K}\boldsymbol{\alpha}\|_2^2 + \lambda_1 \boldsymbol{\alpha}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha}
\end{equation}
This is the formulation for kernel ridge regression commonly used in our ONBA papers, and has a known solution:
\begin{equation}
\boldsymbol{\alpha}_\mathrm{sct} = (\mathbf{K}+\lambda_1\mathbf{I})^{-1} \mathbf{s}_\mathrm{sct},
\end{equation}
which gives us a bound:
\begin{align}
\|\mathbf{s}_\mathrm{sct}-\mathbf{K}\boldsymbol{\alpha}\|_2^2 + \lambda_1 \boldsymbol{\alpha}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha} &\geq \|\mathbf{s}_\mathrm{sct}-\mathbf{K}\boldsymbol{\alpha}_\mathrm{sct}\|_2^2 + \lambda_1 \boldsymbol{\alpha}_\mathrm{sct}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha}_\mathrm{sct}\\
&\geq \|\mathbf{s}_\mathrm{sct}-\mathbf{K}(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}\mathbf{s}_\mathrm{sct}\|_2^2 + \lambda_1 \mathbf{s}_\mathrm{sct}(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}\mathbf{K} (\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}\mathbf{s}_\mathrm{sct}
\end{align}
Observe that:
\begin{align}
\mathbf{I} - \mathbf{K}(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}&=\mathbf{I} - (\mathbf{K}+\lambda_1\mathbf{I} - \lambda_1\mathbf{I})(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}=\lambda_1(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}
\end{align}
and
\begin{align}
(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}\mathbf{K}(\mathbf{K}+\lambda_1\mathbf{I}) ^{-1} &= (\mathbf{K}+\lambda_1\mathbf{I}) ^{-1}(\mathbf{K}+\lambda_1\mathbf{I}-\lambda_1\mathbf{I})(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\\
&=(\mathbf{K}+\lambda_1\mathbf{I})^{-1}-\lambda_1(\mathbf{K}+\lambda_1\mathbf{I})^{-2}.
\end{align}
Therefore:
\begin{align}
\|\mathbf{s}_\mathrm{sct}-\mathbf{K}\boldsymbol{\alpha}\|_2^2 + \lambda_1 \boldsymbol{\alpha}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha} &\geq \|\mathbf{s}_\mathrm{sct}-\mathbf{K}\boldsymbol{\alpha}_\mathrm{sct}\|_2^2 + \lambda_1 \boldsymbol{\alpha}_\mathrm{sct}^\mathsf{H} \mathbf{K} \boldsymbol{\alpha}_\mathrm{sct}\\
&\geq \lambda_1^2\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-2}\mathbf{s}_\mathrm{sct}+\lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{K}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}\\
&\geq \lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}
\end{align}
From this point onwards, the derivations are largely identical. We define an auxilliary loss:
\begin{align}
&\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) = \mathcal{J}(\boldsymbol{\alpha}_\mathrm{sct}, \mathring{\boldsymbol{u}}_\mathrm{sct}) = \lambda_1\mathbf{s}_\mathrm{sct}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}_\mathrm{sct}+\lambda_2\|\mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2\\
&\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) \leq \mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct})\ \forall \boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}\Rightarrow \inf_{\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}} \mathcal{J}(\boldsymbol{\alpha}, \mathring{\boldsymbol{u}}_\mathrm{sct}) \geq \inf_{\mathring{\boldsymbol{u}}_\mathrm{sct}} \mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct})
\end{align}
We then open the function into a quadratic form:
\begin{align}
\mathcal{J}_\mathrm{sct}(\mathring{\boldsymbol{u}}_\mathrm{sct}) &= \lambda_1(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct})^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}) + \lambda_2\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathrm{H}\mathring{\boldsymbol{u}}_\mathrm{sct}\\
&=\lambda_1\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}-\lambda_1\mathring{\boldsymbol{u}}^\mathsf{H}_\mathrm{sct}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\ \ \ \ \ -\lambda_1\mathbf{s}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{sct}+\lambda_1\mathbf{s}^\mathsf{H}(\mathbf{K}+\lambda_1)^{-1}\mathbf{s}+\lambda_2\mathring{\boldsymbol{u}}_\mathrm{sct}^\mathrm{H}\mathring{\boldsymbol{u}}_\mathrm{sct}
\end{align}
We then solve the quadratic form to get the closed-form solution:
\begin{align}
&\mathring{\boldsymbol{u}}_\mathrm{opt} = \left (\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi} + \frac{\lambda_2}{\lambda_1}\mathbf{I} \right )^{-1}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\boldsymbol{\alpha}_\mathrm{opt} = (\mathbf{K}+\lambda_1\mathbf{I})^{-1} (\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{opt})
\end{align}
# Norm of the gradients criteria
In the interest of promoting smoothness of the scattered field, we can also express the regularization of $u_\mathrm{sct}$ in terms of its gradients. That is, the objective becomes:
\begin{equation}
\min_{u \in \mathscr{H},\ \mathring{\boldsymbol{u}}_\mathrm{sct}\in \mathbb{C}^N} \sum_{m=1}^M |s(\mathbf{r}_m) - u(\mathbf{r}_m)-u_\mathrm{sct}(\mathbf{r}_m)|^{p_1} + \lambda_1 \mathrm{Reg}(u) + \lambda_2\sum_{m=1}^M\|\nabla u_\mathrm{sct}(\mathbf{r}_m)\|_{p_2}^{p_2},
\end{equation}
where $\mathrm{Reg}$ is a regularization term. By binding the value of the gradients, we control their magnitudes and receive smoother functions as a result. Smaller gradients result in less jarring transitions in the values. This could be the exception where $p_2=2$ could be preferred over $p_2=1$, since we don't expect as many edges.
## When $p_2=2$
In this case:
\begin{equation}
\sum_{m=1}^M\|\nabla u_\mathrm{sct}(\mathbf{r}_m)\|_2^2 = \sum_{m=1}^M \left | \frac{\partial u_\mathrm{sct}}{\partial \rho}(\mathbf{r}_m)\right |^2 + \sum_{m=1}^M \left | \frac{\partial u_\mathrm{sct}}{\partial \theta}(\mathbf{r}_m)\right |^2 +\sum_{m=1}^M \left | \frac{\partial u_\mathrm{sct}}{\partial \phi}(\mathbf{r}_m)\right |^2,
\end{equation}
where $[\rho\ \theta\ \phi]$ represent the spherical coordinates of vector $\mathbf{r} - \mathbf{r}_0$. However, observe that:
\begin{equation}
\sum_{m=1}^M \left | \frac{\partial u_\mathrm{sct}}{\partial \zeta}(\mathbf{r}_m)\right |^2 = \sum_{m=1}^M \left |\sum_{\nu, \mu}^N \mathring{u}_{\mathrm{sct}, \nu,\mu}\frac{\partial \varphi_{\mathrm{sct,\nu, \mu}}}{\partial \zeta}(\mathbf{r}_m)\right |^2 = \|\partial_\zeta \Phi \mathring{\boldsymbol{u}}_\mathrm{sct}\|_2^2,
\end{equation}
where $\zeta \in \{\rho, \theta, \phi\}$ and $\partial_\zeta \Phi$ is the matrix of partial derivatives of the scattering spherical wave functions with respect to parameter $\zeta$. Therefore:
\begin{equation}
\sum_{m=1}^M\|\nabla u_\mathrm{sct}(\mathbf{r}_m)\|_2^2 = \mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}(\partial_\rho\Phi)^\mathsf{H}\partial_\rho\Phi\mathring{\boldsymbol{u}}_\mathrm{sct} + \mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}(\partial_\theta \Phi)^\mathsf{H}\partial_\theta\Phi\mathring{\boldsymbol{u}}_\mathrm{sct} + \mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}(\partial_\phi\Phi)^\mathsf{H}\partial_\phi\Phi\mathring{\boldsymbol{u}}_\mathrm{sct}
\end{equation}
By defining $\mathbf{D}_\mathrm{sct} = (\partial_\rho\Phi)^\mathsf{H}\partial_\rho\Phi + (\partial_\theta\Phi)^\mathsf{H}\partial_\theta\Phi + (\partial_\phi\Phi)^\mathsf{H}\partial_\phi\Phi$, we have a positive semidefinite matrix and the optimization problem can be rewritten as:
\begin{equation}
\min_{u \in \mathscr{H},\ \mathring{\boldsymbol{u}}_\mathrm{sct}\in \mathbb{C}^N} \sum_{m=1}^M |s(\mathbf{r}_m) - u(\mathbf{r}_m)-u_\mathrm{sct}(\mathbf{r}_m)|^{p_1} + \lambda_1 \mathrm{Reg}(u) + \lambda_2 \mathring{\boldsymbol{u}}_\mathrm{sct}^\mathsf{H}\mathbf{D}_\mathrm{sct}\mathring{\boldsymbol{u}}_\mathrm{sct},
\end{equation}
which is a form of Tikhonov regularization. Two notable cases come to mind:
### For $p_1=2$ and $\ell_2$-based regularization of $u$.
The calculations here are largely redundant with the normed regularization case, so I will write only the result:
\begin{align}
&\mathring{\boldsymbol{u}}_\mathrm{opt} = \left (\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi} + \frac{\lambda_2}{\lambda_1}\mathbf{D}_\mathrm{sct} \right )^{-1}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}^2+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\boldsymbol{\alpha}_\mathrm{opt} = (\mathbf{K}^2+\lambda_1\mathbf{I})^{-1} \mathbf{K}(\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{opt})
\end{align}
### For $p_1=2$ and RKHS-based regularization of $u$.
The calculations here are largely redundant with the RKHS regularization case, so I will write only the result:
\begin{align}
&\mathring{\boldsymbol{u}}_\mathrm{opt} = \left (\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\boldsymbol{\Phi} + \frac{\lambda_2}{\lambda_1}\mathbf{D}_\mathrm{sct} \right )^{-1}\boldsymbol{\Phi}^\mathsf{H}(\mathbf{K}+\lambda_1\mathbf{I})^{-1}\mathbf{s}\\
&\boldsymbol{\alpha}_\mathrm{opt} = (\mathbf{K}+\lambda_1\mathbf{I})^{-1} (\mathbf{s}-\boldsymbol{\Phi}\mathring{\boldsymbol{u}}_\mathrm{opt})
\end{align}
### Gradients of the spherical wave functions of the scattered field
Here, I will calculate the gradient of the spherical wave functions related to the scattering field in order to show $\mathbf{D}_\mathrm{sct}$ can be analytically computed from the data.
We begin by expressing our coordinate system in terms of spherical coordinates. Take vector $\mathbf{r}$ to be:
\begin{equation}
\mathbf{r} = \mathbf{r}_0 + \rho \begin{bmatrix} \cos (\phi) \sin(\theta) \\ \sin(\phi) \sin(\theta) \\ \cos(\theta) \end{bmatrix},\ \rho \in \mathbb{R}_+,\ \phi\in [0, 2\pi),\ \theta \in [0, \pi],
\end{equation}
which can be used to express the spherical wave functions explicitly:
\begin{equation}
\varphi_{\mathrm{sct}, \nu, \mu} = h_\nu(\rho)Y_\nu^\mu(\theta, \phi)
\end{equation}
where $h_\nu$ is the spherical Hankel function of order $\nu$ and can be written as:
\begin{equation}
h_\nu(\rho) = (-\mathrm{i})^{\nu+1}\frac{\mathrm{e}^{\mathrm{i}\rho}}{\rho}\sum_{l=0}^\nu \frac{(\nu + l)!}{(\nu - l)!} \frac{\mathrm{i}^l}{l! (2\rho)^l}
\end{equation}
and $Y_\nu^\mu$ is the spherical harmonic function of order $\nu$, mode $\mu$ and can be computed as:
\begin{equation}
Y_\nu^\mu(\theta, \phi) = \sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}P_\nu^\mu(\cos (\theta))\mathrm{e}^{\mathrm{i}\mu \phi},
\end{equation}
where $P_\nu^\mu$ is the associated Legendre Polynomial of order $\nu$ and mode $\mu$. These polynomials can be written explicitly as:
\begin{equation}
P_\nu^\mu(x) = \frac{(-1)^\mu}{2^\nu \nu!}(1 - x^2)^{\frac{\nu}{2}}\frac{\mathrm{d}^{\nu+\mu}(x^2-1)^{\nu}}{\mathrm{d}x^{\nu+\mu}},
\end{equation}
in which we can see $P_\nu^\mu =0$ if $|\mu|>\nu$, meaning $\varphi_{\mathrm{sct}, \nu, \mu} = 0$ when $|\mu|>\nu$. It's fairly easy to see that the spherical wave functions are separable, making the gradients a matter of computing derivatives on each variable:
\begin{equation}
\nabla \varphi_{\mathrm{sct},\nu,\mu}(\mathbf{r}) = \sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\begin{bmatrix} \frac{\mathrm{d}h_\nu}{\mathrm{d}\rho}(\rho)P_\nu^\mu(\cos(\theta)) \mathrm{e}^{\mathrm{i}\mu\phi}\\ \sin(\theta) h_\nu(\rho) \frac{\mathrm{d}P_\nu^\mu}{\mathrm{d}x}(\cos(\theta)) \mathrm{e}^{\mathrm{i} \mu \phi} \\ h_\nu(\rho) P_\nu^\mu(\cos(\theta)) \frac{\mathrm{d}\mathrm{e}^{\mathrm{i}\mu \phi}}{\mathrm{d}\phi}(\phi)\end{bmatrix}.
\end{equation}
We can immediately take note that the derivative with regards to $\phi$ is trivial$:
\begin{equation}
\frac{\partial \varphi_{\mathrm{sct}, \nu, \mu}}{\partial \phi} = h_\nu(\rho) P_\nu^\mu(\theta) \frac{\mathrm{d}\mathrm{e}^{\mathrm{i}\mu \phi}}{\mathrm{d}\phi}(\phi) = \mathrm{i}\mu h_\nu(\rho) P_\nu^\mu(\theta) \mathrm{e}^{\mathrm{i}\mu \phi} = \mathrm{i}\mu \varphi_{\mathrm{sct}, \nu, \mu}.
\end{equation}
The remaining partial derivatives will be expressed recursively. Since it is necessary to compute all relevant $h_\nu$ and $Y_\nu^\mu$ to calculate $\boldsymbol{\Phi}$, this should be more efficient than a closed formulation of the derivatives. We begin with the radial component:
\begin{align}
&\frac{\mathrm{d}h_0}{\mathrm{d}\rho} = -h_1(\rho)\\
&\frac{\mathrm{d}h_\nu}{\mathrm{d} \rho} = h_{\nu-1}(\rho) - \frac{\nu+1}{\rho}h_\nu (\rho),\ \nu>1
\end{align}
Meaning the Hankel function derivatives can be computed from the values of the previously calculated Hankel functions used to compute $\boldsymbol{\Phi}$. A similar, yet slightly more troublesome parallel can be observed for the associated Legendre polynomials:
\begin{equation}
\frac{\mathrm{d} P_\nu^\mu}{\mathrm{d}x}(x) = \frac{1}{2\sqrt{1-x^2}}((\nu+\mu)(\nu-\mu+1)P_\nu^{\mu-1}(x) - P_\nu^{\mu+1}(x))
\end{equation}
At first glance, this derivative might seem troublesome. However, do recall that the derivative we want is obtained with a chain rule:
\begin{align}
\frac{\partial \varphi_{\nu, \mu}}{\partial \theta}=&\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\frac{\mathrm{d}P_\nu^\mu(\cos (\theta))}{\mathrm{d} \theta}\mathrm{e}^{\mathrm{i}\mu \phi}\\
&\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi} \frac{\mathrm{d}P_\nu^\mu}{\mathrm{d} x}(\cos (\theta))\frac{\mathrm{d}\cos(\theta)}{\mathrm{d}\theta}\\
&-\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}\sin(\theta)\frac{\mathrm{d}P_\nu^\mu}{\mathrm{d} x}(\cos (\theta)).
\end{align}
We express part dependent on $\theta$ to simplify the derivation:
\begin{equation}
\sin (\theta)\frac{\mathrm{d} P_\nu^\mu}{\mathrm{d}x}(\cos(\theta)) = \frac{1}{2}((\nu+\mu)(\nu-\mu+1)P_\nu^{\mu-1}(\cos (\theta)) - P_\nu^{\mu+1}(\cos(\theta))).
\end{equation}
We can therefore obtain the derivative as
\begin{align}
\frac{\partial \varphi_{\mathrm{sct},\nu, \mu}}{\partial \theta}=&-\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}\sin(\theta)\frac{\mathrm{d}P_\nu^\mu}{\mathrm{d} x}(\cos (\theta))h_\nu(\rho)\\
&\frac{1}{2} \left (\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu+1}(\cos (\theta)) - (\nu+\mu)(\nu-\mu+1)\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu-1}(\cos (\theta))\right )h_\nu(\rho),
\end{align}
we now separate the parcels inside the parenthesis to be simpler:
\begin{align}
\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu+1}(\cos (\theta))h_\nu(\rho)=&\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu-1)!(\nu - \mu)}{(\nu+\mu+1)!/(\nu +\mu +1)}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu+1}(\cos (\theta))h_\nu(\rho)\\
&\sqrt{(\nu+\mu)(\nu+\mu-1)}\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-(\mu+1))!}{(\nu+(\mu+1))!}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu+1}(\cos (\theta))h_\nu(\rho)\\
&\sqrt{(\nu+\mu)(\nu+\mu-1)}\mathrm{e}^{-\mathrm{i}\phi} \varphi_{\mathrm{sct}, \nu, \mu+1}(\theta, \phi),\\
\end{align}
and
\begin{align}
\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-\mu)!}{(\nu+\mu)!}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu-1}(\cos (\theta))h_\nu(\rho)=&\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-(\mu-1))!/(\nu - \mu+1)}{(\nu+\mu-1)!(\nu +\mu)}}\mathrm{e}^{\mathrm{i}\mu \phi}P_\nu^{\mu-1}(\cos (\theta))h_\nu(\rho)\\
&\frac{\mathrm{e}^{\mathrm{i}\phi}}{\sqrt{(\nu + \mu)(\nu + \mu -1)}}\sqrt{\frac{2\nu+1}{4\pi}\frac{(\nu-(\mu-1))!}{(\nu+(\mu-1))!}}\mathrm{e}^{\mathrm{i}(\mu-1) \phi}P_\nu^{\mu-1}(\cos (\theta))h_\nu(\rho)\\
&\frac{\mathrm{e}^{\mathrm{i}\phi}}{\sqrt{(\nu + \mu)(\nu + \mu -1)}}\varphi_{\mathrm{sct}, \nu, \mu-1}(\theta, \phi).
\end{align}
Combining both above equalities results in a recursive expression for the partial derivative:
\begin{equation}
\frac{\partial \varphi_{\nu, \mu}}{\partial \theta}=\frac{\sqrt{(\nu + \mu)(\nu + \mu -1)}}{2}(\mathrm{e}^{-\mathrm{i}\phi} \varphi_{\mathrm{sct}, \nu, \mu+1}(\theta, \phi)-\mathrm{e}^{\mathrm{i}\phi} \varphi_{\mathrm{sct}, \nu, \mu-1}(\theta, \phi))
\end{equation}
With these techniques in mind, we can express the gradient of the spherical wave function in analytical form:
\begin{align}
&\frac{\partial \varphi_{\mathrm{sct},\nu, \mu}}{\partial \phi}(\rho, \theta, \phi) = \mathrm{i} \mu \varphi_{\mathrm{sct},\nu, \mu}(\rho, \theta, \phi)\\
&\frac{\partial \varphi_{\mathrm{sct},\nu, \mu}}{\partial \theta}(\rho, \theta, \phi) = \frac{\sqrt{(\nu + \mu)(\nu + \mu -1)}}{2}(\mathrm{e}^{-\mathrm{i}\phi} \varphi_{\mathrm{sct}, \nu, \mu+1}(\theta, \phi)-\mathrm{e}^{\mathrm{i}\phi} \varphi_{\mathrm{sct}, \nu, \mu-1}(\theta, \phi))\\
&\frac{\partial \varphi_{\mathrm{sct},\nu, \mu}}{\partial \rho}(\rho, \theta, \phi) = \begin{cases}-\varphi_{\mathrm{sct},1, 0}(\rho, \theta, \phi),\ \nu=0\\ h_{\nu-1}(\rho)Y_\nu^\mu(\theta, \phi)-\frac{\nu+1}{\rho}\varphi_{\mathrm{sct},\nu, \mu}(\rho, \theta, \phi),\ \nu>0\end{cases}
\end{align}
All of the above partial derivatives can be easily calculated in tandem with $\boldsymbol{\Phi}$ with very little aditional computational work. The angular components especially, since they use the same values featured in $\boldsymbol{\Phi}$.
### Observations
There are, of course, several ways of varying the derivative control regularization term. One example would be to give each partial derivative component a different regularization constant. The model would change very little, however that would introduce moore hyperparameters for us.
Another option is to ignore the radial portion of the derivatives to make use of the fact the angular components are continuous in compact sets. This possibility is somewhat trivial.
Finally, it is possibly to mix and match regularizations, such as controlling the derivatives of $u$ or adding a regular normed regularizer to $u_\mathrm{sct}$.