# 4DVarNet for space-time interpolation: ongoing developments
## 1.The 4DVarNet algorithm
Let $\mathbf{y}(\Omega)=\lbrace \mathbf{y}_k(\Omega_k) \rbrace$ denotes the partial and potentially noisy observational dataset corresponding to subdomain $\Omega=\lbrace \Omega_k \rbrace \subset \mathcal{D}$, $\overline{\Omega}$ denotes the gappy part of the field and index $k$ refers to time $t_k$. Using a data assimilation state space formulation, we aim at estimating the hidden space $\mathbf{x}=\lbrace \mathbf{x}_k(\Omega_k) \rbrace$ from the observations $\mathbf{y}$.
### 1.1 The variational model
Considering a variational data assimilation scheme, the state analysis $\mathbf{x}^\star$ is obtained by solving the minimization problem:
\begin{align*}
\mathbf{x}^\star= \underset{\mathbf{x}}{\arg\min} \ \mathcal{J}(\mathbf{x})
\end{align*}
where the variational cost function $\mathcal{J}(\mathbf{x})=\mathcal{J}_{\Phi}(\mathbf{x},\mathbf{y},\Omega)$ is generally the sum of an observation term and a regularization term involving an operator $\Phi$ which is typically a dynamical prior:
\begin{align*}
\mathcal{J}_{\Phi}(\mathbf{x},\mathbf{y},\Omega) & = \mathcal{J}^o(\mathbf{x},\mathbf{y},\Omega) + \mathcal{J}_{\Phi}^b(\mathbf{x})\\
& = \lambda_1 || \mathbf{y} - \mathcal{H}(\mathbf{x}) ||^2_\Omega + \lambda_2 || \mathbf{x} - \Phi(\mathbf{x}) ||^2
\end{align*}
with $\mathcal{H}$ the observation operator and $\lambda_{1,2}$ are predefined or learnable scalar weights. This formulation of functional $\cal{J}_{\Phi}(\mathbf{x},\mathbf{y},\Omega)$ directly relates to strong constraint 4D-Var.
For inverse problems with time-related processes, the minimization of functional $\mathcal{J}_\Phi$ usually involves iterative gradient-based algorithms and in particular request to consider the adjoint method in classic model-based variational data assimilation schemes where operator $\Phi$ identifies to a deterministic model $\mathbf{x}_{k+1}=\mathcal{M}(\mathbf{x}_{k})$:
\begin{align*}
\mathbf{x}^{(i+1)} = \mathbf{x}^{(i)} - \alpha \nabla_{\mathbf{x}}\mathcal{J}_{\Phi}(\mathbf{x}^{(i)} ,\mathbf{y},\Omega)
\end{align*}
In our case, we are interested in purely data-driven operator $\Phi$: we consider NN-based Gibbs-Energy (GENN) representations, a way of embedding Markovian priors in CNN which proves to be efficient on SSH altimetric datasets. This enables to use deep learning automatic differentiation tools: the computation of this gradient operator $\nabla_{\mathbf{x}}\mathcal{J}_{\Phi}$ given the architecture of operator $\Phi$ can be seen as a composition of operators involving tensors, convolutions and activation functions.
### 1.2 Trainable solver architecture
The proposed end-to-end architecture consists in embedding an iterative gradient-based solver based on the considered variational representation. As inputs, we consider an observation $\mathbf{y}$, the associated observation domain $\Omega$ and some initialization $\mathbf{x}^{(0)}$. Let us denote by $\Gamma$ this iterative update operator. Following meta-learning schemes, a residual LSTM-based representation of operator $\Gamma$ is considered here where the $i^{th}$ iterative update of the solver is given by:
\begin{align*}
\left \{\begin{array}{ccl}
g^{(i+1)}& = & LSTM \left[ \alpha \cdot \nabla_{\mathbf{x}}\mathcal{J}_{\Phi}(\mathbf{x}^{(i)} ,\mathbf{y},\Omega), h(i) , c(i) \right ] \\~\\
x^{(i+1)}& = & x^{(i)} - {\cal{T}} \left( g^{(i+1)} \right )
\end{array} \right.
\end{align*}
with $g^{(i+1)}$ is the LSTM output using as input gradient $\nabla_{\mathbf{x}}\mathcal{J}_{\Phi}(\mathbf{x}^{(i)} ,\mathbf{y},\Omega)$, while $h(i)$ and $c(i)$ denotes the internal states of the LSTM, $\alpha$ is a normalization scalar and ${\cal{T}}$ a linear or convolutional mapping.
Let note that a CNN architecture could also be used instead of the LSTM representation of $\Gamma$ and that when replacing both the LSTM cell by the identity operator and the minimization function $\mathcal{J}_{\Phi}(\mathbf{x},\mathbf{y},\Omega)$ by its single regularization term $\mathcal{J}_{\Phi}^b(\mathbf{x})$, the gradient-based solver simply leads to a parameter-free fixed-point version of the algorithm.
### 1.3 End-to-end joint learning scheme
Overall, let denote by $\Psi_{\Phi,\Gamma}(\mathbf{x}^{(0)},\mathbf{y},\Omega )$ the output of the end-to-end learning scheme given architectures for both NN-based operators $\Phi$ and $\Gamma$, see Figure below, the initialization $\mathbf{x}^{(0)}$ of state $\mathbf{x}$ and the observations $\mathbf{y}$ on domain $\Omega$.

Then, the joint learning of operators $\lbrace\Phi,\Gamma\rbrace$ is stated as the minimization of a reconstruction cost:
\begin{align*}
\arg \min_{\Phi,\Gamma} \mathcal{L}(\mathbf{x},\mathbf{x}^\star) \mbox{ s.t. }
\mathbf{x}^\star = \Psi_{\Phi,\Gamma} (\mathbf{x}^{(0)},\mathbf{y},\Omega)
\end{align*}
In case of supervised learning, where targets are gap-free:
$\mathcal{L}(\mathbf{x},\mathbf{x}^\star)=||\mathbf{x}-\mathbf{x}^\star||^2+||\nabla_\mathbf{x}-\nabla_{\mathbf{x}^\star}||^2$, i.e. the L2-norm of the difference between state $\mathbf{x}$ and reconstruction $\mathbf{x}^\star$ with an additional term related to the gradient of state $\mathbf{x}$.
In case of unsupervised learning, given the observations $\mathbf{y}$ on domain $\Omega$ and hidden state $\mathbf{x}$, the 4DVar cost function may be used $\mathcal{L}(\mathbf{x},\mathbf{x}^\star) = \lambda_1 || \mathbf{y} - \mathcal{H}(\mathbf{x}) ||^2_\Omega + \lambda_2 || \mathbf{x} - \Phi(\mathbf{x}) ||^2$ with weights $\lambda_1$ and $\lambda_2$ to adapt according to the reliability of the observations.
## 2.The Dataset
The 4DVarNet algorithm has been successfully tested on small datasets, typically sequences of spatio-temporal images with sizes 11x200x200 (NtxNyxNx).
### 2.1 Location
In this Hackathon, the main objective is to make it work efficiently on a larger dataset, available at:
```
# obs
https://s3.eu-central-1.wasabisys.com/melody/NATL/data/gridded_data_swot_wocorr/dataset_nadir_0d_swot.nc
#oi
https://s3.eu-central-1.wasabisys.com/melody/NATL/oi/ssh_NATL60_swot_4nadir.nc
#ref
https://s3.eu-central-1.wasabisys.com/melody/NATL/ref/NATL60-CJM165_NATL_ssh_y2013.1y.nc
https://s3.eu-central-1.wasabisys.com/melody/NATL/ref/NATL60-CJM165_NATL_sst_y2013.1y.nc
```
They are stored using [NetCDF format](https://www.unidata.ucar.edu/software/netcdf/), which can be handled in Python using for instance the [xarray](https://xarray.pydata.org/en/stable/index.html) package.
### 2.2 Description
All the three datasets (reference, data, optimal interpolation) have three dimensions:
- **time** (365)
- **lat** (761)
- **lon** (1721)
All the variables are stored as 2D 761x1721 regular grids along the 365 days with the following coordinates:
- **time** 365 days: _'2012-10-01' '2012-10-02' ... '2013-09-30'_
- **lat** 761 x 1/20° : _27.0 27.05 27.1 27.15 27.2 ... 64.85 64.9 64.95 65.0_
- **lon** 1721 x 1/20° : _-79.0 -78.95 -78.9 -78.85 ... 6.9 6.95 7.0_
#### 2.2.1 Ground Truth
- The SSH Ground Truth **NATL60-CJM165_NATL_ssh_y2013.1y.nc**:
- **ssh** (Sea surface Height) One year long daily datasets provided by the [NATL60](https://meom-group.github.io/swot-natl60/science.html) state-of-the-art oceanic simulation
- The SST complementary Ground Truth **NATL60-CJM165_NATL_sst_y2013.1y.nc**:
- **sst** (Sea surface Temperature): sst may be used for complementary tests to improve the ssh spatio-temporal interpolation
#### 2.2.2 Pseudo-observations
- The pseudo-observations dataset is generated by sampling the SSH Ground Truth with realistic satellite trajectories **dataset_nadir_0d_swot.nc** :
- **mask** : mask (1 -> ocean, 0 -> land)
- **lag** : time deviation (in hour) to the selected day
- **flag** : satellite type (0 -> NADIR, 1 -> SWOT)
- **ssh_obs**: data with additional realistic noise
- **ssh_mod**: data without noise (=model)
#### 2.2.3 Optimal interpolation (OI)
- the state-of-the-art optimal interpolation (OI) dataset based on the previous pseudo-observations. This is the baseline the 4DVarNet algorithm aims at improving **ssh_NATL60_swot_4nadir.nc**:
- **ssh_obs**: OI using ssh_obs in the pseudo-observations dataset
- **ssh_mod**: OI using ssh_mod in the pseudo-observations dataset
Let note that the OI are currently involved when applying the 4DVarNet algorithm on the anomaly $\mathbf{x}-\overline{\mathbf{x}}$ between the raw ssh and the OI (denoted here as $\overline{\mathbf{x}}$ and seen as a large scale component of the SSH). This solution is for now the one giving the best results. The OI is also used as additional input channels in the 4DVarNet algorithm and thus can be seen as an extra covariate that helps to localize the areas wth large anomalies.
## 3 Code available on Github
Here is the repo GitHub with a simplified version of our code: [4DVarNet](https://github.com/CIA-Oceanix/4dvarnet-core) and some explanations regarding its architecture [architecture of the code](https://github.com/CIA-Oceanix/4dvarnet-core/blob/main/doc/code_archi.md)
Key components of the code:
* 4DvarNet pytorch Lightning module
* Multi-GPU/multi-node distribution using pytorch lightning
* Possible on-the-fly batch generation from raw files
## 5 Actions under development
Color code to specify the status the following tasks:
<span style="color:blue">**Done**</span>
<span style="color:green">**In progress**</span>
<span style="color:red">**To do**</span>
As you can see, there are still plenty of things to do :)
### 5.1 R&T CNES 2021
#### 5.1.1 OSSE NATL60
* <span style="color:blue">Finalize the experiments (xp1, 2 and 3), produce the appropriate set of scores, check we won the cup ;) and prepare the products (NetCDF+Notebooks) related to the OSSE Boost-SWOT data challenge:
[DC link](https://github.com/CIA-Oceanix/2021_4DVARNET_CHALLENGE_NATL60) </span> **(Mohamed)**
* <span style="color:red"> Amélioration des aspects strides pour l'apprentissage sur la zone GF "étendue", afin d'éviter les effets de bords </span> **(Quentin)**
* <span style="color:green"> Debuggage de la branche 4DVarNet-core:
-> (1) repartir de la branche de Quentin pour un premier test GF->GF (en forçant à 0 les loss d'apprentissage sur la calibration SWOT).
-> (2) modifier/simplifier le code de quentin pour revenir au pb d'interpolation uniquement (ie, pas de calibration conjointe). Vérifier que les résultats sont similaires à la 1ère expérience
-> (3) ajouter dans ce code l'aspect état augmenté. Vérifier que cette config permet de faire aussi bien voire mieux que les résultats 4dvarnet actuels sur le repo boost-swot.
</span> **(Hugo)**
* <span style="color:blue">Mid-october: Based on the Boost-SWOT data challenge, create a new repo for the "OSSE NATL data challenge", with the three areas, GF, OSMOSIS and NATL
> GF domain: [-65.,-55.,33.,43.]
> OSMOSIS domain: [-19.5,-11.5,45.,55.]
> NATL domain: [-79.,7.,27.,65.] </span>
**(Hugo & Maxime)**
* <span style="color:blue">Based on the plan provided by Maxime in Section 5.5.1, start the redaction of R&T Report (see [Overleaf link](https://www.overleaf.com/4781119179qvqwxtpcsjrm) for the successful scaling up of 4DVarNet</span> **(Maxime)**
#### 5.1.2 OSSE Glorys
* **(Ronan & Maxime)** <span style="color:red">Think about the experiments we want to run at global scale with Glorys (resolution at 1/12°)
* prepare a data challenge
* prepare meeting with CLS </span>
* <span style="color:red"> Run the OSSE</span> **(Benjamin)**
### 5.2 OSTST/TOSCA
#### 5.2.1 OSE
* <span style="color:green">Run the comparison based on the c2 along-track datasets, cf BOOST-SWOT data challenge</span> **(Maxime)**
#### 5.2.1 Multi-tracer/multi-sensor synergies
* <span style="color:red"> 4DVarNet-SLA-SST based on NATL60 datasets on the whole North Atlantic basin </span>**(Hugo?)**
### 5.3 SWOT ST DIEGO
#### 5.1.1 SWOT calibration
* <span style="color:green"> SWOT robustness to noise signals</span> **(Quentin)**
### 5.3 On-going developments:
* **Pull requests**
* <span style="color:green">Clean the PR of the Github repo **(Maxime and Quentin)</span>**
* <span style="color:red">Get GAN-based code (Anis) and make a PR</span> **(Hugo & Ronan)**
* <span style="color:red">Get new iterative approach (Duong) and make a PR</span> **(Quentin)**
* <span style="color:red"> Implement test method on multi-GPU </span> **(Hugo)**
* <span style="color:red"> Improve animation </span> **(Hugo)**
* **Technical issues**
* <span style="color:blue">Improve the gestion of complex domains and the extraction at the center of the patch</span> **(Maxime and Quentin)**
* * <span style="color:green">Add along-track nadir gradients </span>
* <span style="color:red"> Test sur le domaine complet</span> **(Hugo)**
* <span style="color:red"> Discuss hydra/lightning </span> **(Quentin & Benjamin)**
* <span style="color:green">**4DVarNet-stochastic** </span> **(Maxime)**
### 5.4 Publications
* <span style="color:blue">**Quentin**: calibration paper (submitted)<span style="color:red">
* <span style="color:green">**Maxime**: SPDE (inprep)<span style="color:red">
### 5.5 Livrables
#### 5.5.1 R&T CNES 2021
* L1.1 Réplication des résultats OSSE obtenus sur la zone GULFSTREAM à partir des données nadir+SWOT à l’échelle du bassin Atlantique Nord
* L1.2 Réplication des résultats OSSE obtenus sur la zone GULFSTREAM à partir des données nadir seules à l’échelle du bassin Atlantique Nord
* L1.3 Synthèse des OSSE utilisant une formulation variationnelle pour la reconstruction de champs altimétriques à l’échelle du bassin Atlantique Nord
* L2.1 Spécifications des OSE pour le passage à l’échelle
* L2.2 Synthèse des OSE utilisant une formulation variationnelle pour la reconstruction de champs altimétriques à l’échelle du bassin Atlantique Nord
* L2.3 Plan d’action pour la suite des activités liées au portage des applications SLANet à la thématique DUACS, notamment en termes de développements complémentaires en vue d’un portage opérationnel.
#### 5.5.2 OSTST DUACS HR
**WP3 New Neural Networks approaches for SL mapping**
R. Fablet, M. Beauchamp, Q. Febvre, M Ballarotta, A. Pascual
* WP 3.1 Development of a benchmarking framework for learning-based SL mapping using real satellite altimeter datasets
* WP 3.2 Development and evaluation of end-to-end learning frameworks for the SL mapping
* WP 3.3 Learning-based SL mapping using multi-tracer/multi-sensor synergies
#### 5.5.3 SWOT ST DIEGO