# Technical Design to Disentangle Features in Federated Learning Settings
In this technical design, we have proposed several potential solutions to disentangle features in federated learning settings.
## Objectives
We would like to disentangle the features into domain-invariant features, domain-specific features, and random features in federated learning settings, which has been demonstrated in Figure 1.

*Figure 1.* Feature disentanglement
## Methods
The quandaries of domain shift have significantly stifled the training process by expeditiously degrading the training efficiency<sup>[1]</sup>. Thus, myriads of researchers have proposed to minimize the domain discrepancy with adversarial training<sup>[2]</sup><sup>[3]</sup>. Marvellous achievements though they have made, these work require access to both source and target data simultaneously, which might be unapplicable in federated learning settings. Therefore, we have proposed the following methods to disentangle features in federated learning settings, whose process has been demonstrated in Figure 2.

*Figure 2.* Process of feature disentanglement in federated learning settings
### Disentangle Domain-invariant Features
Let us assume that we have *N<sup>s</sup>* clients from **source domain** in the federated learning network <img src="https://latex.codecogs.com/gif.latex?D^s&space;=&space;\{{D^s_k\}_{k=1}^N" title="D = \{{D_{k}}\}_{k=1}^N" />, where *D<sup>s</sup><sub>k</sub>* denotes the local feature representations of *k*th client from source domain. Similarly, we have *N<sup>t</sup>* clients from **target domain** in the federated learning network <img src="https://latex.codecogs.com/gif.latex?D^t&space;=&space;\{{D^t_k\}_{k=1}^N" title="D = \{{D_{k}}\}_{k=1}^N" />, where *D<sup>t</sup><sub>k</sub>* denotes the local feature representations of *k*th client from target domain.
First and foremost, we employ a local **Feature Extractor** *E<sup>s</sup><sub>k</sub>* and *E<sup>t</sup><sub>k</sub>* to extract original features <img src="https://latex.codecogs.com/gif.latex?M^s_k&space;=&space;E^s_k(D^s_k)" title="M_k = E_k(D_k)" /> and <img src="https://latex.codecogs.com/gif.latex?M^t_k&space;=&space;E^t_k(D^t_k)" title="M_k = E_k(D_k)" /> respectively.
Then we minimize the domain discrepancy to generate domain-invariant features with adversarial training.
Specifically, for each source-target domain pair <img src="https://latex.codecogs.com/gif.latex?(M^s_k,&space;M^t_k)" title="(M^s_k, M^t_k)" />, we train a global **Domain Classifier** *C<sup>1</sup>* first to distinguish which domain are the features come from.
Then we employ a **Domain-invariant Disentangler** *P<sup>s</sup><sub>k</sub>* and *P<sup>t</sup><sub>k</sub>* to generate features <img src="https://latex.codecogs.com/gif.latex?R^s_k&space;=&space;P^s_k(M^s_k)" title="R^s_k = P^s_k(M^s_k)" /> and <img src="https://latex.codecogs.com/gif.latex?R^t_k&space;=&space;P^t_k(M^t_k)" title="R^t_k = P^t_k(M^t_k)" />. Then we use the generated features <img src="https://latex.codecogs.com/gif.latex?(R^s_k,&space;R^t_k)" title="(R^s_k, R^t_k)" /> to confuse **Domain Classifier** *C<sup>1</sup>*.
Later, **Domain-invariant Disentangler** *P<sup>s</sup><sub>k</sub>* and *P<sup>t</sup><sub>k</sub>* could successfully generate domain-invariant features.
It is noteworthy that the global **Domain Classifier** *C<sup>1</sup>* could only get access to the generated latent representations without any original data communication, which largely precludes the privacy leakage in federated learning settings.
The process of domain-invariant feature disentanglement has been demonstrated in Figure 3.

*Figure 3.* Process of domain-invariant feature disentanglement
### Disentangle Domain-specific Features
Meanwhile, we need to disentangle domain-specific features.
Specifically, after the feature extraction with the local **Feature Extractor** *E<sup>s</sup><sub>k</sub>* and *E<sup>t</sup><sub>k</sub>*, we employ the extracted features *M<sup>s</sup><sub>k</sub>* and *M<sup>t</sup><sub>k</sub>* to the **Domain-specific Disentangler** *Q<sup>s</sup><sub>k</sub>* and *Q<sup>t</sup><sub>k</sub>*, which could generate features <img src="https://latex.codecogs.com/gif.latex?U^s_k&space;=&space;Q^s_k(M^s_k)" title="U^s_k = Q^s_k(M^s_k)" /> and <img src="https://latex.codecogs.com/gif.latex?U^t_k&space;=&space;Q^t_k(M^t_k)" title="U^t_k = Q^t_k(M^t_k)" />.
Then, we freeze the global **Domain Classifier** *C<sup>1</sup>*, and use the generated features *U<sup>s</sup><sub>k</sub>* and *U<sup>t</sup><sub>k</sub>* to predict the corresponding domain labels with the global **Domain Classifier** *C<sup>1</sup>*. The goal of the **Domain-specific Disentangler** is to minimize the loss of domain classification, and generate features within their own domains.
Nevertheless, we should also make sure that **Domain-specific Disentangler** should generate features significantly **different** from the original extracted features *M<sub>k</sub>* and the domain-invariant features *R<sub>k</sub>*. Therefore, we employ an auxiliary **Feature Classifier** *C<sup>2</sup>* in our design.
Since we have the original extracted features *M<sub>k</sub>* and the domain-invariant features *R<sub>k</sub>* generated in the previous steps, we could simply combine them to create a one-class dataset *Z<sub>k</sub>*.
Then we train the global **Feature Classifier** *C<sup>2</sup>*, which is a **one-class SVM** model, to identify the one-class dataset *Z<sub>k</sub>*.
Later, we freeze the global **Feature Classifier** *C<sup>2</sup>*, and use the generated features *U<sup>s</sup><sub>k</sub>* and *U<sup>t</sup><sub>k</sub>* to predict whether it is an **unknown class** with the global **Feature Classifier** *C<sup>2</sup>*. The goal of the **Domain-specific Disentangler** is to minimize the loss of one-class SVM model, and generate features significantly different from the original extracted features *M<sub>k</sub>* or the domain-invariant features *R<sub>k</sub>*.
The process of domain-invariant feature disentanglement has been demonstrated in Figure 4.

*Figure 4.* Process of domain-specific feature disentanglement
### Feature Reconstruction to Ensure Cycle-consistency
Last but not least, we reconstruct the original feature representation with a **Reconstructor**. Then we minimize the discrepancy between the original data and the reconstructed data to ensure cycle-consistency and keep representation integrity. The process of feature reconstruction has been demonstrated in Figure 5.

*Figure 5.* Process of feature reconstruction
## Reference
[1. Learning transferable features with deep adaptation networks](https://arxiv.org/pdf/1502.02791.pdf)
[2. CyCADA: Cycle-consistent adversarial domain adaptation](https://arxiv.org/abs/1711.03213)
[3. Simultaneous deep transfer across domains and tasks](https://arxiv.org/abs/1510.02192)