owned this note
owned this note
Published
Linked with GitHub
# Transfer Learning
###### tags: `Deep Learning for Computer Vision`
## What, When, and Why?
* The data is difficult to collect
* Labeling data is time-consuming and requires a lot of manpower
* The one to one model does not robust
### A More Practical Example
![](https://i.imgur.com/AavWSFh.jpg)
Sometimes it is difficult to collect information on some road conditions, such as there are few pedestrians on rainy days. So we can use the data in virtual games to train the model and apply it in the real world.
## Domain Adaptation in Transfer Learning
* **What’s DA?**
* Leveraging info source to target domains, so that the **same learning task across domains** (or particularly in the target domain) can be addressed.
* Typically all the source-domain data are labeled.
* **Settings**
* Semi-supervised DA : **few** target-domain data are with labels.
* Unsupervised DA : **no label** info available in the target-domain. (shall we address supervised DA?)
* Imbalanced DA : **fewer classes of interest** in the target domain
* Homogeneous vs. Heterogeneous DA
#### Exapmle for Unsupervised DA :
Source domain $D_s=\{\bar{x_s}, \bar{y_s}\}$
Target domain $D_T=\{\bar{x_T}\}$
\begin{equation}
\theta \rightarrow \theta(x_T) \rightarrow \tilde{y_T}
\end{equation}
## Deep Feature is Sufficiently Promising
![](https://i.imgur.com/U9v9d39.png)
Deep learning feature is more powerful than traditional feature.
## Deep Domain Confusion (DDC)
### Deep Domain Confusion: Maximizing for Domain Invariance
Tzeng et al., arXiv: 1412.3474, 2014
<center>
<img style="border-radius: 0.3125em;
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);margin: 2%;" width="500"
src="https://i.imgur.com/r33VFiW.png">
<br>
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
display: inline-block;
color: #999;
padding: 2px;"></div>
</center>
The figure on the right hand side is the result we want. We hope that through the training, the mean of the source domain and the target domain can be very similar in the feature space.
This is a strong assumption because we use the mean to represent the entire domain
<center>
<img style="border-radius: 0.3125em;
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);margin: 2%;" width="500"
src="https://i.imgur.com/ezsQwhJ.png">
<br>
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
display: inline-block;
color: #999;
padding: 2px;"></div>
</center>
Category loss of unlabeled data can't be calculated, so we apply domain loss to replace it. Domain loss is calculated by target domain data and source domain data.
### Domain Confusion by Domain-Adversarial Training
* Y. Ganin et al., ICML 2015
* Maximize domain confusion = Maximize domain classification loss.
* Minimize source-domain data classification loss.
* The derived feature $f$ can be viewed as a disentangled & domain-invariant feature.
<center>
<img style="border-radius: 0.3125em;
box-shadow: 0 2px 4px 0 rgba(34,36,38,.12),0 2px 10px 0 rgba(34,36,38,.08);margin: 2%;"
src="https://i.imgur.com/R7IFa94.png">
<br>
<div style="color:orange; border-bottom: 1px solid #d9d9d9;
display: inline-block;
color: #999;
padding: 2px;">DANN</div>
</center>
![]()
* green : **Feature extractor**
* (1) Extract the features needed by the subsequent network to complete the task.
* (2) Mapping and mixing the source domain samples and target domain samples.
* blue : **Label predictor**
* red : **Domain classifier**
* **gradient reversal layer**
* $\theta_f = \theta_f-\mu(\dfrac{\partial{L_y^i}}{\partial{\theta_f}}-\lambda\dfrac{\partial{L_d^i}}{\partial{\theta_f}})$
* $\lambda_p = \dfrac{2}{1+exp(-\gamma\cdot p)}-1$, where $p$ is the rate of the current step and total iteration, $\gamma = 10$
* $\mu_p = \dfrac{\mu_0}{(1+\alpha\cdot p)^{\beta}}$, $\mu_0=0.01, \alpha=10, \beta=0.75$
Our goal is to train the Domain classifier to be unable to distinguish whether the data comes from source domain or target domain(accuracy = 0.5).
## Beyond Domain Confusion
### Domain Separation Network(DSN)
* Bousmalis et al., NIPS 2016
* **Separate encoders for domain-invariant and domain-specific features.**
* Private/common features are disentangled from each other.
![](https://i.imgur.com/3PKxHAo.png)
* Shared Encoder $E_{c}(x)$ : learns to capture representation components for a given input sample that are shared among domains.
* something like extract front ground
* Private Encoder $E_{p}^{s}(x^{s}), E_{p}^{t}(x^{t})$ : learns to capture domain–specific components of the representation.
* something like extract back ground
* Shared Decoder : learns to reconstruct the input sample by using both the private and source representations.
### Example results
#### During Training
![](https://i.imgur.com/b62B79U.jpg)
#### During Testing
![](https://i.imgur.com/v4SQ7a7.jpg)
# Transfer Learning for Image Synthesis
## Pix2pix
Change image to different style.
![](https://i.imgur.com/6DpTc6b.jpg)
### Object Function
![](https://i.imgur.com/a4hAYzE.jpg)
### Experiment results
![](https://i.imgur.com/YlpUVIq.jpg)
## CycleGAN
Used to solve unpaired training data
![](https://i.imgur.com/TvAJfqx.png)
![](https://i.imgur.com/KLztObx.jpg)
![](https://i.imgur.com/RJIEP9a.png)
### Experiment results
![](https://i.imgur.com/xn12gXm.jpg)
### Image Translation Using Unpaired Training Data
![](https://i.imgur.com/UTxZoTn.png)
## UNIT
Use VAE to represent two images into the same latent space
![](https://i.imgur.com/UR2EG7X.png)
where $z$ is domain variant feature/ representation
![](https://i.imgur.com/md0fAmR.jpg)
![](https://i.imgur.com/QFLcwG7.png)
### Experiment results
![](https://i.imgur.com/UQciuMj.jpg)
## AdaIN
![](https://i.imgur.com/OrROBQT.png)
![](https://i.imgur.com/h8q75ec.png)
![](https://i.imgur.com/my7kMRA.png)
### Object Function
![](https://i.imgur.com/5GRZEKo.jpg)
### Experiment results
![](https://i.imgur.com/TLTwPHy.jpg)
## BicycleGAN
Mutimodel translation
![](https://i.imgur.com/ZUs4Ewq.jpg)
* E : encoding the image content
* G : style condition on $z$
### Experiment results
![](https://i.imgur.com/ZoBcQhl.jpg)
## DRIT
![](https://i.imgur.com/0HwuutG.jpg)
### Main Framework
![](https://i.imgur.com/i8TSpq8.png)
### Attribute Features
![](https://i.imgur.com/7k2cs0h.jpg)
### Inference phase
![](https://i.imgur.com/VXtfG03.jpg)
### Example Results
![](https://i.imgur.com/dcDquAn.jpg)
# Representation Disentanglement
* **Interpretable** deep feature representation
* Disentangle attribute of interest $c$ from the derived latent representation $z$
* Supervised: AC-GAN
* Unsupervised: InfoGAN
![](https://i.imgur.com/52mLmfA.jpg)
## Auxiliay Classifier GAN
![](https://i.imgur.com/DXLZgPP.jpg)
![](https://i.imgur.com/mlNsUjN.jpg)
## InfoGAN
![](https://i.imgur.com/HZ6swnM.png)
* Apply clustering classification. No guarantee in disentangling particular semantics
![](https://i.imgur.com/XNvxUpN.png)
## StarGAN
**multi-domain** image-to-image translation
![](https://i.imgur.com/NLQTW1i.png)
* Idea
* Auxiliary domain classifier as discriminator
* Concatenate image and target domain label as input
* Cycle consistency across domains
![](https://i.imgur.com/q3IdSC1.png)