{%hackmd SybccZ6XD %}
###### tags: `paper`
# Learning without Forgetting
Goal
> learning new visual capabilities while maintaining performance on existing ones.
## Related method
Original model
> 
The meaning of color in the diagram
> 
Fine-tuning
> drawback: degrades performance on previously learned tasks because the shared parameters change without new guidance for the original task-specific prediction parameters.
> 
Feature Extraction
> drawback: underperforms on the new task because the shared parameters fail to represent some information that is discriminative for the new task.
> 
Joint Training
> drawback: if the training data for previously learned tasks is unavailable.
> 
## Learning without Forgetting
Learning without Forgetting
> Similar to Joint Training, but does not need the old task’s images and labels.
> 
Algorithm
> 
Loss (for new task)
> $\hat y_n$: softmax output of the network
> $y$: ground truth
> 
Loss (for old task)
> $y_o'$: recorded probabilities
> $\hat y_o'$: current probabilities
> 
> 
Relationship to joint training
> the distribution of images
from these tasks may be very different, and this substitution may potentially decrease performance. Therefore, joint
training’s performance may be seen as an upper-bound for
our method.
Limitations
- it cannot properly deal with domains that are continually changing on a spectrum (e.g., old task being classification from topdown view, and new task being classification from views of unknown angles)
- LwF requires all new task training data to be present before computing their old task responses
- the ability of LwF to incrementally learn new tasks is limited, as the performance of old tasks gradually drop
- the gap between LwF and joint training performance on both tasks are larger when experimented on VGG structure