---
title: Learning without Forgetting
tags: LLL
---
# 7/4 Paper #6
### Learning without Forgetting
[toc]
### 預備知識
無
### Background
IEEE 2017
### 作者
李之仲

北京 清華大學 -> CMU -> 芭芭拉大學
### Abstract:
lifelong learning,讓model 先學A任務,再學B任務,之後在兩者的表現上都很好,(重點是sequential train以及 不會災難性遺忘)
在以往的作法內,最trivial的作法就是保留A任務的data然後,在train B任務的時候,倒進來dataset, 一起train讓兩者的performance都不會掉太多。在這裡,作者提出一個不會使用到舊任務data的方法。
---
### Overview
概述了3種和此方法相關的作法

Fine-tuning:
add new layer and fine-tune all parameter with small learning rate
Feature Extraction:
use the extract feature on the old model and add layer for prediction (just tuning the addition layer)
----
### Methods:
Defintion:
$\theta_s$: original shared network
$\theta_o$: original output FC
$\theta_n$: new FC Layer for new task
---
Defintion:
$\theta_s$: original shared network
$\theta_o$: original output FC
$\theta_n$: new FC Layer for new task
Knowledge Distillation loss:


new task loss:


### Benefit:
Relationship to joint training:
didn't use m task training data
Efficiency comparison:
compare with Feature Extraction:
slower but high performance on both two task
compare with jointly training:
faster and didn't use old dataset for training.
---
### Experiment, Result:
Dataset Detail:
original dataset:
ILSVRC 2012 (subset of imagenet)(1,000 classes and more than 1,000,000 training images)
Places365-standard dataset (1, 600, 000 training images)
new task:
1. PASCAL VOC 2012 image classification("VOC")
2. Caltech-UCSD Birds-200-2011 classification("CUB")
3. MIT indoor scene classification("Scenes")
4. MNIST
training data for them:
1. 5,717 for VOC; <--> ILSVRC 2012
2. 5,994 for CUB;
3. 5,360 for Scenes. <--> Places365-standard dataset

----
Multiple new task:
VOC(3 part) transport, animals and objects
scene( large rooms, medium rooms and small rooms)

---
Design choices and alternatives:



### Detail:
expansion:
