# 8/20 paper 11
## Overcoming Catastrophic Forgetting for Continual Learning via Model Adaptation
---
## OpenReview: 5, 6, 7 ICLR(2019)
## Authors
Wenpeng Hu, Zhou Lin, Bing Liu, Chongyang Tao, Zhengwei Tao, Dongyan Zhao,
Jinwen Ma, Rui Yan (Peking University)
---
## Abstract
avoid Catastrophic Forgetting:
New Method : Parameter Generation and Model Adaptation (PGMA)
---
## Introduction
----
There has multiple method for catastrophic forgetting (LwF,DEN,EWC,SI,IMM,GEM,GR)
----
* **LwF**(Learning Without Forgetting)
* **DEN**(Dynamic Expandable Network)
Both of them will increase their network architecture when new task need to train.
----
* **EWC**(Elastic Weight Consolidation)
* **SI**(Synaptic Intelligence)
* **IMM**(Incremental Moment Matching)
* **GEM**(gradient episodic memory,
* **GR**(generative replay)
they named above method all are **JP** (**j**oint **p**arameterization) method.
----
**JP**(joint parameterization):
a single joint set of parameters $θ$ to train all task and assume they will perform well.But they have a drawback:
**Accuracy Deterioration**
----
**Accuracy Deterioration**:
Assume We have two task A, B,
the optimal parameter set for A is $θ_A$ but for B is $θ_B$,
if we use JP method,
we will find a compromise parameters $\theta_C$
between $θ_A$ and $θ_B$,
which $\theta_C$ is neither $θ_A$ nor $θ_B$, that approach will cause accuracy deterioration.
---
## FrameWork
----
**PGMA**:
There have three component for this architecture: **S**(Solver), **DG**(Data Generator), **DPG**(Dynamic Parameter Generator)
----
* **S (Solver)**:
with two Parameter Sets:
1. **$θ_0$**: shared Parameter sets
2. **$P_i$**: generated parameters set **$P_i$** for **$i$**-th Task
----
**DG (Data Generator)**:
**DG Encoder**($DG_E$) and **DG Decoder**($DG_D$):
Decoder is the main function for generate the old example point for avoid catastrophic forgetting. Encoder is used to generate the embedding **z**.
----
**DPG (Dynamic Parameter Generator)**:
$f(*)$ function -> input embedding **z** generate dynamic parameters to adapt each sample.
---
## Overview


---
## Experiment
----
Two image datasets:
1. MNIST (60,000/3000/7000) (training/validation/testing)
2. CIFAR-10 (50,000/3000/7000) (training/validation/testing)
Shuffled: 3 tasks and 5 tasks
----
Two text datasets:
1. DBPedia ontology (crowd-sourced dataset) (560,000/10000/60000) 14 classes
2. THUCNews (50,000/5000/10,000) (training/validation/testing)
----
three experiment settings, 2 tasks (7 classes per task), 3 tasks (5, 5, and 4 classes for the three tasks
respectively), and 5 tasks (3, 3, 3, 3, and 2 classes for the 5 tasks respectively)
----
MNIST -> two task {0,1,2,3,4} + {5,6,7,8,9} CIFAR10 is also the same( divided classed to two task).
(5 classes per task) and 5 tasks (2 classes per task)
----
Baseline:
1. EWC
2. IMM
3. GR
---
## training Detail:
----
3-layer network (generated parameters $p$ and the shared parameters $θ_0$ )
----
DPG (T-Net): set the size of each hidden layer to 1000
---
## Results:
----



----

----

----

----

---
<style>
.reveal {
font-size: 24px;
}
img {
width: 50%;
height: auto;
}
div {
resize: both;
}
</style>
---
{"metaMigratedAt":"2023-06-14T23:31:07.846Z","metaMigratedFrom":"YAML","title":"PGMA(Parameter Generator via Model Adaption)","breaks":true,"contributors":"[{\"id\":\"622370bd-3571-44f0-a0b7-c19b051347e1\",\"add\":4510,\"del\":806}]"}