# 11/3 Paper #15
### Continual learning of context-dependent processing in neural networks
[toc]
[Paper link](https://www.nature.com/articles/s42256-019-0080-x.epdf?author_access_token=889XsK8ycbi_0xZ2FpJw1dRgN0jAjWel9jnR3ZoTv0P8gyRnkxk0BV8NCNOCrgcNeUN5GL2D9aBLeyhVvWlMz-ly29hf71ljZ6ZieFK9RStbagxhi6eg7KHglOEgwY1c5v7Tz9xR8OLYDzqw23OWYQ%3D%3D)
---
### 預備知識
<center>RLS(Recursive Least Square)</center>
---
### Background
<center> prefrontal cortex </center>

Function:
<center>Learn “ rules of the game ” and dynamically apply them <br> to map the sensory
inputs to different actions in a context-dependent way.
</center>
---
### 作者
<center>Guanxiong Zeng, Yang Chen, Bo Cui
Shan Yu
University of Chinese Academy of Science </center>
<center>中科院 <br> 2018/10 on Nature Machine Intelligence </center>
---
### Abstract:
Using **Orthogornal Weight Modification(OWM)** with addition Module named "**Context-dependent processing"(CDP)** to solve Catastrophic Forgetting On Sequential learning.
---
### What do we need ?
* Sequential Learning manner
* change the representation of sensory information to form new task
---
### Overview
Concept:
* Sequential Learning manner
* change the representation of sensory information to form new task
----
Tools:
* OWM (Orthogornal Weight Modification)- Over Catastrophic Forgetting
* CDP (Context Depedent Process) Reuse Feature from big model and apply for other task
----
### OWM
Advantage: Didn't store All Previously Data work on Over Catastrophic Forgetting.
Disadvantage: cost time to update!
----
**Orthogornal Weight Modification**
Idea: The reason why the catastrophic Forgetting happen because the previously task weight are modified on current task.
----

---
Trivial Idea: (EWC, MAS, SI)
Then, why don't we keep important weight not drifted more to keep performance?!
---
**Adv** : the implementation is easy and alleviate forgetting.
**DisAdv** : model will learn a compromised weight distributions in the end. All tasks will not have a double-win performance.Also the performance for all is not well.
----

New Idea:
Model weights can only be modified in the direction orthogonal to the subspace spanned by all previously learned inputs ?!
OWM idea!!!
---
### Status
* ~~Sequential Learning manner~~
* change the representation of sensory information to learn new task
---
### CDP
PFC-like Module(Context-Dependent Process)

Specify the task name to change the representation of sensory information for learning a new task.
----
#### Why need this ?
system can't accomplish context-dependent learning by itself.

this module function: rotate the feature and extract the important information for current task.
##### Discussion
Some trivial Solution for this problem:
One task for a new one classifier
Adv: Easy to implement!
Disadv: Scalability and suffer forgetting Problem (shared weight between feature extractor and classifier)

---
### Experiment, Result:
* Check OWM is scalable
* Check PFC-Like module is functional
* Chinese Charactor Classification
* Celebrity Attribute Classification
----
* Check OWM is scalable
(Sequential Training vs Joint training)
Data Set | Classes | Feature Extractor | Concurrent Training by SGD (%) | Sequential Training by OWM (%) | Sequential Training by SGD (%)
--- | --- | --- | --- | --- | ---|
ImageNet | 1000 | ResNet152 | 78.31% | 75.24% | 4.27% |
HWDB1.1 | 3755 | ResNet18 | 97.46% | 93.46% | 35.86% |
----
* Check PFC-Like module is functional
Learn 40 different mappings sequentially with the same sensory inputs.
Learn a single classifier for all 40 different, context-specific mapping rules for 40 task.
Compare above setting vs
multitask training for 40 different separate classifiers.

---
Detail:
Orthognoal Weight Modification:
a). Initialization of parameters: randomly initialize $W_l(0)$ and set $P_l(0) = \dfrac{I_l}{\alpha}$ for $l = 1, ...,L$
b). Forwawrd propagate the uinputs of the $i$th batch in the $j^{th}$ task, then back propagate the errors and calculate weight modifications $\Delta W_{l}^{BP}
(i,j)$ for $W_l(i,j)$ by the standard BP method.
----
c). Update the weight matrix in each layer by
$W_l(i,j) = W_l(i-1,j) + \kappa(i,j) \Delta W_l^{BP}(i,j)$ if $j = 1$
$W_l(i,j) = W_l(i-1,j) + \kappa(i,j) P_l(j-1) \Delta W_l^{BP}(i,j)$ if $j = 1$
where $\kappa(i, j)$ is predefine learning rate
d). Repeat steps from b) to c)
----
e). If the $j^{th}$ task is accomplished. forward propagate the mean of inputs for each batch $(i=1,...,n_j)$ in the $j^{th}$ task successively.<br> In the end, update $P_l$ for $W_l$ as $P_l(j)$ for $W_l$ as $P_l(j) = P_l(n_j,j)$,<br> where $P_l(j) = P_l(n_j,j)$ can be calculated iterativelu according to:
$P_l(i,j) = P_l(i-1,j) - k_l(i,j)x_{l-1}(i,j)^T P_l(i-1,j)$
$k_l(i,j) = P_l(i-1,j)x_{l-1}(i,j) / [1+x_{l-1}(i,j)^TP_l(i-1,j)x_{l-1}(i,j)]$
in which $x_{l-1}(i)$ is the output of the $l-1^{th}$ layer in response to the mean of the inputs in the $i^{th}$ batch of the $j^{th}$ task, and $P_l(j-1).$
f). Reapt steps from (b) to (e) for the next task.
----
Extra Experiment Support:
---
leave:







---
{"metaMigratedAt":"2023-06-15T01:21:11.780Z","metaMigratedFrom":"YAML","title":"Continual learning of context-dependent processing in neural networks","breaks":true,"description":"A Method Related Orthogonal Gradient Updates","contributors":"[{\"id\":\"622370bd-3571-44f0-a0b7-c19b051347e1\",\"add\":6896,\"del\":715}]"}