---
title: 'Paper Note'
auther: wengyc
---
**Index**
---
* Used tag:
`parameter expansion`, `distillation`, `data imbalance`, `feature distillation`, `weight regularization`, `cosine normalization`, `from50`, `feature bridge`
---
[TOC]
---
# Coding for Machine
---
## Learn A Compression for Objection Detection - VAE with a Bridge
[[pdf]]()
###### tags: `feature bridge`
* **My word to the paper**
> Feature is more like been generated rather than extracted.
>
> **Problem** :
> * Detection network is finetuned
>
**Idea**
* it is not natural to directly adapt features for image compression to complex semantic tasks
**Method**
* 
* 
**Note**
* Dataset: CLIC2019 / PASCAL VOC2007 & 2012
* CLIC for training Encoder with reconstruction loss first, then encode + bridge + detection is trained with VOC with detection loss.
* test: VOC2007 test dataset
* Baseline: JPEG, JPEG2000, HEVC-BPG, 2 e2e codec -> reconstruct to RGB for detection
* 
* Encoder in both setting is the same
* Detection is finetuned without the bridge
---
## Scalable Image Coding for Humans and Machines
[[pdf]](https://arxiv.org/abs/2107.08373)
###### tags: `feature bridge`
* **My word to the paper**
>
> **Problem** :
> * Y1 and Y2 are encode independently
>
**Idea**
* Split in latent space
**Method**
* 
**Note**
* Not sure why they mentioned Y is enough to predict T (in page.3)
* which is somehow true if not considering information loss in compression
* Dataset: CLIC2019 / JPEG-AI / PASCAL VOC2007 & 2012
* CLIC and JPEGAI for training Encoder with reconstruction loss first
* Baseline: HEVC-HM, VVC-HTM, 2 e2e codec -> reconstruct to RGB for detection
---
##
[[pdf]]()
###### tags:
* **My word to the paper**
>
> **Problem** :
> *
**Idea**
* 
* This is basically the idea we want with I(XYZ)
*
**Method**
*
**Note**
*
---
## Learning based Multi-modality Image and Video Compression
[[pdf]](https://openaccess.thecvf.com/content/CVPR2022/papers/Lu_Learning_Based_Multi-Modality_Image_and_Video_Compression_CVPR_2022_paper.pdf)
###### tags:
* **My word to the paper**
>
> **Problem** :
> *
**Idea**
*
**Method**
* 
* 
*
**Note**
* Dataset: FLIR, KAIST
*
---
##
[[pdf]]()
###### tags:
* **My word to the paper**
>
> **Problem** :
> *
**Idea**
*
**Method**
*
**Note**
*
# Incremental Learning
---
[CF知乎](https://zhuanlan.zhihu.com/p/40328623)
[IL知乎](https://zhuanlan.zhihu.com/p/55005256)
[git](https://github.com/xialeiliu/Awesome-Incremental-Learning)
---
## Striking a Balance between Stability and Plasticity for Class-Incremental Learning
ICCV 2021 [[pdf]](https://openaccess.thecvf.com/content/ICCV2021/papers/Wu_Striking_a_Balance_Between_Stability_and_Plasticity_for_Class-Incremental_Learning_ICCV_2021_paper.pdf)
###### tags: `distillation` `cosine normalization` `feature distillation` `from50`
* **My word to the paper**
> contrastive learning with LUCIR
> **Problem** :
> * not sure why they tell the story of multi-perspective class independent knowledge?
> * how does this reach such a high average accuracy without exemplar?
**Idea**
* Add contrastive learning and self-supervise learning(rotation classification) for a robust feature extractor. Distillation is done with l2 distance on normalized feature space.
**Method**
* SPB-I

* SPB-M

---
## Supervised Contrastive Replay: Revisiting the Nearest Class Mean Classifier in Online Class-Incremental Continual Learning
CVPR 2021 Workshop [[pdf]](https://arxiv.org/pdf/2103.13885.pdf)
###### tags: `ncm`
* **My word to the paper**
> NCM is easy and efficient to avoid bias toward new class, add in contrastive learning is for learning feature space where data is more clustered.
> **Problem** : why is a projection needed after feature extractor while training?
**Idea**
* Use supervised contrastive learning to learn a better feature space, then use NCM for classification to avoid bias in linear classifier.
* 
**Method**
* 
---
## DER: Dynamically Expandable Representation for Class Incremental Learning
CVPR 2021
[[pdf]](https://arxiv.org/pdf/2103.16788.pdf)
###### tags: `parameter expansion` `distillation` `feature distillation` `data imbalance`
* **My word to the paper**
>
> **Problem** :
**Idea**
* Expand the model for new task then prune it.
* 
**Method**
---
---
## Learning a Unified Classifier Incrementally via Rebalancing
CVPR 2019
[[site]](http://mmlab.ie.cuhk.edu.hk/projects/rebalanced-learning/) [[code]](https://github.com/hshustc/CVPR19_Incremental_Learning)
###### tags: `distillation` `feature distillation`
* **My word to the paper**
>
> **Problem** :
**Idea**

**Method**
---
---
## Continuous Learning in Single-Incremental-Task Scenarios
[[pdf]](https://arxiv.org/pdf/1806.08568.pdf)
---
---
## Less-forgetful Learning for Domain Expansion in Deep Neural Networks
[[pdf]](https://arxiv.org/pdf/1711.05959.pdf)
---
---
## Learning without Memorizing
CVPR 2019
[[pdf]](http://openaccess.thecvf.com/content_CVPR_2019/papers/Dhar_Learning_Without_Memorizing_CVPR_2019_paper.pdf)
---
---
## FearNet: Brain-Inspired Model for Incremental Learning
ICLR 2018
[[pdf]](https://arxiv.org/pdf/1711.10563.pdf)
---
---
## Incremental Classifier Learning with Generative Adversarial Networks
[[pdf]](https://arxiv.org/pdf/1802.00853.pdf)
---
---
## Large Scale Incremental Learning
[[pdf]](https://arxiv.org/pdf/1905.13260.pdf)
###### tags: `distillation` `data imbalance`
* **My word to the paper**
> A linear model can solve the bias for the last learned classes.
> **Problem** : why does the linear model work?
> **Problem** : bias was not seen in the iCaRL confusion matrix
**Method**

* Add a bias correction layer with only two paramenters

* Which fixes the bias caused by imbalanced dataset for old classes

---
---
## Lifelong GAN: Continual Learning for Conditional Image Generation
[[pdf]](https://arxiv.org/pdf/1907.10107.pdf)
###### tags: `distillation`
* **My word to the paper**
> Contrast of the previous model output and the ground truth can be smooth by swaping them when calculating distillation loss.
> **Problem** : how the fuck does this work ?
**Method**
* Backbone : BicycleGAN

* Distillation loss


* Conflict Removal with Auxiliary Data
> The first term encourages the model to reconstruct the inputs of the current task, while the third term encourages the model to generate the same images as the outputs of the old model. In addition, the first term encourages the model to encode the input images to normal distributions, while the second term encourages the model to encode the input images to a distribution learned from the old model.
> (quote from paper)
1. Montage : subset of current image data for distillation
2. Swap : Swap the conditional image At and the ground truth image Bt for distillation.
---
---
## FOOD IMAGE RECOGNITION BY PERSONALIZED CLASSIFIER
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8451422&tag=1)
---
---
## ENHANCING CNN INCREMENTAL LEARNING CAPABILITY WITH AN EXPANDED NETWORK
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8486457)
---
---
## **NETWORK ADAPTATION STRATEGIES FOR LEARNING NEW CLASSES WITHOUT FORGETTING THE ORIGINAL ONES**
2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[[pdf]](https://ieeexplore.ieee.org/document/8682848)
###### tags: `lambda of distillation` `distillation`
* **My word to the paper**
> A compairson between the choose of epsilon and layers to be retrain in new class.
> **Problem** : how the fuck was this accepted by IEEE?
**Method**
* Choose of epsilon


* Compairson of CIFAR100

* Compairson of epsilon

---
---
## **NETTAILOR: Tuning the architecture, not just the weights**
CVPR 2019
[[pdf]](https://arxiv.org/pdf/1907.00274.pdf)
###### tags: `model change`
* **My word to the paper** :
> This is a transfer learning model, which prunes the network in block rather than single parameter.
> **Problem** : Shouldn't it work as well as single parameter but only faster?
**Method**
* Structure :

* Tree situation to prune section :

* Result :

---
---
## Memory Aware Synapses: Learning what (not) to forget
ECCV 2018
[[pdf]](https://arxiv.org/pdf/1711.09601.pdf)
###### tags: `function distillation` `distillation`
* **My word to the paper** :
> Compute the importance of the parameter with the gradient of the difference of function, which punishes changing on the important weightes.
> **Problem** : None.
**Method** :
* Function gradient :

* Importance :

* Total loss :

---
---
## Scalable Recollections for Continual Lifelong Learning
[[pdf]](https://pdfs.semanticscholar.org/8013/3ec669208388df8e6ec327f1273b8e7c86b7.pdf)
---
---
## Encoder Based Lifelong Learning
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8237410&tag=1)
---
---
## Rotate your Networks: Better Weight Consolidation and Less Catastrophic Forgetting
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8545895)
###### tags: `weight regularization` `EWC`
* **My word to the paper** :
> Approximate rotation with network (rotate matrix) compute by SVD so that the FIM (fisher imformation matrix) can be mostly diagonal which leads EWC method with higher accuracy.
> **Problem** : Read EWC and understand FIM.
**Method**
* Idea :


* Indirect rotation :


* Add rotate without changing the structure of origin network :

* Compare EWC / R-EWC :

---
---
## Dynamic Few-Shot Visual Learning without Forgetting
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8578557)
---
---
## Efficient parametrization of multi-domain deep neural networks
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8578945&tag=1)
---
---
## LEARNING TO LEARN WITHOUT FORGETTING BY MAXIMIZING TRANSFER AND MINIMIZING INTERFERENCE
[[pdf]](https://pdfs.semanticscholar.org/2b87/7889ac31b73d1ede70b00eb4c7118ef8eca2.pdf)
---
---
## Lifelong Learning via Progressive Distillation and Retrospection
[[pdf]](http://openaccess.thecvf.com/content_ECCV_2018/papers/Saihui_Hou_Progressive_Lifelong_Learning_ECCV_2018_paper.pdf)
---
---
## Model Transfer with Explicit Knowledge of the Relation between Class Definitions
[[pdf]](https://pdfs.semanticscholar.org/8eed/5f0d4a6b9713380a5830c169ebf4cad88d85.pdf)
---
---
## Overcoming Catastrophic Forgetting with Hard Attention to the Task
[[pdf]](https://pdfs.semanticscholar.org/3087/0ef75aa57e41f54310283c0057451c8c822b.pdf)
---
---
## Progress & Compress: A scalable framework for continual learning
[[pdf]](https://pdfs.semanticscholar.org/394c/990d9621dd4a8cbe966333ffb26078b9816d.pdf)
---
---
## Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
[[pdf]](https://arxiv.org/pdf/1801.10112.pdf)
> better read first
> [name=翁英傑]
---
---
## Revisiting Distillation and Incremental Classifier Learning
[[pdf]](https://arxiv.org/pdf/1807.02802.pdf)
> read this before
> [name=翁英傑]
---
---
## Recent Advances in Zero-Shot Recognition: Toward Data-Efficient Understanding of Visual Content
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8253589&tag=1)
---
---
## Re-evaluating Continual Learning Scenarios: A Categorization and Case for Strong Baselines
[[pdf]](https://arxiv.org/pdf/1810.12488.pdf)
---
---
## MEASURING AND REGULARIZING NETWORKS IN FUNCTION SPACE
[[pdf]](https://pdfs.semanticscholar.org/e3fe/e9244fc47aa9e80006e39352af90f64631fe.pdf)
---
---
## DeeSIL: Deep-Shallow Incremental Learning
[[pdf]](http://openaccess.thecvf.com/content_ECCVW_2018/papers/11130/Belouadah_DeeSIL_Deep-Shallow_Incremental_Learning._ECCVW_2018_paper.pdf)
---
---
## Predicting Visual Exemplars of Unseen Classes for Zero-Shot Learning
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8237638&tag=1)
---
---
## New Metrics and Experimental Paradigms for Continual Learning
[[pdf]](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8575441)
---
---
## Gradient Episodic Memory for Continual Learning
[[pdf]](https://papers.nips.cc/paper/7225-gradient-episodic-memory-for-continual-learning.pdf)
---
---
## Online Continual Learning with Maximally Interfered Retrieval
[[pdf]](https://arxiv.org/pdf/1908.04742.pdf)
---
---
## Gradient based sample selection for online continual learning
[[pdf]](https://arxiv.org/pdf/1903.08671.pdf)
---
---
## Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights
[[pdf]](https://arxiv.org/pdf/1801.06519.pdf)
---
---
## Learning a Unified Classifier Incrementally via Rebalancing
[[pdf]](http://openaccess.thecvf.com/content_CVPR_2019/papers/Hou_Learning_a_Unified_Classifier_Incrementally_via_Rebalancing_CVPR_2019_paper.pdf?fbclid=IwAR0o2tHyR45CIbChLVZk63jkjFA7slgi8Vfc8kTYu8JZy43kRcGqKyEkWiU)
---
---
# Image Retrival
---
---
## Local Features and Visual Words Emerge in Activations
[[pdf]](https://arxiv.org/pdf/1905.06358.pdf)
---
---
# Noise Label
---
---
## Joint Optimization Framework for Learning with Noisy Labels
[[pdf]](https://arxiv.org/pdf/1803.11364.pdf)
* **My word to the paper**
> Take the model prediction as ground truth after few training steps, which believes to solve noise labels.
> **Problem** : Stronge prior is required (e.g. Uniform distribution for Cifar10) which is not commonly known.
**Method**
* Loss Function

* Regularlization
> Constrain the prediction not to all assign to the same class,so it force the new prediction **not to leave prior to far**.

> In this paper they set prior as **uniform distribution**.
> This term make sure the new distribution is **concentrade on single class**.

> Which is more **useful in soft-label**.
---
---
# Multi task Learning
---
---
## Stochastic Filter Groups for Multi-Task CNNs: Learning Specialist and Generalist Convolution Kernels
ICCV 2019
[[site]](https://arxiv.org/pdf/1908.09597.pdf)
###### tags: `share weight`
* **My word to the paper**
> Let the network decide what probability should the [ task1 , share , task2 ] be.
> **Problem** : Is the approximation good enough?
**Idea**


**Method**
* loss function :

* approximate the kl-divergence

---
---