# Paper Record ###### tags: `Papers list` `Basic` : `Paper should be read` `median` : `median level paper` `TF` : `Transformer based model` `NF` : `Not get into further` `GA` : `A piece of garbage` `ITI` : `Interesting idea` `POINTER` : `Latest paper listed` ## Compression ### Image Compression 1. [Variational image compression with a scale hyperprior](https://arxiv.org/abs/1802.01436) **tag: Basic** 2. [Practical Full Resolution Learned Lossless Image Compression](https://arxiv.org/abs/1811.12817) **tag: Basic** 3. [ANFIC](https://arxiv.org/abs/2107.08470) **tag: median** 4. [Lossy Image Compression with Normalizing Flows](https://arxiv.org/abs/2008.10486) **tag: median, NF** 5. [Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules](https://openaccess.thecvf.com/content_CVPR_2020/papers/Cheng_Learned_Image_Compression_With_Discretized_Gaussian_Mixture_Likelihoods_and_Attention_CVPR_2020_paper.pdf) **tag: median** 6. [NLA](https://arxiv.org/abs/1910.06244) **tag: median** 7. [Neural Inter-Frame Compression for Video Coding](https://openaccess.thecvf.com/content_ICCV_2019/papers/Djelouah_Neural_Inter-Frame_Compression_for_Video_Coding_ICCV_2019_paper.pdf) **tag: Basic** 8. [End-to-end Optimized Image Compression](https://arxiv.org/abs/1611.01704) **tag: NF** 9. [Checkboard](https://arxiv.org/abs/2103.15306) **tag: median** 10. [Variable Rate Deep Image Compression with Modulated Autoencoder](https://arxiv.org/abs/1912.05526) **tag: GA** 11. [Generative Adversarial Networks for Extreme Learned Image Compression](https://arxiv.org/abs/1804.02958) **tag: median, GAN** 12. [Integer Discrete Flows and Lossless Compression](https://arxiv.org/abs/1905.07376) **tag 13. [Coarse to fine image compression](https://huzi96.github.io/coarse-to-fine-compression.html)**tag: basic** 14. [Multi-scale and Context-adaptive Entropy Model for Image Compression](https://arxiv.org/abs/1910.07844)**tag:Basic** 15. [Channel-wise Autoregressive Entropy Models for Learned Image Compression](https://arxiv.org/abs/2007.08739)**tag** 16. [Transformer based image compression](https://arxiv.org/abs/2111.06707)**tag** 17. [Joint Autoregressive](https://proceedings.neurips.cc/paper/2018/file/53edebc543333dfbf7c5933af792c9c4-Paper.pdf)**tag:basic** 18. [Casual context prediction](https://arxiv.org/pdf/2011.09704.pdf)**tag: ITI** 19. [SandWich image compression](https://hhoppe.com/sandwich.pdf)**tag** 20. [Tiny-LIC .. V1 and V2](https://arxiv.org/abs/2204.11448)**tag: TF** 21. [ELIC](https://arxiv.org/abs/2203.10886)**tag: median, channel-wise** 22. [Universal Deep Image Compression via Content-Adaptive Optimization with Adapters](https://arxiv.org/abs/2211.00918)**tag** 23. [Density Modeling of Images using a Generalized Normalization Transformation](https://arxiv.org/abs/1511.06281)**tag** 24. [Qualcomm transformer-based image compression](https://paperswithcode.com/paper/transformer-based-transform-coding)**tag: TF** 25. [Channel-Level Variable Quantization Network for Deep Image Compression](https://arxiv.org/abs/2007.12619)**tag** 26. [Asymmetric Gained Deep Image Compression With Continuous Rate Adaptation](https://openaccess.thecvf.com/content/CVPR2021/papers/Cui_Asymmetric_Gained_Deep_Image_Compression_With_Continuous_Rate_Adaptation_CVPR_2021_paper.pdf)**tag** 27. [Joint Global and Local Hierarchical Priors for Learned Image Compression](https://arxiv.org/abs/2112.04487)**tag** 28. [Variable-Rate Deep Image Compression through Spatially-Adaptive Feature Transform](https://arxiv.org/abs/2108.09551)**tag** 29. [Content Adaptive Latents and Decoder for Neural Image Compression](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136780545.pdf)**tag** POINTER ### Video Compression 1. [DVC](https://ieeexplore.ieee.org/document/9072487) **tag: Basic, GA** 2. [Learned Video Compression via Joint Spatial-Temporal Correlation Exploration](https://arxiv.org/abs/1912.06348) **tag: median** 3. [M-LVC](https://openaccess.thecvf.com/content_CVPR_2020/papers/Lin_M-LVC_Multiple_Frames_Prediction_for_Learned_Video_Compression_CVPR_2020_paper.pdf) **tag: median** 4. [LVC](https://arxiv.org/abs/1811.06981) **tag: median** 5. [CANF-VC](https://arxiv.org/abs/2207.05315)**tag: median** 6. [ELF-VC](https://arxiv.org/abs/2104.14335)**tag:median** 7. [TCM](https://arxiv.org/abs/2111.13850)**tag: median** 8. [Alpha-VC](https://arxiv.org/abs/2207.14678)**tag:median** 9. [FVC](https://arxiv.org/abs/2105.09600)**tag** 10. [NVC](https://arxiv.org/abs/2007.04574)**tag:median** 11. [DCVC](https://arxiv.org/pdf/2109.15047.pdf)**tag** 12. [Google VCT](https://arxiv.org/abs/2206.07307)**tag: median** 13. [Generalized Difference Coder: A Novel Conditional Autoencoder Structure for Video Compression](https://arxiv.org/abs/2112.08011)**tag: conditional coding formula** 14. [Boosting neural video codecs by exploiting hierarchical redundancy](https://arxiv.org/abs/2208.04303)**tag** 15. [coarse-to-fine video compression](https://openaccess.thecvf.com/content/CVPR2022/papers/Hu_Coarse-To-Fine_Deep_Video_Coding_With_Hyperprior-Guided_Mode_Prediction_CVPR_2022_paper.pdf)**tag: content-adaptive** 16. [Hybrid Spatial-Temporal Entropy Modelling for Neural Video Compression](https://arxiv.org/pdf/2207.05894.pdf)**tag: median, ACMMA** 17. [MIMT](https://openreview.net/forum?id=j9m-mVnndbm)**tag: TF, ITI** ### Latent Compression 1. [Back-and-Forth prediction for deep tensor compression](https://arxiv.org/abs/2002.07036) **tag: Basic** 2. [Content Adaptive Optimization for Neural Image Compression](https://arxiv.org/abs/1906.01223)**tag** 3. [Content adaptive and error propagation aware deep video compression](https://arxiv.org/abs/2003.11282)**tag** 4. [Reducing The Amortization Gap of Entropy Bottleneck In End-to-End Image Compression](https://arxiv.org/abs/2209.00964)**tag** ### Entropy modeling 1. [EntroFormer](https://arxiv.org/abs/2202.05492)**tag:Basic** 2. [ContextFormer](https://arxiv.org/abs/2203.02452)**tag: TF** 3. [LEARNING ACCURATE ENTROPY MODEL WITH GLOBAL REFERENCE FOR IMAGE COMPRESSION](https://openreview.net/pdf/06784d9497e2c81c4f81b487b90f789b97d82af0.pdf)**tag** ## Content adaptive 1. [Variable Rate Deep Image Compression With a Conditional Autoencoder](https://openaccess.thecvf.com/content_ICCV_2019/papers/Choi_Variable_Rate_Deep_Image_Compression_With_a_Conditional_Autoencoder_ICCV_2019_paper.pdf)**tag** 2. [Content oriented image compression](https://arxiv.org/abs/2207.14168)**tag** 3. [RaFC](https://arxiv.org/abs/2009.05982) **tag** 4. [Optical Flow and Mode Selection for Learning-based Video Coding](https://arxiv.org/abs/2008.02580)**tag: optical flow** ## Generative model ### Image Generation 1. [Augmented Normalizing Flows: Bridging the Gap Between Generative Flows and Latent Variable Models](https://arxiv.org/abs/2002.07101) **tag** 2. [Conditional Image Synthesis With Auxiliary Classifier GANs](https://arxiv.org/abs/1610.09585) **tag: median, NF** 3. [Taming Transformers for High-Resolution Image Synthesis](https://arxiv.org/abs/2012.09841)**tag: TF, VQ, ** 4. [Vector Quantized Image-to-Image Translation](https://arxiv.org/abs/2207.13286)**tag** 5. [Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes](https://arxiv.org/abs/2111.12701)**tag** 6. [Draft-and-Revise: Effective Image Generation with Contextual RQ-Transformer](https://arxiv.org/abs/2206.04452)**tag: ITI** 7. [Autoregressive Image Generation using Residual Quantization](https://arxiv.org/abs/2203.01941)**tag: RQ-VAE, ITI** 8. [Neural Discrete Representation Learning](https://arxiv.org/abs/1711.00937)**tag: VQ-VAE** 9. [Generating Diverse High-Fidelity Images with VQ-VAE-2](https://arxiv.org/abs/1906.00446)**tag** 10. [Hierarchical Quantized Autoencoders](https://proceedings.neurips.cc/paper/2020/file/309fee4e541e51de2e41f21bebb342aa-Paper.pdf)**tag** ### Video Generation 1. [Stochastic Video Generation with a Learned Prior](https://arxiv.org/abs/1802.07687) **tag: DLP project** 2. [Unsupervised Learning of Video Representations using LSTMs](https://arxiv.org/abs/1502.04681) **tag:DLPHW** ## Optical flow 1. [Scale-space flow for end-to-end optimized video compression](https://openaccess.thecvf.com/content_CVPR_2020/papers/Agustsson_Scale-Space_Flow_for_End-to-End_Optimized_Video_Compression_CVPR_2020_paper.pdf) **tag: DLP final project** 2. [RAFT](https://arxiv.org/abs/2003.12039)**tag** 3. [CRAFT](https://arxiv.org/abs/2203.16896)**tag: TF** 4. [PWC-Net](https://arxiv.org/abs/1709.02371)**tag: Basic** 5. [FlowFormer](https://arxiv.org/abs/2203.16194)**tag** ## Special Technique 1. [Soft then Hard: Rethinking the Quantization in Neural Image Compression](https://arxiv.org/abs/2104.05168) **tag: median** 2. [Confident Adaptive Language Modeling](https://arxiv.org/abs/2207.07061)**tag: ITI** 3. [Swin transformer V2](https://arxiv.org/abs/2111.09883)**tag: ITI** ## Complexity Issue 1. [MLP Transformer](https://arxiv.org/abs/2105.01601) **tag: TF** 2. [Pool Transformer](https://arxiv.org/abs/2111.11418) **tag: TF** 3. [Efficient Transformer](https://arxiv.org/abs/2009.06732) **tag: TF** 4. [Confident Adaptive Language Modeling](https://arxiv.org/abs/2207.07061) **tag: hard** 5. [EfficientFormer](https://arxiv.org/abs/2206.01191)**tag:TF, Basic** 6. [Axial Attention in Multidimensional Transformers](https://arxiv.org/abs/1912.12180)**tag:TF** 7. [Sparse Sinkhorn Attention](https://arxiv.org/abs/2002.11296) **tag: NLP-kind TF** 8. [LongFOrmer](https://arxiv.org/abs/2004.05150)**tag: TF** 9. [Masked autoencoder](https://arxiv.org/abs/2111.06377)**tag:TF** 10. [BeiT](https://arxiv.org/abs/2106.08254)**tag:TF** 11. [Slimmable neural network](https://arxiv.org/abs/1812.08928)**tag: TF** 12. [AdaViT: Adaptive Tokens for Efficient Vision Transformer](https://arxiv.org/abs/2112.07658)**tag:TF** 13. [Pyraformer](https://openreview.net/forum?id=0EXmFzUn5I)**tag** 14. [Neighboorhood attention Transformer](https://arxiv.org/abs/2204.07143)**tag: TF** 15. [Adaptive Token Sampling For Efficient Vision Transformers](https://arxiv.org/abs/2111.15667)**tag: TF, ITI** 16. [Max Vit](https://arxiv.org/abs/2204.01697)**tag: TF** 17. [Star-Transformer](https://arxiv.org/abs/1902.09113)**tag: NLP-kind TF** 18. [Dynamic VIt](https://arxiv.org/abs/2106.02034)**tag** 19. [IA-RED2](https://arxiv.org/abs/2106.12620)**tag** ## Detection 1. [Disentangled Representation Learning GAN for Pose-Invariant Face Recognition](https://openaccess.thecvf.com/content_cvpr_2017/papers/Tran_Disentangled_Representation_Learning_CVPR_2017_paper.pdf) **tag: NF** 2. [Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation](https://arxiv.org/abs/1506.07452) **tag: NF** 3. [Deep Learning Markov Random Field for Semantic Segmentation](https://arxiv.org/abs/1606.07230) **tag: median** 4. [Feature Space Optimization for Semantic Video Segmentation](http://vladlen.info/papers/FSO.pdf) **tag: NF, basic** 5. [Semantic Image Segmentation via Deep Parsing Network](https://arxiv.org/abs/1509.02634) **tag: NF, GA** 6. [Class-independent sequential full image segmentation, using a convolutional net that finds a segment within an attention region, given a pointer pixel within this segment](https://arxiv.org/abs/1902.07810) **tag: ItI** 7. [SimMIM: A Simple Framework for Masked Image Modeling](https://arxiv.org/abs/2111.09886)**tag:TF** 8. [Video Swin Transformer](https://arxiv.org/abs/2106.13230)**tag:TF** 9. [Swin Transformer](https://arxiv.org/abs/2103.14030)**tag:basic** 10. [3D huamn pose estimation](https://arxiv.org/abs/1811.11742)**tag: TF** 11. [MultiScale Vision Transformer](https://arxiv.org/abs/2104.11227)**tag:TF** 12. [Axial DeepLab Panoptic Segmentation](https://arxiv.org/abs/2003.07853)**tag:TF** 13. [TubeFormer](https://arxiv.org/abs/2205.15361)**tag:TF** 14. [Video Panoptic segmentation](https://arxiv.org/abs/2006.11339)**tag** 15. [UPSNet: A Unified Panoptic Segmentation Network](https://arxiv.org/abs/1901.03784)**tag** 16. [Video Instance Segmentation using Inter-Frame Communication Transformers](https://arxiv.org/abs/2106.03299)**tag: TF** 17. [DERT](https://arxiv.org/abs/2005.12872)**tag:TF** 18. [Vision Transformer with Deformable transformer](https://arxiv.org/abs/2201.00520)**tag:TF, basic** 19. [Pyramid Vision Transformer](https://arxiv.org/abs/2102.12122)**tag: TF** 20. [Video Transformer Network](https://arxiv.org/abs/2102.00719)**tag: TF** 21. [Max-DeepLab](https://openaccess.thecvf.com/content/CVPR2021/papers/Wang_MaX-DeepLab_End-to-End_Panoptic_Segmentation_With_Mask_Transformers_CVPR_2021_paper.pdf)**tag: TF** 22. [YOLOX](https://arxiv.org/abs/2107.08430) **tag** 23. [TransTrack](https://arxiv.org/abs/2012.15460)**tag:TF** 24. [TrackFormer](https://arxiv.org/abs/2101.02702)**tag:TF** 25. [CrossVit](https://arxiv.org/abs/2103.14899)**tag** 26. [CabViT: Cross Attention among Blocks for Vision Transformer](https://arxiv.org/abs/2211.07198)**tag: TF** 27. [Mobile-Former: Bridging MobileNet and Transformer](https://arxiv.org/abs/2108.05895)**tag: modile-net** ## Super resolution 1. [BSRT: Improving Burst Super-Resolution with Swin Transformer and Flow-Guided Deformable Alignment](https://arxiv.org/abs/2204.08332)**tag:TF** ## Coding for machine 1. [SSSIC: Semantics-to-Signal Scalable Image Coding With Learned Structural Representations](https://ieeexplore.ieee.org/document/9585549) **tag: Basic** 2. [Scalable Image Coding for Humans and Machines](https://arxiv.org/abs/2107.08373) **tag** ## Diffusion model ### Diffusion Concept 1. [Diffusion formula inferation](https://lilianweng.github.io/posts/2021-07-11-diffusion-models/#nice) **tag: Basic** 2. [The wake-sleep algorithm for unsupervised neural networks](https://www.cs.toronto.edu/~hinton/csc2535/readings/ws.pdf) **tag** 3. [Context-aware Synthesis for Video Frame Interpolation](https://arxiv.org/abs/1803.10967) **tag: NF** 4. [Variational Diffusion Models](https://arxiv.org/abs/2107.00630) **tag** 5. [Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2006.11239)**tag:Basic** 6. [ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models](https://arxiv.org/abs/2108.02938) **tag:hard** 7. [Palette: Image-to-Image Diffusion Models](https://arxiv.org/abs/2111.05826) **tag:hard** 8. [Score-Based Generative Modeling through Stochastic Differential Equations](https://arxiv.org/abs/2011.13456) **tag:Very_Hard** 9. [Lossy Compression with Gaussian Diffusion](https://arxiv.org/abs/2206.08889) **tag:median** 10. [Denoising Diffusion Implicit Models](https://arxiv.org/abs/2010.02502) **tag:median** 11. [Deep Unsupervised Learning using Nonequilibrium Thermodynamics](https://arxiv.org/abs/1503.03585)**tag:Basic** ## Interpolation 1. [Softmax Splatting for Video Frame Interpolation](https://arxiv.org/abs/2003.05534) **tag: DLP final project** 2. [Splatting-based Synthesis for Video Frame Interpolation](https://arxiv.org/pdf/2201.10075v1.pdf) **tag: DLP final project** ## Deep Learning Basic model 1. [GAN](https://arxiv.org/abs/1406.2661) **tag** 2. [Normalizing Flows: An Introduction and Review of Current Methods](https://www.researchgate.net/publication/341222454_Normalizing_Flows_An_Introduction_and_Review_of_Current_Methods) **tag** 3. [ViT](https://arxiv.org/abs/2010.11929) **tag: TF** 4. [Swin Transformer](https://arxiv.org/abs/2103.14030) **tag: Basic, TF** 5. [Pixel Recurrent Neural Networks](https://arxiv.org/abs/1601.06759) **tag: Basic** 6. [PixelCNN](https://arxiv.org/abs/1606.05328) **tag: Basic** 7. [SFT](https://arxiv.org/abs/1804.02815) **tag: basic** 8. [Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211)**tag** 9. [Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks](https://arxiv.org/abs/1711.10305)**tag** 10. [Attention is all you need](https://arxiv.org/abs/1706.03762)**tag** 11. [Non-local attention](https://arxiv.org/abs/1711.07971)**tag** 12. [R-CNN](https://arxiv.org/abs/1311.2524)**tag** 13. [Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset](https://arxiv.org/abs/1705.07750)**tag** 14. [Gumbel SoftMax](https://arxiv.org/abs/1611.01144)**tag** ## Paper haven't been read 1. [FLOW-GAN](https://arxiv.org/abs/1705.08868) 2. [Variable Rate Image Compression with Recurrent Neural Networks](https://arxiv.org/abs/1511.06085) 3. [Deformable Video Transformer](https://arxiv.org/abs/2203.16795) 4. [LVQ-VAE](https://openreview.net/forum?id=1pGmKJvneD7) ## shakey paper 1. [Density Modeling of Images using a Generalized Normalization Transformation](https://www.researchgate.net/publication/284218796_Density_Modeling_of_Images_using_a_Generalized_Normalization_Transformation) **tag**