# 2023/5/7 TDL ###### tags: `to do list` ### 1. paper to read * [Transformer-XL](https://arxiv.org/abs/1901.02860) **finished** * [Unlimiformer: Long-Range Transformers with Unlimited Length Input](https://arxiv.org/abs/2305.01625) **finished** * [MAGVLT: Masked Generative Vision-and-Language Transformer](https://arxiv.org/abs/2303.12208) **finished** * [Video Frame Interpolation Transformer](https://arxiv.org/abs/2111.13817) **finished** * [Efficient Long-Text Understanding with Short-Text Models](https://arxiv.org/abs/2208.00748) **finished** `Locality of information assumption:` <img src="https://hackmd.io/_uploads/SypZK34E2.png" width=400> * `TK msg` : `Enc is not necessorily need to use whole input sequence as attention input` <img src="https://hackmd.io/_uploads/rJVN02VN2.png" width=400> ## * [Understanding Masked Autoencoders via Hierarchical Latent Variable Models](https://openreview.net/pdf?id=fhZxgtNsrQ) * [FlowFormer++: Masked Cost Volume Autoencoding for Pretraining Optical Flow Estimation](https://arxiv.org/pdf/2303.01237.pdf)**finished** * [Towards Efficient Use of Multi-Scale Features in Transformer-Based Object Detectors](https://arxiv.org/pdf/2208.11356.pdf) * [Joint Token Pruning and Squeezing Towards More Aggressive Compression of Vision Transformers](https://arxiv.org/pdf/2304.10716.pdf) * [TransFlow: Transformer as Flow Learner](https://arxiv.org/pdf/2304.11523.pdf) * [Rethinking Local Perception in Lightweight Vision Transformer](https://arxiv.org/pdf/2303.17803.pdf) * [NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers](https://arxiv.org/abs/2211.16056) * [BiFormer: Vision Transformer with Bi-Level Routing Attention](https://arxiv.org/abs/2303.08810)**finish** * [Shunted Self-Attention via Multi-Scale Token Aggregation](https://arxiv.org/abs/2111.15193)**finish** * [Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers](https://arxiv.org/abs/2211.11315) **finished** * [IMPROVING CORRUPTION ROBUSTNESS WITH ADVERSARIAL FEATURE ALIGNMENT TRANSFORMERS](https://openreview.net/pdf?id=YWZ90TiPBM) * [BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation](https://arxiv.org/abs/2304.02225) * [Q-DETR: An Efficient Low-Bit Quantized Detection Transformer](https://arxiv.org/pdf/2304.00253.pdf)**finished** * [Vision Transformer with Super Token Sampling](https://arxiv.org/abs/2211.11167) * [Multi-Realism Image Compression with a Conditional Generator](https://arxiv.org/abs/2212.13824) * [Motion Information Propagation for Neural Video Compression](https://arxiv.org/pdf/2303.02959.pdf)**finished** * [Less is More: CLIPBERT for Video-and-Language Learning via Sparse Sampling](https://arxiv.org/abs/2102.06183) * [AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders](https://arxiv.org/abs/2211.09120) * [Hard Patches Mining for Masked Image Modeling](https://arxiv.org/abs/2304.05919) * [MAGVIT: Masked Generative Video Transformer](https://arxiv.org/abs/2212.05199)**finished** * [Neural Video Compression using GANs for Detail Synthesis and Propagation](https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136860549.pdf) * [Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression](https://arxiv.org/abs/2112.13309)**finished** * [Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution](https://arxiv.org/abs/2303.16513)**finished** * [Neural Distributed Image Compression with Cross-Attention Feature Alignment](https://arxiv.org/abs/2207.08489) **finished** * [Neural Distributed Image Compression Using Common Information](https://www.imperial.ac.uk/media/imperial-college/research-centres-and-groups/ipc-lab/Neural-distributed-image-compression-using-common-information.pdf)**finished** * [Enhanced Invertible Encoding for Learned Image Compression](https://arxiv.org/abs/2108.03690)**finished** * [SLIC: Self-Conditioned Adaptive Transform with Large-Scale Receptive Fields for Learned Image Compression](https://arxiv.org/abs/2304.09571)**finished** * [Neural Compression-Based Feature Learning for Video Restoration](https://arxiv.org/abs/2203.09208)**finished** * [Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder](https://arxiv.org/abs/2305.02541)**finish** * [Learning Decorrelated Representations Efficiently Using Fast Fourier Transform](https://arxiv.org/abs/2301.01569) * [Barlow Twins: Self-Supervised Learning via Redundancy Reduction](https://arxiv.org/abs/2103.03230) * [Motion Information Propagation for Neural Video Compression](https://openaccess.thecvf.com/content/CVPR2023/papers/Qi_Motion_Information_Propagation_for_Neural_Video_Compression_CVPR_2023_paper.pdf) * [Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation](https://arxiv.org/abs/2303.00440) * [Latency Matters: Real-Time Action Forecasting Transformer](https://openaccess.thecvf.com/content/CVPR2023/papers/Girase_Latency_Matters_Real-Time_Action_Forecasting_Transformer_CVPR_2023_paper.pdf)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up