# A Fast Transformer-based General-Purpose Lossless Compressor https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor Propose a new Transformer based architechutre for fast inference time and parallel execution. ![](https://hackmd.io/_uploads/ryxds0i33.png) ![](https://hackmd.io/_uploads/ByYDhAo22.png) Estimated probability is sent to Arithmetic Coder for encoding. At the same time, a copy of the estimated probability is sent to Back-prop controller to make backpropagation decisions. Back-propagation (BP) provides an opportunity to fit model into local data distribution, but it also slows down the compression spped. Three classes of NN based approaches: - Static-Pretrainedg - Dynamic-Pretrained - **Dynamic-Random** (need a weight initzalizaion scheme) ![](https://hackmd.io/_uploads/Hk9Ely322.png) ![](https://hackmd.io/_uploads/B1hJWkhn2.png) ![](https://hackmd.io/_uploads/B1jgQJnnh.png) *Higher is better ![](https://hackmd.io/_uploads/B1mSNJnh2.png) ![](https://hackmd.io/_uploads/S12KNyn3n.png) IDEAS: - The methods is GPU intensive, is it a problem? - Context mixing with simple models - Better BP scheme - Detect distribution shift for BP, use statistical methods ??? - Weight initialization with some Bayesian priors? - Bayesian parameter updates? #### Literature overview: - DZip: Improved general-purpose loss less compression based on novel neural network modeling. In 2021 Data Compression Conference. https://arxiv.org/abs/1911.03572 - B. Knoll. 2014. CMIX. (2014). http://www.byronknoll.com/cmix.html - B. Knoll. 2016. Tensorflow-compress. https://github.com/byronknoll/ tensorflow-compress - B. Knoll. 2020. NNCP: Lossless Data Compression with Neural Networks. (2020). https://bellard.org/nncp/ - A Deep Context Model for High Efficiency Arithmetic Coding. In 2019 International Conference on Artificial Intel ligence in Information and Communication # Accelerating General-Purpose Lossless Compression via Simple and Scalable Parameterization This observation guides the designing of an interpretable structure for data compression, rather than learning implicitly from data like Recurrent Neural Network (RNN) and attention. Based on this observation, we disentangle the compression model into order learning and feature learning, which were fused in a large module in previous works. A parameterized ordered mask unit is established to learn the ordered importance of history symbols. - Dynamic coding, statr with a random state - L0-regularized logistic regression-based compressor (how to optimize L0 ? ) - A simple MLP compressor can achieve state-of-theart compression performance with a much faster compression speed with establishing ordered importa ![](https://hackmd.io/_uploads/SyQ3BCh3h.png) Last symbols are more important: ![](https://hackmd.io/_uploads/SJBFwR23n.png) ![](https://hackmd.io/_uploads/HJ_jY0323.png) ![](https://hackmd.io/_uploads/SyLU50hh3.png) ![](https://hackmd.io/_uploads/BkAXElh2n.png) ![](https://hackmd.io/_uploads/r1tE4g23n.png) ![](https://hackmd.io/_uploads/Skz3Hbp32.png) IDEAS: CNN + Classic windows, window choice depends on initial parameters. Source: https://arxiv.org/pdf/2302.10866.pdf ![](https://hackmd.io/_uploads/B1UbwAnh3.png) ![Uploading file..._fhtbzcm3p]() # An Introduction to Neural Data Compression https://arxiv.org/pdf/2202.06533.pdf (Bits Back explained) # Practical Lossless Compression with Latent Variables using Bits Back Coding USE ANS for its stack structure # Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding # LOSSLESS COMPRESSION WITH STATE SPACE MODELS USING BITS BACK CODING # BAYESIAN NETWORKS FOR PATTERN CLASSIFICATION, DATA COMPRESSION, AND CHANNEL CODING https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=9499e1f69d9d6adcf46d0d4b885536ed538effb2 $\sum$