# A Fast Transformer-based General-Purpose Lossless Compressor
https://github.com/mynotwo/A-Fast-Transformer-based-General-Purpose-LosslessCompressor
Propose a new Transformer based architechutre for fast inference time and parallel execution.


Estimated probability is sent to Arithmetic Coder for encoding. At the same time, a copy of the estimated probability is sent to Back-prop controller to make backpropagation decisions.
Back-propagation (BP) provides an opportunity to fit model into local data distribution, but it also slows down the compression spped.
Three classes of NN based approaches:
- Static-Pretrainedg
- Dynamic-Pretrained
- **Dynamic-Random** (need a weight initzalizaion scheme)



*Higher is better


IDEAS:
- The methods is GPU intensive, is it a problem?
- Context mixing with simple models
- Better BP scheme
- Detect distribution shift for BP, use statistical methods ???
- Weight initialization with some Bayesian priors?
- Bayesian parameter updates?
#### Literature overview:
- DZip: Improved general-purpose loss less compression based on novel neural network modeling. In 2021 Data Compression Conference. https://arxiv.org/abs/1911.03572
- B. Knoll. 2014. CMIX. (2014). http://www.byronknoll.com/cmix.html
- B. Knoll. 2016. Tensorflow-compress. https://github.com/byronknoll/
tensorflow-compress
- B. Knoll. 2020. NNCP: Lossless Data Compression with Neural
Networks. (2020). https://bellard.org/nncp/
- A Deep Context Model for High Efficiency Arithmetic Coding. In 2019 International Conference on Artificial Intel ligence in Information and Communication
# Accelerating General-Purpose Lossless Compression via Simple and Scalable Parameterization
This observation guides the designing of an interpretable structure for data compression, rather than learning implicitly from
data like Recurrent Neural Network (RNN) and attention. Based
on this observation, we disentangle the compression model into
order learning and feature learning, which were fused in a large
module in previous works. A parameterized ordered mask unit is
established to learn the ordered importance of history symbols.
- Dynamic coding, statr with a random state
- L0-regularized logistic regression-based compressor (how to optimize L0 ? )
- A simple MLP compressor can achieve state-of-theart compression performance with a much faster compression speed with establishing ordered importa

Last symbols are more important:






IDEAS:
CNN + Classic windows, window choice depends on initial parameters.
Source: https://arxiv.org/pdf/2302.10866.pdf

![Uploading file..._fhtbzcm3p]()
# An Introduction to Neural Data Compression
https://arxiv.org/pdf/2202.06533.pdf
(Bits Back explained)
# Practical Lossless Compression with Latent Variables using Bits Back Coding
USE ANS for its stack structure
# Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding
# LOSSLESS COMPRESSION WITH STATE SPACE MODELS USING BITS BACK CODING
# BAYESIAN NETWORKS FOR PATTERN CLASSIFICATION, DATA COMPRESSION, AND CHANNEL CODING
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=9499e1f69d9d6adcf46d0d4b885536ed538effb2
$\sum$