Re-fit ( Re-build and fine-Tune) method:

# Re-fit ( Re-build and fine-Tune) method: This method tries to combat the CodeBook collapse, i.e(dead codes phenomenon where only a small subset of codes are used by the encoder from the whole codebook), it was first introduced in the paper : https://arxiv.org/pdf/2112.01799 In the best scenario, We want our codeBook to be initialized in a manner where it has a prior on our dataset, so the codes and the encoders outputs are more alligned, hence the distances between them isn't that huge, The idea proposed in the article is simple : take a pre-trained Vq-Vae trained on a random intilialised codebook, then do a pass through all the dataset and encode allthe images using the encoder, we end by a huge matrix, then we apply a K-means on all the vectors to take the centroids, and train a new model whose codebbok is initialized with those centroids. We can note the following : * The new model is still inspired by the previous CodeBook, so note that any enhancement made on the previous can strongly increase the new model, inversely, if we start from a bad model, this mehtod could not help much * We can highly decrease the number of codes used in this new model without loosing much informations. after all, experiments show that only codes are used form a randomly initialized codebook. * Note here the necessity for this new model with reduced K, to use as the same encoder and decoder of th previous model, after all; the method called fine-tuning; this is a crucial point, otherwise we observe a huge downgrade if weights are initilaized radnomly. :::success # Results : ### Projection in the 2-D space of all the latents of the trainSet using a learned projection that preserves meaning : (UMAP) >> in Blue , the output of all trainig dataset latent-vectors (befor quantization) >> in Green , the projection of the centroids returned by the k-means++ algorithme, using the same projection matrix learned by UMAP. and finally in Red, the prohjected codes of the pre-trained vq-vae model. >> ![image](https://hackmd.io/_uploads/S14LGwJrkl.png) We can site here some few observations : > The red dots are slightly close to the codes, even when their learning process is quite independent and totally different. > if we trust at a certain level the UMAP prjection, the codes and the centroids doens't cover all the latents distribution. >We can observe some clusers, it would be intruiging to reverse the process and see wich cluster represents which in the dataSet samples. ### Re_fit models : I trained a new vq_model with the same encoder and decoder of the pre-trained model100, with the codebook initilized by the centroids and reduced number ok K ( new_K = K/4 = 128) , I was expecting a huge increase in the percentage of codes used : >> I went from " 89 OF CODES WERE USED FROM 512, WHICH MAKE 17.3828125 % OF THE CODE-BOOK" to " 13 OF CODES WERE USED FROM 128, WHICH MAKE 10.15625 % OF CODES FROM THE CODE-BOOK" ::: :::warning ### WHY ? If the goal of "The Re-Fit" methodis just t decrease the number of codes from the codeBook, why not just uantize the codeBook in intelligent way, or filter the not used Vectors and snatch them in a new smaller codebook, since codes indices isn't important. :::