# Natural Language Processing ## Task - [Transformer trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer) - [ViSoBERT: Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://huggingface.co/uitnlp/visobert) - [Vietnames-embeddings](https://huggingface.co/dangvantuan/vietnamese-embedding) #### LLaMA - [Medium Explain](https://medium.com/@pranjalkhadka/llama-explained-a70e71e706e9) - [GQA - Grouped Query Attention](https://medium.com/@yashsingh.sep30/what-is-gqa-grouped-query-attention-in-llama-3-c4569ec19b63) - [Key-Value Cache Medium explanation](https://medium.com/@joaolages/kv-caching-explained-276520203249) - [Relative Positional Embedding](https://medium.com/@ngiengkianyew/what-is-relative-positional-encoding-7e2fbaa3b510) #### Rotatory positional embedding - [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864) - [Paper with code explanation](https://paperswithcode.com/method/rope) - [Medium explained](https://medium.com/@ngiengkianyew/understanding-rotary-positional-encoding-40635a4d078e) #### Nomic Embed: Training a Reproducible Long Context Text Embedder - Substituting absolute positional embeddings for rotary positional embedding - Using SwiGLU activation instead of GeLU - Using Flash Attention