# Natural Language Processing
## Task
- [Transformer trainer](https://huggingface.co/docs/transformers/en/main_classes/trainer)
- [ViSoBERT: Pre-Trained Language Model for Vietnamese Social Media Text Processing](https://huggingface.co/uitnlp/visobert)
- [Vietnames-embeddings](https://huggingface.co/dangvantuan/vietnamese-embedding)
#### LLaMA
- [Medium Explain](https://medium.com/@pranjalkhadka/llama-explained-a70e71e706e9)
- [GQA - Grouped Query Attention](https://medium.com/@yashsingh.sep30/what-is-gqa-grouped-query-attention-in-llama-3-c4569ec19b63)
- [Key-Value Cache Medium explanation](https://medium.com/@joaolages/kv-caching-explained-276520203249)
- [Relative Positional Embedding](https://medium.com/@ngiengkianyew/what-is-relative-positional-encoding-7e2fbaa3b510)
#### Rotatory positional embedding
- [RoFormer: Enhanced Transformer with Rotary Position Embedding](https://arxiv.org/abs/2104.09864)
- [Paper with code explanation](https://paperswithcode.com/method/rope)
- [Medium explained](https://medium.com/@ngiengkianyew/understanding-rotary-positional-encoding-40635a4d078e)
#### Nomic Embed: Training a Reproducible Long Context Text Embedder
- Substituting absolute positional embeddings for rotary positional embedding
- Using SwiGLU activation instead of GeLU
- Using Flash Attention