Nvidia Nemo - HackMD

# Nvidia Nemo ![4457f36af7a2b27547d2a48351975f94](https://hackmd.io/_uploads/rybriOvz0.png) | Nemo 功能| 說明 | Column 3 |Column 4 | | ----------|-----------| ----------- |-------- | | Nemo models | 提供語音識別、自然語言理解和語音合成等模組功能| Text |Text | | NeMo-Curator| 建立訓練資料集的處理工具| Text |Text | | Traing toolkits| * NeMo Launcher * NeMo AutoConfigurator * 這是一種超參數工具，可以自動找到最佳訓練組態，使高通量 LLM 能夠更快地訓練。 * NeMo Megatron core ## 基本功能 ### 模型支持 NeMo 提供語音識別、自然語言理解和語音合成等功能, 打造高效能與靈活應用程式的開源框架 * Speech Process * ASR (自動語音識別) * TTS (文本到语音) https://ai-free-startup.medium.com/win-11-%E6%9C%AC%E5%9C%B0%E7%AB%AF%E9%81%8B%E8%A1%8C%E8%AA%9E%E9%9F%B3%E7%94%9F%E6%88%90%E6%9C%8D%E5%8B%99-nemo-streamlit-624498e800f4 * Object Detection * LLMs * Llama2 * BERT * CodeLlama * Mistral * ... [RIVA](https://developer.nvidia.com/riva) ### NeMo-Curator 對訓練資料的處理工具 * 資料下載和文字提取：默認實現 Common Crawl、Wikipedia 和 ArXiv 資料的下載和提取。 * 語言識別和分離：使用 fastText 和 pycld2 進行語言識別。 * 文字重新格式化和清理：通過 ftfy 修復 Unicode 解碼錯誤。 * 質量過濾：基於啟髮式的多語言過濾和基於 fastText 的分類器過濾。 * 文件級去重：使用 cuDF 和 Dask 加速精確和模糊去重。 * 多語言下游任務去污染：遵循 OpenAI GPT3 和 Microsoft Turing NLG 530B 的方法。 * 分佈式資料分類：多節點多 GPU 分類器推理，允許複雜的領域和質量分類。 * 個人可識別資訊（PII）編輯：用於移除地址、信用卡號、社會保障號碼等的工具。 ### 訓練框架與工具 * NeMo Launcher * * NeMo AutoConfigurator * 這是一種超參數工具，可以自動找到最佳訓練組態，使高通量 LLM 能夠更快地訓練。 * NeMo Megatron core ### NeMo Aligner 希望 LLM 輸出其中一種滿足人類偏好 (Preference) 的答案. * SteerLM * DPO * RLHF ### Nemo Guardrails 開源⼯具包，為基於 LLM 的對話應⽤程序新增可程式設計的保護欄。Guardrails 控制⼤型語⾔模型輸出的特定⽅式，例如不談論政治、以特定⽅式響應特定⽤戶請求、遵循預定義對話路徑、使⽤特定語⾔⻛格、提取結構化資料等。 ### Nemo Retriver 在 NVIDIA NeMo 中以微服務的形式存在，可幫助企業利⽤企業級檢索增強⽣成(RAG) 功能增強其⽣成式⼈⼯智能應⽤。 ### 推論框架與工具 * Triton Inference Server #### TensorRT / TensorRT-LLM TRT-LLM支援單機單卡、單機多卡（NCCL）、多機多卡，支援量化（8/4bit）與其他推理技術不同，TensorRT LLM不使用原始權重為模型服務。它會編譯模型並最佳化核心，這樣可以在Nvidia GPU上有效地服務。運行編譯模型的性能優勢遠遠大於運行原始模型。這是TensorRT LLM非常快的主要原因之一。整個模型編譯過程必須在GPU上進行。生成的編譯模型也是專門針對運行它的GPU進行最佳化的。例如，在A40 GPU上編譯模型，則可能無法在A100 GPU上運行它。所以無論在編譯過程中使用哪種GPU，都必須使用相同的GPU進行推理。 https://developer.aliyun.com/article/1448733 * [vLLM vs. TensorLLM with multi-GPUs](https://www.run.ai/blog/achieve-2x-inference-throughput-reduce-latency-leveraging-multi-gpu) * [vllm-gpu-ray-multigpu](https://www.xiaoiluo.com/article/vllm-gpu-ray-multigpu) ### Megatron Core / Megatron-LM 用於訓練大規模 transformer 模型的項目。它基於 PyTorch 框架，實現了高效的平行策略，包括模型平行、資料平行和管道平行。Megatron 還採用了混合精度訓練，以減少記憶體消耗並提高計算性能。 multi-node training uses the ==nccl== distributed backend. * [Pytorch torch.distributed setting](https://pytorch.org/docs/stable/distributed.html#environment-variable-initialization) * [magatron server](https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/nlp/modules/common/text_generation_server.py) * [建立自定義大型語言模型的完美工具](https://www.toolify.ai/tw/ai-news-tw/nvidia-nemo-megatron%E5%BB%BA%E7%AB%8B%E8%87%AA%E5%AE%9A%E7%BE%A9%E5%A4%A7%E5%9E%8B%E8%AA%9E%E8%A8%80%E6%A8%A1%E5%9E%8B%E7%9A%84%E5%AE%8C%E7%BE%8E%E5%B7%A5%E5%85%B7-1386239) * [如何使用 Megatron-LM 訓練語言模型](https://huggingface.co/blog/zh/megatron-training) ## Nemo Framework Software Component: * Transformer Engine * PyTorch * NeMo * NeMo Aligner * NeMo Data Curator * Megatron Core * PyTorch Lightning * Hydra * Kubernetes * Helm * GPU Operator * Network Operator * KubeFlow Operator