[Week 5] Tools for Building LLM

# [Week 5] Tools for Building LLM [課程目錄](https://areganti.notion.site/Applied-LLMs-Mastery-2024-562ddaa27791463e9a1286199325045c) [課程連結](https://areganti.notion.site/Week-5-Tools-for-Building-LLM-Applications-eb1addcaff4a4fa49e2297909cf04bbf) ## ETMI5: Explain to Me in 5 In this section of our course, we explore the essential technologies and tools that facilitate the creation and enhancement of LLM applications. This includes Custom Model Adaptation for bespoke solutions, RAG-based Applications for contextually rich responses, and an extensive range of tools for input processing, development, application management, and output analysis. Through this comprehensive overview, we aim to equip you with the knowledge to leverage both proprietary and open-source models, alongside advanced development, hosting, and monitoring tools. 在課程的這一部分中，我們將探討促進LLM應用程式的建置和增強的基本技術和工具。這包括用於定製解決方案的客制化模型適應、用於上下文豐富響應的基於RAG的應用程式，以及用於輸入處理、開發、應用程式管理和輸出分析的大量工具。透過這個通盤的概述，我們的目標是為你提供利用專有與開源模型以及先進的開發、托管與監控工具的知識。 ## Types of LLM Applications LLM applications are gaining momentum, with an increasing number of startups and companies integrating them into their operations for various purposes. These applications can be categorized into three main types, based on how LLMs are utilized LLM的應用程式正處於蓬勃發展階段，越來越多的新創公司和企業出於各種目的將之整合到營運中。根據LLMs的使用方式，這些應用可以分為三種主要類型 1. **Custom Model Adaptation**: This encompasses both the development of custom models from scratch and fine-tuning pre-existing models. While custom model development demands skilled ML scientists and substantial resources, fine-tuning involves updating pre-trained models with additional data. Though fine-tuning is increasingly accessible due to open-source innovations, it still requires a sophisticated team and may result in unintended consequences. Despite its challenges, both approaches are witnessing rapid adoption across industries. 2. **RAG based Applications**: The Retrieval Augmented Generation (RAG) method, likely the simplest and most widely adopted approach currently, utilizes a foundational model supplemented with contextual information. This involves retrieving embeddings, which represent words or phrases in a multidimensional vector space, from dedicated vector databases. Through the conversion of unstructured data into embeddings and their storage in these databases, RAG enables efficient retrieval of pertinent context during queries. This facilitates natural language comprehension and timely insights extraction without the need for extensive model customization or training. A notable advantage of RAG is its ability to bypass traditional model limitations like context window constraints. Moreover, it offers cost-effectiveness and scalability, catering to diverse developers and organizations. Furthermore, by harnessing embeddings retrieval, RAG effectively addresses concerns regarding data currency and seamlessly integrates into various applications and systems. 1. **Custom Model Adaptation**：這包含從頭開始開發客制化模型和微調現有模型。雖然客製化模型的開發需要熟練的機器學習科學家和大量的資源，不過，微調則是涉及使用額外的資料來更新預訓練的模型。儘管微調因為開源創新而變得越來越容易，但它仍然需要一個經驗老道的團隊，而且可能會有意想不到的結果。儘管面臨挑戰，這兩種方法在各行業中仍然是得到快速採用。 2. **RAG based Applications**：檢索增強生成(RAG)方法可能是目前最簡單且最廣泛被採用的方法，它利用基礎模型輔以上下文信息。這涉及從專用向量資料庫中檢索嵌入，該嵌入表示多維向量空間中的單字或片語。透過將非結構化資料轉換為嵌入並將其儲存在這些資料庫中，RAG可以在查詢過程中有效地檢索相關上下文。這有助於自然語言理解和及時的見解提取，而無需進行大量的模型定製或訓練。RAG的一個顯著的優勢在於它能夠繞過傳統模型的限制，像是上下文視窗約束。此外，它還提供成本效益和可擴展性，滿足不同的開發人員和組織的需求。此外，透過利用嵌入檢索，RAG有效地解決了有關資料流通的問題，並無縫整合到各種應用程式和系統中。 In the previous weeks’ [content](https://www.notion.so/6ad3284a96a241f3bd2318f4f502a1da?pvs=21), we covered the distinctions between these methodologies and discussed the criteria for selecting the most appropriate one based on your specific needs. Please review the materials for further details. 在前幾週的[內容](https://www.notion.so/6ad3284a96a241f3bd2318f4f502a1da?pvs=21)中，我們介紹了這些方法之間的區別，並討論了根據你的具體需求選擇最合適方法的標準。細節就自己回頭喵喵看了。 In the upcoming sections, we'll explore the tool options available for both of these methodologies. There's certainly some overlap between them, which we'll address. 在接下來的章節中，我們要來探討可用於這兩種方法的工具選項。它們之間肯定存在一些重疊，我們將予以解決。 ## Types of Tools We can broadly categorize tools into four major groups: 1. **Input Processing Tools**: These are tools designed to ingest data and various inputs for the application. 2. **LLM Development Tools**: These tools facilitate interaction with the Large Language Model, including calling, fine-tuning, conducting experiments, and orchestration. 3. **Output Tools**: These tools are utilized for managing the output from the LLM application, essentially focusing on post-output processes. 4. **Application Tools**: These tools oversee the comprehensive management of the aforementioned three components, including application hosting, monitoring, and more. 我們可以將工具大致分為四大類： 1. **輸入處理工具**：這些工具是設計來為應用程式擷取資料與各種輸入。 2. **LLM開發工具**：這些工具有助於與大語言模型的交互，包括呼叫、微調、進行實驗和編排。 3. **輸出工具**：這些工具用於管理LLM應用程式的輸出，主要關注於輸出後流程。 4. **應用程式工具**：這些工具負責監督上述三個元件的綜合管理，包括應用程式托管、監控等。 ![image](https://hackmd.io/_uploads/HJUDyJN20.png) If you're remember from the previous content how RAG operates, an application typically follows these steps: 1. Receives a query from the user (user's input to the application). 2. Utilizes an embedding search to find pertinent data (this involves an embedding LLM, data sources and a vector database for storing data embeddings). 3. Forwards the retrieved documents along with the query to the LLM for processing. 4. Delivers the LLM's output back to the user. 如果你還記得前面的內容中說過的RAG的運作方式，那麼應用程式通常會有下面幾個步驟： 1. 接收來自使用者的查詢(使用者對應用程式的輸入)。 2. 利用嵌入搜尋來尋找相關資料(這涉及嵌入LLM、資料來源與用於儲存資料嵌入的向量資料庫)。 3. 將檢索到的文檔連同查詢一起轉送給LLM進行處理。 4. 將LLM的輸出回傳給使用者。 Hosting and monitoring LLM responses are integrated into the overall application architecture, as depicted in the image below. For fine-tuning applications, much of this workflow is maintained. However, there's a need for a framework and computing resources dedicated to model fine-tuning. Additionally, the application may or may not utilize external data, in which case the vector database component might not be necessary. In the figure below, each of these components and their category association is depicted. Now that we know how each of the tools are utilized, let’s dig deeper into each of these tool types. 如下圖所示，托管和監控LLM回應的部份已經被整合到整體應用程式架構中。對於微調應用程式，這工作流程的大部分內容都會得到維護。只是吼，需要一個專門用於模型微調的框架和計算資源。此外，應用程式可能會或可能不會利用外部資料，在這種情況下，向量資料庫元件可能不是必需的。下圖中描述了每個組件及其類別關聯。現在我們知道了各種工具的使用方式，讓我們更深入地了解每種工具類型。 💡If you’re still unsure why each of these tool categories are required, please review the previous weeks’ content to understand how RAG and Fine-Tuning applications work 💡如果你仍然不確定為什麼需要這些工具類別，請查看前幾週的內容以了解RAG和微調應用程式的工作原理 ![image](https://hackmd.io/_uploads/ryzu1kV2A.png) Summary of tools available to build LLM Apps 總結可以用於建構LLM應用程式的工具 ## Input Processing Tools ### 1. Data Pipelines/Sources In LLM applications, the effective management and processing of data are key to boosting performance and functionality. The types of data these applications work with are diverse, encompassing text documents, PDFs, and structured formats like CSV files or SQL tables. To navigate this variety, a range of data pipelines and source tools are chosen for loading and transforming data. 在LLM應用程式中，有效的資料管理與處理是提高效能和功能性的關鍵。這些應用程式使用的資料類型是非常多樣性的，包括文本文件、PDFs以及CSV檔案或SQL資料表等結構化格式。為了駕馭這種多樣性，我們需要選擇一系列的data pipelines與來源工具來載入和轉換資料。 **A. Data Loading and ETL (Extract, Transform, Load) Tools** - **Traditional ETL Tools**: Established ETL solutions are widely used to manage data workflows. [**Databricks**](http://databricks.com) is chosen for its robust data processing capabilities, emphasizing machine learning and analytics, while [**Apache Airflow**](https://airflow.apache.org/) is preferred for its ability to programmatically author, schedule, and monitor workflows. - **Document Loaders and Orchestration Frameworks**: Applications that predominantly deal with unstructured data often utilize document loaders integrated within orchestration frameworks. Notable examples include: - [**LangChain**](https://www.langchain.com/), powered by Unstructured, aids in processing unstructured data for LLM applications. - [**LlamaIndex**](https://www.llamaindex.ai/), a component of the Llama Hub ecosystem, offers indexing and retrieval functions for efficient data management. Further details on LlamaIndex and LangChain will be provided in the orchestration section. **A. Data Loading and ETL (Extract, Transform, Load) Tools** - **Traditional ETL Tools**：已建立的ETL解決方案廣泛用於管理資料工作流程。選擇[**Databricks**](http://databricks.com)是因為其強大的資料處理能力，強調機器學習和分析，而[**Apache Airflow**](https://airflow.apache. org /)則是因為能夠以程式設計方式編寫、調度與監控工作流程而受到青睞。 - **Document Loaders and Orchestration Frameworks**：主要處理非結構化資料的應用程式通常使用整合在編排框架(Orchestration Framework)中的文件載入器。值得注意的包括： - [**LangChain**](https://www.langchain.com/)，由[Unstructed](https://python.langchain.com/v0.2/docs/integrations/providers/unstructured/)提供支援，主要用於處理LLM應用程式中的非結構化資料。 - [**LlamaIndex**](https://www.llamaindex.ai/)，Llama Hub生態系統的一個組件，提供索引和檢索功能以實現高效的資料管理。有關LlamaIndex和LangChain的更多詳細資訊將在編排的章節提供。 **B. Specialized Data-Replication Solutions** Although the existing stack for data management in LLM applications is operational, there is potential for enhancement, especially in developing data-replication solutions specifically tailored for LLM apps. Such innovations could make the integration and operationalization of data more streamlined, improving both efficiency and the scope of possible applications. **B。專業資料複製解決方案** 儘管在LLM應用程式中現有的資料管理堆疊已經能夠運行，但仍有再成長的空間，特別是在開發專為 LLM應用程式定制的data-replication solutions方面。此類創新可以使資料的整合和操作更加流暢，從而提高效率和可能的應用範圍。 **Data Loaders for Structured and Unstructured Data** The capability to integrate data from a variety of sources is enabled by data loaders that can handle both structured and unstructured inputs. For instance: - **Unstructured Data**: Solutions provided by **Unstructured.io** allow for the creation of complex ETL pipelines. These are vital for applications aimed at generating personalized content or conducting semantic searches with data stored in formats like PDFs, documents, and presentations. - **Structured Data Sources**: Loaders that directly connect to databases and other structured data repositories are used, facilitating seamless data integration and manipulation. **結構化和非結構化資料的資料載入器** 我們可以透過能夠同時處理結構化與非結構化輸入的資料載入器來整合處理來自各種來源的資料。例如： - **非結構化資料**：由**Unstructed.io**提供的解決方案允許建立複雜的ETL pipelines。這些對於主要用於生成個人化內容或使用以PDF、文件和簡報等格式儲存的資料進行語義搜尋的應用程式來說是很重要的。 - **結構化資料來源**：使用直接連接到資料庫和其它結構化資料儲存庫的載入器，有助於資料整合和操作。 ### 2. Vector Databases Referring back to the content on RAG, we explored how the most relevant documents are identified through embedding similarity. This is the role where vector databases come into play. 回顧RAG上的內容，我們探討如何透過嵌入相似性來識別最相關的文件。這就是向量資料庫發揮作用的地方。 The primary role of a vector database is to store, compare, and retrieve embeddings (i.e., vectors) efficiently, often scaling up to billions. Among the various options available, [**Pinecone**](https://www.pinecone.io/) stands out as a prevalent choice due to its cloud-hosted nature, making it readily accessible and equipped with features that cater to the demands of large enterprises, such as scalability, Single Sign-On, and Service Level Agreements on uptime. 向量資料庫的主要作用是有效地儲存、比較和檢索嵌入(即向量)，通常可擴展到數十億。在可用的各種選項中，[**Pinecone**](https://www.pinecone.io/)因其雲端托管的性質而成為多數人的選擇，使其易於存取並配備了滿足大型企業的需求，像是擴充性、單一登入和正常服務水準協議(SLA)。 The spectrum of vector databases is broad, encompassing: - **Open Source Systems** like [Weaviate](https://weaviate.io/), [Vespa](https://vespa.ai/), and [Qdrant](https://qdrant.tech/): These platforms offer exceptional performance on a single-node basis and can be customized for particular applications, making them favored choices among AI teams with the expertise to develop tailored platforms. - **Local Vector Management Libraries** such as [Chroma](https://www.trychroma.com/) and [Faiss](https://github.com/facebookresearch/faiss): Known for their excellent developer experience, these libraries are straightforward to implement for small-scale applications and development experiments. However, they may not serve as complete substitutes for a full-fledged database at scale. - **OLTP Extensions like [pgvector](https://supabase.com/docs/guides/database/extensions/pgvector)**: This option is suited for those who tend to use Postgres for various database requirements or enterprises that procure most of their data infrastructure from a single cloud provider, offering a viable solution for vector support. The long-term viability of closely integrating vector and scalar workloads remains to be seen. 向量資料庫百百種，有下面類別： - **Open Source Systems**，例如[Weaviate](https://weaviate.io/)、[Vespa](https://vespa.ai/)跟[Qdrant](https://qdrant.tech/))：這些平台在單節點基礎上提供卓越的性能，並且可以針對特定應用程式進行定制，因此成為擁有開發定制平台專業知識的人工智慧團隊的首選。 - **Local Vector Management Libraries**，例如[Chroma](https://www.trychroma.com/) 和[Faiss](https://github.com/facebookresearch/faiss)：以其出色的開發人員體驗而聞名，這些函式庫可以直接實現小規模應用和開發實驗。然而，它們可能無法完全取代成熟的大型資料庫。 - **OLTP Extensions like [pgvector](https://supabase.com/docs/guides/database/extensions/pgvector)**：此選項適合那些傾向於使用Postgres來滿足各種資料庫需求的人或那些主要從單一雲端供應商購買資料基礎設施的企業，這為向量支援提供了一個可行的解決方案。緊密整合向量與scalar workloads的長期可行性仍有待觀察。 With the evolution of technology, many open-source vector database providers are venturing into cloud services. Achieving high performance in the cloud, catering to a wide array of use cases, presents a significant challenge. While the immediate future may not witness drastic changes in the offerings available, the long-term landscape is expected to evolve. 隨著技術的發展，許多開源向量資料庫供應商正在嚐試雲端服務。在雲端中實現高效能並滿足廣泛的使用情況是一項重大的挑戰。雖然在短時間內可能在可用產品還不致於會有著巨大變化，但長期來看，這領域預計還是會發生變化。 ## LLM Development Tools ### 1. Models Developers have a variety of model options to choose from, each with its own set of advantages depending on the project's requirements. The starting point for many is the OpenAI API, with GPT-4 or GPT-4-32k models being popular choices due to their wide-ranging compatibility and minimal need for fine-tuning. 開發人員有多種模型選項可以選擇，每種模型都有自己的優勢，這取決於專案的需求。許多人一開始會選擇使用OpenAI API，其中GPT-4或GPT-4-32k模型因其廣泛的兼容性和最少的微調需求而成為受歡迎的選擇。 As applications move from development to production, the focus often shifts towards balancing cost and performance. 隨著應用程式從開發轉向生產環境，大家關注的重就會轉向成本與效能的平衡。 Beyond proprietary models, there's a growing interest in open-source alternatives, most of which are available on [**Huggingface**](https://huggingface.co/). Open-source models provide a flexible and cost-effective solution, especially useful in high-volume, consumer-facing applications like search or chat functions. While traditionally seen as lagging behind their proprietary counterparts in terms of accuracy and performance, the gap is closing. Initiatives like Meta's LLaMa models have showcased the potential for open-source models to reach high levels of accuracy, spurring the development of various alternatives aimed at matching or even surpassing proprietary model performance. 除了專有模型之外，人們對開源替代方案越來越感興趣，其中大部分都可以在[**Huggingface**](https://huggingface.co/)上找到。開源模型提供了一個靈活且具有成本效益的解決方案，在高流量、面向消費者的應用程式中特別有效，像是搜尋、聊天功能等。雖然傳統上開源模型在準確性和效能方面被認為落後於專有模型，但差距正在縮小。像是Meta所提倡的LLaMa模型就說明了開源模型達到高準確度的潛力，這刺激了旨在匹配甚至超越專有模型效能的各種替代方案的開發。 The choice between proprietary and open-source models doesn't just hinge on cost. Considerations include the specific needs of the application, such as accuracy, inference speed, customization options, and the potential need for fine-tuning to meet particular requirements. Users may also weigh the benefits of hosting models themselves against using cloud-based solutions, which can simplify deployment but may involve different cost structures and scalability considerations. 專有模型和開源模型之間的選擇不僅取決於成本。考慮因素包括應用程式的特定需求，例如準確性、推論速度、自訂選項以及微調以滿足特定要求的潛在需求。用戶還可以權衡自行托管模型與使用基於雲端的解決方案的優點，後者可以簡化部署，但可能涉及不同的成本結構和擴展性的考慮因素。 💡Note that many proprietary models cannot be fine-tuned by the application developers. 💡請注意，許多專有模型是無法由應用程式開發人員進行微調的。 ### 2. Orchestration Orchestration tools in the context of LLM applications are software frameworks designed to streamline and manage complex processes involving multiple components and interactions with LLMs. Here's a breakdown of what these tools do: 1. **Automate Prompt Engineering**: Orchestration tools automate the creation and management of prompts, which are queries or instructions sent to LLMs. These tools use advanced strategies to construct prompts that effectively communicate the task at hand to the model, improving the relevance and accuracy of the model's responses. 2. **Integrate External Data**: They facilitate the incorporation of external data into prompts, enhancing the model's responses with context that it wasn't originally trained on. This could involve pulling information from databases, web services, or other data sources to provide the LLM with the most current or relevant data for generating its responses. 3. **Manage API Interactions**: Orchestration tools handle the complexities of interfacing with LLM APIs, including making calls to the model, managing API keys, and handling the data returned by the model. This allows developers to focus on higher-level application logic rather than the intricacies of API communication. 4. **Prompt Chaining and Memory Management**: They enable prompt chaining, where the output of one LLM interaction is used as input for another, allowing for more sophisticated dialogues or data processing sequences. Additionally, they can maintain a "memory" of previous interactions, helping the model build on past responses for more coherent and contextually relevant outputs. 5. **Simplify Application Development**: By abstracting away the complexity of working directly with LLMs, orchestration tools make it easier for developers to build applications. They provide templates and frameworks for common use cases like chatbots, content generation, and information retrieval, speeding up the development process. 6. **Avoid Vendor Lock-in**: These tools often design their systems to be model-agnostic, meaning they can work with different LLMs from various providers. This flexibility allows developers to switch between models as needed without rewriting large portions of their application code. LLM應用程式上下中的編排工具是一種軟體框架，主要目的在於簡化和管理涉及多個組件以及與LLMs互動的複雜流程。以下是這些工具的功能細分： 1. **自動化提示工程**：編排工具自動建立和管理提示，這些提示是發送給LLMs的查詢或指示。這些工具使用先進的策略來構建提示，有效地將手頭上的任務丟給模型，從而提高模型響應的相關性和準確性。 2. **整合外部資料**：它們有助於將外部資料合併到提示中，從而增強模型對原始未訓練過的上下文的反應。這可能涉及從資料庫、網路服務或其他資料來源中提取信息，為LLMs提供最新或相關的資料以生成其回應。 3. **管理API互動**：編排工具處理與LLM APIs互動的複雜性，包括呼叫模型、管理API金鑰以及處理模型傳回的資料。這使得開發人員能夠專注於更高層級的應用程式邏輯，而不是複雜的API通訊。 4. **提示鏈和記憶體管理**：它們支援提示鏈，其中一個跟LLM互動的輸出作為另一個的輸入，從而允許更複雜的對話或資料處理序列。此外，它們還可以保留先前互動的"記憶"，幫助模型建立在過去的反應的基礎上，以獲得更連貫和上下文相關的輸出。 5. **簡化應用程式開發**：透過抽象化直接使用LLMs的複雜性，編排工具讓開發人員可以更輕鬆地建立應用程式。它們為常見的案例，像是聊天機器人、內容生成與信息檢索等提供模板與框架，從而加快開發過程。 6. **避免供應商鎖定**：這些工具通常將其系統設計為model-agnostic(與模型無關？)，這意味著它們可以與不同供應商的不同LLMs一起工作。這種靈活性讓開發人員可以根據需求在模型之間切換，而無需重寫大部分應用程式的程式碼。 Frameworks like LangChain and LlamaIndex work by simplifying complex processes such as prompt chaining, interfacing with external APIs, integrating contextual data from vector databases, and maintaining consistency across multiple LLM interactions. They offer templates for a wide range of applications, making them particularly popular among hobbyists and startups eager to launch their applications quickly, with LangChain leading in usage. LangChain和LlamaIndex等框架是利用簡化複雜的流程來執行，例如提示鏈、與外部API互動、整合向量資料庫中的上下文資料以及保持多個LLM互動的一致性。它們為各式各樣的應用程式提供模板，使其在熱衷於快速推出其應用的愛好者和新創公司中特別受歡迎，其中LangChain在使用方面處於領先地位。 ![image](https://hackmd.io/_uploads/SkAjJJ42R.png) Image Source: https://stackoverflow.com/questions/76990736/differences-between-langchain-llamaindex Retrieval-augmented generation techniques, which personalize model outputs by embedding specific data within prompts, demonstrate how personalization can be achieved without altering the model's weights through fine-tuning. Tools like LangChain and LlamaIndex offer structures for weaving data into the model's context, facilitating this process. 檢索增強生成技術透過在提示中嵌入特定資料來個人化模型輸出，說明著如何在不透過微調改變模型權重的情況下實現個人化。LangChain和LlamaIndex等工具提供了將資料編入到模型上下文中的結構，有助於這一過程。 The availability of language model APIs democratizes access to powerful models, extending their use beyond specialized machine learning teams to the broader developer community. This expansion is likely to spur the development of more developer-oriented tools. LangChain, for instance, assists developers in overcoming common challenges by abstracting complexities such as model integration, data connection, and avoiding vendor lock-in. Its utility ranges from prototyping to full-scale production use, indicating a significant shift towards more accessible and versatile tooling in the LLM application development ecosystem. 語言模型API的可用性使對強大模型的存取變得大眾化，將其使用範圍從專業的機器學習團隊擴展到更廣泛的開發人員社群。這種擴展可能會帶動更多面向開發人員的工具的開發。例如，LangChain，透過抽像模型整合、資料連接和避免供應商封鎖等複雜性來幫助開發人員克服常見挑戰。它的實用範圍從原型設計到全面性的生產使用，表明LLMs應用程式開發生態系統中轉向更易於存取且通用的工具。 ### 3. Compute/Training Frameworks Compute and training frameworks play essential roles in the development and deployment of LLM applications, particularly when it comes to fine-tuning models to suit specific needs or developing entirely new models. These frameworks and services provide the necessary infrastructure and tools required for handling the substantial computational demands of working with LLMs. 計算和訓練框架在LLMs應用程式的開發和部署中發揮著重要作用，特別是在微調模型以滿足特定需求或開發全新模型時。這些框架和服務提供了處理LLMs工作的大量計算需求所需的必要基礎設施和工具。 **Compute Frameworks** Compute frameworks and cloud services offer scalable resources needed to run LLM applications efficiently. Examples include: - **Cloud Providers**: Services like [**AWS](https://aws.amazon.com/) (Amazon Web Services)** provide a wide range of computing resources, including GPU and CPU instances, which are critical for both training and inference phases of LLM applications. These platforms offer flexibility and scalability, allowing developers to adjust resources according to their project's requirements. - **LLM Infrastructure Companies**: Companies like [**Fireworks.ai**](https://fireworks.ai/) and [**Anyscale**](https://www.anyscale.com/) specialize in providing infrastructure solutions tailored for LLMs. These services are designed to optimize the performance of LLM applications, offering specialized hardware and software configurations that can significantly reduce training and inference times. **計算框架** 運算框架和雲端服務提供高效運行LLM應用程式所需的可擴展資源。範例包括： - **雲端提供者**：[**AWS**](https://aws.amazon.com/) (Amazon Web Services)等服務提供廣泛的運算資源，包括GPU和CPU實例，這些資源是對於LLMs申請的訓練和推理階段都是很重要的。這些平台提供靈活性和可擴展性，允許開發人員根據專案需求來調整資源。 - **LLM基礎設施公司**：像[**Fireworks.ai**](https://fireworks.ai/)和[**Anyscale**](https://www.anyscale.com/)這樣的公司專門為LLMs提供量身定制的基礎設施解決方案。這些服務旨在優化LLMs應用程式的性能，提供可以明顯減少訓練和推理時間的硬體和軟體配置。 **Training Frameworks** For the development and fine-tuning of LLMs, deep learning frameworks are used. These include: - **PyTorch**: A popular choice among researchers and developers for training LLMs due to its flexibility, ease of use, and dynamic computational graph. PyTorch supports a wide range of LLM architectures and provides tools for efficient model training and fine-tuning. - **TensorFlow**: Another widely used framework that offers robust support for LLM training and deployment. TensorFlow is known for its scalability and is suited for both research prototypes and production deployments. **訓練框架** 為了LLMs的開發與微調所使用的深度學習框架。這些包括： - **PyTorch**：由於其靈活性、易用性和動態計算圖，成為研究人員和開發人員訓練LLMs的熱門選擇。 PyTorch支援廣泛的LLM架構，並提供高效模型訓練和微調的工具。 - **TensorFlow**：另一個廣泛使用的框架，為LLM的訓練和部署提供強大的支援。TensorFlow以其可擴展性而聞名，適用於研究原型和生產部署。 💡Note that LLM API applications, such as those leveraging RAG, typically do not require direct access to computational resources for training since they use pre-trained models provided via an API. In these cases, the focus is more on integrating the API into the application and possibly using orchestration tools to manage interactions with the model. 💡請注意，LLM API應用程式，像是利用RAG的應用程式，通常不需要直接存取運算資源進行訓練，因為它們使用透過API所提供的預訓練模型。在這些情況下，我們的關注點更多是將API整合到應用程式中，並可能使用編排工具來管理與模型的互動。 ### 4. Experimentation Tools Experimentation tools are pivotal for LLM applications, as they facilitate the exploration and optimization of hyperparameters, fine-tuning techniques, and the models themselves. These tools help track and manage the multitude of experiments that are part of developing and refining LLM applications, enabling more systematic and data-driven approaches to model improvement. 實驗工具對於LLMs應用程式來說是關鍵的，因為它們有助於超參數、微調技術和模型本身的探索與最佳化。這些工具有助於追蹤和管理大量實驗，這些實驗是開發和完善LLMs應用程式的一部分，從而實現更系統化和資料驅動的模型改進方法。 💡 It's important to note that the mentioned tools are primarily beneficial for scenarios involving the fine-tuning or training of models, where experimentation is key. If you're working on applications, these tools might not hold the same level of utility since the LLM operates as a black box. In such cases, the LLM's inner workings and training processes are managed externally, and the focus shifts towards optimizing the use of the model through APIs rather than directly manipulating its training or fine-tuning parameters. 💡 需要注意的是，上述工具主要適用於涉及模型微調或訓練的場景，其中實驗是關鍵。如果你正在進行應用程式開發，這些工具可能不會具有相同級別的實用性，因為LLMs的運行是個黑盒子。在這種情況下，LLM的內部運作和訓練流程由外部所控，重點轉向透過API最佳化模型的使用，而不是直接操縱其訓練或微調參數。 The below are some experimentation tools - **Experiment Tracking**: Tools like [**Weights & Biases**](https://wandb.ai/site) provide platforms for tracking experiments, including changes in hyperparameters, model architectures, and performance metrics over time. This facilitates a more organized approach to experimentation, helping developers to identify the most effective configurations. - **Model Development and Hosting**: Platforms like **Hugging Face** and [**MLFlow**](https://mlflow.org/) offer ecosystems for developing, sharing, and deploying ML models, including custom LLMs. These services simplify access to model repositories (model hubs), computing resources, and deployment capabilities, streamlining the development cycle. - **Performance Evaluation**: Tools like [**Statsig**](https://www.statsig.com/) offer capabilities for evaluating model performance in a live production environment, allowing developers to conduct A/B tests and gather real-world feedback on model behavior. 以下是一些實驗工具 - **實驗追蹤**：諸如[**Weights & Biases**](https://wandb.ai/site)之類的工具提供了用於追蹤實驗的平台，包括隨著時間變化的超參數、模型架構和效能指標。這有助於採用更有組織的實驗方法，幫助開發人員確定最有效的配置。 - **模型開發和托管**：**Hugging Face** 和[**MLFlow**](https://mlflow.org/)等平台提供了用於開發、共享和部署機器學習模型的生態系統，包括自訂LLMs。這些服務簡化了對模型儲存庫(model hubs)、計算資源和部署功能的存取，從而簡化了開發週期。 - **效能評估**：像[**Statsig**](https://www.statsig.com/)這樣的工具提供了在即時生產環境中評估模型效能的功能，允許開發人員進行A/B tests並收集有關模型行為的真實反饋。 ## Application Tools ### 1. Hosting Developers leveraging open-source models have a range of hosting services at their disposal. Innovations from companies like [OctoML](https://octo.ai/) have expanded hosting capabilities beyond traditional server setups, enabling deployment on edge devices and directly within browsers. This shift not only enhances privacy and security but also serves to reduce latency and costs. Hosting platforms like [Replicate](https://replicate.com/) are incorporating tools designed to simplify the integration and utilization of these models for software developers, reflecting a belief in the potential of smaller, finely tuned models to achieve top-tier accuracy within specific domains. 開發人員利用開源模型可以處理一系列的托管服務。[OctoML](https://octo.ai/)等公司的創新將托管功能擴展到傳統伺服器設定之外，支援在邊緣設備上和直接在瀏覽器內部署。這種轉變不僅增強了隱私和安全性，而且還有助於減少延遲和成本。 [Replicate](https://replicate.com/)等托管平台正在整合旨在為軟體開發人員簡化這些模型的整合和使用的工具，這反映出人們相信更小、經過微調的模型有潛力實現頂級目標特定領域內的準確性。 Beyond the LLM components, the static elements of LLM applications—essentially, everything excluding the model itself—also require hosting solutions. Common choices include platforms like [Vercel](https://vercel.com/) and services provided by major cloud providers. Yet, the landscape is evolving with the emergence of startups like [Steamship](https://www.steamship.com/) and [Streamlit](https://streamlit.io/), which offer end-to-end hosting solutions tailored for LLM applications, indicating a broadening of hosting options to support the diverse needs of developers. 除了LLM的元件之外，LLM應用程式的靜態元素(本質上是模型本身以外的其它事情)也需要托管的解決方案。常見的選擇包括像是[Vercel](https://vercel.com/)的平台，以及主要雲端供應商提供的服務。然而，隨著像是[Steamship](https://www.steamship.com/)和[Streamlit](https://streamlit.io/)等提供end-to-end托管的新創公司的出現，這說明了托管的選項正在擴張以支援開發人員多元化的需求。 ### 2. Monitoring Monitoring and observability tools are essential for maintaining and improving applications, especially after deployment in production. These tools enable developers to track key metrics such as the model's performance, cost, latency, and overall behavior. Insights gained from these metrics are invaluable for guiding the iteration of prompts and further experimentation with models, ensuring that the application remains efficient, cost-effective, and aligned with user needs. 監控和可觀察性工具對於維護和改進應用程式而言是重要的，特別是在生產環境中部署之後。這些工具讓開發人員能夠追蹤關鍵指標，像是模型的效能、成本、延遲和整體行為。從這些指標中獲得的見解對於指導提示的迭代和模型的進一步實驗、確保應用程式維持高效、具有成本效益並與使用者需求保持一致來說非常有價值。 One notable development in this area is the launch of [**LangKit**](https://github.com/whylabs/langkit) by **WhyLabs**. LangKit is specifically designed to offer developers enhanced visibility into the quality of model outputs. 在這個領域中，一個值得注意的發展是 **WhyLabs** 推出的[**LangKit**](https://github.com/whylabs/langkit)。LangKit是專門為開發人員所設計，提供了對模型輸出品質更佳的可視性。 Some other examples: [**Gantry**](https://www.gantry.io/) offers a holistic approach to understanding model performance by tracking inputs and outputs alongside relevant metadata and user feedback. It assists in uncovering how models function in real-world scenarios, identifying errors, and spotting underperforming cohorts or use cases. [**Helicone**](https://www.helicone.ai/) is designed to offer actionable insights into application performance with minimal setup. It enables real-time monitoring of model interactions, helping developers understand how their models are performing across different metrics. By logging inputs, outputs, and enriching them with metadata and user feedback, Helicone provides a comprehensive view of model behavior. 其它的範例： [**Gantry**](https://www.gantry.io/) 提供了一種全面性的方法來了解模型的效能，透過追蹤輸入與輸出以及相關的元資料(metadata)和使用者反饋。它有助於揭露模型在真實世界場景中的運作方式，識別錯誤，並找出效能不佳的群體或用例。 [**Helicone**](https://www.helicone.ai/) 目的是以最少的設定來提供對應用程式效能的可行見解。它可以即時性監控模型互動，幫助開發人員了解他們的模型在不同指標上的表現。透過記錄輸入、輸出並用元資料和使用者反饋來豐富這些資料，Helicone提供了模型行為的全面性觀點。 ## Output Tools ### 1. Evaluation When developing applications with LLMs, developers often navigate a complex balance among model performance, inference cost, and latency. Strategies to enhance one aspect, such as iterating on prompts, fine-tuning the model, or switching model providers, can impact the others. Given the probabilistic nature of LLMs and the variability in tasks they perform, assessing performance becomes a critical challenge. To aid in this process, a range of evaluation tools have been developed. These tools assist in refining prompts, tracking experimentation, and monitoring model performance, both offline and online. Here's an overview of the types of tools available: 在使用LLMs開發應用程式時，開發人員通常需要在模型效能、推理成本和延遲之間尋求複雜的平衡。增強某一方面的策略，像是迭代提示、微調模型或切換模型的提供者，都可能會影響其它方面。鑑於LLMs的機率性質及其執行任務的多樣性，評估效能變成是一項關鍵挑戰。為了幫助這個過程，已經開發了一系列的評估工具。這些工具有助於改善提示、追蹤實驗以及監控模型效能(離線、線上)。這邊是可用工具類型的概述： For those looking to optimize the interaction with LLMs, No Code / Low Code prompt engineering tools are invaluable. They allow developers and prompt engineers to experiment with different prompts and compare outputs across various models without deep coding requirements. Some examples of such tools include [Humanloop](https://humanloop.com/), [PromptLayer](https://promptlayer.com/) etc. 對於那些希望最佳化與LLMs互動的人來說，No Code / Low Code的提示工程工具是非常寶貴的。它們允許開發人員和提示工程師嘗試不同的提示並比較各種模型的輸出，而無需埋頭苦幹寫程式。這類工具的一些範例包括 [Humanloop](https:// humanloop.com/)、[PromptLayer](https://promptlayer.com/) 等。 Once deployed, it's important to continually monitor an LLM application's performance in the real world. Performance monitoring tools offer insights into how well the model is performing against key metrics, identify potential degradation over time, and highlight areas for improvement. These tools can alert developers to issues that may affect user experience or operational costs, enabling timely adjustments to maintain or enhance the application's effectiveness. Some performance monitoring tools include [Honeyhive](https://www.honeyhive.ai/) and [Scale AI](https://scale.com/). 在部署之後，持續監控LLM應用程式在現實世界中的效能是非常重要。效能監控工具可以深入了解模型針對關鍵指標的執行情況，識別隨時間推移潛在退化的情況，並突出顯示需要改進的區域。這些工具可以提醒開發人員可能影響使用者體驗或營運成本的問題，從而能夠及時進行調整以維持或增強應用程式的有效性。一些效能監控工具包括[Honeyhive](https://www.honeyhive.ai/)和[Scale AI](https://scale.com/)。 ## Read/Watch These Resources (Optional) 1. https://www.secopsolution.com/blog/top-10-llm-tools-in-2024 2. https://www.sequoiacap.com/article/llm-stack-perspective/ 3. https://www.codesmith.io/blog/introducing-the-emerging-llm-tech-stack 4. https://stackshare.io/index/llm-tools