AWS AI Practitioner Preparation === [TOC] ## prompttech :::spoiler **Prioritizing specificity and concision in your prompts** >優先強調提示的具體性與簡潔性 When working with a language model that gives overly verbose responses, focusing on specificity and concision in your prompts is essential. By crafting prompts that are clear, direct, and specific about what you want, you guide the model to produce more concise and relevant outputs. This involves explicitly stating any requirements for brevity and clarity within the prompt itself. For example, you might add instructions like "Provide a brief summary of..." or "In two sentences, explain...". >當你使用一個傾向於產生冗長回覆的語言模型時,聚焦於提示(prompt)的具體性與簡潔性是關鍵。 透過撰寫清楚、直接、並明確指出你所需內容的提示,你能引導模型生成更精煉且相關性更高的輸出。 這需要在提示中明確說明對「簡潔」與「清楚」的要求。例如,你可以在提示中加入指令:「請提供簡短摘要……」或「請用兩句話說明……」。 Being precise about the desired response length and content helps the model understand your expectations and reduces the likelihood of unnecessary or off-topic information. This practice improves the efficiency of communication with the model and ensures that the generated responses are aligned with your objectives. Prioritizing specificity and concision makes your interactions with the model more productive and the outputs more useful, especially in applications where brevity is valued. >對回答的長度與內容要求越精確,模型就越能理解你的期望,並減少產生多餘或離題資訊的可能性。 這種做法能提升與模型互動的效率,確保生成內容與你的目標一致。 在重視篇幅與重點的應用情境中,優先考慮具體與簡潔能使互動更具成效,輸出也更有實用價值。 Incorrect Options(錯誤選項): **Using guardrails to limit the scope of responses** While guardrails can help prevent inappropriate content, it is generally used to enforce ethical guidelines or compliance requirements, not to control verbosity or enhance clarity in responses. >**使用防護欄以限制回覆範圍** 防護欄(guardrails)確實能防止模型產生不當內容,但它主要用於維護倫理準則或符合法規要求,並非用來控制冗長程度或提升回答的清晰度。 **Employing experimentation with random prompts** Experimenting with random prompts might lead to inconsistent results and doesn't specifically address the issue of overly verbose responses. It lacks a targeted approach to improve clarity and relevance. >**嘗試以隨機提示進行實驗** 使用隨機提示可能導致結果不一致,也無法針對「回覆過於冗長」的問題提供有效解決方案。這種做法缺乏針對性,無助於提升回答的明確性與相關性。 **Including negative prompts to filter out unnecessary content** Negative prompting involves telling the model what to avoid, but it may not effectively reduce verbosity. It focuses on excluding specific content rather than encouraging concise and clear responses. >**使用負面提示以過濾不必要內容** 負面提示(negative prompting)是指告訴模型應避免哪些內容,但這通常無法有效減少冗長。 它著重於「排除特定內容」,而非主動促進回答的簡潔與清楚。 ::: ## GenAI problem :::spoiler **Hallucinations(幻覺)** Hallucinations in generative AI refer to instances where the model generates content that is factually incorrect or misleading. In the context of healthcare, this is a serious concern because inaccurate or invented advice can lead to harmful decisions. This occurs when the AI model creates responses based on patterns in the data but fails to distinguish between true and false information. Ensuring that the AI provides accurate and reliable information in sensitive domains like healthcare is critical to prevent such hallucinations. >在生成式人工智慧中,幻覺(hallucinations)指的是模型生成事實上錯誤或具有誤導性內容的情況。 在醫療領域中,這是一項嚴重的問題,因為錯誤或捏造的建議可能導致危險的決策。 此現象的成因在於 AI 模型依據資料中的模式生成回覆,卻無法正確區分真實與虛假資訊。 因此,在醫療等高敏感度領域中,確保 AI 所提供的資訊準確且可靠,對防止此類幻覺至關重要。 Incorrect Options(錯誤選項): **Interpretability** Interpretability refers to how easily humans can understand the AI's reasoning process. While important, the primary concern in this scenario is not understanding the model but addressing its incorrect advice. >**可解釋性(Interpretability)** 可解釋性指的是人類理解 AI 推理過程的難易程度。 雖然這點很重要,但在此情境下,主要問題並非理解模型運作,而是處理其錯誤建議。 **Inaccuracy** While hallucinations often lead to inaccuracy, hallucinations specifically involve generating completely fabricated or misleading information. Inaccuracy is a broader term, whereas hallucinations directly address the issue in this context. >**不準確性(Inaccuracy)** 雖然幻覺常導致不準確的結果,但幻覺特指生成完全虛構或誤導性資訊的情況。 「不準確」是更廣泛的概念,而「幻覺」則直接指出本題的核心問題。 **Nondeterminism** Nondeterminism refers to the AI generating different outputs for the same input, but the issue here is not variability—it is the production of factually incorrect information. >**非決定性(Nondeterminism)** 非決定性指的是 AI 對同一輸入產生不同輸出的現象。 然而,此處的問題並非輸出變化,而是生成事實錯誤的內容。 ::: ## Hyperparameter tuning :::spoiler **Hyperparameter tuning(超參數調整)** Hyperparameter tuning is the process of optimizing the hyperparameters of a machine learning model to improve its performance. > **超參數調整(Hyperparameter tuning)**是指為了提升機器學習模型的表現,而對模型的**超參數進行最佳化**的過程。 Hyperparameters are settings such as learning rate, batch size, and the number of layers in a neural network, which are not learned from the data but are set before the training process. > 所謂的**超參數(hyperparameters)**,是指像**學習率(learning rate)**、**批次大小(batch size)**、或**神經網路層數(number of layers)**等設定值。 > 這些參數**並非透過資料學習得到**,而是必須在訓練開始前手動設定的。 Adjusting these values can have a significant impact on the model's ability to generalize and make accurate predictions. > 調整這些參數會**顯著影響模型的泛化能力與預測準確度**。 In the scenario described, the team is modifying hyperparameters like learning rate and batch size to improve accuracy, which makes this a clear case of hyperparameter tuning. > 在此情境中,團隊透過調整**學習率與批次大小**來提升準確率,這正是**典型的超參數調整實例**。 Tools like grid search, random search, and Bayesian optimization are often used in this phase to find the optimal settings. > 在這個階段,常用的超參數搜尋方法包括:**網格搜尋(grid search)**、**隨機搜尋(random search)**、以及**貝葉斯最佳化(Bayesian optimization)**,用以尋找最合適的設定組合。 [🔗 參考資料來源:AWS - What is Hyperparameter Tuning?](https://aws.amazon.com/what-is/hyperparameter-tuning) ::: ## Fine Tuning vs Pretraining :::spoiler **Continuous pre-training(持續預訓練)** Continuous pre-training is the process of **continuously updating a foundation model by training it on new data over time**, while preserving the model's existing knowledge. > **持續預訓練(Continuous pre-training)**是指在**持續使用新資料訓練基礎模型**的同時,**保留模型原有知識**的過程。 This approach is suitable when the goal is to regularly incorporate new datasets and keep the model up-to-date with evolving information. > 當目標是**定期納入新的資料集**並保持模型與不斷變化的資訊同步時,這種方法非常適合。 Continuous pre-training involves reusing the pre-trained model and periodically introducing new data without starting the entire training process from scratch. > 持續預訓練會**重用已預訓練的模型**,並定期加入新資料,而無需從頭開始整個訓練過程。 This allows the model to adapt to new patterns while maintaining the general knowledge it has already learned. > 這讓模型能**學習新模式**的同時,保持已經掌握的一般知識。 It is a scalable solution for environments where the model needs to stay current with new trends or information as they emerge. > 對於模型需要**隨時跟上新趨勢或最新資訊**的環境,這是一個可擴展的解決方案。 --- Incorrect Options(錯誤選項): **Fine-tuning** > **微調(Fine-tuning)** Fine-tuning involves **adjusting a pre-trained model using a smaller, domain-specific dataset. While useful for adapting to specific tasks**, it is not as comprehensive as continuous pre-training for regularly incorporating large amounts of new data. > 微調是指使用**較小的特定領域資料集調整預訓練模型**。雖然對於適應特定任務有幫助,但它不像持續預訓練那樣,能**定期整合大量新資料**。 **One-shot prompting** > **一次性提示(One-shot prompting)** One-shot prompting refers to providing the model with a single example to help it generate a response. It does not involve updating the model's knowledge or incorporating new data into the model's training. > 一次性提示是指提供模型**單一範例**以協助生成回覆。它**不會更新模型知識**,也不會將新資料納入訓練中。 **Pre-training from scratch** > **從零開始預訓練(Pre-training from scratch)** Pre-training from scratch involves training the model entirely from the beginning, which is resource-intensive and inefficient if the goal is to incrementally update the model with new data while preserving previously learned knowledge. > 從零開始預訓練是指**完全從頭訓練模型**,若目標是**逐步更新模型以納入新資料並保留既有知識**,這種方式將非常耗費資源且效率低下。 ::: ## In-Context learning :::spoiler **In-context learning(上下文學習)** In-context learning involves providing examples of a task within the input to guide the model's responses. > **上下文學習(In-context learning)**是指在輸入中提供任務範例,以引導模型生成回覆。 Although it’s a flexible way to guide the model’s output, it doesn't permanently adapt the model for domain-specific needs, making it less suitable for long-term precision in specialized tasks. > 雖然這是一種靈活的方式來引導模型輸出,但它**不會永久調整模型以滿足特定領域需求**,因此在專業任務中不適合追求長期精準度。 ::: ## Data storage :::spoiler **Amazon Neptune(亞馬遜 Neptune)** Amazon Neptune is a fully managed graph database service optimized for storing and querying complex, multi-dimensional data, such as embeddings and vectors. > **Amazon Neptune** 是一個完全託管的圖形資料庫服務,專門優化用於儲存和查詢**複雜、多維資料**,例如嵌入向量(embeddings)和向量(vectors)。 It is ideal for use cases like knowledge graphs, recommendation engines, and fraud detection, where relationships between data points are as important as the data itself. > 它非常適合用於**知識圖譜(knowledge graphs)**、**推薦系統(recommendation engines)**以及**詐騙偵測(fraud detection)**等場景,在這些場景中,資料點之間的關聯性與資料本身同樣重要。 Neptune supports both property graphs and RDF (Resource Description Framework) standards, making it highly versatile. > Neptune 支援**屬性圖(property graphs)**與 **RDF(資源描述框架,Resource Description Framework)** 標準,因此具有高度的多功能性。 For AI applications, embeddings can be stored and queried efficiently, helping machine learning models to retrieve similar vectors quickly. > 對於 AI 應用,Neptune 可以高效儲存與查詢嵌入向量,幫助機器學習模型**快速檢索相似向量**。 This makes it a suitable choice for AI systems that need to manage graph-based data structures and embeddings. > 因此,它是需要管理**圖形資料結構與嵌入向量**的 AI 系統的理想選擇。 --- Incorrect Options(錯誤選項): **Amazon DocumentDB** > **Amazon DocumentDB** Amazon DocumentDB is a fully managed document database designed for JSON-based data. While it is optimized for handling semi-structured data, it is not specifically optimized for managing graph data or embeddings, which is better handled by Amazon Neptune. > Amazon DocumentDB 是一個完全託管的**文件型資料庫**,設計用於 JSON 資料。雖然它在處理半結構化資料時表現良好,但**並非專門針對圖形資料或嵌入向量的管理**,這類需求由 Amazon Neptune 更適合處理。 **Amazon RDS** > **Amazon RDS** Amazon RDS is a managed relational database service. It supports traditional SQL databases like MySQL, PostgreSQL, and Oracle. Relational databases are not designed for efficiently handling graph-based data or embeddings used in AI applications. > Amazon RDS 是一個託管的**關聯式資料庫服務**,支援 MySQL、PostgreSQL、Oracle 等傳統 SQL 資料庫。**關聯式資料庫並非為圖形資料或 AI 應用中使用的嵌入向量設計**,效率不高。 **AWS Lambda** > **AWS Lambda** AWS Lambda is a serverless compute service that allows you to run code in response to events without managing servers. It is not a database solution and, therefore, not suitable for storing embeddings or multi-dimensional data. > AWS Lambda 是一個**無伺服器運算服務**,允許你在事件觸發時執行程式碼,而不需管理伺服器。它**不是資料庫解決方案**,因此不適合用來儲存嵌入向量或多維資料。 ::: ## Model Selection :::spoiler **The model may have reduced accuracy or performance compared to a more complex black-box model(模型可能比複雜的黑箱模型準確度或表現較低)** A common tradeoff when choosing a highly transparent and interpretable model is that it may have reduced accuracy or performance compared to more complex black-box models, such as deep learning models. > 在選擇高度透明且可解釋的模型時,一個常見的權衡是,其**準確度或表現可能低於更複雜的黑箱模型**,例如深度學習模型。 While interpretable models like decision trees or linear models are easier for doctors to trust and understand, they may not capture complex patterns in data as effectively as black-box models. > 雖然可解釋模型(如決策樹或線性模型)更容易讓醫生**信任與理解**,但它們可能無法像黑箱模型那樣有效捕捉資料中的複雜模式。 This tradeoff means the hospital must balance transparency and explainability with the potential for slightly lower accuracy in high-stakes situations. > 這種權衡意味著醫院必須在**透明性與可解釋性**與**在高風險情境下可能略低的準確度**之間取得平衡。 --- Incorrect Options(錯誤選項): **The model will have high performance but might not be easily explainable.** > **模型將具有高表現,但可能不易解釋** This option describes the opposite tradeoff. Highly interpretable models are usually explainable but may not have the highest performance compared to more complex models. > 這個選項描述的是相反的權衡。高度可解釋的模型通常**易於解釋**,但與更複雜的模型相比,其表現可能不是最高。 **The model will have increased decision-making complexity but reduced safety** > **模型將增加決策複雜度,但降低安全性** Highly transparent models typically have lower complexity, not increased complexity. Additionally, choosing transparency does not inherently reduce safety, though it may affect performance. > 高透明度模型通常**決策複雜度較低**,並非增加。此外,選擇透明性**並不會直接降低安全性**,但可能影響表現。 **The model will make safe decisions without affecting performance or transparency** > **模型將做出安全決策,且不影響表現或透明性** There is usually a tradeoff between safety, performance, and transparency, especially in high-stakes environments like healthcare. A model may not achieve both high transparency and maximum performance without some compromise. > 在安全性、表現與透明性之間通常存在權衡,尤其是在像醫療這類高風險環境中。模型**無法在不做任何妥協的情況下,同時達到高透明度與最高表現**。 ::: ## MLOps :::spoiler [Amazon Web Services, Inc. What is MLOps? - Machine Learning Operations Explained - AWS](https://aws.amazon.com/what-is/mlops/) **MLOps(機器學習運營)** Machine learning operations (MLOps) are a set of practices that automate and simplify [machine learning (ML)](https://aws.amazon.com/what-is/machine-learning/) workflows and deployments. > **機器學習運營(MLOps)**是一套實務方法,用於**自動化與簡化機器學習(ML)的工作流程和部署**。 Machine learning and [artificial intelligence (AI)](https://aws.amazon.com/what-is/artificial-intelligence/) are core capabilities that you can implement to solve complex real-world problems and deliver value to your customers. > **機器學習與人工智慧(AI)**是核心能力,可用來解決複雜的實際問題,並為客戶創造價值。 MLOps is an ML culture and practice that unifies ML application development (Dev) with ML system deployment and operations (Ops). > MLOps 是一種 ML 文化與實務,**將機器學習應用開發(Dev)與系統部署及運營(Ops)整合統一**。 Your organization can use MLOps to automate and standardize processes across the ML lifecycle. These processes include model development, testing, integration, release, and infrastructure management. > 組織可以利用 MLOps **自動化並標準化 ML 生命週期中的各項流程**,包括模型開發、測試、整合、發佈以及基礎設施管理。 ![image](https://hackmd.io/_uploads/Hk98N7NTgx.png) ::: ## Vector Database :::spoiler **Vector Databases(向量資料庫)** Embeddings encode all types of data into vectors that capture the meaning and context of an asset. This allows us to find similar assets by searching for neighboring data points. > **嵌入向量(Embeddings)**將各類資料編碼為向量,以捕捉資產的**意義與上下文**。這使我們能透過搜尋鄰近資料點來找到相似的資產。 Vector search methods allow unique experiences like taking a photograph with your smartphone and searching for similar images. > **向量搜尋方法**提供了獨特的體驗,例如使用智慧型手機拍攝照片後,即可搜尋相似影像。 Vector databases provide the ability to store and retrieve vectors as high-dimensional points. They add additional capabilities for efficient and fast lookup of nearest-neighbors in the N-dimensional space. > **向量資料庫**能將向量以**高維點**形式儲存與檢索,並提供額外功能以**高效快速地查找 N 維空間中的最近鄰**。 They are typically powered by k-nearest neighbor (k-NN) indexes and built with algorithms like the Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) algorithms. > 它們通常使用 **k 最近鄰(k-NN)索引**,並基於演算法如 **分層可導導航小世界(HNSW)** 或 **倒排索引(IVF)** 架構建置。 Vector databases provide additional capabilities like data management, fault tolerance, authentication and access control, and a query engine. > 向量資料庫還提供**資料管理、容錯、驗證與存取控制、以及查詢引擎**等功能。 [🔗 參考資料來源:What is a Vector Database? - Vector Databases Explained - AWS](https://aws.amazon.com/what-is/vector-databases/) --- **相關 AWS 服務應用範例:** - [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) > 提供互動式日誌分析、即時應用監控、網站搜尋等功能。針對向量資料庫,可參考 [OpenSearch k-NN 搜尋](https://docs.aws.amazon.com/opensearch-service/latest/developerguide/knn.html)。 - [Amazon Aurora PostgreSQL-Compatible Edition](https://aws.amazon.com/rds/aurora/) & [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) > 支援 **pgvector** 擴充套件,可將 ML 模型的嵌入向量儲存在資料庫中,並進行高效相似性搜尋。 - [Amazon Neptune ML](https://aws.amazon.com/neptune/machine-learning/) > Neptune 的新功能,利用專為圖設計的 **圖神經網路(GNN)**,可快速、準確地使用圖資料進行預測。 - [Vector search for Amazon MemoryDB](https://aws.amazon.com/memorydb/features/#Vector_search) > 支援儲存數百萬向量,**單位毫秒級查詢與更新延遲**,每秒可處理數萬筆查詢,召回率超過 99%。 - [Amazon DocumentDB](https://aws.amazon.com/documentdb/) > 支援向量搜尋,可儲存、索引及搜尋數百萬向量,並提供毫秒級回應。利用 [DocumentDB 向量搜尋](https://aws.amazon.com/documentdb/features/#Generative_AI_and_machine_learning) 可輕鬆建立、操作與擴展 ML 應用資料庫。 ::: ## Customize Model :::spoiler **Amazon Bedrock Customization(亞馬遜 Bedrock 模型客製化)** You can customize Amazon Bedrock foundation models in order to improve their performance and create a better customer experience. > 你可以對 **Amazon Bedrock** 的基礎模型進行客製化,以提升模型表現,並提供更佳的客戶體驗。 Amazon Bedrock currently provides the following customization methods: > 目前 Amazon Bedrock 提供以下客製化方法: --- - **Continued Pre-training (持續預訓練)** Provide unlabeled data to pre-train a foundation model by familiarizing it with certain types of inputs. You can provide data from specific topics in order to expose a model to those areas. The Continued Pre-training process will tweak the model parameters to accommodate the input data and improve its domain knowledge. > 提供 **未標註(unlabeled)** 資料來預訓練基礎模型,使其熟悉特定類型的輸入。你可以提供特定主題的資料,讓模型接觸這些領域。持續預訓練會調整模型參數以適應輸入資料,並提升其領域知識。 For example, you can train a model with private data, such as business documents, that are not publicly available for training large language models. Additionally, you can continue to improve the model by retraining the model with more unlabeled data as it becomes available. > 例如,你可以使用私人資料(如企業文件)訓練模型,這些資料並未公開用於大型語言模型訓練。此外,隨著更多未標註資料的出現,你可以持續重新訓練模型以進一步提升性能。 --- - **Fine-tuning (微調)** Provide labeled data in order to train a model to improve performance on specific tasks. By providing a training dataset of labeled examples, the model learns to associate what types of outputs should be generated for certain types of inputs. The model parameters are adjusted in the process and the model's performance is improved for the tasks represented by the training dataset. > 提供 **已標註(labeled)** 資料以訓練模型,提升其在特定任務上的表現。透過提供帶標註的訓練資料集,模型學會對不同類型的輸入生成相對應的輸出,並在此過程中調整模型參數,改善模型在該訓練資料代表的任務上的表現。 --- [🔗 參考資料來源:Customize your model to improve its performance for your use case - Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/custom-models.html) ::: ## Overfitting and Underfitting :::spoiler **Overfitting(過擬合)** Overfitting is an undesirable [machine learning](https://aws.amazon.com/what-is/machine-learning/) behavior that occurs when the machine learning model gives accurate predictions for training data but not for new data. > **過擬合(Overfitting)**是指模型在訓練資料上表現準確,但對新資料無法給出正確預測的情況,是機器學習中不理想的現象。 When data scientists use machine learning models for making predictions, they first train the model on a known data set. Then, based on this information, the model tries to predict outcomes for new data sets. An overfit model can give inaccurate predictions and cannot perform well for all types of new data. > 資料科學家在使用機器學習模型進行預測時,會先用已知資料訓練模型,然後再嘗試對新資料做預測。**過擬合模型可能會導致預測不準確,對各類新資料的表現不佳。** --- ### **Why does overfitting occur?(為什麼會發生過擬合?)** You only get accurate predictions if the machine learning model generalizes to all types of data within its domain. Overfitting occurs when the model cannot generalize and fits too closely to the training dataset instead. Overfitting happens due to several reasons, such as: > 只有當機器學習模型能概化到其領域內的所有資料類型時,才可能得到準確的預測。過擬合發生在模型無法概化、過度擬合訓練資料時。其原因包括: - The training data size is too small and does not contain enough data samples to accurately represent all possible input data values. > 訓練資料量太少,無法準確代表所有可能的輸入資料值。 - The training data contains large amounts of irrelevant information, called noisy data. > 訓練資料包含大量不相關的資訊,稱為**雜訊資料(noisy data)**。 - The model trains for too long on a single sample set of data. > 模型在單一資料集上訓練時間過長。 - The model complexity is high, so it learns the noise within the training data. > 模型過於複雜,會學習到訓練資料中的雜訊。 --- ### **Overfitting examples(過擬合範例)** Consider a use case where a machine learning model has to analyze photos and identify the ones that contain dogs in them. If the machine learning model was trained on a data set that contained majority photos showing dogs outside in parks, it may learn to use grass as a feature for classification, and may not recognize a dog inside a room. > 例如,一個機器學習模型需要分析照片並辨識其中含有狗的圖片。如果模型訓練資料大多是狗在公園的照片,它可能會將草地視為分類特徵,導致無法辨識室內的狗。 Another overfitting example is a machine learning algorithm that predicts a university student's academic performance and graduation outcome by analyzing several factors like family income, past academic performance, and academic qualifications of parents. However, the test data only includes candidates from a specific gender or ethnic group. In this case, overfitting causes the algorithm's prediction accuracy to drop for candidates with gender or ethnicity outside of the test dataset. > 另一個例子是,機器學習算法分析學生家庭收入、過去成績和父母學歷來預測大學生的學業表現與畢業結果。但如果測試資料僅包含特定性別或族群,過擬合會導致對其他性別或族群的預測準確度下降。 --- ### **How can you detect overfitting?(如何偵測過擬合?)** The best method to detect overfit models is by testing the machine learning models on more data with comprehensive representation of possible input data values and types. Typically, part of the training data is used as test data to check for overfitting. A high error rate in the testing data indicates overfitting. One method of testing for overfitting is given below. > 偵測過擬合模型的最佳方法是使用更多資料進行測試,確保資料能充分代表可能的輸入資料類型與值。通常會將部分訓練資料當作測試資料。測試資料錯誤率高則表示模型過擬合。以下是一種檢測過擬合的方法: **K-fold cross-validation(K 折交叉驗證)** Cross-validation is one of the testing methods used in practice. In this method, data scientists divide the training set into K equally sized subsets or sample sets called folds. The training process consists of a series of iterations. During each iteration, the steps are: 1. Keep one subset as the validation data and train the machine learning model on the remaining K-1 subsets. 2. Observe how the model performs on the validation sample. 3. Score model performance based on output data quality. Iterations repeat until you test the model on every sample set. You then average the scores across all iterations to get the final assessment of the predictive model. > 交叉驗證是實務中常用的測試方法。將訓練集分為 K 個等大小子集(folds),訓練過程包含多次迭代。每次迭代步驟如下: > 1. 選取一個子集作為驗證資料,其餘 K-1 個子集用於訓練模型。 > 2. 觀察模型在驗證資料上的表現。 > 3. 根據輸出資料品質評分模型表現。 > 重複迭代直到每個子集都被用作驗證,最後對所有迭代分數取平均,得到最終預測模型評估。 --- ### **How can you prevent overfitting?(如何防止過擬合?)** You can prevent overfitting by diversifying and scaling your training data set or using some other data science strategies, like those given below. > 可透過多樣化與擴充訓練資料集,或採用其他資料科學策略來防止過擬合,如下所示: **Early stopping(提前停止)** > Early stopping pauses the training phase before the machine learning model learns the noise in the data. However, getting the timing right is important; else the model will still not give accurate results. > 提前停止會在模型學習到資料雜訊前暫停訓練。然而,掌握時機很重要,否則模型仍可能無法準確預測。 **Pruning(剪枝)** > You might identify several features or parameters that impact the final prediction when you build a model. Feature selection—or pruning—identifies the most important features within the training set and eliminates irrelevant ones. For example, to predict if an image is an animal or human, you can look at various input parameters like face shape, ear position, body structure, etc. You may prioritize face shape and ignore the shape of the eyes. > 在建立模型時,可能會識別多個影響最終預測的特徵或參數。剪枝會找出訓練資料中最重要的特徵並剔除不相關的特徵。例如,預測圖像是人或動物時,可以考慮面型、耳朵位置、身體結構等參數,可優先使用面型並忽略眼睛形狀。 **Regularization(正則化)** > Regularization is a collection of training/optimization techniques that seek to reduce overfitting. These methods try to eliminate those factors that do not impact the prediction outcomes by grading features based on importance. For example, mathematical calculations apply a penalty value to features with minimal impact. Consider a statistical model attempting to predict the housing prices of a city in 20 years. Regularization would give a lower penalty value to features like population growth and average annual income but a higher penalty value to the average annual temperature of the city. > 正則化是一系列訓練/優化技術,用來降低過擬合。透過依據特徵重要性給予懲罰值,降低對預測影響小的因素。例如,預測某城市 20 年後房價的統計模型,正則化會給人口增長與年平均收入較低的懲罰值,而城市年平均溫度給予較高懲罰值。 **Ensembling(集成學習)** > Ensembling combines predictions from several separate machine learning algorithms. Some models are called weak learners because their results are often inaccurate. Ensemble methods combine all the weak learners to get more accurate results. They use multiple models to analyze sample data and pick the most accurate outcomes. The two main ensemble methods are bagging and boosting. Boosting trains different machine learning models one after another to get the final result, while bagging trains them in parallel. > 集成學習結合多個機器學習算法的預測。有些模型稱為弱學習者,因為結果常不準確。集成方法整合所有弱學習者,提高準確性,使用多個模型分析資料並選出最準確結果。兩種主要方法:**Boosting** 依序訓練模型,**Bagging** 平行訓練模型。 **Data augmentation(資料增強)** > Data augmentation is a machine learning technique that changes the sample data slightly every time the model processes it. You can do this by changing the input data in small ways. When done in moderation, data augmentation makes the training sets appear unique to the model and prevents the model from learning their characteristics. For example, applying transformations such as translation, flipping, and rotation to input images. > 資料增強是機器學習技術,每次模型處理資料時,稍微改變樣本資料。適度操作可讓訓練集對模型呈現獨特,避免模型學習到資料特性。例如,對輸入圖像進行平移、翻轉、旋轉等變換。 --- ### **What is underfitting?(欠擬合)** Underfitting is another type of error that occurs when the model cannot determine a meaningful relationship between the input and output data. You get underfit models if they have not trained for the appropriate length of time on a large number of data points. > 欠擬合是另一種錯誤,發生在模型無法找出輸入與輸出資料間的有效關聯。如果模型在大量資料上訓練時間不足,就會產生欠擬合。 **Underfitting vs. overfitting(欠擬合 vs 過擬合)** **Underfit models experience high bias**—they give inaccurate results for both the training data and test set. > 欠擬合模型具有高偏差(high bias),對訓練與測試資料皆表現不佳。 On the other hand, **overfit models experience high variance**—they give accurate results for the training set but not for the test set. More model training results in less bias but variance can increase. Data scientists aim to find the sweet spot between underfitting and overfitting when fitting a model. A well-fitted model can quickly establish the dominant trend for seen and unseen data sets. > 過擬合模型具有高變異(high variance),訓練資料準確但測試資料表現不佳。增加訓練可以降低偏差,但可能增加變異。資料科學家會在欠擬合與過擬合之間尋找平衡點。良好的模型可以快速捕捉已見與未見資料的主要趨勢。 --- ### **Bias and Variance** - **Low bias, low variance** > Incorrect. Low bias indicates that the model is not making erroneous assumptions about the training data. Low variance indicates that the model is not paying attention to noise in the training data. This is an ideal outcome for model training and would not result in model overfitting. > **低偏差、低變異**:錯誤。低偏差表示模型對訓練資料沒有錯誤假設,低變異表示模型不受訓練資料雜訊影響。這是理想模型表現,不會過擬合。 - **Low bias, high variance** > Correct. Low bias indicates that the model is not making erroneous assumptions about the training data. High variance indicates that the model is paying attention to noise in the training data and is overfitting. > **低偏差、高變異**:正確。低偏差表示模型對訓練資料沒有錯誤假設,高變異表示模型受訓練資料雜訊影響而過擬合。 - **High bias, low variance** > Incorrect. High bias indicates that the model is making erroneous assumptions about the training data. Low variance indicates that the model is not paying attention to noise in the training data, which will lead to underfitting. > **高偏差、低變異**:錯誤。高偏差表示模型對訓練資料做出錯誤假設,低變異表示模型不受雜訊影響,會導致欠擬合。 - **High bias, high variance** > Incorrect. High bias indicates that the model is making erroneous assumptions about the training data. High variance indicates that the model is paying attention to noise in the training data. However, this pattern will rarely happen during model training and does not indicate that the model is overfitting or underfitting. > **高偏差、高變異**:錯誤。高偏差表示模型對訓練資料做出錯誤假設,高變異表示模型受訓練資料雜訊影響。但此情況在訓練中很少發生,並不代表模型過擬合或欠擬合。 [🔗 參考資料來源:Overfitting and Underfitting - AWS](https://aws.amazon.com/what-is/overfitting/) ::: ## Generative AI :::spoiler ### AI Models Overview(AI 模型概覽) Generally, most AI applications can be split into traditional ML models and generative AI models. The performance of both categories is dependent on large amounts of data. The base algorithms are rooted in deep learning frameworks. The most common framework is discriminative modeling, which involves predicting or classifying a target variable. You would use a traditional ML model to predict the customer turnover rate and to create a text sentiment analysis application. > 一般而言,大部分 AI 應用可以分為傳統機器學習模型與生成式 AI 模型。兩者的表現都依賴大量資料。基礎演算法以深度學習框架為核心。最常見的是判別式建模(discriminative modeling),用於預測或分類目標變數。例如,可使用傳統 ML 模型預測顧客流失率或建立文字情感分析應用。 The output of generative AI is usually new data. Transformer-based large language models (LLMs) are neural networks that process all observations at once. Transformer-based LLMs do not process observations in gradual processes. The creative power of an LLM makes something new based on sets of probabilities that are processed with the input feature of the underlying dataset. Models based on transformers are often referred to as foundation models (FMs). Creating a large patent repository of English-to-French translation is an example of a transformer-based LLM or generative AI model. Building unique, realistic images from prompts is a use case for a generative AI model, for example a diffusion model. > 生成式 AI 的輸出通常是新的資料。基於 Transformer 的大型語言模型(LLM)是一次性處理所有觀測值的神經網路,不會逐步處理。LLM 的創造力基於資料集特徵計算的機率生成新內容。基於 Transformer 的模型通常稱為基礎模型(Foundation Models, FMs)。例如建立英法專利翻譯資料庫就是 Transformer LLM 的案例;從提示生成獨特且逼真的圖像則是生成式 AI(如 diffusion 模型)的應用。 Learn more about [generative AI and traditional models](https://aws.amazon.com/what-is/generative-ai/). --- ### VAEs(變分自動編碼器) VAEs (variational autoencoders) introduced the capability to create novel variations of multiple data types. This led to the rapid emergence of other generative AI models like generative adversarial networks and diffusion models. These innovations were focused on generating data that increasingly resembled real data despite being artificially created. > VAE(Variational Autoencoders, 變分自動編碼器)可創建多種資料類型的新變體,促使生成對抗網路(GAN)與 diffusion 模型等生成式 AI 模型快速發展,目標是生成愈來愈逼近真實資料的人工資料。 Variational autoencoders (VAEs) learn a compact representation of data called latent space. The latent space is a mathematical representation of the data. You can think of it as a unique code representing the data based on all its attributes. For example, if studying faces, the latent space contains numbers representing eye shape, nose shape, cheekbones, and ears. > VAE 學習資料的緊湊表示,稱為潛在空間(latent space),它是資料的數學表示。可視作一組編碼,代表資料的各項屬性。例如研究人臉時,潛在空間包含眼睛、鼻子、顴骨、耳朵的數值表示。 VAEs use two neural networks—the encoder and the decoder. The encoder neural network maps the input data to a mean and variance for each dimension of the latent space. It generates a random sample from a Gaussian (normal) distribution. This sample is a point in the latent space and represents a compressed, simplified version of the input data. > VAE 使用兩個神經網路:編碼器(encoder)與解碼器(decoder)。編碼器將輸入資料映射到潛在空間每個維度的平均值與變異數,並從高斯分布生成隨機樣本,作為潛在空間的點,代表輸入資料的壓縮簡化版本。 The decoder neural network takes this sampled point from the latent space and reconstructs it back into data that resembles the original input. Mathematical functions are used to measure how well the reconstructed data matches the original data. > 解碼器將潛在空間的樣本點重建為與原始輸入相似的資料,並透過數學函數評估重建資料與原始資料的匹配程度。 ![image (1)](https://hackmd.io/_uploads/ByCodQNpxe.png) --- ### Transformers(Transformer 模型) In 2017, a further shift in AI research occurred with the introduction of transformers. Transformers seamlessly integrated the encoder-and-decoder architecture with an attention mechanism. They streamlined the training process of language models with exceptional efficiency and versatility. Notable models like GPT emerged as foundational models capable of pretraining on extensive corpora of raw text and fine-tuning for diverse tasks. > 2017 年,Transformer 模型引入 AI 研究,將編碼器-解碼器架構與注意力機制(attention mechanism)整合,提升語言模型訓練效率與靈活性。GPT 等知名模型因此誕生,能在大量原始文本語料上進行預訓練,並微調以應對多種任務。 Transformers changed what was possible for natural language processing. They empowered generative capabilities for tasks ranging from translation and summarization to answering questions. > Transformer 改變了自然語言處理的可能性,賦予生成式能力以應對翻譯、摘要與問答等任務。 The transformer-based generative AI model builds upon the encoder and decoder concepts of VAEs. Transformer-based models add more layers to the encoder to improve performance on text-based tasks like comprehension, translation, and creative writing. > 基於 Transformer 的生成式 AI 模型建立在 VAE 的編碼器與解碼器概念上,並增加更多編碼器層以提升文本任務的表現,如理解、翻譯與創作。 Transformer-based models use a self-attention mechanism. They weigh the importance of different parts of an input sequence when processing each element in the sequence. > Transformer 模型使用自注意力機制(self-attention),在處理序列每個元素時,評估序列中不同部分的重要性。 Another key feature is that these AI models implement contextual embeddings. The encoding of a sequence element depends not only on the element itself but also on its context within the sequence. > 另一個關鍵特性是上下文嵌入(contextual embeddings),序列元素的編碼不僅依賴元素本身,也依賴序列中的上下文。 ### How transformer-based models work(Transformer 運作原理) To understand how transformer-based models work, imagine a sentence as a sequence of words. > 為理解 Transformer 運作,可將句子視為單詞序列。 Self-attention helps the model focus on the relevant words as it processes each word. The transformer-based generative model employs multiple encoder layers called attention heads to capture different types of relationships between words. Each head learns to attend to different parts of the input sequence, allowing the model to simultaneously consider various aspects of the data. > 自注意力機制讓模型在處理單詞時關注相關詞彙。Transformer 使用多個編碼器層(attention heads)捕捉單詞間不同關聯,每個頭學會關注序列不同部分,使模型能同時考慮資料多個面向。 Each layer also refines the contextual embeddings, making them more informative and capturing everything from grammar syntax to complex semantic meanings. > 每層同時優化上下文嵌入,使其更具資訊量,捕捉從語法到語意的複雜內容。 --- ### Diffusion models(擴散模型) Diffusion models create new data by iteratively making controlled random changes to an initial data sample. They start with the original data and add subtle changes (noise), progressively making it less similar to the original. This noise is carefully controlled to ensure the generated data remains coherent and realistic. > 擴散模型透過對初始樣本進行受控隨機變化,逐步生成新資料。從原始資料開始,添加微小噪聲,使其逐步與原始資料不同,但控制噪聲以保持生成資料連貫與逼真。 After adding noise over several iterations, the diffusion model reverses the process. Reverse denoising gradually removes the noise to produce a new data sample that resembles the original. > 在多次迭代後,擴散模型進行逆向去噪,逐步移除噪聲,生成類似原始資料的新樣本。 ![image (2)](https://hackmd.io/_uploads/B1SJt7Vpll.png) --- ### Generative adversarial networks(生成對抗網路, GAN) The [generative adversarial network](https://aws.amazon.com/what-is/gan/) (GAN) is another generative AI model that builds upon the diffusion model’s concept. > 生成對抗網路(GAN)是另一種生成式 AI 模型,建立在擴散模型概念上。 GANs work by training two neural networks in a competitive manner. The first network, known as the generator, generates fake data samples by adding random noise. The second network called the discriminator, tries to distinguish between real data and the fake data produced by the generator. > GAN 透過兩個神經網路競爭式訓練:生成器(generator)添加隨機噪聲生成假資料;判別器(discriminator)嘗試區分真實與生成資料。 During training, the generator continually improves its ability to create realistic data while the discriminator becomes better at telling real from fake. This adversarial process continues until the generator produces data that is so convincing that the discriminator can't differentiate it from real data. > 訓練期間,生成器不斷提升生成逼真資料的能力,判別器則更擅長區分真假。此對抗過程持續,直到生成器產生的資料足夠真實,判別器無法區分。 GANs are widely used in generating realistic images, style transfer, and data augmentation tasks. > GAN 廣泛應用於生成逼真影像、風格轉換與資料增強等任務。 --- ### CNN vs RNN The main differences between CNNs and RNNs include the following: > CNN 與 RNN 的主要差異如下: - CNNs are commonly used to solve problems involving spatial data, such as images. RNNs are better suited to analyzing temporal and sequential data, such as text or videos. > CNN 通常處理空間資料,如影像;RNN 適合處理時間序列或序列資料,如文字或影片。 - CNNs and RNNs have different architectures. CNNs are feedforward neural networks that use filters and pooling layers, whereas RNNs feed results back into the network. > CNN 與 RNN 架構不同:CNN 是前饋神經網路,使用卷積與池化層;RNN 則將結果反饋至網路。 - In CNNs, the size of the input and the resulting output are fixed. A CNN receives images of fixed size and outputs a predicted class label for each image along with a confidence level. In RNNs, the size of the input and the resulting output can vary. > CNN 的輸入與輸出大小固定;輸入固定尺寸圖像,輸出對應分類與信心值。RNN 則輸入輸出大小可變。 - Common use cases for CNNs include [facial recognition](https://www.techtarget.com/searchenterpriseai/definition/facial-recognition), medical analysis and image classification. Common use cases for RNNs include machine translation, [natural language processing](https://www.techtarget.com/searchenterpriseai/definition/natural-language-processing-NLP), sentiment analysis and speech analysis. > CNN 常用於人臉辨識、醫療分析與影像分類;RNN 常用於機器翻譯、自然語言處理、情感分析與語音分析。 ![image (3)](https://hackmd.io/_uploads/ryb7F7N6lg.png) ![image (4)](https://hackmd.io/_uploads/SyNQKQNplg.png) ::: ## Foundation Model :::spoiler ### Foundation Models(基礎模型) [A foundation model - Amazon SageMaker JumpStart Foundation Models](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models.html) A foundation model is a large pre-trained model that is adaptable to many downstream tasks and often serves as the starting point for developing more specialized models. Examples of foundation models include LLaMa-3-70b, BLOOM 176B, FLAN-T5 XL, or GPT-J 6B, which are pre-trained on massive amounts of text data and can be fine-tuned for specific language tasks. > 基礎模型(Foundation Model)是大型預訓練模型,可用於多種下游任務,通常作為開發更專門化模型的起點。常見的基礎模型包括 LLaMa-3-70b、BLOOM 176B、FLAN-T5 XL 或 GPT-J 6B,這些模型在海量文本資料上進行預訓練,並可針對特定語言任務進行微調。 ::: ## Context Windows :::spoiler ### Context Window(上下文視窗) The context window is a model property that describes the number of tokens that the model can accept in the context. For this scenario, the model must read and summarize a significant amount of text. Therefore, you should consider the limits to the context size (context window) first before you choose a model. > 上下文視窗(Context Window)是模型的一個屬性,用於描述模型在處理時能接受的最大 token 數量。在此情境下,模型需要閱讀並摘要大量文本,因此在選擇模型前,應先考慮上下文大小的限制。 ### What is a context window in AI?(AI 中的上下文視窗是什麼?) A context window in AI is the maximum number of tokens an LLM can process in one go. While that *sounds* simple, things get messy when you consider that the LLM has to use the context window to keep track of both the input and output. > 在 AI 中,上下文視窗指的是大型語言模型(LLM)一次能處理的最大 token 數量。乍聽之下很簡單,但實際情況會變得複雜,因為 LLM 必須利用上下文視窗同時追蹤輸入與輸出。 Things get even messier when you start to deploy LLMs in the real world. Consider [ChatGPT](https://zapier.com/blog/how-to-use-chatgpt/). Its latest models—[GPT-4o](https://zapier.com/blog/gpt-4o/), GPT-4o mini, and the newest [o1](https://zapier.com/blog/openai-o1/) series—have a maximum content window of 128,000 tokens. But you can't just put the full text of a novel into ChatGPT and expect to have a conversation about it. > 當你將 LLM 部署到現實場景中,情況會更複雜。以 [ChatGPT](https://zapier.com/blog/how-to-use-chatgpt/) 為例,其最新模型—[GPT-4o](https://zapier.com/blog/gpt-4o/)、GPT-4o mini 以及最新 [o1](https://zapier.com/blog/openai-o1/) 系列—的最大上下文視窗為 128,000 tokens。但你不能直接將一本小說的全文丟進 ChatGPT,就指望它能進行對話。 Here's why: In addition to your prompt, ChatGPT also has to process all sorts of other information—all within its maximum content window: > 原因如下:除了你的提示,ChatGPT 還必須在最大上下文視窗內處理各種其他資訊,包括: - Instructions from OpenAI (the company behind ChatGPT) > 來自 OpenAI(ChatGPT 背後公司)的指令 - Any default directives you've set up > 你設定的預設指令 - Your conversation history > 你的對話歷史 ![image (5)](https://hackmd.io/_uploads/BJ1ZjmN6lx.png) ::: ## Fine-tuning :::spoiler ### Fine-tuning Foundation Models(微調基礎模型) Foundation models are computationally expensive and trained on a large, unlabeled corpus. Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small corpus. Fine-tuning is a customization method that involves further training and does change the weights of your model. > 基礎模型的訓練需要大量運算資源,並在海量未標註資料上進行。微調預訓練基礎模型是一種經濟的方式,既能利用其廣泛能力,又可在小型資料集上進行模型客製化。微調是透過額外訓練來調整模型權重的自訂方法。 **Fine-tuning improves a model's performance for a given task by using labeled data. The company does not have labeled data. Therefore, the company would not be able to use fine-tuning for this task.** > 微調利用標註資料提升模型在特定任務上的表現。但如果公司沒有標註資料,就無法對此任務使用微調。 Fine-tuning might be useful to you if you need: - to customize your model to specific business needs > 將模型客製化以符合特定業務需求 - your model to successfully work with domain-specific language, such as industry jargon, technical terms, or other specialized vocabulary > 讓模型能處理領域特定語言,如行業術語、專業名詞或其他專門詞彙 - enhanced performance for specific tasks > 提升特定任務的模型表現 - accurate, relative, and context-aware responses in applications > 提供精確、相關且具有上下文感知的回答 - responses that are more factual, less toxic, and better-aligned to specific requirements > 生成更貼近事實、低毒性且符合特定要求的回答 There are two main approaches that you can take for fine-tuning depending on your use case and chosen foundation model: 1. If you're interested in fine-tuning your model on domain-specific data, see [Fine-tune a large language model (LLM) using domain adaptation](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-fine-tuning-domain-adaptation.html). 2. If you're interested in instruction-based fine-tuning using prompt and response examples, see [Fine-tune a large language model (LLM) using prompt instructions](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-fine-tuning-instruction-based.html). --- #### Fine-tune a large language model (LLM) using domain adaptation(透過領域適應微調 LLM) [**PDF**](https://docs.aws.amazon.com/pdfs/sagemaker/latest/dg/sagemaker-dg.pdf#jumpstart-foundation-models-fine-tuning-domain-adaptation) | [**RSS**](https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-release-notes.rss) Domain adaptation fine-tuning allows you to leverage pre-trained foundation models and adapt them to specific tasks using limited domain-specific data. If prompt engineering efforts do not provide enough customization, you can use domain adaptation fine-tuning to get your model working with domain-specific language, such as industry jargon, technical terms, or other specialized data. This fine-tuning process modifies the weights of the model. > 領域適應微調允許你利用預訓練基礎模型,並使用有限的領域特定資料將模型適配至特定任務。如果提示工程無法提供足夠客製化,可透過領域適應微調,使模型能處理領域特定語言,如行業術語、技術名詞或其他專門資料。此微調過程會修改模型權重。 --- #### Fine-tune a large language model (LLM) using prompt instructions(透過指令式微調 LLM) [**PDF**](https://docs.aws.amazon.com/pdfs/sagemaker/latest/dg/sagemaker-dg.pdf#jumpstart-foundation-models-fine-tuning-instruction-based) | [**RSS**](https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-release-notes.rss) Instruction-based fine-tuning uses labeled examples to improve the performance of a pre-trained foundation model on a specific task. The labeled examples are formatted as prompt-response pairs and phrased as instructions. This fine-tuning process modifies the weights of the model. For more information on instruction-based fine-tuning, see the papers [Introducing FLAN: More generalizable Language Models with Instruction Fine-Tuning](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html) and [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416). > 指令式微調使用標註範例提升預訓練基礎模型在特定任務的表現。標註範例以「提示-回覆」對的形式呈現,並以指令方式描述。此微調過程會修改模型權重。更多資訊可參考論文 [Introducing FLAN](https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html) 與 [Scaling Instruction-Finetuned Language Models](https://arxiv.org/abs/2210.11416)。 Fine-tuned Language Net (FLAN) models use instruction tuning to make models more amenable to solving general downstream NLP tasks. Amazon SageMaker JumpStart provides a number of foundation models in the FLAN model family. For example, FLAN-T5 models are instruction fine-tuned on a wide range of tasks to increase zero-shot performance for a variety of common use cases. With additional data and fine-tuning, instruction-based models can be further adapted to more specific tasks that weren’t considered during pre-training. > 微調後的 FLAN 模型使用指令微調,使模型更適合解決一般下游 NLP 任務。Amazon SageMaker JumpStart 提供多個 FLAN 系列基礎模型。例如,FLAN-T5 模型針對廣泛任務進行指令微調,以提升零樣本(zero-shot)表現。透過額外資料與微調,指令式模型可進一步適配原先預訓練未涵蓋的特定任務。 --- #### Commonly supported fine-tuning hyperparameters(常用微調超參數) Different foundation models support different hyperparameters when fine-tuning. The following are commonly-supported hyperparameters that can further customize your model during training: > 不同基礎模型在微調時支援不同的超參數。以下為常用超參數,可在訓練過程中進一步客製化模型: | Inference Parameter | Description | | ------------ | ------------------------------------------| | epoch| The number of passes that the model takes through the fine-tuning dataset during training. Must be an integer greater than 1.| | learning_rate| The rate at which the model weights are updated after working through each batch of fine-tuning training examples. Must be a positive float greater than 0.| | instruction_tuned| Whether to instruction-train the model or not. Must be 'True' or 'False'.| | per_device_train_batch_size| The batch size per GPU core or CPU for training. Must be a positive integer.| | per_device_eval_batch_size| The batch size per GPU core or CPU for evaluation. Must be a positive integer.| | max_train_samples| For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means that the model uses all of the training samples. Must be a positive integer or -1.| | max_val_samples| For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means that the model uses all of the validation samples. Must be a positive integer or -1.| | max_input_length| Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, <font color="#fd4340"> max_input_length </font>  is set to the minimum of 1024 and the <font color="#fd4340"> model_max_length </font>  defined by the tokenizer. If set to a positive value, <font color="#fd4340"> max_input_length </font>  is set to the minimum of the provided value and the <font color="#fd4340"> model_max_length </font>  defined by the tokenizer. Must be a positive integer or -1.| | validation_split_ratio| If there is no validation channel, ratio of train-validation split from the training data. Must be between 0 and 1.| | train_data_split_seed| If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the model. Must be an integer.| | preprocessing_num_workers| The number of processes to use for the pre-processing. If None, main process is used for pre-processing.| | lora_r| Low-rank adaptation (LoRA) r value, which acts as the scaling factor for weight updates. Must be a positive integer.| | lora_alpha| Low-rank adaptation (LoRA) alpha value, which acts as the scaling factor for weight updates. Generally 2 to 4 times the size of lora_r. Must be a positive integer.| | lora_dropout| Dropout value for low-rank adaptation (LoRA) layers Must be a positive float between 0 and 1.| | int8_quantization| If True, model is loaded with 8 bit precision for training.| | enable_fsdp| If True, training uses Fully Sharded Data Parallelism.| [Fine-tune a foundation model](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-fine-tuning.html) ::: ## Embeddings :::spoiler [What is Embedding? - Embeddings in Machine Learning Explained - AWS (amazon.com)](https://aws.amazon.com/what-is/embeddings-in-machine-learning/?nc1=h_ls) ### Embeddings(嵌入向量) Embeddings are numerical representations of real-world objects that [machine learning (ML)](https://aws.amazon.com/what-is/machine-learning/) and [artificial intelligence (AI)](https://aws.amazon.com/what-is/artificial-intelligence/) systems use to understand complex knowledge domains like humans do. > 嵌入向量是對現實世界物件的數值化表示,機器學習(ML)與人工智慧(AI)系統使用它來理解複雜知識領域,類似人類理解方式。 As an example, computing algorithms understand that the difference between 2 and 3 is 1, indicating a close relationship between 2 and 3 as compared to 2 and 100. However, real-world data includes more complex relationships. For example, a bird-nest and a lion-den are analogous pairs, while day-night are opposite terms. > 例如,計算演算法理解 2 與 3 的差為 1,表示 2 與 3 之間的關聯比 2 與 100 更緊密。然而,現實世界資料包含更複雜的關係。例如,鳥巢與獅穴是一組類比關係,而白天與黑夜則是相反概念。 Embeddings convert real-world objects into complex mathematical representations that capture inherent properties and relationships between real-world data. The entire process is automated, with AI systems self-creating embeddings during training and using them as needed to complete new tasks. > 嵌入向量將現實世界物件轉換為複雜的數學表示,捕捉其內在屬性及物件間的關聯。整個過程是自動化的,AI 系統在訓練期間自動生成嵌入向量,並根據需要使用它們完成新的任務。 ![image (6)](https://hackmd.io/_uploads/SkkNXNE6xe.png) ::: ## Transformers :::spoiler [What are Transformers? Transformers in Artficial Intelligence Expalained - AWS (amazon.com)](https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/#seo-faq-pairs#what-are-the-components-of-transformer-architecture) ### Transformers(變壓器模型) Transformers are a type of neural network architecture that transforms or changes an input sequence into an output sequence. They do this by learning context and tracking relationships between sequence components. > Transformer 是一種神經網路架構,用來將輸入序列轉換為輸出序列。它透過學習上下文與追蹤序列中各元素間的關係來完成任務。 For example, consider this input sequence: “What is the color of the sky?” The transformer model uses an internal mathematical representation that identifies the relevancy and relationship between the words **color**, **sky**, and **blue**. It uses that knowledge to generate the output: “The sky is blue.” > 例如,輸入句子「What is the color of the sky?」,Transformer 模型會透過內部的數學表示,識別出 **color**、**sky** 與 **blue** 之間的關聯,進而產生正確的輸出:「The sky is blue.」 --- ##### **Self-attention mechanism(自注意力機制)** Transformer models modify this process by incorporating something called a *self-attention mechanism*. Instead of processing data in order, the mechanism enables the model to look at different parts of the sequence all at once and determine which parts are most important. > Transformer 模型引入了「自注意力機制(self-attention mechanism)」,不再依序處理資料,而是能同時檢視序列的所有部分,並判斷哪些資訊最重要。 Imagine that you’re in a busy room and trying to listen to someone talk. Your brain automatically focuses on their voice while tuning out less important noises. > 想像你身處嘈雜的房間中,正在試圖聽某人說話。你的大腦會自動聚焦於對方的聲音,忽略不重要的雜音。 Self-attention enables the model do something similar: it pays more attention to the relevant bits of information and combines them to make better output predictions. This mechanism makes transformers more efficient, enabling them to be trained on larger datasets. > 自注意力機制讓模型能做到類似的事——聚焦於相關資訊並整合它們,以生成更準確的輸出。這使 Transformer 既高效又能處理更大的資料集。 It’s also more effective, especially when dealing with long pieces of text where context from far back might influence the meaning of what’s coming next. > 此機制在處理長文本時尤為出色,因為它能保留並利用先前的上下文來影響後續內容的理解。 --- #### **How are transformers different from other neural network architectures?(與其他神經網路的差異)** Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are other neural networks frequently used in machine learning and deep learning tasks. The following explores their relationships to transformers. > 除了 Transformer,機器學習與深度學習中常見的神經網路架構還有循環神經網路(RNN)與卷積神經網路(CNN)。以下將說明它們與 Transformer 的差異。 --- ##### **Transformers vs. RNNs** Transformer models and RNNs are both architectures used for processing sequential data. > Transformer 與 RNN 都是用於處理序列資料的模型架構。 **RNNs process data sequences one element at a time in cyclic iterations.** > **RNN 以循環方式一次處理一個序列元素。** The process starts with the input layer receiving the first element of the sequence. The information is then passed to a hidden layer, which processes the input and passes the output to the next time step. > 該過程從輸入層接收序列的第一個元素開始,接著資訊被傳遞至隱藏層處理,並將輸出傳給下一個時間步。 This output, combined with the next element of the sequence, is fed back into the hidden layer. This cycle repeats for each element in the sequence, with the RNN maintaining a hidden state vector that gets updated at each time step. > 然後輸出與下一個元素結合後再次送回隱藏層,如此循環處理整個序列。RNN 透過持續更新「隱藏狀態向量(hidden state vector)」來記憶先前的資訊。 This process effectively enables the RNN to remember information from past inputs. > 因此,RNN 能在一定程度上「記住」過去輸入的資訊。 In contrast, transformers process entire sequences simultaneously. > 相較之下,Transformer 能同時處理整個序列。 This parallelization enables much faster training times and the ability to handle much longer sequences than RNNs. > 此種平行化設計讓訓練速度大幅提升,也能處理比 RNN 更長的序列。 The self-attention mechanism in transformers also enables the model to consider the entire data sequence simultaneously. This eliminates the need for recurrence or hidden vectors. Instead, positional encoding maintains information about the position of each element in the sequence. > 自注意力機制讓 Transformer 無須循環結構或隱藏狀態,即可同時考慮整個序列的關聯。模型改以「位置編碼(positional encoding)」記錄各元素在序列中的位置資訊。 Transformers have largely superseded RNNs in many applications, especially in NLP tasks, because they can handle long-range dependencies more effectively. They also have greater scalability and efficiency than RNNs. > Transformer 已在多數應用中取代 RNN,尤其是在自然語言處理(NLP)領域,因為它更能處理長距依存關係,並具更高的延展性與效率。 RNNs are still useful in certain contexts, especially where model size and computational efficiency are more critical than capturing long-distance interactions. > 不過在部分情境下,若重視模型輕量與運算效率,RNN 仍具有其價值。 --- ##### **Transformers vs. CNNs** CNNs are designed for **grid-like data, such as images**, where spatial hierarchies and locality are key. > CNN 專為**網格狀資料(如影像)**設計,重點在於捕捉空間層級與區域特徵。 They use convolutional layers to apply filters across an input, capturing local patterns through these filtered views. > CNN 透過卷積層(convolutional layers)在輸入資料上套用濾波器,從局部區域中擷取特徵模式。 For example, in image processing, initial layers might detect edges or textures, and deeper layers recognize more complex structures like shapes or objects. > 例如,在影像處理中,淺層卷積層可能識別邊緣與紋理,而深層則辨識更高階的形狀與物體結構。 **Transformers were primarily designed to handle sequential data and couldn’t process images.** > **Transformer 原本是為處理序列資料設計,並不適用於影像資料。** Vision transformer models are now processing images by converting them into a sequential format. > 不過,現今的「視覺 Transformer(Vision Transformer, ViT)」透過將影像轉換為序列形式來進行處理。 However, CNNs continue to remain a highly effective and efficient choice for many practical computer vision applications. > 儘管如此,CNN 在許多電腦視覺任務中依然是高效且實用的主流方法。 --- #### **What are the different types of transformer models?(Transformer 模型的種類)** Transformers have evolved into a diverse family of architectures. The following are some types of transformer models. > Transformer 已發展出多樣的架構形式,以下是幾種主要的模型類型。 --- ##### **Bidirectional transformers(雙向 Transformer)** Bidirectional encoder representations from transformers (**BERT**) models modify the base architecture to process words in relation to all the other words in a sentence rather than in isolation. > **BERT(Bidirectional Encoder Representations from Transformers)** 改良了原始架構,使模型能同時考慮句中所有詞的關聯,而非逐字處理。 Technically, it employs a mechanism called the **bidirectional masked language model (MLM)**. > 技術上,BERT 採用「雙向遮罩語言模型(MLM)」機制。 During pretraining, BERT randomly masks some percentage of the input tokens and predicts these masked tokens based on their context. > 在預訓練階段,BERT 會隨機遮蔽部分輸入詞元,並根據上下文預測其內容。 The bidirectional aspect comes from the fact that BERT takes into account both the left-to-right and right-to-left token sequences in both layers for greater comprehension. > 「雙向」指模型同時考慮從左到右與從右到左的語境,從而達成更深層的語意理解。 --- ##### **Generative pretrained transformers(生成式預訓練 Transformer)** **GPT** models use stacked transformer decoders that are pretrained on a large corpus of text by using language modeling objectives. > **GPT(Generative Pretrained Transformer)** 採用多層堆疊的 Transformer 解碼器,並以大量文本進行語言模型預訓練。 They are **autoregressive**, which means that they predict the next value in a sequence based on all preceding values. > GPT 屬於**自回歸(autoregressive)**模型,意即根據先前所有詞元來預測下一個詞。 By using more than 175 billion parameters, GPT models can generate text sequences that are adjusted for style and tone. > 擁有超過 1750 億參數的 GPT 模型,能生成符合語氣與風格的文字。 GPT models have sparked the research in AI toward achieving **artificial general intelligence (AGI)**. > GPT 的成功推動了邁向「通用人工智慧(AGI)」的研究。 This means that organizations can reach new levels of productivity while reinventing their applications and customer experiences. > 這讓各組織能在應用創新與生產力上達到新的高度。 --- ##### **Bidirectional and autoregressive transformers(雙向自回歸 Transformer)** A **BART** model combines bidirectional and autoregressive properties. > **BART(Bidirectional and Auto-Regressive Transformer)** 結合了雙向與自回歸兩種特性。 It’s like a blend of BERT’s bidirectional encoder and GPT’s autoregressive decoder. > 它可視為融合了 BERT 的編碼器與 GPT 的解碼器。 BART reads the entire input sequence at once and is bidirectional like BERT, but generates the output sequence one token at a time like GPT. > BART 會像 BERT 一樣同時讀取整個輸入序列,又像 GPT 一樣逐詞生成輸出。 --- ##### **Transformers for multimodal tasks(多模態 Transformer)** Multimodal transformer models such as **ViLBERT** and **VisualBERT** handle multiple input types, typically text and images. > 多模態 Transformer(如 **ViLBERT**、**VisualBERT**)能同時處理多種類型輸入,通常為文字與影像。 They use dual-stream networks to process visual and textual inputs separately before fusing the information. > 它們採用雙流網路架構,分別處理文字與視覺資料,再在後期融合資訊。 For example, ViLBERT uses co-attentional transformer layers to enable the separate streams to interact. > 例如,ViLBERT 使用「協同注意力層(co-attentional layers)」讓兩個資料流進行互動。 This is crucial for tasks that require understanding relationships between text and images, such as **visual question-answering (VQA)**. > 此設計對於理解文字與影像關聯的任務(如**視覺問答 VQA**)至關重要。 --- ##### **Vision transformers(視覺 Transformer)** Vision transformers (**ViT**) repurpose the transformer architecture for image classification tasks. > **ViT(Vision Transformer)** 將 Transformer 架構應用於影像分類任務。 Instead of processing an image as a grid of pixels, they view image data as a sequence of fixed-size patches—like treating words in a sentence. > ViT 不以像素矩陣處理影像,而是將影像分割為固定大小的區塊,視作序列中的「詞」。 Each patch is flattened, linearly embedded, and then processed sequentially by the standard transformer encoder. > 每個影像區塊會被攤平、線性嵌入,然後輸入至 Transformer 編碼器進行處理。 Positional embeddings are added to maintain spatial information. > 同時加入「位置嵌入」以保留空間資訊。 This **global self-attention** allows the model to capture relationships between any pair of patches regardless of their position. > 透過**全域自注意力機制(global self-attention)**,模型能捕捉任意影像區塊間的關聯,無論它們的位置距離多遠。 ::: ## RAG :::spoiler ### **Retrieval Augmented Generation (RAG)|檢索增強生成** Use **Retrieval Augmented Generation (RAG)** to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context. >使用 **檢索增強生成(RAG)** 技術,可以從基礎模型以外的資料來源中檢索相關資訊,並將這些檢索到的資料加入提示(prompt)中,以補充模型的上下文內容。 For more information about RAG model architectures, see ➡️ [**Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks**](https://arxiv.org/abs/2005.11401) >如需了解更多關於 RAG 模型架構的資訊,請參閱上方連結的論文。 --- Foundation models are usually trained **offline**, making the model **agnostic to any data created after training**. Additionally, foundation models are trained on **very general domain corpora**, which makes them **less effective for domain-specific tasks**. >基礎模型通常是**離線訓練(offline training)**的,因此無法掌握**訓練後產生的新資料**。 此外,這些模型的訓練語料多屬於**一般性領域(general domain corpora)**,使得它們在**特定領域任務(domain-specific tasks)**上的表現較為不足。 ![image (4)](https://hackmd.io/_uploads/SJjKbr9ple.png) [Retrieval Augmented Generation - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-rag.html) ::: ## Prompt Engineering :::spoiler ### **Prompt engineering 提示工程** Prompt engineering is the process of designing and refining the prompts or input stimuli for a language model to generate specific types of output. Prompt engineering involves selecting appropriate keywords, providing context, and shaping the input in a way that encourages the model to produce the desired response and is a vital technique to actively shape the behavior and output of foundation models. > 提示工程(Prompt engineering)是設計和優化語言模型輸入提示(prompt 或刺激輸入)的過程,目的是讓模型生成特定類型的輸出。提示工程包括選擇合適的關鍵詞、提供上下文,以及以特定方式塑造輸入,以鼓勵模型生成期望的回應,是主動塑造基礎模型行為與輸出的重要技術。 --- ### **Zero-shot learning 零樣本學習** Zero-shot learning involves training a model to generalize and make predictions on unseen classes or tasks. To perform prompt engineering in zero-shot learning environments, we recommend constructing prompts that explicitly provide information about the target task and the desired output format. For example, if you want to use a foundation model for zero-shot text classification on a set of classes that the model did not see during training, a well-engineered prompt could be: `"Classify the following text as either sports, politics, or entertainment: *[input text]*."` By explicitly specifying the target classes and the expected output format, you can guide the model to make accurate predictions even on unseen classes. > 零樣本學習(Zero-shot learning)指訓練模型以泛化並對未見過的類別或任務進行預測。在零樣本學習環境下進行提示工程時,建議構建明確提供目標任務和期望輸出格式的提示。例如,如果你想讓基礎模型對訓練中未見過的一組類別進行零樣本文本分類,一個設計良好的提示可以是:`"Classify the following text as either sports, politics, or entertainment: *[input text]*."` 明確指定目標類別和期望輸出格式可以引導模型即使對未見類別也能做出準確預測。 --- ### **Few-shot learning 少樣本學習** Few-shot learning involves training a model with a limited amount of data for new classes or tasks. Prompt engineering in few-shot learning environments focuses on designing prompts that effectively use the limited available training data. For example, if you use a foundation model for an image classification task and only have a few examples of a new image class, you can engineer a prompt that includes the available labeled examples with a placeholder for the target class. For example, the prompt could be: `"[image 1], [image 2], and [image 3] are examples of *[target class]*. Classify the following image as *[target class]*"`. By incorporating the limited labeled examples and explicitly specifying the target class, you can guide the model to generalize and make accurate predictions even with minimal training data. > 少樣本學習(Few-shot learning)指使用有限數據訓練模型以處理新類別或任務。在少樣本學習環境下進行提示工程時,重點是設計有效利用有限可用訓練數據的提示。例如,如果你使用基礎模型進行圖像分類任務,且對新圖像類別只有少量範例,你可以設計提示,將可用的標註範例包含在內,並為目標類別留一個占位符。例如提示可以是:`"[image 1], [image 2], and [image 3] are examples of *[target class]*. Classify the following image as *[target class]*"`。透過整合有限的標註範例並明確指定目標類別,你可以引導模型即使在最少訓練數據下也能進行泛化並做出準確預測。 Few-shot prompt engineering is a technique that you can use to guide a model to generalize based on a few examples. The model uses examples to generalize and make more accurate predictions without the need to re-train or fine-tune a model. Few-shot prompting cannot evaluate or assess the quality of generated content. > 少樣本提示工程是一種技術,可用來引導模型基於少量範例進行泛化。模型透過這些範例進行泛化,並做出更準確的預測,而無需重新訓練或微調模型。少樣本提示無法評估生成內容的品質。 --- ### **Supported inference parameters 支援的推理參數** Changing inference parameters might also affect the responses to your prompts. While you can try to add as much specificity and context as possible to your prompts, you can also experiment with supported inference parameters. The following are examples of some commonly supported inference parameters: > 改變推理參數也可能影響模型對提示的回應。雖然你可以在提示中添加盡可能多的細節和上下文,但你也可以嘗試支援的推理參數。以下是一些常見的支援推理參數範例: | **Inference Parameter** | **Description** | | --- | --- | | `max_new_tokens` | The maximum output length of a foundation model response. Valid values: integer, range: Positive integer. | | `temperature` | Controls the **randomness** in the output. Higher temperature results in an output sequence with low-probability words and lower temperature results in output sequence with high-probability words. If `temperature=0`, the response is made up of only the highest probability words (greedy decoding). Valid values: float, range: Positive float. | | `top_p` | In each step of text generation, the model samples from the smallest possible set of words with a cumulative probability of `top_p`. Valid values: float, range: 0.0, 1.0. | | `return_full_text` | If `True`, then the input text is part of the generated output text. Valid values: boolean, default: False. | For more information on foundation model inference, see [Deploy publicly available foundation models with the JumpStartModel class](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use-python-sdk-model-class.html). > 有關基礎模型推理的更多資訊,請參考 [Deploy publicly available foundation models with the JumpStartModel class](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-use-python-sdk-model-class.html)。 If prompt engineering is not sufficient to adapt your foundation model to specific business needs, domain-specific language, target tasks, or other requirements, you can consider fine-tuning your model on additional data or using Retrieval Augmented Generation (RAG) to augment your model architecture with enhanced context from archived knowledge sources. > 如果提示工程不足以將基礎模型適配至特定業務需求、領域專用語言、目標任務或其他要求,則可以考慮使用額外資料微調模型,或使用檢索增強生成(Retrieval Augmented Generation, RAG)從存檔知識來源增強上下文來擴展模型架構。 --- ### **Top-K (Top-K Sampling)** `Top-K` 就像一個 `Tokens` 排名榜,你可以設定一個固定數量 (`k`),模型只會從機率最高的 `k` 個 `Tokens` 中進行選擇。 舉例來說,如果 `Top-K` 設定為 `64`,模型只會考慮機率最高的 `64` 個詞彙,並從中隨機選擇下一個 `Tokens`。通過限制選擇範圍,`Top-K` 能夠確保生成的文字更集中、更符合主題,避免出現過於發散或不相關的內容。 我以 OpenAI 的 API 為例 ([API Reference](https://platform.openai.com/docs/api-reference/introduction)),[Chat Completion API](https://platform.openai.com/docs/api-reference/chat/create) 並沒有所謂的 `top_k` 參數,所以你沒辦法調整這個參數。但是,OpenAI 的 API 有提供 `logit_bias` 參數,用於修改模型生成輸出中指定 `Tokens` 出現的機率,這個參數接受一個 JSON 物件,將 `Tokens` 映射到相關偏差值,從 `-100` (禁止出現) 到 `100` (一定會出現) 等機率設定。這是一個相當有趣的參數,詳情請見 [Using logit bias to alter token probability with the OpenAI API](https://help.openai.com/en/articles/5247780-using-logit-bias-to-alter-token-probability-with-the-openai-api) 文章說明。 [Prompt engineering for foundation models - Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models-customize-prompt-engineering.html) ::: ## Prompt Engineering Techniques :::spoiler ### [What is Prompt Engineering? - AI Prompt Engineering Explained - AWS (amazon.com)](https://aws.amazon.com/what-is/prompt-engineering/) Prompt engineering is a dynamic and evolving field. It requires both linguistic skills and creative expression to fine-tune prompts and obtain the desired response from the generative AI tools. > **提示工程(Prompt Engineering)**是一個動態且不斷演進的領域。 > 它需要結合語言能力與創造力,透過微調提示詞(prompts)來獲得生成式 AI 工具的理想回應。 [Read the guide to prompt engineering by AWS PartyRock »](https://partyrock.aws/u/js2222/zEj353AmT/Prompt-Engineering-Guide-Introduction) Here are some more examples of techniques that prompt engineers use to improve their AI models' natural language processing (NLP) tasks. > 以下是提示工程師用來改進 AI 模型自然語言處理(NLP)能力的一些常見技術。 --- ### **Chain-of-thought prompting** Chain-of-thought prompting is a technique that breaks down a complex question into smaller, logical parts that mimic a train of thought. This helps the model solve problems in a series of intermediate steps rather than directly answering the question. This enhances its reasoning ability. > **思維鏈提示(Chain-of-thought prompting)**是一種將複雜問題拆解成多個邏輯步驟的技術,模仿人類思考過程。 > 它幫助模型以中間推理步驟逐步解題,而非直接給出答案,從而增強推理能力。 You can perform several chain-of-thought rollouts for complex tasks and choose the most commonly reached conclusion. If the rollouts disagree significantly, a person can be consulted to correct the chain of thought. > 對於複雜任務,可以執行多次思維鏈推理,選擇最常出現的結論。 > 若結果差異過大,可由人工協助確認正確推理路徑。 For example, if the question is *"What is the capital of France?"* the model might perform several rollouts leading to answers like *"Paris,"* *"The capital of France is Paris,"* and *"Paris is the capital of France."* Since all rollouts lead to the same conclusion, *"Paris"* would be selected as the final answer. > 例如問題為「What is the capital of France?(法國的首都是哪裡?)」 > 模型可能生成多個推理過程:「Paris」、「The capital of France is Paris」、「Paris is the capital of France」。 > 由於結果一致,最終答案即為 **“Paris”**。 --- ### **Tree-of-thought prompting** The tree-of-thought technique generalizes chain-of-thought prompting. It prompts the model to generate one or more possible next steps. Then it runs the model on each possible next step using a tree search method. > **思維樹提示(Tree-of-thought prompting)**是思維鏈提示的延伸。 > 模型會生成多個可能的下一步,並以樹狀搜尋的方式探索各種推理路徑。 For example, if the question is *"What are the effects of climate change?"* the model might first generate possible next steps like *"List the environmental effects"* and *"List the social effects."* It would then elaborate on each of these in subsequent steps. > 例如問題為「What are the effects of climate change?(氣候變遷的影響是什麼?)」 > 模型可能先生成「列出環境影響」與「列出社會影響」,再分別展開詳細說明。 --- ### **Maieutic prompting** Maieutic prompting is similar to tree-of-thought prompting. The model is prompted to answer a question with an explanation. The model is then prompted to explain parts of the explanation. Inconsistent explanation trees are pruned or discarded. This improves performance on complex commonsense reasoning. > **產婆式提示(Maieutic prompting)**類似於思維樹提示。 > 模型先給出帶有解釋的回答,再針對該解釋的部分細節進行進一步說明。 > 若不同解釋分支互相矛盾,則會被刪除,以提升常識推理的準確度。 For example, if the question is *"Why is the sky blue?"* the model might first answer, *"The sky appears blue to the human eye because the short waves of blue light are scattered in all directions by the gases and particles in the Earth's atmosphere."* It might then expand on parts of this explanation, such as why blue light is scattered more than other colors and what the Earth's atmosphere is composed of. > 例如問題為「Why is the sky blue?(為什麼天空是藍色的?)」 > 模型可能回答:「因為大氣中的氣體與微粒會散射短波長的藍光,使天空在肉眼中呈現藍色。」 > 接著再解釋「為什麼藍光散射較強」與「大氣的組成為何」等細節。 --- ### **Complexity-based prompting** This prompt-engineering technique involves performing several chain-of-thought rollouts. It chooses the rollouts with the longest chains of thought then chooses the most commonly reached conclusion. > **基於複雜度的提示(Complexity-based prompting)** > 此方法執行多次思維鏈推理,並選擇推理步驟最長、同時最常達成共識的結論作為最終答案。 --- ### **Generated knowledge prompting** This technique involves prompting the model to first generate relevant facts needed to complete the prompt. Then it proceeds to complete the prompt. This often results in higher completion quality as the model is conditioned on relevant facts. > **生成知識提示(Generated knowledge prompting)** > 模型會先生成與問題相關的背景知識,再利用這些知識完成任務, > 這樣可提升生成內容的品質與一致性。 --- ### **Least-to-most prompting** In this prompt engineering technique, the model is prompted first to list the subproblems of a problem, and then solve them in sequence. > **由簡入繁提示(Least-to-most prompting)** > 模型先列出問題的子項,再依序解決。 > 此方法能讓後續問題在先前解答的基礎上逐步構建。 --- ### **Self-refine prompting** In this technique, the model is prompted to solve the problem, critique its solution, and then resolve the problem considering the problem, solution, and critique. > **自我修正提示(Self-refine prompting)** > 模型先嘗試回答問題,再批評自己的答案,最後依據批評重新生成。 > 如此反覆,直到滿足停止條件(例如 token 限制或品質達標)。 --- ### **Directional-stimulus prompting** This prompt engineering technique includes a hint or cue, such as desired keywords, to guide the language model toward the desired output. > **方向性提示(Directional-stimulus prompting)** > 在提示中加入特定關鍵字或暗示,引導模型生成符合預期方向的內容。 --- ### **What are some prompt engineering best practices?** Good prompt engineering requires you to communicate instructions with context, scope, and expected response. > **提示工程的最佳實踐** > 優秀的提示設計需明確傳達上下文、範圍與期望的回應。 --- ### **Unambiguous prompts** Clearly define the desired response in your prompt to avoid misinterpretation by the AI. > **明確的提示** > 清楚定義期望的輸出,避免 AI 誤解。 > 例如:若要「摘要」請明示,不要讓模型誤以為要詳細分析。 --- ### **Adequate context within the prompt** Provide adequate context within the prompt and include output requirements in your prompt input. > **提供足夠上下文** > 在提示中加入背景資訊與輸出格式需求。 > 例如:若要求列出「1990 年代最受歡迎的電影」並以表格呈現,應明確指定電影數量與格式。 --- ### **Balance between targeted information and desired output** Balance simplicity and complexity in your prompt to avoid vague or confusing answers. > **平衡提示的簡潔與複雜度** > 提示過於簡略會缺乏上下文,過於複雜則可能混淆模型。 > 對於專業主題,應使用清晰、簡潔的語言降低誤差。 --- ### **Experiment and refine the prompt** Prompt engineering is an iterative process. > **不斷試驗與改進** > 提示工程是一個反覆試驗與調整的過程。 > 透過測試不同結構、措辭與語氣,可逐步找到最有效的提示設計。 --- ### **Using experimentation to optimize response quality** Experimentation allows the team to fine-tune prompts based on performance, ensuring consistent, high-quality responses. > **透過實驗優化回應品質** > 透過嘗試不同提示結構與風格,觀察生成結果差異, > 持續改進能讓模型穩定產生高品質回應,並提升使用者體驗。 --- ### **PROMPT TEMPLATE** Prompt templates are predefined formats that standardize inputs and outputs for AI models. > **提示模板(Prompt Template)** > 預先定義的提示格式,用於標準化模型的輸入與輸出, > 可提升一致性與效能。 [Prompt templates and examples for Amazon Bedrock text models - Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-templates-and-examples.html) ::: ## Prompt Attack :::spoiler ### Common Prompt Injection Attacks [Common attacks on LLMs - AWS Prescriptive Guidance](https://docs.aws.amazon.com/prescriptive-guidance/latest/llm-prompt-engineering-best-practices/common-attacks.html) Prompt engineering has matured rapidly, resulting in the identification of a set of common attacks that cover a variety of prompts and expected malicious outcomes. The following list of attacks forms the security benchmark for the guardrails discussed in this guide. Although the list isn't comprehensive, it covers a majority of attacks that an LLM-powered retrieval-augmented generation (RAG) application might face. Each guardrail we developed was tested against this benchmark. > 隨著提示工程(Prompt Engineering)的快速發展,研究人員已經歸納出一系列常見的攻擊手法,這些攻擊涵蓋了多種提示類型與潛在惡意結果。以下列出的攻擊項目構成了本指南中防護措施(Guardrails)的安全基準。雖然此列表並不全面,但涵蓋了大部分可能發生在以 LLM 為核心的檢索增強生成(RAG)應用中之攻擊模式。我們開發的每一項防護機制都已針對這些攻擊進行測試。 --- - **Prompted persona switches.** It's often useful to have the LLM adopt a persona in the prompt template to tailor its responses for a specific domain or use case (for example, including “You are a financial analyst” before prompting an LLM to report on corporate earnings). This type of attack attempts to have the LLM adopt a new persona that might be malicious and provocative. > **角色誘導攻擊(Prompted persona switches)**:在提示模板中,通常會讓 LLM 扮演特定角色以便針對某領域或用途生成回應(例如:「你是一位財務分析師」)。這類攻擊會嘗試誘導模型切換到惡意或挑釁性的角色設定。 - **Extracting the prompt template.** In this type of attack, an LLM is asked to print out all of its instructions from the prompt template. This risks opening up the model to further attacks that specifically target any exposed vulnerabilities. For example, if the prompt template contains a specific XML tagging structure, a malicious user might attempt to spoof these tags and insert their own harmful instructions. > **提取提示模板(Extracting the prompt template)**:攻擊者要求模型列印出其完整提示模板中的指令內容。這可能導致模型暴露結構細節,進而被利用來插入惡意指令或偽造 XML 標籤等。 - **Ignoring the prompt template.** This general attack consists of a request to ignore the model's given instructions. For example, if a prompt template specifies that an LLM should answer questions only about the weather, a user might ask the model to ignore that instruction and to provide information on a harmful topic. > **忽略提示模板(Ignoring the prompt template)**:攻擊者要求模型「忽略」既有的指令。例如,若模板規定模型只能回答天氣問題,攻擊者可能要求它忽略該限制並回答危險主題。 - **Alternating languages and escape characters.** This type of attack uses multiple languages and escape characters to feed the LLM sets of conflicting instructions. > **混合語言與跳脫字元攻擊(Alternating languages and escape characters)**:攻擊者混用不同語言與跳脫字元,向模型注入衝突指令,藉此繞過語言限制或內容過濾。 - **Extracting conversation history.** This type of attack requests an LLM to print out its conversation history, which might contain sensitive information. > **提取對話紀錄(Extracting conversation history)**:攻擊者要求模型輸出對話歷史紀錄,其中可能包含敏感資訊。 - **Augmenting the prompt template.** This attack tries to cause the model to augment its own template — for example, modifying its persona or resetting before injecting malicious initialization instructions. > **增強提示模板攻擊(Augmenting the prompt template)**:攻擊者誘導模型修改自身的模板,例如改變角色或重置狀態,以便插入惡意指令。 - **Fake completion (guiding the LLM to disobedience).** This attack provides precompleted answers to influence the model’s next response. This prompting strategy is sometimes known as [prefilling](https://docs.anthropic.com/claude/docs/prefill-claudes-response). > **假完成攻擊(Fake completion)**:攻擊者在提示中加入預設答案,引導模型偏離模板指令,進而產生惡意輸出。 - **Rephrasing or obfuscating common attacks.** This attack rephrases or obfuscates malicious instructions to avoid detection. > **重述或混淆攻擊(Rephrasing or obfuscating attacks)**:攻擊者以同義詞、符號替換或數字混寫等方式混淆惡意內容,避開模型偵測。 - **Changing the output format of common attacks.** This attack changes the output format to bypass application output filters. > **改變輸出格式(Changing output format)**:攻擊者改變惡意內容的輸出格式,以繞過應用層的輸出過濾機制。 - **Changing the input attack format.** This attack hides malicious instructions using encodings like base64 to bypass input filters. > **改變輸入格式(Changing input format)**:攻擊者以非人類可讀格式(如 Base64 編碼)隱藏指令,避開輸入過濾器。 - **Exploiting friendliness and trust.** LLMs may respond differently to friendly versus adversarial tones. Attackers exploit this by using polite or persuasive phrasing to deliver malicious instructions. > **利用友善與信任(Exploiting friendliness and trust)**:攻擊者以友善或信任語氣包裝惡意請求,使模型較不警覺並順從執行。 --- Some of these attacks occur independently, whereas others can be combined in a chain of multiple offense strategies. The key to securing a model against hybrid attacks is a set of guardrails that can help defend against each individual attack. > 有些攻擊會單獨發生,也有可能以多階段、連鎖方式組合使用。防禦混合型攻擊的關鍵在於設計針對各種攻擊手法的多層防護機制。 ::: ## Model Inference :::spoiler ### Influence response generation with inference parameters [**PDF**](https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf#inference-parameters) | [**RSS**](https://docs.aws.amazon.com/bedrock/latest/userguide/bedrock-ug.rss) When running model inference, you can adjust inference parameters to influence the model response. Inference parameters can change the pool of possible outputs that the model considers during generation, or they can limit the final response. To learn about inference parameters for different models, see [Inference request parameters and response fields for foundation models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html). > 在執行模型推論(inference)時,你可以調整推論參數以影響模型的回應。這些參數可以改變模型在生成過程中考慮的可能輸出範圍,或限制最終回應的內容。若想了解不同基礎模型的推論參數,請參考[〈基礎模型的推論請求參數與回應欄位〉](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html)。 The following categories of parameters are commonly found across different models: > 以下列出了多數模型中常見的推論參數類別: ### Topics - [Randomness and diversity](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html#inference-randomness) - [Length](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html#inference-length) --- ### **Randomness and diversity** For any given sequence, a model determines a probability distribution of options for the next token in the sequence. To generate each token in an output, the model samples from this distribution. Randomness and diversity refer to the amount of variation in a model's response. You can control these factors by limiting or adjusting the distribution. Foundation models typically support the following parameters to control randomness and diversity in the response. > 在生成過程中,模型會根據目前的序列(sequence)為下一個可能的 token 建立機率分布,並從中抽樣產生輸出。 > 「隨機性」(randomness)與「多樣性」(diversity)指的是模型回應中可變化的程度。 > 你可以透過調整分布形狀或取樣範圍來控制這些因素。 > 基礎模型通常支援以下幾種參數以控制回應的隨機性與多樣性。 --- - **Temperature** – Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs. - Choose a lower value to influence the model to select higher-probability outputs. - Choose a higher value to influence the model to select lower-probability outputs. In technical terms, the temperature modulates the probability mass function for the next token. A lower temperature steepens the function and leads to more deterministic responses, and a higher temperature flattens the function and leads to more random responses. > **溫度(Temperature)**:控制機率分布的形狀,影響模型選擇低機率輸出的傾向。 > - 較低的溫度值會使模型更傾向於選擇高機率的結果(輸出更穩定)。 > - 較高的溫度值則會增加低機率選項被選中的機會(輸出更隨機)。 > > 技術上來說,溫度參數會調整下一個 token 的機率質量函數(probability mass function)。低溫度會使分布更陡峭、結果更確定;高溫度會使分布更平坦、結果更隨機。 --- - **Top K** – The number of most-likely candidates that the model considers for the next token. - Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. - Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs. For example, if you choose a value of 50 for Top K, the model selects from 50 of the most probable tokens that could be next in the sequence. > **Top K**:指定模型在生成下一個 token 時可考慮的「最高機率候選數量」。 > - 較低的值(例如 K=10)會限制模型只從最可能的前 10 個 token 中選擇。 > - 較高的值(例如 K=100)則允許模型考慮更多低機率候選。 > > 例如,若設定 **Top K = 50**,則模型只會從最有可能的 50 個候選 token 中抽樣。 --- - **Top P** – The percentage of most-likely candidates that the model considers for the next token. - Choose a lower value to decrease the size of the pool and limit the options to more likely outputs. - Choose a higher value to increase the size of the pool and allow the model to consider less likely outputs. In technical terms, the model computes the cumulative probability distribution for the set of responses and considers only the top P% of the distribution. For example, if you choose a value of 0.8 for Top P, the model selects from the top 80% of the probability distribution of tokens that could be next in the sequence. > **Top P(又稱 nucleus sampling)**:以「累積機率百分比」控制候選集合大小。 > - 較低的值(例如 0.6)會限制模型只從累積機率前 60% 的 token 中抽樣。 > - 較高的值(例如 0.9)允許模型考慮更多低機率選項。 > > 例如,若設定 **Top P = 0.8**,模型會從累積機率達 80% 的 token 候選中取樣。 --- ### Summary Table | **Parameter** | **Effect of lower value** | **Effect of higher value** | | --- | --- | --- | | Temperature | Increase likelihood of higher-probability tokens; Decrease likelihood of lower-probability tokens | Increase likelihood of lower-probability tokens; Decrease likelihood of higher-probability tokens | | Top K | Remove lower-probability tokens | Allow lower-probability tokens | | Top P | Remove lower-probability tokens | Allow lower-probability tokens | > **參數效果總覽表:** > > | 參數 | 較低數值的效果 | 較高數值的效果 | > |------|----------------|----------------| > | Temperature | 增加高機率 token 的選擇率;降低低機率 token 的選擇率 | 增加低機率 token 的選擇率;降低高機率 token 的選擇率 | > | Top K | 排除低機率 token | 允許包含低機率 token | > | Top P | 排除低機率 token | 允許包含低機率 token | --- As an example to understand these parameters, consider the prompt **`I hear the hoof beats of "`**. Let's say the model determines the following three words to be candidates for the next token, each with its probability: ```json { "horses": 0.7, "zebras": 0.2, "unicorns": 0.1 } ``` - If you set a high temperature, the probability distribution flattens, increasing the chance of choosing “unicorns” and decreasing “horses”. - If Top K = 2, only “horses” and “zebras” are considered. - If Top P = 0.7, only “horses” qualifies; if Top P = 0.9, both “horses” and “zebras” are considered. >示例說明: 提示詞為 I hear the hoof beats of "(我聽到蹄聲來自……) 假設模型對三個候選詞的機率分布如下: ``` { "horses": 0.7, "zebras": 0.2, "unicorns": 0.1 } ``` >- 若設定較高 Temperature,分布會變平,模型更可能選擇 “unicorns”。 >- 若設定 Top K = 2,只考慮 “horses” 與 “zebras”。 >- 若設定 Top P = 0.7,僅包含 “horses”;若 Top P = 0.9,則包含 “horses” 與 “zebras”。 ### Length Foundation models typically support parameters that limit the length of the response. Examples of these parameters are provided below. > 回應長度控制: 基礎模型通常支援以下參數,用以限制回應的字元數或 token 數。 - Response length – Specify the minimum or maximum number of tokens to return. >- 回應長度(Response length):設定輸出回應的最小或最大 token 數。 - Penalties – Specify the degree to which to penalize outputs in a response. Examples include: - Length of the response - Repeated tokens - Frequency of tokens - Types of tokens >- 懲罰參數(Penalties):調整對回應內容的懲罰程度,如: - 回應長度 - 重複 token - token 出現頻率 - token 類型 - Stop sequences – Specify character sequences that stop the model from generating further tokens. >- 停止序列(Stop sequences):設定特定字元序列,使模型在產生到該序列時立即停止生成。 [Influence response generation with inference parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html) ::: ## Supervised Learning :::spoiler ### **Supervised Learning Algorithms** Supervised learning algorithms train on sample data that specifies both the algorithm's input and output. For example, the data could be images of handwritten numbers that are annotated to indicate which numbers they represent. Given sufficient labeled data, the supervised learning system would eventually recognize the clusters of pixels and shapes associated with each handwritten number. > **監督式學習演算法(Supervised Learning Algorithms)** > 監督式學習演算法會在同時包含輸入與輸出的樣本資料上進行訓練。 > 例如,資料可能是手寫數字的影像,且每張影像都附有對應的數字標籤。 > 當有足夠的標註資料時,監督式學習系統最終能夠辨識出與每個手寫數字相關的像素與形狀群集。 --- ##### ***Logistic regression*** Logistic regression predicts a categorical output based on one or more inputs. Binary classification is when the output fits into one of two categories, such as yes or no and pass or fail. Multiple class classification is when the output fits into more than two categories, such as cat, dog, or rabbit. An example of logistic regression is predicting whether a student will pass or fail a unit based on their number of logins to the courseware. **The prediction usually has a finite number of outcomes, like yes or no.** > **邏輯迴歸(Logistic Regression)** > 邏輯迴歸根據一個或多個輸入來預測「類別型輸出」。 > 若輸出僅能歸入兩種分類(例如「是/否」或「通過/未通過」),稱為**二元分類(binary classification)**; > 若輸出可歸入多於兩類(如「貓、狗、兔子」),則為**多類別分類(multi-class classification)**。 > 一個例子是根據學生登入課程系統的次數,預測該學生是否會通過該單元。 > **此類預測的結果通常是有限且離散的,例如「是」或「否」。** ![image (5)](https://hackmd.io/_uploads/S1ncxs2pxl.png) --- #### **What is linear regression?** Linear regression is a data analysis technique that predicts the value of unknown data by using another related and known data value. It mathematically models the unknown or dependent variable and the known or independent variable as a linear equation. For instance, suppose that you have data about your expenses and income for last year. Linear regression techniques analyze this data and determine that your expenses are half your income. They then calculate an unknown future expense by halving a future known income. > **什麼是線性迴歸(Linear Regression)?** > 線性迴歸是一種資料分析技術,用已知的相關資料來預測未知數值。 > 它將未知(應變數)與已知(自變數)之間的關係以數學線性方程式建模。 > 例如,假設你有去年收入與支出的資料。 > 線性迴歸可分析此資料,發現「支出約為收入的一半」,並進一步利用未來的預期收入來估算未來支出。 Linear regression refers to supervised learning models that, based on one or more inputs, predict a value from a continuous scale. An example of linear regression is predicting a house price. You could predict a house’s price based on its location, age, and number of rooms, after you train a model on a set of historical sales training data with those variables. > 線性迴歸是一種監督式學習模型,根據一個或多個輸入變數來預測「連續型數值」。 > 例如,可根據房屋的地點、屋齡與房間數等變數,利用歷史銷售資料訓練模型,以預測房價。 --- ##### ***Decision tree*** The decision tree supervised machine learning technique takes some given inputs and applies an if-else structure to predict an outcome. An example of a decision tree problem is predicting customer churn. For example, if a customer doesn’t visit an application after signing up, the model might predict churn. Or if the customer accesses the application on multiple devices and the average session time is above a given threshold, the model might predict retention. > **決策樹(Decision Tree)** > 決策樹是一種監督式機器學習技術,它根據輸入資料使用「if-else」邏輯結構進行決策與預測。 > 例如,用於預測顧客流失(churn)的情境: > - 若顧客註冊後未再登入應用程式,模型可能預測該顧客會流失; > - 若顧客在多個裝置上使用,且平均使用時間高於某閾值,模型則可能預測該顧客會留下。 --- ##### ***Neural network*** A neural network solution is a more complex supervised learning technique. To produce a given outcome, it takes some given inputs and performs one or more layers of mathematical transformation based on adjusting data weightings. An example of a neural network technique is predicting a digit from a handwritten image. > **神經網路(Neural Network)** > 神經網路是一種更複雜的監督式學習方法。 > 它接收輸入資料,經過一層或多層數學轉換,並根據權重(weight)調整來生成輸出結果。 > 例如,神經網路可用於從手寫影像中辨識出具體的數字。 --- ##### **Logistic regression vs. linear regression** Linear regression predicts a continuous dependent variable by using a given set of independent variables. A continuous variable can have a range of values, such as price or age. So linear regression can predict actual values of the dependent variable. It can answer questions like "What will the price of rice be after 10 years?" > **邏輯迴歸 vs. 線性迴歸** > 線性迴歸根據一組自變數來預測「連續型應變數」,例如價格或年齡。 > 因此,它可以用來預測實際的數值,如「十年後稻米的價格會是多少?」。 Unlike linear regression, logistic regression is a classification algorithm. It cannot predict actual values for continuous data. It can answer questions like "Will the price of rice increase by 50% in 10 years?" > 相較之下,邏輯迴歸是一種分類演算法,無法預測連續數值。 > 它回答的是分類問題,例如「十年後稻米價格是否會上漲 50%?」。 --- ##### **Deep learning** [Deep learning](https://aws.amazon.com/what-is/deep-learning/) uses neural networks or software components that simulate the human brain to analyze information. Deep learning calculations are based on the mathematical concept of vectors. > **深度學習(Deep Learning)** > 深度學習使用模擬人腦運作的神經網路來分析資訊。 > 其運算基於數學上的「向量(vector)」概念。 --- ##### **Logistic regression vs. deep learning** Logistic regression is less complex and less compute intensive than deep learning. More importantly, deep learning calculations cannot be investigated or modified by developers, due to their complex, machine-driven nature. On the other hand, logistic regression calculations are transparent and easier to troubleshoot. > **邏輯迴歸 vs. 深度學習** > 邏輯迴歸相較於深度學習更簡單、運算需求更低。 > 更重要的是,由於深度學習的運算過程極為複雜且由機器自動調整,其內部邏輯通常無法被開發者檢視或修改。 > 相對地,邏輯迴歸的數學結構透明,容易追蹤與除錯。 [Supervised vs Unsupervised Learning](https://aws.amazon.com/tw/compare/the-difference-between-machine-learning-supervised-and-unsupervised/) ::: ## Unsupervised Learning :::spoiler # **Unsupervised Machine Learning** Unsupervised machine learning is when you give the algorithm input data without any labeled output data. Then, on its own, the algorithm identifies patterns and relationships in and between the data. Next are some types of unsupervised learning techniques. > **非監督式學習(Unsupervised Machine Learning)** > 非監督式學習是指僅提供演算法輸入資料,而不附上任何標註的輸出資料。 > 此時,演算法會自行從資料中發現內部的模式與關聯性。 > 以下介紹幾種常見的非監督式學習技術。 --- ### ***Clustering*** The clustering unsupervised learning technique groups certain data inputs together, so they may be categorized as a whole. There are various types of clustering algorithms depending on the input data. An example of clustering is identifying different types of network traffic to predict potential security incidents. > **叢集分析(Clustering)** > 叢集分析是一種非監督式學習技術,用來將特定資料輸入分組,以便將其歸類為整體。 > 根據資料型態的不同,有多種不同的叢集演算法。 > 例如,可利用叢集分析識別不同類型的網路流量,以預測潛在的安全事件。 --- ### ***Association rule learning*** Association rule learning techniques uncover rule-based relationships between inputs in a dataset. For example, the Apriori algorithm conducts market basket analysis to identify rules like coffee and milk often being purchased together. > **關聯規則學習(Association Rule Learning)** > 關聯規則學習用於找出資料集中輸入變數之間的規則性關係。 > 例如,Apriori 演算法在「市場購物籃分析(market basket analysis)」中,能識別出「咖啡與牛奶常被同時購買」這類規則。 --- ### ***Probability density*** Probability density techniques in unsupervised learning predict the likelihood or possibility of an output’s value being within range of what is considered normal for an input. For example, a temperature gauge in a server room typically records between a certain degree range. However, if it suddenly measures a low number based on the probability distribution, it may indicate equipment malfunction. > **機率密度估計(Probability Density)** > 機率密度技術用於預測某輸入值的輸出結果是否落在「被視為正常」的範圍內。 > 例如,伺服室的溫度感測器通常記錄在一定範圍內。 > 若某次測量的溫度顯著低於機率分布中常見的範圍,則可能代表設備發生故障。 --- ### ***Dimensionality reduction*** Dimensionality reduction is an unsupervised learning technique that reduces the number of features in a dataset. It’s often used to preprocess data for other machine learning functions and reduce complexity and overheads. For example, it may blur out or crop background features in an image recognition application. > **降維(Dimensionality Reduction)** > 降維是一種非監督式學習技術,用來減少資料集中的特徵數量。 > 它常用於為其他機器學習任務進行資料前處理,以降低複雜度與計算負擔。 > 例如,在影像辨識應用中,降維可能會模糊或裁剪背景特徵,以聚焦於主要目標。 --- 📖 [Source: *What is Logistic Regression? - Logistic Regression Model Explained - AWS (amazon.com)*](https://aws.amazon.com/what-is/logistic-regression/) ::: ## Reinforcement Learning :::spoiler # **Reinforcement Learning (RL) and RLHF** Reinforcement learning is an ML technique where an agent learns to make decisions by performing actions in an environment and receives rewards and punishments **for every action. This technique meets the requirement to train a model to navigate a complex, unpredictable environment by learning effective strategies through trial and error.** > **強化學習(Reinforcement Learning, RL)** > 強化學習是一種機器學習技術,透過讓智能體(agent)在環境中執行動作,並為每個動作獲得獎勵或懲罰,來學習如何做出最佳決策。 > 此技術特別適用於訓練模型在複雜且不可預測的環境中,透過「嘗試與錯誤(trial and error)」的過程學習有效策略。 --- Reinforcement learning (RL) combines fields such as computer science, neuroscience, and psychology to determine how to map situations to actions to maximize a numerical reward signal. > 強化學習(RL)結合了電腦科學、神經科學與心理學等領域, > 其核心目標是找出如何將「情境(situations)」對應至「行動(actions)」, > 以最大化「數值化獎勵信號(numerical reward signal)」的總量。 --- RL deals with interactive problems, making it infeasible to gather all possible examples of situations with correct labels that an agent might encounter. This type of learning is most promising when an agent is able to accurately learn from its own experience and adjust accordingly. > 強化學習主要處理「互動式問題(interactive problems)」, > 因此難以事先蒐集智能體可能遇到的所有情境與其正確標籤。 > 當智能體能夠準確地從自身經驗中學習並做出調整時, > 這類學習方式的效能與潛力最為顯著。 --- ## **What is RLHF?** Reinforcement learning from human feedback (RLHF) is a machine learning (ML) technique that uses human feedback to optimize ML models to self-learn more efficiently. Reinforcement learning (RL) techniques train software to make decisions that maximize rewards, making their outcomes more accurate. RLHF incorporates human feedback in the rewards function, so the ML model can perform tasks more aligned with human goals, wants, and needs. RLHF is used throughout generative artificial intelligence (generative AI) applications, including in large language models (LLM). > **什麼是人類回饋強化學習(Reinforcement Learning from Human Feedback, RLHF)?** > RLHF 是一種結合人類回饋的機器學習技術,用於優化模型的自我學習效率。 > 傳統強化學習(RL)訓練軟體透過最大化獎勵來改善決策準確性, > 而 RLHF 則在獎勵函數中融入「人類回饋」, > 使模型在學習過程中更貼近人類的目標、需求與價值。 > 這項技術廣泛應用於生成式人工智慧(Generative AI)系統中, > 包括大型語言模型(Large Language Models, LLM)。 --- 📖 [Source: AWS Documentation – *Reinforcement Learning and RLHF*](https://aws.amazon.com/) ::: ## Transfer Learning :::spoiler # **Transfer Learning** Transfer learning is an ML technique where a model that is pre-trained on one task is fine-tuned for a new, related task. This technique does not meet the requirement to train a model to navigate a complex, unpredictable environment by learning effective strategies through trial and error. > **遷移學習(Transfer Learning)** > 遷移學習是一種機器學習技術,透過在一項任務上預先訓練好的模型, > 再針對新的、相關的任務進行微調(fine-tuning)。 > 不同於強化學習,遷移學習並不依賴「嘗試與錯誤(trial and error)」的過程, > 因此不適用於需要模型在複雜或不可預測環境中自主探索策略的情境。 --- Transfer learning is a technique that involves taking a pre-trained model, such as a foundation model, and fine-tuning it on a new, domain-specific dataset. This method is highly suitable when you need precise, domain-specific responses because it allows the model to leverage the knowledge it already has from general-purpose training and adapt it to a particular use case. Fine-tuning a model with transfer learning ensures that the model retains its basic abilities while improving its performance in specialized areas. This approach is efficient because it requires less data and computation than training a new model from scratch, while delivering highly accurate, domain-specific results. > 遷移學習的核心概念,是將一個**已預訓練模型(pre-trained model)**, > 例如基礎模型(foundation model), > 在新的、特定領域的資料集上進行**微調(fine-tuning)**。 > > 這種方法特別適合需要**高精度、特定領域輸出**的應用, > 因為模型能利用其在一般性訓練中所獲得的知識, > 並將之轉化為針對特定任務的能力。 > > 經由遷移學習微調後,模型能夠**保留原有的基礎能力**, > 同時提升其在特定領域的表現。 > > 此方法相較於從零開始訓練模型, > 不僅更**高效(節省資料與運算資源)**, > 亦能提供**高度準確且具領域適應性**的結果。 --- 📖 [Source: *What is Transfer Learning? - AWS Documentation*](https://aws.amazon.com/what-is/transfer-learning/) ::: ## Data preprocessing :::spoiler # **Data Preprocessing** Data preprocessing puts data into the right shape and quality for training. There are many data preprocessing strategies including: data cleaning, balancing, replacing, imputing, partitioning, scaling, augmenting, and unbiasing. > **資料前處理(Data Preprocessing)** > 資料前處理的目的,是將原始資料整理成適合訓練的**正確形態與品質**。 > > 常見的資料前處理策略包括: > - **資料清理(data cleaning)**:移除錯誤或不完整的資料。 > - **資料平衡(balancing)**:處理分類資料中類別比例失衡的問題。 > - **資料替換(replacing)**:修正或替代異常值或遺漏值。 > - **資料填補(imputing)**:以統計或模型方法補齊缺失資料。 > - **資料分割(partitioning)**:將資料劃分為訓練集、驗證集與測試集。 > - **資料縮放(scaling)**:標準化或正規化數值資料。 > - **資料增強(augmenting)**:利用變形或合成技術擴增資料集。 > - **資料去偏(unbiasing)**:減少資料中的偏見,以提升模型公平性。 > > 透過這些步驟,資料能以一致、乾淨且適合訓練的形式輸入模型, > 為後續的學習與推論奠定基礎。 :::