Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
課程概要
- 課程概覽
- 與AWS合作創建的課程,涵蓋生成式AI的基礎知識、實踐技能、功能理解,以及如何在現實世界應用的部署方法
- 深入了解包括最新的Gen AI研究,公司如何利用尖端技術創造價值,以及LLM基礎生成式AI生命周期的關鍵步驟
- 實踐技巧:詳細介紹驅動LLM的Transformers架構,如何被訓練,以及微調如何使LLM適應各種特定用例,並使用經驗性擴展法則優化模型的目標函數
Week1 - 生成式AI應用案例、項目生命周期與模型預訓練
Generative AI use cases, project lifecycle, and model pre-training
- 學習目標
- 討論模型預訓練(pre-training)以及持續預訓練(continued pre-training)與微調(fine-tuning)的價值
- 定義術語「生成式人工智慧」、大型語言模型、提示並描述為LLM提供支援的 Transformer 架構
- 描述典型的基於 LLM 的生成式 AI 模型生命週期中的步驟,並討論在模型生命週期的每個步驟中推動決策的限制因素
- 討論模型預訓練期間的計算挑戰並確定如何有效減少記憶體佔用
- 定義術語「縮放法則」並描述已為LLM發現的與訓練資料集大小、計算預算、推理要求和其他因素相關的法則
1-1. Introduction to LLMs and the generative Al project lifecycle
生成式AI應用案例、項目生命周期與模型預訓練 (Generative AI use cases, project lifecycle, and model pre-training)
- 課程介紹大型語言模型(LLMs)及其應用場景、工作原理、提示工程(prompt engineering)、創造性文本輸出的方法,並概述生成式AI項目的生命週期
- 大型語言模型是傳統機器學習的子集,通過在大量人類原創內容的數據集中找到統計模式來學習這些能力
- 基礎模型具有數十億個參數,展現出語言之外的突出特性,並且研究人員正在揭示它們分解複雜任務、推理和問題解決的能力
- 通過使用這些模型原型或通過應用微調技術來適應特定用例,可以快速建立定制解決方案
- 雖然生成式AI模型正在為多種模式創建,但在這個課程中,將專注於大型語言模型及其在自然語言生成中的用途
- 提示被傳遞給模型後,模型預測下一個詞語,並因為提示包含了問題,模型生成了答案
- 使用模型生成文本的行為被稱為推理(inference),完成包括原始提示中的文本,後面跟著生成的文本
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
大型語言模型的使用案例與任務 (LLM use cases and tasks)
-
大型語言模型(LLMs)應用範圍
- 不限於聊天機器人,廣泛應用於寫作、摘要、翻譯、從自然語言生成機器代碼、信息檢索等多種文本生成任務。
-
提示(Prompt)與生成
- 提示是與LLMs互動的基礎,模型根據提示生成相應的文本或代碼。
- 透過精心設計的提示,可以引導模型更準確地完成特定任務。
-
LLMs的進階互動與微調
- 連接外部數據源與API調用
- 使模型能夠獲取預訓練階段未知的信息,擴展其與現實世界互動的能力。
- 模型規模與語言理解的提升
- 隨著基礎模型的參數從數十億增長到數十億甚至數百億,模型的語言理解能力顯著增強。
- 微調(Fine-Tuning)
- 即使是較小的模型,也可以通過微調專門針對特定任務進行優化,提升性能。
- 架構的重要性
- LLMs能力的快速增長主要歸功於其先進的架構,使其能夠有效學習和處理大量數據。
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Transformers架構之前的文本生成 (Text generation before transformers)
-
RNN
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 生成演算法並非新概念,先前的語言模型使用了稱為遞歸神經網絡(RNNs)的架構
- RNN在其時代雖強大,但在生成任務上受限於所需的計算量和記憶體
- RNN實作在處理簡單的下一個字預測生成任務時,若只看到一個前置詞,預測效果不佳
- 擴大RNN以查看文本中更多前置詞時,需要顯著增加模型使用的資源,但預測仍可能失敗
-
自然語言的複雜性
- 模型要成功預測下一個字,需要看到不僅僅是前幾個字,而是整個句子甚至整個文件
- 語言複雜性大,
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
同一詞在不同語境可能有多重含義(同義詞),僅有句子上下文才能釐清意義
- 句子結構可能含糊或具語法歧義,例如"老師用書教學生",難以判斷是老師使用書籍教學還是學生擁有書籍
- Understanding language can be challenging
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-
Transformer架構出現產生了革命性變化
- Scale efficiently
- Parallel process
- Attention to input meaning
2017年,Google和多倫多大學發表的論文《Attention is All You Need》後,一切改變,引入了Transformers架構,其能有效擴展以使用多核心GPU, 平行處理輸入數據,利用更大的訓練數據集,關鍵在於能學習關注其處理詞語的含義
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-
Self-attention
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-
轉換器架構的結構
- 分為編碼器(encoder)和解碼器(decoder)兩部分
- 兩部分共享許多相似性
-
處理文本的過程
- 將文本轉換為數值:分詞(tokenization)
- 分詞後的輸入進入嵌入層(embedding layer)
- 嵌入向量空間用於編碼單個分詞的意義和上下文
-
自注意力層的作用
- 輸入分詞和位置編碼一起輸入到自注意力層
- 模型分析輸入序列中分詞之間的關係
-
多頭自注意力的概念
- 轉換器架構有多頭自注意力,意味著多組自注意力權重或頭部獨立並行學習
- 每個自注意力頭部可能學習語言的不同方面
-
輸出處理
- 輸出通過全連接前饋網絡處理
- 最終通過softmax層轉換為每個詞語的機率分數
使用Transformers架構生成文本 (Generating text with transformers)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
-
翻譯任務範例
- 使用Transformers模型將法語短語翻譯成英語
- 過程包括:使用訓練網絡的同一分詞器(tokenizer)分詞輸入單詞、通過編碼器(encoder)的嵌入層、多頭注意力層、前饋網絡,到達編碼器輸出
- 編碼器的輸出代表輸入序列的深層結構和意義,這一表示形式被插入解碼器(decoder)中間,影響解碼器的自我注意力機制
- 解碼器基於編碼器提供的上下文理解預測下一個token,直到模型預測出序列終止token為止
- 最終的token序列被反分詞成單詞,得到輸出
-
輸出預測的多種方式
- 從softmax層的輸出預測下一個token有多種方式,這些方式會影響生成文本的創造性
-
Transformers架構總結
- 完整的Transformers架構包括編碼器和解碼器
- 編碼器將輸入序列編碼成深層表示,解碼器則利用編碼器的上下文理解生成新token
- 翻譯範例展示了編碼器和解碼器的使用,但也可以將這些組件分開,用於架構的變體
-
模型類型
- 僅編碼器模型(Encoder Only Models):
- 用於序列到序列模型,但輸入和輸出序列長度相同,如BERT
- 編碼器-解碼器模型(Encoder Decoder Models):
- 適用於翻譯等序列到序列任務,輸入和輸出序列長度可以不同,如BART和T5。
- 僅解碼器模型(Decoder Only Models):
- 如GPT系列、BLOOM、Jurassic、LLaMA等,現今最常用,能夠泛化到大多數任務

-
課程目標
- 提供足夠背景知識,理解世界上使用的各種模型之間的差異,能夠閱讀模型文檔
- 介紹提示工程(prompt engineering),即通過自然語言創建提示,而不是代碼,將在課程的下一部分探索
提示與提示工程 (Prompting and prompt engineering)

- 模型性能與規模
- 大型模型在零次推理中表現出色,能夠完成未特別訓練的多種任務
- 小型模型通常只擅長於訓練時相似的少數任務
- 微調(Fine-Tuning)
- 使用新數據對模型進行額外訓練,使其更能完成特定任務
- 如果包含過多範例未能提升模型性能,應考慮微調模型
- 模型選擇與配置
- 根據用例嘗試不同模型以找到適合的模型
- 一旦找到適合的模型,可以嘗試不同設置來影響模型生成的完成的結構和風格
生成配置 (Generative configuration)
生成式AI項目生命周期 (Generative AI project lifecycle)
AWS實驗室介紹 (Introduction to AWS labs)
Lab 1 - 生成式AI使用案例:對話摘要 (Lab 1 - Generative AI Use Case: Summarize Dialogue)
Lab 1 - Generative AI Use Case: Summarize Dialogue。生成式AI使用案例:對話摘要
1-2. LLM pre-training and scaling laws
大型語言模型的預訓練 (Pre-training large language models)
訓練大型語言模型的計算挑戰 (Computational challenges of training LLMs)
[選修:高效的多GPU計算策略 (Optional video: Efficient multi-GPU compute strategies)]
擴展法則與計算最優模型 (Scaling laws and compute-optimal models)
領域適應的預訓練 (Pre-training for domain adaptation)
領域特定訓練:BloombergGPT (Domain-specific training: BloombergGPT)
第一週測驗 (Week 1 quiz)