YH Hsu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    1
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    ### [AI / ML領域相關學習筆記入口頁面](https://hackmd.io/@YungHuiHsu/BySsb5dfp) #### [Deeplearning.ai GenAI/LLM系列課程筆記](https://learn.deeplearning.ai/) - [Large Language Models with Semantic Search。大型語言模型與語義搜索 ](https://hackmd.io/@YungHuiHsu/rku-vjhZT) - [Finetuning Large Language Models。微調大型語言模型](https://hackmd.io/@YungHuiHsu/HJ6AT8XG6) - [LangChain for LLM Application Development。使用LangChain進行LLM應用開發](https://hackmd.io/1r4pzdfFRwOIRrhtF9iFKQ) - [Building and Evaluating Advanced RAG。建立語評估進階RAG](https://hackmd.io/@YungHuiHsu/rkqGpCDca) ## [Generative AI with Large Language Models](https://www.deeplearning.ai/courses/generative-ai-with-llms/) ![image](https://hackmd.io/_uploads/BkXI2znTp.png =30x) ![image](https://hackmd.io/_uploads/S1uv3Ghpa.png =150x) - [Week1-Generative AI use cases, project lifecycle, and model pre-training](https://hackmd.io/@YungHuiHsu/By7dsMnTp) - [Week2-Fine-tuning and evaluating large language models](https://hackmd.io/@YungHuiHsu/HJQrU7npp) - [Week3-Reinforcement learning and LLM-powered applications](https://hackmd.io/@YungHuiHsu/Hkbxu7h6T) --- ### 課程概要 - **課程概覽** - 與AWS合作創建的課程,涵蓋生成式AI的基礎知識、實踐技能、功能理解,以及如何在現實世界應用的部署方法 - 深入了解包括最新的Gen AI研究,公司如何利用尖端技術創造價值,以及LLM基礎生成式AI生命周期的關鍵步驟 - 實踐技巧:詳細介紹驅動LLM的Transformers架構,如何被訓練,以及微調如何使LLM適應各種特定用例,並使用經驗性擴展法則優化模型的目標函數 --- ## Week1 - 生成式AI應用案例、項目生命周期與模型預訓練<br>Generative AI use cases, project lifecycle, and model pre-training :::success - 學習目標 * 討論模型預訓練(pre-training)以及持續預訓練(continued pre-training)與微調(fine-tuning)的價值 * 定義術語「生成式人工智慧」、大型語言模型、提示並描述為LLM提供支援的 Transformer 架構 * 描述典型的基於 LLM 的生成式 AI 模型生命週期中的步驟,並討論在模型生命週期的每個步驟中推動決策的限制因素 * 討論模型預訓練期間的計算挑戰並確定**如何有效減少記憶體佔用** * 定義術語「縮放法則」並描述已為LLM發現的與訓練資料集大小、計算預算、推理要求和其他因素相關的法則 ::: <iframe style="border:none" width="800" height="450" src="https://whimsical.com/embed/3HLYmuD2jkGRV9ghm5GQ4r@6HYTAunKLgTU8XzPC2JsoyBB2t5THHqYWLHEbK3rLFXj6MP"></iframe> ### 1-1. Introduction to LLMs and the generative Al project lifecycle #### 生成式AI應用案例、項目生命周期與模型預訓練 (Generative AI use cases, project lifecycle, and model pre-training) - 課程介紹大型語言模型(LLMs)及其應用場景、工作原理、提示工程(prompt engineering)、創造性文本輸出的方法,並概述生成式AI項目的生命週期 - 大型語言模型是傳統機器學習的子集,通過在大量人類原創內容的數據集中找到統計模式來學習這些能力 - 基礎模型具有數十億個參數,展現出語言之外的突出特性,並且研究人員正在揭示它們分解複雜任務、推理和問題解決的能力 - 通過使用這些模型原型或通過應用微調技術來適應特定用例,可以快速建立定制解決方案 - 雖然生成式AI模型正在為多種模式創建,但在這個課程中,將專注於大型語言模型及其在自然語言生成中的用途 - 提示被傳遞給模型後,模型預測下一個詞語,並因為提示包含了問題,模型生成了答案 - 使用模型生成文本的行為被稱為推理(inference),完成包括原始提示中的文本,後面跟著生成的文本 ![image](https://hackmd.io/_uploads/B1jH1r2aT.png =400x) ```mermaid graph LR; A[Prompt] --> B[Model]; B --> C[Completion]; ``` ![image](https://hackmd.io/_uploads/ByuygrhTT.png =500x) #### 大型語言模型的使用案例與任務 (LLM use cases and tasks) - **大型語言模型(LLMs)應用範圍** - 不限於聊天機器人,廣泛應用於寫作、摘要、翻譯、從自然語言生成機器代碼、信息檢索等多種文本生成任務。 - **提示(Prompt)與生成** - 提示是與LLMs互動的基礎,模型根據提示生成相應的文本或代碼。 - 透過精心設計的提示,可以引導模型更準確地完成特定任務。 - **LLMs的進階互動與微調** - **連接外部數據源與API調用** - 使模型能夠獲取預訓練階段未知的信息,擴展其與現實世界互動的能力。 - **模型規模與語言理解的提升** - 隨著基礎模型的參數從數十億增長到數十億甚至數百億,模型的語言理解能力顯著增強。 - **微調(Fine-Tuning)** - 即使是較小的模型,也可以通過微調專門針對特定任務進行優化,提升性能。 - **架構的重要性** - LLMs能力的快速增長主要歸功於其先進的架構,使其能夠有效學習和處理大量數據。 ![image](https://hackmd.io/_uploads/ByObIBna6.png =400x) ![image](https://hackmd.io/_uploads/rypw8H3Tp.png =400x) #### Transformers架構之前的文本生成 (Text generation before transformers) - RNN ![image](https://hackmd.io/_uploads/SJrEPHhpp.png =300x) ![image](https://hackmd.io/_uploads/HJZ8Pr3Tp.png =300x) ![image](https://hackmd.io/_uploads/HkfODrhaa.png =300x) - 生成演算法並非新概念,先前的語言模型使用了稱為遞歸神經網絡(RNNs)的架構 - RNN在其時代雖強大,但在生成任務上受限於所需的計算量和記憶體 - RNN實作在處理簡單的下一個字預測生成任務時,若只看到一個前置詞,預測效果不佳 - 擴大RNN以查看文本中更多前置詞時,需要顯著增加模型使用的資源,但預測仍可能失敗 - 自然語言的複雜性 - 模型要成功預測下一個字,需要看到不僅僅是前幾個字,而是整個句子甚至整個文件 - 語言複雜性大,:pencil2:**同一詞在不同語境可能有多重含義(同義詞),僅有句子上下文才能釐清意**義 - 句子結構可能含糊或具語法歧義,例如"老師用書教學生",難以判斷是老師使用書籍教學還是學生擁有書籍 - Understanding language can be challenging ![image](https://hackmd.io/_uploads/rJ5xjBnpp.png =300x) ![image](https://hackmd.io/_uploads/SJAYvrhap.png =200x) - Transformer架構出現產生了革命性變化 - **Scale efficiently** - **Parallel process** - **Attention to input meaning** 2017年,Google和多倫多大學發表的論文《Attention is All You Need》後,一切改變,引入了Transformers架構,其能有效擴展以使用多核心GPU, 平行處理輸入數據,利用更大的訓練數據集,關鍵在於能學習**關注**其處理詞語的含義 #### Transformers架構 (Transformers architecture) :::success - **轉換器架構的優勢與應用** - 大幅提高自然語言任務性能 - 導致生成能力的爆炸性增長 - 能夠學習句子中所有詞語的相關性和上下文,而不僅僅是相鄰詞語 詳細筆記見[[Transformer] Self-Attention與Transformer](https://hackmd.io/fmJx3K4ySAO-zA0GEr0Clw) ::: ![image](https://hackmd.io/_uploads/B1s97U366.png =300x) - Self-attention ![image](https://hackmd.io/_uploads/By9EhShaa.png =300x) ![image](https://hackmd.io/_uploads/rJx9nBnTa.png =150x)![image](https://hackmd.io/_uploads/SJM9AShTa.png =100x)![image](https://hackmd.io/_uploads/HJxLkL26T.png =300x)![image](https://hackmd.io/_uploads/Sk2j0S3a6.png =220x)![image](https://hackmd.io/_uploads/Hy_GyLhpT.png =170x)![image](https://hackmd.io/_uploads/rkF4yIn6T.png =200x) ![image](https://hackmd.io/_uploads/SyVvJLhaT.png =200x)![image](https://hackmd.io/_uploads/rJatyI2T6.png =180x)![image](https://hackmd.io/_uploads/ByopkU2Ta.png =200x) - **轉換器架構的結構** - 分為編碼器(encoder)和解碼器(decoder)兩部分 - 兩部分共享許多相似性 - **處理文本的過程** - 將文本轉換為數值:分詞(tokenization) - 分詞後的輸入進入嵌入層(embedding layer) - 嵌入向量空間用於編碼單個分詞的意義和上下文 - **自注意力層的作用** - 輸入分詞和位置編碼一起輸入到自注意力層 - 模型分析輸入序列中分詞之間的關係 - **多頭自注意力的概念** - 轉換器架構有多頭自注意力,意味著多組自注意力權重或頭部獨立並行學習 - 每個自注意力頭部可能學習語言的不同方面 - **輸出處理** - 輸出通過全連接前饋網絡處理 - 最終通過softmax層轉換為每個詞語的機率分數 #### 使用Transformers架構生成文本 (Generating text with transformers) ![image](https://hackmd.io/_uploads/BJG1zU2aT.png =400x) - **翻譯任務範例** - 使用Transformers模型將法語短語翻譯成英語 - 過程包括:使用訓練網絡的同一分詞器(tokenizer)分詞輸入單詞、通過編碼器(encoder)的嵌入層、多頭注意力層、前饋網絡,到達編碼器輸出 - 編碼器的輸出代表輸入序列的深層結構和意義,這一表示形式被插入解碼器(decoder)中間,影響解碼器的自我注意力機制 - 解碼器基於編碼器提供的上下文理解預測下一個token,直到模型預測出序列終止token為止 - 最終的token序列被反分詞成單詞,得到輸出 - **輸出預測的多種方式** - 從softmax層的輸出預測下一個token有多種方式,這些方式會影響生成文本的創造性 - **Transformers架構總結** - 完整的Transformers架構包括編碼器和解碼器 - 編碼器將輸入序列編碼成深層表示,解碼器則利用編碼器的上下文理解生成新token - 翻譯範例展示了編碼器和解碼器的使用,但也可以將這些組件分開,用於架構的變體 - **模型類型** - 僅編碼器模型(Encoder Only Models): - 用於序列到序列模型,但輸入和輸出序列長度相同,如BERT - 編碼器-解碼器模型(Encoder Decoder Models): - 適用於翻譯等序列到序列任務,輸入和輸出序列長度可以不同,如BART和T5。 - 僅解碼器模型(Decoder Only Models): - 如GPT系列、BLOOM、Jurassic、LLaMA等,現今最常用,能夠泛化到大多數任務 ![image](https://hackmd.io/_uploads/Hk_ozLh66.png =400x) - **課程目標** - 提供足夠背景知識,理解世界上使用的各種模型之間的差異,能夠閱讀模型文檔 - 介紹提示工程(prompt engineering),即通過自然語言創建提示,而不是代碼,將在課程的下一部分探索 #### 提示與提示工程 (Prompting and prompt engineering) - **基本術語** - **提示(Prompt)**:輸入模型的文本。 - **推理(Inference)**:生成文本的行為。 - **完成(Completion)**:輸出文本。 - **上下文窗口(Context Window)**:對於提示可用的全文或記憶量。 - **提示工程(Prompt Engineering)** - 修改提示的語言或書寫方式以獲得所需結果 - 強大策略之一是在提示中包含要求模型執行的任務範例。 - **在上下文中學習(In-Context Learning,ICL)** 通過在提示中包含範例或額外數據幫助LLMs學習任務 - **零樣本推理(Zero-Shot Inference)**:在不提供範例的情況下,讓模型根據提示完成任務 ![image](https://hackmd.io/_uploads/S1yH48nTT.png =400x) - **單樣本推理(One-Shot Inference)**:在提示中包含一個範例 ![image](https://hackmd.io/_uploads/BkBFNLhT6.png =400x) - **少樣本推理(Few-Shot Inference)**:在提示中包含多個範例 ![image](https://hackmd.io/_uploads/HkvaHI3pa.png =500x) - **模型性能與規模** - 大型模型在零次推理中表現出色,能夠完成未特別訓練的多種任務 - 小型模型通常只擅長於訓練時相似的少數任務 - **微調(Fine-Tuning)** - 使用新數據對模型進行額外訓練,使其更能完成特定任務 - 如果包含過多範例未能提升模型性能,應考慮微調模型 - **模型選擇與配置** - 根據用例嘗試不同模型以找到適合的模型 - 一旦找到適合的模型,可以嘗試不同設置來影響模型生成的完成的結構和風格 #### 生成配置 (Generative configuration) #### 生成式AI項目生命周期 (Generative AI project lifecycle) #### AWS實驗室介紹 (Introduction to AWS labs) #### Lab 1 - 生成式AI使用案例:對話摘要 (Lab 1 - Generative AI Use Case: Summarize Dialogue) #### Lab 1 - Generative AI Use Case: Summarize Dialogue。生成式AI使用案例:對話摘要 --- ### 1-2. LLM pre-training and scaling laws #### 大型語言模型的預訓練 (Pre-training large language models) #### 訓練大型語言模型的計算挑戰 (Computational challenges of training LLMs) #### [選修:高效的多GPU計算策略 (Optional video: Efficient multi-GPU compute strategies)] #### 擴展法則與計算最優模型 (Scaling laws and compute-optimal models) #### 領域適應的預訓練 (Pre-training for domain adaptation) #### 領域特定訓練:BloombergGPT (Domain-specific training: BloombergGPT) #### 第一週測驗 (Week 1 quiz)

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully