AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

2023.09。The Large Language Model Revolution

Dr. Ed H. Chi (紀懷新)
- CoT https://arxiv.org/abs/2201.11903 paper 的作者之一)

Can we teach LLMs like we teach kids?

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

當使用連鎖思考提示時，除了輸入和輸出之外，還會產生一個解釋或推理的部分，這有助於理解語言模型是如何得到其輸出的答案的

相比傳統的標準提示方式，使用連續思考提示(Chain of Thought Prompting)可以更準確地引導模型進行推理

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

在問問題時，提供一個具有邏輯推理過程的解釋的提示模板
eg:
- 問題範例: Roger有5個網球。他買了2罐更多的網球。每罐有3個網球。他現在有多少網球？
- 回答範例: Roger起初有5個網球。2罐網球，每罐有3個網球，所以總數是6。5 + 6 = 11。答案是11。

可以將連鎖思考應用於任何任務(Apply Chain-of-Thought to Any Task)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

在BIG-Bench Hard的挑戰任務中，使用連鎖思考提示的Codex相對於使用標準提示的Codex，以及與平均人類評分者的表現有所提升

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

標準“答案-僅”提示 (Standard “answer-only” prompting)：
- 只有少數任務，Codex的表現超過了人類。
連鎖思考提示 (Chain-of-thought prompting)：
- 與標準提示相比，使用連鎖思考提示的Codex在更多的任務上表現得比平均人類評分者好

自我一致性解碼 (Self-consistency Decoding)

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

語言模型如何使用多種途徑進行推理，並透過多數票決方式確定最終答案。這種方法可以增加模型在解答問題時的一致性和精確度

左側部分展示了如何用示例思考鏈解答問題：

問題：Shawn 有五個玩具。他從他的媽媽和爸爸那裡得到了更多的玩具。現在他有多少個玩具？
答案：Shawn 開始時有 5 個玩具。他從媽媽那裡得到 2 個，從爸爸那裡得到 4 個。所以答案是 5+4+2=9。答案是 9。
問題：Janet 的鴨子每天產下 16 顆蛋。她每天早上吃三顆作為早餐，然後用四顆蛋烤鬆餅給她的朋友。她將剩下的蛋以每顆 2 美元的價格出售。那麼她每天能賺多少錢？
答案：(此部分沒有提供具體答案，但可以從右側部分推理出答案。)

右側部分顯示了使用多種推理途徑得到的答案：

她有 16 - 3 - 4 = 9 顆蛋剩下。所以她賺的是 9 * 2 = 18 美元每天。答案是 18 美元。
這意味著她每天使用 3 + 4 = 7 顆蛋。所以她總共賣掉了 7 * 2 = 14 美元的蛋。答案是 14 美元。
她吃三顆作為早餐，所以剩下 16 - 3 = 13 顆。然後她用 4 顆來烤鬆餅，所以她還剩 13 - 4 = 9 顆蛋。所以她賺了 9 * 2 = 18 美元。答案是 18 美元。
經過多種推理途徑，大多數結果都是 18 美元，所以最終答案為 18 美元。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Least-to-most Prompting:

Enables easy-to-hard generalization

Decompose a complex problem into a list of subproblems:
- Order subproblems with ncreasing complexities (from least to most complex).
Sequentially solve the subproblems.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

使用語言模型進行問題的分解，並依序解答各個子問題以得到最終答案。1. 首先確定需要回答的主要問題，然後確定了答案這個問題所需的信息。2. 隨後，解決每個子問題，並得到答案。這種方法可以幫助確保答案的準確性和一致性

階段 1: 將問題分解成子問題
- 問題：Amy 花了 4 分鐘爬到滑梯的頂端。下滑梯需要 1 分鐘。水上滑梯在 15 分鐘後關閉。Amy 可以滑幾次，直到它關閉？
- 答案：為了解決「Amy 可以滑多少次，直到它關閉？」這個問題，我們首先需要解決「每次旅程需要多久？」的問題
階段 2: 依序解決子問題
- 子問題 1
- 問題：每次旅程需要多久？
- 答案：Amy 花了 4 分鐘爬到滑梯的頂端，並且需要 1 分鐘下滑梯。所以，4 + 1 = 5。每次旅程需要 5 分鐘
- 子問題 2
- 問題：Amy 可以滑幾次，直到滑梯關閉？
- 答案：水上滑梯在 15 分鐘後關閉。每次旅程需要 5 分鐘。所以，15 ÷ 5 = 3。Amy 可以滑 3 次，直到它關閉

Instruction Finetuning: Enables zero-shot prompting

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

FLAN Instruction Tuning

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

左圖：
- 說明如何對模型進行微調，以應對一系列多樣化的指令。這些指令都使用簡單且直觀的任務描述，例如“將此電影評論分類為正面或負面”或“將此句子翻譯成丹麥語”。
- 示例：提供了一個"前提"（Premise）：「俄羅斯宇航員瓦列里·波利亞科夫刷新了在太空中停留時間的紀錄」。下方有一個"目標"（Target）示例，即「俄羅斯保持了在太空停留時間最長的紀錄」，以及兩個選項：是的（yes）或不是（no）。
右圖：
- 指令調整僅對大於某種程度的模型提高未見任務的表現（目前未解的神奇的閥值）
- 橫軸表示模型大小（參數的數量）
- 縱軸表示平均測試精確度。
- 圖中有兩條線：一條代表進行了指令調整的模型，另一條代表未經調整的模型。從圖中可以看出，隨著模型大小的增加，進行了指令調整的模型的平均測試精確度也有所提高

FLAN2

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

FLAN2 的核心是對1800多個任務、更大的模型以及一種名為 "chain-of-thought"（連續思考）的技術進行微調。

指令微調：
- 描述：這部分展示了如何對語言模型給予特定指令，並獲得答案
- 示例：問題是「氮的沸點是多少？」，模型的答案是「-320°F」
連續思考微調 (Chain-of-thought finetuning)：
- 描述：這部分展示了如何對模型提供連續的問題和答案，以模擬一個連續的思考過程。
- 示例：原始資料顯示食堂原本有23個蘋果，使用了20個，然後又購入了更多。問題是他們現在有多少蘋果。模型的答案是：他們現在有3 + 6 = 9個蘋果
多任務指令微調 (Multi-task instruction finetuning)：
- 描述：這部分著重於將多個任務融合到一起，以達到從未見過的任務的泛化
- 示例：問題是「Geoffrey Hinton能否和George Washington進行對話？」在回答之前需要給出理由。答案提到了Geoffrey Hinton是1947年出生的英國-加拿大計算機科學家，而George Washington於1799年去世，因此答案是「不能」
reference：
- Scaling Instruction-Finetuned Language Models
- promptingguide.ai

推理總結Reasoning Summary

連鎖思考提示 (Chain-of-thought prompting)
- <問題 ==> 解釋 ==> 答案>
- 這種提示方法要求語言模型不僅僅給出答案，還需要提供過程中的推理或解釋
- 透過這樣的方式，使用者可以更好地了解模型是如何得到其答案的，並對結果有更多的信心
自我一致性 (Self-consistency)
- 生成相同問題的多個答案，然後選擇最常見的
- 這種策略背後的想法是，通過多次獲得答案，我們可以獲得一個最穩定、最可靠的結果。若某一答案多次出現，它可能是最正確的
由小至大的提示 (Least-to-most prompting)
- 將問題分解並解決子問題
- 這種策略允許模型首先關注問題的某些特定部分或更簡單的子問題，然後基於這些答案來解決整體問題。這種方法有助於確保模型不會被複雜的問題所淹沒，並能夠循序漸進地找到答案
指令微調 (Instruction finetuning)
- 教導大型語言模型 (LLMs) 遵循指令
- 模型在訓練期間，會特別強調遵循給定指令的重要性。這樣，當給予一組具體指令時，模型會更精確地遵循和執行，從而提供更符合要求的答案

參考資料與介紹

錄影[2023.09。KDD 2023。The Large Language Model Revolution: Implications from Chatbots and Tool-use to Reasoning]
LLM Day 網站
台灣人工智慧年會
ihower截圖筆記
Highlights on Large Language Models at KDD 2023

AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

2023.09。The Large Language Model Revolution

Can we teach LLMs like we teach kids?

相比傳統的標準提示方式，使用連續思考提示(Chain of Thought Prompting)可以更準確地引導模型進行推理

可以將連鎖思考應用於任何任務(Apply Chain-of-Thought to Any Task)

在BIG-Bench Hard的挑戰任務中，使用連鎖思考提示的Codex相對於使用標準提示的Codex，以及與平均人類評分者的表現有所提升

自我一致性解碼 (Self-consistency Decoding)

左側部分展示了如何用示例思考鏈解答問題：

右側部分顯示了使用多種推理途徑得到的答案：

Least-to-most Prompting:

Instruction Finetuning: Enables zero-shot prompting

FLAN Instruction Tuning

FLAN2

推理總結Reasoning Summary

參考資料與介紹

Read more

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Introduction to Agent Memory

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Baseline Email Assistant

[AI Agents in LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

AI / ML領域相關學習筆記入口頁面