從研究推測大型語言模型將如何引領人機互動新革命 - YC （陳宜昌）

歡迎來到 MOPCON 2024 共筆

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

共筆入口：https://hackmd.io/@mopcon/2024
手機版請點選上方按鈕展開議程列表。

重點摘要

前情提要

GAI 皆圍繞在 LLM 上，從研究員角度看人機互動。

Media 本身的形式比其傳遞內容對人類的影響更為重要

$-$ 麥克魯漢（現代傳播理論的奠基者）

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

1965: PDP-8 - 真正意義上的小型電腦
1984: Macintosh - 具有 GUI 的電腦
1997: Google - 透過搜尋欄與世界互動
2007: iPhone 初代 - 觸控式輸入

User ←→ Agent ←→ Information

AI Agent 四種研究方向
- Natural Language Control
- Multi-modal Awareness
- Tool Using
- Reasoning

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

語言模型訓練階段

Pre-training
- next token prediction
- large-scale data
- to compress all data, the LLMs are forced to learn:
  - basic knowledge
Supervised Fine-tuning (SFT)
- Learn a template
- 缺點：聊天 response 的不符合人類習慣，透過
Preference Learning 來彌補
- RLHF(Reinforcement Learning from Human Feedback)
- 後可用評分器來取代人與 LLM 互動。
- 每一筆訓練資料包含：問題、好的回答、壞的回答。
DPO
- DPO vs RLHF: offline vs online
- 每一筆訓練資料包含：問題、好的回答、壞的回答。

多模態

可以同時處理不同功能的 model，例如：同時辨識圖片與文字
如何把多模態放入大型語言模型內？
- 把圖片當成一種語言（LlaVa）。
- Cross-attention（Llama3）。
Any-to-any Multi-modal LLMs
- NExt-GPT

AI 用工具

Function Calling → Chat Completion
Breeze-FC
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
- 訓練 tool using 的 LLM Prompt template

推論

思想快與慢：快思（system 1）與慢想（system 2）。
CoT, Chain of Thought
- 目前 GPT o1 思考鏈使用的方法
- 利用拆分步驟請 LLM 解決可大幅強化輸出效果
Multi-CoTs (CoT-SC)
Tree of Thoughts (ToT)
Graph of Thoughts (GoT)
Agent Framework: ReAct (Reason + Act)
滿足 Agent 的條件
- Perception
- Decision Making
- Action