### [AI / ML領域相關學習筆記入口頁面](https://hackmd.io/@YungHuiHsu/BySsb5dfp) #### [Deeplearning.ai GenAI/LLM系列課程筆記](https://learn.deeplearning.ai/) - [Large Language Models with Semantic Search。大型語言模型與語義搜索 ](https://hackmd.io/@YungHuiHsu/rku-vjhZT) - [Finetuning Large Language Models。微調大型語言模型](https://hackmd.io/@YungHuiHsu/HJ6AT8XG6) - [LangChain for LLM Application Development](https://hackmd.io/1r4pzdfFRwOIRrhtF9iFKQ) 系列課程筆記 - [Building and Evaluating Advanced RAG。建立語評估進階RAG](https://hackmd.io/@YungHuiHsu/rkqGpCDca) ## [Generative AI with Large Language Models](https://www.deeplearning.ai/courses/generative-ai-with-llms/)   - [Week1-Generative AI use cases, project lifecycle, and model pre-training](https://hackmd.io/@YungHuiHsu/By7dsMnTp) - [Week2-Fine-tuning and evaluating large language models](https://hackmd.io/@YungHuiHsu/HJQrU7npp) - [Week3-Reinforcement learning and LLM-powered applications](https://hackmd.io/@YungHuiHsu/Hkbxu7h6T) --- ## Week3 - 強化學習與LLM驅動的應用<br>Reinforcement learning and LLM-powered applications ### 從人類回饋中強化學習(Reinforcement learning from human feedback) ### 3-1. Introduction - Week 3 #### 第三週介紹 (Introduction - Week 3) #### 與人類價值觀一致的模型 (Aligning models with human values) #### 來自人類反饋的強化學習 (Reinforcement learning from human feedback (RLHF)) #### 從人類獲取反饋 (RLHF: Obtaining feedback from humans) #### 獎勵模型 (RLHF: Reward model) #### 用強化學習進行微調 (RLHF: Fine-tuning with reinforcement learning) #### 選修視頻:近端策略優化 (Optional video: Proximal policy optimization) #### 獎勵黑客攻擊 (RLHF: Reward hacking) #### KL散度 (KL divergence) #### 擴大人類反饋的規模 (Scaling human feedback) #### Lab 3 - 用強化學習微調FLAN-T5以生成 (Lab 3 - Fine-tune FLAN-T5 with reinforcement learning to generate) --- ### 3-2. LLM驅動的應用(LLM-powered applications) #### 部署的模型優化 (Model optimizations for deployment) #### 生成式AI項目生命周期速查表 (Generative AI Project Lifecycle Cheat Sheet) #### 在應用中使用LLM (Using the LLM in applications) #### 與外部應用交互 (Interacting with external applications) ### Helping LLMs reason and plan with chain-of-thought #### 幫助LLM透過思維鏈進行推理和計劃 (Helping LLMs reason and plan with chain-of-thought) ### Program-aided language models (PAL) #### 程序輔助語言模型 (PAL) (Program-aided language models (PAL)) #### ReAct: 結合推理與行動 (ReAct: Combining reasoning and action) #### ReAct: 推理與行動 (ReAct: Reasoning and action) #### LLM應用架構 (LLM application architectures) #### 選修:AWS Sagemaker JumpStart (Optional video: AWS Sagemaker JumpStart) --- ### 3.3. 課程總結與展望 #### 負責任的AI (Responsible AI)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up