Generative AI with Large Language Models - Week3_Reinforcement learning and LLM-powered applications - HackMD

AI / ML領域相關學習筆記入口頁面

Deeplearning.ai GenAI/LLM系列課程筆記

Generative AI with Large Language Models

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Week3 - 強化學習與LLM驅動的應用
Reinforcement learning and LLM-powered applications

從人類回饋中強化學習(Reinforcement learning from human feedback)

3-1. Introduction - Week 3

第三週介紹 (Introduction - Week 3)

與人類價值觀一致的模型 (Aligning models with human values)

來自人類反饋的強化學習 (Reinforcement learning from human feedback (RLHF))

從人類獲取反饋 (RLHF: Obtaining feedback from humans)

獎勵模型 (RLHF: Reward model)

用強化學習進行微調 (RLHF: Fine-tuning with reinforcement learning)

選修視頻：近端策略優化 (Optional video: Proximal policy optimization)

獎勵黑客攻擊 (RLHF: Reward hacking)

KL散度 (KL divergence)

擴大人類反饋的規模 (Scaling human feedback)

Lab 3 - 用強化學習微調FLAN-T5以生成 (Lab 3 - Fine-tune FLAN-T5 with reinforcement learning to generate)

3-2. LLM驅動的應用(LLM-powered applications)

部署的模型優化 (Model optimizations for deployment)

生成式AI項目生命周期速查表 (Generative AI Project Lifecycle Cheat Sheet)

在應用中使用LLM (Using the LLM in applications)

與外部應用交互 (Interacting with external applications)

Helping LLMs reason and plan with chain-of-thought

幫助LLM透過思維鏈進行推理和計劃 (Helping LLMs reason and plan with chain-of-thought)

Program-aided language models (PAL)

程序輔助語言模型 (PAL) (Program-aided language models (PAL))

ReAct: 結合推理與行動 (ReAct: Combining reasoning and action)

ReAct: 推理與行動 (ReAct: Reasoning and action)

LLM應用架構 (LLM application architectures)

選修：AWS Sagemaker JumpStart (Optional video: AWS Sagemaker JumpStart)

3.3. 課程總結與展望

負責任的AI (Responsible AI)