YH Hsu
Generative AI with Large Language Models - Week3_Reinforcement learning and LLM-powered applications
Try
HackMD
YH Hsu
·
Follow
Last edited by
YH Hsu
on
May 17, 2024
Linked with GitHub
Contributed by
0
Comments
Feedback
Log in to edit or delete your comments and be notified of replies.
Sign up
Already have an account? Log in
There is no comment
Select some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Discard
Send
AI / ML領域相關學習筆記入口頁面
Deeplearning.ai GenAI/LLM系列課程筆記
Large Language Models with Semantic Search。大型語言模型與語義搜索
Finetuning Large Language Models。微調大型語言模型
LangChain for LLM Application Development
系列課程筆記
Building and Evaluating Advanced RAG。建立語評估進階RAG
Generative AI with Large Language Models
Image Not Showing
Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted
Learn More →
Week1-Generative AI use cases, project lifecycle, and model pre-training
Week2-Fine-tuning and evaluating large language models
Week3-Reinforcement learning and LLM-powered applications
Week3 - 強化學習與LLM驅動的應用
Reinforcement learning and LLM-powered applications
從人類回饋中強化學習(Reinforcement learning from human feedback)
3-1. Introduction - Week 3
第三週介紹 (Introduction - Week 3)
與人類價值觀一致的模型 (Aligning models with human values)
來自人類反饋的強化學習 (Reinforcement learning from human feedback (RLHF))
從人類獲取反饋 (RLHF: Obtaining feedback from humans)
獎勵模型 (RLHF: Reward model)
用強化學習進行微調 (RLHF: Fine-tuning with reinforcement learning)
選修視頻:近端策略優化 (Optional video: Proximal policy optimization)
獎勵黑客攻擊 (RLHF: Reward hacking)
KL散度 (KL divergence)
擴大人類反饋的規模 (Scaling human feedback)
Lab 3 - 用強化學習微調FLAN-T5以生成 (Lab 3 - Fine-tune FLAN-T5 with reinforcement learning to generate)
3-2. LLM驅動的應用(LLM-powered applications)
部署的模型優化 (Model optimizations for deployment)
生成式AI項目生命周期速查表 (Generative AI Project Lifecycle Cheat Sheet)
在應用中使用LLM (Using the LLM in applications)
與外部應用交互 (Interacting with external applications)
Helping LLMs reason and plan with chain-of-thought
幫助LLM透過思維鏈進行推理和計劃 (Helping LLMs reason and plan with chain-of-thought)
Program-aided language models (PAL)
程序輔助語言模型 (PAL) (Program-aided language models (PAL))
ReAct: 結合推理與行動 (ReAct: Combining reasoning and action)
ReAct: 推理與行動 (ReAct: Reasoning and action)
LLM應用架構 (LLM application architectures)
選修:AWS Sagemaker JumpStart (Optional video: AWS Sagemaker JumpStart)
3.3. 課程總結與展望
負責任的AI (Responsible AI)
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up
Comment