### [AI / ML領域相關學習筆記入口頁面](https://hackmd.io/@YungHuiHsu/BySsb5dfp)
#### [Deeplearning.ai GenAI/LLM系列課程筆記](https://learn.deeplearning.ai/)
- [Large Language Models with Semantic Search。大型語言模型與語義搜索 ](https://hackmd.io/@YungHuiHsu/rku-vjhZT)
- [Finetuning Large Language Models。微調大型語言模型](https://hackmd.io/@YungHuiHsu/HJ6AT8XG6)
- [LangChain for LLM Application Development](https://hackmd.io/1r4pzdfFRwOIRrhtF9iFKQ) 系列課程筆記
- [Building and Evaluating Advanced RAG。建立語評估進階RAG](https://hackmd.io/@YungHuiHsu/rkqGpCDca)
## [Generative AI with Large Language Models](https://www.deeplearning.ai/courses/generative-ai-with-llms/)
 
- [Week1-Generative AI use cases, project lifecycle, and model pre-training](https://hackmd.io/@YungHuiHsu/By7dsMnTp)
- [Week2-Fine-tuning and evaluating large language models](https://hackmd.io/@YungHuiHsu/HJQrU7npp)
- [Week3-Reinforcement learning and LLM-powered applications](https://hackmd.io/@YungHuiHsu/Hkbxu7h6T)
---
## Week3 - 強化學習與LLM驅動的應用<br>Reinforcement learning and LLM-powered applications
### 從人類回饋中強化學習(Reinforcement learning from human feedback)
### 3-1. Introduction - Week 3
#### 第三週介紹 (Introduction - Week 3)
#### 與人類價值觀一致的模型 (Aligning models with human values)
#### 來自人類反饋的強化學習 (Reinforcement learning from human feedback (RLHF))
#### 從人類獲取反饋 (RLHF: Obtaining feedback from humans)
#### 獎勵模型 (RLHF: Reward model)
#### 用強化學習進行微調 (RLHF: Fine-tuning with reinforcement learning)
#### 選修視頻:近端策略優化 (Optional video: Proximal policy optimization)
#### 獎勵黑客攻擊 (RLHF: Reward hacking)
#### KL散度 (KL divergence)
#### 擴大人類反饋的規模 (Scaling human feedback)
#### Lab 3 - 用強化學習微調FLAN-T5以生成 (Lab 3 - Fine-tune FLAN-T5 with reinforcement learning to generate)
---
### 3-2. LLM驅動的應用(LLM-powered applications)
#### 部署的模型優化 (Model optimizations for deployment)
#### 生成式AI項目生命周期速查表 (Generative AI Project Lifecycle Cheat Sheet)
#### 在應用中使用LLM (Using the LLM in applications)
#### 與外部應用交互 (Interacting with external applications)
### Helping LLMs reason and plan with chain-of-thought
#### 幫助LLM透過思維鏈進行推理和計劃 (Helping LLMs reason and plan with chain-of-thought)
### Program-aided language models (PAL)
#### 程序輔助語言模型 (PAL) (Program-aided language models (PAL))
#### ReAct: 結合推理與行動 (ReAct: Combining reasoning and action)
#### ReAct: 推理與行動 (ReAct: Reasoning and action)
#### LLM應用架構 (LLM application architectures)
#### 選修:AWS Sagemaker JumpStart (Optional video: AWS Sagemaker JumpStart)
---
### 3.3. 課程總結與展望
#### 負責任的AI (Responsible AI)