---
# System prepended metadata

title: 'VOYAGER: An Open-Ended Embodied Agent'
tags: [LLM-based Agent]

---

# VOYAGER: An Open-Ended Embodied Agent with Large Language Models

## 1 Introduction

![image](https://hackmd.io/_uploads/rk5GsZ-5p.png)

- RL methods are lacking in generalizability and data efficiency
- Voyager is an embodied LLM-based agent for Minecraft 
    - automatic curriculum maximizing exploration
    - skill library for storing and retrieving complex behaviors
    - iterative prompting mechanism to generate executable code for embodied control
- voyager uses GPT-4 and in-context learning
- it aims to discover as many things as possible via an automatic curriculum
- voyager builds skill library by storing action programs that help solve tasks successfully
- voyager uses iterative prompting
    - difficult to generate working code (use JS mineflayer API to control agent) zero-shot so they run the code and provide error trace and problems and env signals back to GPT-4 via text
    - repeat for another round of code refinement
    - repeat process until self-verification module confirms task completion at which the program is stored in the skill library
- voyager demonstrates lifelong learning capabilities by constructing an ever-growing skill library

## 2 Method

![image](https://hackmd.io/_uploads/H177jZZ5p.png)

### 2.1 Automatic Curriculum

![image](https://hackmd.io/_uploads/HkaXobb9T.png)

![image](https://hackmd.io/_uploads/Hk6UiZZqa.png)

- full prompts can be found in the paper's appendix and in their repo

### 2.2 Skill Library

![image](https://hackmd.io/_uploads/BJZIhbZ9T.png)

![image](https://hackmd.io/_uploads/By803-WcT.png)


![image](https://hackmd.io/_uploads/HygF3-b96.png)



### 2.3 Iterative Prompting Mechanism


![image](https://hackmd.io/_uploads/HyZf6-Wqa.png)


![image](https://hackmd.io/_uploads/B1I7abZ56.png)


![image](https://hackmd.io/_uploads/HyIU6ZZqT.png)


## 3 Experiments

![image](https://hackmd.io/_uploads/r1Z96Zb9T.png)


- significantly better exploration
- consistent tech tree mastery
- extensive map traversal


![image](https://hackmd.io/_uploads/Sks0aWW5p.png)


- their ablations summarized:
    - automatic curriculum crucial for consistent progress
    - voyager without skill library exhibits tendency to plateau later
    - self-verification is the most important feedback out of all feedback types
    - GPT-4 >> GPT-3.5 in code generation



![image](https://hackmd.io/_uploads/rkdu0-bqp.png)

- humans can be critics (giving voyager visual critique) or can be the curriculum (breaking down goals for it) 

## 4 Limitations and Future Work

- cost
- inaccuracies
- hallucinations