# Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory
## 1 Introduction
“What if a cyber brain could possibly generate its own ghost, create a soul all by itself? And if it did,
just what would be the importance ofbeing human then?”
- Generally Capable Agent (GCA)
- prior work has been on specific Minecraft tasks like ObtainDiamond; they focus on a broader goal of exploring Minecraft
- RL-agents for Minecraft are heavily limited and need to be trained for millions of steps and still don't perform well
- poor generalization and scalability
- struggles to map long horizon tasks to specific key presses
- Ghost in the Minecraft (GITM), their GCA is composed of an LLM Decomposer, LLM Planner, and LLM Interface
- Decomposer: decomposes well-defined sub-goals
- Planner: plans a sequence of structured actions for each sub-goal
- Interface: executes action, interacts with env, and receives observation
Specifically, VPT [2] needs to be trained for 6,480 GPU days, DreamerV3 [7] needs to be trained for 17 GPU days, while our GITM does not require any GPUs
and can be trained in just 2 days using a single CPU node with 32 CPU cores

## 2 Related Work
- prior work use RL (imitation learning, hierarchical RL)
- VPL builds foundation model for Minecraft via training on videos
- some work have adopted knowledge distillation and curriculum learning
- some use RL with LLMs while this paper solely uses LLMs

## 3 Method

### 3.1 LLM Decomposer
- decompose a task goal into subgoals
- a goal is (Object, Count, Material, Tool, Info)
- Object is the object itself
- count is how many you have of the object
- material and tool are the prereq materials/tools needed for that object
- info is the text-based knowledge related to this goal
- given a specific goal, a sentence embedding is extracted from a pre-trained LLM and this embedding is used to retrieve the most relevant text-based knowledge from an external knowledge base
- the LLM identifies the required materials/tools/related info from gathered knowledge
- all prereq materials/tools can be listed as subgoals too allowing for recursive decomposition
- they use an external knowledge base based on the minecraft wiki


### 3.2 LLM Planner
- a structured action is (Name, Arguments, Description)

- decompose 3141 tasks from MineDogo dataset using a pre-trained LLM into action sequences

- Action Interface: functional descriptions of structured actions and their parameters
- Query Illustration: clarifies structure and meaning of user queries
- Response Format: requires LLM to return response in specific way
- Interaction Guideline: guides LLM to correct failed actions based on feedback message
- the User Query has the goal + external info (apparently the 5-tuple from decomposer)
- it has feedback from previous action
- and a reference plan to follow
- agent maintains working memory; stores entire set of action sequences after a task goal is achieved
### 3.3 LLM Interface
## 4 Experiments

