# Tree of Thoughts: Deliberate Problem Solving with Large Language Models ## 1 Introduction ![image](https://hackmd.io/_uploads/Bkg7HgHcT.png) ## 2 Background - I/O prompting - CoT prompting - CoT-SC prompting ## 3 Tree of Thoughts: Deliberate Problem Solving with LM 1. How to decompose the intermediate process into thought steps 2. How to generate potential thoughts from each state 3. How to heuristically evaluate states 4. What search algorithm to use. **Thought decomposition** - varies but usually isn't too big or too small **Thought generator** Given a tree state \( s = [x, z_{1...i}] \), we consider two strategies to generate \( k \) candidates for the next thought step: 1. **Sample i.i.d. thoughts from a CoT prompt** (Creative Writing, Figure 4): $( z^{(j)} \sim p^{CoT}_{\theta}(z_{i+1}|s) = p^{CoT}_{\theta}(z_{i+1}|x, z_{1...i}) )~ where ~ j = 1 ~ to ~ k$. This works better when the thought space is rich (e.g., each thought is a paragraph), and i.i.d. samples lead to diversity. 2. **Propose thoughts sequentially using a "propose prompt"** (Game of 24, Figure 2; Crosswords, Figure 6): $[z^{(1)}, ..., z^{(k)}] \sim p^{propose}_{\theta}(z^{(1...k)}_{i+1} | s)$. This works better when the thought space is more constrained (e.g., each thought is just a word or a line), so proposing different thoughts in the same context avoids duplication. **State evaluator** (a) **Value each state independently**: The value function $V(\beta, s)$ assigns a scalar value $v$ (e.g., 1-10) or a classification (e.g., sure/likely/impossible) to each state $s$ in the state space $S$. This is done through reasoning with a prompt $p^{\text{value}}_{\beta}(v|s)$. The basis of reasoning varies by problem and step, utilizing look-ahead simulations or commonsense. For example, the value function might confirm if operations can reach a target number (like 24) or deduce word meanings based on context. These evaluations guide towards "good" states and are approximate aids in decision-making. (b) **Vote across states**: For the vote function $V(\beta, s)$, the "good" state $s^*$ is determined through a voting mechanism $\mathbb{1}[s = s^*] \), where \( p^{\text{vote}}_{\beta}(s^*|S)$ compares different states in $S$. This is particularly useful when direct valuation is challenging (like assessing passage coherence). The process is akin to a step-wise self-consistency strategy or a multi-choice question, where language model samples are used to determine the best state to explore. ![image](https://hackmd.io/_uploads/H19fnxScT.png) - generality - modularity - adaptability - convenience ## 4 Experiments ![image](https://hackmd.io/_uploads/B1Wd3gS96.png) ![image](https://hackmd.io/_uploads/BJPdhlrq6.png) ![image](https://hackmd.io/_uploads/rJgtnxH5T.png) ## 5 Related Work ## 6 Discussion