Searching for Search (old)

# Searching for Search (old) ###### tags: `nicholas` `intepretability` This has become a bit chaotic, and I'm currently using it as a place to dump ideas. Will start a fresh doc. Transformers have been shown to be highly adept at a wide range of tasks, but there is still little understanding what kind of algorithms are implemented internally which make this behavior possible. Understanding these algorithms is especially important insofar as they are doing learned optimization which might lead to inner misalignment concerns. Optimization is deeply intertwined with the concept of search, where a system explores and evaluates a set of candidates and selects from among them. From the original [Risks from Learned Optimization](https://www.alignmentforum.org/posts/FkgsxrGf3QxhfLWHG/risks-from-learned-optimization-introduction) sequence: > We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. This post concerns itself with what I will call "learned search", what shapes it can take, how it might be implemented in transformers, and how we might go about finding it. ## Selection vs Control First, let's disambiguate the term "search" from "optimization". [Optimization](https://www.alignmentforum.org/posts/znfkdCoHMANwqc2WE/the-ground-of-optimization-1) is often characterized by systems which robustly evolve toward small targets from a broad basin of attraction. This definition ends up including quite a wide set of systems, including a ball rolling down a hill. To be more clear about how to think about search, it can be helpful to divide optimization into two clusters, [selection and control](https://www.alignmentforum.org/posts/ZDZmopKquzHYPRNxq/selection-vs-control): - **Selection**: A selection process explores a search space by instantiating members of the space and getting feedback on the quality of those members. This kind of optimization explores *multiple paths* by which the system might evolve. TODO: add unique example (not evolution) - **Control**: A control process only explores a *single path* and just receives feedback on one outcome. TODO: add unique example(not a missile or thermostat) A control process is also characterized by "locking in" a path, where choices made early in the path are irreversible. This is distinct from a selection process where multiple paths can be explored independently of each other. When I say that a transformer might be performing "search" I am making the claim that within the neural net there may exists a *selection process* which instantiates candidate paths, evaluates them, and selects between them. ### Some other discussions: TODO: Build better description of the "search" landscape incorporating other ideas: #### John Wentworth >Roughly speaking, a general search process is something which takes in a specification of some problem or objective (from a broad class of possible problems/objectives), and returns a plan which solves the problem or scores well on the objective. For instance, a gradient descent algorithm takes in an objective, and returns a point which scores well on the objective, for a very broad class of possible objectives; gradient descent is therefore a search method. > >Enumerating possible actions and evaluating their consequences is one way to do general search, but it's wildly inefficient; I would typically refer to that as "brute force search". Gradient descent does better by leveraging backprop and gradients; approximately none of the algorithmic work done by gradient descent comes from direct evaluation of the consequences of actions. And there are many other tricks one can use too - like memoization on subsearches, or A*-style heuristic search, or (one meta-level up from A*) relaxation-based methods to discover heuristics. The key point is that these tricks are all very general purpose: they work on a very wide variety of search problems, and therefore produce general-purpose search algorithms which are more efficient than brute force (at least on realistic problems). > >More advanced general-purpose search methods seem to rely relatively little on enumerating possible actions and evaluating their consequences. By the time we get to human-level search capabilities, we see human problem-solvers spend most of their effort on nontrivial problems thinking about subproblems, abstractions and analogies rather than thinking directly about particular solutions. Some features of search: - **Retargetability**: A* can take any start and end and perform search to find a path - **Recursion**: Divide search into subproblems which are also search. - Chucking - **Heuristics**: To search *efficiently* you need heuristics. How are those generated? AGI must need a general purpose method for generating heuristics. - Depend on the environment, not the objective - Heuristic generator: - Problem relaxation - **Constraints**: Human search seems to start with noticing constraints, and then working backwards from them. (Related to recursion: Suppose I want to travel to London, I know that I will need to fly, which requires me to plan a trip to an aiprort) **Problem relaxation**: >More generally, the steps of problem relaxation are roughly: > > - Start with a problem with a bunch of constraints > - Relax (i.e. ignore) a bunch of the constraints > - Solve the relaxed problem > - Use the relaxed problem-solution as an heuristic for the full problem #### Search-in-Territory vs Search-in-Map (also JSW) >Let’s split out two different types of optimization. The first type includes things like Google Maps’ pathfinder, a human making a plan, or Amazon’s logistical algorithms. The second type includes things like a thermostat, a heat-seeking missile, or water flowing downhill. > >The distinction I want to point to here is where the things-we’re-optimizing-over live. For the first type, optimization is over things-in-a-model; a search algorithm runs on a map. For the second type, optimization is over things-in-the-world; a search algorithm runs on the territory directly. Search-in-map vs search-in-territory. SIM - Maps can be really general purpose. We can build one before we even know what we need to do with it. - This makes them more powerful! - similar to **selection** but not exactly (e.g, evolution is not SIM, but it is selection) SIT - Don't need a map (maybe making a map is hard?) - simalar to **control** but not exactly ## Searchy Domains When should we expect a transformer to be performing internal search? While selection processes seem very useful, it's not entirely clear how far a system can get using control proceses alone. Neither it is clear how difficult or natural it is for SGD to find the mechanisms for search at all. To increase the chances of finding search, we can look for domains which are especially "searchy". These are domains in which control processes are significantly disadvantaged with respect to selection processes. Take, for example, hillclimbing with a single local maximum, where we consider a state to be a square and knowledge about its direct neighbors. ![](https://i.imgur.com/dyuB1qz.png =300x250) In this domain, there is little advantage to exploring multiple paths. A control process which performs gradient ascent will very reliably converge to the optimal state. This is only possible because the process is able to exploit some structure in the search space to allow it to ignore the vast majority of possible states. Contrast that with the searchiest domain possible: decrypting a cryptographic hash. Proof of work algorithms are literally designed to require search, by forcing miners to explore the entire state space to find a solution. Each member of the set gives no information about other members, and so there is no structure to exploit. Many tasks lie somewhere inbetween, where there exists a lot of structure, but it's very hard or impossible to determine the optimal next step without exploring multiple paths. An example is a maze, which despite being quite structured, makes it hard to determine early on if a path will lead to the exit. ![](https://i.imgur.com/aWc71yD.png =800x300) Other ways in which a domain can be searchy: - **Multi-agent environment**: If predicting other systems which are themselves performing search is important, it might be necessary to do internal search. - **Optimal vs sastisfactory solutions**: If the quality of the solution doesn't matter, it might be a lot easier to design a control process which converges to it. For example, a random walk will eventually solve any maze, but it's very unlikely to find the shortest path. This might require searching through more than one path. This makes adversarial environments more searchy, because a strong pressure exists on the quality of actions. - **Time to think**: Search costs a lot of computation, and so the more constrained a system is by how much it can think at each step, the more incentive it has to use a less expensive control process. - **Incremental advantage**: A domain gives a stronger incentive for search when a little bit of search is noticably better than no search at all. Example with: Chess (any depth is better than no depth), Example without: TODO - **Explicit search space**: TODO ## Characteristics of Search A prototypical kind of search is Monte Carlo Tree Search (MCTS). Search can vary in a lot of ways, and decomposing it into parts can help us explore how search algorithms might differ from each other. Motivated by the structure of MCTS, let's divide search into four parts: TODO: Add to these lists 1. **Selection**: How are new members of the search space selected to be explored? - What is the search space? Is it finite? Is it search through time, exploring possible futures, or search through a static space? - Does the selection policy consider the entire search memory? In MCTS, for example, the selection policy works by traversing a search tree node by node using only information contained in the current node. - Can the selection policy change in response to discoveries about the search space? - Does the selection policy depend on how much time/computation has already been spent searching (or is left to be spent)? Simulated annealing, for example, selects new states differently over time using a continuously decreasing temperature parameter. 3. **Expansion**: How is a new member explored, represented, and added to the search memory. - How is the search memory structured? In MCTS this is a tree, but any data structure is in principle possible. This could even be as simple as just a list of visited states or no memory at all. - How are states represented? At what level of abstraction/detail? Can partial or incomplete states be expanded? Is information compressed? - Are all states represented the same? Can the level of detail and abstraction vary as needed from state to state? 3. **Evaluation**: What criteria are used to evaluate members? More broadly, what is searched for? - Is there a single metric, or are multiple kinds of measurement made simultaneously? - Can the evaluation function change as more of the search space is revealed? 4. **Backpropagation**: When a new member is added and evaluated, how then is the search memory updated? - TODO ## Search in a Transformer - Breadth vs Depth = parallel vs serial compute (search can be both) - can't reuse modules serially (no weight sharing between layers) TODO: Need more ideas https://www.lesswrong.com/posts/TdesHi8kkyokQdDoQ/gradient-descent-doesn-t-select-for-inner-search https://www.lesswrong.com/posts/6mysMAqvo9giHC4iX/what-s-general-purpose-search-and-why-might-we-expect-to-see https://www.lesswrong.com/posts/w4aeAFzSAguvqA5qu/how-to-go-from-interpretability-to-alignment-just-retarget https://www.lesswrong.com/posts/s2KJWLAPyjtmQ9ze3/search-in-territory-vs-search-in-map > (michael) I think that before talking about search/selection in a transformer, we need to think more carefully about what counts as representation of a future state (lack of interpretability certainly doesnt make this easier). Some random thoughts: > - if we are "considering" a future state during search, then at some point, when we decide to take an action based on this evaluation, there should be some causal chain between "activations that correspond to the internal state representation" and "activations that correspond to the action being taken". > - in the context of a maze task, we might consider a transformer that has been trained on sequences that reach the blue dot.