This has become a bit chaotic, and I'm currently using it as a place to dump ideas. Will start a fresh doc.
Transformers have been shown to be highly adept at a wide range of tasks, but there is still little understanding what kind of algorithms are implemented internally which make this behavior possible. Understanding these algorithms is especially important insofar as they are doing learned optimization which might lead to inner misalignment concerns.
Optimization is deeply intertwined with the concept of search, where a system explores and evaluates a set of candidates and selects from among them. From the original Risks from Learned Optimization sequence:
We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system.
This post concerns itself with what I will call "learned search", what shapes it can take, how it might be implemented in transformers, and how we might go about finding it.