Conjecture

@conjecture

conjecture.dev

Private team

Joined on Jul 20, 2022

  • This has become a bit chaotic, and I'm currently using it as a place to dump ideas. Will start a fresh doc. Transformers have been shown to be highly adept at a wide range of tasks, but there is still little understanding what kind of algorithms are implemented internally which make this behavior possible. Understanding these algorithms is especially important insofar as they are doing learned optimization which might lead to inner misalignment concerns. Optimization is deeply intertwined with the concept of search, where a system explores and evaluates a set of candidates and selects from among them. From the original Risks from Learned Optimization sequence: We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. This post concerns itself with what I will call "learned search", what shapes it can take, how it might be implemented in transformers, and how we might go about finding it.
     Like  Bookmark
  • Unsupervised learning I want to use unsupervised learning to try to understand what is going on inside neural networks. In particular I mean mapping activation patterns (which exist in some weird incomprehensible space) to a space with human understandable structure. :heavy_check_mark: Avoids ELK-style Goodhearting by a powerful reporter (this is why we use linear probes which are intentionally weak) :heavy_check_mark: Extracts features without relying on us to provide them (we may find things we weren't expecting to find) :heavy_check_mark: Helps protect us from confirmation bias :x: Methods are brittle to assumptions about the structure of the data :x: A failure to find features doesn't prove they aren't there Examples:
     Like  Bookmark