# Solomonoff-like rule extrapolation in a simple programmin language
We would like to study rule extrapolation by programs/algorithms. Let's say we have an algorithm that can accurately complete, say, `a^nb^n` sequences given their prefix. If the algorithm's input is the prefix `aab`, then its output is the suffix `b` such that the prefix and suffix taken together, in this case `aabb`, is a valid `a^nb^n` expression. Let's say that we don't care which output is produced if multiple solutions exist, so long as it is valid.
More generally, we are interested in a method `f` that satisfies a specification/contract signature of the following form:
```
method f (x: string) returns y:string
requires ∃ z : Rule1(x+z) && Rule2(x+z)
ensures Rule1(x+y)
ensures Rule2(x+y)
```
Where
* `Rule1` and `Rule2` are predicates, functions that take a string as input and evaluate to a boolean.
* the keyword `requires` specifies a precondition: the method can assume that these conditions hold for its inputs. In this case, the precondition is that it is possible to complete the string `x` such that both `Rule1` and `Rule2` apply to the concatenated string `x+z`.
* the keyword `ensures` specifies a postcondition: a logical assertion that provably holds at the end of the method's execution. In this case, there are two postconditions: one ensures that `Rule1` holds, the other ensures that `Rule2` holds.
* implicitly, we the contract also ensures that the method terminates in finite time for any input satisfying the preconditions.
We can rewrite the above contract as the following logical statement about the method f:
$$
\forall x \left( \exists z: R_1(x+z) \land R_2(x+z)) \implies T(f, x) \land R_1(x + f(x)\right) \land R_2(x + f(x))
$$
where $T$ is a predicate denoting termination of the method $f$ for input $x$.
If `Rule1` is chosen as balancedness of `a`s and `b`s, and `Rule2` as the string being ordered, i.e. `a`s precede `b`s, we recover a contract for our `a^nb^n` completion algorithm.
It is clear that there can be many potential implementations or programs, $f$, that provably satisfy the above contract. However, the contract does not say anything about these programs' behaviour when the pre-condition is violated. When given an invalid input, these methods may not terminate, and when they do, they may output arbitrary character sequences. This is to say, the implementation is not identifiable from the contract.
*Idea:* Choose a very simple programming language in which it becomes possible to sample implementations that satisfy the contract. Someting like Brainf*ck for example. Sample verified implementations, noting down their length or other measures of description complexity. Define rule extrapolation ability as the fraction of OOD prompts (strings that don't satisfy the precondition) for which the implementation satisfies at least one of the postconditions.
Hypothesis 1: Rule extrapolation ability will correlate negatively with description length.
Hypothesis 2: If one forms a Solomonoff induction-like predictor, where the completion probabilities are established based on assigning $2^{-l}$ weights to implementation of length $l$, the resulting probabilistic method will be similar to the predictions of a transformer pretrained on $a^nb^n$ languages.