There is no commentSelect some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Computation in Superposition: Math Overview
"Toward a Mathematical Framework for Computation in Superposition" (TMF) [link] provides a proof of concept for how superposition can allow parallel on many features in parallel. As a mathematical description, it serves as a starting place for understanding more complex types of superposition computation.
Here we give a condensed version of the points made in the post.
Features: Say we have a set of sparse, boolean features (eg, a set of topics that are either present or not in a text).
Denote features by . Let be the number of features "on" at the same time.
Neurons: In the MLP layer, neurons each take a random subset of the features as inputs. Neurons are only nonzero if at least two features are "on".
where is 1 with probability and 0 otherwise.
Each neuron's inputs define a set of features.
if only two features in are "on". More generally, for features "on" in and 0 otherwise.
Read off ANDs: To linearly read off the AND operation for two features, we take the average over neurons that have both features as inputs.
AND estimate
We expect the included in the sum to typically be when and are "on" and otherwise, giving the correct AND behavior. Rare events when other "on" features are included in the sums add noise to these values.
Interference: Noisy contributions to the read-off from other neurons being "on" are negligible if the features are sufficiently sparse and the random subsets of feature inputs are small enough.
Overall, the noise contributions are small if the expected number of nonzero terms in a neuron's sum over features is small. Since neurons are "on" and contribute to the sum with probability , then this requires .
Consider a neuron with both and as inputs.
If both features are "on", then the neuron is in the linear piece of the ReLU, so . There are other features that are "on" and a probability of of each feature contributing to the sum. So contributes to . This is only small if , ie the probability of extra features being "on" and connected to the neuron is small.
If only one feature is "on", then a similar calculation shows the noise contribution is also .
If both features are "off", then there is only noise when two other features in the sum are "on". This is more rare, so the noise contribution is smaller, if .
Universal AND: It's possible to read-off all pairwise ANDs with a layer size that is a power of .
To ensure a high probability that any pair of features can be read-off we need .
A given pair of features will be inputs into neurons on average. The probability of not appearing in any neuron is , from the poisson distribution.
Across pairs, the probability of all pairs being in at least one neuron's inputs is . For the second term to be small, we need .
The post makes the particular choice . Then, each estimate is an average over terms.
If is also a power of , then can be a power of and be large enough for small interference terms.
Realistic regimes:
Based on values from Anthropic's "Towards Monosemanticity", it wouldn't be possible to use the U-AND construction to compute ANDs for all features.
(range 500-130K), technically these are features found in the neuron activations.
.
Constraints on :
If we take , then we would only be able to extract of all ANDs.
Extensions
When inputs are in superposition: The construction also works if the inputs are already in superposition. However, in this case the number of ANDs that can be computed is of order the number of parameters in the layer .
Let the sparse, boolean features be an "overbasis" for a -dim space. So . The input is a sum of feature vectors.
Decompose the neuron's weight matrix into where ( dims) puts the input back into the feature basis via a dot product. is same as the U-AND construction above.
After applying , the features are either "on" with value 1 or "off" with a small random value of standard deviation .
Unlike before, the interference terms now add up for "off" features. If neurons connect to features via then the interference terms are order 1 and cause "misfires". This requires .
For large , we cannot compute all pairwise ANDs since would be too small for neurons to contain all pairs of features.
Instead, we have to choose ANDs to compute. It's basically the same construction but subsetting to features involved in the chosen ANDs.
Random Gaussian weights for inputs not in superposition:
A layer with random Gaussian weights would have neuron activations for boolean feature values .
We need readoff weights that are correlated with when two features, and are active, but uncorrelated otherwise.
To readoff , we can use weights where the expectation is over inputs. Because of the expectation, the readoff weights would need to be learned.
If both features are active, then by design we'd expect .
If one or zero features are active, then since either or is uncorrelated with .
In general, the noise in the numerator sum () should scale as for active features and nonzero neuron activations, . The denominator goes as , so the readoff noise is .
If the bias is learned, then it's value is likely determined by minimizing the noise term where are random gaussian variables with mean zero and variances and that represent the sums over the two AND features and the other active features, respectively.
It's possible to also readoff using the weights
This approach has the advantage of a higher correlation between and but the disadvantage that the readoff is nonzero when only one feature is active, going something like .
This approach is perhaps favorable when is not large, and the noise for the first approach is large.
Stacking U-AND layers: It might be possible to stack multple AND layers in a row to start computing more complex boolean functions. If interference is not carefully managed then errors are likely to propogate.
ANDs of three or more features: The same type of construction can be used to compute ANDs up to order (eg ) simultaneously.
If a neuron's inputs contain three features, then its value is 2 if all three are "on" and 1 if only two are "on".
An AND of three features (say , , ) can be computed by adding or subtracting averages over neurons with two or three of the features:
.
Can make similar constructions for higher order ANDs.