# ZeroSumNormal
## Goals
1. 1D --> many D
2. Documentation:
3. Intution
4. What situations this gets used
5. Technical (docstrings)
6. Comparison with not dropping degrees of freedom
7. Comparison with dummy coding approach
8. Document high-dimensional extension
## A brief introduction to the ZeroSumNormal
(EM: FYI guys, this sounds like a JMLR paper to me...)
Why does the ZeroSumNormal exist?
tl;dr answer:
When doing regressions involving categorical variables, one common trick is to drop one category to set it as a baseline, with the effect of moving from that baseline category to other categories modelled by a difference term. In a Bayesian setting, when we have priors placed on the difference term, the _choice_ of which category to set as the baseline becomes extremely important. (NOTE: We have notebooks that document this.)
Because the choice of category to drop, or set as baseline, influences our model posteriors heavily, a different solution is needed. Instead of selecting a single category as a baseline, we propose modelling categorical effects as being shifts off the mean of all categories' true effects. Canonically, when we have such shift terms, Gaussian distributions are a natural choice. However, a few technical challenges arise in this problem setting:
1. The sum of all shifts must equal to zero. This satisfies the definition of being shifted off from a mean.
2. Posterior correlations (as we will see later) may be pathological for inference, and we will need a way out of this issue.
To solve these two primary technical issues, we propose the ZeroSumNormal distribution. Its properties are such that:
1. The sum total of effects modelled by the ZeroSumNormal will be centered on zero.
2. With certain implementation tricks that leverage old math, which we will describe here, posterior correlations can be eliminated.
_Now, Adrian, Luciano, Ravin, Alex & co. can nerd out on the implementation below!_
We first start by modelling the effects jointly as a multivariate Normal distribution. The motivation for this choice is not immediately apparent at first, but hang tight, it will become clear as we proceed. Probabilistically-speaking, we desire the ability to draw numbers from this multivariate Gaussian such that their sums are equal to zero. In a 2-dimensional case, this would imply drawing a number from a multivariate Gaussian such that.......... (?stuck?)
Deep inside the ZeroSumNormal implementation is a Householder transformation. The details of how the Householder transformation gets used is explained in more detail later, but at the high-level, the purpose of this transformation is to reduce computational complexity in the internal calculations. (?detail?)
The ultimate effect of the ZeroSumNormal is that it effectively gives us the analogous non-centered Gaussian distribution for categorical effect sizes. By using the ZeroSumNormal, we can eliminate correlations, collinearity, and provide an easier likelihood landscape for sampling to take place.
-----------
Model this as a multivariate Gaussian.
Householder transformation == "reflection about a plane".
We get correlated posteriors -- bad divergences, small ESS.
The ultimate effect is zero correlations.
No colinearity, easier to funnel.