###### tags: `one-offs` `puzzles`
# Unbiased Estimators of the Inverse of the Mean from a Poisson Stream
**Overview**: In this post, I describe an estimation puzzle related to Poisson laws, and describe a simple solution.
## The Puzzle
Suppose that you have a machine which generates iid $\mathrm{Poisson}(\lambda)$ samples, with $\lambda$ unknown, and I want a non-negative, unbiased estimate of $\lambda^{-1}$. You can take as long as you need, but after a finite amount of time, you have to give me an answer. How do you do this?
## First Thoughts
Note that from a single Poisson sample, many quantities admit simple estimators, e.g.
- For $n \in \mathbf{N}$, $\mathbf{E} \left[ \mathbf{I} \left[ X = n \right] \right] = \frac{\lambda^n \cdot \exp(-\lambda)}{n!}$.
- For $k \in \mathbf{N}$, $\mathbf{E} \left[ X(X-1)(X-2)\cdots(X-k+1) \right] = \lambda^k$.
- For $t \in \mathbf{R}$, $\mathbf{E} \left[ \exp (t \cdot X) \right] = \exp(\lambda\cdot(\exp (t) - 1))$.
In practice, when taking a fixed number of Poisson samples, one can either take simple averages of these basic estimators, or use tricks related to the Rao-Blackwell theorem to obtain improved estimators.
Anyways, let's see whether there is a viable estimator which is computable from just one sample. Suppose that when $X \sim \mathrm{Poisson}(\lambda)$, $f(X)$ is an unbiased estimator of $\lambda^{-1}$, for any $\lambda > 0$. Then, it holds that *as functions of $\lambda$*,
\begin{align}
\lambda^{-1} &= \sum_{n \geqslant 0} \frac{\lambda^n \cdot \exp(-\lambda)}{n!} \cdot f(n) \\
\lambda^{-1} \cdot \exp(\lambda) &= \sum_{n \geqslant 0} \frac{f(n)}{n!} \cdot \lambda^n.
\end{align}
This would relate $f$ to the Taylor expansion of $\lambda \mapsto \lambda^{-1} \cdot \exp(\lambda)$ around $\lambda = 0$. But there is a hitch: this function does not admit a Taylor expansion at $0$! As such, this estimator cannot exist.
## Second Pass
When a single sample does not suffice, sometimes a fixed number of samples will be enough, i.e. perhaps one could draw $K > 1$ independent Poisson realisations, and combine them somehow. However, one can show that this strategy will also not work here. Roughly speaking, any such strategy could be used to devise a single-sample estimator of $(\lambda \cdot K)^{-1}$ from a single realisation of a $\mathrm{Poisson}(\lambda \cdot K)$, and from our earlier discussion, we know that this is not possible.
The next thing to try is then to draw a random number of samples, perhaps adaptively. For example, one could try to design a procedure which is based around drawing independent Poisson variables until one of them returns a nonzero result. This would be equivalent to simulating a Geometric random variable with success parameter $p = 1 - \exp(-\lambda)$, and so the expected number of failed trials satisfies
\begin{align}
\mathbf{E} [ F ] &= p^{-1} - 1 \\
&= (1 - \exp(-\lambda))^{-1} - 1 \\
&= ( \exp(\lambda) - 1)^{-1}.
\end{align}
As $\lambda \to 0^+$, this behaves like $\lambda^{-1}$, which suggests that we might be on the right path. I mean this in the following sense: if we are interested in estimating quantities of the form $g(\lambda)$, life is broadly easier if $g$ is smooth, continuous, etc. As such, we might hope to 'absorb the singularity' at $\lambda = 0$ somehow.
To this end, write
\begin{align}
\lambda^{-1} &= ( \exp(\lambda) - 1)^{-1} \cdot \frac{\exp(\lambda) - 1}{\lambda}.
\end{align}
Bearing this in mind, it becomes clear that if we are also able to form an independent unbiased estimator of $\frac{\exp(\lambda) - 1}{\lambda}$, then we are done.
Now, suppose that we implement this procedure, generating $F$ realisations taking the value of $0$, and then generating $X_*$ with the law of $\mathrm{Poisson} (\lambda)$ conditioned to be positive, i.e. with mass function
\begin{align}
\mathbf{P}_\lambda (X_* = n) := (\exp(\lambda) - 1)^{-1} \cdot \frac{\lambda^n}{n!}.
\end{align}
As with standard Poisson variables, the expectations under this law of several expressions involving rising and falling factorials are tractable. In particular, one can compute explicitly that
\begin{align}
\mathbf{E}_{\lambda} \left[ \frac{1}{X_* + 1} \right]
&=\lambda^{-1} - (\exp(\lambda) - 1)^{-1}.
\end{align}
It thus follows that $F + \frac{1}{X_* + 1}$ has expectation $\lambda^{-1}$, and we are done. As an aside (which is not strictly needed for the unbiasedness), it can be checked that $X_*$ and $F$ are independent of one another.
## Closing Comments
Some further refinements of this puzzle which might be worth considering are:
- Analyse the variance of this estimator, and describe a strategy for controlling the variance (either in absolute or relative terms) uniformly in $\lambda$.
- Given $K$ replications of $\{ (F_k, X_{*, k}) \}_{k \in [K]}$, one can improve the estimate of $\lambda^{-1}$ by simple averaging. Can one do anything smarter, in terms of variance reduction?
- Similarly, for $K \in \mathbf{N}$, devise an estimator of $\lambda^{-K}$ which makes optimal use of the samples which are generated along the way.