# Hash Collision Problem Let $n$ be the output length in bits of a hash function $h$, and $t$ mutually distinct message $(x_1, x_2, \cdots, x_t)$ be randomly selected to test collision $h(x_i) = h(x_j)$. Estimate the probability of "at least one collision" to be $\lambda$. Show that when $n$ is large, the required number of messages is approximately $$ t \approx 2^{(n+1)/2} \sqrt{\ln \left( \frac{1}{1 - \lambda} \right)}.$$ --- Assume $t$ is fixed. The probability of no collision, $1 - \lambda$, is $$1 \left( 1 - \frac{1}{2^n} \right) \left( 1 - \frac{2}{2^n} \right) \cdots \left( 1 - \frac{t - 1}{2^n} \right).$$ The Taylor series of $e^{-x}$ at $x = 0$ suggests that $e^{-x} \approx 1 - x$. The probability can be expressed approximately as $$1 \cdot e^{-1/2^n} \cdot e^{-2/2^n} \cdot \, \cdots \, \cdot e^{-(t-1)/2^n},$$ or $$\begin{align} 1 - \lambda &= \exp \left[ -t(t-1)/2^{n+1} \right]\\ &\approx \exp \left[ -t^2/2^{n+1} \right].\end{align}$$ Solving for $t$ yields the resulting approximation. --- For [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem), $n = \lg 365$ and $\lambda = 1/2$. Compute $2^{(n+1)/2} = \sqrt{730}$. Inserting them into above result yields $$t = \sqrt{730} \cdot \sqrt{\ln 2} = 22.49\cdots.$$ In general, when there are $d$ choices, we anticipate 50% chance of presence of collisions if the number of random tests is above $$2^{(\lg d + 1)/2} \sqrt{\ln 2} = 1.17 \sqrt{d} = O(\sqrt d).$$