# Hash Collision Problem
Let $n$ be the output length in bits of a hash function $h$, and $t$ mutually distinct message $(x_1, x_2, \cdots, x_t)$ be randomly selected to test collision $h(x_i) = h(x_j)$.
Estimate the probability of "at least one collision" to be $\lambda$. Show that when $n$ is large, the required number of messages is approximately
$$
t \approx 2^{(n+1)/2} \sqrt{\ln \left( \frac{1}{1 - \lambda} \right)}.$$
---
Assume $t$ is fixed. The probability of no collision, $1 - \lambda$, is $$1 \left( 1 - \frac{1}{2^n} \right) \left( 1 - \frac{2}{2^n} \right) \cdots \left( 1 - \frac{t - 1}{2^n} \right).$$
The Taylor series of $e^{-x}$ at $x = 0$ suggests that $e^{-x} \approx 1 - x$. The probability can be expressed approximately as $$1 \cdot e^{-1/2^n} \cdot e^{-2/2^n} \cdot \, \cdots \, \cdot e^{-(t-1)/2^n},$$
or $$\begin{align}
1 - \lambda &= \exp \left[ -t(t-1)/2^{n+1} \right]\\
&\approx \exp \left[ -t^2/2^{n+1} \right].\end{align}$$
Solving for $t$ yields the resulting approximation.
---
For [Birthday problem](https://en.wikipedia.org/wiki/Birthday_problem), $n = \lg 365$ and $\lambda = 1/2$. Compute $2^{(n+1)/2} = \sqrt{730}$. Inserting them into above result yields $$t = \sqrt{730} \cdot \sqrt{\ln 2} = 22.49\cdots.$$
In general, when there are $d$ choices, we anticipate 50% chance of presence of collisions if the number of random tests is above $$2^{(\lg d + 1)/2} \sqrt{\ln 2} = 1.17 \sqrt{d} = O(\sqrt d).$$