Try โ€‚โ€‰HackMD

Hash Collision Problem

Let

n be the output length in bits of a hash function
h
, and
t
mutually distinct message
(x1,x2,โ‹ฏ,xt)
be randomly selected to test collision
h(xi)=h(xj)
.

Estimate the probability of "at least one collision" to be

ฮป. Show that when
n
is large, the required number of messages is approximately
tโ‰ˆ2(n+1)/2lnโก(11โˆ’ฮป).


Assume

t is fixed. The probability of no collision,
1โˆ’ฮป
, is
1(1โˆ’12n)(1โˆ’22n)โ‹ฏ(1โˆ’tโˆ’12n).

The Taylor series of

eโˆ’x at
x=0
suggests that
eโˆ’xโ‰ˆ1โˆ’x
. The probability can be expressed approximately as
1โ‹…eโˆ’1/2nโ‹…eโˆ’2/2nโ‹…โ‹ฏโ‹…eโˆ’(tโˆ’1)/2n,

or

1โˆ’ฮป=expโก[โˆ’t(tโˆ’1)/2n+1]โ‰ˆexpโก[โˆ’t2/2n+1].

Solving for

t yields the resulting approximation.


For Birthday problem,

n=lgโก365 and
ฮป=1/2
. Compute
2(n+1)/2=730
. Inserting them into above result yields
t=730โ‹…lnโก2=22.49โ‹ฏ.

In general, when there are

d choices, we anticipate 50% chance of presence of collisions if the number of random tests is above
2(lgโกd+1)/2lnโก2=1.17d=O(d).