Try  HackMD Logo HackMD

Hash Collision Problem

Let

n be the output length in bits of a hash function
h
, and
t
mutually distinct message
(x1,x2,β‹―,xt)
be randomly selected to test collision
h(xi)=h(xj)
.

Estimate the probability of "at least one collision" to be

Ξ». Show that when
n
is large, the required number of messages is approximately
tβ‰ˆ2(n+1)/2ln⁑(11βˆ’Ξ»).


Assume

t is fixed. The probability of no collision,
1βˆ’Ξ»
, is
1(1βˆ’12n)(1βˆ’22n)β‹―(1βˆ’tβˆ’12n).

The Taylor series of

eβˆ’x at
x=0
suggests that
eβˆ’xβ‰ˆ1βˆ’x
. The probability can be expressed approximately as
1β‹…eβˆ’1/2nβ‹…eβˆ’2/2nβ‹…β‹―β‹…eβˆ’(tβˆ’1)/2n,

or

1βˆ’Ξ»=exp⁑[βˆ’t(tβˆ’1)/2n+1]β‰ˆexp⁑[βˆ’t2/2n+1].

Solving for

t yields the resulting approximation.


For Birthday problem,

n=lg⁑365 and
Ξ»=1/2
. Compute
2(n+1)/2=730
. Inserting them into above result yields
t=730β‹…ln⁑2=22.49β‹―.

In general, when there are

d choices, we anticipate 50% chance of presence of collisions if the number of random tests is above
2(lg⁑d+1)/2ln⁑2=1.17d=O(d).