Time Collisions
In his awesome book "Designing Data-Intensive Applications"
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
, Martin Kleppmann explains and demonstrates the dangerousness of relying on timestamps to synchronize events (section "Relying on Synchronized Clocks", p.291).
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Whoever uses timestamps to generate (expected) unique identifiers
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
will be exposed to the same issues.Even if it seems obvious, some people still refuse to admit it .
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
In order to demonstrate that timestamp-based identifiers generation inevitably leads to collisions
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
, I wrote a simple program in R beloved
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
language.The full code can be found at the end of the post.
The exercise consisted in running 3 concurrent processes, each generating 1 000 identifiers using system time (with milliseconds precision).
A random sleep time going from 1 to 2 seconds occured between each generation in order to simulate a variable execution time.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Diagram made with Excalidraw. Use it, share it!
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
A similar real world application could be a serverless-one, with a message queue triggered function (Amazon Lambda, Azure Functions, or any of your choice).
Multiple instances of this function will be silmutaneously executed on the numerous compute resources hosted by cloud provider infrastructure .
And the verdict is…
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Collisions do happen, as shown on these graphics:
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
3 times in 20 minutes.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
3 times in 25 minutes.
I let you imagine the consequences depending on how such identifiers could be used…
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
As a conclusion, never ever use timestamps to generate identifiers.
The Code
job.R
dataviz.R
library(tidyverse)
load(file = "data/TimeCollisions-1-2-1000.RData")
sleep_min <- attr(values, "sleep_min")
sleep_max <- attr(values, "sleep_max")
values_count <- attr(values, "values_count")
cores_count <- attr(values, "cores_count")
duration <- attr(values, "duration")
values_agg <- values %>%
group_by(value) %>%
summarise(count = n()) %>%
arrange(value)
(collisions_max <- values_agg %>% summarise(max = max(count)) %>% pull())
(collisions_rate <- (values_agg %>% filter(count > 1) %>% summarise(sum(count)) %>% pull()) / (values %>% nrow()))
collisions_rates <- values_agg %>%
group_by(count) %>%
summarise(c = n()) %>%
transmute(
group = count,
count = c
) %>%
ungroup() %>%
mutate(rate = count / sum(count))
(collisions_count <- collisions_rates %>%
filter(group > 1) %>%
transmute(group_count = (group - 1) * count) %>%
summarise(sum(group_count)) %>%
pull())
values_colours <- c(rgb(46, 139, 87, alpha = 128, maxColorValue = 255), colorRampPalette(c("orange", "red"), alpha = T)(collisions_max - 1))
values_agg %>%
ggplot() +
geom_col(aes(1:(values_agg %>% nrow()),
count,
fill = factor(count)),
show.legend = F,
width = 1) +
geom_point(aes(row, count, size = 2),
shape = 8,
color = "red",
show.legend = F,
data = values_agg %>% mutate(row = row_number()) %>% filter(count > 1)) +
scale_fill_manual(values = values_colours) +
scale_y_continuous(
breaks = seq(1, collisions_max)
) +
xlab("Identifiers") +
ylab("Occurences") +
ggtitle(
"Time Collisions",
paste0(
"Collision rate of ",
sprintf("%.02f%%", collisions_rate * 100),
" (",
collisions_count,
" collisions) for ",
prettyNum(as.integer(values_count), big.mark = " "),
" identifiers (cores = ",
cores_count,
", sleep time = ",
sleep_min,
"-",
sleep_max,
"s, total duration = ",
sprintf("%.02f", duration),
"s)."
)
) +
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)