---
tags: dev, r, statistics, distributed
title: Time Collisions
description: A demonstration that we cannot rely on uniqueness of time generated identifiers
---
# Time Collisions
In his awesome book "[Designing Data-Intensive Applications](https://dataintensive.net/)" :heart:, [Martin Kleppmann](https://martin.kleppmann.com/) explains and demonstrates the dangerousness of relying on timestamps to synchronize events (section "Relying on Synchronized Clocks", p.291). :clock10:
Whoever uses timestamps to generate (expected) unique identifiers :id: will be exposed to the same issues.
Even if it seems obvious, some people still refuse to admit it [^1]. :confused:
In order to demonstrate that timestamp-based identifiers generation inevitably leads to collisions :collision:, I wrote a simple program in [R](https://cran.r-project.org/) beloved :heart_eyes: language.
*The full code can be found at the [end of the post](#The-Code).*
The exercise consisted in running **3 concurrent processes**, each generating **1 000 identifiers** using system time (with milliseconds precision).
A random sleep time going from 1 to 2 seconds occured between each generation in order to simulate a **variable execution time**.

*Diagram [^3] made with [Excalidraw](https://excalidraw.com/). Use it, share it!* :smiley:
*A similar real world application could be a serverless-one, with a message queue triggered function ([Amazon Lambda](https://aws.amazon.com/lambda/features/), [Azure Functions](https://azure.microsoft.com/en-us/services/functions/), or any of your choice).*
*Multiple instances of this function will be silmutaneously executed on the numerous compute resources hosted by cloud provider infrastructure [^4].*
And the verdict is... :tada:
Collisions **do** happen, as shown on these graphics:

***3 times** in 20 minutes.*

***3 times** in 25 minutes.*
I let you imagine the consequences depending on how such identifiers could be used... :flushed:
As a conclusion, **never ever use timestamps** to generate identifiers. [^2]
[^1]: Believe it or not, I witnessed this anti-pattern in some professional solutions. :tired_face:
[^2]: Prefer solutions such as [**U**niversally **U**nique **ID**entifiers](https://www.uuidgenerator.net/).
[^3]: <https://excalidraw.com/#json=5714824436645888,9uZWHFuCuM8B26PqtJPW9A>
[^4]: On Azure Functions, the [consumption plan](https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#consumption-plan) has an [upper-limit](https://docs.microsoft.com/en-us/azure/azure-functions/functions-scale#service-limits) of 200 instances.
> When you're using the Consumption plan, instances of the Azure Functions host are dynamically added and removed based on the number of incoming events. This serverless plan scales automatically.
## The Code
### job.R
```r
library(tidyverse)
library(parallel)
# Number of random sleep values
sleep_precision <- 1000
# Sleep duration in seconds
sleep_min <- 1
sleep_max <- 2
# Number of values to generate
values_count <- 1000
## Identifiers generation function ----
get_values <- function(sleep_precision, values_count, sleep_min, sleep_max) {
sleeps <- sample(sleep_precision, values_count, replace = T) / sleep_precision * sleep_max
sleeps[sleeps < sleep_min] <- sleep_min
sapply(sleeps,
function(s) {
Sys.sleep(s)
# Identifier is generated here
as.numeric(Sys.time()) * 1000
}
)
}
## Run cluster ----
cores_count <- detectCores() - 1
cluster <- makeCluster(cores_count)
process_start <- Sys.time()
res <- clusterCall(cluster,
get_values,
sleep_precision,
values_count,
sleep_min,
sleep_max)
process_end <- Sys.time()
stopCluster(cluster)
## Collect and save data ----
values <- tibble(
value = c(res, recursive = T)
)
attr(values, "sleep_min") <- sleep_min
attr(values, "sleep_max") <- sleep_max
attr(values, "values_count") <- values_count
attr(values, "cores_count") <- cores_count
attr(values, "duration") <- as.double(process_end) - as.double(process_start)
save(values,
file = paste0("data/TimeCollisions-", sleep_min, "-", sleep_max, "-", values_count, ".RData")
)
```
### dataviz.R
```r
library(tidyverse)
## Load data ----
load(file = "data/TimeCollisions-1-2-1000.RData")
sleep_min <- attr(values, "sleep_min")
sleep_max <- attr(values, "sleep_max")
values_count <- attr(values, "values_count")
cores_count <- attr(values, "cores_count")
duration <- attr(values, "duration")
## Build metrics ----
values_agg <- values %>%
group_by(value) %>%
summarise(count = n()) %>%
arrange(value)
(collisions_max <- values_agg %>% summarise(max = max(count)) %>% pull())
# Global collision rate
(collisions_rate <- (values_agg %>% filter(count > 1) %>% summarise(sum(count)) %>% pull()) / (values %>% nrow()))
# Collisions by group
collisions_rates <- values_agg %>%
group_by(count) %>%
summarise(c = n()) %>%
transmute(
group = count,
count = c
) %>%
ungroup() %>%
mutate(rate = count / sum(count))
# Total number of collisions
(collisions_count <- collisions_rates %>%
filter(group > 1) %>%
transmute(group_count = (group - 1) * count) %>%
summarise(sum(group_count)) %>%
pull())
## Dataviz ----
# Base colour = "seagreen"
values_colours <- c(rgb(46, 139, 87, alpha = 128, maxColorValue = 255), colorRampPalette(c("orange", "red"), alpha = T)(collisions_max - 1))
values_agg %>%
ggplot() +
geom_col(aes(1:(values_agg %>% nrow()),
count,
fill = factor(count)),
show.legend = F,
width = 1) +
geom_point(aes(row, count, size = 2),
shape = 8,
color = "red",
show.legend = F,
data = values_agg %>% mutate(row = row_number()) %>% filter(count > 1)) +
scale_fill_manual(values = values_colours) +
scale_y_continuous(
breaks = seq(1, collisions_max)
) +
xlab("Identifiers") +
ylab("Occurences") +
ggtitle(
"Time Collisions",
paste0(
"Collision rate of ",
sprintf("%.02f%%", collisions_rate * 100),
" (",
collisions_count,
" collisions) for ",
prettyNum(as.integer(values_count), big.mark = " "),
" identifiers (cores = ",
cores_count,
", sleep time = ",
sleep_min,
"-",
sleep_max,
"s, total duration = ",
sprintf("%.02f", duration),
"s)."
)
) +
theme(
axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.minor.y = element_blank()
)
```