Hash Function Distribution
After examining distributed systems sharding strategies , I wanted to verify if hash computation follows a uniform distribution .
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
TL;DR: Yes it does.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
When it comes to solve such problems, R is my best friend.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Pseudo code:
The full code can be found at the end of the post.
An animated GIF is generated for each hash function, to make it more visual.
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
Hash Functions
MD5 |
SHA1 |
SHA256 |
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
|
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
|
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
|
MD5
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
SHA1
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
SHA256
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More โ
The Code
library(tidyverse)
library(digest)
library(animation)
item_max <- 500000
shard_count <- 10
lseq <- function(from, to, length.out) {
exp(seq(log(from), log(to), length.out = length.out))
}
get.shard <- Vectorize(function(value, hash_algo) {
digest2int(digest(value, hash_algo)) %% shard_count + 1
})
plot.hash <- function(shards, hash_algo, pass_count=100) {
for (item_count in as.integer(lseq(1, shards %>% nrow(), pass_count))) {
graph <- shards %>%
head(item_count) %>%
ggplot() +
geom_bar(aes(shard), na.rm = T) +
scale_x_continuous(
n.breaks = shard_count,
limits = c(0, shard_count + 1),
labels = c("", 1:shard_count, "")
) +
xlab("Shard") +
ylab("Count") +
ggtitle(
paste0("Hash Distribution (", hash_algo, ")"),
paste0(
hash_algo,
" hash function distribution for ",
prettyNum(as.integer(item_count), big.mark = " "),
" values."
)
) +
theme(
axis.ticks.y = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank()
)
print(graph)
}
}
shards <- tibble(
value = runif(item_max)
) %>%
transmute(
md5 = get.shard(value, "md5"),
sha1 = get.shard(value, "sha1"),
sha256 = get.shard(value, "sha256")
)
for (hash_algo in names(shards)) {
filename <- paste0("HashDistribution-", hash_algo, ".gif")
unlink(filename)
saveGIF(
plot.hash(shards %>% select(shard = all_of(hash_algo)), hash_algo),
movie.name = filename,
interval = .2,
ani.width = 600,
ani.height = 600,
autobrowse = F
)
}