And, one very big, core note:
you can design an archival filesystem, or a filesystem for getting things done – and not both.
[{"name":"x","type":"dir"},{"name":"x/y","type":"file"},...]
Is it useful to put attribs (other than name) in their own block, so when they're the same, the CID is simply the same?
Also makes attribs "fixed size" which might be useful (but names still aren't, so, eh).
How many different ways to do attributes are there?
type Attribs struct {
mtime Int
posix Int # Ye standard 0777 mask packing here?
sticky Bool (implicit: false)
setuid Bool (implicit: false)
setgid Bool (implicit: false)
uid Int
gid Int
}
type Attribs struct {
mtime Int
posix Int # Ye standard 0777 mask packing here?
sticky Bool (implicit: false)
}
type Attribs struct {
mtime optional Int
}
type Attribs struct {
}
type Attribs struct {
mtime Int
posix Int # Ye standard 0777 mask packing here?
sticky Bool (implicit: false)
uid Int
gid Int
}
security.*
"). So this prompts "archival, or fit-for-purpose" again.Basically: chunking is Good. Except: people like using basic-bitch hashes on per-file granularity because it's easier for everyone to agree on. What do?
(Not clear there's actually anything great possible here period.)
Rolling checksums are the name of the game. There are several of these. Correctly implemented rolling checksums for chunkfinding should be very very fast, nearly invisible in performance compared to cryptographic hashing.
Update: some buzhash based things have been pushed into go-ipfs-chunker! Probably we should use them.
https://www.cyphar.com/blog/post/20190121-ociv2-images-i-tar
casync
has a (overly?) comprehensive list of things you could conceivably save metadata on: https://github.com/systemd/casync/blob/e4a3c5efc8f11e0e99f8cc97bd417665d92b40a9/src/caformat.h#L82-L125
Timeless Stack rio
has a bunch of code around filesystem attribute normalizers which is the result of some real-world user stories discovery around needing to hash things deterministically+convergently while handling full container filesystems: https://github.com/polydawn/go-timeless-api/blob/0ece408663edb9dbc6109ef8690f8425f0d8f5f4/filesetFilters.go