owned this note
owned this note
Published
Linked with GitHub
# Encrypted Files on MySky
###### tags: `Specifications`
This document is intended to describe the protocol for storing encrypted data on MySky. Users will be able to encrypt data and store it in files and folders, such that they can share the encryption key to just a single file, or to an entire nested folder.
Note that this is a separate specification from the hidden files specification. Encrypted files can still be associated with the user after being stored on Skynet. Hidden files provide better protection, preventing outside observers from knowing that a file is owned by the user. Encrypted files have been created as an interim step, as the advanced cryptography required to make hidden files work will take longer to implement.
## Pubkey and Tweak Derivation
The pubkey is the same is the user's discoverable pubkey, but the tweak is derived using a private "path seed" to ensure that encrypted files are in a different namespace than discoverable files.
The root path seed is defined by:
```go
rootPathSeed := sha512(append(sha512("encrypted filesystem path seed"), sha512(mySkySeed)...))
```
The path seed and file tweak for a child file or folder is derived using the following methodology:
```go
type DerivationPathObj stuct {
pathSeed hash
directory bool
name string
}
derivationPath := sha512(derivationPathObj)
childPathSeed := sha512(append(sha512("encrypted filesystem child"), sha512(derivationPath)...))
childTweak := sha512(append(sha512("encrypted filesystem tweak"), sha512(childPathSeed)...))
```
## Encryption
MySky will be supporting multiple types of files, each of which has their own encryption scheme. It should be noted that Skynet data is already authenticated, encryption schemes for encrypted files do not need authentication to be secure.
One weakness of supporting multiple filetypes is that different types of files are distinct in how they are accessed and updated. Encrypted files of different types may be distinguishable to a network adversary, but this may be acceptable as long as all formats are commonly used and files are sufficiently unlinked from their users and from each other.
Regardess of file type, the entropy for the encryption key of a file is generated with the following method:
```go
encryptionKeyEntropy := Hash("encryption" || pathSeed)
```
### Json Files v1
A json file is a file that is storing a single json object. The file is always uploaded and downloaded in its entirety, and always decrypts to a single json object.
The file is encrypted using `xsalsa20-poly1305`. The main reason for choosing this algorithm is that our SDK already supports this algorithm thanks to other cryptography we do elsewhere. `xsalsa20-poly1305` is unnecessarily authenticated, but in this case we view the savings in SDK size and complexity as being more important than the computational savings and filesize savings of using unauthenticated encryption.
One notable weakness of `xsalsa20-poly1305` is that data cannot be safely modified. This is acceptable for the json files because updates to the file always re-write the entire file using new IVs.
### Raw Files v1
A raw file is an interface provided to developers that mimics the traditional file format. Developers can do partial reads and partial writes to the file, and are not limited in the type of data that can be stored.
The file is broken up into sectors of 4096 bytes, each sector containing 4056 data bytes from the file and 40 bytes of overhead. The overhead appears at the front of the sector. The first 24 bytes of the overhead are a nonce, and the next 16 bytes of overhead are an authentication tag. The nonce and authentication tag are part of the `xsalsa20-poly1305` encryption algorithm. Each sector of the file is encrypted separately using `xsalsa20-poly1305` with a randomly generated nonce.
When the file is modified, entire sectors must be modified together. The nonce must be re-randomized, which means the entire sector will need to be re-encrypted and re-uploaded.
We have chosen `xsalsa20-poly1305` as our cipher because there is good support for this cipher in javascript and other languages, and because the cipher is already imported by other dependencies of our SDK. We could save ourselves the 40 bytes of overhead in each sector by using a tweakable cipher such as AES-XTS or Threefish, however there is not good support for either in javascript at this time.
## Padding
To prevent analysis that can occur by looking at the sizes of files, all encrypted files will be padded to the nearest "pad block", after encryption. A pad block is minimally 4 kib in size, is always a power of 2, and is always at least 5% of the size of the file.
For example, a 1 kib encrypted file would be padded to 4 kib, a 5 kib file would be padded to 8 kib, and a 105 kib file would be padded to 112 kib. Below is a short table of valid file sizes:
```
4 KiB 8 KiB 12 KiB 16 KiB 20 KiB
24 KiB 28 KiB 32 KiB 36 KiB 40 KiB
44 KiB 48 KiB 52 KiB 56 KiB 60 KiB
64 KiB 68 KiB 72 KiB 76 KiB 80 KiB
88 KiB 96 KiB 104 KiB 112 KiB 120 KiB
128 KiB 136 KiB 144 KiB 152 KiB 160 KiB
176 KiB 192 Kib 208 KiB 224 KiB 240 KiB
256 KiB 272 KiB 288 KiB 304 KiB 320 KiB
352 KiB ... etc
```
Note that the first 20 valid sizes are all a multiple of 4 KiB, the next 10 are a multiple of 8 KiB, and each 10 after that the multiple doubles. We use this method of padding files to prevent an adversary from guessing the contents or structure of the file based on its size.