owned this note
owned this note
Published
Linked with GitHub
# Hidden Files on MySky
###### tags: `Specifications`
This document attempts to describe the cryptographic techniques used on MySky to provide the users with secure, encrypted storage on Skynet, which is otherwise a very public system. When users are storing private data on MySky, we wish to provide the following guarantees:
+ The contents of a file cannot be viewed by an attacker
+ The metadata of a file is not uniquely identifying
+ The file itself cannot be linked to the user
+ Multiple files uploaded together cannot be linked to each other
+ Side channel attacks such as timing analysis are inhibited
We also wish to provide the following features to the user
+ Files are structured like a filesystem with nested folders and files
+ A user can share subfolders of their filesystem with friends
### The Skynet Architecture
Skynet is best thought of as a global operating system, where all users can see all files. At the lowest level, files on Skynet exist in a key-value store, where the key is a pubkey+tweak, and the value is the file data plus some metadata. This key-value store is called the Sia registry.
Anyone with sufficient resources has the ability to view all files on Skynet, including watching all updates to files and immediately being notified when a new file is created. The cost of doing so scales linearly with the total number of files on Skynet, however at the current scale of Skynet this sort of analysis is still within reach of hobbiests.
MySky is a filesystem abstraction on top of Skynet that translates filepaths to key-value pairs in the Sia registry. The discoverable part of the filesystem uses the same public key for every file, and has a derivation method to get the tweak, meaning anyone with a user's pubkey can easily find and explore that user's discoverable filesystem.
To provide the desired guarantees for a user's hidden filesystem, a different approach is needed.
## Pubkey and Tweak Derivation
We wish to unlink the user from the file. This means that we cannot re-use public keys between hidden files. Instead, we will use Hierarchical Deterministic public keys to derive the public key associated with each file. This allows every file to have a different public key, but also gives the user the ability to share a single piece of data with a friend that allows the friend to find all of hidden files within the folder that was shared.
Adversaries can see the full list of public keys, and we do not want the adversary to be able to link a set of files that are in the same folder. This means that the derivation path needs to depend on a secret which is not visible to the adversary.
A user's hidden filesystem is therefore composed of two elements. A root secret key which is used to derive all of the HD keys that compose the files in the hidden filesystem, and a root "path seed" which is used to blind the derivation paths. This path seed also mutates at each level, such that someone who has the path seed for one folder cannot learn the path seed for other folders.
Both the root secret key and the root path seed are derived from the user's MySky seed, which provides the fundamental entropy for all of the user's operations.
```go
rootSecretKeyEntropy := Hash("hidden filesystem secret key" || mySkySeed)
rootPathSeed := Hash("hidden filesystem path seed" || mySkySeed)
```
Each file and subfolder is derived using the following methodology:
```go
type DerivationPathObj struct {
pathSeed hash // The parent path seed
directory bool // True if this is a directory, false otherwise
name string // The name of the file or subfolder
}
derivationPath := Hash(aDerivationPathObj)
childHiddenKey := DeriveChildKey(parentHiddenKey, derivationPath)
childPathSeed := Hash("child" || derivationPath)
childTweak := Hash("tweak" || derivationPath)
```
The derivation path for each child file and folder is chosen by hashing an object that contains the name of the child file, the pathSeed, and then a bool indicating whether the new pubkey points to a file or a folder. The pathSeed is kept secret from adversaries, preventing them from connecting a parent folder to the child files within the folder.
The bool that indicates whether the underlying object is a file or folder is important to protect the user against situations where they share a file with a friend, and then later delete that file and create a directory with the same name. Without the bool, their friend would be able to see all files within the new folder, even if that is not the user's intention. With the bool, the user can be certain that nobody can see the contents of the folder unless the folder is explicitly shared.
## Timing Obfuscation
If a user is accessing and/or updating multiple files at once, an attacker may be able to correlate these accesses, and use the access patterns to learn metadata about the user.
Unlike the files themselves, the access patterns are not trivially available to adversaries. Files are primarily accessed on a subscription basis, where a user accesses a file a single time, and then instructs the hosts to serve them updates as the files get modified. Right as a user begins a session however, they may subscribe to all related files at once, and that could expose information about their activity. At this time, we do not have a good strategy for subscribing to files in a way that obfuscates which files belong to the same group.
An adversary that is subscribed to all files on the network will be able to see the timings of any updates made to the files. At this time, we do not have a good strategy for updating files in a way that obfuscates which files belong to the same group.
The primary barrier is user experience. Any timing obfuscation added to the subscription process increases the time-to-first-byte of data viewed by the user. Obfuscation on the order of low double digit milliseconds may be acceptable, but it also is not likely to be very effective. A similar barrier exists for updating files. The tolerance for latency for updates is closer to triple-digit milliseconds, but any obfuscation still results in an anonymity set that decays very quickly in the presense of a global adversary.
There are some techniques which are showing promise. For example, a user could periodically rotate/alter the path seed of their files, leaving a bread crumb trail for their friends to follow. This bread crumb trail increases latenty, but is invisible to the global adversary and resets any analyic progress the adversary may have made on linking multiple files together. This allows us to get away with obfuscation techniques that are statistically imperfect, yet don't result in the eventual total breakdown of anonymity.
## Network Layer Privacy
The biggest exposure of privacy for hidden files is the IP address that is used to access and update the files. Skynet currently does little to mitigate the expose of this information, however tools like VPNs and Tor can be leveraged to minimize exposure.
We consider the network layer to be a separate concern from the filesystem layer, as the mitigations impact much broader pieces of Skynet, and also benefit much broader pieces of Skynet. We believe that it is possible to sufficiently obfuscate the network layer external to the design of the MySky hidden filesystem.