SAFE Network
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Help
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Brainstorming SAFE Network FUSE filesystem ###### tags: `filesystem` `fuse` Nothing is all figured out here. This is a brainstorming document, intended to layout the current files API landscape and explore how things could evolve. Here's a [slack discussion](https://maidsafe.slack.com/archives/CMTL335N3/p1593454821207800), with some background. See also: [Survey of RUST/FUSE FileSystem Libraries.](/BHGtRwNXSUGUdRoe5Mkjcw) ## Current Situation Presently, the safe-api exposes the concept of a FilesContainer, which consists of a BTreeMap where the keys represent file paths and the values represent metadata. The metadata is another BTreeMap of key/val pairs. The FileContainer gets serialized to JSON in its entirety and stored on the network as one version of a PublicSequence. Changes and Reads of any path require [de-]serializing the entire directory structure, and filtering or re-writing it. This becomes expensive for large tree structures. The exposed API is high-level and largely reflects the underlying "must grab entire FileContainer" reality. There is no equivalent to read_dir() for example. Also, concepts such as symlinks and glob() must be re-invented to provide basic functionality that CLI users have been accustomed to for decades now. Some user goals, such as 2 way syncing cannot be done efficiently with the provided API. The Files API itself has a lot of internal complexity and inefficiency around filtering one file or directory from the entire set and manipulating paths. It has been [suggested](https://maidsafe.slack.com/archives/CMTL335N3/p1593523885236800?thread_ts=1593454821.207800&cid=CMTL335N3) that when the crdt PublicMap type is complete FileContainer can be modified to use it instead, thereby allowing changes to individual paths without re-writing the entire directory structure. That should provide a performance improvement. However, it is not clear (to me) yet how an efficient implementation of read_dir() could be provided with this. ( Actually, an idea just occurred: store child list in directory metadata. Of course this duplicates path data in the map keys, must be kept in sync, etc. ie, gross. ) In summary, I worry that: 1. The present files API is too inefficient for large tree structures. 2. The API is too high level and limiting for some (many?) applications 3. The API is becoming too complex internally 4. The API is quite different from what developers and users are used to. 5. We (I?) needlessly spend time re-inventing the wheel with things like symlinks, glob support, etc. ## Baby, Bathwater Despite the above criticisms, the FileContainer mechanism has nice features we don't want to lose: * integrates nicely with SafeURL. Every bit of content can be given a URL, and accessed via safe-browser * integrates with NRS system * all data is versioned, and any version can be referenced/retrieved via SafeURL. Any alternative design should consider how to incorporate these features. ## Benchmarking current API performance. I haven't done this, though it seems a useful exercise to get an idea of performance as trees get larger (broader and deeper). Will post a link here if I (or anyone) does it. ## How a native/FUSE FileSystem can help The general idea of a FUSE based filesystem is that we implement the FUSE API in such a way that operating system calls to open(), write(), read_dir(), etc, etc are translated to SAFE Network calls for storing and retrieving data, possibly with a local (mem or disk) cache as an optimization. In this scenario, a SAFE Network FileContainer (or some other level of granularity) would be locally mounted, just as any other filesystem. At this point, we get pretty much all of the subcommands in `safe files` for free, and much more besides, because native OS tools for exploring, listing, copying, symlinking, rsyncing, scp'ing, etc all just work. So too, do all the scripts and programs people have already written that perform file operations. Further, in the process of implementing the FUSE interface, we will necessarily create a low-level filesystem API that can be exposed to Safe network aware apps. That is to say that our code should be usable by Safe-Apps directly even if not FUSE mounted. ## Prior Efforts ### Safe NFS @dirvine maybe you can provide here some history/architecture (maybe docs?) of SAFE NFS, and why we aren't still using it? It would be good to understand what worked and/or didn't work, to learn from. Here's some links I dug up: * [NFS Drive](https://github.com/maidsafe-archive/MaidSafe-Drive/blob/next/docs/nfs_drive.md) * [FileSystem Hierarchy Discussion](https://safenetforum.org/t/filesystem-hierarchy/4987/14) From Josh, on slack: > safe-nfs still exists : https://github.com/maidsafe/safe-client-libs/tree/master/safe_core/src/nfs , it's a series of (largely deprecated, but still in the codebase) APIs. This was superceded by FilesContainers as we fleshed out safe-api/cli, which was largely the same idea but with (pseudo) RDF representations of metadata and simplified API for interacting with it to support NRS in these new APIs. ### Safe.NetworkDrive [Safe.NetworkDrive](https://github.com/oetyng/SAFE.NetworkDrive) was written by Edward, before joining MaidSafe. It is designed for efficiency, supports versioning/snapshots/rollbacks, and generally seems a promising starting point. It is written in C# for windows platform, so would need to be ported to rust and made more cross-platform... at least the backend components. See [Architecture Overview](https://safenetforum.org/t/release-safe-networkdrive-on-windows-v-0-1-0-alpha-1/27879/18?u=oetyng) From github: > Event sourced virtual drive, writing encrypted WAL to SQLite, synchronizing to MockNetwork and local materialization to in-memory virtual filesystem. From Slack: > This doesn't use the SAFENetwork file api, but instead just stores the incremental changes - events - to the network, and then builds up a virtual filesystem in memory with these events. It does snapshot the filesystem structure (i.e. all metadata etc, and pointers to the actual data). > That's right, no FileContainer at all. It just uses an ever-expandable database that I implemented over MDs, to insert events -an event store. The eventstore has Snapshot functionality. So my impl did a snapshot of the filesystem structure every 1000 events, and recorded it as an ImD. So regardless of how many events I produced during the drive's lifetime, it would select at most 999 from the DB + 1 snapshot. >> So what does the metadata storage look like on the network? >> > What happened is that all actual data was stored as chunks on the network, and the events contained all data about what had happened and a pointer to data in case new data was uploaded. [Full discussion here](https://maidsafe.slack.com/archives/CMTL335N3/p1593454821207800) This design using AppendOnlyDB seems about the best we could hope for atop a Sequence data type. However, merging concurrent updates remains an unsolved problem. Thus, a better design could be to marry the local cache and instant response aspect of this design with a CRDT Tree data type (introduced below). ### FileTree-PHP This is a little prototype written by @danda over a couple days. The idea here is to implement FileItem as a tree structure of file metadata to make file operations more efficient. The end goal would have been to change FilesMap in safe-api from a BTreeMap to a tree of FileItem with a "real" traversable root directory. This toy prototype is simple, but already supported: * efficiently reading children of node, ie read_dir() * lookup by path * dirs/files/symlinks * efficiently adding/remove paths at any level * serialization to/from flat FileContainer json * serialization to/from nested json. This prototype only stored metadata, just as a FilesContainer did. In this sense, it does not qualify as an "in-memory filesystem". However, the process of implementing it got me to thinking more about filesystem concepts and led me to start investigating FUSE in-mem filesystems, etc and in general thinking about other approaches we could take. ## Thoughts on Design As I see it, there are two high-level approaches to a FUSE based SAFE Network filesystem: 1. Read/write directly to/from network. 2. Read/write to memory file system that periodically syncs with network. These approaches could potentially exist as modes in a single implementation. (1) is going to have considerable latency which causes applications to block, so (2) can be thought of as a performance optimization for (1). Safe.NetworkDrive implements (2) and goes to considerable lengths to support offline operation. Latency is reportedly quite low and the system feels snappy. Complexity arises around merging changes, eg if two or more devices (mounts) modify or move the same file/dir since the last sync. Strategies are needed to deal with such merge conflicts. Here `CRDT-Tree` (below) offers a path forward via eventual consistency. In general I believe Safe.NetworkDrive solves a lot of the problems. It is worth reviewing the design in detail and possibly/probably porting much of it to rust. [syncer](https://github.com/pedrocr/syncer) is another caching/remote fuse filesystem worth reviewing, and maybe could be built upon. ### Platform Differences The main platforms to consider for now are unix and windows. Well, iOS and Android also, but afaict, they can't actually mount FUSE, so apps would need to use a SAFE Network library for transfers. A question naturally arises if we should try to have a single rust implementation for all platforms, or independent implementations. My initial thought is that we should strive for a single, mostly platform-independent API library/crate. Let's call it `safe-fs` for now. So `safe-fs` would be used by `safe-fs-fuse-daemon` which is (slightly) specific to each platform. ### Related Research Paper: [A highly-available move operation for replicated tree and distributed filesystems](https://martin.kleppmann.com/papers/move-op.pdf) Code: [Github](https://github.com/trvedata/move-op/) Klepmann, et al. Awaiting publication. > * We define a CRDT for trees that allow *move* operations without any coordination between replicas such as locking or consensus. This has previously been thought to be impossible to achieve. > * We formalize the algorithm using Isabelle/HOL, a proof assistant base on higher-order logic, and obtain a computer-checked proof of correctness. In particular, we prove that arbitrary concurrent modifications to the tree can be merged such that all replicas converge to a consistent state, while preserving the tree structure. > * To demonstrate the practical viability of our approach, we refine the algorithm to to an executable implementation within Isabelle/HOL and prove the equivalence of the two. We extract a formally verified Scala implementation from Isabelle and evaluate its performance with replicas across three continents. > * We perform experiments with Dropbox and Google Drive, and show that they exhibit problems that would be prevented by our algorithm. **PHD Dissertation: A Highly Available Distributed Filesystem** See writeup: [Notes on PHd Dissertation: A Highly Available Distributed Filesystem.](/1dBKrtkQTa656k070hxVvw) ### Network Storage #### Data Representation The `CRDT-Tree` presented by Kleppman et al seems to check all the boxes for us. In particular, it is a tree structure, designed with a distributed filesystem in mind. As a CRDT, it is eventually consistent, thus solving the merge challenges for concurrent modifications noted by @edward in his design... problems that even affect Dropbox and Google Drive. Also, the paper shows that, even without optimizations, the design is performant enough to achieve 600 writes/sec with replicas on 3 continents. A practical/implementation problem for us at this time is that this design only exists as pure Isabelle/HOL logic and machine generated (and formally proven) Scala code. In an ideal world, the Isabelle/HOL tool could also generate formally proven Rust code. That not being the case (I checked), we will need to create our own Rust implementation. #### Tree Data Type A `CRDT-Tree` gives us a new tree datatype to work with. Let's call it simply the `Tree` data type, a sibling of `Map` and `Sequence`. Being a generic tree data structure, it could support storing any tree-ish data. #### FileContainer --> FileTree We presently have a `FileContainer`, which is a specialization of a Sequence. The `FileContainer` can be modified to utilize the `Tree` data type instead. I propose that we also change the name to `FileTree` which reflects that it is a specialization of `Tree`. The term `FileTree` will be used henceforth in this document to differentiate from the `Sequence` based `FileContainer`. Like a `FileContainer`, the `FileTree` type will store only tree metadata. Files themselves will continue to be stored as `ImmutableObject`. Unlike the `FileContainer` API, the `FileTree` API will support standard filesystem calls such as opendir(), readdir(), fopen(), fread(), fwrite(), fclose(). It is proposed that there will be a low-level (non-caching) API and high-level (locally cached) API. Or possibly a single API with some type of *no_cache* flag. These calls are necessary to build a FUSE filesystem and also will enable SAFE applications to perform tasks that could not be done with the current safe-api. #### Code Layout - Thoughts The `CRDT-Tree` CRDT implementation could exist in rust-crdt, assuming David Rusu agrees, or possibly in its own crate, tbd. The `Tree` Safe Data Type wrapping `CRDT-Tree` could exist in safe-nd crate, or possibly in its own crate, tbd. The `FileTree` API should exist in its own crate/module, ie safe-filetree. It can be used by safe-api, safe-cli, safe-browser, safe-fuse, etc. The `SafeFuse` mountable filesystem should exist in its own repo, safe-fuse, as a library + executable. It will be a translation layer between fuse and safe-filetree API. Here is a diagram, provided by @happybeing that encapsulates the above. ![](https://forum.safedev.org/uploads/default/original/2X/8/829fb56b9b945efc5376cd9c6f0c041a220aa1aa.png) #### Mounts ##### What can be mounted? ###### Mounting CRDT-Trees Each `FileTree` object represents a root directory and becomes a mount point. A question immediately arises: should only the root of the `FileTree` be mountable, or any sub-directory? Logically, any directory could be chosen as "root" of the local mount. Seems nicer, if technically achievable. May be problematic for symlinks though, so for now let's keep it simple and specify that only the root of a `FileTree` is mountable, not any sub-tree thereof. note: We also want safe-browser to be able to load arbitrary public `FileTree` and view the contents. That should be possible, without mounting, providing it uses the same APIs for reading the `FileTree` as the FUSE daemon does. ###### Global Mount? Does it make sense for us to have a single global (worldwide) mount for public data? That appears to be what ipfs does, eg: ``` /ipfs/<hash>/path/to/file ``` Here is a real IPFS link via cloudflare gateway. Note the path after the hash: https://cloudflare-ipfs.com/ipfs/QmXoypizjW3WknFiJnKLwHCnL72vedxjQkDDP1mXWo6uco/wiki/Bill_English.html So the mount point is /ipfs, which provides access to a global ipfs namespace, where content (directories and files) is addressed via CID (hash). The above seems very analagous to SAFE Network XorNames in a single global namespace. The primary advantage of mounting the entire global namespace is that one can easily reference more than one XorName with a single mount, and copy/move files between them. However, if a `FileTree` is our fundamental representation of a filesystem that we design our Filesystem API around, then we have some related issues: 1. There are many distinct `FileTree` objects. Any user can create as many as they like. 2. There are many Data Types in SAFE that are NOT `FileTree` or any type of `Tree`. These objects could be ignored, or provided as a file with metadata info, so not a big problem. In theory, it should be possible for a filesystem layer to aggregate any number of`FileTree` beneath a global `/safe` root and thereby implement a global namespace FS, similar to /ipfs. But that requires special cases for `/` root directory and the `FileTree` root directories and there are theoretical problems trying to move items between different underlying `CRDT-Tree` data objects. Thus for now, it seems simplest and best (KISS principle) to build the FileSystem API around the `FileTree` data type only. Possibly a later phase/iteration of the project could extend it to a global namespace. ### Local Storage Data that is written to or read from the filesystem API can be cached locally as operations in an event log, either in memory or possibly on disk (via another mounted filesystem) for fast access. The idea would be to: 1. Download all the filesystem data (tree structure+files) at time of mount. After the first mount, this can be only the diffs. 2. Make file operations blazing fast because they are all performed locally, and operations written to an event log, which is periodically bi-directionally synced with the network 3. enable working with the mounted filesystem even offline. Of course, this scheme is only workable if the local storage (mem, disk) is large enough to contain the remote `FileTree` and associated `ImmutableObjects`. See the design of Safe.NetworkDrive for details. Also, the paper [Local-first software: You own your data, in spite of the cloud](https://www.inkandswitch.com/local-first.html) by Kleppman et al is worth a read. ### NRS, SafeUrl Linking SafeUrl resolution should work with a `FileTree` similar to how it works with a FileContainer. The main difference in the code is that when URL resolution reaches a `FileTree` node, all further path resolution would be handled by the `FileTree` api. ## Open Issues ### from @scnOr4R4TKa64D1BpaAzpQ A couple of comments on that paper/algo ^^ as per my understanding: - Unless I'm misunderstanding, the comment from @dirvine about "caching" ops non-causally ready (in yesterday's meeting about policies) seems to be applicable to this algo too, as it keeps "unsafe" operations in the log as they could eventually become "safe", pag 7, section 3.5: > Note that the safety of an operation (whether or not it would introduce a cycle) may change as subsequent operations with lower timestamps are applied. For example, an operation may initially be regarded as safe, and then be reclassified as unsafe after applying a conflicting operation with a lower timestamp. The opposite is also possible: an operation previously regarded as unsafe may become safe through the application of an operation that removes the risk of introducing a cycle. For this reason, the operation log must include all operations, even those that were ignored. ### File Content Storage The `CRDT-Tree` data type is only meant for storing tree data, ie the tree structure itself. I suppose in theory it could store file data in the metadata associated with each node/triple but that does not seem to be what the paper's authors contemplate as they state: > In a distributed filesystem, replication and conflict resolution is required not only for the directory structure, but also for the contents of individual files. This can be accomplished by using CRDTs for file contents as well. We discuss distributed filesystems furth in S6.2. However, S6.2 is a general survey of distributed filesystem projects and does not address use of a CRDT for file contents. Further, in SAFE Network we already have ImmutableData for storing/chunking file content. So long as ImmutableData becomes implemented as a CRDT, then we should be ok here I think... ### Causally Stable Threshold The `CST` is defined as the lowest timestamp/clock of the set of known replicas, where all replicas are known. Operations that occurred before the CST can be discarded/pruned. This is useful for keeping log sizes small and is also a requirement for emptying the filesystem Trash (which nodes are moved to when deleted/unlink'ed). I had some concerns about the `CST` for our use case, so I asked Martin Kleppman, author of the paper: > On 16 Aug 2020, at 18:59, Dan wrote: > > Hi Martin, > > From your paper: > > > In this case, we can keep track of the most recent timestamp we have seen from each replica (including our own) and the minimum of these timestamps is the causally stable threshold. > > Ok, so let's say we have replicas a, b, c, with timestamps: {a: 500} and {b: 200}. C has never initiated any operation -- maybe it is a read-only replica or only an infrequent writer. All replicas have converged to same state. > > In this case, is the causally stable timestamp {b: 2} or None? I tend to think the answer is None, but worry about the implications of that in practice. > > I'm just trying to figure out the operational constraints of log truncation and emptying trash because if I'm understanding correctly, it seems like this only really works well if all replicas are (a) known to all and (b) initiating operations regularly. But for a shared filesystem, I think one must deal with read-only or infrequent-write replicas, and ideally replicas can join and leave... and Martin replied: > Hi Dan, > > If a replica has never initiated an operation, but there is a possibility that it may initiate an operation in the future, then you have to include it in the causally stable threshold, because otherwise it may generate an operation with a timestamp that is lower than the threshold, and then that operation might not be able to be processed correctly. > > However, there's a way round this: even if a replica doesn't initiate operations, it can acknowledge the timestamp it has seen. If it also stores the timestamp it has seen on disk (so that it's not forgotten in the case of a crash), then you can use the last acknowledged timestamp from each replica to compute the causal stability threshold. > > Dealing with replicas joining and leaving is harder. You can probably use a rule something like: once all other replicas have acknowledged the removal or addition of a replica, then it takes effect. But the details will require some careful thought. > > Cheers, > Martin So basically, we need to be careful here, but there doesn't seem to be a theoretical showstopper. ### file (or dir) name conflicts. In a generic Tree CRDT, a filename is just metadata, and as such is not checked for unique-ness. However, a file/dir/symlink name in a directory must be unique. This uniqueness can be enforced for a single replica, but still two (or more) replicas can simultaneously create directory children with identical names, but different content, even different types. I made an example test case that demonstrates this occurring. Replica 1&2 both create the file /tmp/file1.txt, and write different content to it. The result is two Tree entries for /tmp/file1.txt, which is perfectly fine as far as the Tree CRDT is concerned: ``` $ php filesystem-split-inode.php test_fs_name_collision ------- fs state after: created /tmp/file1.txt. (replica1) ------- - null => forest - 281474976710656 => {"name":"root","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710659 => {"name":"\/tmp","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710661 => {"name":"file1.txt","inode_id":281474976710660} - 281474976710657 => {"name":"fileinodes","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710660 => {"size":0,"ctime":1598666245,"mtime":1598666245,"kind":"file","links":1,"content":"hello from replica1\n"} - 281474976710658 => {"name":"trash","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} ------- end state ------- ------- fs state after: created /tmp/file1.txt. (replica2) ------- - null => forest - 281474976710656 => {"name":"root","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710659 => {"name":"\/tmp","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 562949953421313 => {"name":"file1.txt","inode_id":562949953421312} - 281474976710657 => {"name":"fileinodes","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 562949953421312 => {"size":0,"ctime":1598666245,"mtime":1598666245,"kind":"file","links":1,"content":"hello from replica2\n"} - 281474976710658 => {"name":"trash","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} ------- end state ------- ------- fs state after: merged ops from replica2. (replica1 ------- - null => forest - 281474976710656 => {"name":"root","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710659 => {"name":"\/tmp","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710661 => {"name":"file1.txt","inode_id":281474976710660} - 562949953421313 => {"name":"file1.txt","inode_id":562949953421312} - 281474976710657 => {"name":"fileinodes","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} - 281474976710660 => {"size":0,"ctime":1598666245,"mtime":1598666245,"kind":"file","links":1,"content":"hello from replica1\n"} - 562949953421312 => {"size":0,"ctime":1598666245,"mtime":1598666245,"kind":"file","links":1,"content":"hello from replica2\n"} - 281474976710658 => {"name":"trash","size":0,"ctime":1598666245,"mtime":1598666245,"kind":"dir"} ------- end state ------- == Pass! replica1 and replica2 filesystems match. == ``` So what *should* happen when these conflicting changes merge? The paper has this to say: > One final type of conflict that we have not discussed so far is multiple child nodes with the same parent and the same metadata. For example, in a filesystem, two users could concurrently create files with the same name in the same directory. Our algorithm does not prevent such a conflict, but simply retains both child nodes. In practice, the collision would be resolved by making the filenames distinct, e.g. by appending a replica identifier to the filenames. My thoughts: 1. appending a replica identifier to both filenames is kind of heavy-handed and gross. That is sort of like neither party wins. 2. keep in mind that this can occur for directories and symlinks, not only files. 3. How to detect the collision, which occurs during apply_op()? I'm thinking that apply_op() should accept a callback function that can inspect the metadata and decide if any collision is occuring. This callback would need to check for another child with same name, which is presently a slow operation (for loop). 4. In the event of a collision, it seems that once again a callback is needed for the application to resolve it, as the generic tree type doesn't understand the metadata. 5. Perhaps last-writer-wins is acceptable here. All replicas can agree on that. (deterministic). So the last writer would keep the original name. But what to do with the loser? It could be moved under trash (deleted) or could be renamed deterministically, perhaps using replica_id or full lamport timestamp (actor + counter). Or we could get fancier and create a sub-folder "conflicts", either in current location, or eg under root or other fixed/known location. See followup: [CRDT Tree Filename collision experiments](/YgLTnGBcTzaO23EPB7AMjw) ### Lookup by name is slow. Finding child matching name presently requires a loop over all entries in the directory. This becomes especially noticeable for large directories. We should consider how to make this faster. Perhaps by caching in a local HashMap upon first read. ### Storing CRDT in Ram vs Disk. I saw this discussion about storing some or all of CRDT on disk, database style. We will probably need to think more about this as we progress with CRDT integrations. https://www.reddit.com/r/rust/comments/ihxacr/a_conflictfree_replicated_data_type_crdt_tree/g349dzq/ ### Network strings vs OS strings. Rust has `String` and `OsString` types because string representations are different between OSes and also in the language. Yet for a shared filesystem that is supposed to be mountable on any OS, there must be some agreement/conversion. In fuse_rs, string parameters are generally of type &OsStr and returned values are of type &[u8] (array of bytes). On Linux, filenames are really just bytes as far as the kernel is considered. In practice, they are usually utf-8, but I'm not certain we can rely on that. On Windows, strings are a variant of UTF-16. It would be *nice* if we could just force everything to utf-8 for storage, and convert as necessary, eg for windows. I believe ntfs-3g did this back in 2009, despite some grumbling. This approach seems to break if user is using anything besides LANG=xx_XX.UTF-8. Some background: * [one solution, in python](https://beets.io/blog/paths.html) * [ntfs-3g discussion](https://tuxera.com/forum/viewtopic.php?f=2&t=1817&view=previous) * [reddit discussion](https://serverfault.com/questions/87055/change-filesystem-encoding-to-utf-8-in-ubuntu) * [convmv](https://www.linux.com/news/cli-magic-convert-file-names-different-encoding-convmv/) More research will be needed on this. Perhaps it is simplest to just use bytes, and convert for windows, as per beets.io solution. However, for network purposes, I think we'd really like it converted into a neutral utf-8 for storage, display by browsers, etc. I think though, that to convert we have to know the FROM encoding, and I don't know that the fuse-filesystem process has access to env of program doing the writing. One solution could be to offer an encoding mount option, so that all byte strings are interpreted as originating in the source encoding. We can also check how other network file systems handle this issue. ### Other OS differences: #### path Case (in)sensitivity. note: NTFS actually is case-sensitive, and since 2018 Windows supports per-folder case sensitivity flag in the UI. So I think we can kind of just rely on it. See: * https://devblogs.microsoft.com/commandline/per-directory-case-sensitivity-and-wsl/ * https://www.tiraniddo.dev/2019/02/ntfs-case-sensitivity-on-windows.html #### path allowable characters #### mode, flags, uid, gid ### Backup/Archive vs Sharing vs Deployment People will use safe_fs for various purposes. We can broadly group these into categories: Backup/Archive: User wants to backup files with all local metadata and restore them to local system later with metadata intact, including user/group info, permissions/mode, timestamps, etc. Sharing: User wants to share files with others, possibly on any operating system. Deployment: User wants to deploy a Safe website or other type of application/service.

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully