filesystem
fuse
See here for background.
Quoting again from the paper:
In practice, the collision would be resolved by making the filenames distinct, eg by appending a replica identifier to the filenames.
However, using the Tree data structure as specified in the paper (and my code), I did not see an obvious way to append a replica ID to both conflicting entries while applying a remote op to local state.
The entry of the op being applied is easy because I can grab the replica id from the op's timestamp. However, the pre-existing local node obtained from the current tree HashMap<child, (parent, meta)> does not have an associated timestamp.
To remedy this, I had to add a timestamp(actor, counter) field to tree, eg: HashMap<child, (parent, meta, timestamp)>.
This works, but I don't like adding an extra field to every node in the tree and deviating from the paper. So I banished the change to an experimental branch for now.
(note: this could be simply actor_id rather than timestamp, to save a little space. I was hedging bets in case I need the counter also.)
Obvious in retrospect: We are now storing the replica+counter in the node ID itself. So we already have it available and don't need any extra field in tree_node!
The only other solution I could think of without adding a field to the tree triples would be to index the operation log entries by child_id+timestamp to find latest entry, though that would seem to complicate log truncation as then we must keep latest log entry for each child_id.
The conflict is occurring in the metadata, which is application specific. In this case, it is the filesystem class which knows the structure of the metadata. As such, it must be the one to (a) detect and (b) resolve the conflict.
I came up with this function, which is called by each replica immediately after merging in an op from remote replica.
private function apply_merge_log_op(log_op_move $op) {
// fs_inode_file_meta do not have 'name' attribute,
// so no possibility of conflict. no-op.
if(!isset($op->metadata->name)) {
return;
}
$name = $op->metadata->name;
// find existing nodes with same parent and same name.
// 1 is ok, because we already applied op.
// 2+ indicates conflict(s)
$matches = $this->children_with_name($op->parent_id, $name);
if(count($matches) <= 1) {
return;
}
// For each conflicting entry,
// generate an op_move that renames it to
// <name>.conflict.<actor_id>
$ops = [];
foreach($matches as $conflict) {
list($child_id, $tn) = $conflict;
$tn->meta->name = sprintf( '%s.conflict.%s', $tn->meta->name, $tn->timestamp->actor_id());
$ops[] = new op_move($this->replica->tick(), $tn->parent_id, $tn->meta, $child_id);
}
// apply the rename ops.
$this->replica->apply_ops($ops);
}
It renames each conflicting directory entry to:
name>.conflict.<actor_id>
So if we had:
parent_dir:
file1.txt
file1.txt
we get:
parent_dir:
file1.txt.conflict.1
file1.txt.conflict.2
I updated the test case test_fs_filename_collision
to call apply_merge_log_op. Results:
------- fs state after: initialized. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
------- end state -------
------- fs state after: created /tmp/file1.txt. (replica1) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710661 => {"name":"file1.txt","inode_id":281474976710660}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599001559,"mtime":1599001559,"kind":"file","links":1,"content":"hello from replica1\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
------- end state -------
------- fs state after: created /tmp/file1.txt. (replica2) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 562949953421312 => {"size":0,"ctime":1599001559,"mtime":1599001559,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
------- end state -------
------- fs state after: merged ops from replica2. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710661 => {"name":"file1.txt.conflict.1","inode_id":281474976710660}
- 562949953421313 => {"name":"file1.txt.conflict.2","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599001559,"mtime":1599001559,"kind":"file","links":1,"content":"hello from replica1\n"}
- 562949953421312 => {"size":0,"ctime":1599001559,"mtime":1599001559,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599001559,"mtime":1599001559,"kind":"dir"}
------- end state -------
== Pass! replica1 and replica2 filesystems match. ==
Let's suppose that each replica creates file1.txt independently again. So upon merge they will try to create file1.txt.conflict.1, but it already exists. What happens?
Let's see:
------- fs state after: created /tmp/file1.txt. (replica2) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 562949953421312 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
------- end state -------
------- fs state after: merged ops from replica2. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710661 => {"name":"file1.txt.conflict.1","inode_id":281474976710660}
- 562949953421313 => {"name":"file1.txt.conflict.2","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":"hello from replica1\n"}
- 562949953421312 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
------- end state -------
------- fs state after: merged ops from replica2. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt.conflict.2.conflict.2","inode_id":562949953421312}
- 562949953421315 => {"name":"file1.txt.conflict.2.conflict.1","inode_id":562949953421314}
- 281474976710661 => {"name":"file1.txt.conflict.1.conflict.2","inode_id":281474976710660}
- 281474976710663 => {"name":"file1.txt.conflict.1.conflict.1","inode_id":281474976710662}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":"hello from replica1\n"}
- 562949953421312 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710662 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":null}
- 562949953421314 => {"size":0,"ctime":1599002122,"mtime":1599002122,"kind":"file","links":1,"content":null}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599002122,"mtime":1599002122,"kind":"dir"}
------- end state -------
== Pass! replica1 and replica2 filesystems match. ==
So now we have 4 conflicting files. And if it happens again, we get 6, and so on, two more each time and some very long filenames.
not great.
So let's go back to our other idea, of using last-writer-wins (lww) to pick a winner that gets to keep the filename in conflict, while the loser(s) get renamed.
To do this, we modify our apply_merge_log_op() func so that it reverse sorts by timestamp, and then ignores the first conflicting entry, ie:
$cb = function($a, $b) {
$at = $a[1]->timestamp;
$bt = $b[1]->timestamp;
// reverse order, greatest to least.
if( $at->eq($bt) ) {
return 0;
} else if ($at->lt($bt)) {
return 1;
} else {
return -1;
}
};
usort($matches, $cb);
$ops = [];
$cnt = 0;
foreach($matches as $conflict) {
if($cnt++ == 0) {
continue;
}
list($child_id, $tn) = $conflict;
$tn->meta->name = sprintf( '%s.conflict.%s', $tn->meta->name, $tn->timestamp->actor_id());
$ops[] = new op_move($this->replica->tick(), $tn->parent_id, $tn->meta, $child_id);
}
$this->replica->apply_ops($ops);
So now, with our initial collision, we get:
------- fs state after: created /tmp/file1.txt. (replica1) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710661 => {"name":"file1.txt","inode_id":281474976710660}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599003072,"mtime":1599003072,"kind":"file","links":1,"content":"hello from replica1\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
------- end state -------
------- fs state after: created /tmp/file1.txt. (replica2) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 562949953421312 => {"size":0,"ctime":1599003072,"mtime":1599003072,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
------- end state -------
------- fs state after: merged ops from replica2. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710661 => {"name":"file1.txt.conflict.1","inode_id":281474976710660}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599003072,"mtime":1599003072,"kind":"file","links":1,"content":"hello from replica1\n"}
- 562949953421312 => {"size":0,"ctime":1599003072,"mtime":1599003072,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599003072,"mtime":1599003072,"kind":"dir"}
------- end state -------
So the files are now:
/tmp/
file1.txt
file1.txt.conflict.1
and if either replica tries to create file1.txt again, eg with mknod(), it generates an error because the file already exists.
------- fs state after: created /tmp/file1.txt. (replica2) -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 562949953421312 => {"size":0,"ctime":1599003222,"mtime":1599003222,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
------- end state -------
------- fs state after: merged ops from replica2. (replica1 -------
- null => forest
- 281474976710656 => {"name":"root","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 281474976710659 => {"name":"tmp","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 562949953421313 => {"name":"file1.txt","inode_id":562949953421312}
- 281474976710661 => {"name":"file1.txt.conflict.1","inode_id":281474976710660}
- 281474976710657 => {"name":"fileinodes","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
- 281474976710660 => {"size":0,"ctime":1599003222,"mtime":1599003222,"kind":"file","links":1,"content":"hello from replica1\n"}
- 562949953421312 => {"size":0,"ctime":1599003222,"mtime":1599003222,"kind":"file","links":1,"content":"hello from replica2\n"}
- 281474976710658 => {"name":"trash","size":0,"ctime":1599003222,"mtime":1599003222,"kind":"dir"}
------- end state -------
PHP Fatal error: Uncaught Exception: Already exists: file1.txt in /home/danda/dev/maidsafe/crdt-php/src/filesystem-split-inode.php:407
As such, we can't get into an endless conflict resolution loop if we have programs on each replica that keep re-creating the same file.
This is by no means an exhaustive theoretical analysis, and there could well be problems yet unseen. However based on these tests alone, the last-writer-wins approach, where the original filename is kept by a single winner seems preferable/superior.