# dat/cobox chat w paul + ranger minutes
Qs:
- What are the possible scalability / design issues we're going to butt up against with peerfs?
- multiwriter in dat? whats the current ideas?
- Whats Dat's thoughts on privacy on the DHT, and is that a consideration going forward?
- libp3-sphinx
- Why isn't hyperswarm in the main Dat stack at the moment?
- Any thoughts on conflict resolution schemes?
- dat:// compatibilty - what are the prerequesites so we can ensure in our development we stay open ended?
our core considerations
- privacy
- data sovereignty
- revokable access
- permissions
Introductions...
k: did you guys get a chance to read over the stuff we sent over?
pf: solid research and documentation, appreciate multiple solutions + trade-offs, would like to help with decision-making
mv: groups + how to do group-based content encryption + access revokation - what can be done here?
Looks like we are going to have iterations that are usable, and that is an interesting way of going about it.
mv: needfinding - who's it with?
mu: with 10 coops in UK, small workers co-ops / co-ops in general, horizontal groups, desire not to rely on big corporate players - ascertain what their main needs are, for e.g. p2p file sharing -
mv: industries?
mu: broad - half from cotech, so digital /tech workers co-ops, couple of platform coops, couple of tech hosting co-ops, a coops that does VR, media co-op, other half is non-tech, printing, ceramics and housing coops.
Paul: that is the full spectrum.
Mu: it is a way to understand what is out there also . coops that focus on nextcloud instances and so on. Any super highlevel needs that exist across industry needs, and if that turns out not to be the case, then perhaps our priorities are clearer.
Paul: proposal for the call - maybe we focus on the MVP first, the features and then we can talk about some of the tech itself?
Kieran: Yes. We focus on the whole concept of sovereignty, We are particularly intersted in privacy, beacsue these organisations need to have a certainty of privacy.
Want to elimiante any risk of data leak. Couple of people working on Kappa core
very interested in file permissions
permissions access
threshold signature schemes
being able to enact that at software level on p2p architectures
Mauve: What kind of decisions did you have in mind, for voting for example
Peg: for example, deciding if someone is entering or leaving the group. a kind of decision that might be enforced through software. Or publishing something on behalf of the group, so like if you have a big group, and publishing something under the name of the group.
Kieran: we are also all intersted in the quetion of collab budgeting and collective management of money. Making decisions on money at software level on these kinds of things, would be incredibly powerful.
Paul: interesting thread of concersn...
1. which applications?
2. how much novel tech to use vs working with off the shelf
3. how much time to make all of this work.
What applications are you thinking as MVP at this point ?
I imagine a pretty robust cloud drive, with certain users can access with different permissions.
Kieran: what we forecast for the rest of the year - we would like to be able to build the shard filesystem. Browser interface, and a lot of the research will feed into permissins, scalability, back end, file sizes, depending on the organisation, media files or documents. Those will come out of the needfining. Shared file ssytem is what we all agree on. Blid replication - distributed back up/ recovery, is the major selling point for us.
Paul: the idea is that coops form partnerships to back each other up?
Mauve: hypercore ?
Kieran: as inspiration yes,
doing it with asymmetric key, doing it with password at the moment.
Mauve: is it open?
Kieran: it is on Legder, but will be publising, but I did send you links to the repos though/
Mauve: might be useful to integrate in the core, if you are interested in that. Would really like to see further in thefuture, COBox is making all the cool things, would be great if it is just used across all of DAT, in beaker, cabal etc and share the possibility of encrypted archives with any and all DTA prohect.
Kieran: Doesn't feel like we have done much. Basiaclly hypercore, Lipsodium, passed into xx signs and encrypts the message. Can easily publish that but there is not a huge amount of code at this point.
Paul: We are moving Matthias to more public stuff. Would love for teams like yours to the more core stuff. Much of waht you are doing is what we want to hve in the eco system. Hopefully we just have a conversation like this, ... interoperable, figure out what fits where. Turn your work into standards and common modules
Mauve: Matthias is very good at optimising and getting data structures efficient.
peerfs is good, but will be scaling issues. Mtthias has ideas for how to make it efficient. Not just about taking your work into the eco system, but make sure that it will be useful for you - and make you more
Mulitwriter in particular, is something that you can get done. But to get it done in an efficient way is going to be hard, will be good to collaborate there.
The encryptions stff you are doing is very cool. No need for us there. But there are other areas where we can make things really efficient.
Kieran: me and peg discussed this earlier, and yes please what would be design advice on for example scaling issues. What else might come up? Heard that multi writer has been discussed in other ways than peer fs. Paul?
Paul: Mutliwriter is tricky. We have debated wat is hte best way. Two models: 1 all devices coordinate with an anchor node. Distributed lock, you talk to coordinator, when safe to not split, then you add. Highly consistent model. Requires connectivity to coordinator node - trade off. Arch is very simple, will not have conflicts on files. Down side is oyu need to be online. 2. eventually consistent model. Merged once eveyrone is online. Pro: usable offline. Con: More work to implement correctly when everyone gets online, more complicated data strucutre.
Getting a lot of word that people really want to be able to work offline. So if we go for eventually consistent model.
Peerfs, does not take advantage of sparse replication. Without sparse replication, that is a scaling problem.
I am keen to work with Matthias to put together beginnings of a spec. Still figuring out how it should work.
Mauve: what is the minimal thin we can do that is efficient without huge architecture. Biggest scalability thing is to have to fully read the hypercore metadata structure.
Paul: vector clocks to hypertrie. that might help the scaling. scaling problem was mostly to do with vector clock. It is a clock that needs to be kept for each file.
need a sparse version of the vector clock, might be able to move it to a mypertrie.
Mauve: were discussing file clocks, basically timestamp. Main bottleneck is goin to be downloading metadata. win would be to use hypertrie and store data there. Instead of downloading full metadata feed, go folder by folder and download those files. We do not have a definite soluteion yet. Just know what to avoid.
About vector clocks: we need blue clocks where we dont store everyone else's clock, just a blue filter instead of every using having to track everything. Having metadata per user is hard. Hypertrie might be the solution. Havent decided yet.
Store causality information in the hypertrie is the idea.
Paul: how sure are you that you are using DAT ?
Peg: clarifying question - this is really useful. The scalability seems to be three differnt types: too many writers, file size too big, too much metadata because of too many changes.
Called with Noffle and Cabal - using kappa core. lots of writers, low content, problems with multiplexing of replicating kappa core.
replication rather than indexing is slowing things down. So I guess this scalability issue we are talking about here is the metadata to do with too many changes.
Mauve: read performance is the big thing. If you have a sync all the data of all the writers then read will be slow. Write doesn't matter.
Kieran: what about large files and performance?
Paul: if you take a dat and put loads of big files - no problem. if you make small changes to large files, like a couple of bytes - in the current vesrion that would require rewrite of the whole file, so an issue. Matthias is working on this. Deduplicating over the network - Matthias also wants to include that. Being worked on.
Kieran: timeline for this?
Paul: random writes dimension is being worked on now. Good bet to get it done withing your timeline. Want to get DAT be a native system inside OS so that it feels like native filesystem.
Mauve: Don't have sharding figured out. Very large files, there is nothing that will split up the files across different storage.
Paul: dats don't have to download all from an archive automatically - primary peer can host most
Kieran: a few more important topics - privacy on the DHT, what are considerations for dat ecosystem, hyperswarm?
protocol compatibility, how can we make sure we stay integrated with your development?
Paul: Hyperswarm, integration, and also plans for multiwriter moving forward.
Kieran: Lets wrap that point up then. Peer fs got multiwriter working that is mountable. Conflict resolution is poor,latest thing that happened. Looking at eventually consisent models. We are interested in just getting something working and seeing how it plays out.
Peg and I;s work with peerfs was to see what the affordances are. Done local replication between us and we have that now.
Paul: peerfs will be stopgap. Need to figure out how long it will or can be the stopgap. I plan to talk to Matthias and Andrew - they are busy, but I can begin to put together an idea of what final multiwriter will look like. Lets keep in touch about that. If it looks like it is not working for you guys, lets keep in touch with that, and you can either do that or use peerfs
Mauve: what kind of privacy do you mean for the DHT, you mean anonymity?
Peg: we do not yet know if absolute anonymity is what we priorities, but we would ideally not want to rely on central server for peer-discovery. Peer discovery whilst maintinaing privacy.
Talking to Gustavo about privacy and DHT, working on an onion routing library that we are thinking of using as a way to avoid leaking meta data.
Paul: hyperswarm is going to be the solution for dat in the future. We are moving towards dat 2.0, hyperswarm will be the connections layer. Going to be a DHT. Status on this is that it will hide what information you are discussing unless people already knows what it is. People have to know the address in order to access it on the DHT.
Also, you are effectively adding ip address. Worst case scenario. Meta data leak
Intersted to see if onion routing can solve it. Priority a year from now. Near term - can have clients talk to shared proxy. CoBox could run a proxy that everyone runs their discovery through. Whoever runs the proxy would be trusted though. Then there is local discovery, over wifi, to find devices. LAN, and turn off DHT.
Mauve: Looking to get I2P working. Anything htat can do duplex streams will work with dat.
Watched Goncalos talk, some approaches will make it harder to do holepunching with hypercore. If you don't want to leak your IP, chose a mix net and use it to be honest.
Peg: p3lib-sphinx is goncalos project.
Mauve: Hyperswarm proxy, am using to get hyperswarm to work on the web through a proxy. Maybe we can write something to secretly talk to this lib, while doing all the p2p networking in go. Or we just make a mixnet.
Kieran: considerations here with the way that coops will use this tech - I think pinning a node that would do a lot of the network could be a stopgap for the coop movement. See with the research.
Muave: Hyperswarm proxy just exists now, and you can use it in node. Thinking autodiscovery for hyperswarm proxies. Now connecting between proxys is not solved, but probably something we can figure out.
Kieran: we are interseted in seeing what works for coops, and moving from there. It is really useful to know of these options to make an informed decision on what comes from the research.
Mauve: hyperxxx??? doesnt use DNS tracker servers for discovery, only uses DHT.
Cannot un-announce yourself. If you have a lot of short lived connections you will
Hyperswarm, when you close the process, you unannounce yourself from DHT, which means you can find peers a lot faster cus of less nois.
Hole punching - if you both accessed a DHT node you can use that to holepunch to each other - more reliable to connect people on home wifi for example.
Hyperdrive proxy already using hyperswarm. Currently workin gon getting it into the SDK. That should not be too much work. early next month or next wil have something in the SDK for using it and hyperdrive.
It is an easy to use API, just need to get around to integrate it everywhere. Paul has also been testing it with Beaker. It is all good to go.
Paul: we are looking to get it all into dat and sdk in the fall.
Kieran: questions on conflict resolution, but maybe too much right now. Talk about compatibility instead.
Mauve: Worst case, think most will work in the sdk in beaker. Can I open a CoBox style archive as a url. that is the main concern where we would need interoperabiltiy.
If you are using hypercore and kappa core then you are probably good to go for applications.
Paul: Beaker architecture has hyperdrive filessytem. Everyone gets a private and public dat (personal website). read/ write into public dats xxx xcant hear!!! unwalled garden and social media something.,,
peer sockets,
join shared lobby to connect
or authenticated connection
so can open peer socket specifically to for example ranger.
anything that is custom in dat is built off of hypercore.
Beaker - will include all tooling for hypercores to be created.
Beaker will not know anything other than that they are hypercores. Application land data structures
Peerfs in Beaker, xx SORRY MISSED THIS TOO
...hypercore apis to read peerfs to interact with the structures there.
Mauve: challenges with compatibility - interseting things in hyperswarm - for example manually managing peers. Would be good to thinking of highlevel apis, if we can get something to work in beaker. doesn't expose networking, just highlevel data structures. thinking of how to have identities for peer sockets, we can probably think of it for other elements in the stack.
Like list peers with public key, which would be the only ones with access to the archive.
Mauve: this also interacts with noise protocol, and identity.
Paul: basic noise intergration, DAT 2.0
Peg: Paul can you describe how you understand what Matthias is working on for eventual consistency. that is not peerfs.
Paul: strict or eventual consistency. So it comes down to whether or not you have a computer (coordinator) that manages things for everyone. [describes strict consistency, bla].
Mauve: peg was asking about the one based on vector clocks?
Peg: yes
Paul: add vector clocks into existing metadata of hypercores. and a couple more data points in existing entries. Going to have some kind of vector clock to keep track of who wrote most recently to every file. Sometimes, can't tell. Then conflict, and the application needs to decide which one to use. Could ask user, or chose to throw away a conflict, depends on the application. Could chose one, but write and track history of decisions.
https://arxiv.org/abs/1905.13064 (nice one I don't have mumble on my laptop)
Mauve: Matthias might hvae mentioned bloomclocks. It is vector clocks where you don't store everything. Can be more space efficient. Will not require as much meta data for every write.
Kieran: content encryption. I will publish something open ended when done. Useful?
Mauve: want to standardise content encryption. To integrate with Dat store and hashbase. Any stuff using the stack should use this. Maybe we can see what you have, and see if others want to use this too.
Kieran: Telemon has done all this work on cyphercore and wants to integrate into teh sdk.
Mauve: would that be useful for CObOx?
Kieran: It would be great to have your shared and private drive hosted with your coop. Can still pin it in the same arena, cyphercore would help on that.
Mauve: I will get back in touch with Telemon.
Paul: I am always free on call, irc etc so let me know. Matthias will also become more available in the future. Multiwriter - think peerfs good in the nearterm, and I will get specs together in the meantime.
Kieran: will be in touch with questions for example on scalability and so on. Useful to be able to ask. We will be publishing it all opensource and will ping you.
Mauve: once I get the kappacore working with the sdk, I really want to speak with you again - get it working across nodejs, beaker etc.