owned this note
owned this note
Published
Linked with GitHub
# SSB Protocol Design Call 30/06/2019
Attending:
- Aljoscha
- christianbundy
- cft
- arj
- dominic
- cryptix
- Staltz
- rabble
## Agenda
- note taking?
- how long do we have?
- Discuss [@christianbundy](@+oaWWDs8g73EZFUMfW37R/ULtFEjwKN/DczvdYihjbU=.ed25519)s [off-chain content](%LrMcs9tqgOMLPGv6mN5Z7YYxRQ8qn0JRhVi++OyOvQo=.sha256) thread
- Slowly rolling out changes, or one big change?
- [@cft](@AiBJDta+4boyh2USNGwIagH/wKjeruTcDX2Aj1r/haM=.ed25519) will present birch
I have a feeling that some of issues to be discussed are highlighted in the [summary diff](https://github.com/cn-uofbasel/ssb-birch#quick-summary-of-differences-jun-1-2019-to-bamboo-and-arj03s-variant).
- Hamburg preparations
## Notes
```abc
X:1
T:Speed the Plough
M:4/4
C:Trad.
K:G
|:GABc dedB|dedB dedB|c2ec B2dB|c2A2 A2BA|
GABc dedB|dedB dedB|c2ec B2dB|A2F2 G4:|
|:g2gf gdBd|g2f2 e2d2|c2ec B2dB|c2A2 A2df|
g2gf g2Bd|g2f2 e2d2|c2ec B2dB|A2F2 G4:|
```
## offchain content
- Dom thoughts on offchain-content:
- one option: treat as new encoding type. in replication, send metadata and content as one thing.
- cft: let's take a step back, big picture
- there are things in the metadata we don't want to be deletable
- e.g. follow information
- what exactly are those things? What is the chain about?
- cb:
- 100% confident: shouldn't be able to delete deletion requests
- changing feed types?
- arj: do we even want delete? Immutable logs with deletions are an oxymoron
- cft: things that can be offchain are application stuff, things that can not are about chain management
- dom: (implicit agree), but: follow messages are application layer
- alj: to me, offchain content isn't about deleting from the logical log, just for deleting from a local replica - cb's proposal diverges from that
Cft: How do we deal with invalid feeds? ignore? bork the feed? (Alj thinks this is a *really* important question)
alternative blob reference approach, msgkey hash with object path to the blob ref or mention (?)
# transcription
[00:00] alj: awseome .. also +1 on... <audiotear>
[00:22] cft: ... i continue: so if we could quickly do one step back and understand what we are trying to achive and maybe also a little bit in the long run with the offchain-content. and let me start maybe the discussion in the following way: there are things, i guess, which should be in these event logs (or metadata) which should not be deleted at all. for example, things that are directed to tell the network layer what to do if you follow somebody. you shouldnt be just deleting that. it really has to be really in the log forever. because otherwise the network... as soon as you ask "please forget this entry", has it to go back and see that entry, was it because you followed somebody? does that mean that you are unfollowing somebody? so i think we should be clear what has to be in the chain, what can be offloaded/offchained? all that is for me not clear at all and before we even start, lets call it hacking, and i denfinitley would like to expand on the naming of these attachments, how i call them.
can we quickly look at what the chain is about and what should go offchain content inside these parts?
[01:45] cb: yeah. One thing, so.. I try think about this and then start enumerating possibilites and there were too many..
the one thing that i'm 100% sure, well.. that i feel very confident about is that we shouldnt be able to delete deletion requests.
because uhm that ends up with a strange scenario where you can delete a message that's asking people to delete a message of you deleting a message..?
so i think there are definitly going to be things onchain. deletion requests should not be off-chain, i don't think.
and i'm not sure if there are other things on the network level.
so if you can change a feed type, that should probably stay on chain.
yeah, i think it would be usefull to enumerate those.
[02:35] arj: there is also the question if we even want deletes.
I mean, the current design is very sort of pure in the way that is an immutable log that just keeps on growing forever and ever.
I can see some of the use-cases for this deleting of content but we just have to realize that is a very different model, in a way.
so I'm not saying that it's good or bad just that it's very very different.
<audiotear>
[03:00] cft: ..application stuff. but things that have to stay in the chain really is about the chain-managment. maybe let's call it that way.
[03:34] dom: on your previous example of follow messages: I think I would not consider it a problem to replicate...? that you have not posted a follow message for. and you can follow peers.. uhm. like, you can post a follow message that is encrypted, so no one else can see it. that currently works.
Im a bit weary because.. I mean, keeping messages having always is simple. Having messages always in offchain content, but usually its just sent: that's also pretty simple.
but having both sometimes messages in the offchance be ... in the like non-deletable content... then how does that get enforced?
and i think if you have offchain content, you basically can't enforce whether something should be in offchain content or not.
unless you just ignore that are types that arent offchain..
so I worry that this solution is just quite complicated.
[05:32] alj: kind of exapnding on that: In my mind offchain content isn't about being able to delete content from your feed.
it's just about being able to locally delete data from a feed replica, while still maintianing the ability to replicate that replica to other peers,
well except for the stuff you deleted obviously.
still, on the logical level, the log would still be an append-only log.
and that's where christianbundys thread divereged from my previous suggestion for offchain content.
because in that thread it was more like we effectivly don't have append-only logs anymore and i'm very skeptical of that.
[06:32] dom: so what I'm thinking about is just an offchain-content where you are receiving a message (..?) and just not sent the content.
for whatever reason they have for doing that (not sending the content) thats their decision.
but you could always when your replicating just say "oh i don't have that content"
and if someone doesn't like that, they could ask someone else for that content.
[07:27] cft: can I ask about what you call metadata. This is.. lets call it the naked chain information.
Who wrote it, which sequence number and what is the previous hash. that's more or less...
[07:40] dom: correct. just the absolute minimum metadata for replication.
[07:50] alj: actually for verification.
dom: yea.
[07:52] cft: could there be also maybe be a little bit more than just the hash to the content. namely the size?
that could maybe be a decision if you want to fetch that content at all or not.
[08:10] cb: i think that's usefull. the implementaion i have does have size... <audiotear>
[08:18] dom: yea i think i recall aljoschas proposal includes a size.
[08:27] alj: it even enforces it. if what you just described happend, we consider the feed broken and stop replicating it.
[08:40] cft: yea, probably ignoring would be a safer decision than just ending the feed.
what about having multiple contents, what I suggested in burch as having attachments?
.. okay. the silence is a not so enthusiastic reaction, i see. so i try to make a little bit of advertisment for that:
the reason why I came up with the attachment thing is that, you want to be very selective and also be able to name the different pieces.
so think about one post and it has three pictures. so you would have four elements, the post text plus the three pictures.
and some clients would like to be, just picking the texts, or the others could be "okay I want the 2nd of the three attachments" or things like that
and that comes then to the naming of these things. it would help to introduce names to sub-content by that venue of attaching things,
like we have in MIME.
[10:09] alj: so what is the advantage of baking that in the protcol instead of just publishing a message that... <audiotear>
[10:16] cft: so lets call it a manifest. so the content would be a manifest, and inside there you would enumarate all the possibilities..
other problem.. if you look at christianbundies suggestion that we have alternting metadata and content pieces..
you would have one big... <audiotear> added to the feed if they are not independant
[11:00] alj: i think thats more from a replication concern then actual feed structure concerenc because.
if i understood you correctly, if we had it as part of the metadata, multiple contents then during replication you could specify
which attachments to inline and which not so sent.
did i get that part right?
[11:40] cft: i have to think about that, maybe you are right.
somehow I wanted smaller granulatiy but you are right that this targets at the replication time not ncesessarlitly rendering time.
i have to think about that
[11:55] alj: yes and uhm.. one addtional point: then we can get to the sort of..
uhm.. dominic posted this thread about transistivly fetching cipherlinks basically.
where i draw this comparssion to captnproto and what was it called? future chaining something..?
but basically we could do that as well for those sort of manifest contents.
so i don't thinkt that there are actual performance and roundtrip gains that we could only get by adding them as part of the metadata.
i think a clever replication protcol can always offset that.
[12:40] arj: and I also think that is really important to keep the core protocol as simple as possible and then built the other things ontop of that.
because later on you might be able to put in all kinds of clever replication tricks that you didn't think about in the beginning afterwards.
[13:04] dominic: idea here, we currently have EBT replication stream which just send messages, each message that comes through is a message with or without content. And then have another stream that is a blob push protocal, in this I'm saying whenever you send me a message just send me the blobs that message mentions without having to ask for it. This would be an independant feature that users can turn on/off. More push style that is selective for specific types.
[14:28] cft: my concern is when trying to stich things together in my mind is the naming of the content. If we would have like bundy's suggestions, inline of the content with the metadata we have a natural naming schema. Say event.0 is the metadata, event.1 is the content, evnet.2 is an attachment etc. If we don't have that the only name we will end up with is the hash and that is the thing I'm really reluctant. A simple namespace. This is such a stress to build distribution systems for flatly named content.
[15:27] alj: my main response to that is. Blobs are basically a flat namespace. we could just have rpcs that lets say please give me this blob by hash and btw I know that it was referenced in the following feeds, then that information can be used. I think that is more general because you can specify multiple feeds.
[15:56] dominic: I think Christian means things like say you have a message and content points to several blobs. How do those blobs relate to that message. You attach the extra links at the bottom and then you say the 1,2,3 etc. Another option I'd like to suggest is you have path into the data. The content is a structured type that has keys and values. Currently you have mentions, that is an array and then you could take the fourth one or any mention that points to a message. You could say things like that. Any blog posts has a blob link, so you could describe push these items that are on this path inside the content. The way you refer to that blob is that you do by hash or you could refer to it by the message id / path. Like IPFS. Does that make sense Christian?
[17:49] cft: absolutely, I see that would replace the numbering. It would mean content is really attached to the seq no. We would have 2 parallel feeds: metadata, each entry is numbered and content where each is also numbered. It would be more explicit than the hints.
[18:37] alj: you would still need to be able to verify the data. It wouldn't make sense to say go into this object and path xyz because the hash needs to be known.
[19:05] dominic: the path would start with hash of msg, id seqno, and then you have the path inside of that object. You could receive that message and then requst parts.
[19:30] alj: you said whole message, aren't we talking about cherry picking?
[19:51] dominic: a way to reference blobs inside a message, not about receiving a part of the msg.
[20:25] alj: at which point does this need to be part of the core protocol?
[20:40] dominic: higher layer, doesn't need to be part of the msg replication protocol, interesting side protocol. A way of addressing Christians use case.
[21:00] cft: good question alj. Lets go back to delete individual things. If we always have to delete the whole content, makes a diff to only delete a part.
[21:25] alj: say we have offchain content, then you are free to delete not the whole message and only keep some of it
[22:00] arj: would still be nice to have a way to say that I only want to delete a certain part of my message by speciying the path
[22:10] alj: my understanding that delete is a local operation
[22:25] dominic: not advoating deleting the text part of a message because then it won't validate, blobs are simple
[23:00] alj: again, local operation
[23:05] dominic: it is, if you post a message that asks other people to delete that you have thought globally but acted locally.
[23:26] alj: application layer, right?
[23:30] dominic, arj: yes
[23:35] cft: confused. If you delete locally you can't help replicate, and if you have holes in the content then you can't validate?
[24:00] alj: you can still validate because you delete the content, but you keep the hash. Hash is enough for validation. No content.
[24:23] bundy: if the msg specifies a size and you don't have content to give them, would they validate?
[24:38] alj: good question
[24:50] cft: if you selectively delete from a message, you would how to specify how you validate. Is it concat of all the bytes? Tree of hashes?
[25:12] alj: not sure there is a need to validate the message again. You get a message once, check it, that's it. After that you can delete
[25:50] cft: if you a message, everything was there except 1 image. How would you handle that?
[25:12] alj: Don't care. Your decision
[26:00] dominic: don't need the image to validate the msg, you just need to check the hash. The signature still signs the image, that means if you do get the image later, you can verify that this was the missing image. Don't need image to check signature.
[26:30] cft: coming back to the multipart msg. 3 things: metadata, text, image. Someone says forget the img. From now on there is no img anymore. How do you check that the rest is correct without the image?
[27:00] alj: you would get that for free, if you just to what we currently do with the manifest approach. Only sign the manifest.
[27:25] cft: what is really signed? Not the bytes of the content, the bytes of the manifest
[27:37] alj: no, bytes of the hash of the manifest. Because the manifest is just the content of a msg. Nothing special about it.
[27:47] cft: agree. In the prev case you would concat all the bytes and that would be the content. No substructure. Would the manifest thing have to be made official?
[28:09] alj: I don't think so. Why would it need to be? Just application data?
[28:25] dominic: correction. I propose calling it platform layer, the replication protocol doesn't care about that. Just bytes. Agree on a good format to encode this data. We can't really stop someone from doing something completely different. People tend to choose the same thing ;)
[29:00] alj: thanks, self-describing format implicit. Please don't use json
[29:17] dominic: yes
[29:20] cft: platform layer would then look at the manifest. Needs to be specced to delete specific parts. Right?
[29:40] alj: selective delete, where?
[29:43] cft: same case, text and image referenced in a manifest. Protocol layer can validate even if image is missing. Correct?
[30:17] dominic: yes, manifest would be official. But seperate discussion. Encoding is independant of this.
[30:40] cft: ok. Still have flags, when feed is terminated. Not off-chain?
[31:10] all: yes
[30:40] cft: might there be more than one 1 bit?
[31:40] bundy: yes delete would be one one type, ssb mutual credit have problems if not stored on chain
[32:25] alj: to clarify, bamboo supports a whole byte. Will be need for more types
[33:28] cft: delete is platform layer. So not part of core protocol?
[33:40] alj: agree
[33:45] arj: you have a short int, what the different numbers mean doesn't have to be part of the core protocol
[34:20] bundy: you should still be able to post on-chain content messages for backwards compat.
[34:38] dominic: current msg type works like it does. Offchain would be a new encoding type, not json. On-chain = current method. Not 1 that can do all. Simple as possible.
[35:30] bundy: great if it was a different msg encoding, with offchain content and a bunch of different features. My goal is to make it as simple as possible to move from now to the next iteration rather than giant leaps. Iterative.
[35:55] dominic: small iterative changes are great but in an immutable protocol, means you have to support all the quirks forever. We want to do enough work to iterate on db, that can change. No hacks. Future is a lot longer than the time it takes to get this out.
[36:56] bundy: scope? 3 month, 2 years?
[37:25] dominic: Good question. this year.
[37:40] alj: make sense with 2 efforts? 1) incremental (json still, add limpaa) 2) full redesign
[38:30] arj: work together on a design that can be rolled out in the current implementation and not have to redesign everything
[39:00] alj: always dissapointed if we have to support the old stuff
[39:25] dominic: thats life
[39:30] cft: ambition to have a future proof design? Can carry a long way beyond json.
[40:00] bundy: what is a new feed type?
[40:40] dominic: moving to a binary protocol is not that hard. Migrating the impl. will be a lot of work. New feeds is going to be easy if they don't require massive architectural changes on the inside. We could support limpaa links and still do normal replication. Off-chain content: msg validation is the hash, not the content. Application layer ignores if empty. Take our time getting it supported by applications. That part could be incremental.
[42:42] alj: limpaa links is a minor version bump.
[42:50] bundy: Would these need a new feed type? Can have limpaa in json?
[43:10] dominic: def. a new feed type. Breaks validation.
[43:25] bundy: To use a new feed type, is that a message you post on your feed? Or a completely new feed?
[43:40] dominic: new feed with a new public key. Same-as good idea. Different discussion
[43:55] bundy: do we need a new feed type? You could push messages with new encoding
[44:10] dominic: more confidant that a new feed won't cause problems. What is you have old, new and the back to old again? Then you have holes they can't validate. Easier with 2. Just go back.
[45:00] rabble: same identity?
[45:10] dominic: tag at the end is different. Current @ base64. ed25.. is the legacy js style. I would encaurage people to use a diff key. Not sure if it will break. Safer with different key
[46:00] rabble: same as working then?
[45:10] dominic: can be done incrementally
[46:20] rabble: social aspect
[46:25] alj: simpler if we don't use old feeds. Ordering issues goes away
[46:45] rabble: all follow that rule?
[47:00] cft: question for bundys proposal about ordering of content and then metadata. Strange?
[47:20] alj: replication issue, not signing
[47:30] bundy: replication part of protocol?
[47:40] alj: not at this level
[47:50] cft: summarize thinking, like dominics idea of two streams. How is off-chain content delivered?
[48:20] bundy: I didn't want a situation in the js implementation where a msg was added to the db before the content. No db rebuild. It is a bit arbitrary.
[49:10] dominic: Christian I think there are more ways to skin this cat. When you receive a buffer that is the message. You don't have to have take those bytes and hash them. You can have a format..
[49:50] cft: confused. Lacking the overall view. Which channels, how is that thing coming to me? If there is seperation between off-chain content and metadata?
[50:10] dominic: that was what bundy was sugggesting, I was going to suggest a way that could basically make. It would be exactly like the current behaviour where massages come one at a time, except they would come as buffers instead of json objects. And different rule for hashing them. You looked at the msg and just hash the metadata not the content and that was the id for that msg. Still 1 item that goes to the db, this would be easy to port the current protocol. Behave exactly the same.
[51:00] cft: not following, what do you mean by, is coming as buffer instead of json? Can't parse ;)
[51:10] dominic: in the muxrpc, each msg has a framing, these chunks come through. In EBT you have a stream, messages grouped into a stream, but each message has a little that says its this type, part of this stream, this long, and if its a string or json or buffer.
[51:50] cft: I see. The scenario you explained is valid for something that only has the metadata, the off-chain content has been deleted? Works for both in combination?
[52:05] dominic: both in combination
[52:10] cft: which sequence? Content or metadata first? How do I know that content is missing?
[52:25] dominic: I would put the content after, because then you can parse. Not sure if this is quite true. Could be the metadata would be a fixed size, or more limited size. Content is bigger, so you could skip content.
[53:00] cft: natural for me with content after. Bundy suggested the opposite. That was why I was puzzled
[53:20] alj: third suggestion, start with a boolean indicating whether the content is missing or not, then the metadata, then content.
[53:29] dominic: Thats good
[53:39] bundy: for muxrpc, to encode the boolean, metadata and content as a string and use a delimiter between them? Or binary?
[53:56] dominic: yeah, that is why I wonna do this with binary format.
[54:00] bundy: understand
[54:56] alj: think we should nail down exactly what feed types are
[55:00] dominic: top level item
Call in two weeks, alj and arj can't attend. Hamburg is coming up in 3.
[57:00] alj: opening up?
[57:20] dominic: Against that, need to focus. No beta. The people interested know.
[58:00] bundy: somebody can join?
[58:20] dominic: sure