P2P Private Message Architectural Challenges

# P2P Private Message Architectural Challenges ###### tags: `Kizuna-architecture-challenges` <style> .ui-infobar, #doc.markdown-body { max-width: 2000px; } </style> ## TL:DR; - P2P private message with using the combination of `call_remote` and private `Entry` ***needs*** `post_commit` at the very least to reach a functional stage. And we need the following for `post_commit` to be true. - post_commit can ensure consistency across source chains even in a single zome call given that the following is true 1. The ability to roll back the original commit if a certain side effect (i.e. call_remote) fails inside post_commit. this has been mentioned(in post_commit) in HDK so we hope it to be true. 2. If the side effect inside post_commit produces an error, return it to the frontend so that we can handle it gracefully in the frontend (retry sending again) - If both of these are the characteristic of the post_commit, then we can ensure that an act of sending a message always results in message being committed both on Alice’s and Bob’s source chain (point 1) and that the frontend can implement some sort of retry mechanism in case the wasm call errors (point 2). - Concurrent writes on source chain is a huge plus to allow multiple agents to write the message on receiver's source chain. ## Purpose This document describes different patterns of private P2P message that the Kizuna team has implemented/conceptualized. The aim is to expose the pros and challenges of each pattern and receive feedback from the community in choosing the best pattern based on scalability and security. ## General Challenge **Non-concurrency** - Given a scenario with three agents Alice, Bob, and Charlie wherein Alice sends a message to Bob at the same time Charlie sends a message to Bob, Alice’s call_remote to Bob will block other operations on Bob’s source chain causing Charlie’s call_remote to Bob to fail. In practice, this will limit the ability of an agent to only chat with one agent at a time and not receiving messages from other agents while doing so. **Scalability** - The more the number of users grow, the more we are likely to face concurrent writes that wil fail ## What we think will work ### Use of `post_commit` - Not yet implemented as `post_commit` does not exist yet - `post_commit` can ensure consistency across source chains even in a single zome call given that the following is **true** 1. The ability to roll back the original commit if a certain side effect (i.e. call_remote) fails inside `post_commit`. this has been [mentioned(in `post_commit`)](https://docs.rs/hdk/0.0.100/hdk/#hdk-is-based-on-callbacks-) in HDK so we hope it to be true. 2. If the side effect inside `post_commit` produces an error, return it to the frontend so that we can handle it gracefully in the frontend (retry sending again) - If both of these are the characteristic of the `post_commit`, then we can ensure that an act of sending a message always results in message being committed both on Alice's and Bob's source chain (point 1) and that the frontend can implement some sort of retry mechanism in case the wasm call errors (point 2). ```mermaid sequenceDiagram participant AliceUI as AliceUI participant Alice as Alice Cell participant Bobby as Bobby Cell AliceUI-->>Alice: call zome fn send_message Alice-->>Alice: commit message entry and receipt entry to chain rect rgba(0, 200, 0, 0.3) Note over Alice: this is the post_commit callback Alice-->>Bobby: call_remote to Bobby Bobby-->>Bobby: commit message and receipt to chain Bobby-->>Alice: return receipt alt call_remote Ok Note over Alice: will commits in a post_commit callback be allowed? Alice-->>Alice: commit receipt to chain else call_remote Error (network, unauthorized, Result<Error>) #Note over Alice: are errors (e.g. source chain head moved) in the remote fn propagated to the local caller? Alice-->>Alice: roll back local commits Alice-->>AliceUI: return error end end rect rgba(255, 0, 0, 0.3) opt if error is returned AliceUI-->>Alice: retry zome call end end ``` ### Use of post_commit + concurrent source chain writes - One of the more fatal errors we have encountered is `WorkflowError::WorkspaceError::SourceChainError::HeadMoved` which occurs when a source chain's head moved after another agent successfully made a commit through `call_remote` (e.g. message entry or read receipts entry) before you commit your changes to that chain. - This becomes more apparent and problematic when two or more agents (including yourself) are actively writing to a single chain. This problem also scales with the number of active users. - A retry mechanism is put in place to handle such errors but we think that the frequency of it getting called will be higher but can be remedied if concurrent writes are allowed. - Concurrent source chain writes will virtually eliminate the need for a retry mechanism except for some special/rare cases. ## What we have tried so far (not really working) ### v1.0 Single Wasm Call Pattern - a single wasm call to `send_message` commits a `Message` entry and `Receipt` entry on multiple source chains (sender and receiver) via call_remote - Pretty much not working... **Challenges** - Quoting HDK, > Remote calls will be atomic on the recipients device but could complete successfully while the local agent subsequently errors and rolls back their chain This pretty much happened to us rampantly. More specifically, the receiver was able to commit the message but the sender errors (mainly due to`WorkflowError::WorkspaceError::SourceChainError::HeadMoved`). This creates inconsistencies in the `Message` entry being committed on 2 source chains. ```mermaid sequenceDiagram participant AliceUI as Alice UI participant Alice as Alice Cell participant Bobby as Bobby Cell AliceUI-->>Alice: call send_message Alice-->>Alice: commit message entry and receipt entry to chain Alice-->>Bobby: call_remote to Bobby Bobby-->>Bobby: commit message and receipt to chain Bobby-->>Alice: return receipt Alice-->>Alice: commit receipt Alice-->>AliceUI: return message and receipt ``` ### V2.0 Granular pattern - A single wasm call can only commit a `Message` and a `Receipt` to one source chain at a time. This is done to avoid the source chain inconsistencies of the previous implementation wherein only one of the two agents will commit successfully to their own source chain. **Challenges** - requires multiple network calls to different zome functions to accomplish a single workflow of sending a message - each network call should have a corresponding retry mechanism in case that wasm call fails and returns an error - The frontend becomes an integral part of the entire process of sending a single message. This means that if somehow the frontend crashes or the page is refreshed in the middle of sending a message, there is no way to recover from it. ```mermaid sequenceDiagram participant AliceUI as AliceUI participant Alice as Alice Cell participant Bobby as Bobby Cell AliceUI-->>Alice: call save_message zome fn Alice-->>Alice: commit message entry and receipt entry to chain rect rgba(255, 0, 0, 0.3) alt if error is returned AliceUI-->>Alice: retry save_message zome call else if zome call succeeds Alice-->>AliceUI: return message and receipt end end AliceUI-->>Alice: call send_message zome fn Alice-->>Bobby: call_remote to Bobby Bobby-->>Bobby: commit message and receipt to chain Bobby-->>Alice: return receipt rect rgba(255, 0, 0, 0.3) alt if error is returned AliceUI-->>Alice: retry send_message zome call else if zome call succeeds Alice-->>AliceUI: return receipt end end Note over AliceUI: if frontend crashes or page is refreshed here, <br> the returned receipt, being the input to the next fn, <br>is lost and there is no way to recover from it AliceUI-->>Alice: call save_receipt zome fn Alice-->>Alice: commit receipt rect rgba(255, 0, 0, 0.3) alt if error is returned AliceUI-->>Alice: retry save_receipt zome call else if zome call succeeds Alice-->>AliceUI: return receipt end end ``` ### Art's suggestion > I'm dubious that intermediate receipt statuses are something you want to be committing to chain. May be a good use for the ephemeral store. But let's look at what's going on in this workflow. 1 commit the message, good. 1.1 commit another element saying the message is saved... useless. The message being saved is proof of that, we don't need to trigger another bunch of dhtOps to tell us locally something we already know, since the info isn't intended for anyone else on the DHT. 2 You don't necessarily need post-commit here. You can emit a remote-signal to Bob at the end the #1 zome call to let him know about the message, which will execute async (Remote signals can basically act like async remote calls, depending on what you do with signal.) > 3. When Bob processes the SIGNAL he can remote_signal to Alice to let her know the message was delivered. 4.1. If Alice receives Bob's signal they both know the message was delivered, and she can commit a status entry to that effect. If she doesn't receive Bob's response, she can try resending it later, which should end up being idempotent for Bob, so he can just re-signal that it was delivered. 4.2 Alice can retry sending any entries for which she has not received a "delivered" receipt (without ever having to commit a "saved" receipt) and without post-commit, chain rollbacks, or countersigning. 1.1 no comment and fully agree with this one. 2 - If bob is offline at the time Alice sent the remote signal, then we could retry sending later but we could also use the ephemeral store so that things can work asynchronously between agents. - Im in need of a bit of a clarity regarding the call zome workflow to fully understand the suggested pattern here. Given that `inline_validation` runs after the execution of the actual zome function, doesn't this mean that we cant be sure that a commit is successful within the same wasm call at the time of invoking `remote_signal` even if it is literally placed after the `create_entry` call? The `inline_validation` can return an error but the `remote_signal` will still be executing resulting in alice sending a message to bob that she failed to write on her chain? (Hence we thought of the need for post commit) 3 Bob's commit of message detail received through remote_signal can fail but the remote_signal of the delivered receipt back to alice will still execute, resulting in alice thinking that the message was delivered when bob failed to commit the message detail to his source chain. (Also, if we invoke `create_entry` inside the `recv_remote_signal` function, where will the error be returned if `create_entry` fails?) #4.1 related to #3, Im not sure if alice can be 100% sure that bob was able to commit the message detail to his source chain with the delivered receipt of #4.2 Yep, agreed. --- > 2 You don’t necessarily need post-commit here. You can emit a remote-signal to Bob at the end the #1 zome call to let him know about the message, which will execute async (Remote signals can basically act like async remote calls, depending on what you do with signal.) - pro: using remote signals can reduce the dependency of zome functions on `call_remote` which blocks most activity on the source chain - it is unclear yet how performant this pattern is in allowing concurrent sending/receiving messages among multiple users - `post_commit` would still be greatly useful in this case --- ### Suggested Pattern by Art #### use of remote_signal - possible sources of inconsistencies 1. A attempts to commit but eventually fails but has already sent a remote_signal triggering B to successfully commit 2. A successfully commits and sends a remote_signal but B fails to commit - TODO: asynchronous messaging - we do not have a way to know whether the remote signal was successfully received ```mermaid sequenceDiagram participant AliceUI as AliceUI participant Alice as Alice Cell participant Bobby as Bobby Cell participant BobbyUI as BobbyUI rect rgba(0,100,0,0.5) AliceUI-->>Alice: call send_message zome fn Alice-->>Alice: commit message entry to chain Alice-->>Bobby: remote_signal with the message as payload alt send_message fn success Alice-->>AliceUI: return message else send_message fn fail Alice-->>AliceUI: return error + message Note over AliceUI: must know if remote signal has been sent rect rgba(250,0,0,0.6) AliceUI-->>Alice: retry send_message end end end rect rgba(250,165,0,0.8) Bobby-->>Bobby: match signal in recv_remote_signal to call receive_message fn end rect rgba(0,100,0,0.5) Note over Bobby: receive_message fn Bobby-->>Bobby: commit message entry Bobby-->>Bobby: create and commit delivered receipt Bobby-->>Alice: remote_signal with the delivered receipt as payload Bobby-->>BobbyUI: emit_signal with message as the payload alt receive_message fn success Bobby-->>BobbyUI: return message else receive_message fn fail Bobby-->>BobbyUI: return error + messsge Note over BobbyUI: must know if remote_signal has been sent rect rgba(250,0,0,0.6) BobbyUI-->>Bobby: retry receive_message end end end rect rgba(250,165,0,0.8) Alice-->>Alice: match signal in recv_remote_signal to call receive_receipt end rect rgba(0,100,0,0.5) Note over Alice: receive_receipt fn Alice-->>Alice: commit delivered receipt entry Alice-->>AliceUI: emit_signal with the receipt as the payload Alice-->>AliceUI: return receipt end rect rgba(0,100,0,0.5) BobbyUI-->>BobbyUI: read message BobbyUI-->>Bobby: call read_message fn Bobby-->>Bobby: create and commit read receipt Bobby-->>Alice: remote_signal with the read receipt as payload Bobby-->>BobbyUI: return receipt Note over Alice: TODO: unify receive_receipt and receive_read_receipt fns end rect rgba(250,165,0,0.8) Alice-->>Alice: match signal in recv_remote_signal to call receive_read_receipt fn end rect rgba(0,100,0,0.5) Note over Alice: receive_read_receipt fn Alice-->>Alice: commit read receipt entry Alice-->>AliceUI: emit_signal with the receipt as the payload Alice-->>AliceUI: return receipt end ``` ------ ### retry pattern - inside recv_remote_signal, encase the call to any function inside a match case. If the function fails, emit a signal to the UI. ```rust= SignalDetails { name: < RETRY_TYPE >, // (e.g. retry commit receipt) payload: < DATA >, // data necessary for recommitting. Most likely the payload received from remote_signal }; ``` Committing duplicate messages in the receiver's chain - happens during retries which hopefully would be rare - pros: 1. easy to implement 2. will not have overhead costs of checking the chain for duplicates every time the receiver receives a message - cons: 1. additional storage needed especially for files 2. moves the source chain head #### Success scenario **send_message**: ran and succeeded **send_message remote_signal**: dispatched **receive_message** ran and succeeded **receive_message remote_signal**: dispatched **receive_receipt**: ran and succeeded | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :heavy_check_mark: | :heavy_check_mark: | | Receiver | :heavy_check_mark: | :heavy_check_mark: | #### Error scenarios 1. **send_message**: ran but eventually failed **send_message remote_signal**: didn't dispatch **receive_message**: didn't run **receive_message remote_signal**: didn't dispatch **receive_receipt**: didn't run | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :x: | :x: | | Receiver | :x: | :x: | 2. **send_message**: ran and succeeded **send_message remote_signal**: dispatched **receive_message**: ran but eventually failed **receive_message remote_signal**: 1) didn't dispatch; 2) dispatched **receive_receipt**: 1) didn't run; 2) ran but eventually failed | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :heavy_check_mark: | :x: | | Receiver | :x: | :x: | 3. **send_message**: ran but eventually failed **send_message remote_signal**: dispatched **receive_message**: ran but eventually failed **receive_message remote_signal**: dispatched **receive_receipt**: ran and succeeded | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :x: | :heavy_check_mark: | | Receiver | :x: | :x: | 4. **send_message**: ran and succeeded **send_message remote_signal**: dispatched **receive_message**: ran but eventually failed **receive_message remote_signal**: dispatched **receive_receipt**: ran and succeeded | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :heavy_check_mark: | :heavy_check_mark: | | Receiver | :x: | :x: | 5. **send_message**: ran but eventually failed **send_message remote_signal**: dispatched **receive_message**: ran and succeeded **receive_message remote_signal**: dispatched **receive_receipt**: ran but eventually failed | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :x: | :x: | | Receiver | :heavy_check_mark: | :heavy_check_mark: | 6. **send_message**: ran and succeeded **send_message remote_signal**: dispatched **receive_message**: ran and succeeded **receive_message remote_signal**: dispatched **receive_receipt**: ran but eventually failed | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :heavy_check_mark: | :x: | | Receiver | :heavy_check_mark: | :heavy_check_mark: | 7. **send_message**: ran but eventually failed **send_message remote_signal**: dispatched **receive_message**: ran and succeeded **receive_message remote_signal**: dispatched **receive_receipt**: ran and succeeded Workaround 08202021 - Return the Message payload together with the Delivered receipt from bob so that Alice can check her source chain and see if she already committed the message or not. If not, then she commits the message and then the delivered receipt. If message is committed already then she simply commits the delivered receipt. - In the UI, if the message failed to be committed, then the emit_signal in receive_receipt should also send back the message payload so that it can render the message in the UI. | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :x: | :heavy_check_mark: | | Receiver | :heavy_check_mark: | :heavy_check_mark: | --------- | | Message | Delivered Receipts | | -------- | -------- | -------- | | Sender | :x: | :x: | | Receiver | :x: | :x: | ``` Workarounds: A. RETRY SEND_MESSAGE TRIGGERED BY A RETURNED ERROR FROM CALLING SEND_MESSAGE If sender::message is **NOT OK** regardless of the status of other entries (cases 1, 3, 5, 7) 1. UI receives an error from send_message 2. UI retries send_message 3. dispatch remote_signal receive_message a. receive_message checks if receiver::message and receiver::receipt is already committed i. If receiver::message and receiver::receipt is **OK**, do not commit ii. If receiver::message and receiver::receipt is not **OK**, commit b. dispatch remote_signal for delivered_receipt 4. receive_receipt checks if sender::message corresponding to receiver::receipt is already committed a. If sender::message is **OK**, do not commit b. If sender::message is **NOT OK**, commit solves 1, 3, 5, 7 B. RETRY RECEIVE_RECEIPT TRIGGERED BY A SIGNAL FROM RECV_REMOTE_SIGNAL If sender::receipt is **NOT OK** (cases 1, 2, 5, 6) 1. UI receives a signal containing the receipt that receive_receipt has failed. 2. UI calls receive_receipt 3. receive_receipt checks if sender::message corresponding to receiver::receipt is already committed a. If sender::message is **OK**, simply commit receipt b. If sender::message is **NOT OK**, commit message and receipt solves 6 C. RETRY RECEIVE_MESSSAGE TRIGGERED BY A SIGNAL FROM RECV_REMOTE_SIGNAL If receiver::message and receiver::receipt is **NOT OK** (cases 1, 2, 3, 4) 1. UI receives a signal containing the message that receive_message has failed. 2. UI calls receive_message 3. receive_message checks if receiver::message and receiver::receipt is already committed a. If receiver::message and receiver::receipt is **OK**, do not commit b. If receiver::message and receiver::receipt is not **OK**, commit 4. dispatch remote_signal for delivered_receipt 5. receive_receipt checks if sender::message corresponding to receiver::receipt is already committed a. If sender::message is **OK**, do not commit b. If sender::message is **NOT OK**, commit solves 2, 4 ``` --------- ```mermaid sequenceDiagram participant AliceUI as AliceUI participant Alice as Alice Cell participant Bobby as Bobby Cell participant BobbyUI as BobbyUI rect rgba(0,100,0,0.5) AliceUI-->>Alice: call send_message zome fn Alice-->>Alice: commit message entry to chain Alice-->>Bobby: remote_signal with the message as payload rect rgba(220,0,0,0.5) opt send_messsage fn fails Alice-->>AliceUI: return message + error Note over Alice: must know if receiver committed AliceUI-->>Alice: retry commit with remote_signal Note over Alice: receiver should check if message already is in chain Note over Alice: will receiving the same signal be performant for the receiver? end end Alice-->>AliceUI: return Ok + message end rect rgba(250,165,0,0.8) Bobby-->>Bobby: match signal in recv_remote_signal to call receive_message fn end rect rgba(0,100,0,0.5) Bobby-->>Bobby: check chain if message exists Bobby-->>Bobby: commit message entry Bobby-->>Bobby: create and commit delivered receipt Note over Bobby: since this is run async, we can't determine if the sender has successfully committed or not <br> assume that the sender successfully committed and deliver receipt anyway Note over Bobby: sender should check if the messages corresponding to the receipt is in the chain Bobby-->>Alice: remote_signal with the delivered receipt as payload rect rgba(220,0,0,0.5) opt receive_message fn fails Bobby-->>BobbyUI: return message + error BobbyUI-->>Bobby: retry commit without remote_signal end end Bobby-->>BobbyUI: return Ok + message + receipt end rect rgba(250,165,0,0.8) Alice-->>Alice: match signal in recv_remote_signal to call receive_receipt end rect rgba(0,100,0,0.5) Note over Alice: receive_receipt fn Alice-->>Alice: commit delivered receipt entry Alice-->>AliceUI: emit_signal with the receipt as the payload Alice-->>AliceUI: return receipt end ``` ## 08/19/2021 Testing of P2P Message with remote_signal ### Findings - Sending of texts consecutively from one participant generally works fine. However, some messages are missing either on the sender side or the receiver side. - Both agents sending text at the same time causes the Source Chain Head Moved error on either side. And the message sometimes gets received but not saved on sender's chain. Pretty much the same with images as well. - Sending of image and text seems to not have as many errors as it did before - Source chain head moved errors are reduced compared to the call_remote implementation ## 08/22/2021 Testing of P2P Message with retry - emit_signal of message is getting called even if the message is not being committed to the source chain - move emit_signal from inside receive_message fn to the match case in recv_remote_signal - a success emit_signal will only be dispatched when the receive_message fn returns an Ok value - UI will add the message into the redux only when the receive_message fn succeeds - the error from receive_message is not being included emit_signal - When the retry of send_message happens, the returned value is not being stored in redux, therefore the message does not appear in ui for the sender - When the sender fails to send the message and retries again, the receiver may receive the same message twice and commits it to his chain. There has to be a mechanism to avoid rendering the same message twice in ui. (maybe filter with entry hash.) - In betweent the retries, the timestamp changes and so essentially we are sending 2 different messages. So we have to find a way to commit the message in retry with the same timestamp. - The image is having a hard time sending. (Probably problem with proxy) ## 08242021 retry for commit will be merged soon - given that source chain head moved will be relaxed, allowing the source chain head to move underneath a commit, we can retry the call_remote architecture - most fatal errors in the p2p architecture are brought about by the SourceChainHeadMoved error - we found out that errrors occuring inside `recv_remote_signal` does not get returned to the ui which means that we cannot really retry if a certain logic fails inside the `recv_remote_signal` - validation happens after an extern_call and the entire `recv_remote_signal` is an extern_call which means that the source chain head moved error will only be encountered after the `match` case and `emit_signals` inside the function - Therefore, we can decide to try the call_remote pattern again together with the relaxed moving of source chain. ### Todos - remove the sent receipt from the call_remote pattern as well. - fix the lazy loading bug #84