Liquid SDK Realtime Sync Client (Final)

# Liquid SDK Real-Time Sync Client See also the original [server definition](https://hackmd.io/@_jgghrRCQeuy5DvvIGhfEQ/BkVUGZvwC#SDK-Liquid-challenges). ## Data Structures ### `SyncData` The first data structure we need is an enum that acts as a wrapper on the information we will be encrypting/decrypting on the client. #### Client-side Ideally, since we need to be selective on the fields sent, I suggest to also have separate structs also for the struct fields of the enum, like so: ```rust #[derive(Serialize, Deserialize)] #[serde(tag = "data_type", content = "data")] enum SyncData { Chain(ChainSyncData { ... }), Send(SendSyncData { ... }), Receive(ReceiveSyncData { ... }), // Future types, may be mutable or immutable } ``` #### Server-side The serialized payload on the server database (pre-encryption) will then look something like: ```json { // Matches the Rust enum "data_type": "Chain" / "Send" / "Receive" / ..., "data": { ...<inner object> } } ``` ### `Record` Additionally, we also need to serialize the record as a whole, thus we can create a duplicate of the `Record` proto definition to be used by [prost](https://docs.rs/prost/latest/prost/) or similar (we can also re-use the same struct to persist future/pending updates): #### Client-side ```rust #[derive(Serialize, Deserialize)] struct Record { id: u64; version: f32; data: Vec<u8>; } ``` ### Server-side ```protobuf message Record { uint64 id = 1; // The id of the record within the server database float version = 2; // The version representing the schema of the collection, in form X.x (major.minor) bytes data = 3; } ``` **Changes:** - Change `id` from String to u64 - Rename version to `recordVersion` (or possibly even `schemaVersion`), and set type to `float` **TODO:** Update [sync.proto](https://github.com/breez/data-sync/blob/main/proto/sync.proto) to match the new definition ## Methods ### `start()` This method acts as a wrapper for [cleanup](#cleanup()) and [connect](#connect()). ### `connect()` This method: 1. Checks the module's configuration details (e.g. uri, timeout/re-connection attempts etc.) and attempts to connect to the RPC stream 2. If it cannot connect, it will throw an error and then either: a. Halt the SDK execution b. Emit an event, alerting the user that the SDK is not currently syncing their changes 3. Otherwise, a connection is established and we instantly call [listen](#listen()). ### `listen()` This method: 1. Receives the raw, real-time changes from the RPC stream 3. Calls [apply_changes](#apply_changes([SyncRecord])) ### `decrypt(&[u8]) -> SyncData` This method: 1. Decrypts and deserializes the incoming data (based on its `data_type`) ### `encrypt(SyncData) -> Vec<u8>` This method: 1. Serializes and encrypts the outgoing data ### `apply_changes([SyncRecord])` This method: 1. For each element, checks if `recordVersion`: a. Is greater than the current schema version, we persist them and (if possible) execute a partial update. See [Backwards Compatibility](#Backwards-Compatibility). b. Otherwise, proceed 2. [Decrypt](#decryptampu8--gt-SyncData)s the record's data (skipping errors if any) 3. Based on `SyncData`'s variant, we write to the required swap tables (using batch call - this also optimizes writes in case multiple records write to the same item) ### `get_latest_changes() -> [SyncRecord]` This method: 1. Retrieves the `latestRecordId` from local storage 2. Calls [get_changes_since](#get_changes_sincerecordId--gt-SyncRecord) with the latest record id ### `get_changes_since(recordId) -> [SyncRecord]` 1. Makes a request to the `GetChangesSince` endpoint ### `set_record(SyncData)` This method: 1. [Encrypt](#encryptSyncData--gt-Vecltu8gt)s the data 2. Retrieves `latestRecordId` from the database 2. Makes a request to the `SetRecord` endpoint 3. Returns `Ok` on success, error otherwise (e.g. if the `recordId` is not up-to-date with the latest) ### `cleanup()` This method: 1. Checks if there are any persisted changes in [pending-sync-records](#pending-sync-records) (from `apply_changes`) 2. If there are, filter the array and check whether the current SDK schema version is >= each `recordVersion` (see [Forwards Compatibility](#Forwards-Compatibility) for why we can apply these changes). 3. Once each applicable change is found, call [apply_changes](#apply_changes(recordVersion,SyncData)) once again, then delete them from storage. ## Integration ### Storage The sync client should be able to read/write from a storage provider (e.g. `Persister`) to read metadata necessary for sync and for writes. This means we should extend the provider's trait methods in order to account for the sync client's requirements. #### Client Tables Here's an idea for the tables required to store the pending `SyncRecord`s ### pending-sync-records | recordId | recordVersion | data | | ----------- | ------------- | -------- | | <u64> | <f32> | <blob> | *** Secondary metadata like the `latestRecordId` can instead be stored in the application's settings table. ### Signing Given the recent changes to the signer, it's sensible to simply pass this same instance to the module. For signing, we can simply call something like `sign_ecdsa`. ### Encryption We use the same system as Greenlight SDK, so [ECIES](https://docs.rs/ecies/latest/ecies/) with the wallet's seed. ## Compatibility ### Forwards Compatibility There are 2 rules that make schemas always forward-compatible: 1. No renaming of fields 2. No changes to field types This ensures that, throughout schema version changes, we can only add/ignore fields, which makes it forward-compatible. ### Backwards Compatibility Previous versions of the SDK can attempt to sync with changes from a more recent version. What discerns whether or not this is possible is the record's `recordVersion`, which comes in the float format of `X.x` (<major version>.<minor version>). **Major Update:** A breaking schema change, does not allow for partial updating. **Minor Update:** A minimal schema change, does allow for partial updating In either case, the changes are always stored into the [pending-sync-records](#pending-sync-records) table until the client updates to a version which is at least equal or higher to the record's `recordVersion` (will be then deleted at [cleanup](#cleanup())). ## Addressing [SDK Challenges](https://hackmd.io/@_jgghrRCQeuy5DvvIGhfEQ/BkVUGZvwC#SDK-Liquid-challenges) ### Prevent double lockup on send **Scenario:** An instance receives a realtime update for a send swap. Its Boltz status stream picks up on the new persisted swap and attempts locking up funds, although another instance already did so. The initial proposal was to always fetch script history before broadcasting, but this still leaves a tiny window for race conditions to happen. So, the following has been suggested: 1. Only the initiator can broadcast a lockup (can be set with a flag) 2. Other instances may be able to refund the swap once the locktime expires OR if a final status update is received from the Boltz status stream (since both instnaces will now receive updates). **Note:** in this scenario, only one of the two refund transactions will go through, so we need to ensure we are tracking the right one locally (so `refund_tx_id` is set to the one that actually gets confirmed) ### Prevent double claim on receive **Scenario:** An application receives a realtime update for a receive swap. Its Boltz status stream picks up on the new persisted swap and attempts claiming funds, although another device already did the same. Very similar to the scenario above, though this time we can simply check the script history and, if there is more than one transaction (therefore lockup + claim) we simply get the latest's tx id and save (same issue as refund, we need to make sure we're tracking the correct tx). ### When to fetch swap onchain data (lockup/claim) Anything that is not recoverable from chain will need to be synced. Given Carlos' PR, transaction ids (refund, lockup and claim) can be derived from chain given the swap scripts, so those will be excluded from the sync payload. ### Ensure that newly-generated addresses are being tracked by LWK across instances. LWK uses two internal pointers from cache (called `last_unused_internal/external`) to determine the next addresses to use. It is based off the latest transactions which have been detected, so, since we're constantly syncing while the SDK is active, the cache always gets updated, and so do these internal pointers (see [apply_update_inner](https://github.com/Blockstream/lwk/blob/a38e6b85db5823df8d872b653c9b212af9ad5232/lwk_wollet/src/update.rs#L142) for more details). # Addressing Data Merging ***Note:** C1 stands for Client 1, C2 for client 2 and so on...* ## Possible Scenarios Scenarios in which data merging could occur are: 1. Double claiming - C1 initiates a receive. Once the swap is created, it syncs the state and starts the claim broadcast. C2 pulls the state, and tries claiming as well. Now either one of C1's tx or C2's tx can get confirmed. 2. Reserved address pool - C1 reverves a specific address, so C2 also has to lock same said address. It may happen that C2 then unlocks said address, while it's still in use in C1, causing a lock mismatch between the two instances. ## Reserved Address Pool Still have to investigate more about this case, but in theory, I think that the _unlocking_ part of the process should happen only on the client-side, and not be shared across instances (that would complicate things even further). So clients get notified when an address is locked and then have some internal logic to unlock said address without an explicit state change from the sync service. **Note:** I also think this case should get its own data payload, of type "ReservedAddress" or similar. ## Double Claiming (or similar) Scenario: C1 and C2 both tried claiming, and both have different `claim_tx_id`s. After broadcasting, they both attempt syncing to the server. C2 gets there first, and updates the global record revision. C1 gets a conflict error. What should C1 do? C1 is still within the `set_record` method, so the payload hasn't yet been consumed. On error, we should try "merging" the changes, and trying again. The steps could be: 1. Pull the latest changes from the remote 2. Apply them one by one until the conflict is found 3. Based on the conflict, apply a merge 4. If the merge is successful, try pushing again, else persist and try again later (/ set a flag in the `persisted-sync-records` like merge-failure = 1) 5. If the push is not successful due to conflict, go back to step (1) and retry, possibly up until a `MAX_SET_RECORD_ATTEMPTS` is reached (then the change can be persisted for later). ## Merging Criteria Different changes have different use-cases. I propose a data-type based criteria, which also changes depending on the fields changed. e.g. `data-type` is `Receive` - The client first computes the delta, meaning which exact fields have changed. It notices the `claim_tx_id` field does not match, so it executes some inner logic (e.g. `resolve_claim_tx_id_conflict`), which would probably query the chain to establish which of the two `claim_tx_id`s has _actually_ been confirmed. Once a "truth" has been established, the client pushes the changes back to the remote, which should now succeed if no other conflicting updates have been sent in the meantime.