Synapse core team
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Write
        • Owners
        • Signed-in users
        • Everyone
        Owners Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Help
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Write
Owners
  • Owners
  • Signed-in users
  • Everyone
Owners Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Faster remote room joins **Outdated**: this document is now superceded by [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902). ## Design overview * We no longer get the whole state in response to a `send_join` * This leads to a dramatic reduction in response size, making the response come back quicker, and (particularly) making it much faster to process the response. * So, the join event is flagged as having *partial state*. * ... as are any events that use that join event as a `prev_event`, and so on. * Synapse's DB layer is updated so that any queries for the state at such events *block* until the state is resolved. (This is where we need good cancellation support.) * But an exception: if the `StateFilter` shows that we don't need the membership events, then there is no need to block. This allows lazy-loading clients to keep using the room anyway. * We have a background process which back-populates the state. * In theory we can do this with `/state_ids` and lots of `/event` requests, but that is glacial, so we have to optimise the code to use `/state` instead. This has shaken out a surprising number of bugs. * Once the state at a particular event is populated, we can unblock any pending DB queries for state at that event. This requires a certain amount of marshalling (and is particularly involved in a multi-worker environment). The initial draft doesn't need any client-side changes, though it's likely we will want to make some once we see how both lazy-loading and non-lazy-loading clients perform (ie, let's do better than just presenting spinners). ## Metrics * a graph showing time taken to join a selection of rooms over time: [prometheus](https://synapse-performance-test.lab.element.dev/prometheus/graph?g0.expr=performance_join_time_seconds%20and%20performance_join_success%20%3E%200%20and%20(time()%20-%20performance_join_timestamp%20%3C%2030000)&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=16d) ## Detailed spec changes * Extend the `send_join` API to return less state: covered in [MSC3706](https://github.com/matrix-org/matrix-doc/pull/3706). ### Receiving events over federation during the resync We may not have enough state, so we replace the "state at event" check (cf [Checks performed on receipt of a PDU](https://spec.matrix.org/v1.3/server-server-api/#checks-performed-on-receipt-of-a-pdu)) with a check against the state-res of the auth events and the state at the event. ### Soft fail We can't follow the current soft-fail algorithm, since we may not have the sender's membership event in the current state. For now, we will skip the soft-fail check if there is partial state. (We may wish to return and improve this) ### Device lists * https://github.com/matrix-org/synapse/pull/13913 * https://github.com/matrix-org/synapse/issues/13891 * https://github.com/matrix-org/synapse/pull/13913 ### Handling incoming federation requests * https://github.com/matrix-org/matrix-spec-proposals/pull/3895 * TODO: we're changing how these are authenticated (see https://github.com/matrix-org/synapse/issues/13288). * TODO: does it imply changes to how we send events? ## 2022-05-30 state of play Done so far: * server-side support for extended `send_join` API: [MSC3706](https://github.com/matrix-org/matrix-doc/pull/3706), [#11967](https://github.com/matrix-org/synapse/pull/11967). * Initial client-side support for just hitting the API and populating the DB: [#11994](https://github.com/matrix-org/synapse/pull/11994), [#12005](https://github.com/matrix-org/synapse/pull/12005), [#12011](https://github.com/matrix-org/synapse/pull/12011), [#12012](https://github.com/matrix-org/synapse/pull/12012), [#12039](https://github.com/matrix-org/synapse/pull/12039). * Making `/state` work correctly for outliers: [#12173](https://github.com/matrix-org/synapse/pull/12173), [#12155](https://github.com/matrix-org/synapse/pull/12155), [#12154](https://github.com/matrix-org/synapse/pull/12154), [#12087](https://github.com/matrix-org/synapse/pull/12087), tests fixes, and more in flight ([sytest#1211](https://github.com/matrix-org/sytest/pull/1211), [sytest#1192](https://github.com/matrix-org/sytest/pull/1192), [#12191](https://github.com/matrix-org/synapse/pull/12191)). * Use `/state` for resyncing large fractions of the room state: [#12013](https://github.com/matrix-org/synapse/pull/12013), [#12040](https://github.com/matrix-org/synapse/pull/12040). * walk the list of partial-state events, and fill them in: [#12394](https://github.com/matrix-org/synapse/pull/12394). * a manager for tracking which events have partial state: [#12399](https://github.com/matrix-org/synapse/pull/12399). ## Testing results, 2022/05/30 I attempted to join #element:matrix.org (a room of 13K users) from sw1v.org. Results: * 20:08:48 (+0:00): Start * 20:09:28 (+0:40): The join itself completes, comprising: * 2s warming up (`/query/directory`, `/make_join`, etc) * 16s waiting for `/send_join` response * 19s checking signatures on `/send_join` response * 3s persisting events in the `/send_join` response * 20:09:33 (+0:45): room is included in `/sync`. At this point, eleweb no longer shows the room as "joining", but it still shows a spinner for history. * 20:15:49 (+7:01): join event is de-partial-stated * Any messages sent by local users before this point are now processed. * 20:16:00 (+7:12): `/backfill` request made * 20:16:12 (+7:24): state resync process completes * 20:16:34 (+7:46): `/members` request completes * 20:21:22 (+12:34): `/messages` request completes For comparison, a regular (not-faster-joins) join: * 21:55:30 (+0:00): start * 21:56:29 (+1:00): client times out, reports an error * 22:01:22 (+5:22): join completes, client shows room with pagination spinner * 22:05:11 (+9:51): `/messages` request completes ## Next steps Work is now being tracked under milestones in the Synapse issue tracker: * [Q2 2022 ─ Faster joins phase 2: correctness](https://github.com/matrix-org/synapse/milestone/6) * [Q3 2022: Faster joins: fix major known bugs for monoliths](https://github.com/matrix-org/synapse/milestone/8) * [Q4 2022: Faster joins: worker-mode and remaining work](https://github.com/matrix-org/synapse/milestone/10) ## Outstanding questions * What do we do if the `/state` request never completes (eg, the resident server becomes unreachable, or leaves the room, or the `/state` response causes us to OOM)? * We probably struggle on zombie-like, repeatedly retrying the `/state`. But we could end up with lots of rooms like that... * What happens if we try to leave the room while the resync is still in progress? Once we do so, we will be unable to make `/state` requests. * Just leave the state incomplete? * Purge the room? * Not allow the last user to leave? * Allow them to leave but not tell other servers about it? * It's possible that, once we get the full state and chase it down through the DAG, we'll discover some state transition is impossible. (Eg, a state event was created by a user which turns out to have left the room at that point.) How do we handle this? * If we'd know about the problem upfront, we'd have just rejected the event. * We can mark the event as rejected retrospectively, but we might have told clients and even other servers about it in the mean time. ## other TODO list (richvdh brain dump) * resync: * fix the race in persistence (where the persistence thread reads a lazy-stated event just before we re-sync it and finish up the resync job) * there are a bunch of races in the resync code. * [x] add the tables to `purge_rooms` (https://github.com/matrix-org/synapse/pull/12889) * [ ] find out why `/send_join` is so slow to respond (jaeger shows it doing lots of `bulk_get_push_rules`. Oddly, sending a message first doesn't help - so maybe it's just not being cached right on our test server) * Tests * [ ] ex-outliers with lazy-loading. A unit test? * [ ] state which turns out to be wrong when we resync * [x] port the schema defs to postgres * [x] switch the schema to use event_ids. It's too difficult to de-outlier things otherwise. --- # Older design notes - no longer relevant ## Handling the half-joined state Auth doesn't actually depend on resolved room state - it depends on *auth events* (though the magic "reconcile auth events" code is likely to make things behave oddly). What *does* depend on room state is soft-fail. Maybe we can get away with not soft-failing anything while the state sync is in progress. So once the `send_join` completes, we need to kick off a process which: * does a `/state` request, and updates the state at the initial join event. * updates the state at any subsequently-received events. * does the post-room-upgrade stuff. For added fun, that process needs to withstand server restarts. So how do we identify a half-joined room? Guess we should keep a db table. ### What do we do for state_groups in half-joined rooms? We need to be able to auth events sent by local users, which really does mean having the concept of "current state", even if it's partial. So, I think we'll have to have cut-down state groups, and generate new state groups at resync. ### Processing incoming requests We only want to do the reprocessing for events whose prev-events are all either fully-stated, or on the list of events to fix up (otherwise we won't be able to figure out the correct state). So, for any event that arrives in the meantime: * if any prev_events are unknown, we should get_missing_events for them (which will populate them as regular events) * if any prev_events are on the lazy-stated list, the new event joins the list ### Managing the list of lazy-stated events * we could make it implicit via links to the DAG, but that gets annoyingly inefficient * we could assume *all* events are lazy-stated (which implies that we must be able to get all prev_events for incoming events, or ignore them) - but that's pretty bogus in a leave/rejoin scenario * it's just everything with a `stream_ordering` larger than the join event ### syncing Ideally we want to withhold lazy-joined rooms from non-LL `/sync` requests until the `/state` completes. "State sync completion" therefore needs to trigger some marker for the sync handler to pick up. ### Endpoints which need changing * federation: * `send_join` * `state` * `state_ids` * c-s: * `/members` * `/joined_members` * `/state` * `/initialSync` * `/sync` ## Old notes from outlier-based design * It seems nice to avoid giving the events state_groups at all? * Why can't we just mark the damn things as outliers? It'll mean updating sync and push code not to just ignore outliers, but that might be a good thing Doesn't work because we need partial state at these events. ### What should we do with forward and backward extremities? Clearly, the lazy-stated events should not be excluded from being forward extremities. So, either we need to update the forward-extremity logic to consider lazy-stated events despite their being outliers, or we need to decide they aren't really outliers (and update everything else that expects non-outliers to have state). https://github.com/matrix-org/synapse/issues/9595

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully