owned this note
owned this note
Published
Linked with GitHub
# Faster remote room joins
**Outdated**: this document is now superceded by [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902).
## Design overview
* We no longer get the whole state in response to a `send_join`
* This leads to a dramatic reduction in response size, making the
response come back quicker, and (particularly) making it much
faster to process the response.
* So, the join event is flagged as having *partial state*.
* ... as are any events that use that join event as a `prev_event`,
and so on.
* Synapse's DB layer is updated so that any queries for the state
at such events *block* until the state is resolved. (This is
where we need good cancellation support.)
* But an exception: if the `StateFilter` shows that we don't need
the membership events, then there is no need to block. This
allows lazy-loading clients to keep using the room anyway.
* We have a background process which back-populates the state.
* In theory we can do this with `/state_ids` and lots of `/event`
requests, but that is glacial, so we have to optimise the code
to use `/state` instead. This has shaken out a surprising number
of bugs.
* Once the state at a particular event is populated, we can
unblock any pending DB queries for state at that event. This
requires a certain amount of marshalling (and is particularly
involved in a multi-worker environment).
The initial draft doesn't need any client-side changes, though it's likely
we will want to make some once we see how both lazy-loading and
non-lazy-loading clients perform (ie, let's do better than just presenting
spinners).
## Metrics
* a graph showing time taken to join a selection of rooms over time:
[prometheus](https://synapse-performance-test.lab.element.dev/prometheus/graph?g0.expr=performance_join_time_seconds%20and%20performance_join_success%20%3E%200%20and%20(time()%20-%20performance_join_timestamp%20%3C%2030000)&g0.tab=0&g0.stacked=0&g0.show_exemplars=0&g0.range_input=16d)
## Detailed spec changes
* Extend the `send_join` API to return less state: covered in
[MSC3706](https://github.com/matrix-org/matrix-doc/pull/3706).
### Receiving events over federation during the resync
We may not have enough state, so we replace the "state at event" check
(cf [Checks performed on receipt of a PDU](https://spec.matrix.org/v1.3/server-server-api/#checks-performed-on-receipt-of-a-pdu))
with a check against the state-res of the auth events and the state at the event.
### Soft fail
We can't follow the current soft-fail algorithm, since we may not
have the sender's membership event in the current state. For now,
we will skip the soft-fail check if there is partial state. (We
may wish to return and improve this)
### Device lists
* https://github.com/matrix-org/synapse/pull/13913
* https://github.com/matrix-org/synapse/issues/13891
* https://github.com/matrix-org/synapse/pull/13913
### Handling incoming federation requests
* https://github.com/matrix-org/matrix-spec-proposals/pull/3895
* TODO: we're changing how these are authenticated (see https://github.com/matrix-org/synapse/issues/13288).
* TODO: does it imply changes to how we send events?
## 2022-05-30 state of play
Done so far:
* server-side support for extended `send_join` API:
[MSC3706](https://github.com/matrix-org/matrix-doc/pull/3706),
[#11967](https://github.com/matrix-org/synapse/pull/11967).
* Initial client-side support for just hitting the API and
populating the DB:
[#11994](https://github.com/matrix-org/synapse/pull/11994),
[#12005](https://github.com/matrix-org/synapse/pull/12005),
[#12011](https://github.com/matrix-org/synapse/pull/12011),
[#12012](https://github.com/matrix-org/synapse/pull/12012),
[#12039](https://github.com/matrix-org/synapse/pull/12039).
* Making `/state` work correctly for outliers:
[#12173](https://github.com/matrix-org/synapse/pull/12173),
[#12155](https://github.com/matrix-org/synapse/pull/12155),
[#12154](https://github.com/matrix-org/synapse/pull/12154),
[#12087](https://github.com/matrix-org/synapse/pull/12087),
tests fixes, and more in flight
([sytest#1211](https://github.com/matrix-org/sytest/pull/1211),
[sytest#1192](https://github.com/matrix-org/sytest/pull/1192),
[#12191](https://github.com/matrix-org/synapse/pull/12191)).
* Use `/state` for resyncing large fractions of the room state:
[#12013](https://github.com/matrix-org/synapse/pull/12013),
[#12040](https://github.com/matrix-org/synapse/pull/12040).
* walk the list of partial-state events, and fill them in:
[#12394](https://github.com/matrix-org/synapse/pull/12394).
* a manager for tracking which events have partial state:
[#12399](https://github.com/matrix-org/synapse/pull/12399).
## Testing results, 2022/05/30
I attempted to join #element:matrix.org (a room of 13K users) from sw1v.org.
Results:
* 20:08:48 (+0:00): Start
* 20:09:28 (+0:40): The join itself completes, comprising:
* 2s warming up (`/query/directory`, `/make_join`, etc)
* 16s waiting for `/send_join` response
* 19s checking signatures on `/send_join` response
* 3s persisting events in the `/send_join` response
* 20:09:33 (+0:45): room is included in `/sync`. At this point, eleweb no
longer shows the room as "joining", but it still shows a spinner for
history.
* 20:15:49 (+7:01): join event is de-partial-stated
* Any messages sent by local users before this point are now
processed.
* 20:16:00 (+7:12): `/backfill` request made
* 20:16:12 (+7:24): state resync process completes
* 20:16:34 (+7:46): `/members` request completes
* 20:21:22 (+12:34): `/messages` request completes
For comparison, a regular (not-faster-joins) join:
* 21:55:30 (+0:00): start
* 21:56:29 (+1:00): client times out, reports an error
* 22:01:22 (+5:22): join completes, client shows room with pagination spinner
* 22:05:11 (+9:51): `/messages` request completes
## Next steps
Work is now being tracked under milestones in the Synapse issue tracker:
* [Q2 2022 ─ Faster joins phase 2: correctness](https://github.com/matrix-org/synapse/milestone/6)
* [Q3 2022: Faster joins: fix major known bugs for monoliths](https://github.com/matrix-org/synapse/milestone/8)
* [Q4 2022: Faster joins: worker-mode and remaining work](https://github.com/matrix-org/synapse/milestone/10)
## Outstanding questions
* What do we do if the `/state` request never completes (eg, the resident server
becomes unreachable, or leaves the room, or the `/state` response causes us to
OOM)?
* We probably struggle on zombie-like, repeatedly retrying the `/state`. But we
could end up with lots of rooms like that...
* What happens if we try to leave the room while the resync is still in progress?
Once we do so, we will be unable to make `/state` requests.
* Just leave the state incomplete?
* Purge the room?
* Not allow the last user to leave?
* Allow them to leave but not tell other servers about it?
* It's possible that, once we get the full state and chase it down through the DAG,
we'll discover some state transition is impossible. (Eg, a state event was
created by a user which turns out to have left the room at that point.)
How do we handle this?
* If we'd know about the problem upfront, we'd have just rejected the event.
* We can mark the event as rejected retrospectively, but we might have told
clients and even other servers about it in the mean time.
## other TODO list (richvdh brain dump)
* resync:
* fix the race in persistence (where the persistence thread reads a
lazy-stated event just before we re-sync it and finish up the resync job)
* there are a bunch of races in the resync code.
* [x] add the tables to `purge_rooms` (https://github.com/matrix-org/synapse/pull/12889)
* [ ] find out why `/send_join` is so slow to respond (jaeger shows it doing lots
of `bulk_get_push_rules`. Oddly, sending a message first doesn't help -
so maybe it's just not being cached right on our test server)
* Tests
* [ ] ex-outliers with lazy-loading. A unit test?
* [ ] state which turns out to be wrong when we resync
* [x] port the schema defs to postgres
* [x] switch the schema to use event_ids. It's too difficult to de-outlier things
otherwise.
---
# Older design notes - no longer relevant
## Handling the half-joined state
Auth doesn't actually depend on resolved room state - it depends on *auth events*
(though the magic "reconcile auth events" code is likely to make things behave oddly).
What *does* depend on room state is soft-fail. Maybe we can get away with not
soft-failing anything while the state sync is in progress.
So once the `send_join` completes, we need to kick off a process which:
* does a `/state` request, and updates the state at the initial join event.
* updates the state at any subsequently-received events.
* does the post-room-upgrade stuff.
For added fun, that process needs to withstand server restarts.
So how do we identify a half-joined room? Guess we should keep a db table.
### What do we do for state_groups in half-joined rooms?
We need to be able to auth events sent by local users, which really does
mean having the concept of "current state", even if it's partial. So, I think
we'll have to have cut-down state groups, and generate new state groups at resync.
### Processing incoming requests
We only want to do the reprocessing for events whose prev-events are
all either fully-stated, or on the list of events to fix up (otherwise we
won't be able to figure out the correct state). So, for any event that arrives
in the meantime:
* if any prev_events are unknown, we should get_missing_events for them
(which will populate them as regular events)
* if any prev_events are on the lazy-stated list, the new event joins the list
### Managing the list of lazy-stated events
* we could make it implicit via links to the DAG, but that gets annoyingly
inefficient
* we could assume *all* events are lazy-stated (which implies that we must
be able to get all prev_events for incoming events, or ignore them) - but
that's pretty bogus in a leave/rejoin scenario
* it's just everything with a `stream_ordering` larger than the join event
### syncing
Ideally we want to withhold lazy-joined rooms from non-LL `/sync` requests until
the `/state` completes. "State sync completion" therefore needs to trigger some
marker for the sync handler to pick up.
### Endpoints which need changing
* federation:
* `send_join`
* `state`
* `state_ids`
* c-s:
* `/members`
* `/joined_members`
* `/state`
* `/initialSync`
* `/sync`
## Old notes from outlier-based design
* It seems nice to avoid giving the events state_groups at all?
* Why can't we just mark the damn things as outliers? It'll mean updating sync and
push code not to just ignore outliers, but that might be a good thing
Doesn't work because we need partial state at these events.
### What should we do with forward and backward extremities?
Clearly, the lazy-stated events should not be excluded from being forward
extremities. So, either we need to update the forward-extremity logic to
consider lazy-stated events despite their being outliers, or we need to
decide they aren't really outliers (and update everything else that expects
non-outliers to have state).
https://github.com/matrix-org/synapse/issues/9595