Try   HackMD

Light client updates design in teku

Background

The GetLightClientUpdatesByRange API requests the light client update (LCU) for a given sync committee period start_period and for count periods after start_period.

LCUs are derived from beacon state. They are assigned to sync committee periods (8192 slots) based on their attested_header.beacon.slot. There are a few other considerations for what makes a valid LCU.

There are a few heuristics to determine the "best" LCU for a sync committee period. According to the spec:

Full nodes SHOULD provide the best derivable LightClientUpdate(according to is_better_update) for each sync committee period covering any epochs in range [max(ALTAIR_FORK_EPOCH, current_epoch - MIN_EPOCHS_FOR_BLOCK_REQUESTS), current_epoch] where current_epoch is defined by the current wall-clock time.

In the definition of is_better_update, two updates are compared based on sync committee participation, finality, slot, etc.

Another indication of which LCU is returned from beacon-APIs:

Servers SHOULD provide results as defined in create_light_client_update. They MUST respond with at least the earliest known result within the requested range, and MUST send results in consecutive order (by period). The response MUST NOT contain more than min(MAX_REQUEST_LIGHT_CLIENT_UPDATES, count) results.

Based on the specification and the API definition, we can derive the LCU for a sync committee period by is_better_update, or at minimum we provide the earliest known valid LCU in a given period.

Implementation

In the teku storage interface, we can query for states by slot, but we do not have a notion of querying by "sync committee period".

There are two ways to go forward:

  1. For each LCU request, given a start_period, compute the earliest slot within that period, and provide an LCU according to that slot. E.g. with start_period = 2, we would query slot 16384 (assuming periods are 0-indexed). Once we have the slot, we query the beacon state and compute the LCU.
    • If the LCU at the earliest slot in a period is not valid, we can proceed in-order to future slots until creating a valid LCU.
  2. As we follow the chain, compute and cache the LCU for each slot. For each LCU at slot n, compare to LCU at slot n-1 based on is_better_update. Keep the best LCU for the period, and cache (based on period number). Serve LCU requests with a cache lookup by sync committee period.

Based on the terms of the spec, option 2 is the design that would serve the best results (according to is_better_update) and be the most performant. LCU data is not big, and we only have to keep the best update per period. Looking up LCU per period is faster than querying multiple beacon states.

However, option 2 is also a larger effort in implementation, because of the updates to the teku db. It also represents a slightly larger storage requirement for clients, but the effect is almost negligible as LCUs are small.

My conclusion is to implement option 1 for the timebeing so that we can provide some data and test the light client functionality end-to-end. Implementing the cache can be written as a performance optimization at a later time when the feature is more stable.