Light clients are crucial elements in blockchain ecosystems. They help users access and interact with a blockchain in a secure, decentralized and trustless manner. https://www.parity.io/blog/what-is-a-light-client/

Owner(s)

@Daan van der Plas

Smoldot: Hello world!

For the last 1,5 months I’ve gone through the Smoldot code which was written by Pierre Krieger. Smoldot is a Rust/Substrate client alternative which contains two clients: the full client and the wasm light node. As for this knowledge base, it is solely focused on the wasm light node implementation. I will start by giving a short introduction to Smoldot and Substrate Connect. Thereafter I want to guide you through the code and give you an overview of how everything works and why it is important. Because Pierre has done an amazing job with documenting everything I have often used links to the source code.

Alright, lets go!

Table of contents:

Smoldot and Substrate Connect
Smoldot abstract
Chainspec
Networking service
Synchronization service
Runtime service
Transaction service
JSON-RPC service

Before we begin, the code executes asynchronously and uses many asynchronous tasks. Tasks will be mentioned throughout the code walk through and are shown in the diagrams, provided for each service, with a symbol (T). Moreover, the arrows between the service pending and other services represent mspc-channels which are used to share information between these asynchronous tasks. Simply put, the channels have a transmit and receiving end. Both ends have shared ownership over a variable, allocated on the heap. The sending end notifies the receiving end when it has changed the shared variable.

Smoldot and Substrate Connect

Most blockchain user interfaces in the ecosystem work by connecting via a server (e.g. PolkadotJS) to a few trusted blockchain nodes which represent a central point of failure. Generally, if one wants to securely interact in a trustless manner with a blockchain, syncing a full node is necessary, which requires a lot of knowledge, effort, and resources. This is where light clients come into play. Put simply, a light client joins the peer-to-peer network of a chain and is able to interact with multiple full nodes. Unlike full nodes, light clients don’t need to run 24/7 and store a lot of data. As a matter of fact, light clients rely on full nodes for obtaining the information they need, e.g. requesting the balance of a specific user. However, and this is very important, after obtaining this information a light client verifies the information is correct. As a result, people that interact with blockchain user interfaces will independently and trustlessly obtain information from the blockchain due to a light client running on their device! In addition, Smoldot validates a submitted transaction (coming from the user) before it sends it to full nodes to be added to the transaction pool. The need for trusted blockchain nodes is now only necessary for light clients to join the peer-to-peer network.

Substrate Connect provides the infrastructure necessary to run light clients directly in the browser. In addition, the browser extension enables resource sharing across browser tabs. Without the extension, Substrate Connect runs in the browser with each browser tab running a light client instance. This route will no doubt negatively impact page loading speed, providing a suboptimal user experience, especially compared to Web2 alternatives. Furthermore, Substrate Connect doesn’t require a TLS certificate to connect to nodes, as the connection is initiated from within the browser extension, which has more access rights than a typical website. Substrate Connect works in all major browsers, and when using the extension, it acts as a bridge, where Smoldot will run in the extension, making it possible for every tab or website to sync with the chain.

Now lets unravel how Smoldot syncs with the chain!

Smoldot abstract

First, this abstract is made to cover the essentials of Smoldot without having to read the whole document. Everything will be described again and more elaborately beginning from the chainspec.

Whether Smoldot is used to interact with the relay chain or a parachain, it always needs to sync with the relay chain. As for a parachain, due to Polkadot's shared security, the parachain's finalized state is guaranteed on the relay chain. In other words, information regarding a parachain block, whether it is included, reverted or finalized, is acquired from the relay chain.

To verify whether a relay chain block is finalized, a light client needs to know the elected validators from the GRANDPA protocol. By knowing the elected validators for a given era Smoldot can verify the justification which contains the proof of the finality of a block. The justification exists of a set of GRANDPA commits signed by all the elected validators. If more than 2/3 + 1 of the elected validators voted with their commit on the block and the signatures are correct, Smoldot can conclude the block is finalized.

The elected validators form the authority set for a given GRANDPA era and are elected by the NPoS algorithm. Changes to the authority set are crucial to track for a node who tries to sync with the relay chain in order to verify justifications. It also offers an alternative and less resourceful way of getting to the head of the chain, called the warp sync protocol. Instead of requesting all the blocks to get to the current state of the chain, a light client only needs to request fragments. These fragments provide the necessary proofs of the changes that have been made to the authority set.

When it is up to date with the latest GRANDPA era, Smoldot will start syncing with the chain similar to a full node. To clarify, instead of holding the state and verifying the content of the new (non-finalized) blocks, it will only receive and verify new (non-finalized) block headers. Smoldot verifies a (non-finalized) block header by verifying the authenticity of the block. In other words, whether the author of the block was selected by the BABE protocol. In short, BABE breaks time into epochs, with each epoch being broken into slots. BABE will select an author (or several) to author a block in each slot.

On the whole, Smoldot verifies consensus- and finality by keeping track of the authority set (the active validator set). The starting authority set will be obtained through the runtime and when it is up to date with the chain it acquires subsequent changes from the block headers.

Chainspec

In order to give Smoldot the necessary information to start syncing with a chain it requires the chainspec. A chain specification is the description of everything that is required for the client to successfully interact with a certain blockchain. From the chainspec the chain information is built. In the code, the chain information is all the crucial information Smoldot needs to know to verify the consensus and finality. It needs an initial state for the chain information and it is constructed from either:

The genesis state:
The state of the chain, the current content of the storage (data structure: the patricia merkle trie) has a special key named :code which contains the WebAssembly code of the runtime. Besides the WASM code, it contains the runtime version and runtime APIs it provides. As for adding a relay chain, specific runtime calls of the GRANDPA-pallet are required to build the starting chain information:
- GrandpaApiAuthorities
- GrandpaApiCurrentSetId
  Prior to executing the dispatchables it builds the runtime (see runtime service)
The checkpoint:
A point in time of the finalized chain that can be used to obtain the chain information from.
From the database:
From local environment, not from the chainspec.

(⇒ All resulting in the ChainInformation)

As a result, the chain information provides the starting authority set for the warp sync process. However, to sync with a chain it needs information from other peers and therefore needs to join the peer-to-peer network.

Networking service

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

In order to connect to other nodes it needs the networking service. It is responsible for joining the peer-to-peer network, for Substrate-based chains, the libp2p protocol is used. The networking service starts three background tasks:

Processing the networking service.
Peer discovery.
Processing existing connections.

For more information regarding Substrate's Networking Protocol.

Processing the networking service (T)

This task ensures that Smoldot is always connected to a certain amount of peers. The amount of in-slots and out-slots refer, respectively, to the maximum amount of peers that can connect to Smoldot and the maximum amount of peers Smoldot connects to. In addition, when a connection is established, it requests to open substreams with the given peer. Last but not least, it coordinates the requests and responses from the connections to Smoldot and vice versa. In the code the the coordinator is responsible for this.

In detail:

Assign out-slots:
Only if there are slots left, it tries to make a connection with closest peers obtained from the k-buckets, referred as desired peers.
Start new connections (and notification substreams) with desired peers:
When a peer is set as desired, Smoldot tries to make a connection. The peer will respond with either making a Single stream connection or a Multi stream WebRTC connection. An asynchronous task will be spawned for each connection to further progress with the handshake phase and requesting / closing new (inbound / outbound) substreams. All together, everything to exchange information with a given peer (see Processing existing connections).
The coordinator is updated about the connectivity status with a given peer. Moreover, from the inbound open substreams Smoldot receives information. The coordinator is responsible for distributing it to the right service. The other way around, it sends information through outbound open substreams coming from other services within Smoldot.
Process requests and updates from other services:
The coordinator is also responsible for requesting new substreams, based on requests from other services, that it needs with a given peer. In addition, when a service has received incorrect information the coordinator needs to be updated in order to e.g. shut down the connection.

Peer discovery (T)

For a peer-to-peer network it is necessary to have the information to connect to any given peer, available at any given time. Due to its origin, the storage of this information needs to be decentralized and is therefore divided over all the peers in the network through the Kademlia distributed hash table (DHT). In the DHT each node has so called k-buckets, which form a partial view of the complete list of all the peers in the network. Peer discovery is done by sending a Kademlia "find node" request to a single peer. More specifically, based on the Kademlia request-response protocol, it builds a wire message to ask the target to return the nodes closest to the parameter. To decide which peer it sends this request to:

It creates a Sha256 hash of a random created peerID, the target hash.
It checks the k-buckets for the closest peers.
It hashes the peerIDs of closest peers and obtains a list with ascending order of the log2 distance between the target hash and the k-bucket peer’s hash.
It obtains the peer with the shortest distance and where it already has an established connection.

As a result, Smoldot receives a new list of peers with their multiadresses. These peers will be inserted into the k-buckets, if there is space, and/or replaces the peers that have not been connected.

After the initialization of the network, sync, runtime and transaction service, the bootnodes (obtained from the chainspec) are added to Smoldot’s k-buckets and Smoldot can start connecting to the peer-to-peer network.

Processing existing connections (T)

Smoldot has two types of connections due to API-related purposes, more specifically two ways of calling the browser's APIs.

In the Single stream connection the browser just sends or receives data and smoldot implements substreams on top of that. So the substreams are handled internally. More specifically, when a connection is established, the handshake phase starts with the multistream-select protocol. As of now, Smoldot supports the noise protocol for encryption and the yamux protocol for multiplexing. After the handshake has finished, the connection is in the established phase and substreams can be opened by the local and remote endpoint.
In the Multi stream WebRTC connection the browser handles the substreams.

There are two types of substreams:

Notifications:
- Block announcements protocol
- GRANDPA announcements protocol (GRANDPA commits)
- Transaction announcements protocol (not implemented for Smoldot)
Request-response:
- IPFS id protocol
- Sync protocol
- Light protocol
- Kademlia protocol
- Sync warp protocol
- State protocol

When a connections is made, a block announcement substream is requested, if accepted, followed by a request for additional GRANDPA announcements. In addition, when an inbound substream is opened it requests for an outbound substream. This is used to update the peers with Smoldot’s view of the state of the chain.

Synchronization service

Now that the networking service has been set up it starts a new background task, the synchronization service. The role of the sync service is to do whatever necessary to obtain and stay up to date with the best and finalized blocks of the chain. As for the relay chain it starts with the warp sync protocol to get to the latest GRANDPA era as fast as possible, then it changes to the all-forks protocol. The parachain synchronization relies on the relay chain, more specifically the runtime service of the relay chain.

In general, Smoldot will track a list of sources, which represent peers that Smoldot receives new block headers and GRANDPA commits from. From the information it receives from a given peer, it knows which (non-finalized) blocks this peer is aware of.

Relay chain

The flow of the relay chain synchronization service exists of:

Sending new requests to peers to progress sync state machine:
- Fragment request, for the warp sync.
- Storage request, provided with a merkle proof.
- Runtime call request: In order to execute a runtime call, Smoldot needs specific merkle values from storage. Provided with a merkle proof.
- Block request: For the all-fork sync, if a block(s) is missing between (non-finalized) blocks that it receives and the latest finalized block Smoldot knows. Or a json-rpc request about a block Smoldot doesn't know about.
Process network events.
Process requests from:
- Runtime service.
- Transaction service.
- JSON-RPC service.

Warp sync

Thanks to GRANDPA, Smoldot is able to get up to date with the latest GRANDPA era through the warp sync protocol. Instead of requesting all the blocks to get to the current state of the chain, a light client only needs to request fragments. These fragments provide the necessary proofs of the changes that have been made to the authority set, also known as the elected validators. The elected validators are elected by the NPoS algorithm to participate in the GRANDPA protocol for that given era (more information). By knowing the elected validators, Smoldot can verify a justification. In other words, it can verify the finality of a block within a given era. Changes to the authority set are signed by the previous authority set and stated in the block header at the end of the era. When the respective block is finalized and the justification is verified by Smoldot, it knows the new authority set in a trustless manner!

The warp syncing process can be split into 4 phases:

Requesting the fragments.
- Block headers where authority set changes occurred.
Verifying fragments.
- Verifying the justification of the header where the changes occurred:
  - If any block between the latest finalized one and the target block triggers any GRANDPA authorities change, then we need to finalize that triggering block before finalizing the one targeted by the justification.
  - Check if public keys are within the authority set.
  - Check that justification contains a number of signatures equal to at least (2/3 + 1) of the number of authorities.
  - Verify signatures.

When Smoldot is up to date with the latest GRANDPA authority set, i.e. latest era:

Download the latest finalized runtime:
- Send storage request, to sync service, to obtain runtime code.
  - Decode the header of the given block to obtain the state root.
  - Send a request for a list of peers that it assumes are aware of this block.
  - Send storage proof request.
  - Decode and verify proof
  - Obtained:code (encoded WASM blob).
- Download runtime
  - Build the WASM virtual machine.
Build latest the chain information:

Send call proof request, to sync service, to execute runtime call.
- Send a request for a list of peers that it assumes are aware of this block.
- Sending call proof request.
- Decode and verify proof.
Executing BABE related runtime calls:

As might be noticed, this time the dispatchables are BABE related instead of GRANDPA related. That is because the BABE protocol decides which elected validator is allowed to author a block when, important for the all-forks syncing protocol.

All-forks sync

The all-forks syncing strategy is very similar to how full nodes sync with the network; holding the state, verifying the content of the new (non-finalized) blocks and GRANDPA commits. Yet, Smoldot will only receive and verify new (non-finalized) block headers and GRANDPA commits. Smoldot verifies a (non-finalized) block header by verifying the authenticity. In other words, whether the author of the block was selected by the BABE protocol.

BABE breaks time into epochs, with each epoch broken into slots. BABE will select an author (or several) to author a block in each slot. Each slot can have a primary and secondary author (or “slot leader”). Primary slot leaders are assigned randomly, using VRF. VRF takes an epoch random seed (agreed upon in advance by all nodes), a slot number and the author’s private key. Each author evaluates its VRF for each slot in an epoch. For each slot whose output is below some agreed-upon threshold, the validator has the right to author a block in
that slot. Because the function is random, however, sometimes there are slots without a leader. In order to ensure a consistent block time, BABE uses a round-robin system to assign secondary slot leaders.

The header of the first block produced after a transition to a new epoch contains the public keys that are allowed to sign new blocks for the next epoch. When this block is finalized, Smoldot knows the block authors for the next epoch.

The all-forks syncing process :

Verify block headers
- Block number.
- Compare latest block header with new block header’s parent hash.
- Verify whether a block header provides a correct proof of the legitimacy of the authorship.
  - BABE:
    - Verify that the signature from the seal is the same as the hash of the unsealed header signed with the public key of the block author.
    - Primary slot: verify the VRF output and proof.
    - Secondary slot: create a Blake2b hash from the epoch’s randomness and the slot number. The authority who matches the authority index from ⇒ hash % number of authorities.

Correspondingly, it determines if it is the best block, it checks for consensus updates in the header digest, and adds the block to the state machine:

Verifying GRANDPA commits:
- Check if public keys are within authority set.
- Check if commit is about a block which is the target block or a descendant.
- Verify signatures.
- If any block between the latest finalized one and the target block triggers any GRANDPA authorities change, then we need to finalize that triggering block before finalizing the one targeted by the justification.

Important to mention, when the warp sync finished, there is a high probability that there is a misalignment in the received information and Smoldot’s latest finalized block. In order to correct this misalignment, Smoldot requests the necessary blocks and justifications.

Verifying justifications:
- If any block between the latest finalized one and the target block triggers any GRANDPA authorities change, then we need to finalize that triggering block before finalizing the one targeted by the justification.
- Check if public keys are within authority set.
- Check that justification contains a number of signatures equal to at least (2/3 + 1) of the number of authorities.
- Verify signatures.

After succesfully verifying a new block or GRANDPA commit, Smoldot sends an update to other light clients it is connected to. In addition, it updates the services that are subscribed to the sync service:

Runtime service
Transaction service
JSON-RPC service.

Last, it informs all its present and future peers of the state of the local node regarding the best and finalized block.

Parachain

It is recommended to first read the runtime service with the relay chain in mind.

The relay chain stores the head-data, also known as parahead, of every registered parachain.
The flow of the parachain and parathread syncing service exists of:

Processing relay chain notifications.
- Updates from the runtime service.
Fetching new paraheads.
Processing network events.
- GRANDPA commit not used.
Processing requests from:
- Runtime service.
- Transaction service.
- JSON-RPC service.

Parachain and parathread sync

First we subscribe to the runtime service to obtain the current state of the relay chain and get notified about new blocks. This is done by subscribing to the runtime service of the relay chain. When the finalized runtime is downloaded by the runtime service the parachain syncing service can fetch paraheads, i.e. best and finalized blocks, by calling the ParachainHost_persisted_validate_data. However, in order to execute this extrinsic, Smoldot needs the merkle node values which are required during runtime execution by:

Sending a call proof request, to the sync service, to execute runtime call.
- Send a request for a list of peers that it assumes are aware of this block.
- Sending call proof request.
- Decode and verify proof.
Starting a WASM virtual machine, with automatic storage overlay and logs management, to execute the ParachainHost_persisted_validation_data extrinsic.

⇒ If successful, obtain parahead from PersistedValidationData.

The parahead, head data, contains information about a parachain block. When the relay chain block, where the parahead is obtained from, is finalized, the parahead is finalized as well. When a parahead is finalized, it checks whether it is up to date with the block height of other parachain nodes.

New blocks and finality updates are notified to the subscribers:

Runtime service.
Transaction service.
JSON-RPC service.

Runtime service

The next asynchronous task is the runtime service. Essentially, this service wants to have the latest finalized downloaded runtime to provide to other services. Therefore it needs to stay up to date with the chain. In order to stay up to date with the chain it subscribes to the synchronization service. When a service subscribes to the sync service it receives the current state of the chain (the finalized block and the non-finalized descendants) and gets notified regarding new non-finalized (best) blocks and finality updates. As a result, the runtime service holds a data structure of a tree of non-finalized blocks that all descend from the finalized block.

These new block headers contain the header digest which states whether the runtime has been upgraded. If so, the runtime service has to download the new runtime:

Send storage request, to sync service, to obtain runtime code.
- Decode the header of the given block to obtain the state root.
- Send a request for a list of peers that it assumes are aware of this block.
- Send storage proof request.
- Decode and verify proof.
- Receive :code (encoded WASM blob)
Download runtime
- Through wasmi it is able to interpret the WASM blob in native binary as a WASM virtual machine. The dispatchables from the runtime are exported and the host functions are imported. That all together makes the WASM virtual machine specific to a Substrate/Polkadot runtime.
  
  When executing a runtime call through a WASM virtual machine, the execution could be interrupted because it requests execution of a host function. The virtual machine is then interrupted, the host function is executed and the virtual machine will be resumed with the result of this host function.

When the runtime service gets notified about the block which stated the runtime upgrade, it has the latest finalized runtime again. Accordingly, the data structure will be pruned and the following services can utilize the latest (upgraded) finalized runtime:

Parachain and parathreads syncing service
Transactions service
JSON-RPC service

Transaction service

The transaction service handles everything related to transactions. It holds a data structure called the transaction pool which holds a list of pending transactions, transactions that should later be included in blocks. Furthermore, a list of transactions that have been included in non-finalized blocks. As for the light client the transaction service and the transaction pool are most of the time idle. This is due to the fact it is only operating when the user submits a transaction.

The transaction service utilizes the runtime service. Identical to the parachain syncing service it wants to use the latest finalized runtime. The transaction service needs the runtime to validate incoming transactions from the JSON-RPC service. It validates a submitted transaction against the latest (best) block. If valid, it sends the transaction to peers through the networking service. Moreover, it will check the new (best) blocks for whether the transaction is included and when the block, where the transaction is included, is finalized. To have the submitted transaction being added to a block the transaction will be gossiped to peers through the networking service.

The validation of a transaction:

Sending a call proof request, to the sync service, to execute a runtime call.
- Send a request for a list of peers that it assumes are aware of this block.
- Sending call proof request.
- Decode and verify proof.
Starting a WASM virtual machine with automatic storage overlay and log management to execute the TaggedTransactionQueue_validate_transaction extrinsic.
- Call transaction pool runtime API.
- Validate transaction.

After validating the transaction and sending it to peers, it will download the block bodies from latest new blocks:

Send a block request, to the sync service, to find the submitted transaction.
- Send a request for a list of peers that it assumes are aware of this block.
- Send a block request, specifically requesting for the body.
- Verify the block body (happens on arrival in the network service).
  - Build the trie root of the list of extrinsics of the block and compare with extrinsics root from block header.
Check the extrinsics in the block body for the submitted transaction (by hash).

It updates the submitter of the transaction about the status of the transaction.

JSON-RPC service

The last asynchronous task that will be spawned handles the JSON-RPC service. The JSON-RPC service holds a state machine which consists of a list of clients (Smoldot API users), pending outgoing messages, pending request(s) and active subscription(s). It all starts with a submitted JSON-RPC request by the API user:

Obtain JSON-RPC request from state machine and match it to its method.
Methods can consist of:
- Sync service request(s), i.e. block header(s) and peer(s).
- Network service request(s), i.e. block body(s), call proof(s) and storage proof(s).
- Runtime service request(s), i.e. (un)subscribe to (finalized) block header(s).
- Execution of Runtime call(s), i.e. storage query(s).
Depending on the request it responds with:
- a single response (e.g. chain_get_header)
- through a subscription (e.g. chain_subscribe_new_heads)

Because this service is dependent on a lot of other services, multiple lightweight tasks are created which are handled by this service. For each subscription it spawns a new lightweight task that waits for updates and notifies the API user (these updates need to be manually polled by the API user). In addition, the service starts another mini task dedicated to filling the Cache with new blocks from the runtime service. The API user is more likely to ask for information about recent blocks and perform calls on them, hence a cache of recent blocks.