How indexed queries to a node's routing table are performed remotely, and how the results are efficiently streamed.
Wetware nodes form a cluster in which peers maintain a mutual awareness of each other. This mutual awareness is provided by the cluster.RoutingTable
interface. The routing table maintains an indexed set of routing.Record
s, each of which is periodically broadcast by a peer as a form of heartbeat and contains routing information and metadata about its originator.
For convenience, here is routing.Record
:
The routing table is responsible for expiring stale entries, i.e. entries for which either (1) the TTL has expired or (2) a new record with a higher sequence number was received. This provides an inconsistent but available system for propagating and querying cluster-membership state. As a design principle, Wetware is a PA/EL system that supports other consistency tradeoffs at the application layer.
Users and applications can perform indexed queries against the routing table to obtain the set of peers matching some constraints. For instance, at the command line:
or
In both cases, queries are prefixed-matched against entries in the routing table, and all matching entries are streamed back to the client. Streaming is performed via Cap'n Proto RPC, which supports streaming RPC calls with BBR flow control. Additionally, indexed lookups are performed on the cluster node, not the client, so only matching entries are sent over the wire. This architecture makes it possible to efficiently query nodes in very large clusters.
To undertand how Wetware's indexed queries are implemented, let us trace the path of a request originating at the command line as it makes its way to a node in the cluster and observe how it is processed on the remote end. Along the way, we will observe several distict features of the Wetware architecture.
Note: most links point to code in the
v0.1.0
branch, which is the most up-to-date. However, I have not yet migrated support for indexes in thels
CLI command tov0.1.0
, so links to CLI code will point to themaster
branch. Please don't be alarmed. There are no meaningful differences between these versions.
Let's begin by having a look at the main function for the ls
command, here.
The ls
function returns a callback that will be invoked b the CLI library when the ls
command is executed. The node
symbol is a client object pointing to an individual node in the cluster. It was obtained during application startup through a hook in the CLI library that uses Wetware's pluggable peer discovery system to find and connect to a peer in the cluster.
In the first step, we call the View
function to obtain a view.View
type. The View
is an object capability that allows the owner to perform queries against the node's routing table, i.e it provides the caller of the capability with the node's view of the cluster.
Wetware and Object Capabilities. Wetware uses object capabilities for access control. In Wetware, object capabilities are Cap'n Proto
interface
types, which act as unforgeable references to remote objects, providing a robust security model. Practically speaking, the Cap'n Proto schema compiler produces Go code for the types found inapi/*.capnp
. Interface types are compiled to special Go structs whose method calls map onto latency-optimize, asynchronous RPC.
The Cap'n Proto schema for the View
capability is found in api/cluster.capnp. The View
interface contains three methods:
Returning to the ls
function, we see that the second step is to call the Iter
method on the view, passing in some query parameters, and obtaining an iterator.
The query(c)
call is responsible for parsing command-line arguments into arguments for View.Iter
. The parsing step is relatively straightforward, and is therefore left as an exercise to the reader. The iterator is more interesting, and we will return to it shortly. But first, let's look at the arguments to the iter
method in the schema file in more detail. These are:
This is fairly self-explanatory. In the CLI example, the WHERE
directive indicates a match
selector. Replacing it with FROM
would indicate a range selector (from
), and omitting directives (e.g. ww ls
) corresponds to an all
selector. In the WHERE
queries shown earlier, the remaining parameters are used to construct the match
field's Index
. The Index
struct is shown below. It's purpose is to designate the field of routing.Record
that should be tested against the selector.
Additionally, the index communicates whether to perform an exact match or a prefix match.
At this point, it is useful to return to the ls
function and observe how the iterator (variable it
) is consumed, as this will
iter
RPC call is construted on the client side and handled on the remote node.So, returning to ls
, the third and final step boils down to a call to:
The formatter(c)
call is not particularly interesting; it simply reads command-line flags to determine the output format of the result (either plain-text or JSON). The render
function, on the other hand, demonstrates Wetware's generalized mechanism for dealing with flow-controlled streams: the Iterator
type.
The render
function is concise and elegant:
The consumerFunc
type is returned by formatter(c)
, and is uninteresting. Instead focus on cluster.Iterator
, and take note of the for
loop. The loop will block on each call to it.Next()
until the next item has been received from the remote end. Recall that BBR flow control is used, so a slow consumer will cause the remote node to throttle appropriately. When the stream either (a) is exhausted or (b) encounters an error, it.Next()
will return a nil routing.Record
, causing the loop to exit. Callers are expected to check it.Err()
, and interpret a nil error as the end of the stream. Callers can abort early by canceling the context.Context
intially passed to the Iter
method.
So, how does this work? Let's dig into the client-side Iter
implementation (source).
There's quite a bit going on here, so let's unpack it. The first thing to notice is that view.Iterator
is a subtype of the generic type casm.Iterator
. It is a concrete iterator that deals in routing.Record
s.
The view.handler
type that was constructed with newHandler
corresponds to the Seq
interface in the generic Iterator
's type definition. The details of how these pieces fit together are best obtained by carefully reading through the source code. The key point is that handler
implements the Cap'n Proto Handler
interface found in cluster.capnp
. Specifically, it implements the Recv
method here, which constitutes the send-end of the iterator shown above. This method is called for each record returned by the routing table lookup in the node. This takes place in the iter
function that is construted on the server-side endpoint for Iter
, in the remote node:
The magic happens in s.bind
, and is straightforward:
The source for routing.Iterator
can be found here.
As with view.Iterator
, it is a concrete subtype of the generic casm.Iterator
.
In all cases, iterators are built from a Seq
interface that is able to deliver the next value, and a Future
interface that is able to detect asynchronous termination of the stream, as well as report any errors encountered. The latter is trivially satisfied by the code-generated Future
types produced by the Cap'n Proto compiler for a given schema, and the former is responsible for receiving a single value and synchronizing access.
Code reuse FTW!
Wetware's family of Iterator
s is a powerful feature that provides a consistent API over several key features. In this write-up, we have seen how the generic casm.Iterator
is used to model both local lookups to the routing table, along with the efficient streaming of routing.Record
s over the network. In both cases, the same developer-facing pattern is used: a simple for
loop with a final error check.
This pattern also emerges in other Wetware features. For example, Wetware provides a peer-to-peer pubsub mesh that distributed applications can use to broadcast messages. Subscribing to a pubsub topic looks very similar to querying the routing table. In fact, it is even simpler, since it requires no selectors or constraints:
Other client-facing features from inter-process channels to service discovery make use of this same pattern.