Try   HackMD

Synchronous Channel Implementation

NOTE: as per #79, we are first pursuing a single-producer/single-consumer (SPSC) channel implementation, which we will extend to a MPMC channel in a later effort. Therefore, this note assumes exactly one sender and one concurrent receiver.

What is sync.Cond?

Reference #1 says it best:

If I had to summarize sync.Cond in one sentence: A channel says “here’s a specific event that occurred;” a sync.Cond says “something happened, idk, you go figure it out.” Thus, sync.Cond is more vague, which means it’s also more flexible.

It also hints at how you should use it:

It’s kind of like a chan struct{} paired with a sync.Mutex: after you receive from the channel, you lock the mutex, then go investigate what happened.

There are two parts to this:

  1. You use sync.Cond in conjunction with a mutex.
  2. sync.Cond locks the mutex when it receives a signal, and then hands control to you.

(Note: point #2 is a bit unclear from the phrasing, but that is how it works)

At any rate, ref #1 is worth reading to understand sync.Cond in general.

How does it help us with channel implementation?

If we trace our way through the implementation flowchart for #79 (below), it becomes clear that there are multiple points at which a goroutine must block until something has happened.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Exactly what must happen is, however, dependent on the current state of the flowchart. For example, "is it my turn?" requires checking different state variables than "is the receiver ready?". Moreover, because both the sender and receiver goroutine can modify these state variables, some synchronization is necessary (e.g. a mutex). Thus, at a glance, we have a problem well suited for sync.Cond.

Back-of-napkin implementation

The following is a proof-of-concept implementation of the SPSC channel using sync.Cond. To avoid obscuring the high-level mechanics, it includes some simplifying assumptions:

  1. There is only one sender and one concurrent receiver.
  2. Both goroutines are in the same process (no network calls are involved).
  3. The channel is not a Cap'n Proto capability.

Applying this logic to a capnp capability server should be a fairly straightforward matter of:

  1. Employing call.Go() to ensure that senders & receivers do not block each other.
  2. Employing a pair of list structures to implement sender & receiver queues.
  3. Employing contexts to cancel out of blocking calls

Implementation: https://go.dev/play/p/20Nht77ZgQ_9

Follow-up thoughts

The concurrency here is already non-trivial, and that complexity is bound to increase with the addition of MPMC semantics. The more I stare at this, the more I think this effort could benefit from a formalized approach. This could take several forms.

At a minimum, we should take the time to enumerate the channel invariants, and convince ourselves even if informally that they hold. I see roughly two (non-exclusive) approaches we could take:

  1. I have a hunch that this channel design is a special case of two-phase commit (2PC). Convincing ourselves of this would help bound the problem.
  2. We could formally specify the channel semantics with TLA+ and verify it with the model checker.

There are some marginal behaviors of sycn.Cond that I don't fully understand. While I would expect the Go devs to avoid (or at least document) any footguns, I would like to actually understand sync.Cond's invariants in some detail. For instance: what happens when a thread calls Signal() immediately before Wait(), as in the above Send() implementation? Is Send() re-entrant in these conditions? If so, this would produce a livelock.

Addendum

A cursory glance at the sync.Cond implementation suggests that Signal() < Wait() does not produce re-entrant behavior. The call to Singal() notifies threads that have previously registered in a runtime.notifyList. So that's nice

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

References

  1. The Pros of Conds
  2. How to properly use the conditional variable sync.Cond in Golang