owned this note
owned this note
Published
Linked with GitHub
# Lighthouse Deadlock: Rayon & RwLock
There is a possible deadlock between these two processes in Lighthouse:
1. `BeaconChain::on_block`, specifically the call to [`fork_choice.on_block`](https://github.com/sigp/lighthouse/blob/2bc9115a94bfcf1269f83f0a9e7a1321b2a7f70c/beacon_node/beacon_chain/src/beacon_chain.rs#L1500-L1502).
1. `rest_api::validator::publish_attestations`, specifically the [parallel attestation verification routine](https://github.com/sigp/lighthouse/blob/2bc9115a94bfcf1269f83f0a9e7a1321b2a7f70c/beacon_node/rest_api/src/validator.rs#L476-L489).
Whist the cause of the deadlock is not immediately clear, it's due to our use of `rayon` and `RwLock`. First I'll describe this type of deadlock in general, then I'll go into it in detail.
### The deadlock in general
*Note: there's an existing description of the lock here: https://github.com/rayon-rs/rayon/issues/592*
- Related issue on the rayon repo:
The `rayon` library allows us to write code like this:
```rust
use rayon::prelude::*;
let a = vec![0, 1, 3];
let a_doubled = a.par_iter().map(|x| *x * 2).collect();
```
What happens here is rayon magically maintains a long-lived (i.e., `static`) pool of `n` threads, where `n` defaults to the number of CPUs. When you run `collect` on the iter from `into_par_iter`, each element of the iter is (maybe) processed on a different one of those threads in the rayon pool.
Part of the magic of `rayon` is that it completely hides the threadpool and the workers. Hidden in this magic is the ability to accidentally hold a mutex whilst relying on a finite set of workers to complete tasks which might also access the mutex you are holding. In detail:
1. There is a `t1` thread holding a write-lock on `shared`, waiting for some rayon tasks to complete.
1. There is a `t2` thread spawning multiple rayon tasks, each of which try to obtain a read-lock on `shared`.
If the tasks spawned by `t2` equal the number of worker threads, then all threads will be blocked waiting on `t1` to drop the write lock. If `t1` is unable to obtain any workers to complete the par_iter, it will never drop the write lock. The program is locked.
You can see a working (non-deterministic) example of this condition here:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=af0f6ad314b983ce132b24272be1431d
## The deadlock in Lighthouse
Looking at some `gdb` backtraces from a locked node, we can identify two functions which together are sufficient to trigger the lock:
### 1. Waiting for tasks to complete, whilst holding a `beacon_chain.fork_choice` write lock
```
Thread 35 (Thread 0x7ff7e91e4700 (LWP 17935)):
#0 0x00007ff887066277 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000558e5c7fde37 in rayon_core::latch::LockLatch::wait_and_reset ()
#2 0x0000558e5c6b56a8 in std::thread::local::LocalKey<T>::with ()
#3 0x0000558e5c6a24a4 in rayon_core::registry::in_worker ()
#4 0x0000558e5c69d7ac in rayon::iter::plumbing::bridge_producer_consumer::helper ()
#5 0x0000558e5c69cf56 in <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback ()
#6 0x0000558e5c6af128 in <rayon::iter::while_some::WhileSome<I> as rayon::iter::ParallelIterator>::drive_unindexed ()
#7 0x0000558e5c6a56ef in rayon::iter::collect::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend ()
#8 0x0000558e5c6ae3ec in rayon::iter::ParallelIterator::collect ()
#9 0x0000558e5c6bd0ce in types::beacon_state::tree_hash_cache::ValidatorsListTreeHashCache::recalculate_tree_hash_root ()
#10 0x0000558e5b720dac in types::beacon_state::tree_hash_cache::BeaconTreeHashCache<T>::recalculate_tree_hash_root ()
#11 0x0000558e5b2b437b in types::beacon_state::BeaconState<T>::update_tree_hash_cache ()
#12 0x0000558e5b616c56 in state_processing::per_slot_processing::per_slot_processing ()
#13 0x0000558e5b29bf4f in store::hot_cold_store::HotColdDB<E,Hot,Cold>::replay_blocks ()
#14 0x0000558e5b2a0335 in store::hot_cold_store::HotColdDB<E,Hot,Cold>::load_hot_state ()
#15 0x0000558e5b2a9bf2 in store::hot_cold_store::HotColdDB<E,Hot,Cold>::get_state ()
#16 0x0000558e5b88f7ec in <beacon_chain::beacon_fork_choice_store::BeaconForkChoiceStore<E,Hot,Cold> as fork_choice::fork_choice_store::ForkChoiceStore<E>>::set_justified_checkpoint ()
#17 0x0000558e5b2fb6a5 in fork_choice::fork_choice::ForkChoice<T,E>::on_block ()
#18 0x0000558e5b91b0db in beacon_chain::beacon_chain::BeaconChain<T>::import_block ()
#19 0x0000558e5b91e27a in beacon_chain::beacon_chain::BeaconChain<T>::process_block ()
#20 0x0000558e5b72b411 in network::beacon_processor::worker::Worker<T>::process_gossip_block ()
#21 0x0000558e5b4174e9 in <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll ()
#22 0x0000558e5b0d0f74 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#23 0x0000558e5b05d249 in tokio::runtime::task::harness::Harness<T,S>::poll ()
#24 0x0000558e5c532b27 in tokio::runtime::blocking::pool::Inner::run ()
#25 0x0000558e5c538db8 in tokio::runtime::context::enter ()
#26 0x0000558e5c537ccf in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#27 0x0000558e5c540c42 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#28 0x0000558e5c8cfeca in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#29 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#30 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#31 0x00007ff88706040b in start_thread () from /lib64/libpthread.so.0
#32 0x00007ff886b84e7f in clone () from /lib64/libc.so.6
```
It's not visible in this trace, but there is a `RwLock::write` being obtained on `beacon_chain.fork_choice` between `#18` and `#17` and we're holding that lock until we can complete all the hashing in `tree_hash_cache::ValidatorsListTreeHashCache::recalculate_tree_hash_root ()`.
### 2. Spawning tasks which read `beacon_chain.fork_choice`
```
Thread 527 (Thread 0x7f4cf7283700 (LWP 13082)):
#0 0x00007f4dc650abf9 in syscall () from /lib64/libc.so.6
#1 0x00005570edcf6de9 in parking_lot::raw_rwlock::RawRwLock::lock_shared_slow ()
#2 0x00005570ecd48e91 in beacon_chain::attestation_verification::VerifiedUnaggregatedAttestation<T>::verify ()
#3 0x00005570ece34eb7 in beacon_chain::beacon_chain::BeaconChain<T>::verify_unaggregated_attestation_for_gossip ()
#4 0x00005570ec81cf69 in core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut ()
#5 0x00005570ec77f941 in rayon::iter::plumbing::Folder::consume_iter ()
#6 0x00005570ec834916 in rayon::iter::plumbing::bridge_producer_consumer::helper ()
#7 0x00005570ecf33152 in <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer ()
#8 0x00005570ec9b67db in rayon::iter::collect::special_extend ()
#9 0x00005570ecb5e97a in rayon::iter::collect::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend ()
#10 0x00005570ec86ee90 in rest_api::validator::publish_attestations ()
#11 0x00005570ec90856a in <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll ()
#12 0x00005570ec5cc9db in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#13 0x00005570ec56c7de in tokio::runtime::task::harness::Harness<T,S>::poll ()
#14 0x00005570eda22b27 in tokio::runtime::blocking::pool::Inner::run ()
#15 0x00005570eda28db8 in tokio::runtime::context::enter ()
#16 0x00005570eda27ccf in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#17 0x00005570eda30c42 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#18 0x00005570eddbfeca in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#19 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#20 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#21 0x00007f4dc69eb40b in start_thread () from /lib64/libpthread.so.0
#22 0x00007f4dc650fe7f in clone () from /lib64/libc.so.6
```
It's also not visible in this task but, there is a call to `beacon_chain.fork_choice.read()` inside `VerifiedUnaggregatedAttestation<T>::verify ()`.
### A concrete example
Here are the two entry points where this can happen in Lighthouse:
1. `BeaconChain::on_block`, specifically the call to [`fork_choice.on_block`](https://github.com/sigp/lighthouse/blob/2bc9115a94bfcf1269f83f0a9e7a1321b2a7f70c/beacon_node/beacon_chain/src/beacon_chain.rs#L1500-L1502).
1. `rest_api::validator::publish_attestations`, specifically the [parallel attestation verification routine](https://github.com/sigp/lighthouse/blob/2bc9115a94bfcf1269f83f0a9e7a1321b2a7f70c/beacon_node/rest_api/src/validator.rs#L476-L489).
In (1) we're holding `beacon_chain.fork_choice.write()` whilst we do a bunch of tree hashing in rayon. In (2), we're spawning lots of attestation processing in rayon threads which require a `beacon_chain.fork_choice.read()`. If we happen to spawn `n` (num cpus) tasks from (2), they'll all get blocked waiting for a read-lock and then there will be no workers left to complete the hashing that would cause the write-lock to be dropped. A deadlock!