There is a possible deadlock between these two processes in Lighthouse:
BeaconChain::on_block
, specifically the call to fork_choice.on_block
.rest_api::validator::publish_attestations
, specifically the parallel attestation verification routine.Whist the cause of the deadlock is not immediately clear, it's due to our use of rayon
and RwLock
. First I'll describe this type of deadlock in general, then I'll go into it in detail.
Note: there's an existing description of the lock here: https://github.com/rayon-rs/rayon/issues/592
The rayon
library allows us to write code like this:
use rayon::prelude::*;
let a = vec![0, 1, 3];
let a_doubled = a.par_iter().map(|x| *x * 2).collect();
What happens here is rayon magically maintains a long-lived (i.e., static
) pool of n
threads, where n
defaults to the number of CPUs. When you run collect
on the iter from into_par_iter
, each element of the iter is (maybe) processed on a different one of those threads in the rayon pool.
Part of the magic of rayon
is that it completely hides the threadpool and the workers. Hidden in this magic is the ability to accidentally hold a mutex whilst relying on a finite set of workers to complete tasks which might also access the mutex you are holding. In detail:
t1
thread holding a write-lock on shared
, waiting for some rayon tasks to complete.t2
thread spawning multiple rayon tasks, each of which try to obtain a read-lock on shared
.If the tasks spawned by t2
equal the number of worker threads, then all threads will be blocked waiting on t1
to drop the write lock. If t1
is unable to obtain any workers to complete the par_iter, it will never drop the write lock. The program is locked.
You can see a working (non-deterministic) example of this condition here:
Looking at some gdb
backtraces from a locked node, we can identify two functions which together are sufficient to trigger the lock:
beacon_chain.fork_choice
write lockThread 35 (Thread 0x7ff7e91e4700 (LWP 17935)):
#0 0x00007ff887066277 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1 0x0000558e5c7fde37 in rayon_core::latch::LockLatch::wait_and_reset ()
#2 0x0000558e5c6b56a8 in std::thread::local::LocalKey<T>::with ()
#3 0x0000558e5c6a24a4 in rayon_core::registry::in_worker ()
#4 0x0000558e5c69d7ac in rayon::iter::plumbing::bridge_producer_consumer::helper ()
#5 0x0000558e5c69cf56 in <rayon::iter::plumbing::bridge::Callback<C> as rayon::iter::plumbing::ProducerCallback<I>>::callback ()
#6 0x0000558e5c6af128 in <rayon::iter::while_some::WhileSome<I> as rayon::iter::ParallelIterator>::drive_unindexed ()
#7 0x0000558e5c6a56ef in rayon::iter::collect::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend ()
#8 0x0000558e5c6ae3ec in rayon::iter::ParallelIterator::collect ()
#9 0x0000558e5c6bd0ce in types::beacon_state::tree_hash_cache::ValidatorsListTreeHashCache::recalculate_tree_hash_root ()
#10 0x0000558e5b720dac in types::beacon_state::tree_hash_cache::BeaconTreeHashCache<T>::recalculate_tree_hash_root ()
#11 0x0000558e5b2b437b in types::beacon_state::BeaconState<T>::update_tree_hash_cache ()
#12 0x0000558e5b616c56 in state_processing::per_slot_processing::per_slot_processing ()
#13 0x0000558e5b29bf4f in store::hot_cold_store::HotColdDB<E,Hot,Cold>::replay_blocks ()
#14 0x0000558e5b2a0335 in store::hot_cold_store::HotColdDB<E,Hot,Cold>::load_hot_state ()
#15 0x0000558e5b2a9bf2 in store::hot_cold_store::HotColdDB<E,Hot,Cold>::get_state ()
#16 0x0000558e5b88f7ec in <beacon_chain::beacon_fork_choice_store::BeaconForkChoiceStore<E,Hot,Cold> as fork_choice::fork_choice_store::ForkChoiceStore<E>>::set_justified_checkpoint ()
#17 0x0000558e5b2fb6a5 in fork_choice::fork_choice::ForkChoice<T,E>::on_block ()
#18 0x0000558e5b91b0db in beacon_chain::beacon_chain::BeaconChain<T>::import_block ()
#19 0x0000558e5b91e27a in beacon_chain::beacon_chain::BeaconChain<T>::process_block ()
#20 0x0000558e5b72b411 in network::beacon_processor::worker::Worker<T>::process_gossip_block ()
#21 0x0000558e5b4174e9 in <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll ()
#22 0x0000558e5b0d0f74 in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#23 0x0000558e5b05d249 in tokio::runtime::task::harness::Harness<T,S>::poll ()
#24 0x0000558e5c532b27 in tokio::runtime::blocking::pool::Inner::run ()
#25 0x0000558e5c538db8 in tokio::runtime::context::enter ()
#26 0x0000558e5c537ccf in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#27 0x0000558e5c540c42 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#28 0x0000558e5c8cfeca in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#29 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#30 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#31 0x00007ff88706040b in start_thread () from /lib64/libpthread.so.0
#32 0x00007ff886b84e7f in clone () from /lib64/libc.so.6
It's not visible in this trace, but there is a RwLock::write
being obtained on beacon_chain.fork_choice
between #18
and #17
and we're holding that lock until we can complete all the hashing in tree_hash_cache::ValidatorsListTreeHashCache::recalculate_tree_hash_root ()
.
beacon_chain.fork_choice
Thread 527 (Thread 0x7f4cf7283700 (LWP 13082)):
#0 0x00007f4dc650abf9 in syscall () from /lib64/libc.so.6
#1 0x00005570edcf6de9 in parking_lot::raw_rwlock::RawRwLock::lock_shared_slow ()
#2 0x00005570ecd48e91 in beacon_chain::attestation_verification::VerifiedUnaggregatedAttestation<T>::verify ()
#3 0x00005570ece34eb7 in beacon_chain::beacon_chain::BeaconChain<T>::verify_unaggregated_attestation_for_gossip ()
#4 0x00005570ec81cf69 in core::ops::function::impls::<impl core::ops::function::FnMut<A> for &F>::call_mut ()
#5 0x00005570ec77f941 in rayon::iter::plumbing::Folder::consume_iter ()
#6 0x00005570ec834916 in rayon::iter::plumbing::bridge_producer_consumer::helper ()
#7 0x00005570ecf33152 in <rayon::vec::IntoIter<T> as rayon::iter::IndexedParallelIterator>::with_producer ()
#8 0x00005570ec9b67db in rayon::iter::collect::special_extend ()
#9 0x00005570ecb5e97a in rayon::iter::collect::<impl rayon::iter::ParallelExtend<T> for alloc::vec::Vec<T>>::par_extend ()
#10 0x00005570ec86ee90 in rest_api::validator::publish_attestations ()
#11 0x00005570ec90856a in <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll ()
#12 0x00005570ec5cc9db in <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once ()
#13 0x00005570ec56c7de in tokio::runtime::task::harness::Harness<T,S>::poll ()
#14 0x00005570eda22b27 in tokio::runtime::blocking::pool::Inner::run ()
#15 0x00005570eda28db8 in tokio::runtime::context::enter ()
#16 0x00005570eda27ccf in std::sys_common::backtrace::__rust_begin_short_backtrace ()
#17 0x00005570eda30c42 in core::ops::function::FnOnce::call_once{{vtable-shim}} ()
#18 0x00005570eddbfeca in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once ()
at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#19 <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/d3fb005a39e62501b8b0b356166e515ae24e2e54/src/liballoc/boxed.rs:1076
#20 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:87
#21 0x00007f4dc69eb40b in start_thread () from /lib64/libpthread.so.0
#22 0x00007f4dc650fe7f in clone () from /lib64/libc.so.6
It's also not visible in this task but, there is a call to beacon_chain.fork_choice.read()
inside VerifiedUnaggregatedAttestation<T>::verify ()
.
Here are the two entry points where this can happen in Lighthouse:
BeaconChain::on_block
, specifically the call to fork_choice.on_block
.rest_api::validator::publish_attestations
, specifically the parallel attestation verification routine.In (1) we're holding beacon_chain.fork_choice.write()
whilst we do a bunch of tree hashing in rayon. In (2), we're spawning lots of attestation processing in rayon threads which require a beacon_chain.fork_choice.read()
. If we happen to spawn n
(num cpus) tasks from (2), they'll all get blocked waiting for a read-lock and then there will be no workers left to complete the hashing that would cause the write-lock to be dropped. A deadlock!