`prd-elc-besu-prysm-mainnet-nightly-forest-snap` Performance Analysis

Date of analysis: 23/04/2024

Current state - last 30 days:

Screenshot 2024-04-23 at 3.43.22 PM

The node had some blocks behind spikes but finally couldn't recover start around 08/04/2024 01:00:00 - let's call this the "incident".
Based on the drop in peers, this time appears to match with a (manual?) restart of one of both of the clients, but we are missing the older log so can't be sure.

Screenshot 2024-04-23 at 3.48.27 PM

There have been a number of OOM killer events, but not at incident time.

Only the following are recorded in dmesg…

$ sudo dmesg -T | grep "Killed process"
[Fri Apr 12 01:49:06 2024] Out of memory: Killed process 28925 (prysm-beacon) total-vm:33279532kB, anon-rss:12255500kB, file-rss:5888kB, shmem-rss:0kB, UID:1025 pgtables:34808kB oom_score_adj:0
[Thu Apr 18 05:44:28 2024] Out of memory: Killed process 193594 (prysm-beacon) total-vm:32956988kB, anon-rss:10152940kB, file-rss:2560kB, shmem-rss:0kB, UID:1025 pgtables:35140kB oom_score_adj:0
[Mon Apr 22 13:30:37 2024] Out of memory: Killed process 310385 (prysm-beacon) total-vm:34276952kB, anon-rss:10531612kB, file-rss:8960kB, shmem-rss:0kB, UID:1025 pgtables:30568kB oom_score_adj:0

Current state - last 3 hours

Screenshot 2024-04-23 at 3.52.00 PM

Nothing in the red, but RAM Used is quite high

$ uptime
 06:47:00 up 13 days,  5:07,  2 users,  load average: 0.71, 0.92, 0.92

JPDM Analysis

https://www.jfokus.se/jfokus17/preso/Using-jPDM–a-performance-diagnostic-model.pdf

Screenshot 2024-04-23 at 3.56.45 PM

1. System dominant?

No…

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 3  0      0 269336  44968 3047964    0    0  1409   401    4    7 11  1 79  9  0
 1  0      0 269336  45072 3047964    0    0     0   824 8549 10326 44  2 52  2  0
 5  0      0 269336  45120 3047964    0    0     0   640 2429 2835 62  1 36  1  0
...
...
 0  0      0 203248  45756 3119512    0    0   272   880 7286 10620  4  1 92  3  0
 1  0      0 203248  45844 3119504    0    0     0   996 5658 8311  2  2 92  4  0
 0  0      0 201740  45880 3119496    0    0     0   408 1791 2825  2  1 96  2  0
...
...
 3  0      0 230860  36164 2443916    0    0     0   820 6453 11340 65  1 32  2  0
 3  0      0 230860  36212 2443916    0    0     0   416 1965 1736 76  0 24  1  0
 1  0      0 220528  36256 2444360    0    0     0  5784 3519 5217 60  1 38  1  0

Two modes:

user-dominant CPU
idle

2. User CPU ~= 100%?

Although…

$ pidstat 1
Linux 6.5.0-1018-azure (prd-elc-besu-prysm-mainnet-nightly-forest-snap) 	04/23/24 	_x86_64_	(4 CPU)

04:51:56      UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
04:51:57      102       578    0.99    0.00    0.00    0.00    0.99     3  dbus-daemon
04:51:57      998       588    2.97    2.97    0.00    0.00    5.94     1  node_exporter
04:51:57        0       909    0.99    0.00    0.00    0.00    0.99     0  dockerd
04:51:57      104       940    0.99    0.00    0.00    0.00    0.99     2  mdsd
04:51:57     1025    478418  148.51    3.96    0.00    0.99  152.48     0  prysm-beacon
04:51:57     1024    502127    0.99    0.00    0.00    0.00    0.99     0  java

prysm-beacon has periodic high usage…it's got 4 CPUs, so I guess 148% out of 400% is actually 37%?

USE Method

Memory

Clearly from the OOMs the memory has been saturated at points.

$ free -m
               total        used        free      shared  buff/cache   available
Mem:           15979       12745         182           4        3051        2883
Swap:              0           0           0

htop (filtered by prysm)

Screenshot 2024-04-23 at 4.29.36 PM

htop (filtered by besu)

Screenshot 2024-04-23 at 4.31.32 PM
…many more threads

CPU

$ mpstat -P ALL 1
...
06:35:38     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:35:39     all   98.25    0.00    1.50    0.00    0.00    0.25    0.00    0.00    0.00    0.00
06:35:39       0   99.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:35:39       1   95.05    0.00    3.96    0.00    0.00    0.99    0.00    0.00    0.00    0.00
06:35:39       2   99.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
06:35:39       3  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

06:35:39     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:35:40     all   16.96    0.00    1.01   22.28    0.00    0.00    0.00    0.00    0.00   59.75
06:35:40       0   15.00    0.00    0.00   12.00    0.00    0.00    0.00    0.00    0.00   73.00
06:35:40       1   14.29    0.00    2.04   20.41    0.00    0.00    0.00    0.00    0.00   63.27
06:35:40       2   14.14    0.00    2.02   54.55    0.00    0.00    0.00    0.00    0.00   29.29
06:35:40       3   24.49    0.00    0.00    2.04    0.00    0.00    0.00    0.00    0.00   73.47

06:35:40     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:35:41     all    3.53    0.00    0.76   24.18    0.00    0.25    0.00    0.00    0.00   71.28
06:35:41       0    2.97    0.00    0.99    1.98    0.00    0.99    0.00    0.00    0.00   93.07
06:35:41       1    5.10    0.00    1.02   73.47    0.00    0.00    0.00    0.00    0.00   20.41
06:35:41       2    4.04    0.00    1.01   14.14    0.00    0.00    0.00    0.00    0.00   80.81
06:35:41       3    2.02    0.00    0.00    8.08    0.00    0.00    0.00    0.00    0.00   89.90

2 modes: user dominant -> transition -> idle

vmstat 1

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0 788956  57016 3166064    0    0  1409   401    2    7 11  1 79  9  0
 0  0      0 806280  56804 3165040    0    0     0   568 1855 2904  1  0 97  3  0
 2  0      0 805776  56836 3165040    0    0     0   392 2433 3274  1  1 97  2  0
 0  0      0 805040  56880 3165040    0    0     0   680 10403 11584  3  2 93  3  0
 1  0      0 803320  56892 3165424    0    0    96  5716 4870 8121 27  1 69  3  0
 5  0      0 803320  56944 3165424    0    0     0   576 6559 11370 40  1 58  2  0

Mostly r <= 4, but a couple of times r = 5 despite being a 4 CPU machine.

Application Analysis

Using pidstat alongside tail -f /var/log/prysm/beacon.log -f /var/log/besu/besu.log with some besu debug enabled to show the engine API calls…
Screenshot 2024-04-23 at 4.13.53 PM

We can see that the periodic CPU spikes from prysm-beacon occur during this log:
{"batchSize":64,"blocksPerSecond":"3.2","estimatedTimeRemaining":27122000000000,"latestProcessedSlot/currentSlot":"8832096/8918887","level":"info","msg":"Processing blocks","peers":80,"prefix":"initial-sync","startingFrom":"0x9013a27c...","time":"2024-04-23T05:37:47Z"}

Then besu receives a batch of newPayloads, all but one of which it already has, but also doesn't receive an FCU…

{"@timestamp":"2024-04-23T05:37:56,713","level":"DEBUG","thread":"vert.x-worker-thread-0","class":"AbstractEngineNewPayload","message":"New payload: number: 19630000, hash: 0x496ee8b8e1cddd7d43d9ab63ad68e4b78ad11e6a673aa0529fa938972f124344, parentHash: 0x555bb91b816369f0dda0f3df8919165d9e6948836967cf358a9097fc434bcb87, latestValidHash: 0x496ee8b8e1cddd7d43d9ab63ad68e4b78ad11e6a673aa0529fa938972f124344, status: VALID","throwable":""}

==> /var/log/prysm/beacon.log <==
{"error":"could not process block in batch: timeout from http.Client: received an undefined execution engine error","level":"warning","msg":"Skip processing batched blocks","prefix":"initial-sync","time":"2024-04-23T05:38:04Z"}
{"error":"could not process block in batch: timeout from http.Client: received an undefined execution engine error","level":"warning","msg":"Skip processing batched blocks","prefix":"initial-sync","time":"2024-04-23T05:38:04Z"}
{"error":"beacon node doesn't have a parent in db with root: 0xafadb55cbf7f88cda7a3eeff22921a4c4383c2b527ee0baabc97c2ebbd62ad96 (in processBatchedBlocks, slot=8832160)","level":"warning","msg":"Skip processing batched blocks","prefix":"initial-sync","time":"2024-04-23T05:38:04Z"}
{"error":"beacon node doesn't have a parent in db with root: 0xafadb55cbf7f88cda7a3eeff22921a4c4383c2b527ee0baabc97c2ebbd62ad96 (in processBatchedBlocks, slot=8832160)","level":"warning","msg":"Skip processing batched blocks","prefix":"initial-sync","time":"2024-04-23T05:38:04Z"}

==> /var/log/besu/besu.log <==
{"@timestamp":"2024-04-23T05:38:06,874","level":"INFO","thread":"vert.x-worker-thread-0","class":"AbstractEngineNewPayload","message":"Imported #19,630,001 / 134 tx / 16 ws / 1 blobs / base fee 12.78 gwei / 11,558,776 (38.5%) gas / (0x126cc21c45fa3ea60aef24d4593816729c728d75865c9848b58413acac1cc5fe) in 10.136s. Peers: 25","throwable":""}
{"@timestamp":"2024-04-23T05:38:06,874","level":"DEBUG","thread":"vert.x-worker-thread-0","class":"AbstractEngineNewPayload","message":"New payload: number: 19630001, hash: 0x126cc21c45fa3ea60aef24d4593816729c728d75865c9848b58413acac1cc5fe, parentHash: 0x496ee8b8e1cddd7d43d9ab63ad68e4b78ad11e6a673aa0529fa938972f124344, latestValidHash: 0x126cc21c45fa3ea60aef24d4593816729c728d75865c9848b58413acac1cc5fe, status: VALID","throwable":""}

This log is also troubling…
{"error":"could not process block in batch: timeout from http.Client: received an undefined execution engine error","level":"warning","msg":"Skip processing batched blocks","prefix":"initial-sync","time":"2024-04-23T07:32:47Z"}

Conclusion

Prysm is not in sync and doesn't seem to successfully getting besu in sync due to lack of FCUs.

This is clearly a memory-constained system, but it is not clear whether the memory issues have caused what now appears to be a misbehaving application.

Not all the logs and metrics were available from the time of the "incident".

Recommend another round of analysis as soon as the issue begins occuring.

UPDATE: Following the prysm resync, the root cause could actually be disk saturation https://hackmd.io/kJ4nxfr4Qj6wV3BBA0hW6A#2-DONE-Resync-Prysm

This azure node doesn't have disk caching enabled like the other canaries do.

Actions

1. [DONE] Upgrade Prysm to latest version (v5.0.2 -> v5.0.3)

# stopping
{"level":"info","msg":"Stopping beacon node","prefix":"node",
"time":"2024-04-23T07:16:36Z"}
{"currentSlot":8919381,"level":"info","msg":"Synced to finalized epoch - now syncing blocks up to current head","prefix":"initial-sync","syncedSlot":8832223,
"time":"2024-04-23T07:16:36Z"}

# starting
{"level":"info","logFileName":"/var/log/prysm/beacon.log","msg":"Logs will be made persistent",
"time":"2024-04-23T07:18:07Z"}

It was 8919381 - 8832223 = 87158 slots behind. We may as well resync prysm.

Screenshot 2024-04-23 at 5.29.36 PM

Upon update, memory drops but "RAM Cache + Buffer" returns to the 16GB limit quickly. RAM Used is also climbing.

2. [DONE] Resync Prysm

This has triggered a backwards sync
{"@timestamp":"2024-04-23T07:42:58,857","level":"INFO","thread":"vert.x-worker-thread-0","class":"BackwardSyncContext","message":"Starting a new backward sync session","throwable":""}

BWS is painfully slow: ~11s per block, not much faster than the chain progresses!

{"@timestamp":"2024-04-23T07:45:37,063","level":"INFO","thread":"ForkJoinPool.commonPool-worker-3","class":"BackwardSyncContext","message":"Backward sync phase 2 of 2, 0.01% completed, imported 6 blocks of at least 86605 (current head 19630090, target head 19716689). Peers: 25","throwable":""}
{"@timestamp":"2024-04-23T07:45:49,016","level":"INFO","thread":"ForkJoinPool.commonPool-worker-3","class":"BackwardSyncContext","message":"Backward sync phase 2 of 2, 0.01% completed, imported 7 blocks of at least 86607 (current head 19630091, target head 19716691). Peers: 25","throwable":""}
{"@timestamp":"2024-04-23T07:46:04,662","level":"INFO","thread":"ForkJoinPool.commonPool-worker-1","class":"BackwardSyncContext","message":"Backward sync phase 2 of 2, 0.01% completed, imported 9 blocks of at least 86608 (current head 19630093, target head 19716692). Peers: 25","throwable":""}

Could be due to I/O saturation…

$ iostat -xz 1

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda              0.00      0.00     0.00   0.00    0.00     0.00   44.00    604.00    93.00  67.88    1.68    13.73    8.00  36156.00     2.00  20.00    2.50  4519.50    0.00    0.00    0.09   9.20
sdb            480.00  21708.00     0.00   0.00    4.00    45.23    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    1.92 100.00

4. [TODO?] Resync Prysm and Besu

Ameziane Hamlat

2024/04/24 07:17:21

JPDM analysis works when you have a CPU issue or at least a high CPU usage

2024/04/24 07:17:46

prysm-beacon has periodic high usage…it’s got 4 CPUs, so I guess 148% out of 400% is actually 37%?

Yes, that's correct

2024/04/24 07:20:34

You can couple this metric with load average to see if there is a real CPU saturation.

2024/04/24 07:22:21

BTW, you can have a CPU saturation even when CPU usage is less than 100%. A good example, is an application that is CPU bound but mono thread on a multi core machine.

2024/04/24 07:31:19

Yes, could be IO saturation. On disks, IO saturation can start even at 60% Disk utilisation, in this case, disk utilisation is 100%. To be sure, time clock profiling can help to find to the top consumers

Simon Dudley

2024/04/28 23:33:59

What tools can I use to do time clock profiling - do you have an example?

prd-elc-besu-prysm-mainnet-nightly-forest-snap Performance Analysis