GS1 ID Union Node out of (write) consensus debugging

Useful ressources:

Issue

The ID Union Test Net Monitoring Dashboard: http://18.158.85.64:3000/d/F4JHZlinz/idunion?orgId=1&refresh=10s
shows "GS1Germany" Consensus Status 3.0000.

Analysis

The above figure "3" is essentially obtained from the validator info (running ledger get-validator-info through a suitably connected and authorized indy-cli) as the count of
["GS1Germany"]["data"]["Node_info"]["Freshness_status"][]["Has_write_consensus"] === false

The ledger info field is set by comparing the last update time to a fixed timeout.
The fields ["GS1Germany"]["data"]["Node_info"]["Freshness_status"][]["Last_updated_time"] also show that no updates are happening (timestamps are close to the node startup time).

  • 2021-08-23 update: Confirmed. Last_updated_time equals the node start time

Apparently timestamps are not updated as expected, which means order_3pc_keys is not happening.

This should happen at process_commit or _do_commit. Indeed, none of the nearby log msgs shows up in the logs. The call stack of _do_commit branches again in _add_to_prepares.

Call Stack Snipped:

Instance change request:

2021-08-16 14:36:13,532|INFO|view_change_trigger_service.py|VIEW CHANGE: GS1Germany sending an instance change with view_no 8055 since State signatures are not updated for too long
2021-08-16 14:36:13,533|TRACE|replica.py|GS1Germany:0 sending INSTANCE_CHANGE{'viewNo': 8055, 'reason': 43}

Declined transaction:

2021-08-16 14:37:01,559|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2}
2021-08-16 14:37:01,559|DEBUG|node.py|GS1GermanyC processing b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw=' request SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2}
2021-08-16 14:37:01,559|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2} from b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw='
2021-08-16 14:37:01,560|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"HHHzGUN4E9xJXrjgDJfsn9","reqId":1629124621461476699}' to b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw=' through listener socket
2021-08-16 14:37:01,561|DEBUG|auth_cons_strategies.py|Using auth constraint from state
2021-08-16 14:37:01,561|TRACE|authorizer.py|Validation error with reason: Not enough TRUSTEE signatures
2021-08-16 14:37:01,561|TRACE|authorizer.py|Validation error with reason: Not enough STEWARD signatures
2021-08-16 14:37:01,561|DEBUG|validator_info_handler.py|Transaction 1629124621461476699 with type 119 started
2021-08-16 14:37:01,714|DEBUG|validator_info_handler.py|Transaction 1629124621461476699 with type 119 finished

Declined read transactions:

2021-08-16 14:37:12,575|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,575|DEBUG|node.py|GS1GermanyC processing b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' request SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,575|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2} from b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE='
2021-08-16 14:37:12,576|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"LibindyDid111111111111","reqId":1629124632129881655}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,576|DEBUG|get_txn_handler.py|<plenum.server.request_handlers.get_txn_handler.GetTxnHandler object at 0x7fc153bc5da0> can not handle GET_TXN request: ledger doesn't have txn with seqNo=22
2021-08-16 14:37:12,576|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REPLY","result":{"identifier":"LibindyDid111111111111","reqId":1629124632129881655,"type":"3","data":null}}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,624|TRACE|zstack.py|GS1GermanyC got 1 messages through listener
2021-08-16 14:37:12,625|DEBUG|node.py|GS1Germany authenticated set() signature on  request 1629124632518629063
2021-08-16 14:37:12,625|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,625|DEBUG|node.py|GS1GermanyC processing b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' request SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,625|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2} from b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE='
2021-08-16 14:37:12,626|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"LibindyDid111111111111","reqId":1629124632518629063}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,626|DEBUG|get_txn_handler.py|<plenum.server.request_handlers.get_txn_handler.GetTxnHandler object at 0x7fc153bc5da0> can not handle GET_TXN request: ledger doesn't have txn with seqNo=6268
2021-08-16 14:37:12,626|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REPLY","result":{"identifier":"LibindyDid111111111111","reqId":1629124632518629063,"type":"3","data":null}}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket

Open Questions / Next Steps

Figure out

  • when the ledger update should happen
  • if ledger(s) are actually behind the global state

Just some notes:

Looking into the logfiles I noticed the following:

and the number of missing pre-prepares then keeps growing in the course of the logs:

Recovery Attempts

Re-Sync running as indy user

  • Changed Container to run as indy user
  • Deleted all ledger data
  • Started Catchup at 2021-08-23 10:01:33
    • CPU usage: 27.91% of 4 CPU(s), i.e. using 1 CPU completely
    • Memory usage: starting at 2.5 GB (node not running) rising to about 6.5 GB of 8 GB
  • Last log message from CatchupRepService at 10:13:14
        "Freshness_status": {
          "0": {
            "Has_write_consensus": false,
            "Last_updated_time": "2021-08-23 08:01:54+00:00"
          },
          "1": {
            "Has_write_consensus": false,
            "Last_updated_time": "2021-08-23 08:01:38+00:00"
          },
          "2": {
            "Has_write_consensus": false,
            "Last_updated_time": "2021-08-23 08:01:54+00:00"
          }
  • Last_updated_time equals node restart time.

Running Old Container Image

        "Freshness_status": {
          "0": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:47:15+00:00"
          },
          "1": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:47:26+00:00"
          },
          "2": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:48:58+00:00"
          }

Node start was around 9:00 UTC:

        "Freshness_status": {
          "0": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:02:06+00:00"
          },
          "1": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:02:17+00:00"
          },
          "2": {
            "Has_write_consensus": true,
            "Last_updated_time": "2021-08-23 09:03:49+00:00"
          }
Select a repo