owned this note
owned this note
Published
Linked with GitHub
---
tags: ID-Union, Indy, Debugging, SSI
---
# GS1 ID Union Node out of (write) consensus debugging
Useful ressources:
- https://github.com/hyperledger/indy-node/blob/master/docs/source/troubleshooting.md
## Issue
The ID Union Test Net Monitoring Dashboard: http://18.158.85.64:3000/d/F4JHZlinz/idunion?orgId=1&refresh=10s
shows `"GS1Germany" Consensus Status 3.0000`.
## Analysis
The above figure "3" is essentially obtained from the validator info (running `ledger get-validator-info` through a suitably connected and authorized indy-cli) as the count of
`["GS1Germany"]["data"]["Node_info"]["Freshness_status"][]["Has_write_consensus"] === false`
The ledger info [field is set](https://github.com/hyperledger/indy-plenum/blob/master/plenum/server/validator_info_tool.py#L558) by comparing the [last update time](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/replica_freshness_checker.py#L64) to a fixed timeout.
The fields `["GS1Germany"]["data"]["Node_info"]["Freshness_status"][]["Last_updated_time"]` also show that no updates are happening (timestamps are close to the node startup time).
- 2021-08-23 update: Confirmed. `Last_updated_time` equals the node start time
[Apparently timestamps are not updated](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/replica_freshness_checker.py#L52) as expected, [which means](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/consensus/ordering_service.py#L1489) [order_3pc_keys](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/consensus/ordering_service.py#L1482) is not happening.
This should happen at [process_commit](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/consensus/ordering_service.py#L452) or [_do_commit](https://github.com/hyperledger/indy-plenum/blob/5960118788c1a580cd271c50b920ac15b1497495/plenum/server/consensus/ordering_service.py#L1438). Indeed, none of the nearby log msgs shows up in the logs. The call stack of _do_commit branches again in _add_to_prepares.
### Call Stack Snipped:
![](https://i.imgur.com/dxHVsoX.png)
### Possibly Related Log Messages
Instance change request:
```
2021-08-16 14:36:13,532|INFO|view_change_trigger_service.py|VIEW CHANGE: GS1Germany sending an instance change with view_no 8055 since State signatures are not updated for too long
2021-08-16 14:36:13,533|TRACE|replica.py|GS1Germany:0 sending INSTANCE_CHANGE{'viewNo': 8055, 'reason': 43}
```
Declined transaction:
- `'type': '119'` = [VALIDATOR_INFO](https://hyperledger-indy.readthedocs.io/projects/node/en/latest/requests.html#validator-info)
```
2021-08-16 14:37:01,559|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2}
2021-08-16 14:37:01,559|DEBUG|node.py|GS1GermanyC processing b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw=' request SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2}
2021-08-16 14:37:01,559|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124621461476699, 'operation': {'type': '119'}, 'identifier': 'HHHzGUN4E9xJXrjgDJfsn9', 'signature': '4NFepuxxLg1w3P7EpQWKXW1aDJBdAvSLF3zFBNetiNzAsZM1GzfTgwfnzXdAG6UGeEfuKLF2eSUzwrfNHyKsaL4i', 'protocolVersion': 2} from b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw='
2021-08-16 14:37:01,560|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"HHHzGUN4E9xJXrjgDJfsn9","reqId":1629124621461476699}' to b'IndS08NWoXA+wXfLSmJGzGMyatALUpZTwdl7qQwk5Hw=' through listener socket
2021-08-16 14:37:01,561|DEBUG|auth_cons_strategies.py|Using auth constraint from state
2021-08-16 14:37:01,561|TRACE|authorizer.py|Validation error with reason: Not enough TRUSTEE signatures
2021-08-16 14:37:01,561|TRACE|authorizer.py|Validation error with reason: Not enough STEWARD signatures
2021-08-16 14:37:01,561|DEBUG|validator_info_handler.py|Transaction 1629124621461476699 with type 119 started
2021-08-16 14:37:01,714|DEBUG|validator_info_handler.py|Transaction 1629124621461476699 with type 119 finished
```
Declined read transactions:
- `'type' : '3'` = [GET_TXN](https://hyperledger-indy.readthedocs.io/projects/node/en/latest/requests.html#get-txn)
```
2021-08-16 14:37:12,575|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,575|DEBUG|node.py|GS1GermanyC processing b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' request SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,575|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124632129881655, 'operation': {'type': '3', 'data': 22, 'ledgerId': 0}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2} from b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE='
2021-08-16 14:37:12,576|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"LibindyDid111111111111","reqId":1629124632129881655}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,576|DEBUG|get_txn_handler.py|<plenum.server.request_handlers.get_txn_handler.GetTxnHandler object at 0x7fc153bc5da0> can not handle GET_TXN request: ledger doesn't have txn with seqNo=22
2021-08-16 14:37:12,576|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REPLY","result":{"identifier":"LibindyDid111111111111","reqId":1629124632129881655,"type":"3","data":null}}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,624|TRACE|zstack.py|GS1GermanyC got 1 messages through listener
2021-08-16 14:37:12,625|DEBUG|node.py|GS1Germany authenticated set() signature on request 1629124632518629063
2021-08-16 14:37:12,625|TRACE|node.py|GS1GermanyC received CLIENT message: SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,625|DEBUG|node.py|GS1GermanyC processing b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' request SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2}
2021-08-16 14:37:12,625|DEBUG|node.py|GS1Germany received client request: SafeRequest: {'reqId': 1629124632518629063, 'operation': {'type': '3', 'data': 6268, 'ledgerId': 1}, 'identifier': 'LibindyDid111111111111', 'protocolVersion': 2} from b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE='
2021-08-16 14:37:12,626|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REQACK","identifier":"LibindyDid111111111111","reqId":1629124632518629063}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
2021-08-16 14:37:12,626|DEBUG|get_txn_handler.py|<plenum.server.request_handlers.get_txn_handler.GetTxnHandler object at 0x7fc153bc5da0> can not handle GET_TXN request: ledger doesn't have txn with seqNo=6268
2021-08-16 14:37:12,626|TRACE|client_message_provider.py|GS1GermanyC transmitting b'{"op":"REPLY","result":{"identifier":"LibindyDid111111111111","reqId":1629124632518629063,"type":"3","data":null}}' to b'awtlxY3kmlyxLbj7CHGmc75X9ZMFvIzJqARWaNJzrVE=' through listener socket
```
## Open Questions / Next Steps
Figure out
- when the ledger update should happen
- if ledger(s) are actually behind the global state
## Just some notes:
Looking into the logfiles I noticed the following:
![](https://i.imgur.com/vbUARAO.png)
and the number of missing pre-prepares then keeps growing in the course of the logs:
![](https://i.imgur.com/FnCp8fj.png)
# Recovery Attempts
## Re-Sync running as indy user
- [Changed Container to run as indy user](https://github.com/IDunion/indy-node-container/pull/34)
- Deleted all ledger `data`
- Started Catchup at `2021-08-23 10:01:33`
- CPU usage: 27.91% of 4 CPU(s), i.e. using 1 CPU completely
- Memory usage: starting at 2.5 GB (node not running) rising to about 6.5 GB of 8 GB
- Last log message from `CatchupRepService` at `10:13:14`
```
"Freshness_status": {
"0": {
"Has_write_consensus": false,
"Last_updated_time": "2021-08-23 08:01:54+00:00"
},
"1": {
"Has_write_consensus": false,
"Last_updated_time": "2021-08-23 08:01:38+00:00"
},
"2": {
"Has_write_consensus": false,
"Last_updated_time": "2021-08-23 08:01:54+00:00"
}
```
- `Last_updated_time` equals node restart time.
## Running Old Container Image
- https://github.com/Echsecutor/docker-indy-node
- in particular using deb packages for node instead of pypi
- no data re-sync -> issue persists!
- data re-sync
- Freshness *does* get updated!
```
"Freshness_status": {
"0": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:47:15+00:00"
},
"1": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:47:26+00:00"
},
"2": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:48:58+00:00"
}
```
Node start was around 9:00 UTC:
```
"Freshness_status": {
"0": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:02:06+00:00"
},
"1": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:02:17+00:00"
},
"2": {
"Has_write_consensus": true,
"Last_updated_time": "2021-08-23 09:03:49+00:00"
}
```