## Background Traditionally, Pantavisor reports updates with a message that has the following format: ``` { "status" : "WONTGO", "status-msg" : "State signatures cannot be verified", "progress" : 0 } ``` This JSON is used by both the **local** and the **remote** experiences: * Firstly, it is stored in `/storage/trails/<revision>/.pv/progress`. * This stored JSON can be accessed using [pvcontrol](https://docs.pantahub.com/pantavisor-commands/#steps): ``` $ pvcontrol steps show-progress current { "status" : "DONE", "status-msg" : "Update finished, revision set as rollback point", "progress" : 100 } ``` * It is also [sent to Pantacor Hub](https://docs.pantahub.com/pantahub-base/trails/#trails) and displayed on the [device dashboard](https://docs.pantahub.com/ph-device-dashboard/). * Additionally, there are further validations in the local experience that might prevent the revision to be stored if an error is detected earlier, only applicable to the WONTGO status errors: ``` pvr post Error: State verification has failed ``` The current solution shows errors that are not very informative, so this document proposes several improvements to make more solid error reporting. The statuses that we are going to try to improve are: * WONTGO: JSON format or signature errors. Not always stored in `/storage` in case of local experience. * ERROR: corrupted state, platform startup error, Hub not responding, etc. Always stored in `/storage`. ## Proposal ### Improving the progress JSON status message As the status messages are too generic. One thing that we can do to greatly improve the update feedback is to just try to be more informative. For example, in the case of signature failure, we have this: ``` { "status" : "WONTGO", "status-msg" : "State signatures cannot be verified", "progress" : 0 } ``` But we do not know what is happening under the hood. More information could be supplied in the status-msg field: ``` { "status": "WONTGO", "status-msg": "Secureboot: State not fully covered by signatures", "progress": 0 } ``` Or, in case of SHA validation error: ``` { "status": "WONTGO", "status-msg": "Secureboot: Signature validation failed", "progress": 0 } ``` #### WONTGO status-msg improvements | Before | After | | ------ | ----- | State signatures cannot be verified | Secureboot: State not fully covered by signatures | State signatures cannot be verified | Secureboot: Signature validation failed | State signatures cannot be verified | Secureboot: Internal error | State cannot be parsed | Parser: State JSON has bad format | Update aborted | PH Client: Update aborted | Unable to download and/or install update | PH Client: Max download retries reached | Space required X B, available Y B | PH Client: Space required X B, available Y B | #### ERROR status-msg improvements | Before | After | | ------ | ----- | Error during update | Secureboot: State not fully covered by signatures | Error during update | Secureboot: Signature validation failed | Error during update | Secureboot: Internal error | Error during update | Checksum: Error during update | Secureboot: Internal error | Error during update | Container: A container could not be started | Error during update | Container: Status goal not reached | Error during update | PH Client: Hub not reachable | Error during update | PH Client: Hub communication not stable | Error during update | Pantavisor: Internal error | ### Adding new information into progress JSON Furthermore, ERROR logs could be filtered and added into the progress JSON, with the same format as the log server single file: ``` { "status": "WONTGO", "status-msg": "Secureboot: Uncovered files in strict mode", "log": { {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "_config/pvr-sdk/etc/pvr-sdk/config.json is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "pvr-sdk/lxc.container.conf is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "pvr-sdk/root.squashfs is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "pvr-sdk/root.squashfs.docker-digest is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "pvr-sdk/run.json is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "pvr-sdk/src.json is not covered by any signature"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "signature", "msg": "not all state elements were covered"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "storage", "msg": "Could not verify state json signatures"}, {"plat": "pantavisor", "tsec": 8959, "lvl": "ERROR", "src": "ctrl", "msg": "state verification went wrong"}, } "progress": 0 } ``` Traditionally, the confusing part of log reading was that an error message belonging to a revision could be stored in the log file of a different revision, as the update was taking place in the current revision and not in the new one (the one actually getting the error). This means that we would have to implement a new log server sink that only exists temporarily during an update that would store all the ERROR messages belonging to the update in `/storage/trails/<update-revision>/.pv/progress` (or to a different file in the same path), while the rest of the configured sinks keep sending its logs into `/storage/logs/current`. The new sink would take all ERROR messages, including the container ones, in case of container start up failure. In order to limit the amount of disk space that this new field would take, we could create a new [configuration](https://docs.pantahub.com/pantavisor-configuration/#at-compile-time) key to specify the maximum size. If not specified, the default value will be **4KB**: ``` storage.trails. ``` It is important to notice that this new data is removable along its revision by the [garbage collector](https://docs.pantahub.com/storage/#garbage-collector). ### Improving the pvr feedback As pvr post can get some errors in local mode before the step is stored, we can use the same contents as for the status-msg and log, and go from this: ``` pvr post Error: State verification has failed ``` To this: ``` pvr post Error: Uncovered files in strict mode [pantavisor] 18667 ERROR -- [signature]: _config/pvr-sdk/etc/pvr-sdk/config.json is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: pvr-sdk/lxc.container.conf is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: pvr-sdk/root.squashfs is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: pvr-sdk/root.squashfs.docker-digest is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: pvr-sdk/run.json is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: pvr-sdk/src.json is not covered by any signature [pantavisor] 18667 ERROR -- [signature]: not all state elements were covered [pantavisor] 18667 ERROR -- [storage]: Could not verify state json signatures [pantavisor] 18667 ERROR -- [ctrl]: state verification went wrong ``` ## Implementation Plan 1. Improve WONTGO status-msg: * Secureboot. * State JSON format. * Progress JSON pvr feedback. 2. Improve ERROR status-msg: * Secureboot. * Containers start up. * Pantacor Hub. 3. Update log sink for log server: * Config: size limit. * Filter ERROR messages in pre update. * Filter ERROR messages in post update. * Progress JSON pvr feedback.