# Audit & Metrics Specs
## Audit feature
:::info
* DataNodes or Nodes (Tech): Abstract representation of a MetaModel.
* Flux (Job), Snapshot (Tech): Set of DataNodes produced by a pipeline run, usually with a target stock.
* Stock (Job/Tech): aggreation of the datanodes of snapshots commited, with most up to date datanodes if there are multiple of the same name.
* An Audit: Task that reports the contents of a new flux (set of data nodes produced by a pipeline run) that may be added to a stock.
* The audit will identify if there is an intersection between the data nodes in a Stock and the data nodes in the audit snapshot. (For instance it is better for flux PTAB)
* The audit will provide a report under a table format (csv or xlsx) that will be downloadable once the audit was finished.
* Here is the format of the [table](https://docs.google.com/spreadsheets/d/1ea8BKcNlqKQjdtZR-38IRkAiVqKd5kxhEMZ-fKvhiMQ/edit#gid=1206604242)
* The table will contain new case and modified, seperated
* either by sheets if in xlsx
* either by two seperated tables in a single file.
* The audit could take some time. You should be able to view the progression of an audit when viewing a snapshot.
* A Notification should be provided at the front level showing that an audit has been completed.
* The termes currently related to `delta computations` should be removed and replaced with termes related to audit.
* Delta computation feature is to be removed and replaced by the audit feature.
* A Snapshot has to be audited to be commited
:::
:::warning
Comparaison in iter 2 (keep in mind)
* SQL table will permit fast comparaison.
* Might be interesting for PTAB
* INPI has updates, thus would be beneficial
* Do we really need it?
* NO
Audit has to be fast
:::
:::success
* `POST /api/v1/stocks/{stockId}/diff` will be deprecated.
* New Endpoint `POST /api/v1/stocks/{stockId}/snapshots/{snapshotId}/audit`
* Used aswell to rerun an audit (a new one is created instead)
* Generates the difference between
* Get Lists of New Nodes and Modified Nodes
* Creates a new Audit Task entity -> similar behavior to the Zip Task entity
```json
{
"id" : GUID,
"snapshotId": GUID,
"triggerBlobPath": string,
"outputBlobPath": string,
"status": ENUM[Created, Running, Finished, Error_Trigger, Error_Runtime]
}
```
* Create a new blob with datanodes to audit, sorted between modified and new
* This will launch the Audit Service
* New Service: Audit Function
* Two ways to do it
* Single function that will proceed iteratively -> Fast developement, but longer audit times and less fun to develop.
* Open each blob, retrieve info, create row, add row to table, repeat. Once done, save the table to a blob.
* Durable orchestrated function "Serverless MapReducer" -> More fun to develop and faster execution, but longer develop time
* Orchestrate multiple "Mappers" to retrieve each audit for a single data node
* Reduce mapper outputs into single table which will be saved as a blob.
* Only if we can
:::
:::success
* New Endpoint `GET /api/v1/stocks/{stockId}/snapshots/{snapshotId}/audit`
* Downloads the table of the audited snapshot.
:::
:::success
* Add the status of the audit to a snapshot response. If there is not audit, the status will be null
:::
## Metric feature
* Metrics feature corresponds only to the evolution of the stock metric (no Competitor Comparaison / README)
:::info
* Delivery: Process representing the contents and the state of the data to be delivered to a client.
* Delivery States:
* Created: New Delivery with already bounded stocks of the previous delivery
* Validated awaiting Metrics (Tech: Started): The Delivery has been checked and can now proceed to next states (packaging and sending)
* System check: All flux have been commited or aborted
* Validated with generated Metrics (Tech: Started and metrics entities)
* the Metrics report have been generated
* A README has been generated / Uploaded*
* Zip in progress: Delivery packaging is in progress. (delivery can be restarted)
* Zip finished: Delivery packaging is finished.(delivery can be restarted)
* Transfer in progress: Delivery is being transfered to the client by FTP*(delivery can be restarted)
* Transfer finished: Delivery is in the hands of the client (delivery can be restarted)
* Finished: Delivery has been closed by the User (No changes possible to the delivery). A new Delivery is created.
* Once the User is ready to validate a Delivery, he will be able to press the `Validate and generate Metrics` button.
* A Metrics table will be generated.
* This is how [it would look like](https://docs.google.com/spreadsheets/d/1ea8BKcNlqKQjdtZR-38IRkAiVqKd5kxhEMZ-fKvhiMQ/edit#gid=1332823129)
* The Metrics table will be downloadable from the front side once it has been generated. (Delivery has to be validated)
* You should be able to follow the generation of the metrics on the front.
* Metrics table will be packaged along side the data at the root of the zip.
:::
:::warning
User should be able to cancel:
* Metrics generation
* Zip generation
* Transfer
* in other words any long ass processing.
:::
:::success
* Metrics Entity
```jsonld=
{
"id": GUID,
"stockId": GUID,
"deliveryId": GUID,
"triggerBlobPath": string,
"outputBlobpath": string,
"status": ENUM[Created, Running, Finished, Trigger_Error, Runtime_Error]
}
```
* Start Delivery endpoint to be changed.
* Gets Snapshots commited in this delivery and related audit task to the snapshot
* Snapshots are to be sorted between their stocks
* Gets previous metrics (Metrics from the most recent finished delivery containing the stock)
* Creates a Metrics task (zip task like) per Stock to be delivered, the metrics tasks are to be triggered by a blob trigger
* The Blob will contain:
* The Previous Metrics Blob path
* The Paths to the audits of the commited snapshots
* A Metric task will:
* Compute metric values from the given audits
* copy the content of the given Metrics and complete/add the computed metrics from the audit.
* Endpoint rerun metrics generation for delivery X and Stock Y
* In case the metrics generation failed, the user will be able to restart a metrics generation of a stock
:::
## Readme feature
:::info
* Readme is set format in the backend
* User will add a change log at each delivery before validating
:::