Sanity check - HackMD

# Sanity check Computation Trigger: - On population data change - On analysis parameters change ```json SanityCheck { "name": "Short description" "population_filter": "EXIT_REASON", "number_of_failed_individual": 2, // Depends on the filter used, fo example, on exit reason, you only check the total of individuals who failed sanity on thes filter "number_of_checked_individual": 9, // Depends on the filter used, a sup-population (as above) "ratio_failed": 0.22, // Float to display as percentage "fail_in_value": 200.00, // Float to display the total pension amount "ratio_failed_in_value": 0.27, // Float to display as percentage weighted by pension amount "approver_id": 1, // Foreign key to user that can be NULL "status": "Approved|Failed|Success", "description": "Total Pension of dependant is zero", "suggestion": "Remove Record", // What to do on fail, how can we fix this "analysis_id": analysis_1, // Foreign key to Analysis "comment": "Text explaining why something was approved" } ``` :warning: The Mortality table does not hold the gender, the sanity check on individual mortality table gender needs an update. Intermediate table ```json SanityCheckIndividual { "sanity_check_id": sanity_check_1, // foreign key to sanity chek "individual_id": individual_1, // foreign key to individual who failed the sanity check } ``` # Processing report ```json ProcessingReportItem { "row_number": 123, // row in the original file onwhich the data is "ref": "def345", // value of ref field on the row "unique_id": "345", // value of unique_id field on the row "level": "WARNING", // WARNING = data was dropped, INFO = data was kept but is weird "field": "Gender", // Field that has wrong data format "message": "Expected 'M' or 'F', received 'U'." // What is wrong with the data "analysis_id": analysis_1 // Foreign key to Analysis } ``` # Architecture design ## Procedural Hence the data processing is done, we launch the sanitization **Pros:** - We are certain to adress all sanitization - The overall data processing is ready hence the sanitization is done - Aggregation on premise **Cons:** - We have a shared data between the processing subdomain and the sanitization subprocess - We have to process all the population individuals and iterate over it (increases the time of execution of the all process) - We have to make sure that any error in the sanitization process won't stop the overall process ## Event Driven way **:warning: This design is interesting if we want to have a sinitization at data processing level.:warning:** Each time an individual is processed we emit an event `IndividualProcessed` on a bus of our own. **Pros:** - Separation of concern - Isolation - Transparency - No shared data (population is not needed) - No need to iterate on population individuals - Easy mental model **Cons:** - The event consumer cannot be executed on the main thread. Doc [here](https://doc.rust-lang.org/book/ch16-01-threads.html) - The sanitization will be ready at a term (not really a cons, but something we have to keep in mind) - We have to implement the event bus - Aggregation on demand? ## Decision We choose the procedural design as it is more accurate for our current needs.