accuracy-pattern

# Accuracy Argument Pattern :::info ℹ️ **About this Document** This is a work-in-progress argument for the second natural environment workshop for the TEA-DT project. This is a work-in-progress argument for the second infrastructure workshop for the TEA-DT project. Please read the instructions below before reviewing the content of the argument pattern. ::: ## Instructions 1. Download the [assurance case template](https://github.com/alan-turing-institute/AssurancePlatform/blob/main/examples/Accuracy%20(TEA-DT%20Natural%20Environment%20Workshop)-2024-5-20T11-1-15.json) to your computer. 2. Go to the Trustworthy and Ethical Assurance platform: https://staging-assuranceplatform.azurewebsites.net/ - Create an account if you do not already have one. NB: the Google and Github features are not live yet. - Import the template file to get started (from the platform's dashboard). 4. Review the goal and context for this pattern (see below). 5. Discuss the questions. If you answer "no" to either of the questions, please make adjustments as required. 6. Review the strategies and property claims (see below) for this pattern. 7. Discuss the questions and make revisions as required. :::success 📝 **Cheat Sheet** A cheat sheet with descriptions of each of the core elements as well as some general tips for developing clear and concise assurance cases is available. 👉 [Access Cheat Sheet](https://hackmd.io/@tea-platform/BJyX12aSA) ::: ## Goal and Context The *goal claim* adopted for this pattern is as follows: > The digital twin provides sufficiently accurate representations across all operational use cases. The *context* for this pattern is as follows: > Use cases related to the natural environment (e.g. digital twin of weather systems, ocean currents, ice drift, and so on). Some more specific context claims to consider: - **C1:** Description of the digital twin system and its purpose. - **C2:** Operational environment and intended use cases. - **C3:** Applicable regulations and standards for safety and security. - **C4:** Assumptions about external systems and user interactions. :::warning ❓ **Questions** 1. Is this goal claim appropriately specified? Is the goal claim clear? 2. Does the context statement capture all intended use cases? ::: ## Strategies and Property Claims ### Strategy 1: Argument Over Data Sources and Integration - **Property Claims**: 1. The data used by the digital twin has sufficient quality. - The data used by the digital twin has sufficient accuracy. - The digital twin uses data from reliable sources. - {Information source 1} produces reliable data. - {Information source N} produces reliable data. - The data used by the digital twin are sufficiently complete. - Imputation method for dealing with missingness is appropriate. - Regular audits are conducted to ensure data completeness. - The data used by the digital twin are unique. - The digital twin has mechanisms to detect and prevent data redundancy. - The data used by the digital twin have sufficient timeliness. - The digital twin gathers data in { real-time / near real-time / just-in-time}. - The data used by the digital twin are consistent. - Data is standardised across all systems and sources used by the digital twin. - The data used by the digital twin are valid. - Data conforms to predefined rules, formats, and standards. - The digital twin only accepts data that meets validity criteria. 2. Data integration methods are appropriate for the context of use. - Data normalisation methods remove undesirable redundancy. - Data harmonisaton methods use suitable conversion standards or ontologies. 3. Information about the data pipeline is suitably communicated. - Technical documentation about the data pipeline is complete and accurate. - Technical documentation about the data pipeline is accessible to all relevant parties. - Assumption: all relevant parties have been identified and have been granted access. - Assumption: all relevant parts of the data pipeline have been documented accurately. - **Rationale**: Ensuring the accuracy of the digital twin involves verifying the quality and integration of diverse data sources. ### Strategy 2: Argument Over Utility - **Property Claims**: 1. The digital twin has sufficient utility in {use case 1}. - The features used by the system are suitable for the physical system being modelled. - The functional outputs of the digital twin enable actionable insights. - The digital twin has sufficient utility in all intended operational conditions. - The digital twin provides accurate representations under normal conditions. - The digital twin provides accurate representations under extreme conditions. - The digital twin provides accurate representations under transitional conditions (e.g., changing seasons). - The purpose of the system has been clearly formulated and documented. - All intended operational conditions have been specified. - Limitations of the system have been clearly communicated. 1. The digital twin has sufficient utility in {use case N}. - The features used by the system are suitable for the physical system being modelled. - The functional outputs of the digital twin enable actionable insights. - The digital twin has has sufficient utility in all intended operational conditions. - The digital twin provides accurate representations under normal conditions. - The digital twin provides accurate representations under extreme conditions. - The digital twin provides accurate representations under transitional conditions (e.g., changing seasons). - The purpose of the system has been clearly formulated and documented. - All intended operational conditions have been specified. - Limitations of the system have been clearly communicated. - **Rationale**: Individual use cases are likely to have distinct characteristics and requirements for utility, necessitating separate arguments (e.g. relevant features, ) ### Strategy 3: Argument Over Verification and Validation :::success ✅ **Verification and Validation** As a reminder, verification and validation are related but distinct concepts. An easy way of remembering the primary differences is with the following questions: 1. Verification: are you building the system *right*? 2. Validation: are you building the *right* system? The following table also provides some key differences and examples: | Aspect | Verification | Validation | | ---------------- | --------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------- | | **Purpose** | Ensure the system is built correctly according to specifications (e.g. internal consistency, correctness of implementation) | Ensure the system meets the needs and requirements of the end-users | | **Focus** | Process-oriented | Product-oriented | | **Timing** | Ongoing throughout the project's lifecycle | After the system or component is developed | | **Techniques** | Reviews, inspections, formal assessments, unit tests | Performance tests, usability tests | | **Output** | Verification reports, defect logs, test results | Validation reports, user feedback, test results | | **Stakeholders** | Developers, QA engineers | End-users, compliance officers | ::: - **Property Claims**: 1. The digital twin's performance has been sufficiently verified. - All performance requirements or thresholds have been met or exceeded. - Assumption: the accuracy metrics (e.g. precision, recall, F1 score, mean absolute error (MAE), root mean squared error (RMSE)) are suitable for the architecture of the digital twin. - The digital twin meets the agreed upon specifications. - Assumption: the project's specifications are complete. 2. The performance of the digital twin is robust. - The digital twin has been externally validated in an environment that is not sufficiently varied from the environment represented within the training data. 3. The digital twin has sufficient performance against external benchmarks. - Assumption: the benchmarks used are suitable and accepted by community or stakeholders. 4. The digital twin's usability has been sufficiently validated. - The digital twin's usability has been independently validated through expert reviews and feedback. - The digital twin's usability meets design specifications. - **Rationale**: Different aspects of accuracy (spatial, temporal, predictive performance) are crucial for a comprehensive assessment of the digital twin's performance across all use cases. Using multiple validation methods ensures a robust and comprehensive assessment of the digital twin's accuracy. ### Strategy 4: Argument Over Sustainability - **Property Claims**: 1. All biases have been proportionately mitigated. - All relevant biases have been identified. - Assumption: bias assessment is sufficient for identifying all relevant biases. - Bias mitigation is sufficient for identified risks. - {Bias mitigation technique N} is sufficient for {risk N}. - Justification: any identified biases that have not been mitigated pose little to no risk. 2. Unintended consequences caused by digital twin are unlikely. - Ongoing monitoring has been established to detected undesirable model drift. - Feedback mechanisms have been established to enable users to log errors or issues. :::warning ❓ **Questions** 1. Do the strategies collectively support (and help operationalise) the goal claim? 2. Do the strategies help identify all necessary requirements for the project or system, which can be developed into property claims? - If you answered "no" to this question, which property claims are missing? - NB: when first approaching this pattern, you should assume that many relevant property claims are missing. 3. Are new strategies required? Do existing strategies need to be revised (e.g. merged or split)? 4. Do any of the current strategies or claims fall outside of the scope of your responsibilities (e.g. assurance for property X would be delivered by a third-party)? 5. (Optional) Do any of the property claims suggest a specific type of evidence (e.g. results of model testing, user evaluation report)? :::