Project Lifecycle: Ethical Self-Assessment

--- title: 'Project Lifecycle: Ethical Self-Assessment' author: Christopher Burr date: 26th July 2021 --- ###### tags: `Online Training` # Project Lifecycle: Ethical Self-Assessment :::info This document introduces a model of a data science or ML project lifecycle, which can be used to support reflective, ethical self-assessment. For example, identifying and evaluating the impact of social, statistical, or cognitive biases on the project and its outcomes. **Table of Contents** [toc] ::: The self-assessment of ethical issues, such as the impact of biases, are anchored in the following model of a project's lifecycle: ![](https://i.imgur.com/tcqGSOl.png) This model is intended as a heuristic device, which can support reflection, action, and justification of the various choices made throughout a project. However, it is not intended to accurately capture the exact processes for all projects. The following sections introduce each of these stages, and where relevant explain how various biases may affect the activities associated with each stage. ## (Project) Design Tasks and Processes ### Project Planning Project planning can involve a very wide range of activities, including, but not limited to: - Anticipatory identification of key decisions throughout the project to support governance and business tasks - An assessment of resources and capabilities within a team, which is necessary for identifying any skills gaps - A contextual assessment of the target domain and of the expectations, norms, and requirements that derive therefrom (e.g. patients' or clinicians' expectations for automated diagnosis) - Stakeholder engagement activities, in order to identify and evaluate possible harms and benefits associated with the project (e.g. socioeconomic inequalities that may be exacerbated as a result of carrying out the project), to gain social license and public trust, and also feed into the process of problem formulation in the next stage - Wider impact assessments (e.g. equality impact assessments, data protection impact assessments, human rights impact assessment, bias assessment) The scope and quality of planning can affect the actions taken in subsequent stages. For example, if key stakeholders are not engaged, specific harms or benefits associated with the project may be missed. ### Problem Formulation Here, 'problem' refers to a well-defined computational process (or a higher-level abstraction of the process) that is carried out by an algorithm. For instance, the series of successive transformations carried out by a convolutional neural network that takes (as input) an image, encoded as an array, in order to produce (as output) a decision about whether some object is present in the image. Typically, there will be a need to define the computational "problem" being solved and explain how it contributes to (or affects) a wider socio-technical problem being considered or research question being explored. This is important for determining and evaluating the choice of target variables used by the algorithm, which may ultimately be implemented within a larger automated decision-making system (e.g. in a verification system). The task of formulating the problem allows the project team to be clear on what input data will be needed, for what purpose, and whether there exists any representational issues in, for example, how the target variables are defined. This is why stakeholder engagement, which helps bring a diversity of perspectives to project design, is so vital, and why this stage is so closely connected with the project planning stage (e.g. discussion about legal and ethical concerns regarding permissible uses of personal or sensitive information) ### Data Extraction or Procurement Ideally, the project team should have a clear idea in mind (from the planning and problem formulation stages) of what data are necessary prior to extracting or procuring data. This can help mitigate risks associated with over-collection of data (e.g. increased privacy or security concerns) and help align the project with values such as _data minimisation_. This stage may need to be returned to after carrying out subsequent tasks (e.g. preprocessing, model testing) if it is clear that insufficient data were collected to achieve the project's goals. Where data is procured, questions about provenance arise (e.g. legal issues, concerns about informed consent of human data subjects). These questions should be asked so that the project team understand the limitations of the data, including how it was collected and cleaned. Generally, responsible data extraction and procurement require the incorporation of domain expertise into decision-making so that desiderata of data minimisation as well as of securing relevant and sufficient data can be integrated into design choices. ### Data Analysis Exploratory data analysis is an important stage for hypothesis generation or uncovering possible limitations of the dataset that can arise from missing data, in turn identifying the need for any subsequent augmentation of the dataset to deal with possible class imbalances. However, there are also risks that stem from cognitive biases (e.g. confirmation bias) that can create cascading effects that effect downstream tasks (e.g. model reporting). See [here](https://alan-turing-institute.github.io/rrp-selfassessment/bias/bias-intro.html) for a list of biases that may affect data analysis. ## (Model) Development Tasks and Processes ### Preprocessing and Feature Engineering Pre-processing and feature engineering is a vital but often lengthy process, which overlaps with the design tasks in the previous section. Tasks at this stage include data cleaning, data wrangling or normalisation, and data reduction or augmentation. It is well understood that the methods employed for each of these tasks can have a significant impact on the model's performance (e.g. deletion of rows versus imputation methods for handling missing data), as well as subsequent stages of a project. ### Model Selection This stage determines the model type and structure that will be produced in the next stages. In some projects, model selection will result in multiple models for the purpose of comparison based on some performance metric (e.g. accuracy). In other projects, there may be a need to first of all implement a pre-existing set of formal models into code. The class of relevant models is likely to have been highly constrained by many of the previous stages (e.g. available resources and skills, problem formulation). For instance, where the problem demands a supervised learning algorithm instead of an unsupervised learning algorithm; or where explainability considerations require a more interpretable model (e.g. a decision tree). ### Model Training Prior to training the model, the dataset will need to be split into training and testing sets to avoid model overfitting. The _training set_ is used to fit the ML model, whereas the _testing set_ is a hold-out sample that is used to evaluate the fit of the machine learning model to the data distribution. There are various methods for splitting a dataset into these components, which are widely available in popular package libraries (e.g. the scikit-learn library for the Python programming language). ### Model Validation and Testing The testing set is typically kept separate from the training set, in order to provide an unbiased evaluation of the final model fit on the training dataset. However, the training set can be further split to create a validation set, which can then be used to evaluate the model while also _tuning model hyperparameters_. This process can be performed repeatedly, in a process known as (k-fold) cross-validation, where the training data are resampled (_k_-times) to compare models and estimate their performance in general when used to make predictions on unseen data. This type of validation is also known as 'internal validation', to distinguish it from 'external validation'—a subsequent process where the model is validated in wholly new environments (e.g. the validation of a prediction model used for diagnosis or prognosis on healthcare data gathered from a different site than the original study). ### Model Reporting Although the previous stages are likely to create a series of artefacts while undertaking the tasks themselves, model reporting should also be handled as a separate stage to ensure that the project team reflect on the future needs of various stakeholders and end users. While this stage is likely to include information about the performance measures used for evaluating the model (e.g. decision thresholds for classifiers, accuracy metrics), it can (and should) include wider considerations, such as intended use of the model, details of the features used, training-testing distributions, and any ethical considerations that arise from these decisions (e.g. fairness constraints, use of politically sensitive demographic features). ## (System) Deployment Tasks and Processes ### Model Productionalization Unless the end result of the project is the model itself, which is perhaps more common in scientific research, it is likely that the model will need to be implemented within a larger system. This process, sometimes known as 'model operationalisation', requires understanding a) how the model is intended to function in the proximate system (e.g. within an agricultural decision support system used to predict crop yield and quality) and b) how the model will impact the functioning of the wider sociotechnical system that the tool is embedded within (e.g. a decision support tool used in healthcare for patient triaging that may exacerbate existing health inequalities within the wider community). Ensuring the model is able to work within the intended system can be a complex programming and software engineering task, especially if it is expected that the model will be updated continuously in its runtime environment. ### User Training Although the performance of the model is evaluated in earlier stages, the model's impact cannot be entirely evaluated without consideration of the human factors that affect its performance in real-world settings. The impact of human cognitive biases, such as algorithmic aversion [^aversion] must also be considered, as such biases can lead to over- and under-reliance on the model (or system), in turn negating any potential benefits that may arise from its use. Understanding the social and environmental context is also vital, as sociocultural norms may contribute to how training is received, and how the system itself is evaluated [see @burton2020]. ### System Use and Monitoring Depending on the context of deployment, it is likely that the performance of the model could degrade over time. This process of _model drift_ is typically caused by increasing variation in how representative the training dataset was at the time of development and how representative it is at later stages, perhaps due to changing social norms (e.g. changing patterns of consumer spending, evolving linguistic norms that affect word embeddings). As such, mechanisms for monitoring the model's performance should be instantiated within the system to track model drift, and key thresholds should be determined at early stages of a project (e.g. project planning) and revised as necessary based on monitoring of the system's use in the ecologically valid environment. ### Model Updating or De-provisioning As noted previously, model updating can occur continuously if the architecture of the system and context of its use allows for it. Otherwise, updating the model may require either revisiting previous stages to make planned adjustments (e.g. model selection and training), or if more significant alterations are required the extant model may need to be entirely de-provisioned, necessitating a return to a new round of project planning.