RRI Script 204-2: Project Transparency

# RRI Script 204-2: Project Transparency ###### tags: `RRI Skills track`, `explainability`, `section 2` **Slides and Notes** [TOC] ## Notes :::info Slides 2 + 3 are potentially unneccessary/overkill Slides 11-15 quite technical for me to unpack :() Could benefit from one summary slide (slide 16) to unpack? ::: ## Slides ### 39 In this second section of our Explainability module, we will focus on the importance of project transparency. ### 40 We will look at what we are trying to achieve with project transparency, some practical choices and mechanisms by which project transparency and accessibility can be achieved, and some of the limits of transparency. ### 41 Let's start with a short introduction. ### 42 We are going to jump straight in with an example. A team of lawyers are building a court case and to do so, they are carrying out discovery. This involves gathering information about a case that typically requires the team to request information from another company or organisation. In this case, they need information from the legal team that represent the other case or person in a legal dispute. One of the areas of concern in the case is about access to information about the use of an algorithmic decision-making system. Because this algorithmic decision-making system was used to make a decision that is subject to dispute in this case. They make a request to see information about the algorithm, including how it was designed, who was involved, and how it works. ### 5 But when the team of lawyers receive the requested information, they experience two major issues. Firstly, the other team have sent across mountains of documents and files. As they try to bury any incriminating evidence in thousands of hours of transcripts of meetings, emails, and other irrelevant documents. ### 6 The second issue is with jargon. The information the other team sent over about the structure of the algorithm is written in technical jargon and is very hard for the lawyers to understand. ### 7 This example highlights two relevant issues that are significant to both project transparency, and explainability in general. Let's break down theses two points of note. ### 8 Transparency is not the same as accessibility. In the example of the request for information, the second team of lawyers have made information "accessible" to the first team by providing documentation and more. However, they way they have shared information makes it extremely difficult for the legal team to understand as it is not transparent in a meaningful way. ### 9 Transparency is also necessary for explainability. The first team of lawyers will need to be able to explain the algorithm's structure in court in a way that is understandable to the judge and jury. Developing this explanation requires a certain degree of transparency in the documentation about how the algorithm was designed, developed, and deployed. > [bnea - not sure what is being said here / if neccessary for the slide] As with many of the SAFE-D principles, explainability is closely related to and overlaps with other neighbouring concepts, such as accessibility and transparency. ### 10 Now, let us move onto the second subsection. In this section we will discuss a question that raises lots of other questions about what we are trying to explain and where we may need transparency. ### 11 Let us now consider the following scenario. A team of data analysts working for a travel booking website are asked to explain why a model has changed its predictions about what trip customers are most likely to buy. Customers are now booking more ski trips instead of beach holidays. > [bnea - sorry, Clau - not sure how to break this one down further!] No fault with the model. Rather a change in the data distribution is the locus of our explanation for the model’s behaviour, although the two are intertwined. ---- > [bnea - could try and condense these paras?] *Perhaps the system is recommending significantly more trips to beach resorts, whereas previously it was recommending ski trips. Here, if the features used by the model were investigated it would be easy enough to identify that season is a feature with high importance for the model.* *It is well known that customers alter their purchasing behaviour between seasons (e.g. Winter, Summer).3 From this we could explain the change in the model's predictions, as a result of a significant change in the data distribution, which itself is a representation of a change in the underlying phenomena (i.e. changing seasons). Simple enough* --- ### 12 But now let's assume that there is another change, which results in a significant drop in conversion rate. Meaning, the ratio of the number of people who view a holiday deal to the number who actually purchase the holiday) suddenly drops. Namely, customers are not just booking different holidays, they are not booking as many holidays at all. ### 13 However, this time, let's pretend that the problem is actually a fault with a third-party piece of software. This software is used as a dependency in the team's data pipeline, which is now causing the data about a user's location to be recorded incorrectly. ### 14 As it turns out, there is no fault with the model itself. Rather, the company's model has learned that those who live in affluent neighbourhoods are more likely to purchase more expensive packages, as it collects purchasing data from these buyers. The company's recommendation system uses this to show customers holidays that are in their predicted price range, or change the price of holiday packages based on their estimated "willingness-to-pay". All customers are now being shown the same expensive holiday packages regardless of their location because their postcodes are all being recorded as located in affluent neighbourhoods. As such, fewer customers are purchasing their packages, because they cannot afford them, and the conversion rate has dropped. Here, there is no fault with the model itself. Instead, the fault lies in the data and the generative mechanisms that produce the data. The model is still making the same predictions, but the predictions are now wrong and effecting the purchasing habits of the customers. ### 15 This example of the travel booking website highlights a very important point about transparency and explainability. The focus of our explanation will not always be the model. Instead, the focus of an explanation could be the data pipeline used to drive the model's predictions as well as other parts of the system. ### 16 As such, the sort of transparency that we are interested in is not just the transparency of the model itself, but the transparency of the project and system as a whole. ### 17 What about the transparency of the learning algorithm here? On the left, we have an image of a husky classified as a wolf and on the right, an explanation that is blurred. Here, we can see that the algorithm used to train the model (the learning algorithm) can also be the focus of our explanations. The algorithim has clearly learned the wrong features and produced an incorrect result. ### 18 Let us now jump into our understanding of what responsible project transparency looks like in practice. ### 19 In a previous module we introduced the project lifecycle model. This model is a useful scaffold for identifying project tasks (e.g. actions or decisions) that serve as a source of information and require transparent forms of documentation or communication. ### 20 Here, we have three examples of practical mechanisms and processes for project transparency that are likely to be needed to achieve transparency in a project. 1. Tasks that involve choices about how a project should be governed. For example, defining the nature of the problem that a data-driven technology is designed to address and the algorithmic procedure by which it is implemented. 1. Tasks that involve what we can term 'data stewardship'. For example, the management of data and the data pipeline. 1. Tasks that involve the engagement of stakeholders, such as members of the public, or other professionals in the relevant domain. ### 21 Now we will consider examples of tasks that involve choices about how a project should be governed. Specifically, decisions or actions that can have an impact on explainability and require certain levels of transparency. First, we have problem formulation. Determining the Problem the System is Designed to Address. This task includes information about why the problem is important and why the technical description. Second, we have the Identification and Mitigation of Biases. The decisions about which biases may be relevant for the project and why any chosen mitigation strategies are able to address them. ### 22 We must also consider transparency for tasks that involve what we can call 'data stewardship'. Exploratory and Confirmatory Data Analysis. Although data analysis is highly iterative, a clear record of the analysis techniques employed and the rationale for their use can help ensure a high level of transparency. Data Provenance which includes data extraction and procurement. The source of the data used throughout the project lifecycle has lots of implications for explainability. These include explanations about how the quality of the data was evaluated, or how the legal basis for its use was established. Sp, clear and accessible documentation on data sources will be crucial for project transparency. System Implementation such as data pipeline engineering. Many stakeholders will be interested in evaluating the safety and security of data, especially where it contains sensitive or personal information. Overall, being able to provide explanations about how the data pipeline was constructed, will be important for explainability. ### 23 Finally, let us finally consider tasks in the project that will involve the engagement of stakeholders. > [bnea - difference in slides and GitHub content] Project Planning: who was involved in understanding the scope or harms and opportunities (are all views well represented and meaningfully included) User Training: does the system address challenges of use in time-sensitive situations (e.g. healthcare)? / Stakeholder Identification: External Auditing: These tasks emphasise the importance of accessibility for those who are directly or indirectly involved with a project. ### 24 Can you think of any other tasks which may occur during one of the project lifecycle stages that would require transparency? ### 25 How would you then achieve transparency and how would it contribute to explaining any decisions or actions taken in the project? ### 26 We have covered quite a bit of content and will now move on to the final part of this section on project transparency. The limits of transparency. ### 27 Like explainability, transparency is a nuanced concept and it is important to recognise that there are limits to what can be achieved and what may be desirable. ### 28 For example, smaller teams can have a number of barriers to meeting the ideal of project transparency on account of lack of resource. ### 29 Other barriers can include intellectual property restrictions, legal restrictions on the disclosure of sensitive information and the need to protect the privacy of individuals, or security concerns about the information being shared. in these cases, providing evidenced reasons for why data have not been made accessible or why some decisions about a project's governance have not been disclosed publicly is fine and may even be the appropriate form of transparency. ### 30 Let us summarise all that we have learned in this section. First, we learned that transparency is necessary for explainability but is not the same as ‘accessibility’. Second, we saw examples of why the locus of an explanation may go beyond the model that powers a system. Then, we reviewed how the project lifecycle model provides a scaffold for identifying tasks that may be a source of information. Lastly, there are limits to how much transparency can be obtained. > [bnea - maybe a sentence on ]We are now going to move on to the third section in this module, Model Interpretability.