Explainable AI

# Explainable AI AI + VISUALIZATION + HUMAN = XAI HUMAN-DATA INTERACTION 1. Understand patterns 2. Gaining insights 3. Make decisions 4. Communicate Why visualization? HUman can see patterns that algorithms cannot EX: Anscombe's Quartett: set of **four datasets** that have nearly **identical statistical properties** yet they **look very different** when graphed. Each dataset consists of **11 (x,y) pairs**.The quartet is often used to demonstrate that **summary statistics can be misleading and that data visualization is important** for understanding the underlying patterns and structure in data. It is also used to emphasize the importance of exploring data and not relying solely on a single analysis or summary statistic to draw conclusions. Why a human in the loop? human not needed if fully automatic solution exists and is trusted. human needed for ill-defined/ ill-structured problems: 1. No single optimal solution 2. No clear objective measures (cure for cancer) Data-ink ratio = data ink/total ink used in graphic (higher the ratio the better i.e maximize the data-ink ratio) Hairball problem (more data than the pixels in the screen hence scaling to bigger datasets will bring this problem where it just looks all mixed up like a hairball) Purpose of XAI: 1. Interpretability and explainability 2. Debugging and improving models 3. Comparing and selecting models 4. Teaching and understanding concepts Inmates running the asylum problem: XAI used to be done by ML researches, it was very algorithm-centered and was criticized for that. Solution to this problem: design XAI solutions for needs of the intended audience, consider how various users interpret and react to explanations, maybe hire multi-disciplinary personels in the team to develop XAI solutions. Who is the target audience in XAI? ML Experts - Model Developer Data scientist - Model user Decision makets using AI systems(doctors, politicians, employment office) People who might be affected by AI(pateints, loan applicant, drivers) Regulatory agencies(FDA) Performance vs Explainability Tradeoff for ML techniques Deep learning- High accuracy, low explanability Decision trees - low accuracy, high explanability The more complex the ML technique the higher the accuracy but lower the explanability (black box) Vis4ML ->XAI ML4VIS -> suggest better visualization based on user interactions, train a model to look for data combinations that can be differentiated well which you show the user. Storytelling 1. Author driven: Linear ordering, heavy messaging, no interactivity 2. Reader Driven: No ordering, less messaging, free interactivity Martini Glass Structure: Start with author driven and then opens up for exploration Interactive slideshow: Split into multiple scenes, allow interaction midway Drill down story: Let reader decide which path to follow, all paths are annotated. Scrollytelling -> scroll down and story evolves Grouping principles: 1. Proximity: items close to each other ex: clusters in scatterplots 2. Containment: couple of items then you draw an area around some of them they are percieved as group even if they are not together. 3. connection: if you connect items they are percieved as belonging together. ex: graph/network visualizations, node-link diagram 4. Similarity: Similar by color, size shape or so on. 5. Continuation: follow lines/ trends, the data points along the line/trend is a group. 6. Common fate: flock of birds flying together to a certain direction. Connection is the strongest grouping principle. It can overrule others. Examples below: ![](https://i.imgur.com/NlcpY15.png) Sorting Algorithm visualization: 1. bump charts -> time progresses from left to right...the grouping principle used is connection and continuation.You can also color code it for larger arrays. In the end you get a heatmap. 2. Using animation:downside is takes time to understand and finish the animation(memory)..solution: eyes beat memory so stack frames on top of each other, top unsorted, bottom sorted. 3. You can also use time curve technique with various projection methods. Clustering Algorithm visualization K-Means: grouping principles containment and similarity, connection and similarity. ![](https://i.imgur.com/AH7HZ7K.png) Partitioning Clustering: color coded data items so called heatmaps and then you put them into clusters and you start seeing differences. Compare clustering results -> Matchmaker technique Particle based approches: 1. Show every data point as a single particle Dimensionality reduction It is a powerful technique to look for hidden structure in high dim data High dimensional data -> low dimesional embeddings/ projections Curse of dimensionality -> 100, 1000s of dimensions, transform this high dim data to space with fewer dims (2D/3D) Disadvantage: 1. Hard to preserve semantics of single dimensions. 2. Hard to understand and interpret. 3. Error not visible/ false confidence Goal: reveal patterns and clusters of similar or dissimilar data 2D preferred over 3D (because of the tilt, occlusion, height is hard to determine, perspective distortion) Exception: protein structure. Techniques: 1. Linear: PCA, Multidimensional Scaling (semantically meaninful axis) 2. Non-linear: semantic is completed lost (t-sne, UMAP, SOM) PCA advantage: 1. relatively computationally cheap. 2. Can save embedding model to then project new data points into the reduced space. disavantage: 1. Linear reduction limits information that can be captured. T-Sne advantage: 1. Produces highly clustered, visually striking embeddings. 2. Non-linear so it captures local structure well. disadvantage: 1. Global structure is lost. 2. More computationally expensive. 3. Requires setting hyper parameters. 4. Non-deterministic, run it second time different result. 5. Cluster sizes in tsne might mean nothing 6. Distances between clusters might not mean nothing 7. Random noise doesnt always look random 8. You can see new some shapes sometimes that is not there 9. For topology, you may need more than one plot Umap advantage: 1. non-linear, computationally faster than tsne. 2. can preserve local or global structure. disadvantage: 1. Requires setting hyperparameters. 2. non-deterministic. Embedding based trajectories Time curves technique: folding time to visualize patterns of temporal evolution in Data a. Timeline with time difference. ![](https://i.imgur.com/9vJPxxf.png) b. Folding (bend the line according to similarity) ![](https://i.imgur.com/yGFwXic.png) c. Time curve (distance between points show you how similar the data points are) ![](https://i.imgur.com/1VhotXO.png) Patterns seen in time curves: 1. Cluster 2. Transition 3. Cycle 4. U turn 5. Outlier 6. Oscilation 7. Alternation Time curves of different wikipedia articles: 1. Loops are reverts in the articles 2. Oscilation are edit wars (big edits/ changes) 3. Alternation is back and forth editing between two versions of the same article 4. very similar versions of the same article appear as a cluster pattern. WHY same as purpose of XAI. Interpretability and Explainability, Debugging and improving models, comparing and selecting models, Teaching and understanding concepts WHO Model developer and builders -> researchers, engineers who develop and deploy deep neural networks. They have a strong understanding. Model users -> data scientists, may have some technical background but not experts in deep neural networks. They develop domain specific applications using pre-traied models etc. Non-experts -> No prior knowledge of deep learning. Users who are directly impacted or their decisions is impacted by AI. ex: pateints, doctors, etc. WHAT 1. Computational Graph & Network architectures ex: computational graph(how a nn trains, tests, saves data and checkpoints after each epoch), dataflow graph(how data flows from operation to operation while training and using a model) 2. Learned Model Parameters 3. Individual Computational Units (instance level observation of activations, gradients for error measurement) 4. Neurons in High dim space (each neuron is a feature vector, dimensionality reduction to find some meaning) 5. Aggregated Information(groups of instances instead of single instance level observation, model metrics like summary statistics, loss, accuracy, etc). Helps to compare performance of multiple models. HOW Node link diagrams for Network Architectures 1. It visualizes where data flows and magnitude of edge weights, complex models can give hairball problem, aggregation helps. ex: Tensorboard for Model Developers and builders to explain and understand the model architecture. Line charts for temporal metrics like loss, accuracy, measure of errors after each epoch Dimensionality reduction Instance based analysis 1. global level (accuracy) 2. Class level (class conditional aggregrate scores: accuracy, precision, recall) 3. Instance level (ground truth and prediction for each instance) ex: confusion matrix with heatmaps, coloring of textual data, instanceFlow Interactive experimentaion ex: GAN dissection, Distil Handwriting Article, Google Quick Draw, Tensorflow playground Algorithms for Attribution and Feature Visualization 1. generate new image that highlights important regions of the image 2. generate new image that is supposedly representative of the same class WHEN Before Training (improving data, feature quality) During training (model understading, diagnosis) After Training, posthoc (understanding data analysis results) WHERE Application domains: drug discovery, protein folder, cancer classification, autonomous driving, etc Research communities Overview of explanation techniques (WHAT AND HOW) Intrinsic interpretability vs Post-hoc explanation Global vs local model agnostic vs model specific Explaining in the input space 1. Permutation Feature Importance (Global, model agnostic) We are given a black box model and we want to know how important certain features are for this predictive model. We take one of the input feature(x1) from data and randomly permute this feature (basically destroying information in this feature). Then we feed this new permuted dataset into our black box model and see our new errors and measure the relative error (new error/ old error). If high relative error then the feature was important. If low relative error then feature was unimportant. Adv: 1. Intuitive 2. Highly compressed, global insight 3. error ratio is comparable across problems 4. Feature interactions are accounted for 5. No retraining Disadv: 1. Correlated features can be biased 2. Uses error of model not variance 3. Need access to true labels 4. Randomness 5. Adding corr features deserces associated feature importance Partial Dependence Plot(Global, model agnostic) ![](https://i.imgur.com/MhueMzJ.png) Advantages: Intuitive and easy to intrepret, casual interpretation Disadv: Assumes feature independence (correlation problems), heterogeneous effects may be hidden Individual Conditional Expectation (ICE) Local, Modal agnostic ![](https://i.imgur.com/wH2E5dA.png) Plot line for each individual sample. Advantage: 1. even more intuitive than PDP. 2. Heterogeneous relationships visible. Disadv: 1. overcrowding. 2. can only display one feature 3. correlated features problem -> unlikely or invalid data on the plot. Marginal Plots (M-Plots) Dont generate artificial new samples, dont use unrealistic data. We take data in a range and average over this data. Downside: Intrepretation is no longer causative. Accumulated Local Effects (ALE) Plot(Global, Model agnostic) Instead of averaging we fix all of these samples within the interval to the minimum and to the maximum and take the difference between them. Accumulate and center. solves correlation problem. Adv: 1. unbiased, work when features are correlated 2. faster computation Disadv: Need to balance interval size, doesnt work with ICEs, complex interpretation Neural Additive Models(NAMS, based on GAMS) Globally and intrinsically interpretable, model specific Take individual neural network for each input feature. useful for Multi task prediction models (cat and female prediction) Adv: 1. Complete description of the model 2. Allows the application of any NN architecture 3. Allows for multitask prediction 4. Differentiability Disadv: 1. No feature interactions 2. difficult interpretations in some cases Global Surrogate Models (global, model agnostic) replace black box model by an interpretable model which is close to the black box model predictions adv: 1. flexible 2. intuitive Disadv: 1. do you need your black box in the first place? 2. does interpretable model behave in the same way as black box? Local Surrogate Models (local, model-agnostic) LIME: Local Interpretable Model-agnostic Explanations Lets say we have a binary classification model with a complex decision boundary that we dont know. We now want to understand what the model does for one specific data-point. We would now try to create a local surrogate model that is interpretable. Adv: 1. work for tabular data, images and text 2. fast and easy to use Disadv: 1. difficult to choose neighborhood correctly 2. prone to adversial attacks Shapley Values (local, model-agnostic) What is each feature's contribution to the models final prediction? Think of model decision as a coalition game played by the features. ![](https://i.imgur.com/rawXUFb.png) Adv: 1. Attributions fairly distributed 2. Theoretically grounded, game theory Disadv: 1. Computationally expensive 2. Must use all features Saliency Maps (Local, Model-specific) -> pixel attribution 1 forward pass 1 backward pass is done. In gradcam, backward pass is only done until 1 specific conv layer in our network. There is some evidence that the later conv layer in a cnn learn some more high level features that would be more related to human understandable concepts. Then we get coarse activation map instead of pixel level maps. then we simply upscale them and plot them on top of the image. guided gram-cam -> vanilla gradient + grad-cam SmothGrad -> adds noise to input Adv: Visual and intuitive, fast to compute Disadv: difficult to judge correctness, prone to adversarial attacks Explaining by Examples Nearest neighbor examples (Local, Model-agnostic) intrinsically interpretable. -> show similar instances from training data -> measure similarity in input domain and latent space. Influential Instances -> measures how much a particular instance influences the resulting model deletion diagnostics-> retrain model once per training instance influence functions-> approximate model parameters using gradient without retraining it Used to debug the data and model, get an impression of model robustness Prototypes and criticisms (global, model agnostic) Prototypes: representative instance of data Criticism: data point that is not well represented by prototype Counterfactual Explanations Creating new data that are not yet contained in the dataset. Certain input causes output, changing input causes output to change. Adversarial Examples: aim to decieve not interpret examples: fast gradient sign, 1 pixel attack, Adversarial Patch, Robust adversarial examples, blackbox attack Feature Visualization by Activation Maximization (model specific, global) Explaining by concepts Concept bottleneck models (local, intrinsically interpretable) 1. map inputs to concepts 2. concepts to predictions Testing with Concept Activation Vectors (global, posthoc, model specific) How does the concept influence the prediction of a certain class. Network dissection: Analyse output of one unit by upsampling it and overlaying with the input image. This overlay gives you idea on what the network focuses. Then we can use semantic segmentation network to get the concepts from the input and then use these concepts to compare with the overlay to come to an agreement. Temporal Evolution of Learning time series problem, post-hoc and monitoring as well Global level -> line charts Class level -> ConfusionFlow (heatmap+linecharts for metrics)=>confusion on class level=>compare different models Instance level -> InstanceFlow => confusion on the instance level => 2 visualizations (one on top sankey diagram, below shows for every row one instance and behavior over time) CNN explainer => for learning about CNNs with interactive visualization