[The Mythos of Model Interpretability](https://arxiv.org/pdf/1606.03490.pdf)

# [The Mythos of Model Interpretability](https://arxiv.org/pdf/1606.03490.pdf) ## Introduction 1. What is interpretability and why is it important? 2. Interpretability is not universally defined. It can refer to many different concepts. 3. Machine learning algorithms use optimization which cannot discover causal association 4. post-hoc interpretation ## Desiderata of Interpretability Research The real-world interpretability objectives: 1. Trust: how often the model is right 2. Causality: generate hypothesis that scientist can test experimentally. 3. Transferability * transferring learned skills to unfamiliar situations. * model should work not only with testing data iid to training data 應該要覺得兩張照片都是游泳 ![](https://i.imgur.com/UrDEiyl.png) 4. Informativeness: provide information to human decision makers 5. Fair and Ethical Decision-Making: whether decisions produced automatically by algorithms conform to ethical standards ## Properties of Interpretable Models 1. Transparency: understand how the model works * Simulatability: if human can run the model * 太複雜的model就不具備這個條件 * 例如你知道input feature 經過rbf kernel之後的意義是甚麼嗎? * Decomposability: each input, parameter, and calculation can tell explanation * 例如認同decision tree的每個分割條件、認同linear model的權重 * 經過feature engineer 之後，可能就無法直覺地知道input的意思 * Algorithmic transparency: know why the training algorithm can work * 為甚麼最小化這個loss function可以達成目標 2. Post-hoc Interpretability: extracting information from learned models. * Text explanation * Neural image captioning * Recommandation system: rating prediction & product review * Visualization: visualize what the model have learned * t-SNE * [Deep dream](https://ai.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html) * [Understanding Deep Image Representations by Inverting Them](https://arxiv.org/pdf/1412.0035.pdf) Reconstruct input image by its representation Given a representation function $\Phi$ and and a representation $\Phi_0=\Phi(x_0)$ to be inverted, reconstruction finds the image x such that ![](https://i.imgur.com/e2RVmAh.png) ![](https://i.imgur.com/OWoquJx.png) * Local explanation: explaining what a neural network depends on locally * Saliency map ![](https://i.imgur.com/ThXgnua.png) * May be misleading:Once you remove a single pixel, you may get a totally different map * Explanation by examples: does model perform similarly on similar data? * [Distributed Representations of Words and Phrases and their Compositionality](https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf) ![](https://i.imgur.com/II7vjpG.png) ## Discussion 1. Linear models are not strictly more interpretable than deep neural networks * Linear model 常常需要強大的feature extraction，我們不一定能了解這些新創造出來的feature 的實際意義。 * DNN 所需要的input常常不需要太多處理，所以在Decomposability 方面可以勝過Linear model 2. Claims about interpretability must be qualified * 以後在做interpretability 的研究時，要先好好定義是哪個面向的interpretability。 3. In some cases, transparency may be at odds with the broader objectives of AI * 不要因為某個model比較能解釋就放棄performance好的其他model 4. Post-hoc interpretations can potentially mislead * model找到的解釋方式可能存在不合常理的部分(例如種族歧視、性別歧視) ## Reference * [作者的演講](https://www.youtube.com/watch?v=mvzBQci04qA) * [蒐集各種可解釋性的相關paper的github](https://github.com/oneTaken/awesome_deep_learning_interpretability)