5.10 SHAP (SHapley Additive exPlanations)

# 5.10 SHAP (SHapley Additive exPlanations) ## 5.10.1 Definition - SHAP是由Shapley value啟發的可加性解釋模型。對於每個預測樣本，模型都產生一個預測值，SHAP value就是該樣本中每個特徵所分配到的數值。 - SAHP是基於合作賽局理論(coalitional game theory)來最佳化shapely value ![](https://i.imgur.com/v6al6qG.png) - 式子中每個phi_i代表第i個Featrue的影響程度、Zi為0或者1，代表某一個特徵是否出現在模型之中。 - SHAP是計算shapley values也滿足shapley values 是有唯一解而且Efficiency, Symmetry, Dummy - 公式解釋： - f(x)為原始模型下的預測值 g(x)事後解釋模型的預測值 - f(x)對數據集進行預測，得到模型預測值的平均值 ϕ0 - ϕi代表第 i 個特徵變量的Shapley Value。是 SHAP 中的核心要計算的值，需要满足唯一性。 - M是特徵數目，z表示相對應的特徵是否存在（1或0），這裡的存在是針對如圖像和文本數據（如文本中，將詞one-hot後，某個句子中並不會出現所有詞） - Local Accuracy - 即兩個模型得到的预测值相等。當輸入單個樣本x到模型 g 中得到的預測值要跟原始模型f(x)相同 - Missingness - 若沒有使用到第i個feature，則該feature的影響程度對於總體的影響就是0 - 其 Shapley Value 為 0。 - Consistency - 當複雜模型 f 從隨機森林變成 XGBoost，如果有一特徵變量對模型預測值的貢獻增加，其 Shapley value 也會隨之增加。 * 參考 - https://www.infoq.cn/article/20dOi64Cfj8ONPUXL5zP - https://zhuanlan.zhihu.com/p/85791430 ## 5.10.2 KernelSHAP ### Overview 1. Sample coalitions * K: 決定sample (coalition) 的總數 * 決定coalition的內容 (1: feature present, 0: feature absent) * sample的技巧: 取 small/large coalitions (權重較大) 2. Get prediction * map coalitions 中的 1, 0取得對應/隨機的feature values * 帶入黑盒子模型取得prediction 3. Compute the weight * 跟 lime 差異 * 如何計算 weight 5. Fit weighted linear model * Loss function 7. Return Shapley values * 參考 - https://medium.com/ai-academy-taiwan/explain-your-machine-learning-model-by-shap-part-1-228fb2a57119 - https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf?fbclid=IwAR14tfxHeorvi26tpwTdXNUTX8RhB4I0gK6YOgoviUjXZtMbPy4qv2-LHrQ - https://docs.seldon.io/projects/alibi/en/stable/methods/KernelSHAP.html ## 5.10.3 TreeSHAP 特性： 1. 速度較KernelSHAP快，O(TL2＾M)to O(TLD^2) 2. 看的是local的每一feature貢獻，但也可以做完所有feature後加總 - 計算過程 https://medium.com/analytics-vidhya/shap-part-3-tree-shap-3af9bcd7cd9b 參考 * [https://arxiv.org/pdf/1802.03888.pdf](https://arxiv.org/pdf/1802.03888.pdf) * https://zhuanlan.zhihu.com/p/106320452 ## 5.10.5 SHAP Feature Importance - 傳統的feature importance只告訴我們哪個特徵值重要，但我們並不清楚該特徵是怎麼樣影響結果。SHAP value最大的優勢是SHAP能對於反映出每一個樣本中的特徵影響力，並表現影響的正負性。 - 取每個特徵的SHAP值的絕對值的平均數作為该特徵的重要性，得到一個標準的條型圖(multi-class則生成堆疊的條形圖) - V.S. permutation feature importance - permutation feature importance是打亂資料集的因子，評估打亂後model performance的差值；SHAP則是根據因子的重要程度的貢獻 ## 5.10.6 SHAP Summary Plot - 為每個樣本繪製其每個特徵的为SHAP值，這可以更好的的理解整體模式，並允許發現預測異常值。每一行代表一個特徵，横坐標為SHAP值。一個點代表一個樣本，顏色表示特徵值(紅色高，藍色低) ## 5.10.7 SHAP Dependence Plot (SHAP DP) - 為了理解單個feature如何影響模型的輸出，可以將該feature的SHAP值與數據集中所有樣本的feature值進行比較。 - 由於SHAP值表示一個feature對模型輸出中的變動量貢獻，圖表示隨著選定的特徵值，預測目標的shap值變化狀況。 - X軸表示此特徵的值，Y軸表示此特徵的每個instance對預測結果有什麼影響 - 圖5.53：與0年相比，服用年數較少會降低罹癌率，服用年數較高則會增加離癌率。 - 與PDP的差異 - PDP是平均效果，SHAP DP則是特徵對預測的影響。 - PDP的Y軸是預測值，SHAP DP的Y軸是SHAP值。 - 若有交互作用的情況下，可從SHAP DP中看到同一數值下的instance在y軸上會較分散。 ## 5.10.8 SHAP Interaction Values * 5.9.3.1 The Shapley value of a feature value is its contribution to the payout, weighted and summed over all possible feature value combinations: * $\phi_j=\sum_{S\subseteq\{x_{1},\ldots,x_{p}\}\setminus\{x_j\}}\frac{|S|!\left(p-|S|-1\right)!}{p!}\left(\left(S\cup\{x_j\}\right)-S\right)$ * $|S|!\left(p-|S|-1\right)!\p!$ = $\frac{1}{\binom{p-1}{|S|}p}$ * $\phi_{i,j}=\sum_{S\subseteq\setminus\{i,j\}}\frac{|S|!(M-|S|-2)!}{2(M-1)!}\delta_{ij}(S)$ ![Uploading file..._aahb2xgzo]() * 參考書籍(第27頁) https://books.google.com.tw/books?id=YcJv1XzIWSMC&lpg=PA23&ots=PltN-3otmB&dq=fuzzy%20integral%20as%20a%20new%20tool%20in%20pattern%20recognition&hl=zh-TW&pg=PA27#v=onepage&q&f=true when i≠j and: $\delta_{ij}(S)=f_x(S\cup\{i,j\})-f_x(S\cup\{i\})-f_x(S\cup\{j\})+f_x(S)$ * 參考資料 https://cloud.tencent.com/developer/article/1629351 https://blog.csdn.net/l https://zhuanlan.zhihu.com/p/64799119 利用SHAP解释Xgboost模型 http://sofasofa.io/tutorials/shap_xgboost/ ## 5.10.9 Clustering SHAP values - 通常的做法是以features來分群，但features有不同的scale，增加計算距離的難度。 - 可用shapley value來分群，適用於各種分群方法。 ## 5.10.10 Advantages - 將 feature 如何組成 prediction 合理有效的描述出來( The prediction is fairly distributed among the feature values.)，並知道單一local prediction 與平均的 prediction 差異。 - connects LIME and Shapley values - LIME 使以線性模型來替代複雜模型模型，來解釋各feature 的貢獻 - Shapley Value 透過賽局理論算出每個 feature 貢獻 - 用LIME的線性替代+以shapley 為值計算= SHAP - Tree based SHAP 的計算很快（所以普及） - 可以LOCAL 也可以 GLOBAL - Shapley values are the "atomic unit" of the global interpretations - 受益於他計算快速，所以可以大量計算 - 他的考量是全面的，不像其他 LIME 是取局部資料 local ，缺乏綜觀的基礎 ## 5.10.11 Disadvantages - KernelSHAP - is slow (太多排列組合要考量了) - ignores feature dependence（會隨機產置/打亂 feature，產生不合理的變數） - 有可能在極端資料上放太多的權重 - TreeSHAP (可以解決Kernel SHAP 的問題) - TreeSHAP can produce unintuitive feature attributions - 沒有影響的 feature，他的TreeSHAP value 不一定是零 (TreeSHAP changes the value function by relying on the conditional expected prediction. With the change in the value function, features that have no influence on the prediction can get a TreeSHAP value different from zero.) - 綜合問題(Shapley valuesy 造成的) - 因為是計算跟平均值的差異，解讀上容易誤解 - 計算時需要重複取用資料，除了TreeSHAP（The disadvantages of Shapley values also apply to SHAP: Shapley values can be misinterpreted and access to data is needed to compute them for new data (except for TreeSHAP). ## 5.10.12 Software 參考 https://zhuanlan.zhihu.com/p/64799119 參考（Hsin） https://zhuanlan.zhihu.com/p/85791430 https://towardsdatascience.com/identifying-high-risk-groups-using-shap-values-on-healthcare-data-e3e7198f30f6 http://www.weainfo.net/news/detail/420388 參考shap作者github https://github.com/slundberg/shap ###### tags: `重點摘要`