Shapley values are a useful tool to distribute gains between a coalition of players in a cooperative game. They were first introduced in 1951 by Lloyd Shapley. The setup is as follows: a coalition of players cooperates, and obtains a certain overall gain from that cooperation. Since some players may contribute more to the coalition than others, what is the "fair" distribution of generated surplus among the playersin any particular game? Or phrased differently: how important is each player to the overall cooperation, and what payoff can he or she reasonably expect? The Shapley value provides one possible answer to this question.
Formally, there is a set
The function
The Shapley value is one way to distribute the total gains to the players, assuming that they all collaborate. It is a "fair" distribution in the sense that it is the only distribution which satisfies certain desirable properties (more details on this below). According to the Shapley value, the amount that player
Intuitively, this sums over all subsets
The sum of the Shapley values of all agents equals the value of the grand coalition, so that all the gain is distributed among the agents.
If
for every subset
This property is also called equal treatment of equals.
If two coalition games described by gain functions
for every
for every
The Shapley value
Given a player set
Paper introducing Shapley values as ML model interpretations: A Unified Approach to Interpreting Model Predictions, by Lundberg and Lee, Part of Advances in Neural Information Processing Systems 30 (NIPS 2017), https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html
One can think of ML inputs as players cooperating to get some value (ML model output). Each input is contributing towards the final model, but in unequal ways. Of course, it's difficult to see a direct connection between the input and the "value" since there are many steps to building a model before making a prediction. A key insight in this paper is to think of model interpretation, i.e. assigning values to input features, as its own model. By doing this, they are able to show that various feature attribution methods are simply different approaches to computing an underlying Shapley value.
More formally, let
Definition: Additive feature attribution methods have an explanation model that is a linear function of binary variables
where
These methods attribute an effect
A surprising attribute of the class of additive feature attribution methods is the presence of a single unique solution in this class with three desirable properties.
Property 1: Local accuracy
Property 2: Missingness
Property 3: Consistency
If the marginal contribution of
These properties are clearly analagous to the properties of Shapley values. It should come as no surprise then, that it turns out the values
The exact computation of SHAP values is challenging. However, by combining insights from current additive feature attribution methods, we can approximate them. Model-agnostic approximations include the Shapley sampling values method and Kernal SHAP. There are also model-specific methods such as Linear SHAP, Low-Order SHAP, Max SHAP, and Deep SHAP.
Unfortunately, it's not clear how accurate these methods are in various scenarios. While the theoretical Shapley values are unique, these approximation methods can give results that differ substantially (see this study by Merrick and Taly: https://arxiv.org/abs/1909.08128). Merrick and Taly stress the importance of carefully constructing contrastive explanation questions. Approximating Shapley values relative to chosen references can give more insight than values in a vacuum.