Shapley value notes

Shapley values

Shapley values are a useful tool to distribute gains between a coalition of players in a cooperative game. They were first introduced in 1951 by Lloyd Shapley. The setup is as follows: a coalition of players cooperates, and obtains a certain overall gain from that cooperation. Since some players may contribute more to the coalition than others, what is the "fair" distribution of generated surplus among the playersin any particular game? Or phrased differently: how important is each player to the overall cooperation, and what payoff can he or she reasonably expect? The Shapley value provides one possible answer to this question.

Formally, there is a set

N

(of

n

players) and a function

v

that maps subsets of players to the real numbers.

v : 2^{N} \to R

, with

v (\emptyset) = 0

, where

\emptyset

denotes the empty set. The function

v

is called a characteristic function.

The function

v

has the following meaning: if

S

is a coalition of players, then

v (S)

, called the worth of coalition

S

, describes the total expected sum of payoffs the members of

S

can obtain by cooperation.

The Shapley value is one way to distribute the total gains to the players, assuming that they all collaborate. It is a "fair" distribution in the sense that it is the only distribution which satisfies certain desirable properties (more details on this below). According to the Shapley value, the amount that player

i

is given in a coalitional game

(v, N)

φ_{i} (v) = \frac{1}{n} \sum_{S \subseteq N ∖ {i}} {(\binom{n - 1}{| S |})}^{- 1} (v (S \cup {i}) - v (S))

Intuitively, this sums over all subsets

S

N

that do not include player

i

. In each case, we compute the total value obtained by that subset with and without player

i

. These marginal contributions are added up and normalized by a combinatorial factor to determine a "fair" distribution of the total

v (N)

for player

i

Properties

Efficiency

The sum of the Shapley values of all agents equals the value of the grand coalition, so that all the gain is distributed among the agents.

\sum_{i \in N} φ_{i} (v) = v (N)

Symmetry

i

and

j

are two actors who are equivalent in the sense that

v (S \cup {i}) = v (S \cup {j})

for every subset

S

N

which contains neither

i

nor

j

, then

φ_{i} (v) = φ_{j} (v)

This property is also called equal treatment of equals.

Linearity

If two coalition games described by gain functions

v

and

w

are combined, then the distributed gains should correspond to the gains derived from

v

and the gains derived from

w

φ_{i} (v + w) = φ_{i} (v) + φ_{i} (w)

for every

i

N

. Also, for any real number

a

φ_{i} (a v) = a φ_{i} (v)

for every

i

N

Null player

The Shapley value

φ_{i} (v)

of a null player

i

in a game

v

is zero. A player

i

is null in

v

v (S \cup {i}) = v (S)

for all coalitions

S

that do not contain

i

Given a player set

N

, the Shapley value is the only map from the set of all games to payoff vectors that satisfies all four properties: Efficiency, Symmetry, Linearity, Null player. This is what is meant by the statement that it is the only "fair" distribution of value.

Machine learning interpretation

A Unified Approach to Interpreting Model Predictions

Paper introducing Shapley values as ML model interpretations: A Unified Approach to Interpreting Model Predictions, by Lundberg and Lee, Part of Advances in Neural Information Processing Systems 30 (NIPS 2017), https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html

Interpretive models

One can think of ML inputs as players cooperating to get some value (ML model output). Each input is contributing towards the final model, but in unequal ways. Of course, it's difficult to see a direct connection between the input and the "value" since there are many steps to building a model before making a prediction. A key insight in this paper is to think of model interpretation, i.e. assigning values to input features, as its own model. By doing this, they are able to show that various feature attribution methods are simply different approaches to computing an underlying Shapley value.

More formally, let

f

be the original prediction model to be explained and

g

the explanation model. Here, we focus on local methods designed to explain a prediction

f (x)

based on an input

x

. Explanation models often use simplified inputs

x^{'}

that map to the original inputs through a mapping function

x = h_{x} (x^{'})

. Local methods try to ensure

g (z^{'}) \approx f (h_{x} (z^{'}))

whenever

z^{'} \approx x^{'}

. (Note that

h_{x} (x^{'}) = x

even though

x^{'}

may contain less information than

x

because

h_{x}

is specific to the current input

x

Definition: Additive feature attribution methods have an explanation model that is a linear function of binary variables

g (z^{'}) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} z_{i}^{'}

where

z^{'} \in {0, 1}^{M}

M

is the number of simplified input features, and

ϕ_{i} \in R

These methods attribute an effect

ϕ_{i}

to each feature, and summing the feature attributions approximates the output

f (x)

of the original model. Methods such as LIME, DeepLIFT, layer-wise relevance propogation, and classic Shapley value estimation all fit this definition, showing a surprising unity across seemingly disparate ML interpretation methods.

Properties

A surprising attribute of the class of additive feature attribution methods is the presence of a single unique solution in this class with three desirable properties.

Property 1: Local accuracy

f (x) = g (x^{'}) = ϕ_{0} + \sum_{i = 1}^{M} ϕ_{i} x_{i}^{'}

Property 2: Missingness

x_{i}^{'} = 0 ⟹ ϕ_{i} = 0

Property 3: Consistency

If the marginal contribution of

x_{i}

to model

f

is greater than its contribution to

f^{'}

, then

ϕ_{i} (f, x) > ϕ_{i} (f^{'}, x)

These properties are clearly analagous to the properties of Shapley values. It should come as no surprise then, that it turns out the values

ϕ_{i}

are in fact those very values. Following from the unique distribution of Shapley values, there is exactly one possible explanation model

g

that satisfies the above 3 properties (i.e. one unique set of values

ϕ_{i}

Approximations

The exact computation of SHAP values is challenging. However, by combining insights from current additive feature attribution methods, we can approximate them. Model-agnostic approximations include the Shapley sampling values method and Kernal SHAP. There are also model-specific methods such as Linear SHAP, Low-Order SHAP, Max SHAP, and Deep SHAP.

Unfortunately, it's not clear how accurate these methods are in various scenarios. While the theoretical Shapley values are unique, these approximation methods can give results that differ substantially (see this study by Merrick and Taly: https://arxiv.org/abs/1909.08128). Merrick and Taly stress the importance of carefully constructing contrastive explanation questions. Approximating Shapley values relative to chosen references can give more insight than values in a vacuum.