ML Explainability

Shapley Residuals: Measuring the Limitations of Shapley Values for Explainability

Shapley Residuals: Measuring the Limitations of Shapley Values for Explainability

This post was originally published in Towards Data Science.

We will use a cube representation of games to walk through the interpretation and limitations of Shapley values.

Introduction

To use machine learning responsibly, you should try to explain what drives your ML model’s predictions. Many data scientists and machine learning companies are recognizing how important it is to be able to explain, feature-by-feature, how a model is reacting to the inputs it is given. This article will show how Shapley values, one of the most common explainability techniques, can miss important information when explaining a model. Then, we will introduce Shapley residuals, a new technique to measure how well Shapley values are capturing model behavior, along with some code to get started calculating them!

Consider the following example from Christopher Molnar’s Interpretable Machine Learning book: a bike-sharing company trains a model to predict the number of bikes taken out on a given day, using features like seasonal info, the day of the week, weather info, etc. Then, if their model is predicting a lower-than-average rider count on some day in the future, they can find out why that lower-than-average score is occurring: by looking at how the model is reacting to each feature. Was it because of a holiday? Was it because of the weather?

A common way of computing the importance of each of your model’s features is to use Shapley values, since it is a method that is 1) widely applicable to many problems, 2) based on solid theoretical grounding, and 3) easily implementable with the SHAP Python library.

The problem: In some scenarios, Shapley values fail to express information about model behavior, because it is only returning a score for one feature at a time. For instance, in the bike-sharing scenario, we are treating the weather and the day of the week as independent features, but sometimes it is the combination of those features that matters; and in those scenarios of feature combinations being more important than the individual features themselves, Shapley values can fail to properly explain a model.

Bar Trivia Example

Let’s use a simpler setting with fewer features to walk through the problem with Shapley values in more detail.

I like to attend trivia nights at some local bars in the neighborhood with different coworkers of mine each week. It’s become quite clear that some members of our team bring more to the table than others.

Can we quantify the impact each team member has on the trivia performance? We can use Shapley values for each player with the following interpretation: they should correspond to the expected change in score when adding that player to the trivia team. Other possible interpretations exist*, but we will use this one.

(*Note: This class of methods to compute Shapley values, called “interventional” Shapley values, measure “expected change in score when adding this feature.” A different type is known as “conditional” Shapley values. The key difference between the interventional method and the conditional method lies in how they treat a feature whose expected change in score is zero—what should its Shapley value be? Zero? If you think the answer is “yes,” use the interventional method. If instead, you think the feature might still have importance due to correlations, and if you think that importance should be included in its Shapley value, then consider using the conditional method.)

Geometrically, a useful way to plot all these 3-player game scores with different teams is as points on a cube, arranged so that neighboring points differ by just one player. Then, the paths between points (a.k.a. the cube’s edges) will represent the change in score when adding a player to a team.

(Note: With two players, we would plot this as a square. With four or more players, we would have to plot this as a hypercube)

Let’s call this shape a GameCube; this will be a useful shape for us because both Shapley values and GameCube edges will correspond to the change in score when adding a player.

Figure 1: plotting each trivia score on a different vertex of a cube corresponding to the players present on the team that night.

In our story, Reid is only knowledgeable about sports trivia, and GW knows about movies, music, history, geography, literature—pretty much everything except sports trivia. So when Reid plays, he improves the score by a little; when GW plays, she increases the score by a lot. And me, well, I’m mostly there for the beer and the company.

A Shapley value is a perfect measure of explainability only when a player always contributes the same amount to a team’s score. And since each player’s change on the score is constant in our story so far, we can assign a Shapley value of 1 to Reid, a Shapley value of 9 to GW, and a Shapley value of 0 to Max. These Shapley values represent the expected change in score when each player joins the team!

Figure 2: Viewing the change in team scores when adding each player.

In more technical terms, a game where each player’s impact is consistent (like our story so far) is called an “inessential game.” Also, we will use the symbol ▽v to represent the “gradient” of a GameCube v, which computes the values along the edges between the values on the vertices, and we will use ▽_player_v to represent the edge values for a specific player’s directions and zero along all other edges.

For example, the GameCube gradient ▽_Reid_ν represents all possible changes in score when adding Reid.

Figure 3: Expressing the change in scores when adding a player as the partial gradient of the GameCube with respect to each player.
Feature contributions can’t always be expressed as a single number—so Shapley Values aren’t enough.

You should expect that most of the time, the features you are working with won’t have constant impacts on model outputs—instead, the impact of a feature typically depends on what the other features are.

Let’s change up our story.

Suppose that Max’s behavior changes based on who he is playing with. When playing with GW, he is pretty chill, drinks his beer, minds his own business and lets GW do most of the work, so he doesn’t bring the score down. But when Max plays with Reid, he gets jealous of how much Reid knows about sports, so Max starts to speak up more, suggesting some wrong answers and bringing the score down by 1!

Figure 4: The new GameCube with inconsistent player contributions.

On this new GameCube, GW’s edges are constant, so her Shapley value of 9 still corresponds exactly to the change in score when she plays. But Max’s and Reid’s edges are not constant, because their impact on score depends on who they are playing with. Therefore, our way of using GameCube edges to quantify what Max and Reid bring to the table now has a problem.

When real data scientists use Shapley values, they solve this problem by taking the average contribution of a player to their teams—on the GameCube, this would mean quantifying a player’s contribution as the average edge values in their direction. So on our GameCube above, GW’s Shapley value would still be 9 as in before, but Reid’s Shapley value would now be 0.5 and Max’s Shapley value would now be -0.5. For some use cases, the story ends there—a player’s average contribution can sometimes be a good enough quantification of their impact!

However, this may cause a problem when it comes to trusting Shapley values. Because we can trust GW’s Shapley values more than we can trust Max’s or Reid’s Shapley values, since there is more consistency in her contribution to the team than Max’s or Reid’s contributions.

Shapley Residuals

The Shapley residual is a measurement of how much a player’s edges deviate from being constant—lower Shapley residuals mean Shapley values are close to perfectly representative of feature contribution, whereas higher Shapley residuals mean Shapley values are missing out on important model information: namely, that a feature’s contribution depends on the other features as well.

The authors of the original Shapley residuals paper formulate this missing information as an error term in a least-squares regression. For example, for the player Reid:

▽_Reid_ν = ▽_ν_Reid + r_Reid

The left side of this equation is the same partial gradient as earlier. The right side of the equation is the sum of a new GameCube’s gradient, ▽_ν_Reid, plus a residual cube, r_Reid, which measures the amount that our game deviates from being inessential with respect to Reid.

Figure 5: the residual cube is the amount a game deviates from inessentiality with respect to a given player.

The key idea is that, if Reid has a consistent impact on the team, the residual cube r_Reid will be all zeros. On the other hand, if the values on the residual cube r_Reid deviate from zero, then that is a signal that Reid’s Shapley value is missing information about how Reid’s impact depends on who else is playing with Reid. The higher the values on the residual cube, the more Reid’s contribution depends on which other players are present.

Code for Calculating Shapley Residuals

Imports
Generate Synthetic Dataset
Train Model & KernelSHAP Explainer
Compute Expected Values of Feature Coalitions

This uses explainer.synth_data, the set of the synthetic data samples generated by the shap library when the explainer is trained.

The dictionary coalition_estimated_values maps feature coalitions to the expected value of the model when those features are used, relative to a baseline (which is the expected value when no features are used: the average model output).

(Note that we convert the lists to strings since lists are not hash-able types in Python.)

Progress Check

coalition_estimated_values should look something like this:

{'[]': 0,
 '[0]': -0.3576234198270127,
 '[1]': 0.010174318030605423,
 '[2]': -0.08009846972721224,
 '[0 1]': -0.34261386138613864,
 '[0 2]': -0.37104950495049505,
 '[1 2]': 0.14435643564356437,
 '[0 1 2]': -0.396}

Create Hypercube Object

We are using 3 dimensional data so this will just be a cube. But this method extends to hypercubes, growing slower as the number of dimensions increases.

Feel free to use the code for the Hypercube python class in the appendix for this article, or to write your own. It needs to place the coalition_estimated_values on the vertices of the cube, and it needs to compute the edge values as the difference between neighboring vertex values.

Compute the Shapley Residuals

For each feature, minimize || ▼_feature_cube — ▼_cube_feature || to compute the residual. This uses a helper function called residual_norm defined in the appendix at the end of this article.

Conclusion

Shapley values have become an incredibly popular and generalizable method for explaining which features are important to a machine learning model. By quantifying their effectiveness using Shapley residuals, you will be able to further identify where exactly your machine learning model’s behavior is coming from, and which insights stemming from Shapley values are worth trusting.

Special thanks to the authors of the original Shapley residuals paper for their work!

Appendix

All images in the piece are created by the author.

Below is the code for the Hypercube object and other helper functions, which you can use with the starter code above to compute Shapley residuals.