This post was originally published in *Towards Data Science*.

**Introduction**

To use machine learning responsibly, you should try to explain what drives your ML model’s predictions. Many data scientists and machine learning companies are recognizing how important it is to be able to explain, feature-by-feature, how a model is reacting to the inputs it is given. This article will show how Shapley values, one of the most common explainability techniques, can miss important information when explaining a model. Then, we will introduce Shapley residuals, a new technique to measure how well Shapley values are capturing model behavior, along with some code to get started calculating them!

Consider the following example from Christopher Molnar’s Interpretable Machine Learning book: a bike-sharing company trains a model to predict the number of bikes taken out on a given day, using features like seasonal info, the day of the week, weather info, etc. Then, if their model is predicting a lower-than-average rider count on some day in the future, they can find out *why* that lower-than-average score is occurring: by looking at how the model is reacting to each feature. Was it because of a holiday? Was it because of the weather?

A common way of computing the importance of each of your model’s features is to use **Shapley values**, since it is a method that is 1) widely applicable to many problems, 2) based on solid theoretical grounding, and 3) easily implementable with the SHAP Python library.

**The problem: **In some scenarios, Shapley values fail to express information about model behavior, because it is only returning a score for *one feature at a time*. For instance, in the bike-sharing scenario, we are treating the weather and the day of the week as independent features, but sometimes it is the *combination* of those features that matters; and in those scenarios of feature *combinations* being more important than the individual features themselves, Shapley values can fail to properly explain a model.

**Bar Trivia Example**

Let’s use a simpler setting with fewer features to walk through the problem with Shapley values in more detail.

I like to attend trivia nights at some local bars in the neighborhood with different coworkers of mine each week. It’s become quite clear that some members of our team bring more to the table than others.

Can we quantify the impact each team member has on the trivia performance? We can use Shapley values for each player with the following interpretation: they should correspond to the expected change in score when adding that player to the trivia team. Other possible interpretations exist*, but we will use this one.

(**Note: This class of methods to compute Shapley values, called “interventional” Shapley values, measure “expected change in score when adding this feature.” A different type is known as “conditional” Shapley values. The key difference between the interventional method and the conditional method lies in how they treat a feature whose expected change in score is zero—what should its Shapley value be? Zero? If you think the answer is “yes,” use the interventional method. If instead, you think the feature might still have importance due to correlations, and if you think that importance should be included in its Shapley value, then consider using the conditional method.)*

Geometrically, a useful way to plot all these 3-player game scores with different teams is as points on a cube, arranged so that neighboring points differ by just one player. Then, the paths between points (a.k.a. the cube’s edges) will represent the change in score when adding a player to a team.

*(Note: With two players, we would plot this as a square. With four or more players, we would have to plot this as a hypercube)*

Let’s call this shape a GameCube; this will be a useful shape for us because *both Shapley values and GameCube edges will correspond to the change in score when adding a player.*

In our story, Reid is only knowledgeable about sports trivia, and GW knows about movies, music, history, geography, literature—pretty much everything *except* sports trivia. So when Reid plays, he improves the score by a little; when GW plays, she increases the score by a *lot*. And me, well, I’m mostly there for the beer and the company.

A Shapley value is a *perfect* measure of explainability *only* when a player *always* contributes the *same* amount to a team’s score. And since each player’s change on the score is constant in our story so far, we can assign a Shapley value of 1 to Reid, a Shapley value of 9 to GW, and a Shapley value of 0 to Max. These Shapley values represent the expected change in score when each player joins the team!

In more technical terms, a game where each player’s impact is consistent (like our story so far) is called an “inessential game.” Also, we will use the symbol *▽v *to represent the “gradient” of a GameCube *v*, which computes the values along the edges between the values on the vertices, and we will use *▽_player_v* to represent the edge values for a specific *player*’s directions and zero along all other edges.

For example, the GameCube gradient *▽_Reid_ν *represents all possible changes in score when adding Reid*.*

**Feature contributions can’t always be expressed as a single number—so Shapley Values aren’t enough.**

You should expect that most of the time, the features you are working with won’t have constant impacts on model outputs—instead, the impact of a feature typically depends on what the other features are.

**Let’s change up our story.**

Suppose that Max’s behavior changes based on who he is playing with. When playing with GW, he is pretty chill, drinks his beer, minds his own business and lets GW do most of the work, so he doesn’t bring the score down. But when Max plays with Reid, he gets jealous of how much Reid knows about sports, so Max starts to speak up more, suggesting some wrong answers and bringing the score down by 1!

On this new GameCube, GW’s edges are constant, so her Shapley value of 9 still corresponds exactly to the change in score when she plays. *But Max’s and Reid’s edges are not constant, because their impact on score depends on who they are playing with*. Therefore, our way of using GameCube edges to quantify what Max and Reid bring to the table now has a problem.

When real data scientists use Shapley values, they solve this problem by taking the *average *contribution of a player to their teams—on the GameCube, this would mean quantifying a player’s contribution as the average edge values in their direction. So on our GameCube above, GW’s Shapley value would still be 9 as in before, but Reid’s Shapley value would now be 0.5 and Max’s Shapley value would now be -0.5. For some use cases, the story ends there—a player’s average contribution can sometimes be a good enough quantification of their impact!

However, this may cause a problem when it comes to *trusting* Shapley values. Because we can trust GW’s Shapley values more than we can trust Max’s or Reid’s Shapley values, since there is more consistency in her contribution to the team than Max’s or Reid’s contributions.

**Shapley Residuals**

The Shapley residual is a measurement of how much a player’s edges deviate from being constant—lower Shapley residuals mean Shapley values are close to perfectly representative of feature contribution, whereas higher Shapley residuals mean Shapley values are missing out on important model information: namely, that a feature’s contribution depends on the other features as well.

The authors of the original Shapley residuals paper formulate this missing information as an error term in a least-squares regression. For example, for the player *Reid*:

*▽_Reid_ν = ▽_ν_Reid + r_Reid*

The left side of this equation is the same partial gradient as earlier. The right side of the equation is the sum of a new GameCube’s gradient, *▽_ν_Reid,* plus a residual cube, *r_Reid, *which measures the amount that our game deviates from being inessential with respect to Reid.

The key idea is that, if Reid has a consistent impact on the team, the residual cube *r_Reid *will be all zeros. On the other hand, if the values on the residual cube *r_Reid *deviate from zero, then that is a signal that Reid’s Shapley value is missing information about how Reid’s impact *depends on who else is playing with Reid. *The higher the values on the residual cube, the more Reid’s contribution depends on which other players are present.

**Code for Calculating Shapley Residuals**

**Imports**

**Generate Synthetic Dataset**

**Train Model & KernelSHAP Explainer**

**Compute Expected Values of Feature Coalitions**

*(Note that we convert the lists to strings since lists are not hash-able types in Python.)*

**Progress Check**

**Create Hypercube Object**

We are using 3 dimensional data so this will just be a cube. But this method extends to hypercubes, growing slower as the number of dimensions increases.

**Compute the Shapley Residuals**

**Conclusion**

Shapley values have become an incredibly popular and generalizable method for explaining which features are important to a machine learning model. By quantifying their effectiveness using Shapley residuals, you will be able to further identify where exactly your machine learning model’s behavior is coming from, and which insights stemming from Shapley values are worth trusting.

Special thanks to the authors of the original Shapley residuals paper for their work!

**Appendix**

All images in the piece are created by the author.

Below is the code for the Hypercube object and other helper functions, which you can use with the starter code above to compute Shapley residuals.