Reinforcement Learning for Counterfactual Explanations

Authors: Sahil Verma, Keegan Hines, John Dickerson

In the field of Explainable AI, a recent area of exciting and rapid development has been counterfactual explanations. In this approach, we aim to understand the decisions of a black-box machine learning model by quantifying what would have needed to have been different in order to get a different decision. A common example is that of credit lending: if I am denied for a loan (by an algorithm), then a counterfactual explanation of that decision could inform me that if my income were $10K higher and my credit score were 30 points higher, then I would have been approved. This basic premise is intriguing, but it also comes along with several subtle constraints and desirable add-ons. These might include things like, causal constraints, realism constraints, actionability, sparsity, computational efficiency, and so on. If you’re interested in learning more about these overall areas of research, feel free to read our recent review paper from the NeurIPS workshop on ML Retrospectives, which won a Best Paper Award.

We’ve recently posted a new approach to these problems which allows us to solve for many of these constraints. By framing the problem of generating counterfactual explanations as a Markov Decision Problem, we can associate many of the desiderata to various components of an agent and environment (such as the transition function or reward function). Then, using common techniques from reinforcement learning, we can train an agent to calculate counterfactuals for a given model. Once this agent is trained, we have amortized the computation of new counterfactuals, since any new counterfactual is calculated by simply evaluating the agent’s policy for any starting point. This makes the method extremely computationally efficient for calculating new counterfactuals. As you can see in the table below, our approach, which we termed FastCFE, is able to accommodate all of the major desiderata/constraints that have been brought forth recently in the counterfactual explainability community. This is an exciting development in operationalizing counterfactual explainability for real-world and high-volume use cases. We hope you enjoy the paper.

Reinforcement Learning for Counterfactual Explanations

SHARE