AI Bias & Fairness

Making models more fair: everything you need to know about algorithmic bias mitigation

Making models more fair: everything you need to know about algorithmic bias mitigation

As ML models affect our lives more and more, machine learning practitioners need to ensure that our models are not creating harm for end-users. It is especially important to make sure that our models are not unfairly harming any subgroups of our population. The first step is identifying and quantifying any potential bias in a model, and many different definitions of group fairness have been proposed. The Arthur platform provides proactive bias monitoring and alerting, so you know exactly when, where, and how algorithmic bias is occurring in your models.

At Arthur, we’re also interested in the question of what we can do to help make your models more fair. In this post, we briefly describe families of techniques for bias mitigation: ways to improve an unfair model. While this is an active area of research, current mitigation techniques target specific parts of the model development lifecycle: preprocessing, or adjustments on the training data; in-processing, or algorithms specifically intended to be fair; and post-processing, or adjusting the outputs of the model. The right solution might depend on the use case, the industry, and the model deployment; in this post, we outline several families of approaches to fair ML.

Why do we observe unfair classifiers?

Intuitively, training classifiers on historical datasets “bakes in” bias into the model: if your hiring algorithm uses data from the 1960s, it’s likely to think that women are great at being secretaries, while men should be CEOs. However, actually characterizing how “biased data” might be related to a “biased model” is a more complicated task.

One commonly discussed cause of bias in a learned model is an imbalance between subgroups in the data. By definition, if we have a subgroup that is a minority represented in our training data, then that means we have fewer observations of them compared to majority groups. When the classifier is training, it is optimizing a loss function over the whole dataset. If the major class is truly dominant, then it is possible that the best way to achieve high overall accuracy on the training data is to be as accurate as possible on the majority group while incurring errors on the minority group.

Therefore, if the majority group and minority group have any differences in their properties and relationship to the target variable, the model is likely adhering primarily to the patterns of the major data and potentially ignoring the contributions of the minority. This could mean the model is fairly accurate for the majority groups, but much less accurate for the smaller subgroups.

Of course, it’s worth mentioning that not all data biases result from undersampling. In many cases where the data in question represents human behaviors, as in the cases of loan performance, hiring, or crime data, historical biases from human personalities can also show up in the information. Hiring data from the 1960s made predictively, for instance, might suggest that women are best suited as secretaries rather than executives. No matter what technique you choose to ameliorate bias among these options, close attention should be paid to the historical context through which the data collection practices and societal influence can be better understood.

Pre-Processing Bias Mitigation

Pre-processing techniques for bias mitigation tend to be all about the data. As described in the previous section, particular characteristics of the training data may directly cause the problematic performance of a learned model. For this reason, many techniques for pre-processing focus on modifying the training set to overcome versions of dataset imbalance. This could be achieved in many ways including resampling rows of the data, reweighting rows of the data, flipping the class labels across groups, and omitting sensitive variables or proxies. Other techniques consider learning direct modifications and transformation functions that achieve desired fairness constraints.  In all cases, the strategy is to change the underlying training data, and then proceed with training using any classification algorithm desired. By modifying the training data in these specific ways, the outputs of the learned classifier will be less biased.

In-Processing Bias Mitigation

With in-processing techniques, we want to create a classifier that is explicitly aware of our fairness goals. That is, in training the classifier, it is not enough to simply optimize for accuracy on the training data. Instead, we modify the loss function to account simultaneously for our two goals: our model should be both accurate and fair. This modification can be achieved in many ways such as using adversarial techniques, ensuring underlying representations are fair, or by framing constraints and regularization. In each case, the goal is that the underlying classifier is directly taking fairness into consideration. As a result, the outcomes of that trained classifier will be less biased as compared to a classifier that knew nothing about fairness.

Post-Processing Bias Mitigation

Finally, there is a family of techniques that aim to only adjust the outputs of a model and leave the underlying classifier and data untouched. The benefit here is appealing in its simplicity - in using post-processing methods, we allow the model development team to use any modeling algorithms they wish, and they don’t need to modify their algorithm or retrain a new model to make it more fair. Instead, post-processing methods center on the idea of adjusting the outputs of an unfair model such that the final outputs become fair. As an example, early works in this area have focused on modifying outcomes and thresholds in a group-specific manner.

Suppose we build a classification model to assist in credit risk decisions. After much hyperparameter tuning, we arrive at a model that is accurate and generalizes well, but we notice that it tends to favor older loan applicants over younger applicants. With post-processing techniques, we would keep the classifier as is, but adjust the outcomes so that the overall acceptance rates are more equitable. We would pick a definition of fairness (say, Demographic Parity), and adjust the treatments across the groups such that the final outcomes are as desired. This means we might have group-specific thresholds instead of a single threshold for the classifier.

Detecting unintended bias is the first step to mitigating it. Arthur gives you the power to detect and analyze bias in your ML models.

Detecting unintended bias is the first step to mitigating it. Arthur gives you the power to detect and analyze bias in your ML models.


It’s important to note that in this scenario, there remains a lot of legal ambiguity around bias mitigation. With so much unknown about how courts will handle algorithmic discrimination, many organizations are leaning heavily on their legal teams for how to navigate this complexity!

Many post-processing techniques have this basic structure in common: they leave the classifier and the data alone, and only adjust the outcomes in a group-dependent way. And while binary classification has been a focus in the past, recent work has sought to extend these ideas to regression models as well. The overall framework can be effective for achieving fairness in an ML system, though in some use cases, treating groups differently could be an uncomfortable proposition, or even an illegal one.

Accuracy/Fairness Tradeoff

When we embark on deploying ML models that are more fair, we have to acknowledge that this fairness does not come for free; in fact, in many cases, it may conflict with model accuracy. Consider one extreme: a model that is as accurate as possible (relative to available ground truth) is potentially quite unfair and discriminates against at least one subpopulation. Consider the other extreme: a model which is perfectly fair and is equitable across all populations, this model must be less accurate than a model that did not consider fairness as a constraint. (Though some recent work suggests the tradeoff may not necessarily always occur, the behavior of "fair algorithms" when deployed in the real world will not always match the results demonstrated in theory; as a result, understanding the relationship between fairness and accuracy is critical to being confident in the models we choose to use. )

Between these two extremes live a broad family of possible models that balance the concerns of accuracy and fairness. This set of models forms a Pareto frontier (efficient frontier) in the space of Accuracy vs Fairness.

The figure below, from a 2019 survey paper characterizing the performance of many popular fair ML algorithms, illustrates this tradeoff empirically: on the x-axis is Disparate Impact, a measure of fairness, while the y-axis is accuracy. (The entire paper is worth reading; it is an excellent introduction to many common considerations for the performance of fair ML models.) As practitioners and stakeholders, we must confront questions about this tradeoff: for each use case, we must weigh the costs of potential harm through unfairness against costs of potential harm through reduced accuracy.

Adult dataset, sex attribute
Adult dataset, race attribute

The following charts illustrate the trade-off between fairness and accuracy among some of the more popular fair ML algorithms.

These are challenging questions that have no single right answer. Instead, ML practitioners must work together with stakeholders such as business leaders, humanities experts, compliance, and legal teams and formulate a program for how to best treat your population.

The Arthur platform brings together performance monitoring and algorithmic bias monitoring into a unified view for all of your stakeholders, so you can make informed decisions about how to make your models more fair and effective. If you’d like to learn more about how Arthur can help you combat bias in your AI systems, please reach out to schedule a demo.