Uncovering Bias in Machine Learning: We Can Do Better

Jacob Zweig
Co-Founder, Principal Consultant
Brock Ferguson
Co-Founder, Principal Consultant

We talk a lot at Strong about “applying machine learning in the real world” and how that differs from developing and testing algorithms in the research lab. We often focus on the technical challenges of sampling and annotating data, assessing the robustness or safety of models, and productionizing machine learning algorithms.

But there is another, much more important challenge in applying algorithms in the “real world”: systemic biases that, in their worst form, jeopardize equality and justice.

The urgency of addressing these biases has been no clearer than in recent weeks as the world witnessed the murders of George Floyd, Ahmaud Arbery, Breonna Taylor, and Rayshard Brooks. It is likewise clear that the same biases that led to the decisions that cost these people their lives have — over a much longer timescale — shaped public policies in unfair and unjust ways.

In machine learning and artificial intelligence, we build systems that make decisions. In doing so, practitioners must recognize their role in amplifying or minimizing the biases of the real world. When human biases infect algorithmic decision-making, those same biases become harder to observe (“I can’t explain it — our system is a black box”), emboldened through their links to science (“you think you know more than my algorithm?”), and efficiently applied at unparalleled scale.

Yet recent examples suggest that some in our field have taken this role lightly, from algorithms for massive-scale facial recognition, determining parole eligibility, determining your credit card limits, or detecting your sexual orientation from a picture of your face.

We can do better.

A couple months ago, in response to concerns about the spread of facial recognition in policing, we held a team-wide “lab meeting” in which we discussed, amongst other ethical issues, how to break the feedback loop between biases in the real world and algorithms. Although far from addressing the entirety of the problem, here are four recommendations that came out of that discussion.

Seek out biased assumptions in data

Despite the mistaken notion that data are somehow inherently objective, datasets can often "bake in" historical biases and cause downstream predictive models to reinforce unfair assumptions.

A recent Science study revealed such bias in healthcare. In this case, an algorithm which purported to automatically triage cases to identify which were in most need of additional care was found to recommend much more treatment for white patients than black patients who, by medical standards, were in need of the same care. This bias arose from problematic data: the model was trained using health costs rather than actual needs. Unfortunately, because less money is spent on black patients with the same needs as equally-sick white patients, the algorithm ended up systematizing the racial inequalities in healthcare that was present in its dataset.

Define metrics to measure bias 

A growing contingent has recognized that we must be more than just not racist, we must be anti-racist. This shift from passive to active thinking applies in our work as well.

One reason why bias is such a problem in machine learning is that it is unobservable and, as more complex modeling approaches have emerged, becomes harder and harder to identify even with careful model interrogation.

To take an anti-bias mindset that can unearth bias where it exists, one must actively define and measure bias in algorithmic outputs even when overt bias is not detected in the data. Several metrics based on measuring the degree of an algorithm’s inequality have been proposed to help uncover biased results, including Statistical Parity Difference, Equal Opportunity Difference, Theil Index, and more.

Leverage algorithmic solutions aimed at reducing bias

Beyond identifying and measuring algorithmic bias, we must seek out ways to encourage fairness at the algorithmic level even when unbiased data are not available. Recent approaches to directly mitigate algorithmic bias at the level of training, including adversarial de-biasing, upsampling and reweighting training examples, and distributionally robust optimization, can help improve model fairness. 

Encourage diversity

There is no single technique or approach that can entirely combat biased algorithms, and even unbiased algorithms can be used unfairly. Instead, true progress towards reducing bias and responsible use of machine learning relies on collaboration between people motivated to break the feedback loop. To that end, building a diverse field that can draw on unique experiences and perspectives to solve these difficult problems is essential.

The evidence is clear that the field of machine learning and artificial intelligence suffers from a “disastrous” lack of diversity.

We want to help change this, which is why we are donating $10,000 to Black Girls Code, a non-profit with the vision of increasing the number of women of color in STEM fields such as computer science.

If you are a data scientist and want to encode these practices in your work, we recommend you look at deon, an ethics checklist for data science projects. Please also get in touch with us and share other recommendations or tools so that we can share in this post.


Strong Analytics builds enterprise-grade data science, machine learning, and AI to power the next generation of products and solutions. Our team of full-stack data scientists and engineers accelerate innovation through their development expertise, scientific rigor, and deep knowledge of state-of-the-art techniques. We work with innovative organizations of all sizes, from startups to Fortune 500 companies. Come introduce yourself on Twitter or LinkedIn, or tell us about your data science needs.