Nov 21, 2019 2:00 PM

Researchers Want Guardrails to Help Prevent Bias in AI

Machine-learning experts often design their algorithms to avoid some unintended consequences. But that’s not as easy for others.

A blue ball and a pink ball balancing on a minimalist scale

Artificial intelligence has given us algorithms capable of recognizing faces, diagnosing disease, and of course, crushing computer games. But even the smartest algorithms can sometimes behave in unexpected and unwanted ways—for example, picking up gender bias from the text or images they are fed.

A new framework for building AI programs suggests a way to prevent aberrant behavior in machine learning by specifying guardrails in the code from the outset. It aims to be particularly useful for nonexperts deploying AI, an increasingly common issue as the technology moves out of research labs and into the real world.

The approach is one of several proposed in recent years for curbing the worst tendencies of AI programs. Such safeguards could prove vital as AI is used in more critical situations, and as people become suspicious of AI systems that perpetuate bias or cause accidents.

Last week Apple was rocked by claims that the algorithm behind its credit card offers much lower credit limits to women than men of the same financial means. It was unable to prove that the algorithm had not inadvertently picked up some form of bias from training data. Just the idea that the Apple Card might be biased was enough to turn customers against it.

Similar backlashes could derail adoption of AI in areas like health care, education, and government. “People are looking at how AI systems are being deployed and they're seeing they are not always being fair or safe,” says Emma Brunskill, an assistant professor at Stanford and one of the researchers behind the new approach. “We're worried right now that people may lose faith in some forms of AI, and therefore the potential benefits of AI might not be realized.”

Examples abound of AI systems behaving badly. Last year, Amazon was forced to ditch a hiring algorithm that was found to be gender biased; Google was left red-faced after the autocomplete algorithm for its search bar was found to produce racial and sexual slurs. In September, a canonical image database was shown to generate all sorts of inappropriate labels for images of people.

Machine-learning experts often design their algorithms to guard against certain unintended consequences. But that’s not as easy for nonexperts who might use a machine-learning algorithm off the shelf. It’s further complicated by the fact that there are many ways to define “fairness” mathematically or algorithmically.

The new approach proposes building an algorithm so that, when it is deployed, there are boundaries on the results it can produce. “We need to make sure that it's easy to use a machine-learning algorithm responsibly, to avoid unsafe or unfair behavior,” says Philip Thomas, an assistant professor at the University of Massachusetts Amherst who also worked on the project.

The researchers demonstrate the method on several machine-learning techniques and a couple of hypothetical problems in a paper published in the journal Science Thursday.

First, they show how it could be used in a simple algorithm that predicts college students' GPAs from entrance exam results—a common practice that can result in gender bias, because women tend to do better in school than their entrance exam scores would suggest. In the new algorithm, a user can limit how much the algorithm may overestimate and underestimate student GPAs for male and female students on average.

In another example, the team developed an algorithm for balancing the performance and safety of an automated insulin pump. Such pumps decide how much insulin to deliver at mealtimes, and machine learning can help determine the right dose for a patient. The algorithm they designed can be told by a doctor to only consider dosages within a particular range, and to have a low probability of suggesting dangerously low or high blood sugar levels.

The researchers call their algorithms “Seldonian” in reference to Hari Seldon, a character in Isaac Asimov stories that feature his famous “three laws of robotics,” which begin with the rule: “A robot may not injure a human being or, through inaction, allow a human being to come to harm.”

The new approach is unlikely to solve the problem of algorithms misbehaving. Partly that’s because there’s no guarantee organizations deploying AI will adopt such approaches when they can come at the cost of optimal performance.

The work also highlights the fact that defining “fairness” in a machine-learning algorithm is not a simple task. In the GPA example, for instance, the researchers provide five different ways to define gender fairness.

“One of the major challenges in making algorithms fair lies in deciding what fairness actually means,” says Chris Russell, a fellow at the Alan Turing Institute in the UK. “Trying to understand what fairness means, and when a particular approach is the right one to use is a major area of ongoing research."

If even experts cannot agree on what is fair, Russell says it might be a mistake to put the burden on less proficient users. “At the moment, there are more than 30 different definitions of fairness in the literature,” he notes. “This makes it almost impossible for a nonexpert to know if they are doing the right thing.”