Hitting the Books: How biased AI can hurt users or boost a business's bottom line

“Why would you build an expensive algorithm if you can’t bias it in your favor?” - former American Airlines president, Robert Crandall

George Rinhart via Getty Images

I'm not sure why people are worried about AI surpassing humanity's collective intellect any time soon, we can't even get the systems we have today to quit emulating some of our more ignoble tendencies. Or rather, perhaps we humans must first detangle ourselves from these very same biases before expecting them eliminated from our algorithms.

In A Citizen's Guide to Artificial Intelligence, John Zerilli leads a host of prominent researchers and authors in the field of AI and machine learning to present readers with an approachable, holistic examination of both the history and current state of the art, the potential benefits of and challenges facing ever-improving AI technology, and how this rapidly advancing field could influence society for decades to come.

A Citizen's Guide to AI by John Zerilli
MIT Press

Excerpted from “A Citizen's Guide to AI” Copyright © 2021 By John Zerilli with John Danaher, James Maclaurin, Colin Gavaghan, Alistair Knott, Joy Liddicoat and Merel Noorman. Used with permission of the publisher, MIT Press.

Human bias is a mix of hardwired and learned biases, some of which are sensible (such as “you should wash your hands before eating”), and others of which are plainly false (such as “atheists have no morals”). Artificial intelligence likewise suffers from both built-in and learned biases, but the mechanisms that produce AI’s built-in biases are different from the evolutionary ones that produce the psychological heuristics and biases of human reasoners.

One group of mechanisms stems from decisions about how practical problems are to be solved in AI. These decisions often incorporate programmers’ sometimes-biased expectations about how the world works. Imagine you’ve been tasked with designing a machine learning system for landlords who want to find good tenants. It’s a perfectly sensible question to ask, but where should you go looking for the data that will answer it? There are many variables you might choose to use in training your system — age, income, sex, current postcode, high school attended, solvency, character, alcohol consumption? Leaving aside variables that are often misreported (like alcohol consumption) or legally prohibited as discriminatory grounds of reasoning (like sex or age), the choices you make are likely to depend at least to some degree on your own beliefs about which things influence the behavior of tenants. Such beliefs will produce bias in the algorithm’s output, particularly if developers omit variables which are actually predictive of being a good tenant, and so harm individuals who would otherwise make good tenants but won’t be identified as such.

The same problem will appear again when decisions have to be made about the way data is to be collected and labeled. These decisions often won’t be visible to the people using the algorithms. Some of the information will be deemed commercially sensitive. Some will just be forgotten. The failure to document potential sources of bias can be particularly problematic when an AI designed for one purpose gets co-opted in the service of another — as when a credit score is used to assess someone’s suitability as an employee. The danger inherent in adapting AI from one context to another has recently been dubbed the “portability trap.” It’s a trap because it has the potential to degrade both the accuracy and fairness of the repurposed algorithms.

Consider also a system like TurnItIn. It’s one of many anti-plagiarism systems used by universities. Its makers say that it trawls 9.5 billion web pages (including common research sources such as online course notes and reference works like Wikipedia). It also maintains a database of essays previously submitted through TurnItIn that, according to its marketing material, grows by more than fifty thousand essays per day. Student-submitted essays are then compared with this information to detect plagiarism. Of course, there will always be some similarities if a student’s work is compared to the essays of large numbers of other students writing on common academic topics. To get around this problem, its makers chose to compare relatively long strings of characters. Lucas Introna, a professor of organization, technology and ethics at Lancaster University, claims that TurnItIn is biased.

TurnItIn is designed to detect copying but all essays contain something like copying. Paraphrasing is the process of putting other people’s ideas into your own words, demonstrating to the marker that you understand the ideas in question. It turns out that there’s a difference in the paraphrasing of native and nonnative speakers of a language. People learning a new language write using familiar and sometimes lengthy fragments of text to ensure they’re getting the vocabulary and structure of expressions correct. This means that the paraphrasing of nonnative speakers of a language will often contain longer fragments of the original. Both groups are paraphrasing, not cheating, but the nonnative speakers get persistently higher plagiarism scores. So a system designed in part to minimize biases from professors unconsciously influenced by gender and ethnicity seems to inadvertently produce a new form of bias because of the way it handles data.

There’s also a long history of built-in biases deliberately designed for commercial gain. One of the greatest successes in the history of AI is the development of recommender systems that can quickly and efficiently find consumers the cheapest hotel, the most direct flight, or the books and music that best suit their tastes. The design of these algorithms has become extremely important to merchants — and not just online merchants. If the design of such a system meant your restaurant never came up in a search, your business would definitely take a hit. The problem gets worse the more recommender systems become entrenched and effectively compulsory in certain industries. It can set up a dangerous conflict of interest if the same company that owns the recommender system also owns some of the products or services it’s recommending.

This problem was first documented in the 1960s after the launch of the SABRE airline reservation and scheduling system jointly developed by IBM and American Airlines. It was a huge advance over call center operators armed with seating charts and drawing pins, but it soon became apparent that users wanted a system that could compare the services offered by a range of airlines. A descendent of the resulting recommender engine is still in use, driving services such as Expedia and Travelocity. It wasn’t lost on American Airlines that their new system was, in effect, advertising the wares of their competitors. So they set about investigating ways in which search results could be presented so that users would more often select American Airlines. So although the system would be driven by information from many airlines, it would systematically bias the purchasing habits of users toward American Airlines. Staff called this strategy screen science.

American Airlines’ screen science didn’t go unnoticed. Travel agents soon spotted that SABRE’s top recommendation was often worse than those further down the page. Eventually the president of American Airlines, Robert L. Crandall, was called to testify before Congress. Astonishingly, Crandall was completely unrepentant, testifying that “the preferential display of our flights, and the corresponding increase in our market share, is the competitive raison d’être for having created the [SABRE] system in the first place.” Crandall’s justification has been christened “Crandall’s complaint,” namely, “Why would you build and operate an expensive algorithm if you can’t bias it in your favor?”

Looking back, Crandall’s complaint seems rather quaint. There are many ways recommender engines can be monetized. They don’t need to produce biased results in order to be financially viable. That said, screen science hasn’t gone away. There continue to be allegations that recommender engines are biased toward the products of their makers. Ben Edelman collated all the studies in which Google was found to promote its own products via prominent placements in such results. These include Google Blog Search, Google Book Search, Google Flight Search, Google Health, Google Hotel Finder, Google Images, Google Maps, Google News, Google Places, Google+, Google Scholar, Google Shopping, and Google Video.

Deliberate bias doesn’t only influence what you are offered by recommender engines. It can also influence what you’re charged for the services recommended to you. Search personalization has made it easier for companies to engage in dynamic pricing. In 2012, an investigation by the Wall Street Journal found that the recommender system employed by a travel company called Orbiz appeared to be recommending more expensive accommodation to Mac users than to Windows users.