Hitting the Books: The bias behind AI assistants' failure to understand accents

Can a new technology really be revolutionary if it isn't universally accessible?

AndreyPopov via Getty Images

The age of being able to speak to our computers just as we do with other humans is finally upon us but voice-activated assistants like Siri, Alexa, and Google Home haven't proven quite as revolutionary — or inclusive — as we'd hoped they'd be. While these systems make a commendable effort to accurately interpret commands regardless of whether you picked up your accent in Houston or Hamburg, for users with heavier or less common accents such as Caribbean or Cockney, requests to their digital assistants are roundly ignored. In her essay "Siri Disciplines" for Your Computer Is on Fire from the MIT Press, Towson University professor Dr. Halcyon M. Lawrence, examens some of the more glaring shortcomings of this nascent technology, how those preventable failures have effectively excluded a sizeable number of potential users and the western biases underpinning the issue.

Your Computer is on Fire
MIT Press

Excerpted from “Your Computer is On Fire” Copyright © 2021 Edited by Thomas S. Mullaney, Benjamin Peters, Mar Hicks and Kavita Philip. Used with permission of the publisher, MIT Press.

Voice technologies are routinely described as revolutionary. Aside from the technology’s ability to recognize and replicate human speech and to provide a hands-free environment for users, these revolutionary claims, by tech writers especially, emerge from a number of trends: the growing numbers of people who use these technologies, the increasing sales volume of personal assistants like Amazon’s Alexa or Google Home, and the expanding number of domestic applications that use voice. If you’re a regular user (or designer) of voice technology, then the aforementioned claim may resonate with you, since it is quite possible that your life has been made easier because of it. However, for speakers with a nonstandard accent (for example, African-American vernacular or Cockney), virtual assistants like Siri and Alexa are unresponsive and frustrating — there are numerous YouTube videos that demonstrate and even parody these cases. For me, a speaker of Caribbean English, there is “silence” when I speak to Siri; this means that there are many services, products, and even information that I am not able to access using voice commands. And while I have other ways of accessing these services, products, and information, what is the experience of accented speakers for whom speech is the primary or singular mode of communication? This so-called “revolution” has left them behind. In fact, Mar Hicks pushes us to consider that any technology that reinforces or reinscribes bias is not, in fact, revolutionary but oppressive. The fact that voice technologies do nothing to change existing “social biases and hierarchies,” but instead reinforce them, means that these technologies, while useful to some, are in no way revolutionary.

One might argue that these technologies are nascent, and that more accents will be supported over time. While this might be true, the current trends aren’t compelling. Here are some questions to consider: first, why have accents been primarily developed for Standard English in Western cultures (such as American, Canadian, and British English)? Second, for non-Western cultures for which nonstandard accent support has been developed (such as Singaporean and Hinglish), what is driving these initiatives? Third, why hasn’t there been any nonstandard accent support for minority speakers of English? Finally, what adjustments — and at what cost — must standard and foreign-accented speakers of English make to engage with existing voice technologies?

In his slave biography, Olaudah Equiano said, “I have often taken up a book, and have talked to it, and then put my ears to it, when alone, in hopes it would answer me; and I have been very much concerned when I found it remained silent.” Equiano’s experience with the traditional interface of a book mirrors the silence that nonstandard and foreign speakers of English often encounter when they try to interact with speech technologies like Apple’s Siri, Amazon’s Alexa, or Google Home. Premised on the promise of natural language use for speakers, these technologies encourage their users not to alter their language patterns in any way for successful interactions. If you possess a foreign accent or speak in a dialect, speech technologies practice a form of “othering” that is biased and disciplinary, demanding a form of postcolonial assimilation to standard accents that “silences” the speaker’s sociohistorical reality.

Because these technologies have not been fundamentally designed to process non-standard and foreign-accented speech, speakers often have to make adjustments to their speech — that is, change their accents — to reduce recognition errors. The result is the sustained marginalization and delegitimization of nonstandard and foreign-accented speakers of the English language. This forced assimilation is particularly egregious given that the number of second-language speakers of English has already exceeded the number of native English-language speakers worldwide. The number of English as a Second Language (ESL) speakers will continue to increase as English is used globally as a lingua franca to facilitate commercial, academic, recreational, and technological activities. One implication of this trend is that, over time, native English speakers may exert less influence over the lexical, syntactic, and semantic structures that govern the English language. We are beginning to witness the emergence of hybridized languages like Spanglish, Konglish, and Hinglish, to name a few. Yet despite this trend and the obvious implications, foreign-accented and nonstandard- accented speech is marginally recognized by speech-mediated devices.

Gluszek and Dovidio define an accent as a “manner of pronunciation with other linguistic levels of analysis (grammatical, syntactical, morphological, and lexical), more or less comparable with the standard language.” Accents are particular to an individual, location, or nation, identifying where we live (through geographical or regional accents, like Southern American, Black American, or British Cockney, for example), our socioeconomic status, our ethnicity, our cast, our social class, or our first language. The preference for one’s accent is well-documented. Individuals view people having similar accents to their own more favorably than people having different accents to their own. Research has demonstrated that even babies and children show a preference for their native accent. This is consistent with the theory that similarity in attitudes and features affects both the communication processes and the perceptions that people form about each other.

However, with accents, similarity attraction is not always the case. Researchers have been challenging the similarity-attraction principle, suggesting that it is rather context-specific and that cultural and psychological biases can often lead to positive perceptions of non-similar accents. Dissimilar accents sometimes carry positive stereotypes which lead to positive perceptions of the speech or speaker. Studies also show that even as listeners are exposed to dissimilar accents, they show a preference for standard accents, like standard British English as opposed to nonstandard varieties like Cockney or Scottish accents.

On the other hand, non-similar accents are not always perceived positively, and foreign-accented speakers face many challenges. For example, Flege notes that speaking with a foreign accent entails a variety of possible consequences for second-language (L2) learners, including accent detection, diminished acceptability, diminished intelligibility, and negative evaluation. Perhaps one of the biggest consequences of having a foreign accent is that L2 users oftentimes have difficulty making themselves understood because of pronunciation errors. Even accented native speakers (speakers of variants of British English, like myself, for example) experience similar difficulty because of the differences of pronunciation.

Lambert et al. produced one of the earliest studies on language attitudes that demonstrated language bias. Since then, research has consistently demonstrated negative perceptions about speech produced by nonnative speakers. As speech moves closer to unaccented, listener perceptions become more favorable, and as speech becomes less similar, listener perceptions become less favorable; said another way, the stronger the foreign accent, the less favorable the speech.

Nonnative speech evokes negative stereotypes such that speakers are perceived as less intelligent, less loyal, less competent, poor speakers of the language, and as having weak political skill. But the bias doesn’t stop at perception, as discriminatory practices associated with accents have been documented in housing, employment, court rulings, lower-status job positions, and, for students, the denial of equal opportunities in education.

Despite the documented ways in which persons who speak with an accent routinely experience discriminatory treatment, there is still very little mainstream conversation about accent bias and discrimination. In fall 2017, I received the following student evaluation from one of my students, who was a nonnative speaker of English and a future computer programmer:

I’m gonna be very harsh here but please don’t be offended — your accent is horrible. As a non-native speaker of English I had a very hard time understanding what you are saying. An example that sticks the most is you say goal but I hear ghoul. While it was funny at first it got annoying as the semester progressed. I was left with the impression that you are very proud of your accent, but I think that just like movie starts [sic] acting in movies and changing their accent, when you profess you should try you speak clearly in US accent so that non-native students can understand you better.

While I was taken aback, I shouldn’t have been. David Crystal, a respected and renowned British linguist who is a regular guest on a British radio program, said that people would write in to the show to complain about pronunciations they didn’t like. He states, “It was the extreme nature of the language that always struck me. Listeners didn’t just say they ‘disliked’ something. They used the most emotive words they could think of. They were ‘horrified,’ ‘appalled,’ ‘dumbfounded,’ ‘aghast,’ ‘outraged,’ when they heard something they didn’t like.” Crystal goes on to suggest that reactions are so strong because one’s pronunciation (or accent) is fundamentally about identity. It is about race. It is about class. It is about one’s ethnicity, education, and occupation. When a listener attends to another’s pronunciation, they are ultimately attending to the speaker’s identity.

As I reflected on my student’s “evaluation” of my accent, it struck me that this comment would have incited outrage had it been made about the immutable characteristics of one’s race, ethnicity, or gender; yet when it comes to accents, there is an acceptability about the practice of accent bias, in part because accents are seen as a mutable characteristic of a speaker, changeable at will. As my student noted, after all, movie stars in Hollywood do it all the time, so why couldn’t I? Although individuals have demonstrated the ability to adopt and switch between accents (called code switching), to do so should be a matter of personal choice, as accent is inextricable to one’s identity. To put upon another an expectation of accent change is oppressive; to create conditions where accent choice is not negotiable by the speaker is hostile; to impose an accent upon another is violent.

One domain where accent bias is prevalent is in seemingly benign devices such as public address systems and banking and airline menu systems, to name a few; but the lack of diversity in accents is particularly striking in personal assistants like Apple’s Siri, Amazon’s Alexa, and Google Home. For example, while devices like PA systems only require listeners to comprehend standard accents, personal assistants, on the other hand, require not only comprehension but the performance of standard accents by users. Therefore, these devices demand that the user assimilate to standard Englishes — a practice that, in turn, alienates nonnative and nonstandard English speakers.