As a speech-language pathologist, I am fascinated with how humans and voice interfaces (such as Amazon’s Echo, “Alexa”) communicate with each other. I was fortunate to be a member of a research team based at the University of Washington, in which we recruited 10 diverse families to incorporate an Amazon Echo Dot into their homes for the first time. As researchers, we wanted to learn and understand how families incorporate the fast-growing technology of dedicated, home-based, voice interfaces in their homes. None of the families we recruited had ever had a dedicated voice interface like the Echo or Echo Dot in their home before, and all of the families had children living in their homes.
During our study, we found that all 10 families experienced communication breakdowns with the Echo Dot (“Alexa”). A communication breakdown in these instances is when the human communication partner says something to Alexa, and Alexa misunderstands what was said, resulting in an inappropriate response or no response at all from Alexa. I used my background as a speech-language pathologist to analyze how communication between families and Alexa would breakdown, and why those breakdowns occurred.
A major concept in analyzing communication breakdowns is pragmatics. In the context of human communication interactions, pragmatic skills involve the social use of communication, such as choosing the right vocabulary for your communication partner. The ability to adjust your communication to be appropriate for the person you are communicating with is a form of code switching. For example, when a teenager talks to her friends, she will use vocabulary, sentence structures, and a tone of voice that is appropriate for her peers. In contrast, a teenager is likely to use a different vocabulary, sentence structure, and tone of voice when speaking to a teacher or a parent.
Our research shows that the voice interface our families used (whom I’ll refer to as “Alexa”) currently has problems with pragmatics, including code switching between children and adults. This may not be surprising since Alexa is not actually a person with a brain, but nonetheless, communication breakdowns with Alexa could be addressed, at least in part, if designers and developers could build Alexa’s communication skills in the area of pragmatics. In addition, Alexa has difficulty with assisting her human communication partners in repairing communication breakdowns. Alexa often provides a neutral response to communication breakdowns, such as “hmmm…I’m not sure” rather than acting on a misunderstanding or providing a specific response. Neutral responses do not provide any contextual cues about where the breakdown occurred, whereas when Alexa acts on misunderstood information, her communication partners can at least get some clues as to where the communication breakdown occurred. In our research paper, “Communication Breakdowns with Alexa” I describe how parents and children collaborate to repair communication breakdowns, and how Alexa’s responses impact families’ abilities to repair communication breakdowns successfully.
In this post, I’m going to delve into the concept of figurative language, words with multiple meanings, and code switching, and we’ll explore how important these concepts are for voice interfaces that interact with children.
To start, let’s take a look at an excerpt that we recorded from one of our families, in which we see how Alexa has difficulty with words that have multiple meanings. In this instance, the mother and child have tried out different requests with Alexa, and the mother urged the child to see if Alexa can speak in other languages.
Mother: Go on ask her. Ask her if she speaks Spanish.
Child: Alexa, count in Spanish.
Alexa: Count in Spanish is …contar. (word spoken in Spanish)
In this exchange between parent, child, and Alexa, we see how Alexa takes the child’s request literally, by translating the word “count” instead of performing the action “to count.” This demonstrates a concept we observed during our research study: Alexa has difficulty following directions which involve words with multiple meanings. Without additional context, Alexa has difficulty knowing which meaning to interpret in the command. However, this doesn’t stop Alexa from using words with multiple meanings herself, which also result in communication breakdowns.
Let’s consider the concept of code switching. When thinking of the different types of human communication partners Alexa was talking with in our research study, we need to think about the communication differences between children and adults. Children are still developing their language skills, and therefore figurative language and understanding how words may have multiple meanings can be challenging concepts for children, especially young children. Yet, in our study, we found that Alexa used figurative language, particularly when it came to jokes. Often, children in our study asked Alexa to tell jokes, and one child in particular asked Alexa to tell jokes throughout the entire four-week research period. Here’s an example of one of the jokes that Alexa told this child:
A man walks into a bar. Crank. It was a heavy metal bar.
Did you get the joke? How about this next one?
Why shouldn’t you tell a secret on a farm?
Because the potatoes have eyes and the corn has ears! And the beans stalk.
Both of these jokes were told to the same child during the course of our study. Do you think the child got these jokes? How about if I told you the child was four years old? When we interviewed this child at the end of our study, we asked if they understood the jokes that Alexa told, and why the jokes were funny. The child’s response was “No, I don’t know why.” This is a classic illustration of how lack of code switching can significantly impact communication interactions. In this case, Alexa’s jokes were not age-appropriate for young children, relying often on figurative language as a key element for humor—a form of humor that is not yet developed in young children (see the American Speech-Language-Hearing-Association’s Communication Reference Sheets for more details). Alexa failed to properly identify her communication partner (a child) and as a result, failed to adjust her communication so that she could be understood.
Just for fun, let’s think of how Alexa, as an entity, could improve her communication skills. I’ve created a communication plan for Alexa to help:
Goal: Alexa will correctly identify her communication partner at least 80% of the time.
- Objective 1: Alexa will identify if her communication partner is an adult or a child (approximately age 5 or under) based on fundamental frequency1.
- Objective 2: Alexa will identify if her communication partner is an adult or a child based on sentence length.
- Objective 3: Alexa will identify if her communication partner is an adult or a child based on key words used.
Goal: Alexa will provide appropriate contextual cues to her communication partner when a communication breakdown occurs at least 90% of the time.
- Objective 1: Alexa will state when the request was too long for her to understand.
- Objective 2: Alexa will request clarification if the topic of the request is not known.
- Objective 3: When Alexa is less than 60% certain of the human’s request, Alexa will repeat the information she understood, and ask her communication partner if this is what they said.
- Objective 4: Alexa will provide a specific response, or act on a misunderstanding, whenever possible to assist her human communication partner in identifying the communication breakdown.
This communication plan is really an exercise in helping technology designers think of different ways in which they might create avenues for voice interfaces, such as Alexa, to improve communication interactions with humans. However, it highlights just how complex human communication is, and how many of us take verbal communication and the many processes involved in verbal communication, for granted. Human communication is complex and challenging, even for other humans… No wonder technology is having a hard time with it as well.
For more details on our research study, and on the communication breakdowns that occurred between Alexa and families, you can find our research paper, “Communication Breakdowns Between Families and Alexa,” here.
1 In this case, fundamental frequency refers to a measure of vocal fold vibration during speech production. Although not exactly the same, you can think of it as overall pitch of a voice. I suggest this as one empirical measure that an entity such as Alexa can refer to in order to help identify the sound characteristics of her communication partner: differentiating between a young child vs an adult, who generally have different fundamental frequencies.
Erin Beneteau is a speech-language pathologist (SLP) and a PhD student at the Information School at the University of Washington. Her primary areas of clinical practice include working with preschool age children, as well as working with people of all ages who use assistive technologies for communication. Erin has practiced in the United States, New Zealand, and Ireland as an SLP. and she has also worked in the technology industry as an instructional designer. Her recent research on family interactions with the Amazon Echo Dot has been an exciting way to blend her interests in technology and human communication.