Using Emotion to Gain Rapport in a Spoken Dialog System

Jaime C. Acosta

Ph. D. Dissertation, 2009
Department of Computer Science, University of Texas at El Paso

Abstract: Although spoken dialog systems are becoming more widespread, their application is today limited largely to domains involving simple information exchange. To enable future applications, such as persuasion, new capabilities are needed. One barrier to the creation of such applications has been the lack of methods for building rapport between spoken dialog systems and human users, and more generally the inability to model the emotional and interpersonal aspects of dialog. This dissertation focuses on improving this.

A corpus of persuasive dialogs that in which a graduate coordinator informed undergraduate students about the graduate school option was analyzed. Although much of each dialog was involved in conveying factual information, there was also a heavy use of what appear to be rapport-building strategies. This seemed to occur through emotional coloring of the utterances of both coordinator and students as heard in prosodic variation, including variation in pitch, timing, and volume.

Some of these rapport-building strategies were modeled and implemented in a spoken dialog system named Gracie (Graduate Coordinator with Immediate-Response Emotions). Gracie is the first dialog system that uses emotion in voice to build rapport with users. This is accomplished by first detecting emotions from the user's voice, not classic emotions such as sadness, anger, and joy, but the more subtle emotions that are more common in spontaneous conversations. These subtle emotions are described with a dimensional approach, using the three dimensions of activation (active/passive), evaluation (positive, negative), and power (dominant/submissive). Once the user's emotional state is recognized, Gracie chooses an appropriate emotional coloring for the response.

To test the value of such emotional responsiveness, an experiment with 36 subjects examined whether a spoken dialog system that recognizes human emotion and reacts with appropriate emotion can help gain rapport with humans. Users felt significantly more rapport with Gracie to the controls, and in addition, users significantly preferred Gracie to the other two systems. This suggests that dialog systems that attempt to connect to users should vary their emotional coloring, as expressed through prosody, in response to the user's inferred emotional state.

Full text (123 pages, 980K)