Research Projects

Responsiveness in Spoken Tutorial Dialogs

An interesting aspect of human-human conversations is the ability for a conversant to pick up on the nuances of the other conversant's speech and be able to infer the other's state at the time, such as if they are happy, frustrated, or confused. Using the prosody of the utterances and the timing between utterances, an insightful conversant is able to alter their own speaking style and feedback to encourage the other speaker. By evaluating data obtained through a corpus collected from a skilled tutor, we have found rules for when to use and how to use specific acknowledgments and feedback. Using these rules, we have built a more human-like tutoring system that users will find preferable to a system without these rules. This is an extension of a previous study done in Japanese. We are now extending the system to dynamically select the appropriate pitch contours to use, to make the interaction more compelling and motivating. (Rafael Escalante, continuing work by Tasha Hollingsed and by Wataru Tsukahara)

Towards Persuasive Dialog Systems

The range of applications for spoken dialog systems today is very limited: only information access and very simple transactions are commonly supported, and even these are generally disliked and used only when people find impractical to use the web instead. Ultimately, however, spoken dialog systems should be more usable and better liked for certain kinds of interaction. We are examining persuasive dialogs, where the speakers use their language and vocal repertoire in a highly adaptive, highly effective, and "charming" way. So far we have collected a corpus of dialogs between freshmen and a staff member wanting them to consider graduate school as an option, identified the content nuggets, and built two baseline systems, one with text input and output and one in VoiceXML. Our next steps will be to discover the high-level persuasive strategies and low-level engaging behaviors, and to embody these in a spoken dialog system. (Jaime C. Acosta)

Time-Based Language Modeling

Most language models treat speech as simply sequences of words, ignoring the fact that words are also events in time. We have shown that modeling how word probabilities vary with time-into-utterance can improve a language model. Our next steps will be to extend this into a broader and deeper model, not just for improving the performance of speech recognizers, but also for insight into the time-course of human cognitive processing in dialog. (with Alejandro Vega, Nisha Kiran, Ben Walker and Shubhra Datta)

Cross-Cultural Misperceptions of Dialog Behaviors

The rules governing real-time interpersonal interaction are today not well understood. With only a few exceptions, there are no quantitative, predictive rules explaining how to respond in real-time, in the sub-second range, in order to be an effective communicator in a given culture. This can be a problem in intercultural interactions; if an American knows only the words of a foreign language, not the rules of interaction, he can easily appear uninterested, ill-informed, thoughtless, discourteous, passive, indecisive, untrusting, dull, pushy, or worse. Short of long-term cultural exposure, there are today no reliable ways to train speakers to understand and follow such rules and attain mastery of interaction at the sub-second level. The purpose of this research is to increase our knowledge and know-how in this area.

The aim of this project is to develop methods for training learners of Arabic to master these behaviors and thereby appear more polite. So far we have focused on back-channel feedback. We have developed a training sequence which enables the acquisition of a basic Arabic back-channel skill, namely, that of producing feedback immediately after the speaker produces a sharp pitch downslope. This training sequence includes software that provides feedback on learners' attempts to produce the cue themselves and feedback on learners' performance as they play the role of an attentive listener in response to one side of a pre-recorded dialog. Preliminary experiments indicate that this is effective. (with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)

Automating the Discovery of Dialog Cue Patterns

Building better dialog systems requires a better understanding of the low-level details of human communication. However the dynamics of interaction at the extreme time-scales characteristic of swift dialog are not accessible to casual observation. Progress here depends tools for systematically analyzing these patterns of behavior. In recent years some excellent freeware tools for audio data transcription, phonetic analysis, and manipulation have appeared, however none directly support the sorts of search, comparison, hypothesis formulation, and hypothesis evaluation essential to advancing scientific understanding and to engineering highly responsive systems. We have begun prototyping toolkits for this kind of analysis, to determine what functionality linguists and others need, and how best to provide it (Downloads), and are developing a method for automatically identifying important dialog cues from conversation data in any language. (with Joshua McCartney)

Speaking Rate in Dialog

The pacing of today's dialog systems tends to be rigid. In addition to the problems of turn-taking, the speaking rate itself is generally fixed. For example, the automatic number-giving that comes at the end of directory assistance calls is at a fixed rate: too slow for some people and too fast for others. We have found that the speaking rate can be adapted automatically for these dialogs, based just on the user's speaking rate and response latency. However simple correlations break down for more complex types of dialog; there the speaking rate appears to depend more on dialog act and dialog state. (with S. Kumar Mamidipally, continuing work by Satoshi Nakagawa)


See also:   Publications
Research Themes in Dialog System Usability
Earlier Projects
Interactive Systems Group Projects Page

Up to Nigel Ward.