An interesting aspect of human-human conversations is
the ability for a conversant to pick up on the nuances of the other
conversant's speech and be able to infer the other's state at the
time, such as if they are happy, frustrated, or confused. Using the
prosody of the utterances and the timing between utterances, an
insightful conversant is able to alter their own speaking style and
feedback to encourage the other speaker. By evaluating data obtained
through a corpus collected from a skilled tutor, we have found rules
for when to use and how to use specific acknowledgments and
feedback. Using these rules, we have built a more human-like tutoring
system that users will find preferable to a system without these
rules. This is an extension of a previous study done in Japanese. We
are now extending the system to dynamically select the appropriate
pitch contours to use, to make the interaction more compelling and motivating.
(Rafael Escalante,
continuing work by Tasha Hollingsed and by Wataru Tsukahara)
The range of applications for spoken dialog systems today is very
limited: only information access and very simple transactions are
commonly supported, and even these are generally disliked and used
only when people find impractical to use the web instead. Ultimately,
however, spoken dialog systems should be more usable and better liked
for certain kinds of interaction. We are examining persuasive
dialogs, where the speakers use their language and vocal repertoire in
a highly adaptive, highly effective, and "charming" way. So far we
have collected a corpus of dialogs between freshmen and a staff member
wanting them to consider graduate school as an option, identified the
content nuggets, and built two baseline systems, one with text input and output
and one in VoiceXML. Our next steps will be to discover the
high-level persuasive strategies and low-level engaging behaviors, and
to embody these in a spoken dialog system. (Jaime C. Acosta)
Most language models treat speech as simply sequences of words,
ignoring the fact that words are also events in time. We have shown
that modeling how word probabilities vary with time-into-utterance can
improve a language model. Our next steps will be to extend this into
a broader and deeper model, not just for improving the performance of
speech recognizers, but also for insight into the time-course of human
cognitive processing in dialog. (with Alejandro Vega, Nisha
Kiran, Ben Walker and Shubhra Datta)
The rules governing real-time interpersonal interaction are today not
well understood. With only a few exceptions, there are no
quantitative, predictive rules explaining how to respond in real-time,
in the sub-second range, in order to be an effective communicator in a
given culture. This can be a problem in intercultural interactions;
if an American knows only the words of a foreign language, not the
rules of interaction, he can easily appear uninterested, ill-informed,
thoughtless, discourteous, passive, indecisive, untrusting, dull,
pushy, or worse. Short of long-term cultural exposure, there are
today no reliable ways to train speakers to understand and follow such
rules and attain mastery of interaction at the sub-second level.
The purpose of this research is to increase our knowledge and know-how
in this area.
The aim of this project is to develop
methods for training learners of Arabic to master these behaviors and
thereby appear more polite. So far we have focused on back-channel feedback.
We have developed a training sequence which enables the
acquisition of a basic Arabic back-channel skill, namely, that of
producing feedback immediately after the speaker produces a sharp
pitch downslope. This training sequence includes software that
provides feedback on learners' attempts to produce the cue themselves
and feedback on learners' performance as they play the role of an
attentive listener in response to one side of a pre-recorded
dialog. Preliminary experiments indicate that this is effective.
(with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)
Building better dialog systems requires a better understanding of the
low-level details of human communication. However the dynamics of
interaction at the extreme time-scales characteristic of swift dialog
are not accessible to casual observation. Progress here depends tools
for systematically analyzing these patterns of behavior. In recent
years some excellent freeware tools for audio data transcription,
phonetic analysis, and manipulation have appeared, however none
directly support the sorts of search, comparison, hypothesis
formulation, and hypothesis evaluation essential to advancing
scientific understanding and to engineering highly responsive systems.
We have begun prototyping toolkits for this kind of analysis, to
determine what functionality linguists and others need, and how best
to provide it (Downloads), and
are developing a method for automatically identifying important dialog
cues from conversation data in any language. (with Joshua McCartney)
The pacing of today's dialog systems tends to be rigid. In addition
to the problems of turn-taking, the speaking rate itself is generally
fixed. For example, the automatic number-giving that comes at the end
of directory assistance calls is at a fixed rate: too slow for some
people and too fast for others. We have found that the speaking rate
can be adapted automatically for these dialogs, based just on the
user's speaking rate and response latency. However simple
correlations break down for more complex types of dialog; there the
speaking rate appears to depend more on dialog act and dialog state.
(with S. Kumar Mamidipally, continuing work by Satoshi Nakagawa)
Towards Persuasive Dialog Systems
Time-Based Language Modeling
Cross-Cultural Misperceptions of Dialog Behaviors
Automating the Discovery of Dialog Cue Patterns
Speaking Rate in Dialog
See also:   | Publications | ||
Research Themes in Dialog System Usability | |||
Earlier Projects
Interactive Systems Group Projects Page
| |
Up to Nigel Ward.