Time-Based Language Modeling
Responsiveness in Spoken Tutorial Dialogs
Towards Persuasive Dialog Systems
Cross-Cultural Misperceptions of Dialog Behaviors
Automating the Discovery of Dialog Cue Patterns
Speaking Rate in Dialog
Speech recognizers all include a component for predicting, based on the past context, what words are likely to appear next. Today these components, known as language models, operate at the symbol level, abstracted away from the details of how and when the words are spoken. Spoken language, however, is not just a symbolic or mathematical object, but is produced and understood by human brains, with specific processing constraints, and these can directly affect what happens when in dialog.
This project is developing language models that explicitly use the
information in the local prosody, including pitch, volume, and
speaking rate. Inspired by psychological research suggesting that
dialog and language behaviors are the result of multiple
simultaneously active cognitive processes, the working assumption is
that the words likely to be spoken at a given time depend,
probabilistically, on the speaker's current state, as indicated, for
example, by the animation in their voice, or the time since the other
speaker stopped talking. Statistical analyses of large corpora of
human-human spoken dialogs revealing interesting patterns.
Language models that include such features can improve speech
recognition accuracy. As they implicitly represent some aspects of
dialog dynamics, they may also lead to a more integrated understanding
of the nature of dialog as a human ability.
More generally, prediction in dialog seems to be
critical to many important applications, as discussed at our recent workshop: Predictive Models of Human
Communication Dynamics.
(with Alejandro Vega, Shreyas Karkhedkar, Ben Walker, and Nisha Kiran)
Funded in part by NSF award IIS-0914868.
An important ability seen in human-human conversations
is the ability for a conversant to pick up on the nuances of the
other's speech, and from that to be able to infer whether the other is
confident, frustrated, confused, etc. Good conversants can then use
this information to alter their own speaking style to show
supportiveness to the other speaker. We are examining these abilities
as they are deployed in tutorial dialog. By analyzing a corpus of
dialogs with skilled tutor, we have found rules for which
acknowledgment to use when, and in what prosodic form, based on the
user's current cognitive and communicative state, as revealed by his
or her prosody and recent behavior. Experiments show that users
prefer systems which produce acknowledgements chosen appropriately in
this way. (Rafael Escalante, continuing work by Tasha Hollingsed and
by Wataru Tsukahara)
Funded in part by NSF award IIS-0415150.
The range of applications for spoken dialog systems today is very
limited: only information access and very simple transactions are
commonly supported. Even these systems are generally disliked and
avoided whenever practical. Ultimately, however, spoken dialog
systems have the potential to be better than other media (web
interfaces etc.) for interactions where empathy or trust plays a role.
In particular we are examining persuasive dialogs, where the speakers
use their language and vocal repertoire in a highly adaptive, highly
effective, and "charming" way. We have collected a corpus of dialogs
between freshmen and a staff member wanting them to consider graduate
school as an option, identified the content nuggets, and built two
baseline systems, one with text input and output and one in VoiceXML.
We have discovered some of the low-level engaging behaviors in this
corpous, and built them in a spoken dialog system that engages users. (Michael
H. Durcholz, Jaime C. Acosta)
The rules governing real-time interpersonal interaction are today not
well understood. With only a few exceptions, there are no
quantitative, predictive rules explaining how to respond in real-time,
in the sub-second range, in order to be an effective communicator in a
given culture. This can be a problem in intercultural interactions;
if an American knows only the words of a foreign language, not the
rules of interaction, he can easily appear uninterested, ill-informed,
thoughtless, discourteous, passive, indecisive, untrusting, dull,
pushy, or worse. Short of long-term cultural exposure, there are
today no reliable ways to train speakers to understand and follow such
rules and attain mastery of interaction at the sub-second level.
The purpose of this research is to increase our knowledge and know-how
in this area.
The aim of this project is to develop methods for
training learners of Arabic to master these behaviors and thereby
appear more polite. Focusing on back-channel feedback,
we have developed a training sequence which enables the acquisition of
a basic Arabic back-channel skill, namely, that
of producing feedback immediately after the speaker produces a sharp
pitch downslope. This training sequence includes software that
provides feedback on learners' attempts to produce the cue themselves
and feedback on learners' performance as they play the role of an
attentive listener in response to one side of a pre-recorded dialog.
Experiments with human subjects have shown this to be effective.
(with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)
Funded in part by awards from DARPA-DSO and DoD-CIFA.
Building better dialog systems requires a better understanding of the
low-level details of human communication. However the dynamics of
interaction at the extreme time-scales characteristic of swift dialog
are not accessible to casual observation. Progress here depends tools
for systematically analyzing these patterns of behavior. In recent
years some excellent freeware tools for audio data transcription,
phonetic analysis, and manipulation have appeared, however none
directly support the sorts of search, comparison, hypothesis
formulation, and hypothesis evaluation essential to advancing
scientific understanding and to engineering highly responsive systems.
We have begun prototyping toolkits for this kind of analysis, to
determine what functionality linguists and others need, and how best
to provide it (Downloads), and
are developing a method for automatically identifying important dialog
cues from conversation data in any language. (with Joshua McCartney)
Prosody-Based Language Modeling
Responsiveness in Spoken Tutorial Dialogs
Towards Persuasive Dialog Systems
Cross-Cultural Misperceptions of Dialog Behaviors
Automating the Discovery of Dialog Cue Patterns
| See also:   | Publications | ||
| Research Themes in Dialog System Usability | |||
Earlier Projects
| Interactive Systems Group Projects Page
| |
Up to Nigel Ward.