Nigel Ward: Current and Recent Projects
Modeling Prosody for Speech-to-Speech Translation
Quantifying the Contributions of Prosodic Behaviors to Trust
Methods for Identifying Individual Differences in Dialog Prosody
Tools to Support the Discovery of Dialog Patterns
Modeling Prosody for Speech-to-Speech Translation
Speech-to-speech translation is rapidly advancing, but current systems and current research focus on the translation of lexical content. To better serve users, these systems also need to accurately convey pragmatic and interactional intents, which requires the proper transfer of prosodic meanings. Prosodic features combine in patterns to convey numerous meanings, including important functions relating to the coordination of action, the marking of topic shifts and connections, the managing of turn taking, and the expression of engagement, stance, and attitude. Prosody modeling is an active area of research, especially for speech synthesis, yet there are no generally useful computational models of prosody as it serves pragmatic functions. Accurately describing the prosody of various languages is also an active area of research, yet there are no quantitative models of how prosodic forms and functions map across languages.
In this NSF-funded project we aim to advance our knowledge of how prosody relates across languages and our ability to model this. The approach is corpus-based: leveraging the Spanish-English bilingual setting of The University of Texas at El Paso, we have collected a modest-sized corpus of dialog utterances reeenacted across languages. We are now building models that take as input an utterance of one language and predict appropriate prosody for the translation in the other language.
with Olac Fuentes and Divette Marco
Quantifying the Contributions of Prosodic Behaviors to Trust
Effective human-agent teaming requires trust, and trust can be fostered by continuous, rapid, non-intrusive communication of system state and intentions. In human-human interaction, this is done in large part using prosody, that is, the aspects of the speech signal that convey more than just the sequence of words. However, unfortunately, agents and robots today do very little with prosody.
This AFOSR-supported project, will address four research questions 1) How much can appropriate agent prosody contribute to trust in teaming? 2) Which properties of agent prosody are most important for trust in teaming? 3) For what communicative functions is appropriate agent prosody most critical for trust in teaming? 4) How can we build agents with effective realtime prosody control?
We will investigate these questions with human-subjects experiments. We will also work on speech synthesizer improvements, and construct a human-agent interaction testbed. The potential impact will be that users of robots and advanced agents will benefit when systems become able to utilize prosody to provide constant unobtrusive signaling of the system's situation awareness, intents, status and trustability, as they vary over time in changing situations.
Modeling Individual and Group Differences in Dialog Prosody
Mastery of prosody is important for effective conversation, and for thus for success in social interactions. However prosodic mastery is not evenly distributed in the population. Language learners often have difficulty with prosody, especially for the prosodic forms used in dialog activities, as do many people with autism and other language disorders. Today there are no good diagnostic tools for impairments in dialog prosody, nor systematic and focused treatment methods. We are developing methods to work directly on unannotated dialog data to automatically identify likely points of weakness, and ways to map out the space of individual variation to enable meaningful classification with modest data. Our current focus is childhood communication disorders, working with partners in the AI Institute for Exceptional Education.
with Arin Rahman and Georgina Bugarini
Tools to Support the Discovery of Dialog Patterns
Building better dialog systems requires a better understanding of the low-level details of human communication. However the dynamics of interaction at the short time-scales characteristic of swift dialog are not accessible to casual observation. Progress here depends on analysis tools. In recent years excellent freeware tools for audio data transcription, phonetic analysis, and speech manipulation have appeared, however none work well for dialog. We need tools that directly support search, comparison, hypothesis formulation, and hypothesis evaluation for dialog phenomena, as this is essential to advancing scientific understanding and to engineering highly responsive systems.
We have built a toolkit for aspects of such analysis, including methods for semi-automatically identifying important dialog cues and patterns from conversation data in any language. We are currently extending this toolkit to cover more prosodic features and support more workflows.
with Javier Vazquez
See the Midlevel Prosodic Features Toolkit.
updated October 2024
See also Publications