Nigel Ward: Current and Recent Projects
Using Continuous Assessments of Dialog Quality for Training Purposes
People use dialog systems today in very limited ways. This is largely because that the interactions are tuned to get the job done, but not necessarily in any pleasing way. One possible way to improve is to train systems to satisfy quality goals, moment by moment. We are exploring ways to annotate for quality in real-time, and to use these annotations to improve dialog system performance and habitability.
Using Prosodic Information to Improve Search in Audio
If only search in audio archives were as simple as search in text. While human speech is intrinsically challenging to process --- due to the variety of styles and pronunciations --- it does bring an additional point of leverage: prosodic information. Thus we wish to go beyond words, to also use information in the way people say things, in two ways.
We have applied Principal Component Analysis to time-spread prosodic features as a way to reduce the dimensionality. This enables us to map each moment in a dialog to a point in a vector space. We have found that point pairs that are close in this vector space are frequently similar, in terms of the dialog activities (planning, complaining, explaining, and so on), in terms of the stance (new, urgent, factual, etc.) and in terms of topic. Using proximity in this space as an indicator of similarity, we have developed models for query-by-example search, for detecting urgency, and for inferring aspects of stance, such as good, typical, local, urgent, new information, and relevant to a large group. We are now exploring utility for information extraction, starting with the task of spotting location mentions.
Responsive Prosodic Behaviors for Interactive Systems
Spoken language is an attractive way for people to interact with autonomous intelligent systems. When systems talk to people, speech can convey not only lexical information but also meta information, such as whether the information requires immediate attention or is just background, how well the system understands the user's goals and situation, and whether the system needs to provide more information or is done for the moment.
In human-human dialog such meta-information is mostly conveyed by prosody: subtle variations in the pitch, energy, rate and timing within utterances. Unfortunately this sort of expressiveness for agents today requires using pre-recorded prompts or hand-crafted synthesized utterances, but neither of these is flexible enough for systems operating in contexts where the possible configurations of communication needs are not known ahead of time.
This project is building models of prosody to support the creation of prosodically appropriate utterances in dynamic domains. Automatic methods will be developed to infer prosodic behaviors from dialog datasets, enabling rapid development of models for new tasks, domains, and user populations. The methods and models will be evaluated on their ability to accurately model observed human behavior and on their ability to make a system a more effective collaborator for humans. This work will also inform the design of better speech synthesizers.
In collaboration with AninditaNath, Diego Aguirre, James Jodoin and Olac Fuentes. Preliminary work was supported by the National Science Foundation as IIS-1449093: Eager: Preliminaries to the Development of Responsive Prosodic Behaviors for Interactive Systems, 2014-2016.
Methods for Identifying Individual Differences in Dialog Prosody
Language learners often have difficulty with prosody, especially for the prosodic forms used in dialog activities, but there are no diagnostic tools for dialog prosody. We are developing methods to work directly on unannotated non-native dialog data to automatically produce a listing of the prosodic constructions on which the non-natives are weak. We first create models of both native and non-native prosodic behavior in terms of pragmatic constructions, derived using Principal Components Analysis. The constructions involving weakness are then automatically identified as those native constructions for which there is no close non-native counterpart, as measured the cosine distance over the loadings of the component features. So far this method has been applied to 90 minutes of dialog behavior by six advanced native-Spanish learners of English, successfully discovering both minor differences and major deficits.
We next intend to extend these methods for the characterization of individual differences, which will involve mapping the space of individual variation to enable meaningful classification with modest data.
Tools to Support the Discovery of Dialog Patterns
Building better dialog systems requires a better understanding of the low-level details of human communication. However the dynamics of interaction at the extreme time-scales characteristic of swift dialog are not accessible to casual observation. Progress here depends on tools for systematically analyzing these patterns of behavior. In recent years excellent freeware tools for audio data transcription, phonetic analysis, and speech manipulation have appeared, however none work well for dialog. We need tools that directly support search, comparison, hypothesis formulation, and hypothesis evaluation for dialog phenomena; this is essential to advancing scientific understanding and to engineering highly responsive systems.
We have built a toolkit for this kind of analysis including methods for semi-automatically identifying important dialog cues and patterns from conversation data in any language. We are currently extending this toolkit and applying it to new languages.
See the Prosodic Features Toolkit.
See also Publications