The rules governing real-time interpersonal interaction are today not well understood. With only a few exceptions, there are no quantitative, predictive rules explaining how to respond in real-time, in the sub-second range, in order to be an effective communicator in a given culture. This can be a problem in intercultural interactions; if an American knows only the words of a foreign language, not the rules of interaction, he can easily appear uninterested, discourteous, passive, indecisive, pushy, or worse. Short of long-term cultural exposure, there are today no reliable ways to train speakers to understand and follow such rules and attain mastery of interaction at the sub-second level. The purpose of this research is to increase our knowledge and know-how in this area.
The aim of this project is to develop methods for training learners of Arabic to master these behaviors and thereby appear more polite. Focusing on back-channel feedback, we have developed a training sequence which enables the acquisition of a basic Arabic back-channel skill, namely, that of producing feedback immediately after the speaker produces a sharp pitch downslope. This training sequence includes software that provides feedback on learners' attempts to produce the cue themselves and feedback on learners' performance as they play the role of an attentive listener in response to one side of a pre-recorded dialog. Experiments with human subjects have shown this to be effective. (with Yaffa Al Bayyari, David Novick, and Marisa Flecha-Garcia)
Funded in part by awards from DARPA-DSO and DoD-CIFA.
An important ability seen in human-human conversations is the ability for a conversant to pick up on the nuances of the other's speech, and from that to be able to infer whether the other is confident, frustrated, confused, etc. Good conversants can then use this information to alter their own speaking style to show supportiveness to the other speaker. We are examining these abilities as they are deployed in tutorial dialog. By analyzing a corpus of dialogs with skilled tutor, we have found rules for which acknowledgment to use when, and in what prosodic form, based on the user's current cognitive and communicative state, as revealed by his or her prosody and recent behavior. Experiments show that users prefer systems which produce acknowledgments chosen appropriately in this way. (Rafael Escalante, continuing work by Tasha Hollingsed and by Wataru Tsukahara)
Funded in part by NSF award IIS-0415150.
Today most applicants to graduate programs cannot predict where they will be accepted. Even those departments which do give quantitative information regarding GPAs and GREs report statistics so obliquely that it is very difficult for a potential applicant to infer what they mean for him or her.
The Acceptance Estimator
Towards a Model of Computer Science Graduate Admissions Decisions. Nigel Ward. Journal of Advanced Computational Intelligence and Intelligent Informatics, 10, pp 372-383. pdf draft, final version
The pacing of today's dialog systems tends to be rigid. In addition to the problems of turn-taking, the speaking rate itself is generally fixed. For example, the automatic number-giving that comes at the end of directory assistance calls is at a fixed rate: too slow for some people and too fast for others. We have found that the speaking rate can be adapted automatically for these dialogs, based just on the user's speaking rate and response latency. However simple correlations break down for more complex types of dialog; there the speaking rate appears to depend more on dialog act and dialog state. (with S. Kumar Mamidipally, continuing work by Satoshi Nakagawa)
Speech recognizers all include a component for predicting, based on the past context, what words are likely to appear next. Today these components, known as language models, operate at the symbol level, abstracted away from the details of how and when the words are spoken. Spoken language, however, is not just a symbolic or mathematical object, but is produced and understood by human brains, with specific processing constraints, and these can directly affect what happens when in dialog.
This project is developing language models that explicitly use the information in the local prosody, including pitch, volume, and speaking rate. Inspired by psychological research suggesting that dialog and language behaviors are the result of multiple simultaneously active cognitive processes, the working assumption is that the words likely to be spoken at a given time depend, probabilistically, on the speaker's current state, as indicated, for example, by the animation in their voice, or the time since the other speaker stopped talking.
Language models that include such features can improve speech recognition accuracy. As they implicitly represent some aspects of dialog dynamics, they may also lead to a more integrated understanding of the nature of dialog as a human ability.
More generally, prediction in dialog is critical to many important applications, as discussed at our workshop: Predictive Models of Human Communication Dynamics.
(with Alejandro Vega, Shreyas Karkhedkar, Ben Walker, and Nisha Kiran)
Funded in part by NSF award IIS-0914868.
The range of applications for spoken dialog systems today is very limited: only information access and very simple transactions are commonly supported. Even these systems are generally disliked and avoided whenever practical. Ultimately, however, spoken dialog systems have the potential to be better than other media for interactions where empathy or trust plays a role. In particular we are examining persuasive dialogs, where the speakers use their language and vocal repertoire in a highly adaptive, highly effective, and "charming" way. We have collected a corpus of dialogs between freshmen and a staff member wanting them to consider graduate school as an option, identified the content nuggets, and built two baseline systems, one with text input and output and one in VoiceXML. We have discovered some of the low-level engaging behaviors in this corpus, and demonstrated their value in a spoken dialog system that engages users. (with Michael H. Durcholz, Jaime C. Acosta)
Not everything people say to each other is equally important. However current mobile telecommunications infrastructure doesn't care: every uh and um is transmitted just as faithfully as critical names or numbers.
We believe that telecommunications systems should instead transmit voice with more accuracy to the extent that it is more important. This means that a system should economize on the bits used to code less important utterance components --- fillers, back-channels, disfluencies and so on --- and use some of the bits saved to instead transmit meta-communicative signals and to more accurately transmit the most important parts of utterances. The value for users will be a near-wideband experience for no additional datarate-related costs.
We are working towards this goal by
with Karen Richart-Ruiz
When dictating using speech recognition, most of the time is spent correcting errors. However it is possible to predict which words are probably in error, using various confidence measures.
This suggests an editor with two new functions: display probable errors in red, and enable jumping to the next probable error with the tab key. Simple experiments suggest that these functions can be valuable.
Using a GOMS-type model of editing costs, it is possible to predict in which circumstances these function are useful, depending on three parameters: speech recognition rate, confidence measure quality, and user tolerance for uncorrected errors.
Can Confidence Scores Help Users Post-Editing Speech Recognizer Output? Taku Endo, Nigel Ward, and Minoru Terada. International Conference on Spoken Language Processing, 2002. pdf
A similar proposal was later made by Feng and Sears, in ToCHI 2004.
When you want to find out the basic meaning of some new word or abbreviation, the net is a good place, but search engines do rather poorly on single-word terminology-type queries.
It is possible to do better by adapting various classic information-retrieval techniques, together with the new technique of favoring pages which contain a region with a high local density of relevant concepts.
Searching for Explanatory Web Pages using Query Expansion.
This work won a Best Paper award at Pacling 2001, leading to an invited paper 5 years later:
Searching for Explanatory Web Pages using Query Expansion. (updated version) Manabu Tauchi and Nigel Ward. Computational Intelligence, 2006, to appear. pdf
DigitalNote is a freeware Java application which makes it easier to take class notes using your Tablet PC or other notebook computer. Notes taken with DigitalNote are superior to pencil-and-paper notes in being searchable, editable, easily sharable, and more legible.
DigitalNote may be useful for you if you type relatively quickly, you carry around a notebook PC all the time anyway, and equations and diagrams are rare in your classes ... or, if they are common, your PC is a tablet or has a touchscreen (i.e. is stylus-equipped).
DigitalNote is stable. While there are many improvements and new features that are needed, it is usable as-is. (Hajime Tatsukawa and Jabel Morales)
DigitalNote Downloads and Publications
Many speech-to-speech translation projects assume that perfect translation of arbitrary text is possible and necessary, but for many situations simpler technology is more appropriate. Our inpiration comes from phrasebooks, which contain useful, context-tailored information. They are, however, not convenient enough to use during actual interactions.
We developed a wearable device, Yak, as an aid to cross-language communication. Yak produces utterances in the native's langauge at the user's command. The use of a heads-up display allows the user to operate the device while maintaining eye contact with the native, and while sending and receiving non-verbal signals. We believe this interaction paradigm will be valuable for many face-to-face encounters, for example as travelers deal with people in service roles, such as ticket agents, waiters, clerks. In this work we developed a hardware configuration, a user interface, and a scripting language. Preliminary experiments showed that the system was usable. (Jani Patokallio)
Yak Project Page
A central issue for dialog systems is the proper handling of turn-taking, to ensure that the system responds swiftly, but without talking over the user or denying him or her time to think. For one class of dialogs, direction-giving dialogs, this is relatively easy; it can be done surprisingly well using only the information in the duration and pitch of the user's utterances, as we found in a confirmation of Schmandt's early work.
Pacing Spoken Directions to Suit the Listener. Tatsuya Iwase and Nigel Ward. 5th International Conference on Spoken Language Processing (ICSLP-98). abstract, pdf
Modern networked games generally support audio communication among players, but the way this is done is unrealistic in that the team members can hear each other perfectly, but opposing teams cannot overhear their talk at all. We have augmented Quake with an audio environment where audio is transmitted to everyone whose avatar is close to the avatar of any talker. We expect that this new functionality will make games more compelling. (Jaime C. Acosta)
Music plays an important part in the lives of many people, however the deaf are largely excluded from this aspect of society. We are developing software to present music visually, in the form of a spectrogram conveying the beat and some of the tonal information together with karaoke-style presentation of the lyrics. (Rosario Chavez)
When a new visitor comes to your e-commerce site, you usually don't know have a clue what sort of person it is, which makes customization hard.
In daily life, we easily classify people as to age, sex, confidence, and so on, based, among other things, on how they move. For example, from a person's walk, or driving style, or handwriting, it is possible to make a reasonable guess about some important properties.
It would be nice if the same were possible from users' movements as they they move the mouse across a web page.
Based on mouse trajectories from hundreds of users, it seems that there are no easy correlations. More detailed examination might be profitable.
Chikara Shimizu and Nigel Ward, 2002
Tidying up files and directories is a nuisance, and there's no software designed to help. One possible solution is to use an enhanced representation, with file properties represented with colors, shapes, and textures, and file organization represented as a 3-D tree.
This system won first prize at the "Night for Java Technology", 2001.
Three-dimensional Display of Directory Structure to Support Users Tidying up Files. (in Japanese) Keisuke Nishimoto and Nigel Ward. Interaction 2001 (87th Human Interface Workshop), Information Processing Society of Japan. 2001.
A static web page or brochure is not effective for conveying to potential new members the feeling of being and working in the lab. On the other hand, not everyone can physically come and visit.
An effective solution is to provide a chat system for/about the lab, inhabited by a gaggle of chatterbots each having the personality and speech patterns of a lab member, interacting with the user and each other.
Evaluating Information-Giving Chatterbots. (in Japanese). Tatsuya Iwase and Nigel Ward. Interaction 2000 (87th Human Interface Workshop), Information Processing Society of Japan. 2000.
Chat using standard chat systems gets confused if more than a few people are participating.
Many classic cyberspace ideas have shown up in chat systems, including a two- or three- dimensional layout of participants, and avatars' with controllable facial expressions.
Responsive Chat was a low-overhead implementation of some of these ideas, plus gaze direction transmission, which seems to be the most valuable feature of all.
Daisuke Nakamura and Nigel Ward, 2000
A Lobby Agent should be able to greet people, and provide information and entertainment appropriate for users and passer-bys level of interest, engagement, and so on.
Our system did image processing to detect position, posture and gesture, speech processing to detect prosodic features, and used this information to determine the next appropriate action.
Requirements for a Socially Aware Free-standing Agent. Nigel Ward and Takeshi Kuroda. Proceedings of the Second International Symposium on Humanoid Robotics, Waseda University, Tokyo, Japan, 1999 (pdf)
In the late 1980s many machine translation systems were under development, but fluency of the output was generally poor. I attacked this problem by building a new kind of natural language generator, one in which the words and structures of the target language were flexibly employed to generate outputs more natural than those seen before. This generator was also incremental, parallel, and worked without assembling syntactic structures, features which were advantageous computationally and as a psycholinguistic model.
A Parallel Approach to Syntax for Generation. Nigel Ward. Artificial Intelligence 57, pages 183-225, 1992. abstract
A Connectionist Language Generator. Nigel Ward. Ablex 1994.
Even earlier projects, and the students who created them, are listed at my Sanpo advisees page.
Up to Nigel Ward.