|
Current Projects
Responsiveness in Spoken Tutorial Dialogs
An interesting aspect of human-human conversations is the ability
for a conversant to pick up on the nuances of the other conversant's
speech and be able to infer the other's state at the time, such as if
they are happy, frustrated, or confused. Using the prosody of the
utterances and the timing between utterances, an insightful conversant
is able to alter their own speaking style and feedback to encourage
the other speaker. By evaluating data obtained through a corpus
collected from a skilled tutor, we will be able to develop rules of
when to use and how to use specific acknowledgments and feedback.
Using these rules, we hope to build a more human-like tutoring system
that users will find preferable to a system without these rules.
This is an extension of a previous study done in Japanese.
Tasha Hollingsed, Ernesto Medina (Advisor: Nigel Ward)
Prosodic Cues that Lead to Back-Channel Feedback in Northern-Mexican Spanish
Although human-human spoken dialog is generally fluent and natural, interaction with today's spoken dialog systems is often rigid and unpleasant. Thus a research priority is to identify aspects of human-human interaction that may be exploited for better human-computer dialog systems. One such aspect is back-channel feedback, that is, the short utterances which a listener commonly produces while a speaker is talking, such as uh-huh in English and si, aja, and mjm in Spanish. A dialog system that appropriately produces such utterances may seem more encouraging and competent to users. To do this systems need to know the places in which back-channels are welcomed. This study analyzed prosodic features of five conversations to identify the cues which lead to back-channel feedback. In Northern Mexican Spanish, these places are mostly characterized by a pitch downslope followed by a pitch rise accompanied by a rate reduction on the last syllable and then a drop
in energy followed by a slight pause. A quantitative model based on this feature gave 29% coverage and 14% accuracy.
Luis H. Acosta Reyes and Anais Rivera (Advisor: Nigel Ward)
Prosodic Features that Invite Back-Channel Responses in Arabic
To be a good listener requires active listening, which can be achieved
partly by producing small utterances called back-channel feedback.
Second language learners, even if masters in grammar and vocabulary,
can easily appear uninterested when they do not show responsiveness in
real face-to-face conversations. In Arabic, there is a lack of resources
that teach L2 learners when to produce back-channels in dialog. The most
frequent prosodic feature used as a cue for back-channel feedback in Arabic
was found to be a steep pitch downslope. The performance of this feature
as a predictive rule gave 43% coverage and 13% accuracy on a 168 minute
corpus of Egyptian Arabic
(Ward and Al Bayyari; 2006). The next most frequent cue (causing 14% of
the total back-channels) is an upturn in pitch, whose predictive rule gave
19% coverage and 15% accuracy on 112 minutes of Iraqi Arabic dialogs.
Further work to be done on whether there are any visual cues exist in
Arabic dialog. For achieving this, a video-recorded Arabic corpus will be
used. Additionally, we have done work to find out whether speakers' gestures play a role in cueing back-channel feedback from listeners. We used a video-recorded Iraqi Arabic corpus of face-to-face free-content dialogs which we collected. We found that the tendency of visual cues co-occurred with subsequent back-channel was not significant. Also, we found that a prosodic cue accompanied by a visual cue is not stronger than a prosodic cue alone in eliciting a back-channel response.
Yaffa Al Bayyari (Advisor: Nigel Ward)
A Training Tool for Teaching Arabic Back-channeling Behavior
We aim to develop a tool to teach back-channeling to learners of a foreign language. The current development version of the trainer has been used in pilot studies combined with Flash-based tutorials and the results show that a 15-minute session with the trainer and the tutorials is effective in teaching basic back-channeling behavior. The tool, called “Back-channel Trainer”, works as follows: The core trainer plays back a conversation to the user taken from real phone conversations in Arabic. These conversations present the user with back-channel opportunities, so that learners receive numerous examples of the cues. The listener’s side of the conversations, that is, the side containing the back-channel productions to be emulated by the learner is removed and the remaining side is used as the stimulus. The trainer tool features a visual indicator to call attention to the cue that is included in the stimulus track. The trainer scores the learner based on the timing and frequency quality of her productions.
Rafael Escalante, Nigel Ward, Yaffa Al Bayyari, Thamar Solorio.
Consistent Generation of Text and Graphics
Safety-critical applications such as aircraft maintenance manuals
contain both textual and graphical descriptions of the same systems.
Incidents have occurred as a result of inconsistencies between the
textual and graphical descriptions. To address this problem, we are
developing ways of producing both the text and graphics for a system
from a common logical description of the system. This approach will
also enable generation of the text in multiple languages.
PI: Dr. David Novick
Speech Recognition System Development
This project involves the study of speech recognition software and
the development of scripts written in Perl that can be used to
automatically build speech recognition systems. The purpose
for the development of these scripts is to build a speech
recognition system that can recognize a user's single word answer to
quiz items presented by an interactive tutoring system (ITS) under
development by Tasha Hollingsed. Eventually, the two systems
will be integrated to form an ITS that can automatically recognize
answers. In addition, these scripts can also be used to customize
and evaluate the accuracy of the speech recognition system.
In particular, we are producing scripts that work with The Hidden
Markov Model Toolkit (HTK), a speech recognition system developed
by the Cambridge University Engineering Department.
Ernesto Medina
(Advisor: Nigel Ward)
Automated Processing of interlanguages
The main goal is to extend Natural Language Processing tools by allowing them to process language alternations. We are currently working with Spanglish. In this context, we use the term Spanglish to refer to the language alternation patterns of English and Spanish observed in large Hispanic communities across the U.S. As a first step, we gathered a small corpus of Spanglish use in a natural conversation between several Hispanic speakers. We then explored the use of existing language modeling toolkits and evaluate how well these tools can be trained with this corpus (read more here). Currently we are working on developing a Part-of-Speech tagger for Spanglish.
Juan Carlos Franco (Advisor: Thamar Solorio)
Open Source Game System
This project provides public-domain programming resources for authors
and developers of video games.
Joaquin A. Aguilar (Advisor: David Novick)
The Effects of Transmission Delay on Conversation Dynamics
In telephony, the effects of transmission delay on user satisfaction
have been long studied, and the relationship is quantified as part
of a standard predictive model, the E-model. However this research
has focused almost exclusively on telephony in traditional contexts,
with both conversants seated in quiet offices and devoting full
attention to the conversation. We plan to reexamine the effects
of transmission delay on conversational dynamics, and explore
second-order effects that may arise in mobile contexts.
Anais Rivera (Advisor: Nigel Ward)
Adding the Ability to Eavesdrop on Opponent Communications to an Online Game
This project is aimed at determining whether allowing opponents to overhear each other's conversations during an team online capture the flag match improves the fun factor of a game. The implementation uses the Quake 3 source code as the core platform and incorporates elements of other open source library packages such as PortAudio, Speex, SOX, and Raknet. Several User Studies are being conducted to test whether the new feature is valuable. The subjects will engage in capture-the-flag matches, with and without the eavesdropping ability; capture the flag intrinsically gives scenarios that require users to plan strategies, work in teams, and communicate. The positive and negative impacts of this new feature on subjective satisfaction will be measured, including vividness, realism, enjoyment, difficulty, novelty, and sensory gratification.
Jaime Acosta (Advisor: Nigel Ward)
Byrics Software, a Tool to Assist the Deaf and Hard of Hearing Experience Music
To be born deaf or hard of hearing, a person enters a world of limited to complete silence. Their world contains limited auditory conversation, music, and the sounds about them. There are some deaf and/or hard of hearing people who do not understand music since they lack the ability to hear. People with hearing disabilies have lost the ability to hear some or all frequencies that exist in a song and possibly hear the music, but the lyrics are hard to place since they have a higher pitch. A tool is being developed that will convey music through the senses of sight, sound, and touch via vibrations would enable them to enhance the experience and gain a better appreciation and understanding of music. The tool has the ability to play music, display lyrics, spectrograms and interact with Snugums. Snugums, from Somatron, is a vibro-acoustic speaker inside a furry creature. This device provides sound vibrations without having to raise the volume.
Rosario Chavez (Advisor: Nigel Ward)
Previous Projects
A Tool for Analysis of Sound Files
This project involved the extension of Wavesurfer, an existing open-source
analysis tool. The original software allows a user to analyze
various aspects of a sound file, including energy and pitch. It
also provides a user with the ability to transcribe the files. The
main feature added was support for stereo sound files, because the
original version supported only mono files.
Ernesto Medina (Advisor: Thamar Solorio)
Acknowledgments in Human-Computer Interaction
This study focuses on the use of acknowledgments as a form of
understanding and agreement during Human-Computer dialogues.
This study asks if people are unwilling to use acknowledgments
when speaking to a machine or if their scarcity is due to the
way that spoken dialogue systems are designed. This study was
run in English and Spanish.
PI: Karen Ward
Other study members: Tasha Hollingsed, Javier Aldaz Salmon
Extended Direct Manipulation
This study's objective is to extend direct-manipulation interfaces
by incorporating, via the direct-manipulation modality itself,
interaction techniques that add kinds of language features associated
with spoken conversation.
PI: David Novick
Other study members: Armando Sandoval, Gabriel Sotelo
An Improved Tool For Taking Notes in the Classroom
Although computers have become ubiquitous, largely replacing
traditional tools such as paper and pen, there remains a common
environment where computers are seldom used: the classroom.
The reason for this is that class notes have unique properties
not seen in other documents. In early 2002 Tatsukawa and Ward's
NoteTaker demonstrated how to design a system suitable for taking
notes in class, and studies found that it could be useful for students.
However weaknesses of the hardware available at that time limited the
utility of this system. The system developed here provides similar
functionality on the Tablet PC, a PC design with a high resolution
digitizing tablet released in late 2002. This version includes a
GUI redesigned for SWING and various event-handling optimizations
to provide good performance on the Tablet PC. This version was tested
by five subjects using it to take notes in their classes, and most
rated it positively.
Thesis Project by: Jabel Morales
Advisor: Nigel Ward
Situational Awareness in Medical Displays
Study in which we apply Endsley's Situation Awareness Model
(Endsley, 1995) and PICTIVE's low-tech approach to participatory
design (Muller, 1992) to the display of medical records. We believe
that this process will yield a better way of displaying medical
data that helps medical professionals better understand the condition
of their patients.
Thesis Project by: Francisco Romero
Advisor: Karen Ward
Towards Automatic Transcription of Sports Play-by-Play
Play-by-play broadcasts of sports events are a challenge for speech
recognition because they involve noise, multiple speakers, emotions,
and other phenomena of spontaneous speech. This study is a first
attempt to build a system to recognize football play-by-play,
initially focusing on the combination of two language models,
one for the event descriptions and one for the commentary.
Thesis Project by: Ryota Miura
Advisor: Nigel Ward
Border Agent-based Simulation Environment (BASE)
This project is developing a single-cell automata simulation tool for
modeling the complex phenomena of political borders. The objective of
the modeling is not so much prediction as understanding the nature of
border phenomena. The questions being modeled and explored in our
Border Agent-based Simulation Environment (BASE) begin with migration
stimulated by material push/pull factors, namely, availability of food,
and extend to the effects of the border, such as rates of filtering,
mortality, congestion, asymmetries (resources, fertility, disease)
on each side of the border, acculturation, the effects of communication,
resulting patterns of interaction, including inter-breeding, among agents
of different backgrounds, ownership, importation and theft of resources,
and secondary borders, such as highway checkpoints. The results of the
research will have value both for policy-makers facing border issues
and for computer scientists striving to make simulations more
realistic.
Guillermo Enriquez (Advisors: David Novick, Jon Amastae)
Some Practical Issues and Research Priorities in Dialog Management
Today's spoken dialog systems fall short of the human to human ideal.
Is this due to the fact that developers do not always use the state
of the art in the development of their systems or is it because the
technologies necessary to improve the spoken interactions do not yet
exist? To answer this question we developed a credit card system using
state-of-the-art technology. We conducted experiments in which the
subjects were to complete three tasks both using the system and
speaking to a human operator. We were able to observe and classify
the differences between the two dialogs. We expect that these
observations will provide us with a deeper insight into what would
be necessary in order to achieve the ideal interaction.
Anais Rivera (Advisor: Nigel Ward)
Adaptive Speaking Rate in a Tutorial Spoken Dialog System
Today we have many tutor-systems which can train users in memory
games and there are also systems which can adapt their speed to
user's needs. Our aim is to build a system, which can do both,
a tutor adapting its speed to suit the needs of the user such
that there is neither loss of time due to too slow prompt,
nor need for repetition of statements due to too fast prompts.
Further we plan to link such a system to external processes
for prosodic based high responsive behavior to improve turn
taking, emulating human - human interaction.
Kumar Mamidipally (Advisor: Nigel Ward)
Non-Lexical Utterances
There are a many sounds in conversation that are not words, such as
u-uh, uh-uhmmm, um-um-hu-mh, ahh, aum, and haah. These sounds play
important roles in human-human communication --- especially the
managment of information status, interpersonal affect, and channel
control --- but are not used in spoken dialog systems today. We are
exploring the phonetics and pragamatics of such sounds in English,
Japanese, and Spanish. So far we have concluded, based on both corpus
and experimental evidence, that these items are in part compositional,
in that each component sound brings a corresponding component of meaning.
We are now extending the empirical work and exploring potential
applications.
PI: Dr. Nigel Ward
An Acceptance Estimator for Computer Science Graduate Admissions
Potential applicants to graduate school find it difficult to predict,
even approximately, which schools will accept them. We have created
a predictive model of admissions decision-making, based on analysis
of a database of past decisions. Interesting aspects of the model
include the way that weights are assigned dynamically to various
factors based on the informativeness of each factor and based on the
applicant's relative strengths on each factor. This model has been
packaged in the form of a Web page that enables a student to enter
his or her information and see a list of schools where he or she is
likely to be accepted. We are currently working to improve the
accuracy of the model and make it more generally useful.
PI: Dr. Nigel Ward
|