Technical Report UTEP-CS-TR-19-28.
Abstract:
While many aspects of speech processing, including speech recognition
and speech synthesis, have seen enormous advances over the past few
years, advances in dialog have been more modest. This difference is
largely attributable to the lack of resources that can support machine
learning of dialog models and dialog phenomena.
The research community accordingly needs a corpus of spoken dialogs
with quality annotations every 100 milliseconds or so. We envisage a
large and diverse collection: on the order of fifty hours of data,
representing hundreds of speakers and many genres, with every instant
labeled for interaction quality by one or more human judges. To make
it maximally useful, its design will be a community effort.
This technical report is an edited version of a National Science
Foundation proposal, submitted to the CISE Community
Research Infrastructure Program in February 2019.