Department of Computer Science, University of Texas at El Paso
Abstract: Two salient characteristics of spoken dialogs, in contrast to written texts, is that they are processes in time and that they are co-constructed by the interlocutors. Most current corpus-based methods for analyzing dialog phenomena, however, abstract away from these characteristics. This paper introduces a new corpus-based analysis method, temporal distributional analysis, which can reveal these aspects of dialog. Given a word of interest, this method identifies which words tend to co-occur with it at specific temporal offsets. This can be done not only for words produced by the same speaker but also for the interlocutor's words. This paper explains the method, presents several ways to visualize the results, illustrates what can be found about the words I, uh and uh-huh, compares it to non-temporal distributional analysis, and discusses potential applications to speech recognition, generation, and synthesis.
Full Paper (pdf)
Nigel Ward's Publications