Track 4 – 09:30-13:00
Duration: 3 hours, 30min coffee break
Location: Conference 6
Presenters: Nigel G. Ward1, Gabriel Skantze2
1) University of Texas at El Paso, USA
2) KTH, Sweden
Target Audience: Interspeech participants wanting an overview of current models, techniques, trends, and issues in dialog modeling; and especially those wanting to use dialog knowledge to improve the performance of their systems.
Description
Models of dialog have traditionally been designed mostly to support dialog systems, but dialog knowledge is becoming used more broadly. Newer applications are leading to new models, more accurate and more robust than traditional finite-state models.
This tutorial will cover “what every speech researcher should know about dialog”. Participants will learn
In relation to the conference theme of “Speech beyond Speech”, we will consider not only the lexical and prosodic aspects of dialog but also gesture, gaze, action,
and motion.
Participants are encouraged to bring a laptop if convenient, as one
of the exercises will involve small-group analysis of provided audio
clips.
Outline
A) Basic Notions in their Historical Contexts [45 minutes]
A1 Dialog Modeling for Telecommunications
A2 Engendering Rapport through Dialog
A3 Dialog Modeling for Information Retrieval
A4 Information Delivery
B) Philosophical Interlude [10 minutes]
C) Traditional Models of Dialog [10 minutes]
D) Empirical Interlude [15 minutes]
Given fifteen seconds each from three casual dialogs:
E) Dialog Systems: Basics [10 minutes]
Illustrations with IrisTK and VoiceXML 2.1
F) Dialog and Speech Recognition, Understanding and Synthesis [15 minutes]
F1 Speech Recognition
F2 Language Understanding in Dialog
F3 Speech Synthesis for Dialog Applications
G) Exercise: Dialog Design [15 minutes]
H) Dialog Systems: The Research Forefront [40 minutes]
I) Opportunities and Challenges [20 minutes]
References
[1] Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language, 28:903–922, 2014.
[2] David Schlangen and Gabriel Skantze. A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 710–718, 2009.
[3] Gabriel Skantze. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, 45:325–341, 2005.
[4] Gabriel Skantze and S. Al Moubayed. IrisTK: a statechart-based toolkit for multi-party face-to- face interaction. In ICMI, 2012.
[5] Gabriel Skantze and A. Hjalmarsson. Towards incremental speech generation in conversational systems. Computer Speech and Language, 27:243–262, 2013.
[6] Gabriel Skantze, Catharine Oertel, and Anna Hjalmarsson. User feedback in human-robot dialogue: Task progression and uncertainty. HRI Workshop on Timing in Human-Robot Interaction. 2014.
[7] Nigel G. Ward and David DeVault. Ten challenges in highly-interactive dialog systems. In AAAI Symposium on Turn-taking and Coordination in Human-Machine Interaction, 2015.
[8] Nigel G.Ward and Karen A. Richart-Ruiz. Patterns of importance variation in spoken dialog. In 14th SigDial, 2013.
[9] Nigel G.Ward, Anais G. Rivera, Karen Ward, and David G. Novick. Root causes of lost time and user stress in a simple dialog system. In Interspeech, 2005.
[10] Nigel G. Ward and Alejandro Vega. A bottom-up exploration of the dimensions of dialog state in spoken interaction. In 13th Annual SIGdial Meeting on Discourse and Dialogue, 2012.
[11] Nigel G. Ward, Alejandro Vega, and Timo Baumann. Prosodic and temporal features for language modeling for dialog. Speech Communication, 54:161–174, 2011.
[12] Nigel G. Ward, Steven D. Werner, Fernando Garcia, and Emilio Sanchis. A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication, 68:86–96, 2015.