Track 4 – 09:30-13:00

Duration: 3 hours, 30min coffee break

Location: Conference 6

Presenters: Nigel G. Ward¹, Gabriel Skantze²

1) University of Texas at El Paso, USA

2) KTH, Sweden

Target Audience: Interspeech participants wanting an overview of current models, techniques, trends, and issues in dialog modeling; and especially those wanting to use dialog knowledge to improve the performance of their systems.

Description

Models of dialog have traditionally been designed mostly to support dialog systems, but dialog knowledge is becoming used more broadly. Newer applications are leading to new models, more accurate and more robust than traditional finite-state models.

This tutorial will cover “what every speech researcher should know about dialog”. Participants will learn

models for representing and applying dialog knowledge
ways dialog knowledge is used in a wide variety of applications
tools, resources, opportunities and open issues

In relation to the conference theme of “Speech beyond Speech”, we will consider not only the lexical and prosodic aspects of dialog but also gesture, gaze, action,
and motion.

Participants are encouraged to bring a laptop if convenient, as one
of the exercises will involve small-group analysis of provided audio
clips.

Outline

A) Basic Notions in their Historical Contexts [45 minutes]

A1 Dialog Modeling for Telecommunications

concepts: turns, talkspurts, envelope information
issues: voice activity detection, effects of delay, turn-taking signals, talk and near-talk, joint modeling versus coupled models

A2 Engendering Rapport through Dialog

concepts: contingency, adaptation
issues: incremental processing, multifunctional utterances, action selection and coordination, integrating authored and learned behaviors

A3 Dialog Modeling for Information Retrieval

concepts: dialog activities, dialog genres, continuous vector-space modeling

A4 Information Delivery

concepts: human cognitive capacity, attention variation over time, pacing, feedback

B) Philosophical Interlude [10 minutes]

why is dialog necessary? the limited bandwidth of speech, the ephemerality of
sound, uncertainty and vagueness, grounding issues, human working-memory size
limitations, meta-communication, interpersonal relations
when is dialog not necessary? voice-command systems, mobile personal assistants
roles of dialog models: mediating
signal-to-interpretation mappings, supporting predictions, enabling action selection, coordinating behavior streams

C) Traditional Models of Dialog [10 minutes]

concepts: states, turns, dialog acts; lexical and prosodic features
issues: dilemmas in discretizing time, state and acts
alternate conceptions: plan-based, rhetorical-structure, and information-state models

D) Empirical Interlude [15 minutes]

Given fifteen seconds each from three casual dialogs:

identify the state at the end of each turn
identify what the state predicts about the upcoming behavior of each speaker
identify similar states across the dialogs, name them, discuss similarities and differences
now do the same for a sampling of ten within-turn timepoints
discuss with respect to previously-examined applications, models and issues

E) Dialog Systems: Basics [10 minutes]

Illustrations with IrisTK and VoiceXML 2.1

basic functions: call flow, choices, input fields, forms
other functionality: error handling, turn-taking, timeouts, universals, confidence
issues: separating dialog management from domain knowledge and task knowledge, separating interface and backend, initiative

F) Dialog and Speech Recognition, Understanding and Synthesis [15 minutes]

F1 Speech Recognition

language model conditioning: on prompt, previous dialog act, expected dialog act, slot type, dialog state, topic
speech recognition for dialog systems: authored and learned grammars

F2 Language Understanding in Dialog

issues: ambiguity and ellipsis; noise, fillers and disfluencies

F3 Speech Synthesis for Dialog Applications

conditioning on: dialog acts, emotional state, local context, timing
output-timing monitoring and uptake monitoring

G) Exercise: Dialog Design [15 minutes]

Author a small dialog model
Test it with an untrained “user”
Discussion

issues: representational convenience versus power, realistic and unrealistic expectations of user behavior, persona design

H) Dialog Systems: The Research Forefront [40 minutes]

genres: autonomous robots, collaborative agents, tutorial systems
issues: joint action, multimodal perception, multimodal
synthesis, situated dialog, incremental processing, tighter semantic
integration, multiparty interaction, integrating reactive and
task-oriented behaviors

I) Opportunities and Challenges [20 minutes]

other applications: predicting dialog outcomes, role inference,
personality detection, speaker state detection, analytics, clinical
diagnosis, training
unsupervised learning of: policies, dialog acts, dialog moves, strategies, structure, turn-taking
other challenges: modeling interaction-styles variation, composable behaviors
resources for research: organizations, software, shared tasks

References

[1] Raveesh Meena, Gabriel Skantze, and Joakim Gustafson. Data-driven models for timing feedback responses in a Map Task dialogue system. Computer Speech & Language, 28:903–922, 2014.

[2] David Schlangen and Gabriel Skantze. A general, abstract model of incremental dialogue processing. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 710–718, 2009.

[3] Gabriel Skantze. Exploring human error recovery strategies: Implications for spoken dialogue systems. Speech Communication, 45:325–341, 2005.

[4] Gabriel Skantze and S. Al Moubayed. IrisTK: a statechart-based toolkit for multi-party face-to- face interaction. In ICMI, 2012.

[5] Gabriel Skantze and A. Hjalmarsson. Towards incremental speech generation in conversational systems. Computer Speech and Language, 27:243–262, 2013.

[6] Gabriel Skantze, Catharine Oertel, and Anna Hjalmarsson. User feedback in human-robot dialogue: Task progression and uncertainty. HRI Workshop on Timing in Human-Robot Interaction. 2014.

[7] Nigel G. Ward and David DeVault. Ten challenges in highly-interactive dialog systems. In AAAI Symposium on Turn-taking and Coordination in Human-Machine Interaction, 2015.

[8] Nigel G.Ward and Karen A. Richart-Ruiz. Patterns of importance variation in spoken dialog. In 14th SigDial, 2013.

[9] Nigel G.Ward, Anais G. Rivera, Karen Ward, and David G. Novick. Root causes of lost time and user stress in a simple dialog system. In Interspeech, 2005.

[10] Nigel G. Ward and Alejandro Vega. A bottom-up exploration of the dimensions of dialog state in spoken interaction. In 13th Annual SIGdial Meeting on Discourse and Dialogue, 2012.

[11] Nigel G. Ward, Alejandro Vega, and Timo Baumann. Prosodic and temporal features for language modeling for dialog. Speech Communication, 54:161–174, 2011.

[12] Nigel G. Ward, Steven D. Werner, Fernando Garcia, and Emilio Sanchis. A prosody-based vector-space model of dialog activity for information retrieval. Speech Communication, 68:86–96, 2015.

Interspeech 2015 Tutorial on Dialog Models and Dialog Phenomena