Interspeech 2022 Special Session on

Speaking Styles and Interaction Styles

Motivation

Many core speech technologies are now highly tuned and rival or exceed human performance. However, the user experience is often lacking, in part because speech synthesizers, speech recognizers, dialog managers, and other components are still inflexible in their expectations of human behavior. Moreover, considerations of style are becoming more important as we seek to deploy variations of one basic system across domains and genres. Thus more attention to interaction style variation and speaking style variation may help us improve the value for users.

Recently work in these areas has been rapidly advancing. We now know more about many types of style, including expressive styles, styles relating to social roles, and styles relating to stance and interpersonal dynamics. There are now many techniques for style in speech synthesis and voice conversion, style transfer in text and speech. The connections of style to genre, priming, accommodation/entrainment, lexical choice, social identity, dialog activity modeling, and next dialog move selection, have been increasingly examined.

However, work on style has been fragmented. There has been little communication between those working on style as it relates to various core technologies, various phenomena, various applications needs, and various linguistic questions.

Accordingly, the aim of this interdisciplinary special session is to bring together researchers who work on different aspects of style for spoken dialog, who work on different tasks, or who approach style from different perspectives. We expect this will foster cross-fertilization, broader understanding of the goals and challenges, and identification of shared priorities for further research.

Format

We envisage two oral sessions, with the last hour of the second session devoted to a panel discussion, plus one poster session.

Call for Papers

We welcome submissions on all aspects of style in speech and interaction. All papers must have an empirical component and must make a significant contribution to speech science or speech technology. We further encourage authors to also elaborate on how their work contributes to or relates to "the big picture" of style in spoken dialog. We also welcome papers whose motivations, contributions, or implications relate to more than one traditional Interspeech area, or which highlight issues not commonly addressed at Interspeech.

Questions of interest include:

How can we relate defined/explainable/controllable styles and learned/latent styles?

What aspects of style matter most for spoken language systems and for other applications?

What general principles and modeling techniques underlie style transfer across speech synthesis, language generation, and interactive behaviors?

How do the dimensions of style differ across domains, genres, and languages?

Can we approach a universal style dictionary or a universal space of styles?

What commonalities are there among expressive styles, social styles, and interpersonal styles?

How do styles relate to personality, gender, social identity, and dialect?

How can style modeling support influence detection and other classification tasks?

Which of the observations and aspirations of classic work on style (by Tannen, Biber, Pennebaker, and others) are not yet treated by computational models, and how can we bridge the gaps?

To what extent are style inventories and methods developed for monologue data sufficient to handle styles as they occur in dialog?

To what extent are findings based on controlled data relevant to style in the wild?

What range of interaction styles do users consider acceptable for robots and other interactive systems?

How does style variation operate across different timescales, from the utterance level, through transient stances and interpersonal attitudes, to the dialog and installation levels?

How do style choices serve both the overall goals of the interaction and the local goals and dialog activities?

How do less-studied aspects, such as voice quality, phonetic variation, turn timing, and gestures, contribute to the realization of styles?

Which aspects of style variation are most basic, perceptually or developmentally?

How can models of styles and their realizations in one domain transfer to other domains?

How can we model human perceptions and misperceptions of style?

How can style matching contribute to user acceptance, decrease cognitive load, or improve rapport?

How can knowledge of style differences improve the teaching, learning, and assessment of language and communications skills?

Submission

Please use the standard Interspeech style and format, and submit papers through the standard Interspeech process.

Organizers

Nigel Ward, University of Texas at El Paso

Kallirroi Georgila, University of Southern California

Scientific Committee

Yang Gao , Amazon AWS AI

Mark Hasegawa-Johnson, University of Illinois

Koji Inoue, Kyoto University

Simon King , University of Edinburgh

Rivka Levitan, City University of New York

Katherine Metcalf, Apple

Eva Szekely, KTH Royal Institute for Technology

Pol van Rijn, Max Planck Institute for Empirical Aesthetics

Rafael Valle , NVIDIA