Slides, as of July 27 (45MB zip)
Prosody is essential in human interaction, enabling people to show interest, establish rapport, efficiently convey nuances of attitude or intent, and so on. Prosody is relevant to every area of speech science, but our current understanding of prosody is fragmentary. In contrast to some areas of speech technology, where superhuman performance has been demonstrated on core tasks, models and techniques for handling prosody have lagged. This survey will give non-specialists the knowledge needed to decide whether and how to integrate prosodic information into their models and systems. It will overview the different ways in which prosody serves paralinguistic, phonological and pragmatic functions; discuss the roles of prosody in applications including speech recognition, speech synthesis, dialog systems, information retrieval and the inference of speaker states and traits; and present current trends, including modeling prosody beyond just intonation, representing prosodic knowledge with constructions of multiple prosodic features in specific temporal configurations, modeling observed prosody as the result of the superposition of patterns representing independent intents, modeling multispeaker phenomenon, and the use of sequence-to-sequence methods and unsupervised methods. Finally we will consider remaining challenges in research and applications.
Short Bibliography (now replaced by a new version)
Prosody, broadly defined as the aspects of spoken utterances that are not governed by segmental contrasts, is challenging to analyze because it operates close to the limits of conscious introspection, and because most spoken utterances involve multiple prosodic dimensions simultaneously conveying multiple meanings or serving multiple communicative functions. This course will help participants learn to identify, discover, and describe meaningful prosodic properties and patterns in spoken utterances.
The approach will be theory-neutral and descriptively eclectic. The focus will be on primary observation and preliminary analysis and ideation rather than hypothesis testing based on pre-existing theories. The course will include lectures, ear and production training exercises, discussions of readings, qualitative and quantitative analysis with Praat, and other tools, hands-on analysis of provided and contributed data, and the development and presentation of student research proposals. The course is designed to be broadly accessible, with knowledge of phonetics not required. Case studies will, depending on student interests, include sociolinguistic differences in the production and perception of prosodic forms, the mapping between prosody and other layers of linguistic and communicative organization (e.g. syntax, discourse, conversational turn-taking), cross-language comparisons, cross-cultural issues, and the prosody of non-native speakers.
Prosody is of wide cross-cutting interest, and this course will highlight its relevance to topics beyond phonetics: grammar, discourse analysis, pragmatics, nonverbal communication, and other areas. The relation to the theme of Language in the Digital Era will be in terms of introducing participants to software tools and computation methods for analyzing prosody. There will also be digressions on ways to model prosody for applications including emotion detection, personality inference, detection of medical conditions, speech synthesis, information retrieval, engendering rapport in virtual agents, and so on.
Upon successful completion of this course, students will be able to: