USC-UTEP Workshop on Predictive Models of
Human Communication Dynamics

Breakout Session Topics

The following topics were proposed. Those actually discussed are marked with *.

* Topic 1: Can we develop a general-purpose evaluation metric for predictive models of communication dynamics?

Background: There are many ways to evaluate predictors. For yes/no decisions (e.g. whether to back-channel), accuracy and precision are often used. For endpointing, specific penalties for dead-time and overlaps are common. For predicting the upcoming word, perplexity is often used. For other tasks, such as predicting the upcoming pitch height, loudness, or speaking rate, there are no common metrics at all. Issues include such basic questions as predicting events vs predicting states (e.g. mutual gaze as an action or a situation); predicting all the time vs only at "important" decision points (e.g. potential TRPs, etc.); and absolute predictions vs. graded predictions (e.g. weakly predict a back-channel in the next 500ms).

Activity: Discussion.

Desired Outcomes: A listing of evaluation methods and when each is most appropriate. Decision as to whether and how to find or develop a standard evaluation metric or method.

Topic 2: How can learners and teachers of language and communication be led to appreciate the importance of communication dynamics?

Background: We assume that good communication dynamics is important in everyday life, and that the difficulty of adapting to the dynamics of new languages and cultures is a problem for many people. What are the non-verbal response patterns that are most important to know? How can we make people more aware of the need to learn these? How can we make it easier for teachers to teach them?

Activity: Discussion, possibly with reference to dialog data.

Desired Outcomes: Ideas for how to raise awareness of the importance of culture-appropriate interactional dynamics, including both scholarly and mass-media approaches. Ideas for how to facilitate the learning and teaching of these patterns.

* Topic 3: Where do predictive models fit in the overall cognitive architecture of the speaker/hearer?

Background: Participating in dialog requires understanding what the other person is saying, what they mean, and what their aims are; thinking of how to respond, recalling and marshalling relevant information; deciding when to talk, what to say, and how to say it; and constantly monitoring the allocation of cognitive effort, the production processes, and the reception by the interlocutor. How do all these processes interact, and where does prediction fit in? Do the same answers apply when designing systems to interact with people?

Activity: Discussion, possibly supplemented by perusal of a recorded dialog.

Desired Outcomes: A diagram (maybe just an old-fashioned boxes-and-arrows diagram) of the mind of a dialog participant, showing how information must flow and how the various processes are invoked and controlled. Hypotheses, and ideas of how to test them.

* Topic 4: Can we formulate the human-communication dynamics prediction problem as a useful and compelling Grand Challenge?

Background: In this area, which problem or sub-problem is of the right centrality and scope to be a Grand Challenge? How could we formulate it so as to make its importance obvious to everyone? How would we define success? How long would it take, and what level of effort would be required? Who would cheer for us when we're done, and who would support us while we do it? Should it be tackled competitively by teams of friendly rivals, or do we need a strong coordinating committee? Or is the prediction problem best presented as part of some other grand challenge?

Activity: Discussion.

Desired Outcomes: A clear goal statement and a high-level plan of attack.

Topic 5: How can we develop appropriate sets of labels for annotating human communicative behaviors, especially when human observers perceive things differently?

Background: Quantitative studies of interactional dynamics typically start by annotating a corpus with labels. This is often frustrating, given theoretical differences (e.g. what is a "turn" anyway?), differences in labelers' perceptions (e.g. is that movement at 45.2 seconds into the dialog a nod or not?), and differences in interpretation (e.g. is a certain tone of voice indicating boredom, lack of attention, dominance, or passivity).

Activity: Discussion, with reference to some dialog data.

Desired Outcomes: A short list of solutions or work-arounds, noting when each is appropriate. Perhaps a consensus inventory of states or descriptive scales.

* Topic 6: How can we decompose the enormous job of modeling human communication dynamics into subtasks or subfields?

Background: A researcher wanting to build or use a predictive model typically has to collect data, build feature-detectors, define a prediction quality metric select or adapt a learning algorithm, and interpret the results. Could the work be organized more efficiently, reducing the need to have each lab be so vertically integrated? How can we parcel out the needed work? If a linguist friend came to us saying "I'm excited about the idea of predictive models of human communication dynamics, and would like to contribute," what would we ask him to work on? An anthropologist friend? a neuroscientist? Others?

Activity: Discussion.

Desired Outcomes: A list of relevant fields. A list of activities, organized by fields, that are individually interesting and together provide all the needed tools and components for building predictive models.

Topic 7: How should the study of human communication dynamics relate to existing research communities, or should we create a new society or organization?

Background: Work in human communication dynamics is published in all sorts of journals and presented in all sorts of venues. The study of human communication dynamics is marginal from the point of view of every academic field, hurting visibility and making it hard for students to see it as important. Perhaps synergy would be improved if we could identify a good home, or establish one.

Activity: Discussion.

Desired Outcomes: A list of the relevant societies and communities, and how we can relate to them, either as individual researchers or as a new research group.

Topic 8: How good are people at predicting ahead in human interactions?

Background: A premise of this workshop is that people interacting are able to predict what will happen next. How true is this?

Activity: Play a recorded dialog up to some random point and stop it. Have all participants write down what they think will happen next: which word, what timing, what tone of voice, (if video) what gestures, etc. Play ahead and see whether the guesses were correct. Discuss. Repeat. Now try it when the people predicting are not observers but participants: set a timer to interrupt the discussion semi-randomly, and at the point when it beeps, have everyone write down what they were about to say or do and what they expected the other participants to say or do, and then compare. Introspection on the mental processes involved may also be useful.

Desired Outcomes: Thoughts about the types of knowledge used by humans, thoughts about the nature of this ability to predict, and thoughts about how to study this.

Topic A: What are we trying to predict?

Topic B: What's the role of consciousness in the various models being discussed?

Topic C: What shared (annotated?) data sets should we develop? Should there be a repository?

*Topic D: What new tools do we need?

Topic E: (How) can we develop models where the value function is separate from the algorithm?

Topic F. What are good computational strategies for predictive model dynamics? (How) can we collaborate on this aspect?

Topic G. What's the path to analyzing human communication dynamics over larger time scales and/or larger group sizes?

* Topic H. How can we combine linguistic and non-linguistic information?

Topic I. What's the common ground between the study of intentional communicative acts and the study of behaviors in communication a la Pentland.

Available Data

Gently Persuasive Dialogs This corpus consists of 10 dialogs between students and the program coordinator, a department staff member who we had noticed as unusually personable and pleasant to talk to, an exemplar for effective dialog behaviors. Her job functions included giving career advice to undergraduates and helping to grow the graduate program, so it was natural to ask her to talk to undergraduates about graduate school. We brought in 10 students to talk with her, compensating them with credit for one of the assignments in their Introduction to Computer Science class. The students had little knowledge of the nature or value of graduate school or of the application process. The conversations lasted 9-20 minutes.

Mock Billing Dialogs The subjects were 20 lower-division Computer Science students, of whom 11 were native speakers of English, all with little or no experience using spoken dialog systems. For each interaction, subjects were given a mock credit-card statement, a mock bank statement, and a brief checklist of three tasks to complete. They were also instructed verbally regarding the tasks, which were to obtain balance information, to review the most recent transactions, and to make a payment. Instructions were kept simple so that subjects would know what they needed to accomplish but not how. In the system-based interactions the subjects were informed that they would be using a spoken dialog system and that they should speak to it as they would with a person. The interactions with a human operator were constrained to be roughly comparable by showing the operator the system's prompts and dialog flow and asking her to use mostly

Tutoring Dialogs These are "memory quiz tasks", representative of the type of review a student might do with a tutor if the aim is to to memorize the times table, or the abbreviations of the chemical elements, or the codes to use for kinds of merchandise. We chose memory recall quizzes because they exhibit rich variation in acknowledgment use and a swift interplay between student and tutor, but are otherwise semantically and pragmatically quite simple. The quiz tasks included the countries of South America, the El Paso exits of Interstate 10, and the colleges of UTEP.

Free Dialog Each conversation had two participants, generally of the conversations, the participants were seated in such a way as to prevent them from seeing each other. Thus these conversations were similar to telephone talk.

Direction-Giving Students came to the lab, and were instructed to ask the other person for directions to someplace they knew.