Disappointment Project Annotation Guide: Watergirl Version

Version 1.2, June 28, 2021, Nigel G. Ward, Jonathan E. Avila


=== Purpose ===

We would like a method to automatically detect disappointment in dialog, at the
frame level.  If we can do this, we can write a paper. Our first practical
application will be to combine this with simple hand-crafted aggregation
methods, such as counting the fraction of frames that seem disappointed, to
solve our sponsor's problem.

We want training data, to overcome the problem of having only dialog-level
labels that are insufficient for supervised learning. In particular, we want to
identify informative regions, likely to be usefully informative and worth
training on.  Exhaustively labeling everything, or striving to accurate
categorizing borderline cases, is of no interest to us.

We also want data to use for testing purposes.  Ideally this should be labeled
consistently with the training data, but for evaluation purposes slight
differences shouldn't much matter.

Our current modeling strategies all assume that there are basically two kinds of
utterances, disappointed ones, and everything else, which will dominate.

Some dialogs also appear to include strong markers: super-disappointed
utterances and super-pleased utterances.

=== Definition ===

For current purposes, the word "disappointment" may include feelings, stances,
and activities like annoyance, indignation, anger, pleading, remonstrance,
frustration, dismay, sadness, being upset, disengagement, exasperation, general
negative sentiment, and so on.

We are more interested in the more informative disappointments. Minor
disappointments --- for example, a false start and self correction --- should
generally also be marked as disappointment, but this is less critical.

Disapointment comes in degrees, but for now we are not interested in the
subtleties.  If they sound clearly disappointed, even if not strongly, it should
be marked.  If there is a slight nuance of disappointment in their voice, there
is no need to mark it.

You can base your decisions on the tone of voice, the words they use, the timing
of what they say, and on the broader context, including what the other player
just said, and the way the interlocutor reacts to their utterance.

It is normal for different annotators to have different thresholds and criteria.
High agreeement is not needed or expected.  We may attempt to predict the number
of annotators who label a segment d or dd, so variety can be welcome.
Incidentally, agreement may here be measured by a varient of Kappa ((achieved
agreement - random agreement) / (perfect agreement - random agreement)),
computed over all frames that at least one annotator considers to be p, d, or
dd.


=== Categories ===

ds: dissatisfied with self

do: dissatisfied with other person

dg: dissatisfied with game

dr: repair/correction. Either when one person indicates that they don't
understand, e.g. "sorry? what?", or the other person points it out, e.g. "no,
the other way."

d: dissatisfied. Anything that should be labeled as dissatisfaction but doesn't
fit the categories above. (Rare.)

p: pleased. Use this for very pleased utterances, for example when the players
show joy at completing a level. Phrases where this might be likely include
"Great", "Yay". This can include pleasure and praise.

o: outside speaker or out of character. Use this for audio that outside of the
normal course of the game. For example, when testing the recording set up or
getting instructions from a third party.

no label: neutral/normal. (The default.)


=== Units ===

Labels should be generally be done by turns.  These may be as short as a single
word, or as long as several sentences.  It should be fine if the units are
roughly selected, including some silence, as our processes will be able to take
care of, or ignore, those units. In particular, turn-internal pauses are not
something to worry about. Modest errors in identifying the start and end of the
speech regions are also tolerable.

If the speaker's tone shifts during a turn, or even during an utterance, split
that into two regions and label each separately. 

Regions of silence, music, small uninformative sounds, and so on should not be
labeled.


=== File Naming and File Format ===

Do the annotation in ELAN, open the .EAF file and save it once done
(overwritting it).