Disappointment Project Annotation Guide: Watergirl Version Version 1.2, June 28, 2021, Nigel G. Ward, Jonathan E. Avila === Purpose === We would like a method to automatically detect disappointment in dialog, at the frame level. If we can do this, we can write a paper. Our first practical application will be to combine this with simple hand-crafted aggregation methods, such as counting the fraction of frames that seem disappointed, to solve our sponsor's problem. We want training data, to overcome the problem of having only dialog-level labels that are insufficient for supervised learning. In particular, we want to identify informative regions, likely to be usefully informative and worth training on. Exhaustively labeling everything, or striving to accurate categorizing borderline cases, is of no interest to us. We also want data to use for testing purposes. Ideally this should be labeled consistently with the training data, but for evaluation purposes slight differences shouldn't much matter. Our current modeling strategies all assume that there are basically two kinds of utterances, disappointed ones, and everything else, which will dominate. Some dialogs also appear to include strong markers: super-disappointed utterances and super-pleased utterances. === Definition === For current purposes, the word "disappointment" may include feelings, stances, and activities like annoyance, indignation, anger, pleading, remonstrance, frustration, dismay, sadness, being upset, disengagement, exasperation, general negative sentiment, and so on. We are more interested in the more informative disappointments. Minor disappointments --- for example, a false start and self correction --- should generally also be marked as disappointment, but this is less critical. Disapointment comes in degrees, but for now we are not interested in the subtleties. If they sound clearly disappointed, even if not strongly, it should be marked. If there is a slight nuance of disappointment in their voice, there is no need to mark it. You can base your decisions on the tone of voice, the words they use, the timing of what they say, and on the broader context, including what the other player just said, and the way the interlocutor reacts to their utterance. It is normal for different annotators to have different thresholds and criteria. High agreeement is not needed or expected. We may attempt to predict the number of annotators who label a segment d or dd, so variety can be welcome. Incidentally, agreement may here be measured by a varient of Kappa ((achieved agreement - random agreement) / (perfect agreement - random agreement)), computed over all frames that at least one annotator considers to be p, d, or dd. === Categories === ds: dissatisfied with self do: dissatisfied with other person dg: dissatisfied with game dr: repair/correction. Either when one person indicates that they don't understand, e.g. "sorry? what?", or the other person points it out, e.g. "no, the other way." d: dissatisfied. Anything that should be labeled as dissatisfaction but doesn't fit the categories above. (Rare.) p: pleased. Use this for very pleased utterances, for example when the players show joy at completing a level. Phrases where this might be likely include "Great", "Yay". This can include pleasure and praise. o: outside speaker or out of character. Use this for audio that outside of the normal course of the game. For example, when testing the recording set up or getting instructions from a third party. no label: neutral/normal. (The default.) === Units === Labels should be generally be done by turns. These may be as short as a single word, or as long as several sentences. It should be fine if the units are roughly selected, including some silence, as our processes will be able to take care of, or ignore, those units. In particular, turn-internal pauses are not something to worry about. Modest errors in identifying the start and end of the speech regions are also tolerable. If the speaker's tone shifts during a turn, or even during an utterance, split that into two regions and label each separately. Regions of silence, music, small uninformative sounds, and so on should not be labeled. === File Naming and File Format === Do the annotation in ELAN, open the .EAF file and save it once done (overwritting it).