Labelers Guide, Draft 1, May 22, 2012, Nigel Ward and Karen Richart Welcome to ISG. Our research project is aiming to reveal the details of how people in dialog interact, and in particular the moment-by-moment variation in the significance of what they are saying. Sometimes what people say is critical, and other times they are producing ums and ahs, that have little meaning ... but not none. You will be labeling conversations for importance. Each conversation is a telephone conversation between two people. You will hear these in stereo, with each speaker in a separate track. Some dialogs have noise, echo, or bleeding, but just try to ignore it. You will need to do two things: split each track of the dialog into segments, and assign an importance label to each segment. We have created a tool, "dede," that will let you do these things. [Demonstration of how to use dede.] Importance labeling is subjective, meaning that your opinions may differ from ours. That's okay. Although we will spot-check some of your labels, to see how they differ from our judgments, generally we will be using your judgments as-is, to build our system, so please be thoughtful. For us, importance means how important it would be to the listener in the dialog. Thus, for example, when you're labeling what the Left speaker said, think about how important it would be for the listener in the Right track to hear those words clearly. Importance includes at least four aspects. A. conveying content. For example, if the speaker says he's from Dallas, the word "Dallas" is important information, as it's likely to come up later in the dialog. Sometimes you can infer the importance from the word, example the word "is" is usually predictable from context and carries little information, whereas the word "shortstop" is rarer and generally more information-rich. B. helping the listener predict what will come next. For example, if the speaker says "um", that can indicate that he's thinking of a word, so the next word may be a long, important one, so the listener should be prepared. Similarly, if a listener says "uh-huh", that's important because it tells the other person that it's okay to go on. C. suggesting to the listener how to respond. For example, if the speaker says "Arizona's beautiful" in an enthusiastic tone of voice, then it's important for the listener to pick up the implication that he should probably express agreement or somehow say something on the topic of Arizona. D. other information. For example, "Hello" has little meaning, but is important for revealing the speaker's gender, age, etc. Similarly, the sound of a child crying in the background doesn't mean anything, but helps the listener understand the speaker's situation and likely mental state. When you make your judgements, listen not only to the words but also the way the words are said. For example, stressed words and words pronounced in higher volume are often the more important ones. It may help to think about the clarity needed for the listener to correctly get the meaning of the word. For example, "three" should be transmitted clearly so as not to be confused with "free", but "and uh" probably doen't need to be that clear. Also label things that are not words, for example loud inbreaths, which are often important (by criterion 2) as indications that the speaker is about to start a turn. Laughter and even coughs etc. may also have some importance. Please label on a scale from 0 to 5. 5 is for unusually important words, for example stressed words or words that are somehow important to the dialog. 4 is for most words in fluent speech, for example the word "live" in "I live in Dallas", which brings some meaning but is not so critical. 3 is for somewhat less important things, for example word repetitions, as in "I went to, drove to Houston", where "went to" is less important. Backchannels (uh-huh) and laughter probably are usually at this level. Connecting words such as a stretched-out "and" said while the person decides what to say next may also be at this level. 2 is for for even less important things. For example, many inbreaths will be at this level. 1 is for things with almost no value, for example background noise. 0 is usually just pure silence. If you omit a label for a region it will automatically be counted as 0. Finally, if you're unsure at any point, just put a question mark after the label, for example "4?". Later on we'll ask you about these, and maybe refine our descriptions of the levels to be more clear in future. Before you assign the importance values, you will need to break up the speech into regions. Probably each region will be one word or two, although if a speaker continues in the same tone of voice, and with the same content density, then it's okay to have several seconds (up to about a dozen words) all in one region. Please align the region boundaries roughly with word boundaries, to within 30 millisecond or so. Occasionally you may wish to split one word into two regions, for example if the first syllable is loud and clear and the remainder of the word is mumbled as if unimportant. Steps: First, listen to the entire dialog, to get a sense for what the speakers are saying. Second, go through the left track, second by second, splitting the speech into regions and labeling each one. Third, to the same thing for the right track. Then go on to the next file. Dede command summary: f b s m ...