Dialog Issue Labelers Guide, Version 3

December 13, 2004, Nigel Ward

Before you Begin

Make sure you understand the aims of the study, as presented in the draft Eurospeech paper.

Make sure you understand what is meant by each of the items on the Dialog Event Record.

Make sure you understand how the system works. You need a basic understanding of the components of a spoken dialog system. You should have a copy of the dialog flow diagram for the ISG Credit Card System (and the grammars for each state?) in front of you as you record.

Watch both the human-human dialog and the human-computer dialog for each subject before doing any Event recording. Also read the subject's questionaire.

Preliminary Remarks

We are interested in identifying the differences between system performance and human performance. These can of course not be seen directly, but can be infered from the actual dialogs.

Thus we are assuming that human performance will be better than system performance, and your job is to help identify how and why. (Of course there are also cases where the system performs superbly, or the human performs poorly; these events should also be recorded, noting that they are contrary to expectation.)


For all non-trivial events, use a separate Dialog Event Record sheet; you will probably use 5-20 sheets per subject.

Some events are trivial, meaning they can be described very briefly. For example, in many conversations, there are many cases where the user speaks very slowly. To handle such cases, take a Dialog Event Record and label it "Overall". Then just keep a running total of how many times you saw trivial events of each type.

For non-trivial events, first give a brief description of what happened and why it was good or bad. Unless it's irrelevant, transcribe the user's and provider's utterances.

Next fill in whatever checkboxes apply. These are provided mostly to save you from having to write out things. If the Event cannot be accurately described with the checkboxes, explain it with free text.

You will inevitably have to speculate about what the effects on the user. We don't particularly care about what the actual user felt, but what any user in that situation might have felt, so your subjective opinion is fine here. If you're not sure about a case, make a note to that effect.

Predicting how an observed problem could be fixed is also speculative. If you're not sure about something, mention it, or just put question marks next to the relevant checkboxes.

Likely Uses of your Labels

We will compute various statistics over the observations, so accurately marking the checkboxes is important.

We are also interested in the details of what happened,

We are also interested in improving this labelers guide, so problems and weaknesses should be noted.