Simple Actions, Complex Acts

David G. Novick

European Institute of Cognitive Sciences and Engineering


Actors in real-world conversations typically have a set of heterogeneous, implict goals that engender correspondingly complex acts. These acts, though, are often expressed through relatively simple actions that somehow embody the complexity. This paper suggests that their innocuous appearance can lead to misrepresentation of effects or to misunderstanding of intent.


Most dialogue researchers--certainly most of those interested in this symposium--would agree that the foundation of conversational communication is goal-directed action. In act-based models, theese actions constitute communication-based realizations of abstract acts, typically presented as utterance-level speech-acts. Acts are seen as accomplishing things such as obtaining information about the departure of a train or ordering a pizza. In these cases, the actions that effectuate acts are motivated by nominal "task-oriented" goals; but actions may embody attempts to achieve other kinds of less easily discernable goals such as group-formation. Observation of real conversations indicates that utterances, exchanges and whole conversations typically include actions that realize some mix some number of these goals. In the case of conversations in business meetings, the "task" entails a complex of goals and activities:

There is usually a nominal task--such as setting sales policies or designing an aspect of a computer interface--that forms a recurrent theme for meeting activities. Episodes of activities related to the theme typically are interspersed with other kinds of supporting activities. There also may be implicit tasks--such as group formation or maintenance--that cause other kinds of activities to occur (Ward, Marshall & Novick, 1995, p.15).

The simultaneous presence of the these goals may cause changes in the carrying-out and in the results of the nominal task. For example, one can explain observed modality-based differences in performance of a cooperative task with a negotiation component on the basis of the relative salience of the non-nominal goals (Marhall & Novick, 1995).

The intertwined character of heterogeneous goals often makes it difficult to explicate precisely what effects an agent intends to achieve with their action(s). This has a variety of consequences for act-based models of conversation. Among these are (1) rejection of speech acts as not corresponding to unitary segmentations of discourse (Levinson, 1981), (2) support for decomposing utterances into groups of associated component actions, and (3) enormous difficulty in producing verifiable sets of goals that motivate these actions. Conversational agent simulations tend to address these problems through the use of radically simple domains and tasks, such as a blocks world (Power, 1979), sequences of letters (Novick, 1988), or--somewhat richer--electrical circuit repairs (Guinn, 1995). The goals are well-defined and the language or acts produced are straightforward. The model of action occasionally embodies satisfaction of multiple goals, but only in the most rudimentary ways.

In more pragmatic terms, the mix of goals is apt to be misrepresented by computational agents or misunderstood by human agents.

Computational and Human Agents

The apparent simplicity of dialogues generated in agent-based simulations has more to do with a failure to model non-nominal goals than with any inherent defect in act-based models. The complexity of conversational phenomena is so high that we are generally happy simply to have our agents converse coherently, if stolidly. Power (1988), whose conversational simulation aimed at getting across the "point" of an utterance, argued at length that problems apparent in his simulation could be addressed in future work through the use of speech acts.

My argument here is that our representational problems arise from the fact that our models of action-as-act are too simple and not from inherent defects in the notion of act. Some of this problem is addressed by meta-conversational models (e.g., Carbonell, 1982; Novick, 1988; Traum & Hinkelman, 1992). However, these only go part of the way toward explicating the richness of conversation, as they do not address interactional goals as such; rather, these models basically present meta-acts for purposes of achieving nominal goals.

Schegloff (1996) presents an example of a simple action as a complex act in his account of a telephone conversation. An excerpt with the start of the conversation is presented in Figure 1.

  01   1+ rings 
  02   Marcia:   Hello?
  03   Donny:    'lo Marcia,=
  04   Marcia:   Yea[:h    ]
  05   Donny:      =[('t's) D]onny.
  06   Marcia:   Hi Donny.
  07   Donny:    Guess what.hh
  08   Marcia:   What.
  09   Donny:    hh My ca:r is sta::lled.

Figure 1: Donny and Marcia

In utterance 07, Donny's "Guess what" is what Schegloff calls a "pre-announcement" that foreshadows an implicit request for help. In speech-act terms, this ends up being represented as something like:

  intended effects:

  bel(M, expects(D, inform(D, M, x:{urgent, charged})))
  bel(M, polite(D))
  bel(M, will-request(D, M, consequences-of-x))
  request(M, D, inform(D, M, x))
  bel(M, bel(D, not-know(M, x))

Of course, pre-announcements such as "Guess what" could also foreshadow the telling of news that might not result in an implicit request. Donny could have followed up with "I'm engaged!" or "I won the lottery!" This means that Marcia actually expects that Donny's next utterance will be charged but either in a positive or negative way. In either case, this sort of pre-announcement also implies a belief that the speaker knows something that the hearer does not know.

Consequences of Complexity

Innocuous actions such as "Guess what" can lead to under-representation or outright misunderstaning. Where one agent says "Guess what," one could always have the other agent rotely respond "What." This would probably be a felicitous response 95 percent of the time. However, the rote response does not account for the complex of beliefs that we can see as coloring the balance of the conversation.

Among human agents, failure to perceive the correct act can sometimes result from assuming the wrong source of complexity. For example, a conversant might misinterpret an action as embodying the mix of social goals discussed above when, in fact, the speaker actually meant to produce a different set of effects.

Such misinterpretations can have serious consequences, as in the case of the dialogue reported by Cushing (1994), presented in Figure 2. Flight EAL401 is having trouble lowering its landing gear, and is flying over the Everglades while trying to solve the problem in order to land in Miami. The aircraft is slowly descending, but the crew is unaware of this because they are focused getting the landing-gear indicator light to confirm that the gear is locked in place.

  2338:46   EAL401   Eastern four oh one we'll go ah, out west just a
                     little further if we can here and, ah, see if we
                     can get this light to come on here.
  2341:40   AppCon   Eastern, ah, four oh one how are things comin'
                     along out there?
  2341:44   EAL401   Okay, we'd like to turn around and come, come back in.
  2341:47   AppCon   Eastern four oh one turn left heading one eight zero.
  2342:12   (Aircraft crashes into Everglades.)

Figure 2: EAL401 and Miami Approach Control

In this case, the controller noticed that the aircraft was descending and attempted in utterance 23:41:40 to confirm that the crew was aware of the situation. The expected consequences of the controller's utterance were probably something along the lines of:

  intended effects:

  bel(E, expects(A, x:{urgent, charged} AE inform(E, A, x)))
  bel(E, concerned(A))
  bel(E, will-assist-with(A, E, consequences-of-x))
  x:{urgent, charged} => request(E, A, action-for(x))

However, in their context of being focused on the landing gear, the crew misunderstood the complex of acts associated with the controller's question. The crew may have seen work-load or even social overtones, where the controller probably intented to signal concern for another abnormal condition. Thus the actual consequences of the controller's simple action were probably something like:

  actual effects:

  bel(E, expects(A, inform(E, A, x:{mutually known problem})))
  bel(E, helpful(A))
  request(E, A, command(A, E, y:{non-consequential(x)}))


Carbonell, J. G. (1982). Meta-language utterances in purposive discourse, Technical report CMU-CS-82-185, Department of Computer Science, Carnegie-Mellon University.

Cushing, S. 1994. Fatal words. Chicago: University of Chicago Press.

Guinn, C. (1995). Meta-dialogue behaviors: Improving the efficiency of human-machine dialogue - A computational model of variable initiative and negotiation in collaborative problem-solving, Ph.D. dissertation, Duke University.

Marshall, C. R., & Novick, D. G. (1995). Conversational effectiveness in multimedia communications, Information Technology & People, 8(1), 54-79.

Novick, D. G. (1988). Control of mixed-initiative discourse through meta-locutionary acts: a computational model. Doctoral dissertation, available as Technical Report CIS-TR-88-18, Department of Computer and Information Science, University of Oregon.

Levinson, S. C. (1981). The essential inadequacies of speech act models of dialogue. In H. Parret, M. Sbisa, & J. Verschuren (eds.), Possibilities and limitations of pragmatics: Proceedings of the Conference on Pragmatics at Urbino, July, 1979. Amsterdam: Benjamins. 473-492.

Power, R. (1979). The organization of purposeful dialogues, Linguistics, 17, 107-152.

Schegloff, E. 1996. Issues of relevance for discourse analysis: Contingency in action, interaction and co-participant context. In E. Hovy and D. Scott (eds.), Computational and conversational discourse: burning issues, an interdisciplinary account. Berlin: Springer-Verlag. 3-35.

Traum, D., & Hinkelman, E. (1992). Conversation acts in task-oriented spoken dialogue, Computational Intelligence, 8(3), 575-599.

Ward, K., Marshall, C., & Novick, D. (1995). Applying task classification to natural meetings, Technical Report CS/E 95-11, Department of Computer Science and Engineering, Oregon Graduate Institute.

Published in the working notes of the AAAI Fall Symposium on Communicative Action in Humans and Machines, Massachusetts Institute of Technology, November 8-10, 1997.

DGN, December 26, 1997
Valid HTML 4.0!