The models of co-presence and of acceptance developed by Clark and his colleagues provide both fundamental support for conversational models generally and design guidance in developing cooperative systems specifically. Co-presence, which underlies conversants' understanding of what knowledge is mutual, provides a clean representation and a better understanding of conversation as context, but it has not yet provided direct design guidance for human-computer dialogues. Systems designed to actively facilitate human-human cooperation might benefit from incorporating representations of co-presence or co-presence heuristics. Acceptance, which serves to confirm that an utterance is contributing to a dialogue, is highly replicable with respect to evidence levels, can be extended to multiple conversants, and provides key design guidance for interactive systems, but is less useful for tracking conversational structure or explaining why conversations are more or less effective.
Keywords: Clark, co-presence, acceptance, dialogue, mutual knowledge
Les modèles de co-présence et d'acceptation developés par Clark et ses collègues servent comme fondement des modèles conversationels en général et comme guide de conception dans le développement des systèmes cooperatifs systems en particulier. Co--presénce, qui aide les particpants conversationels dans la compréhension de la mutualité de leurs connaissances, fournit une représentation nette et une compréhension améliorée de la conversation comme contexte, mais n'a pas encore servit directement comme guide de conception. Les systèmes qui facilitent activement la cooperation pourraient profiter de l'utilisation des répresentations du co-présence ou des heuristiques de co-présence. L'acceptation, qui sert à confirmer qu'une déclaration contribut è un dialogue, est très réproductible pour les niveaux évidentiels, peut etre étendu vers des particpants multiples, et sert de guide de conception important pour les systèmes interactifs, mais est moins utile en suivant les structures conversationelles ou en explicant pourquoi les conversations sont plus ou moins efficaces.
Mots Cles: Clark, co-présence, acceptatation, dialogue, connaissances mutuelles
Know then thyself, presume not God to scan;
The proper study of mankind is man.
--Alexander Pope, An Essay on Man
Sexism aside, Pope had it right. Relatively early attempts in artificial intelligence to impose an idealized taxonomic structure on knowledge and discourse failed to capture something human--reflected in the fact that things like crying, spitting, and even giving birth presumably should all be represented by an act like "egest," for example. An approach that tries to produce a perfect, perspectiveless account of the world is, in effect, seeking to scan God. So over the years, both independently and together, we have been approaching the research and development of interactive systems with the point of view that machines should adapt to humans. The tricky part, though, is understanding humans and their interactivity deeply enough to enable us to design systems that can compensate for computers' shortcomings and thus cooperate with people sufficiently well. What's worse, our experience suggests that when people adapt technology to their needs, the means by which they effectuate their cooperation can be entirely unintuitive, despite the fact that many of us use these technologies daily.
The appeal of the conversational models developed by Clark and his colleagues is that they provide a human-centered account of interaction that holds a promise of providing design guidance that is both specific and theoretically motivated. We have particular experience in studying and using two of these models, co-presence and acceptance. On the basis of our experience, we claim that these models not only provide fundamental support for conversational models generally but actually provide design guidance in developing cooperative systems. At the same time, some aspects of the models are of only limited use in explicating dialogue structure or quality. In this paper, then, we briefly reiterate the concepts of co-presence and acceptance, and discuss our explorations and uses of these models.
A first and most basic concept is that of co-presence, which Clark and Marshall (1981) developed in expressing their model of mutual knowledge in conversation. Clark and Marshall resolved a hard problem of representation of mutuality by collapsing the unwieldy infinite-recursion model to a single-level model founded in co-presence. In a nutshell, co-presence of a thing can be physical, linguistic or cultural. Physical co-presence occurs when the thing is mutually known to the conversants because it is there, you see it, I see it, and we can see each other see it. An utterance made to an attending conversational partner is linguistically co-present, and either conversant should be able to refer to an utterance and its contents. Cultural co-presence occurs when there are circumstances that enable a conversant to conclude that it is likely that the thing is mutually known by all relevant parties to a conversation. In each case, the inference establishing mutual knowledge relies on co-presence heuristics. One such heuristic (used to establish cultural co-presence) is that both conversants in a dialogue are members of a community in which knowledge of thing is known and known to be known. For example, virtually all Americans over the age of nine can be presumed to know the number of stripes on the American flag and the name of the President. At this point, we suggest that the process of knowing about mutuality operates through the conversant's accumulation of evidence about the mutuality of a thing X through subsequent actions, until the point where the conversant might be justified in being angry if X were not really mutual. Based on our experiences in devising cooperative systems as research-oriented architectures and as development prototypes, we see three specific plusses and one large minus for co-presence as a basis for designing cooperative systems.
The first two plusses are a much cleaner representation and a better understanding of conversation as context. In achieving its main aim of resolving representational difficulties, the concept of co-presence is a great success. This makes it all the more distressing to hear, as one does on occasion, graduate students in computer science still use the infinite-recursion model of mutual knowledge, when this issue was rather neatly resolved nearly 20 years ago. The cleaner representation makes tractable the use mutual knowledge in multi-agent simulations. Novick and Ward (1993) used the Clark and Marshall approach explicitly to model mutual knowledge among conversational agents in the domain of air traffic control. The co-presence of utterances was used to build up increasingly complex representations of mutual knowledge as the number of agents grew to include both additional intended participants and overhearers. The agent architecture used a representation involving belief groups to track, from each agent's perspective, which elements of the prior discourse were believed mutually by which agents. Moreover, the point about understanding conversation as mutually known context is not only that it enables rational use of prior utterances but that it explains other kinds of conversational behaviors. In particular, concepts like negotiation of referring expressions (Clark & Wilkes-Gibbs, 1986) and the effects of the ability to affect the extent of output of a conversational partner for purposes of necessary and sufficient understanding (Schober & Clark, 1989) depend critically on co-presence. The third plus is that co-presence explains parts of conversations that might otherwise be considered off-topic: the conversants are establishing the community(ies) to which they belong. For example, Wynn and Novick (1996) report a meeting in which a pair of mid-level technical managers engage in a sort of duet about parts numbers that does not fit well into the meeting's nominal task; this discourse can be understood as establishing the manager as part of a community, creating-among other things-foundations for mutual knowledge in sections of the conversation to come.
In order to consider co-presence as a basis for design, we must distinguish interactive systems in which human-computer cooperation is central from interactive systems that are designed to mediate or facilitate cooperation among human agents. In the case of the former, it can be argued that, even when users treat computers as "social actors," they make minimal assumptions regarding mutual knowledge. For this reason, useful systems can be designed without explicit reference to co-presence. Indeed, when actually building cooperative systems, it is not entirely clear what it means to design a dialogue system using co-presence as a key element. This may simply be the consequence of the fact that mutual knowledge is a sort of omnipresent phenomenon that receives implicit support from interactive systems. In other words, co-presence explains why it works but need not be represented explicitly in a functioning system. Our view, though, is that as cooperative systems become more sophisticated, explicit representation of co-presence (and its justifications) will be increasingly important. It is relatively easy to demonstrate that the state of mutual knowledge (and its basis) is a factor for conversants in making choices with respect to conversational behaviors such as initiative. For example, if a flight management system asks a pilot for a destination airport, the pilot could reasonably want to add to her answer the approach course that she would prefer. These points turned out to be of practical use in developing a spoken-language prototype appointment scheduling system (Fanty et al., 1995).
In human-human dialogue, mutual knowledge plays a much more important role. Thus systems that are designed to mediate or facilitate such dialogue must be designed to support the application of co-presence heuristics by their human users. In previous research (Marshall & Novick, 1995) we have shown that situational factors, such as the type of task being performed, determine how important various forms of co-presence are. In a task involving assembly of parts, for example, physical co-presence is predictably important for effective communication and successful task completion. Thus we were able to show that mediated communication can be as effective as face-to-face communication for such a task if the mediating system supports physical co-presence by providing (1) both an audio and a video channel, (2) remote camera control so that each party can decide what to see in the other's environment and (3) a split video display that allows each party to see what the other is seeing. In contrast, for a decision task in which linguistic and cultural co-presence provided the basis for relevant mutual knowledge, mediated communication with audio+video had no advantage over audio-only. Because the type of co-presence needed varies from task to task, mediating systems should be designed to support all types of co-presence if those systems are to be broadly useful.
In designing prototype systems we have found it useful to extend task analysis and requirements formulation to include specification of co-presence needs. Thus far, it has not seemed useful to incorporate representations of co-presence or co-presence heuristics into the mediating systems themselves. However, we speculate that such a step might be necessary to move from systems that provide passive mediation to systems that provide active facilitation.
A second concept on which we would like to comment is that of acceptance, and the notion of levels of acceptance, developed by Clark and Schaefer (1989). At some risk of reductionism, we focus here on the key point that contributions to discourse are composed of utterances that are presentations and acceptances, often embedded, forming a contribution tree. To be effective as a part of the discourse, a presentation must be accepted, and an utterance that accepts is also a presentation. Conversational behaviors produce different levels of evidence of acceptance (in order of increasing strength): (1) continued attention, (2) next relevant contribution, (3) acknowledgment, (4) demonstration (i.e., actually doing something that demonstrates comprehension, and (5) display (i.e., repeating the presentation back). Contributions as linguistic units were revolutionary because "they are not formulated autonomously by the speaker according to some prior plan, but emerge as the contributor and the partner act collectively. Success depends on the coordinated actions by the two of them." (Clark & Schaefer, 1989, p. 292). Based on our experience, we can identify three plusses and a couple of minuses with acceptance and acceptance levels as a basis for the design of cooperative systems.
For the first plus, a key point on which we believe most researchers in the field will agree is that the coding of acceptance levels is generally replicable. In our own research, we have had multiple observers code many hundreds of utterances with respect to acceptance levels, and across these cases the levels of inter-rater reliability have been high. Second, the contribution model has proven extensible both in depth and breadth: the functions of the acknowledgment level were explored by Novick and Sutton (1994); and contribution model was extended to multiparty conversation (i.e., with more than two conversants) (Novick, Walton & Ward, 1996). Third, and perhaps more usefully, the notions of acceptance and levels of acceptance have proven practical in the design of interactive systems; consideration of how to produce contributions through achieving acceptance constitutes a design method. Acceptance level and the nature of the presentation being accepted were explicit factors considered in generating a design space for system prompts in spoken-language systems (Hansen, Novick & Sutton, 1996). In the development of a telephone-based spoken-language system for collecting the U.S. census (Cole et al., 1997), the need for and costs of acceptance of presentations of information elements proved to be a key factor in deciding the design of the system's prompts. One great advantage of acceptance as a design method is that it explicitly accounts for the effects to which it relates. This contrasts with, say, the use of design guidelines, which are popular among developers but methodologically incapable of providing a verifiable result. Considering why and how an utterance must be accepted provides an explicit basis for checking that the design does so. Use of a guideline like "Speak the user's language" really amounts to a kind of self-exhortation that cannot produce a detailed explanation of why a particular design does or does not comply with the guideline. In short, contribution, presentation and acceptance constitute a more articulate model.
At the same time, structures based around contribution seem less replicable. Our experience is that a reasonably well-informed researcher can generate a plausible contribution tree, which will not necessarily be the same as other trees generated by similarly qualified researchers. While detailed sections are often fairly reliable, differences among trees can involve fairly major structures. An informal survey of spoken-language systems suggests that the focus-stack model (Grosz & Sidner, 1986) is often used as the driver for dialogue structure. The contribution model tends to be cited more as an indication of general approach rather than as a mechanism specifically implemented in the system.
We found the acceptance model provided limited help in understanding the modality differences reported in (Marshall & Novick, 1995). In that empirical study, we showed that previously inconsistent results for different modalities with respect to conversational effectiveness could be explained by task differences. We then explored, but did not publish, research into whether differences in conversational behaviors across face-to-face, audio-only, and audio+video modalities could be explained, in part, through analysis of differences in acceptance levels across modality and task conditions. We did find some significant differences in acceptance levels as a function of task. These show that the type of co-presence used to establish mutual knowledge impacts the type of acceptance acts conversants employ. For example, we saw hundreds of level-4 acceptances (demonstration) during an assembly task and merely a few during a decision task.
Initially we expected that certain modalities would support more "effective" conversation that would be characterized by a regular pattern of presentations and acceptances, while the less-effective conversations produced in other modalities would be characterized by more frequent non-acceptance acts and consequent repairs. Instead, we found that the distribution of degrees of effectiveness did not vary with modality in any interesting way, even though some modalities are clearly superior to others for performance on certain kinds of cooperative tasks. Examination of specific cases made it clear to us that the pattern of presentations and acceptances we observe in a dialogue do not reveal much about how well communicative intentions are realized. This led us, ultimately, to questions our original assumptions about what makes a conversation "effective."
Clark, H., and Marshall, C. (1981). Definite reference and mutual knowledge. In A K. Joshi, B. L. Webber, I. A. Sag (Eds.), Elements of discourse understanding (pp. 10-63). New York: Cambridge University Press.
Clark, H., and Schaefer, E. (1989). Contributing to discourse. Cognitive Science, 13, 259-294.
Clark, H., and Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition. 22, 1-39.
Cole, R., Novick, D., Vermeulen, P.J.E., Sutton, S., Fanty, M., Wessels, L., de Villiers, J., Schalkwyk, J., Hansen, B., and Burnett, D. (1997). Experiments with a spoken dialogue system for taking the U.S. census, Free Speech Journal, 1, February 25, 1997.
Fanty, M., Sutton, S., Novick, D., and Cole, R. (1995). Automated appointment scheduling, ESCA Workshop on Spoken Dialogue Systems, Vigsø, Denmark, May, 1995, 144-47.
Grosz, B. J., and Sidner, C. L. (1986). Attention, intentions, and the structure of discourse. Computational Linguistics, 12, 175-204.
Hansen, B., Novick, D., and Sutton, S. (1996). Systematic design of spoken prompts, Conference on Human Factors in Computing Systems (CHI'96), Vancouver, BC, April, 1996, 157-164.
Marshall, C. and Novick, D. G. (1995). Conversational effectiveness in multimedia communications. Information Technology & People, 8(1), 54-79.
Novick, D., and Sutton, S. (1994). An empirical model of acknowledgment for spoken-language systems, Proceedings of ACL-94, Las Cruces, NM, June, 1994, 96-101.
Novick, D., Walton, L., and Ward, K. (1996). Contribution graphs in multiparty conversations, Proceedings of the International Symposium on Spoken Dialogue (ISSD-96), Philadelphia, PA, October, 1996, 53-56.
Novick, D., and Ward, K. (1993). Mutual beliefs of multiple conversants: A computational model of collaboration in air traffic control. Proceedings of AAAI'93, Washington, DC, July, 1993, 196-201.
Schober, M., and Clark, H. (1989). Understanding by addressees and overhearers. Cognitive Psychology, 21, 211-232.
Wynn, E., and Novick, D. (1996). Relevance conventions and problem boundaries in work redesign teams. Information Technology & People, 9(2), 61-80.
This paper was published as: Novick, D., Marshall, C., Hansen, B. and Ward, K. (1998). Implications of co-presence and acceptance for cooperative systems, Working Notes of the COOP 98 Workshop on The Use of Herbert H. Clark's Models of Language Use for the Design of Cooperative Systems, Cannes, FR, May, 1998.