A Theory of Meta-Locutionary Acts
Approaches to the Integration Problem
the descriptive requirements for speech acts are too restrictive, in what
useful way might some requirements be relaxed? How could the resulting entities
still be characterized as acts? One way to approach this problem is to enlarge
the domain of the acts. Typically, speech acts concern the general domain of
the conversation--the subject of the conversants’ shared model. That is,
traditionally defined speech acts have been applied to effectuating changes in
the extra‑conversational context, which is what I earlier called the
“top” level of conversation. This means that speech acts have not been applied
to actions concerning the conversation itself, or to human cognitive mechanisms
such as recollection or focus of attention. As Cohen’s (1984) work made clear,
Austinian speech acts cannot adequately explain attention‑focusing
utterances independent of some act of predication. This failure does not
invalidate the theory of the process of generation and comprehension of
language which is implicit in speech-act theory. Under this theory, speakers
are goal-directed independent agents whose instrumentalities for achieving
their goals include language acts. Recall that
The conversants share responsibility for the maintenance of the model. Note that here I am referring specifically to the conversants’ model of their own interaction rather than a model of their mutual beliefs about the subject of the conversation. Jernudd and Thuan (1983) characterized the behavior of conversants in this way, noting that the social aspects of language provide a set of shields or protocols which allow conversants (and here the emphasis is of course on hearers) to employ a wide variety of correction (i.e., maintenance) resources. These protocols, they stated, allow for shared responsibility for speaking. Clark and Wilkes-Gibbs independently proposed a principle of mutual responsibility:
The participants in a conversation try to establish, roughly by initiation of each new contribution, the mutual belief that the listeners have understood what the speaker meant in the last utterance to a criterion sufficient for current purposes. (Clark & Wilkes-Gibbs, 1986, p.33)
Clark and Wilkes-Gibbs described the operation of the principle of mutual responsibility in the case of definite reference but, as Jernudd and Thuan suggested, such a principle is appropriate for most aspects of interactive discourse. For definite reference, this process involves initiation and iterative expansion, replacement, or repair until the conversants arrive at a version of the reference they mutually accept. The conversants try to minimize collaborative effort. Thus if the speaker utters an elementary noun phrase the hearers can presuppose their acceptance without taking an extra turn. Clark and Wilkes‑Gibbs noted Schegloff’s finding that speakers are usually allowed to present utterances without interruption; repair communications come in the interstices between utterances. Schegloff et al. (1977) in fact stated that other-initiation is withheld in order to allow speakers an opportunity to initiate repair themselves. Clark and Wilkes-Gibbs (1966) then observed that this explains why allowing a new contribution to proceed is tantamount to a mutual acceptance of the old one.
Cohen’s (1984) approach to this problem was to separate the goals of reference and predication and to satisfy them separately. He suggested that in the case of apparent predication without reference the solution is to understand the utterance as an indirect speech act of reference. In either case, there has to be an action of referent identification. Thus Cohen argued that referent identification should be treated as an action that speakers request, and that the speech act of referring should be considered a kind of request. In all three instances (Jernudd & Thuan, 1983; Clark & Wilkes-Gibbs, 1986; Cohen, 1984), the solution to the integration problem comes through applying speech-act-like analysis to various intra‑conversational aspects of discourse. They in effect break up speech acts to cover the particulars of their respective problems studied. The theory of meta‑locutionary acts presented in this dissertation extrapolates this method to a generalized approach to interactive discourse.
the extension of speech-acts to meta‑communication is novel, the idea of
meta‑communication as a factor accounting for coherence has been the
subject of at least preliminary research. It appears that the choice of a
direct speech act for embodying an indirect act is, at least in part, a
function of social relationship. The form of act then conveys as meta‑communication
the nominal social statuses of the conversants (Sanford & Roach, 1987).
Conversants can then use this meta‑information to interpret indirect
speech acts. Note, though, that this approach does not involve relaxing
constraints of Austinian speech acts. Rather,
There is some empirical evidence in support of techniques which break down domain‑level speech acts. Schegloff et al. (1977) found that where one conversant initiates repair of another’s utterance, the operations of locating the repairable discourse units and supplying a candidate repair are separated. The techniques for initiating a repair of another’s utterance are technique for locating the “trouble source.” In this case the actions of maintaining the shared model of the conversation can be broken down at least into actions which shift the (sub‑) focus to a problem with the model and actions which repair the problem.
A Computational Model of Meta-Locutionary Acts
In the previous discussion I have reviewed the fundamental reasons for computational models of interactive discourse which follow the successful route of speech-act theory but which relax the theory by applying it to the shared model of the conversation itself. This is the consequence of the principle of shared responsibility. However, the discussion so far has been synthetic, in that it has attempted to extrapolate from various approaches to the integration problem yet has largely remained in the realm of motivation rather than extension. Here, then, the discussion becomes developmental. In this section, I outline the areas in which computational models of interactive discourse can be used to analyze this problem.
Requirements for a Model
The search for a meta-locutionary model does not begin in a vacuum. Schegloff and others have shown how repair has structure. This structure is not coextensive with chunks of linguistic expressions such as lexical items, intonations, or syntactic constructions. Rather, as I understand the import of this work, it is the relationship among these chunks which creates a repair structure through the local dynamics of the communication itself (Jernudd & Thuan, 1983; Suchman, 1987).
One way in which the mechanism of the feedback process has been examined is through timing and coördination of correction utterances. (This is a sort of meta-coördination issue.) The socio‑linguistic research examining how people direct the course of conversation and repair its inherent troubles has shown that conversants manage who is to talk at which times through an intricate system of turn-taking. It turns out that there is a pattern and structure to initiation of repairs which can be described in terms of domain-level turn-taking:
The `repair-initiation opportunity space’ is continuous and discretely bounded, composed of initiation-opportunity positions at least some of which are discretely bounded. The positions are adjacent, each being directly succeeded by a next, some being themselves composed internally of a set of `sub-positions’. [Footnote omitted.] (Schegloff et al., 1977, p. 375)
I expect that instances of positive feedback will fall neatly into these positions as well, indicating prophylactically to the speaker that the shared model is consistent thus far.
Feedback in interactive discourse displays other procedural regularities. Schegloff et al. (1977) noted that repairs are subject to two strong preferences: speakers prefer to repair their own utterances rather than let hearers do it; and speakers prefer to initiate their own repairs rather than let hearers prompt them to do it. As a result, conversants repair their own utterances as soon as they detect problems. This is consistent with the results of Clark and Wilkes-Gibbs (1986), who found that speakers prefer to make their own expansions of noun phrases without prompting. There are places in the conversational flow which the conversants mutually recognize in which repair initiation ought to occur if repair were needed. A place-marking act which does not initiate repair can then be understood as indicating comprehension (Suchman, 1987).
At this point I want to interject, though, some methodological perspective on these results. From the published research, it appears that Schegloff et al. studied transcripts of verbal interaction. These transcripts at best coded pause, intonation, and audible breath; although “non-linguistic” behavior such as gesture may be parenthetically noted, they may have missed other extra‑linguistic aspects of discourse such as gaze. As a result, the only attempts at correction which are reflected in the research analysis are verbal ones. It may be, and preliminary protocol analysis of my own so suggests, that nonverbal (but nonetheless linguistic) communication may contain a significant level of correction, repair, and other feedback information. As a consequence, other‑initiated repairs may be underreported and the preference for self-repair correspondingly over‑reported. To the extent which they are true, though, do the reported aspects of the process have any advantage for interactive discourse? Clark and Wilkes-Gibbs (1986) observed that conversants also try to avert potential exchanges as the hearer tries to correct any misunderstanding; that is, the conversants strive to achieve mutual acceptance of a reference with minimal conversational cost. These preferences have the effect, then, of minimizing collaborative effort. Accordingly, we can develop a number of standards by which computational models of feedback in discourse can be evaluated. Is the model a performance model with respect to repair opportunities? Does the model inherently serve to minimize collaborative effort?
Levels of Feedback for Modeling
There are differences among kinds of feedback, in the sense of supportive versus corrective, or self‑repair versus other-initiated repairs. Goodman (1986) found four kinds of communicative failures in the water-pump assembly conversations he studied: erroneous specificity, improper focus, wrong context, and bad analogy. There are also differences in the level of the feedback with respect to the multi‑stratification of conversation. Some corrections and assurances are obvious because they are embodied in sentence-level utterances. For example, Grosz reported the following fragment:
S1: The lid is attached to the container with four ¼-inch bolts.
S2: Where are the bolts? (Grosz, 1977, p. 353)
Very typical of the dialogues reported by Cohen is this exchange:
S: Take that red piece. It’s got four little feet on it.
S: And put the small end into that hole on the air tube--on the big tube.
J: On the very bottom.
S: On the bottom, yes.
J: Okay. (Cohen, 1984, p. 140)
In the first example, the request for clarification is at the extra-conversational level of Austinian speech acts. In the second exchange, we begin to see fragments: utterances which cannot be coded directly as speech acts. This exchange, even though purely textual, shows the beginning of the journey down through the layers of the conversational structure; it can still be coded reasonably through a minimally relaxed set of speech acts. There are some kinds of perlocutionary effects, though, which can be achieved linguistically, but not through communication of intent (Cohen, 1984). These are farther down in the structure. It is not clear where the bottom is. Do conversants recognize intent for fundamentally intra-conversational acts? Actually, this question is not settled even for domain-level acts (cf., Allen & Perrault, 1980). Recognition, as with other conversational processes, need not be conscious. At some point, though, we will reach another set of not‑usually‑negotiated communicative units through which the collaborative structure is created.
Forms of Computational Models
The goal of this research is to identify and to model computationally a set of acts relative to the conversational levels. The set of acts which we choose as a representation for the real acts in the world of actual conversants depends on the computational structures within which the acts are used in the processes of comprehension and generation. Is the process simply pattern-matching? Parsing? Inferential?
Clark and Wilkes-Gibbs (1986) stressed that the principle of mutual responsibility operates in a context of widespread, natural ambiguity and uncertainty. Conversants need not assure perfect understanding of each utterance but only understanding to a criterion sufficient under the circumstances. The dynamics of the collaborative process encourage minimizing the collaborative effort, thus implicitly encouraging some measure of uncertainty in understanding. Certainly, hearers will accept uncertainty when they anticipate that the gaps in their understanding will be filled in later. But, as Clark and Wilkes-Gibbs pointed out, the hearer has to tolerate uncertainty always: Although conversants might like to accept each element mutually, second by second as they proceed, this represents an impractical ideal. First, there is the probability that some definite references cannot be understood fully until later in the utterance. Second, I note that each part of the utterance itself is a case of the speaker being ahead of the hearer. This is exactly what Grosz (1981) described as the speaker always being one step ahead of the hearer. This suggests two further points about the nature of feedback itself: First, no matter how the observed discourse units are subdivided or aggregated (and they always can be), the speaker is always this unit ahead of the hearer and both speaker and hearer may be depending on the hearer’s feedback to maintain their model of the conversation. Second, the simultaneous presence of discourse segments, units, fragments, and atoms of various sizes, almost always overlapping of others, suggests that feedback with respect to the mutual model of the conversation must itself be multi-layered, with related temporally and referentially dependent communicative functions. Adequate computational models of interactive discourse, then, must reflect this multi-layered, simultaneous character.
To determine a set of illocutionary meta-acts, I start with the taxonomy of acts proposed by Searle (1969) that form the foundation of speech-act theory. Searle suggested that the types of illocutionary acts are promise, request, assert (or affirm), question, thank, advise, warn, greet, and congratulate. For each type, Searle defines the act’s propositional content, the conditions preparatory to its use, the intent (which Searle calls the sincerity) of the speaker, and the act’s essential meaning. Can these acts be applied directly in a meta-locutionary way?
The Role of Acts in Language
Some of Searle’s acts do not have obvious analogs for the control processes of conversation. For example, greeting and congratulating seem inherently tied to the domain of social relations. The mere existence of other acts might be questioned. What, after all, are acts in language? The easiest act to understand is a request. Requests have an obvious motivation for the speaker; namely, a goal that the hearer perform the requested act. But in what sense is an assertion an act? Searle says that a “propositional act” can be a constituent part of an illocutionary act of asserting. The apparent, surface motivation for assertions is a goal that the hearer know something. Yet is it ever the case that the speaker really wants as an end in itself that the hearer should know something? Typically, one might surmise, the speaker really wants the hearer to know something because the speaker believes that this will bring about some further, elaborated state which is the speaker’s greater goal. If a child wants someone to bring it a drink, the child might say “Hey, I’m thirsty.” This is an act of assertion which has as its immediate goal the state that the hearer should know of the speaker’s thirst. It can further be seen as act of requesting, where the speaker’s goal is to get the hearer to bring the speaker a drink. This case is a clear example of an indirect speech act. Thus, one can ask, is an assertion ever an act in itself? Actually, examples of assertion are not hard to find. A tutor’s statement, under the usual circumstances, that “The capital of Delaware is Dover.” is an act of assertion that has as its intended effect the acceptance of the content of the statement by the tutor’s student.6 The student’s might in return say “OK,” thus making an assertion that she understands the tutor’s statement. In the context of the control of conversations, an act analogous to the student’s confirmation would be a conversant’s acknowledgment that it was the other conversant’s turn to speak.
If meta‑locutionary acts are the extension of illocutionary acts to subsentential, meta‑locutionary phenomena in conversation, then are these acts the same acts which are used at the sentential, domain level? One possibility is that the set of acts is the same, and that the instantiation of the acts with meta‑locutionary states differentiates them from acts instantiated with domain states. The other possibility is that the acts are different in themselves. That is, there is one set of acts which is peculiar to meta‑locutionary phenomena and another set for domain knowledge. In fact, this position might even be extended by the idea that every different domain has a different set of acts and that meta‑locutionary phenomena simply comprise a particular, recurrent domain. The first position I will call universalist; the latter situated, in the spirit of situated‑action analysis (Suchman, 1987).
The existence of different sets of acts for different domains postulated in the situational position is not as bleak a representational prospect as might be imagined. Specialization of acts does not negate the idea of ontological structure for classes of acts. Just as we can apply old words in new domains, or understand new words in old domains, so too can we interpret unfamiliar acts. It may also be the case that acts are reified by specialized use. In other words, the constant use of particular (perhaps partial) instantiations of certain acts may lead to general acceptance of these instantiations as acts in and of themselves. Jernudd and Thuan (1983) pointed out that speech acts and their meanings may be negotiated between conversants. Of course, a huge number of pre-negotiated acts is part of the language as we learn it. As a consequence, one would expect to find the greatest levels of specialization in the domains which we most frequently use. If the process of conversation is itself considered as a domain, then the use of specialized meta‑locutionary operators should be highly developed.
Note that ontological relationships should still hold among acts which are specialized for the meta-domain of conversational control. Suchman’s (1987) situated action hypothesis, that conversants have highly developed operators which respond to local context, also leads to the conclusion that the acts used in the experimental domain may fairly be represented as domain‑specialized acts. Indeed, meta‑locutionary acts ought to be derivable as instantiations of more basic acts. Accordingly, at the turn‑taking level, I suggest that there are specialized acts which take into account the particulars of these behaviors. These acts include acknowledge‑turn, give‑turn, request‑turn, and hold turn. Each act should, in theory, be derivable from some set of more general acts. For example, request‑turn could be expressed as
act ( Me, request ( Other, act ( Other assert ( belief ( turn ( Me )))).
The concept of turn itself might be then be explicated further. The point here, though, is that there are domain‑specific acts which have an ontological relationship to a more abstract set of acts upon which conversants might rely in an unfamiliar context. Givón (1984) observed that as languages develop, they grow gradually less ambiguous and more richly coded. That is, in their early stages languages tend to be highly polysemous. As linguistic functions become routinized, specialized functionality develops. In the context of speech acts, then, the general acts identified by Searle represent the ontological primitives from which the specialized meta‑acts developed as the functions they facilitated became routine.
A form of taxonomy of meta‑locutionary acts can be developed using the general classifications of speech acts as an outline. More precisely, there is a forest of taxonomies where each tree corresponds to a conversational level. I will follow the taxonomic outline proposed by Bach and Harnish (1979) for communicative illocutionary acts.7 Constatives express the conversant’s belief and her intention that the other conversant have or form a like belief. Directives express the conversant's attitude toward some prospective action by the other conversant and the first conversant's intention that her utterance, or the attitude it expresses, be taken as a reason for the second conversant's action. Commissives express the conversant's intention and belief that her utterance obligates her to do something (perhaps under certain conditions). Acknowledgments express feelings regarding the other conversant or convey formal or perfunctory expressions that social obligations be met. The use of Bach and Harnish’s particular taxonomic system is intended as an explanatory aid; it is the identification of (and the relationships among) the various meta‑locutionary acts which matter most. Using these distinctions then, Figures 3, 4, 5, and 6 present taxonomic trees for certain meta‑locutionary levels of the conversational process; respectively, these are turn‑taking, repair of mutual models, reference/information, and attention.
communicative illocutionary acts: turn‑taking
constatives directives commissives acknowledgments
│ │ │ │
FIGURE 3. Taxonomy of meta‑locutionary acts for turn‑taking.
communicative illocutionary acts: repair of mutual models
constatives directives commissives acknowledgments
│ │ │ │
FIGURE 4. Taxonomy of meta‑locutionary acts for repair of mutual models.
communicative illocutionary acts: reference/information
constatives directives commissives acknowledgments
│ │ │ │
FIGURE 5. Taxonomy of meta‑locutionary acts for information.
communicative illocutionary acts: attention
constatives directives commissives acknowledgments
│ │ │ │
FIGURE 6. Taxonomy of meta‑locutionary acts for attention.
In this chapter, I presented the foundations and elaboration of a theory of meta‑locutionary acts. The theory arises from the principle of mutual responsibility in conversation, which holds that conversants share the responsibility for maintaining the coherence of their interaction. This implies, among other things, that conversants must have means for confirming copresence and the effects of conversational acts on the other conversants. These control functions can be modeled as speech acts through relaxation of constraints on Austinian speech acts. In particular, going beyond the mapping of speech acts to domain (i.e., extra-conversational) matters permits the acts to reference and affect the conversation itself; that is, for these acts, the conversation is the domain. Such meta‑locutionary acts, then, are reflected in a computational model which proposes to encompass the processes of repair and feedback which maintain the conversational models. These acts cross conversational levels and are simultaneous. Finally, I presented a taxonomic account of the meta‑locutionary acts, with particular application to turn‑taking, repair of mutual models, information, and attention.
Given the theory, though, two questions arise. First, does the theory correspond to and account for the interaction observed in actual conversations? Second, is the theory adequate for computer‑based interaction? The question of the explanatory power is addressed in Chapters IV and V. The question of the applicability of the theory is addressed by the computational simulation in Chapter VI.
6. The fact that the tutor’s act may be a pure assertion does not imply that the tutor lacks more complex intentions which are served by the act. The tutor may have intentions, for example, that the student will pass an examination. However, the tutor cannot achieve this directly through an act. Rather, the tutor--through assertion--takes an act which has a change in the mental state of the student as its immediately intended effect.
7. In addition to communicative acts, Bach and Harnish (1979) also described “conventional” illocutionary acts, which are not relevant to the theory developed in this dissertation. Briefly, conventional illocutionary acts include effectives and verdictives, which mainly concern socially conventional acts such as vetoing a bill or finding a defendant guilty.