Chapter III
A
Theory of Meta-Locutionary Acts
Approaches to the
Integration Problem
If
the descriptive requirements for speech acts are too restrictive, in what
useful way might some requirements be relaxed? How could the resulting entities
still be characterized as acts? One way to approach this problem is to enlarge
the domain of the acts. Typically, speech acts concern the general domain of
the conversation--the subject of the conversants’ shared model. That is,
traditionally defined speech acts have been applied to effectuating changes in
the extra‑conversational context, which is what I earlier called the
“top” level of conversation. This means that speech acts have not been applied
to actions concerning the conversation itself, or to human cognitive mechanisms
such as recollection or focus of attention. As Cohen’s (1984) work made clear,
Austinian speech acts cannot adequately explain attention‑focusing
utterances independent of some act of predication. This failure does not
invalidate the theory of the process of generation and comprehension of
language which is implicit in speech-act theory. Under this theory, speakers
are goal-directed independent agents whose instrumentalities for achieving
their goals include language acts. Recall that
The
conversants share responsibility for the maintenance of the model. Note that
here I am referring specifically to the conversants’ model of their own
interaction rather than a model of their mutual beliefs about the subject of
the conversation. Jernudd and Thuan (1983) characterized the behavior of
conversants in this way, noting that the social aspects of language provide a
set of shields or protocols which allow conversants (and here the emphasis is
of course on hearers) to employ a wide variety of correction (i.e.,
maintenance) resources. These protocols, they stated, allow for shared
responsibility for speaking. Clark and Wilkes-Gibbs independently proposed a
principle of mutual responsibility:
The participants in a conversation try to
establish, roughly by initiation of each new contribution, the mutual belief
that the listeners have understood what the speaker meant in the last utterance
to a criterion sufficient for current purposes. (Clark & Wilkes-Gibbs,
1986, p.33)
Clark
and Wilkes-Gibbs described the operation of the principle of mutual
responsibility in the case of definite reference but, as Jernudd and Thuan
suggested, such a principle is appropriate for most aspects of interactive
discourse. For definite reference, this process involves initiation and
iterative expansion, replacement, or repair until the conversants arrive at a
version of the reference they mutually accept. The conversants try to minimize
collaborative effort. Thus if the speaker utters an elementary noun phrase the
hearers can presuppose their acceptance without taking an extra turn. Clark and
Wilkes‑Gibbs noted Schegloff’s finding that speakers are usually allowed
to present utterances without interruption; repair communications come in the
interstices between utterances. Schegloff et al. (1977) in fact stated that
other-initiation is withheld in order to allow speakers an opportunity to
initiate repair themselves. Clark and Wilkes-Gibbs (1966) then observed that
this explains why allowing a new contribution to proceed is
tantamount to a mutual acceptance of the old one.
Cohen’s
(1984) approach to this problem was to separate the goals of reference and
predication and to satisfy them separately. He suggested that in the case of
apparent predication without reference the solution is to understand the
utterance as an indirect speech act of reference. In either case, there has to
be an action of referent identification. Thus Cohen argued that referent
identification should be treated as an action that speakers request, and that
the speech act of referring should be considered a kind of request. In all
three instances (Jernudd & Thuan, 1983; Clark & Wilkes-Gibbs, 1986;
Cohen, 1984), the solution to the integration problem comes through applying
speech-act-like analysis to various intra‑conversational aspects of
discourse. They in effect break up speech acts to cover the particulars of
their respective problems studied. The theory of meta‑locutionary acts
presented in this dissertation extrapolates this method to a generalized
approach to interactive discourse.
Although
the extension of speech-acts to meta‑communication is novel, the idea of
meta‑communication as a factor accounting for coherence has been the
subject of at least preliminary research. It appears that the choice of a
direct speech act for embodying an indirect act is, at least in part, a
function of social relationship. The form of act then conveys as meta‑communication
the nominal social statuses of the conversants (Sanford & Roach, 1987).
Conversants can then use this meta‑information to interpret indirect
speech acts. Note, though, that this approach does not involve relaxing
constraints of Austinian speech acts. Rather,
There
is some empirical evidence in support of techniques which break down domain‑level
speech acts. Schegloff et al. (1977) found that where one conversant initiates
repair of another’s utterance, the operations of locating the repairable
discourse units and supplying a candidate repair are separated. The techniques
for initiating a repair of another’s utterance are technique for locating the
“trouble source.” In this case the actions of maintaining the shared model of
the conversation can be broken down at least into actions which shift the (sub‑)
focus to a problem with the model and actions which repair the problem.
A Computational Model of
Meta-Locutionary Acts
In
the previous discussion I have reviewed the fundamental reasons for
computational models of interactive discourse which follow the successful route
of speech-act theory but which relax the theory by applying it to the shared
model of the conversation itself. This is the consequence of the principle of
shared responsibility. However, the discussion so far has been synthetic, in
that it has attempted to extrapolate from various approaches to the integration
problem yet has largely remained in the realm of motivation rather than
extension. Here, then, the discussion becomes developmental. In this section, I
outline the areas in which computational models of interactive discourse can be
used to analyze this problem.
Requirements for a Model
The
search for a meta-locutionary model does not begin in a vacuum. Schegloff and
others have shown how repair has structure. This structure is not coextensive
with chunks of linguistic expressions such as lexical items, intonations, or
syntactic constructions. Rather, as I understand the import of this work, it is
the relationship among these chunks which creates a repair structure through
the local dynamics of the communication itself (Jernudd & Thuan, 1983;
Suchman, 1987).
One
way in which the mechanism of the feedback process has been examined is through
timing and coördination of correction utterances. (This is a sort of
meta-coördination issue.) The socio‑linguistic research examining how
people direct the course of conversation and repair its inherent troubles has
shown that conversants manage who is to talk at which times through an
intricate system of turn-taking. It turns out that there is a pattern and
structure to initiation of repairs which can be described in terms of
domain-level turn-taking:
The `repair-initiation opportunity space’ is
continuous and discretely bounded, composed of initiation-opportunity positions
at least some of which are discretely bounded. The positions are adjacent, each
being directly succeeded by a next, some being themselves
composed internally of a set of `sub-positions’. [Footnote omitted.] (Schegloff
et al., 1977, p. 375)
I
expect that instances of positive feedback will fall neatly into these
positions as well, indicating prophylactically to the speaker that the shared
model is consistent thus far.
Feedback
in interactive discourse displays other procedural regularities. Schegloff et
al. (1977) noted that repairs are subject to two strong preferences: speakers
prefer to repair their own utterances rather than let hearers do it; and
speakers prefer to initiate their own repairs rather than let hearers prompt
them to do it. As a result, conversants repair their own utterances as soon as
they detect problems. This is consistent with the results of Clark and Wilkes-Gibbs
(1986), who found that speakers prefer to make their own expansions of noun
phrases without prompting. There are places in the conversational flow which
the conversants mutually recognize in which repair initiation ought to occur if
repair were needed. A place-marking act which does not initiate repair can then
be understood as indicating comprehension (Suchman, 1987).
At
this point I want to interject, though, some methodological perspective on
these results. From the published research, it appears that Schegloff et al.
studied transcripts of verbal interaction. These transcripts at best coded
pause, intonation, and audible breath; although “non-linguistic” behavior such
as gesture may be parenthetically noted, they may have missed other extra‑linguistic
aspects of discourse such as gaze. As a result, the only attempts at correction
which are reflected in the research analysis are verbal ones. It may be, and
preliminary protocol analysis of my own so suggests, that nonverbal (but
nonetheless linguistic) communication may contain a significant level of
correction, repair, and other feedback information. As a consequence, other‑initiated
repairs may be underreported and the preference for self-repair correspondingly
over‑reported. To the extent which they are true, though, do the reported
aspects of the process have any advantage for interactive discourse? Clark and
Wilkes-Gibbs (1986) observed that conversants also try to avert potential
exchanges as the hearer tries to correct any misunderstanding; that is, the
conversants strive to achieve mutual acceptance of a reference with minimal
conversational cost. These preferences have the effect, then, of minimizing
collaborative effort. Accordingly, we can develop a number of standards by
which computational models of feedback in discourse can be evaluated. Is the
model a performance model with respect to repair opportunities? Does the model
inherently serve to minimize collaborative effort?
Levels of Feedback for
Modeling
There
are differences among kinds of feedback, in the sense of supportive versus
corrective, or self‑repair versus other-initiated repairs. Goodman (1986)
found four kinds of communicative failures in the water-pump assembly
conversations he studied: erroneous specificity, improper focus, wrong context,
and bad analogy. There are also differences in the level of the feedback with
respect to the multi‑stratification of conversation. Some corrections and
assurances are obvious because they are embodied in sentence-level utterances.
For example, Grosz reported the following fragment:
S1: The lid is attached to the container with
four ¼-inch bolts.
S2: Where are the bolts? (Grosz, 1977, p. 353)
Very
typical of the dialogues reported by Cohen is this exchange:
S: Take that red piece. It’s got four little
feet on it.
J: Yeah.
S: And put the small end into that hole on the
air tube--on the big tube.
J: On the very bottom.
S: On the bottom, yes.
J: Okay. (Cohen, 1984, p. 140)
In
the first example, the request for clarification is at the extra-conversational
level of Austinian speech acts. In the second exchange, we begin to see
fragments: utterances which cannot be coded directly as speech acts. This
exchange, even though purely textual, shows the beginning of the journey down
through the layers of the conversational structure; it can still be coded
reasonably through a minimally relaxed set of speech acts. There are some kinds
of perlocutionary effects, though, which can be achieved linguistically, but
not through communication of intent (Cohen, 1984). These are farther down in
the structure. It is not clear where the bottom is. Do conversants recognize
intent for fundamentally intra-conversational acts? Actually, this question is
not settled even for domain-level acts (cf., Allen & Perrault, 1980). Recognition,
as with other conversational processes, need not be conscious. At some point,
though, we will reach another set of not‑usually‑negotiated
communicative units through which the collaborative structure is created.
Forms of Computational
Models
The
goal of this research is to identify and to model computationally a set of acts
relative to the conversational levels. The set of acts which we choose as a
representation for the real acts in the world of actual conversants depends on
the computational structures within which the acts are used in the processes of
comprehension and generation. Is the process simply pattern-matching? Parsing? Inferential?
Clark
and Wilkes-Gibbs (1986) stressed that the principle of mutual responsibility
operates in a context of widespread, natural ambiguity and uncertainty.
Conversants need not assure perfect understanding of each utterance but only
understanding to a criterion sufficient under the circumstances. The dynamics
of the collaborative process encourage minimizing the collaborative effort,
thus implicitly encouraging some measure of uncertainty in understanding.
Certainly, hearers will accept uncertainty when they anticipate that the gaps
in their understanding will be filled in later. But, as Clark and Wilkes-Gibbs
pointed out, the hearer has to tolerate uncertainty always: Although
conversants might like to accept each element mutually, second by second as
they proceed, this represents an impractical ideal. First, there is the
probability that some definite references cannot be understood fully until
later in the utterance. Second, I note that each part of the utterance itself
is a case of the speaker being ahead of the hearer. This is exactly what Grosz
(1981) described as the speaker always being one step ahead of the hearer. This
suggests two further points about the nature of feedback itself: First, no
matter how the observed discourse units are subdivided or aggregated (and they
always can be), the speaker is always this unit ahead of the hearer and both speaker
and hearer may be depending on the hearer’s feedback to maintain their model of
the conversation. Second, the simultaneous presence of discourse segments,
units, fragments, and atoms of various sizes, almost always overlapping of
others, suggests that feedback with respect to the mutual model of the
conversation must itself be multi-layered, with related temporally and
referentially dependent communicative functions. Adequate computational models
of interactive discourse, then, must reflect this multi-layered, simultaneous
character.
Meta-Locutionary Acts
To
determine a set of illocutionary meta-acts, I start with the taxonomy of acts
proposed by Searle (1969) that form the foundation of speech-act theory. Searle
suggested that the types of illocutionary acts are promise, request, assert (or
affirm), question, thank, advise, warn, greet, and congratulate. For each type,
Searle defines the act’s propositional content, the conditions preparatory to
its use, the intent (which Searle calls the sincerity) of the speaker, and the
act’s essential meaning. Can these acts be applied directly in a
meta-locutionary way?
The
Role of Acts in Language
Some
of Searle’s acts do not have obvious analogs for the control processes of
conversation. For example, greeting and congratulating seem inherently tied to
the domain of social relations. The mere existence of other acts might be
questioned. What, after all, are acts in language? The easiest act to
understand is a request. Requests have an obvious motivation for the speaker;
namely, a goal that the hearer perform the requested act. But in what sense is
an assertion an act? Searle says that a “propositional act” can be a
constituent part of an illocutionary act of asserting. The apparent, surface
motivation for assertions is a goal that the hearer know
something. Yet is it ever the case that the speaker really wants as an end in
itself that the hearer should know something? Typically, one might surmise, the
speaker really wants the hearer to know something because the speaker believes
that this will bring about some further, elaborated state which is the
speaker’s greater goal. If a child wants someone to bring it a drink, the child
might say “Hey, I’m thirsty.” This is an act of assertion which has as its
immediate goal the state that the hearer should know of the speaker’s thirst.
It can further be seen as act of requesting, where the speaker’s goal is to get
the hearer to bring the speaker a drink. This case is a clear example of an
indirect speech act. Thus, one can ask, is an assertion ever an act in itself?
Actually, examples of assertion are not hard to find. A tutor’s statement,
under the usual circumstances, that “The capital of Delaware is Dover.” is an
act of assertion that has as its intended effect the acceptance of the content
of the statement by the tutor’s student.6 The student’s might in
return say “OK,” thus making an assertion that she understands the tutor’s
statement. In the context of the control of conversations, an act analogous to
the student’s confirmation would be a conversant’s acknowledgment that it was
the other conversant’s turn to speak.
Domain‑Specific
Acts
If
meta‑locutionary acts are the extension of illocutionary acts to
subsentential, meta‑locutionary phenomena in conversation, then are these
acts the same acts which are used at the sentential, domain level? One
possibility is that the set of acts is the same, and that the instantiation of
the acts with meta‑locutionary states differentiates them from acts
instantiated with domain states. The other possibility is that the acts are
different in themselves. That is, there is one set of acts which is peculiar to
meta‑locutionary phenomena and another set for domain knowledge. In fact,
this position might even be extended by the idea that every different domain
has a different set of acts and that meta‑locutionary phenomena simply
comprise a particular, recurrent domain. The first position I will call universalist; the latter situated, in the spirit of situated‑action
analysis (Suchman, 1987).
The
existence of different sets of acts for different domains postulated in the
situational position is not as bleak a representational prospect as might be
imagined. Specialization of acts does not negate the idea of ontological
structure for classes of acts. Just as we can apply old words in new domains,
or understand new words in old domains, so too can we interpret unfamiliar
acts. It may also be the case that acts are reified by specialized use. In
other words, the constant use of particular (perhaps partial) instantiations of
certain acts may lead to general acceptance of these instantiations as acts in
and of themselves. Jernudd and Thuan (1983) pointed out that speech acts and
their meanings may be negotiated between conversants. Of course, a huge number
of pre-negotiated acts is part of the language as we
learn it. As a consequence, one would expect to find the greatest levels of
specialization in the domains which we most frequently use. If the process of
conversation is itself considered as a domain, then the use of specialized meta‑locutionary
operators should be highly developed.
Note
that ontological relationships should still hold among acts which are
specialized for the meta-domain of conversational control. Suchman’s (1987)
situated action hypothesis, that conversants have highly developed operators
which respond to local context, also leads to the conclusion that the acts used
in the experimental domain may fairly be represented as domain‑specialized
acts. Indeed, meta‑locutionary acts ought to be derivable as
instantiations of more basic acts. Accordingly, at the turn‑taking level,
I suggest that there are specialized acts which take into account the
particulars of these behaviors. These acts include acknowledge‑turn, give‑turn,
request‑turn, and hold turn. Each act should, in theory, be derivable
from some set of more general acts. For example, request‑turn could be
expressed as
act ( Me, request ( Other, act ( Other assert (
belief ( turn ( Me )))).
The
concept of turn itself might be then be explicated further. The point here,
though, is that there are domain‑specific acts which have an ontological
relationship to a more abstract set of acts upon which conversants might rely
in an unfamiliar context. Givón (1984) observed that as languages develop, they
grow gradually less ambiguous and more richly coded. That is, in their early
stages languages tend to be highly polysemous. As linguistic functions become
routinized, specialized functionality develops. In the context of speech acts,
then, the general acts identified by Searle represent the ontological
primitives from which the specialized meta‑acts developed as the
functions they facilitated became routine.
A
form of taxonomy of meta‑locutionary acts can be developed using the
general classifications of speech acts as an outline. More precisely, there is
a forest of taxonomies where each tree corresponds to a conversational level. I
will follow the taxonomic outline proposed by Bach and Harnish (1979) for
communicative illocutionary acts.7 Constatives express the conversant’s belief and her intention that
the other conversant have or form a like belief. Directives express the conversant's attitude toward some
prospective action by the other conversant and the first conversant's intention
that her utterance, or the attitude it expresses, be taken as a reason for the
second conversant's action. Commissives
express the conversant's intention and belief that her utterance obligates her
to do something (perhaps under certain conditions). Acknowledgments express feelings regarding the other conversant or
convey formal or perfunctory expressions that social obligations be met. The
use of Bach and Harnish’s particular taxonomic system is intended as an
explanatory aid; it is the identification of (and the relationships among) the
various meta‑locutionary acts which matter most. Using these distinctions
then, Figures 3, 4, 5, and 6 present taxonomic trees for certain meta‑locutionary
levels of the conversational process; respectively, these are turn‑taking,
repair of mutual models, reference/information, and attention.
communicative illocutionary acts: turn‑taking

constatives directives commissives acknowledgments
│ │ │ │
hold‑turn(<person1>)
give‑turn(<person1>,<person2>)
take‑turn(<person1>)
accede(<person1>,turn(<person2>))
FIGURE 3. Taxonomy of meta‑locutionary acts
for turn‑taking.
communicative illocutionary acts: repair of mutual
models

constatives directives commissives acknowledgments
│ │ │ │
repeat(<person1>,Act>)
clarify(<person1>,State)
confirm‑mutual(<person1>,<person2>,State)
comprehend(<person1>,State)
FIGURE 4. Taxonomy of meta‑locutionary acts
for repair of mutual models.
communicative illocutionary acts:
reference/information

constatives directives commissives acknowledgments
│ │ │ │
assert(<person1>,<person2>,State) acknowledge(<person1>,Act)
deny(<person1>,State)
request(<person1>,<person2>,Act)
FIGURE 5. Taxonomy of meta‑locutionary acts
for information.
communicative illocutionary acts: attention

constatives directives commissives acknowledgments
│ │ │ │
attend(<person1>,<person2>)
unattend(<person1>,<person2>)
FIGURE 6. Taxonomy of meta‑locutionary acts
for attention.
Summary
In
this chapter, I presented the foundations and elaboration of a theory of meta‑locutionary
acts. The theory arises from the principle of mutual responsibility in conversation,
which holds that conversants share the responsibility for maintaining the
coherence of their interaction. This implies, among other things, that
conversants must have means for confirming copresence and the effects of
conversational acts on the other conversants. These control functions can be
modeled as speech acts through relaxation of constraints on Austinian speech
acts. In particular, going beyond the mapping of speech acts to domain (i.e.,
extra-conversational) matters permits the acts to reference and affect the
conversation itself; that is, for these acts, the conversation is the domain. Such meta‑locutionary
acts, then, are reflected in a computational model which proposes to encompass
the processes of repair and feedback which maintain the conversational models.
These acts cross conversational levels and are simultaneous. Finally, I
presented a taxonomic account of the meta‑locutionary acts, with
particular application to turn‑taking, repair of mutual models,
information, and attention.
Given
the theory, though, two questions arise. First, does the theory correspond to
and account for the interaction observed in actual conversations? Second, is
the theory adequate for computer‑based interaction? The question of the
explanatory power is addressed in Chapters IV and V. The question of the
applicability of the theory is addressed by the computational simulation in
Chapter VI.
6.
The fact that the tutor’s act may be a pure assertion does not imply that the
tutor lacks more complex intentions which are served by the act. The tutor may
have intentions, for example, that the student will pass an examination.
However, the tutor cannot achieve this directly through an act. Rather, the
tutor--through assertion--takes an act which has a change in the mental state
of the student as its immediately intended effect.
7.
In addition to communicative acts, Bach and Harnish (1979) also described
“conventional” illocutionary acts, which are not relevant to the theory
developed in this dissertation. Briefly, conventional illocutionary acts
include effectives and verdictives, which mainly concern socially conventional
acts such as vetoing a bill or finding a defendant guilty.