CHAPTER I
INTRODUCTION
This
dissertation presents the motivation, methodology, and results of a qualitative
study of mixed-initiative conversational interaction. The study examines speech
meta-acts which handle the various functions of coördinating and producing
conversational discourse. It addresses the problem of multi-agent computer and
computer-human interaction by examining human‑human interaction. In the
day-to-day experience of ordinary human interaction, people encounter and deal
with multi-agent communication situations all of the time. If the processes
humans use to produce conversation can be understood, these insights may
facilitate human-computer interaction that conveniently resembles the
interactive discourse of independent human agents.
Human-computer
interaction tends to be fragile and frustrating while human-human interaction
displays robustness over a wide variety of circumstances (Hayes & Reddy,
1983). Some of the differences between the two forms of interaction are
summarized in Figure 1. In human—computer interaction, typically either the
human or the computer controls flow and structure of the interaction. This is
single-initiative interaction in the sense that only one of the parties is
empowered to initiate exchanges. This form of interaction is largely
pre-planned, either by the human or by the designers of the program. Even so,
it is largely because they are forced by the interface (or have learned from
earlier frustration) to use well-formed language that the human and the
computer are able to understand each other at all. In contrast, human-human
conversations are characterized by unpredictability, ungrammatical utterances,
non-verbal expression, and mixed‑initiative control in which the
conversants take independent actions. In the face of an unexpected shift of
conversational context, somehow humans are able to understand that the shift
has occurred. Our every-day language is notoriously ill-formed, full of improper
usages, mismatched agreements, halts and starts, in-line correction of self and
other, and utterances that may not even be words. Moreover, we interpret and
express a wide range of non‑verbal behaviors as a concomitant of verbal
interaction. In human-human conversation, both conversants have the power to
seize the initiative to meet their own needs and expectations. If the language
of human-computer interaction seems stilted and tame, the language of ordinary
human-human conversation is wild and untamed.
H H
I C H I
control mixed-initiative single‑initiative
actions situated pre-planned
language informal, relaxed formal,
stilted
Figure
1. Differences between interaction modes. Human‑human
interaction (HHI) is generally more flexible than human‑computer
interaction (HCI)
Traditional
natural-language systems--even those based on speech acts--are largely unable
to handle these aspects of “feral” language because they mainly rely on parsing
sentence-level interaction and fail to account properly for the context created
by the conversation itself. Conversational processes are apparently not
characterized by top-down planning; they are made up of actions which are
responsive to the dynamic conversational situation. A striking aspect of
conversational discourse is its coherence--the character of making sense to the
conversants. What accounts for the coherent nature of conversation? Despite the
incoherence of human-human interaction to an observer to whom only the words
are given, the interaction is coherent for the participants; the conversants
take turns, make interruptions, detect and cure misunderstandings, and resolve
ambiguous references.
How
can these processes of control be modeled formally in a manner sufficient to
bring this sort of coherence to human‑computer interaction? Although
speech act theory adds clarity to the role of intention in interactive
discourse, and implies a central function for intention in making discourse
coherent, intention is not sufficient to explain the coherent nature of
interactive discourse. I argue in this dissertation that shared models play a
key role in the mechanics of conversation and creation of coherence. If
interactive discourse is achieved through a process of maintenance of a shared
model, the conversants will have to use feedback in their discourse as a
process control. This feedback is contained in non‑sentential aspects of
conversation such as nods, fragmentary utterances, and correction. Such actions
by the conversants, based on the context of their interaction, determine the
form of the conversation. These actions are thus “meta” to the conversation
because they are about the conversation itself rather than constituting its
primary substance. In this view ungrammaticality, for example, is not a problem
but evidence of these “meta” acts.
This
dissertation develops a theory of meta-locutionary acts that explains these
control processes. The theoretical approach developed and used here is based on
a synthesis of two schools of thought with respect to natural language.1
The first, speech-act theory, is a branch of the
philosophy of language. The second, conversational analysis,
is a branch of the sociology of language. The differences between these
approaches results in what I term the integration problem. I propose a solution
to the integration problem and then use this result to build an operator-based
model of conversation. The theory, which encompasses a
taxonomy of meta-locutionary acts, thus extends speech-act theory to
real-world conversational control.
The
theory of meta-locutionary acts was refined and validated by a protocol study
of human‑human interaction and a computational simulation of this
interaction. In the protocol study, subjects were given a coöperative problem‑solving
task in a simple domain which necessitated linguistic interaction. The
conversants’ behaviors, both verbal and non‑verbal, were transcribed, and
their interaction was mapped into illocutionary and meta-locutionary acts. The
simulation was developed using a rule-based system written in Prolog; it
embodies a system which, based on the theory of meta-locutionary acts, shows
that the theory has practical explanatory power. It extends the work of Power
(1979) to meta-locutionary phenomena. The system contains independent knowledge
bases for both conversants and can represent their separate conversational
knowledge simultaneously. The conversants simultaneous actions can then be
simulated in the rule‑based system on a cycle-by-cycle basis. Simulations
of illocutionary and meta-locutionary acts in the conversations obtained in the
protocol study showed that meta-locutionary acts are capable of providing
contextually appropriate control of mixed-initiative discourse.
The
dissertation concludes with an assessment of the computational model, including
proposals for extension, and a discussion of the implications of meta-locution
for understanding human-computer interaction.
Speech-Act Theory
When
language is viewed as a series of assertions about the world, the reasons for
speaking and the way language is comprehended are difficult to understand. A
breakthrough in understanding the functions of language was made by the
development of a theory of speech-acts.
Problems of Speech-Act
Theory
Speech-act
theory has not been without critics. Levinson (1981) argued that speech acts
are inadequate for modeling dialogue. First, he argued, the immense variety of
surface utterances cannot be reduced to a limited set of acts. Second, speech
acts do not appear to correspond neatly to sentences. Third, assignment of
meaning to acts cannot be accomplished by a set of rules expressing
conventions; rather, this process must involve inferential techniques that take
many aspects of context into account. Fourth, sequence‑oriented speech‑act
models are too simple to account for the complexity of real‑world
exchanges, and coherence cannot, as matter of principle, be explained by rule‑based
systems. The first three points are addressed directly by the theory advanced
in this dissertation. It is plausible that speech acts are specialized by
domain. This is analogous to observations of development of sublanguages in
communities of speakers with specialized interests or needs. In this case,
viewing the process of conversation as a domain itself leads to the observation
that speech acts need not correspond to sentence‑level discourse units.
The circumstance that early speech-act models were simple does not mean that more
sophisticated rule‑based speech-act models cannot embody situated action
in response to complex contexts. As I argue in this dissertation, the fault is
not in the idea of speech act theory but rather in confining its application to
sentence‑level meaning. Finally, Levinson’s argument about the
computational properties of rule-based systems are mistaken. One could dispute
matters such as convenience of representation, but Levinson’s observation that
simple rules cannot explain some exchanges does not reasonably lead to the
conclusion that this problem is an inherent one. For example, rule‑based
systems are not inherently limited to modeling adjacency-pair exchanges. Bowers
and Churcher (1988), for example, have shown that a rule-based system modeling
speech acts can account for complex situated communicative actions. They do so
by extending the theory of illocutionary acts from individual performance to
social accomplishment; illocution is understood in “dialogical” terms.
Another
problem with speech‑act theory has been the difficulty with what are
called “indirect” speech acts (Perrault & Cohen, 1980). The problem which
indirect speech acts address is that of explaining why an utterance which on
its face appears to be one sort of act and which is nevertheless easily
understood as an altogether different sort of act. Indirect acts are often the
result of social conventions for politeness and seemly behavior. For example,
“Gee, that cake sure looks good.” is apparently an act of assertion when the
conversants both know the act is really a request: “Please give me a piece of
that cake.” Understanding and explaining indirect acts is a major open problem
for speech-act theory. The protocol analysis in Chapter V makes a modest gain
in this area; the evidence suggests that indirect speech acts have utility in
terms of efficiency of conversational communication.
There
is also a continuing argument over whether conversants must have the ability to
recognize illocutionary acts as such (Bach & Harnish, 1979; Cohen & Levesque,
1986), but the outcome of this debate is not critical to the points made in
this dissertation. What is important about speech acts here, though, is a view
of language as a series of acts with effects on listeners, or even as
coöperative acts which have effects on a jointly constructed conversational
structure. Construed broadly, speech acts can be seen as the principal units of
of inter‑agent interaction; they are the acts which build conversations.
Winograd and Flores (1986) describe language as form of human social action.
Language as acts, they argue, creates a consensual domain-interlinked patterns
of activity. While Austinian speech-act theory was developed to explain overt
effects of sentence-level utterances, these qualities can be relaxed to include
mental effects in the hearer and sub‑sentence level utterances. As I will
show in this dissertation, such extensions of speech act theory lead to
plausible computational models for interactive discourse.
Extension of Speech-Act
Theory
Tying
speech acts to sentence level utterances appears to be a limitation which does
not take adequate account of the relationship between acts and language. Some
acts are physical acts masquerading as speech acts. Others are speech acts
which look like physical acts. For example, a light switch which reacts to loud
noises might turn itself on when someone yells “Lights on!” but would also work
if they yelled “I now pronounce you husband and wife.” Conversely, a cough
might embody a request for someone to be quiet. In particular, not all effects
which speakers intend to cause through their illocutionary acts are physical in
nature like passing the salt. As we have seen, some illocutions can make
changes in legal relations. Other illocutions have as their effect another
speech act. Thus speaking “Do you have the time?” may lead to the hearer saying
“Yes, it’s ten of two.” Indeed, as is centrally important to this dissertation,
the perlocutionary effect of an utterance may involve the conversation itself.
Thus, for example “Shut up!” may be intended by the speaker to cause the hearer
to stop talking. In a more pleasant and coöperative context, speaking “Go on.”
may be intended by the speaker to cause the hearer to continue speaking (and
maybe even effect some purely mental change in the hearer’s beliefs about the
speaker’s level of comprehension). Such utterances can be viewed as
meta-locutionary acts because they concern and affect the process of their own
conversation. This dissertation, then, develops a framework for research into
speech meta‑acts which handle the various functions of coördinating and
producing discourse as well as research developing semantics and pragmatics for
these meta-acts which are sufficiently complete to permit computational
modeling of simple interaction in conversational discourse.
Mutuality of Knowledge
The
model proposed in this dissertation is based on the idea that the
meta-locutionary acts of participants in a conversation are directed toward
maintenance of some set of mutual beliefs. Before tackling the problem of what
knowledge is shared, though, the question of what constitutes sharing must
first be addressed. This issue was studied by Clark and Marshall (1981) in
their research on definite reference. They noted what they termed the mutual
knowledge paradox: People clearly understand definite reference in a finite
time, but this ought to take infinite time because to be certain of a reference
the conversants are caught in an infinitely recursive evaluation of the form “B
knows that A knows that B knows that A knows that p.”
Clark
and Marshall resolved the mutual knowledge paradox through use of copresence
heuristics. In the simplest case, direct copresence creates mutual knowledge
when both parties simultaneously observe a domain feature and are aware that
the other party is also observing the feature. For example, two people sitting
across from each other at a dinner table observe simultaneously a candle in the
center of the table, thus also observing each other observe. Copresence heuristics
eliminate the infinite recursion of the mutual knowledge paradox by allowing
the parties to have direct knowledge of the conclusion. These direct copresence
heuristics are supplemented (and here the supplements may be far larger than
the original corpus) by indirect copresence heuristics that provide an
equivalent basis for concluding mutual knowledge. That is, direct copresence is
not necessary where the circumstances or the parties’ general knowledge about
each other provide a substitute for physical copresence. Clark and Marshall
described an extensive set of indirect copresences, the details of which are
far beyond the scope or needs of this dissertation. As an example, though, I
note the case where two conversants can infer that they are both members of a
community (
In
the case of interactive discourse, the notion of mutual knowledge through
copresence heuristics applies directly to conversants’ sharing of knowledge
about the conversation itself. In conversing, the direct copresence heuristic
applies to the utterances of both parties because they hear both their own and
the others’ utterances while in each other’s presence. Thus the conversants
have mutual knowledge of their conversation (and they know it, because the
mutuality is “on the table”). The conversants also share a directly copresent
knowledge not only of the individual utterances but also of the contextual
structure which the utterances create for the conversation. This, then, is what
I mean generally when I refer to a shared model of interactive discourse. Note
that this exposition covers only what is meant by sharing and does not reach
the issues of what specifically is shared or what is significant in what is
shared.
Even
though Clark and Marshall’s work was based on resolving a problem of
understanding definite reference, their concept of mutual knowledge is straightforwardly
extensible to other aspects of the domain and the conversants’ view of the
domain. Moreover, even the nominally limited field of definite reference is
broad enough to support reasonable inference about the role of shared
structures in interactive discourse. When a conversant makes a reference to a
(conversational) domain referent, the effect in the hearer might be a simple
one of affording recall of or attention to some specific and insular thing.
More likely, though, is that the referent is one part of a complex memorial
structure. If there is a specific and definite referent with copresence, the
speaker can reasonably infer the hearer’s shared knowledge. But even in such a
case the referent is likely to have remindings and shadings of meaning for the
hearer which differ from that of the speaker. To the extent that copresence
heuristics are available for these remindings (community, shared experiences,
“our song”) the remindings are part of the shared knowledge of the
conversation; beyond this extent, the conversants’ knowledge is a matter for
inference at best and more likely ignorance. That is, often there must be
remindings about which the other party has little or no knowledge, and the
parties presumably know this.
My
point here that the boundary between shared and unshared knowledge is not
sharp. Rather a field of imprecision and conjecture exists where shared
dissolves into unshared as the inferences of indirect copresence become
increasingly implausible. Thus a shared model of a conversation will be
clearest for the facts of the utterances themselves (because of direct
copresence) and increasingly indistinct for the memorial extensions of
reference and the semantics of the conversants’ actions. For another
conversant, as for any observer, the imprecision of ascertaining the meanings
of a conversant’s actions in response to illocutionary acts is exacerbated to
the extent that the action is a mental one. That is, some responsive actions
are physical—like opening a door or passing the salt; other actions are
mental—like remembering
Given
these fundamental concepts of speech acts and mutuality, I can begin to address
the research question around which this proposal is centered. A vocabulary,
though, does not in itself constitute a theory. In the next chapter, then, I
examine the state of related research and the frontier at which my own problem
is formed. Specifically, Chapter II looks at the problem of applying speech‑act
theory to real‑world language, examines evidence for shared models of
conversations, and discusses the role of repair in maintaining coherence.
Chapter III presents the theory of meta-locutionary acts as a way of explaining
coherence in terms of conversational acts analogous to domain-level
illocutionary acts. Chapter IV discusses the methodology of a protocol study of
human‑human conversation. In the study, the results of which are reported
in Chapter V, a conversational protocol is analyzed in terms of
meta-locutionary acts. Chapter VI presents a simulation of the protocol
conversation using rule-based operators. The operators represent an application
of the theory of meta-locutionary acts; they are able to use the theory to
account for significant parts of the interaction observed in the protocol.
Finally, Chapter VII summarizes the results of the dissertation research,
discusses issues of intentionality which arose out the research, and presents
thoughts about related future work.
1.
Speech-act theory is the source of the terms locution, illocution and
perlocution. The spoken words are a locution; illocution is the rhetorical
force of the utterance; and perlocution is the change wrought by the
illocution. The term meta-locution is an invention of the author's which refers
to speech about speech. A meta-locutionary act is an illocutionary act which
has as its intended perlocutionary effect a change in the conversation in which
the act itself occurs.