Chapter VII
Conclusion
The
results of this study, stated broadly, suggest that it is feasible to extend
speech-act analysis to meta-locutions, that humans use feedback to maintain
coherence, and that irregularities in speech are clues to understanding rather
than noise to be cleaned away. A
computational model of meta-locutionary acts was developed and validated
through both a protocol study and a rule-based simulation.
Summary
This
dissertation began by posing a question which I called the integration problem:
how can the speech-act theory of conversational action be reconciled with the
sociological view of the irregularities of actual interactive discourse? In response, I suggested that conversants
maintain a shared model of their conversation; that is, they are jointly
creating a single conversation and each conversant has a model of the
conversation they believe themselves to be creating. In light, then, of the coöperative,
feedback-infused, locally-controlled nature of conversation, I suggested that
the conversational repairs which maintain the shared model can be seen as acts
akin to speech acts, except that they specifically affect the process of the
conversation itself rather than more general aspects of the world related, for example,
to conversants’ underlying needs.
The
view of conversational coherence as being produced by meta-acts of the
conversants establishes the goal of the research reported in this
dissertation: Trying to identify and to
model computationally a set of acts relative to these sub-sentential levels of
conversation. I then described a set of
conversational effects, as reported in the sociological literature, which
formed a basis for ascribing meta-acts to the conversants. I presented an experimental technique to
observe these acts.
I
applied the theory of meta-locution to the protocols which I had obtained. That
is, given the theory, did it and account for the interaction observed in actual
conversations? The theory suggested that
meta-locutionary actsÑacts by a conversant which control the process of the
conversationÑcould be modeled at several different conversational levels,
ranging from domain actions to turn-taking.
In the case of turn-taking specifically, for example, the set of
directive acts included hold-turn, give-turn, take-turn, and accede(turn). The interaction observed between the
conversants could be explained computationally in terms of this set of
meta-locutionary acts. Examples of
operators were presented which, given the state of a conversant's model of the
conversation, would account for the observed acts. In three specific cases, I used the
meta-locutionary analysis to explain otherwise anomalous utterances. The protocol analysis also confirmed the
observation that humans use feedback to maintain coherence. Indeed, most of the acts accounted for in the
protocol are not domain-level acts but rather concern and affect the control
and comprehension of the conversation itself.
It is also true that the conversation obtained in the protocol contains
all kinds of interruptions and sentence fragments. The meta-locutionary analysis of the protocol
evidence is consistent with a view that each utterance, however fragmentary,
has meaning in the conversational context.
In understanding human communication, the individual fragments of the
conversation are valuable as clues to coherent interpretation. I note that in the observed interaction, the
production of utterances is inherently tied to understanding of the other
conversant’s utterances. Moreover, a
number of the utterances are contemporaneous, suggesting that timing of
utterance production is a mutually determined process, as might be expected in
a coherent mixed-initiative dialogue.
This suggests that in analyzing interactive discourse, language
production and language understanding cannot be separated.
The
computational model developed through the protocol analysis was tested through
a rule-based simulation. Note that the
simulation did not constitute the theory; rather, the theory was partially
implemented as a rule-based system to evaluate its adequacy for computer-based
interaction. The simulation presented in
Chapter VI showed that meta-locutionary operators can control mixed-initiative
discourse in a manner that resembles the kind of interaction observed in
natural conversation. Meta-locutionary
acts specifically modeled in the simulation included the requesting, taking,
giving, and holding of conversational turns.
The simulation also showed situated action producing a form of global
coherency through the use of local, context-driven operators.
Open Issues
The
research presented in this dissertation represents a set of first steps in
understanding the control of mixed-initiative discourse. The central thesis of accounting for coherence
through meta-locutionary acts appears well-founded. That the particular acts proposed here
represent the ultimate expression of the theory seems subject to doubt. The simulation study made clear the need for
specialized operators (and presumably acts) dealing with correction and
repair. Other classes of acts will
probably emerge. Aside, though, from the
obvious future work in this area involving the expansion and refinement of the
acts, and aside from the design issues discussed with respect to the
implementation of the simulation, there are other issues to address. Some of these issues are representational,
and have to do with understanding what we are observing in conversations
recorded in protocol studies. Other
issues arise with respect to what other kinds of human-human interaction might
this methodology be applied.
Representation Issues
With
respect to the process of transcribing and encoding the protocols, there remain
aspects of the conversations which are as yet unrepresented. First, pauses, while implicitly represented
in the time-line transcript, are not specifically noted as either things or
acts. It is possible that pauses may, in
and of themselves, be acts. The
interpretation of pauses as possible acts seems subjective in the extreme,
however. Chafe (1986) suggested, in the
context of generation, that pauses correspond to the process of memorial
retrieval; that is, concepts have activation levels which determine the
retrieval costs for related concepts.
According to Chafe, patterns in discourse are shaped by the speaker's
costs of retrieval corresponding to active, semi-active, and inactive
concepts. In any event, the problem of
accounting for pauses remains unsatisfactorily addressed by the
meta-locutionary model.
Another
problem arises out the subjective nature of the coding of acts from the
utterances, which is necessarily a process of interpretation. In a qualitative study in which rules for
coding acts from lexical representations have not been expressed, it falls upon
the coder to make a subjective interpretation of the act in the context of the
conversation. In the case of the
protocol analyzed in this dissertation, this process has been an iterative one,
with successive refinements and alterations of the encodings of the acts and
the states in order to produce a rational account of the observed
conversation. Does not this process
result in an encoding for the conversation which must work, under the
circumstances? The answer to this
question is in three parts. First, although
the acts are motivated by accounts in the socio-linguistic literature, the
development of the specific acts and their computational representations is the
express goal of this research. That is,
this process is the one which Cohen (1984) proposes. Second, the set of acts, their
representations, and the operators, are validated by application in the
simulation. Third, there is a need for
future work in applying the acts developed on the basis of the protocol
reporter here to other protocols. If effectiveness
outside of the context of their development can be observed, then the acts may
reasonably considered valid.
Other
Voices, Other Rooms
The
theory may also have application for other sources of human interaction and
other contexts. Clearly, the theory of
meta-locutionary acts could be applied in other domains. If the theory is true, then meta-locutionary
acts should be consistent among different speakers of a given language;
otherwise we would find it hard to converse at all. Moreover, the acts should be fairly
domain-independent. The dissertation
research looked at two different pairs of speakers, both in the
sequence-recollection domain. The main
obstacle to application of the theory to other domains is difficulty of
representing the domain knowledge with sufficient exactness to permit the
simulation to run. The complexity of
even the sequence-recollection domain, especially as compared to the
meta-locutionary knowledge, turned out to have been daunting. The problem is not that it is hard to find
some representation which would permit a computer program to perform the domain
task in the abstract. Rather, the
problem is in linking the representation of the domain to mental states which
can form the basis for linguistic action (Stucky, 1988).
Speech
act theory has also been applied to communication within organizations
(Winograd & Flores, 1986). There may
meta-locutionary analogs for communicative control and coherence in this
interaction. This may be actually be a
fruitful domain because of the relatively formalized and statically represented
nature of intra-organizational communication.
Cognitive
Constraints
The
protocol analysis and simulation suggest two areas of inquiry as to the role of
cognitive constraints in conversation. Understanding
these constraints would aid explication of the observed discourse and would
immediately serve to improve communication through computer-human interfaces.
Language
Units
One
of the issues which the simulation forced attention was that of the size of the
units of communication. How much can
people understand at once? For computers
this is generally not a problem; if the agents were simply Prolog programs they
could have just transferred their sequences to each other and merged them. Obviously, most people cannot do this; the
protocol subjects certainly didn’t. The
human cognitive constraints which determine how much we can absorb at a time
thus have consequences for the design of mixed-initiative interfaces. The computer program must accommodate the
human’s limited capacity. Conversely,
there may be constraints on the human capacity for linguistic production. Systems which are receiving information from
humans may be better equipped to understand the interaction if they can relate
the size of the units of language produced to factors associated with the
production process.
The
variation in human skills for interaction is large (see e.g., Argyle &
Cook, 1976). Presumably conversants take
account of and adapt for these differences, which inject a correspondingly
large measure of uncertainty into the interaction process: Is the other conversant really
attending? Was that a nod? Variation in the size of the units of
language which conversants can produce or understand creates uncertainty as to
(1) the degree of understanding and (2) the degree to which conversants believe
that they have been understood. These
factors, then, suggest reasons why feedback is such an important part of
conversation. Conversants are
continually faced with issues of understanding which are not ordinarily
resolvable through one-sided inference.
Thus greater understanding of the limits of human cognitive capacity
would better allow modelers of interaction to base operators on the underlying
reasons for the use of meta-locutionary acts.
Tolerance
of Uncertainty
Having
observed that conversation is replete with sources of uncertainty, I now turn
to the some of the issues of how conversants manage the uncertainty they
encounter. The central question I want
to ask is: How tolerant of uncertainty
are conversants? The answer to this
question has important implications for problems such as deciding when to
initiate conversational repairs. In the
model and simulation, uncertainty was reduced to qualitative simplifications
such as true and mutually_known_true.
Yet conversants apparently proceed with imperfect knowledge of the state
of the conversation or, when faced with obvious differences in their
conversational models, will continue without repair if the differences are not
too great. How much imperfection is too
much? How great a difference is too
great?
In
addition to uncertainty with respect to mutuality of knowledge, there is
uncertainty as to whether purely mental perlocutionary effects have been
achieved. If a tutor asserts some fact
to a student is the tutor justified in assuming that the students now knows the
fact? How well is well enough? These are issues which might be addressed by
psycho-linguistic experimentation.
Conversations could be induced in which responses indicating various
degrees of comprehension are returned to the subjects.
Finally,
if conversants are not certain of the other’s knowledge, they are also
uncertain as to their own knowledge--or at least their understanding of their
knowledge. As they listen, conversants
seem to assume that if they are momentarily off track they will eventually
recover using later information. Thus
they may continue to give affirming signals to the speaker despite imperfect
understanding of what is being said.
Again, this phenomenon seems to be a matter of gradation. The level of tolerable uncertainty probably
varies with context. To what extent,
then, are conversants apt to defer repair?
In interacting with computers, the perceived (or real) bother of repair
may lead people to defer it to point of irreparability; the program’s limited
ability to track the conversant’s knowledge may have been quickly exhausted.
Modality
The
bandwidth limitations of human-computer interaction appear to limit the richness
and ease of meta-locutionary action.
Differences in modality cause changes in interaction patterns (see e.g.,
Argyle & Cook, 1976; Chapanis, et al., 1972, 1977; Grosz, 1982; Cohen,
1984). The changes are not always
straightforward:
Under telephone or no-vision conditions,
utterances are often shorter, some studies find more pauses, and less mutual
influence, but there are fewer interruptions.
There is some evidence of poorer synchronizing (more pauses), and for transfer
of function to other signals (more attention signals). There is a reversal of a clear-cut
expectation about interruptions, suggesting either that people learn to use
different cues over the telephone, or that the shift of gaze cue is not very
helpful. Evidently people either do not
or cannot interrupt each other if they cannot see each other. (Argyle & Cook, 1976, pp. 163-164)
This
evidence suggests that with present technology the human-computer interface is
at an inherent disadvantage for mixed-initiative interaction. There are a number of approaches for possible
solutions to this problem. The first
approach would involve alternate lexicalization of meta-locutionary acts which
are expressed in the physical channel.
The idea here is to provide, through design, a set of verbal lexemes
that correspond to physical kinemes.
Unfortunately, the history of artificial languages in human use is
dim. The languages which thrive are
those created through conventional acceptance of negotiated units and
meanings. This suggests a second but more
difficult approach: provide the computer
program with (1) the skills to negotiate the use of meta-locutionary
interaction and (2) connections to the community of programs in which the
language is being negotiated. Human
beings would have, I surmise, great difficulty in developing language skills
(and the language itself) if isolated from the general community of language
users. If anything, computer programs
are more susceptible to failure on this account because their language-learning
skills are at best rudimentary.
Beyond
alternate lexicalization, it may be that different modalities lead to the use
of different meta-locutionary acts altogether.
That is, conversants do not simply express the same acts in different
ways; rather, they actually interact differently. This is, I think, a fair inference from the
results described by Argyle and Cook (1976) and Grosz (1982). Some of the observed differences may not
arise directly out of the limitations on the kinds of communication
permitted. Instead, there may be
second-order effects from, for example, declines in the speed of feedback. That is, feedback would still be possible
through the available channels, but the process of conversion would be too
slow. Such effects certainly might lead
to significant changes in the naturalistic feel of the interaction. Furthermore, it may be the case that the
feedback-based production processes are, through nature or through entrenched
habit, dependent on prompt interaction for their effectiveness. In this case, substitute acts for feedback
would have to account for speed of interaction.
In computer-human interaction through graphically and aurally based
interfaces, the ease and speed of feedback is asymmetric. The computer program can provide feedback
much more rapidly than it can interpret it.
This suggests that if meta-locutionary acts are to be useful for
computer interfaces greater efforts should be made in the technology of
physical input devices.
Humans and Computers
Over
the course of the preceding chapters, I have tried to account for the
differences between human-human and human-computer interaction. Many of these differences, I argued, can be
explained by understanding mixed-initiative discourse as a multi-level process
which uses meta-acts to produce coherence.
In reaching these conclusions (and in applying them), though, we again
face the limits of our knowledge of fundamental aspects of mind. In the case of intention, for example, the
model presented in this dissertation achieves a reasonable level of success by
ascribing intentionality to meta-behaviors.
The upper levels of the conversational process rely on fairly diffuse
intentions which are then translated into smaller and more specific
intentions. In this process, planning is
not used (although it might be useful in generating the high-level intentions);
rather, the intentional acts set up expectations which are taken into account
as part of the conversational context for later action. But as I discussed in Chapter III, the limits
of this approach are near. It may be
that intention itself is not a useful concept for inducing action. Suchman (1987) suggested that plans are a post-hoc
rationalization of the organization of behavior rather than the mechanism which
produces it. How can we exclude, then,
the possibility that intention itself is not simply a post-hoc rationalization
of the causes of behavior rather than the mechanism which leads to it? The problem, though, is that at this point we
simply do not have an alternative theory of action. If not for intention, how else--or why
else--would people do things?
Another
fundamental limit arises out of what we perceive to be mixed-initiative
interaction. In this dissertation, I
have attempted to explicate mixed-initiative discourse through meta-acts. But if these acts are all co-temporally
related behaviors at different conversational levels, the problem of mixing
initiative also may arise at the meta-locutionary levels. I have suggested that the “lower” levels of
conversational interaction represent a set of base cases for this recursion,
but this cannot be concluded with confidence absent further research.
Yet
even if we are reaching limits of knowledge, I hope that these limits are
better defined and better illuminated as the result of this work. In the issues which I discussed in this work,
the fundamentals come down to the fact that research into human-computer
interaction must tell us more about what it means to be a human being, what it
means to be a computer, and what it means to interact.