Chapter
VI
The
Simulation Study
To
test whether meta-locutionary operators would plausibly explain or reproduce
the behaviors observed in the protocols, a forward‑chaining rule‑based
simulation was built following the computational model developed through
protocol analysis. The simulation was a partial implementation of the theory as
a rule‑based system. This work was done in two parts: first, the creation
of a meta‑interpreter in Prolog in which to write and execute rules; and
second the development and running of rules which represent the meta‑locutionary
operators. The goals of the simulation were to determine if the theory and
model actually produced plausible control in interaction and to serve as an
experimental environment for meta‑locutionary operators. The system has
some similarities with that written by Power (1979), who modeled a plan‑based
conversation between two robots named John and Mary. Here, for example, is the beginning of an
exchange:14
1. JOHN: MARY. (call by name)
2. MARY: YES. (acknowledgment)
3. JOHN: I WANT TO SUGGEST A GOAL. (ANNOUNCE 1)
4. MARY: GO AHEAD. (ANNOUNCE 2)
5. JOHN: WILL YOU HELP ME GET IN? (AGREEGOAL 1)
6. MARY: BY ALL MEANS. (AGREEGOAL 2) ....
7. JOHN: SHALL WE MAKE A PLAN? (ANNOUNCE 1)
8. MARY: JOHN. (call by name)
9. JOHN: YES. (acknowledgment)
10. MARY: MAY I ASK YOU SOMETHING? (ANNOUNCE 1)
11. JOHN: GO AHEAD. (ANNOUNCE 2)
12. MARY: ARE YOU IN? (ASK 1)
13. JOHN: NO. (ASK 2) (Power, 1979, p. 116)
The
simulation which is described in this chapter is similar to that by Power,
except that the meta‑locutionary simulation (1) goes beyond
representation of illocution to address control issues, (2) uses situated
action in place of top-down planning; and (3) models observed instead of
idealized interaction.
Representation
Both
development of the rule‑based system and creation of rules embodying the
meta‑locutionary operators depended on a representational scheme for
conversational models of conversants. The representation for both the mental
states and the operators was necessarily more complex than that for the more
abstract states and operators used for the protocol analysis. Natural language
was not used directly as the representation of either states or operators. With
respect to the internal representation, there is no reason to believe that
language is a plausible representation for mental states. Stucky (1988) pointed
out that utterance structures, even translated into predicate logic, are
unlikely to be the key to adequate representations of successfully understood utterances.
Such successful states must be formed in such a way that they can tie
indexicals in an utterance to features of the context. That is, the situated
character of language action must be accounted for in the representation. With
respect to the operators and their output,
A difficulty in writing programs of this kind is
that to understand the point of a remark one must normally begin by
understanding its literal meaning. Assuming that the programmer has nothing to
say about the latter, he must either: (a) run the conversation in the program’s
internal semantic notation, (b) translate between this notation and English by
unprincipled short cuts. (Power, 1979, p.109)
Thus
the meta-locutionary simulation expresses its actions in its internal
representation; to do otherwise would be misleading, because the model does not
really address issues of lexicalization at all.15 There would be an
appearance of linguistic skill not actually found in the model. Interestingly,
it would have been possible to express the output acts of the simulation in
natural‑language form, perhaps making the exchanges more immediately
understandable.
In
the rule-based simulation, the conversants’ mental states were represented as
aggregations of currently‑active individual states. These states were
either beliefs or acts. Of course, the mental representation of an act in some
sense is also a belief. However, the propositional content of acts seems
distinct from other statements about the world. Acts have an inherent temporal
aspect; they occur rather than exist. An act of clarification has exists while
it is happening, but then has no existence independent from its recollection.
It may have been true that an act took place, but it is no longer a part of the
world except for its effects, including its own recollection. In contrast,
beliefs about the world may have a temporal element, but they generally also
have a continued existence that is dependent on but not derived from memory.
For example, the belief that someone is confused has validity beyond the moment
of recognition of the state; such a belief has existence over some period of
time.
Accordingly,
the computational model represents acts and beliefs as follows:
state (act (Actor,
<act>(<act args>), Belief_state), Cycle_added, Cycle_deleted)
state (believe
(Actor,<belief>(<belief args>), Belief_state), Cycle_added,
Cycle_deleted)
This
notation is similar to the Prolog‑style syntax used in representing the
protocol. Lower‑case names represent predicate labels, capitalized names
are variables, and <bracketed> items represent either predicate labels
for acts or beliefs (give_turn, next_letter), or values for arguments to the
predicates.
The
Belief_state variable contains a value expressing the truth value of the belief
or act. This approach has a number of advantages. It permits the direct and
explicit representation of belief in falsity rather than relying on non‑existence
to stand for falsity. It also encodes intentionality. The full set of values
for belief state are true, false, mutually_known_true, mutually_known_false,
goal_true, goal_false, goal_mutually_known_true, and goal_mutually_known_false.16
The
Actor variable is filled by the name of one of the conversants. The cycle
values indicate when a state entered and left memory. They do not indicate that
a fact became true or false; truth values are a function of the Belief_state
variable. A negative value for Cycle_deleted indicates that the state is still
in active memory.
Using
this notation, then, the state
state ( act ( angela , give_turn ( phil ),
goal_true ), 5, 8)
means
that the conversant in whose memory this state is placed had as an active fact
from cycles 5 to 8 a belief that Angela wants to give the turn to Phil. This
state alone does not indicate why the fact was removed from active memory. It
may have been that the fact was simply no longer relevant through the action of
an operator. It may have been that the fact, which expressed an intention, was
not needed after cycle 8 because the goal was no longer being sought. In the
current implementation of the system, the cycles are used by the matcher and
conflict resolver, but are not available to the operators. The implications of
this limitation are discussed below.
A
conversant's memory, then, consists of a bag of such states, where the cycles
indicate what is currently believed and what was believed before. Nothing in
the system automatically prevents the insertion of conflicting states. It is
left to the operators to assure that the states achieved are rational ones.
The
sequence-recall-task domain also presented issues for knowledge representation.
It was apparent from the protocols that although the conversants could recite
the entire list of letters, they dealt locally with confirmation and
communication. Typically, conversants would go through a short part of the
sequence of letters, and then explain that these letters were followed by N
blanks. In effect, they created subsequence substructures from the sequence as
whole using the apparent features of the otherwise random sequences. This
behavior is consistent with that observed in psychological studies of the
ability to recall or to predict sequences. People tend to induce pattern
descriptions from the sequences and rely on notions of same and next alphabetic
characters, iteration of subpatterns, and hierarchic phrase structure (Simon,
1972). The maximal length of the subsequences in the model was set to either
(1) the extent to which the symbols were of the same sort (i.e., letter or blank),
or (2) a chunking constant representing the capacity of short‑term
memory. In this study, the chunking constant was set to five, which appears to
be a psychologically plausible value for sequence recall (Simon, 1974). The
domain-related functions which created the subsequences are listed in Appendix
J. As implemented, subsequences and letters have indexes back into their
super-sequence.
The
operators are represented in standard rule form, as shown in Figure 14. The
“if”, “not”, and “test” parts of the rule represent a differentiation of
functions grouped together in the simpler “IF” part of the conversational
operators discussed in Chapter V. If the “if” clauses are true, the “not”
clauses cannot be proved, and the “test” predicates are true, then the operator
is instantiated because its felicity conditions obtain. If the operator is
executed, the “action” acts are posted to the memories of both conversants and
the “effects” states are added or deleted as indicated from the memory of the
conversant executing the operator. The attributes contain name and level
information useful to readers and to the conflict resolver; the levels are
intended to correspond to the levels of illocutionary acts presented in Figure
2.
if ([ <state1>, <state2>, ... ]),
not ([ <state3>, <state4>, ... ]),
test ([ <predicate1>, <predicate2>,
... ]),
action ([ <act1>, <act2>, ... ]),
effects ([ add (<state5>), del
(<state6>), ... ]),
attributes ([ name
(<operator_name>), level ( <illocutionary_level>)] ).
Figure 14. Form of rule representing meta‑locutionary
operator. The “if” clauses are matched against the conversant's active memory.
The “not” clauses succeed if they cannot be matched against memory. The “test”
clauses are predicates, written in Prolog, which must succeed if the rule is to
be fired. If all clauses are true and the rule is fired, the acts contained as
“action” clauses are added to the memories of both conversants, with their
variables instantiated as matched in the “if” clauses. The conversant's model
is modified by the accordingly instantiated “effects” clauses.
The Rule-Based System
I
now turn to a description of the forward‑chaining rule‑based system
with which the computational model was tested. I first discuss the
specifications for the system generally, and then report on how the
specifications were met with the system as implemented.
Specifications
In
the abstract, the rule‑based system had to meet certain criteria. First,
it had to be capable of reasoning from the predicate representations of its
facts. Second, it had to have independent knowledge bases for the different
conversants while allowing for communicative action. Third, it had to be
capable of simulating simultaneous action by the conversants. And fourth, it
had to allow customizable functions for conflict‑resolution. To be able
to reason from the predicate representation of the facts, the system was built
on top of Prolog. It turns out that as implemented, virtually all of the domain
and meta knowledge was contained directly in the operators rather than in a
separate planning or reasoning subsystem.
Having
independent knowledge bases was critical to the simulation. The concept of
conversation as a jointly constructed entity meant that each conversant
maintains their own model of the conversation while believing the model to be
shared one. To the extent that divergences in the models are detected by the
conversants, repair interaction should occur. This means that the knowledge
bases had to be set up so that operators could be written generally enough to
apply to both conversants yet avoid matching on memorial structures of the
other conversant. At the same time, a path between the agents has to exist in
order for the acts to be communicated. In the operator representation, the
actions are posted as acts in both conversants' knowledge bases.
Simultaneous
action has three aspects: inter‑agent, intra‑agent, and mental‑process.
Inter‑agent simultaneity means having both conversants take their actions
at the same mythical time as the other. A normal cycle‑by‑cycle
alternate execution of the conversant’s operators would not have permitted
simulation of simultaneous action. That is, the model agents must be able to
take action on every cycle irrespective of the implemented order of their
actual execution. This can be achieved in a specialized rule‑based system
by designing the cycling functions to take account of the two agents and their
independent knowledge bases. In contrast, Power’s (1979) agents, named John and
Mary, were not capable of simultaneous action. They were separate programs
which were explicitly given sequential turns.
Intra‑agent
simultaneity means allowing the agents to execute more than one instantiated
rule per cycle, unlike, for example, YAPS (Allen, 1983) or OPS5 (Forgy, 1981).
The hierarchical nature of illocutionary acts suggested by the feedback‑based
control of natural conversation implies that acts on the various levels are
occurring at the same time. Cohen and Levesque (1982) observed that a single
utterance may contain multiple acts. This requires special handling of the
conflict set. The agents written by Power (1979) did not have this problem
because they were plan‑based rather than rule‑based.
By
mental‑process simultaneity I mean that the system has to recognize that
understanding, reasoning, and action all have temporal extent, and all have to
be able to occur both sequentially and simultaneously. That is, while an agent
is taking one action, it may also be recognizing an action by the other agent and
reasoning about states and goals. Power (1979) did not address this issue.
Finally,
the control knowledge needed for conflict resolution has to be domain specific.
Aside from the need for executing multiple instantiations, the conflict
resolver has to be able to reason about consistent and conflicting acts and
effects, and the relationships among different classes of acts. Moreover,
conflict resolution may need to be tuned as more experience is gained with the
simulation or different hypotheses are tested with respect to reasoning about
actions. For example, if there are conflicting effects among acts at different
conversational levels, should the system execute the lowest‑level
operator or the highest‑level? Or perhaps the system should use some
other, situation‑based meta‑rules (see Davis, 1980). Although the
system is not able to handle meta‑rules, it does rely on customizable
conflict resolution in the spirit of McDermott and Forgy (1978) and ORBS
(Fickas & Novick, 1985). The basic conflict resolution strategy used in the
simulation was (1) reduce the conflict set to avoid multiple instantiations of
operators at the same illocutionary level; (2) find the lowest‑level
instantiation, and prefer it; and (3) eliminate any instantiations whose acts
and effects conflict with those of the preferred instantiation. The specific
functions which handle conflict resolution can be found under the “select_ops”
clauses in Appendix G.
Implementation
The
rule‑based system in which the simulations was conducted was written in
SICStus Prolog (Swedish Institute of Computer Science, 1988), running on a VAX
11‑750 computer. SICStus Prolog is similar in syntax and essential
semantics to DECsystem‑10 PROLOG and Quintus Prolog. A listing of the
system is presented in Appendix G (although utility clauses commonly used in
Prolog programs are omitted).
Basically,
the system consists of a set of conventions about representation of states, a
rule‑matcher, a conflict resolution function, a driver for cyclic action,
some I/O functions, and some development tools. The conventions for
representation are exactly as discussed above. They had been prototyped as
objects in KEE 3.0 and then unfolded into predicate representations for
purposes of clarity. The separation of the knowledge bases for the two
conversants (named Adam and Barney, representing subjects A and B) was
implemented through the use of the names of the conversants as a key into
Prolog's internal database. Facts in memory are considered active if they were
added before the current cycle and either (1) never deleted or (2) deleted
after the current cycle.
The
rule matcher takes a straightforward brute‑force approach rather than
using a rete‑style approach. Given the current number of rules (fewer
than 40) and the number of facts in the knowledge‑base (fewer than 1000
after 20 cycles), the matcher's performance is acceptable. It relies on
Prolog's internal unification functions. Note that the operators’ “not” clauses
refer to the inability to match facts in the memory rather than explicit
falsity. This means that the nots-matcher will find explicitly false states in
active memory but will also find any non‑existent state. This approach is
much more efficient than the alternative in representing knowledge in memory.
The tests are simply evaluated in Prolog directly, with the variables in the
test predicates instantiated from the “if” and “not” clauses. The test
predicates can also be used to generate values used in the right‑hand-side
clauses of the operator. A set of predicates for use as test predicates in the
experimental domain are found in Appendix J.
The
conflict resolution function is called by the predicate select_ops/2. The
arguments are the conflict set and a subset selected for execution. In the
present implementation, the work of resolving conflicts among the matched
instantiations is performed by making sure that only one act on any
conversational level is chosen. This choice was made (1) in the interests of simplicity,
and (2) as a way of expressing, as observed in the protocol discourse, that the
conversants utterances generally focus on one theme. A problem with this
function as implemented is that it simply uses rule order to determine which of
the acts in any given level should be executed. This is an area where meta‑rules
for conflict resolution, representing processes for determining intentional
choices, would be of value. It would also be interesting to relax this
constraint and see what happens if, for example, multiple domain‑level
acts were permitted simultaneously. This might reproduce the indirect speech‑act
behavior observed in the protocol at Ai2. In the present function, having
reduced the conflict set in this manner, each act is assigned a priority
corresponding to its level. Currently, the system chooses the lowest‑level
act as the most important. The rationale for this design decision was to
preserve to coherence of the flow of the discourse, keeping up the feedback,
even if it meant that domain‑level actions were sometimes deferred. This
reflects the behaviors observed in the protocol, where the confirmations of
mutuality of knowledge continue even as a conversant is re‑evaluating his
or her domain goals. Finally, the conflict resolution function makes sure that
the selected instantiations of operators have actions and effects which are
consistent with each other. That is, one operator should not be adding a state
which is being deleted by another. Only those operators whose effects are
consistent with those of the priority operator are selected. The implementation
of this function is reasonably straightforward; it is the design choice
underlying it which still needs thought. It turns out that some operators will
take an action--thus adding it to the conversant's own knowledge base--and then
immediately delete it because a record of the action turns out not to be needed
in active memory. The operator give_turn_1 (listed in Appendix H) is an example
of this. In other cases, variations in belief states may cause apparent
inconsistency to be found where the operators are really in substantial
agreement. For example, an act added as a goal might also be added as a known
act. Both of these problems would be minimized or eliminated by expansion of
the system's capability to reason about opportunistic action and the
conversant's own intentionality.
The
driver for the rule-based system runs the match-resolve‑execute cycle for
one conversant and then the other, using the same cycle number. Actions taken
by the first agent are not used by the second agent in processing on the same
cycle. This was implemented via the definition of active memory as those states
which had been added before the current cycle. Drivers for single‑cycle,
multiple‑cycle, and indefinite execution are available. These are
run_single_cycle, run_to_cycle, and run, respectively. The indefinite execution
halts when the system completes a cycle in which neither conversant has had any
rule fire.
The
I/O clauses create two output forms, a short trace and a full trace. The short
trace is displayed on the screen as the system runs and is also incrementally
appended into a file. The short trace shows which operators fired and what
actions were taken. A short trace of a simulation of the protocol described in
Chapter V is set out in Appendix E. The full trace is incrementally appended
into a different file. This trace contains a cycle-by-cycle listing of active
memory and full representations of all of the operators executed on each cycle.
A full trace of the simulation protocol is set out in Appendix F.
The
development tools for the system include clauses to check matching of a given
operator for a conversant, to display active memory, to display other states in
memory for other cycles, to display an operator, to run a cycle with full
output to the terminal, to reinitialize memory and the operators, to add and
delete states in the knowledge base, and to back the system up to a previous
cycle state. These tools proved useful in both refining the operators and
simulating miscomprehension by changing states in memory.
Implementation of the
Model as Rules
In
building a computational system which models the meta‑locutionary acts,
one might begin by recognizing that there is a correspondence between the action‑producing
rules and the acts themselves. That is, the firing of a rule which has as its
consequence an act represents an actualization or embodiment of the situated
act. There might be, though, situations which produce the same sort of act
which differ by more than mere instantiation of their premises; wholly distinct
sets of premises might cause the taking of the same act. As a result, acts
should typically be represented by a corresponding set of rules (or operators,
as they are called in the theory) rather than a single rule. In the particular
system I have built, the rules create an instantiation of the act as a record
of its being taken.
Through
the process of developing the rules, intermixture of actions of different
levels in the same rule was reduced and eventually eliminated. That is, rules
represent operators at specific levels of the discourse hierarchy. The effect
of one utterance embodying acts of different levels was produced through
refinement of the left-hand-sides of the rules for the different‑level
operators. For example, an instantiation of the rule do_request, an information‑level
rule, is typically accompanied by an instantiation of one of the give_turn
rules, which are turn‑level rules.
Meta-Locutionary Rules
The
simulation modeled meta-locutionary operators for turn‑taking and
utterance expression. The operators were written as rules in the turn‑taking
and information classes, respectively. For both levels, there were three basic
types of rules: reasoning about intention, doing an act, and recognizing an act
by the other agent. Figure 15 shows a rule for the act of assertion and Figure
16 shows a rule for recognizing an assertion by the other agent. The complete
set of meta‑locutionary rules used in the simulation is presented in Appendix
H.
The
rule shown in Figure 15 has one action, namely the asserting, and changes the
belief state of the triggering clause from goal_true to true (via automatic
posting of the action into the conversant's own active memory). The rule in
Figure 16 has no action. It captures part of the process of reasoning about the
other agent's knowledge. The effect of executing an instantiation of
assertion_received_1 is to add to the conversant's active memory a state which
represents the conversant's belief as to the belief of the other agent. This
rule is na•ve, because it is possible that the other agent is not telling the
truth, but as no one in the protocols was found to have intentionally not
spoken the truth, a more sophisticated rule was not needed for the simulation.
op( %
do_assert
if([ conversants(Me,Other),
act(Me,assert(Info,Other),goal_true),
believe(Me,turn(Me),mutually_known_true) ]),
not([]),
test([]),
action([ act(Me,assert(Info,Other),true) ]),
effects([ del(act(Me,assert(Info,Other),goal_true))
]),
atts([ name(do_assert),
level(information) ]) ).
Figure 15. Meta‑locutionary rule
do_assert. The rule is represented as a Prolog clause. Comments are marked by
the `%’ symbol. There are three `if’ clauses for the operator; these are are
matched against a conversant's active memory. There is only one `action’
clause; it is added to the memory of both conversants.
op( %
assertion_received_1
if([ conversants(Me,Other),
act(Other,assert(believe(Other,Info,Belief_value),Me),true),
believe(Me,turn(Other),mutually_known_true)
]),
not([ believe(Other,Info,Belief_value) ]),
test([]),
action([]),
effects([
add(believe(Other,Info,Belief_value)) ]),
atts([ name(assertion_received_1),
level(information) ]) ).
Figure 16. Meta‑locutionary rule
assertion_received_1. The rule is represented as a Prolog clause. Comments are
marked by the `%’ symbol.
The
simulation showed that the number and complexity of rules needed to handle these
basic functions is quite small. In all, 15 meta-locutionary operators were used
in the simulation. Of course, these rules represented only two meta‑locutionary
conversational levels in addition to domain‑knowledge levels. The other
conversational levels implied in Figure 2 would have corresponding sets of
additional operators in a more complete conversational simulation. Thus to the
extent that additional skills for interaction are needed for coherence of more
complex conversation, many more new operators might have to be added. The
operators developed in this dissertation constitute perhaps 35 percent of the
operators at their own conversational level. Accordingly, they certainly
constitute no more than 10 percent of the total number of conversational
operators for all meta‑locutionary conversational levels from turn‑taking
up.
Domain Rules
Representing
the illocutionary knowledge of the conversants turned out to be a more
difficult task than representing the meta‑locutionary knowledge, even in
such an apparently simple domain. Fifteen meta‑locutionary rules were
used in the simulation. Eighteen domain rules were written, of which 16 were
used in the simulation whose trace is presented here. Again, there were rules
for reasoning, taking, and recognizing acts. Figure 17 shows an example of a
domain rule which reasons about acts to be taken. The rule,
confirmed_next_letter, is instantiated if the agent (1) is trying to confirm
mutual knowledge of a subsequence, (2) is trying to confirm mutual knowledge of
a letter (presumably in that subsequence), and (3) now believes that the letter
is, in fact, mutually known true. As a result, this operator will then remove
the goals about the current letter and establish goals for the next letter to
be confirmed. If there had been no next letter (i.e., if this had been the last
letter in the subsequence), the test clause would have failed; I expect that
the rule confirm_last_letter would then have been instantiated instead.
op( %
confirmed_next_letter
if([ conversants(Me,Other),
act(Me,confirm_mutual(subsequence(S_index,List)),goal_true),
act(Me,confirm_mutual(next_letter(letter(Index,Letter))),goal_true),
believe(Me,next_letter(letter(Index,Letter)),mutually_known_true) ]),
not([]),
test([ get_next_letter(Index,List,New_letter)
]),
actions([]),
effects([
del(act(Me,confirm_mutual(next_letter(letter(Index,Letter))),goal_true)),
add(act(Me,confirm_mutual(next_letter(letter(Index,Letter))),true)),
del(believe(Me,next_letter(letter(Index,Letter)),mutually_known_true)),
add(believe(Me,next_letter(New_letter),true))
]),
atts([ name(confirmed_next_letter),
level(domain) ]) )
Figure 17. Domain rule confirmed_next_letter.
The rule is represented as a Prolog clause. Comments are marked by the `%’ symbol.
The
biggest problem with the domain rules was prevention of repetitious and
redundant instantiations. This was caused by the system’s design using the
situated action approach rather than a stack-based planner. That is, if a rule
could be executed on one cycle, it would be likely to fire on the next cycle as
well. This problem was tempered through the use of not clauses, which took
account of the operator's own history in posting to the knowledge base. Such
extensive use of nots, however, seems (1) unduly cumbersome and (2) unrealistic
as a representation of human memory. Much of this process could be moved to the
conflict resolution process if that process could use meta-logic to take
account of the agent's intentional state. In other words, although an agent
could repeat an action indefinitely, the fact that the agent has already
performed an action usually means that the agent does not consider doing it
again; changes in circumstances (which could include lack of response) might
cause an action to be repeated. Thus as a strategy for conflict resolution, it
might be possible choose actions on the basis of what the agent believes it is
currently doing in the conversation. This implies that the conflict resolution
strategy would have to be tied to the agent's history.
Another
helpful approach to the control issue would be the addition of additional meta‑rules
which specifically had memory maintenance responsibilities. Thus, for example,
a representation of focus structure could allow the operation of rules which
removed facts from active memory when they pertained to a matter not in focus;
other rules would restore facts to active memory when their subject again
became the focus of the conversation.
Results of the
Simulation
After
development of the rule set, the system was used to simulate the conversation
reported in the protocol analyzed in Chapter V. The initial state of the system
is listed in Appendix K. In brief, for each agent the initial state identified
the conversants, set up goals of confirming mutually their respective
sequences, identified the first subsequence, and set an initial turn state. In
addition, Adam’s initial state included an act Barney’s giving the turn to
Adam; this brought the simulated conversation up to the point of A’s act Ai1,
and the simulation proceeded from there.
Simulation of
Experimental Protocol
The
first simulation looked at what the agents’ conversation would be like in the
absence of the misconstruals caused by homophony observed in the experimental
protocol. Short and full traces of this simulation are presented in Appendices
E and F. Figure 18 shows the first two full cycles of the short trace.
For
cycle 1, the trace shows Adam reasoning that he should try to request the next
subsequence. (Recall that A’s first letter was a blank.) At the same time, Adam
takes an act: acknowledgment of Barney’s initial‑state act of giving Adam
the turn. Simultaneously, Barney is reasoning that he should try to confirm the
next subsequence of his sequence. (B’s sequence began with a letter.) For cycle
2, Adam, using his turn, takes an information‑level act: requesting that
Barney assert the first subsequence because he, Adam, has a blank. At the same
time, an operator at the turn level fires, taking the act of giving Barney the
turn. This is, clearly enough, a salutary effect, as Adam would want Barney to
be able to respond to his request. Meanwhile, Barney is still reasoning about
what to do next. He realizes that he will need to confirm the next (which in
this case is the first) letter of the subsequence. At the same time, he
recognizes that Adam has the turn on this cycle. At this point, the simulation
has progressed up through act Ai2. In later cycles, Barney then asserts that
the first letter is "I." Without the miscommunication engendered by
homophony, he then requests that Adam tell him the second letter, giving Adam
the turn. Adam does so. The agents then go on to begin to confirm the next
subsequences. At this point, as can be seen in the traces in Appendices E and F,
the conversation begins to break down. At cycle 19, the conversation stops.
This failure is attributable to (1) the lack of domain rules for proper
handling of the confirmation of individual letters, and (2) the lack of rules
for agreement on which subsequence is being looked at. With the refinement and
addition of these rules, it appears that the conversation would have continued
until the sequences were mutually confirmed. As it was, the simulation was able
to get through the first three or four exchanges in a meaningful (to us),
plausible way.
adam's acts for cycle 1:
goal_request_next_subsequence --
acknowledge_my_turn --
act(adam,acknowledge_turn(turn(adam),mutually_known_true),true)
barney's acts for cycle 1:
goal_confirm_next_subsequence --
adam's acts for cycle 2:
do_request --
act(adam,request(act(barney,assert(subsequence(1,[blank]),adam),goal_true),barney),true)
give_turn_3 --
act(adam,give_turn(barney),mutually_known_true)
barney's acts for cycle 2:
goal_confirm_next_letter_1 --
others_turn_acknowledged --
Figure 18. Trace of simulation of protocol. This
figure shows the first two complete cycles of a multi‑cycle interaction.
The reasoning and acts of each conversant are considered to be simultaneous
with those of the other conversant for a given cycle. The first level of
indentation presents the names of the operators which were executed on the that
cycle. If any acts were take, the acts follow the dash after the operator at
the second level of indentation.
Control Issues
The
results of this simulation indicate that meta‑locutionary operators can
plausibly provide the control processes which lend coherence to mixed‑initiative
discourse. The traces of the simulation show the agents requesting, taking,
giving, and holding turns as needed to accomplish their domain goals. This
control was accomplished without the use of semaphores.17 Rather,
the agents had only their own understanding of what they believed the state of
the conversation to be. Additionally, the simulation showed situated action
being used successfully to drive the agents’ responses. Until the exchanges
began to break down due to domain‑rule failures in the later cycles, a
form of global coherency in the conversation was achieved entirely through the
use of local, context‑driven operators.
It
is possible that the turn‑taking operators could create a state in which
the agents repeatedly try to take or give turns, thus finding themselves in a
loop. However, this sort of thrashing did not occur in any of the simulation
runs in this experiment. To be certain of avoiding thrashing, more
sophisticated use of the conversational history would be needed. If the agents
had memorial operators which checked for fruitless repetition in turn‑taking,
they could cause the agents to, in effect, step back and try something else.
Such control would, in effect, be meta-meta because it causes changes in the
conversation’s processes of control. The major point here is that to avoid
ineffectual actions, the agent should have operators which reflect knowledge
about the relationship between action patterns and intention.
The
converse situation of thrashing is also a concern. In the simulation, each
agent expects that if they take an action that the other agent will eventually
react. For example, if an agent makes a request, they will wait for the other
agent to respond. Thus if the other agent does not respond then the
conversation will come to an unintended end. This situation is made possible
mostly by lack of temporal awareness in the simulation. That is, none of the
operators has an explicit representation of or responsibility for the passage
of time. To overcome this problem of stalling, then, the system ought to have
operators which note failure of timely response. Such effects were in fact
observed in the protocols. For example, a reasonable interpretation of A’s
illocutionary act at Ai4 (in Figure 13) is that A has waited for a response to
her request‑act Ai2; when B fails to make a timely response, A then takes
Ai4 to renew the request by trying to repair a problem which she infers B must
have encountered in interpreting Ai2.
Simulation of
Miscomprehension
The
rule‑based system was also used as an experimental environment for
simulating the miscommunication observed in the protocol. The system was given
the same initial state, and ran through the end of cycle 4. At this point,
Adam’s knowledge base was modified by changing the substance of B’s act (as
indicated in the discussion in Chapter 5 with respect to act Bi6). The simulation
was then resumed. As it turned out, the simulation proceeded in a manner
virtually identical to that reported above. The principal reason for this
appears to be that Adam had no rule available analogous to the R1 operator of
Appendix D, which would have permitted the agent to repeat its request as an
explicit act. In the absence of operators which explicitly concern actions of
correction and repair, it appears that the simulation will not produce remedial
feedback. Accordingly, one might quite reasonably conclude that the repair
phenomena reported in the socio-linguistic literature are the result of meta‑acts
specialized for this purpose. In other words, conversational repairs can be
seen as constituting a domain in which persons often have tasks; as a consequence,
conversants develop domain‑specialized skills for particular kinds of
repairs. The lack of operators representing these skills in the simulation left
the agents unable to effect direct repairs.
Contrasts with Power’s
System
As
Power (1979) observed, the real achievement of his conversational simulation
between the robots John and Mary was that it was able to represent the point of
an utterance. His system represented a major advance in this area, in that he
was able to produce a form of coherent interaction based on recognition of
illocutionary meaning. The current research also represents the illocutionary
meaning of actions but does not follow Power in producing English text.
However, the simulation reported here is a significant advance in three
principal respects. First, the simulation models control: the agents control
and modify the flow of their own interaction. Power’s robots relied on an
executive to allocate control between them. This meant that their turns were
explicit and did not have to be recognized. Additionally, they were unable to
modify their interaction because the flow of the robots’ conversation was
basically pre-planned.
Second,
the agents in the meta-locutionary simulation use locally situated acts to
produce, without top-down planning, globally organized interaction. Power’s
robots had control stacks that recorded and executed conversational planning
procedures. Indeed, the robots explicitly discussed their actions in terms of
plans and intended plans. In the meta‑locutionary simulation, the agents
are guided by their intentions but do not have to be so explicit about
developing plans; they are able to react more flexibly to context.
Third,
the simulation models more realistic conversation instead of idealized speech.
Power’s work was directly in the tradition of speech-act theory. The
conversations he modeled were fictional, idealized interactions between robots.
The robots’ acts were accordingly stylized. In contrast, the current work
attempts to model the kinds of exchanges actually observed in human‑human
interaction. The acts used by the agents correspond to acts obtained from
analysis of the experimental protocols.
Limitations and
Extensions
The
simulation as implemented had a number of limitations. Here I describe those
limitations and suggest ways in which the system could be improved and
extended. The limitations include over-specificity in matching operators
against the agents’ memory, separation of different cognitive processes into
different cycles of the rules system, and a simplified representation for time.
Going beyond the specifics of the implementation of the simulation, it would be
possible to add other conversants and to push the notion of meta-locutionary
action to include processes of lexical choice.
Implementation Issues
I
begin this review of the simulation's limitations by examining the
implementation of the simulation as a rule-based system. One problem, then,
with the computational implementation so far is over-specificity in the
operator‑matching process. For example, operators may match and execute
because a clause just happens to hanging around from some long-previous part of
the conversation. The rules were written in such a way that they tried to
prevent this problem by consistent deletion of states from active memory. This
technique, though, suffers from drawbacks similar to those encountered with
coding “not” clauses to prevent repetitious execution. This suggests that the
conversant's models should be structured to provide appropriate contexts for
matching. Such structure may arise from the multi-layered, hierarchical model
of interaction presented in Figure 2. At the same time, there may be
discourse-segment structure more closely allied with domain structure, along
the lines of the focus spaces developed by Grosz (1981). This could be handled
either by increasing the complexity of the key into the database or by adding
cross‑state references. It may be that explicit structures for focus are
required. In the simulation, a lack of focus was most acutely noticed in the
divergence of the agents’ models of the concept of next subsequence: the agents
had different patterns of blanks and letters, and so they divided their
sequences into correspondingly different subsequences.
Another
problem involved the design choice to spread the reasoning process over
multiple cycles. This was intended to correspond to processing times for mental
processes of conversants. In some cases this proved reasonable. In others,
however, the multiple cycles needed for recognition, reasoning, and then action
caused the agents to continue behaviors that in the protocols are immediately
suspended. This suggests that the rules should be typed not only for level but
for function as well. That is, rules would be of recognition, reasoning or
action type. Then for each major cycle, the rule-based system would first fire
each set of function-type rules in order, with later sets able to use the
results of the earlier ones. This would necessitate reworking of the system for
inter-agent simultaneity, but this should not be a difficult problem. In turn,
if the temporal cost of the operators could be declared, then the system could
keep track of inter-cycle dependencies in a more sophisticated manner than that
currently used.
A
third problem with simulation also has to do with time. In the system as
implemented, all acts take unit time. For complex phenomena, this assumption is
probably unwarranted. For example, uttering "O" may possibly be a
unit act; reciting a poem surely is not. If the system is to represent
processes like interruptions, then, there will have to be an explicit account
of the temporal constraints on the physical processes of speech and
understanding. Complex utterances will have to be produced over a sequence of
cycles. At the limit, this could involve phoneme-by-phoneme (or at least
word-by-word) production, and would constitute a logical extension of the meta‑locutionary
theory to speech production. This would also enable representation of self‑correction.
Absent an adequate representation for temporal extent, then, it will not be
possible to build an adequate model of interruption or self-correction.
Possible Extensions
I
turn now to issues beyond those peculiar to the simulation’s implementation. The
system is not inherently limited to two conversants. The predicates
representing the acts were developed so as to permit multiple conversants. For
example, acts like give_turn, and do_request have slots for the “receiving”
agent. Changing the rule‑based system's driver clauses for multiple
agents would be trivial. However, multi-agent protocols, even in the sequence‑recollection
domain, have proven harder to transcribe and map into acts.
The system could also be extended to encompass
issues of lexical choice. In its current form, the model does not address the
problem of transforming illocutionary acts into locutions; the acts are
communicated directly, without being expressed as specific verbal and nonverbal
language. If the use of the system is broadened to include human agents
interacting with the simulation agents, such issues will have to be resolved.
What sort of solutions would be reasonable? Straight act-to-utterance mappings
would not be a good choice. While some physical actions have societally
conventional “emblematic” meanings (Ekman, 1980), no action has meaning without
context (Birdwhistell, 1970). That is, while under the right circumstances
raising the shoulders together is understood as a physical lexeme generally
called a shrug, there cannot be any universally appropriate action for
conveying a particular meaning usually embodied by a shrug. Thus for meta‑locutionary
acts that are normally expressed through non-verbal channels, the system cannot
simply look up the turn-holding behavior in some table; rather, an expression
must be chosen which the context will then provide the needed meaning. Indeed,
the problem of lexical choice can viewed as a conversational level in itself:
the idea of meta-locutionary processes then might be extended to encompass
operators which produced the expression of specific lexemes--verbal and
nonverbal--as their acts.
14.
In this transcript Power added parenthetical information about which
conversational procedure was being used. This information was not used by the
agents in his system and is provided only for the convenience of the reader.
15.
Actually, Power (1979) took the second approach so that observers could more
easily follow the conversational flow. He built a simple expectation-based
parser which looked for key words.
16.
This list of belief values does not contain explicit provisions for dealing
with uncertainty. In practice, a conversant’s noting in memory that some fact
about the conversation is true can serve as a hypothesis which is later confirmed
as mutually_known_true. Adding additional levels of (real) uncertainty would
have been difficult to handle gracefully in the rule-based system without
adding near-duplicate rules to handle alternative truth values. However, a
reasonable use for qualitative truth values might involve meta-control in
conflict resolution, where choices of what action to take could depend--either
way--on the certainty of the conversant’s knowledge.
17.
A semaphore is a protected variable whose value can be accessed only by two
indivisible operations. They are used in computing systems with multiple
processes (1) to ensure mutual exclusion and (2) to implement process
synchronization (Deitel, 1984).