The Simulation Study
To test whether meta-locutionary operators would plausibly explain or reproduce the behaviors observed in the protocols, a forward‑chaining rule‑based simulation was built following the computational model developed through protocol analysis. The simulation was a partial implementation of the theory as a rule‑based system. This work was done in two parts: first, the creation of a meta‑interpreter in Prolog in which to write and execute rules; and second the development and running of rules which represent the meta‑locutionary operators. The goals of the simulation were to determine if the theory and model actually produced plausible control in interaction and to serve as an experimental environment for meta‑locutionary operators. The system has some similarities with that written by Power (1979), who modeled a plan‑based conversation between two robots named John and Mary. Here, for example, is the beginning of an exchange:14
1. JOHN: MARY. (call by name)
2. MARY: YES. (acknowledgment)
3. JOHN: I WANT TO SUGGEST A GOAL. (ANNOUNCE 1)
4. MARY: GO AHEAD. (ANNOUNCE 2)
5. JOHN: WILL YOU HELP ME GET IN? (AGREEGOAL 1)
6. MARY: BY ALL MEANS. (AGREEGOAL 2) ....
7. JOHN: SHALL WE MAKE A PLAN? (ANNOUNCE 1)
8. MARY: JOHN. (call by name)
9. JOHN: YES. (acknowledgment)
10. MARY: MAY I ASK YOU SOMETHING? (ANNOUNCE 1)
11. JOHN: GO AHEAD. (ANNOUNCE 2)
12. MARY: ARE YOU IN? (ASK 1)
13. JOHN: NO. (ASK 2) (Power, 1979, p. 116)
The simulation which is described in this chapter is similar to that by Power, except that the meta‑locutionary simulation (1) goes beyond representation of illocution to address control issues, (2) uses situated action in place of top-down planning; and (3) models observed instead of idealized interaction.
Both development of the rule‑based system and creation of rules embodying the meta‑locutionary operators depended on a representational scheme for conversational models of conversants. The representation for both the mental states and the operators was necessarily more complex than that for the more abstract states and operators used for the protocol analysis. Natural language was not used directly as the representation of either states or operators. With respect to the internal representation, there is no reason to believe that language is a plausible representation for mental states. Stucky (1988) pointed out that utterance structures, even translated into predicate logic, are unlikely to be the key to adequate representations of successfully understood utterances. Such successful states must be formed in such a way that they can tie indexicals in an utterance to features of the context. That is, the situated character of language action must be accounted for in the representation. With respect to the operators and their output,
A difficulty in writing programs of this kind is that to understand the point of a remark one must normally begin by understanding its literal meaning. Assuming that the programmer has nothing to say about the latter, he must either: (a) run the conversation in the program’s internal semantic notation, (b) translate between this notation and English by unprincipled short cuts. (Power, 1979, p.109)
Thus the meta-locutionary simulation expresses its actions in its internal representation; to do otherwise would be misleading, because the model does not really address issues of lexicalization at all.15 There would be an appearance of linguistic skill not actually found in the model. Interestingly, it would have been possible to express the output acts of the simulation in natural‑language form, perhaps making the exchanges more immediately understandable.
In the rule-based simulation, the conversants’ mental states were represented as aggregations of currently‑active individual states. These states were either beliefs or acts. Of course, the mental representation of an act in some sense is also a belief. However, the propositional content of acts seems distinct from other statements about the world. Acts have an inherent temporal aspect; they occur rather than exist. An act of clarification has exists while it is happening, but then has no existence independent from its recollection. It may have been true that an act took place, but it is no longer a part of the world except for its effects, including its own recollection. In contrast, beliefs about the world may have a temporal element, but they generally also have a continued existence that is dependent on but not derived from memory. For example, the belief that someone is confused has validity beyond the moment of recognition of the state; such a belief has existence over some period of time.
Accordingly, the computational model represents acts and beliefs as follows:
state (act (Actor, <act>(<act args>), Belief_state), Cycle_added, Cycle_deleted)
state (believe (Actor,<belief>(<belief args>), Belief_state), Cycle_added, Cycle_deleted)
This notation is similar to the Prolog‑style syntax used in representing the protocol. Lower‑case names represent predicate labels, capitalized names are variables, and <bracketed> items represent either predicate labels for acts or beliefs (give_turn, next_letter), or values for arguments to the predicates.
The Belief_state variable contains a value expressing the truth value of the belief or act. This approach has a number of advantages. It permits the direct and explicit representation of belief in falsity rather than relying on non‑existence to stand for falsity. It also encodes intentionality. The full set of values for belief state are true, false, mutually_known_true, mutually_known_false, goal_true, goal_false, goal_mutually_known_true, and goal_mutually_known_false.16
The Actor variable is filled by the name of one of the conversants. The cycle values indicate when a state entered and left memory. They do not indicate that a fact became true or false; truth values are a function of the Belief_state variable. A negative value for Cycle_deleted indicates that the state is still in active memory.
Using this notation, then, the state
state ( act ( angela , give_turn ( phil ), goal_true ), 5, 8)
means that the conversant in whose memory this state is placed had as an active fact from cycles 5 to 8 a belief that Angela wants to give the turn to Phil. This state alone does not indicate why the fact was removed from active memory. It may have been that the fact was simply no longer relevant through the action of an operator. It may have been that the fact, which expressed an intention, was not needed after cycle 8 because the goal was no longer being sought. In the current implementation of the system, the cycles are used by the matcher and conflict resolver, but are not available to the operators. The implications of this limitation are discussed below.
A conversant's memory, then, consists of a bag of such states, where the cycles indicate what is currently believed and what was believed before. Nothing in the system automatically prevents the insertion of conflicting states. It is left to the operators to assure that the states achieved are rational ones.
The sequence-recall-task domain also presented issues for knowledge representation. It was apparent from the protocols that although the conversants could recite the entire list of letters, they dealt locally with confirmation and communication. Typically, conversants would go through a short part of the sequence of letters, and then explain that these letters were followed by N blanks. In effect, they created subsequence substructures from the sequence as whole using the apparent features of the otherwise random sequences. This behavior is consistent with that observed in psychological studies of the ability to recall or to predict sequences. People tend to induce pattern descriptions from the sequences and rely on notions of same and next alphabetic characters, iteration of subpatterns, and hierarchic phrase structure (Simon, 1972). The maximal length of the subsequences in the model was set to either (1) the extent to which the symbols were of the same sort (i.e., letter or blank), or (2) a chunking constant representing the capacity of short‑term memory. In this study, the chunking constant was set to five, which appears to be a psychologically plausible value for sequence recall (Simon, 1974). The domain-related functions which created the subsequences are listed in Appendix J. As implemented, subsequences and letters have indexes back into their super-sequence.
The operators are represented in standard rule form, as shown in Figure 14. The “if”, “not”, and “test” parts of the rule represent a differentiation of functions grouped together in the simpler “IF” part of the conversational operators discussed in Chapter V. If the “if” clauses are true, the “not” clauses cannot be proved, and the “test” predicates are true, then the operator is instantiated because its felicity conditions obtain. If the operator is executed, the “action” acts are posted to the memories of both conversants and the “effects” states are added or deleted as indicated from the memory of the conversant executing the operator. The attributes contain name and level information useful to readers and to the conflict resolver; the levels are intended to correspond to the levels of illocutionary acts presented in Figure 2.
if ([ <state1>, <state2>, ... ]),
not ([ <state3>, <state4>, ... ]),
test ([ <predicate1>, <predicate2>, ... ]),
action ([ <act1>, <act2>, ... ]),
effects ([ add (<state5>), del (<state6>), ... ]),
attributes ([ name (<operator_name>), level ( <illocutionary_level>)] ).
Figure 14. Form of rule representing meta‑locutionary operator. The “if” clauses are matched against the conversant's active memory. The “not” clauses succeed if they cannot be matched against memory. The “test” clauses are predicates, written in Prolog, which must succeed if the rule is to be fired. If all clauses are true and the rule is fired, the acts contained as “action” clauses are added to the memories of both conversants, with their variables instantiated as matched in the “if” clauses. The conversant's model is modified by the accordingly instantiated “effects” clauses.
The Rule-Based System
I now turn to a description of the forward‑chaining rule‑based system with which the computational model was tested. I first discuss the specifications for the system generally, and then report on how the specifications were met with the system as implemented.
In the abstract, the rule‑based system had to meet certain criteria. First, it had to be capable of reasoning from the predicate representations of its facts. Second, it had to have independent knowledge bases for the different conversants while allowing for communicative action. Third, it had to be capable of simulating simultaneous action by the conversants. And fourth, it had to allow customizable functions for conflict‑resolution. To be able to reason from the predicate representation of the facts, the system was built on top of Prolog. It turns out that as implemented, virtually all of the domain and meta knowledge was contained directly in the operators rather than in a separate planning or reasoning subsystem.
Having independent knowledge bases was critical to the simulation. The concept of conversation as a jointly constructed entity meant that each conversant maintains their own model of the conversation while believing the model to be shared one. To the extent that divergences in the models are detected by the conversants, repair interaction should occur. This means that the knowledge bases had to be set up so that operators could be written generally enough to apply to both conversants yet avoid matching on memorial structures of the other conversant. At the same time, a path between the agents has to exist in order for the acts to be communicated. In the operator representation, the actions are posted as acts in both conversants' knowledge bases.
Simultaneous action has three aspects: inter‑agent, intra‑agent, and mental‑process. Inter‑agent simultaneity means having both conversants take their actions at the same mythical time as the other. A normal cycle‑by‑cycle alternate execution of the conversant’s operators would not have permitted simulation of simultaneous action. That is, the model agents must be able to take action on every cycle irrespective of the implemented order of their actual execution. This can be achieved in a specialized rule‑based system by designing the cycling functions to take account of the two agents and their independent knowledge bases. In contrast, Power’s (1979) agents, named John and Mary, were not capable of simultaneous action. They were separate programs which were explicitly given sequential turns.
Intra‑agent simultaneity means allowing the agents to execute more than one instantiated rule per cycle, unlike, for example, YAPS (Allen, 1983) or OPS5 (Forgy, 1981). The hierarchical nature of illocutionary acts suggested by the feedback‑based control of natural conversation implies that acts on the various levels are occurring at the same time. Cohen and Levesque (1982) observed that a single utterance may contain multiple acts. This requires special handling of the conflict set. The agents written by Power (1979) did not have this problem because they were plan‑based rather than rule‑based.
By mental‑process simultaneity I mean that the system has to recognize that understanding, reasoning, and action all have temporal extent, and all have to be able to occur both sequentially and simultaneously. That is, while an agent is taking one action, it may also be recognizing an action by the other agent and reasoning about states and goals. Power (1979) did not address this issue.
Finally, the control knowledge needed for conflict resolution has to be domain specific. Aside from the need for executing multiple instantiations, the conflict resolver has to be able to reason about consistent and conflicting acts and effects, and the relationships among different classes of acts. Moreover, conflict resolution may need to be tuned as more experience is gained with the simulation or different hypotheses are tested with respect to reasoning about actions. For example, if there are conflicting effects among acts at different conversational levels, should the system execute the lowest‑level operator or the highest‑level? Or perhaps the system should use some other, situation‑based meta‑rules (see Davis, 1980). Although the system is not able to handle meta‑rules, it does rely on customizable conflict resolution in the spirit of McDermott and Forgy (1978) and ORBS (Fickas & Novick, 1985). The basic conflict resolution strategy used in the simulation was (1) reduce the conflict set to avoid multiple instantiations of operators at the same illocutionary level; (2) find the lowest‑level instantiation, and prefer it; and (3) eliminate any instantiations whose acts and effects conflict with those of the preferred instantiation. The specific functions which handle conflict resolution can be found under the “select_ops” clauses in Appendix G.
The rule‑based system in which the simulations was conducted was written in SICStus Prolog (Swedish Institute of Computer Science, 1988), running on a VAX 11‑750 computer. SICStus Prolog is similar in syntax and essential semantics to DECsystem‑10 PROLOG and Quintus Prolog. A listing of the system is presented in Appendix G (although utility clauses commonly used in Prolog programs are omitted).
Basically, the system consists of a set of conventions about representation of states, a rule‑matcher, a conflict resolution function, a driver for cyclic action, some I/O functions, and some development tools. The conventions for representation are exactly as discussed above. They had been prototyped as objects in KEE 3.0 and then unfolded into predicate representations for purposes of clarity. The separation of the knowledge bases for the two conversants (named Adam and Barney, representing subjects A and B) was implemented through the use of the names of the conversants as a key into Prolog's internal database. Facts in memory are considered active if they were added before the current cycle and either (1) never deleted or (2) deleted after the current cycle.
The rule matcher takes a straightforward brute‑force approach rather than using a rete‑style approach. Given the current number of rules (fewer than 40) and the number of facts in the knowledge‑base (fewer than 1000 after 20 cycles), the matcher's performance is acceptable. It relies on Prolog's internal unification functions. Note that the operators’ “not” clauses refer to the inability to match facts in the memory rather than explicit falsity. This means that the nots-matcher will find explicitly false states in active memory but will also find any non‑existent state. This approach is much more efficient than the alternative in representing knowledge in memory. The tests are simply evaluated in Prolog directly, with the variables in the test predicates instantiated from the “if” and “not” clauses. The test predicates can also be used to generate values used in the right‑hand-side clauses of the operator. A set of predicates for use as test predicates in the experimental domain are found in Appendix J.
The conflict resolution function is called by the predicate select_ops/2. The arguments are the conflict set and a subset selected for execution. In the present implementation, the work of resolving conflicts among the matched instantiations is performed by making sure that only one act on any conversational level is chosen. This choice was made (1) in the interests of simplicity, and (2) as a way of expressing, as observed in the protocol discourse, that the conversants utterances generally focus on one theme. A problem with this function as implemented is that it simply uses rule order to determine which of the acts in any given level should be executed. This is an area where meta‑rules for conflict resolution, representing processes for determining intentional choices, would be of value. It would also be interesting to relax this constraint and see what happens if, for example, multiple domain‑level acts were permitted simultaneously. This might reproduce the indirect speech‑act behavior observed in the protocol at Ai2. In the present function, having reduced the conflict set in this manner, each act is assigned a priority corresponding to its level. Currently, the system chooses the lowest‑level act as the most important. The rationale for this design decision was to preserve to coherence of the flow of the discourse, keeping up the feedback, even if it meant that domain‑level actions were sometimes deferred. This reflects the behaviors observed in the protocol, where the confirmations of mutuality of knowledge continue even as a conversant is re‑evaluating his or her domain goals. Finally, the conflict resolution function makes sure that the selected instantiations of operators have actions and effects which are consistent with each other. That is, one operator should not be adding a state which is being deleted by another. Only those operators whose effects are consistent with those of the priority operator are selected. The implementation of this function is reasonably straightforward; it is the design choice underlying it which still needs thought. It turns out that some operators will take an action--thus adding it to the conversant's own knowledge base--and then immediately delete it because a record of the action turns out not to be needed in active memory. The operator give_turn_1 (listed in Appendix H) is an example of this. In other cases, variations in belief states may cause apparent inconsistency to be found where the operators are really in substantial agreement. For example, an act added as a goal might also be added as a known act. Both of these problems would be minimized or eliminated by expansion of the system's capability to reason about opportunistic action and the conversant's own intentionality.
The driver for the rule-based system runs the match-resolve‑execute cycle for one conversant and then the other, using the same cycle number. Actions taken by the first agent are not used by the second agent in processing on the same cycle. This was implemented via the definition of active memory as those states which had been added before the current cycle. Drivers for single‑cycle, multiple‑cycle, and indefinite execution are available. These are run_single_cycle, run_to_cycle, and run, respectively. The indefinite execution halts when the system completes a cycle in which neither conversant has had any rule fire.
The I/O clauses create two output forms, a short trace and a full trace. The short trace is displayed on the screen as the system runs and is also incrementally appended into a file. The short trace shows which operators fired and what actions were taken. A short trace of a simulation of the protocol described in Chapter V is set out in Appendix E. The full trace is incrementally appended into a different file. This trace contains a cycle-by-cycle listing of active memory and full representations of all of the operators executed on each cycle. A full trace of the simulation protocol is set out in Appendix F.
The development tools for the system include clauses to check matching of a given operator for a conversant, to display active memory, to display other states in memory for other cycles, to display an operator, to run a cycle with full output to the terminal, to reinitialize memory and the operators, to add and delete states in the knowledge base, and to back the system up to a previous cycle state. These tools proved useful in both refining the operators and simulating miscomprehension by changing states in memory.
Implementation of the Model as Rules
In building a computational system which models the meta‑locutionary acts, one might begin by recognizing that there is a correspondence between the action‑producing rules and the acts themselves. That is, the firing of a rule which has as its consequence an act represents an actualization or embodiment of the situated act. There might be, though, situations which produce the same sort of act which differ by more than mere instantiation of their premises; wholly distinct sets of premises might cause the taking of the same act. As a result, acts should typically be represented by a corresponding set of rules (or operators, as they are called in the theory) rather than a single rule. In the particular system I have built, the rules create an instantiation of the act as a record of its being taken.
Through the process of developing the rules, intermixture of actions of different levels in the same rule was reduced and eventually eliminated. That is, rules represent operators at specific levels of the discourse hierarchy. The effect of one utterance embodying acts of different levels was produced through refinement of the left-hand-sides of the rules for the different‑level operators. For example, an instantiation of the rule do_request, an information‑level rule, is typically accompanied by an instantiation of one of the give_turn rules, which are turn‑level rules.
The simulation modeled meta-locutionary operators for turn‑taking and utterance expression. The operators were written as rules in the turn‑taking and information classes, respectively. For both levels, there were three basic types of rules: reasoning about intention, doing an act, and recognizing an act by the other agent. Figure 15 shows a rule for the act of assertion and Figure 16 shows a rule for recognizing an assertion by the other agent. The complete set of meta‑locutionary rules used in the simulation is presented in Appendix H.
The rule shown in Figure 15 has one action, namely the asserting, and changes the belief state of the triggering clause from goal_true to true (via automatic posting of the action into the conversant's own active memory). The rule in Figure 16 has no action. It captures part of the process of reasoning about the other agent's knowledge. The effect of executing an instantiation of assertion_received_1 is to add to the conversant's active memory a state which represents the conversant's belief as to the belief of the other agent. This rule is na•ve, because it is possible that the other agent is not telling the truth, but as no one in the protocols was found to have intentionally not spoken the truth, a more sophisticated rule was not needed for the simulation.
op( % do_assert
action([ act(Me,assert(Info,Other),true) ]),
effects([ del(act(Me,assert(Info,Other),goal_true)) ]),
level(information) ]) ).
Figure 15. Meta‑locutionary rule do_assert. The rule is represented as a Prolog clause. Comments are marked by the `%’ symbol. There are three `if’ clauses for the operator; these are are matched against a conversant's active memory. There is only one `action’ clause; it is added to the memory of both conversants.
op( % assertion_received_1
not([ believe(Other,Info,Belief_value) ]),
effects([ add(believe(Other,Info,Belief_value)) ]),
level(information) ]) ).
Figure 16. Meta‑locutionary rule assertion_received_1. The rule is represented as a Prolog clause. Comments are marked by the `%’ symbol.
The simulation showed that the number and complexity of rules needed to handle these basic functions is quite small. In all, 15 meta-locutionary operators were used in the simulation. Of course, these rules represented only two meta‑locutionary conversational levels in addition to domain‑knowledge levels. The other conversational levels implied in Figure 2 would have corresponding sets of additional operators in a more complete conversational simulation. Thus to the extent that additional skills for interaction are needed for coherence of more complex conversation, many more new operators might have to be added. The operators developed in this dissertation constitute perhaps 35 percent of the operators at their own conversational level. Accordingly, they certainly constitute no more than 10 percent of the total number of conversational operators for all meta‑locutionary conversational levels from turn‑taking up.
Representing the illocutionary knowledge of the conversants turned out to be a more difficult task than representing the meta‑locutionary knowledge, even in such an apparently simple domain. Fifteen meta‑locutionary rules were used in the simulation. Eighteen domain rules were written, of which 16 were used in the simulation whose trace is presented here. Again, there were rules for reasoning, taking, and recognizing acts. Figure 17 shows an example of a domain rule which reasons about acts to be taken. The rule, confirmed_next_letter, is instantiated if the agent (1) is trying to confirm mutual knowledge of a subsequence, (2) is trying to confirm mutual knowledge of a letter (presumably in that subsequence), and (3) now believes that the letter is, in fact, mutually known true. As a result, this operator will then remove the goals about the current letter and establish goals for the next letter to be confirmed. If there had been no next letter (i.e., if this had been the last letter in the subsequence), the test clause would have failed; I expect that the rule confirm_last_letter would then have been instantiated instead.
op( % confirmed_next_letter
test([ get_next_letter(Index,List,New_letter) ]),
level(domain) ]) )
Figure 17. Domain rule confirmed_next_letter. The rule is represented as a Prolog clause. Comments are marked by the `%’ symbol.
The biggest problem with the domain rules was prevention of repetitious and redundant instantiations. This was caused by the system’s design using the situated action approach rather than a stack-based planner. That is, if a rule could be executed on one cycle, it would be likely to fire on the next cycle as well. This problem was tempered through the use of not clauses, which took account of the operator's own history in posting to the knowledge base. Such extensive use of nots, however, seems (1) unduly cumbersome and (2) unrealistic as a representation of human memory. Much of this process could be moved to the conflict resolution process if that process could use meta-logic to take account of the agent's intentional state. In other words, although an agent could repeat an action indefinitely, the fact that the agent has already performed an action usually means that the agent does not consider doing it again; changes in circumstances (which could include lack of response) might cause an action to be repeated. Thus as a strategy for conflict resolution, it might be possible choose actions on the basis of what the agent believes it is currently doing in the conversation. This implies that the conflict resolution strategy would have to be tied to the agent's history.
Another helpful approach to the control issue would be the addition of additional meta‑rules which specifically had memory maintenance responsibilities. Thus, for example, a representation of focus structure could allow the operation of rules which removed facts from active memory when they pertained to a matter not in focus; other rules would restore facts to active memory when their subject again became the focus of the conversation.
Results of the Simulation
After development of the rule set, the system was used to simulate the conversation reported in the protocol analyzed in Chapter V. The initial state of the system is listed in Appendix K. In brief, for each agent the initial state identified the conversants, set up goals of confirming mutually their respective sequences, identified the first subsequence, and set an initial turn state. In addition, Adam’s initial state included an act Barney’s giving the turn to Adam; this brought the simulated conversation up to the point of A’s act Ai1, and the simulation proceeded from there.
Simulation of Experimental Protocol
The first simulation looked at what the agents’ conversation would be like in the absence of the misconstruals caused by homophony observed in the experimental protocol. Short and full traces of this simulation are presented in Appendices E and F. Figure 18 shows the first two full cycles of the short trace.
For cycle 1, the trace shows Adam reasoning that he should try to request the next subsequence. (Recall that A’s first letter was a blank.) At the same time, Adam takes an act: acknowledgment of Barney’s initial‑state act of giving Adam the turn. Simultaneously, Barney is reasoning that he should try to confirm the next subsequence of his sequence. (B’s sequence began with a letter.) For cycle 2, Adam, using his turn, takes an information‑level act: requesting that Barney assert the first subsequence because he, Adam, has a blank. At the same time, an operator at the turn level fires, taking the act of giving Barney the turn. This is, clearly enough, a salutary effect, as Adam would want Barney to be able to respond to his request. Meanwhile, Barney is still reasoning about what to do next. He realizes that he will need to confirm the next (which in this case is the first) letter of the subsequence. At the same time, he recognizes that Adam has the turn on this cycle. At this point, the simulation has progressed up through act Ai2. In later cycles, Barney then asserts that the first letter is "I." Without the miscommunication engendered by homophony, he then requests that Adam tell him the second letter, giving Adam the turn. Adam does so. The agents then go on to begin to confirm the next subsequences. At this point, as can be seen in the traces in Appendices E and F, the conversation begins to break down. At cycle 19, the conversation stops. This failure is attributable to (1) the lack of domain rules for proper handling of the confirmation of individual letters, and (2) the lack of rules for agreement on which subsequence is being looked at. With the refinement and addition of these rules, it appears that the conversation would have continued until the sequences were mutually confirmed. As it was, the simulation was able to get through the first three or four exchanges in a meaningful (to us), plausible way.
adam's acts for cycle 1:
barney's acts for cycle 1:
adam's acts for cycle 2:
barney's acts for cycle 2:
Figure 18. Trace of simulation of protocol. This figure shows the first two complete cycles of a multi‑cycle interaction. The reasoning and acts of each conversant are considered to be simultaneous with those of the other conversant for a given cycle. The first level of indentation presents the names of the operators which were executed on the that cycle. If any acts were take, the acts follow the dash after the operator at the second level of indentation.
The results of this simulation indicate that meta‑locutionary operators can plausibly provide the control processes which lend coherence to mixed‑initiative discourse. The traces of the simulation show the agents requesting, taking, giving, and holding turns as needed to accomplish their domain goals. This control was accomplished without the use of semaphores.17 Rather, the agents had only their own understanding of what they believed the state of the conversation to be. Additionally, the simulation showed situated action being used successfully to drive the agents’ responses. Until the exchanges began to break down due to domain‑rule failures in the later cycles, a form of global coherency in the conversation was achieved entirely through the use of local, context‑driven operators.
It is possible that the turn‑taking operators could create a state in which the agents repeatedly try to take or give turns, thus finding themselves in a loop. However, this sort of thrashing did not occur in any of the simulation runs in this experiment. To be certain of avoiding thrashing, more sophisticated use of the conversational history would be needed. If the agents had memorial operators which checked for fruitless repetition in turn‑taking, they could cause the agents to, in effect, step back and try something else. Such control would, in effect, be meta-meta because it causes changes in the conversation’s processes of control. The major point here is that to avoid ineffectual actions, the agent should have operators which reflect knowledge about the relationship between action patterns and intention.
The converse situation of thrashing is also a concern. In the simulation, each agent expects that if they take an action that the other agent will eventually react. For example, if an agent makes a request, they will wait for the other agent to respond. Thus if the other agent does not respond then the conversation will come to an unintended end. This situation is made possible mostly by lack of temporal awareness in the simulation. That is, none of the operators has an explicit representation of or responsibility for the passage of time. To overcome this problem of stalling, then, the system ought to have operators which note failure of timely response. Such effects were in fact observed in the protocols. For example, a reasonable interpretation of A’s illocutionary act at Ai4 (in Figure 13) is that A has waited for a response to her request‑act Ai2; when B fails to make a timely response, A then takes Ai4 to renew the request by trying to repair a problem which she infers B must have encountered in interpreting Ai2.
Simulation of Miscomprehension
The rule‑based system was also used as an experimental environment for simulating the miscommunication observed in the protocol. The system was given the same initial state, and ran through the end of cycle 4. At this point, Adam’s knowledge base was modified by changing the substance of B’s act (as indicated in the discussion in Chapter 5 with respect to act Bi6). The simulation was then resumed. As it turned out, the simulation proceeded in a manner virtually identical to that reported above. The principal reason for this appears to be that Adam had no rule available analogous to the R1 operator of Appendix D, which would have permitted the agent to repeat its request as an explicit act. In the absence of operators which explicitly concern actions of correction and repair, it appears that the simulation will not produce remedial feedback. Accordingly, one might quite reasonably conclude that the repair phenomena reported in the socio-linguistic literature are the result of meta‑acts specialized for this purpose. In other words, conversational repairs can be seen as constituting a domain in which persons often have tasks; as a consequence, conversants develop domain‑specialized skills for particular kinds of repairs. The lack of operators representing these skills in the simulation left the agents unable to effect direct repairs.
Contrasts with Power’s System
As Power (1979) observed, the real achievement of his conversational simulation between the robots John and Mary was that it was able to represent the point of an utterance. His system represented a major advance in this area, in that he was able to produce a form of coherent interaction based on recognition of illocutionary meaning. The current research also represents the illocutionary meaning of actions but does not follow Power in producing English text. However, the simulation reported here is a significant advance in three principal respects. First, the simulation models control: the agents control and modify the flow of their own interaction. Power’s robots relied on an executive to allocate control between them. This meant that their turns were explicit and did not have to be recognized. Additionally, they were unable to modify their interaction because the flow of the robots’ conversation was basically pre-planned.
Second, the agents in the meta-locutionary simulation use locally situated acts to produce, without top-down planning, globally organized interaction. Power’s robots had control stacks that recorded and executed conversational planning procedures. Indeed, the robots explicitly discussed their actions in terms of plans and intended plans. In the meta‑locutionary simulation, the agents are guided by their intentions but do not have to be so explicit about developing plans; they are able to react more flexibly to context.
Third, the simulation models more realistic conversation instead of idealized speech. Power’s work was directly in the tradition of speech-act theory. The conversations he modeled were fictional, idealized interactions between robots. The robots’ acts were accordingly stylized. In contrast, the current work attempts to model the kinds of exchanges actually observed in human‑human interaction. The acts used by the agents correspond to acts obtained from analysis of the experimental protocols.
Limitations and Extensions
The simulation as implemented had a number of limitations. Here I describe those limitations and suggest ways in which the system could be improved and extended. The limitations include over-specificity in matching operators against the agents’ memory, separation of different cognitive processes into different cycles of the rules system, and a simplified representation for time. Going beyond the specifics of the implementation of the simulation, it would be possible to add other conversants and to push the notion of meta-locutionary action to include processes of lexical choice.
I begin this review of the simulation's limitations by examining the implementation of the simulation as a rule-based system. One problem, then, with the computational implementation so far is over-specificity in the operator‑matching process. For example, operators may match and execute because a clause just happens to hanging around from some long-previous part of the conversation. The rules were written in such a way that they tried to prevent this problem by consistent deletion of states from active memory. This technique, though, suffers from drawbacks similar to those encountered with coding “not” clauses to prevent repetitious execution. This suggests that the conversant's models should be structured to provide appropriate contexts for matching. Such structure may arise from the multi-layered, hierarchical model of interaction presented in Figure 2. At the same time, there may be discourse-segment structure more closely allied with domain structure, along the lines of the focus spaces developed by Grosz (1981). This could be handled either by increasing the complexity of the key into the database or by adding cross‑state references. It may be that explicit structures for focus are required. In the simulation, a lack of focus was most acutely noticed in the divergence of the agents’ models of the concept of next subsequence: the agents had different patterns of blanks and letters, and so they divided their sequences into correspondingly different subsequences.
Another problem involved the design choice to spread the reasoning process over multiple cycles. This was intended to correspond to processing times for mental processes of conversants. In some cases this proved reasonable. In others, however, the multiple cycles needed for recognition, reasoning, and then action caused the agents to continue behaviors that in the protocols are immediately suspended. This suggests that the rules should be typed not only for level but for function as well. That is, rules would be of recognition, reasoning or action type. Then for each major cycle, the rule-based system would first fire each set of function-type rules in order, with later sets able to use the results of the earlier ones. This would necessitate reworking of the system for inter-agent simultaneity, but this should not be a difficult problem. In turn, if the temporal cost of the operators could be declared, then the system could keep track of inter-cycle dependencies in a more sophisticated manner than that currently used.
A third problem with simulation also has to do with time. In the system as implemented, all acts take unit time. For complex phenomena, this assumption is probably unwarranted. For example, uttering "O" may possibly be a unit act; reciting a poem surely is not. If the system is to represent processes like interruptions, then, there will have to be an explicit account of the temporal constraints on the physical processes of speech and understanding. Complex utterances will have to be produced over a sequence of cycles. At the limit, this could involve phoneme-by-phoneme (or at least word-by-word) production, and would constitute a logical extension of the meta‑locutionary theory to speech production. This would also enable representation of self‑correction. Absent an adequate representation for temporal extent, then, it will not be possible to build an adequate model of interruption or self-correction.
I turn now to issues beyond those peculiar to the simulation’s implementation. The system is not inherently limited to two conversants. The predicates representing the acts were developed so as to permit multiple conversants. For example, acts like give_turn, and do_request have slots for the “receiving” agent. Changing the rule‑based system's driver clauses for multiple agents would be trivial. However, multi-agent protocols, even in the sequence‑recollection domain, have proven harder to transcribe and map into acts.
14. In this transcript Power added parenthetical information about which conversational procedure was being used. This information was not used by the agents in his system and is provided only for the convenience of the reader.
15. Actually, Power (1979) took the second approach so that observers could more easily follow the conversational flow. He built a simple expectation-based parser which looked for key words.
16. This list of belief values does not contain explicit provisions for dealing with uncertainty. In practice, a conversant’s noting in memory that some fact about the conversation is true can serve as a hypothesis which is later confirmed as mutually_known_true. Adding additional levels of (real) uncertainty would have been difficult to handle gracefully in the rule-based system without adding near-duplicate rules to handle alternative truth values. However, a reasonable use for qualitative truth values might involve meta-control in conflict resolution, where choices of what action to take could depend--either way--on the certainty of the conversant’s knowledge.
17. A semaphore is a protected variable whose value can be accessed only by two indivisible operations. They are used in computing systems with multiple processes (1) to ensure mutual exclusion and (2) to implement process synchronization (Deitel, 1984).