CHI 2000 Workshop on Natural Language
Interfaces
The
Hague, The Netherlands, April 3, 2000
This paper outlines the case for use of discourse modelling for design of human computer dialogues and argues that with the growth of intelligent agents, natural language approaches to planning reactive dialogues will be necessary. Current techniques for specifying human-computer dialogues, e.g. state transition sequences, are inadequate for managing interaction with complex, intelligent agents. Designers need theoretically sound advice about how collaborative roles should be designed in agent based systems, and more importantly, how to predict users’ reactions to system initiative and perception of agents’ capabilities. In contrast to highly structured human-computer dialogues, natural human conversations change with the context and expectancies of the participants. With the spread of intelligent agents, user-system dialogues will have to adapt rapidly to changes in task contexts and participants’ expectations.
Early
attempts to use discourse models for structuring collaborative work
demonstrated that highly structured dialogues were unusable (Winograd and
Flores 1987); however, more recent planning models for structuring dialogues in
intelligent user interfaces have enabled mixed conversational initiative
(Maybury 1993, Smith and Gordon 1997). Our understanding of communication as rational
activity is well founded theoretically (e.g. Cohen and Levesque 1990), but less
easy to translate into computer models. As utterance types are highly ambiguous
as speech acts, the recognition of intention by rational processes is a process
of abduction over a large set of commonsense knowledge as well as contextual
factors (see the contributions to Bunt and Black in press or Ballim and Wilks 1992). Although some have argued (cf
Chapman 1992 quoting Button) that it is impossible in principle to encapsulate
the rules of conversation by computers, there are some convincing proofs of
concept. Practical implementors have generally chosen to adopt speech or
communicative act classifications, despite the well-known arguments against
reifying speech acts (cf Geis, 1995). Current approaches stress the inductive
acquisition of the ability to classify utterances (Carletta et al 1997; Lager
and Zinovyeva 1999) as speech act tokens. However, there is little research on
similar approaches to communicative acts enacted in conversation with computers.
The influence of people's perception of computerised agents radically affects the way they communicate with them, as well as influencing their trust and authority. This has been demonstrated in empirical studies of telephone conversations with computer agents (Falzon 1991). Linguistic theories such as situation semantics (Barwise and Perry 1983) emphasize the effect of interlocutor role and place on semantic understanding, while Clark (Clark 1996) describes how a common ground of understanding is established by discourse, shared actions and representations of information. Discourse theories of politeness (Brown and Levinson 1987) could be applied to designing computer based conversations to change users' attitudes thereby augmenting proven approaches (Moore and Paris 1993) to planning advisory dialogues based on Rhetorical Structure Theory (Mann and Thompson 1988). Although discourse theory is a promising research direction, current discourse theories have three disadvantages: first they do not consider the role of external memory and information supplied by designed artifacts; secondly, they have not been integrated with models of tasks and interaction, and third the role of user attitude towards conversational partners, although acknowledged, is not explicitly employed in predicting people’s reaction to advice or in planning responses.
The
information and perceived affordances supplied by the user interface are seen
as a vital contribution to usability in theories of display based cognition
(Norman 1988, Kieras and Meyer 1997, Kitajima and Polson 1997), while research
in distributed cognition points to the importance of role expectations,
beliefs, and interaction patterns in group working (Hutchins 1996). However the
effects of peoples' beliefs and attitudes on user system communication has
received little attention apart from recent interest in persuasive computing
(Fogg 1998, Tseng and Fogg 1999) which attempts to describe how users' beliefs
and motivations may effect usability and credibility of user interface
designs. Cognitive theories of
judgement and decision making, (Beach 1990, Payne et al 1993) could be applied
to predict how people make decisions to act or communicate based on
preconceived attitudes or by more rational reasoning about agents’ capabilities
and intentions. In our previous research we investigated the integration of
task-action models and communication to model the dependencies between agents’
actions and different types of discourse acts (Sutcliffe and Minocha 1999),
discourse models for multimedia explanations (Sutcliffe & Darzentas 1994);
how different combinations of media can be used to draw users’ attention to
important information (Sutcliffe 1999, Faraday and Sutcliffe 1999) and generic
requirements for ecommerce broking task (Sutcliffe et al 1998). This provides a
baseline for developing a discourse/action theory that can be used to predict
human responses to system initiatives and plan human computer
conversations. The discourse/action
theory needs to be synthesised with user models of attitude formation that
describe how decisions are governed by prior knowledge, experience of
interaction; and thereby predict how people's attitudes may affect their
reaction to system advice when interacting with intelligent agents.
The challenge is to develop linguistic discourse theory that predicts the requirements for human computer communication that can be operationalised to create effective and usable design advice. The theory will need to account for human computer dialogues with intelligent agents with different levels of mutual awareness. The theory will need to predict human interaction with, and conversational response to, machine based agents. To do so we need to account for human attitude towards, and reactions to, intelligent agents and how users’ perceptions develop as shared common ground through explicit communication, experience of interaction, and preconceived roles. The theory needs to deliver practical usability. Ultimately, designers need advice about when specific design features/services should be delivered in a conversational/task context. A promising approach is to use claims analysis (Carroll and Rosson 1992, Sutcliffe and Carroll 1999) to derive psychologically based design rationale from the discourse theory to guide designers.
The design problems that need be addressed are
· How the abilities and limitations of intelligent agents can be communicated to users
· How dialogues with intelligent agents can be designed so users find their advice and decision support acceptable and helpful
The discourse/action theory will require definitions of discourse acts, the predicted effects of computer generated acts on users’ understanding and actions and exchange structures for selected tasks. Discourse act formalisms have traditionally specified the intended communicative effect and some assumptions about the receiver. However we need to go further to investigate users’ preconceived attitudes to computer agents from functional descriptions and stereotypical system images, how attitudes might be changed by explanation of agents' intentions and capabilities or as a result of experience during interaction. The role of system initiative and discourse acts with different degree of politeness and perlocutionary force has to be understood with strategies for communicating agents' capabilities and constraints to users. Although some of these effects are understood in human discourse and we know that human conversational expectations with computer agents are very different, there is little guidance about how advanced human-computer conversations might be structured.
Any theoretical model needs to operationalized to provide design advice about the form of utterances and the organisation of conversational exchanges in a particular context. Claims provide a succinct means of expressing design principles as psychologically motivated design rationale linked to scenarios of use and prototypical conversations for a set of task contexts. To give one example:
Claim ID: Low level persuasion
Claims: persuading a user to undertake a course of action should be delivered with a Propose discourse act (Sutcliffe et al 1991). Propose discourse acts should motivate the proposal, convey the suggestion, and justify why the user should follow the system’s advice.
Upside: the structure of the propose act gives the user a reason for following the system advice.
Downsides: the user may not believe the system, or might find the advice too verbose.
Example: “System back is important to ensure that you don’t lose files. To backup you files select the backup option form the facilities menu and set the time and frequency for automatic backup. If you don’t have a regular backup system crashes may mean you lose all the files you have been recently working on”
Scenario of use: Susan has just installed Netscape when the system proposes that she sets a regular backup for saved messages. She does so and is thus relieved when she can restore her files after a subsequent system crash
Assumptions: User will believe the system’s advice, is capable of acting on the system advice, system’s proposal is comprehensible to the user
Effect: use follows system proposal.
Claims can be cross referenced to the theory in a simple hypermedia network so the designers can access a claim relating to a particular issue then follow links to its theoretical motivation, scope illustrated in a scenario of use and embodiment in a multimedia exemplar dialogue. Different claims can advise on explanation of agents' capabilities and limitations; the role of system initiative, discourse exchanges for specific tasks and how discourse acts might be represented in speech, text and other media. Heuristics for interpreting users’ responses to agents’ suggestions with differing degrees of politeness and perlocutory force (e.g. hints, suggestions, proposals, commands) will enable dialogues to be designed by combinations of claims that specify the form of a generated dialogue act and the probably reaction by the receiver. In this manner dialogue models for particular tasks and domains, e.g. broking and sales in ecommerce, could be developed using the theory to specify the interaction between tasks assigned to the system, the degree of system initiative, the structure of conversation, and the users’ reaction to system generated discourse.
For example for broking in ecommerce a set of discourse acts would need to be assembled for questioning the users, interpreting the reply, clarifying user requirements, making proposals (products/services that match to requirements), interpreting user choice, as well as clarifying follow up requests. Task analysis and requirements analysis should enable search criteria to be drawn up so a claims library coupled be interrogated to assemble a set of claims to form a domain sub-language, and hence dialogue structure for the application. Further claims may then be recruited to advise on natural language and multimedia representations for different types of discourse act. The representation layer will be to be informed informed by previous multimedia research (Sutcliffe et al 1998) to advise on the range of multimedia representations for agents’ capabilities and discourse acts (e.g. natural language, language plus still image, animation).
However, empirical studies will be necessary to develop and refine discourse action model by investigating the usability errors and communication problems that can be attributed to specific design features. Such studies are rarely undertaken by the natural language community who concentrate on small segments of naturally occurring human human dialogue. HCI researchers on the other hand analyse user behaviour in usability evaluation studies but tend to ignore conversation and verbal reports. Protocol and conversation analysis techniques using prototype designs and Wizard of Oz simulations could provide valuable data for analyzing users' attitudes, perceptions and interaction with intelligent agents and how these may change with experience. One set of empirical studies could assess the formation of users’ attitude to intelligent agents according to different levels of self-disclosure of intent, capabilities and limitations. Critical incidents in dialogues and interaction help to illuminate why misunderstandings occur, and subsequent actions need to be investigate to see whether qualities of preceding dialogue containing the system’s advice had the predicted effect.
A discourse-action theory that predicts users'
reaction to machine agents given a profile of their prior attitudes, knowledge
of the system and dialogue structure offers a radically new paradigm for
designing interaction that progresses beyond the current limited models (e.g.
Norman’s gulfs model) that have been used to date. Other contributions are to
link theory to practical design advice as a set of claims for designing
intelligent agents, communication of agents' capabilities, system initiative,
and ways of implementing discourse acts in natural language and other
modalities for specific communication goals (e.g. persuasion, suggestion,
advising, etc.). The paper has outlined a proposal based on the author’s
previous experience with analysis of natural human explanatory dialogues, and
ideas on where the future convergence of HCI and natural language might lie.
The main message is that that convergence needs to happen irrespective of the
modality of delivery.
References
Ballim, A. and Wilks, Y. (1992) Artificial Believers. Erlbaum.
Barwise
J. and Perry J. (1983), Situations and Attitudes, MIT Press.
Beach, L.R. (1990), Image theory: Decision making in personal and organisational contexts, J. Wiley, Chichester.
Brown, P. and Levinson, S.C. (1987) Politeness: Some universals in language usage. Cambridge: Cambridge University Press.
Bunt, H.C. and Black, W.J. to appear. Abduction, Beliefs, Context and Dialogue: Studies in Computational Pragmatics. Benjamins.
Carletta, J., Isard, A., Isard, S., Kowtko, J.C., Doherty-Sneddon, G. and Anderson, A.H. (1997) The reliability of a Dialogue Structure Coding Scheme. Computational Linguistics 23(1), pp. 13—31.
Carroll, J.M. & Rosson, M.B. (1992), Getting around the task-artifact framework: How to make claims and design by scenario. ACM Transactions on Information Systems, 10(2), pp.181—212.
Chapman, D. (1992) Computer Rules, Conversational Rules. Computational Linguistics 18(4), 531—536.
Clark H.H., (1996), Using Language, Cambridge University press.
Cohen, P.R. and Levesque, H. (1990) Rational Interaction as the Basis for Communication. In Cohen, P., Morgan, J. and Pollack, M. (Eds.) Intentions in Communication. Cambridge, MA: MIT Press.
Fogg, B.J., (1998) Persuasive Computer: Perspectives and Research Directions., In Human Factors in Computing Systems, CHI-98, pp 225—232, ACM Press.
Falzon, P. (1991), Cooperative dialogues, In Distributed decision making: Cognitive models for cooperative work. Rasmussen J., Brehmer B. and Leplat J., (eds), J. Wiley, Chichester
Geis, M.L. (1995) Speech acts and conversational interaction. Cambridge: Cambridge University Press.
Hutchins E. (1996), Cognition in the Wild. MIT press
Jokinen, K. (1993) Reasoning about coherent and co-operative system responses. Fourth European Workshop on Natural Language Generation preprints, 115—126, Pisa, April 1993.
Jokinen, K. (1994) Response Planning in Information-Seeking Dialogues. PhD thesis, University of Manchester.
Kieras, D.E. & Meyer, D.E., (1997) An overview of the EPIC architecture for Cognition and Performance with application to Human Computer Interaction. Human Computer Interaction, Vol 12(4), pp. 391-438.
Kitajima M. & Polson P.G., (1997) A comprehension based model of exploration. Human Computer Interaction, Vol 12(4), pp 345-390.
Lager, Torbjörn and Natalia Zinovjeva (1999) Training a Dialogue Act Tagger with the µ-TBL System. Paper presented at the Third Swedish Symposium on Multimodal Communication, Linköping University Natural Language Processing Laboratory (NLPLAB), October 16-17, 1999.
Maybury, M.T (1993) Planning multimedia explanations using communicative acts. In Intelligent Multimedia Interfaces, Maybury M.T. (ed), pp 60-74, MIT/AAAI press.
Norman D.A. (1988), The psychology of everyday things, Doubleday.
Payne, J.W., Bettman, J.R., & Johnson, E.J., (1993) The adaptive decision maker. Cambridge: Cambridge University Press.
Smith, R.W. and Gordon, S.A. (1997) Effects of
Variable Initiative on Linguistic Behaviour in Human-Computer Spoken Natural Language
Dialogue. Computational Linguistics 23(1), 141--168.
Sutcliffe A.G. and Carroll J.M. (1999) Designing claims for reuse in interactive systems design. International Journal of Human Computer Studies 50(3) 213-242
Sutcliffe A.G. and Maiden N.A.M. (1991) Use of discourse acts to plan explanatory discourse. EU KTR project IDEAL, Deliverable D7, Centre for HCI Design, UMIST.
Sutcliffe A.G., Ryan M., J. Hu. and Griffyth J. (1998) designing a multimedia application for the WWW: the Multimedia Broker experience. In Proceedings of IFIP WG 8.1 Working Conference Information Systems in the WWW Environment, Beijing China, Eds Rolland C., Chen Y. and Fang M.Q., pp 171-196, Chapman and Hall, London.
Sutcliffe A.G. and Minocha S., (1999), Linking business modelling to socio technical system design. In Proceedings of CAiSE’99, Advanced Information System Engineering, Jarke M. and Oberweis A., (eds), pp 73-87, Lecture Notes in Computer Science 1626, Springer, Berlin.
Tseng
S. and Fogg B.J. (1999), Credibility and Computer Technology. Communications of
the ACM, Vol 42(5), pp 39-49
Winograd T. and Flores F. (1987), Understanding computers
and cognition: A new foundation for design. Addison Wesley, Reading MA.