CHI 2000 Workshop on Natural Language Interfaces
The Hague, The Netherlands, April 3, 2000
Bridging
the gap between NLP and HCI:
A new synergy
in the name of the user
Antonella
De Angeli* and Daniela Petrelli**
* Cognitive Technology Laboratory
Department of Psychology - University of Trieste
Via S.
Anastasio , 12 ; I-34100, Trieste, Italy
deangeli@univ.trieste.it
** IRST - Istituto per la Ricerca Scientifica e Tecnologica
I-38050
Povo (Trento) – Italy
petrelli@irst.itc.it
When usability became a major concern in system design, a dichotomy among visual and verbal language arose. Even if both interaction modalities were considered adequate to human capability [9], the past decades saw the dominance of the graphical approach. In this paradigm, the primary interaction style is Direct Manipulation (DM); verbal language is just a subordinated modality. Words or plain utterances (in the form of menus, error messages or icon labels) are used only when pictures and actions need to be disambiguated. This primitive form of Natural Language (NL) represents an asymmetric medium of communication, since verbal elements are meaningful only to the user.
Till very recently, the Human-Computer Interaction (HCI) community has disregarded NL communication. A direct evidence for this claiming comes from a meta-analysis of previous CHI proceedings. In the period from 1995 to 1999, papers addressing written NL, speech or multimodal communication represent a very small minority, ranging from 3% in 1995 and 1998 to 6% in 1996. At the same time, researchers who work in the NL Processing (NLP) field have been mainly interested in the computational treatment of linguistic inputs. With a few exceptions [for instance: 10, 14], designers neglected user studies, relying on the fact that the interaction would have been naturally user friendly.
Despite this, a new tendency is emerging: HCI and
NLP are recently unifying their effort to build innovative intelligent
interfaces. The change elicits a number of questions that can be proposed as
topics for discussion: (a) Are HCI and NLP complementary fields?
(b); Where can they meet?; (c) How will NL modify HCI?
To answer the first hot question ¾ Are HCI and NLP complementary fields?¾, we need to clarify our understanding of goals and methods of both disciplines. Indeed, it seems to us that the gap can only partially be explained by epistemic distinctions, but that it is related to strong discipline boundaries separating HCI from AI. As a matter of fact, HCI and NLP attempt to reach a common goal: simplifying user interaction with information systems. Despite this, historically they have followed two antithetic design approaches. HCI is, by definition, user-centred; NLP has for long been based on a prevailing system-centred view.
The problem of interaction arises because users and computers are different kinds of systems: humans are analog, biological animals; computers are digital, technological machines [9, 13]. To merge them, HCI concentrates on interfaces, artificial modules able to translate digital signals into analog representations. The focus of attention has always been on users: interfaces adapt computers to limits, capabilities and needs of humans. NL has been almost ignored not because of theoretical prejudice about its usability, but because of the many technical difficulties hampering the development of effective systems. The power of language as a tool for solving specific tasks has never been denied, simply understanding was considered a dream.
On the other hand, for many years NLP has focused on systems, attempting to reproduce verbal communication at the human-computer interface by architectures processing conversational inputs. In a perfect NL system the traditional concept of user-interface tends to disappear: the language itself constitutes the interface. But, a perfect NL system is probably just a utopia. Until now and for many years to come, computers will not be able to hold real conversations with users. The limit derives from very serious computational difficulties in formalising the marvellously complex structure of human communication. Therefore, to be effective, current technology needs to substantially reduce the richness, freedom, and ambiguity of natural communication.Historically, researchers designing language-oriented systems have assumed that users could easily adapt to system limits. This assumption has generated poorly usable systems, because it stems from a basic misunderstanding of user behaviour. Indeed, despite the fact that adaptation is a fundamental human capability, some of the most basic communication skills escape conscious control and involve hard-wired or automatic processes that cannot be easily modified.
This line of reasoning drives us towards a
paradox. HCI have neglected NLP because of its technical limitations, but HCI
has the theoretical and methodological apparatus that could overtake them. On
the other hand, NLP has ignored a discipline that could strongly help its
development. The loop can be closed accepting a desirable synergy and different
roles. Provided that the system is faithful to the real context of use, also a
limited linguistic capability can strongly enhance the usability of current
computer systems. For the purpose of an effective interaction, a reliable
language is one that makes explicit knowledge and processes for which users and
computers share a common understanding. Therefore, to optimise NL technology,
HCI must provide a deep understanding of users and tasks and propose innovative
interface techniques for spontaneously limiting user production. NLP must
implement this knowledge following the user-centred design. The convergence is
connected to our second question ¾
Where can they meet?¾
The controversy between verbal and graphical language has for long hampered a desirable synergy between the dialogue mode and the interaction mode [1]. Nevertheless, an effective cross-modal integration can provide advantages over the pure use of either technology [2, 3, 5].
Structured interfaces, being they visual or oral, can supply a partial solution to many difficulties of natural communication. The role of the visual context on communication is straightforward [5, 6]: users plan their utterance according to the perceptual organisation of interface [6]. When interaction is mediated by structured graphical interfaces, language variability, utterance complexity, and speech disfluencies are substantially reduced [14]. Graphical representations mitigate the intrinsic opacity of language, enabling users to see what objects and actions are available. All in all, interfaces provide conversational frameworks that diminish the need of user initiatives for planning and self-structuring information. Similarly, when using a narrow-band device such as a telephone, a well-designed confirmation dialogue can overcome the deficiencies of the speech recognition program strongly simplifying user's inputs.
On the other side, NL has the power for reducing many constraints of the action mode. Only by speaking, users can delegate actions and act on invisible objects. The availability of complex linguistic structures, such as connectives, conditionals, and quantifiers, allows users to group sets of basic actions into individual high-level commands [2]. Moreover, while discussing NL based interaction, we have to bear in mind that the next generation of user will not be completely unaware about technology. They will be people who grew up with computers, the "post Nintendo" generation [7]. The experience they had playing with text-based adventure games could be profitably used when NL mediates interaction. Possibly, conversation will not be based on a fully natural communication, but on an innovative pidgin: a mixture of human and computer languages. Therefore, next-generation users could be able of better coping with constraints of NL systems, especially when multiple communication modalities are provided.
Multimodal communication should represent a natural and profitable evolution of verbal and visual interfaces. Multimodal systems have the peculiarity of extracting and conveying meanings through several I/O interfaces (such as microphone, keyboard, mouse, electronic pen, etc.). Combining the dialogue mode and the action mode they represent the optimal synergy between HCI and NLP. From the point of view of the user, they can introduce a major shift in the usability of future computers. Humans can express their intentions in a spontaneous way, selecting the most appropriate modalities according to the circumstances. In particular, combining speech and gestures (the advanced form of DM), multimodal systems were found to be extremely useful whenever the task was to locate and act on objects displayed on the interface [15]. Users were faster, less error prone and disfluent, when interacting via pen and voice, than via voice or pen only [16]. The advantage was primarily due to NL limitations in defining spatial location [3], and to gestural limitations in conveying non-standard commands.
As NL will gain more importance in HCI, interaction will be less and less a matter of pushing buttons and dragging slides, and more and more a matter of specifying operations and assessing their effects through the use of language. Computers will no longer be medium where performing tasks fully requires users to define and execute all the actions; computers will work at a higher level, being able to split actions in tasks and autonomously executing them. The change can deeply affect the paradigm of interaction: from doing to having it done [18]. Consequently, the mental representation elicited by computers may drastically evolve.
Traditionally, computers based on the DM paradigm were perceived as cognitive artifacts: artificial tools storing, manipulating, and retrieving information [12]. They are artificial instruments created to support representational functions and improve user’s performances. Nevertheless, since the beginning of the computer age, a similarity with human beings has been detected and commonly applied to explain the machine behaviour (e.g., computers have a memory, speak a language, can be friendly to users or affected by virus).
The tendency to ascribe human-like features to non-human entities is a natural disposition of human beings, known as anthropomorphism [4]. Interacting with NL systems, the human-metaphor can be objectively strengthened.. Therefore, computers can be perceived as social artifacts: artificial agents capable to set up co-operative and direct relationships with human. Elsewhere, we have demonstrated that even basic linguistic capabilities can trigger stronger anthropomorphic attributions as compared to DM interfaces. Other authors support this finding. Back in the 1987, [19] suggested that NL-based HCI is fundamentally social because of the high interactivity combined with non random reactions that induced users to feel computers as purposeful agents. Nass in 1994 [17] showed that even computer literate people apply social rules when interacting with computers. What remains to be clarified is the role of linguistic capabilities in eliciting anthropomorphic attributions. Two different experiments [4] showed that linguistic capabilities are not directly taken into consideration when evaluating the degree of anthropomorphism attributed to computers. The mental representation induced by NL systems (both in speaking or writing) was hardly modified by the linguistic capability of the system. From these findings, we are developing the idea that it is not the effective language ability, but rather the communicative intent that trigger the perception of anthropomorphism. This intuition is confirmed by the evaluations of systems that do not support any linguistic input by the user. For instance, in the Agneta&Frieda [8] system two characters are set on the user screen commenting user net-surfing in an ironic and sarcastic way. Users' interviews highlighted strong anthropomorphic and affective reactions (both positive and negative). Human pre-recorded voices can elicit social reactions [17].
Our idea that, because of NL technology, HCI will shift from a prevalent instrumental activity towards social cognition partially contrasts with the media equation paradigm (media = real life) [17] stating that the same social rules guiding human-human communication are always equally applied to HCI. In our opinion, the idea that social cognition encompasses any exchange of meaning between users and computers poses the problem of differentiating computers from users. The media equation contrasts with the intrinsic flexibility of natural social behaviour, where people are very sensitive to differences between partners modifying their behaviour according to the communication context. Therefore, it seems implausible that humans will be blind in front of the enormous differences between computers and persons. On the contrary, we assume that social-artifacts will be special stimuli, giving rise to innovative social dimensions. As interface designers are now required to clarify the differences between computers and humans, understanding the innovative social codes and rules driving interaction.
In conclusion, from a methodological point of view, both HCI and NLP needs to upgrade their scientific apparatus to cope with the design of social artifacts. This opens a number of challenging problems regarding the controversial effect of anthropomorphism on HCI. Some people claim that, the more computers resemble humans the better it is, but some preliminary studies [8, 11] evidenced several negative reactions towards systems exasperating human characteristics in computers. An understanding is very urgent since antropomorhism opens innovative dimensions for agent technology, pushing towards the realisation of characters with personality. The first example is already on the net: at the Extempo company (http://www.extempo.com), the dog Max acts as clerk in e-commerce
.
1. Bos, E., Huls, C., and Claassen W. EDWARD: full integration of language and action in a multimodal user interface. Internation Journal of Human-Computer Studies, 1994, 40, 473-495.