CHI 2000 Workshop on Natural Language Interfaces
The Hague, The Netherlands, April 3, 2000
Pierre Nugues
Institut des Sciences de la Matière et du Rayonnement
6, boulevard du Maréchal Juin
F-14050 Caen, France
pnugues@greyc.ismra.fr
http://www.greyc.ismra.fr/~pierre
In this text I first discuss what can be the respective advantages
of language interaction in a virtual worlds and 3D images in language
interactions and dialogue.
I then describe an example of a verbal and written interaction system in
virtual reality, Ulysse, a conversational agent that can help a user navigate
in virtual worlds. Ulysse has been designed to be embedded in the
representation of a participant of a virtual conference. Ulysse responds
positively to motion orders and navigate the user’s viewpoint on his/her behalf
in the virtual world. On tests we carried out, we discovered that users,
novices as well as experienced ones have difficulties moving in a 3D
environment. Agents such as Ulysse enable a user to carry out navigation
motions that would have been impossible with classical devices.
Finally, the text describe a virtual workbench to study motion verbs. From the
whole Ulysse system, we have stripped off a skeleton architecture that we have
ported to VRML, Java, and Prolog. This skeleton allows the design of language
applications in virtual worlds.
Keywords: Virtual reality, Conversational agents, Spoken navigation.
Computer interfaces have now widely stabilized into a set of
paradigms where variations are merely cosmetic. Following the Macintosh’s finder,
Windows’ desktop or X-Window’s avatars have all converged into visualization
and interaction means that include windows, icons, menus, and pointers – the
so-called WIMP model.
This model may have reached a plateau and be unable to scale up to new computing
entities such as the Internet or to be adapted to new tasks such as cooperative
work or simulation.
From ideas to recreate desktop tools on the screen of a computer under the form
of symbols – icons –, some researchers thought that virtual reality was a
better paradigm to design metaphors. This pushes desktop symbolization closer
to reality and presents a way to escape some trends of the GUI routine. In
applications such as computer tools for collaborative work others researchers
saw virtual reality as means to situate users, to make them aware of the
context: their coworkers and other working teams notably, and to enable easier
communications [1].
In many respects, virtual reality is appealing because it brings more realistic
images and more interaction to the computer desktop. Although cognitive values
of visualization and visual think have been extensively described and
researched [2, 3, 4], power of images and interaction is probably best captured
by the Chinese proverb:
I hear and I forget, I see and I remember, I do and I understand.
And virtual reality addresses the two last points better than any other
interface. From this viewpoint, virtual reality would appear as the extreme
trend of the interactive desktop metaphor and an ultimate interface.
In virtual reality environments however, navigation is often
difficult and interaction is can be oppressive. It extends in that way
drawbacks of existing interfaces as well as their advantages. Virtual reality
requires extensive and sometimes tricky gestures. Opening a folder in the
recreated office of a virtual world would probably require more movements than
with the Macintosh Finder.
Information access, desktop control which are often tied to ease of navigation
are key points of a good interface design. It’s not sure that widely available
virtual reality interfaces address this problem. In experiments we conducted
earlier, we found that computer novices as well as experienced computer users –
but unfamiliar to virtual reality – had much navigational difficulties in
virtual worlds [5]. In addition, virtual reality embodies the principle of
direct interaction that requires from objects to be visible, close, and in a
relatively upright position. When not visible, objects are sometimes difficult
to find and then to approach which adds a supplement of navigation chore.
A language interface would enable easier designation and navigation and hence
help users complete their tasks faster. If this last statement has no definitive
proof, a hint can be given by the analogous example of the Web development that
shows that most popular portals are natural or constraint language interfaces
(e.g. Altavista, Voilà, Excite, Lycos, Yahoo). Thanks to their indexing robots
they prevent a user from clicking zillions of pages before finding relevant
information. Although, the extent of language processing behind these sites
might be discussed, it shows a clear user preference to designate things using
Dutch (or French) rather than to navigate links.
Gentner and Nielson [6] in a prospective article underlined limits of the WIMP
interfaces. They described the possible role of language in future interfaces.
They noted that language lets us refer to objects that are not immediately
visible, encapsulate complex groups of objects, and support reasoning. They
predicted a slow pervasion of natural language techniques in interfaces that
would allow a negotiation between the user and its interface thanks to limited
linguistic capabilities.
In our recent projects, we have implemented natural language agents in an
attempt to bring to virtual reality systems some advantages of linguistics
capabilities. That’s what we describe now.
Ulysse was our first implementation of a linguistic device
in a virtual world. From user studies that we undertook, we discovered that
many users were not able to move properly in a virtual world. We designed and
implemented a conversational agent to help their navigation [5, 7, 8, 9, 10].
Ulysse consists in a chart parser and a semantic analyzer to build a dependency
representation of the word stream [11] and a case form. It also features a
reference resolver to associate noun phrases to entities of the virtual world
and a geometric reasoner to cope with prepositions, groups, spatial
descriptions, and to enable a limited understanding of the structure of the
virtual world (Figure 1).
Figure 1 Ulysse's architecture.
Ulysse is embedded in the representation of the user in the world. Upon
navigation commands from the user, Ulysse analyzes the word stream and
navigates the user’s viewpoint on his/her behalf in the virtual world. Figure 2
gives an example of a dialogue with Ulysse. The user moves in the world, goes
around objects, and visits them. Ulysse uses the DIVE environment [13]. In the
experiments we conducted, novice users were not able to do these actions using
a standard finder and a mouse.
|
Va devant cette voiture |
|
Voilà |
|
|
Tourne à droite |
|
Voilà |
|
|
Va vers le cube |
|
Il y en a plusieurs |
|
|
Va vers les petits cubes |
|
Voilà |
|
|
Retourne devant la maison |
|
Voilà |
|
|
Va derrière |
|
Voilà |
|
|
Entre dedans |
|
Voilà |
|
|
Va devant Jo |
|
Voilà |
|
Figure 2 A dialogue with Ulysse.
Ulysse’s action engine is a planner that uses an algorithm derived from STRIPS [12] (Figure 3). Ulysse can been used with a keyboard interface or a speech recognition system such as IBM’s VoiceType or ViaVoice.
|
|
|
|
|
|
Figure 3 Ulysse’s walking motion.
Ulysse has recently undergone some changes. We rewrote it to
adapt its architecture to Internet programming tools and languages, namely Java
and the VRML programming interface. We also modified the parser that was quite
slow eliminating the chart parser and replacing it by a more efficient
algorithm. Although VRML Ulysse is not a complete port it provides the
programmer with a skeleton that is relatively easy to adapt to other tasks
[14].
VRML Ulysse has three main components: a language engine in Prolog, the VRML
world, and a Java applet to form the input interface and to ensure
communication with the VRML world and the language engine. Both are linked
through the External Authoring Interface. The Java applet is derived from the
Script class that provides facilities to send and receive events from a VRML 2
world (Figure 4).
Figure 4 VRML Ulysse architecture.
Master students from the university of Caen started using it to implement
logical and cinematic definitions of French motion verbs. They worked on sauter
(jump) and courir dans (run into) for which they implemented a model in Prolog
that they could visualize with VRML Ulysse.
We have described a prototype of a conversational agent
embedded within a virtual worlds. This prototype accepts verbal commands or
descriptions from written texts. Ulysse conversational agent enables users to
navigate into relatively complex virtual worlds. It also accepts orders to
manipulate objects such as a virtual brain. We believe that a spoken or verbal
interface can improve the interaction quality between a user and virtual
worlds.
We have also sketched the architecture of a new version of the Ulysse system in
VRML, Prolog, and Java which allows the design of language applications in
virtual worlds. It has been used to implement the cinematic definition of
French motion verbs. While there are theories in this domain, few are proven
due to the lack of experimental devices. Experimental tools are central to the
improvement or design of theories. We hope such a system enables the
experimentation of theories on motion verbs making them implementable and
provable. The prototype is available from the Internet.
In conclusion, we believe that the virtual reality and computational
linguistics communities could have a fruitful cooperation. In spite of their
different technical culture and history they could create paradigms to explore
future interfaces and new ways of computing.