Yakkey Translation Interface
by Jani PATOKALLIO
a part of the Yak
Project
Introduction
Yakkey is the user's interface to the translation facilities
provided by the Yak-1 and Yak-2 hardware.
The name is yet another horrid pun: not only is it a translation
device (yakki) but also the key to the yak.
The software also likes to yak with other people and
may even be a little yucky.
Domains and mediation
In an ideal world, a translation device like the Yak-1
would understand everything said to it by both parties
and produce perfect translations; in other words,
the domain of the machine's capabilities is
infinite and the machine requires no guidance.
Unfortunately, we live in reality, where speech recognition
and translation have some very real limitations. The
supported domain must thus be limited to
fit the range supported by the machine.
Referring back to the original
specifications, there are two scenarios to deal
with: native to foreign language, and foreign to
native.
Native to foreign
The key problem here is getting the machine to
understand what the user wants. Once a
given thought is coded into the machine, it is trivial
to code a representation of that thought in a
foreign language. For example, if the user
gets across the idea that he wants a beer,
the mapping to "Quiero una cerveza",
"Biiru ippon kudasai" or "Antakaa mulle lisaa
kaljaa" requires just a prerecorded sample for
each supported language, or at least a
prerecorded string that can be run through
speech synthesis.
So how does the user convey what he wants?
Machines are, at least currently, incapable of
"really" understanding anything, but they can
present the user with list of possible options
(ie. thoughts that the machine "understands"
and can provide translations of) and allow the
user to select the one that best matches his
desires. Designing the most streamlined
method of accomplishing this is Yakkey's
primary function.
Foreign to native
The reversed situation is fundamentally
similar. Once the machine understands what
the other person has said, translating it
back to the user's native language is easy.
Complications arise from the fact that it is
considerably more difficult for the other
person to communicate his desires to the machine:
the only natural interface available is speech,
and speech recognition of non-pretrained voices
is in its infancy.
Some other projects, notably CMU SpeechWear, have
circumvented this
by providing an extra display (an LCD on
the wrist, for example) for allowing the
other person to communicate. While undoubtedly
practical for some situations, this is only a
stopgap measure and thus Yakkey endeavours to
manage with speech recognition only.
Requirements and assumptions
The user has a wearable display and a mouse-type input device
with selection button for navigation.
Speech recognition and output is available through
external hardware and software. The speech recognition system
is capable of recognizing a largish set of the
user's input and a very limited set of external input ("yes",
"no", perhaps the numbers).
Design

Yakkey's central components and actors
Domains are encapsulated as scripts that contain all
possible "thoughts" that can be understood, their
translations and their relationships. The user's
wearable displays a menu used to navigate within
the script. Navigation is performed with the input
device, scrolling up and down and clicking on entries,
or (after training) by speaking certain keywords or
keyphrases into the device. Feedback from the other
party is limited to certain key words ("Yes", "No",
"I do not understand"), and scripts are designed in
such a fashion that these suffice. The user is
asked to confirm all input before the machine acts on it.
Interesting and currently undecided question: speech recognition
in both directions would undoubtedly be useful, but
is it necessary? The user can navigate the menus with
the pointing device alone, yes/no-type answers can be
understood through body language (especially if the machine
specifically asks to do so), numbers can be read off a calculator
or a piece of paper. For now, we will assume that both will be
used, but we will be careful to always allow the user to override
the speech recognition system.
Example: In a restaurant
Actors:
- user, actions in italics
- translator, display in boxes, speech in red
- waiter, speech in blue
For brevity, the options "Back" and "Leave" have been omitted from
all displays. The computer's suggested selection is in italics,
the user's actual selection is in bold.
User activates Yakkey and is presented with the following
menu. The item selected by the user is shown in bold here,
but with a highlighted box in reality.
Hello. I am a translation device, my owner does not speak
Japanese. Please answer all questions using only "yes" and
"no". Is this OK?
OK.
I am sorry, I did not understand. Please answer all questions using
only "yes" and "no". Is this OK?
Yes.
Thank you for your understanding. Please wait a moment.
Select:
- Daily set menu
- Recommendation
- Cheapest meal
|
Ask for price?
- Yes, exact
- Yes, limit only
- No
|
Do you have a daily set menu?
Yes.
How much does the set menu cost? Please spell out numbers digit by digit.
For example, for 850 yen, please say eight-five-zero yen.
Eh? (pause)
I am sorry, I did not understand. Please spell out the price
digit by digit. For example, for 850 yen, please say eight-five-zero
yen. How much does the set menu cost?
Umm... One two zero zero yen.
One thousand two hundred yen. Is this correct?
Yes.
Thank you. Please wait a moment.
Yes, my owner will take the set menu.
The meal appears and the user eats it. Time to pay the bill.
Excuse me... check, please.
Can I use a credit card?
I am sorry, it is a little difficult...
I understand. I will pay by cash.
The user pays the bill.
The meal was excellent. My apologies for being such a nuisance
and thank you for your kindness.
Scripting
Scripts are written with the YakScript
menu system / scripting language. YakScript is interpreted
(although it has an optional verifier that will
eliminate most syntax errors) and supports a host of goodies
including conditionals and subroutines. See
the YakScript Language Specification
for details.
Yes, I thought about using HTML instead, and I may actually
implement an alternative interface where frames are HTML files
containing displays and options are embedded as
custom YakScript anchors.
frame.html:
<A href="yakscript:Sub YESNO boo.wav;Yes:Goto frame2">Like
this.</A>
But then I'll have to write my own HTML parser, which would
entail a lot of work and kind of defeats the point...
although with Swing 1.2's javax.text.html classes and
the HyperlinkEvent framework the gain might be worth the pain.
No matter what I do, dropping YakScript entirely doesn't appear
to be an option: the scripts are too specialized to be written
in a general program language and not small enough to be
built with JavaScript/HTML and run in a standard browser...
Source code
The current version of the Yakkey system, classes, source, scripts,
samples and all, is available right here.
Automatically generated Javadocs (not necessarily up to date)
are here.
No packages versions yet as everything is still deep beta,
wget the whole thing if you want a snapshot...