Yakkey Translation Interface

by Jani PATOKALLIO
a part of the Yak Project

Introduction

Yakkey is the user's interface to the translation facilities provided by the Yak-1 and Yak-2 hardware. The name is yet another horrid pun: not only is it a translation device (yakki) but also the key to the yak. The software also likes to yak with other people and may even be a little yucky.

Domains and mediation

In an ideal world, a translation device like the Yak-1 would understand everything said to it by both parties and produce perfect translations; in other words, the domain of the machine's capabilities is infinite and the machine requires no guidance.

Unfortunately, we live in reality, where speech recognition and translation have some very real limitations. The supported domain must thus be limited to fit the range supported by the machine.

Referring back to the original specifications, there are two scenarios to deal with: native to foreign language, and foreign to native.

Native to foreign

The key problem here is getting the machine to understand what the user wants. Once a given thought is coded into the machine, it is trivial to code a representation of that thought in a foreign language. For example, if the user gets across the idea that he wants a beer, the mapping to "Quiero una cerveza", "Biiru ippon kudasai" or "Antakaa mulle lisaa kaljaa" requires just a prerecorded sample for each supported language, or at least a prerecorded string that can be run through speech synthesis.

So how does the user convey what he wants? Machines are, at least currently, incapable of "really" understanding anything, but they can present the user with list of possible options (ie. thoughts that the machine "understands" and can provide translations of) and allow the user to select the one that best matches his desires. Designing the most streamlined method of accomplishing this is Yakkey's primary function.

Foreign to native

The reversed situation is fundamentally similar. Once the machine understands what the other person has said, translating it back to the user's native language is easy. Complications arise from the fact that it is considerably more difficult for the other person to communicate his desires to the machine: the only natural interface available is speech, and speech recognition of non-pretrained voices is in its infancy.

Some other projects, notably CMU SpeechWear, have circumvented this by providing an extra display (an LCD on the wrist, for example) for allowing the other person to communicate. While undoubtedly practical for some situations, this is only a stopgap measure and thus Yakkey endeavours to manage with speech recognition only.

Requirements and assumptions

The user has a wearable display and a mouse-type input device with selection button for navigation. Speech recognition and output is available through external hardware and software. The speech recognition system is capable of recognizing a largish set of the user's input and a very limited set of external input ("yes", "no", perhaps the numbers).

Design

Yakkey's central components and actors

Domains are encapsulated as scripts that contain all possible "thoughts" that can be understood, their translations and their relationships. The user's wearable displays a menu used to navigate within the script. Navigation is performed with the input device, scrolling up and down and clicking on entries, or (after training) by speaking certain keywords or keyphrases into the device. Feedback from the other party is limited to certain key words ("Yes", "No", "I do not understand"), and scripts are designed in such a fashion that these suffice. The user is asked to confirm all input before the machine acts on it.

Interesting and currently undecided question: speech recognition in both directions would undoubtedly be useful, but is it necessary? The user can navigate the menus with the pointing device alone, yes/no-type answers can be understood through body language (especially if the machine specifically asks to do so), numbers can be read off a calculator or a piece of paper. For now, we will assume that both will be used, but we will be careful to always allow the user to override the speech recognition system.

Example: In a restaurant

Actors: For brevity, the options "Back" and "Leave" have been omitted from all displays. The computer's suggested selection is in italics, the user's actual selection is in bold.


User activates Yakkey and is presented with the following menu. The item selected by the user is shown in bold here, but with a highlighted box in reality.

Introduction
  • Play
  • Skip

Hello. I am a translation device, my owner does not speak Japanese. Please answer all questions using only "yes" and "no". Is this OK?

OK.

Confirm answer:
  • Yes
  • No
  • Repeat

I am sorry, I did not understand. Please answer all questions using only "yes" and "no". Is this OK?

Yes.

Confirm answer:
  • Yes
  • No
  • Repeat

Thank you for your understanding. Please wait a moment.

Select:
  • Daily set menu
  • Recommendation
  • Cheapest meal

Ask for price?
  • Yes, exact
  • Yes, limit only
  • No

Do you have a daily set menu?

Yes.

Confirm answer:
  • Yes
  • No
  • Repeat

How much does the set menu cost? Please spell out numbers digit by digit. For example, for 850 yen, please say eight-five-zero yen.

Eh? (pause)

Choose action:
  • Proceed
  • Repeat question

I am sorry, I did not understand. Please spell out the price digit by digit. For example, for 850 yen, please say eight-five-zero yen. How much does the set menu cost?

Umm... One two zero zero yen.

One thousand two hundred yen. Is this correct?

Yes.

Confirm answer:
  • 1200 yen
  • Repeat question

Thank you. Please wait a moment.

Set menu for 1200 yen:
  • Accept
  • Decline

Yes, my owner will take the set menu.

The meal appears and the user eats it. Time to pay the bill.

Signal waiter:
  • Now

Excuse me... check, please.

Payment:
  • Cash
  • Credit card

Can I use a credit card?

I am sorry, it is a little difficult...

Confirm answer:
  • Yes
  • No
  • Repeat

I understand. I will pay by cash.

The user pays the bill.

Say goodbye:
  • Now

The meal was excellent. My apologies for being such a nuisance and thank you for your kindness.

Scripting

Scripts are written with the YakScript menu system / scripting language. YakScript is interpreted (although it has an optional verifier that will eliminate most syntax errors) and supports a host of goodies including conditionals and subroutines. See the YakScript Language Specification for details.

Yes, I thought about using HTML instead, and I may actually implement an alternative interface where frames are HTML files containing displays and options are embedded as custom YakScript anchors.

frame.html:
<A href="yakscript:Sub YESNO boo.wav;Yes:Goto frame2">Like this.</A>

But then I'll have to write my own HTML parser, which would entail a lot of work and kind of defeats the point... although with Swing 1.2's javax.text.html classes and the HyperlinkEvent framework the gain might be worth the pain.

No matter what I do, dropping YakScript entirely doesn't appear to be an option: the scripts are too specialized to be written in a general program language and not small enough to be built with JavaScript/HTML and run in a standard browser...

Source code

The current version of the Yakkey system, classes, source, scripts, samples and all, is available right here. Automatically generated Javadocs (not necessarily up to date) are here. No packages versions yet as everything is still deep beta, wget the whole thing if you want a snapshot...