Pstone Overview

Summary

Pstone is a software package which enables students to experience writing, and debugging a grammar to cover a set of sentences.

In particular, it supports an assignment, typically taking 2 to 4 hours, which gives students:

Pstone includes

The documentation consists of

Pstone runs on SunOS 5.6 (a Unix variant). It is written in C and Tcl, and uses Tcl 7.0 and Tk 4.0, which are included in the distribution. The distribution also includes the source files and the code does, so it probably can be ported to other environments.

Pstone is freeware, produced by Nigel Ward with contributions from T. Maeda.

This release is the first version (Version 1). It has been used successfully in my natural language processing graduate course, but has not been exhaustively tested.

The name `Pstone' is an amalgam of P for parser, PST for phrase structure tree, and Stone from the Stone Soup story (although I no longer have the time to stir even if people contribute juicy pieces).

Installation

  1. Obtain Pstone from http://www.cs.utep.edu/nigel/pstone/pstone.tar.gz (5 Megabytes).
  2. Uncompress and untar the files (gunzip pstone.tar.gz, then tar xvf pstone.tar).
  3. Move the pstone directory to where you want it.
  4. cd into the pstone directory.
  5. Edit the first line of tcl-setup.csh to point to the location of the pstone/tcltk directory.
  6. Edit 3 lines at the top of browser (replacing xxxxxxx appropriately).
  7. Put source xxxxxxx/pstone/tcl-setup.csh in your .cshrc file (and your students' .cshrc files), or always remember to source this file before using browser.
  8. Test browser (see instructions below), to make sure that the lines edited above are pointing to the correct places.
  9. That's all.

Use

The roles of the components of pstone are as seen in the figure below; for usage details refer to pstone-a.ps in the assignment subdirectory.

Other Needs, Other Tools

Pstone is not intended to illustrate the basic properties of context-free grammars nor the operation of parsers. There exist tools which do; including SUNY Stony Brook's `Syntactica'.

Pstone's parser code is not something that you could read to learn how parsers work, nor how to program elegantly. There exist parsers which are much cleaner; the Natural Language Registry is a good place to look.

Pstone's parser is no-frills. More advanced parsers exist; again see the Natural Language Registry. (The reason for using simple context free grammars is that they are simple and well understood. Students who use Pstone become aware of the weaknesses, which motivates to learn about more advanced topics, such as probabilistic parsing, lexical disambiguation, simpler models of grammar, more powerful models of grammar, and automatic learning of grammars from corpora.)

Pstone's corpus is tiny. For real corpora, including especially the Penn Treebank, see the Linguistic Data Consortium. (Pstone's corpus size was chosen to be small enough to allow students to approach a feeling of accomplishment, but large enough to make the assignment challenging.)

Pstone's uses fairly ad hoc criteria for evaluating the parse trees produced by student grammars; better software for this includes the EVALB bracket scorer.

Bugs and Missing Features

Invoking browser in cases where the locations of Tcl/Tk etc. were not set properly during installation results in the unenlightening error message browser not found.

Invoking the parser with a one-word sentence on the command line results in a confusing usage message.

The browser displays no trees when first invoked, until the user clicks the `prev input' or `next input' buttons.

The current set of reference trees does not include a separate test set. Thus it is currently possible for students to examine the reference parse trees and from them reverse-engineer the grammar which I used to create them, and so get a perfect match score without much effort.

It would be nice if the browser included a button for invoking the parser.

The browser could support search for sentences which: contain certain words, or parse with certain non-terminals, or are overly ambiguous, or parse incorrectly, or parse correctly with one grammar but not with another, and so on.

The browser could display a lot more information.

The appearance of the browser could be better.

Corpora for other languages would be nice.

It would be nice to have a more compact distribution for use by sites with Tcl/Tk already installed. And of course Pstone should use the latest version of Tcl/Tk.

other bugs

... let me know which of these are important to you, to help me prioritize.


January 2000