SANLP Bigrams Assignment

Perl Lab Assignment

Part 1 (JM 6.3): write a program to compute unigram probabilities.

Part 2 (JM 6.4): try your program on two corpora (for example /share/classes/nigelward/corpus and /www/), and note the top 10 unigrams in each corpus.

optional Part 3 (JM 6.3): extend the program to compute bigram probabilities and run it on a corpus.

optional Part 4 (JM 6.5): use the bigrams to generate random sentences, as explained on page 202.

Due 3:01 Thursday. Hand in a print out of the code and the interesting parts of its output.

A Good Perl Introdution

Perl in 20 pages

Some Sample Perl Code

#! /usr/bin/perl -w
use English;

print "Starting";

#process each line
while (defined($line=<>)) {
    print "line is $line";
    print "                ";
    @words = split(' ',$line);
    # process each word
    foreach $token (@words) {
	$count{$token}++;
	print "so far \"$token\" seen $count{$token} times\n";
    }
    print "\n"
}

#scan the entire list of words, in alphabetical order
foreach $key (sort keys(%count)){
    print "altogether, the word $key appeared $count{$key} times\n";
}

print "sorted by frequency\n";
#this is a really crude way to sort
#  first create a list with count and token concatenated 
foreach $key (sort keys(%count)){
    push(@list, sprintf("%5d", $count{$key}) . " ". $key); 
}
#  now sort that list
foreach $thing (sort(@list)) {
    print "$thing \n";}

print "\nHere are some random coinflips: ";
my $i;
for ($i=5; $i>=0; $i--) {
    if (rand() > .5){print "H";} else {print "T";}
}

NLP Home . . . Nigel Ward Home