# SANLP Bigrams Assignment

### Perl Lab Assignment

Part 1 (JM 6.3): write a program to compute unigram probabilities.

Part 2 (JM 6.4): try your program on two corpora (for example /share/classes/nigelward/corpus and /www/), and note the top 10 unigrams in each corpus.

optional Part 3 (JM 6.3): extend the program to compute bigram probabilities and run it on a corpus.

optional Part 4 (JM 6.5): use the bigrams to generate random sentences, as explained on page 202.

Due 3:01 Thursday. Hand in a print out of the code and the interesting parts of its output.

Perl in 20 pages

### Some Sample Perl Code

 ```#! /usr/bin/perl -w use English; print "Starting"; #process each line while (defined(\$line=<>)) { print "line is \$line"; print " "; @words = split(' ',\$line); # process each word foreach \$token (@words) { \$count{\$token}++; print "so far \"\$token\" seen \$count{\$token} times\n"; } print "\n" } #scan the entire list of words, in alphabetical order foreach \$key (sort keys(%count)){ print "altogether, the word \$key appeared \$count{\$key} times\n"; } print "sorted by frequency\n"; #this is a really crude way to sort # first create a list with count and token concatenated foreach \$key (sort keys(%count)){ push(@list, sprintf("%5d", \$count{\$key}) . " ". \$key); } # now sort that list foreach \$thing (sort(@list)) { print "\$thing \n";} print "\nHere are some random coinflips: "; my \$i; for (\$i=5; \$i>=0; \$i--) { if (rand() > .5){print "H";} else {print "T";} } ```

