Similar Segments of Social Speech Task

				BASELINES ON THE TESTSET

Nigel Ward, July 17, 2013



BASELINE 1: RANDOM GUESSING 

['../python/score5.py', 'testset/testset-queries.txt', '../tagsets/testset/all-tagsets-pruned-SECRET.txt', 'testset/random-guesses.txt']


SUMMARY:
  processed 21 queries
   of which 0 lacked answers entirely
   the answers examined (within 120 sec. per query) included:
     195 false alarms and 15  total hits
     4 successful and exact-or-early jump-in points
       averaging 2.4 seconds early
     11 successful but late jump-in points
       averaging 32.5 seconds late
   naivePrecision =  7% = 15 / 210
 
  raw recall = 11% = 223 / 2020 seconds
  raw searcher utility ratio: 0.125  = 10.6 / 85.4
    (average cost  / average benefit (both per query, both in seconds))
  Normalized Searcher Utility Ratio: 0.430 
  Normalized Recall: 0.402 
  F-measure 0.43



BASELINE 2: THE "CLEVER ALGORITHM" (as described in the Guide to
Interpreting the Metrics)

['../python/score5.py', 'testset/testset-queries.txt', '../tagsets/testset/all-tagsets-pruned-SECRET.txt', 'testset/inferred-answers.txt']


SUMMARY:
  processed 21 queries
   of which 7 lacked answers entirely
   the answers examined (within 120 sec. per query) included:
     112 false alarms and 11  total hits
     5 successful and exact-or-early jump-in points
       averaging 2.3 seconds early
     6 successful but late jump-in points
       averaging 42.5 seconds late
   naivePrecision =  9% = 11 / 123
 
  raw recall = 18% = 371 / 2020 seconds
  raw searcher utility ratio: 0.290  = 17.7 / 61.0
    (average cost  / average benefit (both per query, both in seconds))
  Normalized Searcher Utility Ratio: 1.001 
  Normalized Recall: 0.669 
  F-measure 0.95




/home/research/isg/speech/social/baselines/baselines.txt