Similar Segments of Social Speech Task BASELINES ON THE TESTSET Nigel Ward, July 17, 2013 BASELINE 1: RANDOM GUESSING ['../python/score5.py', 'testset/testset-queries.txt', '../tagsets/testset/all-tagsets-pruned-SECRET.txt', 'testset/random-guesses.txt'] SUMMARY: processed 21 queries of which 0 lacked answers entirely the answers examined (within 120 sec. per query) included: 195 false alarms and 15 total hits 4 successful and exact-or-early jump-in points averaging 2.4 seconds early 11 successful but late jump-in points averaging 32.5 seconds late naivePrecision = 7% = 15 / 210 raw recall = 11% = 223 / 2020 seconds raw searcher utility ratio: 0.125 = 10.6 / 85.4 (average cost / average benefit (both per query, both in seconds)) Normalized Searcher Utility Ratio: 0.430 Normalized Recall: 0.402 F-measure 0.43 BASELINE 2: THE "CLEVER ALGORITHM" (as described in the Guide to Interpreting the Metrics) ['../python/score5.py', 'testset/testset-queries.txt', '../tagsets/testset/all-tagsets-pruned-SECRET.txt', 'testset/inferred-answers.txt'] SUMMARY: processed 21 queries of which 7 lacked answers entirely the answers examined (within 120 sec. per query) included: 112 false alarms and 11 total hits 5 successful and exact-or-early jump-in points averaging 2.3 seconds early 6 successful but late jump-in points averaging 42.5 seconds late naivePrecision = 9% = 11 / 123 raw recall = 18% = 371 / 2020 seconds raw searcher utility ratio: 0.290 = 17.7 / 61.0 (average cost / average benefit (both per query, both in seconds)) Normalized Searcher Utility Ratio: 1.001 Normalized Recall: 0.669 F-measure 0.95 /home/research/isg/speech/social/baselines/baselines.txt