Why Not Not Evaluate AI?

Nigel Ward

SIGART Bulletin, 3(3), pp 4-5, 1992.

I have recently had the pleasure of an extended stay in Lotrs (fictitious name), an island-nation in the Pacific, and, on the days not spent on the beach, I observed how AI is practiced there. Most interesting to me were the social and organizational aspects; one aspect I found so admirable that I think it worthwhile to propose its adoption by other AI communities.

At AI conferences, (and more generally) nearly 100% of the papers get accepted. The journals take about two thirds. That is, evaluation is not done as a filtering step before publication.

Why Lack of Filtering Evaluation Is Good

Bad evaluation is probably worse than no evaluation at all. Properly evaluating AI (see Cohen and Howe, AI Magazine, 9(4)) is very hard work, and seems quite rare. Therefore doing without evaluation is potentially a way to improve the quality of AI research, obviously enough. What is less obvious is that less evaluation can make AI more relevant and more on-track.

Deprived of the peer approval positive evaluations represent, Lotrsian researchers turn to seek the approval of the public, for example, by giving public lectures, holding open houses, and writing articles for popular magazines, at a rate unimaginable in the States. This means better dissemination of AI results to the larger technical community, and to the general intelligentsia.

If papers are not evaluated, there is no fear of having a paper rejected for repeating previous research. As a result, every year a new crop of young Lotrsians attacks the same virtually impossible problems, blissfully ignorant of the fact that previous researchers have tried and failed. This is a good thing. It means that the attention of the field as a whole does not stray to far from the hard problems (like language understanding) to those where it is possible to make more rapid progress (like the study of logics with funny symbols). Of course, hard problems are still hard, and so many Lotrsian researchers are going around in circles, banging their heads against brick walls. This also is good --- going in circles is at least a way to keep your engine warm, and whenever a real new idea comes along, you'll be ready to pop it in and zoom away.

Why Lack of Filtering Evaluation is Possible

Of course, it would not be possible to simply stop evaluating conference submissions --- some other adjustments to the organization of the AI community would also be required.

To avoid the danger of conference proceedings ballooning it would help to adopt two Lotrsian techniques. First, to reverse the economics of publication, by imposing not only page charges, but also `presentation charges' for the national conferences. Second, by reducing the number of pages allowed per paper (the archipelago-wide AI conference allows 4 B5 pages, which is still generous compared to the 2 pages allowed at the computer science and cognitive science conferences; even the flagship journal for AI has an 8 page limit).

To please those researchers who still hanker after some formal recognition, we could adopt Lotrs-style post-publication evaluation. The key here is best-paper awards, voted by those attending the conference and awarded at the subsequent conference, and, similarly journal's paper-of-the-year awards, selected by the readership. This scheme has the twin advantages of being less work for the organizers and of being more democratic.

To forestall complaint from the hamburger-flipping, tax-paying public, who might become unhappy about lavishly funding a `science' that is not subject to the same rigorous standards of quality control as fast food, it would be necessary to import two more aspects of Lotrsian society more generally. First, educating the public to have more respect for scholarship, engineering, and teaching is a measure important of course not only for the sake of AI. To do so will take time thought; to avoid public resentment at freeloading scientists in the interim, it will be necessary to adopt another Lotsrian measure: meagre funding for science, and, especially, low pay for researchers. This should decrease taxpayer concern but without adversely affecting scientific morale, since the increased public respect will compensate for the lack of cash.

Why Lack of Filtering Evaluation is Not Bad

To answer some objections which could be made to this proposal:

`If papers aren't screened, the conferences will be filled with garbage, and it will waste everyone's time.' But of course, no one goes to conferences to listen to papers.

`Lotrs can only get by without evaluation because there is still a touchstone for good research, namely the international conferences and journals.' Well, it is true that the high-fliers in Lotrs tend to seek the international limelight. But why should we have to be among the hard-nosed ones aiming the spotlights? We could adopt the no-evaluation policy for national purposes, and maybe even opt out of the thankless chore of participating in evaluation for international purposes, leaving it to more obsessed nations.

`Cross-fertilization among paradigms is important, and evaluation brings them forcibly into contact.' It would be nice if this were true, but probably the more common effects of contact and conflict are that AI researchers develop thick skins, so that criticisms do not stick and cross-fertilization does not penetrate, and that researchers who neglect to develop good rhetorical skills fall too easily by the wayside.

`Anger and frustration at having a paper rejected is a powerful force that drives researchers to work harder to prove that they were right (and to attain a position of power so that they can fight back by adversely `evaluating' their detractors) and this drives science forward.' Well, attacks and struggle are all very nice, but the extent to which this really makes for progress is an open question. What is certain is that conflict is a precarious foundation for a society. To the extent that AI people are part of the intellectual elite, they should set an example for society at large, by governing their professional lives with the principles of mutual respect, harmony, and compromise.

`But if papers aren't evaluated, how will people know I'm good? Doing AI without public evaluation is like playing rogue without having the high scores posted.' Certainly some researchers are sustained by the belief in an day of judgement when they will be impartially scored for all to see, if not by the Nobel Prize committee, at least by the editorial board of the AI Journal. But discouraging such fringe elements, and thereby making the progress of AI more sedate and predicatable, must be counted as a further potential benefit of discontinuing filtering evaluation.