Automated Program Testing

My most important contribution in this area is the connection that I invented between JML specifications and JUnit testing. JUnit is a widely-used unit testing framework for Java. I developed a tool that uses the run-time assertion checking code generated by the JML compiler (jmlc) as a test oracle. The main idea is that one can use the precondition of a methods specification to filter invalid test data; this is done by distinguishing precondition violations that happen on entry to the method under test from those that happen within the code of the method under test. This work appeared in ECOOP 2002, and has made a tremendous impact on the formal methods community, since it helps make formal specifications a useful tool in unit testing (Cheon and Leavens, 2002). There have been several papers written by others that build on this approach, starting in 2002 and continuing to the present day.

I have continued to develop the work on unit testing, focusing on test data generation. In object-oriented programming, the state of an object is hidden and accessible only through a set of exported or public methods. One key issue therefore is to construct a test object of an interesting or desired state. This is a difficult problem, as one cannot manipulate the hidden state variables directly. An object must be constructed indirectly as a sequence of method invocations, preceded by a constructor invocation, and each of such invocations has to satisfy the assumption that the invoked method makes about its arguments, often formally written as a method precondition. I have been investigating different strategies for constructing test data automatically, including random testing, constraint solving, and genetic algorithms. In random testing, the method to be invoked is chosen randomly along with its arguments. The strengths of this approach are that it can be fully automated, eliminates subjectiveness in constructing test data, and increases the diversity of test data. However, the weakness is that randomly generated tests may not satisfy a program's assumptions such as method preconditions. In our experiment, for example, depending on the complexity of method preconditions, 99% of randomly-generated method invocation sequences failed to build test objects (Cheon, 2007; Cheon and Rubio, 2007).

With a graduate student, Antonio Cortes, I recently blended random testing with constraint solving, improving the efficiency of generating valid test data while preserving diversity (Cheon, et al., 2008). For domains such as objects, we generate input values randomly; however, for values of finite domains such as integers, we represent test data generation as a constraint satisfaction problem by solving constraints extracted from the precondition of the method under test. In our experimental evaluation we observed an average improvement of 80 times without decreasing test data diversity, measured in terms of the time needed to generate a given number of valid test cases.

A PhD student, Myoung Yee Kim, is investigating on improving random testing by applying evolutionary approaches such as genetic algorithms (refer to the project website). In evolutionary testing, metaheuristic search techniques such as genetic algorithms are used to select or generate test data. The search space is the input domain of the program under test. The search starts with an initial set of test cases that are typically generated randomly, and then evolves them into new generations by applying operations inspired by genetics and natural selection, such as selection, crossing, and mutation. The process is repeated until a solution.a set of test cases that satisfy the testing found or a certain stopping condition, e.g., the maximum number of iterations, is met. The search is guided by a fitness function that calculates the fitness values of the individuals in the generation in that the fitter ones have a higher chance to survive and thus evolve into the next generation. Thus, the effectiveness of an evolutionary testing is determined in part by the quality of the fitness function. In a recent paper we proposed a new approach to defining fitness functions for evolutionary testing that allows one to use such an intuition as "choosing the one with the shortest length." (Kim and Cheon, 2008). The core idea of our approach is to use the behavioral specification of a method to determine the goodness (or fitness) of test data. A preliminary experiment shows that our approach improves the search from 300% up to 800% in terms of the number of iterations needed to find a solution.

To summarize, my research has contributed to making testing more automatic and complete, which is of substantial importance to the software industry.

My recent research in this area was supported by the following research grants.

Last modified: $Id: autotest.html,v 1.5 2009/03/06 10:20:21 cheon Exp $