Run-time Software Fault Monitoring through DynaMICs
Ann Q. Gates
Steve Roach
Introduction. Computers are present in all aspects of our lives, automating and controlling most systems with which we interact. Because the reliability of software systems cannot always be guaranteed by traditional verification and validation methods, use of a runtime software-fault monitoring system that checks for correct program behavior is essential. Dynamic Monitoring with Integrity Constraints (DynaMICs) is such a system, providing support throughout the software's life cycle. Integrity constraints specify properties about the world being modeled. Maintenance of constraints in a repository (separate from the program), automated instrumentation of source code with constraint checks, and use of a tracing mechanism to facilitate resolution of constraint violations are the three main features of DynaMICs that differentiate it from other approaches.
Lifecycle Support. During the requirements phase, constraints are elicited from domain experts, customers, and users, providing an additional layer of communication among development team members and facilitating identification of potential requirements conflicts. In developing large systems, the complexity of requirements management and the probability of introducing conflicts increases with the amount of information integrated from multiple domain experts. Run-time monitoring helps manage the information and reduces the complexity associated with it. Because constraints are maintained in a repository, the constraint-analysis process is simplified. For example, separating constraints permits grouping constraints that are defined on the same objects, grouping constraints that originate from the same source, and automating the detection of potential inconsistencies. Because each constraint is manually linked to an artifact that justifies the constraint, it is possible to identify the source of a constraint and, as a result, resolve conflicts among constraints and program code.
In the design phase, constraints can capture limitations imposed by designers. Similarly, constraints can capture assumptions made by the development team; this keeps the assumptions active and not lost in documentation and program comments. Although software may be deemed valid and correct with respect to specifications, the problem space may change over time from the original requirements, and its unanticipated use may lead to failure.
During the maintenance phase, as the system evolves due to changes in requirements and existing code, constraints can be modified accordingly. As a result, it is possible to ensure that changes in the code do not inadvertently alter constraints, and that added or modified code does not violate existing constraints. Because constraints are not integrated in source code, maintenance of their specifications and associated monitoring code is simplified. In production, a subset of constraints can be selected for monitoring software. When errors are detected by a DynaMICs system, the user is provided with information that permits analysis, resolution of potential conflicts among experts or requirements, and possible refinement of requirements, specifications, and/or code. This gives the user an opportunity to enhance the quality of the software, providing an increased level of confidence.
Status of Project. The constraint elicitation process, instrumentation, and the relationship between constraints and program behavior has been studied on approximately 80 programs collected from a special-topics class in which ten problems were assigned in applications such as text processing, graph manipulation, pattern matching, and real number-calculations. The study documented and evaluated the methodology used to elicit constraints. Preliminary results from the study show that DynaMICs is effective at detecting when an unpredictable sequence of events results in an inconsistent state, the environment of the program changes, and assumptions no longer hold. As part of the study, the process of manual instrumentation was documented and refined, providing insight for defining an automated instrumentation process. Specifically, the work resulted in definition of the event specification that can direct automated instrumentation.
We have written formal proofs to prove the correctness of an algorithm that converts control-flow graphs into optimized path expressions. Path expressions are used in the algorithm that determines the instrumentation points in a program based on the event specification associated with a constraint.
A language for specifying constraints and associated events has been defined. Current work shows that automated translation of specifications to executable monitoring code is feasible. This effort, however, assumes a one-to-one correspondence of specification-variable data structures and program-variable data structures. The data-structure translation issue must be addressed to automate code generation since one goal of DynaMICs is to develop a system in which constraints can be reused in applications from the same domain. In other words, constraints should not be implementation dependent.
We have completed the design of a system that uses a coprocessor to execute constraint-checking code from information gathered by bus-monitoring hardware. This permits concurrent execution of application and monitoring code, addressing performance degradation. In addition, we have completed the design of the tracing mechanism. A prototype DynaMICs system that integrates the results of the various efforts is under development.
Next Steps. In addition to completing the prototype system, there are several other on-going efforts. One is to study the relationship of the constraint specification in our approach to temporal logic. We hope to show that constraints may be expressed in a temporal logic and, thus, it may be possible to take advantage of reasoning systems that utilize temporal logic. Another effort is to examine the soundness of the monitor, i.e., provide assurance that the monitor does not give rise to false negatives or false positives.
A case study of an actual system is needed to evaluate and refine the elicitation methodology as well as to assess the effectiveness of the approach in specifying appropriate constraints, detecting violations, and facilitating resolution of errors (using the prototype system). Note that errors may be in a constraint specification, an artifact, or program code. As the research produces results, the results will be used in a feedback loop to refine algorithms and understand advantages and possible limitations of the approach.
This research, along with the research funded by other agencies, will result in an end-to-end system that can be used throughout the lifecycle to provide quality software in which the user has a high-level of confidence.