What is Effectiveness?

Position paper for the CHI '97 Workshop
on HCI Research and Practice Agenda
Based on Human Needs and Social Responsibility

David G. Novick

European Institute of Cognitive Sciences and Engineering (EURISCO)
4 Avenue Edouard Belin, 31400 Toulouse, France
Anti-spam e-mail address

Introduction

This paper tries to address the problem of defining the effectiveness of interaction in terms that are useful for meeting human needs more directly, both in terms of task outcome and of interaction quality. The problem of effectiveness is generally considered from an "objective" standpoint; that is, the assessment considers only the nominal outcome of the interaction rather than the subjective qualitity of the interaction from the standpoint of the participants. This leads to cases such as computer-supported meeting rooms which produce highly "effective" results--better, faster decisions--but with less commitment from the participants. The problem of understanding effectiveness is particularly acute in safety-critical domains, such as aviation, where there are high social costs for failures.

The Problem of Defining Effectiveness

The concept of effectiveness has long been a very difficult problem for the human-computer interaction community, even if not always addressed directly. The "Handbook of Human-Computer Interaction" (Helander, 1988) does not even index "effectiveness" or "efficiency." Some low-level aspects of interaction have been relatively easier to model. For example, there is an extensive literature of the efficiency of pointing, captured by Fitts' Law (see, e.g., Fitts, 1954; MacKenzie, 1992). But for higher-level, non-motor activitives, reasonable approaches to effectiveness are difficult to come by. For task-based interaction, typical measures of effectiveness are time-to-completion and task outcome (Marshall & Novick, 1995). But this approach has fundamental limitations, particularly where the quality of interaction is at issue. Time-to-completion measures are poor choices for a task in which the use of the interface has little effect on the task completion rate. Likewise, interaction with interfaces to flight systems produces (or should produce) extremely low effective error rates, so task outcome is a poor measure of effectiveness. For example, research in improving the flight interfaces used by crews in new-generation aircraft is paradoxically hindered by the happy fact that most flights result in no incidents or accidents. Thus under current approaches to effectiveness, direct measures of effectiveness are unhelpful because (a) better than 99.99 percent of crews land their planes safely and (b) time of completion is generally a consequence of factors other than the interface, such as weather and air-traffic control.

Current approaches to measurement of effectiveness thus tend to be indirect: they measure elements associated with effectiveness rather than effectiveness itself. In the aircraft interface domain, for example, the cockpit management attitudes questionnaire (CMAQ) has been valuable because it assessed attitudes that have been linked to output factors like performance (Helmreich & Foushee, 1993). Similarly, the aircrew coordination observation/coordination scale (ACO/E) measured crew behaviors that were later related causally to mishaps, although different skills were found to be important for different crews and tasks (Prince & Salas, 1993). There are two major flaws in these approaches. First, they usually depend on expert ratings of performance rather than direct empirical measurement. Second, they measure process inputs instead of process outputs; this is (a) indirect and (b) unhelpful when trying to determine what the inputs should be. In particular, this presents a problem because the effectiveness measures will presumably be used to evaluate new procedures and interfaces that will be characterized by these indirect "input" factors, and the validity of the relationship between the inputs and the outputs may simply be a function of the procedures and interfaces which formed the basis for the study. We may end up maximizing possibly ineffective inputs instead of output effectiveness. In other words, our methodological limitations end up possibly misleading designers of interfaces and thus producing (a) potentially sub-optimal domain-task results in fields in which there is a strong public interest (such as operation of commercial aircraft) and (b) potentially unpleasant or aversive interaction for the users.

Subject measures of effectiveness, such as quesionnaires, are also suspect. First, it is not at all clear that the users are necessarily good judges of effectiveness of interaction; they too may rely on indirect indices. Second, these measures should be considered as domain-specific. For example, an interface for a video game might be considered effective if it were found to be "involving," in the sense of compelling the user's attention and interest. This would be a poor measure for a flight interface, however, because the crew have plenty of other things to which they need to attend.

In summary, the problems faced in current approaches to measurement of human effectiveness are that (1) direct measures are unhelpful because almost all task outcomes are successful, (2) indirect measures usually rely on judgment and are vulnerable to weak relationships between the inputs and outputs, and (3) subjective measures are unreliable and domain-specific.

A Research Agenda

Given a need to develop and validate methodologies for the measurement of human effectiveness, the research challenge is to find effectiveness measures that are not only practical but theoretically well-founded enough to be reliable and extensible. These problems are particularly acute in the domain of the cockpit environment. I suggest that a useful research agenda would involve development of methodological models. In particular, I begin with the proposition that effectiveness measurement techniques ought

The path to achieving these goals is not obvious, in part due to a trade-off between risk and reward. It can be expected that research on direct measures will present both higher risks and higher potential pay-offs relative to research on indirect measures, which are more familiar and clearly easier to execute. Consequently, funding agencies may seek to pursue a two-track approach, looking at both direct and indirect measures, with a view to producing high-probability results while pursuing an active chance of high-value results.

Looking specifically at possible methodologies for developing direct measures of effectiveness for, in particular, the cockpit environment, I suggest that it might be possible to decompose the flight task into smaller pieces (i.e., not the whole flight but intermediate results). This would involve conducting a task-analysis for sub-tasks, sub-goals, and reachable and non-reachable states. I expect that this should produce measurable error rates and outcomes. In particular, error detection and correction rates should be relatively high, particularly in tasks with positive outcomes. I also expect that, in practice, the incidence of minor unreported errors--almost always corrected--is high, so these could be used more successfully than overall task-outcome measures. In other words, maybe the notion of task can be redefined on a smaller scale where the everyday phenomena of interaction can be detected. If this is the case, we will have taken a step toward a more useful definition of the quality of interaction.

Acknowledgements

Ynze van Houten provided useful comments on an early version of this paper.

References

Fitts, P. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychological, 47, 381-391.

Helander, M. (ed.) (1988). Handbook of Human-Computer Interaction. Amsterdam: North-Holland.

Helmreich, R., and Foushee, H. (1993). Why crew resource management? Empirical and theoretical bases of human factors training in aviation. In Wiener, E., Kanki, B., and Helmreich, R. (eds.) (1993). Cockpit resource management. London: Academic Press, 3-45.

MacKenzie, I. (1992). Fitts' law as a research and design tool in human-computer. Human-Computer Interaction, 7, 91-139.

Marshall, C., and Novick, D. (1995). Conversational effectiveness in multimedia communications. Information Technology & People, 8(1), 54-79.

Prince, C., and Salas, E. (1993). Training and research for teamwork for the military aircrew. In Wiener, E., Kanki, B., and Helmreich, R. (eds.) (1993). Cockpit resource management. London: Academic Press, 337-398.


This paper appeared as Novick, D. (1997). What is effectiveness? Working notes, CHI '97 Workshop on HCI Research and Practice Agenda Based on Human Needs and Social Responsibility, CHI '97, Atlanta, GA, March, 1997.

__________
DGN, October 10, 1997
Valid HTML 4.0!