Domain-Linked Rationale for Changes in Use of Complex Interfaces

David G. Novick

EURISCO
4, avenue Edouard Belin
31400 Toulouse, France

The purpose of this paper is to examine a study of user needs that was inherently tied to domain elements, and to distinguish areas of possible cross-domain knowledge and methods from areas that are likely to remain domain-specific. The paper will present a study of changes to user manuals by operators of a complex interface, in this case for a commercial aircraft. From an examination of this study, I will review the usefulness and limitations of cross-domain application with respect to methods, taxonomic classifications, and use of manuals as a source for user-oriented design knowledge.

1. The Change-Rationale Study

The use of complex interfaces in safety-critical domains can be characterized, in large part, through accommodations to domain-specific-and even user-specific-factors. One way of assessing the effects of domain factors on use is to examine changes in use made by classes of users. In the case of aircraft cockpit systems, for example, uses of the aircraft's interfaces are codified in an operating manual (OM), which is published by the manufacturer. And, of great value to the study of use of interfaces, some airlines publish, in whole or in part, a revised OM that details uses specific to the airline. Because of the interface-procedure duality (Boy, 1998; Novick, in press), user-initiated changes to user-interface procedures reveal user attitudes and preferences about the underlying interface. So in the aviation domain, researchers have available a written record of the changes in use that result from user or situational characteristics. These changes are reflected in differences in procedures for a particular aircraft.

To understand the domain-, user- and situation-specific reasons for changes in aircraft operating procedures by airlines, I undertook, as part of a project funded by Aerospatiale Airbus, a review and analysis of the rationale of airlines for representative changes to their OMs. The study was motivated by the promise of helping developers of procedures for new interfaces in taking advantage of industry experience and meeting more directly the customers' needs. This paper will report on the methods used, present a taxonomic analysis of OM change rationale, and evaluate the validity of the taxonomy.

1.1 Method

The study gathered data on OM change rationale from three airlines, which I will call AX, AY and AZ. Airline AX is a North-American carrier, airline AY is an Asian carrier, and airline AZ is European carrier. All three airlines have world-wide routes.

The study used two approaches in identifying OM change rationales. The first approach ("closed-ended") involved selection of known OM changes. Selections were made on the basis of obtaining a variety of kinds of change and possible variety of change rationales. In particular, the criteria revolved around apparent changes to the substance of a procedure or some other significant difference in content presented to the crew. In this approach, I reviewed prior research reports in which OM changes had been identified for airline AZ. Queries were sent to AZ about the reasons for these changes, and the airline responded with an exhaustive account of the change rationales for the items identified.

The second approach ("open-ended") involved asking key airlines to identify and explain changes to their OM. Queries were sent to airlines AX and AY. Airline AY responded with written materials. Airline AX provided written materials and generously sent two airline employees to EURISCO for interviews.

Based on the materials generated in the airlines' written responses and through the AX interview, the rationales for specific changes in the OMs of airlines AX, AY and AZ were identified and listed. The identified rationales were then classified in terms of their component factors. The results of this analysis are reported below.

1.2 Analysis

The data provide 68 instances of OM change rationale identified. From these, categories and super-categories of rationale were developed. In the terms of Lampert and Ervin-Tripp (1993), the taxonomic analysis was empirically derived rather than based on an a priori theory. The approach used was an iterative alternation of bottom-up and top-down analysis, which was intended to generate the analysis's implicit theory from the data, with refinement of the theory based on knowledge of the field. The multiple bottom-up and top-down passes led to the creation of new categories, recategorization of instances, grouping of sub-categories, formation of super-categories, combining categories, and renaming categories.

Overall, the change rationale can be classified in terms of five taxonomic categories: safety, explicitness, efficiency, regulatory requirements, and economy. The safety category includes six sub-categories: minimizing the effects of error, improving crew coordination, using spatially based routines, leave safety items armed, keep most critical pages of system information displayed, operations and consistency. The explicitness category includes four sub-categories: add specific procedure items for items otherwise left unclear or implicit, add extra material providing rationale, other background or local specifics, clarify unclear terms, and inform crew early about operational limits. The efficiency category includes three sub-categories: avoid delays, omit unnecessary items, and omit items covered in other ways.

Here is the final taxonomy, with the rationale instances identified by numbers associated with the airline identifier. That is, AZ 3 refers to the third instance of change rationale from airline AZ. In some cases, rationale instances have been classified in more than one category. The details of the change rationale instances were used in the classification but are beyond the scope of this paper.

1. Safety
   1.1 Minimize effects of error (AZ 1; AX 49)
   1.2 Improve crew coordination (AY 1; AX 15)
       1.2.1 Do not use checklists in the air (AX 2)
       1.2.2 Use calls for abnormal rather than normal conditions (AX 36)
       1.2.3 Eliminate or move items out of high-load times to keep crew in
the loop
          (AX 38, 39, 44)
   1.3 Use spatially based routines (AX 14)
   1.4 Leave safety systems armed (AX 32)
   1.5 Keep most critical pages of system information displayed (AX 44)
   1.6 Operations (e.g., avoid flameouts from ingesting snow) (AX 45)
   1.7 Consistency
       1.7.1 Consistency between OM and cockpit labels (AX 13, 23)
       1.7.2 Consistency among OM procedures (AX 28, 51)
       1.7.3 Fleet commonality/company standard (AZ 8; AY 2; AX 6, 9, 10,
19, 25, 32, 34, 47, 54)
2. Explicitness
   2.1 Add specific procedure items for items otherwise left unclear or
implicit, including calls and their wording
       (AZ 7; AY 4; AX 12, 17, 24, 25, 42)
   2.2 Add extra material providing rationale, other background, or local
specifics (AY 3, 5; AX 47)
   2.3 Clarify unclear terms (AX 29, 34, 51)
   2.4 Inform crew early about operational limits (AZ 2)
3. Efficiency
   3.1 Avoid delays (AZ 4; AX 4, 16, 20)
   3.2 Omit unnecessary items (AZ 6; AX 11, 22, 33, 53)
   3.3 Omit items covered in other ways
       3.3.1 Items covered by other personnel (e.g., dispatch office, cabin
crew) (AZ 5; AX 3, 8, 21, 30)
       3.3.2 Items covered elsewhere in OM or operations manual (AZ 3; AX
26, 41, 48)
       3.3.3 Crew's prior knowledge
             3.3.3.1 Knowledge of system (AZ 3; AX 1)
             3.3.3.2 Knowledge of airmanship (AX 27, 28, 31)
4. Regulatory requirement  (AY 6; AX 5, 37, 43, 46)
5. Economy
   5.1 Save fuel (AX 7, 50)
   5.2 Use packs rather than auxilliary power unit
       (AX 40)

1.3 Evaluation

Using four factors for assessing the taxonomic categories indicates that the categories have a reasonable prima facie validity.

Taxonomic analysis, such as categorization of items like activity or rationale, is often simply a one- or two-step process (see, e.g., Kirwan & Ainsworth, 1992). In taxonomic analyses involving repeated application of the categorization to large sets of data, it is possible to validate the taxonomy through measures of interrater reliability, such as Cohen's Kappa (Lampert & Ervin-Tripp, 1993; Carletta, 1996). Where the data are limited or the research is exploratory, however, single-analyst categorization is accepted (see, e.g., Levinson, 1992; Wood, 1996).In such cases, the validity of the taxonomy depends on its inherent plausibility rather than external assessment; the reader as much as the analyst is entitled to assess independently the correctness and usefulness of the taxonomy, although some prima facie indices of validity can be applied. These indices, adapted from Lampert and Ervin-Tripp (1993), include:

To these indices, I can add a fourth:

Applying the four indices of validity to the rationale taxonomy suggests a reasonably high level of a priori validity. First, the categories relate directly to primary concerns of aircraft operators and OM authors. Safety, efficiency, regulatory requirement clearly cut across both kinds of users. The category of explicitness is less obvious in its interest, but the category stems directly from the language of the airlines themselves, and has a clear theoretical basis in the safe operation of aircraft (see, e.g., Novick & Juillet, 1998). Airline AX's emphasis on explicitness may be attributable to the airline's multicultural environment.

Second, the categories are, for the most part, clearly defined. The cases in which I am least confident involve categories 2.2 ("Add extra material providing rationale, other background, or local specifics") and 5.2 ("Use packs rather than APU"). In the former, this is partly due to the relatively general nature of some of airline AX's explanations. In the latter, this is due to my limited understanding of the economics of the relationship between the use of packs and the APU. Otherwise, the categories are reasonably self-explanatory.

Third, the categories are exhaustive for this data set. There are no miscellaneous categories.

Fourth, all but eight of the 32 categories are supported by multiple instances of change rationale. Of these eight, five have a high degree of face validity:

  1.2.1    Do not use checklists in the air
  1.2.2    Use calls for abnormal rather than normal conditions
  1.3      Use spatially based routines

The first three of these cases consist of single instances of rationale that have, according to the airline, been applied across many sections of the OM. They represent explicit company policy.

  1.6      Operations (e.g., avoid flameouts from ingesting snow)

This case appears to representative of a class of specific changes resulting from experience with operating conditions.

  2.4      Inform crew early about operational limits

This case arises from the explicit comment of one of the airlines. It is possible that the study did not find additional instances because the study looked at change rationale rather than authoring rationale in general. That is, this justification may exist for many unchanged elements in the original OM, where the text would not have had to be changed by the airline.

I have less confidence in the remaining three categories:

  1.4      Leave safety systems armed

As a category, this case is plausible. However, it may represent an incorrect generalization of the underlying rationale. After all, not all safety systems are always armed.

  1.5      Keep most critical pages of system information displayed
  5.2      Use packs rather than auxilliary power unit

These are cases where there is likely to a more general category (such as "Display salient information") but where the data did not contain other like instances. As it is, the categories seem over-specific. It could be the case that the instance supporting category 5.2 might fall in the "save fuel" category if, for instance, it turned out that use of the packs was less expensive than the fuel consumed by the auxilliary power unit.

Overall, then, application of the four validity factors suggests that the taxonomic categories, particularly for the top level, are reasonably good.

2. Common Ground across Domains

Having, at some length, presented a domain-based example of taxonomy development for understanding requirements for human-system interfaces, I now turn to an analysis of the common ground and key differences across domains arising from the change-rationale study. I will discuss the study's implications for cross-domain application with respect to methods, taxonomic classifications, and, briefly, use of manuals as a source for user-oriented design knowledge.

2.1 Methods

The methods for taxonomy development and evaluation are not inherently tied to the aviation domain. In fact, the antecedants for these methods came from other, unrelated domains.

The taxonomy-development method is domain-independent with respect to its bottom-up/top-down approach, but requires extended domain knowledge for the groupings and classifications. In other words, the method is domain-neutral but relies on domain knowledge.

The taxonomy-evaluation method is also nominally domain neutral, but some criteria are more linked to domain knowledge than others. Thus the criteria

are truly domain neutral, because they are measure absence of "miscellaneous" categories and the number of items classified into each category. I note, though, that whether an item should be classified in a particular category depends on domain knowledge. Consider the study's analysis of the categories supported by only a single example of change rationale. The analysis uses concepts such as face validity and experience, and discusses the effects of limitations of the analyst's domain knowledge on the outcome of the classification.

The other two criteria

can be applied to virtually any domain but, again, depend on domain knowledge. The extent of this dependence can be judged, in part, by reviewing the study's analysis, which uses concepts such as (a) judgments about whether categories relate directly to primary concerns of aircraft operators, (b) independently obtained knowledge of an airline's operating environment or culture, and (c) the limitations of the analyst's domain knowledge.

I suggest that these methods are thus usable across domains but that confidence in their results depends in large part on the strength of the analyst's knowledge of the domain.

2.2 Taxonomic Abstractions

To what extent did the rationale-change study produce domain-independent results? By this, I mean to ask whether the top-level (or even some of the other-level) categories are sufficiently broad or domain neutral that theycould be serve a useful analytical role in the assessment of interface requirements in domains other than commercial aircraft cockpits.

Some top-level categories do seem relatively domain independent. These categories include efficiency and, perhaps, economy. Other categories are still general but could easily not apply to many domains. These categories include safety and regulatory requirements. Finally, the category of explicitness seems especially linked to the domain of manuals for safety-critical systems.

Looking at the sub-categories, again some appear more domain-independent than others. For example, categories like "Clarify unclear terms" and "Omit unnecessary items" would appear to have broad application across many domains. I also can identify some elements that have a clear, domain-specific nature. Even such apparently application-independent factors such as consistency across fleets is a characteristic of the aviation domain. Such characteristics, thus, are likely to be shared with other domains in which heterogeneity of interfaces and systems plays a role.

Other elements are so domain-specific that they appear to be of limited use for purposed of generalization across domains. Operational conditions such as avoiding ingestion of snow appears to be in this category. However, this sort of example may be useful for interface and procedure designers in that it points out that variations in conditions of use may be so extreme as to lead to systematic changes in manners of use.

This discussion suggests that there are further levels of abstraction possible from the change-rationale taxonomy's top-level categories. Moreover, the differences in domain-dependence among categories both at the top-level and in sub-categories suggests that the taxonomy contains categories that are related at possibly inconsistent levels. That is, a better taxonomy might ensure that categories at a given level of abstraction have similar levels of domain dependence and domain independence.

2.3 Manuals as Sources for Design Knowledge

One other kind of domain dependence involves the sources for the user requirements that were the focus of the study: changes to user manuals. At first blush, one might think that the case of having customers customize their user manuals was so rare that, as a technique, analysis of change rationale would be limited to a very small set of domains. While it is true that publishing customized manuals may be infrequent, there are analogous sources of information in many domains. In particular, study of individuals' copies of manuals is likely to disclose notes by the user that indicate things like key items, needs for clarification, and awkward procedures. Even for a walk-up-and-use kiosk, consider looking at a kiosk in use to see if anyone's written graffiti that reflects use of or reaction to the user interface!

Acknowledgements

This research was funded by Aérospatiale Aéronautique. I also thank Guy Boy for his helpful comments on this paper and the airlines contributing to the study for their valuable assistance.

References

Boy, G. Cognitive function analysis for human-centered automation of safety-critical systems. Proceedings of CHI 98 (Los Angeles, CA, April, 1998), 265-272.

Carletta, J. (1996). Assessing agreement on classification tasks: The Kappa statistic. Computational Linguistics 22, 2, 249-254.

Kirwan, B., and Ainsworth, L. (1992). A guide to task analysis. London: Taylor & Francis.

Lampert, M., and Ervin-Tripp, S. (1993). Structured coding for the study of language. In Edwards, J., and Lampert, M. (eds.), Talking data: Transcription and coding in discourse research. Hillsdale, NJ: Lawrence Erlbaum Associates.

Levinson, S. (1992). Activity types and language. In Drew, P., and Heritage, J. (eds.), Talk at work. Cambridge: Cambridge University Press.

Novick, D., and Juillet, J. (1998). Documentation integrity for safety-critical applications: The COHERE project, Proceedings of SIGDOC 98, Quebec, September, 1998, 51-57.

Novick, D. (in press). Using the cognitive walkthrough for operating procedures, Interactions.

Wood, L. (1996). The ethnographic interview in user-centered work/task analysis. In Wixon, D., and Ramey, J. (eds.), Field methods casebook for software design. .New York: John Wiley & Sons.


 
This paper was published as Novick, D. (1999). Domain-linked rationale for changes in use of complex interfaces, Working Notes, CHI 99 Workshop on HCI in Domains: Common Ground and Key Differences, Conference on Human Factors in Computing Systems (CHI'99), Los Angeles, CA, May, 1999.
 
____________
 
DGN, November 23, 2000