Standard error classification to support software reliability assessment by JOHN B. BOWEN Hughes-Fullerton Fullerton, California SUMMARY application of software development tools and techniques, but the reality of the situation does not support this conclusion. Project managers who have the ability to both initiate error reporting procedures, and analyze the incoming data, do not consistently take action resulting from the analysis of the error reports." This observation is still true today. Typically, the reason for not acting on the analysis of error trends is the overriding pressure of getting the immediate job done on schedule. Such a reason is understandable particularly during the latter stages of development. However, software management appears to be remiss in not applying the results of error analysis to subsequent projects. In the case of predictive software reliability models, most program managers have doubts about their usefulness. Consequently some view error data collection as a nonproductive extra burden. This view is unfortunate, because only with the support of conscientious error data collection can proposed quantitive reliability models be validated. The Rome Air Development Center (RADC) has sponsored numerous studies on software error collection and analysis, starting with a software reliability study by TRW! which included an error category scheme, which was generated as the raw error data was analyzed. This error scheme was used in later RADC studies ,4,5 but no approved standard error classification has emerged within the Air Force, to date. The Navy has included a software trouble classification in its recent MIL-STD-1679;6 however the four categories do not have enough detail to assist in feeding back constructive information to management. This paper proposes the standardization of a set of software error classifications that have casual, severity, and source phase properties. Such a set will assist the project manager in taking remedial action to improve reliability, support company and software community efforts in evaluating the impact of reliability-producing techniques, and aid in validating software reliability models and metrics. The term software reliability, therefore, as used in this paper represents both the assessment of the use of reliability-producing factors and the prediction of residual errors. Although reliability models are primarily used to predict residual errors existing after acceptance testing, they can also be applied to earlier development phases if primed with sufficient error data. Reliability prediction is not concerned with the casual A standard software error classification is viable based on experimental use of different schemes on Hughes-Fullerton projects. Error classification schemes have proliferated independently due to varied emphasis on depth of casual traceability and when error data was collected. A standard classification is proposed that can be applied to all phases of software development.· It includes a major casual category for design errors. Software error classification is a prerequisite both for feedback for error prevention and detection, and for prediction of residual errors in operational software. INTRODUCTION The ability of managers and technical developers to influence the reliability of software is very high at the outset of a project but declines rapidly as commitments are made, schedule time and budgets are used, and code and documents are produced. The acceptance test phase is the very time when little chanceremains to influence the reliability of the system except by rebuilding the deficient parts. A significant goal is to alert management as early as possible in the development phases of critical problems and adverse trends that could degrade software reliability. Since up to 60 percent of the errors detected in the life cycle of software have been committed during the design phase,! a major challenge is to devise error categories that are sensitive to that phase, and thereby provide feedback. Management feedback has been difficult to obtain, because programmers have traditionally enjoyed a pride of codemanship that rarely admits to the existence of errors. However, with the advent of Modern Programming Practices (MPPs), such as code reviews, software errors are available for analysis and feedback-even before a program module is executed. A special conference on the problems of data collection 2 concluded that "The most success in data collection has been realized in those places where there has been feedback. " Over three years ago Marcia Finfe~ noted that "Many papers addressing the problem of error collection and quantization state that greater understanding of software errors will lead to the improvement in the design and 697 From the collection of the Computer History Museum (www.computerhistory.org) National Computer Conference, 1980 698 properties of errors, but should be concerned with severity and source phase properties. NEED FOR A STANDARD ERROR CLASSIFICATION Like most human activities, the software engineering environment is a complex of a g~eat variety of interrelated factors. Some researchers such as Willmorth, et al. 7 conclude that "No one set of data parameters collected for research purposes will significantly support a wide range of reliability analyses." Weiss8 contends that error classifications need to be tailored for each study or application so that the questions of interest can be answered. I contend that there is a need for a standard scheme to classify error data which represents the basic characteristics of the software environment. In fact, a number of organizations and agencies such as the Joint Logistic Commanders, U.S. Navy, IEEE Computer Society, and a number of industrial companies have developed, or °are in the process of standardizing, software error classificationso Unfortunately, few of these schemes are compatible with each other. Only the severity classifications are similar, and even in this case the number of severity categories ranges from three to five. RADC has inaugurated a software data collection and analysis program9 which has as one of its major objectives to "Promote standards of software data collection, and support the development and definition of common software data collection terminology. " The necessity of a standard error classification scheme becomes evident when the needs of a large project and research activities are examined. A few examples are: to provide feedback to develop software design standards; provide guidance to test engineers; evaluate modern programming practices; evaluate verification and validation tools; and validate and support quantitative reliability models. The minimal ingredients of such a scheme are listed in Table I. Since some studies report that as much as 60 percent of all software errors originate in the design phase, it is important that error collection and classification be sensitive to the point in time in the life cycle of a program when the error occurs. Only then can improved software design standards be developed. In addition, the distribution of types of errors from related projects can assist test engineers and quality analysts in concentrating their activities. For instance, if one particular application is expected to have a preponderance of computational errors then the test planners would profit by applying dynamic tools, rather than static tools, to uncover such errors. 10 Thus, while it has been TABLE I.-Questions that can be answered by a feedback-oriented classification scheme When - How What In what phase in the software development cycle did the error ori ginate? What did the designer/analyst/programmer do wrong? - What is the effect of exercising the resultant fault? established that the use of error classifications can aid in evaluating all phases of software development, the most rewarding efforts occur during the early phases, such as design. As suggested by Finfer,3 error analysis can indicate the necessity to apply additional personnel to a particularly error-prone program or subsystem, and a cluster of errors in a related group of programs may indicate that particular software is poorly designed. In a study for NASA-Langley, Hecht l l recommended that "Classification by cause of failure is desirable in order to organize remedial measures. This information is of value for the management of the immediate project on which it is obtained, for overall software management (e.g., in guiding the allocation of resources), and for the development of improved software engineering tools and procedures (language processors, test tools)." Thus while these examples illustrate the underlying necessity for developing a standard software error classification scheme, the problem is not exactly new. A software data collection conference in 19752 concluded that: "Standardization of data items, collection procedures, and project characteristics is needed to provide comparability of measures in evaluating tools, techniques, and methods." This is still true today especially in the validation of predictive software reliability models and software reliability metrics, as well in the selection of the best V&V tools and techniques. One of the major hurdles in comparability is the difficulty in controlling all of the factors that influence software development during an experiment that compares two software development activities using different modern programming practices. It is difficult to compare the programming activity of different projects using error analysis because of uncontrollable factors-such as programmer background, hardware and software environment, and applications. Error density is frequently used to evaluate MPPs. For example, IBM12 compared two large projects: One project with topdown design, structured code, chief programmer teams, and a librarian, had an error rate of 1.0 per 100 lines of code. Another project, using conventional techniques, had twice that rate. This report is an example of the typical use of unqualified errors to evaluate the effectiveness of MPPs. In the final analysis, such a use can be misleading unless the researcher reveals when the errors were detected, and how severely these errors impact mission performance. Even if the errors are qualified there must be a common understanding of the classification scheme. Susan Gerhart 13 reports "The study of observed ~rrors on the fallibility of modern programming methodologies suffered from an inconsistent error domain which caused several types of classification schemes to be difficult to construct and to interpret." Castle, in a thesis on validation of software reliability math models,14 states that if he had to make one recommendation, it would be the importance of continued software error collection. He pointed out, "A dis~ase cannot be cured without knowledge of the cause. So is the case with unreliable software." In a list of 22 software error characteristics for collection, he includes the phase in which the error occurred, the criticality of the error, and the error categories (causal) with unambiguous definitions. As a result of a study of candidate software reliability models, Kruszewskp5 recom- From the collection of the Computer History Museum (www.computerhistory.org) Error Classification to Support Software Reliability Assessment mended improved data collection with formal error reporting and using causal and severity categories. Schafer in a recent RADC study to validate candidate software reliability models 16 used 16 sets of project error data which represented a total of 31,181 errors. The results of the study indicated that in general the software models fit poorly due to vagaries of the data, rather than shortcomings of the models. The study report concluded that more work remains in the area of software error data collection. Echoing these findings, Sukert, at a recent conference,17 recommended the development of software error data collection standards, and the study of software reliability predictions based on error criticality categories. SURVEY 'oF CANDIDATE ERROR CLASSIFICATION SCHEMES An excellent survey of the state-of-the-art in software error data collection and analysis was published by Robert Thibodeau. 18 His report describes recent efforts of government agencies, educational institutions, and private companies; and includes synopses of several studies on software error collection and analysis. On the topic of error classification he states: "The study of software errors requires them to be separated according to their attributes. This is the first step in understanding what causes them and, subsequently, how they may be prevented. The need for a practical error classification is important and, since it applies to nearly all areas of software research, it deserves to be treated as a separate topic. " 699 TRW software reliability study During a study for RADC,I TRW-Redondo Beach devised a software error classification scheme with twelve major causal categories. The study also developed a source phase classification. These classifications which were iteratively developed during a 2.5-year study are listed in Table II. Study of errors found in validation Raymond Rubey in a technical paper published in 197520 presents several error categories. He stated that, "The most basic data required about the errors found during validation are the frequency of occurrence of those errors in defined error categories and their relative effect or severity. " Three of the proposed error classification schemes are included in Table III. ' ANISLQ-32(V) verification and validation In May 1977 the Navy distributed a statement of work for V& V services21 which characterized the software errors encountered during software development as follows: Requirements Processing Design Data Base Design Interface Design Processing Construction Data Base Construction Interface Construction Verification Specification (all documentation) Mitre error classification study In early 1973 MITRE Corporation, under contract to RADC, developed a general software error classification methodology.19 The methodology was designed to serve as a guideline for experiment-specific application. The proposed classification scheme is hierarchical, and consists of five major categories: 1. 2. 3. 4. 5. Where did the error take place What did the error look like How was the error made When did the error occur Why did the error occur The associated subcategories are not unique to the major categories and include attributes such as People, Hardware, Software, Mechanical, Intellectual, and Communicational. The scheme accounts for the fact that a single error can have a number of characteristics occurring simultaneously. The report addresses the problem of mUltiple classification of the same error, and suggests the use of the" fuzzy set theory where multiple classifications are qualified by degree to fully describe a single software error. TABLE H.-Software error classifications developed during TRW reliability study COMPUTATIONAL LOGIC DATA INPUT DAT A HANDLIN G DATA OUTPUT INTERFAGE DATA DEFIN IT ION DATA BASE OPERATION OTHER DOCUMENTATION PROBLEM REPORT REJECTION Source Phase REQUIREMENTS DESIGN CODING MAINTENANCE NOT KNOWN From the collection of the Computer History Museum (www.computerhistory.org) National Computer Conference, 1980 700 TABLE I11.-Error classifications proposed by Rubey study Discussion Causal INCOMPLETE OR ERRONEOUS SPECIFICATICN INTENTIONAL DEVIATION FROM SPECIFICATION VIOLATION OF PROGRAMMING STANDARDS ERRONEOUS DATA ACCESSING ERRONEOUS DECISION LOGIC OR SEQUENCING ERRONEOUS ARITHMETIC COMP UT ATION S INVALID TIMING IMPROPER HANDLING OF INTERRUPTS WRONG CONSTANTS AND DATA VALUES INACCURATE DOCUMENTATION Severity SERIOUS MODERATE MINOR Source Phase DEFINING THE PROGRAl\] SPECIFICATION DEFINING TilE PROGRAM CODING PERFORMING MAINTENANCE FUNCTIONS JLC preliminary error classification In April 1979 the Joint Logistics Commanders Joint Policy Coordinating Group on Computer Resource Management held a software workshop22 where preliminary general categories for classifying software errors were defined. As shown. in Table IV three major casual categories and four severity categories were included. Most of the software error classification schemes surveyed have a separate classification for severity or impact on mission performance. However, there was no general agreement on using distinct classifications for cause and source phase. Some error causes are phase peculiar, therefore a combined single category would result in fewer subcategories than all possible combinations of source phase and casual subcategories. This advantage appears to be outweighed, however, by the ease in implementing automated statistical analysis of the phase and casual attributes when the categories are separated. It should be noted that only the Navy AN/SLQ-32(v) casual classification scheme included unique categories for design errors. Rubey's classification contains only one special design category, intentional deviation from specification. (This category could be interpreted as representing either a design or coding activity.) The JLC classification has design categories; however they are combined with requirements (e.g., incomplete requirements or design). RESULTS OF USING EXPERIMENTAL CLASSIFICATIONS ON HUGHES-FULLERTON PROJECTS For over two years Hughes experimented with a software error classification scheme on an Army project during the development phases. The classification scheme used on this project was based on the scheme proposed by Rubey.20 Three classifications were used: Severity, Cause, and Miscellaneous as shown in Table V. The casual classification TABLE IV.-Software error categories proposed by Joint Logistic Commanders Software Specifications 1. 2. 3. 4. 5. 6. 7. TABLE V.-Hughes-Fullerton experimental error classification Unnecessary functions Incomplete requirements or design Inconsistent requirements or design Untestable requirements or design Heq uirements not traccable to higher specifications Incorrect algorithm Incomplete or inaccurate interface specifications Code 1. 2. 3. 4. 5. 6. 7. S. 1. 2. 3. Syntax errors Non-compliance with specification (s) Interface errors Exception handling errors Shared variable accessing error Software support environment errors Violation of programming standards Operational support environment errors Accuracy Precision Consistency Severity 1. Prevents accomplishment of its primary function, jeopardizes safety, or inhibits maintainability of the software 2. Degrades performance or maintainability, with no workaround 3. Degrades performance or maintainability, but a workaround exists 4. Doesn't adversely affect performance or maintainability (such as documentation, etc. errors transparent to users) Severity CR l\IA l\JJ System Crash or Serious Effect on l\lission Performance Incorrect Values that Reduce Mission Performance Incorrect Values that have Tolerable Effect on l\lission Cause REQl\IT PROGl\1 SPECS LOGIC !l\/PVE INTRT LINKE ARITII ALGOR DOCUM EDIT DATAl DATA2 DATA3 DATA4 DATA5 DATA6 DATA7 DATAS Expanded. Reduced, or Erroneous Requirements Non Responsive Program Desi gn Incomplete or Erroneous Prog-ram Design Specifications Erroneous Decision Logic on Sequencing Improved Program Storage or Response Time Improper Handling of Interrupts Incorrect l\lodule or !loutine Linl<nge Erroneous Arithmetic Computations Insufficient Accuracy in Implementation of Algorithm Inaccurate or Incomplete Comments on Prologue Erroneous Editing for New Version Update Incomplete or Inconsistent Data Structure Definition Wrong Value for Constant or Preset Data Improper Scaling of Constmlt or Preset Data Uncoordinated Use of Data by !\lore than O,ne User Erroneous Access or Transfer of Data Erroneous Reformatting or Conversion of Data Improper !\lasking & Shifting During Data Extraction & Storage Failure to Initialize Counters, .Flags, or Data Areas Miscellaneous INTRa STAND NelV Error Introduced During Correction Noncompliance with Programming Standards and Conventions -------------------------- From the collection of the Computer History Museum (www.computerhistory.org) Error Classification to Support Software Reliability Assessment was open-ended, that is to say, categories were added as required during the project. The Data category was assigned most frequently (23 percent of total errors), consequently it was divided into eight subcategories. Incomplete or erroneous program design specifications accounted for 15 percent of the total number of errors; logic for 14 percent; and requirements, program design, and access or tran'sfer of data for 10 percent each. On a similar Army project,23 Hughes has over a year's experience in using an error classification scheme based on the TRW/RADC scheme. The casual classification was assigned separately from the source phase, and was tailored to the following ten major categories (percent of total errors are shown in parentheses): Computational (4) Logic (38.5) Data definition (20.5) Data handling (14) Data base (3) Interface (4.5) Operation (1) Documentation (0.5) Problem Report Rejection (NA) Other (13.5) The major categories, Data Input and Data Output, were dropped, because they were not appropriate to the application. An analysis of error trends on this project revealed that eight problems were caused by the improper selection of instructions. Accordingly, it was felt that this class of errors warranted a separate subcategory. Since such a selection could result from either misunderstanding or carelessness, the following two subcategories were added to the Other category: Selection of wrong instruction or statement Careless selection/omission of instruction or statement It is believed that these two categories will determine the need for improved training of new programmers on subsequent projects in the understanding of the instruction repertoire. Such categories may be useful in validating complexity metrics such as the one proposed by Ruston. 24 The metric is based on information theory, and assumes that the less frequently an operator or operand is used then the more difficult it is for the programmer to use correctly. On a Navy project, Hughes employed a code review technique which included the recording of errors according to categories. Five hundred modules had a total of 765 errors; the remaining 742 had no problems. 25 Table VI presents the distribution of the most frequent errors, and compares the distribution with comparable categories from IBM's code inspection technique. 26 The high percentage of errors due to missing or insufficient listing prologues and comments for the Hughes project was probably due to the novelty of such a requirement early in the coding phase. 701 TABLE VI.-=-Distribut~~_~~_:rr~~ dete~!~.~ during code inspection ~ Categury ----- Prologue IComments Desig'n Conflict Logic Programming Standards Language Usage Other Module Interface Data Base Total _____ y of Total Hug'hes 44 19.5 11.5 11 5 3 3 IBM 17.0 25.5 30.5 4.5 12.5 3.5 6.5 3 100.0 100.0 _ _ _ ._•........ ___ . ____ ...L _ _ _ __ RECOMMENDED ERROR CATEGORIES With respect to proposed error classification schemes, the applicability to more than one project, the excessive granularity and ambiguity of subcategories, have been called out as problems. Hughes has found that the use of a minimal set of three software error classifications (Cause, Severity, and Source) solves these problems and is sufficient to support the assessment of software reliability. As summarized in Table VII, Source tells in which software development phase the error originated in, Cause tells what the analyst or programmer did wrong, and Severity tells whether the manifestation of the error degrades mission performance. The recommended casual classification for software reliability assessment, containing seven major categories, is shown in Table VIII. The scheme can be tailored by adding subcategories of interest or exception, such as problem rejection, to the Other category. A definition of each category/ subcategory is presented in Appendix A. A severity classification of at least three categories (for example Critical, Major, and Minor) is recommended. In addition to guiding project managers in assigning priorities to the troubleshooting and resolution of problems, severity categories are necessary for practical application of predictive software reliability models. In order for the prediction of residual software faults to be meaningful, the impact of the execution or manifestation of the fault on the system mission performance must also be included. Some proposed reliability models such as the execution time theory model can accommodate severity by running separate predictions for each severity category of interest. The justification for the recommended error casual and source phase categories/ subcategories is discussed in the following subparagraphs. TABLE VII.-A software error classification scheme that provides feedback - - - - - _ ..- .. __. _ - - _ . _ - - - - - - - - - - - - Scurce - Phase in which error of omission Icommission was madc (e. g. Requirement, Design, Coding, Test, Maintenance, and Corrective Maintenance). Cause - The causal description of the error. rather than symptomatic Severity - The resulting effect of the error on mission performance (e.g. Critical. Major, and Minor) From the collection of the Computer History Museum (www.computerhistory.org) National Computer Conference, 1980 702 TABLE VIII.-Casual categories to support software reliability analysis Design Nonresponsive to reluirements Inconsistent or incOloplete data base Incorrect or incomplete interface Incorrect or incomplete program structure Extreme conditions neglected Interface Wrong' or nonexistent subroutine called Sl.broutine call arg'wnents not consistent Improper usc or setting of data base by a routine Improper handling of interruptE Subcategory for clerical errors Data Definition Data not initialized property Incorrect data units or scaling Incorrect variable type Logic Incorrect relalional operator Logic activities out of sequence Wrong variable being checked l\Jissing' logic or condition tests Loop iterated incorrect number of times (including endless loop) Duplicate logic Data Handling' Data accessed or stored improperly Variable used as a flag or index not set properly Bit manipulation done incorrectly Incorrect variable type Data packing/unpacking error Uni ts or data conversion error Subscripting error Computational Two experimental studies, one performed at the Naval Post Graduate School (NPGS)Z7 and the other performed at the Naval Research Laboratory (NRL),8 found it necessary to include clerical as a major error category. In fact, both studies found that the clerical category was the most frequency error cause (see Tables IX and X). The NPGS error distributions represent a composite of four projects. On one project the Clerical, Manual subcategory contributed to 36 percent of the total errors. Due to the high occurrences of clerical errors reported on these two unrelated projects, it is recommended that clerical be added as a subcategory to the Other category. Maintenance category Incorrect operator /operand in equation Sign convention error Incorrect/inaccurate equation used Precision loss due to mixed mode Missing computation Rounding or truncation error Other Not applicable to software reliability analysis Not compatible with project standards Unacceptable listing prologue/comments Code or desig'n inefficient/not necessary Operator Clerical Category for design-related errors Although a casual category for design-related errors is redundant with the source phase category Design, sufficient error volume has been associated with software design activities to warrant a separate casual category. In analyzing designed-related category assignments on three software TABLE IX.-Misunderstandings as sources of errors during NRL experiment Categ~.~y __ .. Clerical Design Coding StJccs Careless Omission Language Interface Requirements Coding Standards Total projects at Hughes-Fullerton it was found that the categories accounted for 25, 17, and 8 percent of the total errors. Furthermore, the results of the error category frequency distributions collected during code review/inspections (refer to Table V) reveal that design conflicts constitute a significant portion of the error causes (25.5 and 19.5 percent). Another study8 performed at the Naval Research Laboratory (NRL) reported that design misunderstandings contributed to 19 percent of the total errors (see Table IX). % of Total Errors 36 19 13 _L____ J Maintenance errors are defined by Thayer l as those errors resulting from the correction of previously documented errors. He reported that in one project this category of errors reached 9 percent of the total number of errors; however, he estimated that a practical norm for this type of error ranges from 2 to 5 percent. Fries 5 reported " ... a surprisingly high 6.5 percent of the errors were a result of attempts to fix previous errors or update the software. Thus, the number of errors introduced by the correction process itself is nontrivial. This is an important consideration when developing reliability model assumptions." Note that Fries' 6.5 percent includes updates or enhancement changes as well as corrections of previously documented errors; therefore the actual percentage value for maintenance errors would probablY lie in the 2 to 5 percent range. TABLE X.-Most frequent e!f0~ types found during NPGS experiment Subcategory Clerical, manual Coding, Representation Coding, Syntax Design, Extreme Condition Neglected Coding, Inconsistency in Naming Coding, Forgotten Statements Design, Forgotten Cases or Steps Design, Loop Control Coding, Missing Declarations or Block Coding, Level Problems Limits Coding, Sequencing Design, Indexing Coding, Mh;sing Data Declarations Clerical, Mental Other (combined) Total - - - - - - - - - -. .- - - - - - - - -- From the collection of the Computer History Museum (www.computerhistory.org) % of Total Errors 18.5 10.0 7.0 6.5 5.0 5.0 4.5 4.0 4.0 3.0 3.0 2.5 2.5 2.5 ...l.l!.d. 100.0 Error Classification to Support Software Reliability Assessment At Hughes-Fullerton three projects have been monitored during development phases for maintenance errors. The portion of total errors for these three projects are 14, 12, and 8 percent. One possible reason these percentages are higher than the previously reported range of two to five percent, is that none of the thre"e Hughes projects controlled the number of allowable patches. Consequently, there was always the extra risk of wrong correction in patch form due to hasty implementation, or the subsequent incorrect symbolic implementation of a successful patch. It is estimated that maintenance errors· contribute to as high as 20 percent of the total errors after a system is fielded. Because of the frequency of this type of error, and the interest in reducing the cause of maintenance errors, a separate category is required. Either a Maintenance subcategory could be added to the Other causal category, or a Corrective Maintenance category could be added to the source phase classification. It is recommended that a new category be added to the Source phase classification, because including maintenance error as a causal subcategory would preclude the assignment of the more descriptive cause (e.g., Subscripting error). Optional category/subcategory assignment The original TRW/RADC classification for Project 51 was designed for universal application by allowing the option to assign categories at only the major category level (e.g., Computational, Logic, Data Handling, etc.). The TRW study report commented as follows about the applicability of the subcategories: "The detailed categories, however, are less universal and suffer in applicability due to differences in language, development philosophy software type, etc. When data are collected may also have a bearing on applicability [of the detailed categories] to some software test environments. For Project 5 the list used was apparently adequate for the real time applications and simulator software, as well as the Product Assurance tools. However, there was criticism concerning applicability of detailed categories to the real time operating system software problems." HughesFullerton has employed the two-level (category/subcategory) option, and has found it to be satisfactory for all projects. ERROR COLLECTION GUIDELINES It is human nature not to admit to errors, therefore it is essential that software engineers be informed of the significance of reporting accurate error data to support software reliability analysis. It should be emphasized that the purpose of error reporting is to measure the technology and not the people. I agree with Gerhart's13 statement: "It is necessary to view errors as a phenomenon of programming which requires study and, while it is necessary to be sensitive to peoples' reactions when threatened by exposure of errors, it may be healthier to get the errors and the errants out in the open rather than to cover up the human origin of errors." Automatic data collection may be the only means to ensure . 703 objective data, but short term projects cannot afford it. In most instances, useful software reliability information can be obtained by only slight modifications to existing problem report/correction systems. The use of coded error category descriptors on program trouble and correction reports tends _ to alleviate thoughts of incrimination. Guideline procedures for assigning and approving error categories should be included in project standard practices to promote consistent interpretation of the error categories. In addition to the error categories, the procedure should contain detailed definitions of the error subcategories. Those definitions guide individual programmers in assigning the most appropriate category to represent the error at hand. Even with the use of such an error category dictionary, programmers may assign different categories for the same errors. Therefore, it is futher suggested that a senior programmer or reliability analyst be responsible for reviewing all error category assignments for consistency and accuracy. Certain less offensive subcategories such as Clerical require special monitoring, because a programmer will lean toward them when given a choice. Programmers must be reminded to fill out a separate problem correction report for each distinguishable correction at the module level. It is recommended that the following data be collected in addition to the error classifications: • • • • Date/time that error/incident was detected Date/time that error was resolved by programmer Date/time that resolution was verified Principle module responsible for error CONCLUSIONS It appears from the survey of proposed software error classification schemes that they differ primarily because of varying emphasis on different areas of software development. I agree with some researchers that error classifications must reflect areas of interest, however this does not preclude the development of a standard minimal set of software error classifications that has universal application-including reliability assessment. Therefore, I suggest that the proposed error classification scheme be considered as a standard for use in software reliability assessment. The proposed scheme can be used during design reviews, code reviews, and testing. In order to satisfy all activities, additional error characteristics will have to be collected. For example, in the validation and use of predictive software reliability models the date and time of detection of a fault, and the date and time of correction of the error are additional data that are required to be collected. However, if the cause of the "error" is ignored a reliability model could be fed time/date data for a problem report, such as integration of new software, that is not analogous to the residual class of errors that quantitative models predict. The development of a set of standard software error classifications is a prerequisite for the development of a mean- From the collection of the Computer History Museum (www.computerhistory.org) 704 National Computer Conference, 1980 ingful software reliability discipline. Such a set of classifications can serve two promising approaches to the discipline: 1) those that emphasize the use and assessment of reliabilityproducing techniques during the early development phases, and 2) those that focus on the prediction and measurement of the number of residual errors after acceptance, by statistical math models. Both approaches require error classifications to effectively assess and measure software reliability. Concurrent with the development and acceptance by the software community of a standard set of causal, severity, and source classifications there is a need for research and development in the automation error collection through compilers and test runs. Also, the capabilities of emerging independent V& V tools when augmented by standard error classifications can be extended to improve test plan and procedure generation. 22. Hartwick, R. Dean, "Software Acceptance Criteria Panel Report," Joint Logistics Commanders Joint Policy Coordinating Group on Computer Resource Management, Software Workshop, Monterey, CA (April 1979). 23. Bowen, J. B., "AN/TPQ-36 Software Reliability Status Report," HughesFullerton, CDRL 8-18-015 (Dec 1979). 24. Shooman, M. L. and Ruston, H., "Summary of Technical Progress, Investigation of Software Models," Polytechnic Institute of New York, RADC-TR-79-188 (July 1979). 25. Thielen, B. J., "SURTASS Code Review Statistics," Hughes-Fullerton, IDC 78/1720.1004 (Jan 1978). 26. Fagan, M. E., "Inspecting Software Design and Code," Datamation, pp 133-144 (Oct 1977). 27. Hoffman, H., "An Experiment in Software Error Occurrence and Detection," masters thesis, Naval Postgraduate School (Jun 1977). APPENDIX A DEFINITION OF RECOMMENDED ERROR CATEGORIES/SUBCATEGORIES REFERENCES 1. Thayer, T. A., et ai, "Software Reliability Study," TRW-Redondo Beach, RADC TR-76-238 (Aug 1976). 2. Willmorth, N. E., "Proceedings of Data Collection Problem Conference," RADC TR-76-329, Vo!' VI (Dec 1976). 3. Finrer, M. C., "Software Data Collection Study," System Development Corp., RADC-TR-76-329, Vol III (Dec 1976). 4. Baker, W. F., "Software Data Collection and Analysis: A Real-Time System Project History," IBM Corp., RADC-TR-77-192 (Jun 1977). 5. Fries, M. J., "Software Error Data Acquisition," Boeing-Seattle, RADCTR-77-130 (April 1977). 6. Chief of Naval Materiel, Military Standard for Weapon System Software Development MIL-STD-1679 (Navy), AMSC No. 23033 (Dec 1978). 7. Willmorth, N. E., et aI, "Software Data Collection Study, Summary and Conclusions," RADC-TR-76-329, Vo!' I (Dec 1976). 8. Weiss, D. M., "Evaluating Software Development by Error Analysis: The Data from the Architecture Research Facility," Naval Research Laboratory, NRL report 8268 (Dec 1978). 9. Nelson, R., "Software Data Collection and Analysis, Draft"-partial report, RADC (Sep 1978). 10. Gannon, c., "Error Detection Using Path Testing and Static Analysis," Computer, pp 26-31 (Aug 1979). 11. Hecht, H., "Measurement, Estimation, and Prediction of Software Reliability," Aerospace Corp. NASA CR-145135 (Jan 1977). 12. Motley, R. W. and Brooks, W. D., "Statistical Prediction of Programming Errors," IBM Corp. RADC TR-77-175 (May 1977). 13. Gerhart, S. L., "Development of a Methodology for Classifying Software Errors," Duke University (July 1976). 14. Castle, S. G., "Software Reliability: Modelling Time-to-Error and Timeto-Fix," masters thesis, Air Force Institute of Technology (Mar 1978). 15. Kruszewski, G., "Modeling Software Reliability Growth, Proceedings of Surface Warfare Systems RMQ Seminar," Norfolk, VA (Sept 1978). 16. Schafer, R. E., et ai, "Validation of Software Reliability Models," Hughes-Fullerton, RADC-TR-79-147 (Aug 1979). 17. Sukert, A., "State of the Art in Software Reliability," Presentation, NSIA Software Conference, Buena Park, CA (Feb 1979). 18. Thibodeau, R., "The State-of-the-Art in Software Error Data Collection and Analysis," AIRMICS (Jan 1979). 19. Amory, W. and Clapp, J. A., "Engineering of Quality Software Systems (A Software Error Classification Methodology)," MITRE Corp., MTR2648, Vol VII, Jan 1975, also RADC-TR-74-324, Vol VII. 20. Rubey, R. J., "Quantitative Aspects of Software Validation," Proceedings of the 1975 International Conference on Reliable Software Los Angeles, pp 246-251 (April 1975). 21. NAVSEA, Statement of Work for AN/SLQ-32(V) Verification and Validation, Appendix A (May 1977). Design The Design category reflects software errors caused by improper translation of requirements into design. The design at all levels of program and data structure is included (subsystem through module and data base through table). Such errors normally occur in the design phase, but are not limited to that phase. Errors due to inconsistent, incomplete, or incorrect requirements do not qualify for this category; such errors should be assigned to the subcategory, "Not Applicable to Software Reliability Analysis. " Interface The Interface category includes those errors concerned with communicating between ,1) routines and subroutines, 2) routines and functions, 3) routines and the data base, 4) the executive routine and other routines, and 5) external interrupts and the executive routine. Data definition This category pertains to errors involved with permanent data, such as retained, global, and COMPOOL. It includes common variable and constant data, as well as preset, initialized, and dynamically set variables. Logic The Logic category includes all logical-related errors at the intramodule level. Examples of this category are incorrect relational operators and incorrect looping control. Improper or incomplete logic occurrences at the intermodule level do not qualify for this category, and should be assigned to the Interface category. From the collection of the Computer History Museum (www.computerhistory.org) Error Classification to Support Software Reliability Assessment 705 subcategories should not change. The following suggested subcategories deserve further explanation. Data handling The Data Handling category is concerned with errors in the initialization, accessing, and storage oflocal data; as well as the conversion and modification of all data. Computational The Computational category pertains to inaccuracies and mistakes in the implementation of addition, subtraction, multiplication, and division operations. Operator This subcategory includes errors caused by inaccurate users manuals for both operational and diagnostic applications. Clerical Other The Other category is designed to provide flexibility for each application. However once selected for a project, the This subcategory includes errors that can be traced to careless keypunch, configuration control, or system generation operations. From the collection of the Computer History Museum (www.computerhistory.org) From the collection of the Computer History Museum (www.computerhistory.org)
© Copyright 2026 Paperzz