MODEL-BASED TESTING
FOCUS AREA: QUALITY
Definition and Summary: Model-Based Testing is the automatic generation of efficient test procedures/vectors using models of system requirements and specified functionality.
Specific activities of the practice are (1) Build the model, (2) Generate expected inputs (3) Generate expected outputs, (4) Run tests, (5) Compare actual outputs with expected outputs, and (5) Decide on further actions (whether to modify the model, generate more tests, or stop testing, estimate reliability (quality) of the software)
Model-Based Testing (MBT) can result in the following benefits:
§ Shorter schedules, lower cost, and better quality
§ A model of user behavior
§ Enhanced communication between developers and testers
§ Early exposure of ambiguities in specification and design
§ Capability to automatically generate many non-repetitive and useful tests
§ Test harness to automatically run generated tests
§ Eases the updating of test suites for changed requirements
§ Capability to evaluate regression test suites
§ Capability to assess software quality
These benefits all require an initial investment in tools and training.
SUMMARY DESCRIPTION
Specific activities of the practice, as shown in the Figure below, are:
§ Build the model
§ Generate expected inputs
§ Generate expected outputs
§ Run tests
§ Compare actual outputs with expected outputs
§ Decide on further actions (whether to modify the model, generate more tests, or stop testing, estimate reliability (quality) of the software)
DETAILED DESCRIPTION
The MBT process begins with requirements. A model for user behavior is built from requirements for the system. Those building the model need to develop an understanding of the system under test and of the characteristics of the users, the inputs and output of each user, the conditions under which an input can be applied, etc. The model is used to generate test cases, typically in an automated fashion. The specification of test cases should include expected outputs. The model can generate some information on outputs, such as the expected state of the system. Other information on expected outputs may come from somewhere else, such as a test oracle. The system is run against the generated tests and the outputs are compared with the expected outputs. Here, too, automation is extremely useful. The failures are used to identify bugs in the system. The test data is also used to make decisions, for example, on whether testing should be terminated and the system released.
Build the model: Forming a mental representation of the system’s functionality is a prerequisite to building a model for testing purposes. Testers need to understand not only the software, but also the environment in which it operates. The model should be a depiction of the software’s behavior, which can be described in terms of the input sequences accepted by the system; the actions, conditions, and output logic; or the flow of data through the applications, modules, and routines. In order to be useful for groups of testers and for multiple testing tasks, the description needs to be written down in an easily understandable form and be as formal as is practical. Useful models typically possess properties that make test generation effortless and, frequently, automatable. There are many formal modeling techniques (ways to depict behavior) from which to choose. For large complex systems it is often necessary for a team of testing/modeling experts to work together to derive the model. They use formal modeling techniques to communicate/coordinate their efforts.
A variety of techniques/methods exist for expressing models of user/system behavior. These include, but are not limited to:
§ Decision Tables – Tables used to show sets of conditions and the actions resulting from them
§ Finite State Machines – A computational model consisting of a finite number of states and transitions between those states, possibly with accompanying actions
§ Grammars – describe the syntax of programming and other input languages
§ Markov Chains (Markov process) – A discrete, stochastic process in which the probability that the process is in a given state at a certain time depends only on the value of the immediately preceding state
§ Statecharts – Behavior diagrams specified as part of the Unified Modeling Language (UML). A statechart depicts the states that a system or component can assume, and shows the events or circumstances that cause or result from a change from one state to another.
Table 1 describes characteristics of an application that indicate which technique is most appropriate.
Table 1: Modeling Method Guidance [Based on El-Far (2001a)]
Application Characteristics |
Suggested Modeling Method |
Processes formal language (e.g., web browser process HTML, compiler) |
Grammar |
Protocol-based |
Grammar |
State-rich systems (e.g., telephony systems) |
Finite State Machines |
Few states, transitions caused by external conditions, as well as user inputs |
Prefer statecharts over Finite State Machines |
Capable of being model by Finite State Machine; statistical analysis of failure data or reliability assessment is desired. |
Prefer Markov chains over Finite State Machines |
Use of operational profiles to guide test generation is desired. |
Markov chains |
Must ensure correctness for all combinations of input values. |
A tabular model;
Prefer Finite State Machines over Markov chains |
Need to represent conditions under which inputs cause a particular response, Finite State Machines too awkward. |
Decision tables |
Parallel system, individual components capable of being modeled by state machines |
Statecharts |
Parallel system, some components not capable of being modeled by state machines |
Models for different components; “one gaping hole” [El-Far 2001a] in Model-Based Testing |
Finite State Machines and Markov chains are the two most popular techniques in MBT for modeling user behavior. Finite State Machines can ensure that generated test cases cover the model. When a Markov chain model is used, a random process generates test cases, making coverage criteria more difficult to ensure in some specified number of test cases. The mathematics of Markov chains, however, provides analytical formulas to determine expected values useful in test planning.
Addressing these specific techniques in further detail is beyond the scope of this document. Additional information, however, is presented in [Vienneau, 2003] and in a number of references cited in the Reference section of this document.
Table 2 presents sequential heuristics useful in building a state-based model, such as a Finite State Machine or a Markov chain. These heuristics cannot replace a good understanding of the system under test; they provide guidance on how to use that understanding.
Table 2: Build the Model (Based on [El-Far 2001c] and [Whittaker, 1997])
List all inputs |
For each input, list the situations in which the input can be applied and the situations in which the input cannot be applied. |
For each input, list the situations in which the input causes different behaviors or outputs, depending on the context in which the input is applied. |
Generate expected inputs: Use the model to generate test cases, which consist primarily of specifying the inputs and expected outputs. The difficulty of generating tests depends on the nature of the model. In the case of finite state machines, it is a matter of implementing an algorithm that randomly traverses the state transition diagram (a directed graph). Tests are, by definition, the sequence of inputs along the generated paths. Thus, if the model is well defined, the tests can be generated automatically. In contrast, without automation and modeling tools, this task can be immense and near impossible to do manually for a complex system.
Generate expected outputs: Software testing involves execution of a program under test using some fault-revealing input data and examination of the output to determine success or failure. A fundamental assumption of this testing is that there is some mechanism, a test “oracle”, that will determine whether or not the results of a test execution are correct, - something that defines/identifies the expected outputs. As illustrated in the summary Figure above, expected outputs must be generated prior to running tests. A test oracle is the criterion used to check the correctness of the output. For example, the behavior of a competing product might be the basis for assessing the correctness of the product under test, i.e., “It should do what product B does”. Another example would be using a previous version of the software in which the component/feature under test did not experience significant change, i.e., “We should get the same results now that we got with this in version X.”
Form, fit and function of the test oracle is closely tied to (1) size and complexity of the software under test, and (2) the degree of automation in the testing process. The greater the size and/or complexity of the software, the greater is the need for automation. Yet, automation itself makes writing/using a test oracle more difficult.
The test oracle needs to be developed in such a way that it can “marry” expected outputs with corresponding tests so that success or failure can be determined automatically for the millions of test cases typically generated for a complex system. Also the oracle needs to be flexible enough to easily adjust to the dynamics of test generation.
Table 3 presents some comparisons of manual and automatic testing relative to test oracles.
Table 3: Comparison of Manual and Automated Testing
Manual Testing |
Automated Testing |
Manual testing is slow |
Automated testing is blazing fast (making manual/visual verification self-defeating) |
Fewer tests can be performed – tester must identify the “most important few” |
Millions of test can be performed, resulting in a much larger percentage of the functionality being covered during testing |
Oracles do not have to be as comprehensive (they only need to address a subset of significant behaviors the tester has time to perform) |
The test oracle must address all functionality addressed through automated testing – a much larger portion of possible behaviors |
Oracles can be manual (There is time to visually review screen output during manual testing) |
The high volume of tests prevents manual/visual assessment of results of individual test cases, necessitating automation of the test oracle implementation |
Finding/creating a test oracle can be an issue in MBT. Tests are generated automatically and in volume. Furthermore, test suites do not remain static. Thus, calculating expected outputs by hand is usually infeasible. Some work has explored the automatic generation of test oracles [Feather 1999]. In the absence of a good test oracle, one may need to settle for plausibility checks. Tests may be considered to have passed if their outputs are in certain ranges or they pass certain consistency checks. If the system is instrumented to identify its state, expected test outputs can include the system state. In many instances, this expected output would be generated automatically from the model in conjunction with test inputs.
In practice, this is often done by comparing the output, either automatically or manually, to some pre-calculated, presumably correct, output. However, if the program is formally documented it is possible to use the specification to determine the success or failure of a test execution. There is some current research relating to the development of a prototype tool that automatically generates a test oracle from formal program documentation.
Run tests: Most MBT environments are supported with test generation tools that generate tests (test cases) that can easily be translated into executable test scripts, or produce the test script directly from the test data contained within the tool. It is worthwhile investing some time in writing good efficient scripts, since they can be used for as long as the software needs testing (potentially through the maintenance cycle, too). Although tests can be run as soon as they are created, in most testing groups it is policy to run the tests only after a complete suite that meets certain adequacy criteria is generated. Typically there is a coverage plan that is being addressed. In some instances, only a small number of tests relating to a particular feature or component would be run, even though the complete suite has been generated. Having good test generation tools in place enhances the flexibility in scheduling and executing tests.
Compare actual with expected outputs: It is useless to have the capability to generate and run millions of tests unless you have a way to assess the results and take action based upon the results. The automation process in place should make the comparison of actual to expected outputs, and alert testers to the failures. Of course, this is dependent on the quality and completeness of the test oracle. MBT cannot make good information out of bad data. It is not a silver bullet. It should provide an efficient means to drill down into the particular test cases that failed. MTB is good at verifying the state of the software and cataloging state changes. It can provide assistance to testers (but not replace them) in verifying all aspects of the software.
Decide on further actions: Outputs from MBT can include:
§ A model of user behavior from which additional tests can be constructed
§ Test cases, including expected outputs for the test cases
§ Measures of test coverage attained by the generated tests
§ Test results from which the reliability of the system under test can be estimated
MBT supports management decision making relative to:
§ When to terminate system testing and release a software system
§ Revising the model
§ Generate more tests
There are typically four kinds of criteria on which release decisions can be made:
§ White box test coverage metrics. Some test automation tools track what percentage of the statements, branches, and so on, of the code of the system under test have been executed. One might decide to stop testing when a certain percentage has been attained for one of these white box coverage criteria.
§ Black box coverage criteria are based on characteristics of the user model developed under MBT. Many tools for MBT generate test suites to satisfy some black box coverage criterion.
§ Software Reliability. Many software reliability models can be fit to test data generated to resemble usage patterns in the field. Markov chain models of user behavior, for example, can be used to generate such test data. One might decide to release a software system when its reliability exceeds some goal at some level of confidence, as calculated by some specified reliability model.
§ Cost/Economic Metrics. Suppose one supplements a reliability model with data on the cost of finding bugs in the field, of finding bugs during testing, and the cost of testing for another unit of time. Some organizations use such data to make release decisions by comparing the cost of additional testing with the expected savings of finding a bug during testing, as opposed to in the field.
Analysis of test results may lead to identifying flaws in the model itself and warrant its revision.
Models not only give a good picture of what tests have been run, but also give insight into what tests haven’t been run. This information provides some guidance on when to stop testing and when to continue. A manager might choose to continue testing until there are no more non-repetitive tests presented by the model.
This section presents an illustrative example of Model-Based Testing. Te figure below is a Finite State Machine (FSM) model of a simple phone system. This model is of a phone that can call out. Nodes are states of the phone. Edges indicate actions that the user can take (these are inputs to the system). Test cases specify a sequence of inputs, the states the system is expected to be in after each action, and the value of all outputs of the system.
Finite State Machine for Simple Phone System
A Finite State Machine is an ordered quintuple of five sets:
1. A set of inputs, e.g., {Dial/Party Busy, Dial/Party Ready, Hang Up, Party Hangs Up, Party Picks Up, Pick Up}
2. A set of states, e.g., {Busy, Dial Tone, On-Hook, Ringing, Talking}
3. The set of initial states. Let the initial state be {On-Hook}
4. The set of final (terminal) states. Let this set be {On-Hook}
5. A partial function mapping from an ordered pair consisting of a state and an input to a subset of states. Figure 2 defines this function for the example. For example, f(Dial Tone, Dial/Party Busy) is {Busy}.
This FSM can be used to generate test cases. Table 5 shows a sequence of 15 inputs. This sequence begins and ends in the “On-Hook” state. It has the desirable property of executing every action possible at each state at least once. This property is called action coverage (the Figure can be thought of as showing a directed graph). The sequence is not unique, however. Using the model to generate test sequences, then, can result in a range of efficient test sequences, each stressing the software somewhat differently, and each achieving the same coverage criterion.
Table 5: Test Sequence Achieving Action Coverage
Action |
State |
|
Action |
State |
|
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Busy |
Busy |
|
Party Hangs Up |
Dial Tone |
Hang Up |
On-Hook |
|
Hang Up |
On-Hook |
Pick Up |
Dial Tone |
|
Pick Up |
Dial Tone |
Dial/Party Ready |
Ringing |
|
Dial/Party Ready |
Ringing |
Hang Up |
On-Hook |
|
Party Picks Up |
Talking |
Pick Up |
Dial Tone |
|
Hang Up |
On Hook |
One can think of a new test case starting each time the system enters the “On-Hook” state. Four test cases are shown in Table 5. Other test sequences covering all actions would have a different number of test cases. A test case should specify expected outputs, as well as the sequence of inputs. Suppose this phone system is instrumented to output its state. In this case, the expected outputs are generated from the FSM, as well. If the system has no other interesting outputs, a test oracle is not needed here.
Action coverage is the easiest test coverage criterion for a FSM model. Switch coverage provides the next level of rigor in measuring tests synthesized from a FSM. Switch coverage is met when, for each state, every pair of actions leading into and out of that state is covered in the test sequence. For example, consider the “Dial Tone” state. One possible path leading in and out of the “Dial Tone” state is the sequence (“Party Hangs Up”, “Dial/Party Ready”). Table 5 does not contain this sequence. Thus, this test sequence does not achieve switch coverage.
Table 6 shows a test sequence for achieving switch coverage. Twenty-six inputs are needed to achieve switch coverage, which is a greater number than the length of the sequence of inputs needed to achieve action coverage. This sequence is not unique. Other sequences of the same length can be generated which will achieve switch coverage. If one thinks of the state “On-Hook” as initiating a new test case, six test cases are presented in Table 6. The expected sequence of states is shown for each test case.
Table 6: Test Sequence Achieving Switch Coverage
Action |
State |
|
Action |
State |
|
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Hang Up |
On-Hook |
|
Party Hangs Up |
Dial Tone |
Pick Up |
Dial Tone |
|
Hang Up |
On-Hook |
Dial/Party Busy |
Busy |
|
Pick Up |
Dial Tone |
Hang Up |
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Ready |
Ringing |
|
Party Hangs Up |
Dial Tone |
Hang Up |
On-Hook |
|
Dial/Party Ready |
Ringing |
Pick Up |
Dial Tone |
|
Party Picks Up |
Talking |
Dial/Party Ready |
Ringing |
|
Party Hangs Up |
Dial Tone |
Party Picks Up |
Talking |
|
Dial/Party Busy |
Busy |
Hang Up |
On-Hook |
|
Hang Up |
On-Hook |
Pick Up |
Dial Tone |
|
|
|
SUMMARY CHARACTERISTICS
Enabling Practices: Link to Model-Based Testing Interrelationships Diagram
Enabled Practices: Link to Model-Based Testing Interrelationships Diagram
Impact Areas: Primary: Schedule Secondary: Cost; Quality
Life Cycle Phase: Production, deployment and maintenance
Scope/Authority: No consensus
Applicability: No indication
Use Indicators: Long test schedules; test development delays
Use Inhibitors: No indication
Appropriate Programs: High requirement volatility; highly complex software
Inappropriate Programs: Legacy and prototype programs
Barriers: Ignorance of model-based testing capabilities; expense in tool investment
Facilitators: Tools; training; incentives based on program quality
Model-Based Testing (MBT) can result in the following benefits:
§ Shorter schedules, lower cost, and better quality
§ A model of user behavior is one major artifact of Model-Based Testing
§ Enhanced communication between developers and testers in conjunction with developing the model
§ Early exposure of ambiguities in specification and design while developing the model [Robinson 1999]
§ Capability to automatically generate many non-repetitive and useful tests
§ Test harness to automatically run generated tests
§ Combination of MBT artifacts eases the updating of test suites for changed requirements (typically, only the model need be updated)
§ Capability to evaluate regression test suites (one can know what level of test coverage they obtain)
§ Capability to assess software quality (if tests are generated from a Markov model, the results of the tests provide inputs to typical software reliability models satisfying appropriate assumptions [El-Far 2001a]
These benefits all require an initial investment in tools and training.
DETAILED CHARACTERISTICS
Key Characteristics of the “Model-Based Testing” Gold Practice
Characteristic |
Comments |
Assumes Availability of Automation and Modeling Tools |
§ Modeling makes automatic generation of many test cases possible
§ Automation improves test coverage. Not possible to achieve the same degree of coverage with manual testing that is attainable with an automated testing system |
Formal Requirements Specifications |
§ Specification drives the model. The more complete the spec, the more likely the model will be a good reflection of the true behavior of the system
§ In some cases, the model (parts of the model) can be generated directly from the specification
§ Essential for large complex systems because many people need to develop a common understanding of the system
§ How test cases are generated depends on the notation used to record the behavior model, which is closely related to how requirements are recorded
§ Some requirements formalisms are (1) Software Cost Reduction Project by Naval Research Laboratory, (2) Specification and Description Language, used in the telecommunication industry, and (3) Universal Modeling Language Statecharts |
State Space Explosion |
§ A condition evident in finite state models in which the number of states of a system increases “beyond manageability”
§ Testers/modelers use “abstraction” and “exclusion” as two approaches to lowering the number of states in the model |
Skilled Testers |
§ Testers need to be knowledgeable about the modeling techniques and their underlying and supporting mathematics and theories
§ Working knowledge of finite state machines
§ Basic familiarity with formal languages, automata theory, graph theory, and elementary statistics
§ Temporarily assigning testing roles to failed programmers (or staff with nothing to do) will not work |
Large Complex Systems |
§ MBT is often the only way to address the volume of tests needed to ensure adequate coverage
§ MBT is adaptable to changing requirements |
Up-Front Investment in Time and Tools |
§ Sophisticated modeling tools are necessary but expensive
§ Most developers will need training in use of automation and modeling tools acquired
§ MBT not appropriate for short-lived systems – too much time and money up front is needed to reap any benefit – unless criticality of software demands it |
Requires Testing Infrastructure |
§ Requires a sustainable test bed to simulate the environment of the Software Under Test
§ Due to expense and sophistication of tools and equipment, it makes sense to share the infrastructrure across programs |
The objectives of black box tests, such as acceptance testing during System Test, are:
§ Increase confidence in the system under test
§ Find bugs in the system
§ Assess reliability or other certification measures
Given the size of the input domains for large systems, all input sequences cannot be tested. Thus, test cases must be constructed to sample from the input domain. How can test cases be generated to fulfill these test objectives? Research has shown that partition testing does not generate test cases that increase confidence. In any case, the tests generated from partition testing are not representative of typical usage and are, therefore, unsuited for reliability measurement. Suppose a test suite is handcrafted. Such a test suite will generally be difficult to update as program requirements evolve over the lifecycle. How can testing processes be devised so that tests can be more easily updated with changing requirements?
Issues relating to inefficient test cases and adapting test plans to updated requirements are manifested in long test schedules and development delays relating to testing. Acquisition managers should consider implementing MBT on programs in which these difficulties arise or are expected to arise. Tests can be adapted to changing requirements through updates to the model of user behavior. MBT generates tests to fulfill coverage criteria and a mechanism for adjusting testing rigor by choosing more rigorous testing criteria. Finally, MBT can provide data for reliability modeling, thereby supporting product measurement and the imposition of appropriate stopping criteria on testing. In other words, MBT provides quantitative data for metrics-based scheduling and management, and for progress measurement.
Implementing MBT requires an investment and some tailoring of the development process. Costs include training and tools. Tools directly related to MBT accomplish the following:
§ Construction of behavior model from requirements
§ Generation of abstract test cases
§ Conversion of abstract test cases to concrete test cases in a format appropriate for testing infrastructure
Tools for requirements and automated testing facilitate MBT:
§ Tools for capturing and maintaining requirements in a formal model (e.g., UML or SDL-based tools)
§ Configuration Management tools
§ Test harnesses
§ Tools to measure white-box test coverage metrics
§ Regression testing
§ Defect tracking system
§ Reliability modeling tools (e.g., Computer Aided Software Reliability Estimation (CASRE) or Statistical Modeling and Estimation of Reliability Functions for Systems (SMERFS))
Activities for MBT occur during every development and maintenance phase. Tests with test data generated from a model are conducted during Integration Test, System Test, and Operations and Maintenance. These are all examples of phases in which black box testing can be appropriate. Model-Based Testing provides guidance on whether such tests should be continued, as is also shown in the figure. The testers begin constructing the model in the requirements phase and update it during design and coding phases, as needed.
The Figure below represents a high-level process architecture for the subject practice, depicting relationships among this practice and the nature of the influences on the practice (describing how other practices might relate to this practice). These relationship statements are based on definitions of specific “best practices” found in the literature and the notion that the successful implementation of practices may “influence” (or are influenced by) the ability to successfully implement other practices. A brief description of these influences is included in the table below.