A Strategy For Testing C++
A Strategy For Testing C++
Executive Summary Compared with a procedural language such as C, the testing of C++ presents some novel problems. This paper discusses what those problems are and outlines an approach to the verification of C++ code. Some reference is made to implementing these strategies using the IPL Cantata++ tool.
This document attempts to outline a strategy for testing software written in C++. This means providing guidance on the thought processes you should be going through when planning verification of C++ software, preferably before any code is written. The first section is devoted to that task. The second looks at what you can do once code starts to become available for testing, though without any particular regard to any tools you might be using. Section Three describes the facilities offered by Cantata++ and indicates how this might help in selected situations.
Requirements Analysis
Acceptance Test
Architectural Design
System Integration
Detailed Design
Software Integration
Fig 1. The V-model of the Software Lifecycle. This suggests that verification of code should begin at the software unit/module level, and proceed in an orderly fashion through integration testing (various sub-levels can exist here), up to the System Acceptance Testing step, which should be the final stage before release of the software to users. This is the recommended approach in cases where the end system needs to be reasonably robust. The reliability is achieved through the different levels of testing, each providing a greater degree of confidence in the quality of the software. However, in many situations this orderly approach is neither possible nor practical. In many situations detailed specifications of integration items and modules are not possible because the end user is unclear about what they really want. Other scenarios where it may be impossible to specify modules tightly is where performance constraints may mean that different possible implementations have to be trialled before choosing one which provides what is wanted. In situations like these it is quite likely that the only testing that takes place will be at the Acceptance Test stage. In a properly run software development project there will need to be a degree of formality at some point in the testing side. Under these rules software is only allowed
IPL Information Processing Ltd
to pass from one person to another for further testing when it can be demonstrated to do what it is supposed to do. Formality means that the test itself should be reviewable (to decide whether it means anything), the test results should give a clear statement of pass or failure (as well as meaningful diagnostics of any failures), and the test should be repeatable. (The contrast is with informal testing, which is what programmers tend to like to do when no one is watching.) The big question is at what stage do you introduce formal testing? Start at the Unit Test level. This essentially means testing each unit in the system, possibly building larger tests up by using previously tested units. A unit can be a module (typically C++ file) or the class itself (if more than one class per file. The minimum requirement is that a specification should exist, preferably in written form, describing what each unit should do. The general approach will be that having tested the base classes to get a reasonable degree of confidence that they work, each successive derived class is tested in an approach known generally as Hierarchical Integration Testing. More will be said about this later. This approach suits highintegrity developers because of the reliability built in by using only fully-tested components. Start at the Integration Test level. In the rush to produce a working system there may not be time to test at the unit level. In this case, a reasonable compromise is to defer formal testing to the cluster level, where a cluster is group of classes. In a multi-threaded application there may be advantages to testing at the thread level, which is usually a stage higher than clusters. Either way, or indeed whatever entry level of testing is chosen as most appropriate, there must be a specification for whatever it is you are testing. Start at the System Test level. A lot of developers choose to defer any formal test to the application level. The obvious advantage is that this gives users an early indication of what theyre getting and how robust it seems. The drawback is that since the underlying components will not have been individually tested it will be unclear how robust it really is! To go some way to counter this weakness the technique of coverage analysis is strongly recommended as a means of finding out which parts of the overall system have not been exercised by the tests. These may typically be obscure branches of functionality or error handling. Once revealed, a decision can then be made on whether to add more tests to cover these paths. In some cases it may be decided that the unexercised paths are simply not wanted and should be removed. Much more will be said about coverage analysis later (Sections 1.4 and 2.7).
practical to stub an external class due to the near impossibility of trying to stub constructors of the external classes (of which there may be several layers!). Add that to the fact that each class may contain many member functions, and you will quickly come to appreciate that simple stubs are not up to the job! The two solutions are to either employ wrapping techniques (see Section 2.4) or to make the external classes more stubbable through a design which specifies that base classes be coded as abstract classes. Abstract Base Classes (ABCs) and their partners, Concrete Implementation Classes (CICs) form a technique for completely separating a class interface definition from its implementation. Normally a C++ class declaration defines both the interface (public) and some features of its implementation (the private part) in a single place. In the ABC/CIC technique the class is split into two: 1. The ABC which defines the (public) interface to the class as pure virtual member functions; 2. The CIC which inherits (publicly) from the ABC and provides the implementation of the class. Clients of the class depend only on the ABC not on the CIC. To stub the class we retain the ABC but provide an alternative (stub) implementation class. Consider the following small subsystem (arrows indicate a dependency):
Software Under Test External Class A
External Class B
External Class C
External Class D
Fig 2.i Small subsystem showing unit under test and some immediate and indirect dependencies. Attempting to isolate the software under test would involve stubbing all the external classes, which is difficult or impossible.
Fig 2.ii Same subsystem re-implemented using ABCs and CICs. The subsystem can be implemented using ABCs and CICs, which looks a more complex design, but leads to easier testing.
External Class A (interface) Software Under Test Stub for A
Stub for B
Stub for C
Fig 2.iii Software can now be tested with a manageable array of stub classes. In the new design it is now possible to test the software using stubs for classes A, B and C, and omitting the stub for class D completely. Unfortunately, there is generally speaking, a price to be paid for use of this technique, namely that each virtual member call involves an additional indirect memory access. This can lead to unacceptable levels of inefficiency, so this technique is not a panacea. A later section (2.4) describes the use of wrapping as an alternative method of simulating external classes, but nevertheless use of ABCs does represent a useful approach for enabling the isolation of significant sub-systems. For more detail of these ideas see [Dorman#1] and [Lakos]. For more on how the ABC/CIC approach can improve maintainability see [Martin]. Design Validation with Metrics There is a stage in software design where the code structure has been mapped out as C++ header files, and it would be worth gaining some idea of how good this is, in terms of OO design. This subject is covered in more detail later (Section 2.1), but for now just note that some OO metrics can be useful.
Shape
(abstract)
Shape Factory
(abstract)
Shape Test
Circle
Square
Circle Factory
Square Factory
used to create Circles
Circle Test
Square Test
Fig 3. Shows how Factory classes and corresponding Test classes are related. Shape is an (abstract) base class and tests can be created to check the properties of this. However, when moving to test derivations from Shape, e.g. Circle, it might be impossible to re-useably re-run the Shape test (to check the Shape properties of Circle). The solution is to use a Factory class for the creation of objects, whether for testing or for normal use, because this acts more as an interface for object creation allowing the details of the type to be supplied only when known. Thus, a ShapeFactory class will be defined, and this can be passed (as a reference parameter) to a ShapeTest class for testing the Shape properties. Then, a CircleFactory class can be created by derivation from ShapeFactory, and CircleTest can then be created from ShapeTest. Now, when CircleTest runs the ShapeTest test cases, it passes it the CircleFactory object. This is
used by the ShapeTest test cases which tests that the Circles so created are indeed valid Shapes. Use of ShapeFactory ensures that when a further derived class, Square, needs to be tested then the Shape tests can be re-run in the context of a Square object without too much difficulty. The C++ details of Factory class implementation are not given here. Further information on these ideas are presented in Section 2.5.
look at the values held by key variables (e.g. state). You might, for example, want to check that such a variable has held a full range of possible values before being able to say that the module has been 100% tested. The above coverage types are applicable to both procedural languages (e.g. C) and OO languages such as C++. However, for C++ it is arguable that structural coverage on its own is not enough. Described below are some examples where it is advisable to get a high degree of (structural) coverage in a range of contexts derived classes, states, and threads. Context Coverage Derived Classes When testing derived classes it is possible to gain a misleading impression of how well an underlying base class has been tested because structural coverage achieved on the base class can accumulate. A simple example (Fig 4) illustrates this point:
Fig 4. Shows how coverage achieved on two derived class can give a misleading impression of coverage on the common base class. This problem applies to all the traditional structural coverage metrics none of them take into account the need for re-test to exercise the interactions between inherited base class methods and the overridden methods in each derived class. The solution is to specify that the required level of coverage must be achieved in a specific context. We consider this to be OO-Context Coverage [Dorman#2]. Context Coverage State Machines C++ classes can frequently act as state machines i.e. the behaviour of the class depends not just on which member functions are called but also on what current state it is in. State information is held in variables, but the actual definition of state may be a matter of interpretation. For a state machine, we may require that a structural coverage level is achieved for each member function in a range of possible states.
IPL Information Processing Ltd
A simple example will hopefully illustrate the point. A stack class implementation will have at the very least a push and a pop member function. A stack can also have states empty, full and partially full (although defining these may require interpretation of private data variables). A complete test of this class should involve, at the very least, calling of both push and pop in all three possible states. See [Binder] for more on State Machine testing. Context Coverage - Threads Finally, the behaviour of a class may be dependent on which execution thread it is being used in. In order to avoid misleading structural coverage information it may be beneficial to generate coverage data which shows for each class in each possible thread what structural coverage levels were reached and what elements were missed. The raw coverage data can be used to analyse how the software under test actually behaves in the presence of complex thread-interaction issues. Context Definition From the above discussions it can be seen that it may be useful to include in the private member functions of the class, a method which returns the value of the current state or the current thread ID.
In a later section we suggest some target values for these metrics in the context of the Cantata++ tool.
For example, referring again to the stack class, it would be sensible to follow up the first test case on the initial state of the stack object, with a test case that looks at its behaviour when having items inserted and removed. The simplest example would be to push an item and then to see whether it can be popped off returning the same value. It would be sensible also to see if the stack is restored to its previous state. A little imagination is all that is needed to construct a sequence of test cases which taken as a whole give you, the tester, a reasonable degree of confidence that the object is indeed working properly. Of course, every time a test case fails you will need to stop, debug, fix the code (or the test!), and then re-run the test which previously failed. In devising a set of test cases it is reasonable to question whether each test case should create a new object (thus making each test case an independent item), or whether each test case should use the retained state of the object from the previous test. It is our experience that the former is generally a better approach. There may be some duplication but the benefits of having each test case independent of the others, if necessary, far outweighs the disadvantages. Exception Monitoring C++ supports user-definable exceptions. When testing any C++ module some consideration should be given to verifying exception behaviour. There are four possibilities defined in the table below: Exception is Thrown Exception is Expected Exception is Not Expected Test pass Test fail Exception is Not Thrown Test fail Test pass
Table 2. Matrix to use when planning exception monitoring test cases. As mentioned in the previous section it should be the testers job to search out anomalous behaviour, and verifying exceptions is definitely part of this. Since exception throwing is frequently programmed to report error returns from external calls some cleverness may need to be exercised in finding ways to force these. See the later Section (2.4) on Stubbing and Wrapping. Repeatability and Portability Finally a couple of tips on making life easy for yourself from the start. As mentioned before, the purpose of testing is to find bugs. Once found you need to track down and fix the fault, probably using a debugger. It may be that your fix does not actually work, so the test must be re-run to check for this. It may also be the case that your fix does indeed solve the initial problem but introduces a new problem. For both of these reasons it is advisable to make sure that your tests are easily repeatable. This should, ideally, be without human intervention to make them work (i.e. automated), and that where possible they should present the test results in a format that is both readable
and unambiguous. If a test fails this needs to be clear, both as a fact (i.e. Test Failed) and with symptoms provided (i.e. Test Failed because). A programmer working on his/her own will need repeatable tests for the reasons just mentioned. With several programmers working in a team the need for repeatability of tests is even more acute because of the possibility of changes introduced by one having a negative effect on the code written other members of the team. If a programmer changes a base class, then in addition to retesting that object every derived class will need to be retested. The way to make this practical is to ensure that each individual test is repeatable, and that the whole suite of tests can be run in an automated fashion as a regression test. During a period of intensive code development it is sensible to run the regression tests at least once a day. Portability is another issue which may affect people. Code for a product may be developed on one platform (the development host) for eventual execution by a user on a different platform (the target). If a piece of code works perfectly correctly on the host, what guarantee is there that it will work the same on the target? Only a complete optimist will say that it must! The fact is that the C++ language definition contains many gaps and ambiguities, and furthermore that different compilers have different implementations of the same features, sometimes deliberately, sometimes by mistake. The only way to guard against compiler differences in this way is to write your tests from the outset to be portable from the host to the target, and to develop the discipline of ensuring that target tests are run!
probably the dependency of your test on the private data of the class under test. Public interfaces rarely change, once the class has been written. Private data is more likely to change, possibly forcing changes to be made to your test scripts. Being Able to Call Private Methods Directly Private member functions exist to support the public member functions. If the private member functions do not work correctly then neither will the public ones. It can be argued that the private methods should be tested before any attempt is made to test the class at a public level, but unfortunately this is not usually easy. Having white-box access to a class implementation means that these tests can be run if necessary directly on the methods themselves. This will in turn affect how test case planning can be approached because you can now plan a set of test cases designed specifically to verify the private methods before turning attention to the behaviour of the class as a whole. Again, bear in mind the extra dependency that the test script will have on the class under test. It is a matter of weighing up the benefits, in terms of making the tests easier to write, against possible maintenance problems.
of the time, and just occasionally behave (or seem to behave) differently. It will also not work operators. For these and other related situations you will need to consider the use of wrappers. Fig 5 shows how a wrapper works. "before"
Verify parameters Change parameters
SUT
Return, or throw exception
"after"
Fig 5. Wrappers Before and After A wrapper is effectively a small piece of software which sits between the software under test (SUT) and the external software (class or function). It has two parts to it, namely a before wrapper and an after wrapper. The before wrapper is capable of being programmed so that, when activated for a particular instance (defined by the tester) it can optionally check the parameter values being written to the external software and even change these values if wanted. The after wrapper can, using an analogous mechanism, change a return value coming from the external software or throw an exception. The key point is that, whatever wrapper actions are programmed in the real external software is still called so that all behaviours wanted from that will actually occur. Further useful checks can be built into wrappers, such as the ability to verify that external calls are in fact made, and made in a specific sequence as a way of verifying what is going on. A comparison between stubs and wrappers is given below (Table 3).
Isolation Testing Check call order Check parameters Call original function Set return value Throw exception Change output parameters Call original function with modified parameters (optional) (optional) (optional)
Use with system calls Use selectively (based on call-site, as well as function called) Original function is linked with test
Table 3. Comparison between Isolation Testing with stubs and use of Wrappers.
2.6 Templates
Templates are another case where code reuse promises endless productivity gains. True, but only if the specialisations are testable, and we must once again recognise the possibility that functionality may be disrupted if template specialisations are made on inappropriate types. This means of course that each specialisation needs to be individually tested. There are several ways of doing this, but the efficient way to achieve this is to write a template test which is itself a template. The script can then, with very little effort, be re-instantiated for every specialisation type.
You can, if you wish, take a very lax view of context coverage, meaning you dont care how a unit achieves structural coverage, or you can take a strict view, which means that a high level of structural coverage must be achieved for every possible context. Do not forget that State and Thread contexts are user-defined.
templates, states or threads. For the latter two contexts, the user is required to define the states or threads which will be of interest. Two actions are necessary to make use of Cantata++ coverage. First, you need to gather the coverage data. This is achieved by using Cantata++ to instrument (a copy of) the source files containing the software under test. Secondly you need to report on the coverage values obtained. You will need to add coverage checks and report directives to the test script. Note: Coverage reporting does not have to be done at the same time as gathering coverage data. If there are no coverage directives, then the coverage data will be automatically exported to a file, to allow it be checked and reported on at a later stage. This is particularly useful for application coverage as a standalone activity. When used normally the tool is able to generate a test report which indicates (pass, fail or warning) on whether the desired coverage level was reached. It can also produce reports which show which parts of the code (in which context) were not executed. It is then up to the user to analyse these results to decide whether more testing is needed, or possibly that some redundant code has been discovered and should be removed! Cantata++ coverage can be applied in two ways: As Part of Unit/Integration Testing In this, a Cantata++ test script as described above forms the main test drive mechanism, and the coverage results contribute to and form part of the overall result picture. Thus, you could have a test which passes on logical behaviour but fails coverage, or vice versa, or indeed any combination. As a Standalone Activity In this, the test driving is done by some means independent of Cantata++, but the tool is nevertheless involved to provide the coverage. This is a typical situation when using a GUI tester to drive an application level test, but doing coverage to check how thorough that test is.
5. References
[Binder] R. Binder, The FREE Approach to Testing Object-Oriented Software, http://www.rbsc.com/pages/FREE.html [Dorman#1] M. Dorman, C++ Its Testing Jim But Not As We Know It, Proceedings of the Fifth European Congress of Software Testing and Review, 1997, paper available at IPL, http://www.iplbath.com [Dorman#2] M. Dorman, Advanced Coverage Metrics for Object-Oriented Software, paper available at IPL, http://www.iplbath.com [Harrold] M.J. Harrold, J.D. McGregor, and K.J. Fitzpatrick, Incremental Testing of Object-Oriented Class Structures, Proceedings of the Fourteenth International Conference on Software Engineering, 1992, pp. 68 - 80. [IPL] Cantata++ product information at http://www.iplbath.com [Lakos] J. Lakos, Large Scale C++ Software Design, 3 part series starting in C++ Report June 1996
IPL Information Processing Ltd
[Liskov] Liskov, B and J.Wing, A Behavioural Notion of Subtyping, ACM Transactions on Programming Languages and Systems, Vol 16 No 6, November 1994, pages 1811-1841. [Martin] R.C. Martin, The Dependency Inversion Principle, C++ Report May 1996 (also available at http://www.oma.com) [McGregor] McGregor,J.D. and A.Kare, PACT: An Architecture for Object-Oriented Component Testing, Proceedings of the Ninth International Software Quality Week, May 1996.