0% found this document useful (0 votes)
26 views14 pages

The Software Testing Automation Framework

Uploaded by

testdm5961
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views14 pages

The Software Testing Automation Framework

Uploaded by

testdm5961
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

The Software Testing

Automation Framework
by C. Rankin

Software testing is an integral, costly, and time- placing it under machine or program control. In our
consuming activity in the software development case, the process in question was software testing.
life cycle. As is true for software development in
general, reuse of common artifacts can provide Through reuse and automation, we planned to re-
a significant gain in productivity. In addition, duce or remove the resources (i.e., hardware, peo-
because testing involves running the system ple, or time) necessary to perform our testing.
being tested under a variety of configurations
and circumstances, automation of execution- To help illustrate the problems we were seeing and
related activities offers another potential source
of savings in the testing process. This paper the solution we produced, I use a running example
explores the opportunities for reuse and of one particular product for which I was the SVT
automation in one test organization, describes lead. This product, the IBM OS/2 WARP* Server for
the shortcomings of potential solutions that are e-Business, encompassed not only the base operating
available “off the shelf,” and introduces a new
solution for addressing the questions of reuse system (OS/2*—Operating System/2*) but also included
and automation: the Software Testing the file and print server for a local area network
Automation Framework (STAF), a multiplatform, (LAN) (known as LAN Server), Web server, Java**
multilanguage approach to reuse. It is based on virtual machine (JVM), and much more. Testing such
the concept of reusable services that can be a product is a daunting, time-consuming task. Any
used to automate major activities in the testing
process. The design of STAF is described. Also improvements we could make to reduce the com-
discussed is how it was employed to automate a plexity of the task would make it more feasible.
resource-intensive test suite used by an actual
testing organization within IBM. For our purposes, a test suite is a collection of tests
that are all designed to validate the same area of a
product. I discuss one test suite in particular, known
affectionately as “Ogre.” This test suite was designed
to perform load and stress testing of LAN Server and
the base OS/2. Ogre is a notoriously resource-inten-

I n late 1997, the system verification test (SVT) and


function verification test (FVT) organizations with
which I worked recognized a need to reduce per-proj-
sive test suite, and we were looking at automation
to help reduce the hardware, number of individuals,
and time necessary to execute it.
ect resources in order to accommodate new projects
in the future. To this end, a task force was created
to examine ways to reduce the expense of testing. 娀Copyright 2002 by International Business Machines Corpora-
This task force focused on improvement in two pri- tion. Copying in printed form for private use is permitted with-
mary areas, reuse and automation. For us, reuse re- out payment of royalty provided that (1) each reproduction is done
fers to the ability to share libraries of common func- without alteration and (2) the Journal reference and IBM copy-
tions among multiple tests. For purposes of this right notice are included on the first page. The title and abstract,
but no other portions, of this paper may be copied or distributed
paper, a test is a program executed to validate the royalty free without further permission by computer-based and
behavior of another program. Automation refers to other information-service systems. Permission to republish any
the removal of human interaction with a process and other portion of this paper must be obtained from the Editor.

126 RANKIN 0018-8670/02/$5.00 © 2002 IBM IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
With a focus on reducing the complexity of creating Figure 1 Software testing cycle
and automating our testing, we looked at existing so-
lutions within IBM and the test industry. None of
these solutions met our needs, so we developed a
new one, the Software Testing Automation Frame-
work (STAF). This paper explores the design of STAF, DEVELOP- EXECUTION ANALYSIS
explains how STAF addresses reuse, and details how MENT OR
STAF was used to automate and demonstrably im- REVIEW
prove the Ogre test suite. The solution provided by
STAF is quite flexible. The techniques presented here DESIGN
could be used by most test groups to enhance the
efficiency of their testing process.

The problem PLANNING

Figure 1 depicts the software testing cycle. Planning


consists of analyzing the features of the product to
be tested and detailing the scope of the test effort.
Design includes documenting and detailing the tests
that will be necessary to validate the product. De-
velopment involves creating or modifying the actual Several issues arise when this activity is left to be re-
tests that will be used to validate the product. Ex- invented by each tester or group of testers, instead
ecution is concerned with actually exercising the tests of using a common reusable routine. The problems
against the product. Analysis or review consists of are:
evaluating the results and effectiveness of the test
effort; the evaluation is then used during the plan- ● Log files are stored in different places: Some groups
ning stage of the next testing cycle. create log routines that store the log files in the
directory in which the test is run. Others create
Reuse is focused on improving the development, and log routines that store them in a central directory.
to a lesser extent the design, portions of the testing This discrepancy makes it difficult to determine
cycle. Automation is focused on improving the ex- where all the log files for tests run on a given sys-
ecution portion of the testing cycle. Although every tem are stored. Ultimately, you have to scour the
product testing cycle is different, generally, most per- whole system looking for log files.
son-hours are spent in execution, followed by devel- ● Log file formats are different: Different groups or-
opment, then design, planning, and analysis or re- der the data fields in a log record differently. This
view. By improving our reuse and automation, we difference makes it difficult to write scripts that
could positively influence the areas where the most parse the log files looking for information.
effort is expended in the testing cycle. ● Message types are different: One group might use
“FATAL” messages where another would use
The following subsections look individually at the “ERROR,” or one group might use “TRACE” where
areas of reuse and automation and delineate the another would use “DEBUG.” This variation makes
problems we faced in each of these areas. it difficult to parse the log files. It also increases
the difficulty in understanding the semantic mean-
Reuse. This subsection provides some examples from ing of a given log record.
the OS/2 WARP Server for e-Business SVT team that
motivated the desire for reuse. Within the team, None of these problems is insurmountable, and many
there were numerous smaller groups that were fo- could be handled sufficiently well through a “stan-
cused on developing and executing tests for differ- dards” document indicating where log files should
ent areas of the entire project. We wanted to ensure be stored, the format of the log records, and the
that each of these groups could leverage common meaning, and intended use, of message types. None-
sets of testing routines. To better understand this de- theless, this list provides justification for our desire
sire for reuse, consider some of the potential prob- for common and consistent reusable routines. Also,
lems surrounding the seemingly simple task of log- additional problems exist that cannot be addressed
ging textual messages to a file from within a test. by adhering to standards.

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 127


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Multiple programming languages. Our testers write a Existing automation components. As we examined the
wide variety of tests in a variety of programming lan- types of components that were continually being re-
guages. When testing the C language APIs (applica- created by our teams, as well as those that would need
tion programming interfaces) of the operating sys- to exist to support the types of automation we wanted
tem, they write tests in C. When testing the command to put in place (as described in the following sub-
line utilities of the operating system or applications section), we realized that we would need a substan-
with command line interfaces, they write tests in tial base of automation components. Some of these
scripting languages such as REXX (which is the na- components included process execution, file trans-
tive scripting language of OS/2). When testing the Java fer, synchronization, logging, remote monitoring,
virtual machine of the operating system, they write resource management, event management, data
tests in the Java language. In order for our testers management, and queuing. Additionally, these com-
to use common reusable routines to perform such ponents had to be available both locally and in a re-
tasks as logging, described above, the routines mote fashion across the network. If the solution did
needed to be accessible from all the languages they not provide these components, we would have to cre-
use. ate them. Therefore, we wanted a solution that pro-
vided a significant base of automation components.
Multiple codepages. OS/2 WARP Server for e-Business
was translated into 14 different languages, among Automation. This subsection provides some exam-
them English, Japanese, and German. It is not un- ples, using the Ogre test suite, to motivate the need
common for problems to exist in one translated ver- for automation. As was mentioned, this test suite was
sion but not in another. Therefore, we were respon- designed to test the LAN Server and base OS/2 prod-
sible for testing all of these versions. Testing multiple ucts under conditions of considerable load and stress,
versions introduces additional complexities in our where load means a sustained level of work and stress
tests, and in particular to any set of reusable com- means pushing the product beyond defined limits.
ponents we wanted our testers to use. One specific The test suite consists of a set of individual tests fo-
aspect of this situation is the use of different cused on a specific aspect of the product (such as
codepages by different translated versions. A transferring files back and forth between the client
codepage is the encoding of a set of characters (such and server). These tests are executed in a looping
as those used in English or Japanese) into a binary pseudorandom fashion on a set of client systems. The
form that the computer can interpret. Using differ- set of client systems is typically large, ranging up-
ent codepages means that one codepage can encode wards of 128 systems. The set of servers that are be-
the letter “A” in one binary form and another can ing tested is usually very small, typically no more than
encode it in a different binary form. Hence, care must three. The test suite executes on the client systems
be taken when manipulating the input and output for an extended period of time, typically 24 to 72
of programs that use different codepages—a situa- hours. The combination of the number and config-
tion our testers would frequently encounter when uration of clients and servers and the amount of run
testing across multiple translated versions of our time represents a scenario. If all the clients and serv-
product. If our testers were going to use a common ers are still operational after the prescribed amount
set of routines for reading and writing log files, those of time, the scenario is considered to be successful.
routines had to be able to handle messages not only Multiple scenarios are executed during a given SVT
in an English codepage, but also in the codepages cycle.
used by the other 13 languages into which our prod-
uct was translated. Figure 2 shows the basic procedure flow used to ex-
ecute a given Ogre scenario. Note the areas in red.
Multiple operating systems. While we were directly These areas indicate which steps in the procedure
testing OS/2 WARP Server for e-Business, it was es- are currently done manually. The following subsec-
sential for us to run tests on other operating systems, tions describe these areas in more detail.
such as Windows** and AIX* (Advanced Interactive
Executive*) to perform interoperability and compat- Test suite execution. Our existing mechanism for start-
ibility testing with our product. If we wanted our ing or stopping a scenario was to have one or more
testers to use common reusable routines to perform individuals walk up to each client and start or stop
such tasks as logging, described above, the routines the test suite. Given the situation of 128 clients
needed to be accessible from all the operating sys- spread throughout a large laboratory, this exercise
tems we used. is expensive, both in time and human resources. This

128 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
method also introduces the potential of skipping one Figure 2 Ogre scenario flow before automation
or more clients, which can have a significant impact
on the scenario (such as not uncovering a defect due
to insufficient load or stress). Therefore, we wanted
a solution that would allow us to start and stop the START
scenario from a central “management console.”
INSTALL
Test suite distribution. As new tests were created or NEW BUILD
existing tests were modified, they needed to be dis-
tributed to all the client systems. Our existing mech- YES DISTRIBUTE
anism consisted of one or more individuals walking TEST CASES
CHANGED TEST CASES
around to each client copying the tests from diskettes.
NO
This method was complicated by the fact that the
tests did not always exist in the exact same location START
SCENARIO
on each client. Like the previous problem of test suite
execution, this mechanism was very wasteful of time YES NO
and human resources. It also introduced another po- MONITOR
NEW
tential point of failure whereby one or more clients BUILD? SCENARIO
do not receive updated tests, resulting in false er-
rors. Therefore, we wanted a solution that provided
STOP YES
a mechanism for distributing our tests to our clients SCENARIO
CONFIGURATION
correctly and consistently. CHANGE?
NO

Test suite monitoring. While a scenario was running, UPDATE YES


SERVER
we were responsible for continually monitoring it to CONFIGURATION FAILURE?
ensure that no failures had occurred. Our existing NO
mechanism consisted of one or more individuals SCENARIO NO
walking around to each client system to look for er- COMPLETE?
rors on the system screen. Such monitoring was par- YES
tially alleviated by the fact that the tests would emit END
audible beeps when an error occurred. The beeps
generally made it possible to simply walk into the PROCEDURE DECISION MANUAL
laboratory and “listen” for errors. Unfortunately, we ENTRY OR EXIT POINT TASK
still had to monitor the scenario after standard work
hours and on the weekend, which meant having in-
dividuals periodically drive into work and walk
around the laboratory looking and listening for er-
rors. Again, this method was very wasteful of time accessed relative to one another. This configurabil-
and human resources. It was also a negative morale ity allowed us, for example, to make a one-line
factor, since it was considered “grunt” work. There- change that would prevent the clients from access-
fore, we wanted a solution that provided a remote ing a given server (in case a problem was currently
monitoring mechanism so that the status of the sce- being investigated on it) or increase or decrease the
nario could be evaluated from an individual’s office stress one server received in relation to another.
or by telneting in from home. However, the only viable way to modify these pa-
rameters was to stop and start the entire scenario.
Test suite execution dynamics. The Ogre test suite was As an example, assume that 36 hours into a 72-hour
already very configurable. An extensive list of prop- scenario, we found a problem with one of the serv-
erties was defined in a configuration file that was read ers. We could stop the scenario, change the config-
during test suite initialization (and cached in envi- uration file to make the server unavailable, and then
ronment variables for faster access). These proper- restart the scenario, which allowed us to exercise the
ties manipulated many aspects of the scenario, such remaining servers while the problem was being an-
as which resources were available on which servers, alyzed. Then, 12 hours later, when a fix for the prob-
which servers were currently off line, and the ratios lem had been created, we needed to bring the newly
defining the frequency with which the servers were fixed server back into the mix. In order to do this,

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 129


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Figure 3 Single Ogre instance before multi-instance
(in the case of a printer resource) through which the
support resource will be accessed. When running multiple
instances of the test suite, race conditions arise sur-
rounding which drive letter or printer port to spec-
ify at any given time. Therefore, we wanted a solu-
START tion that allowed us to manage the drive letter and
printer port assignments among multiple instances
SELECT RANDOM of the test suite.
SUBTEST

Test suite synchronization. Some of our tests have


strict, nonchangeable dependencies on being the only
SELECT RANDOM
SERVER(S) process on the system running that particular test.
When running multiple instances of the test suite,
we needed a way to avoid having multiple instances
ESTABLISH executing the same test simultaneously. Therefore,
CONNECTION(S)
TO SERVER(S) we wanted a solution that allowed us to synchronize
access to individual tests.
EXECUTE PROCEDURE
SUBTEST ENTRY OR EXIT Existing solutions
NO MULTI-INSTANCE
CONFLICT Because we had two separate problems (reuse and
RELEASE
CONNECTION(S)
automation), we realized we might need to find two
MULTI-INSTANCE separate solutions. However, we were hoping to find
TO SERVER(S)
CONFLICT
a single solution that would address both problems.
Our preferences, in order, were:

1. A single solution designed to solve both problems


2. Two separate solutions designed to work together
we had to stop and start the entire scenario, which 3. A solution to reuse, which provided components
effectively negated all of the run time we had accu- designed to support automation, from which we
mulated on the other servers at that point. Similar could build an automation solution
situations arose when we needed to change server 4. Two separate, disjoint solutions
stress ratios or other configuration parameters.
Therefore, we wanted a solution that would allow In the following subsections, I describe existing so-
us to change configuration information dynamically lutions that we explored, how they addressed the
during the execution of a scenario. problems of reuse and automation, and how they re-
lated to our solution preferences.
Another long-standing issue with Ogre was that we
were only able to execute one instance of the test Scripting languages. Scripting languages such as
suite at a time on any given client. It was felt that Perl, Python, Tcl, and Java (although Java would not
the ability to execute multiple instances of the test technically be considered a scripting language, since
suite on the same client at the same time would al- it does require programs to be compiled) are very
low us to produce equivalent stress with fewer cli- popular in the programming industry as a whole, as
ents. Figure 3 shows the basic procedure flow of a well as within test organizations, since they facilitate
single instance of the Ogre test suite executing on a rapid development cycle. 1 As programming lan-
a given system. Note the areas in red. These areas guages, scripting languages are not intended to di-
indicate places where running multiple instances of rectly solve either reuse or automation. Addition-
Ogre on the same system creates conflicts. The fol- ally, they are not directly targeted at the test
lowing two subsections describe these areas in more environment, although their generality does not pre-
detail. clude their use in a test environment. Despite these
limitations, we felt that given the wide popularity of
Test suite resource management. In order to make a scripting languages and the almost fanatical devo-
connection to a server, the client must specify a drive tion of their proponents, we should examine their
letter (in the case of a file resource) or a printer port potential for solving our problems.

130 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Although scripting languages are not a direct solu- itoring, and test suite execution dynamics unsolved.
tion to reuse or automation, scripting languages do Additionally, test harnesses have no direct or gen-
have some general applicability to the problem of eral applicability to the problem of reuse. Thus, test
reuse. To begin with, they are available on a wide harnesses are, at best, only part of the solution to
variety of operating systems. They also have large category 4 of our preferences. That having been said,
well-established sets of extensions. Although not the proximity of test harnesses to the test environ-
complete from a test perspective, these extensions ment made it likely that one or more test harnesses
would provide a solid base from which to build. Ad- would play a role in our ultimate solution. However,
ditionally, some languages (notably Tcl and Java) we still needed to find a solution for reuse and de-
provide support for dealing with multiple codepages. termine which, if any, of the existing test harnesses
we would use and extend to fill in the rest of the au-
The benefits of scripting languages would clearly tomation gaps.
place them in category 3 of our preferences. Unfor-
tunately, these benefits are only available if one is CORBA. At a very basic level, CORBA** (Common
willing to standardize on one language exclusively. Object Request Broker Architecture) is a set of in-
As was mentioned earlier, our testers create tests in dustry-wide specifications that define mechanisms
many different programming languages, and it would that allow applications running on different operat-
have been tremendously difficult to force them to ing systems, and written in different programming
switch to one common programming language. Even languages, to communicate. 3 CORBA also defines a
if we could have convinced all of the testers on our set of higher-level services, sitting on top of this com-
team, we could never have convinced all the testers munication layer, that provide functionality deemed
in our entire organization (much less those in other beneficial by the programming community at large
divisions, or at other sites), with whom we hoped to (such as naming, event, and transaction services). It
share our solution. Therefore, we were unable to rely is important to understand that CORBA itself is not
on scripting languages for our solution. a product; it is a set of specifications. For any given
set of operating systems, languages, and services, it
Test harnesses. A test harness is an application that is necessary to either find a vendor who has imple-
is used to execute one or more tests on one or more mented CORBA for that environment, or, much less
systems. In effect, test harnesses are designed to au- desirably, implement it oneself.
tomate the execution of individually automated tests.
CORBA is not intended to directly solve the problems
A variety of different test harnesses are available. of reuse and automation. However, CORBA does have
Each is geared toward a particular type of testing. some general applicability to the problem of reuse.
For example, many typical UNIX** tests are written First, CORBA is supported on a wide variety of op-
in shell script or the C language. These tests are gen- erating systems. Second, there is CORBA support for
erally stand-alone executables that return zero on a wide variety of programming languages. Thus,
success and nonzero on error. Harnesses such as the CORBA solves two of our key reuse problems. In con-
Open Group’s Test Environment Toolkit (TET, also trast, CORBA has no direct support for multiple
known as TETware) are designed to execute these codepages. Additionally, the set of available CORBA
types of tests on one or more systems. 2 In contrast, services is not geared toward a test environment,
a harness such as Sun’s Java Test leverages the un- which is understandable given the general applica-
derlying Java programming language to create a har- bility of CORBA to the computer programming in-
ness that is geared specifically to tests written in the dustry as a whole.
Java language. It would not be uncommon for a test
team to use both of these harnesses. Additionally, Given the above, CORBA would clearly fit in cate-
it is not uncommon for test teams to create custom gory 3 of our preferences, although significant work
harnesses geared toward specialized areas they test, would be necessary to provide the missing support
such as I/O subsystems and protocol stacks. in terms of multiple codepages and existing automa-
tion components. Additionally, as we mentioned
It is clear that test harnesses have direct applicabil- above, there is no one company that produces a prod-
ity to the problem of automation. However, as a gen- uct called “CORBA.” What this means is that for a
eral rule, test harnesses only solve the execution part complete solution one must frequently obtain prod-
of the automation problem. This solution still leaves ucts from multiple vendors and attempt to config-
areas such as test suite distribution, test suite mon- ure them to work together. This attempt has been

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 131


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Figure 4 STAF service types

INCOMING REQUEST

STAF DAEMON SERVICE


DISPATCH EXTERNAL C/C++ SERVICE
LAYER

INTERNAL SERVICE

JAVA SERVICE PROXY EXTERNAL JAVA SERVICE

INTERNAL SERVICE

REXX SERVICE PROXY EXTERNAL REXX SERVICE

notoriously difficult in the past, 4 and, although the local and remote processes) to these services. STAF
situation is improving, we would rather have avoided has two “flavors” of services, internal and external.
this layer of complication. All told, we felt that a Internal services are coded directly into the daemon
CORBA solution was not worth the expense neces- process and provide the core services, such as data
sary to implement and maintain it. management and synchronization, upon which other
services build. External services are accessed via
The design of STAF shared libraries that are dynamically loaded by STAF.
These external libraries represent either the service
Having exhausted other avenues, we decided to cre- itself, in the case of languages like C or C⫹⫹, which
ate our own solution. We had a two-phased approach ultimately generate native executable object code,
to the development of STAF. The first phase ad- or a proxy interface to other languages, such as the
dressed the issue of reuse. This phase by itself would Java or REXX languages, which do not generate na-
give us a solution that fell into category 3 of our so- tive executable object code. The differentiation of
lution preferences. The second phase tackled the service “flavors” and proxy handling can be seen in
problem of automation. In this phase we would build Figure 4.
on top of the reuse solution and extend it to solve
our automation problem. This two-step approach This ability to provide services externally from the
provided a solution that fell into category 1 of our STAF daemon process allowed us to keep the core
solution preferences. The result of that work was the of STAF very small, while allowing users to pick and
Software Testing Automation Framework, or STAF. choose which additional pieces they wanted. It min-
imizes the infrastructure necessary to run STAF. Ad-
In the subsections that follow, I present the under- ditionally, the small STAF core makes it easy to pro-
lying design ideas surrounding STAF and how they vide support on multiple platforms, and also to port
helped provide a reuse solution. A subsequent sec- STAF to new platforms.
tion will then address how we built and extended this
solution to solve the problem of automation. Request-result format. Fundamentally, every STAF
request consists of three parameters, all of which are
Services. STAF was designed around the idea of re- strings. The first parameter is the name of the sys-
usable components. In STAF, we call these compo- tem to which the request should be sent. This pa-
nents services. Each service in STAF exposes a spe- rameter is analyzed by the local STAF daemon to de-
cialized set of functionality, such as logging, to users termine whether the request should be handled
of STAF and other services. STAF, itself, is fundamen- locally or should be directed to another STAF sys-
tally a daemon process that provides a thin dispatch- tem. Once the request has made it to the system that
ing mechanism that routes incoming requests (from will handle it, the second parameter is analyzed to

132 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
determine which service is being invoked. Finally, Figure 5 details the concepts just described. A STAF
the third parameter, which contains data for the re- request is initiated by the REXX program running on
quest itself, is passed into the request handler of the machine gamma (running Windows 2000). It is sub-
service to be processed. mitting the request “generate type Build subtype
WebSphere V4” to the event service on machine
After processing the request, the service returns two delta. In step 1 the REXX interpreter passes the re-
pieces of data. The first is a numeric return code, quest to the REXX API layer of STAF. In step 2, the
which denotes the general result of the request. The REXX API layer passes the request to the C API layer.
second is a string that contains request-specific in- In step 3 the C API layer makes the interprocess com-
formation. If the request was successful, this infor- munication (IPC) request to the STAF daemon pro-
mation contains the data, if any, which were asked cess. At this point the STAF daemon determines that
for in the request. If the request was unsuccessful, the request is destined for another system, which ini-
this information typically contains additional diag- tiates step 4, a network IPC request to the STAF dae-
nostic information. mon on machine delta (running AIX Version 4.3.3).
The STAF daemon on machine delta determines that
By dealing primarily with strings, we have been able the request is bound for the event service. This leads
to simplify many facets of STAF. First, there is only to step 5 where the request is passed to the Java ser-
one primary function used to interface with STAF vice proxy layer, the layer responsible for commu-
from any given programming language. This func- nicating directly with the JVM, which is step 6. In step
tion is known as STAFSubmit( ), and its parameters 7, the JVM invokes the corresponding method on the
are the three strings described above. Because of the event service object. Upon receiving the request, step
simplicity of this interface, requests look essentially 8 shows the event service passing the request string
identical across all supported programming lan- to the common request parser of STAF for valida-
guages, which makes using STAF from multiple pro- tion. At this point the event service would perform
gramming languages much easier. Adding support the indicated request and steps 1 through 7 would
for a new programming language is also trivial, be- be reversed as the result was passed back to the REXX
cause only a very small API set must be exposed in program on machine gamma.
the target language. Had we chosen to use custom
APIs for each service, the work to support a new pro-
There are a number of things to note about this re-
gramming language would be significant, since we quest flow. First, it was quite easy to specify a net-
would be faced with providing interfaces to a much, work-oriented request from the point of view of the
much larger set of APIs. REXX program. Second, the machines in question
are running different operating systems on different
Strings also make it easier to create and interface hardware architectures, and neither the REXX pro-
with external services. The primary interface for com- gram nor the event service need be aware of this dif-
municating with an external service consists of a ference. Third, neither the REXX program nor the
method to pass the requisite strings in and out of Java-based event service need be concerned with the
the service. Additionally, by restricting ourselves to language the other was using.
strings we are able to provide to services a common
set of routines to parse the incoming request strings.
Common routines allow service providers to simply The decision to have STAF deal only with strings was
define the format of their request strings and pass the most crucial and beneficial decision we made
them to this common parser for validation and data while designing STAF. It has allowed us to keep STAF
retrieval, which helps ease the creation of reusable simple and flexible at the same time.
components. This leads to benefits in the user space
as well, since all service request strings follow a com- Unicode. Because we focus predominantly on strings
mon lexical format, which provides a level of com- and were concerned with codepage issues, STAF was
monality to all services. It also makes it easier to use designed to use Unicode** internally. When a call
services when switching from one programming lan- to STAFSubmit( ) is made, the input strings are con-
guage or operating system to another, because the verted to Unicode. All further processing is carried
request strings are identical regardless of the envi- out in Unicode. Data are only converted out of Uni-
ronment. Commonality has the added benefit of hid- code when a result is passed back from STAFSub-
ing the programming language choice of the caller mit( ), or if STAF is forced to interact with the op-
and the service provider from one another. erating system or some other entity that does not

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 133


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Figure 5 STAF service request flow

STAF DAEMON 4 STAF DAEMON

3 5

C API LAYER JAVA SERVICE PROXY

2 6

REXX API LAYER JAVA VIRTUAL MACHINE

1 7

, EVENT SERVICE IMPLEMENTATION


“generate type Build subtype WebSphere_V4”
8

REQUEST PARSER

STAF AND STAF- USER AND SERVICE-


HELPER LAYERS IMPLEMENTATION LAYERS

accept Unicode strings. By processing data in Uni- STAF provides. We will see these services again later
code, we keep the integrity of the data intact. For when we examine how they were used to create the
example, if a system using a Japanese codepage sends solution to our automation problems.
a request to log some data containing Japanese
codepage characters to a system using an English Three core services in STAF are the handle, variable,
codepage, the data are initially converted to Unicode and queue services. These services provide funda-
(which maintains the integrity of the data) when the mental capabilities that are common across all ser-
STAFSubmit() call is issued. The data are main- vices and provide a foundation from which to build.
tained in Unicode until another STAFSubmit() call Unsurprisingly, these services expose the capabili-
is issued to retrieve the data. If the same system run- ties of handles, variables, and queuing in STAF.
ning the Japanese codepage requests the data, the
data will be converted from Unicode back to the Jap- Handles are used to identify and encapsulate appli-
anese codepage, which preserves the integrity of the cation data in the STAF environment. When an ap-
data, since the data were originally in the same plication wishes to use STAF, it obtains a handle by
codepage. The data retrieved will be the same data calling a registration API. The handle returned is tied
initially logged even though, for some indeterminate specifically to the registering application. In general,
length of time, the data were being stored or main- this is a 1-to-N mapping between applications and
tained on a system using an English codepage. Thus, handles. An application may have more than one
by using Unicode throughout STAF, we solved our handle, but any given handle belongs to a single ap-
problem of handling multiple codepages. plication. However, STAF does support special “stat-
ic” handles that can be shared among applications.
Available services. In order to solve our automation Each STAF handle has an associated message queue.
problems, we needed a set of components on which This queue allows an application to receive data from
to build. As we built STAF, we kept this foremost in other applications and services. It also forms the ba-
our minds and ensured that the services we devel- sis for local and network-oriented interprocess com-
oped included these essential automation compo- munication in STAF. Many services deliver data to
nents. Here we describe some of the services that an application via its queue. These queues allow ap-

134 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
plications to work in an event-driven manner sim- STAF systems to be started, stopped, and queried. It
ilar to the approach used by many windowing sys- provides detailed control over the execution of pro-
tems. cesses including specification of environment vari-
ables, the working directory, input/output redirec-
STAF provides data management facilities through tion, and effective user identification. The process
STAF variables. These STAF variables are used by service can also, at user request, deliver notifications
STAF applications in much the same way that var- when processes end. These notifications are deliv-
iables are used in a programming language. When ered via the queuing facilities described earlier.
a STAF request is submitted, any STAF variables in
the request are replaced with their values. One of STAF provides file system facilities through the file
the powerful capabilities of STAF variables is that they system service. Currently, this service provides mech-
can be changed outside of the scope of the running anisms for transferring files and accessing file con-
application. This capability provides the ability to tent. Future versions of STAF will expand the capabil-
dynamically alter the behavior of an application. For ities of this service into file and directory management,
example, an application designed to apply a specific such as directory creation and enumeration and file
percentage of load on a system might allow the per- or directory deletion.
centage to be specified through an environment vari-
able or as a command line argument. In this case,
STAF provides logging facilities through the log ser-
once the application is running, the only way to
vice. At its most basic layer, this service provides
change the load percentage is to stop the applica-
time-stamped message logging based on levels, such
tion and restart it with the altered environment vari-
as “FATAL,” “ERROR,” “WARNING,” and “DEBUG.”
able or command line argument. Using STAF vari-
A variety of higher-level facilities are built on top
ables allows the value to be changed without stopping
of this foundation, including local and centralized
the application. The only change to the application
logging, log sharing between applications, dynamic
would be to periodically reevaluate the value of the
level-masking, and maintenance on active logs. The
STAF variable. These STAF variables are stored in
dynamic level-masking is of particular interest. Level-
variable pools. Each STAF handle has a unique vari-
masking refers to the ability of the user to determine
able pool that is specific to that application. There
which logging levels will be stored in a log file. Mes-
is also a global variable pool that is common across
sages with logging levels not included in the level-
all handles on a given STAF system. Commonality al-
mask will be discarded. The fact that this feature is
lows default values to be specified in the global vari-
dynamic means that the level-mask can be changed
able pool, which can then be overridden on a handle-
while an application is running. For example, this
by-handle basis.
ability would allow a user to “switch on” debug mes-
sages when a problem is encountered, without need-
STAF provides several other services in addition to
ing to stop and restart the application.
handle, variable, and queue. STAF provides synchro-
nization facilities through the semaphore and re-
source pool services. The semaphore service provides STAF provides remote monitoring facilities through
named mutual exclusion (mutex) and event sema- the monitor service. This service provides a light-
phores. Compared with native semaphores com- weight publish-query mechanism. Applications pub-
monly provided by an operating system, STAF sema- lish their state, which then allows other applications
phores have two advantages. One, they are available to remotely query it. The published state is a simple
remotely across the network. Two, they are more vis- time-stamped string, yet this has proven sufficiently
ible, meaning it is much easier, for example, to de- robust for monitoring the progress of typical tests
termine who owns a mutex semaphore and who is and applications.
waiting on an event semaphore. The resource pool
service provides a means to manage named pools of STAF provides event-handling facilities through the
resources, such as machines, user identifiers, and li- event service. This service provides standard publish-
censes. In particular, it provides features for man- subscribe semantics. Applications register for spe-
aging the content of the pools as well as synchro- cific types and, possibly subtypes, of events. Other
nizing access to the elements in the pools. applications generate events based on a type, sub-
type, and sets of properties (which are attribute/value
STAF provides process execution facilities through pairs). The events are delivered via the queuing fa-
the process service. This service allows processes on cilities described earlier.

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 135


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Figure 6 Single Ogre instance after multi-instance support
The first area we tackled was the execution of the
Ogre test suite. Instead of trying to retrofit an ex-
isting test harness onto STAF, we chose to create a
new one that was STAF-aware from the ground up.
START What we came up with was a program called the Ge-
neric WorkLoad processor or, in abbreviated form,
SELECT GenWL (pronounced JEN-wall). This harness allows
RANDOM us to create a text file defining the configuration data
SUBTEST
for the scenario, the processes to be executed, and
the systems on which they should be executed. This
YES ACQUIRE text file is called the workload file. Using GenWL,
SYNCHRONIZED
LOCK
SUBTEST? we are able to start or stop the entire workload with
NO a single command from a central management con-
SELECT REQUEST
sole, which was our desired goal. GenWL also played
RANDOM RESOURCE(S) an important role in solving other aspects of the au-
SERVER(S)
tomation problem, which are discussed below.

ESTABLISH
CONNECTION(S)
Next, we looked to solve the problems associated
TO SERVER(S) with executing more than one instance of Ogre on
a given system. The two most pressing issues were
EXECUTE
test suite synchronization and resource management.
SUBTEST To handle synchronized access to tests, we relied on
the STAF semaphore service, in particular, its mutex
semaphore support. This service allowed one in-
RELEASE RELEASE stance of the test suite to gain exclusive access to a
CONNECTION(S) RESOURCE(S)
TO SERVER(S) test and then release control once execution of that
test was complete. To manage the drive letters and
printer ports, we relied on the resource pool service
NO YES RELEASE
SYNCHRONIZED
LOCK
of STAF. This service allowed us to set up separate
SUBTEST?
pools for the drive letters and printer ports. The ser-
vice then manages the access to entries within the
PROCEDURE NO MULTI-INSTANCE pool. Thus, when one instance of the test suite re-
ENTRY OR EXIT CONFLICT
quests a drive letter, we can be sure that no other
DECISION MULTI-INSTANCE
POINT OR STAF SPECIFIC
instance of the test suite will obtain that drive letter
until the first instance releases control of it back to
the resource pool service. With these problems
solved, we were able to run multiple instances of
Ogre on our systems. These changes to the test suite
In addition to the services described above, STAF are illustrated in Figure 6. In particular, the light pur-
makes it quite easy for groups to develop their own ple areas of Figure 6 represent where STAF was used
services to meet specific needs. These services can to solve the test suite synchronization and resource
then become part of the set of service components management problems.
available for use with STAF. The modular service-
based nature of the platform provides the infrastruc- While making the synchronization and resource
ture for evolution and growth. management changes described above, we found
ourselves redistributing the test suite more often than
usual, so in conjunction with the above changes, we
From reuse to automation
also set out to solve the test suite distribution prob-
Having addressed reuse, we next focused on auto- lem. Here we were able to leverage the file system
mation. Our plan was to build a solution on top of and variable services of STAF. Using these two ser-
STAF by leveraging the automation components that vices, we wrote a small script that iterated through
it provides. a list of clients in a file and used the file system ser-

136 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
vice to copy each file. The variable service was used given period of time. We were then able to ascer-
to deal with mapping the abstract destination defined tain which, if any, of those errors and warnings were
in the copy command to the actual destination on true problems or simply artifacts of temporarily push-
ing a server beyond its capacity. Remember, Ogre
is a load and stress test, so we expect to occasionally
push the servers beyond their limits.

STAF has allowed teams Finally, we were left with the problem of execution
to focus on directly solving dynamics. To solve this problem, we leveraged
their problems instead of GenWL again. As mentioned above, the workload
inventing infrastructure. file contains the configuration information for the
scenario. As the workload file is processed, this con-
figuration information is stored on each of the cli-
ent systems using the STAF variable service. As the
test suite executes, it retrieves the configuration in-
each client. With the list of clients maintained in a formation from the variable service. By using the
file, we were assured the updated test suite was con- variable service, we were able to update the config-
sistently distributed to all the clients. uration information dynamically. Thus, if we needed
to change the configuration information, such as to
With the problems of test suite distribution and ex- reintroduce a server or change server stress ratios,
ecution solved, we next addressed the test suite mon- we simply updated the appropriate values in the
itoring problem. Here we leveraged the monitor ser- workload file and directed GenWL to push that value
vice of STAF. Our test suite published its state to the out to all the clients.
monitor service every time it entered a subtest or
when an error or warning occurred. Given the pub- Figure 7 illustrates the flow of an Ogre scenario af-
lished information, we next developed a way to view ter our automation changes. In comparison to Fig-
this information using the GenWL execution har- ure 2, note that the majority of the steps are now
ness. The workload file read by GenWL defines all automated. The only two steps that are not auto-
the test suite instances; thus it is trivial for GenWL mated are updating the configuration and installing
to interact with the monitor service to retrieve the a new build. Updating the configuration consists of
published state for all the test suite instances. manually updating the workload file with the con-
GenWL then displays this information on a system- figuration changes. This updating effectively requires
by-system basis. With a single command from our human intervention. Installing a new build is some-
management console, we were able to ascertain the thing that we have automated in other areas but was
current state of the entire Ogre scenario. not deemed useful for the Ogre scenarios.

Although GenWL and the monitor service allowed Issues


us to determine the state of the scenario at any given
point in time, this capability was not sufficient for us We have received surprisingly few complaints about
to determine what had transpired over extended pe- STAF from our users. The vast majority of user is-
riods of time (e.g., from one evening until the fol- sues concern clarifying the documentation or re-
lowing morning). With GenWL and the monitor ser- questing new features (such as new services or ex-
vice, we could see the state as we left and when we tensions to existing services). We have also found
came in, but we were still unaware as to any prob- and fixed isolated performance issues. For example,
lems that had occurred in between. the log service was originally written in REXX, which
proved to be unacceptably slow. We have since
To solve this problem we simply exchanged our cur- ported the log service to C⫹⫹, which significantly im-
rent logging mechanism with calls to the log service proved its performance.
of STAF. This exchange allowed us to use an approach
similar to the one used to solve the test suite dis- With respect to general overall performance, STAF
tribution problem. We created a simple script that requests do incur a minimal amount of overhead since
iterated over a list of clients in a file and used the they require an IPC request to go from the request-
facilities of the log service to retrieve all the error ing process to the STAF daemon, plus the user’s re-
and warning messages that had been logged over a quest string must be parsed (as opposed to dealing

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 137


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
Figure 7 Ogre scenario flow after automation
50 lines of code. The scripts were so small because
they were able to depend on the underlying STAF in-
frastructure and the services it provides. The GenWL
program relies on a number of STAF services to per-
START form its tasks. By reusing these services, GenWL is
free to concern itself with the fundamental activi-
INSTALL ties of parsing the command line parameters and the
NEW BUILD workload file. The remainder of the work is handled
by STAF and includes setting the configuration in-
YES
YES DISTRIBUTE
formation, starting and stopping the processes, and
TEST CASES
TEST CASES monitoring the test progress. This work is done with
CHANGED?
only nine commands in the GenWL program. We
NO
have found this type of usage to be fairly typical.
START
SCENARIO If we look at the application of STAF to our auto-
mation problem, we see significant savings arise. By
MONITOR
overcoming our test suite synchronization and re-
SCENARIO source management problems, we were able to re-
duce the required number of client systems by ap-
proximately 33 percent, which in the largest case
APPLY CHANGES YES
DYNAMICALLY
CONFIGURATION meant a reduction of 48 client systems. This reduc-
CHANGE? tion represents a very large savings in the hardware
NO required to run the test suite.
UPDATE YES
YES SERVER
CONFIGURATION FAILURE? By overcoming our test suite execution and test suite
NO distribution problems, we were able to reduce the
STOP YES
YES IMPORTANT
time it takes to restart a scenario based on a new
SCENARIO NEW BUILD? build by roughly 50 percent. Our old manual pro-
NO cedure took us approximately eight hours. Our new
NO
NO
automated procedure takes us approximately four
SCENARIO
COMPLETE?
hours. This difference is a significant reduction in
NO
YES
time and is amplified even more when builds are re-
ceived late in the day, e.g., 4:00 P.M. Because it pre-
END
viously took eight hours to start the scenario, we
would typically begin working with the new build at
PROCEDURE MANUAL approximately 8:00 A.M. the following morning. Thus
ENTRY OR EXIT TASK
the scenario was not actually running until 5:00 P.M.
DECISION AUTOMATED of that following day. However, with a reduction to
POINT TASK
four hours, someone can stay and have the scenario
running by 8:00 P.M. the same night, which is an even
more significant cycle-time reduction of 21 hours.
In addition, it used to take several people to per-
directly with raw data). This means STAF would not form this work. Now one person can perform the
be appropriate for extremely low-latency requests. work because we can manage everything from a cen-
To date, we have not encountered this problem. tral console. Thus, there are personnel savings as
well.
Benefits
A major benefit of overcoming our test suite mon-
By providing a reusable framework and reusable ser- itoring problems was finding a number of defects in
vices, STAF has allowed teams to focus on directly the product that would have gone undetected oth-
solving their problems instead of inventing infrastruc- erwise. Detecting problems before they reach the
ture. This advantage is illustrated with the tools de- customer is a very significant source of savings, be-
veloped for automating Ogre. The test distribution cause problems found by customers are much more
script and the log-querying script were both less than costly to fix than those found during testing. 5 In ad-

138 RANKIN IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002

Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.
dition, our new monitoring capabilities improved create without a solution such as STAF from which
morale by removing the “grunt” work of perform- to build.
ing periodic monitoring check-ins at night and on
the weekend. If a problem was uncovered while mon- After a long incubation period, STAF is available in
itoring remotely, we were sometimes able to perform an open-source form on the SourceForge Web site
remote diagnostics and solve the problem without (http://staf.sourceforge.net). It is my hope that the
coming to the site. availability of this flexible framework will lead to sus-
tained advances in the testing efficiency and effec-
Finally, by overcoming our test suite execution tiveness of many software organizations.
dynamics problems, we were able to save time and
personnel by reducing the frequency of scenario re- Acknowledgments
starts. This reduction in restarts was yet another mo-
rale boosting item, since we no longer felt like we I would like to thank Clay Williams, Karen Rosen-
were “twiddling our thumbs” when running the sce- gren, Sharon Lucas, David Bender, and Don Ran-
nario in a configuration that we knew would have to dall for reviewing draft versions of this paper. I would
be restarted in mid-run. like to express my sincere gratitude to Peri Tarr for
helping me organize my thoughts and for keeping
Many times our group had contemplated fixing some this paper flowing in a consistent manner. Her as-
of the problems in the Ogre test suite. We had elab- sistance has been invaluable in making this paper
orated a list of items that we would need to create available to readers.
in order to solve the problem. Evaluating this list in *Trademark or registered trademark of International Business
hindsight, we realized that what we actually needed Machines Corporation.
was STAF. Had we addressed our list of items ear- **Trademark or registered trademark of Sun Microsystems, Inc.,
lier, we would have ended up with a solution that Microsoft Corporation, The Open Group, Object Management
was centered around our particular test suite, as op- Group, or Unicode Consortium, Inc.
posed to the general solution, which is STAF. Instead,
the reuse philosophy of STAF allowed us to pick up Cited references
the reusable components it provides and solve our
test suite problems. 1. J. Ousterhoust, “Scripting: Higher Level Programming for the
21st Century,” Computer 31, No. 3, 23–30 (March 1998).
2. The Open Group, http://tetworks.opengroup.org/.
3. The Object Management Group, http://www.corba.org.
Conclusion 4. R. Bastide, P. Palanque, O. Sy, and D. Navarre, “Formal Spec-
ification of CORBA Services: Experience and Lessons
To improve the efficiency and effectiveness of the Learned,” OOPSLA Conference Proceedings (2000), pp. 105–
testing process, groups need to find ways to improve 117.
their reuse and automation. As a solution to help 5. R. S. Pressman, Software Engineering: A Practitioner’s Approach,
address these issues, we created STAF. It was designed 3rd Edition, McGraw Hill, New York (1992), p. 559.
to solve our reuse problems and was then leveraged
to solve our automation problems. Using STAF, we Accepted for publication September 18, 2001.
have generated considerable savings with respect to
the people, time, and hardware necessary to perform Charles Rankin IBM Server Group, 11401 Burnet Road, Austin,
Texas 78758 (electronic mail: [email protected]). Mr. Rankin
testing. is an advisory software engineer in the IBM Austin Development
Laboratory. He graduated with a B.S. degree in electrical engi-
Since its inception, STAF has been adopted by nu- neering from the University of Florida in 1993, after which he
joined IBM in Austin. He has worked extensively with IBM’s PC-
merous test groups throughout IBM, and it is being oriented operating systems and networking products. He was the
used to create a variety of innovative testing solu- system test lead for IBM’s Directory and Security Server for OS/2
tions. In my organization alone, we have developed and IBM’s OS/2 WARP Server for e-Business. He is currently
a pluggable solution that drives automated testing the lead developer for STAF.
from build through results collection. When a new
build becomes available, the test systems are auto-
matically set up and installed. Then the test suites
are executed automatically, and the results are col-
lected for analysis. These types of solutions would
be tremendously more difficult, if not impossible, to

IBM SYSTEMS JOURNAL, VOL 41, NO 1, 2002 RANKIN 139


Authorized licensed use limited to: University of West London. Downloaded on March 18,2024 at 13:15:06 UTC from IEEE Xplore. Restrictions apply.

You might also like