0% found this document useful (0 votes)

35 views12 pages

SPICE Investigating Measurement Scales and Aggregation Methods in SPICE

This document discusses three issues with the SPICE assessment method based on ISO/IEC 15504-2 for software process assessment: 1) A lack of a clear measurement scale for characterizing the extent to which an outcome (practice) is achieved, 2) Shortcomings in the aggregation methods used to generate process attribute ratings from practice outcomes, and 3) Shortcomings in the aggregation methods used to generate capability levels from process attribute ratings. The purpose is to identify these issues and provide candidate solutions to address them based on measurement theory, while retaining the existing rating scales. Six candidate solutions are presented and discussed in terms of their strengths and weaknesses from practical and theoretical perspectives.

Uploaded by

Sam Gvmxdx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views12 pages

SPICE Investigating Measurement Scales and Aggregation Methods in SPICE

Uploaded by

Sam Gvmxdx

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Information and Software Technology 55 (2013) 1450–1461

Contents lists available at SciVerse ScienceDirect

Information and Software Technology

journal homepage: www.elsevier.com/locate/infsof

Investigating measurement scales and aggregation methods in SPICE

assessment method
Ho-Won Jung ⇑
Korea University Business School, Anam-dong 5Ka, Sungbuk-gu, Seoul 136-701, Republic of Korea

a r t i c l e i n f o a b s t r a c t

Article history: Context: This study identified three issues in SPICE (Software Process Assessment and Capability dEter-
Received 7 September 2011 mination) assessment method based on ISO/IEC 15504-2 (Performing an assessment). The issues include
Received in revised form 8 December 2012 a lack of a measurement scale for characterizing the extent to which an outcome (practice) is achieved
Accepted 5 February 2013
(implemented) and two shortcomings of the aggregation methods in order to generate a process attribute
Available online 18 February 2013
(PA) rating. Such issues may be weaknesses to the needs of retaining consistent assessment results com-
parable within and across assessed organizations.
Keywords:
Objective: The purpose of this study is to identify issues, such as the measurement scale and aggregation
SPICE
ISO/IEC 15504
methods, in SPICE assessment methods and to provide candidate solutions based on measurement the-
Aggregation method ories, while the rating scales of the current PA and capability are retained.
MADM Method: For those purposes, the present study reviews scale types based on a measurement theory and
Formative measurement model uses the reflective and formative measurement models in order to find the relationships between PAs and
Reflective measurement model practices. Composite measure development methods that are dependent on the relationships are then
proposed on the basis of appropriate aggregation methods by using multiple attribute decision making
(MADM) methods.
Results: Six candidate solutions are presented along with their strengths and weaknesses based on prac-
tical and theoretical perspectives. Two examples are given to illustrate and interpret six candidate solu-
tions for the issues identified. By applying six candidate solutions to the examples shows that the
measurement scale and the aggregation methods influence the PA rating.
Conclusion: The process community, including the SPICE standardization group, can initiate discussions
in order to determine the measurement scale and the aggregation methods with our six candidate solu-
tions. The rationale and methods addressed in this study can also be applied to other domains in order to
derive a composite (aggregate) value or rating.
Ó 2013 Elsevier B.V. All rights reserved.

1. Introduction of a process improvement initiative and/or a capability determina-

tion approach (i.e., generate capability level (CL)) [28]. A process
The SPICE (Software Process Improvement and Capability dEter- assessment method should define a measurement scale for charac-
mination) project was initiated in 1993 to support the develop- terizing the extent to which outcomes (practices) in a process are
ment and validation of the ISO/IEC 15504 International Standards achieved (implemented). A set of values characterized are then
for software process assessment [52,58]. SPICE standards include aggregated to generate CL as a composite value.2
well-known international documents that comprehensively de- The process capability defined for a process is a latent construct
note ISO/IEC 15504 (Process assessment) and emerging ISO/IEC of a theoretical concept that can be measured indirectly via multi-
33000 series. SPICE assessment is performed using a process ple measures3 [6,11,17]. In a process, the relationship between a
assessment method described in ISO/IEC 15504-2 [27] against a construct and its measures is referred to as a measurement model.
process assessment model (PAM1). It is usually performed as a part
2
Composite measure is defined as a measure derived from a combination of
⇑ Tel.: +82 2 3290 1938; fax: +82 2 3290 1839. various measures of a multidimensional theoretical model that cannot be captured by
E-mail address: [email protected] a single measure [47]. Composite value is a value from a composite measure [34].
1 3
PAMs include ISO/IEC 15504-5 (An exemplar software life cycle process assess- Many manuscripts use a term indicator to refer to variables that are used to
ment model) [29], ISO/IEC 15504-6 (An exemplar system life cycle process assess- empirically detect constructs. Since the indicator is a reserved term in SPICE, this
ment model) [30], ISO/IEC 15504-7 (Assessment of organizational maturity) [31], and study intentionally uses the term measure instead of indicator. Other terms are proxy
ISO/IEC 15504-8 (An exemplar assessment model for IT service management) [32], variables, items, scores, scales, indices, or observed variables [8]. This study also uses
and a model under development, such as ISO/IEC 33063 (Software testing process the term composite measure instead of the widely used composite indicator,
assessment model) [35] as well as their conformance models. particularly in the OECD [47].

0950-5849/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.infsof.2013.02.004
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1451

The construct can be estimated using its measures in the composite Capability Dimension
measure development. The construct estimator, also known as the
composite measure, is derived (aggregated) by an appropriate com- Level 5: Optimizing process
bination of measures. In PAMs, a set of process attributes (PAs) are PA 5.1 Process innovation
defined as measures for indirectly measuring the process capability. PA 5.2 Continuous optimization
CL assigned by 0 to 5 is a composite value aggregated from a set of Level 4: Predictable process
PA ratings. PA rating, as a composite value, is also determined by PA 4.1 Process measurement
aggregating a set of outcomes values characterized. This hierarchy, PA 4.2 Process control
as shown in Fig. 3, is a multidimensional structure, in which the term Level 3: Establish process
multidimensional implies ‘‘a construct [that] consists of a number of PA 3.1 Process definition
interrelated attributes or dimensions and exists in multidimensional PA 3.2 Process deployment
domains’’ [41, p. 741].
Level 2: Managed process
The relationship between a multidimensional construct and its
PA 2.1 Performance management
dimensions as well as the relationship in measurement models can
PA 2.2 Work product management
be explained by construct specification, such as the reflective mea-
Level 1: Performed process
surement (or measure) model or the formative measurement (or
measure) model [2,4,6,9,12,14,17,20,42]. The former suggests that PA 1.1 Process performance
the latent construct is a cause of the observable measures whereas Level 0: Incomplete process Process Dimension
the latter implies that the observable measures are causes of the
latent concept. System Life Cycle Software Life Cycle
In order to develop composite measures, the aggregation meth- Processes Processes
ods depend on construct specification, i.e., the reflective or forma-
tive measurement model [1,19,22,25,47,51,54,55,61,63]. They are Fig. 1. The two-dimensional architecture of ISO/IEC 15504-5.
also influenced by the evaluation policy, the purpose of the com-
posite measure, and the measurement scale [34]. According to
Diamantopoulos et al. [14], construct misspecification (the incor- 2. Background
rect adoption of reflective or formative measurement model) is
not uncommon with an occurrence of around 30% across top-tier 2.1. Two-dimensional architecture
journals.
The purpose of this study is to identify issues related to mea- This study uses ISO/IEC 15504-5 [29] in order to explain PAM
surement scale and aggregation methods in order to generate PA because it is the most popular and well known standard. The
ratings in a process and then to provide a set of candidate solu- structure of ISO/IEC 15504-5 can be expressed as process and
tions to cope with them based on the theoretical and practical capability dimensions, as shown in Fig. 1. Process dimension in-
perspectives. Three issues are identified: the lack of a measure- cludes two process categories, such as System Lifecycle Processes,
ment scale for characterizing the extent of outcome achievement consisting of four Process Groups of 42 processes, and Software
(or practice implementation) and two shortcomings related to the Lifecycle Processes, containing three Process Groups of 17 pro-
lack of theory-based aggregation methods. Since measurement cesses [29].
scale and aggregation methods can influence PA and CL ratings, As shown in Fig. 1, the capability dimension consists of six CLs
these issues should be resolved for consistent assessment results in a range from zero to five. CL 1 includes one PA, and CLs 2 to 5
comparable within and across assessed organizations.4 It is re- comprise of two PAs that work collectively in order to significantly
quired that assessment methods should be designed to ‘‘provide enhance the capability of performing a given process. Each process
benchmark quality ratings. . .’’ [52, p. 9]. For such purposes, this in a PAM can be assessed and rated by CL 5. This is known as the
study utilizes the measurement theory [1,19,22,25,47,51,54, continuous model.5
55,61,63] and multiple attribute decision making (MADM) methods
[44–46,64] in the development of the current PA ratings. Six candi-
2.2. Process attribute and capability level ratings
date solutions are presented along with their strengths and weak-
nesses. Two examples are given to illustrate and explain the
While the best practices are included in PAMs, the ratings of PA
candidate solutions.
and CL belong to the assessment method described in ISO/IEC
This study is organized as follows. Section 2 introduces a brief
15504-2. Each PA in ISO/IEC 15504-5 is comprised of outcomes (re-
overview of SPICE, construct specification, and construct examina-
sults of achievements), base practices (generic practices), work
tion. Section 3 addresses the construct specifications in CL determi-
products (generic work products), and others, where terms in
nation hierarchy. Section 4 identifies the three associated issues. In
parenthesis are reserved for PA 2.1–PA 5.2. As an example, Table 1
Section 5, this study describes the first issue of the lack of a
shows the outcomes and generic practices of PA 2.2 (Work Product
measurement scale. In Sections 6 and 7, this study analyzes the
Management). For the sake of simplicity, this study employs terms
aggregation methods for practice implementation-level and pro-
such as outcomes, practices, and work products.
cess-level outcome values, respectively. Section 8 provides candi-
PA achievement, as an aggregate (composite) value, is the
date solutions for the three issues and then provides two
aggregation of outcome values. According to Table 2, the aggregate
examples. Finally, Section 9 presents concluding remarks as well
value is then transformed into the PA rating of an ordinal scale: F
as suggestions for further studies.
(Fully achieved), L (Largely achieved), P (Partially achieved), or N
(Not achieved). This study refers to them as NPLF.

5
The model with the ordered capability for individual processes is called a
4
Assessed organizations that are part or all of an organization are called continuous representation, while the model with an ordered set of related processes
organizational units [26]. The ISO glossary can also be accessed in the Software and that deﬁnes (organizational unit) maturity is referred to as a stage model, such as ISO/
Systems Engineering Vocabulary (SEVOCAB) (www.computer.org/sevocab) and Sys- IEC 15504-7 and CMMI maturity. Note that this study is limited to ISO/IEC 15504-5
tems and Software Engineering – Vocabulary (ISO/IEC/IEEE 24765:2010). with capability as a continuous model.
1452 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461

Table 1
PA 2.2 Work product management attributes (cited from ISO/IEC 15504-5).

Outcomes*(Achievement) of PA 2.2
(OC 2.2a) requirements for the work products of the process are defined
(OC 2.2b) requirements for documentation and control of the work products are defined
(OC 2.2c) work products are appropriately identified, documented, and controlled
(OC 2.2d) work products are reviewed in accordance with planned arrangements and adjusted as necessary to meet requirements
Generic practices of PA 2.2
GP 2.2.1 Define the requirements for the work products
GP 2.2.2 Define the requirements for documentation and control of the work products
GP 2.2.3 Identify, document and control the work products
GP 2.2.4 Review and adjust work products to meet the defined requirements
*
In this study, the outcome number in 15504-5 standard was changed to explicitly denote the outcomes in (PA 2.2, i.e., a) ? OC 2.2a.

Table 2
The rating scale of process attributes (cited from ISO/IEC 15504-2).

Acronym Achievement of the deﬁned attribute

N (not achieved) 0–15% achievement: There is little or no evidence of achievement of the defined attribute in the assessed process
P (partially >15–50% achievement: There is some evidence of an approach to, and some achievement of, the defined attribute in the assessed process. Some
achieved) aspects of the achievement of the attribute may be unpredictable
L (largely >50–85% achievement: There is evidence of a systematic approach to, and significant achievement of, the defined attribute in the assessed process.
achieved) Some weakness related to this attribute may exist in the assessed process
F (fully achieved) >85–100% achievement: There is evidence of a complete and systematic approach to, and full achievement of, the defined attribute in the assessed
process. No significant weaknesses related to this attribute exist in the assessed process

SPICE assessment is implicitly based on the characterization of Since measures, which are supposed to be sampled from the same
outcome achievement because a practice may result in one or more conceptual domain, also have the same or similar contents and
outcomes; thus, measuring each practice outcomes may double share a common theme, dropping any one measure cannot change
count each outcome6. The practice implementation-level outcomes the conceptual domain of the construct and therefore, measures
become the base measures in an assessment of multiple process in- are interchangeable.
stances. If a single process instance is assessed, the process-level In Fig. 2a, the reflective construct, n (Ksi), denotes an underlying
outcomes become the base measures. The explanation of the prac- factor of four reflective measures xis The construct-measure rela-
tice implementation-level and process-level is provided in Section 3 tionship is represented by the following set of equations:
in detail.
x ¼ ki n þ di ; ð1Þ
ISO/IEC 15504-2 provides a rule to aggregate PA ratings in order
to derive the CL. A process in the assessment of a single instance is where coefficient k (lambda) is the expected impact of a one-unit
achieved to CL k if all PAs below CL k satisfy rating F; moreover, le- difference in n on xi. The random error term di (delta) is the mea-
vel k PA are rated as either F or L. Since all aggregation methods surement error.
analyzed and suggested by this study are concerned with ones to In the formative measurement model or formative specification,
reach the current PA ratings, the current PA rating scale and the shown in Fig. 2b, the formative construct can be regarded as an in-
rule to aggregate PA ratings to CL in ISO/IEC 15504-2 are not influ- dex generated by the observed measures, i.e., each measure is a
enced at all. determinant of the latent construct [6,17]. Formative measures
are not interchangeable and each measure captures a specific as-
2.3. Construct specification pect of the conceptual domain of construct. Thus, omitting any
measure may alter the conceptual domain and moreover, they
Fig. 2 shows two path diagrams of measurement models in con- need not co-vary with one another.
ventional structural equation modeling (SEM) [5], where con- The formative construct, g (eta), can be represented as
structs are represented as ovals, observed measures as follows:
rectangles, causal paths as single-headed arrows, and correlations
g ¼ c1 x1 þ þ cq xq þ f; ð2Þ
as double-headed arrows. A measurement model can be inter-
preted by the construct-measures relationship, known as a reflec- where g is the construct being estimated by its formative measure
tive or a formative measurement model. xi, coefficient ci (gamma) denotes the effect of measure xi on the la-
In the reflective measurement model or reflective specification, tent variable g, and the disturbance term f denotes the effect of
shown in Fig. 2a, a construct is regarded as the cause of the mea- measures omitted in the model on g. Note that in Eq. (2), g is
sures. Thus, variation in a construct results in variation in its mea- ‘‘the composite that best predicts the dependent variable in the
sures and the measures reflect the same underlying construct analysis’’ [17, p. 158].
[6,17]. Hence, measures are expected to co-vary with one another. The formative construct in Eq. (2) can be represented with no
error, i.e., the disturbance term f is assumed to be zero as follows:

6
On the other hand, the assessment method built on practice implementation is
C ¼ c 1 x1 þ þ c q xq ð3Þ
the Standard CMMI Appraisal Method for Process Improvement (SCAMPI) [56] for
CMMI appraisals [57], where SCAMPI is an instantiation of the Appraisal Require-
This equation works as the multi-attribute decision-making
ments for CMMI (ARC) that deﬁnes the requirements for appraisal methods intended (MADM) method, i.e., a composite measure, C, determined by a
for use with CMMI and with the People CMM (P-CMM) [56]. set of formative measures weighted by the importance or priority
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1453

(a) Reflective measurement model (b) Formative measurement model

ζ
ξ η

λ1 λ2 λ3 λ4 γ1 γ2 γ3 γ4

x1 x2 x3 x4 x1 x2 x3 x4

δ1 δ2 δ3 δ4
Fig. 2. Relationship between a construct and its measures (adapted from ISO/IEC CD 33003 [34]).

ζ1 ζ2 ζ3 ζ4 ζ5

ζ 1.1 ζ 2.1 ζ 2.2 ζ 3.1 ζ 3.2 ζ 4.1 ζ 4.2 ζ 5.1 ζ 5.2

... ... ... ... ... ... ... ...

...
(b) Formative (2nd order)
... Process-level outcomes

OC (1)
2.2a
. . . OC (2.2a
n)
OC (1)
2.2d
. . . OC (2.2d
n)
Implementation-level outcomes (a) Reflective (1st order)

δ 2.2a
(1)
δ 2.2a
(n)
δ 2.2d
(1)
δ 2.2.d
(n)

Fig. 3. Construct speciﬁcations of n process instances in a process.

of those measures7. MADM with one alternative is the same as the a simulation is required if less than four measures are used. The
aggregation method in order to derive a value of a composite mea- latter macro, CTANEST1, can test more complex models and non-
sure. Note that one alternative implies a single case. There are many continuous measures, but requires model identification. Jung and
MADM methods for the aggregation of a set of measures in order to Ting [40] introduced examples in order to test the construct spec-
derive a composite measure [64]. Detailed discussions on Eq. (3) can ification between capability and its process attributes by utilizing
be found in [13,15]. those macros and simulation. The decision rule and CTA should
be used to validate construct specifications in the multidimen-
sional constructs as well [37].
2.4. Construct examination

Two methods exist in order to investigate the construct specifi- 2.5. Composite measure development
cation. First, when data on the measures of the construct of interest
are not available (e.g., at the beginning of the development of best Composite measure development depends on construct specifi-
practices of an interesting construct), the construct specification is cations of the multidimensional construct [22,47,51,54,55,61].
examined through mental experiments [5] and/or decision rules However, its purpose does not find an SEM solution. The composite
[36]. Table 3 developed on the basis of several studies measure construction merely shares the same steps with an SEM
[5,14,17,18,36,63] presents the decision rules for construct development, from the theoretical model development to the oper-
specification. ationalization of the model constructs [5,47]. A composite measure
Second, when data are available (e.g., during or after trials), it is in Eq. (3) is only related to the aggregation method. Many MADM
possible to statistically test whether a construct is reflective or for- methods [64] can be applied to formative constructs. The creation
mative by using the two confirmatory tetrad analysis (CTA) SAS of composite measures is provided in Section 7.
macros: SAS-CTA2 [7,60] or CTANEST1 [23]. The former can be di-
rectly utilized for a construct with four or more measures whereas
3. Construct specifications in process capability
7
Bollen [9] used the terms causal indicators for measures in (2) and formative
indicator for measures in (3). For consistency with other studies, this study uses the
This section analyzes a hierarchy of SPICE multidimensional
term formative measure for both cases. However, we clearly state the difference, if construct in order to determine whether constructs in the hierar-
appropriate. chy are reflective or formative ones. Analysis results are utilized
1454 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461

Table 3
Decision rules for determining construct speciﬁcation (cited from ISO/IEC CD 33003).

Decision rule Reﬂective measurement model Formative measurement model

Characteristics of measures Measures are manifestations of the construct Measures are deﬁning characteristics (aspects) of the
construct
Measures share a common theme Measures need not share a common theme.
Measures should be interchangeable Measures need not be interchangeable
Measures should have the same or similar content Measures need not have the same or similar content
Excluding a measure should not alter the conceptual domain Excluding a measure may alter the conceptual domain
of the construct of the construct
Measures are expected to co-vary with one another Measures need not co-vary with one another
Direction of causality between construct The direction of causality is from the construct to its multi- Construct is a combination of its measures
and measures item measures
Changes in a measure should not cause changes in the Changes in the construct should not cause changes in
construct the measures

in selecting the aggregation methods. Deriving a composite mea- Table 4

Capability determination hierarchy and characterization values.
sure in a multidimensional construct consists of aggregating the
low dimensions to a higher dimension [47]. Capability rating with an ordinal scale: 0, 1, 2, 3, 4, 5
The SPICE capability hierarchy of a process can be depicted as a PA rating with an ordinal scale: N, P, L, F
Process-level characterization of outcomes: Not defined
multidimensional construct as in Fig. 3. The construct can be rep-
Practice implementation-level characterization of outcomes: Not defined
resented as a third-order path diagram, such as reflective first-or-
der, formative second-order, and formative-third order [14], where Note: F (fully achieved), L (largely achieved), P (partially achieved), or N (not
achieved).
PAs and outcomes (OCs) are also depicted as multidimensional
constructs. The third-order conceptualization in Fig. 3 includes
two errors: measurement error (d) and disturbance error (f) For SPICE does not provide measurement scales to characterize the
the sake of simplicity, PA 2.2, which is described in Table 1, is used extent to which outcomes, as a base measure, are achieved, i.e.,
as an example. a lack of measurement scale for characterizing outcome
The rectangles in the bottom of Fig. 3 represent the practice achievements at the practice implementation-level.
implementation-level outcomes of n process instances assessed Thus, there is certainly no method to aggregate a set of practice
against a PA outcome (OC 2.2x). A sample with n process instances implementation-level outcome values to a process-level out-
was from a population of process instances in a process assessed. come, i.e., missing (a) in Fig. 3.
For example, in Fig. 3, the first outcome in PA 2.2’s (i.e., OC 2.2a) Furthermore, SPICE does not provide any guide to aggregate a
is assessed across n process instances. Their assessment values, set of process-level outcome values to a PA percentage, i.e.,
ð1Þ ðnÞ
i.e., OC2:2a ; ; OC2:2a are called practice implementation-level out- missing (b) in Fig. 3.
come values. Since these values are replicated assessments in an
assessed organization that uses the same process standard, the Thus, the issues related to capability determination hierarchy
relationship between a shadow circle (OC 2.2a) and a rectangle are summarized in Table 5. The following sections analyze those is-
ðÞ
(n outcomes OC2:2a S) can be assumed as a reflective measurement sues and provide candidate solutions to tackle such issues.
model [denoted as (a) in Fig. 3]. The n implementation-level out- Issues 2 and 3 have been partially drafted in ISO/IEC 33002 un-
ðÞ
comes, OC2:2a S), satisfy the reflective decision rules in Table 3. der development as follows (note that the draft assumes practice
Next is the relationship between PA 2.2 and its four process-le- measurement) (cited from ISO/IEC CD 33002 [33]):
vel outcomes (OC 2.2a, , OC 2.2d). Since the outcomes are defined
to represent the contribution to the achievement of the PA [29], (a) The extent to which a practice is implemented shall be char-
they are developed with a unique aspect of PA [denoted as (b) in acterized for each process instance, based on validated data.
Fig. 3]. This implies that the four process-level outcomes are the (b) The process attribute rating for every process attribute
formative measures of PA 2.2. They satisfy the formative decision within the scope of the assessment shall be characterized
rules in Table 3. for each process instance, based on validated data. For the
PA 2.2 is not only a formative dimension of a multidimensional highest process attribute ratings, there shall be at least
construct CL [denote as (c) in Fig. 3], but also the formative con- two sources of objective evidence available from each
struct of the four dimensions, such as OC 2.2a, . . ., OC 2.2d. The selected process instance, e.g., Fully Achieved or Largely
PAs in Fig. 3(c) are statistically validated as a formative measure- Achieved.
ment model using the CTA [40]. The current aggregation method
from a set of PAs to a CL rating is correctly defined to meet the con- The aggregation rules in (a) and (b) can be depicted as a multi-
cept of the formative construct. All of the above explanations can dimensional construct of formative first-order and reflective sec-
also be directly applicable to PA 1.1. ond-order, as depicted in Fig. 4. The figure shows that all of the
process instances are separately rated for PA 2.2 and then aggre-
gated to a PA 2.2 rating. The first-order factors are not only reflec-
4. Issues in the assessment method tive dimensions of the second-order construct PA, but also a
formative construct of the four measures GP(1)s. As seen in (a)
Based on the analyses of the multidimensional constructs in the and (b) above, since ISO/IEC CD 33002 assumes the assessment
previous section, this section reveals and summarizes the issues of practice implementation, GP (generic practice) is intentionally
related to capability determination hierarchy of the SPICE assess- used.
ment method. Table 4 shows the characterizations in capability However, no studies based on Fig. 4 have been reported in
determination hierarchy. As seen in Table 4, literature [14]. This absence is attributed to the difficulty in
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1455

Table 5
Issues from CL determination hierarchy.

Issues Description
Issue 1 What measurement scale is used in characterizing the extent to which an outcome is achieved at the practice implementation-level?
Issue 2 How can a set of practice implementation-level outcome values be aggregated to a process-level one in the assessment of multiple process instances?
That is, (a) in Fig. 3
Issue 3 How can a set of process-level outcome values be aggregated to a PA-level percentage or rating? That is, (b) in Fig. 3

between the ratings, i.e., 4 3 – 3 2 and F L – L P [65]. The

ζ 2.2
meaning lies in the relationship between two values such that F
Reflective (2 nd order) is superior to L, L to P, and P to N.

ζ1 5.2. Outcome measurement scale

ζn
PA (1)2.2 ... PA (n)2.2 Formative (1 st order ) Any SPICE standard does not provide a measurement scale for
characterizing the achievement of the practice implementation-le-
vel outcomes as base measures. The ordinal scale rating NPLF of
PAs in Table 2 is a composite measure transformed from the per-
(1)
GP2.2.1 ... (1)
GP2.2.4
(n)
GP2.2.1 . . . GP (n)
2.2.4 centage of PA achievement. The percent scale, including zero, is a
ratio scale. The percentage in the PA rating is an aggregation of a
set of outcome achievements at the process-level in the assess-
Fig. 4. Formative first-order and reflective second-order (a simplified example). ment of multiple process instances or implementation-level in a
single instance assessment.
As noted in the previous section, according to the classical
interpreting the terms f1, . . ., fn at the first-order dimensions be- measurement theory, outcome values characterized in imple-
cause they represent both disturbance errors due to the formative mentation- and process-levels should be defined with a ratio
specification of the first-order dimensions and measurement er- scale because they are used to compute the percentage (i.e., a
rors due to the reflective measures of the second-order construct ratio scale) of PA achievement. Thus, a ratio scale with a range
PA [14]. Second, each of the first-order factors is a formative con- from 0 to 1 should be a measurement scale of outcome charac-
struct created by measures GP(1)s generated by each process in- terization in implementation- and process-levels. This is also
stance. Thus, they cannot be interchanged. However, since the compatible with the recommendations from other studies, as
first-order factors are manifestations of a second-order construct, summarized by [19].
they are interchangeable. This, however, contradicts the interpre- If the percentage in Table 2 is used only as a supplementary to
tation at a theoretical perspective. A composite measure derived improve the understanding of the ordinal scale rating NPLF, then,
from a contradictory multidimensional construct should be re- the outcome characterization in the implementation and process
examined. levels can be defined with an ordinal or ratio scale.

6. Aggregation of practice implementation-level outcome

5. Measurement scale
values

This section addresses issue 1 in Table 5 at a theoretical per-

Aggregating a set of measures to create a composite measure
spective. It is as follows:
depends on the construct specification, such as reflective or forma-
Issue 1: What measurement scale is used in characterizing the
tive constructs. Accordingly, we separately examined the aggrega-
extent to which outcome is achieved at the practice implementa-
tion method of reflective and formative constructs. A composite
tion-level?
measure of a reflective construct is sometimes called a composite
scale, while that of a formative construct is also identified as a
5.1. Scale type assumption composite index [8,61].
This section addresses issue 2 in Table 5, focusing on the theo-
According to the classical measurement theory introduced by retical analysis of a composite measure of a reflective construct. It
Stevens [59], the following four levels of measurement scales are is as follows:
defined from a lower level to a higher level: nominal scales (cate- Issue 2 (Aggregation in reflective measurement model): How
gorical: only attributes are named), ordinal scales (rankings: attri- can a set of practice implementation-level outcome values be
butes can be ordered), interval scales (equal distances aggregated to a process-level outcome in the assessment of multi-
corresponding to equal quantities of the attribute), and ratio scales ple process instances?
(equal distances corresponding to equal quantities of the attribute Issue 2 related to (a) in Fig. 3 shows that each of the process-
where the value of zero corresponds to none of the attributes) [61]. level outcomes (i.e., OC 2.2a) behaves as a reflective construct of
ð1Þ ðnÞ
Lower level transformation, from a higher measurement level to its implementation-level outcome values (i.e., OU2:2a , . . ., OC2:2a ).
lower level, is possible such that, (i) a ratio scale can be trans- In the reflective measurement model, measures are classified
formed to an interval, ordinal, or nominal scale, (ii) an interval to into three models according to factor loadings and error vari-
an ordinal or nominal, and (iii) an ordinal to a nominal. However, ances: congeneric, tau-equivalent, and parallel measures
the inverse direction is not allowed. [10,11]. Congeneric measures are presumed to share the same
The ordinal scales (ratings) can be represented by numerical or construct (i.e., all measure loads on only one factor) and the size
verbal rankings, such as 1, 2, 3, and 4; A, B, and C; or N, P, L, and F in of their factor loadings and measurement errors are free to vary
the SPICE PA rating. However, there is no meaning in the difference (i.e., need not be the same). A tau-equivalent model entails a
1456 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461

congeneric solution in which the measures of a given factor have a taxonomy of 13 MADM methods.8 Those methods can be utilized
equal loadings but differing error variances. The most restrictive to aggregate a set of values in order to derive a composite one and
model treats measures as parallel, in which the observed mea- rank alternatives when there are multiple alternatives. Zhou and
sures are posited to have equal factor loadings and equal error Ang [67] examined previous studies in order to develop composite
variances. measures for ranking alternatives by using MADM methods and
These three models can be explained with a simple mathemat- the data envelopment analysis (DEA).
ical notation. From Eq. (1), two measures can be presented as Munda and Nardo [46] addressed the advantages of the non-
follows: compensatory MADM approach in constructing a composite mea-
sure in order to rank the alternatives, where if trade-offs among
xi ¼ k i n i þ d i ; attribute values are permitted, then it is referred to as being com-
ð4Þ
xj ¼ k j n j þ d j ; j–i: pensatory. Otherwise, it is non-compensatory. These studies
implicitly assumed a formative measurement model. Although for-
Assume that the error terms di and dj are not correlated, and the true
mative specification is not explicitly stated, most of the composite
score is the same (i.e., ni ¼ nj ). Then, the following statements hold
measures listed in [3] have also been developed on the basis of the
for (4):
MADM method for the formative measurement model.
If ki – kj and var(di) – var(dj), then xi and xj are congeneric
The MADM method can also be utilized to aggregate a set of
measures.
attribute values of one object to a value of a composite measure
If ki = kj and var(di) – var(dj), then xi and xj are tau-equivalent
with one alternative (a single case), which is the context of this
measures.
study. Thus, only three of the thirteen MADM methods in Yoon
If ki = kj and var(di) = var(dj), then xi and xj are parallel measures.
and Hwang [64] are theoretically and practically applicable to
the aggregation of the process-level outcome values: Simple addi-
Parallel measures are assumed to measure the latent construct
tive weighting (SAW) method, Weighted product (WP) method,
with the same level of precision. Thus, the measures can be inter-
and conjugate methods.
changeable. Also, a latent construct can be estimated by averaging
This author [38] addressed an MADM SAW method in order to
the observed values of measures without considering the weight
aggregate a set of process-level outcome values to derive a PA out-
[43]. If the measures are congeneric, then the weighted average
come in the assessment of a single instance. This author also per-
may be reasonable because each measure has different loadings.
formed a sensitivity analysis of the SAW method in order to
For tau-equivalent, an average without considering the weights
evaluate how much a change in the weight or performance value
may be reasonable.
of an implemented practice will affect and alter the current PA rat-
Reflective measures are expected to have similar values in mea-
ing [39]. In the SAW method, weight assignment was determined
surement. Brown [10], Carmines and Zeller [11], and Gerbing and
on the basis of the analytic hierarchy process (AHP) [53]. In AHP,
Anderson [21] recommended an equal weight average as a com-
a problem is broken down into a hierarchy of interrelated decision
posite measure of the reflective construct. Therefore, this study
criteria (attributes); then, the aggregate value of the criteria is
recommends an average as an aggregation method to the pro-
formed on the base of their weights and achievements. AHP in tax-
cess-level characterization, in case of the practice implementa-
onomy can be utilized as a weight assignment method for the SAW
tion-level characterization with a ratio scale. However, since
or WP method.
outliers may produce unrealistic average, an examination may be
However, the SAW method is one of alternative aggregation
required to detect outliers. True outliers may be removed from
methods suggested in this study. Other methods should be equally
the aggregation,
investigated. Of course, there are more than three MADM methods
If the outcomes at the practice implementation-level are char-
described. This study does not consider the optimization approach
acterized with an ordinal scale, such as NPLF, as discussed in Sec-
that can only be applied to a problem with more than one alterna-
tion 5.2, then the median (middle number) can be used instead
tive. In the next section, the theoretical background of the three
of the average. However, the median may not be good aggregation
methods is introduced with minimum mathematics.
to generate a process-level characterization because of its limita-
tion. As an extreme example, assume that the ordinal ratings of
the four outcomes of PA 2.2 are F, F, N, and F. In this case, the med- 7.2. Simple additive weighting method
ian is an impractical composite value.
To utilize Eq. (3) with convenience, weights are normalized so
that their sum becomes one; the SAW method is represented with
7. Aggregation of process-level outcome values
popular notations in the MADM community as follows:
This section presents the theoretical analyses for issue 3 in Ta- X
q X
q
ble 5. It is as follows: V¼ wi aj ; where wj ¼ 1; ð5Þ
Issue 3 (Aggregation in the formative measurement model): j¼1 j¼1

How can a set of process-level outcome values be aggregated to a

PA-level percentage or rating? where wj and aj are weight and characterization values of the jth
As identiﬁed in Fig. 3(a), PA is a formative multidimensional process-level outcomes, and q denotes the number of outcomes in
construct with process-level outcomes as its dimensions. Compos- PA.
ite measures of the formative construct can be aggregated by using In the SAW method of a two-outcome case, i.e., V = w1a1 + w2a2,
MADM methods with one alternative. the small movement (Da1, Da2) for a constant value of V gives w1-
Da1 + w2Da2 = 0, which can be rearranged as the ratio of weights:
in economics, Da2/Da1 = w1/w2 is called the margin rate of tech-
7.1. MADM as an aggregation method
8
MADM [64,65] is a well-established and popular methodology They are: Dominances, MaxiMin, MaxiMax, Conjugate method, Disjunctive
method, Lexicographic method, Eliminations by aspect, Simple additive weighting
for constructing preference decisions (e.g., evaluation, prioritiza- (SAW) method, Weighted product (WP) method, Technique for Order Preference by
tion, and selection) among available alternatives characterized by Similarity to Ideal Solution (TOPIS), Elimination et choix traduisant la realite
multiple criteria or attributes. Yoon and Hwang [64, p. 6] provide (ELECTRE), Median ranking method, and Analytic hierarchy process (AHP).
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1457

nical substitution (MRTS) [62]. It is interpreted as the rate at which Table 6

a decision maker is willing to substitute a small amount of a2 for a1. A conjugate aggregation rule to a PA rating.

As an example, suppose w1 = 0.66 and w2 = 0.33. Then, an indiffer- The range of conjugate value aggregated PA rating
ence exists for the trade between 1 unit of a1 and 2 units of a2. >0–15% N (Not achieved)
The rationale of the SAW method is that since the concept of >15–50% P (Partially achieved)
substitutability is compatible with the reﬂective measurement >50–85% L (Largely achieved)
measures, the SAW method can be a candidate method for aggre- >85–100% F (Fully achieved)

gating reﬂective measures with equal weights. That is, it becomes

an average. Its potential applicability to aggregate formative mea-
sures is briefly discussed in Section 8.2.2.
priate. For example, ‘‘If the analyst decides that an increase in eco-
nomic performance cannot compensate for a loss in social cohesion
7.3. Weighted product method
or a worsening in environmental sustainability, then neither the
SAW nor the WP method are suitable’’ [47, p. 33].
The rationale of the WP method (also referred to as geometric
Let’s review a conjugate method example and then its applica-
aggregation9 [47]) is that a low value of any measure may lead to
bility to SPICE. The goal of CMMI is a multidimensional construct.
a low aggregate value because formative measures need not share
The relationship between the goal and its generic practice is as-
a common theme. As an example, loss of job, divorce, and loss of par-
sumed as formative. SCAMPI A has a rule to generate goal ratings
ents are considered as formative measures of life stress. A higher de-
as follows (cited from [57, p. 158]):
gree of life stress does not imply that all of these three events have
happened at the same time. Merely, one of the three may be enough
All associated practices are characterized at the process-level as
to increase life stress. In this context, Blalock [4, p. 105] suggested an
either LI (Largely Implemented) or FI (Fully Implemented).
aggregation method10 referred to as the Blalock formula. In the con-
The aggregation of weaknesses associated with the goal does
text of this process study, his method can be represented with vari-
not have a significant negative impact on goal achievement.
ables aj taking values between 0 and 1, where high values indicate
the presence of a cause. The aggregate value V becomes:
The above goal ratings can be expressed with the following sim-
X
q
ple notation:
logðVÞ ¼ wj logðaj Þ;
j¼1 aj P a0j ; j ¼ 1; . . . ; n;
ð6Þ
Y
q
w
)V ¼ aj j ; where a0j is the minimum acceptable level of the jth process-level
j¼1 characterization and is also the cutoff value of LI (Largely Imple-
mented) in SCAMPI A. This aggregation is called the conjugate
where if any one value of aj is small, then, V has a small value
method in non-compensatory MADM [64, p. 20].
regardless of the others.
Since the outcomes of PA 2.2 are equally legitimate (i.e., one
For the WP method in a two-attribute case, MRTS is
outcome cannot be replaced or compensated with other out-
w1 a2 =w2 a1 [62]. To derive the meaning of the weight wj in Eq.
comes), the conjugate method may be considered as an alternative
(6), for the first practice 1, let D imply the small changes of the
aggregation method. This method is appropriate when an overall
variables. Then, the following is derived:
value is determined by a minimum value of all measures or ele-
logðVÞ ¼ w1 logða1 Þ; ments such as security. It is also stronger than WP for enforcing
ð7Þ weakness improvements.
logðV þ DVÞ ¼ w1 logða1 þ Da1 Þ
Conjugate method can be applied as follows: each of the process-
Subtracting the first equation from the second one in Eq. (7) and level outcome values has a value between 0 and 1. A conjugate value
then rearranging the difference gives is computed by min (process-level characterization values across
DV=V practices or outcomes). The value can be then transformed to NPLF
W1 ffi ; ð8Þ ratings by using a conjugate method, as shown in Table 6.
Da1 =a1
which implies that a 1% change in a1 is associated with a w1 percent 8. Candidate solutions and examples
change in V. Thus, w1 has the interpretation of elasticity in econom-
ics [62]. 8.1. Candidate solutions
The rational of the WP method penalizes low outcomes. Thus,
the WP method resembles a tool of policy-making in order to im- The previous three sections theoretically analyzed three issues
prove weaknesses. This is the reason as to why some OECD indices in Table 5, which should be resolved for consistent process assess-
use the geometric mean (equal weights in the WP method) [48,49]. ments. This study provides six candidate solutions as summarized
Data for the SAW method should be measured at the least interval in Table 7, under the compliance with the current PA rating of
scale, whereas the WP method requires a ratio scale [16]. NPLF. The six candidate solutions can be explained according to
the resolution of the three issues as follows:
7.4. Conjugate method
[Resolution of issue 1: Second column in Table 7]. For the first
When different dimensions or measures are equally legitimate issue, the two possible measurement scales of implementa-
and important, then a non-compensatory method may be appro- tion-level outcomes are the ratio or ordinal scale under the
compliance of the current PA rating scale of NPLF. However, if
9
The geometric aggregation method in some studies has a different equation from the ordinal measurement scale is adopted, the percentage inter-
the WP method in Eq. (5). For example, the geometric aggregation method has a form pretation in Table 2 is at best a perception.
P
of V ¼ qj¼1 ðwj aj Þ1=q in Zhou et al. [66].
P [Resolution of issue 2: Third column in Table 7]. The resolution
10
logð1 yÞ ¼ qj¼1 wj logð1 xj Þ, where the variables xj take a value between 0 and
1. High values mean that the cause (formative) is present. The aggregate value is of the second issue depends on the properties of Eq. (4), such as
denoted by y. congeneric, tau-equivalent, or parallel measures. Since multiple
1458 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461

Table 7
The combination of the measurement scale and the four aggregation methods.

Solution Implementation-level Aggregation method Process-level Aggregation method [Process ? PA*]

characterizations [Issue 1] [Implementation ? Process] [Issue 2] characterization [Issue 3]
1 Ratio scale [0, 1] Average Ratio scale [0, 1] SAW method
2 Ratio scale [0, 1] Average Ratio scale [0, 1] WP method
3 Ratio scale [0, 1] Average Ratio scale [0, 1] Conjugate method in Table 6
4 Ratio scale [0, 1] Average Ratio scale [0, 1] Median method
5 Ordinal scale: NPLF Median Ordinal scale:NPLF Conjugate method in Table 6
6 Ordinal scale: NPLF Median Ordinal scale:NPLF Median method
*
Note that PA ratings in the last column are the same as the current one deﬁned in ISO/IEC 15504-2.

instances assessed are from diverse projects depending on a The second example in Table 9 is a case for candidate Solutions
sampling policy, tau-equivalent or parallel measures may not 5 and 6 for ordinal ratings NPLF. The implementation-level charac-
be expected. For the ratio scale, assuming congeneric measures terizations (italic) of outcomes have ordinal ratings of NPLF (Reso-
results in the use of arithmetic average without weights as an lution of issue 1). The last column represents aggregate values by
aggregation method. The average of the ratio scale is also the the median for reflective measurement models based on the reso-
ratio scale. Note that the average is automatically held for lution of issue 2. A conjugate method, as a non-compensatory
tau-equivalent or parallel measures. The concept of average in method, is applied to the column of median aggregation (resolu-
the ordinal scale is medium. Problems related to the median tion of issue 3). Its results are a rating of ‘‘Partially’’ of PA 2.2, i.e.,
method are discussed later. min (F, F, L, P). Finally, the median as an aggregation method at
[Resolution of issue 3: Fifth column in Table 7]. Since the rela- the process level results in a rating of ‘‘Fully’’ or ‘‘Largely’’. As
tionship between a PA and its process-level outcome values is you can see in the example, the conjugate or the median method
assumed as a formative measurement model, three aggregation has very limited usability.
methods (SAW, WP, or conjugate) are applicable. These meth-
ods aggregate the process-level outcome values to a percentage
of PA achievement. According to Table 2, the percentage is then
8.2.2. Interpretations
transformed to a PA rating. On the other hand, the conjugate
Weaknesses and strengths of the solutions can be revealed: The
method is applicable to the ordinal scale as well. The median
SAW method, Solution 1 in Table 7, implies full compensability and
method that has the concept of the SAW method may also be
rewards measures proportionally to the weights. It always assumes
considered.
a complete substitutability among the various outcomes. The WP
method, Solution 2, also entails a non-constant compensability
8.2. Examples and interpretations and presents greater rewards to higher values. That is, process
assessments with low values in some characterizations would
8.2.1. Examples achieve a higher aggregate value in the SAW method than the
Two examples are given to illustrate the six candidate solutions WP method.
in Table 7. Each of the solutions may yield different results when However, such compensability in SAW and WP methods may
applied to the same problem. not be practical in the formative aggregation context because any
The first example to determine PA 2.2 rating in Table 8 is devel- practices, outcomes, or achievements are not developed with the
oped to illustrate candidate Solutions 1–4, which are related to a concept of substitutability. Furthermore, weights are used as
ratio scale [0,1] (Resolution of issue 1). It includes PA 2.2 imple- importance coefficients in the process context but are equivalent
mentation-level outcome values (italic) for four outcomes in a pro- to the trade-off ratio in the SAW and WP methods. Therefore, a the-
cess. In the construct specification perspective, the second row of oretical inconsistency exists as noted in the OECD [47].
the first column in the example, i.e., ‘‘2.2a) requirements for the The SAW method assumes that the contribution of an individual
work products of the process are defined’’, is a reflective construct attribute to the total value is independent of the other attribute
and its measures (four outcomes from process instances assessed) values. However, outcomes assessed, based on practices, work
have outcome values of (86, 86, 86, 88) as implementation-level products, and resources, may be partially related to each other be-
characterizations. The remaining rows 2.2b), 2.2c), and 2.2d) can cause a practice may be an ascendant or descendent practice of
be interpreted in the same way. others. Furthermore, in a process, while an outcome is fully
In the last column in Table 8, the boldface values of (86.5, 81.5, achieved, the total failure of others may not be common due to
51.75, 16)T represent process-level outcome values by arithmetic communication and coordination among the project staff. This is
mean aggregation due to reflective specification. It corresponds not consistent with the assumptions of the SAW method.
to (a) in Fig. 3 and it is a ratio scale. However, even if the underlying assumption of the SAW meth-
The last four rows in Table 8 show the aggregate values of the od does not fully hold for the process context, it is known that the
four process-level characterizations by four aggregation methods. SAW method yields extremely close approximations to ‘‘true’’ va-
For the sake of simplicity, equal weights are assumed in SAW lue functions, even when independence among attributes does
and WP methods. In the SAW method, the aggregate value of not exactly hold [64]. In addition, when considering its transpar-
58.6% results in a rating of ‘‘Largely’’ in PA 2.2. The WP method ency, in terms of ease of understanding, and usability, the SAW
provides an aggregate value of 48.2%, i.e., (86.5 81.5 and WP methods can be good alternatives in order to aggregate
51.8 14.8)0.25, which results in a rating of ‘‘Partially’’. The conju- process-level outcome values to a PA percent value. However, con-
gate method gives a rating ‘‘Not Achieved’’ of 14.8%, i.e., min (86.5, sidering the insufficient experiences thus far acquired on the WP
81.5, 51.8, 14.8). The median method assigns a rating of ‘‘Largely’’, method in the SPICE community, the SAW method may be more
i.e., median (86.5, 81.5, 51.8, 14.8) = (81.5 + 51.8)/2 = 66.6%. The appropriate, i.e., Solution 1 looks more favorable than Solution 2.
rating of PA 2.2 is very different, such as ‘‘Largely’’, ‘‘Partially’’, Weight assignment in the SAW method can be solved by using
and ‘‘Not Achieved’’ depending on the aggregation methods. AHP [53], as seen in previous studies [38,39].
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1459

Table 8
Example 1: results by candidate Solutions 1–4 (unit%).

Achievement Implementation-level outcome value Process-level outcome

value
Process Process Process Process Arithmetic mean across
instance 1 instance 2 instance 3 instance 4 process instances
(OC 2.2a) requirements for the work products of the process are defined 86 86 86 88 86.5
(OC 2.2b) requirements for documentation and control of the work products are 67 86 86 87 81.5
defined;
(OC 2.2c) work products are appropriately identified, documented, and 45 52 55 55 51.75
controlled;
(OC 2.2d) work products are reviewed in accordance with planned arrangements 9 16 17 17 14.8
and adjusted as necessary to meet requirements.

Aggregation method to PA 2.2 Aggregate value

Simple additive weighting (SAW) (Solution 1) 58.6 (L)
Weighted product (WP) (Solution 2) 48.2 (P)
Conjugate (Solution 3) 14.8 (N)
Median (Solution 4) 66.6 (L)

Table 9
Results by candidate Solutions 5 and 6.

Attribute achievement Implementation-level outcome value Process-level outcome value

Process Process Process Process Median across process instances
instance 1 instance 2 instance 3 instance 4 (process-level outcome value)
(OC 2.2a) requirements for the work products of the process are defined; F F F F F
(OC 2.2b) requirements for documentation and control of the work L F F F F
products are defined;
(OC 2.2c) work products are appropriately identified, documented, and P L L L L
controlled;
(OC 2.2d) work products are reviewed in accordance with planned N P P P P
arrangements and adjusted as necessary to meet requirements.
Aggregation method to PA 2.2 Aggregate value
Conjugate (Solution 5) P
Median (Solution 6) F or L

The conjugate aggregation, i.e., Solutions 3, fully accommodates combined to form an overall representation of the construct [41].
non-compensatory aggregation. This method assumes that all As noted earlier, such an example is the current aggregation rule
measures are essential activities to be implemented for full from PA ratings to those of CL in ISO/IEC 15504-2. If this approach
achievement of the PA. This can be conceptualized in terms of Jus- is adopted, its validity as an aggregation method should also be
tus von Liebig’s Law of the Minimum ‘‘yield is proportional to the statistically tested.
amount of the most limiting nutrient, whichever nutrient it may
be’’ [50]. Thus, the conjugate method based on a ratio scale may 9. Concluding remarks
also be an appropriate aggregation method to a PA rating. Adapting
this method is based on a policy that pushes up low outcome value. This study addressed three issues arising in a CL determination
However, this method may be too strict for its use in private sector hierarchy (from practice implementation-level characterizations to
assessments. PA rating) in the SPICE assessment method. The issues are analyzed
The medium method, Solution 4, has similar concept as the on the bases of a measurement theory and two kinds of construct
SAW. However, it seems as if it has no specific advantages in the specifications: reflective and formative.
process assessment context due to information loss by the median As Hobbs et al. [24, p. 384] noted, the appropriateness of MADM
for the ratio scale. is related to the question: ‘‘Is the method appropriate to the prob-
As previously noted, the conjugate method of ordinal measure- lem it is to be applied to, the people who will use it, and the insti-
ment (Solution 5) is based on a policy that pushes up a low out- tutional setting in which it will be implemented?’’ This question
come value. This is a useful method in which Justus von Liebig’s implies that the selection of the most appropriate method is the
Law of the Minimum is held. SCAMPI A employs this method in or- responsibility of process communities, including ISO. Six candidate
der to aggregate process-level achievements to the goal ratings. solutions and its possible variants can be a starting point to begin
Finally, Solution 6 is the median method for ordinal scale, which discussions in determining the measurement scale and aggregation
is the same concept of the SAW in the ratio scale. However, it can- methods. The rationale and methods addressed in this study can
not consider weight and may not reach a unique rating, as shown also be applied to other domains in order to determine a composite
in in Table 9. Sometimes, it generates unrealistic results, such as (aggregate) value. However, considering the information utiliza-
FFNNN ? N and FFFNN ? F. However, it may be possible to devel- tion collected and the ease of use may recommend Solution 1 (ratio
op a variant aggregation method to be compatible with the forma- scale, average for implementation-level aggregation, and the SAW
tive measurement model. That is, the dimensions are algebraically method for process-level aggregation) for SPICE assessments. It re-
1460 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461

quires weight assignments as well as outcome achievements. [26] ISO/IEC 15504-1, Information Technology—Process Assessment—Part 1:
Concepts and Vocabulary, ISO, 2004.
There are many methods for assigning weights, including the
[27] ISO/IEC 15504-2, Information Technology—Process Assessment—Part 2:
AHP [38,53,64]. Performing an Assessment, ISO, 2003.
Future studies can be summarized as follows: SPICE is implicitly [28] ISO/IEC 15504-4, Information Technology — Process Assessment — Part 4:
based on the characterization of the outcome achievement. How- Guidance on Use for Process Improvement and Process Capability
Determination, ISO, 2004.
ever, differences in contents between outcomes and practices ap- [29] ISO/IEC 15504-5, Information Technology — Process Assessment — Part 5: An
pear insignificant. In reality, some assessments were performed Exemplar Process Assessment Model, ISO, 2012.
with the evaluation of practice rather than outcome. Thus, future [30] ISO/IEC 15504-6, Information Technology—Process Assessment—Part 6: An
Exemplar System Life Cycle Process Assessment Model, ISO, 2008.
studies should investigate the differences in ratings between out- [31] ISO/IEC 15504-7, Information Technology—Process Assessment—Part 7:
come-based and practice-based assessments. This study consid- Assessment of Organizational Maturity, ISO, 2008.
ered a limited number of MADM methods for aggregation. Since [32] ISO/IEC 15504-8, Information Technology—Process Assessment—Part 8: An
Exemplar Process Assessment Model for IT Service Management, ISO, 2012.
there are a lot of MADM methods, further efforts may be required [33] ISO/IEC CD 33002, Information Technology—Process Assessment—
to consider other MADM methods, considering the ease of under- Requirements for Performing Process Assessment, ISO/IEC JTC1/SC 7 WG10,
standing and usability. Finally, the SPICE standardization group 2012.
[34] ISO/IEC CD 33003, Information Technology—Process Assessment—
should statistically validate the assertions presented herein on Requirements for Process Measurement Frameworks, ISO/IEC JTC1/SC 7
the construct specification, i.e., (a) and (b) in Fig. 3. WG10, 2012.
[35] ISO/IEC CD 33063, Information Technology—Process Assessment—Process
Assessment Model for Software Testing, ISO/IEC TC JTC1/SC 7/WG 10, 2012.
References [36] C.B. Jarvis, S.B. MacKenzie, P.M. Podsakoff, A critical review of construct
indicators and measurement model misspecification in marketing and
consumer research, Journal of Consumer Research 30 (2) (2003) 199–218.
[1] E.R. Babbie, The Practice of Social Research, Wadsworth/Thomson Learning,
[37] R. Johnson, C. Rosen, C.-H. Chang, To aggregate or not to aggregate: steps for
Inc., Belmont, CA, 2007.
developing and validating higher-order multidimensional constructs, Journal
[2] R.P. Bagozzi, Measurement and meaning in information systems and
of Business and Psychology 26 (3) (2011) 1–8.
organizational research: methodological and philosophical foundations, MIS
[38] H.-W. Jung, Rating the process attributes utilizing AHP in SPICE-based process
Quarterly 35 (2) (2011) 261–292.
assessments, Software Process Improvement and Practice 6 (2) (2001) 111–
[3] R. Bandura, A Survey of Composite Indices Measuring Country Performance:
122.
2006 Update, A UNDP/ODS Working Paper, 2008, <http://goo.gl/x2upo>.
[39] H.-W. Jung, Process attribute rating and sensitivity analysis in process
[4] H. Blalock Jr., Conceptualization and Measurement in the Social Sciences, Sage,
assessment, Journal of Software: Evolution and Process 24 (8) (2012) 401–419.
Beverley Hills, CA, 1982.
[40] H.-W. Jung, K. Ting, Investigating the relationship between process capability
[5] K.A. Bollen, Structural Equations with Latent Variables, Wiley, New York, 1989.
and its measures: reflective or formative, submitted for publication.
[6] K.A. Bollen, R. Lennox, Conventional wisdom on measurement: a structural
[41] K.S. Law, C.-S. Wong, W.H. Mobley, Toward a taxonomy of multidimensional
equation perspective, Psychological Bulletin 110 (2) (1991) 305–314.
constructs, The Academy of Management Review 23 (4) (1998) 741–755.
[7] K.A. Bollen, K.-F. Ting, A tetrad test for causal indicators, Psychological
[42] R.C. MacCallum, M.W. Browne, The use of causal indicators in covariance
Methods 5 (1) (2000) 3–22.
structure models: some practical issues, Psychological Bulletin 114 (3) (1993)
[8] K.A. Bollen, Indicator: methodology, International Encyclopedia of the Social
533–541.
and Behavioral Sciences (2001) 7282–7287.
[43] R.P. McDonald, Test Theory: A Unified Treatment, Lawrence Erlbaum, 1999.
[9] K.A. Bollen, Evaluating effect, composite, and causal indicators in structural
[44] G. Munda, M. Nardo, Constructing Consistent Composite Indicators: The Issue
equation models, MIS Quarterly 35 (2) (2011) 359–372.
of Weights, EUR 21834 EN, 2005.
[10] T.A. Brown, Confirmatory Factor Analysis for Applied Research, The Guilford
[45] G. Munda, M. Nardo, Weighting and aggregation for composite indicators, in:
Press, New York, 2006.
Proceedings of the European Conference on Quality in Survey Statistics
[11] E. Carmines, R. Zeller, Reliability and Validity Assessment, Sage University
(Q2006), Cardiff, UK, 2006, <http://goo.gl/oXSYi>.
Paper Series on Quantitative Applications in Social Sciences, Thousand Oaks,
[46] G. Munda, M. Nardo, Noncompensatory/nonlinear composite indicators for
CA, 1979.
ranking countries: a defensible setting, Applied Economics 41 (12) (2009)
[12] A. Diamantopoulos, H.M. Winklhofer, Index construction with formative
1513–1523.
indicators: an alternative to scale development, Journal of Marketing
[47] OECD, Handbook on Constructing Composite Indicators: Methodology and
Research 38 (2) (2001) 269–277.
User Guide, 2008, http://goo.gl/cS7PY.
[13] A. Diamantopoulos, The error term in formative measurement models:
[48] OECD, Human Development Report 2010, OECD, 2010, <http://goo.gl/dInix>.
interpretation and modeling implications, Journal of Modelling in
[49] OECD, OECD e-Government Studies: Indicators Project, 2011, <http://goo.gl/
Management 1 (1) (2006) 7–17.
sn8Yv>.
[14] A. Diamantopoulos, P. Riefler, K.P. Roth, Advancing formative measurement
[50] R.R. Ploeg, W. Bohm, M. Kirkham, On the origin of the theory of mineral
models, Journal of Business Research 61 (12) (2008) 1203–1218.
nutrition of plants and the law of the minimum, Soil Science Society of
[15] A. Diamantopoulos, Incorporating formative measures into covariance-based
America Journal 63 (5) (1999) 1055–1062.
structural equation models, MIS Quarterly 35 (2) (2011) 335. A335.
[51] S.A. Rijsdijk, E.J. Hultink, A. Diamantopoulos, Product intelligence: its
[16] U. Ebert, H. Welsch, Meaningful environmental indices: a social choice
conceptualization, measurement and impact on consumer satisfaction,
approach, Journal of Environmental Economics and Management 47 (2)
Journal of the Academy of Marketing Science 35 (3) (2007) 340–356.
(2004) 270–283.
[52] T.P. Rout, K. El Emam, M. Fusani, D. Goldenson, H.-W. Jung, SPICE in retrospect:
[17] J. Edwards, R. Bagozzi, On the nature and direction of relationships between
developing a standard for process assessment, Journal of Systems and
constructs and measures, Psychological Methods 5 (2) (2000) 155–174.
Software 80 (9) (2007) 1483–1493.
[18] J.R. Edwards, The fallacy of formative measurement, Organizational Research
[53] T.L. Saaty, How to make a decision: the analytic hierarchy process, European
Methods 14 (2) (2011) 370–388.
Journal of Operational Research 48 (1) (1990) 9–26.
[19] P.M. Fayers, D.J. Hand, Causal variables, indicator variables and measurement
[54] A. Saltelli, Composite indicators between analysis and advocacy, Social
scales: an example from quality of life, Journal of the Royal Statistical Society:
Indicators Research 81 (1) (2007) 65–77.
Series A (Statistics in Society) 165 (2) (2002) 233–253.
[55] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana,
[20] C. Fornell, F. Bookstein, Two structural equation models: LISREL and PLS
S. Tarantola, Global Sensitivity Analysis: The Primer, John Wiley & Sons, West
applied to consumer exit-voice theory, Journal of Marketing Research 19 (4)
Sussex, UK, 2008.
(1982) 440–452.
[56] SEI, Appraisal Requirements for CMMIÒ Version 1.3 (ARC, V1.3), Software
[21] D.W. Gerbing, J.C. Anderson, An updated paradigm for scale development
Engineering Institute, Carnegie Mellon University, 2011, <http://goo.gl/
incorporating unidimensionality and its assessment, Journal of Marketing
neIaw>.
Research 25 (2) (1988) 186–192.
[57] SEI, Standard CMMIÒ Appraisal Method for Process Improvement (SCAMPISM)
[22] J.B. Grace, K. Bollen, Representing general theoretical concepts in structural
A, Version 1.3: Method Definition Document (CMU/SEI-2011-HB-001),
equation models: the role of composite variables, Environmental and
Software Engineering Institute, Carnegie Mellon University, 2011, <http://
Ecological Statistics 15 (2) (2008) 191–213.
goo.gl/FoOZA>.
[23] J.R. Hipp, D.J. Bauer, K.A. Bollen, Conducting tetrad tests of model fit and
[58] SPICE Trials, SPICE Phase 2 Trials Final Report, ISO/IEC JTC1/SC7/WG10, 2003,
contrasts of tetrad-nested models: a new SAS macro, Structural Equation
<http://goo.gl/V3jQJ>.
Modeling: A Multidisciplinary Journal 12 (1) (2005) 76–93.
[59] S.S. Stevens, Mathematics, measurement, and psychophysics, in: S.S. Stevens
[24] B.F. Hobbs, What can we learn from experiments in multiobjective decision
(Ed.), Handbook of Experimental Psychology, Wiley, New York, 1951, pp. 1–49.
analysis?, IEEE Transactions on Systems on Man and Cybernetics 16 (3) (1986)
[60] K.-F. Ting, Confirmatory tetrad analysis in SAS, Structural Equation Modeling 2
384–394
(2) (1995) 163–171.
[25] R.D. Howell, E. Breivik, J.B. Wilcox, Is formative measurement really
[61] W. Trochim, J.P. Donnelly, Research Methods Knowledge Base, Atomic Dog Pub
measurement? Reply to Bollen (2007) and Bagozzi (2007), Psychological
Online, 2001.
Methods 12 (2) (2007) 238–245.
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1461

[62] H.R. Varian, Intermediate Microeconomics: A Modern Approach, seventh ed., [65] M. Zeleny, Multiple Criteria Decision Making, McGraw-Hill, New York, 1982.
WW Norton, New York, 2006. [66] P. Zhou, B. Ang, K. Poh, Comparing aggregating methods for constructing the
[63] J. Wilcox, R. Howell, E. Breivik, Questions about formative measurement, composite environmental index: an objective measure, Ecological Economics
Journal of Business Research 61 (12) (2008) 1219–1228. 59 (3) (2006) 305–311.
[64] K.P. Yoon, C.-L. Hwang, Multiple Attribute Decision Making: An Introduction, [67] P. Zhou, B. Ang, Comparing MCDA aggregation methods in constructing
Sage University Paper Series on Quantitative Applications in Social Sciences, composite indicators using the Shannon–Spearman measure, Social Indicators
Thousand Oaks, CA, 1995. Research 94 (1) (2009) 83–96.