SPICE Investigating Measurement Scales and Aggregation Methods in SPICE
SPICE Investigating Measurement Scales and Aggregation Methods in SPICE
a r t i c l e i n f o a b s t r a c t
Article history: Context: This study identified three issues in SPICE (Software Process Assessment and Capability dEter-
Received 7 September 2011 mination) assessment method based on ISO/IEC 15504-2 (Performing an assessment). The issues include
Received in revised form 8 December 2012 a lack of a measurement scale for characterizing the extent to which an outcome (practice) is achieved
Accepted 5 February 2013
(implemented) and two shortcomings of the aggregation methods in order to generate a process attribute
Available online 18 February 2013
(PA) rating. Such issues may be weaknesses to the needs of retaining consistent assessment results com-
parable within and across assessed organizations.
Keywords:
Objective: The purpose of this study is to identify issues, such as the measurement scale and aggregation
SPICE
ISO/IEC 15504
methods, in SPICE assessment methods and to provide candidate solutions based on measurement the-
Aggregation method ories, while the rating scales of the current PA and capability are retained.
MADM Method: For those purposes, the present study reviews scale types based on a measurement theory and
Formative measurement model uses the reflective and formative measurement models in order to find the relationships between PAs and
Reflective measurement model practices. Composite measure development methods that are dependent on the relationships are then
proposed on the basis of appropriate aggregation methods by using multiple attribute decision making
(MADM) methods.
Results: Six candidate solutions are presented along with their strengths and weaknesses based on prac-
tical and theoretical perspectives. Two examples are given to illustrate and interpret six candidate solu-
tions for the issues identified. By applying six candidate solutions to the examples shows that the
measurement scale and the aggregation methods influence the PA rating.
Conclusion: The process community, including the SPICE standardization group, can initiate discussions
in order to determine the measurement scale and the aggregation methods with our six candidate solu-
tions. The rationale and methods addressed in this study can also be applied to other domains in order to
derive a composite (aggregate) value or rating.
Ó 2013 Elsevier B.V. All rights reserved.
0950-5849/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.infsof.2013.02.004
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1451
The construct can be estimated using its measures in the composite Capability Dimension
measure development. The construct estimator, also known as the
composite measure, is derived (aggregated) by an appropriate com- Level 5: Optimizing process
bination of measures. In PAMs, a set of process attributes (PAs) are PA 5.1 Process innovation
defined as measures for indirectly measuring the process capability. PA 5.2 Continuous optimization
CL assigned by 0 to 5 is a composite value aggregated from a set of Level 4: Predictable process
PA ratings. PA rating, as a composite value, is also determined by PA 4.1 Process measurement
aggregating a set of outcomes values characterized. This hierarchy, PA 4.2 Process control
as shown in Fig. 3, is a multidimensional structure, in which the term Level 3: Establish process
multidimensional implies ‘‘a construct [that] consists of a number of PA 3.1 Process definition
interrelated attributes or dimensions and exists in multidimensional PA 3.2 Process deployment
domains’’ [41, p. 741].
Level 2: Managed process
The relationship between a multidimensional construct and its
PA 2.1 Performance management
dimensions as well as the relationship in measurement models can
PA 2.2 Work product management
be explained by construct specification, such as the reflective mea-
Level 1: Performed process
surement (or measure) model or the formative measurement (or
measure) model [2,4,6,9,12,14,17,20,42]. The former suggests that PA 1.1 Process performance
the latent construct is a cause of the observable measures whereas Level 0: Incomplete process Process Dimension
the latter implies that the observable measures are causes of the
latent concept. System Life Cycle Software Life Cycle
In order to develop composite measures, the aggregation meth- Processes Processes
ods depend on construct specification, i.e., the reflective or forma-
tive measurement model [1,19,22,25,47,51,54,55,61,63]. They are Fig. 1. The two-dimensional architecture of ISO/IEC 15504-5.
also influenced by the evaluation policy, the purpose of the com-
posite measure, and the measurement scale [34]. According to
Diamantopoulos et al. [14], construct misspecification (the incor- 2. Background
rect adoption of reflective or formative measurement model) is
not uncommon with an occurrence of around 30% across top-tier 2.1. Two-dimensional architecture
journals.
The purpose of this study is to identify issues related to mea- This study uses ISO/IEC 15504-5 [29] in order to explain PAM
surement scale and aggregation methods in order to generate PA because it is the most popular and well known standard. The
ratings in a process and then to provide a set of candidate solu- structure of ISO/IEC 15504-5 can be expressed as process and
tions to cope with them based on the theoretical and practical capability dimensions, as shown in Fig. 1. Process dimension in-
perspectives. Three issues are identified: the lack of a measure- cludes two process categories, such as System Lifecycle Processes,
ment scale for characterizing the extent of outcome achievement consisting of four Process Groups of 42 processes, and Software
(or practice implementation) and two shortcomings related to the Lifecycle Processes, containing three Process Groups of 17 pro-
lack of theory-based aggregation methods. Since measurement cesses [29].
scale and aggregation methods can influence PA and CL ratings, As shown in Fig. 1, the capability dimension consists of six CLs
these issues should be resolved for consistent assessment results in a range from zero to five. CL 1 includes one PA, and CLs 2 to 5
comparable within and across assessed organizations.4 It is re- comprise of two PAs that work collectively in order to significantly
quired that assessment methods should be designed to ‘‘provide enhance the capability of performing a given process. Each process
benchmark quality ratings. . .’’ [52, p. 9]. For such purposes, this in a PAM can be assessed and rated by CL 5. This is known as the
study utilizes the measurement theory [1,19,22,25,47,51,54, continuous model.5
55,61,63] and multiple attribute decision making (MADM) methods
[44–46,64] in the development of the current PA ratings. Six candi-
2.2. Process attribute and capability level ratings
date solutions are presented along with their strengths and weak-
nesses. Two examples are given to illustrate and explain the
While the best practices are included in PAMs, the ratings of PA
candidate solutions.
and CL belong to the assessment method described in ISO/IEC
This study is organized as follows. Section 2 introduces a brief
15504-2. Each PA in ISO/IEC 15504-5 is comprised of outcomes (re-
overview of SPICE, construct specification, and construct examina-
sults of achievements), base practices (generic practices), work
tion. Section 3 addresses the construct specifications in CL determi-
products (generic work products), and others, where terms in
nation hierarchy. Section 4 identifies the three associated issues. In
parenthesis are reserved for PA 2.1–PA 5.2. As an example, Table 1
Section 5, this study describes the first issue of the lack of a
shows the outcomes and generic practices of PA 2.2 (Work Product
measurement scale. In Sections 6 and 7, this study analyzes the
Management). For the sake of simplicity, this study employs terms
aggregation methods for practice implementation-level and pro-
such as outcomes, practices, and work products.
cess-level outcome values, respectively. Section 8 provides candi-
PA achievement, as an aggregate (composite) value, is the
date solutions for the three issues and then provides two
aggregation of outcome values. According to Table 2, the aggregate
examples. Finally, Section 9 presents concluding remarks as well
value is then transformed into the PA rating of an ordinal scale: F
as suggestions for further studies.
(Fully achieved), L (Largely achieved), P (Partially achieved), or N
(Not achieved). This study refers to them as NPLF.
5
The model with the ordered capability for individual processes is called a
4
Assessed organizations that are part or all of an organization are called continuous representation, while the model with an ordered set of related processes
organizational units [26]. The ISO glossary can also be accessed in the Software and that defines (organizational unit) maturity is referred to as a stage model, such as ISO/
Systems Engineering Vocabulary (SEVOCAB) (www.computer.org/sevocab) and Sys- IEC 15504-7 and CMMI maturity. Note that this study is limited to ISO/IEC 15504-5
tems and Software Engineering – Vocabulary (ISO/IEC/IEEE 24765:2010). with capability as a continuous model.
1452 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461
Table 1
PA 2.2 Work product management attributes (cited from ISO/IEC 15504-5).
Outcomes*(Achievement) of PA 2.2
(OC 2.2a) requirements for the work products of the process are defined
(OC 2.2b) requirements for documentation and control of the work products are defined
(OC 2.2c) work products are appropriately identified, documented, and controlled
(OC 2.2d) work products are reviewed in accordance with planned arrangements and adjusted as necessary to meet requirements
Generic practices of PA 2.2
GP 2.2.1 Define the requirements for the work products
GP 2.2.2 Define the requirements for documentation and control of the work products
GP 2.2.3 Identify, document and control the work products
GP 2.2.4 Review and adjust work products to meet the defined requirements
*
In this study, the outcome number in 15504-5 standard was changed to explicitly denote the outcomes in (PA 2.2, i.e., a) ? OC 2.2a.
Table 2
The rating scale of process attributes (cited from ISO/IEC 15504-2).
SPICE assessment is implicitly based on the characterization of Since measures, which are supposed to be sampled from the same
outcome achievement because a practice may result in one or more conceptual domain, also have the same or similar contents and
outcomes; thus, measuring each practice outcomes may double share a common theme, dropping any one measure cannot change
count each outcome6. The practice implementation-level outcomes the conceptual domain of the construct and therefore, measures
become the base measures in an assessment of multiple process in- are interchangeable.
stances. If a single process instance is assessed, the process-level In Fig. 2a, the reflective construct, n (Ksi), denotes an underlying
outcomes become the base measures. The explanation of the prac- factor of four reflective measures xis The construct-measure rela-
tice implementation-level and process-level is provided in Section 3 tionship is represented by the following set of equations:
in detail.
x ¼ ki n þ di ; ð1Þ
ISO/IEC 15504-2 provides a rule to aggregate PA ratings in order
to derive the CL. A process in the assessment of a single instance is where coefficient k (lambda) is the expected impact of a one-unit
achieved to CL k if all PAs below CL k satisfy rating F; moreover, le- difference in n on xi. The random error term di (delta) is the mea-
vel k PA are rated as either F or L. Since all aggregation methods surement error.
analyzed and suggested by this study are concerned with ones to In the formative measurement model or formative specification,
reach the current PA ratings, the current PA rating scale and the shown in Fig. 2b, the formative construct can be regarded as an in-
rule to aggregate PA ratings to CL in ISO/IEC 15504-2 are not influ- dex generated by the observed measures, i.e., each measure is a
enced at all. determinant of the latent construct [6,17]. Formative measures
are not interchangeable and each measure captures a specific as-
2.3. Construct specification pect of the conceptual domain of construct. Thus, omitting any
measure may alter the conceptual domain and moreover, they
Fig. 2 shows two path diagrams of measurement models in con- need not co-vary with one another.
ventional structural equation modeling (SEM) [5], where con- The formative construct, g (eta), can be represented as
structs are represented as ovals, observed measures as follows:
rectangles, causal paths as single-headed arrows, and correlations
g ¼ c1 x1 þ þ cq xq þ f; ð2Þ
as double-headed arrows. A measurement model can be inter-
preted by the construct-measures relationship, known as a reflec- where g is the construct being estimated by its formative measure
tive or a formative measurement model. xi, coefficient ci (gamma) denotes the effect of measure xi on the la-
In the reflective measurement model or reflective specification, tent variable g, and the disturbance term f denotes the effect of
shown in Fig. 2a, a construct is regarded as the cause of the mea- measures omitted in the model on g. Note that in Eq. (2), g is
sures. Thus, variation in a construct results in variation in its mea- ‘‘the composite that best predicts the dependent variable in the
sures and the measures reflect the same underlying construct analysis’’ [17, p. 158].
[6,17]. Hence, measures are expected to co-vary with one another. The formative construct in Eq. (2) can be represented with no
error, i.e., the disturbance term f is assumed to be zero as follows:
6
On the other hand, the assessment method built on practice implementation is
C ¼ c 1 x1 þ þ c q xq ð3Þ
the Standard CMMI Appraisal Method for Process Improvement (SCAMPI) [56] for
CMMI appraisals [57], where SCAMPI is an instantiation of the Appraisal Require-
This equation works as the multi-attribute decision-making
ments for CMMI (ARC) that defines the requirements for appraisal methods intended (MADM) method, i.e., a composite measure, C, determined by a
for use with CMMI and with the People CMM (P-CMM) [56]. set of formative measures weighted by the importance or priority
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1453
λ1 λ2 λ3 λ4 γ1 γ2 γ3 γ4
x1 x2 x3 x4 x1 x2 x3 x4
δ1 δ2 δ3 δ4
Fig. 2. Relationship between a construct and its measures (adapted from ISO/IEC CD 33003 [34]).
ζ1 ζ2 ζ3 ζ4 ζ5
...
(b) Formative (2nd order)
... Process-level outcomes
OC (1)
2.2a
. . . OC (2.2a
n)
OC (1)
2.2d
. . . OC (2.2d
n)
Implementation-level outcomes (a) Reflective (1st order)
δ 2.2a
(1)
δ 2.2a
(n)
δ 2.2d
(1)
δ 2.2.d
(n)
of those measures7. MADM with one alternative is the same as the a simulation is required if less than four measures are used. The
aggregation method in order to derive a value of a composite mea- latter macro, CTANEST1, can test more complex models and non-
sure. Note that one alternative implies a single case. There are many continuous measures, but requires model identification. Jung and
MADM methods for the aggregation of a set of measures in order to Ting [40] introduced examples in order to test the construct spec-
derive a composite measure [64]. Detailed discussions on Eq. (3) can ification between capability and its process attributes by utilizing
be found in [13,15]. those macros and simulation. The decision rule and CTA should
be used to validate construct specifications in the multidimen-
sional constructs as well [37].
2.4. Construct examination
Two methods exist in order to investigate the construct specifi- 2.5. Composite measure development
cation. First, when data on the measures of the construct of interest
are not available (e.g., at the beginning of the development of best Composite measure development depends on construct specifi-
practices of an interesting construct), the construct specification is cations of the multidimensional construct [22,47,51,54,55,61].
examined through mental experiments [5] and/or decision rules However, its purpose does not find an SEM solution. The composite
[36]. Table 3 developed on the basis of several studies measure construction merely shares the same steps with an SEM
[5,14,17,18,36,63] presents the decision rules for construct development, from the theoretical model development to the oper-
specification. ationalization of the model constructs [5,47]. A composite measure
Second, when data are available (e.g., during or after trials), it is in Eq. (3) is only related to the aggregation method. Many MADM
possible to statistically test whether a construct is reflective or for- methods [64] can be applied to formative constructs. The creation
mative by using the two confirmatory tetrad analysis (CTA) SAS of composite measures is provided in Section 7.
macros: SAS-CTA2 [7,60] or CTANEST1 [23]. The former can be di-
rectly utilized for a construct with four or more measures whereas
3. Construct specifications in process capability
7
Bollen [9] used the terms causal indicators for measures in (2) and formative
indicator for measures in (3). For consistency with other studies, this study uses the
This section analyzes a hierarchy of SPICE multidimensional
term formative measure for both cases. However, we clearly state the difference, if construct in order to determine whether constructs in the hierar-
appropriate. chy are reflective or formative ones. Analysis results are utilized
1454 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461
Table 3
Decision rules for determining construct specification (cited from ISO/IEC CD 33003).
Table 5
Issues from CL determination hierarchy.
Issues Description
Issue 1 What measurement scale is used in characterizing the extent to which an outcome is achieved at the practice implementation-level?
Issue 2 How can a set of practice implementation-level outcome values be aggregated to a process-level one in the assessment of multiple process instances?
That is, (a) in Fig. 3
Issue 3 How can a set of process-level outcome values be aggregated to a PA-level percentage or rating? That is, (b) in Fig. 3
congeneric solution in which the measures of a given factor have a taxonomy of 13 MADM methods.8 Those methods can be utilized
equal loadings but differing error variances. The most restrictive to aggregate a set of values in order to derive a composite one and
model treats measures as parallel, in which the observed mea- rank alternatives when there are multiple alternatives. Zhou and
sures are posited to have equal factor loadings and equal error Ang [67] examined previous studies in order to develop composite
variances. measures for ranking alternatives by using MADM methods and
These three models can be explained with a simple mathemat- the data envelopment analysis (DEA).
ical notation. From Eq. (1), two measures can be presented as Munda and Nardo [46] addressed the advantages of the non-
follows: compensatory MADM approach in constructing a composite mea-
sure in order to rank the alternatives, where if trade-offs among
xi ¼ k i n i þ d i ; attribute values are permitted, then it is referred to as being com-
ð4Þ
xj ¼ k j n j þ d j ; j–i: pensatory. Otherwise, it is non-compensatory. These studies
implicitly assumed a formative measurement model. Although for-
Assume that the error terms di and dj are not correlated, and the true
mative specification is not explicitly stated, most of the composite
score is the same (i.e., ni ¼ nj ). Then, the following statements hold
measures listed in [3] have also been developed on the basis of the
for (4):
MADM method for the formative measurement model.
If ki – kj and var(di) – var(dj), then xi and xj are congeneric
The MADM method can also be utilized to aggregate a set of
measures.
attribute values of one object to a value of a composite measure
If ki = kj and var(di) – var(dj), then xi and xj are tau-equivalent
with one alternative (a single case), which is the context of this
measures.
study. Thus, only three of the thirteen MADM methods in Yoon
If ki = kj and var(di) = var(dj), then xi and xj are parallel measures.
and Hwang [64] are theoretically and practically applicable to
the aggregation of the process-level outcome values: Simple addi-
Parallel measures are assumed to measure the latent construct
tive weighting (SAW) method, Weighted product (WP) method,
with the same level of precision. Thus, the measures can be inter-
and conjugate methods.
changeable. Also, a latent construct can be estimated by averaging
This author [38] addressed an MADM SAW method in order to
the observed values of measures without considering the weight
aggregate a set of process-level outcome values to derive a PA out-
[43]. If the measures are congeneric, then the weighted average
come in the assessment of a single instance. This author also per-
may be reasonable because each measure has different loadings.
formed a sensitivity analysis of the SAW method in order to
For tau-equivalent, an average without considering the weights
evaluate how much a change in the weight or performance value
may be reasonable.
of an implemented practice will affect and alter the current PA rat-
Reflective measures are expected to have similar values in mea-
ing [39]. In the SAW method, weight assignment was determined
surement. Brown [10], Carmines and Zeller [11], and Gerbing and
on the basis of the analytic hierarchy process (AHP) [53]. In AHP,
Anderson [21] recommended an equal weight average as a com-
a problem is broken down into a hierarchy of interrelated decision
posite measure of the reflective construct. Therefore, this study
criteria (attributes); then, the aggregate value of the criteria is
recommends an average as an aggregation method to the pro-
formed on the base of their weights and achievements. AHP in tax-
cess-level characterization, in case of the practice implementa-
onomy can be utilized as a weight assignment method for the SAW
tion-level characterization with a ratio scale. However, since
or WP method.
outliers may produce unrealistic average, an examination may be
However, the SAW method is one of alternative aggregation
required to detect outliers. True outliers may be removed from
methods suggested in this study. Other methods should be equally
the aggregation,
investigated. Of course, there are more than three MADM methods
If the outcomes at the practice implementation-level are char-
described. This study does not consider the optimization approach
acterized with an ordinal scale, such as NPLF, as discussed in Sec-
that can only be applied to a problem with more than one alterna-
tion 5.2, then the median (middle number) can be used instead
tive. In the next section, the theoretical background of the three
of the average. However, the median may not be good aggregation
methods is introduced with minimum mathematics.
to generate a process-level characterization because of its limita-
tion. As an extreme example, assume that the ordinal ratings of
the four outcomes of PA 2.2 are F, F, N, and F. In this case, the med- 7.2. Simple additive weighting method
ian is an impractical composite value.
To utilize Eq. (3) with convenience, weights are normalized so
that their sum becomes one; the SAW method is represented with
7. Aggregation of process-level outcome values
popular notations in the MADM community as follows:
This section presents the theoretical analyses for issue 3 in Ta- X
q X
q
ble 5. It is as follows: V¼ wi aj ; where wj ¼ 1; ð5Þ
Issue 3 (Aggregation in the formative measurement model): j¼1 j¼1
As an example, suppose w1 = 0.66 and w2 = 0.33. Then, an indiffer- The range of conjugate value aggregated PA rating
ence exists for the trade between 1 unit of a1 and 2 units of a2. >0–15% N (Not achieved)
The rationale of the SAW method is that since the concept of >15–50% P (Partially achieved)
substitutability is compatible with the reflective measurement >50–85% L (Largely achieved)
measures, the SAW method can be a candidate method for aggre- >85–100% F (Fully achieved)
Table 7
The combination of the measurement scale and the four aggregation methods.
instances assessed are from diverse projects depending on a The second example in Table 9 is a case for candidate Solutions
sampling policy, tau-equivalent or parallel measures may not 5 and 6 for ordinal ratings NPLF. The implementation-level charac-
be expected. For the ratio scale, assuming congeneric measures terizations (italic) of outcomes have ordinal ratings of NPLF (Reso-
results in the use of arithmetic average without weights as an lution of issue 1). The last column represents aggregate values by
aggregation method. The average of the ratio scale is also the the median for reflective measurement models based on the reso-
ratio scale. Note that the average is automatically held for lution of issue 2. A conjugate method, as a non-compensatory
tau-equivalent or parallel measures. The concept of average in method, is applied to the column of median aggregation (resolu-
the ordinal scale is medium. Problems related to the median tion of issue 3). Its results are a rating of ‘‘Partially’’ of PA 2.2, i.e.,
method are discussed later. min (F, F, L, P). Finally, the median as an aggregation method at
[Resolution of issue 3: Fifth column in Table 7]. Since the rela- the process level results in a rating of ‘‘Fully’’ or ‘‘Largely’’. As
tionship between a PA and its process-level outcome values is you can see in the example, the conjugate or the median method
assumed as a formative measurement model, three aggregation has very limited usability.
methods (SAW, WP, or conjugate) are applicable. These meth-
ods aggregate the process-level outcome values to a percentage
of PA achievement. According to Table 2, the percentage is then
8.2.2. Interpretations
transformed to a PA rating. On the other hand, the conjugate
Weaknesses and strengths of the solutions can be revealed: The
method is applicable to the ordinal scale as well. The median
SAW method, Solution 1 in Table 7, implies full compensability and
method that has the concept of the SAW method may also be
rewards measures proportionally to the weights. It always assumes
considered.
a complete substitutability among the various outcomes. The WP
method, Solution 2, also entails a non-constant compensability
8.2. Examples and interpretations and presents greater rewards to higher values. That is, process
assessments with low values in some characterizations would
8.2.1. Examples achieve a higher aggregate value in the SAW method than the
Two examples are given to illustrate the six candidate solutions WP method.
in Table 7. Each of the solutions may yield different results when However, such compensability in SAW and WP methods may
applied to the same problem. not be practical in the formative aggregation context because any
The first example to determine PA 2.2 rating in Table 8 is devel- practices, outcomes, or achievements are not developed with the
oped to illustrate candidate Solutions 1–4, which are related to a concept of substitutability. Furthermore, weights are used as
ratio scale [0,1] (Resolution of issue 1). It includes PA 2.2 imple- importance coefficients in the process context but are equivalent
mentation-level outcome values (italic) for four outcomes in a pro- to the trade-off ratio in the SAW and WP methods. Therefore, a the-
cess. In the construct specification perspective, the second row of oretical inconsistency exists as noted in the OECD [47].
the first column in the example, i.e., ‘‘2.2a) requirements for the The SAW method assumes that the contribution of an individual
work products of the process are defined’’, is a reflective construct attribute to the total value is independent of the other attribute
and its measures (four outcomes from process instances assessed) values. However, outcomes assessed, based on practices, work
have outcome values of (86, 86, 86, 88) as implementation-level products, and resources, may be partially related to each other be-
characterizations. The remaining rows 2.2b), 2.2c), and 2.2d) can cause a practice may be an ascendant or descendent practice of
be interpreted in the same way. others. Furthermore, in a process, while an outcome is fully
In the last column in Table 8, the boldface values of (86.5, 81.5, achieved, the total failure of others may not be common due to
51.75, 16)T represent process-level outcome values by arithmetic communication and coordination among the project staff. This is
mean aggregation due to reflective specification. It corresponds not consistent with the assumptions of the SAW method.
to (a) in Fig. 3 and it is a ratio scale. However, even if the underlying assumption of the SAW meth-
The last four rows in Table 8 show the aggregate values of the od does not fully hold for the process context, it is known that the
four process-level characterizations by four aggregation methods. SAW method yields extremely close approximations to ‘‘true’’ va-
For the sake of simplicity, equal weights are assumed in SAW lue functions, even when independence among attributes does
and WP methods. In the SAW method, the aggregate value of not exactly hold [64]. In addition, when considering its transpar-
58.6% results in a rating of ‘‘Largely’’ in PA 2.2. The WP method ency, in terms of ease of understanding, and usability, the SAW
provides an aggregate value of 48.2%, i.e., (86.5 81.5 and WP methods can be good alternatives in order to aggregate
51.8 14.8)0.25, which results in a rating of ‘‘Partially’’. The conju- process-level outcome values to a PA percent value. However, con-
gate method gives a rating ‘‘Not Achieved’’ of 14.8%, i.e., min (86.5, sidering the insufficient experiences thus far acquired on the WP
81.5, 51.8, 14.8). The median method assigns a rating of ‘‘Largely’’, method in the SPICE community, the SAW method may be more
i.e., median (86.5, 81.5, 51.8, 14.8) = (81.5 + 51.8)/2 = 66.6%. The appropriate, i.e., Solution 1 looks more favorable than Solution 2.
rating of PA 2.2 is very different, such as ‘‘Largely’’, ‘‘Partially’’, Weight assignment in the SAW method can be solved by using
and ‘‘Not Achieved’’ depending on the aggregation methods. AHP [53], as seen in previous studies [38,39].
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1459
Table 8
Example 1: results by candidate Solutions 1–4 (unit%).
Table 9
Results by candidate Solutions 5 and 6.
The conjugate aggregation, i.e., Solutions 3, fully accommodates combined to form an overall representation of the construct [41].
non-compensatory aggregation. This method assumes that all As noted earlier, such an example is the current aggregation rule
measures are essential activities to be implemented for full from PA ratings to those of CL in ISO/IEC 15504-2. If this approach
achievement of the PA. This can be conceptualized in terms of Jus- is adopted, its validity as an aggregation method should also be
tus von Liebig’s Law of the Minimum ‘‘yield is proportional to the statistically tested.
amount of the most limiting nutrient, whichever nutrient it may
be’’ [50]. Thus, the conjugate method based on a ratio scale may 9. Concluding remarks
also be an appropriate aggregation method to a PA rating. Adapting
this method is based on a policy that pushes up low outcome value. This study addressed three issues arising in a CL determination
However, this method may be too strict for its use in private sector hierarchy (from practice implementation-level characterizations to
assessments. PA rating) in the SPICE assessment method. The issues are analyzed
The medium method, Solution 4, has similar concept as the on the bases of a measurement theory and two kinds of construct
SAW. However, it seems as if it has no specific advantages in the specifications: reflective and formative.
process assessment context due to information loss by the median As Hobbs et al. [24, p. 384] noted, the appropriateness of MADM
for the ratio scale. is related to the question: ‘‘Is the method appropriate to the prob-
As previously noted, the conjugate method of ordinal measure- lem it is to be applied to, the people who will use it, and the insti-
ment (Solution 5) is based on a policy that pushes up a low out- tutional setting in which it will be implemented?’’ This question
come value. This is a useful method in which Justus von Liebig’s implies that the selection of the most appropriate method is the
Law of the Minimum is held. SCAMPI A employs this method in or- responsibility of process communities, including ISO. Six candidate
der to aggregate process-level achievements to the goal ratings. solutions and its possible variants can be a starting point to begin
Finally, Solution 6 is the median method for ordinal scale, which discussions in determining the measurement scale and aggregation
is the same concept of the SAW in the ratio scale. However, it can- methods. The rationale and methods addressed in this study can
not consider weight and may not reach a unique rating, as shown also be applied to other domains in order to determine a composite
in in Table 9. Sometimes, it generates unrealistic results, such as (aggregate) value. However, considering the information utiliza-
FFNNN ? N and FFFNN ? F. However, it may be possible to devel- tion collected and the ease of use may recommend Solution 1 (ratio
op a variant aggregation method to be compatible with the forma- scale, average for implementation-level aggregation, and the SAW
tive measurement model. That is, the dimensions are algebraically method for process-level aggregation) for SPICE assessments. It re-
1460 H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461
quires weight assignments as well as outcome achievements. [26] ISO/IEC 15504-1, Information Technology—Process Assessment—Part 1:
Concepts and Vocabulary, ISO, 2004.
There are many methods for assigning weights, including the
[27] ISO/IEC 15504-2, Information Technology—Process Assessment—Part 2:
AHP [38,53,64]. Performing an Assessment, ISO, 2003.
Future studies can be summarized as follows: SPICE is implicitly [28] ISO/IEC 15504-4, Information Technology — Process Assessment — Part 4:
based on the characterization of the outcome achievement. How- Guidance on Use for Process Improvement and Process Capability
Determination, ISO, 2004.
ever, differences in contents between outcomes and practices ap- [29] ISO/IEC 15504-5, Information Technology — Process Assessment — Part 5: An
pear insignificant. In reality, some assessments were performed Exemplar Process Assessment Model, ISO, 2012.
with the evaluation of practice rather than outcome. Thus, future [30] ISO/IEC 15504-6, Information Technology—Process Assessment—Part 6: An
Exemplar System Life Cycle Process Assessment Model, ISO, 2008.
studies should investigate the differences in ratings between out- [31] ISO/IEC 15504-7, Information Technology—Process Assessment—Part 7:
come-based and practice-based assessments. This study consid- Assessment of Organizational Maturity, ISO, 2008.
ered a limited number of MADM methods for aggregation. Since [32] ISO/IEC 15504-8, Information Technology—Process Assessment—Part 8: An
Exemplar Process Assessment Model for IT Service Management, ISO, 2012.
there are a lot of MADM methods, further efforts may be required [33] ISO/IEC CD 33002, Information Technology—Process Assessment—
to consider other MADM methods, considering the ease of under- Requirements for Performing Process Assessment, ISO/IEC JTC1/SC 7 WG10,
standing and usability. Finally, the SPICE standardization group 2012.
[34] ISO/IEC CD 33003, Information Technology—Process Assessment—
should statistically validate the assertions presented herein on Requirements for Process Measurement Frameworks, ISO/IEC JTC1/SC 7
the construct specification, i.e., (a) and (b) in Fig. 3. WG10, 2012.
[35] ISO/IEC CD 33063, Information Technology—Process Assessment—Process
Assessment Model for Software Testing, ISO/IEC TC JTC1/SC 7/WG 10, 2012.
References [36] C.B. Jarvis, S.B. MacKenzie, P.M. Podsakoff, A critical review of construct
indicators and measurement model misspecification in marketing and
consumer research, Journal of Consumer Research 30 (2) (2003) 199–218.
[1] E.R. Babbie, The Practice of Social Research, Wadsworth/Thomson Learning,
[37] R. Johnson, C. Rosen, C.-H. Chang, To aggregate or not to aggregate: steps for
Inc., Belmont, CA, 2007.
developing and validating higher-order multidimensional constructs, Journal
[2] R.P. Bagozzi, Measurement and meaning in information systems and
of Business and Psychology 26 (3) (2011) 1–8.
organizational research: methodological and philosophical foundations, MIS
[38] H.-W. Jung, Rating the process attributes utilizing AHP in SPICE-based process
Quarterly 35 (2) (2011) 261–292.
assessments, Software Process Improvement and Practice 6 (2) (2001) 111–
[3] R. Bandura, A Survey of Composite Indices Measuring Country Performance:
122.
2006 Update, A UNDP/ODS Working Paper, 2008, <http://goo.gl/x2upo>.
[39] H.-W. Jung, Process attribute rating and sensitivity analysis in process
[4] H. Blalock Jr., Conceptualization and Measurement in the Social Sciences, Sage,
assessment, Journal of Software: Evolution and Process 24 (8) (2012) 401–419.
Beverley Hills, CA, 1982.
[40] H.-W. Jung, K. Ting, Investigating the relationship between process capability
[5] K.A. Bollen, Structural Equations with Latent Variables, Wiley, New York, 1989.
and its measures: reflective or formative, submitted for publication.
[6] K.A. Bollen, R. Lennox, Conventional wisdom on measurement: a structural
[41] K.S. Law, C.-S. Wong, W.H. Mobley, Toward a taxonomy of multidimensional
equation perspective, Psychological Bulletin 110 (2) (1991) 305–314.
constructs, The Academy of Management Review 23 (4) (1998) 741–755.
[7] K.A. Bollen, K.-F. Ting, A tetrad test for causal indicators, Psychological
[42] R.C. MacCallum, M.W. Browne, The use of causal indicators in covariance
Methods 5 (1) (2000) 3–22.
structure models: some practical issues, Psychological Bulletin 114 (3) (1993)
[8] K.A. Bollen, Indicator: methodology, International Encyclopedia of the Social
533–541.
and Behavioral Sciences (2001) 7282–7287.
[43] R.P. McDonald, Test Theory: A Unified Treatment, Lawrence Erlbaum, 1999.
[9] K.A. Bollen, Evaluating effect, composite, and causal indicators in structural
[44] G. Munda, M. Nardo, Constructing Consistent Composite Indicators: The Issue
equation models, MIS Quarterly 35 (2) (2011) 359–372.
of Weights, EUR 21834 EN, 2005.
[10] T.A. Brown, Confirmatory Factor Analysis for Applied Research, The Guilford
[45] G. Munda, M. Nardo, Weighting and aggregation for composite indicators, in:
Press, New York, 2006.
Proceedings of the European Conference on Quality in Survey Statistics
[11] E. Carmines, R. Zeller, Reliability and Validity Assessment, Sage University
(Q2006), Cardiff, UK, 2006, <http://goo.gl/oXSYi>.
Paper Series on Quantitative Applications in Social Sciences, Thousand Oaks,
[46] G. Munda, M. Nardo, Noncompensatory/nonlinear composite indicators for
CA, 1979.
ranking countries: a defensible setting, Applied Economics 41 (12) (2009)
[12] A. Diamantopoulos, H.M. Winklhofer, Index construction with formative
1513–1523.
indicators: an alternative to scale development, Journal of Marketing
[47] OECD, Handbook on Constructing Composite Indicators: Methodology and
Research 38 (2) (2001) 269–277.
User Guide, 2008, http://goo.gl/cS7PY.
[13] A. Diamantopoulos, The error term in formative measurement models:
[48] OECD, Human Development Report 2010, OECD, 2010, <http://goo.gl/dInix>.
interpretation and modeling implications, Journal of Modelling in
[49] OECD, OECD e-Government Studies: Indicators Project, 2011, <http://goo.gl/
Management 1 (1) (2006) 7–17.
sn8Yv>.
[14] A. Diamantopoulos, P. Riefler, K.P. Roth, Advancing formative measurement
[50] R.R. Ploeg, W. Bohm, M. Kirkham, On the origin of the theory of mineral
models, Journal of Business Research 61 (12) (2008) 1203–1218.
nutrition of plants and the law of the minimum, Soil Science Society of
[15] A. Diamantopoulos, Incorporating formative measures into covariance-based
America Journal 63 (5) (1999) 1055–1062.
structural equation models, MIS Quarterly 35 (2) (2011) 335. A335.
[51] S.A. Rijsdijk, E.J. Hultink, A. Diamantopoulos, Product intelligence: its
[16] U. Ebert, H. Welsch, Meaningful environmental indices: a social choice
conceptualization, measurement and impact on consumer satisfaction,
approach, Journal of Environmental Economics and Management 47 (2)
Journal of the Academy of Marketing Science 35 (3) (2007) 340–356.
(2004) 270–283.
[52] T.P. Rout, K. El Emam, M. Fusani, D. Goldenson, H.-W. Jung, SPICE in retrospect:
[17] J. Edwards, R. Bagozzi, On the nature and direction of relationships between
developing a standard for process assessment, Journal of Systems and
constructs and measures, Psychological Methods 5 (2) (2000) 155–174.
Software 80 (9) (2007) 1483–1493.
[18] J.R. Edwards, The fallacy of formative measurement, Organizational Research
[53] T.L. Saaty, How to make a decision: the analytic hierarchy process, European
Methods 14 (2) (2011) 370–388.
Journal of Operational Research 48 (1) (1990) 9–26.
[19] P.M. Fayers, D.J. Hand, Causal variables, indicator variables and measurement
[54] A. Saltelli, Composite indicators between analysis and advocacy, Social
scales: an example from quality of life, Journal of the Royal Statistical Society:
Indicators Research 81 (1) (2007) 65–77.
Series A (Statistics in Society) 165 (2) (2002) 233–253.
[55] A. Saltelli, M. Ratto, T. Andres, F. Campolongo, J. Cariboni, D. Gatelli, M. Saisana,
[20] C. Fornell, F. Bookstein, Two structural equation models: LISREL and PLS
S. Tarantola, Global Sensitivity Analysis: The Primer, John Wiley & Sons, West
applied to consumer exit-voice theory, Journal of Marketing Research 19 (4)
Sussex, UK, 2008.
(1982) 440–452.
[56] SEI, Appraisal Requirements for CMMIÒ Version 1.3 (ARC, V1.3), Software
[21] D.W. Gerbing, J.C. Anderson, An updated paradigm for scale development
Engineering Institute, Carnegie Mellon University, 2011, <http://goo.gl/
incorporating unidimensionality and its assessment, Journal of Marketing
neIaw>.
Research 25 (2) (1988) 186–192.
[57] SEI, Standard CMMIÒ Appraisal Method for Process Improvement (SCAMPISM)
[22] J.B. Grace, K. Bollen, Representing general theoretical concepts in structural
A, Version 1.3: Method Definition Document (CMU/SEI-2011-HB-001),
equation models: the role of composite variables, Environmental and
Software Engineering Institute, Carnegie Mellon University, 2011, <http://
Ecological Statistics 15 (2) (2008) 191–213.
goo.gl/FoOZA>.
[23] J.R. Hipp, D.J. Bauer, K.A. Bollen, Conducting tetrad tests of model fit and
[58] SPICE Trials, SPICE Phase 2 Trials Final Report, ISO/IEC JTC1/SC7/WG10, 2003,
contrasts of tetrad-nested models: a new SAS macro, Structural Equation
<http://goo.gl/V3jQJ>.
Modeling: A Multidisciplinary Journal 12 (1) (2005) 76–93.
[59] S.S. Stevens, Mathematics, measurement, and psychophysics, in: S.S. Stevens
[24] B.F. Hobbs, What can we learn from experiments in multiobjective decision
(Ed.), Handbook of Experimental Psychology, Wiley, New York, 1951, pp. 1–49.
analysis?, IEEE Transactions on Systems on Man and Cybernetics 16 (3) (1986)
[60] K.-F. Ting, Confirmatory tetrad analysis in SAS, Structural Equation Modeling 2
384–394
(2) (1995) 163–171.
[25] R.D. Howell, E. Breivik, J.B. Wilcox, Is formative measurement really
[61] W. Trochim, J.P. Donnelly, Research Methods Knowledge Base, Atomic Dog Pub
measurement? Reply to Bollen (2007) and Bagozzi (2007), Psychological
Online, 2001.
Methods 12 (2) (2007) 238–245.
H.-W. Jung / Information and Software Technology 55 (2013) 1450–1461 1461
[62] H.R. Varian, Intermediate Microeconomics: A Modern Approach, seventh ed., [65] M. Zeleny, Multiple Criteria Decision Making, McGraw-Hill, New York, 1982.
WW Norton, New York, 2006. [66] P. Zhou, B. Ang, K. Poh, Comparing aggregating methods for constructing the
[63] J. Wilcox, R. Howell, E. Breivik, Questions about formative measurement, composite environmental index: an objective measure, Ecological Economics
Journal of Business Research 61 (12) (2008) 1219–1228. 59 (3) (2006) 305–311.
[64] K.P. Yoon, C.-L. Hwang, Multiple Attribute Decision Making: An Introduction, [67] P. Zhou, B. Ang, Comparing MCDA aggregation methods in constructing
Sage University Paper Series on Quantitative Applications in Social Sciences, composite indicators using the Shannon–Spearman measure, Social Indicators
Thousand Oaks, CA, 1995. Research 94 (1) (2009) 83–96.