ParetoPrinciple FinalVersion
ParetoPrinciple FinalVersion
net/publication/362397405
CITATIONS READS
3 172
4 authors, including:
SEE PROFILE
All content following this page was uploaded by Chau Le on 01 September 2022.
5 Abstract
7 Specifically, project owners use cost estimates during the scoping phase to set project budgets
8 for funding approval and cost management. Due to the lack of detailed design, State
9 Transportation Agencies (STAs)' estimators often rely on the Pareto principle, also known as the
10 80/20 rule, in early cost estimating by estimating only high-impact work items and roughly
11 calculating the costs of remaining items using a fixed percentage. However, it is a heuristic rule
12 of thumb, which does not apply to every scenario. Moreover, STAs' guidance is minimal, and
13 few studies have investigated the validity of this common practice. This study proposes two
15 determine optimal major work items and necessary related information to apply the items to
16 STAs' scoping-phase estimating of new projects. A case study was conducted using an STA's
17 actual historical bid tabulation data for two project work types. The first model's output shows
18 that 10% and 20% of the work items can respectively contribute up to 92% and 97% of total
1
Assistant Professor, Dept. of Civil, Construction and Environmental Engineering, North Dakota State University,
Fargo, ND 58102 (corresponding author). Email: [email protected]
2
The James C. Smith CIAC Endowed Professor, Dept. of Construction Science, Texas A&M University, College
Station, TX 77843. Email: [email protected]
3
Professor, Zachry Dept. of Civil and Environmental Engineering, Texas A&M University, College Station, TX
77843. Email: [email protected]
4
Professor, Dept. of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843.
Email: [email protected]
19 project cost, and the cost contribution ratios vary with not only project work types but also
20 projects under the same type, with coefficients of variance minimized simultaneously and
21 calculated by the model. Due to the variance, mean or median measures can represent the center
22 of cost percentages for estimating new projects' total cost from major items' costs. The second
23 model's results reveal that using the median is preferred due to lower expected errors. Also, using
24 an optimal 10% itemset and the median of its cost contribution ratios in past projects to estimate
25 future projects results in the approach's expected average error of 8.5%, which is acceptable in
27
28 Introduction
29 Cost estimating is one of the most important parts of any construction project's
30 development and management. A typical highway project may involve four main development
31 phases before construction: 1) Planning, 2) Scoping, 3) Preliminary Design, and 4) Final Design
32 (AASHTO 2013; ITD 2020). In the planning phase, needs for new projects are identified and
33 prioritized. Planning-phase cost estimates are necessary to understand potential funding amounts
34 required and compare different alternatives under consideration (PennDOT 2018). In the scoping
35 phase, projects' definitions become clearer with input from various functional groups and
37 developed, and the cost estimate information is primarily used to prioritize the projects for
38 programming, in other words, approval or authorization of the agency’s level funding for next
39 phases (MnDOT 2008). Thus, the scoping-phase estimate is critical since it becomes the project's
40 baseline cost (WSDOT 2015). Subsequently, estimates are developed in the preliminary design
2
41 phase to manage project costs against the budgets approved in the scoping phase (AASHTO
42 2013). Plans, Specifications, and Estimates (PS&E) estimating is required in the final design
44 Therefore, the reliability of cost estimates affects State Transportation Agencies (STAs)
45 at both agency and project levels (AASHTO 2013; Elmousalami 2020). First, timely project
46 completion within budget is critical as it directly affects STAs' key performance indicators,
47 accountability of the public funds, and public satisfaction. Second, unreliable cost estimating
48 negatively affects budget-related communications, budgeting decisions, and the use of agencies'
49 resources (WSDOT 2015). For example, under-estimating causes project delay while additional
50 funding has to be arranged to meet the project costs. On the other hand, over-estimating causes
51 inefficient use of funds (FHWA 2004; Gardner et al. 2017). Specifically, the scoping phase's cost
52 estimate is critical since the project owner uses it as the baseline to set the project budget for cost
53 management; cost estimates in later stages are compared against it (WSDOT 2015).
54 Due to the difference in project maturity levels in different project development phases,
55 the methods used for scoping-phase cost estimating are not the same as those used for planning-
56 phase or PS&E estimating. For planning-phase cost estimating, STAs typically use simple
57 parametric methods such as applying cost per parameter (e.g., dollars per centerline mile or
58 square foot of bridge deck) from past similar projects to estimating new projects due to the
59 limited project information and scope definition in the planning stage (AASHTO 2013; MnDOT
60 2008). Numerous research studies have also developed statistical modeling or advanced artificial
61 intelligence-based parametric models for predicting total project cost to improve estimation
62 accuracy (Elmousalami 2020; Gardner et al. 2017; Karaca et al. 2020; Zhang et al. 2017). Work
63 item-level information (e.g., approximate quantities of major items) becomes available or can be
3
64 reasonably determined in the scoping phase, and STAs often use the historical bid-based method
65 to estimate work items' costs (AASHTO 2013; ITD 2020). Unlike PS&E estimating, estimators
66 in the scoping phase (about 10% to 30% of project definition completed) do not have detailed
67 design plans to determine all work items entailed in a project and their quantities (WSDOT
68 2015). The estimators only focus on high-cost impact work items, based on the Pareto principle,
69 which implies that approximately 20% of the work items account for 80% of a project's total cost
70 (Olumide et al. 2010; PennDOT 2018; TxDOT n.d.). A percentage or minor item allowance is
71 applied to consider the remaining work items (CTDOT 2019; PennDOT 2018).
72 STAs' application of the Pareto principle (also known as the 80/20 rule) to scoping-phase
73 cost estimating faces challenges. First, the numbers 80 and 20 do not necessarily hold in cost
74 estimating, but guidelines are missing or vary across STAs. While some STAs state that 20% of
75 the work items can account for 80% of the cost, others have different ideas about the
76 contribution of the 20% items, such as 70% (Iowa DOT 2012; TxDOT n.d.). Second, the cost
77 contributions of the same set of major items in even similar projects are expected to fluctuate,
78 but little is currently known about the variation. Third, STAs do not have detailed guidance on
79 which items should be used for estimating and rely on estimators' judgments and experiences to
80 select the items. Fourth, high-cost impact work items vary with project work types and work-
81 item breakdown structures, but STAs' guidance or research studies on these dynamics are very
82 limited. Last, the error of applying the Pareto principle to estimating only major items' unit prices
83 and accounting for minor items by a predetermined percentage compared with estimating all
84 work items in a project has not been investigated. All of the issues mentioned above can
4
86 A few studies have applied the Pareto principle to cost estimating but only identified the
87 major work items or influential factors affecting total project costs (Le et al. 2019; Sayed et al.
88 2020; Shehab and Meisami-Fard 2013), which is not sufficient to address the issues. For
89 example, Le et al. (2019) identified the top five work items for unit price visualization, and
90 Sayed et al. (2020) determined the nine most critical influential factors from 29 factors
91 influencing construction cost estimates (e.g., site conditions and estimators' experience).
92 In this research, historical bid data (i.e., cost estimates submitted by the bidders of
93 previous projects in the letting phase) are utilized to address the issues described above.
94 Historical bid data consist of all work items, including both major and minor items, identified by
95 designers/estimators at the final design phase, along with item quantities calculated from detailed
96 design plans. Also, unit prices are available and so allow for the investigation of the cost
98 Since numerous sets of major items can be used for scoping-phase cost estimating,
99 selecting an optimal set is desirable. Apart from accuracy, another critical aspect of cost
100 estimating is the amount of effort spent on developing cost estimates (Cao et al. 2018) because of
101 the limited time allowed for estimating (Alroomi et al. 2012; ITD 2020). This research proposes
102 a novel application of multi-objective optimization methods to enhance STAs' current practice of
103 applying the Pareto principle to scoping-phase cost estimation. Two optimization models are
104 proposed to automatically find optimal major work items for cost estimation of different project
105 work types and work-item breakdown structures. The models' output provides new knowledge
106 about optimal major work items, their contribution to total project cost and relative variation, and
107 the Pareto principle approach's errors. Optimization objectives used in the models include 1)
108 maximizing the mean or the median of the cost percentages of major items over total project cost
5
109 (e.g., maximizing the average cost contribution of the top 20% items: it is 80% or much higher?),
110 2) minimizing the coefficient of variance of the percentages for uncertainty reduction, and 3)
111 minimizing the error of applying the Pareto principle in scoping-phase cost estimating.
112 Comparisons between different numbers of major work items are also conducted.
113
115 Cost estimates are necessary for each development phase (AASHTO 2013). Due to the
116 differences in the amount of input information available for cost estimating and the purpose and
117 required accuracy of the estimates, the methodologies adopted for cost estimation are different
118 between the development phases (PennDOT 2018; WSDOT 2015). This research focuses on cost
119 estimating at the scoping phase due to its importance to budget approval and project cost
122
123 STAs often apply the Pareto principle to scoping-phase cost estimating (ITD 2020;
124 Olumide et al. 2010). Specifically, they use the historical bid-based estimating method for major
125 quantifiable work items and then apply a fixed percentage to account for the remaining minor
126 items. Historical bid-based estimating is an approach that relies on the bid tabulation data (see
127 Fig. 2) of similar projects in recent years to estimate unit prices for a new project with possible
128 modifications by estimators to account for unique project characteristics (Le et al. 2019).
129 However, STAs' guidance on applying the Pareto principle is minimal (see Table 1). Various
130 issues mentioned in the Introduction section can significantly affect estimating accuracy.
133
134 This paper aims to develop an innovative multi-objective approach for selecting optimal
135 major work items for cost estimating by STAs in the scoping phase by leveraging their currently
136 available historical bid tabulation data with the considerations of different project work types,
137 work-item breakdown structures, and multiple objectives. For each optimal set of major work
138 items, definitive and relevant information to apply the itemset for future projects is determined
139 and recommended. The expected error of using the Pareto principle with the itemset is also
141
142 Methodology
143 This section presents the development of two multi-objective optimization models to
144 support the application of the Pareto principle to cost estimating in the scoping phase. Fig. 3
145 gives an overview of the proposed models, including three main components: 1) Input data, 2)
148
150 Historical bid data in recent years of an STA are used as input of the proposed models.
151 Each project's data attributes used in the models include the project work type and the winning
152 bidder's extended amounts of all work items in the project (see Fig. 2). Additionally, two user-
7
153 defined input variables are necessary, including 1) Project work type of interest and 2) Number
154 of major items (n) selected from all items relevant to the work type.
155 • The project work type variable is needed because different work types have
156 significantly different lists of work items and cost distributions. For example, the top
157 five items of Hot Mix Asphalt (HMA) resurfacing projects are entirely different from
158 those of Portland Cement Concrete (PCC) pavement projects (see Table 2). Major work
159 items used for cost estimating, therefore, vary with project work types.
160 • The number of major items (n) is related to the amount of time and effort spent on
163
165 The model development process consists of three phases: 1) Decision variables, 2)
167
169 Assume the past projects of the work type of interest involve m work items (from Item 1
170 to Item m); m can be hundreds. However, not all work items can be used for cost estimating in
171 the scoping phase due to the lack of detailed design information and design plans. With the user-
172 defined input n, the models need to identify optimal n-item sets from the m work items for future
173 cost estimating in the scoping phase. The selection of n items from the m work items is modeled
175
8
176 ! = (!! , !" , !# , … , !$ ) (1)
1, *+ !,-. * */ /-0-1,-2
177 !% = ( (2)
0, *+ !,-. * */ 45, /-0-1,-2
178 ∑$
%&! !% = 4 (3)
180 Assume the input bid tabulation data contain k projects of the work type of interest (Prj 1
181 to Prj k). Given a selected set of n work items, the cost percentage of the items in the itemset
182 over total project cost in each project is calculated, forming a sample of k cost percentages for
185 75,80 15/, 5+ /-0-1,-2 *,-./ *4 <:; ; = @!=' = ∑%, *! &! >?%' (5)
+*,"
186 =5/, 9-:1-4,8A- 5+ /-0-1,-2 *,-./ 5B-: ,5,80 9:5;-1, 15/, = <' = -,"
(6)
187 Where EAij = the extended amount of Item i in Prj j. Statistical measures of the cost percentages
189 • Mean of the percentages (PMean): the average of the percentages, a measure of central
191 • Median of the percentages (PMedian): the middle value when the percentages are ordered
192 from the lowest to the highest, another measure of central tendency. Compared to the
193 mean, the median is less sensitive to skewness and outliers (Ott and Longnecker 2015).
194 • Coefficient of variance (CV): the standard deviation divided by the mean of the
196 spread, CV is more appropriate in comparing the variability between populations (i.e.,
9
197 between different sets of major work items) because it reflects variation over the
200
201 A set of n major work items is associated with two measures of the center of cost
202 percentages: PMean and PMedian. Therefore, there are two strategies for applying the itemset to
204 • Strategy 1 involves calculating the total cost of the major items included in the project
205 (MIC) and then dividing it by PMean to obtain a total project cost estimate. The Mean
206 Absolute Percentage Error (MAPE), a common measure of how accurate a forecast or
207 estimate is in percentage, between MIC/PMean values and total project costs reflects the
208 error of applying Strategy 1, compared with estimating all work items. The resulted
211 • Strategy 2, similarly, involves calculating the total cost of the major items included in
212 the project and then dividing it by PMedian to obtain a total project cost estimate. The
213 MAPE between MIC/PMedian values and total project costs reflects the error of applying
214 Strategy 2, compared with estimating all work items. The resulted MAPE is called
10
217 Model 1 includes two objectives: 1) Maximizing PMean and 2) Minimizing CV. This
218 model is designed to examine whether the numbers 80 and 20 in the 80/20 rule hold in cost
221 Minimizing MAPE using mean, and 4) Minimizing MAPE using median. The model is designed
222 to examine the errors of applying the Pareto principle to scoping-phase cost estimating and
223 compare two application strategies: using mean (Strategy 1) or median (Strategy 2) to represent
224 the cost contributions of major work items over total project cost.
225
227 The models are implemented using the Non-Dominated Sorting Genetic Algorithm
228 (NSGA-II) due to its high capability to solve a variety of multi-objective optimization problems
229 and its ability to consider all objectives simultaneously without the need to pre-define weights
230 for the objectives (Deb et al. 2002). Examples of applying NSGA-II to construction decision-
231 making problems are multi-objective scheduling and planning (El-Abbasy et al. 2017; Halabya
232 and El-Rayes 2020; Jeong and Abraham 2006; Peralta et al. 2018), design optimization (Dino
233 and Üçoluk 2017; Hyari et al. 2016), and optimal construction layout or work zone design and
234 development (Abdelmohsen and El-Rayes 2016; Abdelmohsen and El-Rayes 2018; Schuldt and
235 El-Rayes 2018). The models are developed with the support of the Distributed Evolutionary
237 The NSGA-II computations in the models include four primary tasks:
238 1) An initialization task that randomly creates an initial population of sets of n work items
239 from all work items relevant to the project work type of interest [see Eq. (1) to (3)],
11
240 2) A fitness evaluation task that calculates model evaluation metrics for each generated n-
242 3) A ranking task that sorts the itemsets using nondomination ranks and crowding
244 4) An evolution task of generating new populations with selection, crossover, and
246 Tasks 2 to 4 iterate until a stop criterion (e.g., a maximum number of iterations) is met.
247
249 Given the work type of interest and the user-defined number of major items (n) for cost
250 estimating, the output of Model 1 is optimal n-item sets, trade-off solutions between maximizing
251 PMean while minimizing CV. The item sets and their corresponding measures are provided for
253 Similarly, the output of Model 2 is optimal n-item sets with four defined objectives: 1)
254 Maximizing PMean, 2) Maximizing PMedian, 3) Minimizing MAPE using mean, and 4) Minimizing
255 MAPE using median. The MAPE values illustrate the errors of applying the Pareto principle to
257 However, the MAPE values from the output of Model 2 are calculated from the same
258 projects used for optimization, which probably makes the errors underestimated. Therefore, the
259 optimal sets of n major work items are applied to the cost estimation of each of the projects in a
260 hold-out dataset with the two defined strategies (i.e., Strategy 1: using mean and Strategy 2:
261 using median). Assume there are l projects in the hold-out dataset [from Prj (k+1) to Prj (k+l)].
12
#$%"
! /-,"
&#'()
262 @?<> C/*4A .-84 54 ,ℎ- ℎ5025C, 28,8/-, = ∑.10
'&.1! D D (9)
0 -,"
#$%"
/-,"
! &#'*!()
263 @?<> C/*4A .-2*84 54 ,ℎ- ℎ5025C, 28,8/-, = ∑.10
'&.1! D D (10)
0 -,"
264 The MAPE values on the hold-out dataset are expected to provide more realistic and reliable
266
268 Bid tabulation data of 181 HMA resurfacing projects (Work type code: 1523) and 95
269 projects of the work type PCC pavement – Grade/New (Code: 1014) were obtained from an STA
271
272 Model 1
273 The input data include 181 HMA resurfacing projects (Work type code: 1523). A total of
274 421 work items were used in the letting stage of these projects. According to the Pareto principle,
275 STAs suggest estimating only high-cost impact work items in the scoping phase, only a small
276 portion of all relevant work items. Since the proposed models allow users to define the number
277 of major items they want to use for scoping-phase cost estimating, optimal sets of various
278 numbers of work items can be obtained. Fig. 5 provides a wide range of optimal sets of 42 work
281
13
282 On one end of the spectrum, Solution A represents an optimal 42-item set that results in
283 the highest PMean. On average, the 42 items of Solution A account for 91.7% of the total project
284 cost. However, that percentage is also associated with the highest variation among the generated
285 solutions, which may not be a desirable feature from cost estimators' perspectives. Solution B
286 corresponds to an optimal 42-item set at the other end of the spectrum, resulting in the lowest CV
287 but a PMean value significantly lower than that of Solution A (i.e., 85.6% compared with 91.7%).
288 Between the two ends of the spectrum, the model provides other trade-offs between the two
289 defined objectives: 1) Maximizing PMean and 2) Minimizing CV. Of those, Solution C seems to
290 be a harmonious solution between Solution A and Solution B, using the elbow method. In fact,
292 The solutions correspond to the following setting: population size (npop) = 100, two-point
293 crossover with the probability that an offspring is produced by crossover (pcx) = 0.7, two-point
294 swapping mutation with the probability that an offspring is produced by mutation (pmut) = 0.2,
295 and the maximum number of iterations = 2,000. The solutions are compared with the results of
296 two other NSGA-II settings and the solutions obtained by another popular multi-objective
297 optimization method, i.e., the Strength Pareto Evolutionary Algorithm II (SPEA-II). Fig. 6 shows
298 that the solutions from the alternatives lie very close to the baseline, illustrating the quality and
301
302 When the number of major items used for scoping-phase cost estimating increases, the
303 effort required for the estimation increases, which naturally results in improvements in both
304 objectives (see Fig. 7). However, the improvements decrease as the number of major items
14
305 increases due to the uneven cost distribution among work items. As shown in Fig. 7, for HMA
306 resurfacing projects (Work type code: 1523), the improvements in the objectives from 67 items
307 to 76 items are significantly smaller than those from 25 items to 34 items. A similar trend applies
310
311 Fig. 7 also demonstrates the necessity of applying the Pareto principle to different project
312 work types separately. While 421 items were used in the bid tabulation data of the past 181
313 HMA resurfacing projects, 771 items were used in those of 95 PCC pavement projects. The
314 evaluation metrics (i.e., PMean and CV) of the two work types are also substantially different for
315 the same ratio of major items. Take the ratio of 20% as an example. While 20% of the work
316 items of HMA resurfacing projects can account for up to 96.5% of the total project cost on
317 average, the counterpart only contributes up to an average of 91.6% of the total cost; both cost
318 percentages are much larger than 80%, from the 80/20 rule. Conversely, the variations of the cost
319 percentages in PCC pavement projects are significantly smaller than those in HMA resurfacing
320 projects for the same major item ratios (i.e., from 6% to 20% with an increment of 2%).
321
322 Model 2
323 For a specific set of major work items, the major items' contribution to total project cost
324 varies among projects, even with the projects of the same work type. A measure of the center of
325 cost contributions (i.e., mean or median) is necessary to apply an itemset for future projects.
326 Model 2 can enable comparison between using the mean or the median of cost percentages in
329 two datasets: optimization (75% of the projects) and hold-out (25% of the projects). With the
330 optimization dataset as input, Model 2 can generate optimal solutions for different user-defined
331 numbers of work items and the solutions' evaluation metrics (i.e., PMean, PMedian, MAPE using
332 mean, and MAPE using median). The generated optimal sets of work items were subsequently
333 applied to the hold-out dataset [see Eq. (9) to (10)]. For each optimal set, the total cost of its
334 items in each hold-out project was calculated, then divided by the corresponding PMean or PMedian
335 and compared with the total project cost. Collectively, the relative differences were used to
336 obtain MAPE using mean or MAPE using median on the hold-out dataset. The obtained MAPE
337 values reflected the expected errors of applying the Pareto principle and the optimal itemset for
338 future projects alone, not yet considering other factors influencing the accuracy of a cost estimate
340 Fig. 8 shows the optimal 42-item solutions generated from Model 2 with the optimization
341 dataset as input and their corresponding 1) mean and 2) MAPE using mean on the optimization
342 dataset and 3) MAPE using mean on the hold-out dataset. MAPE values on the optimization
343 dataset are generally smaller than MAPE values on the hold-out dataset, justifying the need for
344 splitting the original data into two datasets as performed. The average error of using an optimal
345 set of 42 major work items with mean as the center measure (i.e., Strategy 1) for scoping-phase
348
349 Similarly, Fig. 9 shows the optimal 42-item solutions generated from Model 2 with the
350 optimization dataset as input and their corresponding 1) median and 2) MAPE using median on
16
351 the optimization dataset and 3) MAPE using median on the hold-out dataset. The average error
352 of using an optimal set of 42 major work items with median as the center measure (i.e., Strategy
353 2) for scoping-phase cost estimating of HMA resurfacing projects alone is 9.2%.
355
356 For each scenario of user-defined input (i.e., the project work type of interest and the
357 number of major items used for cost estimating), a comparison between MAPE using mean and
358 MAPE using median is necessary to choose between mean or median to represent the cost
359 contribution of an optimal major-item set over total project cost. Fig. 10 provides a comparison
360 between the mean and the median of the optimal 42-item sets of HMA resurfacing projects
361 generated by Model 2. The right part of the figure shows some trade-offs between maximizing
362 PMean and maximizing PMedian, demonstrating that the two objectives should be separated as
363 originally defined in Model 2. The left part of the figure illustrates that the mean is smaller than
364 the median in all generated optimal solutions, indicating left-skewed distributions of cost
365 percentages.
367
368 Each generated set of 42 major work items is associated with four MAPE values: MAPE
369 using mean and MAPE using median, from the output of Model 2, and MAPE using mean and
370 MAPE using median when applying the itemset to the hold-out dataset. The left part of Fig. 11
371 shows that MAPE using median (Strategy 2) is smaller than MAPE using mean (Strategy 1) on
372 the optimization dataset. On the hold-out dataset, MAPE using median is also smaller than
373 MAPE using mean for most solutions (see the right part of Fig. 11). Collectively, the figure
17
374 suggests that median (Strategy 2) is a better center measure than mean (Strategy 1) due to
375 smaller resulted errors. The result agrees with the common suggestion that the median is
378
379 As previously shown in Fig. 9, for optimal sets of 42 major work items, the MAPE using
380 median on the hold-out dataset has an average value of 9.2%. To examine the changes when
381 different subsets of the data were used for optimization, the other 25% of project groups were
382 left aside as the hold-out dataset while the remaining corresponding 75% of the projects were
383 used for optimization, similar to four-fold cross-validation. The average errors in the four cases
384 are 9.2%, 7.7%, 7.3%, and 9.7%, and the four-fold average error is 8.5% (see Fig. 12).
386
387 In order to obtain a more accurate cost estimate or a lower error of applying the Pareto
388 principle to cost estimating, an obvious solution is to increase the number of major work items to
389 be used for scoping-phase cost estimating. However, the effectiveness in reducing errors is not
390 linear with the increase in the number of major items (see Fig. 12).
391
393 The proposed models' outputs and further analyses have addressed the five issues stated
394 in the Introduction section about the applications of the Pareto principle to scoping-phase cost
18
396 First, the numbers 80 and 20 in the 80/20 rule and STAs' guidance summarized in Table 1
397 are not likely to hold in cost estimating. For example, Solution C in Fig. 5 corresponds to an
398 optimal set of 42 major work items (i.e., 10% of all relevant items) that accounts for, on average,
399 90% of the total cost of an HMA resurfacing project. Additionally, 20% of the work items can
400 contribute up to 96.5% of the total project cost on average (see Fig. 7). In these cases, 90/10 or
401 97/20 (not 80/20) apply. Second, Model 1 provides a measure of the variability of the cost
402 percentages of a major item set over total project cost, which is not available in previous studies
403 or STAs' guidance. Various trade-off solutions between maximizing the average cost percentage
404 and minimizing the CV of cost percentages are also provided. Third, with an STA's historical bid
405 tabulation dataset as input, the proposed models can automatically determine various optimal
406 sets of major items, which helps avoid STAs' reliance on estimators' judgments and experiences
408 Fourth, the applications of the Pareto principle to different project work types can be
409 substantially different (see Fig. 7). Yet, STAs' guidance is the same for all projects, which can
410 cause significant errors in cost estimation. The proposed models can be applied to different STAs
411 (i.e., work breakdown structures) and project work types (e.g., HMA resurfacing or PCC
412 pavement) flexibly to obtain corresponding optimal sets of major work items for cost estimating.
413 Last, the errors of applying the Pareto principle were not known but are now discovered by
414 Model 2. Two strategies of using the generated optimal solutions to future estimating (i.e., mean
415 or median of the cost percentages of the major items in past projects) can also be compared using
416 the output of Model 2. For example, in the case of HMA resurfacing projects and 42-item sets,
19
418 Since most STAs have minimal guidance on applying the Pareto principle to cost
419 estimating, the agencies can significantly enhance their current practices by periodically applying
420 the proposed models to their historical bid tabulation data and then providing the generated
421 results to cost estimators to estimate new projects. For example, at the end of each five years, an
422 STA can implement the models to its historical bid data of the five most recent years to obtain
423 optimal solutions and corresponding measures for each common project work type. Cost
424 estimators do not need to conduct any optimization tasks but directly apply the model outputs in
425 the subsequent five years. For a new project, cost estimators are provided with different optimal
426 sets of major work items. Depending on the new project's available information and the certainty
427 in the occurrence of particular major work items, they can select the most appropriate optimal
428 major item set for estimation in the scoping phase. They are also provided with the
429 corresponding measures (e.g., PMean, PMedian, CV, MAPE using mean, and MAPE using median)
430 to calculate total project cost and variation from the major items' cost and determine the expected
431 error. As the number of major work items used for estimating increases, the required effort
432 increases, and the error of the approach decreases. Cost estimators can rely on an expected error
433 of the approach to select the number of work items they need to estimate unit prices. Since the
434 required accuracy of scoping-phase cost estimating is not high, with a needed range from -30%
435 to +50% (AASHTO 2013), an error of 8.5% of applying the Pareto principle to cost estimating
436 like in the cases of Fig. 12 seems acceptable, allowing mistakes caused by other factors (e.g.,
438
439
20
440 Summary and conclusions
442 project. It is used to set the budget for project cost management. Due to the lack of detailed
443 design plans in the scoping phase and limited time allowed for estimating, STA cost estimators
444 often apply the Pareto principle in their estimation. They focus time and effort on estimating
445 major high-cost impact work items and account for the remaining items by a fixed percentage or
446 a minor item allowance. However, STAs have minimal guidance on this approach. Besides, few
447 previous studies have investigated the issues associated with applying the Pareto principle, such
448 as major item determination, variances among projects and project work types, or the error of the
450 This study's primary contribution to the body of knowledge is the novel application of
451 multi-objective optimization methods to address those issues and discover new knowledge that
452 provides STAs with practical guidance on using the Pareto principle in early cost estimating. The
453 proposed models can automatically determine various optimal sets of major work items for
454 different project work types and numbers of work items from an STA's historical bid tabulation
455 dataset. The output measures of each solution also provide definitive information for applying
456 the optimal work item set for future projects, such as the distribution of cost percentages over
457 total project cost with mean, median, and CV and the expected error associated with using the
458 mean or the median for a new project. For example, the case study results show that 10% and
459 20% of the work items can respectively contribute up to 92% and 97% of total project cost,
460 which both are much larger than 80%. Also, the cost contribution ratios of an optimal major item
461 set vary even with projects under the same work type, specifically with a CV of at least 9.4% in
462 the case of HMA resurfacing projects and a major item ratio of 10%. The study's findings also
21
463 illustrate the differences in applying the Pareto principle to different project types since the
464 Pareto fronts for the same major item ratio of HMA resurfacing and PCC pavement projects are
465 significantly different. The effectiveness of increasing the number of major work items used for
466 cost estimation was also assessed, which was an error reduction of 41% (from 8.5% to 5.0%)
468 Due to data availability issues, this study is limited by considering only one primary
469 factor (i.e., project work type) influencing the list of work items and cost distribution in a project.
470 Although projects of the same work type are similar to each other, variations still exist.
471 Considering extra factors may help create more uniform groups of projects. However, it also
472 requires extra effort from STAs in collecting additional data. Furthermore, the proposed models
473 provide measures of variations and errors to help STA cost estimators make an informed data-
474 back decision. Although this research focused on transportation projects, the proposed approach
475 also applies to other construction sectors provided that a systematic and consistent work
476 breakdown structure is in use and historical bid tabulation data are available.
477
478 Acknowledgment
479 The authors would like to acknowledge that the Iowa Department of Transportation has
481
483 The data used during the study were provided by a third party. Direct requests for these
484 materials may be made to the provider as indicated in the Acknowledgements. Some models or
22
485 codes that support the findings of this study are available from the corresponding author upon
487
488 References
489 AASHTO (2013). "Practical guide to cost estimating." AASHTO Washington, DC.
490 Abdelmohsen, A. Z., and El-Rayes, K. (2016). "Optimal Trade-Offs between Construction Cost
491 and Traffic Delay for Highway Work Zones." Journal of Construction Engineering and
493 Abdelmohsen, A. Z., and El-Rayes, K. (2018). "Optimizing the Planning of Highway Work
494 Zones to Maximize Safety and Mobility." Journal of Management in Engineering, 34(1).
495 Alroomi, A., Jeong, D. H. S., and Oberlender, G. D. (2012). "Analysis of Cost-Estimating
496 Competencies Using Criticality Matrix and Factor Analysis." Journal of Construction
498 Cao, Y., Ashuri, B., and Baek, M. (2018). "Prediction of Unit Price Bids of Resurfacing
499 Highway Projects through Ensemble Machine Learning." Journal of Computing in Civil
503 Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). "A fast and elitist multiobjective
505 182-197.
23
506 Dino, I. G., and Üçoluk, G. (2017). "Multiobjective Design Optimization of Building Space
509 El-Abbasy, M. S., Elazouni, A., and Zayed, T. (2017). "Generic Scheduling Optimization Model
510 for Multiple Construction Projects." Journal of Computing in Civil Engineering, 31(4).
511 Elmousalami, H. H. (2020). "Artificial Intelligence and Parametric Construction Cost Estimate
514 FHWA (2004). "Guidelines on Preparing Engineer's Estimate, Bid Reviews and Evaluation."
516 Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A. G., Parizeau, M., and Gagné, C. (2012).
517 "DEAP: Evolutionary algorithms made easy." The Journal of Machine Learning
519 Gardner, B. J., Gransberg, D. D., and Rueda, J. A. (2017). "Stochastic Conceptual Cost
521 Sampling." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part
523 Halabya, A., and El-Rayes, K. (2020). "Optimizing the Planning of Pedestrian Facilities Upgrade
524 Projects to Maximize Accessibility for People with Disabilities." Journal of Construction
526 Hyari, K. H., Khelifi, A., and Katkhuda, H. (2016). "Multiobjective Optimization of Roadway
530 Jeong, H. S., and Abraham, D. M. (2006). "Operational Response Model for Physically Attacked
531 Water Networks Using NSGA-II." Journal of Computing in Civil Engineering, 20(5),
532 328-338.
533 Karaca, I., Gransberg, D. D., and Jeong, H. D. (2020). "Improving the Accuracy of Early Cost
536 Le, C., Le, T., Jeong, H. D., and Lee, E.-B. (2019). "Geographic Information System–Based
537 Framework for Estimating and Visualizing Unit Prices of Highway Work Items." Journal
539 MDT (2016). "Cost Estimation Procedure for Highway Design Projects." Montana Department
540 of Transportation.
541 MnDOT (2008). "Cost Estimation and Cost Management - Technical Reference Manual."
543 Olumide, A. O., Anderson, S. D., and Molenaar, K. R. (2010). "Sliding-Scale Contingency for
545 Ott, R. L., and Longnecker, M. (2015). An Introduction to Statistical Methods and Data
548 Peralta, D., Bergmeir, C., Krone, M., Galende, M., Menéndez, M., Sainz-Palmero, G. I.,
549 Bertrand, C. M., Klawonn, F., and Benitez, J. M. (2018). "Multiobjective Optimization
550 for Railway Maintenance Plans." Journal of Computing in Civil Engineering, 32(3),
551 04018014.
25
552 Sayed, M., Abdel-Hamid, M., and El-Dash, K. (2020). "Improving cost estimation in
554 Schuldt, S., and El-Rayes, K. (2018). "Optimizing the Planning of Remote Construction Sites to
557 Shehab, T., and Meisami-Fard, I. (2013). "Cost-Estimating Model for Rubberized Asphalt
559 TxDOT (n.d.). "Risk-Based Construction Cost Estimating - Reference Guide." Texas
561 WSDOT (2015). "Cost Estimating Manual for Projects." Washington State Department of
562 Transportation.
563 Zhang, Y., Minchin, R. E., and Agdas, D. (2017). "Forecasting Completed Cost of Highway
566
567
568
569
570
571
572
573
26
574 List of Tables
575
576 Table 1. STAs' guidance on applying the Pareto principle to cost estimating
577 Table 2. Top five work items of two different work types
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
27
594 Table 1. STAs' guidance on applying the Pareto principle to cost estimating
596
597
598
599
600
601
602
603
604
605
606
607
28
608 Table 2. Top five work items of two different work types
Note: Top five work items in terms of the total extended amount of the item of all projects of the
work type in a historical bid dataset
609
610
611
612
613
614
615
616
617
618
619
620
621
29
622 List of Figures
623
624 Fig. 1. Timing of scoping-phase cost estimating in the project development phases
628 Fig. 5. Optimal trade-offs between mean and CV of cost percentages of a 42-major-item set over
629 total project cost in HMA resurfacing projects (Work type 1523)
631 Fig. 7. Optimal trade-offs between mean and CV of cost percentages of a major item set over
632 total project cost in different projects: A comparison between two work types (1523 — HMA
634 Fig. 8. Optimal trade-offs between the mean of cost percentages of a 42-major-item set over total
635 project cost in HMA resurfacing projects and MAPE using mean
636 Fig. 9. Optimal trade-offs between the median of cost percentages of a 42-major-item set over
637 total project cost in HMA resurfacing projects and MAPE using median
638 Fig. 10. Comparison between the mean and median of cost percentages of a 42-major-item set
640 Fig. 11. Comparison between MAPE using mean and MAPE using median on the optimization
642 Fig. 12. Changes in average MAPE using median with four-fold cross-validation and increases
644
30
Cost Estimating at the
Scoping Phase Historical Bid
Data
Preliminary
Planning Scoping Final Design Letting
Design
on level)
ject maturity (Project definiti
Pro
1
Project • Project number, Letting date, Winning bidder, etc.
information • Work type (e.g., Hot Mix Asphalt resurfacing)
Bid tabulation
data of a project
Extended
Code Description Unit Quantity Unit price
Work item amount
information ... ... ... ... ...
2303- Asphalt Binder, PG Ton 960 525 504,000
0245828 58-28
... ... ... ... ...
... ... ... ... ...
INPUT DATA
Bid tabulation data of an SHA in recent years User-defined input
• Project number and work type • Work type of interest
• Work item data • Number of major items (n) to be
• Item code, description, unit, and quantity chosen from all relevant items of the
• Winning bidder’s unit price work type for cost estimating
MODEL DEVELOPMENT
OUTPUT
Model 1: Model 2:
• Optimal sets of major work items • Optimal sets of major work items
• Corresponding mean and coefficient of • Corresponding mean, median, MAPE using
variation mean, and MAPE using median
• MAPE using mean on a hold-out dataset
• MAPE using median on a hold-out dataset
For a work type of interest,
• k projects of the work type in the bid tabulation data: Prj 1, Prj 2, ..., Prj k
• m work items included in the k projects: Item 1, Item 2, ..., Item m
Note:
0.1125
Worktype 1523 (421 relevant items) Solution A
Find optimal 42-item sets (10%) Mean = 0.917
Percentages over Total Project Cost
0.1100 CV = 0.114
Coefficient of Variation of
0.1075
0.1050
Solution B
0.1025 Mean = 0.856
CV = 0.094
0.1000
Solution C
0.0975 Mean = 0.900
CV = 0.100
0.0950
0.0925
0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92
Mean of Percentages over Total Project Cost (%)
0.120
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.115
Percentages over Total Project Cost
0.110
Coefficient of Variation of
0.105
0.100
0.090
0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92
Mean of Percentages over Total Project Cost (%)
Worktype 1014 (771 relevant items) Worktype 1523 (421 relevant items)
0.16
25 items (06%)
0.14
34 items (08%)
Percentages over Total Project Cost
0.12
42 items (10%)
Coefficient of Variation of
59 items (14%)
0.08
46 items (06%) 67 items (16%)
0.00
0.70 0.75 0.80 0.85 0.90 0.95 1.00
Mean of Percentages over Total Project Cost (%)
0.14
Note:
Worktype 1523 (421 relevant items)
0.13 Find optimal 42-item sets (10%)
0.12
0.11
MAPE Using Mean
0.09
Average MAPE on the optimization dataset = 0.083
0.07
0.06
0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93
Mean of Percentages over Total Project Cost (%)
0.13
Note:
Worktype 1523 (421 relevant items)
0.12
Find optimal 42-item sets (10%)
0.11
MAPE on the hold-out dataset
MAPE Using Median
Optimal solutions on
the optimization dataset
0.08
0.06
0.87 0.88 0.90 0.92 0.94 0.96 0.98 0.99
Median of Percentages over Total Project Cost (%)
1.000
n
ia
ed 0.980
0.975 M
=
n
ea
0.950 M
0.975
0.925
0.970
0.900
0.875 0.965
0.850
0.915 0.920 0.925 0.930
0.85 0.90 0.95 1.00
Mean of Percentages over
Mean of Percentages over Total Project Cost (%)
Total Project Cost (%)
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.10 0.13
MAPE (Mean) = MAPE (Mean) =
MAPE (Median) MAPE (Median)
0.12
0.07
0.09
0.06
0.08
0.08 0.09 0.10 0.11 0.12 0.13
0.06 0.07 0.08 0.09 0.10 MAPE Using Mean
MAPE Using Mean on the Hold-out Dataset
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.097
0.092
0.085
0.073
0.077 0.070
0.073 0.064
0.055 0.055
0.055 0.056 0.050
0.046 0.044
Fold 1 Fold 2 Fold 3 Fold 4 Fold 1 Fold 2 Fold 3 Fold 4 Fold 1 Fold 2 Fold 3 Fold 4
10% of the work items 15% of the work items 20% of the work items