0% found this document useful (0 votes)
18 views43 pages

ParetoPrinciple FinalVersion

This study proposes two multi-objective optimization models to improve cost estimation during the scoping phase of transportation projects by utilizing the Pareto principle. The models aim to identify optimal major work items that significantly contribute to project costs, addressing the limitations of current practices that rely on heuristics and minimal guidance. Results indicate that using a selected set of major items can lead to more accurate cost estimates, with an expected average error of 8.5% when applying historical data.

Uploaded by

DC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views43 pages

ParetoPrinciple FinalVersion

This study proposes two multi-objective optimization models to improve cost estimation during the scoping phase of transportation projects by utilizing the Pareto principle. The models aim to identify optimal major work items that significantly contribute to project costs, addressing the limitations of current practices that rely on heuristics and minimal guidance. Results indicate that using a selected set of major items can lead to more accurate cost estimates, with an expected average error of 8.5% when applying historical data.

Uploaded by

DC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/362397405

Pareto Principle in Scoping-Phase Cost Estimating: A Multiobjective


Optimization Approach for Selecting and Applying Optimal Major Work Items

Article in Journal of Construction Engineering and Management · August 2022


DOI: 10.1061/(ASCE)CO.1943-7862.0002349

CITATIONS READS

3 172

4 authors, including:

Chau Le Hyungseok David Jeong


University of North Carolina at Charlotte Iowa State University
64 PUBLICATIONS 252 CITATIONS 128 PUBLICATIONS 1,866 CITATIONS

SEE PROFILE SEE PROFILE

Satish T.S. Bukkapatnam


Texas A&M University
274 PUBLICATIONS 4,985 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chau Le on 01 September 2022.

The user has requested enhancement of the downloaded file.


1 Pareto Principle in Scoping-Phase Cost Estimating: A Multi-Objective Optimization

2 Approach for Selecting and Applying Optimal Major Work Items

3 Chau Le, A.M.ASCE1; H. David Jeong, A.M.ASCE2; Ivan Damnjanovic, M.ASCE3;

4 and Satish Bukkapatnam4

5 Abstract

6 Cost estimation is critical to a typical transportation project's development process.

7 Specifically, project owners use cost estimates during the scoping phase to set project budgets

8 for funding approval and cost management. Due to the lack of detailed design, State

9 Transportation Agencies (STAs)' estimators often rely on the Pareto principle, also known as the

10 80/20 rule, in early cost estimating by estimating only high-impact work items and roughly

11 calculating the costs of remaining items using a fixed percentage. However, it is a heuristic rule

12 of thumb, which does not apply to every scenario. Moreover, STAs' guidance is minimal, and

13 few studies have investigated the validity of this common practice. This study proposes two

14 novel models that utilize well-established multi-objective optimization methods to automatically

15 determine optimal major work items and necessary related information to apply the items to

16 STAs' scoping-phase estimating of new projects. A case study was conducted using an STA's

17 actual historical bid tabulation data for two project work types. The first model's output shows

18 that 10% and 20% of the work items can respectively contribute up to 92% and 97% of total

1
Assistant Professor, Dept. of Civil, Construction and Environmental Engineering, North Dakota State University,
Fargo, ND 58102 (corresponding author). Email: [email protected]
2
The James C. Smith CIAC Endowed Professor, Dept. of Construction Science, Texas A&M University, College
Station, TX 77843. Email: [email protected]
3
Professor, Zachry Dept. of Civil and Environmental Engineering, Texas A&M University, College Station, TX
77843. Email: [email protected]
4
Professor, Dept. of Industrial and Systems Engineering, Texas A&M University, College Station, TX 77843.
Email: [email protected]
19 project cost, and the cost contribution ratios vary with not only project work types but also

20 projects under the same type, with coefficients of variance minimized simultaneously and

21 calculated by the model. Due to the variance, mean or median measures can represent the center

22 of cost percentages for estimating new projects' total cost from major items' costs. The second

23 model's results reveal that using the median is preferred due to lower expected errors. Also, using

24 an optimal 10% itemset and the median of its cost contribution ratios in past projects to estimate

25 future projects results in the approach's expected average error of 8.5%, which is acceptable in

26 the scoping phase.

27

28 Introduction

29 Cost estimating is one of the most important parts of any construction project's

30 development and management. A typical highway project may involve four main development

31 phases before construction: 1) Planning, 2) Scoping, 3) Preliminary Design, and 4) Final Design

32 (AASHTO 2013; ITD 2020). In the planning phase, needs for new projects are identified and

33 prioritized. Planning-phase cost estimates are necessary to understand potential funding amounts

34 required and compare different alternatives under consideration (PennDOT 2018). In the scoping

35 phase, projects' definitions become clearer with input from various functional groups and

36 stakeholders (MnDOT 2008). Based on a project's definition, a scoping-phase cost estimate is

37 developed, and the cost estimate information is primarily used to prioritize the projects for

38 programming, in other words, approval or authorization of the agency’s level funding for next

39 phases (MnDOT 2008). Thus, the scoping-phase estimate is critical since it becomes the project's

40 baseline cost (WSDOT 2015). Subsequently, estimates are developed in the preliminary design

2
41 phase to manage project costs against the budgets approved in the scoping phase (AASHTO

42 2013). Plans, Specifications, and Estimates (PS&E) estimating is required in the final design

43 phase for evaluating bids (AASHTO 2013).

44 Therefore, the reliability of cost estimates affects State Transportation Agencies (STAs)

45 at both agency and project levels (AASHTO 2013; Elmousalami 2020). First, timely project

46 completion within budget is critical as it directly affects STAs' key performance indicators,

47 accountability of the public funds, and public satisfaction. Second, unreliable cost estimating

48 negatively affects budget-related communications, budgeting decisions, and the use of agencies'

49 resources (WSDOT 2015). For example, under-estimating causes project delay while additional

50 funding has to be arranged to meet the project costs. On the other hand, over-estimating causes

51 inefficient use of funds (FHWA 2004; Gardner et al. 2017). Specifically, the scoping phase's cost

52 estimate is critical since the project owner uses it as the baseline to set the project budget for cost

53 management; cost estimates in later stages are compared against it (WSDOT 2015).

54 Due to the difference in project maturity levels in different project development phases,

55 the methods used for scoping-phase cost estimating are not the same as those used for planning-

56 phase or PS&E estimating. For planning-phase cost estimating, STAs typically use simple

57 parametric methods such as applying cost per parameter (e.g., dollars per centerline mile or

58 square foot of bridge deck) from past similar projects to estimating new projects due to the

59 limited project information and scope definition in the planning stage (AASHTO 2013; MnDOT

60 2008). Numerous research studies have also developed statistical modeling or advanced artificial

61 intelligence-based parametric models for predicting total project cost to improve estimation

62 accuracy (Elmousalami 2020; Gardner et al. 2017; Karaca et al. 2020; Zhang et al. 2017). Work

63 item-level information (e.g., approximate quantities of major items) becomes available or can be
3
64 reasonably determined in the scoping phase, and STAs often use the historical bid-based method

65 to estimate work items' costs (AASHTO 2013; ITD 2020). Unlike PS&E estimating, estimators

66 in the scoping phase (about 10% to 30% of project definition completed) do not have detailed

67 design plans to determine all work items entailed in a project and their quantities (WSDOT

68 2015). The estimators only focus on high-cost impact work items, based on the Pareto principle,

69 which implies that approximately 20% of the work items account for 80% of a project's total cost

70 (Olumide et al. 2010; PennDOT 2018; TxDOT n.d.). A percentage or minor item allowance is

71 applied to consider the remaining work items (CTDOT 2019; PennDOT 2018).

72 STAs' application of the Pareto principle (also known as the 80/20 rule) to scoping-phase

73 cost estimating faces challenges. First, the numbers 80 and 20 do not necessarily hold in cost

74 estimating, but guidelines are missing or vary across STAs. While some STAs state that 20% of

75 the work items can account for 80% of the cost, others have different ideas about the

76 contribution of the 20% items, such as 70% (Iowa DOT 2012; TxDOT n.d.). Second, the cost

77 contributions of the same set of major items in even similar projects are expected to fluctuate,

78 but little is currently known about the variation. Third, STAs do not have detailed guidance on

79 which items should be used for estimating and rely on estimators' judgments and experiences to

80 select the items. Fourth, high-cost impact work items vary with project work types and work-

81 item breakdown structures, but STAs' guidance or research studies on these dynamics are very

82 limited. Last, the error of applying the Pareto principle to estimating only major items' unit prices

83 and accounting for minor items by a predetermined percentage compared with estimating all

84 work items in a project has not been investigated. All of the issues mentioned above can

85 significantly affect the accuracy and reliability of scoping-phase cost estimates.

4
86 A few studies have applied the Pareto principle to cost estimating but only identified the

87 major work items or influential factors affecting total project costs (Le et al. 2019; Sayed et al.

88 2020; Shehab and Meisami-Fard 2013), which is not sufficient to address the issues. For

89 example, Le et al. (2019) identified the top five work items for unit price visualization, and

90 Sayed et al. (2020) determined the nine most critical influential factors from 29 factors

91 influencing construction cost estimates (e.g., site conditions and estimators' experience).

92 In this research, historical bid data (i.e., cost estimates submitted by the bidders of

93 previous projects in the letting phase) are utilized to address the issues described above.

94 Historical bid data consist of all work items, including both major and minor items, identified by

95 designers/estimators at the final design phase, along with item quantities calculated from detailed

96 design plans. Also, unit prices are available and so allow for the investigation of the cost

97 distribution of the work items.

98 Since numerous sets of major items can be used for scoping-phase cost estimating,

99 selecting an optimal set is desirable. Apart from accuracy, another critical aspect of cost

100 estimating is the amount of effort spent on developing cost estimates (Cao et al. 2018) because of

101 the limited time allowed for estimating (Alroomi et al. 2012; ITD 2020). This research proposes

102 a novel application of multi-objective optimization methods to enhance STAs' current practice of

103 applying the Pareto principle to scoping-phase cost estimation. Two optimization models are

104 proposed to automatically find optimal major work items for cost estimation of different project

105 work types and work-item breakdown structures. The models' output provides new knowledge

106 about optimal major work items, their contribution to total project cost and relative variation, and

107 the Pareto principle approach's errors. Optimization objectives used in the models include 1)

108 maximizing the mean or the median of the cost percentages of major items over total project cost
5
109 (e.g., maximizing the average cost contribution of the top 20% items: it is 80% or much higher?),

110 2) minimizing the coefficient of variance of the percentages for uncertainty reduction, and 3)

111 minimizing the error of applying the Pareto principle in scoping-phase cost estimating.

112 Comparisons between different numbers of major work items are also conducted.

113

114 Research scope and objective

115 Cost estimates are necessary for each development phase (AASHTO 2013). Due to the

116 differences in the amount of input information available for cost estimating and the purpose and

117 required accuracy of the estimates, the methodologies adopted for cost estimation are different

118 between the development phases (PennDOT 2018; WSDOT 2015). This research focuses on cost

119 estimating at the scoping phase due to its importance to budget approval and project cost

120 management (see Fig. 1).

121 <Insert Fig. 1 here>

122

123 STAs often apply the Pareto principle to scoping-phase cost estimating (ITD 2020;

124 Olumide et al. 2010). Specifically, they use the historical bid-based estimating method for major

125 quantifiable work items and then apply a fixed percentage to account for the remaining minor

126 items. Historical bid-based estimating is an approach that relies on the bid tabulation data (see

127 Fig. 2) of similar projects in recent years to estimate unit prices for a new project with possible

128 modifications by estimators to account for unique project characteristics (Le et al. 2019).

129 However, STAs' guidance on applying the Pareto principle is minimal (see Table 1). Various

130 issues mentioned in the Introduction section can significantly affect estimating accuracy.

131 <Insert Fig. 2 here>


6
132 <Insert Table 1 here>

133

134 This paper aims to develop an innovative multi-objective approach for selecting optimal

135 major work items for cost estimating by STAs in the scoping phase by leveraging their currently

136 available historical bid tabulation data with the considerations of different project work types,

137 work-item breakdown structures, and multiple objectives. For each optimal set of major work

138 items, definitive and relevant information to apply the itemset for future projects is determined

139 and recommended. The expected error of using the Pareto principle with the itemset is also

140 discovered and discussed.

141

142 Methodology

143 This section presents the development of two multi-objective optimization models to

144 support the application of the Pareto principle to cost estimating in the scoping phase. Fig. 3

145 gives an overview of the proposed models, including three main components: 1) Input data, 2)

146 Model development, and 3) Model output.

147 <Insert Fig. 3 here>

148

149 Input data

150 Historical bid data in recent years of an STA are used as input of the proposed models.

151 Each project's data attributes used in the models include the project work type and the winning

152 bidder's extended amounts of all work items in the project (see Fig. 2). Additionally, two user-

7
153 defined input variables are necessary, including 1) Project work type of interest and 2) Number

154 of major items (n) selected from all items relevant to the work type.

155 • The project work type variable is needed because different work types have

156 significantly different lists of work items and cost distributions. For example, the top

157 five items of Hot Mix Asphalt (HMA) resurfacing projects are entirely different from

158 those of Portland Cement Concrete (PCC) pavement projects (see Table 2). Major work

159 items used for cost estimating, therefore, vary with project work types.

160 • The number of major items (n) is related to the amount of time and effort spent on

161 scoping-phase cost estimating.

162 <Insert Table 2 here>

163

164 Model development

165 The model development process consists of three phases: 1) Decision variables, 2)

166 Objectives, and 3) Implementation.

167

168 Phase 1: Decision variables

169 Assume the past projects of the work type of interest involve m work items (from Item 1

170 to Item m); m can be hundreds. However, not all work items can be used for cost estimating in

171 the scoping phase due to the lack of detailed design information and design plans. With the user-

172 defined input n, the models need to identify optimal n-item sets from the m work items for future

173 cost estimating in the scoping phase. The selection of n items from the m work items is modeled

174 by vector I [see Eq. (1) to (3)].

175
8
176 ! = (!! , !" , !# , … , !$ ) (1)

1, *+ !,-. * */ /-0-1,-2
177 !% = ( (2)
0, *+ !,-. * */ 45, /-0-1,-2

178 ∑$
%&! !% = 4 (3)

179 Phase 2: Objectives

180 Assume the input bid tabulation data contain k projects of the work type of interest (Prj 1

181 to Prj k). Given a selected set of n work items, the cost percentage of the items in the itemset

182 over total project cost in each project is calculated, forming a sample of k cost percentages for

183 the k projects: Pj (j = 1, k) [see Eq. (4) to (6)].

184 75,80 9:5;-1, 15/, 5+ <:; ; = <=' = ∑$


%&! >?%' (4)

185 75,80 15/, 5+ /-0-1,-2 *,-./ *4 <:; ; = @!=' = ∑%, *! &! >?%' (5)

+*,"
186 =5/, 9-:1-4,8A- 5+ /-0-1,-2 *,-./ 5B-: ,5,80 9:5;-1, 15/, = <' = -,"
(6)

187 Where EAij = the extended amount of Item i in Prj j. Statistical measures of the cost percentages

188 are then calculated (see Fig. 4).

189 • Mean of the percentages (PMean): the average of the percentages, a measure of central

190 tendency (Ott and Longnecker 2015).

191 • Median of the percentages (PMedian): the middle value when the percentages are ordered

192 from the lowest to the highest, another measure of central tendency. Compared to the

193 mean, the median is less sensitive to skewness and outliers (Ott and Longnecker 2015).

194 • Coefficient of variance (CV): the standard deviation divided by the mean of the

195 percentages. While standard deviation is commonly used to measure population

196 spread, CV is more appropriate in comparing the variability between populations (i.e.,

9
197 between different sets of major work items) because it reflects variation over the

198 baseline mean value (Ott and Longnecker 2015).

199 <Insert Fig. 4 here>

200

201 A set of n major work items is associated with two measures of the center of cost

202 percentages: PMean and PMedian. Therefore, there are two strategies for applying the itemset to

203 estimating the total cost of a new project.

204 • Strategy 1 involves calculating the total cost of the major items included in the project

205 (MIC) and then dividing it by PMean to obtain a total project cost estimate. The Mean

206 Absolute Percentage Error (MAPE), a common measure of how accurate a forecast or

207 estimate is in percentage, between MIC/PMean values and total project costs reflects the

208 error of applying Strategy 1, compared with estimating all work items. The resulted

209 MAPE is called "MAPE using mean" in shorthand.


#$%"
! /-,"
&
210 @?<> C/*4A .-84 = ∑.'&! D #'() D (7)
. -, "

211 • Strategy 2, similarly, involves calculating the total cost of the major items included in

212 the project and then dividing it by PMedian to obtain a total project cost estimate. The

213 MAPE between MIC/PMedian values and total project costs reflects the error of applying

214 Strategy 2, compared with estimating all work items. The resulted MAPE is called

215 "MAPE using median" in shorthand.


#$%"
/-,"
! &
216 @?<> C/*4A .-2*84 = ∑. D #'*!() D (8)
. '&! -, "

10
217 Model 1 includes two objectives: 1) Maximizing PMean and 2) Minimizing CV. This

218 model is designed to examine whether the numbers 80 and 20 in the 80/20 rule hold in cost

219 estimating and assess the variation of the ratio.

220 Model 2 includes four objectives: 1) Maximizing PMean, 2) Maximizing PMedian, 3)

221 Minimizing MAPE using mean, and 4) Minimizing MAPE using median. The model is designed

222 to examine the errors of applying the Pareto principle to scoping-phase cost estimating and

223 compare two application strategies: using mean (Strategy 1) or median (Strategy 2) to represent

224 the cost contributions of major work items over total project cost.

225

226 Phase 3: Implementation

227 The models are implemented using the Non-Dominated Sorting Genetic Algorithm

228 (NSGA-II) due to its high capability to solve a variety of multi-objective optimization problems

229 and its ability to consider all objectives simultaneously without the need to pre-define weights

230 for the objectives (Deb et al. 2002). Examples of applying NSGA-II to construction decision-

231 making problems are multi-objective scheduling and planning (El-Abbasy et al. 2017; Halabya

232 and El-Rayes 2020; Jeong and Abraham 2006; Peralta et al. 2018), design optimization (Dino

233 and Üçoluk 2017; Hyari et al. 2016), and optimal construction layout or work zone design and

234 development (Abdelmohsen and El-Rayes 2016; Abdelmohsen and El-Rayes 2018; Schuldt and

235 El-Rayes 2018). The models are developed with the support of the Distributed Evolutionary

236 Algorithms in Python (DEAP) toolbox (Fortin et al. 2012).

237 The NSGA-II computations in the models include four primary tasks:

238 1) An initialization task that randomly creates an initial population of sets of n work items

239 from all work items relevant to the project work type of interest [see Eq. (1) to (3)],
11
240 2) A fitness evaluation task that calculates model evaluation metrics for each generated n-

241 item set (see Fig. 4),

242 3) A ranking task that sorts the itemsets using nondomination ranks and crowding

243 distances (Deb et al. 2002), and

244 4) An evolution task of generating new populations with selection, crossover, and

245 mutation operations.

246 Tasks 2 to 4 iterate until a stop criterion (e.g., a maximum number of iterations) is met.

247

248 Model output

249 Given the work type of interest and the user-defined number of major items (n) for cost

250 estimating, the output of Model 1 is optimal n-item sets, trade-off solutions between maximizing

251 PMean while minimizing CV. The item sets and their corresponding measures are provided for

252 further comparisons and analyses.

253 Similarly, the output of Model 2 is optimal n-item sets with four defined objectives: 1)

254 Maximizing PMean, 2) Maximizing PMedian, 3) Minimizing MAPE using mean, and 4) Minimizing

255 MAPE using median. The MAPE values illustrate the errors of applying the Pareto principle to

256 cost estimating.

257 However, the MAPE values from the output of Model 2 are calculated from the same

258 projects used for optimization, which probably makes the errors underestimated. Therefore, the

259 optimal sets of n major work items are applied to the cost estimation of each of the projects in a

260 hold-out dataset with the two defined strategies (i.e., Strategy 1: using mean and Strategy 2:

261 using median). Assume there are l projects in the hold-out dataset [from Prj (k+1) to Prj (k+l)].

12
#$%"
! /-,"
&#'()
262 @?<> C/*4A .-84 54 ,ℎ- ℎ5025C, 28,8/-, = ∑.10
'&.1! D D (9)
0 -,"

#$%"
/-,"
! &#'*!()
263 @?<> C/*4A .-2*84 54 ,ℎ- ℎ5025C, 28,8/-, = ∑.10
'&.1! D D (10)
0 -,"

264 The MAPE values on the hold-out dataset are expected to provide more realistic and reliable

265 error estimates.

266

267 Data analysis and results

268 Bid tabulation data of 181 HMA resurfacing projects (Work type code: 1523) and 95

269 projects of the work type PCC pavement – Grade/New (Code: 1014) were obtained from an STA

270 and used as input for the proposed models.

271

272 Model 1

273 The input data include 181 HMA resurfacing projects (Work type code: 1523). A total of

274 421 work items were used in the letting stage of these projects. According to the Pareto principle,

275 STAs suggest estimating only high-cost impact work items in the scoping phase, only a small

276 portion of all relevant work items. Since the proposed models allow users to define the number

277 of major items they want to use for scoping-phase cost estimating, optimal sets of various

278 numbers of work items can be obtained. Fig. 5 provides a wide range of optimal sets of 42 work

279 items (10% of the total number of work items).

280 <Insert Fig. 5 here>

281

13
282 On one end of the spectrum, Solution A represents an optimal 42-item set that results in

283 the highest PMean. On average, the 42 items of Solution A account for 91.7% of the total project

284 cost. However, that percentage is also associated with the highest variation among the generated

285 solutions, which may not be a desirable feature from cost estimators' perspectives. Solution B

286 corresponds to an optimal 42-item set at the other end of the spectrum, resulting in the lowest CV

287 but a PMean value significantly lower than that of Solution A (i.e., 85.6% compared with 91.7%).

288 Between the two ends of the spectrum, the model provides other trade-offs between the two

289 defined objectives: 1) Maximizing PMean and 2) Minimizing CV. Of those, Solution C seems to

290 be a harmonious solution between Solution A and Solution B, using the elbow method. In fact,

291 the ratio 90/10, not 80/20, applies to Solution C.

292 The solutions correspond to the following setting: population size (npop) = 100, two-point

293 crossover with the probability that an offspring is produced by crossover (pcx) = 0.7, two-point

294 swapping mutation with the probability that an offspring is produced by mutation (pmut) = 0.2,

295 and the maximum number of iterations = 2,000. The solutions are compared with the results of

296 two other NSGA-II settings and the solutions obtained by another popular multi-objective

297 optimization method, i.e., the Strength Pareto Evolutionary Algorithm II (SPEA-II). Fig. 6 shows

298 that the solutions from the alternatives lie very close to the baseline, illustrating the quality and

299 convergence of the baseline solutions.

300 <Insert Fig. 6 here>

301

302 When the number of major items used for scoping-phase cost estimating increases, the

303 effort required for the estimation increases, which naturally results in improvements in both

304 objectives (see Fig. 7). However, the improvements decrease as the number of major items
14
305 increases due to the uneven cost distribution among work items. As shown in Fig. 7, for HMA

306 resurfacing projects (Work type code: 1523), the improvements in the objectives from 67 items

307 to 76 items are significantly smaller than those from 25 items to 34 items. A similar trend applies

308 to PCC pavement projects.

309 <Insert Fig. 7 here>

310

311 Fig. 7 also demonstrates the necessity of applying the Pareto principle to different project

312 work types separately. While 421 items were used in the bid tabulation data of the past 181

313 HMA resurfacing projects, 771 items were used in those of 95 PCC pavement projects. The

314 evaluation metrics (i.e., PMean and CV) of the two work types are also substantially different for

315 the same ratio of major items. Take the ratio of 20% as an example. While 20% of the work

316 items of HMA resurfacing projects can account for up to 96.5% of the total project cost on

317 average, the counterpart only contributes up to an average of 91.6% of the total cost; both cost

318 percentages are much larger than 80%, from the 80/20 rule. Conversely, the variations of the cost

319 percentages in PCC pavement projects are significantly smaller than those in HMA resurfacing

320 projects for the same major item ratios (i.e., from 6% to 20% with an increment of 2%).

321

322 Model 2

323 For a specific set of major work items, the major items' contribution to total project cost

324 varies among projects, even with the projects of the same work type. A measure of the center of

325 cost contributions (i.e., mean or median) is necessary to apply an itemset for future projects.

326 Model 2 can enable comparison between using the mean or the median of cost percentages in

327 past projects for future estimating.


15
328 The bid tabulation data of the 181 HMA resurfacing projects were randomly divided into

329 two datasets: optimization (75% of the projects) and hold-out (25% of the projects). With the

330 optimization dataset as input, Model 2 can generate optimal solutions for different user-defined

331 numbers of work items and the solutions' evaluation metrics (i.e., PMean, PMedian, MAPE using

332 mean, and MAPE using median). The generated optimal sets of work items were subsequently

333 applied to the hold-out dataset [see Eq. (9) to (10)]. For each optimal set, the total cost of its

334 items in each hold-out project was calculated, then divided by the corresponding PMean or PMedian

335 and compared with the total project cost. Collectively, the relative differences were used to

336 obtain MAPE using mean or MAPE using median on the hold-out dataset. The obtained MAPE

337 values reflected the expected errors of applying the Pareto principle and the optimal itemset for

338 future projects alone, not yet considering other factors influencing the accuracy of a cost estimate

339 (e.g., inaccuracies in quantity takeoffs and unit price estimates).

340 Fig. 8 shows the optimal 42-item solutions generated from Model 2 with the optimization

341 dataset as input and their corresponding 1) mean and 2) MAPE using mean on the optimization

342 dataset and 3) MAPE using mean on the hold-out dataset. MAPE values on the optimization

343 dataset are generally smaller than MAPE values on the hold-out dataset, justifying the need for

344 splitting the original data into two datasets as performed. The average error of using an optimal

345 set of 42 major work items with mean as the center measure (i.e., Strategy 1) for scoping-phase

346 cost estimating of HMA resurfacing projects alone is 9.3%.

347 <Insert Fig. 8 here>

348

349 Similarly, Fig. 9 shows the optimal 42-item solutions generated from Model 2 with the

350 optimization dataset as input and their corresponding 1) median and 2) MAPE using median on
16
351 the optimization dataset and 3) MAPE using median on the hold-out dataset. The average error

352 of using an optimal set of 42 major work items with median as the center measure (i.e., Strategy

353 2) for scoping-phase cost estimating of HMA resurfacing projects alone is 9.2%.

354 <Insert Fig. 9 here>

355

356 For each scenario of user-defined input (i.e., the project work type of interest and the

357 number of major items used for cost estimating), a comparison between MAPE using mean and

358 MAPE using median is necessary to choose between mean or median to represent the cost

359 contribution of an optimal major-item set over total project cost. Fig. 10 provides a comparison

360 between the mean and the median of the optimal 42-item sets of HMA resurfacing projects

361 generated by Model 2. The right part of the figure shows some trade-offs between maximizing

362 PMean and maximizing PMedian, demonstrating that the two objectives should be separated as

363 originally defined in Model 2. The left part of the figure illustrates that the mean is smaller than

364 the median in all generated optimal solutions, indicating left-skewed distributions of cost

365 percentages.

366 <Insert Fig. 10 here>

367

368 Each generated set of 42 major work items is associated with four MAPE values: MAPE

369 using mean and MAPE using median, from the output of Model 2, and MAPE using mean and

370 MAPE using median when applying the itemset to the hold-out dataset. The left part of Fig. 11

371 shows that MAPE using median (Strategy 2) is smaller than MAPE using mean (Strategy 1) on

372 the optimization dataset. On the hold-out dataset, MAPE using median is also smaller than

373 MAPE using mean for most solutions (see the right part of Fig. 11). Collectively, the figure
17
374 suggests that median (Strategy 2) is a better center measure than mean (Strategy 1) due to

375 smaller resulted errors. The result agrees with the common suggestion that the median is

376 preferred to the mean for skewed distributions.

377 <Insert Fig. 11 here>

378

379 As previously shown in Fig. 9, for optimal sets of 42 major work items, the MAPE using

380 median on the hold-out dataset has an average value of 9.2%. To examine the changes when

381 different subsets of the data were used for optimization, the other 25% of project groups were

382 left aside as the hold-out dataset while the remaining corresponding 75% of the projects were

383 used for optimization, similar to four-fold cross-validation. The average errors in the four cases

384 are 9.2%, 7.7%, 7.3%, and 9.7%, and the four-fold average error is 8.5% (see Fig. 12).

385 <Insert Fig. 12 here>

386

387 In order to obtain a more accurate cost estimate or a lower error of applying the Pareto

388 principle to cost estimating, an obvious solution is to increase the number of major work items to

389 be used for scoping-phase cost estimating. However, the effectiveness in reducing errors is not

390 linear with the increase in the number of major items (see Fig. 12).

391

392 Discussion and practical implications

393 The proposed models' outputs and further analyses have addressed the five issues stated

394 in the Introduction section about the applications of the Pareto principle to scoping-phase cost

395 estimating by STAs.

18
396 First, the numbers 80 and 20 in the 80/20 rule and STAs' guidance summarized in Table 1

397 are not likely to hold in cost estimating. For example, Solution C in Fig. 5 corresponds to an

398 optimal set of 42 major work items (i.e., 10% of all relevant items) that accounts for, on average,

399 90% of the total cost of an HMA resurfacing project. Additionally, 20% of the work items can

400 contribute up to 96.5% of the total project cost on average (see Fig. 7). In these cases, 90/10 or

401 97/20 (not 80/20) apply. Second, Model 1 provides a measure of the variability of the cost

402 percentages of a major item set over total project cost, which is not available in previous studies

403 or STAs' guidance. Various trade-off solutions between maximizing the average cost percentage

404 and minimizing the CV of cost percentages are also provided. Third, with an STA's historical bid

405 tabulation dataset as input, the proposed models can automatically determine various optimal

406 sets of major items, which helps avoid STAs' reliance on estimators' judgments and experiences

407 in selecting major work items for scoping-phase cost estimating.

408 Fourth, the applications of the Pareto principle to different project work types can be

409 substantially different (see Fig. 7). Yet, STAs' guidance is the same for all projects, which can

410 cause significant errors in cost estimation. The proposed models can be applied to different STAs

411 (i.e., work breakdown structures) and project work types (e.g., HMA resurfacing or PCC

412 pavement) flexibly to obtain corresponding optimal sets of major work items for cost estimating.

413 Last, the errors of applying the Pareto principle were not known but are now discovered by

414 Model 2. Two strategies of using the generated optimal solutions to future estimating (i.e., mean

415 or median of the cost percentages of the major items in past projects) can also be compared using

416 the output of Model 2. For example, in the case of HMA resurfacing projects and 42-item sets,

417 Strategy 2 is preferred due to its smaller MAPE values.

19
418 Since most STAs have minimal guidance on applying the Pareto principle to cost

419 estimating, the agencies can significantly enhance their current practices by periodically applying

420 the proposed models to their historical bid tabulation data and then providing the generated

421 results to cost estimators to estimate new projects. For example, at the end of each five years, an

422 STA can implement the models to its historical bid data of the five most recent years to obtain

423 optimal solutions and corresponding measures for each common project work type. Cost

424 estimators do not need to conduct any optimization tasks but directly apply the model outputs in

425 the subsequent five years. For a new project, cost estimators are provided with different optimal

426 sets of major work items. Depending on the new project's available information and the certainty

427 in the occurrence of particular major work items, they can select the most appropriate optimal

428 major item set for estimation in the scoping phase. They are also provided with the

429 corresponding measures (e.g., PMean, PMedian, CV, MAPE using mean, and MAPE using median)

430 to calculate total project cost and variation from the major items' cost and determine the expected

431 error. As the number of major work items used for estimating increases, the required effort

432 increases, and the error of the approach decreases. Cost estimators can rely on an expected error

433 of the approach to select the number of work items they need to estimate unit prices. Since the

434 required accuracy of scoping-phase cost estimating is not high, with a needed range from -30%

435 to +50% (AASHTO 2013), an error of 8.5% of applying the Pareto principle to cost estimating

436 like in the cases of Fig. 12 seems acceptable, allowing mistakes caused by other factors (e.g.,

437 inaccuracies in quantity takeoffs and unit price estimates).

438

439

20
440 Summary and conclusions

441 A scoping-phase cost estimate is critical to the development of a typical transportation

442 project. It is used to set the budget for project cost management. Due to the lack of detailed

443 design plans in the scoping phase and limited time allowed for estimating, STA cost estimators

444 often apply the Pareto principle in their estimation. They focus time and effort on estimating

445 major high-cost impact work items and account for the remaining items by a fixed percentage or

446 a minor item allowance. However, STAs have minimal guidance on this approach. Besides, few

447 previous studies have investigated the issues associated with applying the Pareto principle, such

448 as major item determination, variances among projects and project work types, or the error of the

449 approach itself.

450 This study's primary contribution to the body of knowledge is the novel application of

451 multi-objective optimization methods to address those issues and discover new knowledge that

452 provides STAs with practical guidance on using the Pareto principle in early cost estimating. The

453 proposed models can automatically determine various optimal sets of major work items for

454 different project work types and numbers of work items from an STA's historical bid tabulation

455 dataset. The output measures of each solution also provide definitive information for applying

456 the optimal work item set for future projects, such as the distribution of cost percentages over

457 total project cost with mean, median, and CV and the expected error associated with using the

458 mean or the median for a new project. For example, the case study results show that 10% and

459 20% of the work items can respectively contribute up to 92% and 97% of total project cost,

460 which both are much larger than 80%. Also, the cost contribution ratios of an optimal major item

461 set vary even with projects under the same work type, specifically with a CV of at least 9.4% in

462 the case of HMA resurfacing projects and a major item ratio of 10%. The study's findings also
21
463 illustrate the differences in applying the Pareto principle to different project types since the

464 Pareto fronts for the same major item ratio of HMA resurfacing and PCC pavement projects are

465 significantly different. The effectiveness of increasing the number of major work items used for

466 cost estimation was also assessed, which was an error reduction of 41% (from 8.5% to 5.0%)

467 when doubling the ratio from 10% to 20%.

468 Due to data availability issues, this study is limited by considering only one primary

469 factor (i.e., project work type) influencing the list of work items and cost distribution in a project.

470 Although projects of the same work type are similar to each other, variations still exist.

471 Considering extra factors may help create more uniform groups of projects. However, it also

472 requires extra effort from STAs in collecting additional data. Furthermore, the proposed models

473 provide measures of variations and errors to help STA cost estimators make an informed data-

474 back decision. Although this research focused on transportation projects, the proposed approach

475 also applies to other construction sectors provided that a systematic and consistent work

476 breakdown structure is in use and historical bid tabulation data are available.

477

478 Acknowledgment

479 The authors would like to acknowledge that the Iowa Department of Transportation has

480 provided the bid tabulation data for this study.

481

482 Data availability statement

483 The data used during the study were provided by a third party. Direct requests for these

484 materials may be made to the provider as indicated in the Acknowledgements. Some models or

22
485 codes that support the findings of this study are available from the corresponding author upon

486 reasonable request.

487

488 References

489 AASHTO (2013). "Practical guide to cost estimating." AASHTO Washington, DC.

490 Abdelmohsen, A. Z., and El-Rayes, K. (2016). "Optimal Trade-Offs between Construction Cost

491 and Traffic Delay for Highway Work Zones." Journal of Construction Engineering and

492 Management, 142(7), 05016004.

493 Abdelmohsen, A. Z., and El-Rayes, K. (2018). "Optimizing the Planning of Highway Work

494 Zones to Maximize Safety and Mobility." Journal of Management in Engineering, 34(1).

495 Alroomi, A., Jeong, D. H. S., and Oberlender, G. D. (2012). "Analysis of Cost-Estimating

496 Competencies Using Criticality Matrix and Factor Analysis." Journal of Construction

497 Engineering and Management, 138(11), 1270-1280.

498 Cao, Y., Ashuri, B., and Baek, M. (2018). "Prediction of Unit Price Bids of Resurfacing

499 Highway Projects through Ensemble Machine Learning." Journal of Computing in Civil

500 Engineering, 32(5).

501 CTDOT (2019). "Connecticut Department of Transportation 2019 Estimating Guidelines."

502 Connecticut Department of Transportation.

503 Deb, K., Pratap, A., Agarwal, S., and Meyarivan, T. (2002). "A fast and elitist multiobjective

504 genetic algorithm: NSGA-II." IEEE Transactions on Evolutionary Computation, 6(2),

505 182-197.

23
506 Dino, I. G., and Üçoluk, G. (2017). "Multiobjective Design Optimization of Building Space

507 Layout, Energy, and Daylighting Performance." Journal of Computing in Civil

508 Engineering, 31(5), 04017025.

509 El-Abbasy, M. S., Elazouni, A., and Zayed, T. (2017). "Generic Scheduling Optimization Model

510 for Multiple Construction Projects." Journal of Computing in Civil Engineering, 31(4).

511 Elmousalami, H. H. (2020). "Artificial Intelligence and Parametric Construction Cost Estimate

512 Modeling: State-of-the-Art Review." Journal of Construction Engineering and

513 Management, 146(1).

514 FHWA (2004). "Guidelines on Preparing Engineer's Estimate, Bid Reviews and Evaluation."

515 Federal Highway Administration, Washington, D.C.

516 Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A. G., Parizeau, M., and Gagné, C. (2012).

517 "DEAP: Evolutionary algorithms made easy." The Journal of Machine Learning

518 Research, 13(1), 2171-2175.

519 Gardner, B. J., Gransberg, D. D., and Rueda, J. A. (2017). "Stochastic Conceptual Cost

520 Estimating of Highway Projects to Communicate Uncertainty Using Bootstrap

521 Sampling." ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part

522 A: Civil Engineering, 3(3), 05016002.

523 Halabya, A., and El-Rayes, K. (2020). "Optimizing the Planning of Pedestrian Facilities Upgrade

524 Projects to Maximize Accessibility for People with Disabilities." Journal of Construction

525 Engineering and Management, 146(1).

526 Hyari, K. H., Khelifi, A., and Katkhuda, H. (2016). "Multiobjective Optimization of Roadway

527 Lighting Projects." Journal of Transportation Engineering, 142(7).

528 Iowa DOT (2012). "Design Manual." Iowa Department of Transportation.


24
529 ITD (2020). "Construction Cost Estimating Guide." Idaho Transportation Department.

530 Jeong, H. S., and Abraham, D. M. (2006). "Operational Response Model for Physically Attacked

531 Water Networks Using NSGA-II." Journal of Computing in Civil Engineering, 20(5),

532 328-338.

533 Karaca, I., Gransberg, D. D., and Jeong, H. D. (2020). "Improving the Accuracy of Early Cost

534 Estimates on Transportation Infrastructure Projects." Journal of Management in

535 Engineering, 36(5).

536 Le, C., Le, T., Jeong, H. D., and Lee, E.-B. (2019). "Geographic Information System–Based

537 Framework for Estimating and Visualizing Unit Prices of Highway Work Items." Journal

538 of Construction Engineering and Management, 145(8), 04019044.

539 MDT (2016). "Cost Estimation Procedure for Highway Design Projects." Montana Department

540 of Transportation.

541 MnDOT (2008). "Cost Estimation and Cost Management - Technical Reference Manual."

542 Minnesota Department of Transportation.

543 Olumide, A. O., Anderson, S. D., and Molenaar, K. R. (2010). "Sliding-Scale Contingency for

544 Project Development Process." Transportation Research Record, 2151(1), 21-27.

545 Ott, R. L., and Longnecker, M. (2015). An Introduction to Statistical Methods and Data

546 Analysis, Cengage Learning.

547 PennDOT (2018). "Estimating Manual." Pennsylvania Department of Transportation.

548 Peralta, D., Bergmeir, C., Krone, M., Galende, M., Menéndez, M., Sainz-Palmero, G. I.,

549 Bertrand, C. M., Klawonn, F., and Benitez, J. M. (2018). "Multiobjective Optimization

550 for Railway Maintenance Plans." Journal of Computing in Civil Engineering, 32(3),

551 04018014.
25
552 Sayed, M., Abdel-Hamid, M., and El-Dash, K. (2020). "Improving cost estimation in

553 construction projects." International Journal of Construction Management, 1-20.

554 Schuldt, S., and El-Rayes, K. (2018). "Optimizing the Planning of Remote Construction Sites to

555 Minimize Facility Destruction from Explosive Attacks." Journal of Construction

556 Engineering and Management, 144(5).

557 Shehab, T., and Meisami-Fard, I. (2013). "Cost-Estimating Model for Rubberized Asphalt

558 Pavement Rehabilitation Projects." Journal of Infrastructure Systems, 19(4), 496-502.

559 TxDOT (n.d.). "Risk-Based Construction Cost Estimating - Reference Guide." Texas

560 Department of Transportation.

561 WSDOT (2015). "Cost Estimating Manual for Projects." Washington State Department of

562 Transportation.

563 Zhang, Y., Minchin, R. E., and Agdas, D. (2017). "Forecasting Completed Cost of Highway

564 Construction Projects Using LASSO Regularized Regression." Journal of Construction

565 Engineering and Management, 143(10), 04017071.

566

567

568

569

570

571

572

573
26
574 List of Tables

575

576 Table 1. STAs' guidance on applying the Pareto principle to cost estimating

577 Table 2. Top five work items of two different work types

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

27
594 Table 1. STAs' guidance on applying the Pareto principle to cost estimating

No. Guidance Project phase Reference

1 Minor items: 20 – 30% of the cost Scoping WSDOT (2015)


2 Minor items: 15 – 30% of the major item Scoping (CTDOT 2019)
cost
3 20% of work items: 70% of the cost Early phases (Iowa DOT 2012)
4 20% of work items: 80% of the cost Scoping (MnDOT 2008)
5 20% of work items: 80% of the cost Not specified (PennDOT 2018)
6 20% of work items: 80% of the cost Scoping (TxDOT n.d.)
7 Major items: 65 – 85% of the cost Not specified (MDT 2016)
8 Major items: most of the cost Not specified (ITD 2020)
595

596

597

598

599

600

601

602

603

604

605

606

607

28
608 Table 2. Top five work items of two different work types

No. Work Work type Top five work items


type code description
1. Asphalt binder, PG 58-28
1 1523 HMA resurfacing
2. Asphalt binder, PG 64-22
3. Asphalt binder, PG 64-28
4. HMA mixture (300,000 ESAL), Intermediate
5. Granular shoulders, Type B

1. Standard or slip-form PCC pavement


2 1014 PCC pavement –
2. Mobilization
Grade/New
3. Special backfill
4. Excavation, Class 10, Roadway and borrow
5. Removal of pavement

Note: Top five work items in terms of the total extended amount of the item of all projects of the
work type in a historical bid dataset
609

610

611

612

613

614

615

616

617

618

619

620

621

29
622 List of Figures

623

624 Fig. 1. Timing of scoping-phase cost estimating in the project development phases

625 Fig. 2. Data attributes of historical bid data

626 Fig. 3. Proposed multi-objective optimization models

627 Fig. 4. Calculations of evaluation metrics

628 Fig. 5. Optimal trade-offs between mean and CV of cost percentages of a 42-major-item set over

629 total project cost in HMA resurfacing projects (Work type 1523)

630 Fig. 6. Convergence of optimal trade-offs solutions

631 Fig. 7. Optimal trade-offs between mean and CV of cost percentages of a major item set over

632 total project cost in different projects: A comparison between two work types (1523 — HMA

633 resurfacing and 1014 — PCC pavement)

634 Fig. 8. Optimal trade-offs between the mean of cost percentages of a 42-major-item set over total

635 project cost in HMA resurfacing projects and MAPE using mean

636 Fig. 9. Optimal trade-offs between the median of cost percentages of a 42-major-item set over

637 total project cost in HMA resurfacing projects and MAPE using median

638 Fig. 10. Comparison between the mean and median of cost percentages of a 42-major-item set

639 over total project cost in HMA resurfacing projects

640 Fig. 11. Comparison between MAPE using mean and MAPE using median on the optimization

641 dataset and the hold-out dataset

642 Fig. 12. Changes in average MAPE using median with four-fold cross-validation and increases

643 in the number of major items

644
30
Cost Estimating at the
Scoping Phase Historical Bid
Data

Preliminary
Planning Scoping Final Design Letting
Design

on level)
ject maturity (Project definiti
Pro

1
Project • Project number, Letting date, Winning bidder, etc.
information • Work type (e.g., Hot Mix Asphalt resurfacing)
Bid tabulation
data of a project
Extended
Code Description Unit Quantity Unit price
Work item amount
information ... ... ... ... ...
2303- Asphalt Binder, PG Ton 960 525 504,000
0245828 58-28
... ... ... ... ...
... ... ... ... ...
INPUT DATA
Bid tabulation data of an SHA in recent years User-defined input
• Project number and work type • Work type of interest
• Work item data • Number of major items (n) to be
• Item code, description, unit, and quantity chosen from all relevant items of the
• Winning bidder’s unit price work type for cost estimating

MODEL DEVELOPMENT

Phase 1: Decision Variables Phase 3:


An n-item set of major work items to be used for cost estimating
Implementation
Optimization
Phase 2: Objectives computations
using NSGA II
Model 1:
• Maximize the mean of percentages of the total cost of major items
over total project cost
• Minimize the coefficient of variance of the percentages
Model 2:
• Maximize the mean of percentages of the total cost of major items
over total project cost
• Maximize the median of percentages of the total cost of major
items over total project cost
• Minimize MAPE using mean
• Minimize MAPE using median

OUTPUT
Model 1: Model 2:
• Optimal sets of major work items • Optimal sets of major work items
• Corresponding mean and coefficient of • Corresponding mean, median, MAPE using
variation mean, and MAPE using median
• MAPE using mean on a hold-out dataset
• MAPE using median on a hold-out dataset
For a work type of interest,
• k projects of the work type in the bid tabulation data: Prj 1, Prj 2, ..., Prj k
• m work items included in the k projects: Item 1, Item 2, ..., Item m

All relevant A set of n Percentages over Total Evaluation metrics


work items major items Project Cost
(S) • Mean of the percentages:
Item 1 For each of the k projects, PMean
Item 2 Item 2 • Calculate total project cost: • Median of the
PCj (j = 1,k) percentages: PMedian
Item 3
• Calculate the total cost of • Coefficient of variation of
Item 4 Item 4 the major items included in the percentages: CV
Item 5 the project: MICj (j = 1,k) • MAPE using Mean:
... ... • Calculate the percentage MAPE between (MICj /
over the total project cost: PMean) and PCj
... ... Pj = MICj / PCj (j = 1,k) • MAPE using Median:
Item (m-1) Item (m-1) MAPE between (MICj /
Item m PMedian) and PCj
0.1150

Note:
0.1125
Worktype 1523 (421 relevant items) Solution A
Find optimal 42-item sets (10%) Mean = 0.917
Percentages over Total Project Cost

0.1100 CV = 0.114
Coefficient of Variation of

0.1075

0.1050
Solution B
0.1025 Mean = 0.856
CV = 0.094

0.1000

Solution C
0.0975 Mean = 0.900
CV = 0.100
0.0950

0.0925
0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92
Mean of Percentages over Total Project Cost (%)
0.120
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.115
Percentages over Total Project Cost

0.110
Coefficient of Variation of

0.105

0.100

0.095 NSGA-II, npop = 50, pcx = 0.7, pmut = 0.2


NSGA-II, npop = 100, pcx = 0.9, pmut = 0.1
SPEA-II
Baseline, NSGA-II, npop = 100, pcx = 0.7, pmut = 0.2

0.090
0.85 0.86 0.87 0.88 0.89 0.90 0.91 0.92
Mean of Percentages over Total Project Cost (%)
Worktype 1014 (771 relevant items) Worktype 1523 (421 relevant items)
0.16
25 items (06%)

0.14
34 items (08%)
Percentages over Total Project Cost

0.12
42 items (10%)
Coefficient of Variation of

0.10 51 items (12%)

59 items (14%)
0.08
46 items (06%) 67 items (16%)

62 items (08%) 76 items


0.06
(18%)
77 items (10%)
84 items
0.04 93 items (12%)
108 items (20%)
(14%) 123 items
(16%) 139 items
0.02 154 items
(18%)
(20%)

0.00
0.70 0.75 0.80 0.85 0.90 0.95 1.00
Mean of Percentages over Total Project Cost (%)
0.14
Note:
Worktype 1523 (421 relevant items)
0.13 Find optimal 42-item sets (10%)

0.12

0.11
MAPE Using Mean

MAPE on the hold-out dataset


of the optimal solutions
0.10
Average MAPE on the hold-out dataset = 0.093

0.09
Average MAPE on the optimization dataset = 0.083

0.08 Optimal solutions on


the optimization dataset

0.07

0.06
0.86 0.87 0.88 0.89 0.90 0.91 0.92 0.93
Mean of Percentages over Total Project Cost (%)
0.13

Note:
Worktype 1523 (421 relevant items)
0.12
Find optimal 42-item sets (10%)

0.11
MAPE on the hold-out dataset
MAPE Using Median

of the optimal solutions


0.10

Average MAPE on the hold-out dataset = 0.092


0.09

Optimal solutions on
the optimization dataset
0.08

Average MAPE on the optimization dataset = 0.072


0.07

0.06
0.87 0.88 0.90 0.92 0.94 0.96 0.98 0.99
Median of Percentages over Total Project Cost (%)
1.000
n
ia
ed 0.980

Median of Percentages over


Median of Percentages over

0.975 M
=

Total Project Cost (%)


Total Project Cost (%)

n
ea
0.950 M
0.975

0.925
0.970
0.900

0.875 0.965

0.850
0.915 0.920 0.925 0.930
0.85 0.90 0.95 1.00
Mean of Percentages over
Mean of Percentages over Total Project Cost (%)
Total Project Cost (%)
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.10 0.13
MAPE (Mean) = MAPE (Mean) =
MAPE (Median) MAPE (Median)
0.12

on the Hold-out Dataset


0.09
MAPE Using Median

MAPE Using Median


0.11
0.08
0.10

0.07
0.09

0.06
0.08
0.08 0.09 0.10 0.11 0.12 0.13
0.06 0.07 0.08 0.09 0.10 MAPE Using Mean
MAPE Using Mean on the Hold-out Dataset
Note:
Worktype 1523 (421 relevant items)
Find optimal 42-item sets (10%)
0.097
0.092
0.085
0.073
0.077 0.070
0.073 0.064
0.055 0.055
0.055 0.056 0.050
0.046 0.044

Fold 1 Fold 2 Fold 3 Fold 4 Fold 1 Fold 2 Fold 3 Fold 4 Fold 1 Fold 2 Fold 3 Fold 4
10% of the work items 15% of the work items 20% of the work items

View publication stats

You might also like