DFSFGSG
DFSFGSG
ABSTRACT:
The monitoring of the construction progress is an essential task on construction sites, which nowadays is con-
ducted mostly by hand. Recent image processing techniques provide a promising approach for reducing manual
labor on site. While modern machine learning algorithms such as convolutional neural networks have proven
to be of sublime value in other application fields, they are widely neglected by the CAE industry so far. In this
paper, we propose a strategy to set up a machine learning routine to detect construction elements on UAV
photographs of construction sites. In an accompanying case study using 750 photographs containing nearly
10.000 formwork elements, we reached accuracies of 90% when classifying single object images and 40% when
locating formwork on multi-object images.
Ground truth
Prediction
Area of overlap Area of union
Figure 2: Area of overlap and area of union for predicted and Figure 3: Reprojected bounding box of a column on a picture gath-
labeled bounding boxes ered during acquisition
Figure 2: Sample data from a) labeling, b) image snippets for classification, as well as c) snippets for DetectNet
The process of labeling can benefit from this work by a self-written tool that takes all labeled data and
because by this method, labels for all building ele- images as input and crops them automatically. The
ment can be marked in all pictures, that were taken tool is made available on GitHub as an OpenSource
and aligned accordingly. Future research will focus solution 1. To assure relatively even image sizes with
on this method to extract labels for all construction sufficient detailing, we removed all images with re-
elements and train a CNN accordingly without the sulting dimensions under 200 x 200 pixels.
time consuming, manual labeling work to be done. To train the algorithm not only on formwork ele-
ments but on several classes, we added seven classes
(see Table 1) that are related to construction sites
4 CASE STUDY from the Caltech 256 dataset (Griffin, Holub, and
Perona 2007). The Caltech 256 provides single object
In the following sections, we present an image analy- images of 256 classes that need no further prepro-
sis routine including data preparation as well as the cessing for image classification.
training of convolutional neural networks to be able
to recognize formwork elements. We focus on two Table 1: Classes and number of images per class used for train-
ing of an image classification CNN
different image analysis tasks: image classification
and object detection. Class Origin Number of
images
4.1 Data preparation Barrel Caltech 256 47
Bulldozer Caltech 256 110
As an initial dataset, 9.956 formwork elements were Car Caltech 256 123
labeled manually on pictures of three construction Chair Caltech 256 62
sites that were collected during different case studies Formwork Own dataset 1410
in the recent years. The images contain formwork el- Screwdriver Caltech 256 102
ements from two different, German manufacturers Wheelbarrow Caltech 256 91
and vary in size (30cm up to 2,70m length) as well as Wrench Caltech 256 39
color (red, yellow, black, grey). They were taken at
varying weather conditions on partly cloudy, as well
As GoogLeNet requests input images of 256 x 256
as sunny days. The image acquisition was achieved
pixels, all images are resized to that dimensions by
with aerial photography by different UAVs, but also
DIGITS. For image classification, DIGITS automati-
from the ground with regular digital cameras, result-
cally splits the data into training and validation data.
ing in image sizes from 4000 x 3000 px up to 6000 x
The CNN converged quickly towards high accura-
4000 px. The manual labeling process for this data set
cies (top-1-error) around 85% (Figure 4) and stag-
took around 130h to complete.
nated at 90% after 100 epochs, which is a satisfying
The gathered data is processed as plain text files
result. To achieve even higher accuracies throughout
for each picture and processed for the various neural
all classes, the number of images per class could be
networks according to their respective requirements.
evened out by adding additional images to the un-
derrepresented classes of the training data in future
4.2 Image analysis work.
For image analysis, we used the Nvidia Deep Learn-
ing GPU Training System DIGITS (Yeager 2015),
which provides a graphical web interface to the wide-
spread machine learning frameworks TensorFlow,
Caffe, and Torch (NVIDIA 2018). It enables data-
management, network design and visualization of the
training process.
1
https://github.com/tumcms/Labelbox2DetectNet
4.3 Object detection input to various classification and detection algo-
rithms, resulting in very high success rates for the
As next step, an object detection algorithm is
classification of single object images and mediocre
introduced, to exactly detect certain elements in
success rates for object detection on multi-object im-
images and also precisely find the position of these
ages. However, as object detection is a highly de-
elements. For this purpose, the dataset depicted in
manding task concerning a large community of re-
Figure 4 c) is used. To detect several formworks
searchers, the results give a promising starting point
within an image of a construction site, we used a CNN
for future improvements.
with DetectNet architecture, implemented in Caffe.
To reduce training time, we used the weights of the
“BVLC GoogleNet model” 2, which has been
pretrained on ImageNet data. The training again is
performed using the Adam Solver.
We split the labeled images into 85% of training
data and 15% of validation data. The images were rec-
orded at a high resolution between 4000 x 3000 and
6000 x 4000 pixels. To minimize the necessary com-
putational effort, we split the images into smaller
patches with a size of 1248 x 384 pixels.
We trained the CNN twice with 300 epochs each.
Figure 4: Precision, recall and mAP of the DetectNet after one
Both precision and recall reached values around 63%, round of 300 epochs of training for detecting formwork on images of
the mAP stagnated around 44% (Figure 5). The net- construction sites.
work manages to detect most formwork elements cor-
rectly with low rates of false detections. In Figure 6,
the resulting bounding box for one example image is
depicted. For this image, a very good result was re-
trieved.
Further steps to improve the object detection algo-
rithm entail more extensive preprocessing of the data,
longer training periods and adjustments of both the
network architecture and the solving algorithms.
Table 2: Number of images and number of formwork elements
contained in that images for training and validation of the object
detection Figure 5: Detected bounding box for formwork elements on a pho-
tography of a construction site.
Nr. of form-
Purpose Nr. of images work elements
Training 646 8429
Validation 99 1487 6 ACKNOWLEDGMENTS
2
Released for unrestricted use at
https://github.com/NVIDIA/DIGITS/tree/master/examples/ob-
ject-detection
Stilla. 2015. “A Concept for Automated Construction with Region Proposal Networks.” IEEE Transactions on
Progress Monitoring Using BIM-Based Geometric Pattern Analysis and Machine Intelligence 39 (6): 1137–
Constraints and Photogrammetric Point Clouds.” ITcon 49. doi:10.1109/TPAMI.2016.2577031.
20: 68–79. Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause,
Buduma, Nikhil. 2017. Fundamentals of Deep Learning : Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015.
Designing Next-Generation Machine Intelligence “ImageNet Large Scale Visual Recognition Challenge.”
Algorithms. Vol. 44. doi:10.1007/s13218-012-0198-z. International Journal of Computer Vision 115 (3): 211–
Girshick, Ross. 2015. “Fast R-CNN.” In 2015 IEEE 52. doi:10.1007/s11263-015-0816-y.
International Conference on Computer Vision (ICCV), Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet,
1440–48. IEEE. doi:10.1109/ICCV.2015.169. Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Vanhoucke, and Andrew Rabinovich. 2015. “Going
Malik. 2014. “Rich Feature Hierarchies for Accurate Deeper with Convolutions.” In 2015 IEEE Conference on
Object Detection and Semantic Segmentation.” In 2014 Computer Vision and Pattern Recognition (CVPR), 1–9.
IEEE Conference on Computer Vision and Pattern IEEE. doi:10.1109/CVPR.2015.7298594.
Recognition, 580–87. IEEE. doi:10.1109/CVPR.2014.81. Tao, Andrew, Jon Barker, and Sriya Sarathy. 2016. “DetectNet:
Golparvar-fard, Mani, F Pena-Mora, and S Savarese. 2009. Deep Neural Network for Object Detection in DIGITS.”
“D4AR - a 4 Dimensional Augmented Reality Model for https://devblogs.nvidia.com/detectnet-deep-neural-
Automation Construction Progress Monitoring Data network-object-detection-digits/.
Collection, Processing and Communication.” Journal of Yeager, Luke. 2015. “DIGITS : The Deep Learning GPU
Information Technology in Construction 14 (June): 129– Training System.” ICML AutoML Workshop.
53.
Griffin, G., A. Holub, and P. Perona. 2007. “Caltech-256 Object
Category Dataset.”
http://www.vision.caltech.edu/Image_Datasets/Caltech25
6/.
Han, Kevin K., and Mani Golparvar-Fard. 2017. “Potential of
Big Visual Data and Building Information Modeling for
Construction Performance Analytics: An Exploratory
Study.” Automation in Construction 73 (January): 184–
98. doi:10.1016/j.autcon.2016.11.004.
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun.
2016. “Deep Residual Learning for Image Recognition.”
In 2016 IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), 770–78. IEEE.
doi:10.1109/CVPR.2016.90.
Kingma, Diederik P., and Jimmy Ba. 2014. “Adam: A Method
for Stochastic Optimization,” December.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2017.
“ImageNet Classification with Deep Convolutional
Neural Networks.” Communications of the ACM 60 (6):
84–90. doi:10.1145/3065386.
Kropp, Christopher, Christian Koch, and Markus König. 2018.
“Interior Construction State Recognition with 4D BIM
Registered Image Sequences.” Automation in
Construction 86 (February): 11–32.
doi:10.1016/j.autcon.2017.10.027.
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015.
“Deep Learning.” Nature 521 (7553): 436–44.
doi:10.1038/nature14539.
NVIDIA. 2018. “Nvidia Digits - Deep Learning Digits
Documentation,” no. May.
Redmon, Joseph, Santosh Divvala, Ross Girshick, and Ali
Farhadi. 2016. “You Only Look Once: Unified, Real-Time
Object Detection.” In 2016 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 779–
88. IEEE. doi:10.1109/CVPR.2016.91.
Redmon, Joseph, and Ali Farhadi. 2017. “YOLO9000: Better,
Faster, Stronger.” In 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 6517–25. IEEE.
doi:10.1109/CVPR.2017.690.
Redmon, Joseph, and Ali Farhadi. 2018. “YOLOv3: An
Incremental Improvement,” April.
Ren, Shaoqing, Kaiming He, Ross Girshick, and Jian Sun. 2017.
“Faster R-CNN: Towards Real-Time Object Detection