Enabling Deep Learning Using Synthetic Data Wiring Harness Case Study
Enabling Deep Learning Using Synthetic Data Wiring Harness Case Study
net/publication/360881323
Enabling deep learning using synthetic data: A case study for the automotive
wiring harness manufacturing
CITATIONS READS
13 530
3 authors, including:
All content following this page was uploaded by Huong Giang Nguyen on 06 September 2022.
ProcediaProcedia
CIRP 00CIRP
(2017)
107000–000
(2022) 1263–1268
www.elsevier.com/locate/procedia
inspection (AOI) and recognition of wiring harnesses and 2.1. Point clouds and deep neural networks for point clouds
respective components have been proposed by [4–6].
Nevertheless, there is still a need for flexible and smart quality Point clouds are a set of data points in Euclidian space. For
assurance systems for the wiring harness manufacturing. inspection purposes, they are sampled from surfaces of objects
in a scene and used to represent objects in a scene as 3D
information in semantic data structures. The simplest
description of point clouds is a list with XYZ coordinates of
each point. More valuable information can be added with
descriptors such as surface normal vectors, RGB values, and
intensity values. Point clouds are characterized by irregularity,
unstructuredness, and lack of order [11]. The density of data
points in a scene varies, the distance between data points is
irregular, and point clouds are inherently unorganized.
The characteristics of point clouds complicate the
development of deep learning algorithms for point cloud data
processing. Therefore, early research focuses on data
processing of 3D information by converting point clouds into
alternative representation forms such as voxels or octrees
[12,13]. Recent research advances show that DNNs succeed in
Fig. 1. (a) door wiring harness on an assembly board; (b) 3D model of door
wiring harness with simulated deformable linear objects. extracting features directly from point clouds as unstructured
geometric data. There is no need for data pre-processing to
The following paper introduces a deep learning-based transform the point cloud into a structured grid format. The
approach for the automated optical inspection of wiring most common computer vision tasks using point clouds are
harnesses. The general intent is based on processing point cloud point cloud classification, object detection and tracking, and
data to evaluate the quality of wiring harness assembly states point cloud segmentation [11]. PointNet++ is one of the first
ranging between partially and fully assembled wiring well-known DNNs that implement local feature learners in a
harnesses. The contributions of this paper are threefold. First, a hierarchical structure for the recognition of fine patterns and
synthetic data generation approach for wiring harnesses based aggregation of local features to higher-level features [14].
on CAD engineering data is introduced. Second, a deep learning Other DNNs have since then outdone the performance of the
model trained with synthetic and real point cloud data is PointNet++. Examples are convolutional neural networks such
developed for the automated optical inspection tasks. Lastly, as RS-CNN [14] which have shown that convolutional layers
experiments were conducted to derive success factors for the contribute to learning features of point clouds. Graph CNNs
synthetic data generation. can also extract geometric features in point clouds by
To present the contributions, the subsequent paper is connecting neighboring points to generate a graph [15].
structured as follows. The paper begins with the related work
section for deep learning for point cloud data as well as 2.2. Synthetic point cloud data generation
synthetic point cloud generation approaches. Then, the
developed data processing pipeline are presented. The pipeline Synthetic data are defined as data that are not acquired
consists of the process steps data collection, data pre- through the measurement of physically existing objects but are
processing, data splitting, and model training and evaluation. still highly similar to real data [16]. Synthesizing artificial data
Experiments were conducted with varying parameters and the is becoming increasingly interesting in the research field of
results are analyzed. The paper concludes with the theoretical deep learning because the bottleneck for high-performing
and practical implications as well as the future research agenda. DNNs is a large database for training. The collection and
preparation of real data is a tedious and time-consuming
2. Deep learning for point clouds process. In comparison to real data, synthetic data are easier
and faster to collect, allow higher labeling precision, and are
Deep learning has become a powerful algorithm for less cost-intensive.
computer vision tasks. The strengths of deep neural networks The generation and collection of synthetic data range
(DNN) are the capability to automatically extract features, between simple and advanced approaches. On the one hand,
process structured and unstructured data, and generalize to point clouds can be synthesized from 3D CAD models.
unseen data [7]. Due to these characteristics, there has been a Examples of this approach are the datasets ModelNet40 [17]
significant increase in research on deep learning in the and ShapeNet [18]. On the other hand, elaborate approaches
manufacturing environment [8–10]. To develop the proposed can generate synthetic data through simulation and rendering.
AOI solution for wiring harnesses, the following related work Such techniques not only demand product models but also need
section focuses on DNNs for processing point clouds. high effort for the proper modeling of the optical sensor,
illumination, and other relevant environmental conditions. The
goal is to create models and simulations as close as possible to
Huong Giang Nguyen et al. / Procedia CIRP 107 (2022) 1263–1268 1265
Author name / Procedia CIRP 00 (2022) 000–000 3
the real environment which occurs during the inference phase. were collected in a laboratory environment. Data capturing was
In an ideal scenario, synthetic data substitutes real data during executed for an assembly line consisting of multiple
the training stage without compromising DNN’s performance workstations similar to the real manufacturing environment. A
when processing unseen real data during the inference stage. structured-light 3D scanner, specifically the Zivid2 3D camera,
However, there is the inherent problem of a domain gap was employed to capture point clouds of different door wiring
between real and synthetic data. To overcome the domain gap, harness configurations, each in multiple assembly states. Due
transfer learning, generative models, and a combination of real to the size of the wiring harness and the specifications of the
and synthetic data are applied. These approaches can camera, more precisely working distance and aperture angle,
significantly increase the accuracy and generalization each point cloud represents either the left or right half of the
performance of DNNs in comparison to training with real point wiring harness. Point clouds are captured and saved as text files
cloud data only [19,20]. containing the XYZ coordinates. Afterward, each point of a
point cloud was manually labeled and the XYZ surface normals
3. Deep learning-based data processing pipeline for the were added using the CloudCompare software [21]. The labels
automated optical inspection of wiring harnesses applied for the segmentation task are ‘untaped wire bundles’
(UWB), ‘fully taped wire bundles’ (TWB), ‘grommet’
The goal of this paper is the development of a data (GROM), ‘clip type 1’ (C1), ‘clip type 2’ (C2), ‘clip type 3’
processing pipeline for the automated optical inspection of (C3), ‘clip type 4’ (C4) and ‘connector’ (CONN).
wiring harness assembly states that occur in the final assembly The starting point of synthetic data generation is the wiring
line. Assembly states are uncompleted wiring harnesses, which harness development, see Fig. 2. The wiring harness with
are work in processes in the assembly line, or finished wiring multiple configurations was designed in CAD. The 3D
harnesses, which occur at the end of the assembly line. For the component models were either created or retrieved from
implementation in this paper, the data processing pipeline is suppliers’ libraries. The next step is composing a scene similar
realized and tested for a door wiring harness, see Fig. 1. The to the physically existing assembly board with wiring harness
automated optical inspection aims to assess, whether the assembly states by arranging the 3D models. The result is a
correct components have been assembled at a specific development artifact that contains the geometric information,
workstation and in the correct position. These quality features name, and layout of components. Then, for each product
are analyzed to check whether the correct product configuration, a 3D CAD model was exported into a simulation
configuration has been built. This is especially important due tool. Mechanical simulation is important for the detection of
to the high product variance and component variety in the deformable linear objects because these objects, especially
wiring harness assembly. To deduct these quality features, a wires, are characterized by many degrees of freedom, no
DNN, specifically PointNet++, for point cloud segmentation is compression strength, and elastic deformation. The simulation
implemented. The proposed data processing pipeline uses real should ensure that the objects look as realistic as possible and
and synthetic depth data and is composed of the process steps that various states of the wiring harness are captured. The
of data collection, data pre-processing, data splitting, and results are 3D simulation models for assembly states of the
model training and evaluation, see Fig. 2. wiring harness, which each reflect the wiring harness as a work
in progress after each work station and finished assembly as an
Wire harness development Synthetic point cloud generation output at the end of the assembly line. For our implementation,
the CAD model was created in the design tool NX [22] and the
3D CAD
3D model library
model
3D simulation
model
Raw synthetic
point cloud
simulations were conducted in IPS Cable Simulation [23]. An
for components
example of the simulation is given in Fig. 1(b). The next step
is generating labeled point clouds for each 3D simulation
Real point cloud generation
model which is done in Blender [24]. Labeling information was
Raw real
Preprocessed syn. added by defining the custom object properties on a component
point cloud
point cloud
Optical sensor Physical wire harness
level. To convert the 3D model into a point cloud data format,
each polygon surface of the model was used to uniformly
Deep neuronal network framework sample data points. Points on these faces were randomly
sampled based on the face-vertex information by a python code
Preprocessed real Training, validation, Trained Pre-trained
point cloud and test dataset PointNet++ PointNet++
in Blender. Analogous to the real data, the synthetic data were
generated for defined wiring harness sections. The output of
this process is raw point cloud text files containing the XYZ
Fig. 2. framework for point cloud segmentation using synthetic and real data. coordinates, XYZ normal coordinates, and labels for each point
in the point cloud. The labeling process for synthetic data is less
time-consuming in comparison to manual labeling. Meta-
3.1. Data collection information is provided from the wiring harness development
and can be directly reused to automatically derive labeling
The data collection step can be differentiated into the information.
generation of real depth data and synthetic depth data. Real data
1266 Huong Giang Nguyen et al. / Procedia CIRP 107 (2022) 1263–1268
4 Author name / Procedia CIRP 00 (2022) 000–000
3.2. Data pre-processing The DNN models were created through hyperparameter
optimization. Grid search was implemented to find the best
The raw point clouds need to be transformed into an hyperparameter setting for each experiment. Hyperparameters
understandable format for the DNN. The real data, which were were composed of the number of epochs, learning rate,
captured with a physically existing setup, are incomplete and momentum, batch size, decay step, decay rate, and dropout. All
noisy. The conversion of the raw real point clouds into a deep training runs were evaluated regarding the evaluation metrics
learning-relevant format requires the tasks of data cleaning and accuracy, loss, and intersection over union (IoU).
data reduction. The pre-processing tasks for the real point cloud
data were conducted in CloudCompare. Outlier points were Table 1. Overview of experiments, training approaches, and data splitting.
manually deleted to denoise the point clouds. Furthermore, No. Training approach Train/val/test
unnecessary objects, such as the assembly board and fixing E1 Baseline model: Real data train and test 120/15/15
elements that are attached to the formboard, were removed. E2 Ideal model: Synthetic data train, real data test 137/13/15
Normal information was retrieved and added to the text file.
E3 Synthetic data: Synthetic data train and test 262/30/33
The last step is randomly downsampling the number of data
E4 Real data train and synthetic data test 120/15/20
points of each point cloud to 2048. The pre-processing step for
synthetic data entails the task of downsampling because the 3D E5 Synthetic data traing, 262/30/33,
real data fine-tuning and test 120/15/15 or 60/6/15
model of the wiring harnesses already provides clean point
clouds. Similarly to downsampling the real data, the number of E6 Mixed dataset: Real and synthetic data train
414/46/15
(70/30 ratio), real data test
points in the synthetic data was randomly reduced to 2048.
Synthetic data pre-processing is conducted with a python code E7 Mixed dataset: Real and synthetic data train
243/27/15
(50/50 ratio), real data test
in Blender. The result of the data pre-processing step is text
files that contain 2048 rows and seven columns with XYZ
coordinates, XYZ surface normal, and labeling information. 4. Results and discussion
(E4). E2 and E4 provide valuable insight into the domain gap synthetic to the real training data results in a slight increase in
between real and synthetic datasets. It seems like the data accuracy in comparison to the baseline model. Halving the
distribution of synthetic data is easier to learn for the DNN. A amount of real data caused a slight decrease in accuracy. Fine-
reduction of the XYZ coordinates using T-distributed tuning with 60 point clouds for training and 6 point clouds for
Stochastic Neighbor Embedding (t-SNE) [26] for a synthesized validation resulted in a mean IoU of 92.70 % and testing
as well as real point cloud shows that a clear distinction accuracy of 96.88 %. A drop of 0.48 % mean IoU is not
between the classes is more evident for synthetic data, see significant considering that 50% of the real data was removed
Fig. 3. The t-SNE graph for the real point cloud shows a from the database for fine-tuning.
scattering of the classes. The boundaries between the classes
are blurry and hardly observable. This can especially be
observed for the classes C1 (4) and TWB (2).