Thesis - Anomaly Detection in Manufacturing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

U NIVERSITY OF L’A QUILA

M ASTER T HESIS

Exploration and Pre-selection of Images


for Downstream Tasks: Leveraging
Embeddings and Anomaly Detection in
the Manufacturing Process

Author: Supervisor:
Ashish D AHAL Dr. Phuong T. N GUYEN

Corso di Laurea Magistrale in Informatica

Department of Information Engineering Computer Science and


Mathematics

Academic Year 2022/2023


ii
iii

Acknowledgements
I would like to express my deep gratitude to my academic supervisor, Dr. Phuong
T. Nguyen, for his invaluable guidance, support, and expertise throughout the thesis
process. His constant encouragement and feedback significantly contributed to the
success of this project.
I extend my immense appreciation to my supervisor Johan Westö from Novia
University of Applied Sciences, Finland and Mika Adler, supervisor from Mirka Oy,
for their industry insights, practical suggestions, and mentorship. Their contribu-
tions were crucial in shaping this research.
I am also grateful to my friends and family for their continuous support and
encouragement throughout my academic journey. Their unwavering belief in me
motivated me to overcome challenges and pursue my goals with enthusiasm.
Furthermore, I would like to acknowledge the many researchers and authors
whose work has provided the foundational knowledge for this study. Their dedica-
tion to advancing knowledge in the field of computer vision and machine learning
has been a constant source of inspiration and motivation.
v

Contents

Acknowledgements iii

1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Scope and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Thesis Structure and Overview . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Review 5
2.1 Defect Detection in Manufacturing . . . . . . . . . . . . . . . . . . . . . 5
2.2 Image Pre-selection for Machine Learning . . . . . . . . . . . . . . . . . 6
2.3 Image Embeddings and Pre-trained Models . . . . . . . . . . . . . . . . 6
2.4 Dimensionality Reduction Techniques: UMAP and Alternatives . . . . 7
2.5 Anomaly Detection: LOF and Alternatives . . . . . . . . . . . . . . . . 9

3 Methodology 11
3.1 The Proposed Framework . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Dataset Acquisition and Pre-processing . . . . . . . . . . . . . . . . . . 12
3.2.1 Image Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Feature Extraction using ResNet Model . . . . . . . . . . . . . . . . . . 15
3.3.1 Pre-trained ResNet Model Selection . . . . . . . . . . . . . . . . 15
3.3.2 Image Embeddings Calculation . . . . . . . . . . . . . . . . . . . 16
3.4 Unsupervised Anomaly Detection using LOF . . . . . . . . . . . . . . . 18
3.4.1 LOF Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4.2 Hyperparameter Selection and Tuning . . . . . . . . . . . . . . . 19
3.5 Dimensionality Reduction using UMAP . . . . . . . . . . . . . . . . . . 20
3.5.1 UMAP Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5.2 Parameter Selection and Tuning . . . . . . . . . . . . . . . . . . 20
3.6 Image Pre-selection for Annotation . . . . . . . . . . . . . . . . . . . . . 21
3.6.1 Threshold Determination for Anomaly Detection . . . . . . . . 21
3.6.2 Image Exploration and Pre-selection . . . . . . . . . . . . . . . . 21
3.7 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4 Experiments and Results 25


4.1 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.1 Feature Extraction Performance . . . . . . . . . . . . . . . . . . . 25
4.1.2 UMAP Performance . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 LOF Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2 Embeddings Visualization and Observations . . . . . . . . . . . . . . . 28
4.2.1 Colour-coding with LOF scores . . . . . . . . . . . . . . . . . . . 31
4.2.2 Colour-coding with Camera Number . . . . . . . . . . . . . . . 32
4.2.3 Colour-coding with Timestamp . . . . . . . . . . . . . . . . . . . 32
vi

4.3 Pre-selection Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32


4.3.1 Pre-selection Accuracy . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.2 Missed Defect Fraction . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Discussion 37
5.1 Interpretation of the Results . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.2 Potential Improvements and Extensions . . . . . . . . . . . . . . . . . . 38
5.2.1 Automatic Threshold Selection . . . . . . . . . . . . . . . . . . . 38
5.2.2 Image Selection Based on Vector Quantization . . . . . . . . . . 38
5.2.3 Contrastive Learning for Feature Extraction . . . . . . . . . . . 38
5.2.4 Active Learning Approach . . . . . . . . . . . . . . . . . . . . . 38
5.3 Applicability to Other Industries . . . . . . . . . . . . . . . . . . . . . . 39

6 Conclusions 41
6.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Bibliography 43
vii

List of Figures

2.1 Buliding Block of Residual Learning [28] . . . . . . . . . . . . . . . . . . 6


2.2 Architecture of ResNet and VGG-19 [28] . . . . . . . . . . . . . . . . . . 7
2.3 UMAP Applied to COIL20, MNIST, and Fashion MNIST Datasets [33] 8

3.1 Proposed Framework for Image Pre-selection . . . . . . . . . . . . . . . 12


3.2 Sandpaper Images with Defects . . . . . . . . . . . . . . . . . . . . . . . 13
3.3 Sandpaper Images without Defects . . . . . . . . . . . . . . . . . . . . . 14
3.4 FiftyOne Dashboard for Image Tagging . . . . . . . . . . . . . . . . . . 22

4.1 Per-image Computation Time for ResNet-18 Feature Extraction . . . . 25


4.2 Total Computation Time for ResNet-18 Feature Extraction . . . . . . . 26
4.3 Per-image Computation Time for 2D UMAP Embedding Generation . 26
4.4 Total Computation Time for 2D UMAP Embedding Generation . . . . 27
4.5 Per-image Computation Time for LOF Score . . . . . . . . . . . . . . . 27
4.6 Total Computation Time for LOF Score . . . . . . . . . . . . . . . . . . . 28
4.7 Dashboard for Image Exploration . . . . . . . . . . . . . . . . . . . . . . 29
4.8 Scatter Plot of 2D UMAP Embeddings for Different Batches . . . . . . 30
4.9 Patterns and Global Anomalies . . . . . . . . . . . . . . . . . . . . . . . 30
4.10 Colour-coding with LOF Scores . . . . . . . . . . . . . . . . . . . . . . . 31
4.11 Color-Coding Based on Camera Number . . . . . . . . . . . . . . . . . 32
4.12 Color-coded Based on Production Timestamp . . . . . . . . . . . . . . . 33
4.13 2D UMAP Scatter Plot for Training Dataset . . . . . . . . . . . . . . . . 34
4.14 YOLOv8 Top1 Validation Accuracy . . . . . . . . . . . . . . . . . . . . . 35
ix

List of Tables

3.1 Dataset Image Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13


3.2 Resnet-18 architecture for Embeddings Extraction . . . . . . . . . . . . 16

4.1 Configuration for the YOLOv8 Model . . . . . . . . . . . . . . . . . . . 34


4.2 Confusion Matrix of the Pre-selection Result . . . . . . . . . . . . . . . 35
4.3 Calculation of Missed Defect Fraction (MDF) . . . . . . . . . . . . . . . 36
1

Chapter 1

Introduction

1.1 Background and Motivation


Sandpaper manufacturing is a crucial process in the production of abrasive materials
used across various industries, including automotive, woodworking, metalworking,
and construction [1]. Ensuring the quality of a product is vital to maintain its per-
formance, longevity, and safety [2]. Therefore, defect detection in sandpapers is an
essential aspect of the manufacturing process.
In recent years, advances in computer vision and machine learning have rev-
olutionized the field of automated defect detection systems [3]. These systems,
which significantly improve the efficiency, accuracy, and consistency of quality con-
trol compared to manual inspection methods, have found applications in various
industries, including sandpaper manufacturing [4]. However, the effectiveness of
such systems is highly dependent on the quality and relevance of the training data
used to build the models [4].
Training these models often necessitates the procurement of massive amounts
of labelled data, a process that can be both time-consuming and costly [3]. The
complexity is compounded in a manufacturing environment, where hundreds of
thousands of images from the production line arrive daily, making comprehensive
labelling an impractical task. Furthermore, the rarity of defects in manufacturing
processes leads to an inherently imbalanced dataset [5]. This imbalance significantly
challenges learning algorithms, often resulting in models with suboptimal perfor-
mance on the minority class - the class that is of prime interest in defect detection
tasks [6].
Addressing this issue requires an innovative approach to facilitate the process
of data labelling while ensuring the quality of the training dataset. Unsupervised
learning techniques, such as anomaly detection algorithms, have shown promise in
identifying potential defects without the need for labelled data [7, 8]. These tech-
niques can effectively learn the normal behaviour and identify deviations from it as
potential anomalies or defects [9, 10].
By combining unsupervised learning with image embeddings from pre-trained
models, it may be possible to develop a method for pre-selecting images that are
more likely to contain defects [11, 12]. Image embeddings provide a compact and
rich representation of the image content, capturing both the local and global im-
age features. These embeddings, when learned from large-scale pre-trained models,
have been shown to capture a wide range of visual semantics [13, 14].
This approach can reduce the overall effort required for data labelling, ultimately
enhancing the performance of defect detection systems in sandpaper manufacturing.
By focusing the labelling effort on a pre-selected set of images that are more likely
to contain defects, we can ensure a more balanced and informative training dataset.
This, in turn, can lead to more robust and accurate defect detection models [5, 6].
2 Chapter 1. Introduction

1.2 Problem Statement


This thesis primarily aims to assess the efficacy of unsupervised learning approaches
in streamlining the annotation process in sandpaper manufacturing by identifying
images of potential defects prior to labelling. The research strives to address several
pertinent aspects:

1. The development and application of an unsupervised technique to identify


and pre-select images showing possible defects during sandpaper production,
thereby minimizing the resources invested in data labelling.

2. The establishment of a comprehensive workflow that integrates image em-


beddings, anomaly detection mechanisms, and dimensionality reduction tech-
niques. This pipeline is tailored to detect potential defects in sandpaper images
and provide a visual representation of the same.

3. The rigorous evaluation of the devised method in terms of pre-selection accu-


racy and the proportion of defects overlooked during this preliminary stage of
selection.

To address these research questions, this thesis introduces a methodology that har-
nesses image embeddings extracted from a pre-trained ResNet model, the Local Out-
lier Factor for anomaly identification, and UMAP for reducing dimensionality and
facilitating visualization. This approach aims to expedite the process of image pre-
selection and annotation, thereby optimizing the labelling process and improving
the overall quality of the training dataset.

1.3 Scope and Limitations


The scope of this thesis is limited to the development and evaluation of an unsuper-
vised method for pre-selecting sandpaper images with potential defects for annota-
tion. The study focuses on the use of image embeddings from a pre-trained ResNet
model, Local Outlier Factor for outlier detection, and UMAP for dimensionality re-
duction and visualization. The research is conducted in collaboration with Mirka,
and the dataset used for experimentation is provided by the company.
The limitations of the study include the availability and representativeness of the
dataset, the unavailability of the ground truth for the entire dataset, the generaliz-
ability of the proposed method to other industries and applications, and the reliance
on unsupervised learning techniques, which may be sensitive to hyperparameter
settings and noise in the data. Additionally, the study assumes that the pre-trained
ResNet model is suitable for feature extraction in the context of sandpaper manufac-
turing, although other pre-trained models could also be considered.

1.4 Thesis Structure and Overview


The remainder of this thesis is organized as follows:

• Chapter 2 presents a literature review on defect detection in manufacturing,


unsupervised learning for image pre-selection, image embeddings, anomaly
detection techniques, and dimensionality reduction methods
1.4. Thesis Structure and Overview 3

• Chapter 3 describes the methodology used in the study, including dataset ac-
quisition and pre-processing, feature extraction using a pre-trained ResNet
model, unsupervised anomaly detection using Local Outlier Factor, image pre-
selection for annotation, dimensionality reduction using UMAP, and evalua-
tion metrics.

• Chapter 4 presents the experimental results, including feature extraction per-


formance, UMAP performance, Local Outlier Factor performance evaluation,
and pre-selection results.

• Chapter 5 discusses the interpretation of the results, potential improvements


and extensions, and the applicability of the proposed method to other indus-
tries and applications.

• Chapter 6 concludes the thesis by summarizing the findings and suggesting


directions for future work.
5

Chapter 2

Literature Review

2.1 Defect Detection in Manufacturing


The manufacturing process often involves the collection of large image datasets for
quality control and defect detection. However, these datasets often suffer from im-
balance, with the majority of images representing non-defective samples and only a
small proportion showing defects [15, 16, 17, 18]. This imbalance poses a significant
challenge in training machine learning models for defect detection, as the models
may become biased towards the majority class, i.e., the defect-free images [19].
Moreover, the annotation of these datasets, which involves labelling the images
as either normal or defective, is a time-consuming and costly process [20]. This is
particularly true in cases where defects are rare events [15], as is often the case in
sandpaper manufacturing. The high cost of annotation, coupled with the dataset
imbalance, necessitates the development of innovative approaches to defect detec-
tion that can overcome these challenges.
Similarly, defects can manifest as either local anomalies, in which only a portion
of an image displays abnormal characteristics, or global anomalies, in which the
entire image appears abnormal. [21]. Both types of anomalies must be taken into
account for a thorough quality evaluation.
One such approach is the use of unsupervised learning techniques, which do not
require labelled data for training. For instance, anomaly detection algorithms can be
employed to identify potential defects in the images, thereby facilitating the process
of data labelling [22]. These algorithms can effectively detect unknown anomalous
patterns in the images, thereby enabling the detection of defects that may not have
been previously encountered.
Furthermore, the use of pre-trained models has been proposed as a means to
enhance the performance of defect detection systems. These models, which have
been trained on large, diverse datasets, can extract useful features from the images,
which can then be used for defect detection [23]. For instance, a pre-trained ResNet
model can be used to generate image embeddings, which can then be used as input
to an anomaly detection algorithm [22].
In conclusion, the challenges associated with defect detection in manufacturing,
such as dataset imbalance and high annotation cost, can be addressed through the
use of unsupervised learning techniques and pre-trained models. These approaches
can facilitate the process of data labelling, enhance the performance of defect detec-
tion systems, and ultimately contribute to the improvement of product quality in the
manufacturing industry.
6 Chapter 2. Literature Review

2.2 Image Pre-selection for Machine Learning


The concept of image pre-selection in machine learning is gaining traction, especially
in the context of neural scaling laws. These laws describe how error decreases as a
power of the training set size or model size, but they often come with significant
computational and energy costs [24].
Likewise, [25] propose a novel approach to address these challenges. The au-
thors argue that not all training samples are equally important, and many of them
become less relevant after a few epochs of training. They propose a novel importance
sampling scheme that accelerates the training of any neural network architecture by
focusing computation on the samples that will introduce the biggest change in the
parameters, reducing the variance of gradient estimates.
In line with this idea, [24] suggest that many training examples are redundant,
implying that training datasets can be pruned to smaller sizes without sacrificing
performance. The authors develop a new analytic theory of data pruning and demon-
strate that exponential scaling is possible concerning the pruned dataset size.
Furthermore, the authors of [24] introduce a new, inexpensive unsupervised data
pruning metric that does not require labels. Surprisingly, this unsupervised metric
performs comparably to the best supervised pruning metrics that do require labels
and significantly more computational resources.
The findings of these studies collectively suggest that the discovery of effective
data pruning methods and importance sampling techniques may offer a viable path
towards substantially improving neural scaling laws. By reducing the resource costs
associated with modern deep learning, such advancements can have a significant
impact on the field.

2.3 Image Embeddings and Pre-trained Models


Pre-trained models have become a cornerstone in the field of machine learning and
computer vision, providing a means to leverage large amounts of pre-existing data
to improve performance and efficiency in various tasks. These models are trained
on large datasets and can extract useful features from input data, which can then be
used in other tasks, often with significantly less data available [26, 27].
One of the most popular pre-trained models used in image processing tasks is
the Residual Network (ResNet) [28]. ResNet, introduced by He et al. [28], is a type
of convolutional neural network (CNN) that uses skip connections or shortcuts to
jump over some layers. This unique architecture allows the model to be trained
effectively with a large number of layers.

F IGURE 2.1: Buliding Block of Residual Learning [28]


2.4. Dimensionality Reduction Techniques: UMAP and Alternatives 7

The architecture of ResNet is composed of several stacked "residual blocks". Each


residual block contains a series of convolutional layers and a shortcut connection
that bypasses these layers [28]. The input to each block is added to the output of the
block, which helps to mitigate the problem of vanishing gradients during training,
allowing the network to learn more complex features. This architecture has been
shown to achieve state-of-the-art performance on the ImageNet dataset [29] and is
widely used in various image-processing tasks.

F IGURE 2.2: Architecture of ResNet and VGG-19 [28]

Image embeddings are a form of feature extraction where an image is repre-


sented as a vector of numbers. These embeddings can be extracted from various
layers of a pre-trained model like ResNet and VGG. The choice of the layer from
which to extract the embeddings depends on the complexity of the features needed
for the task at hand. Earlier layers capture basic features like edges and textures,
while deeper layers capture more complex features like shapes and object parts [30].
These embeddings can be used for various tasks, including image classification,
object detection, and image retrieval [31]. They can also be used for visualization
purposes, where techniques like t-SNE [32] or UMAP [33] can be used to project the
high-dimensional embeddings onto a 2D or 3D space. This can provide insights into
the distribution and structure of the data and can be particularly useful in tasks like
anomaly detection, where anomalies often manifest as outliers in the embedding
space [34].

2.4 Dimensionality Reduction Techniques: UMAP and Al-


ternatives
Uniform Manifold Approximation and Projection (UMAP) is a non-linear dimen-
sionality reduction technique that has gained popularity due to its ability to pre-
serve both local and global structures in data [33]. UMAP is based on three assump-
tions about the data: the data are uniformly distributed on a Riemannian manifold,
the Riemannian metric is locally constant, and the manifold is locally connected
[35]. The algorithm first constructs a weighted k-nearest neighbour graph using
the nearest-neighbour descent algorithm, then computes a low-dimensional repre-
sentation that preserves the characteristics of this graph [35].
8 Chapter 2. Literature Review

F IGURE 2.3: UMAP Applied to COIL20, MNIST, and Fashion MNIST


Datasets [33]

UMAP has been used in various applications, including the analysis of genomic
data. For instance, UMAP was applied to biobank-derived genomic data of a Japanese
population, revealing fine-scale population structure and differentiating adjacent in-
sular subpopulations [36]. This study demonstrated that UMAP, in combination
with PCA (PCA-UMAP), was able to clearly distinguish neighbouring clusters while
retaining the global structure, making it a powerful tool for visualizing and under-
standing complex genomic data [36].
In the context of image data, UMAP has been used to visualize high-dimensional
image embeddings. For instance, in the work of Zhu et al. [37], UMAP was used
to visualize the embeddings of an image-to-image translation model. The authors
found that UMAP was able to capture meaningful semantic relationships in the im-
age data. Similarly, in the work of Wang et al. [38], UMAP was used to visualize the
embeddings of a deep learning model trained for image-text matching. The authors
found that UMAP was able to effectively capture the semantic relationships between
images and their associated text descriptions.
Likewise, Principal Component Analysis (PCA) is a classical linear dimensional-
ity reduction method that has been widely used to uncover large population struc-
tures [36]. PCA identifies the directions (principal components) in which the data
varies the most and projects the data onto these directions to reduce its dimensional-
ity [35]. However, PCA’s linear nature may not capture the fine and subtle structure,
and it may not maintain the global structure of the data as effectively as UMAP [36].
t-Distributed Stochastic Neighbor Embedding (t-SNE) is another non-linear di-
mensionality reduction method that has been used to interpret complex population
structures and disease biology [36]. t-SNE converts similarities between data points
to joint probabilities and minimizes the Kullback-Leibler divergence between the
joint probabilities of the low-dimensional embedding and the high-dimensional data
[35]. However, t-SNE is more focused on preserving local structures, and it may not
maintain the global structure of the data as effectively as UMAP [35].
In comparison, UMAP exhibits high stability and moderate accuracy, with the
second highest computing cost after t-SNE [35]. UMAP is also computationally
fast and scalable for application to large datasets [36]. Moreover, UMAP is capable
of clearly distinguishing neighbouring clusters while retaining the global structure,
making it a powerful tool for visualizing and understanding complex data [36].
2.5. Anomaly Detection: LOF and Alternatives 9

2.5 Anomaly Detection: LOF and Alternatives


Anomaly detection is a critical task in data mining, with the Local Outlier Factor
(LOF) being a powerful algorithm for detecting local outliers. The LOF measures the
local deviation of a given data point with respect to its neighbours. It is more effec-
tive in discovering outliers that are not revealed by global outlier detection methods
[39].
The LOF algorithm works by comparing the local density of a point with the
local densities of its neighbours. The local density is estimated by the reachabil-
ity distance, which is defined as the maximum of the actual distance and the local
reachability distance of the data point and its kth nearest neighbour. If a point has
a significantly lower density than its neighbours, it is considered an outlier. The de-
gree of outlierness is measured by the LOF score, which is the ratio of the average
local density of the neighbours to the local density of the point itself [39].
However, the LOF and its related local outlier detection algorithms are near-
est neighbour-based algorithms, which have a computational complexity of O(n2 ),
making them less efficient for large datasets [40]. Moreover, the LOF requires storage
in computer memory of the whole dataset and the distance values, and it requires
recalculation from the beginning if any modification occurs in the dataset, such as
the insertion of a new data point [41].
Several alternative anomaly detection techniques have been proposed to address
these issues. For instance, the Connectivity-based Outlier Factor (COF) algorithm,
which has a detection performance equivalent to that of the LOF but is more effi-
cient in data streams [42]. The Heterogeneous Data Streams Outlier Detection (HD-
SOD) approach, which uses a partition-cluster approach for data stream segments
and computes the outlier result based on the number of cluster references and the
degree of representation [43]. The Incremental LOF (ILOF) algorithm, which uses a
sliding window to update data profiles and reduce the false-positive rate [44]. And
the Local Outlier Probabilities (LoOP) algorithm, which can detect outliers almost
immediately and reduces computational time [45].
Among these techniques, LOF has several advantages. It does not require any
distribution assumptions and can be applied to different data types. It is also more
accurate in detecting local outliers compared to global outlier detection methods.
However, it requires an appropriate distance calculation for the data and has a high
computational complexity for large datasets [40].
In conclusion, while LOF is a powerful tool for anomaly detection, its applica-
tion to large datasets and data streams can be challenging due to its computational
complexity and memory requirements. Alternative techniques such as COF, HD-
SOD, ILOF, and LoOP offer potential solutions to these challenges, but each has its
own strengths and weaknesses that need to be considered in different application
contexts [40].
11

Chapter 3

Methodology

In this chapter, we present in detail the proposed framework for the assessment of
the efficacy of unsupervised learning approaches in streamlining the annotation pro-
cess in sandpaper manufacturing by identifying images of potential defects prior to
labelling. We conceived a methodology that harnesses image embeddings extracted
from a pre-trained ResNet model, the Local Outlier Factor for anomaly identifica-
tion, and UMAP for reducing dimensionality and facilitating visualization. We aim
to expedite the process of image preselection and annotation, thus optimizing the
labelling process and improving the overall quality of the training dataset.

3.1 The Proposed Framework


Mirka’s production line produces hundreds of thousands of sandpaper images daily.
Labelling these images is a costly and time-consuming task, especially when rare
defects are the primary focus. Thus, this points to a need for a mechanism to group
similar images together and isolate ones that deviate from the normal. Moreover,
it is essential to have an easier way to visualize and navigate the entire dataset to
facilitate image selection. Figure 3.1 illustrates the workflow we have developed to
address this task. The general workflow is denoted by colored boxes on the left and
the specific tools used for the workflow for our research are denoted by blue boxes.
The visualization problem is addressed by Uniform Manifold Approximation
and Projection (UMAP). UMAP can map high-dimensional images to lower dimen-
sions (2D or 3D) for visualization, preserving both global and local structures. As a
result, similar images will have similar UMAP embeddings and will appear closer
together in the UMAP embedding space [33]. However, running UMAP on raw im-
age pixels requires substantial memory as all images need to be loaded into memory
to calculate nearest neighbours.
To overcome this problem, we first extract features from the images and then
run UMAP to obtain a lower-dimensional representation. We can either build our
model or use pre-trained models like ResNet [28] for feature extraction. In our case,
we chose ResNet-18. The fundamental idea is that the features (embeddings) of
similar images are also closer together in the latent space. We then visualize these
UMAP embeddings in a scatter plot to get an overview of the dataset. Our experi-
ments reveal that the dataset usually comprises patterns or clusters. Notably, some
of these clusters predominantly consist of anomalies, referred to as global anomalies.
Following this, we apply an anomaly detection algorithm, specifically Local Outlier
Factor (LOF), to the original feature space. This assigns an anomaly score to each
image. By setting an appropriate threshold, we are able to distinguish local anoma-
lies from the images. Consequently, UMAP and LOF work in concert, allowing for
the detection of both global and local anomalies.
12 Chapter 3. Methodology

F IGURE 3.1: Proposed Framework for Image Pre-selection

The next challenge is to find a way to quickly view the images in the scatter plot
in areas of interest and tag the images as defective or non-defective. Several tools
can perform these tasks; however, we opted for Plotly Dash1 and FiftyOne2 for our
research. Using these two Python packages, we can select data points on the graph
to preview images quickly, filter images based on anomaly scores, compare image
intensities between two or more selected regions on the graph, tag images, etc.
The output of pre-selection is a curated dataset of pre-selected images that can
then be sent to human labellers to create a training dataset for downstream tasks like
defect detection. This process means that only a small fraction of images are sent for
labelling instead of the hundreds of thousands of images that arrive daily.

3.2 Dataset Acquisition and Pre-processing


3.2.1 Image Collection
The images used for this research were obtained from Mirka’s sandpaper manufac-
turing plant, where a Cognex In-Sight 9902L line scan camera was deployed on the
sandpaper production line. This camera offers a practical solution for capturing im-
ages in a continuous production environment, where high-frequency scanning of
1 Plotly Dash: https://dash.plotly.com/
2 FirtyOne: https://docs.voxel51.com/
3.2. Dataset Acquisition and Pre-processing 13

moving objects is essential. This approach is ideal for the task at hand, consider-
ing that the defects on sandpaper are visually identifiable, thus allowing computer
vision to be a viable solution for determining sandpaper quality and classification.
For the scope of this study, we selected five datasets representing five batches of
the same product family, collectively comprising 122,487 images as shown in Figure
3.1. These images, captured in the Bitmap (.bmp) format, originally had dimen-
sions of 1948 × 550 pixels with a size ranging from 890 kb to 920 kb each and were
recorded in RGB colour.
TABLE 3.1: Dataset Image Count

Batch Number of Images Alias


Combi2-Iridium_Grip_220/2022-12-27 12,685 Batch A
Combi2-Iridium_Grip_220/2022-09-15 17,975 Batch B
Combi2-Iridium_Grip_220/2023-01-25 18,279 Batch C
Combi2-Iridium_Grip_220/2022-11-04 25,199 Batch D
Combi2-Iridium_Grip_220/2022-10-04 48,349 Batch E
Total 122,487

According to the manufacturer’s reports, the incidence of defects is relatively


low, with less than 0.05% of the sandpapers expected to contain some kind of de-
fect. Figure 3.2 shows the defective images, and Figure 3.3 shows the non-defective
images in the dataset for the product family "Iridium Grip 220".

F IGURE 3.2: Sandpaper Images with Defects


14 Chapter 3. Methodology

F IGURE 3.3: Sandpaper Images without Defects


3.3. Feature Extraction using ResNet Model 15

3.2.2 Image Pre-processing


In preparation for the analysis, the images underwent several pre-processing steps.
First, each image was opened and converted to RGB format using the Python Imag-
ing Library (PIL). This conversion was necessary since the images may have been
saved in a different colour space during the collection process.
Then, we applied a series of transformations using the torchvision.transforms
module in PyTorch. The transformation pipeline consisted of resizing, tensor con-
version, and normalization.
The images were resized to 224x224 pixels to conform to the input size require-
ment of the pre-trained ResNet model used for generating image embeddings.
Next, we converted the images to tensors to allow them to be processed by Py-
Torch models. The conversion from a PIL Image to a PyTorch tensor changes the
image array from (height, width, channels) to (channels, height, width), as PyTorch
models expect the channel dimension to be first.
Finally, the images were normalized using the mean and standard deviation of
the RGB channels of the ImageNet dataset, which the ResNet model was originally
trained on. This normalization step is crucial to ensure that our model receives in-
puts that follow the same distribution as the original ResNet training data. The
specific means were [0.485, 0.456, 0.406] and standard deviations were [0.229, 0.224,
0.225] for the RGB channels respectively.
Here is the Python code snippet for the image pre-processing:

from torchvision import transforms


from PIL import Image

transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])

img = Image.open(path).convert('RGB')
img_t = transform(img)

This pre-processing step prepared the dataset for subsequent unsupervised learn-
ing and defect detection tasks. By standardizing the size, format, and colour distri-
bution of the images, we ensured that the downstream models can focus on the
critical task of identifying potential defects, rather than handling image variety.

3.3 Feature Extraction using ResNet Model


3.3.1 Pre-trained ResNet Model Selection
In this study, we chose the ResNet-18 model for feature extraction. The ResNet
(Residual Network) family is a prominent choice for image classification tasks, hav-
ing demonstrated exceptional performance across various domains [46]. The ResNet-
18 model, in particular, was chosen for its high computational efficiency and ade-
quate depth of learning. It makes it a suitable candidate for an industrial setting
16 Chapter 3. Methodology

such as sandpaper manufacturing, where quick and reliable results are crucial.
ResNet-18 is a deep convolutional neural network consisting of 18 layers, which
include several types of layers: convolutional, batch normalization, ReLU activation,
pooling, and fully connected layers. Initially trained on millions of images from
the ImageNet database, it has learned to extract complex and detailed patterns and
features from images [47].
Opting for a pre-trained model such as ResNet-18 brings several benefits. Firstly,
it capitalizes on the power of transfer learning [47]. This is a process in which knowl-
edge gained while solving one problem is applied to another related problem. In
our context, the general object recognition capabilities learned by the model from
the ImageNet dataset can be reused for sandpaper defect detection. Secondly, using
ResNet-18 saves a substantial amount of time and computational resources, which
would otherwise be needed to train such a deep model from scratch. The reduced
number of layers compared to larger models like ResNet-50 allows for faster infer-
ence time, making ResNet-18 an excellent choice for applications where rapid re-
sponse times are essential.

3.3.2 Image Embeddings Calculation


To generate embeddings for our sandpaper images, we employed the ResNet-18
model as a feature extractor. Specifically, we used the model up to its final fully
connected layer, discarding this last layer as shown in figure 3.2. The reasoning
behind this decision is that the final fully connected layer is generally more task-
specific (in ResNet-18’s case, 1000-class object recognition), whereas the preceding
layers learn more generic, reusable features.

TABLE 3.2: Resnet-18 architecture for Embeddings Extraction

Layer (type) Output Shape Param #


Conv2d-1 [-1, 64, 112, 112] 9,408
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 56, 56] 0
Conv2d-5 [-1, 64, 56, 56] 36,864
BatchNorm2d-6 [-1, 64, 56, 56] 128
ReLU-7 [-1, 64, 56, 56] 0
Conv2d-8 [-1, 64, 56, 56] 36,864
BatchNorm2d-9 [-1, 64, 56, 56] 128
ReLU-10 [-1, 64, 56, 56] 0
BasicBlock-11 [-1, 64, 56, 56] 0
Conv2d-12 [-1, 64, 56, 56] 36,864
BatchNorm2d-13 [-1, 64, 56, 56] 128
ReLU-14 [-1, 64, 56, 56] 0
Conv2d-15 [-1, 64, 56, 56] 36,864
BatchNorm2d-16 [-1, 64, 56, 56] 128
ReLU-17 [-1, 64, 56, 56] 0
BasicBlock-18 [-1, 64, 56, 56] 0
Conv2d-19 [-1, 128, 28, 28] 73,728
BatchNorm2d-20 [-1, 128, 28, 28] 256
Continued on next page
3.3. Feature Extraction using ResNet Model 17

Table 3.2 ± continued from previous page


Layer (type) Output Shape Param #
ReLU-21 [-1, 128, 28, 28] 0
Conv2d-22 [-1, 128, 28, 28] 147,456
BatchNorm2d-23 [-1, 128, 28, 28] 256
Conv2d-24 [-1, 128, 28, 28] 8,192
BatchNorm2d-25 [-1, 128, 28, 28] 256
ReLU-26 [-1, 128, 28, 28] 0
BasicBlock-27 [-1, 128, 28, 28] 0
Conv2d-28 [-1, 128, 28, 28] 147,456
BatchNorm2d-29 [-1, 128, 28, 28] 256
ReLU-30 [-1, 128, 28, 28] 0
Conv2d-31 [-1, 128, 28, 28] 147,456
BatchNorm2d-32 [-1, 128, 28, 28] 256
ReLU-33 [-1, 128, 28, 28] 0
BasicBlock-34 [-1, 128, 28, 28] 0
Conv2d-35 [-1, 256, 14, 14] 294,912
BatchNorm2d-36 [-1, 256, 14, 14] 512
ReLU-37 [-1, 256, 14, 14] 0
Conv2d-38 [-1, 256, 14, 14] 589,824
BatchNorm2d-39 [-1, 256, 14, 14] 512
Conv2d-40 [-1, 256, 14, 14] 32,768
BatchNorm2d-41 [-1, 256, 14, 14] 512
ReLU-42 [-1, 256, 14, 14] 0
BasicBlock-43 [-1, 256, 14, 14] 0
Conv2d-44 [-1, 256, 14, 14] 589,824
BatchNorm2d-45 [-1, 256, 14, 14] 512
ReLU-46 [-1, 256, 14, 14] 0
Conv2d-47 [-1, 256, 14, 14] 589,824
BatchNorm2d-48 [-1, 256, 14, 14] 512
ReLU-49 [-1, 256, 14, 14] 0
BasicBlock-50 [-1, 256, 14, 14] 0
Conv2d-51 [-1, 512, 7, 7] 1,179,648
BatchNorm2d-52 [-1, 512, 7, 7] 1,024
ReLU-53 [-1, 512, 7, 7] 0
Conv2d-54 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-55 [-1, 512, 7, 7] 1,024
Conv2d-56 [-1, 512, 7, 7] 131,072
BatchNorm2d-57 [-1, 512, 7, 7] 1,024
ReLU-58 [-1, 512, 7, 7] 0
BasicBlock-59 [-1, 512, 7, 7] 0
Conv2d-60 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-61 [-1, 512, 7, 7] 1,024
ReLU-62 [-1, 512, 7, 7] 0
Conv2d-63 [-1, 512, 7, 7] 2,359,296
BatchNorm2d-64 [-1, 512, 7, 7] 1,024
ReLU-65 [-1, 512, 7, 7] 0
BasicBlock-66 [-1, 512, 7, 7] 0
Continued on next page
18 Chapter 3. Methodology

Table 3.2 ± continued from previous page


Layer (type) Output Shape Param #
AdaptiveAvgPool2d-67 [-1, 512, 1, 1] 0

When an image is passed through the ResNet-18 model, the output of the penul-
timate layer is a 512-dimensional feature vector or an "embedding". This embedding
encapsulates the essential visual features of the input image as understood by the
model.
In Python, using PyTorch, the procedure for calculating image embeddings is
illustrated as follows:

import torch
from torchvision import models

# Load pre-trained ResNet-18 model


model = models.resnet18(pretrained=True)

# Remove the final fully connected layer


model = torch.nn.Sequential(*(list(model.children())[:-1]))

# Calculate image embedding


with torch.no_grad():
model.eval()

# Add batch dimension


img_t = img_t.unsqueeze(0)

# Shape: (1, 512)


embedding = model(img_t)

# Convert to 1D array
embedding = torch.flatten(embedding).numpy()

These 512-dimensional embeddings were then utilized as inputs to the LOF al-
gorithm for anomaly detection and were also reduced to a lower dimensionality for
visualization using UMAP.
This approach using a pre-trained ResNet-18 model as a feature extractor pro-
vides an efficient way to convert raw images into a form suitable for machine learn-
ing tasks. It allows us to benefit from the potent feature learning capability of deep
neural networks while avoiding the need for extensive re-training of the model.
Calculating image embeddings from the ResNet-18 model was executed without
major challenges. It is worth noting that the model’s ability to handle diverse and
complex image content makes it a reliable tool for extracting relevant features from
our sandpaper images, potentially leading to more accurate anomaly detection.

3.4 Unsupervised Anomaly Detection using LOF


In order to identify potential defects in sandpaper manufacturing, this research lever-
ages the power of unsupervised learning, specifically anomaly detection, using the
Local Outlier Factor (LOF) algorithm.
3.4. Unsupervised Anomaly Detection using LOF 19

3.4.1 LOF Algorithm


LOF algorithm is an unsupervised learning method specifically developed for anomaly
or outlier detection. It works on the principle of measuring the local deviation of a
given data point with respect to its neighbours. It considers the density of local
neighbourhoods to identify anomalies, which are often expected to have a lower
density compared to regularly occurring data points [39].
In the context of our research, each high-dimensional image embedding from the
ResNet-18 model is considered as a data point. The LOF algorithm is then used to
assign each point a score representing how much it deviates from the structure of its
surrounding neighbourhood. Higher LOF scores correspond to more anomalous or
’outlying’ data points, which in this case could potentially represent defects in the
sandpaper images.
It is worth noting that LOF scores do not directly indicate which data points (or
images) are anomalous. A threshold is needed to classify data points as normal or
anomalous based on their LOF scores.
Here is how we calculated the LOF scores:

from sklearn.neighbours import LocalOutlierFactor


from sklearn.preprocessing import MinMaxScaler

# Initialize LOF model


lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)

# Fit the model and predict the LOF scores


raw_lof_scores = lof.fit_predict(embeddings)

# Initialize MinMaxScaler
scaler = MinMaxScaler()

# Normalize the LOF scores


lof_scores = scaler.fit_transform(raw_lof_scores.reshape(-1,1))

LOF scores typically range from 1.0 (indicating a data point closely resembling its
neighbours) to an upper limit determined by the level of deviation from the neigh-
bouring points. We normalized these scores using the MinMaxScaler from sklearn
[48] to bring them onto a comparable scale, which helps interpret the scores more
intuitively.

3.4.2 Hyperparameter Selection and Tuning


There are several hyperparameters to consider when using the LOF algorithm, two
of which are crucial to its performance: the number of neighbours and the propor-
tion of anomalies in the data (contamination).
The number of neighbours determines the size of the local neighbourhood from
which the density is estimated. A smaller number can make the LOF scores more
sensitive to local anomalies, but may also make them more prone to noise. On the
other hand, a larger number may make the scores more stable, but also more global,
potentially missing smaller local anomalies. After several rounds of experimenta-
tion, we found that n_neighbors=20 provided a good balance in our case.
The contamination parameter, on the other hand, is an estimate of the proportion
of anomalies in the data. As mentioned earlier, defects are quite rare in sandpaper
20 Chapter 3. Methodology

manufacturing, occurring in less than 0.05% of the products. Therefore, we set the
contamination parameter to 0.05 to reflect this imbalance. It is noteworthy that al-
though not all anomalies are defects, as a general guideline we used the estimated
defect rate from the Manufacturer as the value for the contamination parameter.
Tuning these hyperparameters effectively allowed us to calibrate the sensitivity
of the LOF algorithm, ensuring it is well-suited to the task of detecting rare defects
in the context of sandpaper manufacturing. These adjustments, in conjunction with
the ResNet-18 derived image embeddings, provided a comprehensive approach to
unsupervised anomaly detection within the dataset.

3.5 Dimensionality Reduction using UMAP


While the 512-dimensional image embeddings provide a comprehensive represen-
tation of sandpaper images, such high-dimensional data can be challenging to visu-
alize and interpret. Therefore, we used the Uniform Manifold Approximation and
Projection (UMAP) algorithm for dimensionality reduction.

3.5.1 UMAP Algorithm


UMAP works by constructing a high-dimensional graph representation of the data,
then optimizes a low-dimensional version of this graph to reflect as closely as pos-
sible the same topological structure. The algorithm effectively captures both local
and global structure in the data [33]. UMAP’s key idea can be summarized at a high
level by the function: f : X → Y where, X is the original high-dimensional data space
and Y is the low-dimensional space. UMAP aims to find the function f that best
preserves the topological structure from X to Y.
In our case, the high-dimensional space X is the 512-dimensional image embed-
dings from the ResNet-18 model. We used the UMAP algorithm to reduce these to a
2-dimensional space Y for easy visualization:

from umap import UMAP

# Initialize UMAP model


umap = UMAP(n_neighbors=15, min_dist=0.1, n_components=2)

# Fit the model and transform the data


embeddings_2d = umap.fit_transform(embeddings)

The 2-dimensional embeddings from UMAP enabled us to visualize the distribu-


tion of sandpaper images and the detected anomalies in a comprehensible manner.

3.5.2 Parameter Selection and Tuning


The key hyperparameters of the UMAP algorithm include the number of neigh-
bours (n_neighbors), the minimum distance (min_dist) between points in the low-
dimensional representation, and the number of components (n_components).
The number of neighbours controls the balance between preserving the global
versus local structure of the data. After several experiments in this study, n_neighbors
was set to 15 to strike a balance between maintaining local and global structures.
The min_dist parameter controls how tightly UMAP is allowed to pack points
together. A larger value ensures that embedded points are more evenly spread out,
3.6. Image Pre-selection for Annotation 21

while a smaller value allows them to cluster more tightly. Based on the distribution
and the density of our data, a min_dist of 0.1 was found to be appropriate.
The number of components corresponds to the number of dimensions in the low-
dimensional space we are mapping to. For visualization purposes, this was set to 2.
These UMAP hyperparameters were carefully tuned to optimize the quality of
the low-dimensional representations of our high-dimensional image embeddings.
The two-dimensional embeddings generated from the sandpaper images offered sig-
nificant insights into their distribution, which included both regular and anomalous
instances. These insights greatly facilitated the overarching aim of potential defect
detection.

3.6 Image Pre-selection for Annotation


The process of image pre-selection involves the visualization and exploration of
the UMAP embeddings scatter plot, inspecting intriguing regions (formed clusters,
structure of the data points), filtering data points based on LOF threshold, and fi-
nally tagging the images as defective or non-defective.

3.6.1 Threshold Determination for Anomaly Detection


A critical aspect of separating anomalies is the filtration of locally anomalous images
by setting a threshold value for LOF. As previously discussed, it is crucial to exper-
iment with the threshold to discern a borderline between anomalous and normal
images. This process involves a delicate balance. The strategy we adopted to find
defective images involved initiating with a low range (for instance, 0-0.5) and grad-
ually progressing towards the upper end until we start to see normal images. This
approach allowed us to tune the threshold for the particular dataset. We applied a
similar process for normal images, starting with a high range (0.98-1) and gradually
decreasing it until defective images begin to appear.

3.6.2 Image Exploration and Pre-selection


We utilized the Plotly Dash Python package to develop an interactive dashboard for
dataset exploration. The tool enables users to hover over data points to preview
images and metadata, and filter data points based on the LOF threshold. Users can
also make selections on different regions of the graph and compare the maximum,
minimum, and average image pixel intensities of these selections, which could po-
tentially indicate differences in intensity between selected areas.
Although FiftyOne offers similar features, it is somewhat more limited in terms
of the capabilities mentioned above. However, it provides other beneficial function-
alities such as image tagging and filtering. Compared to Plotly Dash, where these
features must be manually coded, FiftyOne offers this out-of-the-box. Figure 3.4
presents the FiftyOne dashboard.
The workflow for exploring and pre-selecting the images comprised the follow-
ing steps:

1. Finding local anomalies: Using the thresholding method outlined above, local
anomalies were segregated and tagged as defective.

2. Finding global anomalies: By inspecting clusters formed with UMAP, we


could determine whether the clusters predominantly consist of anomalous or
22 Chapter 3. Methodology

F IGURE 3.4: FiftyOne Dashboard for Image Tagging

normal images. The clusters formed predominantly of anomalous images are


global anomalies. Typically, clusters of anomalies stood out in some way from
the rest of the clusters of normal images. However, only those anomalous clus-
ters, that revealed defects upon inspection were tagged as defective.

3. Selecting Normal Images: To create a balanced training set, we also required


a set of normal images. To get a balanced dataset, we selected an equal num-
ber of normal images to match the number of defective ones. We filtered out
normal images from the dataset using the thresholding method in conjunction
with avoiding clusters of anomalies. As the number of normal images was sig-
nificant, we opted for random selection using FiftyOne’s filtering tool3 . These
images were then exported in order to be labelled by human annotators.

3.7 Evaluation Metrics


In order to evaluate the effectiveness of the pre-selection process, we propose a
comprehensive two-fold approach. The first metric gauges the accuracy of the pre-
selected labels by comparing them to ground truth labels obtained from expert hu-
man annotators. The second metric involves training a downstream defect detection
model on the pre-selected and ground-truth labelled dataset and then employing
this model to infer potential defects in the residual images from the original dataset.
This measure aids in estimating the number of defects missed during pre-selection.
The proposed evaluation metrics are elucidated as follows:

1. Pre-selection Accuracy (PAcc): This metric measures the degree of agreement


between the pre-selection labels and the ground truth labels. It can be mathe-
matically expressed as follows:

Number of correctly pre_selected images


PAcc = (3.1)
Total number of pre_selected images
3 FiftyOne Filtering: https://docs.voxel51.com/cheat_sheets/filtering_cheat_sheet.html
3.7. Evaluation Metrics 23

A high Pre-selection Accuracy indicates that the pre-selection process is able


to correctly identify and categorize defective and non-defective images in line
with expert human annotators. It signifies the capability of the pre-selection
workflow compared to human annotators in segregating defective from non-
defective images.

2. Missed Defect Fraction (MDF): The Missed Defect Fraction quantifies the pro-
portion of defective images missed during pre-selection but later recognized
by the downstream defect detection model. This can be computed as:

Number of defects detected during inference


MDF = (3.2)
Total number of residual images

Here, the residual image refers to the image that was not pre-selected. A lower
MDF denotes that the pre-selection process was efficient in capturing a ma-
jority of the defective images. However, a higher MDF would point towards
potential gaps in the pre-selection process, thereby necessitating a more robust
method.

These evaluation metrics, thus, offer significant insight into the pre-selection pro-
cess, its strengths, and areas for potential improvement.
25

Chapter 4

Experiments and Results

This chapter presents the conducted experiments, as well as the obtained results to
evaluate the performance of our proposed framework. All the experiments for the
image pre-selection workflow were conducted on the Databricks platform,1 with the
following specifications:

• Cluster Type: Standard_NC12s_v3.

• Runtime: 12.2.x-gpu-ml-scala2.12.

• RAM and Processor: 224GB Memory, 12 Cores.

4.1 Performance Evaluation


4.1.1 Feature Extraction Performance

·10−2
3.4 3.38 · 10−2
Seconds per Image

3.2
3.12 · 10−2

3
2.91 · 10−2
2.82 · 10−2
2.8
2.63 · 10−2
2.6
Batch A Batch B Batch C Batch D Batch E

F IGURE 4.1: Per-image Computation Time for ResNet-18 Feature Ex-


traction

ResNet-18 transforms raw image data into a lower-dimensional feature vector.


Overall, as seen in Figure 4.1, the per-image time for feature extraction through
ResNet-18 appears relatively stable across the different batches, with times ranging
from approximately 0.026 to 0.033 seconds. The most efficient batch in this regard
is Batch E, with a per-image time of about 0.026 seconds, and the least efficient is
1 Databricks: https://www.databricks.com/
26 Chapter 4. Experiments and Results

1,273.8
1,200

Time Taken (sec)


1,000

786
800
617.4
600 523.2

400 358.2

Batch A Batch B Batch C Batch D Batch E

F IGURE 4.2: Total Computation Time for ResNet-18 Feature Extrac-


tion

Batch C, which has a per-image time of about 0.033 seconds. The total computa-
tional time seems to be increasing with larger batch size as seen in Figure 4.2. Given
this architecture, more complex images with intricate patterns might take longer to
process as they necessitate more computational resources to accurately capture their
intricacies.

4.1.2 UMAP Performance

·10−3
4 3.77 · 10−3
Seconds per Image

3 −3
2.72 · 10−3 2.82 · 10

2
1.7 · 10−3

1 9.17 · 10−4

Batch A Batch B Batch C Batch D Batch E

F IGURE 4.3: Per-image Computation Time for 2D UMAP Embedding


Generation

Figure 4.3 shows that the UMAP performance exhibited a clear downward trend
in per-image computation time as the batch size increased. The per-image time for
UMAP varies more noticeably across batches, ranging from approximately 0.0009
to 0.0037 seconds. This indicates that UMAP may have a significant constant-time
component in its computation. This fixed cost gets amortized over a larger number
of images, leading to a lower per-image computation time for larger batches. This
means UMAP benefits from a larger batch size. However, similar to ResNet, the total
computation time increases with the size of the batch as shown in Figure 4.4.
4.1. Performance Evaluation 27

82.2
80

Time Taken (sec) 70

60

50 47.77 48.9

40 35.48

Batch A Batch B Batch C Batch D

F IGURE 4.4: Total Computation Time for 2D UMAP Embedding Gen-


eration

4.1.3 LOF Performance

·10−3
1.07 · 10−3
1
Seconds per Image

8.42 · 10−4
7.99 · 10−4
0.8

0.6

0.4 3.84 · 10−4


2.84 · 10−4

Batch A Batch B Batch C Batch D Batch E

F IGURE 4.5: Per-image Computation Time for LOF Score

LOF showed an upward trend in the per-image computation time as the batch
size increased as seen in Figure 4.5, ranging from approximately 0.0002 seconds to
0.0011 seconds, with Batch A being the most efficient (about 0.0002 seconds per im-
age) and Batch D the least efficient (about 0.0011 seconds per image). This trend
might be because LOF, being a density-based outlier detection method, has to com-
pute the local density for each data point, a task that may become increasingly com-
plex as the number of data points (images) increases. Despite this increase, the LOF
times are still significantly lower than those of ResNet and UMAP, which suggests
that LOF is quite efficient on a per-image basis.

From the observations made in the analyses of the performances of ResNet-18,


UMAP, and LOF, it is evident that the computations, particularly for larger datasets,
can be time-consuming. This is especially evident in the total computation time for
ResNet-18 and UMAP. To manage this computational demand more efficiently, we
have adopted a strategy of pre-computation. The computed values are stored in a
28 Chapter 4. Experiments and Results

51.94
50

Time Taken (sec)


40

30
21.21
20
14.6
10 6.91
3.6
0
Batch A Batch B Batch C Batch D Batch E

F IGURE 4.6: Total Computation Time for LOF Score

Parquet file, ready to be served for subsequent exploratory analysis and visualiza-
tion tasks. This approach allows us to capitalize on the time spent computing these
values by preserving the results for reuse, hence maximizing our computational ef-
ficiency.

4.2 Embeddings Visualization and Observations


The dashboard, as demonstrated in Figure 4.7, has been developed to facilitate the
exploration of various visualizations corresponding to distinct datasets. We can filter
the data points based on LOF scores, hover over the data points to get an image pre-
view, and also compare the image intensities between multiple selections in the plot.
It’s worth noting that while the dashboard also supports multiple feature extraction
models and enables the viewing of 2D, 3D and DensMAP2 UMAP embeddings, the
focus of this thesis does not centre on these aspects.
When we visualized the 2D embeddings of UMAP on a scatter plot, we already
could get a quite comprehensive view of the dataset. By reducing thousands of high-
dimensional images into a two-dimensional scatter plot, we can begin to understand
the structure and relationships within our dataset.
In this low-dimensional representation, similar images cluster together, while
dissimilar images are situated further apart. Not only did this allow for an intuitive
view of image similarity, but it also revealed emergent structures in the data. Clus-
ters (Figure 4.8a), for example, often represent groups of similar images that differ
only subtly. Patterns (Figure 4.8b), on the other hand, are formed from regularities
in the images, such as repetitive prints in a product on a production line (see Figure
4.9a). This approach allowed us to selectively examine specific regions of interest
within the scatter plot, eliminating the need to analyze the entirety of the thousands
of images in the dataset individually.
With regards to anomaly detection, global anomalies typically formed their own
clusters, making them relatively easy to identify. Local anomalies, however, pre-
sented more of a challenge. These anomalies were slightly deviated from specific
clusters or patterns, but also occurred within clusters and patterns, making them
more difficult to detect.
2 DensMAP: https://umap-learn.readthedocs.io/en/latest/densmap_demo.html
4.2. Embeddings Visualization and Observations 29

F IGURE 4.7: Dashboard for Image Exploration


30 Chapter 4. Experiments and Results

(A) (B)

F IGURE 4.8: (a) Scatter Plot of 2D UMAP Embeddings for Batch A.


(b) Scatter Plot of 2D UMAP Embeddings for Batch C.

(A) (B)

F IGURE 4.9: (a) Images associated with patterns in the dataset from
Figure 4.8b. (b) Images showing global anomalies in the dataset
from Figure 4.8b. These figures provide visualization of patterns and
anomalies identified within the specified UMAP embeddings.
4.2. Embeddings Visualization and Observations 31

To aid in the identification of local anomalies, we introduced an additional layer


to our visualization. By colour-coding the scatter plot according to LOF scores, we
could more readily distinguish local anomalies, even when they occur within clus-
ters or patterns. In addition, by incorporating colour-coding into the scatter plots
with other meta-data like timestamp and camera number, we could uncover further
fascinating insights about the dataset.

4.2.1 Colour-coding with LOF scores


One key visualization strategy we employed involved colour-coding the scatter plot
based on LOF scores. This allowed for a clearer depiction of local anomalies, which
could now be visually separated based on colour hues. As depicted in Figures 4.10a
and 4.10c, local anomalies (associated with lower scores) are represented in vary-
ing shades of green to purple, while non-anomalous images, as identified by LOF,
appear in yellow. However, it is critical to note that LOF treats clusters of global
anomalies as standard clusters, leading these to also be represented in yellow. There-
fore, this colour-coding strategy primarily aids in detecting images that are locally
anomalous with respect to specific clusters.
In Figure 4.10, we present the scatter plot filtered at a LOF threshold of 1.00 and
0.97. The remaining data points represent local anomalies after filtration at a desired
threshold, which can subsequently be pre-selected and tagged as defective.

(A) (B)

(C) (D)

F IGURE 4.10: (a) Batch A with LOF threshold = 1. (b) Batch A with
LOF threshold = 0.97. (c) Batch C with LOF threshold = 1. (d) Batch
C with LOF threshold = 0.97. The figure presents the LOF scores vi-
sualization for the specified batches and thresholds.
32 Chapter 4. Experiments and Results

4.2.2 Colour-coding with Camera Number


In addition to colour-coding by LOF scores, we also explored colour-coding the scat-
ter plot based on the camera number metadata. Mirka’s manufacturing line includes
two cameras capturing images of the sandpapers. The UMAP was able to accu-
rately discern the difference between images taken by different cameras and accord-
ingly formed separate clusters, as depicted in Figure 4.11. Interestingly, the clusters
from the two cameras are quite similar in structure and size. With this information,
one can strategically select images from either one or both cameras during the pre-
selection process.

( A ) Batch A ( B ) Batch C

F IGURE 4.11: Color-Coding Based on Camera Number

4.2.3 Colour-coding with Timestamp


An interesting approach we explored involved visualizing the temporal evolution
of image batches through colour-coding based on timestamps. This approach, as
presented in Figure 4.12, revealed how UMAP managed to differentiate images re-
sulting from systematic changes in the production line over time.
The formation of clusters correlating with similar timestamps indicates that cer-
tain changes in the production line happened during these specific intervals, war-
ranting further investigation. For instance, as shown in Figure 4.12d, the cluster
colour-coded in purple was found to comprise images of a different product that
had inadvertently got mixed in. These images were associated with the initial times-
tamps, thus clearly distinguishing this cluster from the others.
Moreover, it is noteworthy that certain defects displayed closer timestamps, in-
dicating that they occurred within certain time intervals as bursts.

4.3 Pre-selection Results


Utilizing the methodology described earlier and the insights derived from the ex-
ploratory analyses, we pre-selected a total of 2,580 images. The scatter plot of the
4.3. Pre-selection Results 33

( A ) Batch A ( B ) Batch B

( C ) Batch D ( D ) Batch E

F IGURE 4.12: Color-coded Based on Production Timestamp


34 Chapter 4. Experiments and Results

training dataset, colour-coded by batch names, is displayed in Figure 4.13. It illus-


trates the variations in the embeddings of different batches, indicating their dissimi-
larities. This set included all potential defects along with an equal number of normal
images that were then annotated by human annotators.

F IGURE 4.13: 2D UMAP Scatter Plot for Training Dataset

Furthermore, a downstream defect detection task was executed by training a


YOLOv83 model with the human-annotated ground truth labels. The configuration
for the downstream model is described in Table 4.1.
TABLE 4.1: Configuration for the YOLOv8 Model

Parameter Value
epochs 100
batch 20
imgsz 1,952
cache ram
rect true

The YOLOv8 model achieved an accuracy of 95.5% as seen in Figure 4.14. This
indicates that the model correctly identified 95.5% of the validation data, either by
correctly identifying defective images as defective, or by correctly identifying nor-
mal images as normal. This high accuracy demonstrates the effectiveness of our pre-
selection process and the ability to train a successful downstream model for defect
detection. However, our focus is on the effectiveness of the pre-selection workflow
which is further evaluated as follows.

4.3.1 Pre-selection Accuracy


Comparative analysis with the ground truth led to the following results:
3 YOLOv8: https://github.com/ultralytics/ultralytics
4.3. Pre-selection Results 35

F IGURE 4.14: YOLOv8 Top1 Validation Accuracy

• Number of pre-selected defects: 1,290, with 72 instances misclassified

• Number of pre-selected non-defects: 1,290, with 2 instances misclassified

The misclassification rate was notably higher for defects, likely due to the com-
plexities and variations in defect manifestations as well as the limitations of the
thresholding strategy. The results can be presented in the form of a confusion matrix
as shown in Table 4.2.
TABLE 4.2: Confusion Matrix of the Pre-selection Result

Actual
Pre_selected Defect Non-Defect
Defect 1218 72
Non-Defect 2 1288

The overall accuracy of the pre-selection can be calculated as follows:

Number of correctly pre_selected images


PAcc = (4.1)
Total number of pre_selected images
By plugging in the numbers:

1218 + 1288
PAcc = = 0.9713 (4.2)
1290 + 1290
This demonstrates a high degree of accuracy (approximately 97.1%) in our pre-
selection process, indicating its efficacy in correctly identifying defective and non-
defective samples.

4.3.2 Missed Defect Fraction


An ideal pre-selection process should eliminate all defective images, thereby en-
suring their non-existence within the pool of non-selected images. The evaluation
of this process requires ground-truth labels for the remaining non-selected images.
36 Chapter 4. Experiments and Results

Given the absence of these labels, the downstream defect detection model provides
tentative labels which can be used for evaluation purposes.
To gauge the effectiveness of the pre-selection process, it is necessary to examine
the proportion of defects missed. This involves executing the downstream defect de-
tection model on the pool of non-selected images and analyzing the resulting output.
This method, however, hinges on the assumption that the defect detection model is
accurate and reliable in identifying defects in images. The fraction of defects missed
by the pre-selection process is referred to as the ’Missed Defect Fraction’ (MDF) and
serves as a crucial metric for assessing the performance of the pre-selection process.
The calculation of MDF can be represented as:

Number of defects detected during inference


MDF = (4.3)
Total number of residual images
In our experiments, the number of defects detected during the inference was
1513, while the total number of non-selected images was 119907. Substituting these
values into the MDF equation gives:

1513
MDF = ≈ 0.0126 (4.4)
119907
This result implies that around 1.26% of defective images were overlooked dur-
ing the pre-selection process but were later identified by the downstream inference
model.
The MDF result is summarized in Table 4.3:
TABLE 4.3: Calculation of Missed Defect Fraction (MDF)

Parameter Value
Number of defects detected during inference 1,513
Total number of residual images 119,907
Missed Defect Fraction (MDF) 0.0126

Though the MDF is relatively small, indicating a high level of effectiveness in


the pre-selection process, it also reveals an area for further improvement in our pre-
selection process.
37

Chapter 5

Discussion

5.1 Interpretation of the Results


The results of our experiments indicate that our proposed framework for image pre-
selection and defect detection in sandpaper production lines is highly effective. The
process of visualizing high-dimensional images in a lower-dimensional space using
UMAP provides a comprehensive view of the image dataset. Colour-coding the scat-
ter plots based on the LOF scores and other metadata, such as camera number and
timestamp, allows for detailed examination and exploration of the data. This has
proven to be instrumental in discerning patterns, regularities, and anomalies. This
research has demonstrated that it might be possible to uncover additional interest-
ing structural relationships within the image data by incorporating colour coding
with other associated meta-data.
Our pre-selection process managed to accurately identify and tag defective and
non-defective images. With a high accuracy rate of approximately 97.1%, our ap-
proach significantly reduces the burden of human labelling, especially given the
volume of images produced on the production line. However, it is noteworthy that,
while our system demonstrated high efficiency, there remains a Missed Defect Frac-
tion (MDF) of approximately 0.0126, indicating that around 1.26% of defective im-
ages were overlooked during the pre-selection process but were later identified by
the downstream inference model. Despite the small MDF percentage pointing to
a largely effective pre-selection process, it nonetheless suggests potential room for
further enhancement in our defect detection system.
Our system managed to select only 2,580 images from a considerably larger set
of 122,487 images for human labelling. This represents substantial cost and time sav-
ings in the labelling process. However, the level of strictness applied to LOF thresh-
olding can be modulated according to specific requirements regarding the number
of pre-selected images. Lowering the threshold stringency can result in a larger pre-
selection pool, whereas increasing the strictness will result in a more narrowed, se-
lective pool. This provides additional flexibility to fine-tune the balance between
the size of the pre-selected dataset and the degree of accuracy sought in the defect
detection process.
Using the pre-selected and human-annotated images, we trained a YOLOv8 model
for defect detection. The model achieved an accuracy of 95.5%, indicating its strong
performance in identifying defects in the sandpaper production line based on the
curated training dataset.
Furthermore, the patterns observed in the UMAP scatter plots provide insights
about systematic changes in the production line over time. This could be utilized for
additional investigations and improvements in the production process. For instance,
clusters correlating with similar timestamps may indicate changes in the production
line at that time interval.
38 Chapter 5. Discussion

In summary, our proposed framework proves to be a robust and effective method


for image pre-selection, human labelling reduction, and defect detection in sandpa-
per production lines. These results open up possibilities for further optimization
and automation of quality control processes in manufacturing environments.

5.2 Potential Improvements and Extensions


There are numerous directions in which the current study can be extended and im-
proved, in order to potentially obtain higher performance and versatility in the de-
fect detection process. This section outlines some of the promising areas that can be
explored in future work.

5.2.1 Automatic Threshold Selection


In this study, we manually fine-tuned the LOF threshold to separate the normal and
anomalous images. An automated, data-driven approach for threshold selection
can be a promising improvement. Techniques such as clustering algorithms, opti-
mization procedures, or learning-based approaches could be used to automatically
determine an optimal threshold. This could potentially improve the precision of the
pre-selection process and reduce the manual intervention required, thereby making
the process more automated.

5.2.2 Image Selection Based on Vector Quantization


In the current process, images were selected based on the LOF scores. However,
another method to consider in conjunction with LOF could be vector quantization.
This could enable the system to avoid selecting images that are too similar to each
other, and therefore provide a more diverse set of images for pre-selection. This can
be particularly useful when dealing with highly imbalanced datasets where certain
defects are very rare, as the inclusion of a more diverse set of images in the training
set could potentially lead to a more robust and generalized defect detection model.

5.2.3 Contrastive Learning for Feature Extraction


The feature extraction in this study was performed using a pre-trained ResNet-18
model. One potential area for future work could involve the use of contrastive learn-
ing methods for fine-tuning the model or even training a contrastive learning model
from scratch. Contrastive learning, a self-supervised learning technique, has shown
significant success in learning useful representations by contrasting positive and
negative samples [49]. Implementing this could help in extracting more discrimina-
tive features, leading to better clustering and anomaly detection in the pre-selection
process.

5.2.4 Active Learning Approach


Another avenue complimentary to the research could be the incorporation of active
learning techniques as a post-selection process. Active learning involves iterative
learning where the model identifies and queries uncertain instances to be labelled
by the human expert, helping in the construction of a more informative training
dataset [50]. This approach can reduce the reliance on human annotators and make
the process more efficient by focusing the annotation effort on the most informative
5.3. Applicability to Other Industries 39

samples. As a result, it may increase the effectiveness of the downstream tasks in


conjunction with the proposed pre-selection workflow.

5.3 Applicability to Other Industries


Although we used specific tools for the workflow as outlined in Figure 3.1, one could
easily replace the tools with suitable alternatives for specific domains or data types
or improve upon existing ones. Likewise, the methodologies and techniques de-
veloped in this thesis are not confined to the realm of sandpaper production but
rather possess the versatility to be transferred to a variety of industries. Despite our
focus on image-based quality control within a manufacturing context, the princi-
ples employed are adaptable and have potential applicability across sectors generat-
ing high-volume image data and where automated image identification, inspection,
and categorization are paramount. This is particularly relevant for industries where
anomaly detection plays a crucial role.
Moreover, the versatility extends beyond image data, encompassing other forms
of high-dimensional data, including audio files, time-series data, complex sensor
data, and even high-dimensional biological data. Thus, sectors such as healthcare,
finance, agriculture, and environmental monitoring, among others, can potentially
benefit from the application of these methods.
However, the specific nature of the data can introduce certain challenges. For
instance, previewing non-image high-dimensional data might not be as straightfor-
ward, and suitable exploration and visualization tools may not be readily available.
Despite these potential obstacles, the fundamental methodologies - dimensional re-
duction, anomaly detection, and pre-selection for labelling - maintain wide appli-
cability, offering a cost-effective and efficient means of managing high-dimensional
data across a diverse array of industries.
41

Chapter 6

Conclusions

6.1 Summary of Findings


This thesis presented a comprehensive workflow for effective image pre-selection
in the context of quality control within the sandpaper production line. Our re-
search was centered around the use of unsupervised learning techniques, particu-
larly ResNet for feature extraction, UMAP for dimensionality reduction and LOF
for anomaly detection, to visualize and explore high-dimensional image data.
The main findings of our work can be summarized as follows:
• Our approach utilized a ResNet-18 model for the extraction of features from
the sandpaper images. This model was proven effective in translating the raw
images into a form that could be processed by unsupervised learning tech-
niques. It facilitated the transformation of high-dimensional image data into a
compact, yet informative, feature representation.
• A UMAP-based visualization provided a comprehensive and interpretable scat-
ter plot of the high-dimensional image dataset. The plot successfully reduced
thousands of high-dimensional images into a 2D space where similar images
appeared closer together, and dissimilar ones were farther apart. This visual-
ization helped reveal clusters and patterns, serving as an initial hint for detect-
ing anomalies. Global anomalies manifested as distinct clusters in the UMAP
visualizations.
• Embedding visualization through different colour-coding schemes (LOF scores,
camera numbers, timestamps) enriched our understanding of the data and
helped guide the pre-selection process.
• Anomaly detection using LOF was effective in identifying locally anomalous
images. By tuning the LOF threshold, we were able to distinguish between
anomalous and normal images with reasonable confidence. This technique
was particularly beneficial in discovering potential defects. The thresholding
approach for LOF scores, although challenging, allows for a flexible trade-off
between selecting too many non-defective images and missing out on defective
ones.
• The pre-selection process resulted in a balanced dataset of defective and non-
defective images. The subsequent human annotation of these pre-selected im-
ages resulted in a robust training set for a downstream task involving defect
detection.
• The trained YOLOv8 model achieved an accuracy of 95.5%, indicating high
performance in detecting defects. This result underlines the efficacy of our pre-
selection pipeline in facilitating the training of effective downstream models.
42 Chapter 6. Conclusions

• The pre-selection accuracy, as compared to the human-annotated ground truth,


was satisfactory with an accuracy of 97.1%. Likewise, 1.26% defects were
missed during the pre-selection. The minor fractions of misclassification and
missed defects point to potential areas for improvement.

These findings underscore the potential of unsupervised learning techniques in


reducing time, effort and cost for data labelling through the image pre-selection pro-
cess, ultimately enhancing the efficiency and effectiveness of quality control within
a manufacturing setting.

6.2 Future Work


While the methodologies proposed in this thesis have shown promising results,
there are several areas where further investigations could lead to substantial im-
provements and innovative advancements in this domain. We identify the following
potential directions for future research:

• Comparative Analysis of Backbone Models: Different backbone models could


be compared in terms of their ability to produce useful embeddings for the pre-
selection task. As demonstrated in this work, ResNet-18 was used as the back-
bone model. However, there is a multitude of alternative models, each with
different characteristics and strengths, such as VGG16, MobileNet, or even
newer architectures like EfficientNet and Vision Transformers. An evaluation
based on the proposed metrics might give insight into which model performs
best in this specific task, and might lead to improved performance.

• Hyperparameter Tuning of UMAP and LOF: The current study employed


UMAP for dimensionality reduction and LOF for anomaly detection, with
fixed hyperparameters. However, there might exist an optimal combination
of hyperparameters for these two methods that could further improve the pre-
selection process. For instance, altering the number of nearest neighbours con-
sidered by LOF, or changing the number of components or the minimum dis-
tance in UMAP could yield different results. A systematic quantitative com-
parison based on evaluation metrics could potentially help in identifying the
optimal hyperparameters for these techniques in the context of this specific
task.

• Embedding Extraction Strategy: In this study, embeddings were extracted


from the second last layer of the ResNet-18 model after global average pooling.
However, it might be worth exploring the extraction of embeddings from ear-
lier layers that preserve more detailed spatial information, which could be ben-
eficial for the detection of small or intricate defects. Also, considering larger
image sizes for embedding computation could potentially enhance the per-
formance, especially for detecting smaller defects. Switching from a global
average pooling layer in the ResNet model to a global max pooling layer is
another potential modification that could make smaller defects more visible in
the embeddings.

We believe these directions could open up new ways to refine the image selection
process for downstream tasks, facilitating more efficient and accurate quality control
in manufacturing industries.
43

Bibliography

[1] Fukuo Hashimoto et al. ªAbrasive fine-finishing technologyº. In: CIRP Annals
65.2 (2016), pp. 597±620. ISSN: 0007-8506. URL: https://www.sciencedirect.
com/science/article/pii/S0007850616301950.
[2] Tian Wang et al. ªA fast and robust convolutional neural network-based de-
fect detection model in product quality controlº. In: The International Journal of
Advanced Manufacturing Technology 94 (Feb. 2018). DOI: 10.1007/s00170-017-
0882-0.
[3] Michela Prunella et al. ªDeep Learning for Automatic Vision-Based Recogni-
tion of Industrial Surface Defects: A Surveyº. In: IEEE Access 11 (2023), pp. 43370±
43423. DOI: 10.1109/ACCESS.2023.3271748.
[4] JK Park, BK Kwon, JH Park, et al. ªMachine learning-based imaging system
for surface defect inspectionº. In: International Journal of Precision Engineering
and Manufacturing-Green Technology 3.3 (July 2016), pp. 303±310. DOI: 10.1007/
s40684-016-0039-x.
[5] Haibo He and Edwardo A. Garcia. ªLearning from Imbalanced Dataº. In: IEEE
Transactions on Knowledge and Data Engineering 21.9 (2009), pp. 1263±1284. DOI:
10.1109/TKDE.2008.239.
[6] N. V. Chawla et al. ªSMOTE: Synthetic Minority Over-sampling Techniqueº.
In: Journal of Artificial Intelligence Research (2002). URL: https://doi.org/10.
1613/jair.953.
[7] Jinlei Hou et al. ªDivide-and-Assemble: Learning Block-wise Memory for Un-
supervised Anomaly Detectionº. In: 2021 IEEE/CVF International Conference on
Computer Vision (ICCV). 2021, pp. 8771±8780. DOI: 10.1109/ICCV48922.2021.
00867.
[8] Xian Tao et al. ªUnsupervised Anomaly Detection for Surface Defects With
Dual-Siamese Networkº. In: vol. 18. 11. 2022, pp. 7707±7717. DOI: 10.1109/
TII.2022.3142326.
[9] M. Zaheer et al. ªGenerative Cooperative Learning for Unsupervised Video
Anomaly Detectionº. In: 2022 IEEE/CVF Conference on Computer Vision and Pat-
tern Recognition (CVPR). IEEE Computer Society, 2022, pp. 14724±14734. URL:
https://doi.ieeecomputersociety.org/10.1109/CVPR52688.2022.01433.
[10] Samet Akcay et al. ªAnomalib: A Deep Learning Library for Anomaly Detec-
tionº. In: 2022 IEEE International Conference on Image Processing (ICIP). 2022,
pp. 1706±1710. DOI: 10.1109/ICIP46576.2022.9897283.
[11] R. Strudel et al. ªSegmenter: Transformer for Semantic Segmentationº. In: 2021
IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer
Society, 2021, pp. 7242±7252. URL: https://doi.ieeecomputersociety.org/
10.1109/ICCV48922.2021.00717.
44 Bibliography

[12] ªOn the Sentence Embeddings from Pre-trained Language Models", author =
"Li, Bohan and Zhou, Hao and He, Junxian and Wang, Mingxuan and Yang,
Yiming and Li, Leiº. In: Proceedings of the 2020 Conference on Empirical Methods
in Natural Language Processing (EMNLP). Association for Computational Lin-
guistics, 2020, pp. 9119±9130. URL: https://aclanthology.org/2020.emnlp-
main.733.
[13] Jize Cao et al. ªBehind the Scene: Revealing the Secrets of Pre-trained Vision-
and-Language Modelsº. In: Computer Vision ± ECCV 2020. Cham: Springer In-
ternational Publishing, 2020, pp. 565±580. URL: https://doi.org/10.1007/
978-3-030-58539-6_34.
[14] Nasir Mohammad Khalid et al. ªCLIP-Mesh: Generating Textured Meshes from
Text Using Pretrained Image-Text Modelsº. In: SIGGRAPH Asia 2022 Confer-
ence Papers. Association for Computing Machinery, 2022. ISBN: 9781450394703.
DOI : 10.1145/3550469.3555392. URL : https://doi.org/10.1145/3550469.
3555392.
[15] Domen Tabernik et al. ªSegmentation-based deep-learning approach for surface-
defect detectionº. In: Journal of Intelligent Manufacturing (2019). URL: https :
//dx.doi.org/10.1007/s10845-019-01476-x.
[16] Tang Tang et al. ªAnomaly Detection Neural Network with Dual Auto-Encoders
GAN and Its Industrial Inspection Applicationsº. In: Sensors 20.12 (2020), p. 3336.
URL : https://www.mdpi.com/1424-8220/20/12/3336.

[17] Jungsuk Kim et al. ªPrinted Circuit Board Defect Detection Using Deep Learn-
ing via A Skip-Connected Convolutional Autoencoderº. In: Sensors 21.15 (2021),
p. 4968. URL: https://www.mdpi.com/1424-8220/21/15/4968.
[18] Liang Xu et al. ªA Weakly Supervised Surface Defect Detection Based on Con-
volutional Neural Networkº. In: IEEE Access 8 (2020), pp. 44200±44212. DOI:
10.1109/ACCESS.2020.2977821.
[19] Paul Bergmann et al. ªThe MVTec Anomaly Detection Dataset: A Comprehen-
sive Real-World Dataset for Unsupervised Anomaly Detectionº. In: Interna-
tional Journal of Computer Vision (2021). URL: https://dx.doi.org/10.1007/
s11263-020-01400-4.
[20] Yuan-Hong Liao, Amlan Kar, and Sanja Fidler. ªTowards Good Practices for
Efficiently Annotating Large-Scale Image Classification Datasetsº. In: June 2021,
pp. 4348±4357. DOI: 10.1109/CVPR46437.2021.00433.
[21] Lihai Nie, Laiping Zhao, and Keqiu Li. ªGlad: Global And Local Anomaly De-
tectionº. In: 2020 IEEE International Conference on Multimedia and Expo (ICME).
2020, pp. 1±6. DOI: 10.1109/ICME46284.2020.9102818.
[22] Chun-Liang Li et al. ªCutPaste: Self-Supervised Learning for Anomaly Detec-
tion and Localizationº. In: Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 2021. URL: https : / / dx . doi . org / 10 . 1109 /
CVPR46437.2021.00954.
[23] Eric Wu et al. ªConditional Infilling GANs for Data Augmentation in Mam-
mogram Classificationº. In: Medical Image Computing and Computer Assisted In-
tervention ± MICCAI 2018. 2018. URL: https://dx.doi.org/10.1007/978-3-
030-00946-5_11.
Bibliography 45

[24] Ben Sorscher et al. ªBeyond neural scaling laws: beating power law scaling
via data pruningº. In: Advances in Neural Information Processing Systems. Ed.
by S. Koyejo et al. Vol. 35. Curran Associates, Inc., 2022, pp. 19523±19536.
URL : https://proceedings.neurips.cc/paper_files/paper/2022/file/
7b75da9b61eda40fa35453ee5d077df6-Paper-Conference.pdf.
[25] Angelos Katharopoulos and Francois Fleuret. ªNot All Samples Are Created
Equal: Deep Learning with Importance Samplingº. In: (Mar. 2018).
[26] J. Yosinski et al. ªHow transferable are features in deep neural networks?º In:
Advances in neural information processing systems (2014), pp. 3320±3328.
[27] C. Tan et al. ªA survey on deep transfer learningº. In: International conference
on artificial neural networks. Springer. 2018, pp. 270±279.
[28] K. He et al. ªDeep residual learning for image recognitionº. In: Proceedings of
the IEEE conference on computer vision and pattern recognition. 2016, pp. 770±778.
[29] O. Russakovsky et al. ªImagenet large scale visual recognition challengeº. In:
International journal of computer vision 115.3 (2015), pp. 211±252.
[30] M.D. Zeiler and R. Fergus. ªVisualizing and understanding convolutional net-
worksº. In: European conference on computer vision. Springer. 2014, pp. 818±833.
[31] A. Gordo et al. ªDeep image retrieval: Learning global representations for im-
age searchº. In: Proceedings of the European conference on computer vision (ECCV).
2016, pp. 241±257.
[32] L.v.d. Maaten and G. Hinton. ªVisualizing data using t-SNEº. In: Journal of
machine learning research 9.Nov (2008), pp. 2579±2605.
[33] Leland McInnes, John Healy, and James Melville. ªUMAP: Uniform mani-
fold approximation and projection for dimension reductionº. In: arXiv preprint
arXiv:1802.03426 (2018). URL: https://arxiv.org/abs/1802.03426.
[34] R. Chalapathy, A.K. Menon, and S. Chawla. ªDeep learning for anomaly de-
tection: A surveyº. In: arXiv preprint arXiv:1901.03407 (2019).
[35] Yufei Liu, Jing Zhou, and Kevin P White. ªA Comparison for Dimensional-
ity Reduction Methods of Single-Cell RNA-seq Dataº. In: Frontiers in Genetics
(2021). URL: https : / / www . frontiersin . org / articles / 10 . 3389 / fgene .
2021.646936/full.
[36] Rui Yamaguchi et al. ªDimensionality reduction and visualization of genomic
data using UMAPº. In: Nature Communications (2020). URL: https : / / www .
nature.com/articles/s41467-020-15194-z.
[37] Jun-Yan Zhu et al. ªUnpaired Image-to-Image Translation using Cycle-Consistent
Adversarial Networksº. In: Proceedings of the IEEE International Conference on
Computer Vision. 2017, pp. 2223±2232. URL: https : / / openaccess . thecvf .
com/content_ICCV_2017/html/Jun- Yan_Zhu_Unpaired_Image- To- Image_
Translation_ICCV_2017_paper.html.
[38] Xun Wang et al. ªRanked List Loss for Deep Metric Learningº. In: Proceedings of
the IEEE Conference on Computer Vision and Pattern Recognition. 2019, pp. 5207±
5216. URL: https : / / openaccess . thecvf . com / content _ CVPR _ 2019 / html /
Wang_Ranked_List_Loss_for_Deep_Metric_Learning_CVPR_2019_paper.
html.
[39] M. M. Breunig et al. ªLOF: Identifying Density-Based Local Outliersº. In: Pro-
ceedings of the 2000 ACM SIGMOD International Conference on Management of
Data (2000), pp. 93±104.
46 Bibliography

[40] A. Zimek, E. Schubert, and H. P. Kriegel. ªA survey on unsupervised outlier


detection in high-dimensional numerical dataº. In: Statistical Analysis and Data
Mining: The ASA Data Science Journal 5.5 (2012), pp. 363±387.
[41] C. C. Aggarwal. ªOutlier Analysisº. In: (2015).
[42] J. Tang et al. ªEnhancing effectiveness of outlier detections for low density
patternsº. In: Advances in Knowledge Discovery and Data Mining (2002), pp. 535±
548.
[43] J. Ren et al. ªHDSOD: A heuristic-based density and structure outliers detec-
tion method for mixed-attribute dataº. In: Knowledge-Based Systems 121 (2017),
pp. 163±177.
[44] W. Jin et al. ªIncremental local outlier detection for data streamsº. In: Pro-
ceedings of the 2006 IEEE International Conference on Data Engineering (2006),
pp. 111±112.
[45] H. P. Kriegel et al. ªLoOP: Local Outlier Probabilitiesº. In: Proceedings of the
18th ACM Conference on Information and Knowledge Management (2009), pp. 1649±
1652.
[46] Zewen Li et al. ªA Survey of Convolutional Neural Networks: Analysis, Ap-
plications, and Prospectsº. In: IEEE Transactions on Neural Networks and Learn-
ing Systems 33.12 (2022), pp. 6999±7019. DOI: 10.1109/TNNLS.2021.3084827.
[47] Amir Ebrahimi, Suhuai Luo, and Raymond Chiong. ªIntroducing Transfer
Learning to 3D ResNet-18 for Alzheimer’s Disease Detection on MRI Imagesº.
In: 2020 35th International Conference on Image and Vision Computing New Zealand
(IVCNZ). 2020, pp. 1±6. DOI: 10.1109/IVCNZ51579.2020.9290616.
[48] F. Pedregosa et al. ªScikit-learn: Machine Learning in Pythonº. In: Journal of
Machine Learning Research 12 (2011), pp. 2825±2830.
[49] Phuc H. Le-Khac, Graham Healy, and Alan F. Smeaton. ªContrastive Rep-
resentation Learning: A Framework and Reviewº. In: IEEE Access 8 (2020),
pp. 193907±193934. DOI: 10.1109/ACCESS.2020.3031549.
[50] Hideitsu Hino. ªActive Learning: Problem Settings and Recent Developmentsº.
In: (Dec. 2020).

You might also like