0% found this document useful (0 votes)

55 views19 pages

Computer Aided Civil Eng - 2022 - Yong - Prompt Engineering For Zero Shot and Few Shot Defect Detection and Classification

Uploaded by

ding liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views19 pages

Computer Aided Civil Eng - 2022 - Yong - Prompt Engineering For Zero Shot and Few Shot Defect Detection and Classification

Uploaded by

ding liu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

DOI: 10.1111/mice.

12954

I N D U S T R I A L A P P L I C AT I O N

Prompt engineering for zero-shot and few-shot defect

detection and classification using a visual-language
pretrained model

Gunwoo Yong Kahyun Jeon Daeyoung Gil Ghang Lee

Building Informatics Group, Department

of Architecture and Architectural Abstract
Engineering, Yonsei University, Seoul, Zero-shot learning, applied with vision-language pretrained (VLP) models, is
South Korea
expected to be an alternative to existing deep learning models for defect detec-
Correspondence tion, under insufficient dataset. However, VLP models, including contrastive
Ghang Lee, Building Informatics Group, language-image pretraining (CLIP), showed fluctuated performance on prompts
Yonsei University, A512, Bldg 121, 50
Yonsei-ro, Seodaemun-Gu, Seoul 03722, (inputs), resulting in research on prompt engineering—optimization of prompts
Republic of Korea. for improving performance. Therefore, this study aims to identify the features of
Email: [email protected]
a prompt that can yield the best performance in classifying and detecting building
Funding information defects using the zero-shot and few-shot capabilities of CLIP. The results reveal
National Research Foundation of Korea the following: (1) domain-specific definitions are better than general definitions
(NRF), Grant/Award Number:
and images; (2) a complete sentence is better than a set of core terms; and (3)
2021RIA2C300820969
multimodal information is better than single-modal information. The resulting
detection performance using the proposed prompting method outperformed that
of existing supervised models.

1 INTRODUCTION defect data, even for research purposes (D’Addario, 2020),

as defects negatively influence the reputation of construc-
Defects are a significant issue and have been the topic of tion companies (Rotimi et al., 2015). Moreover, even if such
numerous studies in the construction industry (Perez & a large amount of defect data can be acquired, additional
Tah, 2021). However, traditional inspection primarily by tasks to label and preprocess the data are highly time-
highly trained inspectors is significantly time-consuming consuming. Consequently, various supervised learning
and laborious (Perez & Tah, 2021). approaches have struggled with the lack of data required
Many studies have attempted to replace manual inspec- for training (Y. Gao et al., 2019).
tion with computer vision-aided inspection (Guo et al., To overcome this issue, data augmentation and trans-
2021; Lee et al., 2020; Maeda et al., 2018; Meijer et al., fer learning have been studied extensively for vision-based
2019; Perez et al., 2019) or vibration-sensor-based inspec- inspection. For data augmentation, image preprocessing
tion (Amezquita-Sanchez & Adeli, 2015; Azimi & Pekcan, is used, primarily including shift, rotation, flip, scaling,
2020; Rafiei & Adeli, 2017). However, these studies on whitening, and brightness changes. The traditional data-
automated defect inspection were primarily based on augmentation approach, however, exhibits a limitation in
supervised learning, which requires enormous defect data the additional acquisition of unprecedented features, as
(Y. Gao et al., 2021). the augmented images are simply the variations of original
However, collecting defect data are arduous. Many con- images (Y. Gao et al., 2021). Owing to this limitation, gen-
struction companies have been reluctant to open their erative adversarial networks (GANs) have been deployed
© 2022 Computer-Aided Civil and Infrastructure Engineering.

1536 wileyonlinelibrary.com/journal/mice Comput Aided Civ Inf. 2023;38:1536–1554.

14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1537

as an alternative to the traditional data-augmentation region feature regression, and image–text matching (ITM),
approaches (Y. Li et al., 2021) for their superior per- with approximately 5 million images and corresponding
formance (J. J. Zhao et al., 2017). Y. Gao et al. (2021) descriptions. Chen et al. (2020) used 5.6 million image–text
augmented the crack and spalling image data using GAN, pairs to pretrain their model, called universal image–
to overcome the low-and imbalanced-class data environ- text representation learning (UNITER), for four tasks,
ment. Y. Li et al. (2021) expanded their defect data using including MLM, ITM, masked region modeling, and word-
GAN for pavement distress detection. Maeda et al. (2021) region alignment. Among several VLP models, contrastive
generated road images, including potholes, by deploying language-image pretraining (CLIP) has received attention
GAN, and the results were used to train their detection owing to its task-agnostic design and robust zero-shot per-
model. However, GAN-based data augmentation possesses formance (Agarwal et al., 2021). CLIP, trained with 400
limitations. The images generated through GAN inevitably million (image, text) pairs, outperformed existing super-
inherit biases from their original dataset (Hu & Li, 2019), vised learning models (e.g., ResNet50) in diverse tasks
and thus may produce non-sensical images that contradict (Radford et al., 2021). However, CLIP still shows a consid-
the laws of physics (Y. Gao et al., 2019). erably low performance on specialized or complex tasks,
In contrast, transfer learning leverages the initial archi- including German traffic sign recognition benchmark (the
tecture and parameters of an existing model, pretrained GTSRB) task, counting objects in synthetic scenes (the
with a significantly large dataset, with a new dataset from a CLEVRCounts task), and lymph node tumor detection (the
target domain, enabling the pretrained model to learn new PatchCamelyon task) (Radford et al., 2021).
features of the target domain easily (Pan & Yang, 2010). To meet the demands of harnessing VLP models, includ-
Owing to this, the effort to collect and label a large dataset ing CLIP, Dall-E (Ramesh et al., 2021), and Midjourney
for training has been greatly reduced. Nevertheless, this (Midjourney, 2022), even a market (Promptbase, 2022)
method still requires hundreds to thousands of images. for buying and selling prompts (inputs) has been formed
For vision-based defect detection tasks, Y. Gao and Mos- (Nine, 2022). However, not all prompts yield desirable
alam (2018) used 1600 images for retraining the last several performance. Several studies have identified that the
blocks of a pretrained visual geometry group-16 (VGG-16) performance of VLP models, including CLIP, fluctuates
model. Bang et al. (2019) selected 427 images from 2.1 TB of depending on how prompts are constructed (Brown et al.,
video data, consisting of 289 h of playback time, to retrain 2020; T. Gao et al., 2020; X. Liu et al., 2021; T. Z. Zhao et al.,
residual networks (ResNet). J. Zhu et al. (2020) deployed 2021; Zhou et al., 2021).
1180 images for retraining Inception-v3. The CLIP developer team identified that the prompts
To eliminate the requirement for the customization of constructed in a form of “a {category} photo of {label}”
a model using an additional dataset, few-shot learning yielded results with higher accuracy than those in a form
and zero-shot learning using colossal pretrained language of “{label}” (Radford et al., 2021). Thus, the optimization
models have been introduced. Few-shot learning is a deep of prompts, which is referred to as prompt engineering,
learning technique aiming to deploy transferable knowl- is important for eliciting the best zero-shot performance
edge to identify the features of classes, where an extremely from pretrained models. Nevertheless, studies on prompt
small number of samples are given per class (Cui et al., engineering, particularly for VLP models, have been rarely
2022). Similarly, zero-shot learning is a deep learning tech- conducted.
nique aiming to identify unseen objects without any given This study aims to identify the features of a prompt that
samples (Lampert et al., 2009). Zero-shot learning mim- can yield the best performance in classifying and detecting
ics the human ability to categorize unseen classes based building defects from a video clip or a pile of images using
on previous experiences and knowledge without further zero-shot learning, based on CLIP. Specific research ques-
training (Chang et al., 2008; Lampert et al., 2009). One tions are discussed in Section 2.4. The scope of this study is
of the key strategies of zero-shot learning is leveraging limited to the defects in residential buildings, particularly
semantic information, such as word embeddings from the five most frequently occurring defects described in pre-
seen/unseen classes (Pourpanah et al., 2022). In the natu- vious literature (Y. Gao & Mosalam, 2020; Guo et al., 2021;
ral language processing (NLP) field, enlarging pretraining Paton-Cole & Aibinu, 2021; Wali & Ali, 2019; Zalejska &
data superabundantly enable a model to consider seman- Hungria, 2019): crack, mildew, nail popping, peeling, and
tic information, resulting in the recent success of zero-shot wrinkling.
and few-shot downstream tasks (X. Liu et al., 2021). The remainder of this paper is organized as follows: Sec-
Similarly, vision-language pretrained (VLP) models tion 2 introduces related work and describes the research
have been developed to perform a variety of tasks without gaps and questions. Section 3 describes the proposed
any customization. Qi et al. (2020) introduced ImageBERT, method and hypotheses. Section 4 presents the datasets
which was pretrained for four tasks: masked language for this study, the overall experimental processes, and
modeling (MLM), masked object classification, masked the performance indices. Section 5 presents the results
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1538 YONG et al.

of the study. Finally, Section 6 concludes the paper with augmented images, through GAN. They showed a perfor-
contributions and limitations. mance improvement for increasing training data. At least
hundreds of images per class were thus required for defect
2 RELATED WORKS detection studies, in the recent 3 years.
In addition to the challenges of acquiring gold-labeled
2.1 Defect detection with insufficient data, the imbalanced number of images per class is a cru-
dataset cial issue with respect to the construction industry (Y.
Gao et al., 2021). The imbalanced data environment is
This section, this study reviews previous deep learning- natural during the monitoring of a structure, not only
based defect detection studies, particularly with an insuf- because defects are relatively rare, compared to undam-
ficient dataset. This study uses the term “insufficiency” to aged components (Y. Gao et al., 2021), but also because
denote deficiency in the quantity, as well as the balance the occurrence frequency of defects varies depending
between datasets. This section enumerates the methods on the type and location (D. Li et al., 2019). Several
and the number of images deployed in each study and approaches have been used to detect or classify defects
discusses the need for zero-shot capability in detail. using insufficient datasets. One approach is oversampling.
Since the advent of advanced computer vision tech- Oversampling refers to the expansion of a small portion
nology, deep learning-based defect detection has been of the class by tuning the original feature or generating
studied extensively. As a rule of thumb, deep learning a similar feature. Meijer et al. (2019) applied oversam-
requires 1000 images per class (Goodfellow et al., 2016). pling with a class-weighted loss function to enlarge their
However, not all studies satisfied this rule of thumb, as training data from 17,663 images to five times as much.
it is often difficult to collect high-quality and reliably Subsequently, they conducted the defect classification task
labeled (gold-labeled) data in the construction industry using CNN. Another approach is the hierarchical task per-
(Maeda et al., 2021). Previous studies bypassed the pro- formance. D. Li et al. (2019) first developed a CNN model
cess of mining such big data, through transfer learning. to identify defects and classified these defects in detail at
Transfer learning is the most commonly deployed method the end. Meta-learning is another method to overcome
for deep learning-based technology, as it decreases the imbalanced data environments. Meta-learning is gener-
requirement for building a bespoke architecture. Y. Gao ally referred to as learning to learn, which connotes an
and Mosalam (2018) used 1600 images for training VGG- outer (or meta) algorithm that refurbishes the inner learn-
16 and performed four classification tasks: component ing algorithm to yield a desired outcome for an outer task
type, spalling condition, damage level, and damage type. (Hospedales et al., 2020). This enables a deep learning
Liang (2019) trained three pretrained models, including model to be trained with small datasets (Nichol et al., 2018).
AlexNet, Google Net, and VGG-16, with 1154 images and Guo et al. (2020) classified façade defects into blistering,
deployed them to identify system-level major failures, cracking, peeling, delamination, spalling, and biological
bridge columns defects, and pinpoint damages. S. Jiang growth through a meta-learning-based CNN trained with
and Zhang (2020) proposed a real-time crack assessment 21,259 images comprising 63% of non-defect data. In addi-
method using the combination of single-shot detector lite tion to meta-learning, semi-supervised learning is also an
and MobileNetV2 (SSDLite-MobileNetV2). Out of 1330 emerging method for imbalanced datasets. Compared to
images, 1030 images were employed for training. meta-learning, semi-supervised learning aims to update
Data augmentation through conventional image prepro- the learning process by adjusting the interaction between
cessing or GAN has been deployed as another strategy labeled and unlabeled data (Engelen & Hoos, 2020; X.
to work with small quantities of data. J. Zhu et al. Zhu & Goldberg, 2009). This has alleviated the labeling
(2020) expanded 243 raw images to a total of 1458 images requirement for end-users, particularly with respect to
through brightness, saturation, and flip adjustments. Sub- the construction industry, as there are not many labeled
sequently, they classified five defects: crack, intact, pock- datasets. Guo et al. (2021) leveraged the semi-supervised
mark, exposed rebar, and spalling through a convolutional learning based on CNN, to classify façade defects into the
neural network (CNN). Y. Li et al. (2021) comparatively same classes as their previous study (Guo et al., 2020),
studied their detection model: the “you only look once however, with a smaller size; a total of 5621 images were
v4,” trained with 2500, 5000, 7500, and 10,000 augmented employed for training. Y. Gao et al. (2021) harnessed over-
images through GAN. They applied these models for pave- sampling through GAN and semi-supervised learning to
ment distress detection and discovered that the greater the classify defects in low and imbalanced datasets. A total of
image-augmentation for training, the greater the accuracy 10,500 images was used to train the CNN model. However,
of the model. Maeda et al. (2021) measured the perfor- only 900 images were used for defects.
mance of their damage detection model, based on SSD Thus far, zero-shot and few-shot concepts are barely
MobileNet when using 1200 original images, 1800 and 2400 used for defect classification/detection. Recently, a study
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1539

leveraged 1-shot, 2-shot, 5-shot, and 10-shot methods to (Shen et al., 2021), including visual question answering
classify façade defects (Cui et al., 2022). During the train- (Goyal et al., 2017).
ing stage, five defect classes (that is blistering, crack, To be concrete, CLIP includes two encoders as a VLP
delamination, peeling, and no defects) with approximately model. Either a ResNet50 (K. He et al., 2016; T. He et al.,
thousands of images per se were used. However, only 2019; R. Zhang, 2019) or a vision transformer architecture
two novel classes (efflorescence and spalling with exposed (Dosovitskiy et al., 2021) was adopted as the image encoder,
rebars) were tested (Cui et al., 2022). and a masked self-attention transformer (Vaswani et al.,
Notwithstanding, the existing approaches for insuf- 2017) was used as the text encoder. During the training
ficient defect datasets, such as data augmentation, phase, both the image and text encoders extract visual and
meta-learning, and semi-supervised learning, still textual features, respectively, from a set of (image, text)
require hundreds of images per class. In addition, these pairs. Subsequently, these features are mapped into a mul-
bespoke models cannot identify unlearned/untrained timodal embedding space. CLIP computes the cosine sim-
defects. ilarities of all possible (image embedding, text embedding)
pairs. CLIP is trained to maximize the cosine similarities
2.2 Clip of true (image, text) pairs while minimizing that of false
pairs (Radford et al., 2021). CLIP adopts the learning pro-
Scaling up a dataset for pretraining has advanced in deep cess of contrastive visual representation learning from text
learning communities while suggesting a new horizon for (ConVIRT) (Yuhao Zhang et al., 2020) and then simplifies
zero-shot and few-shot transfer. The recent success of pre- it. The process of vectorizing inputs and loss functions of
trained language models, such as bidirectional encoder CLIP is explained as follows.
representations from transformers (BERT; Devlin et al., Given an input image 𝑥𝐼 , a transformed view 𝑥˜ℐ is gener-
2018) and generative pretrained transformer (GPT) series ated via random resizing and cropping. Subsequently, the
(Brown et al., 2020; Radford et al., 2018, 2019), inspired augmented image is converted into a fixed-dimensional
researchers to extend these models to perform vision- vector. Then, the linear projection is applied to be the final
related tasks (Chen et al., 2020; Kim et al., 2021; L. H. vector 𝒥. Similarly, a text input 𝑥𝑇 is transformed to a vector
Li et al., 2019; X. Li et al., 2020; Qi et al., 2020; Radford 𝒯. When training CLIP, the batch size of (𝑥𝐼 , 𝑥𝑇 ) is vector-
et al., 2021). By training visual and textual features simul- ized as (𝒥, 𝒯). Here, 𝑁 denotes the batch size and (𝒥𝑖 , 𝒯𝑖 )
taneously and bridging the semantic gap between them, is the 𝑖th pair of 𝑁 (𝒥, 𝒯) pairs. Then, the image-to-text loss
these models can identify a wide range of information in function is described as
an image (Chen et al., 2020). Beyond the detection of an
(𝒥→𝒯) 𝑒(𝒥𝑖 , 𝒯𝑖 )∕𝜏
object, they can recognize attributes, spatial relationships, 𝑙𝑖 = − log ∑𝑁 (1)
(𝒥𝑖 , 𝒯𝑘 ) ∕𝜏
actions, and intentions (L. H. Li et al., 2019). 𝑘=1 𝑒
CLIP has been widely used in previous studies (W. Wang
et al., 2021), as it pursues end-users to harness a VLP model where (𝒥, 𝒯) is the cosine similarity of the vector pair,
to accomplish diverse tasks without any customization and 𝜏 is a temperature parameter (Yuhao Zhang et al.,
(Radford et al., 2021). Inspired by the results that pre- 2020). This parameter adjusts penalties on negative pairs.
training with web-scale collections of text surpasses that Notably, as the temperature decreases, the distribution
with gold-labeled NLP datasets, CLIP was trained with of embeddings becomes more uniform. Similarly, the
400 million (image, text) pairs under natural language text-to-image loss function is described as
supervision—learning visual features from corresponding 𝑒( 𝒯𝑖 , 𝒥𝑖 )∕𝜏
(𝒯→𝒥)
natural language without human intervention (Radford 𝑙𝑖 = − log ∑𝑁 (2)
(
et al., 2021). 𝑘=1 𝑒 𝒯𝑖 , 𝒥𝑘 ) ∕𝜏
Such scaling-up approach enabled CLIP to be task-
agnostic, operating under various scenarios. The zero-shot The final training loss is defined using the weighted
capability of CLIP outperformed typical supervised learn- combination of two losses (Equations 1 and 2) as follows:

1 ∑ ( (𝒥→𝒯) )
ing models (e.g., ResNet50; K. He et al., 2015), VLP models 𝑁
(𝒯→𝒥)
(e.g., Visual N-grams; A. Li et al., 2017), PixelBERT (Huang ℒ= 𝜆𝑙𝑥 + (1 − 𝜆) 𝑙𝑥 (3)
𝑁 𝑥=1
et al., 2020), learning cross-modality encoder representa-
tions from transformers (LXMERT) (Tan & Bansal, 2019),
where 𝜆 ∈ [0, 1] is a scalar weight (Yuhao Zhang et al.,
UNITER (Chen et al., 2020), and object-semantics aligned
2020).
pre-training (OSCAR) (X. Li et al., 2020), not only in image
Owing to its dominant zero-shot performance, previ-
classification tasks (Radford et al., 2021), including Ima-
ous studies deployed CLIP in artwork classification (Conde
geNet (Deng et al., 2009), SUN397 (Xiao et al., 2010), and
& Turgutlu, 2021), video summarization by detecting
Food101 (Kaur et al., 2017) but also vision-language tasks
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1540 YONG et al.

key video frames (Narasimhan et al., 2021), and vehi- 2.4 Research gap and questions
cle retrieval tasks (Khorramshahi et al., 2021). However,
it does not perform effectively in domain-specific tasks Previous studies proposed deep learning-based defect
or complex tasks, such as traffic sign recognition and detection, which requires large datasets of defect images
lymph node tumor detection (Radford et al., 2021). This that are difficult to acquire. However, owing to recent
indicates that the use of CLIP for zero-shot tasks in a advancements in task-agnostic VLP models (e.g., CLIP),
domain-specific area requires performance improvement. zero-shot transfer, which does not require training data,
has become possible. The task-agnostic VLP models do not
work effectively with domain-specific tasks, whereas they
2.3 Prompt engineering work effectively with generic tasks. Inspired by the success
of prompt engineering for a pretrained model in the NLP
Several studies have revealed that paraphrasing or rephras-
field, several VLP models have embraced prompt engineer-
ing an input query, known as prompt, can bolster the
ing. However, prompt engineering of a VLP model has not
performance of CLIP-like pretrained models (Z. Jiang
reached the stage where it can help us detect defects with
et al., 2020; V. Liu & Chilton, 2022; X. Liu et al., 2021;
high accuracy. Therefore, this study aims to identify the
Radford et al., 2019; Schick & Schütze, 2021). Prompt
characteristics of a prompt enabling CLIP to detect build-
engineering is the process of modifying a query to facil-
ing defects with high accuracy by addressing the following
itate a pretrained model to identify target information
research questions:
with high accuracy (Brown et al., 2020; T. Gao et al.,
2020; P. Liu et al., 2021; T. Z. Zhao et al., 2021). Some
1. Does a generic dictionary definition of a defect perform
studies manually converted an initial query into a set for-
better as a prompt or domain-specific definition of a
mat of the query. Radford et al. (2019) illustrated that
defect better?
GPT-2 could acquire advanced zero-shot competency by
2. Does CLIP perform better with a complete sentence
inserting a task description, such as “translate English to
including stopwords as a prompt, or does it perform bet-
French” into a prompt. Schick and Schütze (2021) changed
ter with a prompt consisting only of core terms without
the labels to task-relevant descriptions applicable to a
stopwords?
masked-language model and showed the improvement
3. Which has stronger descriptive power, either a textual
of robustly optimized BERT performance. Radford et al.
prompt or a visual one?
(2021) described that the elaboration of a query in the
4. Does CLIP detect defects more accurately with a multi-
form of “a {category} photo of {label}” performs better than
modal prompt?
a single “{label}” query in fine-grained image classifica-
tion using CLIP. In contrast, other studies have focused
The significances of these four research questions are
on automatically tuning a prompt. Z. Jiang et al. (2020)
discussed in detail in the next section.
identified that knowledge-contained prompts, generated
by mining-based and paraphrasing-based approaches, can
enhance the performance of pretrained language mod- 3 PROPOSED PROMPT ENGINEERING
els, that is, BERT-base and BERT-large. Compared to METHOD
the vocabularic approach, Liu et al. (2021) presented
p-tuning, which is a method to automatically search bet- 3.1 Construction of DK reflected in
ter prompts with numerical embedding tensors beyond prompts
the scope of the original vocabulary, thereby boosting
the GPT-2’s performance. Zhou et al. (2021) introduced Hypothesis 1 (H1): A DK based definition of a defect
context optimization—generating a prompt consisting of performs better as a prompt than a GK based defini-
context vectors—to not only improve the zero-shot per- tion.
formance of CLIP but also to avoid tuning a prompt
manually. Expanding the information of input data quantitatively
However, a prerequisite for any prompt tuning method, and qualitatively by referring to a dictionary can boost
whether automated or not, is that the result must be known the performance of a deep learning model (Kupi et al.,
first to optimize a prompt. That is, prompt tuning meth- 2021; Peng et al., 2020; Qiu et al., 2020). Motivated by
ods do not explain how to configure the prompt in the first this technique, this study tailored prompts by looking up
place to improve the zero-shot performance. The main dif- dictionaries. In previous prompt engineering approaches,
ference between previous studies and this study is that this Radford et al. (2021) added category information to instruct
study focuses on initial “prompt construction” rather than CLIP to classify objects within a provided category. This
“prompt tuning.” approach cannot augment the information for individ-
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1541

ual target labels. Zhou et al. (2021) discovered an optimal 3.3 Descriptive power of a visual prompt
prompt in vector space. However, these vectors could not
be converted into real-world vocabulary (Zhou et al., 2021), Hypothesis 3 (H3): A defect image is better than the
which results in an inharmonious human-AI interaction defect definition as a prompt.
(Shin et al., 2020; Wu et al., 2022). Compared to earlier
approaches, our dictionary-enhanced prompt is a “human- The majority of defect reports include descriptions of
readable” prompt, as well as “knowledge-contained.” defects and corresponding images (Dong et al., 2009; Park
In addition, this study distinguished between the et al., 2013), as humans are more receptive to visual infor-
enhanced prompts obtained through domain-specific dic- mation than other types of information (Colavita, 1974;
tionaries and those obtained from the commonly used Colavita et al., 1976; Posner et al., 1976; Sinnett et al., 2007).
dictionaries (e.g., the Oxford English Dictionary), to deter- Based on this claim, this study comparatively measures the
mine the difference between prompts based on domain- descriptive powers of visual and textual information.
specific information and those based on general infor- As described in Section 2.2, there are two encoders in
mation. This study considered definitions in construction CLIP. The text encoder places a textual query, primarily
jargon dictionaries as domain knowledge reflected in used as a prompt, in the 512-dimensional space. In con-
prompts (DK prompts). This was based on the rationale trast, visual data (e.g., images) are encoded as 512, 640,
that they include clear and concise explanations of the 768, or 1024-dimensional vectors by the image encoder.
most commonly encountered terms, phrases, and abbrevi- Considering that CLIP can convert both data types into
ations used throughout the construction industry (Tolson, vectors in the same-dimensional embedding space (the
2012). Conversely, this study considered definitions from 512-dimension), an image can be deployed as a prompt.
the commonly used and renowned dictionaries as general Moreover, multiple vectors from multiple images can be
knowledge reflected in prompts (GK prompts). In addition, merged into a single vector by calculating the average,
this study created ensemble features for these two prompt which is similar to a few-shot transfer (Radford et al., 2021).
groups in the embedding space by averaging embedding M. Wang et al. (2021) employed eight images as a prompt
vectors as Radford et al. (2021) suggested an increase in to carry out few-shot action recognition. In a similar way,
the accuracy of CLIP. This study sets the prompt format, “a this study used a small number of images as a prompt and
{category} photo of {label},” suggested by the CLIP develop- compared it with the textual prompt.
ment team in its latest publication (Radford et al. (2021) as Through this work, it is possible to optimize the prompt
a baseline. Table 1 displays the prompts used in this study, toward classifying or detecting defects and, by extension, to
and Table 2 lists the prompt sources. identify which format is better for delivering information
to a VLP model, either linguistic or visual.

3.2 Removal of stopwords 3.4 Multimodal prompt for clip

Hypothesis 2 (H2): A list of core terms in a definition Hypothesis 4 (H4): A multimodal prompt with the
performs better as a prompt than a complete sentence combination of defect images and definitions per-
definition. forms better than a single-modal prompt.

Stopwords (e.g., a, the, it, he, she) are functional and Prior to testing H4, this study performed the principal
frequently appearing words that show low discrimination component analysis (PCA) to verify the existence of a gap
power (Brants, 2003; Lo et al., 2005; Rijsbergen, 1979; Saif between defect images and descriptions when a VLP model
et al., 2014). Considering that the removal of stopwords processes information. PCA is a widely used technique for
facilitates the improvement of data quality and decreases analyzing, compressing, and visualizing high-dimensional
the dimension of data (Silva & Ribeiro, 2003; Wilbur & data (Amiri et al., 2012; Bishop, 2006; Nabian & Meidani,
Sirotkin, 1992), it has become the mainstream preprocess- 2018; Yulong Zhang et al., 2021). PCA is the orthogonal
ing method in NLP (Fan & Mostafavi, 2019; Roy et al., projection of data onto a lower-dimensional linear space,
2020). Along with this trend, this study discarded stop- that is, the principal subspace, such that the variance of
words from prompts through the natural language toolkit the projected data is maximized (Hotelling, 1933). This
(Bird et al., 2009), anticipating performance strengthening. study extracted principal components from features in 512-
Zhou et al. (2021) discovered that there is no golden rule dimensions and visualized them to identify the position
for determining the optimal context length. However, their of image embedding vectors from defect images and text
approach was based on tokens that could not be converted embedding vectors from descriptions.
into the existing vocabulary. Thus, it is still an uncharted Generally, there is a feature difference between an image
area to reveal the relationship between the number of and a text when a deep learning model interprets the input
words in a prompt and the performance of the VLP model. data. Owing to this, earlier studies revealed that merg-
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1542 YONG et al.

TA B L E 1 Prompts and their sources for each prompt group and defect type
Defect type (synonyms)
[Prompt group] Prompt (Source)
Crack (cracks)
[BL] A defect photo of crack (R1)
[DK1] Crack is a fissure or fracture in a material (R2)
[DK2] Crack is a building defect consisting of complete or incomplete separation within a single element or between
contiguous elements of construction (R3)
[DK3] Crack is a fissure (R4)
[GK1] Crack is a fissure or opening formed by the cracking, breaking, or bursting or a hard substance (R5)
[GK2] Crack is a narrow break or opening (R6)
[GK3] Crack is a thin line on the surface of something when it is broken but not actually come apart (R7)
Mildew (mold; mould)
[BL] A defect photo of mildew (R1)
[DK1] Mildew is a fungus growth that is enhanced by dampness (R2)
[DK2] Mildew is a fungus that grows and feeds on paint, cotton, and linen fabric, and so forth, which are exposed moisture;
causes discoloration and decompositions of the surface (R3)
[DK3] Mildew is a fungus that stains materials but does not rot wood (R8)
[GK1] Mildew is a woolly, furry, or staining growth now recognized as consisting of fungus, such as that which forms on
food, textile, and so forth (R5)
[GK2] Mildew is a fungus producing mildew (R6)
[GK3] Mildew is a white or gray substance that grows on walls or other surfaces in wet, slightly warm conditions (R7)
Nail popping (nail pops; nail + popping)
[BL] A defect photo of nail popping (R1)
[DK1] Nail popping is a problem that appears both in decking and in gypsum wallboard finishes where the heads of nails
pull or work themselves out of the framing members and pop through the surface (R2)
[DK2] Nail popping is a protrusion of nailhead and compound directly over the nailhead, caused by outward movement of
the nail relative to the gypsum board (R9)
[DK3] Nail popping is a nail head that protrudes above the surrounding surface (R10)
[GK1] Nail popping is the displacement, dislodgement, and dislocation of a small metal spike (R5)
[GK2] Nail popping is a slender, usually pointed and headed fastener designed to be pounded in, escaping or braking away
from something usually suddenly or unexpectedly (R6)
[GK3] Nail popping is unexpectedly out of or away from a thin pointed piece of metal with a flat top, which you hit into a
surface with a hammer (R7)
Peeling
[BL] A defect photo of peeling (R1)
[DK1] Peeling is a paint defect where the paint debonds from the surface and peels off (R2)
[DK2] Peeling is finishing such as paint that has not properly adhered to the surface and has started to come away from the
substrate (R11)
[DK3] Peeling is the defect of dislodgement of paint or plaster from a surface due to lack of adhesion or a weak backing (R8)
[GK1] Peeling is the removal of the external layer or outer covering of something (R5)
[GK2] Peeling is a peeled-off piece or strip (R6)
[GK3] Peeling is paint peels, it comes off, usually in small pieces (R7)
Wrinkling
[BL] A defect photo of wrinkling (R1)
[DK1] Wrinkling is a paint defect in which the surface becomes wrinkled (R2)
[DK2] Wrinkling is the distortion in a paint film appearing as ripples, may be produced intentionally as a decorative effect,
or may be a defect caused by drying conditions or an excessively thick film (R3)
[DK3] Wrinkling is the development of ridges and furrows in a paint film during drying (R12)
(Continues)
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1543

TA B L E 1 (Continued)
[GK1] Wrinkling is the action of ceasing, puckering, or contracting into wrinkles (R5)
[GK2] Wrinkling is a small ridge or furrow especially when formed on a surface by the shrinking or contraction of a smooth
substance (R6)
[GK3] Wrinkling is a small untidy fold in a piece of clothing or paper (R7)
Abbreviations: BL, baseline; DK, domain knowledge; GK, general knowledge.

TA B L E 2 Prompt source identifiers and references

ID Reference
R1 Learning transferable visual models from natural
language supervision (Radford et al., 2021)
R2 Dictionary of Construction Terms (Tolson, 2012)
R3 Dictionary of Architecture and Construction (Harris,
2006)
R4 Dictionary of Building and Civil Engineering
(Montague, 2017)
R5 Oxford English Dictionary (Simpson & Weiner, 1989)
R6 Dictionary of Merriam-Webster (Merriam-Webster,
F I G U R E 1 Image classification and detection using
2019)
contrastive language-image pretraining (CLIP)
R7 Longman Dictionary of Contemporary English
(Pearson Education, 2014)
R8 Dictionary of Building (Scott & Maclean, 2000)
R9 A professional website (InspectApedia, n.d.) the cosine similarity between the embedding vectors from
R10 Crest Wood Painting (Crestwoodpainting, n.d.) each defect image and each prompt. Subsequently, CLIP
R11 A Dictionary of Construction, Surveying and Civil
predicted the defect type of each data point, based on
Engineering (Gorse et al., 2012) the maximum cosine similarity. In particular, CLIP picked
R12 National Dictionary of Building & Plumbing Terms
as many defects images as the total number of defects.
(Australia; Standards Australia, n.d.) The detailed experiments for testing our hypotheses were
conducted in the following order (Figure 2).
ing text and image information, as a multimodal method, First, this study compared the domain-specific and gen-
could identify a new linguistic or visual concept that a sin- eral dictionary-augmented prompts for classifying and
gle modality failed to elaborate (Shibata et al., 2007; Srihari, detecting building defects. Further, the DK and GK
1994), which boosts the performance of a deep learning prompts were ensembled as described in Section 3.1 to
model (Audebert et al., 2019; Gallo et al., 2018). Inspired by boost the performance.
these results, this study integrated the textual prompt from Second, this study discarded stopwords from prompts
Section 3.1, and the visual prompt from Section 3.3 over the and traced the performance change to investigate the
ensemble space in the same way as Section 3.3, aiming to importance of complete sentence format, including stop-
maximize the performance. words, in comparison with that of listing core terms
without stopwords. Moreover, to show that this perfor-
mance change does not result from the difference in the
4 EXPERIMENT DESIGN number of words between the complete sentence and
4.1 Tasks and experiments core words-based definitions, this study analyzed the rela-
tionship between the number of words in a prompt and
To test the four hypotheses, this study performed two tasks: the classification performance through a scatter plot and
classification and detection of defect images using CLIP. Spearman’s rank correlation coefficient. In contrast to the
In this study, “image classification” is defined as “sorting Pearson’s correlation coefficient, Spearman’s rank correla-
images by type,” whereas “image detection” is defined as tion coefficient can be calculated even if neither variable is
“the identification of target images from a stack of images distributed normally (Sedgwick, 2014). In this study, both
or a series of video frames.” Figure 1 displays a framework variables, that is, the number of words in a prompt and
for tasks including the structure of CLIP in this study. CLIP’s performance, did not follow normality as tested
To classify and detect defect images according to their by the Shapiro–Wilk test. The detection performances
defect type without any training data, this study computed were excluded in this experiment because the detection
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1544 YONG et al.

FIGURE 3 Epitomical defect images

compared them with the single-modal prompts (textual

prompts only and visual prompts only) in classifying and
detecting defects.
Additionally, when comparing the performances of two
groups, particularly for experiments conducted 30 times,
this study leveraged the independent two-sample t-test to
verify the existence of a performance gap. When the p-
value of the t-test was less than 0.05, this study concluded
the presence of a statistically significant performance gap.
Parameters of CLIP were fixed to the initial setting so as not
FIGURE 2 Methodology of the present study to affect the experiment results. All experiments were con-
ducted using PyTorch on a desktop computer with a single
GPU (CPU: Intel Core [email protected] GHz16 Core, RAM:
aimed to identify not only one defect but multiple defects 64GB, and GPU: Nvidia Geforce RTX 3080Ti).
concurrently.
Third, this study comparatively measured the descrip-
tive power of visual and textual prompts. This study 4.2 Datasets
deployed multiple defect images as a prompt, similar to
the few-shot transfer described in Section 3.3, for defect As this study focused on a zero-shot transfer approach,
classification and detection tasks. To build visual prompts, no images were required for training. Nonetheless, defect
this study randomly selected a few defect images from images were required to evaluate the proposed prompt
datasets, which will be described in Section 4.2. Subse- engineering methods. Four types of defects—mildew, nail
quently, these images were transposed with conventional popping, peeling, and wrinkling—were collected through
methods, including shifting, tilting, zooming, and flipping. web crawling. Crack images were collected from the open
This experiment was conducted 30 times to avoid sampling “crack” image dataset provided by Özgenel (2019). Figure 3
bias. displays an epitomical image for each defect.
Fourth, this study identified the feature difference To avoid the influence of class imbalance on the model
between defect images and descriptions by mapping performance, this study constructed a balanced dataset
embedding vectors (512 dimensions) into low-level dimen- composed of 100 images for each class. Furthermore, this
sions using PCA. Generally, approximately 85% of the study maintained the raw size of images during data gath-
cumulative variance percentage, which refers to the ratio ering to inspect diverse defect sizes. This balanced dataset
of the explanation of subset data divided by that of the was used for defect classification.
entire data (Fu et al., 2012), is used as a criterion to rep- This study built an imbalanced dataset for defect detec-
resent the entire feature. This study revealed that at least tion so as to mimic the real environment task of finding
57 dimensions are significant (85.19%). However, to visu- new defects from many non-defect cases (Lan et al.,
ally understand the feature difference, this study projected 2018). To create the dataset for defect detection, this study
the embedding vectors into two dimensions (29.70% for selected 50 random defect images from the developed
image embeddings and 34.87% for text embeddings). Sub- defect classification dataset and 1600 random non-defect
sequently, aiming to cover all the features from images images from Peer Hub ImageNet (Φ-Net) Task 2 (Y. Gao
and texts, this study ensembled the visual and textual & Mosalam, 2020) and shuffled them. The imbalance ratio
prompts within the vector space (512 dimensions) and was set to 32:1 (non-defect:defect).
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1545

This study also harnessed the test dataset from Y. Gao

et al. (2021) for the direct comparison of GAN, trans-
fer learning, and supervised learning with the proposed
method. Y. Gao et al. (2021) studied the detection of struc-
tural damages from a dataset composed of a mixture of
structural damages and undamaged images (non-defect
images). This test dataset includes 5100 images (4800 for
undamaged and 300 for cracked). The major difference
between the study by Y. Gao et al. (2021) and the present
study is that the proposed method does not require any
training datasets.
FIGURE 4 Zero-shot defect classification performances

4.3 Performance indices

were 0.817 and 0.820 (DK1), respectively. The lowest F1-
The zero-shot performance using the prompt form pro- score and accuracy of the DK prompts were 0.760 and
posed by the developers of CLIP was set as the baseline. 0.768 (DK3), respectively. However, these performances
The accuracy and Fβ -score were used to measure the clas- were higher than not only the baseline (F1-score = 0.713;
sification and detection performance. Accuracy is the most accuracy = 0.736) but also the highest performance of the
widely used empirical measure in machine learning stud- GK prompts (F1-score = 0.715; accuracy = 0.720). The GK
ies. The Fβ -score (Equation 4) focuses on the true positive, prompts showed the lowest average performance (average
considering recall and precision. F1-score = 0.667; average accuracy = 0.683). One of the
( 2 ) possible reasons for the weakest performance of the GK
𝛽 + 1 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛∗𝑟𝑒𝑐𝑎𝑙𝑙 prompts is that their descriptions use general terms rather
𝐹𝛽 − 𝑠𝑐𝑜𝑟𝑒 = (4) than feature-specific terms.
𝛽 2 ∗𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
The ensembling of prompts generally improved the
Fβ -score is the weighted harmonic average of the recall performance. The DK_ensemble improved the zero-shot
and precision. When balanced data classes are analyzed, defect classification performance to an F1-score of 0.835
the weight (𝛽) is generally set to 1 (F1-score) for the and accuracy of 0.834 and exhibited the most robust zero-
equal consideration of recall and precision (Sokolova et al., shot defect classification performance. The performances
2006). Since this study used equal quantities of data for of the GK_ensemble were still lower than those of the DK
each class for defect classification, the F1-score was used prompts (DK1, 2, and 3), however, were higher than the
as a performance index for the classification task. How- baseline (baseline).
ever, with respect to imbalanced datasets, the weight (𝛽) To study the cause of increased performance under DK
is recommended to be set to a value larger than 1, such as prompts, this study analyzed the F1-scores of the classifi-
𝛽 = 2 (F2-score), for emphasizing recall (Y. Gao et al., 2021). cation task for each defect type (Table 3). The DK prompts
The rationale for focusing on recall is to prevent the model exhibited the most reliable results, with the least stan-
from confusing a defect for non-defect, which exerts a high dard deviations between the defect types. For example,
negative impact on dwellers and construction companies all nine prompts used in the experiment exhibited a rel-
than mispredicting a non-defect as a defect. Thus, in addi- atively low performance in classifying “peeling” images,
tion to accuracy, the F2-score was used as a performance compared to the other defect images. However, the per-
index for defect detection to assign a high weight toward formance gaps between “peeling” and the other defect
the reduction of this error. types were maintained within 0.2 in the case of the DK
prompts, whereas the gaps sometimes reached 0.6 in the
5 RESULTS AND ANALYSIS case of baseline and GK prompts. This could be because
the DK prompts included detailed explanations for the
5.1 Domain knowledge prompts versus defect class using terms from domain-specific dictionar-
general knowledge prompts ies. To validate this hypothesis, this study tested the change
in the cosine similarity by adding a word stepwise. It was
Figure 4 shows the F1-score and accuracy of the zero-shot observed that the cosine similarity increased when core
defect classification to compare the performance between terms that depicted the features of a defect were added. For
the prompts based on domain and general knowledge. example, the cosine similarity between a “peeling” image
The DK prompts outperformed (average F1-score = 0.783; and the prompt increased when core feature terms such as
average accuracy = 0.786) both the GK prompts and base- “paint,” “debonds,” “dislodgement,” and “adhered” were
line. The highest F1-score and accuracy of the DK prompts added.
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1546 YONG et al.

TA B L E 3 F1-scores based on the defect type, for each prompt

Domain General
knowledge knowledge
(DK (GK
Defects type Baseline ensemble) DK1 DK2 DK3 ensemble) GK1 GK2 GK3
Crack 0.808 0.930 0.897 0.884 0.965 0.810 0.717 0.823 0.816
Mildew 0.803 0.817 0.842 0.778 0.667 0.815 0.800 0.830 0.651
Nail popping 0.917 0.696 0.886 0.786 0.718 0.870 0.910 0.832 0.817
Peeling 0.333 0.848 0.646 0.581 0.684 0.485 0.300 0.507 0.490
Wrinkling 0.701 0.835 0.813 0.769 0.832 0.658 0.344 0.585 0.592
Average 0.713 0.835 0.820 0.759 0.773 0.727 0.614 0.715 0.673
Standard 0.225 0.089 0.101 0.110 0.125 0.157 0.276 0.157 0.143
deviation

ping” as “mildew.” However, the model detected “crack”

the most (99.4%), followed by “nail popping” (92.06%).
The results were also compared with those obtained by
Y. Gao et al. (2021) to understand the extend of the results
of this study (Table 4). The same dataset used in Y. Gao
et al.’s study was used for this experiment as described in
Section 4.2. Y. Gao et al. (2021) conducted a task to detect
one type of defect “crack” from an imbalanced dataset
(undamaged:crack = 16:1) using six different approaches:
(1) a baseline shallow CNN classifier (BSL), (2) BSL with
FIGURE 5 Zero-shot defect detection performances undersampling the majority-calss data to restore the class
balance (BUS), (3) BSL with oversampling minority-class
data by conventional data augmentation (BOS-DA), (4)
In the detection task, both DK and GK prompts outper- BSL with oversampling the minority-class data by ordi-
formed the baseline with respect to accuracy and F2-score nary GAN-generated data (BOS-GAN), (5) BSL adopting
(Figure 5). The highest accuracy and F2-score of the the synthetic data fine-tuning (BSL-SDF), and (6) bal-
DK prompt group and those of the GK prompt group anced semisupervised GAN (BSS-GAN). The comparison
were equal (accuracy = 0.975; F2-score = 0.593). The t- demonstrates that the proposed zero-shot defect detection
test revealed no statistically significant performance gap using prompts based on domain knowledge and general
between the DK and GK prompts in the detection task knowledge (accuracy = 0.989 and F2-score = 0.907, at
(accuracy: p = 0.853 > 0.050; F2-score: p = 0.853 > 0.050). their highest) outperforms the supervised learning-, fine-
The highest F2-scores of the DK and GK prompts were tuning-, and GAN-based models tested by Y. Gao et al.
twice as high as those of the baseline. (2021) (accuracy = 0.954 and F2-score = 0.728, respectively,
However, zero-shot image classification and detection at their highest). This was a surprising result because the
are challenging even for humans because defect images proposed methods did not require any training.
cannot easily be classified by relying solely on a descrip-
tion, whether this description is based on general or
domain knowledge. Moreover, some images with different 5.2 Complete sentence definitions
types of defects are similar. The following are exam- against core terms
ples of types of defects that were mostly confused in the
classification and detection tasks. The defect classification and detection performances in the
In the classification task using DK prompts (i.e., DK1, previous section were compared with those acquired after
DK2, and DK3), the model misclassified “peeling” the most removing stopwords from the complete sentence defini-
among the five defect types. Out of 300 “peeling” images, tions used in the previous section. Figures 6 and 7 show
48 were miscategorized as “mildew” and 32 as “crack.” the results. In five and six cases out of seven, both clas-
In the detection task using DK prompts, (1) “peeling” sification and detection performances were significantly
was misclassified as “crack” the most, (2) “mildew” as decreased when using only the core terms in definitions
“peeling,” (3) “wrinkling” as “peeling,” and (4) “nail pop- by removing stopwords. During classification, both accu-
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1547

TA B L E 4 Comparison of crack detection performances between the proposed method and the results from Y. Gao et al. (2021)
Experimental
Source group Training data Test data Accuracy F2 score
The proposed Baseline 0 5100 0.985 0.877
method DK1 0 5100 0.987 0.890
DK2 0 5100 0.986 0.880
DK3 0 5100 0.989 0.907
GK1 0 5100 0.988 0.897
GK2 0 5100 0.983 0.857
GK3 0 5100 0.988 0.900
Benchmark (Y. Gao BSL 10,200 5100 0.954 0.352
et al., 2021) BUS 10,200 5100 0.940 0.466
BOS-DA 10,200 5100 0.962 0.501
BOS-GAN 10,200 5100 0.954 0.336
BSL-SDF 10,200 5100 0.902 0.434
BSS-GAN 10,200 5100 0.921 0.728

prompt group decreased by an average of 1.90 % and 1.60

%, respectively. The results of the baseline exhibited the
largest decrease (accuracy 40.76 %; F1-score 43.90 %).
Compared to classification, only the F2-score was
significantly affected by the removal of stopwords. The
largest fall in the F2-score was 29.70 % in the GK prompt
group. The F2-score of the DK prompt group decreased by
23.80 %. In contrast, the F2-score of the baseline improved
marginally by 2.90 %. This shows that although stopwords
may not convey core semantic information with respect to
defects, they play an important role in image classification
F I G U R E 6 Defect classification performance change after the
and detection. One conjecture is that although the seman-
removal of stopwords in a prompt
tic significance of stopwords is relatively weak, stopwords
contribute to performance improvement because they
provide contextual relationships between core terms to a
VLP model.
The improved performance of complete sentence def-
initions might have been because of the greater number
of words in complete definitions than lists of core terms.
To verify this, this study analyzed the correlation between
the number of words in a prompt and zero-shot defect
classification performance using the Spearman’s rank
correlation coefficient. Figure 8 shows the results. The
Spearman’s rank correlation coefficient was approximately
zero (−0.04) with a corresponding p-value of 0.81 (> 0.05).
F I G U R E 7 Defect detection performance change after the That is, the number of words in a prompt exhibited a
removal of stopwords in a prompt considerably weak correlation with the zero-shot classifi-
cation performance. This result shows that the improved
racy and F1-score were significantly decreased. Whereas performance of complete sentence definitions was not
the average accuracy and F1-score of the DK prompt group because of the number of words, however, because of the
dropped by 9.41% and 10.68%, respectively, those of the GK prompt construction method.
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1548 YONG et al.

F I G U R E 8 Relationship between the defect classification F I G U R E 1 0 Comparison of the defect detection performances
performance (F1-score) and the number of words in a prompt between textual and visual prompts

F I G U R E 1 1 Principal component analysis (PCA) results using

F I G U R E 9 Comparison of the defect classification image and text embedding vectors
performances between textual and visual prompts

mance. Moreover, they reveal that a text prompt including

5.3 Visual prompts versus textual domain knowledge can be a more appropriate prompt than
prompts a single visual cue.

This study measured the descriptive power of visual and

textual prompts in defect image classification and detec- 5.4 Single-modal prompts versus
tion. Figure 9 shows the changes in the defect classification multimodal prompts
performance for the increasing number of prompt images.
The classification performance of CLIP increased for the There is a feature difference between the VLP model pro-
increasing number of images. However, the textual prompt cessing of the textual and visual prompts (Figure 11). While
(DK_ensemble), which contained the highest performance images of mildew and nail popping shared certain features,
within the DK prompt group, showed a higher perfor- there is a significant gap between their textual descrip-
mance than the other groups up to three input images. tions. One possible explanation is the position of their
Over four input images, a visual prompt performed bet- occurrence. Both defects primarily occurred on walls. The
ter than textual prompts. Furthermore, the one-shot visual information on the wall was more in images (backgrounds)
prompt performed better in defect classification than the than in texts (words).
baseline or the best GK_ensemble. One of the recent solutions to this problem is to deploy
The detection revealed similar results to those of the multimodal data (both images and texts) or derive one type
classification. However, the tipping point of the detec- of data from the other, instead of choosing either one of
tion task, in which visual prompts defeat textual prompts, them. An example of a study that combines visual and
was marginally lagged (Figure 10). From the fifth image language data is the image captioning model developed
onward, visual prompts showed superior performance to for bridge damages by Chun et al. (2022). The present
textual prompts with respect to F2-score. These results study compared and analyzed the differences between the
indicate that the greater the number of images used as a performances of single-modal and multimodal prompts.
prompt, the higher the classification and detection perfor- The results showed that multimodal prompts composed
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1549

T A B L E 5 Defect classification performances in ensembling general knowledge-based definitions with 16 images (accu-
visual and textual prompts racy = 0.933; F1-score = 0.934). Furthermore, this mul-
Prompt Accuracy F1-score timodal prompt showed the highest performance on the
DK_ensemble (text) 0.834 0.835 defection detection (accuracy = 0.981; F2-score = 0.683).
GK_ensemble (text) 0.736 0.727 Unlike the previous tasks, combinations of images and
1-shot (image) 0.770 0.762 a GK prompt appeared to perform marginally better than
DK_ensemble + 1-shot (text + image) 0.825 0.821 the combinations of images and a DK prompt, not only in
GK_ensemble + 1-shot (text + image) 0.837 0.834
the classification task but also in the detection task. How-
ever, the t-test showed that the differences in the detection
2-shot (image) 0.817 0.807
task were not statistically significant (all p-values > 0.05).
DK_ensemble + 2-shot (text + image) 0.864 0.861
This study cannot explain the better performance of the
GK_ensemble + 2-shot (text + image) 0.870 0.868
combination of GK prompts and images than the others.
4-shot (image) 0.864 0.860 However, this study assumes that the GK + image prompts
DK_ensemble + 4-shot (text + image) 0.896 0.894 capture the core semantic features of a defect from images
GK_ensemble + 4-shot (text + image) 0.904 0.904 and general features from descriptions. This is because the
8-shot (image) 0.891 0.890 PCA results of image embeddings are completely different
DK_ensemble + 8-shot (text + image) 0.917 0.917 from those of text embeddings as shown in Figure 11.
GK_ensemble + 8-shot (text + image) 0.927 0.928
16-shot (image) 0.902 0.901
6 CONCLUSION
DK_ensemble + 16-shot (text + image) 0.925 0.925
GK_ensemble + 16-shot (text + image) 0.933 0.934
Previous studies on deep learning-based defect inspection
have required a large number of defect images for appli-
T A B L E 6 Defect detection performances in ensembling visual cability because they deployed supervised learning. Zero-
and textual prompts shot or few-shot transfer using VLP models such as CLIP
Prompt Accuracy F2-score are expected to be the alternative for traditional supervised
DK_ensemble (text) 0.975 0.591
learning, particularly for data-insufficient tasks. Defect
inspection is one of such tasks, where data acquisition is
GK_ensemble (text) 0.976 0.605
difficult, owing to their sensitivity. The zero-shot perfor-
1-shot (image) 0.966 0.433
mance of a VLP model varies according to the input data
DK_ensemble + 1-shot (text + image) 0.974 0.570
(prompts). However, the identification of optimal prompt
GK_ensemble + 1-shot (text + image) 0.974 0.567 for the improvement of zero-shot performance has not
2-shot (image) 0.970 0.512 been widely studied. This study experimented with four
DK_ensemble + 2-shot (text + image) 0.976 0.610 different groups of prompts to construct an optimal prompt
GK_ensemble + 2-shot (text + image) 0.976 0.611 to bolster the zero-shot performance in classifying and
4-shot (image) 0.975 0.595 detecting five types of building defects (crack, mildew,
DK_ensemble + 4-shot (text + image) 0.979 0.656 nail popping, peeling, and wrinkling). The major results
GK_ensemble + 4-shot (text + image) 0.979 0.659 acquired from testing the hypotheses are as follows:
8-shot (image) 0.978 0.634
DK_ensemble + 8-shot (text + image) 0.980 0.674 H1: A domain knowledge-based definition of a
GK_ensemble + 8-shot (text + image) 0.980 0.671
defect performs better as a prompt than a general
knowledge-based definition
16-shot (image) 0.979 0.648
DK_ensemble + 16-shot (text + image) 0.981 0.679
The results indicated domain knowledge to perform bet-
GK_ensemble + 16-shot (text + image) 0.981 0.683
ter (average accuracy = 0.786; average F1-score = 0.783)
than general knowledge as a prompt for classifying defects.
of images and text performed better than single-modal Domain knowledge also returned the most reliable classi-
prompts, deploying only images or text for both classification performance across different defect types, exhibit-
fication and detection. Table 5 compares the accuracies ing the smallest standard deviations (< 0.2), whereas
and F1-scores from classification, whereas Table 6 com- the baseline and general knowledge exhibited standard
pares the accuracies and F2-scores from detection. The deviations of 0.6. In the zero-shot defect detection, both
highest accuracy and F1-score of the classification task domain knowledge and general knowledge improved per-
were obtained from the multimodal prompt, leveraging formance; however, the difference was not statistically
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1550 YONG et al.

significant (p > 0.05). Furthermore, zero-shot crack detec- ing improves exponentially. Among them, this study made
tion using the proposed prompt methods outperformed a unique contribution in that it focused on the initial
supervised learning, transfer learning, and GAN-based “prompt construction” method, while previous studies
approaches experimented in previous study (Y. Gao et al., focused on “prompt tuning (word tuning).” (2) It demon-
2021). strated the possibility of replacing traditional supervised
learning with zero-shot transfer, which does not require
H2: A list of core terms in a definition performs better as a training process with a very large amount of data. The
a prompt, than a complete sentence definition. required data are “construction domain knowledge” and
a small number of images. (3) It revealed that a care-
A complete sentence definition with stopwords per- ful selection of a prompt is required to elicit complete
formed better as a prompt than a set of core terms that advantage of a VLP model. (4) It showed the feasibility
were extracted from the complete sentence definition after of multimodal information in vision-related tasks in the
removing the stopwords. To examine whether this result construction industry. The results of this study may be
is because a complete sentence definition includes more used as a baseline for future zero-shot or few-shot trans-
words than a set of core terms, the correlation between fer studies and to classify or detect construction-specific
the number of words in a prompt and its performance was tasks or objects, including construction activity recogni-
measured. The Spearman’s rank correlation coefficient test tion, construction equipment detection, or construction
indicated that there was no statistically significant cor- method classification.
relation between the number of words in a prompt and To extend the contributions of this study, two frame-
performance (rS = −0.04, p = 0.81 > 0.05) works are suggested for future research. The first involves
building a construction-VLP model with a plurality of con-
H3: A defect image is better than the defect definition, as struction (image, text) pairs. Considering a large scale
a prompt. of parameters, initializing the parameter of a model and
training it with a domain-specific dataset can be a bet-
For defect classification, when the number of images <4, ter option than fine-tuning the model with a relatively
domain knowledge-based definitions exhibited stronger smaller dataset. After realizing a construction-specific VLP
descriptive power as a prompt than images. However, for model, more accurate and useful zero-shot capabilities will
defect detection, when the number of images >4, images be available in many construction areas. Another potential
were better prompts than knowledge-based definitions. area is that which predicts maintenance cost with multi-
modal information, for example, a defect report with an
H4: A multimodal prompt with the combination of image and its description as a solution. If the maintenance
defect images and definitions performs better than a cost information regarding a safety report is acquired, a
single-modal prompt. cost estimation model, which assumes a defect report as
an input and a maintenance cost as an output, can be built.
When the combination of images and texts was This model may be more practical as it considers a visual
used, instead of using either only images or text, as a and a textural feature. Further, this model can connect
prompt, zero-shot transfer through CLIP performed best image–text features with a financial feature.
in both classifying (highest accuracy = 0.933; highest However, a few issues remain for further studies. First,
F1-score = 0.934) and detecting defects (highest accu- the zero-shot approach used in this study possesses the
racy = 0.981; F2-score = 0.683). Moreover, in all cases, advantage of not requiring the collection and labeling
the proposed prompt methods performed better than the of large datasets. However, the proposed methods have
baseline case (the accuracy and F1-score for the classifica- not been tested in a large-scale real-world project. Sec-
tion task: 0.736 and 0.713; the accuracy and F2-score for ond, using the proposed zero-shot transfer method with
the detection task: 0.956 and 0.275), which is a benchmark appropriate prompts, a defect type can be identified with
prompt in the computer vision field proposed by Radford considerably high accuracy. However, it is still challenging
et al. (2021). The results of this study confirm that domain to identify detailed defect information, such as the severity
knowledge-based prompts enhance the performance of of defects, which can be measured by the width or length
zero-shot detection and classification. of a crack or the angle between building components.
The primary contributions of this study are as follows: These remain as future work. Overall, despite the scope for
(1) There have not been many studies on prompt engineer- improvement, the proposed method is expected to enable
ing yet, although the importance of prompt engineering is practitioners with only introductory knowledge of AI to
rapidly recognized as the performance of zero-shot learn- apply AI technology in everyday construction work.
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1551

ORCID Colavita, F. B., Tomko, R., & Weisberg, D. (1976). Visual prepotency
Gunwoo Yong https://orcid.org/0000-0003-0912-4520 and eye orientation. Bulletin of the Psychonomic Society, 8, 25–26.
https://doi.org/10.3758/BF03337062
Conde, M. V., & Turgutlu, K. (2021). CLIP-art: Contrastive pre-
REFERENCES training for fine-grained art classification. Proceedings of the
Agarwal, S., Krueger, G., Clark, J., Radford, A., Kim, J. W., IEEE/CVF Conference on Computer Vision and Pattern Recogni-
& Brundage, M. (2021). Evaluating CLIP: Towards character- tion, Nashville, TN (pp. 3951-395).
ization of broader capabilities and downstream implications. Crestwoodpainting. (n.d.). Nail pops: What you should know. https://
arXiv:2108.02818 [cs]. crestwoodpainting.com/nail-pops/
Amezquita-Sanchez, J. P., & Adeli, H. (2015). Synchrosqueezed Cui, Z., Wang, Q., Guo, J., & Lu, N. (2022). Few-shot classification of
wavelet transform-fractality model for locating, detecting, and façade defects based on extensible classifier and contrastive learn-
quantifying damage in smart highrise building structures. Smart ing. Automation in Construction, 141, 104381. https://doi.org/10.
Materials and Structures, 24, 065034. https://doi.org/10.1088/0964- 1016/j.autcon.2022.104381
1726/24/6/065034 D’Addario, J. (2020). New survey finds British businesses are reluctant
Amiri, G. G., Abdolahi Rad, A., Aghajari, S., & Khanmohamadi to proactively share data. https://theodi.org/article/new-survey-
Hazaveh, N. (2012). Generation of near-field artificial ground finds-just-27-of-british-businesses-are-sharing-data/
motions compatible with median-predicted spectra using PSO- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L.
based neural network and wavelet analysis. Computer-Aided Civil (2009). ImageNet: A large-scale hierarchical image database. 2009
and Infrastructure Engineering, 27, 711–730. https://doi.org/10. IEEE Conference on Computer Vision and Pattern Recognition,
1111/j.1467-8667.2012.00783.x Miami, FL (pp. 248–255). https://doi.org/10.1109/CVPR.2009.5206
Audebert, N., Herold, C., Slimani, K., & Vidal, C. (2019). Multimodal 848
deep networks for text and image-based document classification. Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert:
In Joint European Conference on Machine Learning and Knowledge Pre-training of deep bidirectional transformers for language
Discovery in Databases (pp. 427–443). Springer, Cham. understanding. arXiv preprint arXiv:1810.04805.
Azimi, M., & Pekcan, G. (2020). Structural health monitoring using Dong, A., Maher, M. L., Kim, M. J., Gu, N., & Wang, X. (2009). Con-
extremely compressed data through deep learning. Computer- struction defect management using a telematic digital workbench.
Aided Civil and Infrastructure Engineering, 35, 597–614. https://doi. Automation in Construction, 18, 814–824. https://doi.org/10.1016/j.
org/10.1111/mice.12517 autcon.2009.03.005
Bang, S., Park, S., Kim, H., & Kim, H. (2019). Encoder–decoder net- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai,
work for pixel-level road crack detection in black-box images. X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G.,
Computer-Aided Civil and Infrastructure Engineering, 34, 713–727. Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is
https://doi.org/10.1111/mice.12440 worth 16×16 words: Transformers for image recognition at scale.
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing arXiv:2010.11929 [cs]. https://doi.org/10.48550/arXiv.2010.11929
with Python (1st ed.). O’Reilly. Fan, C., & Mostafavi, A. (2019). A graph-based method for social sens-
Bishop, C. M. (2006). Pattern recognition and machine learning, ing of infrastructure disruptions in disasters. Computer-Aided Civil
information science and statistics. Springer. and Infrastructure Engineering, 34, 1055–1070. https://doi.org/10.
Brants, T. (2003). Natural language processing in information 1111/mice.12457
retrieval. CLIN. Fu, J., Huang, C., Xing, J., & Zheng, J. (2012). Pattern classification
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, using an olfactory model with PCA feature selection in electronic
P., Neelakantan, A., Krueger, G., Henighan, T., Child, R., Ramesh, noses: Study and application. Sensors, 12, 2818–2830. https://doi.
A., Ziegler, D. M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, org/10.3390/s120302818
E., Litwin, M., Gray, S., . . . Amodei, D. (2020). Language models Gallo, I., Calefati, A., Nawaz, S., & Janjua, M. K. (2018). Image
are few-shot learners. Advances in neural information processing and encoded text fusion for multi-modal classification. 2018
systems, 33, 1877–1901. Digital Image Computing: Techniques and Applications (DICTA),
Chang, M.-W., Ratinov, L., Roth, D., & Srikumar, V. (2008). Impor- Canberra, Australia (pp. 1–7). https://doi.org/10.1109/DICTA.2018.
tance of semantic representation: Dataless classification. Proceed- 8615789
ings of the Twenty-Third AAAI Conference on Artificial Intelligence, Gao, T., Fisch, A., & Chen, D. (2021). Making pre-trained lan-
Chicago, IL (pp. 830–835). guage models better few-shot learners. Proceedings of the 59th
Chen, Y. C., Li, L., Yu, L., El Kholy, A., Ahmed, F., Gan, Z., Cheng, Annual Meeting of the Association for Computational Linguistics
Y., & Liu, J. (2020). Uniter: Universal image-text representation and the 11th International Joint Conference on Natural Language
learning. In European conference on computer vision (pp. 104–120). Processing. 1, (pp. 3816–3830).
Springer, Cham. Gao, Y., Kong, B., & Mosalam, K. M. (2019). Deep leaf-bootstrapping
Chun, P., Yamane, T., & Maemura, Y. (2022). A deep learning-based generative adversarial network for structural image data augmen-
image captioning method to automatically generate comprehen- tation. Computer-Aided Civil and Infrastructure Engineering, 34,
sive explanations of bridge damage. Computer-Aided Civil and 755–773. https://doi.org/10.1111/mice.12458
Infrastructure Engineering, 37, 1387–1401. https://doi.org/10.1111/ Gao, Y., & Mosalam, K. M. (2018). Deep transfer learning for image-
mice.12793 based structural damage recognition. Computer-Aided Civil and
Colavita, F. B. (1974). Human sensory dominance. Perception & Infrastructure Engineering, 33, 748–768. https://doi.org/10.1111/
Psychophysics, 16, 409–412. https://doi.org/10.3758/BF03203962 mice.12363
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1552 YONG et al.

Gao, Y., & Mosalam, K. M. (2020). PEER hub ImageNet: A large-scale tion for Computational Linguistics, 8, 423–438. https://doi.org/10.
multiattribute benchmark data set of structural images. Journal 1162/tacl_a_00324
of Structural Engineering, 146, 04020198. https://doi.org/10.1061/ Kaur, P., Sikka, K., & Divakaran, A. (2017). Combining weakly
(ASCE)ST.1943-541X.0002745 and webly supervised learning for classifying food images.
Gao, Y., Zhai, P., & Mosalam, K. M. (2021). Balanced semisu- arXiv:1712.08730 [cs].
pervised generative adversarial network for damage assessment Khorramshahi, P., Rambhatla, S. S., & Chellappa, R. (2021). Towards
from low-data imbalanced-class regime. Computer-Aided Civil accurate visual and natural language-based vehicle retrieval sys-
and Infrastructure Engineering, 36, 1094–1113. https://doi.org/10. tems. 2021 IEEE/CVF Conference on Computer Vision and Pattern
1111/mice.12741 Recognition Workshops (CVPRW), Nashville, TN (pp. 4178–4187).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT https://doi.org/10.1109/CVPRW53098.2021.00472
Press. Kim, W., Son, B., & Kim, I. (2021). ViLT: Vision-and-language trans-
Gorse, C. A., Johnston, D., & Pritchard, M. (2012). A dictionary former without convolution or region supervision. Proceedings
of construction, surveying, and civil engineering (1st ed.). Oxford of the 38th International Conference on Machine Learning (pp.
University Press. 5583–5594).
Goyal, Y., Khot, T., Summers-Stay, D., Batra, D., & Parikh, D. Kupi, M., Bodnar, M., Schmidt, N., & Posada, C. E. (2021). dictNN:
(2017). Making the v in vqa matter: Elevating the role of image A dictionary-enhanced CNN approach for classifying hate speech
understanding in visual question answering. In Proceedings of the on Twitter. arXiv:2103.08780 [cs.CL] 1–8.
IEEE conference on computer vision and pattern recognition (pp. Lampert, C. H., Nickisch, H., & Harmeling, S. (2009). Learning to
6904–6913). detect unseen object classes by between-class attribute transfer.
Guo, J., Wang, Q., & Li, Y. (2021). Semi-supervised learning based 2009 IEEE Conference on Computer Vision and Pattern Recogni-
on convolutional neural network and uncertainty filter for façade tion, Miami, FL (pp. 951–958). https://doi.org/10.1109/CVPR.2009.
defects classification. Computer-Aided Civil and Infrastructure 5206594
Engineering, 36, 302–317. https://doi.org/10.1111/mice.12632 Lan, M., Zhang, Y., Zhang, L., & Du, B. (2018). Defect detection from
Guo, J., Wang, Q., Li, Y., & Liu, P. (2020). Façade defects classification UAV images based on region-based CNNs. 2018 IEEE International
from imbalanced dataset using meta learning-based convolu- Conference on Data Mining Workshops (ICDMW), Singapore, Sin-
tional neural network. Computer-Aided Civil and Infrastructure gapore (pp. 385–390). https://doi.org/10.1109/ICDMW.2018.00063
Engineering, 35, 1403–1418. https://doi.org/10.1111/mice.12578 Li, A., Jabri, A., Joulin, A., & Van Der Maaten, L. (2017). Learn-
Harris, C. M. (2006). Dictionary of architecture and construction. ing visual n-grams from web data. In Proceedings of the IEEE
McGraw-Hill Education. International Conference on Computer Vision (pp. 4183–4192).
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for Li, D., Cong, A., & Guo, S. (2019). Sewer damage detection from
image recognition. arXiv:1512.03385 [cs]. imbalanced CCTV inspection data using deep convolutional
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning neural networks with hierarchical classification. Automation in
for image recognition. In Proceedings of the IEEE conference on Construction, 101, 199–208. https://doi.org/10.1016/j.autcon.2019.
computer vision and pattern recognition (pp. 770–778). 01.017
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., & Li, M. (2019). Bag of Li, L. H., Yatskar, M., Yin, D., Hsieh, C.-J., & Chang, K.-W. (2019).
tricks for image classification with convolutional neural networks. VisualBERT: A simple and performant baseline for vision and
In Proceedings of the IEEE/CVF conference on computer vision and language. arXiv preprint arXiv:1908.03557. 1–14.
pattern recognition (pp. 558–567). Li, X., Yin, X., Li, C., Zhang, P., Hu, X., Zhang, L., Wang, L., Hu,
Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2021). H., Dong, L., Wei, F., Choi, Y., & Gao, J. (2020). OSCAR: Object-
Meta-learning in neural networks: A survey. IEEE transactions on semantics aligned pre-training for vision-language tasks. In A.
pattern analysis and machine intelligence, 44(9), 5149–5169. Vedaldi, H. Bischof, T. Brox, & J.-M. Frahm (Eds.), European con-
Hotelling, H. (1933). Analysis of a complex of statistical variables into ference on computer vision 2020, lecture notes in computer science.
principal components. Journal of Educational Psychology, 24, 417– (pp. 121–137). Springer International Publishing, https://doi.org/
441. https://doi.org/10.1037/h0071325 10.1007/978-3-030-58577-8_8
Hu, M., & Li, J. (2019). Exploring bias in GAN-based data augmenta- Li, Y., Che, P., Liu, C., Wu, D., & Du, Y. (2021). Cross-scene
tion for small samples. arXiv:1905.08495 [cs, stat]. pavement distress detection by a novel transfer learning frame-
Huang, Z., Zeng, Z., Liu, B., Fu, D., & Fu, J. (2020). Pixel-BERT: work. Computer-Aided Civil and Infrastructure Engineering, 36,
Aligning image pixels with text by deep multi-modal transformers. 1398–1415. https://doi.org/10.1111/mice.12674
arXiv:2004.00849 [cs]. Liang, X. (2019). Image-based post-disaster inspection of reinforced
InspectApedia. (n.d.). Construction Dictionary Section 9 Finishes concrete bridge systems using deep learning with Bayesian opti-
Terminology. https://inspectapedia.com/Design/Construction- mization. Computer-Aided Civil and Infrastructure Engineering,
Terms-9-Finishes.txt 34, 415–430. https://doi.org/10.1111/mice.12425
Jiang, S., & Zhang, J. (2020). Real-time crack assessment using deep Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021).
neural networks with wall-climbing unmanned aerial system. Pre-train, prompt, and predict: A systematic survey of prompting
Computer-Aided Civil and Infrastructure Engineering, 35, 549–564. methods in natural language processing. arXiv:2107.13586 [cs].
https://doi.org/10.1111/mice.12519 Liu, V., & Chilton, L. B. (2022). Design Guidelines for Prompt Engi-
Jiang, Z., Xu, F. F., Araki, J., & Neubig, G. (2020). How can we neering Text-to-Image Generative Models. In CHI Conference on
know what language models know? Transactions of the Associa- Human Factors in Computing Systems (pp. 1–23).
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
YONG et al. 1553

Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., & Tang, J. Perez, H., & Tah, J. H. M. (2021). Deep learning smartphone appli-
(2021). GPT understands, too. arXiv:2103.10385 [cs]. cation for real-time detection of defects in buildings. Structural
Lo, R. T. W., He, B., & Ounis, I. (2005). Automatically building a stop- Control and Health Monitoring, 28, e2751. https://doi.org/10.1002/
word list for an information retrieval system. In Journal on Digital stc.2751
Information Management: Special Issue on the 5th Dutch-Belgian Perez, H., Tah, J. H. M., & Mosavi, A. (2019). Deep learning for detect-
Information Retrieval Workshop (DIR), 5 (pp. 17–24). ing building defects using convolutional neural networks. Sensors,
Maeda, H., Kashiyama, T., Sekimoto, Y., Seto, T., & Omata, H. 19, 3556. https://doi.org/10.3390/s19163556
(2021). Generative adversarial network for road damage detection. Posner, M., Nissen, M., & Klein, R. (1976). Visual dominance:
Computer-Aided Civil and Infrastructure Engineering, 36, 47–60. An information-processing account of its origins and signifi-
https://doi.org/10.1111/mice.12561 cance. Psychological Review, 83, 157–71. https://doi.org/10.1037/
Maeda, H., Sekimoto, Y., Seto, T., Kashiyama, T., & Omata, H. 0033-295X.83.2.157
(2018). Road damage detection and classification using deep neu- Pourpanah, F., Abdar, M., Luo, Y., Zhou, X., Wang, R., Lim, C. P.,
ral networks with smartphone images. Computer-Aided Civil and Wang, X.-Z., & Wu, Q. M. J. (2022). A review of generalized zero-
Infrastructure Engineering, 33, 1127–1141. https://doi.org/10.1111/ shot learning methods. IEEE Transactions on Pattern Analysis and
mice.12387 Machine Intelligence. Advance online publication. https://doi.org/
Meijer, D., Scholten, L., Clemens, F., & Knobbe, A. (2019). A defect 10.1109/TPAMI.2022.3191696
classification methodology for sewer image sets with convolu- Promptbase (2022). Promptbase. https://promptbase.com/
tional neural networks. Automation in Construction, 104, 281–298. Qi, D., Su, L., Song, J., Cui, E., Bharti, T., & Sacheti, A. (2020). Image-
https://doi.org/10.1016/j.autcon.2019.04.013 BERT: Cross-modal pre-training with large-scale weak-supervised
Merriam-Webster (2019). The Merriam-Webster dictionary (Newest image-text data. arXiv:2001.07966 [cs].
ed.). Merriam–Webster Inc. Qiu, Q., Xie, Z., Wu, L., & Tao, L. (2020). Dictionary-based automated
Midjourney (2022). Midjourney. https://github.com/midjourney/ information extraction from geological documents using a deep
docs learning algorithm. Earth and Space Science, 7, 1–18. https://doi.
Montague, D. (2017). Dictionary of building and civil engineering: org/10.1029/2019EA000993
English/French French/English (2nd ed.). Routledge. https://doi. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal,
org/10.4324/9780203851227 S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., &
Nabian, M. A., & Meidani, H. (2018). Deep learning for accelerated Sutskever, I. (2021). Learning transferable visual models from nat-
seismic reliability analysis of transportation networks. Computer- ural language supervision. In International Conference on Machine
Aided Civil and Infrastructure Engineering, 33, 443–458. https:// Learning (pp. 8748–8763). PMLR.
doi.org/10.1111/mice.12359 Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018).
Narasimhan, M., Rohrbach, A., & Darrell, T. (2021). CLIP-It! Improving language understanding by generative pre-training.
language-guided video summarization. Advances in Neural Infor- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever,
mation Processing Systems, 34, 13988–14000. I. (2019). Language models are unsupervised multitask learners.
Nichol, A., Achiam, J., & Schulman, J. (2018). On first-order meta- OpenAI blog, 1(8), 9.
learning algorithms. arXiv:1803.02999 [cs]. Rafiei, M. H., & Adeli, H. (2017). A novel machine learning-based
Nine, A. (2022). People have begun to sell their prompts for AI- algorithm to detect damage in high-rise building structures. Struc-
generated artwork. https://www.extremetech.com/internet/ tural Design of Tall and Special Buildings, 26(18), e1400. https://doi.
339304-people-have-begun-to-sell-their-prompts-for-ai- org/10.1002/tal.1400
generated-artwork Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen,
Özgenel, Ç. F. (2019). Concrete crack images for classification. M., & Sutskever, I. (2021). Zero-shot text-to-image generation.
Mendeley Data, V2, https://doi.org/10.17632/5y9wdsg2zt.2 International Conference on Machine Learning (pp. 8821–8831).
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.).
Transactions on Knowledge and Data Engineering, 22, 1345–1359. Butterworth-Heinemann.
https://doi.org/10.1109/TKDE.2009.191 Rotimi, F. E., Tookey, J., & Rotimi, J. O. (2015). Evaluating defect
Park, C.-S., Lee, D.-Y., Kwon, O.-S., & Wang, X. (2013). A frame- reporting in new residential buildings in New Zealand. Buildings,
work for proactive construction defect management using BIM, 5, 39–55. https://doi.org/10.3390/buildings5010039
augmented reality and ontology-based data collection template. Roy, K. C., Hasan, S., & Mozumder, P. (2020). A multilabel clas-
Automation in construction. Augmented Reality in Architecture, sification approach to identify hurricane-induced infrastructure
Engineering, and Construction, 33, 61–71. https://doi.org/10.1016/ disruptions using social media data. Computer-Aided Civil and
j.autcon.2012.09.010 Infrastructure Engineering, 35, 1387–1402. https://doi.org/10.1111/
Paton-Cole, V. P., & Aibinu, A. A. (2021). Construction defects and mice.12573
disputes in low-rise residential buildings. Journal of Legal Affairs Saif, H., Fernandez, M., He, Y., & Alani, H. (2014). On stopwords,
and Dispute Resolution in Engineering Construction, 13, 05020016. filtering and data sparsity for sentiment analysis of Twitter. LREC
https://doi.org/10.1061/(ASCE)LA.1943-4170.0000433 2014, Ninth International Conference on Language Resources
Pearson Education (2014). Longman dictionary of contemporary and Evaluation. Proceedings, Reykjavik, Iceland (pp. 810–
English. Pearson Education. 817).
Peng, W., Huang, C., Li, T., Chen, Y., & Liu, Q. (2020). Dictionary- Schick, T., & Schütze, H. (2021). Exploiting cloze questions for
based data augmentation for cross-domain neural machine trans- few shot text classification and natural language inference. Pro-
lation. arXiv:2004.02577 [cs]. ceedings of the 16th Conference of the European Chapter of the
14678667, 2023, 11, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/mice.12954 by Georgia Institute Of Technology, Wiley Online Library on [28/12/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1554 YONG et al.

Association for Computational Linguistics (pp. 255–269). https:// Wang, M., Xing, J., & Liu, Y. (2021). ActionCLIP: A new paradigm for
doi.org/10.18653/v1/2021.eacl-main.20 video action recognition. arXiv:2109.08472 [cs].
Scott, J. S., & Maclean, J. H. (2000). Dictionary of building (4th UK Wang, W., Bao, H., Dong, L., & Wei, F. (2021). VLMo: Unified
ed.). Penguin UK. vision-language pre-training with mixture-of-modality-experts.
Sedgwick, P. (2014). Spearman’s rank correlation coefficient. BMJ: arXiv:2111.02358 [cs].
British Medical Journal, 349, g7327. Wilbur, W. J., & Sirotkin, K. (1992). The automatic identification of
Shen, S., Li, L. H., Tan, H., Bansal, M., Rohrbach, A., Chang, K.-W., stop words. Journal of Information Science, 18(1), 45–55.
Yao, Z., & Keutzer, K. (2021). How much can CLIP benefit vision- Wu, T., Terry, M., & Cai, C. J. (2022). AI chains: Transparent and con-
and-language tasks? arXiv:2107.06383 [cs]. trollable human-ai interaction by chaining large language model
Shibata, T., Kato, N., & Kurohashi, S. (2007). Automatic object prompts. In CHI Conference on Human Factors in Computing
model acquisition and object recognition by integrating linguistic Systems (pp. 1–22).
and visual information. Proceedings of the 15th International Con- Xiao, J., Hays, J., Ehinger, K. A., Oliva, A., & Torralba, A. (2010). SUN
ference on Multimedia—MULTIMEDIA ’07, Augsburg, Germany. database: Large-scale scene recognition from abbey to zoo. 2010
https://doi.org/10.1145/1291233.1291327 IEEE Computer Society Conference on Computer Vision and Pattern
Shin, T., Razeghi, Y., Logan, R. L. IV., Wallace, E., & Singh, S. Recognition (CVPR), San Francisco, CA (pp. 3485–3492). https://
(2020). AutoPrompt: Eliciting knowledge from language models doi.org/10.1109/CVPR.2010.5539970
with automatically generated prompts. Proceedings of the 2020 Zalejska, J. A., & Hungria, G. R. (2019). Defects in newly constructed
Conference on Empirical Methods in Natural Language Process- residential buildings: Owners’ perspective. International Journal
ing (EMNLP), Online (pp. 4222–4235). https://doi.org/10.18653/v1/ of Building Pathology and Adaptation, 37, 163–185. https://doi.org/
2020.emnlp-main.346 10.1108/IJBPA-09-2018-0077
Silva, C., & Ribeiro, B. (2003). The importance of stop word removal Zhang, R. (2019). Making convolutional networks shift-invariant
on recall values in text categorization. Proceedings of the Inter- again. In International conference on machine learning (pp. 7324–
national Joint Conference on Neural Networks, 2003(3), 1661–1666. 7334). PMLR.
https://doi.org/10.1109/IJCNN.2003.1223656 Zhang, Y., Jiang, H., Miura, Y., Manning, C. D., & Langlotz, C.
Simpson, J., & Weiner, E. (Eds.). (1989). The Oxford English dictionary P. (2020). Contrastive learning of medical visual representations
(2nd ed.). Oxford University Press. from paired images and text. arXiv:2010.00747 [cs].
Sinnett, S., Spence, C., & Soto-Faraco, S. (2007). Visual domi- Zhang, Y., Macdonald, J. H. G., Liu, S., & Harper, P. W. (2021). Damage
nance and attention: The Colavita effect revisited. Perception & detection of nonlinear structures using probability density ratio
Psychophysics, 69, 673–686. https://doi.org/10.3758/BF03193770 estimation. Computer-Aided Civil and Infrastructure Engineering,
Sokolova, M., Japkowicz, N., & Szpakowicz, S. (2006). Beyond accu- 37(7), 878–893. https://doi.org/10.1111/mice.12772
racy, F-score and ROC: A family of discriminant measures for Zhao, J. J., Mathieu, M., & LeCun, Y. (2017). Energy-based generative
performance evaluation. Australasian Joint Conference on Artifi- adversarial networks. 5th International Conference on Learning
cial Intelligence, Canberra, Australia (pp. 1015–1021). https://doi. Representations, ICLR 2017, Toulon, France.
org/10.1007/11941439_114 Zhao, T. Z., Wallace, E., Feng, S., Klein, D., & Singh, S. (2021). Cal-
Srihari, R. K. (1994). Computational models for integrating linguistic ibrate before use: Improving few-shot performance of language
and visual information: A survey. Artificial Intelligence Review, 8, models. In International Conference on Machine Learning (pp.
349–369. 12697–12706). PMLR.
Standards Australia. (n.d.). National dictionary of building & plumb- Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for
ing terms. https://www.constructiondictionary.com.au/ vision-language models. International Journal of Computer Vision,
Tan, H., & Bansal, M. (2019). LXMERT: Learning cross-modality 130(9), 2337–2348.
encoder representations from transformers. Proceedings of the 2019 Zhu, J., Zhang, C., Qi, H., & Lu, Z. (2020). Vision-based defects detec-
Conference on Empirical Methods in Natural Language Processing tion for bridges using transfer learning and convolutional neural
and the 9th International Joint Conference on Natural Language networks. Structure and Infrastructure Engineering, 16, 1037–1049.
Processing (EMNLP-IJCNLP) (pp. 5100–5111). https://doi.org/10. https://doi.org/10.1080/15732479.2019.1680709
18653/v1/D19-1514 Zhu, X., & Goldberg, A. B. (2009). Introduction to semi-
Tolson, S. (2012). Dictionary of construction terms. Informa Law from supervised learning. Synthesis Lectures on Artificial Intelligence
Routledge. https://doi.org/10.4324/9781315850320 and Machine Learning, 3, 1–130. https://doi.org/10.2200/
van Engelen, J. E., & Hoos, H. H. (2020). A survey on semi-supervised S00196ED1V01Y200906AIM006
learning. Machine Learning, 109, 373–440. https://doi.org/10.1007/
s10994-019-05855-6
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, How to cite this article: Yong, G., Jeon, K., Gil,
A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need.
D., & Lee, G. (2023). Prompt engineering for
Advances in Neural Information Processing Systems, Long Beach,
CA (pp. 5998–6008).
zero-shot and few-shot defect detection and
Wali, K. I., & Ali, N. S. (2019). Diagnosis and evaluation of defects classification using a visual-language pretrained
encountered in newly constructed houses in Erbil City, Kurdistan, model. Computer-Aided Civil and Infrastructure
Iraq. Engineering and Technology Journal, 37, 70–77. https://doi. Engineering, 38, 1536–1554.
org/10.30684/etj.37.2A.5 https://doi.org/10.1111/mice.12954

Laser Beam Welding Planner BoK FINAL 13mar2023
No ratings yet
Laser Beam Welding Planner BoK FINAL 13mar2023
11 pages
Military Specification MIL-A-8625 Anodic Coatings - AAC
No ratings yet
Military Specification MIL-A-8625 Anodic Coatings - AAC
2 pages
Mil PRF 23377J
No ratings yet
Mil PRF 23377J
18 pages
(Fold/Cover If You Don'T Wanna See The Answers Yet) B
100% (2)
(Fold/Cover If You Don'T Wanna See The Answers Yet) B
43 pages
Mil STD 882e
100% (1)
Mil STD 882e
104 pages
Mil DTL 83420M
No ratings yet
Mil DTL 83420M
26 pages
Mil DTL 53022F
100% (1)
Mil DTL 53022F
22 pages
List of Standards
No ratings yet
List of Standards
4 pages
Mil PRF 9954D
No ratings yet
Mil PRF 9954D
8 pages
Mil PRF 38534G
No ratings yet
Mil PRF 38534G
96 pages
Mil DTL 3885H
No ratings yet
Mil DTL 3885H
19 pages
Kinetic AppStudioExtensionsUserGuide
No ratings yet
Kinetic AppStudioExtensionsUserGuide
144 pages
Mil PRF 61002B
No ratings yet
Mil PRF 61002B
27 pages
Mil STD 1560C
No ratings yet
Mil STD 1560C
272 pages
Dod STD 2168
No ratings yet
Dod STD 2168
14 pages
MIL-HDBK-217F N2 Parts Count Analysis
No ratings yet
MIL-HDBK-217F N2 Parts Count Analysis
13 pages
Mil HDBK 759C
No ratings yet
Mil HDBK 759C
363 pages
Notes For Practical
No ratings yet
Notes For Practical
49 pages
Department of Defense - Militarily Critical Technologies List Section 19: Space Systems Technology
No ratings yet
Department of Defense - Militarily Critical Technologies List Section 19: Space Systems Technology
169 pages
Iec60068-2-78 Ed2
No ratings yet
Iec60068-2-78 Ed2
18 pages
Mil I 6866
No ratings yet
Mil I 6866
6 pages
Mil I 46058C
No ratings yet
Mil I 46058C
15 pages
MIL-DTL-53072D With Amendment 1
No ratings yet
MIL-DTL-53072D With Amendment 1
38 pages
(NASA NPC-200-2) Quality Program Provisions For Space System Contractors (NHB-5300.4-1B) (1962)
No ratings yet
(NASA NPC-200-2) Quality Program Provisions For Space System Contractors (NHB-5300.4-1B) (1962)
43 pages
Aidco 450E BR
No ratings yet
Aidco 450E BR
4 pages
MIl W 8611.PDF Rev A
No ratings yet
MIl W 8611.PDF Rev A
10 pages
Mil R 4608282 - B
No ratings yet
Mil R 4608282 - B
20 pages
Web Application Architectures
No ratings yet
Web Application Architectures
8 pages
Mil PRF 16173e (
No ratings yet
Mil PRF 16173e (
34 pages
Mil PRF 81322G
No ratings yet
Mil PRF 81322G
10 pages
Mil DTL 6162C
No ratings yet
Mil DTL 6162C
32 pages
Mil STD 130n
No ratings yet
Mil STD 130n
52 pages
Mil STD 130N1
No ratings yet
Mil STD 130N1
71 pages
Mil DTL 23659F
No ratings yet
Mil DTL 23659F
38 pages
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
No ratings yet
Lecturas No 1 Role and Scope of Industrial Engineers (1) - Recognized
14 pages
ASME Y14.5.2 2000 CertificationGDTP
No ratings yet
ASME Y14.5.2 2000 CertificationGDTP
22 pages
Mil STD 883 4
No ratings yet
Mil STD 883 4
43 pages
Preview - ASME Y14 38 2019
No ratings yet
Preview - ASME Y14 38 2019
10 pages
SAE AMS A 8625 A Yr 00 Cancelled 03
100% (1)
SAE AMS A 8625 A Yr 00 Cancelled 03
22 pages
Engineering Deisgn With Polymers
100% (1)
Engineering Deisgn With Polymers
39 pages
Servo Algorithm
No ratings yet
Servo Algorithm
20 pages
Unsupervised Optimal Fuzzy Clustering: I.Gath and A. B. Geva. IEEE Transactions On Pattern
No ratings yet
Unsupervised Optimal Fuzzy Clustering: I.Gath and A. B. Geva. IEEE Transactions On Pattern
34 pages
Davis S. - Fundamentals of Reliability and Maintainability
100% (1)
Davis S. - Fundamentals of Reliability and Maintainability
123 pages
Mil STD 7179a
No ratings yet
Mil STD 7179a
23 pages
ASME - IsO GD&T Standard Differences
100% (1)
ASME - IsO GD&T Standard Differences
1 page
036.000.523 DHS - DCS
No ratings yet
036.000.523 DHS - DCS
56 pages
Mil PRF 28800F
No ratings yet
Mil PRF 28800F
88 pages
MIL-Q-9858 Quality Program Requirements
No ratings yet
MIL-Q-9858 Quality Program Requirements
11 pages
Mil DTL 15514G
No ratings yet
Mil DTL 15514G
25 pages
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
No ratings yet
GEHC DICOM Conformance - Senographe Pristina Zephyr - DOC2139635 - Rev2
212 pages
Asme List
No ratings yet
Asme List
3 pages
Modelling With Ordinary Differential Equations: A Comprehensive Approach (Chapman & Hall/Crc Numerical Analysis and Scientific Computing) Alfio Borzì
100% (2)
Modelling With Ordinary Differential Equations: A Comprehensive Approach (Chapman & Hall/Crc Numerical Analysis and Scientific Computing) Alfio Borzì
55 pages
Supplier Quality Requirements For Partially Defined Dimensions
No ratings yet
Supplier Quality Requirements For Partially Defined Dimensions
5 pages
Pc-Dmis 2017 r1 Nist
No ratings yet
Pc-Dmis 2017 r1 Nist
5 pages
Mil PRF 32058
No ratings yet
Mil PRF 32058
12 pages
Preview Y14-41 2019
No ratings yet
Preview Y14-41 2019
13 pages
MIL PRF 3150d
No ratings yet
MIL PRF 3150d
14 pages
Laboratory Activity 2
No ratings yet
Laboratory Activity 2
19 pages
Evaluation of The Hand-Held Fire Extinguisher On The M1A1: Section 1. Abstract
No ratings yet
Evaluation of The Hand-Held Fire Extinguisher On The M1A1: Section 1. Abstract
18 pages
Mil DTL 43719D
No ratings yet
Mil DTL 43719D
17 pages
Mil STD 1916
No ratings yet
Mil STD 1916
33 pages
Mil-Prf-85285 e (Irr) PDF
No ratings yet
Mil-Prf-85285 e (Irr) PDF
24 pages
CGL Tier-1 Mock - p12
No ratings yet
CGL Tier-1 Mock - p12
1 page
B1-10M Edtn 2004 PDF
No ratings yet
B1-10M Edtn 2004 PDF
26 pages
Pipe Glossary
No ratings yet
Pipe Glossary
3 pages
Break The Wall From Bottom: Automated Discovery of Protocol-Level Evasion Vulnerabilities in Web Application Firewalls
No ratings yet
Break The Wall From Bottom: Automated Discovery of Protocol-Level Evasion Vulnerabilities in Web Application Firewalls
50 pages
Nato Code Numbers: Scope of List
No ratings yet
Nato Code Numbers: Scope of List
6 pages
KODAG
No ratings yet
KODAG
24 pages
MIL-DTL-5541 Rev F - Part3
No ratings yet
MIL-DTL-5541 Rev F - Part3
1 page
Sensors 24 00264 v2
No ratings yet
Sensors 24 00264 v2
17 pages
Xaliss Jamal Omer - Numerical
No ratings yet
Xaliss Jamal Omer - Numerical
16 pages
Sensors 24 00429 v2
No ratings yet
Sensors 24 00429 v2
14 pages
113 Current Monitoring Relay of Imin and Imax in 1P - AC/DC: PRI-41, PRI-42
No ratings yet
113 Current Monitoring Relay of Imin and Imax in 1P - AC/DC: PRI-41, PRI-42
1 page
Mil DTL 18240
No ratings yet
Mil DTL 18240
24 pages
Intrinsically 1
No ratings yet
Intrinsically 1
5 pages
Auto Encoder
No ratings yet
Auto Encoder
13 pages
Attention-Guided Multitask Learning For Surface Defect Identification
No ratings yet
Attention-Guided Multitask Learning For Surface Defect Identification
9 pages
Paradigm Shifts
No ratings yet
Paradigm Shifts
1 page
Create Stored Procedures in The NorthWind
No ratings yet
Create Stored Procedures in The NorthWind
7 pages
Features Material Specifications: Application
No ratings yet
Features Material Specifications: Application
1 page
TURBOTEST-nl EN
No ratings yet
TURBOTEST-nl EN
2 pages
Refrigeration Unit Datasheet
No ratings yet
Refrigeration Unit Datasheet
8 pages
1 s2.0 S0925231220303386 Main
No ratings yet
1 s2.0 S0925231220303386 Main
9 pages
High Dry Dewatering of Sludge Based On Different Pretreatment Conditions
No ratings yet
High Dry Dewatering of Sludge Based On Different Pretreatment Conditions
10 pages
Assignment (Difference Equations)
No ratings yet
Assignment (Difference Equations)
7 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
5 pages
S11 Question Catalog en
No ratings yet
S11 Question Catalog en
2 pages
Nastran Shell Element Orientation Question
No ratings yet
Nastran Shell Element Orientation Question
3 pages
From Prognostics and Health Systems Management to Predictive Maintenance 1: Monitoring and Prognostics
From Everand
From Prognostics and Health Systems Management to Predictive Maintenance 1: Monitoring and Prognostics
Rafael Gouriveau
No ratings yet