Wafer Map Defect Pattern Classification and Image Retrieval Using Convolutional Neural Network

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 31, NO.

2, MAY 2018 309

Wafer Map Defect Pattern Classification and Image


Retrieval Using Convolutional Neural Network
Takeshi Nakazawa and Deepak V. Kulkarni

Abstract—Wafer maps provide important information for There are a number of studies for wafer map pattern
engineers in identifying root causes of die failures during semi- recognitions [1]–[4]. Their classification approaches can be
conductor manufacturing processes. We present a method for divided into two main groups: 1) model-based pattern recog-
wafer map defect pattern classification and image retrieval using
convolutional neural networks (CNNs). Twenty eight thousand six nition, 2) feature extraction based pattern recognition. The
hundred synthetic wafer maps for 22 defect classes are generated model-based pattern recognition uses a predefined probability
theoretically and used for CNN training, validation, and testing. distribution function for each defect pattern and selects the best
The overall classification accuracy for the 6600 test dataset is matching model using information criterion such as the Akaike
98.2%. One thousand one hundred and ninety one real wafer information criterion (AIC) and the Bayesian information cri-
maps are used for CNN performance evaluation for the same
model trained by synthetic wafer maps. We demonstrate that by terion (BIC). The feature extraction based pattern recognition
using only synthetic data for network training, real wafer maps extracts pattern features using techniques such as correlogram
can be classified with high accuracy. For image retrieval, a binary and Radon transform. Once the pattern features are extracted,
code for each wafer map is generated from an output of a fully the common pattern classification algorithms such as support
connected layer with sigmoid activation. A retrieval error rate is vector machines, neural networks, nearest neighbors etc. are
0.36% for the test dataset and 3.7% for the real wafers. Image
retrieval takes 0.13 s per wafer map from the 18 000 wafer map applied for the classification task.
library. Deep convolutional neural networks (CNN) [5] have
recently advanced the state-of-the-art image classification
Index Terms—Deep learning, convolutional neural network,
information retrieval, semiconductor defects. performance and became the standard approach for any image
classification tasks. CNN is the end-to-end model and does not
require any task-specific feature engineering. This end-to-end
I. I NTRODUCTION model approach is beneficial since we don’t need to develop
N THE semiconductor manufacturing, wafer maps are used the task specific feature extractors and the domain specific
I to visualize defect patterns and identify potential process
issues. Inline metrology tools perform inspection after a cer-
export knowledge is not required. Another aspect of image
classification is the problem of image retrieval [6], [7]. The
tain process step and monitor abnormalities on dies. Then image retrieval is a task of finding images containing simi-
a wafer map is created based on the detected abnormal loca- lar objects or scene, given a query image, and has been used
tions. One of the main purposes for wafer map visualization in security and surveillance, medical imaging, and many other
is to monitor any abnormal defect signatures and respond to areas. Traditionally, the image retrieval requires feature extrac-
process problems quickly. Once wafer map libraries are cre- tion using object color and shapes. Since the deep CNN can
ated with corresponding root causes, defect pattern similarities learn rich features at each layer, these intermediate features
between wafers could be a good indication of the common are used as good descriptors for image retrieval [8], [9].
root causes and this knowledge base can be used to solve In this paper, we employ CNN for the defect pattern clas-
problems. In order to have an effective knowledge base, two sification and wafer map retrieval tasks. As a dataset, we use
components are required: 1) wafer map defect pattern classifi- wafer maps from simulation and the real wafers. For CNN
cation and 2) wafer map image retrieval from historical wafer training and validation, we only use the simulated wafer maps
map libraries. The wafer map defect pattern classification can because real data available for each class from the manufactur-
provide information about a defect occurrence rate for each ing process is highly imbalanced. In this case, it is beneficial to
defect class and engineers focus on the most important issue train CNN by using theoretically generated data so that we can
using this data. The wafer map image retrieval is helpful to also include rare defect patterns to the model and yet achieve
identify a root cause by querying historical wafer maps with reasonable classification accuracy. To verify the performance
the known root cause. of the proposed method, we generated 28,600 dataset by sim-
ulation. Data from 1,191 real wafers are also used to evaluate
Manuscript received November 2, 2017; revised December 31, 2017; the performance of the trained CNN.
accepted January 15, 2018. Date of publication January 18, 2018; date of
current version May 8, 2018. (Corresponding author: Takeshi Nakazawa.) Our paper is organized as follows. In Section II, methods
The authors are with Intel Corporation, Chandler, AZ 85226 USA (e-mail: for wafer map pattern generation, the CNN configuration and
[email protected]). CNN based image retrieval are descried. In Section III, we
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. present the results of defective wafer map pattern generations,
Digital Object Identifier 10.1109/TSM.2018.2795466 the CNN training/validation/test results using theoretically
0894-6507  c 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/
redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
310 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 31, NO. 2, MAY 2018

TABLE I
CNN C ONFIGURATION

Fig. 1. Defect density wafer map with the random defects (left) and with
the random and the non-random defects (right).

generated wafer maps. The performance of trained CNN is


also validated using data from the real wafers. Then the image
retrieval result is shown by comparing a query image and the
top three retrieved wafer maps from the 18,000 wafer library.
The conclusion is given in Section IV. B. Convolutional Neural Network Configuration
Table I shows our CNN configuration. The input wafer map
II. M ETHOD image size is 286 x 400. We have three convolutional layers
A. Wafer Map Pattern Generation with the receptive field size of 3 x 3 and stride 1. The first
and second convolutional layers have the 32 channels and
Defect patterns can be categorized into three types: 1) ran- the third convolutional layer has the 64 channels. The rec-
dom pattern, 2) non-random pattern and 3) the superposition tified linear activation is used for each convolutional layer.
of random and non-random patterns. Typical wafer map shows The max pooling size is 2 x 2. The fully connected (FC)
either pure random defect pattern, or random and non-random layer with the size of 256 is added after the convolutional
mixed pattern. Fig. 1 illustrates these two examples. layers with sigmoid activation. After dropout, another fully
From the yield analysis and the process improvement connected layer with the size of the defect class is added.
perspective, wafers showing non-random patterns are more The last layer is the softmax layer for the class probability
important since it clearly indicates the process related issue. calculation.
In general, random defects with the controlled number of
defects are acceptable and it provides less information from
C. Wafer Map Image Retrieval Using Convolutional
the process improvement perspective.
Neural Network
There are different ways to visualize wafer map based on the
purpose of analysis. For example, if engineers are interested in Since images are high-dimensional data, dimensionality
good versus bad die locations, a binary (pass/fail) wafer map reduction is essential to achieve rapid search in a large
is used. Sometimes engineers need to know the frequency of database. To achieve this goal, we follow the similar approach
defects at each die, and in this case defect frequency density described in [9]. In our case, no latent layer is required due to
map is the right method. The benefit for the density map is that the node size of the FC layer. For the CNN configuration used
it provides additional information about spatial defect occur- in their study, the layer 7 has the 4096 nodes and it is still
rence rates, which could be helpful for engineers to understand too large for efficient database search. Since our FC layer has
a problem more deeply, as opposed to the binary wafer map the 256 nodes, we simply use the features extracted at this
that only provides pass/reject unit information. In our study, layer after the sigmoid activation. To get a binary code for
we use the density wafer map. each wafer map, we applied the threshold value of 0.5 to the
The wafer map defect pattern is modeled using Poisson output of the sigmoid activation, i.e., if a value is greater or
point process. The Poisson distribution is given by equal to 0.5, the value is 1 and 0 otherwise. Once the binary
code library is built for the entire wafer map, the Hamming
n − distance measure is used to retrieve similar wafer maps, given
P(k, ) = e , (1)
k! a query wafer map.
where  is often called the rate parameter that defines the
III. R ESULT
average number of events in an interval. The number of events
is defined by k. By following the algorithm described in [10], A. Wafer Map Pattern Generation
we generate random points in the polar coordinates. In addi- We defined the 22 defect classes for our classification task.
tion, non-random points are superimposed by controlling the Table II is the list of the defect patterns. The simulated wafer
interval of the uniform distribution used in the algorithm. Once maps are with 1) pure random defects, 2) random defects
all points are generated for a single wafer map, a density map and typical non-random defects and 3) random defects and
is created by summing up the number of points within each multiple different non-random defect types. In order to eval-
die boundary and normalized by the maximum number among uate the classification performance for wafer maps showing
all dies. multiple defect classes, we added the class with line scratch
NAKAZAWA AND KULKARNI: WAFER MAP DEFECT PATTERN CLASSIFICATION AND IMAGE RETRIEVAL USING CNN 311

Fig. 2. The example of the generated wafer map for each class.

defect and non-random cluster defect at each quadrant. The For example, the non-random cluster defects for each quadrant
simulated defect classes contain the similar defect patterns are considered as the different class. The reason is that some-
but its location is at the different area of the wafer map. times the defect location provides locational commonality
312 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 31, NO. 2, MAY 2018

TABLE II
L IST OF WAFER M AP D EFECT C LASS

Fig. 3. Accuracy confusion matrix in percentage for the simulated test wafer
maps.

TABLE III
BATCH S IZE AND M EMORY U SAGE

information for a specific process tool and helps to identify


the specific issue.
Fig. 2 illustrates the example wafer maps for each defect
patterns. We use the defect density map for this study.

B. Wafer Map Classification Accuracy


We train our CNN as follows. First, the 1,300 wafer maps
are generated for each class using the method described in
the previous section. Then, we split these images randomly
into 1) 700 training data set, 2) 300 validation data set and
3) 300 test data set. The 700 wafer maps for each class
is used for training our CNN and the 300 wafer maps are Fig. 4. The training and validation accuracy.
used for the validation. Once the desired training/validation
accuracy is achieved using the 15,400 training and 6,600 val-
idation images, the 6,600 test images are used to evaluate our
CNN performance. The training accuracy after the 10 epoch is
99.8% and the validation accuracy is 97.8% for the simulated
wafer maps. Fig. 3 is the confusion matrix for the test dataset.
Most of the class accuracy is greater than 95% except 89.0%
for the line scratch defects (C5) and 87.7% for the curved
scratch defects (C6). The line scratch is misclassified as curved
scratch and vice versa. The overall accuracy is 98.2%.
Table III shows the relationship between the batch size and Fig. 5. The misclassified wafer map (left) and the top 5 class probabil-
the average memory usage in percentage during the network ity (right).
training phase.
The training and validation accuracy for each epoch is
shown in Fig. 4. The average processing time for each epoch scratch defect (C5). The class probability is 55.9% for C5
is 110.6 seconds. and 42.8% for C6.
Fig. 5 illustrates the misclassified wafer map examples with In addition to the simulated wafer map, we test the CNN
the top 5 class probabilities. The true class is the curved inference results using the 1,191 real wafers. Fig. 6 is the
scratch defect (C6) but it was misclassified with the line confusion matrix and it shows the per-class classification
NAKAZAWA AND KULKARNI: WAFER MAP DEFECT PATTERN CLASSIFICATION AND IMAGE RETRIEVAL USING CNN 313

Fig. 6. Accuracy confusion matrix in percentage for the real wafer maps.

Fig. 8. Query wafer map (1st column) and the corresponding top 3 retrieved
wafer map images for the selected defect patterns. (a) Query wafer map is
from simulation. (b) Query wafer map is from the real wafer.

TABLE IV
I MAGE R ETRIEVAL E RROR R ATE

Fig. 7. The misclassified wafer map from the real wafer (left) and the top
5 class probability (right).

C. Wafer Map Image Retrieval


accuracy in percentage. This real dataset contains not all Fig. 8 shows the wafer map image retrieval results for the
22 classes but 9 classes and the crosses along the diagonal selected defect class. The first image at each row is the query
of the table indicate no real wafer map data for that par- image and the rest of images are the top 3 retrieved wafer map
ticular class. The dataset is imbalanced and the dominant images. As we can see from these examples, the algorithm
class is the random defects (C1). Due to the confidential- successfully retrieved similar wafer map from the library.
ity reason, we can only provide per-class accuracy, not the For image retrieval performance evaluation, we check an
absolute number of wafers. The curved scratch defect shows error rate based on a top 1 retrieved image class and a true
the 66.7% classification accuracy based on the data size class. Table IV summarizes the result. The image retrieval
of 3 wafers and it is the rare defect class within this takes 0.13 seconds per image from the 18,000 wafer map
dataset. library with the basic Python code implementation.
Fig. 7 illustrates the misclassified real wafer map examples
with the top 5 class probabilities. The top figure is the mis-
classification example of the curved scratch defect (C6) and IV. C ONCLUSION
the prediction is the random defect (C1). The class probability In this paper, we present a method for wafer map pattern
for C1, C6 is 98.9% and 0.006% respectively. For the bottom classification and wafer map image retrieval using CNN. In
example, the true class is the non-random cluster defect at left the semiconductor manufacturing, rare event detection is crit-
(C15) but it was misclassified with the gross defect at left half ical to maintain high yield. We demonstrate the benefit of
of wafer (C17). The class probability is 44.6% for C17 and using theoretically generated wafer maps for CNN training
33.9% for C15. to enable classification tasks for the imbalanced dataset from
314 IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, VOL. 31, NO. 2, MAY 2018

the real wafers. Without having enough number of dataset, [7] A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain,
CNN cannot be trained well and it is difficult to have enough “Content-based image retrieval at the end of the early years,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 22, no. 12, pp. 1349–1380,
data size in some cases if defect patterns happen rarely. Our Dec. 2000.
model enables rare event detection capability without having [8] A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky, “Neural codes
real data and it is particularly beneficial during technology for image retrieval,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Zürich,
Switzerland, 2014, pp. 584–599.
development phase. [9] K. Lin, H.-F. Yang, J.-H. Hsiao, and C.-S. Chen, “Deep learning of
We also demonstrate efficiency and performance of CNN binary hash codes for fast image retrieval,” in Proc. IEEE Conf. Comput.
based image retrieval using the binary code generated by the Vis. Pattern Recognit. Workshops (CVPRW), Boston, MA, USA, 2015,
pp. 27–35.
FC layer of our CNN model. Once the root causes and solu- [10] R. Pasupathy, “Generating homogeneous Poisson processes,” in Wiley
tions of a particular defect mode are associated with its wafer Encyclopedia of Operations Research and Management Science.
map pattern(s), wafer map image retrieval can be used to Hoboken, NJ, USA: Wiley, Jan. 2011.
trigger the actions for problematic processes.

R EFERENCES
[1] J. Y. Hwang and W. Kuo, “Model-based clustering for integrated circuit
Takeshi Nakazawa received the Ph.D. degree in optical sciences from the
yield enhancement,” Eur. J. Oper. Res., vol. 178, no. 1, pp. 143–153,
College of Optical Sciences, University of Arizona in 2011.
Apr. 2007.
He is currently working with Intel Corporation, Chandler, AZ, USA,
[2] Y.-S. Jeong, S.-J. Kim, and M. K. Jeong, “Automatic identification of
as a Yield Engineer/Data Scientist for developing image and data analysis
defect patterns in semiconductor wafer maps using spatial correlogram
systems and yield prediction models using machine learning. He was the recip-
and dynamic time warping,” IEEE Trans. Semicond. Manuf., vol. 21,
ient of several Intel divisional and department awards, the Best Paper Award
no. 4, pp. 625–637, Nov. 2008.
for Intel Technology Journal, and several distinguished invention awards.
[3] T. Yuan, W. Kuo, and S. J. Bae, “Detection of spatial defect patterns gen-
erated in semiconductor fabrication processes,” IEEE Trans. Semicond.
Manuf., vol. 24, no. 3, pp. 392–403, Aug. 2011.
[4] M.-J. Wu, J.-S. R. Jang, and J.-L. Chen, “Wafer map failure pattern
recognition and similarity ranking for large-scale data sets,” IEEE Trans.
Semicond. Manuf., vol. 28, no. 1, pp. 1–12, Feb. 2015.
[5] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica- Deepak V. Kulkarni received the Ph.D. degree in mechanical engineering
tion with deep convolutional neural networks,” in Proc. Adv. Nueral Inf. from the University of Illinois at Urbana-Champaign in 2005. He cur-
Process. Syst., 2012, pp. 1097–1105. rently serves as an Engineering Technology Development Manager with
[6] Y. Rui, T. S. Huang, and S.-F. Chang, “Image retrieval: Current tech- the Assembly and Test Technology Development Group, Intel Corporation,
niques, promising directions, and open issues,” J. Vis. Commun. Image Chandler, AZ, USA. His interests are in applying big data analysis techniques
Represent., vol. 10, no. 1, pp. 39–62, Mar. 1999. to improve manufacturing yield.

You might also like