Combining Neural Networks For Skin Detection
Combining Neural Networks For Skin Detection
Combining Neural Networks For Skin Detection
2, December 2010
Chelsia Amy Doukim1, Jamal Ahmad Dargham1, Ali Chekima1 and Sigeru Omatu2
1
School of Engineering and Information Technology, Universiti Malaysia Sabah,
Malaysia
[email protected],[email protected],[email protected]
2
Computer and Systems Sciences, Graduate School of Engineering,
Osaka Prefecture University, Sakai, Osaka 599-8531, Japan
[email protected]
ABSTRACT
Two types of combining strategies were evaluated namely combining skin features and combining skin
classifiers. Several combining rules were applied where the outputs of the skin classifiers are combined
using binary operators such as the AND and the OR operators, “Voting”, “Sum of Weights” and a new
neural network. Three chrominance components from the YCbCr colour space that gave the highest correct
detection on their single feature MLP were selected as the combining parameters. A major issue in
designing a MLP neural network is to determine the optimal number of hidden units given a set of training
patterns. Therefore, a “coarse to fine search” method to find the number of neurons in the hidden layer is
proposed. The strategy of combining Cb/Cr and Cr features improved the correct detection by 3.01%
compared to the best single feature MLP given by Cb-Cr. The strategy of combining the outputs of three
skin classifiers using the “Sum of Weights” rule further improved the correct detection by 4.38% compared
to the best single feature MLP.
KEYWORDS
Skin Detection, Multi-Layer Perceptron, Feature Extraction
1. INTRODUCTION
Skin detection is an important preliminary process for subsequent feature extraction in a wide
range of image processing techniques such as face detection, face tracking, gesture analysis,
content-based image retrieval systems, various computer vision applications, etc. Studies have
shown that by combining more than one feature or classifier, the performance of the skin
detection system is improved. Zhu et al. [1] combined two Gaussian feature spaces where the first
one is related to the colour distribution and the second one is related to the skin spatial and shape
distribution. The combined feature method performed better compared to single feature and also
other generic skin model namely histogram model, single Gaussian model and Gaussian Mixture
model. Brand and Mason [2] evaluated the performance of the combined colour components or
features from the RGB colour space and concluded that the combination of (R/G + R/B + G/B)
gave better performance than the single colour component. Jiang et al. [3] proposed a Skin
Probability Map (SPM) based skin detection system that integrated the colour, texture and space
information and claimed that their proposed method performed better than the generic SPM
method. Gasparini et al. [4] combined different skin classifiers based on different colour features
using several combination rules such as the sum rule, the product rule, the majority rule and the
author’s proposed skin corrected by non-skin (SCNS) rule. The performance was evaluated in
terms of recall and precision. The performance for all combining rules increases in terms of
precision compared to single classifier and the most precision-oriented is given by the product
rule. Sajedi et al. [5] combined a block-based skin detection classifier with a boosted pixel-based
DOI : 10.5121/sipij.2010.1201 1
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
classifier. The boosted pixel-based classifier is modified by combining several explicit boundary
skin classifiers based on different colour features. The authors claimed that their method is more
robust to variations of skin colour compared to Self-organizing Map (SOM), Fuzzy Integral,
conventional pixel-based method and Bayesian network approach.
Neural networks have been used successfully as skin classifier. However, there is no research has
been done for combining the neural network-based skin classifiers. In this paper, the multi-layer
perceptron (MLP) neural network is used for skin detection. Several chrominance components
from the YCbCr colour space are used as the skin colour features. Several strategies for combining
MLP neural networks for skin detection are proposed and their performance on skin detection is
evaluated. The paper is organised as follows. Section 2 briefly describes the data preparation. In
Section 3, the neural network properties used in this work are explained. Section 4 explains the
method for finding the number of neurons in the hidden layer. Section 5 describes the
performance metrics used to evaluate the skin detection performance and the performance for
single feature MLP is given. Section 6 explains the combining skin features strategy. In Section 7,
the strategy of combining skin classifiers using several combining rules are described. Finally,
Section 8 concludes the paper.
2. DATA PREPARATION
The database used in this work is the Compaq database [6]. This database consists of 13,640
images with its corresponding masked images. These images contain skin pixels belonging to
persons of different origins, with unconstrained illumination and background conditions, which
make the skin detection task more challenging and difficult. Figure 1 shows an example of
images and their corresponding masked images. Two sets of data are prepared namely the training
data and test data. The training data comprises training and validation samples that will be used to
train the MLP neural networks. The training sample consists of 420,000 image pixels and will be
validated by a similar number of image pixels randomly selected from the Compaq database.
Note that each image pixel can be selected only once. The training data is divided into 30 data
files where each data file consists of 14,000 pixels from the training sample and an equal number
of pixels from the validation sample. Each data file will be used to train a network during a
training run. The test data consists of 100 images selected at random from the Compaq database.
The test images are used to evaluate the performance of the skin detection system.
Figure 1. Example of images from the Compaq database with their corresponding masked
images.
2
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
3
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
HNBL HNBH
HNB
...
HN
4
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
5. SKIN DETECTION
The fixed MLP neural networks are used to segment all the images in the test dataset into skin
and non-skin regions. To evaluate the performance of each neural network, three performance
metrics are used. The first metric is the correct detection rate, CDR and is given in Equation 1.
The false acceptance rate FAR is the percentage of identification instances in which false
acceptance occurs. For example, an unauthorized person is identified as an authorized one. The
false rejection rate FRR is the percentage of identification instances in which false rejection
occurs. This is the case when the system fails to recognize an authorized person and rejects that
person as an impostor. The FAR and FRR are expressed in Equations (2) and (3), respectively.
Since the transfer function used is a sigmoid function, the MLP neural network will be producing
the output between 0 and 1. Thus, the output of the neural network needs to be modified so that it
is either 0 or 1. In this work, a single threshold value is used in determining the skin and non-skin
classes. If the MLP network output is higher than the threshold value, then the output is 1.
Otherwise, the output is 0. The threshold value used in this work is 0.5. As can be seen from
Table 2, Cb-Cr gives the highest correct detection rate.
7.1. Combination of Outputs of Skin Classifiers Using the AND and OR Operators
The outputs of two and three skin classifiers are combined using the AND and OR operators. For
combining two skin classifiers, the possible combinations are Cb-Cr and Cb/Cr, Cb-Cr and Cr, and
Cb/Cr and Cr. Figure 4 illustrates the procedure of skin detection for combining the outputs of two
skin classifiers using the binary operators. The skin classifiers, for example the Cb-Cr and Cb/Cr,
are used to segment 100 test images into skin and non-skin regions. The outputs from the
respective skin classifier are thresholded and combined using the AND operator. A single
threshold value of 0.5 is used for the skin and non-skin classification.
6
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
Cb-Cr NN Threshold
100 images 0.5
from the AND/OR
test dataset Operator
Cb/Cr NN Threshold
0.5
Input
Segmented
images
Figure 4. Procedure for combining the outputs of two skin classifiers using the binary operators.
Cb-Cr NN Threshold
0.5
Voting
100 images
from the Cb/Cr NN Threshold
test dataset 0.5
Segmented
Input Cr NN Threshold images
0.5
Figure 5. Procedure for combining the outputs of three skin classifiers using Voting rule.
7.3. Combination of Outputs of Skin Classifiers Using the Sum of Weights Rule
The weights for the three skin classifiers: Cb-Cr, Cb/Cr and Cr, are fixed based on their correct
detection rates. From Table 2, the correct detection rate for Cb-Cr, Cb/Cr and Cr are 79.60%,
78.63% and 76.18%, respectively. Thus, the weights for each neural network are fixed as
expressed in Equation 4 to Equation 6.
Figure 6 illustrates the procedure of skin detection using this strategy. The inputs that are fed into
each neural network will be multiplied by their corresponding weights and produce the outputs
Y1(i,j), Y2(i,j) and Y3(i,j), respectively. These outputs are then summed and thresholded in order
to classify the skin and non-skin regions. Note that the threshold value used in this strategy is
different because each neural network was multiplied by their corresponding fixed weights and
thus each neural network will have different threshold values. Thus, the threshold value is
determined empirically.
7
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
0.3396*Cb-Cr NN Y1(i,j)
Input Sum of
100 images Y2(i,j) Weights = Threshold
from the test 0.3354*Cb/Cr NN Y1(i,j)+ 0.9
dataset Y2(i,j)+Y3(i,j)
Figure 6. Procedure of combining the outputs of three skin classifiers using the Sum of Weights
rule.
Input Cb-Cr NN
Cr NN
Figure 7. Procedure for creating the training and validation samples for training the new neural
network.
Cr Threshold
Segmented
NN 0.5
images
Figure 8. Procedure for combining the outputs of three skin classifiers using a new neural
network.
8
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
Combination
Number of Chrominance
Rule/ CDR FAR FRR
Outputs Component
Operator
Cb-Cr & Cb/Cr 79.77 19.35 0.89
AND Cb-Cr & Cr 82.21 16.90 0.89
Two Cb/Cr & Cr 82.29 16.80 0.91
Outputs Cb-Cr & Cb/Cr 78.46 20.17 0.83
OR Cb-Cr & Cr 73.56 26.13 0.31
Cb/Cr & Cr 72.53 27.18 0.30
AND 82.38 16.69 0.92
OR 72.53 27.18 0.30
Voting 79.50 19.66 0.84
Three Sum of Cb-Cr, Cb/Cr &
Outputs Cr 83.98 14.90 1.12
Weights
New 3-126-1
Neural 82.21 16.90 0.89
Network
8. CONCLUSION
In this work, several combination strategies for combining MLP neural networks for skin
detection were evaluated. A modified network growing technique for finding the number of
neurons in the hidden layer of a MLP neural network was applied. Three chrominance
components Cb-Cr, Cb/Cr and Cr that gave the highest CDR on their respective MLP were used for
the combination. The combination of Cb/Cr and Cr features improved the CDR by 3.01%
compared to the best single feature MLP given by Cb-Cr. Combining classifier using Sum of
Weights strategy further improved the CDR by 4.38% compared to the best single feature MLP.
Furthermore, combining classifiers using the Sum of Weights strategy improved the correct
detection rate by 3.98% compared to the Bayes’ rule classifier reported by Jones and Rehg [6]
using the Compaq database.
9
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
Combining feature
MLP
Combining classifier
MLP
Figure 8. The best test images segmented using the best single feature MLP, the best combining
feature MLP and the best combining classifier MLP.
Combining feature
MLP
Combining classifier
MLP
Figure 9. The worst test images segmented using the best single feature MLP, the best combining
feature MLP and the best combining classifier MLP.
10
Signal & Image Processing : An International Journal(SIPIJ) Vol.1, No.2, December 2010
REFERENCES
[1] Q. Zhu, K. T. Cheng & C. T. Wu, “A Unified Adaptive Approach to Accurate Skin Detection,” in
Proceedings of the International Conference on Image Processing, Singapore, October 2004.
[2] J. Brand & A. Mason, “A comparative assessment of three approaches to pixel level human skin
detection,” in Proceedings of the International Conference on Pattern Recognition, vol. 1,
Barcelona, Spain, pp. 1056-1059, September 2000.
[3] Z. W. Jiang, M. Yao & W. Jiang, “Skin Detection Using Color, Texture and Space Information,”
in Proceedings of the Fourth International Conference on Fuzzy Systems and Knowledge
Discovery, Hainan, China, August 2007.
[4] Francesca Gasparini, Silvia Corchs & Raimondo Schettini, “A recall or precision oriented skin
classifier using combining strategies,” Pattern Recognition, vol. 38, no. 11, pp. 2204-2207,
November 2005.
[5] H. Sajedi, M. Najfi & S. Kasaei, “A Boosted Skin Detection Method Based on Pixel and Block
Information,” in Proceedings of the Fifth International Symposium on Image and Signal
Processing and Analysis, Istanbul, Turkey, September 2007.
[6] Michael J. Jones & James M. Rehg, “Statistical color models with applications to skin detection,”
International Journal of Computer Vision, vol. 46, no. 1, pp. 81-96, January 2002.
[7] L. M. Fu, “Neural networks in computer intelligence,” McGraw-Hill, 1994.
[8] J. A. Dargham, “Face detection: A comparison between histogram thresholding and neural
networks,” Ph.D. thesis, Universiti Malaysia Sabah, Malaysia, 2008.
[9] S. E. Fahlman & C. Lebiere, “The Cascade-Correlation Learning Architecture,” in Touretzky, D. S.
(ed). Advances in Neural Information Processing System 2, pp. 524-532, Morgan Kaufmann
Publishers, 1990.
[10] P. N. Suganathan, E. K. Teoh & D. P. Mital, “Multilayer Backpropagation Network for Flexible
Circuit Recognition,” in Proceedings of the International Conference on Industrial Electronics,
Control and Instrumentation, Hawaii, USA, November 1993.
11