Weighted statistical binary patterns for facial feature representation

Truong, Hung Phuoc; Nguyen, Thanh Phuong; Kim, Yong-Guk

doi:10.1007/s10489-021-02477-1

Weighted statistical binary patterns for facial feature representation

Open access
Published: 31 May 2021

Volume 52, pages 1893–1912, (2022)
Cite this article

Download PDF

You have full access to this open access article

Applied Intelligence Aims and scope Submit manuscript

Weighted statistical binary patterns for facial feature representation

Download PDF

2672 Accesses
1 Altmetric
Explore all metrics

Abstract

We present a novel framework for efficient and robust facial feature representation based upon Local Binary Pattern (LBP), called Weighted Statistical Binary Pattern, wherein the descriptors utilize the straight-line topology along with different directions. The input image is initially divided into mean and variance moments. A new variance moment, which contains distinctive facial features, is prepared by extracting root k-th. Then, when Sign and Magnitude components along four different directions using the mean moment are constructed, a weighting approach according to the new variance is applied to each component. Finally, the weighted histograms of Sign and Magnitude components are concatenated to build a novel histogram of Complementary LBP along with different directions. A comprehensive evaluation using six public face datasets suggests that the present framework outperforms the state-of-the-art methods and achieves 98.51% for ORL, 98.72% for YALE, 98.83% for Caltech, 99.52% for AR, 94.78% for FERET, and 99.07% for KDEF in terms of accuracy, respectively. The influence of color spaces and the issue of degraded images are also analyzed with our descriptors. Such a result with theoretical underpinning confirms that our descriptors are robust against noise, illumination variation, diverse facial expressions, and head poses.

Recognizing Individuals from Unconstrained Facial Images

Local Binary Pattern and Its Variants: Application to Face Analysis

Analysis of Local Descriptors for Human Face Recognition

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial intelligence has been developing rapidly with many real-world applications such as time series prediction [24], image classification [40, 46], and smart cities [28]. Among them, personal identification using biometric traits is a hot trend nowadays and has received increasing attention in the computer vision community. With biometric characteristics, the face image can be easily obtained from the camera as a non-invasive acquisition process. Therefore, face recognition can be widely applied to public environments such as video surveillance, criminal detection, access control system, mobile device security, etc. [9]. Although diverse methods for face recognition have been introduced [18, 26, 36], they still have shortcomings. For this reason, face recognition is a challenging topic. Figure 1 shows several face recognition challenges, such as facial expression, head pose, illumination, and background complexity. Also, it has other difficulties, including occlusion, aging, makeup, image quality, etc. These challenges are formidable to deal with well.

A face recognition application typically consists of face detection, feature extraction, and classification. In general, the feature extraction stage plays a vital role because it will fail to achieve decent results when the employed feature descriptor is not adequate. Indeed, most well-known methods have robust feature descriptors, highly discriminative, and robust to extrinsic changes. In recent years, most face recognition algorithms, which have been studied extensively in addressing robust and discriminative descriptors, focus on three primary techniques: holistic, local, and hybrid models [23]. The holistic approach exploits the entire face and projects it into a small subspace such as Eigenfaces in manifold space [45], Fisherfaces [16, 33]. The local approach considers certain facial features such as Speed-up robust features (SURF) [17], Local Binary Patterns (LBP) [22]. The local information combines with the holistic information to enrich feature descriptors for performance improvements in the hybrid approach: the fusion of 54 Gabor functions and fuzzy logic for facial expression recognition [15], two-color local descriptors, called Color ZigZag Binary Pattern (CZZBP) [19], or a fusion of Deep features [12].

Thanks to the low computational cost and efficient feature extraction capability, the LBP-based methods have been studied and widely applied to many tasks such as face recognition, facial expression classification, or texture classification. A large number of the LBP variants and hybrid models based on LBPs have been introduced [1, 36] for face recognition. However, they still have some drawbacks, such as noise sensitivity, contrast information, or illumination variation. This paper proposes a weighting statistical binary pattern framework that can improve the local descriptor in terms of discriminative power and robust against noise and illumination variation.

This work is extended from our prior efforts where we consider neighborhoods in straight-line topology [44] to utilize more useful information for local feature descriptors by statistical binary patterns [14, 37]. In this way, the proposed framework firstly considers two statistical moments (mean and variance) for noise elimination and obtain complementary information. Then, the proposed LBP variant is applied to the first-moment image for LBP representations. The second-statistical moment image is a complementary component for building the weighted histogram to incorporate each pattern contribution. This proposed framework can enrich local descriptors by utilizing both moments without increasing the fused histogram dimension. The present study addresses prior shortcomings and proposes an upgraded descriptor for face recognition. The contributions of it are given as follows:

We present a straight-line topology approach with LBP by direction (known as LBP_α), which is robust against several visual challenges, such as noise, illumination, and facial expressions, as a base foundation.
Then, we propose a novel complementary LBP variant (known as CLBP_α), which is inspired by the local difference magnitude-sign transform to complement information for the local descriptor.
To extract more robust descriptors from salient information in statistical moments, we propose the fused histogram of CLBP_α, that is constructed by using WSBP_α to obtain enriched features.
A comprehensive evaluation of six public datasets suggests that our proposed framework outperforms the state-of-the-art methods.

The paper is organized as follows. Section 2 prepares some background on LBP. Section 3 details the proposed framework. In section 4, we analyze the implementations through several parameter settings for evaluations. Experimental results are interpreted in Section 5. A discussion for our proposed framework is analyzed in Section 6, and the last one consists of our conclusion and future works.

2 Related works

Many methods based on basic LBP descriptors, that can encode the local appearance by the relation between neighborhoods, have been introduced. However, there exist several shortcomings, such as local information loss or sensitivity to noise. Diverse LBP variants have been proposed to address these shortcomings. Several neighborhood topologies or encoding operators have been introduced, such as Dominant Rotated Local Binary Patterns (DRLBP) [32] and Enhanced Line Local Binary Pattern (EL-LBP) [44].

Recently, several hybrid models based on LBP-like descriptors for face analysis have been examined and proved to have highly discriminative power [22]. Lin et al. [27] proposed a fast algorithm, called LBP edge-mapped descriptor, which was to fuse LBP and SIFT using the maxima of gradient magnitude points on the image to illustrate facial contours for face recognition. Ding et al. [11] introduced the Dual-Cross Patterns (DCPs) as a core algorithm to extract facial features at both the holistic and component levels of a human face, then applied the first derivative of Gaussian for eliminating the differences of illumination. The Multi-scale block Local Multiple Patterns (MB-LMP) [49] exploited multiple feature maps based on the modified Weber’s ratio, then fused the histograms of non-overlapping patches for more robust features. Kas et al. [21] addressed shortcomings of previous LBPs and proposed Mixed Neighborhood Topology Cross Decoded Patterns (MNTCDP) by considering multi-radial and multi-orientation information simultaneously to exploit the relationship between the referenced point and its neighbors on each 5 × 5 pixel block. Inspired by LBP-like in face recognition, Shu et al. [43] proposed Equilibrium Difference LBP (ED-LBP) in multiple color channels (RGB, HSV, YCbCr) accompanied with an SVM classifier for face spoofing detection. Unlike the traditional LBP circle, the Local Diagonal Extrema Number Pattern (LDENP) [42] descriptor only encoded information within the local diagonal neighbors using the first-order local diagonal derivatives to obtain a compact description for face recognition. Deng et al. [10] proposed an accurate face recognition by exploiting the compressive binary patterns (CBP) on a set of first six random-field eigenfilters, which reduced the bit error rate of LBP-like descriptor and were more robust against additive Gaussian noise. According to LBP, another approach encoded information by examining neighboring pixels at different distances across different derivative directions called Local Gradient Hexa Pattern (LGHP) [6] which generated discriminative inter-class facial images. Lu et al. [29] proposed an unsupervised feature learning to represent face images from raw pixels and jointly encoded codebook for small regions to obtain high discrimination in descriptors, called Simultaneous Local Binary Feature Learning and Encoding (SLBFLE).

The other aspect was to utilize more useful information for descriptors to overcome local information loss within images. For instance, the Completed LBP technique (CLBP) [14] described local difference Sign-Magnitude transform to obtain higher performance. Another improvement of CLBP, i.e. the statistical binary patterns model [37], was built on several statistical moments for robust descriptors and improved the performance.

2.1 LBP

LBP was first introduced by Ojala et al. [38]. The LBP feature describes the spatial relationship in an image by encoding the neighbor points of a given central point. Let f be an 2D discrete image in $\mathbb {Z}^{2}$ space. Then, the LBP encoding of f can be considered as a mapping from $\mathbb {Z}^{2}$ to {0,1}^P:

$$ \text{LBP}_{P,R}(f)(\mathbf{c}) = \sum\limits_{p = 0}^{P} s(f(\mathbf{g}_{p}) - f(\mathbf{c})), \quad \text{with } s(x) = \begin{cases} 1, \quad x \geq 0 \\ 0, \quad \text{otherwise} \end{cases} $$

(1)

and g_p are intensity of P neighbors and are measured on the circle of central point c and radius R.

The dimension of LBP descriptor can be reduced by considering its uniform patterns, whose values U(LBP_P,R) ≤ 2 and defined by the following equation:

$$ \mathbf{U}(\text{LBP}_{P,R}) = \sum\limits_{p=1}^{P} |\text{LBP}_{P,R}^{p} - \text{LBP}_{P,R}^{p-1}| $$

(2)

where $\text {LBP}_{P,R}^{p}$ is the p-th bit of LBP_P,R, and $\text {LBP}_{P,R}^{P} = \text {LBP}_{P,R}^{0}$. $\text {LBP}_{P,R}^{u2}$ [38] was a very robust and reliable descriptor for face representation or texture classification. As a result, the mapping from LBP_P,R to $\text {LBP}^{u2}_{P,R}$ produces L = P(P − 1) + 3 distinct output values by building a lookup table of 2^P patterns. Therefore, the local descriptor is described as follows:

$$ \mathbf{H} = [H_{0}, H_{1}, ..., H_{L-1}]^{T} $$

(3)

where

$$ H_{t} = \sum T\big\{\text{LBP}_{P,R}(x,y) = t\big\}, \text{and }T\big\{ A \big\} = \begin{cases} 1, \quad \text{if A is true} \\ 0, \quad \text{otherwise} \end{cases} $$

(4)

in which H_t is the occurrence of the t^th LBP^u2 code, where t ∈ [0..L − 1]. Therefore, the length of histogram in uniform LBP representation is L = P(P − 1) + 3.

2.2 Completed LBP

Guo et al. [14] considered a local difference sign-magnitude transform and proposed a completed model as a state-of-the-art variant of LBP. The transform d_p = s_p ∗ m_p consists of two components, i.e. signs: s_p = sign(d_p) and magnitudes: m_p = |d_p| = |f(g_p) − f(c)|. By adding components to them, three operators, called CLBP-Sign (CLBP_S), CLBP-Magnitude (CLBP_M), and CLBP-Center (CLBP_C), were designed to code three features S, M, and C. The first operator CLBP_S was the same as the original LBP operator and produced S component. The M component which expresses the local variance of magnitude should be consistent with S and defined as follows:

$$ \text{CLBP\_M}_{P,R}(f)(\mathbf{c}) = (s(m_{p} - \textit{\={m}}))_{0 \leq p < P} $$

(5)

where m̄ is the mean value of m_p from the whole image. Moreover, the last component C also carries discriminant information. Therefore, the CLBP_C operator is formulated:

$$ \text{CLBP\_C}(f)(\mathbf{c}) = s(f(\mathbf{c}) - \textit{\={f}}) $$

(6)

where f̄ is set as the mean gray level of the whole image. Because of complementary relationship between these operators, it turns out that the Completed LBP descriptor is useful for the texture classification task.

2.3 Face representation based on LBPs

The face representation based on LBP descriptors has been first introduced by Ahonen et al. [1] by analyzing small local regions in the face instead of striving for a holistic facial texture representation. In such a local approach, a face image is partitioned into m non-overlapping patches R^(j) (j = 1..m) where an LBP operator is independently applied to produce local histograms. It aims to fuse all LBP histograms as a single vector (also known as local LBP descriptors) for facial texture representation. The concatenation approach is a simple and efficient one for LBP description. Each LBP histogram H^(j) by each image patch R^(j) is computed by (3). Finally, the global LBP descriptor for all patches R^(j) is formulated as follows (T is the transpose operator):

$$ \mathbf{H} = [(\mathbf{H}^{(1)})^{T} (\mathbf{H}^{(2)})^{T} ... (\mathbf{H}^{(m)})^{T}]^{T} $$

(7)

The resulting feature vector has the size of m × n, where n is the length of LBP histogram along with its topology. Therefore, this approach for face representation is more robust under variations such as poses or illumination. Notably, small patches within an image can be of different sizes or overlapping regions. Many face recognition works have followed the local approach and obtained significant LBP variants [5, 42, 47, 49].

2.4 Statistical moment images

Since we define that f is a 2D discrete image in $\mathbb {Z}^{2}$ space, we can obtain a real-valued image in $\mathbb {R}$ by a mapping technique. The spatial support, which is employed to compute the local statistics, is modeled as ${\mathscr{B}} \subset \mathbb {Z}^{2}$, such that $\mathcal {O} \in {\mathscr{B}}$, where $\mathcal {O}$ is the origin of $\mathbb {Z}^{2}$ [37]. Figure 2 illustrates how to construct a spatial support ${\mathscr{B}}$.

The r-order moment image associated to f and ${\mathscr{B}}$ is also a mapping from $\mathbb {Z}^{2}$ to $\mathbb {R}$, defined as

$$ m_{(f, \mathcal{B})}^{r}(\mathbf{c}) = \frac{1}{|\mathcal{B}|} \underset{\textbf{b} \in \mathcal{B}}{\sum} (f(\textbf{c} + \textbf{b}))^{r} $$

(8)

where c is a pixel from $\mathbb {Z}^{2}$, and $|{\mathscr{B}}|$ is the cardinality of the structuring element ${\mathscr{B}}$. Accordingly, the r-order centered moment image (r > 1) is defined as

$$ \mu_{(f, \mathcal{B})}^{r}(\mathbf{c}) = \frac{1}{|\mathcal{B}|} \underset{\textbf{b} \in \mathcal{B}}{\sum} (f(\textbf{c} + \textbf{b}) - m_{(f, \mathcal{B})}^{1}(\mathbf{\textbf{c}}))^{r} $$

(9)

where $m_{(f, {\mathscr{B}})}^{1}(\mathbf {\textbf {c}})$ is the average value (1-order moment) calculated around c. Finally, the r-order normalized centered moment image (r > 2) is defined as

$$ {\upbeta}_{(f, \mathcal{B})}^{r}(\mathbf{c}) = \frac{1}{|\mathcal{B}|} \underset{\textbf{b} \in \mathcal{B}}{\sum}(\frac{(f(\textbf{c} + \textbf{b}) - m_{(f, \mathcal{B})}^{1}(\textbf{c})}{\sqrt{\mu_{(f, \mathcal{B})}^{2}(\mathbf{c})}})^{r} $$

(10)

where $\mu _{(f, {\mathscr{B}})}^{2}(\mathbf {c})$ is the variance (2-order centered moment) calculated around c.

3 Weighted Statistical Binary Patterns by direction α (WSBP_α)

We propose the Weighted Statistical Binary Patterns by direction α (WSBP_α) descriptor to enhance the discriminant capability of LBPs for face recognition while reducing its sensitivity to representative challenges, such as facial emotions, noise, or illumination. Such a descriptor can encode spatial information in a set of local statistical moment images and maps this coding to uniform LBP “u2” to produce a more compact descriptor. Because of complementary and consistent characteristics, two crucial components CLBP_S_α and CLBP_M_α (Here, α is a given direction) are computed on a mean image and weighted by a variance image to improve the performance. The detail is given as follows.

3.1 Local Binary Patterns by direction (LBP_α)

In the original LBP and several variants, the neighbors g_p have the coordinate $(R\cos \limits (2\pi p/P), R\sin \limits (2\pi p/P))$ lying on a circle of radius R. In the proposed LBP_α, we consider the relationship between pixels by a straight-line topology on a direction α, given that the coordinate of c is (0,0). The neighbors of straight-line topology are defined as follows:

$$ \mathbf{g}^{p}_{\alpha} = (\frac{2pR\cos\alpha}{P}, \frac{2pR\sin\alpha}{P})_{-P/2 \leq p \leq P/2} $$

(11)

When considering a line topology, the number of neighbors should be an even number, and neighbors are bilateral symmetry with a central point c. Figure 3 illustrates four LBP$_{\alpha _{i}}$ by considering 6 neighbors along with a line topology.

Similar to traditional LBP, we encode an image by LBP_α operator as defined in (1). Therefore, it can be expressed as follows:

$$ \text{LBP}_{\alpha (P, R)}(f)(\mathbf{c}) = {\sum}_{p = 0}^{P} s(f(\mathbf{g}_{\alpha}^{p}) - f(\mathbf{c})), \quad \text{with } s(x) = \begin{cases} 1, \quad x \geq 0 \\ 0, \quad \text{otherwise} \end{cases} $$

(12)

where $\mathbf {g}_{\alpha }^{p}$ is defined in (11), and remaining variables such as f, c, P, and R are defined in (1). As a result, LBP_α operator, which produces 2^P distinct patterns, leads to a huge descriptor. Inspired by the LBP uniform principle in Section 2.1, we reduce the number of patterns by considering the uniform patterns concept for LBP_α. After this process, the LBP_α “uniform patterns” have P(P − 1) + 3 distinct output values from a lookup table of 2^P values.

The main difference between the circle LBP and the LBP_α is that the LBP considers spatial relationship by a circle, whereas the LBP_α exploits spatial information in the straight-line of neighbors along with the given directions. Although several primary factors such as the direction of exposure, illumination, or facial expressions are given as challenges in face recognition, it turns out that the LBP_α-based representation is robust against changes of illumination and scale since it examines micro-patterns in a line topology. Moreover, by taking advantage of traditional LBP, the proposed LBP_α can characterize the distribution of local pixels by a direction, and the frequency of occurrences of LBP_α values can be used to represent various facial structures.

3.2 Complementary Local Binary Patterns by direction α (CLBP_α)

The CLBP [14] had been used for texture classification by combining three operators CLBP_S, CLBP_M, and CLBP_C in a joint or hybrid way. Similar to CLBP, we propose Complementary Local Binary Patterns by direction α (CLBP_α) which considers neighbors $\mathbf {g}_{\alpha }^{p}$ by direction α for the face recognition task. The proposed CLBP_α consists of two operators: CLBP_α-Sign (CLBP_S_α) and CLBP_α-Magnitude (CLBP_M_α). In general, the CLBP_S_α is similar to the proposed LBP_α described in (12). The CLBP_S_α operator describes the structure of image f with respect to the local relationship, whereas the CLBP_M_α complements local difference Magnitude and is in a consistent format with that of the CLBP_S_α. This operator is defined as follows:

$$ \begin{array}{@{}rcl@{}} \text{CLBP\_M}_{\alpha (P, R)}(f)(\mathbf{c}) &=& (s(m_{\alpha}^{p} - \textit{\={m}}_{\alpha}))_{0 \leq p < P} \text{, } \\ m_{\alpha}^{p} &=& |d_{p}| = |f(\mathbf{g}_{\alpha}^{p}) - f(\mathbf{c})| \end{array} $$

(13)

where m̄_α is the mean value of $m_{\alpha }^{p}$ for the whole discrete image f. Each component S and M has P(P − 1) + 3 distinct values corresponding to the “uniform” LBP_α coding of discrete image f. Inspired by forming CLBP descriptors [14], we have two ways to combine different components for enhanced descriptors. The first descriptor CLBP_S/M_α, which forms a joint 2D histogram from the CLBP_S_α and CLBP_M_α codes, has [P(P − 1) + 3]² values. The second descriptor CLBP_S_M_α, which concatenates two histograms together, has 2[P(P − 1) + 3] values. The distribution of the first one can become too sparse when the dimension (i.e., the number of neighbors P) increases. However, the marginal histogram of the second one obtains a reasonable size of 2[P(P − 1) + 3]. As a trade-off between the performance and computational cost, the marginal histogram approach is utilized in our experiments. Note that component C, which expresses the local gray level in the image, is ignored in our proposed model. The proposed CLBP_S_M_α produces more reliable and significant expressiveness for the facial feature representation.

3.3 Weighted Statistical CLBP by directions α _i (WSBP$_{\alpha _{i}}$)

An introduction of two first-order moments (mean and variance moments) into an LBP-based operator was proposed as Statistical Binary Patterns (SBP) [37]. The first order, known as mean-valued moment m₁, gives the contribution of individual pixel intensity for the entire image. The second order, known as variance-valued moment μ₂, is to find how each pixel varies from its neighboring pixels and represents salient regions in an image. Our proposed WSBP can build a novel histogram from CLBP$_{\alpha _{i}}$ descriptors by computing CLBP$_{\alpha _{i}}$ image on the first-order moment m₁ and counting occurrences of every pattern on that CLBP$_{\alpha _{i}}$ image by a significance index corresponding to the salient regions using the new second-order moment $\mu ^{\prime }_{2}$. The proposed descriptor can discard the noise, illumination, or near-uniform regions. Figure 4 illustrates the flow diagram of our descriptor based on the WSBP$_{\alpha _{i}}$ descriptors. With the mean image m₁, the spatial relationship between local structures is represented using CLBP$_{\alpha _{i}}$ operator to obtain two essential components S$_{\alpha _{i}}$ and M$_{\alpha _{i}}$. Then, each component (S$_{\alpha _{i}}$ and M$_{\alpha _{i}}$) obtained by CLBP$_{\alpha _{i}}$ operator is weighted by the contribution of every local pattern according to the new variance image $\mu ^{\prime }_{2}$ for the weighting histogram.

Let H be the histogram vector of each component, and (x,y) be location of pixel in each component of CLBP$_{\alpha _{i} (P, R)}$ image. Then, the histogram for each component is based on the contribution of every location (pixel) in new variance moment $\mu ^{\prime }_{2}$. Equation (4) defines the occurrence of every CLBP$_{\alpha _{i} (P, R)}$ code t^th as follows:

$$ H_{t} = \begin{cases} \underset{\forall (x,y)}{\sum} \mu^{\prime}_{2}(x,y), \text{ if } \text{CLBP}_{\alpha_{i} (P, R)}(x,y) = t \\ 0, \text{ otherwise} \end{cases} $$

(14)

The SBP descriptor [37] produces enhanced descriptors and only considers all patterns having the same weights and ignoring their significance. In this paper, the WSBP$_{\alpha _{i}}$ descriptors capture the local relationships within images corresponding to the mean moment, and exploit contrast and gradient magnitude information through variance moment to enhance the local relationship description. Equation (14) describes how every pixel occurrence is weighted by its contribution corresponding to those pixels in a new variance moment $\mu ^{\prime }_{2}$. Therefore, the histogram of each component S$_{\alpha _{i}}$ and M$_{\alpha _{i}}$ has P(P − 1) + 3 values. Finally, the dimensionality of WSBP$_{\alpha _{i}}$ descriptor is 2[P(P − 1) + 3] because of the concatenation of histograms. As a result, the WSBP$_{\alpha _{i}}$ descriptor is not only compact but also robust to noise, illumination and other variations.

3.4 The computational complexity

In this section, we address the computational complexity of WSBP descriptor for an input image of size N × N. Suppose that the pre-defined spatial support ${\mathscr{B}}$ is defined as (R₁,P₁),(R₂,P₂); WSBP_α is calculated by considering P neighbors. The computational complexity of WSBP descriptor depends on the following factors.

Construction of moment images: At each pixel, the mean value can be obtained after O(P₁ + P₂) operations, while the variance value requires O((P₁ + P₂)²) operations. Therefore, the construction of moment images can be done in O((P₁ + P₂)²N²) = O(N²).
Construction of CBLP_α: CBLP_α consists of 2 components CBLP_S_α and CBLP_M_α. The first one is calculated in O(PN²). The second one has the same complexity of O(PN²). As a result, the complexity of CBLP_α is O((2PN²) = O(N²).
Construction of WSBP_α: WSBP_α addresses CBLP_α on mean image and considers variance image for constructing the weighted histogram. As mentioned above, each component can be done in O(N²).

Therefore, the computational complexity of WSBP is O(N²). It is evident that WSBP requires more calculation than LBP, but both are in the same computational complexity order. Such a constraint guarantees that our operator is effective as the non-LBP methods in terms of computation time.

4 Implementation

In this section, we detail the configuration of the WSBP descriptor.

4.1 The fusion of different descriptors WSBP$_{\alpha _{i}}$

Suppose that WSBP_α considers only one direction (α is a given direction), it could lead to an inadequate description simply because such a descriptor would exploit only the local relationship along that direction. What we aim here is that this descriptor should utilize every useful surrounded features. Inspired by LBP operators in a circle topology (with scale of (P, R) = (8, 1)), we propose to consider at least four directions for the fused histogram, α_i ∈{0⁰,45⁰,90⁰,135⁰} (see Section 5). Figure 5 shows components S and M of CLBP at four directions {α_i} as four views of a given image. The fusion of four views could be an adequate descriptor in recognizing face against illumination or head pose variations. Such a WSBP can be expressed as follows:

WSBP = WSBP$_{\alpha _{1}}$_WSBP$_{\alpha _{2}}$_WSBP$_{\alpha _{3}}$_WSBP$_{\alpha _{4}}$

4.2 Moment parameters

For a successful implementation of our descriptor, a proper parameter setting has to be made. As a pre-processing step, the mean (m₁) and variance (μ₂) moments obtained by computing the spatial support ${\mathscr{B}}$ are used to reduce the noise sensitivity. Thus, moment parameters should be in the optimal settings for this purpose.

We define the structuring elements as a circle spatial support ${\mathscr{B}} = \{\{(R_{i},P_{i})\}\}$, such that (P_i) is the number of neighbors and (R_i) is its radii. Figure 6 shows an example of two-moment images using ${\mathscr{B}} = \{(1,8)\}$. Given that the second-order moment (variance moment) tends to emphasize only dominant edges, some potentially important information could be discarded. To handle this problem, we propose to perform an extraction of root k-th for the variance moment as $\mu ^{\prime }_{2} = \sqrt [k]{\mu _{2}}$ (k ∈ [2,16]). For example, Fig. 6e shows the new variance moment ($\mu ^{\prime }_{2}$), built by extracting root 9-th from the original one. With this method, more useful facial features, such as eye, nose, and mouth, can be enhanced as salient regions. Thus, the weighted histogram can enrich the essential areas by exploiting the contribution of every statistical pattern in the variance image. In the next section, we show how $\mu ^{\prime }_{2} = \sqrt [9]{\mu _{2}}$ under the ${\mathscr{B}} = \{(1,6)\}$ for structuring element makes a huge difference through a series of experiments with six public face datasets.

5 Experiments

This section describes experiments with six face datasets, such as ORL, YALE, AR, Caltech, FERET, and KDEF. Our statistical feature descriptors were processed with the algorithm mentioned above. Below features, that were the concatenation of 4 directions in exploiting CLBP$_{\alpha _{i}}$ operators, were used in our experiments:

CLBP_S(m₁) = CLBP_S₀_CLBP_S₄₅_CLBP_S₉₀ _CLBP_S₁₃₅
CLBP_M(m₁) = CLBP_M₀_CLBP_M₄₅_CLBP_M₉₀ _CLBP_M₁₃₅
CLBP_S($m_{1},\mu ^{\prime }_{2}$) = CLBP_S(m₁)_CLBP_S($\mu ^{\prime }_{2}$)
CLBP_M($m_{1},\mu ^{\prime }_{2}$) = CLBP_M(m₁)_CLBP_M($\mu ^{\prime }_{2}$)
CLBP_S_M(m₁) = CLBP_S(m₁)_CLBP_M(m₁)
CLBP_S_M($m_{1},\mu ^{\prime }_{2})=$ CLBP_S_M(m₁) _CLBP_S_M($\mu ^{\prime }_{2}$)
WSBP_S, WSBP_M, and WSBP were weighted-statistical CLBPs applied to S, M, and a fusion of S and M components as described in Section 3, respectively. Note that each descriptor had the concatenated histogram by 4 directions of CLBP$_{\alpha _{i}}$, {α_i} = {0⁰,45⁰,90⁰,135⁰}.

The fusion of different directions and components (S, M, m₁, $\mu ^{\prime }_{2}$) would lead to a very long descriptor as the concatenation of histograms. To handle this problem, the Principal Component Analysis (PCA) with the percentange of cumulative sum of eigenvalues of 95%, was adopted for the dimension reduction purpose. For the classification task, the Linear SVMs were utilized.

5.1 Databases and experimental protocols

The ORL dataset

^{Footnote 1} had 40 subjects and 10 different gray-scale images with a size of 92 × 112 were collected from each subject. All ORL images were collected under various conditions such as facial expression, illumination changes, occlusion (sun glasses), see Fig. 7.

The YALE Face dataset

^{Footnote 2} included 165 images from 15 individuals and 11 different images with the size of 243 × 320 were collected from each subject. The dataset had various expressions and lighting conditions, see Fig. 8.

The Caltech 1999 dataset

^{Footnote 3} produced by California Institute of Technology had 447 images from 26 persons, yet the number of images for each person was different and collected under the unconstrained background. The dataset had various conditions such as different expression, illuminations and occlusions, see Fig. 9.

The KDEF dataset

^{Footnote 4} [3] is a set of 4900 photographs of facial expressions. The set had 70 persons (35 males and 35 females) displaying seven facial expressions under five different viewing angles. For the present evaluation, the frontal view was considered for each facial expression. This subset contained 490 color images for 70 individuals as shown in Fig. 10 where each subject expressed seven different emotions.

The AR dataset

^{Footnote 5} [31] had 3016 face images for 116 persons (63 men and 53 women), each having 26 color images (768 × 576) under severe illumination conditions (left-light, right-light or all sidelights), 7 basic emotions (happy, sad, neutral, sleepy, anger, surprised and wink), head poses, and occlusion (sun glasses and bangs). Figure 11 shows several examples in which original color images were converted to gray-scale images and decomposed into mean and variance moment images. For the dataset, we conducted two experiments for comprehensive evaluation. Because of color images, the first experiment was examined with gray-scale images following the same protocol as other datasets, while the other was carried on other color channels to give several our perspectives.

The FERET dataset

^{Footnote 6} [41], collected in 15 sessions during four years, was a large benchmark used extensively for comparison. The dataset comprised a total of 14,126 images from 1199 individuals. A subset adopted for evaluations had 1400 images of 200 subjects (7 images per person), including variations in poses, expression, and illumination. Figure 12 shows images under 7 states of each person.

5.2 Results with the ORL and YALE datasets

For the ORL dataset, N_train training images per subject were randomly selected (N_train = 2, 4, 5, 8) while the remaining (10 - N_train) images were used for testing. For the YALE dataset, N_train training images per subject were randomly selected (N_train = 2, 4, 6, 8) and the remaining (11 - N_train) images were carried out for testing. The measurements were repeated 100 times by shuffling data process. Results were shown in Tables 1 and 2 for the average classification rate.

Table 1 Recognition rates for the ORL dataset

Full size table

Table 2 Recognition rate for the YALE dataset

Full size table

Tables 1 and 2 summarised our experimental results under the various configurations. First, the CLBP_S(m₁) and CLBP_M(m₁), the WSBP_S and WSBP_M produced similar recognition results. The CLBP_S(m₁) and WSBP_S played a major role in achieving good results, suggesting that the S component encoded more valuable information from each face image. Second, when utilizing both mean (m₁) and variance ($\mu ^{\prime }_{2}$) moments as the input data for CLBP, the CLBP_S($m_{1}, \mu ^{\prime }_{2}$) and CLBP_M($m_{1}, \mu ^{\prime }_{2}$) dramatically increased the classification rate than considering only the first-order (m₁) moment, suggesting that fusion of S and M components improved the performance. Indeed, the CLBP_S_M(m₁) produced better results. For instance, when (P, R) = (4, 2), the best results obtained by CLBP_S_M(m₁) with the number of training images (N = 2, 4, 5, and 8) for the ORL dataset were 86.22%, 96.2%, 98.25%, and 99.45%, respectively. Similarly, the WSBP descriptors reached 87.91%, 96.77%, 98.51%, or 99.41%, respectively. On the other hand, WSBP and CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) were similar. For the YALE dataset (Table 2), our WSBP outperformed the other methods: 91.95%, 97.14%, 98.72%, and 99.44% at the scale of (P, R) = (4, 2).

Also, our proposed method was compared with other state-of-the-art methods as shown in Table 3, that summarized techniques and recognition rate corresponding to each method. Ten methods on the top, including ours, were based on several hand-crafted features, and three remaining ones were based on deep features. Our method achieved 98.51% and 98.72% recognition rates for ORL and YALE datasets, which were greater than other methods, suggesting that our descriptor was robust against visual challenges, such as illumination variation, facial expressions, head poses (multi-orientation), and occlusion.

Table 3 Performance comparison with the ORL and YALE datasets

Full size table

5.3 Results with Caltech 1999 and KDEF datasets

Since the number of images for each class in Caltech 1999 dataset varied, we did not change the number of training images for each class like the previous experiments. Here, we randomly chose half of the images in each class as a training set, and the remaining ones were used as a testing set. Table 4 showed our results for three configurations of (P, R), respectively, suggesting that our WSBP and CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) achieved the highest recognition rate. Table 5 compared our results with a deep learning approach based on Deep Stack Denoising Sparse Autoencoders (DSDSA) [13]. As can be seen from these tables, even using one single small scale (P, R) = (4, 2), our descriptor WSBP (Ours 1), CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) reached the recognition rate of 98.83%, 98.96%, respectively, which were greater than the performance of DSDSA. When we considered CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) at the scale of (P, R) = (6, 3), the performance was 99.03% (Ours 2).

Table 4 Recognition rates for the Caltech dataset

Full size table

Table 5 Performance comparison with the Caltech dataset

Full size table

For the KDEF dataset, we conducted the face recognition task by changing the number of training images for each person to verify the accuracy rates of each train/test portion. Indeed, several training images N_train (N_train = 2, 3, 4, 5) were randomly chosen while the remaining images (7 - N_train) were used for test. Here, the evaluation was repeated 100 times by shuffling data to get the average accuracy. Table 6 showed our results for three configurations of (P, R), respectively. Specifically, at the scale of (P, R) = (4, 2), our descriptor WSBP substantially increased the accuracy up to 94.11%, 97.87%, 99.07%, and 99.33% with the number of training images N_train = 2, 3, 4, and 5, respectively. Such high performance suggests that our descriptor could effectively deal with visual challenges such as diverse facial expressions, illumination or occlusions.

Table 6 Recognition rate for the KDEF dataset

Full size table

5.4 Results with the AR dataset

5.4.1 Evaluation with gray-scale images

We carried out the cross-validation for training and testing. The different number of training images (N_train = 10, 13, 15, 20) and the remaining (26 - N_train) were used for training and testing sets to guarantee they had unseen images. Results using 100 shuffle splits were summarized in Table 7 for the recognition rate and Table 8 for comparing with other methods, respectively.

Table 7 Recognition rate for the AR dataset

Full size table

Table 8 Performance comparison with the AR dataset

Full size table

Table 7 showed several LBPs results obtained with various parameters. As it can be seen, the CLBP_S(m₁) and CLBP_M(m₁) obtained the base results. With a specific parameter of (P, R) = (6, 2), the best results obtained by CLBP_S(m₁) with N_train = 10,13,15,20 were respectively 82.87%, 88.83%, 90.79%, 95.90%. Similarly, the results of CLBP_M(m₁) were respectively 80.09%, 86.22%, 89.83%, 95.67%. However, the recognition rates were improved significantly when bilaterally complementing S and M components with CLBP_S_M(m₁), the results were 98.46%, 98.68%, 99.04%, 99.93%, respectively. In this case, the CLBP_S_M(m₁) had a significant improvement which could increase 12.46% at N_train = 13 (compared to CLBP_M(m₁) and CLBP_S(m₁)). Moreover, CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) and WSBP descriptors also significantly increased the performance when reaching 98.79% and 99.37%. With this parameter, our proposed framework WSBP outperformed the CLBP_S_M(m₁) (0.69%) and CLBP_S_M($m_{1}, \mu ^{\prime }_{2}$) (0.58%). Notice that the improvement of WSBP could reach 13% compared to the original LBPs.

Table 8 compared our method with others. In terms of the recognition rates, Ours outperformed the state-of-the-art methods, including hand-crafted features and deep features techniques. Also, our WSBP was better than Multi-resolution dictionary [30] (82.19%), MNTCDP [21] (96.18%), Local Multiple Patterns [49] (98.00%), or even deep facial features CS [2] (93.99%) by a substantial margin. The remaining algorithms, including EL-LBP [44] (98.27%) and deep feature FDDL + CNN [39] (98%) were comparable with our descriptors, and yet ours prevailed.

5.4.2 Evaluation with color channel

The motivation of this experiment was to check the behaviors of facial descriptors for the color channel. The experiment was conducted with the HSV color images by keeping the other experimental setting was similar to that of the gray-scale image. First, an RGB color image was converted into an HSV color image. Second, the Hue channel was extracted from the HSV space, called it H image, and fed it as an input for our experiment. Our descriptors were able to extract the eyes, eye-blows and mouth from H image probably because these areas had the distinctive colors. And yet the Sign (S) and Magnitude (M) components, computed by CLBP_α on m₁, could not discriminate the subtle color change occurring within the facial skin area, as shown in Fig. 13.

An evaluation with color channel was conducted with the same protocol settings of the gray-scale case. Result with 100 shuffle splits was summarized in Table 9. Note that (P, R) = (4, 2), and N_train = 13 were specific parameters chosen from Table 9, suggesting that S worked better than M for three cases. For instance, the accuracy for CLBP_S(m₁), CLBP_S($m_{1},\mu ^{\prime }_{2}$), and WSBP_S was 49.41%, 93.56%, and 90.28%, respectively, while that of CLBP_M(m₁), CLBP_M($m_{1},\mu ^{\prime }_{2}$), and WSBP_M was low as 24.36%, 55.32%, and 57.06%, respectively, indicating that the combination of S and M somehow impaired the overall accuracy compared to the S case. In addition, the accuracy for CLBP_S_M(m₁), CLBP_S_M($m_{1},\mu ^{\prime }_{2}$), and WSBP reached 33.73%, 82.52%, and 83.05%, respectively.

Table 9 Performance comparison with S, M and S-M components where the structuring element was ${\mathscr{B}} = \{(1, 5);(2, 8)\}$

Full size table

CLBP_α, which was inspired from CLBP [14], was designed for the gray-scale case to complement the crucial component M. It was not very effective in discriminating the color change within facial skin (see Magnitude components of H and the gray-scale images in Fig. 13). On the other hand, S component worked very well on H image by utilizing statistical moments ($m_{1}, \mu ^{\prime }_{2}$), since its accuracy was comparable with the state-of-the-art methods. For instance, the accuracy of CLBP_S($m_{1},\mu ^{\prime }_{2}$) and WSBP_S reached 93.56% and 90.28%, respectively, while CLBP_S(m₁) reached 49.41%. Notice that the accuracies of CLBP_S($m_{1},\mu ^{\prime }_{2}$) and WSBP_S were better than that of CLBP_S(m₁) since margins were 44.15% and 40.87%, respectively. These results suggest that our descriptor was designed to extract the spatial relationship of the neighboring pixels, not to simply discriminate the magnitude between pixels.

5.5 Results with the FERET dataset

Previous works [10, 29] in the literature performed their experiments with a protocol by using only frontal sets (Fa, Fb, Fc, Duplicate I, and Duplicate II) where Fa with 1196 images known as the gallery and others known as probes. Unlike the previous protocol, we did with a subset created from 1400 images (ba, bd, be, bf, bg, bj, bk) in which each person had two facial expression images, two left pose images, two right pose images, and one illumination image. This subset was more challenging than the previous one since it comprised not only frontal faces but also multiple orientations, expressions. Moreover, experimental results under this subset could reflect changing accuracy rates of each train/test portion.

The experiment was carried out with random N_train training images of each class (N_train = 1,2,3,4,5,6), and N_test testing images (N_test = 7 - N_train) by 100 splits for average accuracy. Table 10 illustrates the achieved results on CLBP$_{\alpha _{i}}$ operators at various scales of (P, R). For most cases, WSBP obtained the best results and reached over 90% accuracy with 2 training images only. Table 11 compared a few recent methods and stated that our descriptors achieved the best performance. In detail, WSBP with 3 training images exceeded MNTCDP [21] at 2.57%, which was not easy to deal with the challenging FERET dataset having images under multiple orientations. As mentioned above, FERET had two different protocols for evaluations. It would not be a fair evaluation if we compared such methods under different protocols. And yet, it is interesting to evaluate the previous reports. For this purpose, we performed the average accuracy result based on recent reports: CLBP [10], SLBFLE [29], and WPCBP+FLD (HI) [47] (see Table 11), wherein these methods performed efficiently with the subset of the frontal face cases.

Table 10 Recognition rate for the FERET dataset

Full size table

Table 11 Performance comparison with the FERET dataset

Full size table

5.6 Robustness against degraded images

In practical surveillance scenarios, the degradation of images often happened during the acquisition process and could significantly affect the system performance. Therefore, motivation of this experiment was to examine how our facial descriptors dealt with such problems. In the first scenario, the Gaussian noise was added to the original image. For instance, five different levels of Gaussian noise were added by levels = {10%,20%,30%,40%,50%} using the Matlab function “imnoise”. In the second scenario, occlusion was simulated by adding a white rectangle of random positions within the face region. Each rectangle had various sizes, ranging from [20, 20] to [30, 60] with a Matlab function “insertShape”. Figure 14 showed the both scenarios.

In each scenario, five images were chosen in each class as training samples, whereas the rest as testing samples, by splitting the data into 100. The average recognition rates from the different methods were shown in Table 12. Here, we fine-tuned the structuring element ${\mathscr{B}}_{2} = \{(1, 5); (2, 6)\}$ to obtain the best achievements for WSBP and CLBP_S_M($m_{1},\mu ^{\prime }_{2}$). This structuring element made our descriptors more robust against noise and occlusion comparing to other methods.

Table 12 Performance comparison for different descriptors with the ORL dataset added with Gaussian noise and occlusion

Full size table

5.7 The processing time

This section describes the computational cost of several descriptors based on LBPs. Experiments of the ORL dataset for 400 images with 92 × 112 pixels were carried out with a machine with 3.5GHz CPU, 32GB RAM, and Windows 10 64-bit operating system. Table 13 showed the computational cost from two aspects: firstly, the processing time for the feature descriptor extraction phase and; secondly, the processing time for the matching phase (in seconds) of various descriptors with three different configurations of (P, R). The processing time measured here was based on the structuring element ${\mathscr{B}} = \{(1, 6)\}$, where the training set had 200 images and the testing sets had 200 images.

Table 13 The processing time of the descriptors used for the present study with different parameters (FT: feature extraction time, FTS: feature size of LBPs descriptors without using any dimension reduction techniques, and MT: matching time)

Full size table

Table 13 showed that the WSBP required a longer processing time for the feature extraction and matching phase than CLBP_S(m₁) or WSBP_S. Indeed, it took much more processing time proportionally to a size of (P, R) due to larger dimension. And yet, notice that our WSBP descriptor was effective when it was compared with CLBP_S_M($m_{1},\mu ^{\prime }_{2}$) since both recognition rates were approximately the same; see Tables 13 and 1.

6 Summary and discussion

Based on our experiments, we summarize and discuss several advantages of our proposed descriptors:

The WSBP descriptor is designed to extend the LBPs with local difference Sign-Magnitude distributions on statistical moments. As a pre-processing step, statistical moment images obtained by spatial support ${\mathscr{B}}$ of local filters can eliminate noise coming from contrast change or illumination variation (mean moment) and yet derive useful information from the salient regions in a face image (variance moment) (see Fig. 6).
The classical LBPs consider neighborhoods bilaterally in a circle, whereas our WSBP descriptors are to exploit CLBP$_{\alpha _{i}}$ operators along with multiple directions, i.e. four directions, independently and combine them in the final descriptors. It is found that they are robust against different lighting conditions, head poses, and facial expressions to achieve high performance (see CLBP_S_M(m₁), CLBP_S_M($m_{1}, \mu ^{\prime }_{2}$), and WSBP in Tables 1, 2, 4, 7, and 10).
Since the WSBP is built by fusing CLBPs along four different directions {α_i} = {0⁰,45⁰,90⁰,135⁰}, it works well with a single scale (P, R) of CLBP operators. It is then unnecessary to exploit a multi-scale approach using many parameters (P, R) because it could lead to a high dimensional descriptor.
Evaluation using six face datasets suggests that our descriptors outperform state-of-the-art methods, such as EL-LBP [44], AECLBP-S (B16) [22], Multi-resolution dictionary [30], DR-LBP + LDA [35], LDENP [42]. Moreover, our WSBP descriptors achieves better results than some deep facial features such as Deep Belief Net (GDBN) [8], Deep Autoencoders (DSDSA) [13], Compressive sensing (CS) [2], or FDDL + CNN [39] (see Tables 3, 5, and 8).
According to an additional experiment with the color channel, it is found that the Magnitude transform captures the relationship of pixel magnitude on gray-scale image very well, but is not effective with Hue image (see Fig. 13), since a combination of Sign-Magnitude of CLBP_α in Hue space performs worse than that of the gray-scale case. Also, fusing statistical moments (m₁ and μ₂) in CLBP_S and WSBP_S achieves the higher accuracy in Hue space, by ignoring texture pixel intensity. This evaluation suggests a new direction in face recognition problems, such as integrating many color channels to enhance face spoofing detection performance [43].
Although the YALE dataset contains some facial expression cases, it would be interesting in testing how our descriptor affords systematic variation of facial expressions. In addition, we use the KDEF dataset which has seven facial expressions for each subject to study the effect of facial expressions. Result suggests that our descriptor deals with such cases very well.
Our facial descriptors using both mean (m₁) and variance (μ₂) have shown their robustness against degraded images by evaluating the ORL dataset that contains artificial noise. Given that the Gaussian noise level of 50% makes the degraded face more challenging to recognize by human eyes, our WSBP(${\mathscr{B}}_{2}$) descriptor still reaches the acceptable accuracy of 93.05% for noise and 85.09% for occlusion, which are much higher than those of other LBPs.

7 Conclusions and future work

We present a set of descriptors wherein the local difference distributions in local binary patterns are exploited by directions, and then a weighting approach for binary patterns is applied to statistical moment images for an efficient and robust facial feature representation. A comprehensive evaluation with several standard face datasets is carried out to validate our proposal. We have analyzed the behaviors of several descriptors with gray-scale images and found that our method mostly outperforms state-of-the-arts. Also, an analysis with a set of color images has also been examined using the Hue channel for AR dataset. We have also simulated a few practical scenarios, that can be occurred during the data acquisition stage, by adding various Gaussian noise and random occlusion to the ORL dataset. One may understand that the spatial support strategy is a special preprocessing technique to eliminate the noise issues, and selecting the structure element ${\mathscr{B}}$ depends on the levels and types of noise. For the scenarios examined in this study, it is found that the structuring element of two circles eliminates noise very efficiently. Although this issue could downgrade the recognition performance, our experimental result is still higher than others. It shows that our proposed descriptor is robust against the degradation of the given image. Overall, our experimental results suggest that the proposed descriptor is robust against noise, contrast change, illumination variation, and facial expressions by exploiting different directions of binary pattern operators on the mean moment and considering the contribution of binary pattern to the variance moment.

We expect that these descriptors find more applications in the face recognition area and other areas such as facial paralysis analysis and face spoofing detection. Although our proposed framework is novel and high-performing, it has a few issues to be addressed: (1) the computational cost for matching increases when the descriptor dimension becomes larger; (2) it is necessary to fine-tune the optimal k-parameter for the root extraction variance moment. We plan to focus on how to deal with them. Also, it would be interesting to combine the WSBP descriptors with deep neural network for building powerful descriptors.

Notes

References

Ahonen T, Hadid A, Pietikainen M (2006) Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Anal Mach Intell 28(12):2037–2041
Article Google Scholar
Biswas S, Sil J, Maity S (2018) On prediction error compressive sensing image reconstruction for face recognition. Comput Electr Eng 70:722–735
Article Google Scholar
Calvo MG, Lundqvist D (2008) Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behav Res Methods 40(1):109–115
Article Google Scholar
Chakraborty S, Singh SK, Chakraborty P (2017) Local quadruple pattern: A novel descriptor for facial image recognition and retrieval. Comput Electric Engineer 62:92–104
Article Google Scholar
Chakraborty S, Singh SK, Chakraborty P (2018) Centre symmetric quadruple pattern: A novel descriptor for facial image recognition and retrieval. Pattern Recogn Lett 115:50–58
Article Google Scholar
Chakraborty S, Singh SK, Chakraborty P (2018) Local gradient hexa pattern: a descriptor for face recognition and retrieval. IEEE Trans Circ Syst Vid Technol 28(1):171–180
Article Google Scholar
Chan CH, Yan F, Kittler J, Mikolajczyk K (2015) Full ranking as local descriptor for visual recognition: A comparison of distance metrics on sn. Pattern Recogn 48(4):1328–1336
Article Google Scholar
Chen Y, Huang T, Liu H, Zhan D (2016) Multi-pose face ensemble classification aided by Gabor features and deep belief nets. Optik 127(2):946–954
Article Google Scholar
Chihaoui M, Elkefi A, Bellil W, Ben Amar C (2016) A survey of 2D face recognition techniques. Computers 5
Deng W, Hu J, Guo J (2019) Compressive binary patterns: designing a robust binary face descriptor with random-field Eigenfilters. IEEE Trans Pattern Anal Mach Intell 41(3):758–767
Article Google Scholar
Ding C, Choi J, Tao D, Davis LS (2016) Multi-directional multi-level dual-cross patterns for robust face recognition. IEEE Trans Pattern Anal Mach Intell 38(3):518–531
Article Google Scholar
Ding C, Tao D (2015) Robust face recognition via multimodal deep face representation. IEEE Trans Multimed 17(11):2049–2058
Article Google Scholar
Görgel P, Simsek A (2019) Face recognition via Deep Stacked Denoising Sparse Autoencoders (DSDSA). Appl Math Comput 355:325–342
MathSciNet MATH Google Scholar
Guo Z, Zhang L, Zhang D (2010) A completed modeling of local binary pattern operator for texture classification. IEEE Trans Image Process 19(6):1657–1663
Article MathSciNet Google Scholar
Hernandez-Matamoros A, Bonarini A, Escamilla-Hernandez E, Nakano-Miyatake M, Perez-Meana H (2016) Facial expression recognition with automatic segmentation of face regions using a fuzzy based classification approach. Knowl-Based Syst 110:1–14
Article Google Scholar
Huang P, Gao G, Qian C, Yang G, Yang Z (2017) Fuzzy linear regression discriminant projection for face recognition. IEEE Access 5:4340–4349
Article Google Scholar
Işık Ṡ, Özkan K (2015) A comparative evaluation of well-known feature detectors and descriptors. Int J Appl Math Electron Comput 3:1–6
Article Google Scholar
Jridi M, Napoléon T, Alfalou A (2018) One lens optical correlation: application to face recognition. Appl Opt 57(9):2087–2095
Article Google Scholar
Karanwal S, Diwakar M (2020) Two novel color local descriptors for face recognition. Optik :166007
Karczmarek P, Kiersztyn A, Pedrycz W, Dolecki M (2017) An application of chain code-based local descriptor and its extension to face recognition. Pattern Recogn 65:26–34
Article Google Scholar
Kas M, El merabet Y, Ruichek Y, Messoussi R (2018) Mixed neighborhood topology cross decoded patterns for image-based face recognition. Expert Syst Appl 114:119–142
Article Google Scholar
Kas M, El-merabet Y, Ruichek Y, Messoussi R (2020) A comprehensive comparative study of handcrafted methods for face recognition LBP-like and non LBP operators. Multimed Tools Appl 79(1):375–413
Article Google Scholar
Kortli Y, Jridi M, Al Falou A, Atri M (2020) Face recognition systems: a survey. Sensors 20(2)
Le T, Vo MT, Kieu T, Hwang E, Rho S, Baik SW (2020) Multiple electric energy consumption forecasting using a cluster-based strategy for transfer learning in smart building. Sensors 20(9)
Liang H, Gao J, Qiang N (2020) A novel framework based on wavelet transform and principal component for face recognition under varying illumination. Appl Intell. https://doi.org/10.1007/s10489-020-01924-9
Liao S, Jain AK, Li SZ (2016) A fast and accurate unconstrained face detector. IEEE Trans Pattern Anal Mach Intell 38(2):211–223
Article Google Scholar
Lin J, Chiu CT (2017) Low-complexity face recognition using contour-based binary descriptor. IET Image Process 11(12):1179–1187
Article Google Scholar
Liouane Z, Lemlouma T, Roose P, Weis F, Messaoud H (2018) An improved extreme learning machine model for the prediction of human scenarios in smart homes. Appl Intell 48:2017–2030. 10.1007/s10489-017-1062-5
Article Google Scholar
Lu J, Liong VE, Zhou J (2018) Simultaneous local binary feature learning and encoding for homogeneous and heterogeneous face recognition. IEEE Trans Pattern Anal Mach Intell 40(8):1979–1993
Article Google Scholar
Luo X, Xu Y, Yang J (2019) Multi-resolution dictionary learning for face recognition. Pattern Recogn 93:283–292
Article Google Scholar
Martinez A, Benavente R (1998) The ar face database. CVC Technical Report 24
Mehta R, Egiazarian K (2016) Dominant Rotated Local Binary Patterns (DRLBP) for texture classification. Pattern Recogn Lett 71:16–22
Article Google Scholar
Mi J, Liu T (2016) Multi-step linear representation-based classification for face recognition. IET Comput Vis 10(8):836–841
Article Google Scholar
Moussa M, HMILA M, Douik A (2018) A novel face recognition approach based on genetic algorithm optimization. Stud Inf Control 27(1):127–134
Google Scholar
Najafi Khanbebin S, Mehrdad V (2020) Local improvement approach and linear discriminant analysis-based local binary pattern for face recognition. Neural Comput Appl. https://doi.org/10.1007/s00521-020-05512-3
Napoléon T, Alfalou A (2017) Pose invariant face recognition: 3D model from single photo. Opt Lasers Eng 89:150–161
Article Google Scholar
Nguyen TP, Vu NS, Manzanera A (2016) Statistical binary patterns for rotational invariant texture classification. Neurocomputing 173:1565–1577
Article Google Scholar
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Article Google Scholar
Ouanan H, Ouanan M, Aksasse B (2018) Non-linear dictionary representation of deep features for face recognition from a single sample per person. Procedia Comput Sci 127:114–122
Article Google Scholar
Pham NT, Lee JW, Park CS (2020) Structural correlation based method for image forgery classification and localization. Appl Sci 10(13)
Phillips PJ, Moon H, Rizvi SA, Rauss PJ (2000) The FERET evaluation methodology for face-recognition algorithms. IEEE Trans Pattern Anal Mach Intell 22(10):1090–1104
Article Google Scholar
Pillai A, Soundrapandiyan R, Satapathy S, Satapathy SC, Jung KH, Krishnan R (2018) Local diagonal extrema number pattern: A new feature descriptor for face recognition. Futur Gener Comput Syst 81:297–306
Article Google Scholar
Shu X, Tang H, Huang S (2020) Face spoofing detection based on chromatic ED-LBP texture feature. Multimed Syst
Truong HP, Kim YG (2018) Enhanced line local binary patterns (EL-LBP): An efficient image representation for face recognition Proc of 2018 ACIVS, pp. 285–296
Truong HP, Vo TMD, Le T (2016) Face recognition based on LDA in manifold subspace EAI. Endorsed Trans Context-aware Syst Appl 3(9)
Vo AH, Hoang Son L, Vo MT, Le T (2019) A novel framework for trash classification using deep transfer learning, vol 7, pp 178631–178639
Xu Z, Jiang Y, Wang Y, Zhou Y, Li W, Liao Q (2019) Local polynomial contrast binary patterns for face recognition. Neurocomputing 355:1–12
Article Google Scholar
Yang W, Wang Z, Zhang B (2016) Face recognition using adaptive local ternary patterns method. Neurocomputing 213:183–190
Article Google Scholar
Yang W, Zhang X, Li J (2020) A local multiple patterns feature descriptor for face recognition. Neurocomputing 373:109–122
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Institute of Information & communications Technology Planning & Evaluation (IITP), grant funded by the Korea government (MSIT) (No.2019-0-00231) as well as by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2020R1A6A1A03038540).

We would like to express our deep gratitude to the reviewers and editors, who pointed out the valuable and insightful remarks allowing us to clarify the presentation of this work.

Author information

Authors and Affiliations

Department of Computer Engineering, Sejong University, Seoul, Korea
Hung Phuoc Truong & Yong-Guk Kim
CNRS, LIS, Université de Toulon, Aix Marseille Université, Marseille, France
Thanh Phuong Nguyen

Authors

Hung Phuoc Truong
View author publications
You can also search for this author in PubMed Google Scholar
Thanh Phuong Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Yong-Guk Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Thanh Phuong Nguyen or Yong-Guk Kim.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Truong, H.P., Nguyen, T.P. & Kim, YG. Weighted statistical binary patterns for facial feature representation. Appl Intell 52, 1893–1912 (2022). https://doi.org/10.1007/s10489-021-02477-1

Download citation

Accepted: 26 April 2021
Published: 31 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02477-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Weighted statistical binary patterns for facial feature representation

Abstract

Similar content being viewed by others

Recognizing Individuals from Unconstrained Facial Images

Local Binary Pattern and Its Variants: Application to Face Analysis

Analysis of Local Descriptors for Human Face Recognition

Explore related subjects

1 Introduction

2 Related works

2.1 LBP

2.2 Completed LBP

2.3 Face representation based on LBPs

2.4 Statistical moment images

3 Weighted Statistical Binary Patterns by direction α (WSBPα)

3.1 Local Binary Patterns by direction (LBPα)

3.2 Complementary Local Binary Patterns by direction α (CLBPα)

3.3 Weighted Statistical CLBP by directions α i (WSBP\(_{\alpha _{i}}\))

3.4 The computational complexity

4 Implementation

4.1 The fusion of different descriptors WSBP\(_{\alpha _{i}}\)

4.2 Moment parameters

5 Experiments

5.1 Databases and experimental protocols

The ORL dataset

The YALE Face dataset

The Caltech 1999 dataset

The KDEF dataset

The AR dataset

The FERET dataset

5.2 Results with the ORL and YALE datasets

5.3 Results with Caltech 1999 and KDEF datasets

5.4 Results with the AR dataset

5.4.1 Evaluation with gray-scale images

5.4.2 Evaluation with color channel

5.5 Results with the FERET dataset

5.6 Robustness against degraded images

5.7 The processing time

6 Summary and discussion

7 Conclusions and future work

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

3 Weighted Statistical Binary Patterns by direction α (WSBP_α)

3.1 Local Binary Patterns by direction (LBP_α)

3.2 Complementary Local Binary Patterns by direction α (CLBP_α)

3.3 Weighted Statistical CLBP by directions α _i (WSBP\(_{\alpha _{i}}\))