0% found this document useful (0 votes)

53 views11 pages

Deep Learning Face Attributes in

This document summarizes a research paper on deep learning for predicting face attributes in unconstrained images. It proposes a novel deep learning framework that cascades two CNNs - LNet and ANet. LNet is pre-trained on general object categories for face localization and ANet is pre-trained on face identities for attribute prediction. This framework outperforms state-of-the-art methods and reveals that pre-training LNet and ANet differently improves their performance on localization and attribute prediction, respectively.

Uploaded by

Nguyễn Thái

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views11 pages

Deep Learning Face Attributes in

Uploaded by

Nguyễn Thái

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Deep Learning Face Attributes in the Wild∗

Ziwei Liu1 Ping Luo1 Xiaogang Wang2 Xiaoou Tang1

1
Department of Information Engineering, The Chinese University of Hong Kong
2
Department of Electronic Engineering, The Chinese University of Hong Kong
{lz013,pluo,xtang}@ie.cuhk.edu.hk, [email protected]
arXiv:1411.7766v3 [cs.CV] 24 Sep 2015

Arched Eyebrows Receding Hairline Smiling Mustache Young

Abstract

(a) HOG(landmarks)+SVM
Predicting face attributes in the wild is challenging due
to complex face variations. We propose a novel deep
learning framework for attribute prediction in the wild.
true false false false true
It cascades two CNNs, LNet and ANet, which are fine-
tuned jointly with attribute tags, but pre-trained differently.
LNet is pre-trained by massive general object categories for (b) Our Method

face localization, while ANet is pre-trained by massive face

identities for attribute prediction. This framework not only
true true true true true
outperforms the state-of-the-art with a large margin, but
also reveals valuable facts on learning face representation.
(1) It shows how the performances of face localization
(c)

(LNet) and attribute prediction (ANet) can be improved

by different pre-training strategies. (2) It reveals that 5 attributes 10 attributes 20 attributes 40 attributes
although the filters of LNet are fine-tuned only with image- Figure 1. (a) Inaccurate localization and alignment lead to prediction
level attribute tags, their response maps over entire images errors on attributes by existing methods (b) LNet localizes face regions by
have strong indication of face locations. This fact enables averaging the response maps of attribute filters. ANet predicts attributes
training LNet for face localization with only image-level without alignment (c) Face localization with the averaged response map
when LNet is trained with different numbers of attributes. (Best viewed in
annotations, but without face bounding boxes or landmarks, color)
which are required by all attribute recognition works. (3)
It also demonstrates that the high-level hidden neurons of They are not robust to deformations of objects [23]. Recent
ANet automatically discover semantic concepts after pre- local models [15, 4, 5, 2, 19, 32] first detect object parts
training with massive face identities, and such concepts are and extract features from each part. These local features
significantly enriched after fine-tuning with attribute tags. are concatenated to train classifiers. For example, Kumar et
Each attribute can be well explained with a sparse linear al. [15] predicted face attributes by extracting hand-crafted
combination of these concepts. features from ten face parts. Zhang et al. [32] recognized
1. Introduction human attributes by employing hundreds of poselets [4]
to align human body parts. These local methods may fail
Face attributes are beneficial for multiple applications when unconstrained face images with complex variations
such as face verification [15, 2, 25], identification [20], and are present, which makes face localization and alignment
retrieval. Predicting face attributes from images in the wild difficult. As shown in Fig.1 (a), HOG+SVM fails because
is challenging, because of complex face variations such as the faces or landmarks are wrongly localized or misaligned.
poses, lightings, and occlusions as shown in Fig.1. Thus the features are extracted at wrong positions [26].
Attribute recognition methods are generally categorized Recent research shows that face localization and alignment
into two groups: global and local methods. Global methods are still not well solved problems, especially in the wild
extract features from the entire object, where accurate condition, although much progress has been achieved in the
locations of object parts or landmarks are not required. past decade. It is also proved by our experimental result.
∗ This work has been accepted to appear in ICCV 2015. This is the pre- This work revisits global methods by proposing a novel
printed version. Content may slightly change prior to the final publication. deep learning framework, which integrates two CNNs,
LNet and ANet, where LNet locates the entire face region attributes. The performance of attribute prediction drops
and ANet extracts high-level face representation from the without this pre-training stage.
located region. The novelties are in three aspects. Firstly, The main contributions are summarized as follows.
LNet is trained in a weakly supervised manner, i.e. only (1) We propose a novel deep learning framework, which
image-level attribute tags of training images are provided, combines massive objects and massive identities to pre-train
making data preparation much easier. This is different from two CNNs for face localization and attribute prediction,
training face and landmark detectors, where face bounding respectively. It achieves state-of-the-art attribute classifi-
boxes and landmark positions are required. LNet is pre- cation results on both the challenging CelebFaces [27] and
trained by classifying massive general object categories, LFW [12] datasets, improving existing methods by 8 and
such that its pre-trained features have good generalization 13 percent, respectively. (2) A novel fast feed-forward
capability on handling large background clutters. LNet is algorithm for CNN with locally shared filters is devised. (3)
then fine-tuned by attributes tags. We demonstrate that fea- Our study reveals multiple valuable facts on leaning face
tures learned in this way are effective for face localization representation by deep models. (4) We also contribute a
and also can distinguish subtle differences between human large facial attribute database with more than eight million
faces and analogous patterns, such as a cat face. attribute labels and it is 20 times larger than the largest
Secondly, ANet extracts discriminative face represen- publicly available dataset.
tation, making attribute recognition from the entire face 1.1. Related Work
region possible. ANet is pre-trained by classifying massive
face identities and is fine-tuned by attributes. We show that Extracting hand-crafted features at pre-defined land-
the pre-training step enables ANet to account for complex marks has become a standard step in attribute recognition
variations in the unconstrained face images. [9, 15, 4, 2]. Kumar et al. [15] extracted HOG-like features
Thirdly, within the rough locations of face regions on various face regions to tackle attribute classification and
provided by LNet, averaging the predictions of multiple face verification. To improve the discriminativeness of
patches can improve the performance. A simple way is hand-crafted features given a specific task, Bourdev et al.
to evaluate the feed-forward pass for each single patch. [4] built a three-level SVM system to extract higher-level
However, it is slow and has a lot of redundant computation. information. Deep learning [23, 7, 19, 32, 31, 13, 33, 22, 3]
A novel fast feed-forward scheme is proposed to replace recently achieved great success in attribute prediction, due
patch-by-patch evaluation. It evaluates images with arbi- to their ability to learn compact and discriminative features.
trary sizes with only one-pass feed-forward operation. It Razavian et al. [23] and Donahue et al. [7] demonstrated
becomes non-trivial if the filters are locally shared, while that off-the-shelf features learned by CNN of ImageNet [13]
studies [28, 27] showed that locally shared filters perform can be effectively adapted to attribute classification. Zhang
better in face related tasks. This is solved by proposing an et al. [32] showed that better performance can be achieved
interweaved operation. by ensembling learned features of multiple pose-normalized
CNNs. The main drawback of these methods is that they
Besides proposing new methods, our framework also
rely on accurate landmark detection and pose estimation in
reveals valuable facts on learning face representation. They
both training and testing steps. Even though a recent work
not only motivate this work but also benefit future research
[31] can perform automatic part localization during test, it
on face and deep learning. (1) It shows how pre-training
still requires landmark annotations of the training data.
with massive object categories and massive identities can
improve feature learning for face localization and attribute 2. Our Approach
recognition, respectively. (2) It demonstrates that although
filters of LNet are fine-tuned by attribute tags, their response Framework Overview Fig.2 illustrates our pipeline
maps over the entire image have strong indication of face where LNet locates the entire face region in a coarse-to-
location. Good features for face localization should be able fine manner as shown in (a) and (b), while ANet extracts
to capture rich face variations, and more supervised infor- features for attribute recognition as shown in (c).
mation on these variations improves the learning process. Different from existing works that rely on accurate face
The examples in Fig. 1 (a) show that as the number of and landmark annotations, LNet is trained in a weakly su-
attributes decreases, the localization capability of learned pervised manner with only image-level annotations. Specif-
neurons gets reduced dramatically. (3) ANet is pre-trained ically, it is pre-trained with one thousand object categories
with massive face identities. It discloses that the pre-trained of ImageNet [6] and fine-tuned by image-level attribute
high-level hidden neurons of ANet implicitly learn and tags. The former step accounts for background clutters,
discover sematic concepts that are related to identity, such while the latter step learns features robust to complex
as race, gender, and age. It indicates that when a deep face variations. Learning LNet in this way not only
model is pre-trained for face recognition, it implicitly learns significantly reduces data labeling, but also improves the
n (a) LNeto (b) LNets

(5)
xo
(5)
ho xs hs

FC
Linear SVM Wavy Hair
No Beard

…
FC High Cheekbones
Linear SVM Smiling
FC
Linear SVM

…
(4)
hf FC y xf
xf (c) ANet (d) Extracting features to predict attributes
Figure 2. The proposed pipeline of attribute prediction (Best viewed in color)

accuracy of face localization. Both LNeto and LNets have (ILSVRC) 2012 [6], containing 1.2 million training images
network structures similar to AlexNet [13], whose hyper and 50 thousands validation images. All the data is
parameters are specified in Fig.2 (a) and (b) respectively. employed for pre-training except one third of the validation
The fifth convolutional layer (C5) of LNeto indicates head- data for choosing hyper-parameters [13]. We augment
shoulders while C5 of LNets indicates faces, with their data by cropping ten patches from each image, including
highly responsed regions in their averaged response maps. one patch at the center and four at the corners, and their
Moreover, the input xo of LNeto is a m × n image, while horizontal flips. We adopt softmax for object classification,
the input xs of LNets is the head-shoulder region, which is which is optimized by stochastic gradient descent (SGD)
localized by LNeto and resized to 227 × 227. with back-propagation (BP) [16]. As shown in Fig.3
As illustrated in Fig.2 (c), ANet is learned to predict (a.2), the averaged response map in C5 of LNeto already
attributes y by providing the input face region xf , which is indicates locations of objects including human faces after
detected by LNets and properly resized. Specifically, multi- pre-training.
view versions [13] of xf are utilized to train ANet. Further- Fine-tuning LNet Both LNeto and LNets are fine-tuned
more, ANet contains four convolutional layers, where the with attribute tags. Additional output layers are added to
filters of C1 and C2 are globally shared and the filters of C3 the LNets individually for fine-tuning and then removed for
and C4 are locally shared. The effectiveness of local filters evaluation. LNeto adopts the full image xo as input while
have been demonstrated in many face related tasks [26, 28]. LNets uses the highly responsed region xs in the averaged
To handle complex face variations, ANet is pre-trained by response map in C5 of LNeto as input, which roughly re-
distinguishing massive face identities, which facilitates the spond to head-shoulders. The cross-entropy loss is used for
learning of discriminative features. attribute classification, i.e. L =
P
y log p(yi |x) + (1 −
i=1 i
Fig.2 (d) outlines the procedure of attribute recognition. 1

yi ) log 1 − p(yi |x) , where p(yi = 1|x) = 1+exp(−f (x))
ANet extracts a set of feature vectors (FCs) by cropping is the probability of the i-th attribute given image x. As
overlapping patches on xf . An efficient feed-forward shown in Fig.3 (a.3), the response maps after fine-tuning
algorithm is developed to reduce redundant computation become much more clean and smooth, indicating that the
in the feature extraction stage. SVMs [8] are trained to filters learned by attribute tags can detect face patterns with
predict attribute values given each FC. The final prediction complex variations. To appreciate the effectiveness of pre-
is obtained by averaging all these values, to cope with small training, we also include the averaged response map in C5
misalignment of face localization. of being directly trained from scratch with attribute tags but
without pre-training in Fig.3 (a.4). It cannot separate face
2.1. Face Localization regions from background and other body parts well.
The cascade of LNeto and LNets accurately localizes Thresholding and Proposing Windows We show that
face regions by being trained on image-level attribute tags. the responses of C5 in LNet are discriminative enough
Pre-training LNet Both LNeto and LNets are pre- to separate faces and background by simply searching a
trained with 1, 000 general object categories from the threshold, such that a window with response larger than
ImageNet Large Scale Visual Recognition Challenge this threshold corresponding to face and otherwise is back-
(a.1) (a.2) (b) Brown Hair Male
Frontal Left Big Eyes Black Hair
Response on Face Images
Smiling Sunglasses
Response on Bg. Images

Percentage of Images
View 1 View N Attr Config 1 Attr Config N
(a.3) (a.4) threshold ...
... ...
(a) single detector (b) multi-view detector (c) face localization by attributes

Figure 4. Face localization by attributes

Maximum Score
Figure 3. (a.1) Original image. (a.2)-(a.4) are averaged response maps detectors. View labels were used in training detectors and
in C5 of LNeto after pre-training (a.2), fine-tuning (a.3) and directly
training from scratch with attribute tags but without pre-training (a.4). (b)
the whole training set is divided into subsets according to
Determine threshold. views. If views are treated as one type of face attributes,
learning face representation by predicting attributes with
ground. To determine the threshold, we select 2000 images, deep models actually extends this idea to extreme. As
each of which contains a single face, and 2000 background shown in Fig.4 (c), a filter (or a group of filters) functions
images from SUN dataset [29]. For each image, EdgeBox as a detector of an attribute. When a subset of neurons
[34] is adopted to propose 500 candidate windows, each of are activated, they indicate the existence of face images
which is measured by a score that sums over its response with a particular attribute configuration. The neurons at
values normalized by its window size. A larger score different layers can form many activation patterns, implying
indicates the localized pattern is more likely to be a face. that the whole set of face images can be divided into
Each image is then represented by the maximum score over many subsets based on attribute configurations, and each
all its windows. In Fig.3 (b), the histogram of the maximum activation pattern corresponds to one subset (e.g. ‘pointy
scores shows that these scores clearly separate face images nose’, ‘rosy cheek’, and ‘smiling’). Therefore, it is not
from background images. The threshold is chosen as the surprising that filters learned by attributes lead to effective
decision boundary as shown in Fig.3 (b). More results representations for face localization.
are given in Fig.6 (a), showing that the above strategy can
precisely localize face within a single test image. Since each 2.2. Attribute Prediction
training image only contains one single face, we localize a
face region using the window with the largest score during As shown in Fig.2 (c) and (d), ANet is learned to
training. extract features and SVM classifiers are used to predict
Pruning Multiple Faces within a single Window. For attributes. Specifically, in the pre-training stage, ANet is
some challenging cases in the testing stage, it encounters trained by classifying massive face identities. In the fine-
difficulty when multiple faces are presented within a single tuning stage, we first extend the localized face region, which
window, such that there may be multiple regions with high is properly resized, with a small factor to incorporate more
responses. We predict attributes based one face region context information. Then, multiple patches are cropped
which generates the largest response1 . Similar to [24], a from the enlarged face region and utilized as inputs of
fast density peak identification technique is devised. It ANet. ANet is fine-tuned by attributes to learn the high-
calculates a special geodesic distance for each position i in level feature FC. Furthermore, as shown in Fig.2 (d), each
the response map, di = (ρ2i +σi2 )1/2 , where ρi is the density feature vector is adopted to train SVM classifier for attribute
intensity in position i, σi = minj:ρj >ρi (sij ) and sij is the prediction. The above strategy is similar to the multi-
spatial distance between i and j. Then density peaks are view data augmentation [13], increasing the robustness of
identified by selecting extreme large di . This process can attribute recognition. In the testing stage, attributes are
be further accelerated, as the averaged response map in C5 predicted by averaging the SVM scores over all the patches.
is sparse. We propose the correct window by cropping the Pre-training of ANet We introduce how to learn dis-
region with the highest density. criminative features by pre-training ANet with a large num-
To understand why rich attribute information enables ber of identities. We select eight thousand face identities
accurate face localization, one could consider the examples from the CelebFaces [27] dataset, where each identity
in Fig.4. If only a single detector [17, 21] is used to has around twenty images. There are over 160 thousand
classify all the positive and negative samples in Fig.4 (a), training images in total. A simple way to train ANet
it is difficult to handle complex face variations. Therefore, is to classify eight thousand categories with the softmax
multi-view face detectors [30] were developed in Fig.4 (b), loss. However, it is challenging because the number of
i.e. face images in different views are handled by different samples of each identity is limited to maintain the intra-
class invariance. To improve intra-class invariance, we
1 In CelebFaces and LFW, it is assumed that each image has a employ the similarity loss similar to [27, 10]. It decreases
“dominant” face, based on which the attribute tags were labeled by users. the distances between samples of the same identity. We
3
3
2
2 1
6
5
1 6 4 9
8
7

…
5

4 9

8
…
7
(c) feature extraction with
(a) global convolution (b) local convolution interweaved operation (d) interweaved operation
Figure 5. Detailed pipeline of efficient feature extractions in ANet.

P|D|
have L = i=1,yi =yj kFCi − FCj k22 , where FCi and FCj of the filters, including {1, 2, 4, 5}, {2, 3, 5, 6}, {4, 5, 7, 8},
denote the feature vectors of the i-th and j-th face images and {5, 6, 8, 9}. Instead of directly applying the local filters
respectively, and yi = yj indicates the identities of these on h1 , the interweaved operation generates an interweaved
(1)
samples are the same. In summary, ANet is pre-trained by map Ii for each filter, where i = 1...4. Each local filter
combining the softmax loss and the similarity loss. is then apply on its corresponding interweaved map. Since
Efficient Feature Extractions In test, ANet is evaluated the interweaved map capturing the entire image, each local
on multiple patches of the face region as shown in Fig.2 (d), filter is turned into a global filter such that its computation
leading to redundant convolutional computations because can be shared across different patches.
of the large overlaps in these patches. When all the filters (1)
Specifically, each interweaved map, e.g. I1 , is achieved
are globally-shared, the computational cost can be reduced by padding the cells of the corresponding channels in an
by applying [11], which convolves the filters in the input (1)
interweaved manner, e.g. hi={1,2,4,5} , as shown in Fig.5
image and then obtains a feature vector for each patch by (d). All of the interweaved maps are illustrated in Fig.5
pooling over the last convolutional layer. Given a simple (c). After that, each of the four local filters is applied on its
example with one convolutional layer as shown in Fig.5 (a), corresponding interweaved map, leading to four response
the feature vector FC for each patch (e.g. rectangle in red) (2)
maps hi , where i = 1...4. As a result, the feature vector
can be extracted by pooling in the corresponding region of
FC is pooled and concatenated from the receptive fields of
the response map h(1) , without evaluating convolutions in
the filters, which are the rectangles in black as shown in (c).
the input image patch-by-patch. Therefore, it shares the
Intuitively, instead of padding cells according to the
convolutions for every patch.
receptive fields of all the local filters (e.g. h(1) in (b)),
However, this scheme is not applicable when we have
which has to be performed in a patch-by-patch way, the
more than two convolutional layers whose filters are
interweaved operation pads the cells with respect to the
locally-shared. An example is illustrated in Fig.5 (b), where
receptive field of each local filter over the entire image. It
each patch is equally divided into 3 × 3 = 9 cells and
enables extracting multiple feature vectors with only one-
we learn different filters for different cells. To reduce
pass of feed-forward evaluation. This operation can be
computations in the first convolutional layer, each local
repeated when more locally convolutional layers are added.
filter can be applied on the entire image, resulting in the
(1) The proposed feature extraction scheme has achieved 6×
response map with nine channels, i.e. hi and i = 1...9. speedup empirically when compared with patch-by-patch
(1)
The final response map h is obtained by cropping and scanning. It is applicable to CNNs with local filters and
padding the regions (i.e. rectangles in black) in these 9 compatible to all existing CNN operations.
channels. As a result, each feature vector FC can be pooled
from h(1) , without convolving the input image patch-
3. Experiments
by-patch. Nevertheless, since h(1) is corresponded to a
patch of the input image, the succeeding local convolutions Large-scale Data Collection We construct two face
have to be handled patch-by-patch, leading to redundant attribute datasets, namely CelebA and LFWA, by labeling
computations. images selected from two challenging face datasets, Celeb-
To this end, we propose an interweaved operation, which Faces [27] and LFW [12]. CelebA contains ten thousand
is a fast feed-forward method for CNN with locally-shared identities, each of which has twenty images. There are
filters. Suppose we have four local filters in the next locally two hundred thousand images in total. LFWA has 13, 233
convolutional layer and each filter is applied on 2 × 2 cells images of 5, 749 identities. Each image in CelebA and
of h(1) as shown in (b). These cells are the receptive fields LFWA is annotated with forty face attributes and five key
(a) (b) (c)

Figure 6. Averaged response maps of LNet, including (a) CelebA, (b) MobileFaces, (c) some failure cases. (Best viewed in color)

(a) 1 (b)
1

0.8 0.8 (a)

True Positive Rates

0.6 0.6 Input Image Young Pale Skin Bangs Brown Hair No Beard
LNet LNet
DPM [21] DPM [21]
0.4 0.4
ACF Multi-view [29] ACF Multi-view [29] (b)
SURF Cascade [17] SURF Cascade [17]
Face++ [1] Face++ [1]
0.2 0.2
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
False Positive Per Image False Positive Per Image Input Image Male Receding Bags Chubby Wearing
(c) 1 (d) 1 Hairline Under Eyes Necktie
Recall Rates (FPPI = 0.1)

Recall Rates (FPPI = 0.1)

0.8
0.9
(c)
0.6
0.8
0.4 Input Image Straight Attractive Wearing Wearing Wearing
LNet
DPM [21]
Hair Lipstick Earrings Necklace
0.7
0.2 ACF Multi-view [29]
SURF Cascade [17] Figure 9. Attribute-specific regions discovery.
LNet (w/o pre-training)
0
0 0.2 0.4 0.6 0.8 1 10 20 30 40
Overlap Ratio Number of Attributes

Figure 7. ROC curves on (a) CelebA (b) MobileFaces. (c) Recall rates
w.r.t. overlap ratio (F P P I = 0.1). (d) Recall rates w.r.t. number of
bling multiple CNNs, each of which extracts features from
attributes (F P P I = 0.1) a well-aligned human part. These features are concatenated
to train SVM for attribute recognition. It is straightforward
points by a professional labeling company. CelebA and to adapt this method to face attributes, since face parts can
LFWA have over eight million and five hundred thousand be well-aligned by landmark points. Here, we consider two
attribute labels, respectively. settings. PANDA-w obtains the face parts by applying the
state-of-the-art face detection [17] and alignment [26] on
CelebA is partitioned into three parts. Images of the
wild images, while PANDA-l attains the face parts by using
first eight thousand identities (with 160 thousand images)
ground truth landmark points. For fair comparison, all the
are used to pre-train and fine-tune ANet and LNet, and
above methods are trained with the same data as ours.
the images of another one thousand identities (with twenty
thousand images) are employed to train SVM. The images 3.1. Effectiveness of the Framework
of the remaining one thousand identities (with twenty
thousand images) are used for testing. LFWA is partitioned This section demonstrates the effectiveness of the frame-
into half for training and half for testing. Specifically, 6, 263 work. All experiments in this section are done on CelebA.
images are adopted to train SVM and the remaining images • LNet
for test. When being evaluated on LFWA, LNet and ANet Performance Comparison We compare LNet with four
are trained on CelebA. state-of-the-art face detectors, including DPM [21], ACF
Multi-view [30], SURF Cascade [17], and Face++ [1].
Methods for Comparisons The proposed method is
We evaluate them by using ROC curves when IoU 2 ≥0.5.
compared with three competitive approaches, i.e. Face-
As plotted in Fig.7(a), when F P P I = 0.01, the true
Tracer [14], PANDA-w [32], and PANDA-l [32]. Face-
positive rates of Face++ and LNet are 85% and 93%;
Tracer extracts HOG and color histograms in several im-
when F P P I = 0.1, our method outperforms the other
portant functional face regions and then trains SVM for
three methods by 11, 9 and 22 percent respectively. We
attribute classification. We extract these functional regions
also investigate how these methods perform with respect to
referring to the ground truth landmark points. PANDA-w
overlap ratio (IoU ), following [34, 21]. Fig.7(c) shows that
and PANDA-l are based on PANDA [32], which was pro-
posed recently for human attribute recognition by ensem- 2 IoU indicates Intersection over Union.
High Resp. Low Resp. High Resp. Low Resp. Test Image Activations Neurons
(a.1) Gender (a.2) Hair Color (b.1) Bangs Brown Hair Pale Skin Narrow Eyes High Cheek.

(a.3) Age (a.4) Race (b.2) Eyeglasses Mustache Black Hair Smiling Big Nose

(a.5) Face Shape (a.6) Eye Shape (b.3) Wear. Hat Blond Hair Wear. Lipstick Asian Big Eyes

Figure 8. Visualization of neurons in ANet (a) after pre-training (b) after fine-tuning (Best viewed in color)
(a) ANet (FC) ANet (C4) ANet (C3)
Attractive No_Beard
Group #1 Mouth_Slightly_Open Group #2
Identity-related Attributes Identity-non-related Attributes
100% 90% High_Cheekbones
Wavy_Hair
Heavy_Makeup Young
Smiling Wearing_Lipstick
95% 85% Bangs
Accuracy

Blond_Hair Black_Hair Brown_Hair Pointy_Nose

Group #3
90% 80% Gray_Hair Group #4 Straight_Hair Rosy_Cheeks
Pale_Skin Oval_Face
85% 75% Blurry Group #6
Wearing_Hat Wearing_Earrings
80% 70% Wearing_Necktie Eyeglasses Wearing_Necklace Chubby
Male White Black Asian Smiling Wearing Rosy 5oClock Group #5
Bald
Double_Chin
Hat Cheeks Shadow Male Receding_Hairline Arched_Eyebrows
5_o_Clock_Shadow Bushy_Eyebrows Narrow_Eyes
(b) ANet (After fine-tuning) HOG (After PCA) Sideburns Bags_Under_Eyes
Big_Nose
Mustache
Big_Lips
80% Goatee
Average Accuracy

single best Figure 11. Automatic attributes grouping.

70%
performing
60% neuron

50%
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% different circumstances (lighting, pose, occlusion, image
Percentage of Best Performing Neurons Used
resolution, background clutter etc.) are shown in Fig.14.
Figure 10. (a) Layer-wise comparison of ANet after pre-training (b) Best
performing neurons analysis of ANet after fine-tuning. Best performing Attribute-specific Regions Discovery Different at-
neurons are different for different attributes. The proposed accuracies are tribute captures information from different region of face.
averaged over attributes which select their own subsets of best performing We show that LNet automatically learns to discover these
neurons.
regions. Given an attribute, by converting fully connected
LNet generally provides more accurate face localization, layers of LNet into fully convolutional layers following
leading to good performance in the subsequent attribute [18], we can locate important region of this attribute. Fig.9
prediction. shows some examples. The important regions of some
Further Analysis LNet significantly outperforms LNet attributes are locally distributed, such as ‘Bags Under Eyes’,
(without pre-training) by 74 percent when the overlap ‘Straight Hair’ and ‘Wearing Necklace’, but some are glob-
ratio equals to 0.5, which validates the effectiveness of ally distributed, such as ‘Young’, ‘Male’ and ‘Attractive’.
pre-training, as shown in Fig.7(c). We then explore • ANet
the influence of the number of attributes on localization. Pre-training Discovers Semantic Concepts We show
Fig.7(d) illustrates rich attribute information facilitates face that pre-training of ANet can implicity discover semantic
localization. To examine the generalization ability of LNet, concepts related to face identity. Given a hidden neuron
we collect another 3, 876 face images for testing, namely at the FC layer of ANet as shown in Fig.2(c), we partition
MobileFaces, which comes from a different source3 and has the face images into three groups, including the face images
a different distribution from CelebA. Several examples of with high, medium, and low responses at this neuron. The
MobileFaces are shown in Fig.6(b) and the corresponding face images of each group are then averaged to obtain
ROC curves are plotted in Fig.7(b). We observe that the mean face. We visualize these mean faces for several
LNet constantly performs better and still gains 7 percent neurons in Fig.8(a). Interestingly, these mean face changes
improvement (F P P I = 0.1) compared with other face smoothly from high response to low response, following a
detectors. Despite some failure cases due to extreme poses high-level concept. Human can easily assign each neuron
and large occlusions, LNet accurately localize faces in the with a semantic concept it measures (i.e. the text in yellow).
wild as demonstrated in Fig.6. More results of LNet under For example, the neurons in (a.1) and (a.4) correspond
3 MobileFaces was collected by normal users with mobile phones, to ‘gender’ and ‘race’, respectively. It reveals that the
while CelebA and LFWA collected face images of celebrities taken by high-level hidden neurons of ANet can implicitly learn
professional photographers. to discover semantic concepts, even though they are only
Bushy Eyebrows
Arch. Eyebrows

H. Cheekbones
Heavy Makeup
Bags Un. Eyes

Double Chin
Brown Hair
Blond Hair

Eyeglasses
Black Hair

Gray Hair
Attractive
5 Shadow

Big Nose
Big Lips

Chubby

Goatee
Blurry
Bangs

Male
Bald
FaceTracer [14] 85 76 78 76 89 88 64 74 70 80 81 60 80 86 88 98 93 90 85 84 91
PANDA-w [32] 82 73 77 71 92 89 61 70 74 81 77 69 76 82 85 94 86 88 84 80 93
PANDA-l [32] 88 78 81 79 96 92 67 75 85 93 86 77 86 86 88 98 93 94 90 86 97
CelebA
[17]+ANet 86 75 79 77 92 94 63 74 77 86 83 74 80 86 90 96 92 93 87 85 95
LNets+ANet(w/o) 88 74 77 73 95 92 66 75 84 91 80 78 85 86 88 96 92 93 85 84 94
LNets+ANet 91 79 81 79 98 95 68 78 88 95 84 80 90 91 92 99 95 97 90 87 98
FaceTracer [14] 70 67 71 65 77 72 68 73 76 88 73 62 67 67 70 90 69 78 88 77 84
PANDA-w [32] 64 63 70 63 82 79 64 71 78 87 70 65 63 65 64 84 65 77 86 75 86
PANDA-l [32] 84 79 81 80 84 84 73 79 87 94 74 74 79 69 75 89 75 81 93 86 92
LFWA
[17]+ANet 78 66 75 72 86 84 70 73 82 90 75 71 69 68 70 88 68 82 89 79 91
LNets+ANet(w/o) 81 78 80 79 83 84 72 76 86 94 70 73 79 70 74 92 75 81 91 83 91
LNets+ANet 84 82 83 83 88 88 75 81 90 97 74 77 82 73 78 95 78 84 95 88 94

Reced. Hairline

Wear. Necklace
Wear. Earrings

Wear. Lipstick

Wear. Necktie
Rosy Cheeks
Narrow Eyes

Straight Hair
Mouth S. O.

Pointy Nose

Wavy Hair
Oval Face

Wear. Hat
Sideburns
Mustache

No Beard

Pale Skin

Average
Smiling

Young
FaceTracer [14] 87 91 82 90 64 83 68 76 84 94 89 63 73 73 89 89 68 86 80 81
PANDA-w [32] 82 83 79 87 62 84 65 82 81 90 89 67 76 72 91 88 67 88 77 79
PANDA-l [32] 93 93 84 93 65 91 71 85 87 93 92 69 77 78 96 93 67 91 84 85
CelebA
[17]+ANet 85 87 83 91 65 89 67 84 85 94 92 70 79 77 93 91 70 90 81 83
LNets+ANet(w/o) 86 91 77 92 63 87 70 85 87 91 88 69 75 78 96 90 68 86 83 83
LNets+ANet 92 95 81 95 66 91 72 89 90 96 92 73 80 82 99 93 71 93 87 87
FaceTracer [14] 77 83 73 69 66 70 74 63 70 71 78 67 62 88 75 87 81 71 80 74
PANDA-w [32] 74 77 68 63 64 64 68 61 64 68 77 68 63 85 78 83 79 70 76 71
PANDA-l [32] 78 87 73 75 72 84 76 84 73 76 89 73 75 92 82 93 86 79 82 81
LFWA
[17]+ANet 76 79 74 69 66 68 72 70 71 72 82 72 65 87 82 86 81 72 79 76
LNets+ANet(w/o) 78 87 77 75 71 81 76 81 72 72 88 71 73 90 84 92 83 76 82 79
LNets+ANet 82 92 81 79 74 84 80 85 78 77 91 76 76 94 88 95 88 79 86 84

Table 1. Performance comparison of attribute prediction. (Note that FaceTracer and PANDA-l attains the face parts by using ground truth landmark points.)

optimized for face recognition using identity information ably, these neurons express diverse high-level meanings
and attribute labels are not used in pre-training. We also and cooperate to explain the test images. The activations
observe that most of these concepts are intrinsic to face of all the neurons are visualized in Fig.8(b), and they are
identity, such as the shape of facial components, gender, sparse. In some sense, attributes presented in each test
and race. image are explained by a sparse linear combination of these
To better explain this phenomena, we compare the concepts. For instance, the first image is described as “a
accuracy of attribute prediction using features at different lady with bangs, brown hair, pale skin, narrow eyes and high
layers of ANet right after pre-training. They are FC, C4, cheekbones”, which well matches human perception.
and C3. The forty attributes are roughly separated into two To validate this, we explore how the number of neurons
groups, which are identity-related attributes, such as gender influences attribute prediction accuracies. Best performing
and race, and identity-non-related attributes, e.g. attributes neurons for each attribute are identified by sorting corre-
of expressions, wearing hat and sunglasses. We select sponding SVM weights. Fig.10(b) illusatrates that only
some representative attributes for each group and plot the 10% of ANet best performing neurons are needed to achieve
results in Fig.10(a), which shows that the performance of 90% of the original performance of a particular attribute4 .
FC outperforms C4 and C3 in the group of identity-related In contrast, HOG+PCA does not have the sparse nature
attributes, but they are relatively weaker when dealing with and need more than 95% features Besides, the best single
identity-non-related attributes. This is because the top layer performing neuron of ANet outperforms that of HOG+PCA
FC learns identity features, which are insensitive to intra- by 25 percent in average prediction accuracy.
personal face variations. Automatic Attributes Grouping Here we show that the
weight matrix at the FC layer of ANet can implicitly capture
Fine-tuning Expands Semantic Concepts Fig.8 shows relations between attributes. Each column vector of the
that after fine-tuning, ANet can expand these concepts to weight matrix can be viewed as a decision hyperplane to
more attribute types. Fig.8(b) visualizes the neurons in the partition the negatives and positive samples of an attribute.
FC layer, which are ranked by their responses in descending By simply applying k-means to these vectors, the clusters
order with respect to several test images. Human can assign show clear grouping patterns, which can be interpreted
semantic meaning to each of these neurons. We found that
a large number of new concepts can be observed. Remark- 4 Best performing neurons are different for different attributes.
Mustache

No Beard
Blond H.
M. Aged

Black H.

Average
No Eye.

B. Nose
R. Hair.

A. Eye.
B. Eye.
Gender

R. Jaw
Senior
White

Youth
Asian

Black

Bald

Eye.
FaceTracer [14] 91 87 86 75 66 54 70 66 68 72 84 86 83 76 72 66 65 81 51 73
POOF [2] 92 90 81 90 71 60 80 67 75 67 87 90 86 72 74 71 68 77 55 76
LNets+ANet 94 85 83 87 80 77 81 86 89 84 85 84 86 83 82 75 79 78 81 83

Table 2. Performance comparison on extended attributes. (Performance are measured by the average of true positive rates and true negative rates.)

FaceTracer [11] PANDA-w [26] PANDA-l [26] LNets+ANet outperforms them by 10 and 7 percent respectively.
100%

95% Further Analysis When compared with [17]+ANet,

90% LNets accounts for nearly 6 percentage improvement over
85%
Accuracy

using an off-the-shelf face detector [17]. We also ex-

80%

75%
periment with the case of providing ANet with localized
70% face region by LNets, but without pre-training, denoted as
65%
LNets+ANet(w/o). The average accuracies have dropped
60%
4 and 5 percent on CelebA and LFWA, which indicate
pre-training with massive facial identities helps discover
semantic concepts.
Figure 12. Performance comparison of FaceTracer [14], PANDA-w [32], Performance on LFWA+ To further examine whether
PANDA-l [32] and LNets+ANet on LFWA+.
the proposed approach can be generalized to unseen at-
tributes, we manually label 30 more attributes for the testing
LNets+ANet PANDA-l
CelebA LFWA LFWA+
images on LFWA and denote this extended dataset as
88 84 86 LFWA+. To test on these 30 attributes, we directly transfer
weights learned by deep models to extract features, and only
Accuracy

85 80 82
re-train SVMs using one third of the images. LNets+ANet
leads to 8, 10, and 3 percent average gains over the other
82 76 77
1k 3k 10k 1k 3k 10k 1k 3k 10k
three approaches (FaceTracer, PANDA-w, and PANDA-l).
Dataset Size It demonstrates that our method learns discriminative face
representations and has good generalization ability.
Figure 13. Performances of different training dataset sizes.
Size of Training Dataset We compare the attribute
prediction accuracy of the proposed method with the ac-
semantically. As shown in Fig.11, Group #1, Group #2 and curacy of PANDA-l, regarding different sizes of training
Group #4 demonstrate co-occurrence relationship between datasets. Only the training data of ANet is changed in
attributes, e.g. ‘Attractive’ and ‘Heavy Makeup’ have high our method for fair comparison. Fig.13 demonstrates that
correlation. Attributes in Group #3 share similar color LNets+ANet performs well when dataset size is small, but
descriptors, while attributes in Group #6 correspond to the performance of PANDA-l drops significantly.
certain texture and appearance traits. Time Complexity For a 300 ∗ 300 image, LNets takes
35ms to localize face region while ANet takes 14ms to
3.2. Attribute Prediction extract features on GPU. In contrast, a naı̈ve patch-by-
patch scanning needs nearly 80 ms to extract features. Our
Performance Comparison The attribute prediction per- framework has large potential in real-world applications.
formance is reported in Table.1. On CelebA, the prediction
accuracies of FaceTracer [14], PANDA-w [32], PANDA-l 4. Conclusion
[32], and our LNets+ANet are 81, 79, 85, and 87 percent
respectively, while the corresponding accuracies on LFWA This paper has proposed a novel deep learning frame-
are 74, 71, 81, and 84 percent. Our method outperforms work for face attribute prediction in the wild. With carefully
PANDA-w by nearly 10 percent. Remarkably, even when designed pre-training strategies, our method is robust to
PANDA-l is equipped with groundtruth bounding boxes background clutters and face variations. We devise a new
and landmark positions, our method still achieves 3 percent fast feed-forward algorithm for locally shared filters to save
gain. The strength of our method is illustrated not only redundant computation, which enables evaluating image
on global attributes, e.g. “Chubby” and “Young”, but also with arbitrary size in realtime. It allows taking images of
on fine-grained facial traits, e.g. “Mastache” and “Pointy arbitrary sizes as input without normalization. We have
Nose”. We also report performance on 19 extended at- also revealed multiple important facts about learning face
tributes and compare our result with [14] and [2]. The eval- representation, which shed a light on new directions of face
uation protocol is the same as [2]. In Table 2, LNets+ANet localization and representation learning.
References [19] P. Luo, X. Wang, and X. Tang. A deep sum-product
architecture for robust facial attributes analysis. In ICCV,
[1] Face++. http://www.faceplusplus.com/. 6 pages 2864–2871, 2013. 1, 2
[2] T. Berg and P. N. Belhumeur. Poof: Part-based one-vs.-one [20] O. K. Manyam, N. Kumar, P. Belhumeur, and D. Kriegman.
features for fine-grained categorization, face verification, and Two faces are better than one: Face recognition in group
attribute estimation. In CVPR, pages 955–962, 2013. 1, 2, 9 photographs. In IJCB, pages 1–8, 2011. 1
[3] A. Bergamo, L. Bazzani, D. Anguelov, and L. Torresani. [21] M. Mathias, R. Benenson, M. Pedersoli, and L. Van Gool.
Self-taught object localization with deep networks. arXiv Face detection without bells and whistles. In ECCV, pages
preprint arXiv:1409.3964, 2014. 2 720–735. 2014. 4, 6
[4] L. Bourdev, S. Maji, and J. Malik. Describing people: A [22] M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Is object
poselet-based approach to attribute classification. In ICCV, localization for free?–weakly-supervised learning with con-
pages 1543–1550, 2011. 1, 2 volutional neural networks. In CVPR, pages 685–694, 2015.
[5] J. Chung, D. Lee, Y. Seo, and C. D. Yoo. Deep attribute 2
networks. In NIPS Workshop on Deep Learning and Unsu- [23] A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson.
pervised Feature Learning, volume 3, 2012. 1 Cnn features off-the-shelf: an astounding baseline for recog-
[6] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- nition. arXiv preprint arXiv:1403.6382, 2014. 1, 2
Fei. Imagenet: A large-scale hierarchical image database. In [24] A. Rodriguez and A. Laio. Clustering by fast search and find
CVPR, pages 248–255, 2009. 2, 3 of density peaks. Science, 344(6191):1492–1496, 2014. 4
[7] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, [25] F. Song, X. Tan, and S. Chen. Exploiting relationship
E. Tzeng, and T. Darrell. Decaf: A deep convolutional between attributes for improved face verification. CVIU,
activation feature for generic visual recognition. arXiv 122:143–154, 2014. 1
preprint arXiv:1310.1531, 2013. 2 [26] Y. Sun, X. Wang, and X. Tang. Deep convolutional network
[8] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.- cascade for facial point detection. In CVPR, pages 3476–
J. Lin. Liblinear: A library for large linear classification. 3483, 2013. 1, 3, 6
JMLR, 9:1871–1874, 2008. 3 [27] Y. Sun, X. Wang, and X. Tang. Deep learning face
[9] A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing representation by joint identification-verification. In NIPS,
objects by their attributes. In CVPR, pages 1778–1785, 2009. 2014. 2, 4, 5
2 [28] Y. Taigman, M. Yang, M. Ranzato, and L. Wolf. Deepface:
[10] R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality Closing the gap to human-level performance in face verifica-
reduction by learning an invariant mapping. In CVPR, tion. In CVPR, pages 1701–1708, 2014. 2, 3
volume 2, pages 1735–1742, 2006. 4 [29] J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba.
[11] K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling Sun database: Large-scale scene recognition from abbey to
in deep convolutional networks for visual recognition. In zoo. In CVPR, pages 3485–3492, 2010. 4
ECCV, pages 346–361. 2014. 5 [30] B. Yang, J. Yan, Z. Lei, and S. Z. Li. Aggregate channel
[12] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. features for multi-view face detection. In IJCB, pages 1–8,
Labeled faces in the wild: A database for studying face 2014. 4, 6
recognition in unconstrained environments. Technical Re- [31] N. Zhang, J. Donahue, R. Girshick, and T. Darrell. Part-
port 07-49, University of Massachusetts, Amherst, October based r-cnns for fine-grained category detection. In ECCV,
2007. 2, 5 pages 834–849. 2014. 2
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet [32] N. Zhang, M. Paluri, M. Ranzato, T. Darrell, and L. Bourdev.
classification with deep convolutional neural networks. In Panda: Pose aligned networks for deep attribute modeling. In
NIPS, pages 1097–1105, 2012. 2, 3, 4 CVPR, 2014. 1, 2, 6, 8, 9
[14] N. Kumar, P. Belhumeur, and S. Nayar. Facetracer: A search [33] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba.
engine for large collections of images with faces. In ECCV, Object detectors emerge in deep scene cnns. In ICLR, 2015.
pages 340–353. 2008. 6, 8, 9 2
[15] N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar. [34] C. L. Zitnick and P. Dollár. Edge boxes: Locating object
Attribute and simile classifiers for face verification. In ICCV, proposals from edges. In ECCV, pages 391–405. 2014. 4, 6
pages 365–372, 2009. 1, 2
[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E.
Howard, W. Hubbard, and L. D. Jackel. Handwritten digit
recognition with a back-propagation network. In NIPS, 1990.
3
[17] J. Li and Y. Zhang. Learning surf cascade for fast and
accurate object detection. In CVPR, pages 3468–3475, 2013.
4, 6, 8, 9
[18] J. Long, E. Shelhamer, and T. Darrell. Fully convolutional
networks for semantic segmentation. In CVPR, 2015. 7
Figure 14. More results of LNet averaged response maps. (Best viewed in color)

Face Recognition Based On MTCNN and FaceNet
No ratings yet
Face Recognition Based On MTCNN and FaceNet
6 pages
Thinking, Fast and Slow (By Daniel Kahneman) : Article
No ratings yet
Thinking, Fast and Slow (By Daniel Kahneman) : Article
3 pages
Ms Project Basic Exercise
100% (1)
Ms Project Basic Exercise
3 pages
The 37 Idioms You Should Know For The TOEFL
100% (1)
The 37 Idioms You Should Know For The TOEFL
3 pages
Citizenship Pupil's Book s2
No ratings yet
Citizenship Pupil's Book s2
69 pages
Sports and Socialization
0% (1)
Sports and Socialization
32 pages
Introduction To Face Processing With Computer Vision
No ratings yet
Introduction To Face Processing With Computer Vision
82 pages
Deep Learning For Face Recognition
No ratings yet
Deep Learning For Face Recognition
47 pages
Biometric Watermark On Soft-Biometric Information Stored in Biometric Face Embeddings
No ratings yet
Biometric Watermark On Soft-Biometric Information Stored in Biometric Face Embeddings
16 pages
Arun MasterThesis
No ratings yet
Arun MasterThesis
73 pages
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
No ratings yet
Deep Face Recognition: A Survey: Mei Wang, Weihong Deng
31 pages
AI Problems
No ratings yet
AI Problems
46 pages
Chapter 5 Face Recognition
No ratings yet
Chapter 5 Face Recognition
75 pages
BSP Backyard Camping Registration Form
No ratings yet
BSP Backyard Camping Registration Form
6 pages
TPAMI-2018-CR297-Faceness-net - Face Detection Through Deep Facial Part Responses
No ratings yet
TPAMI-2018-CR297-Faceness-net - Face Detection Through Deep Facial Part Responses
15 pages
Convolutional Neural Network-Based Face Recognition Using Non Subsampled Shearlet Transform and Histogram of Local Feature Descriptors
No ratings yet
Convolutional Neural Network-Based Face Recognition Using Non Subsampled Shearlet Transform and Histogram of Local Feature Descriptors
12 pages
ECCV-2018-CR154-Pyramidbox - A Context-Assisted Single Shot Face Detector
No ratings yet
ECCV-2018-CR154-Pyramidbox - A Context-Assisted Single Shot Face Detector
17 pages
Fast Deep Convolutional Face Detection in The Wild Exploiting Hard Sample Mining
No ratings yet
Fast Deep Convolutional Face Detection in The Wild Exploiting Hard Sample Mining
24 pages
Journal Paper-2
No ratings yet
Journal Paper-2
11 pages
Amc JNR 2008
No ratings yet
Amc JNR 2008
8 pages
Deep Residual Learning For Human Identification Based On Facial Landmarks
No ratings yet
Deep Residual Learning For Human Identification Based On Facial Landmarks
12 pages
Efficient Lightweight Attention Network For Face R
No ratings yet
Efficient Lightweight Attention Network For Face R
11 pages
Sarkar Parameter Efficient Local Implicit Image Function Network For Face Segmentation CVPR 2023 Paper
No ratings yet
Sarkar Parameter Efficient Local Implicit Image Function Network For Face Segmentation CVPR 2023 Paper
11 pages
A Deep Face Identification Network Enhanced by Facial Attributes Prediction
No ratings yet
A Deep Face Identification Network Enhanced by Facial Attributes Prediction
8 pages
IEEE Research Paper 2
No ratings yet
IEEE Research Paper 2
5 pages
Scalable Face Image Retrieval With Identity-Based Quantization and Multi-Reference Re-Ranking
No ratings yet
Scalable Face Image Retrieval With Identity-Based Quantization and Multi-Reference Re-Ranking
8 pages
A Compressed Deep Convolutional Neural Networks For Face Recognition
No ratings yet
A Compressed Deep Convolutional Neural Networks For Face Recognition
6 pages
Liu EQFace A Simple Explicit Quality Network For Face Recognition CVPRW 2021 Paper
No ratings yet
Liu EQFace A Simple Explicit Quality Network For Face Recognition CVPRW 2021 Paper
9 pages
20B91A05W9
No ratings yet
20B91A05W9
53 pages
1 Mukherjee2017 PDF
No ratings yet
1 Mukherjee2017 PDF
5 pages
Growth Mindset VS. Fixed Mindset
No ratings yet
Growth Mindset VS. Fixed Mindset
11 pages
Adaptive Deep Supervised Autoencoder Based Image R PDF
No ratings yet
Adaptive Deep Supervised Autoencoder Based Image R PDF
15 pages
Face Dectection
No ratings yet
Face Dectection
5 pages
Math7 Unpacking
No ratings yet
Math7 Unpacking
1 page
Is Valid Only With Original Photo ID: Railway Recruitment Board
No ratings yet
Is Valid Only With Original Photo ID: Railway Recruitment Board
2 pages
Detect Faces Efficiently A Survey and Evaluations
No ratings yet
Detect Faces Efficiently A Survey and Evaluations
19 pages
TOPIC: The Structure of Personality Client-Centered (Carl Rogers) DISCUSSANT: Mary Ann S. Ariente MAED AS
No ratings yet
TOPIC: The Structure of Personality Client-Centered (Carl Rogers) DISCUSSANT: Mary Ann S. Ariente MAED AS
1 page
DSFD
No ratings yet
DSFD
10 pages
Face Recognition and Identification Using Deep Learning
No ratings yet
Face Recognition and Identification Using Deep Learning
5 pages
cvpr10 Facereco PDF
No ratings yet
cvpr10 Facereco PDF
8 pages
Face Recognition With Deep Learning Architectures
No ratings yet
Face Recognition With Deep Learning Architectures
27 pages
The Strengths & Weaknesses of Face2Vec - FaceNet
No ratings yet
The Strengths & Weaknesses of Face2Vec - FaceNet
6 pages
Sagar Institute of Research & Technology Department of Electronics & Communication
No ratings yet
Sagar Institute of Research & Technology Department of Electronics & Communication
13 pages
Transfer Learning Convolutional Neural Network-AlexNet Achieving Face Recognition
No ratings yet
Transfer Learning Convolutional Neural Network-AlexNet Achieving Face Recognition
4 pages
Convolutional Neural Network Approach Fo
No ratings yet
Convolutional Neural Network Approach Fo
6 pages
Seminar
No ratings yet
Seminar
7 pages
Face Recognition Using Facenet Deep Lear
No ratings yet
Face Recognition Using Facenet Deep Lear
6 pages
Facenet: A Unified Embedding For Face Recognition and Clustering
No ratings yet
Facenet: A Unified Embedding For Face Recognition and Clustering
9 pages
DeepFace Summary
No ratings yet
DeepFace Summary
2 pages
Deep Face
No ratings yet
Deep Face
8 pages
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
No ratings yet
Automated Attendance System With Multi-Faces Using Convolution Neural Network (CNN)
6 pages
Comparative Analysis of Transfer Learning CNN For Face Recognition
No ratings yet
Comparative Analysis of Transfer Learning CNN For Face Recognition
6 pages
Human Attribute Recognition by Rich Appearance Dictionary
No ratings yet
Human Attribute Recognition by Rich Appearance Dictionary
8 pages
Deep Convolutional Neural Network-Based Approaches
No ratings yet
Deep Convolutional Neural Network-Based Approaches
21 pages
A Deep Learning Based Approach For Real Time Face Recognition System
No ratings yet
A Deep Learning Based Approach For Real Time Face Recognition System
4 pages
An Efficient Face Recognition Method Based On CNN
No ratings yet
An Efficient Face Recognition Method Based On CNN
4 pages
The Role of Artificial Intelligence in Enhancing Cybersecurity and Internal Audit
No ratings yet
The Role of Artificial Intelligence in Enhancing Cybersecurity and Internal Audit
7 pages
Class10 FaceNet Comparison
No ratings yet
Class10 FaceNet Comparison
20 pages
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
No ratings yet
Implementation of FaceNet and Support Vector Machine in A Real-Time Web-Based Timekeeping Application
9 pages
Introduction To Mediation: Titre de La Présentation
No ratings yet
Introduction To Mediation: Titre de La Présentation
41 pages
Human Environment System
100% (1)
Human Environment System
10 pages
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
No ratings yet
A Comprehensive Survey On Face Recognition and Image Retrieval For Event-Based Applications
5 pages
Face Recognition Using Modified Histogram of Oriented Gradients and Convolutional Neural Networks
No ratings yet
Face Recognition Using Modified Histogram of Oriented Gradients and Convolutional Neural Networks
17 pages
Topical Drug Bioavailability Bioequivalence and Penetration 2014
No ratings yet
Topical Drug Bioavailability Bioequivalence and Penetration 2014
393 pages
Using Technology and Algorithms For Face Detection and Recognition Using Digital Image Processing 14328
No ratings yet
Using Technology and Algorithms For Face Detection and Recognition Using Digital Image Processing 14328
14 pages
Prakash2019 - Face Recognition
No ratings yet
Prakash2019 - Face Recognition
4 pages
Face Regonition
No ratings yet
Face Regonition
17 pages
Intermediate and Secondary Education Boards Bangladesh
No ratings yet
Intermediate and Secondary Education Boards Bangladesh
1 page
Report 16
No ratings yet
Report 16
9 pages
Face Regonition
No ratings yet
Face Regonition
10 pages
Face Recognition Approach Via Deep and Machine Lea
No ratings yet
Face Recognition Approach Via Deep and Machine Lea
13 pages
Journal Club
No ratings yet
Journal Club
6 pages
A Hybrid Approach For Face Recognition Using A Convolutional Neural Network Combined With Feature Extraction Techniques
No ratings yet
A Hybrid Approach For Face Recognition Using A Convolutional Neural Network Combined With Feature Extraction Techniques
14 pages
Comparing The Effectiveness and Performance of Image Processing Algorithms in Face Recognition
No ratings yet
Comparing The Effectiveness and Performance of Image Processing Algorithms in Face Recognition
5 pages
Teenage Pregnancy
No ratings yet
Teenage Pregnancy
5 pages
Lesson 1 - Compatibility Mode
No ratings yet
Lesson 1 - Compatibility Mode
15 pages
Description of LSA's Oral Interpreting Assessment - 5.2024
No ratings yet
Description of LSA's Oral Interpreting Assessment - 5.2024
2 pages
r3 Instwiseadm
No ratings yet
r3 Instwiseadm
237 pages
Guarnera DeepFake Detection by Analyzing Convolutional Traces CVPRW 2020 Paper
No ratings yet
Guarnera DeepFake Detection by Analyzing Convolutional Traces CVPRW 2020 Paper
10 pages
StarGAN v2 - Diverse Image Synthesis For Multiple Domains
No ratings yet
StarGAN v2 - Diverse Image Synthesis For Multiple Domains
14 pages
Marissa 2 IMVIP2018 Book
No ratings yet
Marissa 2 IMVIP2018 Book
4 pages
London South Bank University International Prospectus
No ratings yet
London South Bank University International Prospectus
72 pages
(BW Version) Grad - Financial - Verification - Form 2021-2022
No ratings yet
(BW Version) Grad - Financial - Verification - Form 2021-2022
2 pages
MUST READ - DRB-GAN A Dynamic ResBlock Generative Adversarial Network For
No ratings yet
MUST READ - DRB-GAN A Dynamic ResBlock Generative Adversarial Network For
20 pages
The Power of Learning Through Poetry
No ratings yet
The Power of Learning Through Poetry
10 pages
Business Studies Report Structure Help
No ratings yet
Business Studies Report Structure Help
6 pages
IB World Literature Assignment 1
No ratings yet
IB World Literature Assignment 1
3 pages
Get Principles of Multiscale Modeling 1st Edition Weinan E Free All Chapters
No ratings yet
Get Principles of Multiscale Modeling 1st Edition Weinan E Free All Chapters
51 pages
The Power of Alignment
No ratings yet
The Power of Alignment
3 pages
Assignment No 2
No ratings yet
Assignment No 2
1 page
1st Semmm
No ratings yet
1st Semmm
1 page
Eigenface: Exploring the Depths of Visual Recognition with Eigenface
From Everand
Eigenface: Exploring the Depths of Visual Recognition with Eigenface
Fouad Sabry
No ratings yet
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Computer Vision Fundamental Matrix: Please, suggest a subtitle for a book with title 'Computer Vision Fundamental Matrix' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet

Deep Learning Face Attributes in

Uploaded by

Deep Learning Face Attributes in

Uploaded by

Deep Learning Face Attributes in the Wild∗

Ziwei Liu1 Ping Luo1 Xiaogang Wang2 Xiaoou Tang1

Arched Eyebrows Receding Hairline Smiling Mustache Young

face localization, while ANet is pre-trained by massive face

(LNet) and attribute prediction (ANet) can be improved

Figure 4. Face localization by attributes

0.8 0.8 (a)

True Positive Rates

Recall Rates (FPPI = 0.1)

Blond_Hair Black_Hair Brown_Hair Pointy_Nose

single best Figure 11. Automatic attributes grouping.

95% Further Analysis When compared with [17]+ANet,

using an off-the-shelf face detector [17]. We also ex-

You might also like