3D Face Recognition For Biometric Applications: L. Akarun, B. G Okberk, A.A. Salah
3D Face Recognition For Biometric Applications: L. Akarun, B. G Okberk, A.A. Salah
ABSTRACT Face recognition (FR) is the preferred mode of identity recognition by humans: It is natural, robust and unintrusive. However, automatic FR techniques have failed to match up to expectations: Variations in pose, illumination and expression limit the performance of 2D FR techniques. In recent years, 3D FR has shown promise to overcome these challanges. With the availability of cheaper acquisition methods, 3D face recognition can be a way out of these problems, both as a stand-alone method, or as a supplement to 2D face recognition. We review the relevant work on 3D face recognition here, and discuss merits of different representations and recognition algorithms. 1. INTRODUCTION Recent developments in computer technology and the call for better security applications have brought biometrics into focus. A biometric is a physical property; it cannot be forgotten or mislaid like a password, and it has the potential to identify a person in very different settings: a criminal entering an airport, an unconscious patient without documents for identication, an authorized person accessing a highly-secured system. Be it for purposes of security or humancomputer interaction, there is wide application to robust biometrics. Two different scenarios are of primary importance. In the verication (authentication) scenario, the person claims to be someone, and this claim is veried by ensuring the provided biometric is sufciently close to the data stored for that person. In the more difcult recognition scenario, the person is searched in a database. The database can be small (e.g. criminals on the wanted list) or large (e.g. photos on registered ID cards). The unobtrusive search for a number of people is called screening. The signature and handwriting have been the oldest biometrics, used in the verication of authentication of documents. Face image and the ngerprint also have a long history, and are still kept by police departments all over the world. More recently, voice, gait, retina and iris scans, hand print, and 3D face information are considered for biometrics. Each of these have different merits, and applicability. When deploying a biometrics based system, we consider its accuracy, cost, ease of use, ease of development, whether it allows integration with other systems, and the ethical consequences of its use. Two other criteria are susceptibility to spoong (faking an identity) in a verication setting, and susceptibility to evasion (hiding an identity) in a recognition setting. The purpose of the present study is to discuss the merits and drawbacks of 3D face information as a biometric, and review the state of the art in 3D face recognition. Two things make face recognition especially attractive for our consideration. The acquisition of the face information is easy and non-intrusive, as opposed to iris and retina scans. This is important if the system is going to be used frequently, and by a large number of users. The second point is the relatively low privacy of the information; we expose our faces constantly, and if the stored information is compromised, it does not lend itself to improper use like signatures and ngerprints would. The drawbacks of 3D face recognition include high cost and decreased ease-of-use for laser sensors, low accuracy for other
acquisiton types, and the lack of sufciently powerful algorithms. Figure. 1 presents a summary of different biometrics and their relative strengths.
Figure 1: Biometrics and their relative strengths. Although 2D and 3D face recognition are not as accurate as iris scans, their ease of use and lower cost makes them a preferable choice for some scenarios. 3D face recognition represents an improvement over 2D face recognition in some respects. Recognition of faces from still images is a difcult problem, because the illumination, pose and expression changes in the images create great statistical differences and the identity of the face itself becomes shadowed by these factors. Humans are very capable in this modality, precisely because they learn to deal with these variations. 3D face recognition has the potential to overcome feature localization, pose and illumination problems, and it can be used in conjunction with 2D systems. In the next section we review the current research on 3D face recognition. We focus on different representations of 3D information, and the fusion of different sources of information. We conclude by a discussion of the future of 3D face recognition. 2. STATE OF THE ART IN 3D FACE RECOGNITION 2.1 3D Acquisition and Preprocessing We distinguish between a number of range data acquisition techniques. In the stereo acquisition technique, two or more cameras that are positioned and calibrated are employed to acquire simultaneous snapshots of the subject. The depth information for each point can be computed from geometrical models and by solving a correspondence problem. This method has the lowest cost and highest ease of use. The structural light technique involves a light pattern projected on the face, where the distortion of the pattern reveals depth information. This setup is relatively fast, cheap, and allows a single standard camera to produce 3D and texture information. The last technique employs a laser sensor, which is typically more accurate, but also more expensive and slower to use. The acquisition of a single 3D head scan can take more than 30 seconds, a restricting factor for the deployment of laser-based systems.
3D information needs to be preprocessed after acquisition. Depending on the type of sensor, there might be holes and spikes (artifacts) in the range data. Eyes and hair will not reect the light appropriately, and the structured light approaches will have trouble correctly registering those portions. Illumination still effects the 3D acquisition, unless accurate laser scanners are employed [7]. For patching the holes, missing points can be lled by interpolation or by looking at the other side of the face [22, 33]. Gaussian smoothing and linear interpolation are used for both texture and range information [1, 8, 10, 13, 15, 22, 24, 30]. Clutter is usually removed manually [6, 8, 15, 24, 18, 21, 29, 30] and sometimes parts of the data are completely omitted where the acquisition leads to noise levels that cannot be coped with algorithmically [10, 21, 35]. To help distance calculation, the mesh representations can be regularized [16, 36], or a voxel discretization can be used [2]. Most of the algorithms start by aligning the faces, either by their centres of mass [8, 29], nose tip [15, 18, 19, 22, 26, 30], the eyes [13, 17], or by tting a plane to the face and aligning it with that of the camera [2]. Registration of the images is important for all local similarity metrics. The key idea in registration is to dene the similarity metric and the set of possible transformations. The similarity is measured by point-to-point or point-to-surface distances, or cross correlation between more complex features. The rigid transformation of a 3D object involves a 3D rotation and translation, but the nonlinearity of the problem calls for iterative methods [11]. The most frequently used ([16, 19, 21, 22, 27, 29]) registration technique is the Iterative Closest Point (ICP) algorithm [3]. Warping and deforming the models (nonrigid registration) for better alignment helps co-locating the landmarks. An important method is the Thin Plate Spline (TPS) algorithm, which establishes perfect correspondence [16, 20]. One should however keep in mind that the deformation may be detrimental to the recognition performance, as discriminatory information is lost proportional to the number of anchor points. Lu and Jain also distinguish between inter-subject and intra-subject deformations, which is found useful for classication [20]. Landmark locations used in registration are either found manually [6, 10, 17, 19, 21, 25, 33] or automatically [12, 16, 37]. The correct localization of the landmarks is crucial to many algorithms, and it is usually not possible to judge the sensitivity of an algorithm to localization errors from its description. Nevertheless, the automatic landmark localization remains an unsolved problem. 2.2 3D Recognition Algorithms We summarize relevant work in 3D face recognition. We have classied each work according to the primary representation used in the recognition algorithm, much in the spirit of [7]. Table 3 summarizes the recent work on 3D and 2D+3D face recognition. 2.2.1 Curvatures and Surface Features In one of the early 3D face papers, Gordon proposed a curvaturebased method for face recognition from 3D data, kept in a cylindrical coordinate system [13]. Since the curvatures involve second derivatives, they are very sensitive to noise. An adaptive Gaussian smoothing is applied so as not to destroy curvature information. In [31] principal directions of curvatures are used. The advantage of these over surface normals is that they are applicable to freeform surfaces. Moreno et al. extracted a number of features from 3D data, and found that curvature and line features perform better than area features [24]. In [14], the authors have compared different representations on the 3D RMA dataset: point clouds, surface normals, shape-index values, depth images, and facial prole sets. Surface normals are reported to be more discriminative than others, and LDA is found very useful in extracting discriminative features. 2.2.2 Point Clouds and Meshes Point cloud is the most primitive 3D representation for faces, and it is difcult to work with. Achermann and Bunke employ Hausdorff distance for matching the point clouds [2]. They use a voxel
discretization to speed up matching, but it causes some information loss. Lao et al. discard matched points with large distances as noise [17]. When the data are in point cloud representation, ICP is the most widely used registration technique. The similarity of two point sets that is calculated at each iteration of the ICP algorithm is frequently used in point cloud-based face recognizers. Medioni and Waupotitsch present an authentication system that acquires the 3D image of the subject with two calibrated cameras [23] and ICP algorithm is used to dene similarity between two face meshes. Lu et al. use a hybrid-ICP based registration using Besls method and Chens method successively [19]. The base mesh is also used for alignment in [36], where features are extracted from around landmark points, and nearest neighbour after PCA is used for recognition. Lu and Jain also use ICP for rigid deformations, but they also propose to use TPS for intra-subject and inter-subject nonrigid deformations, with the purpose of handling expression variations [20]. Deformation analysis and combination with appearance based classiers both increase the recognition accuracy. In a similar study, Irfano lu et al. have used ICP to automatg ically locate facial landmarks in a coarse alignment step, and then warp faces using TPS algorithm to establish dense point-to-point correspondences [16]. The use of an average face model signicantly reduces the complexity of similarity calculation and pointcloud representation of registered faces are more suitable for recognition then depth image-based methods, point signatures, and implicit polynomial-based representation techniques. In a follow-up study, G kberk et al. have analyzed the effect of registration metho ods on the classication accuracy [14]. To inspect side effects of warping on discrimination an ICP-based approximate dense registration algorithm is designed that allows only rotation and translation transformations. Experimental results conrmed that ICP without warping leads to better recognition accuracy1 . Table. 1 summarizes the classication accuracies of different feature extractors for both TPS-based and ICP-based registration algorithms on the 3D RMA dataset. Improvement is visible for all feature extraction methods, except the shape-index. Table 1: Average classication accuracies (and standard deviations) of different face recognizers for 1) TPS warping-based and 2) ICPbased face representation techniques. TPS ICP Point Cloud 92.95 1.01 96.48 2.02 Surface Normals 97.72 0.46 99.17 0.87 Shape Index 90.26 2.21 88.91 1.07 Depth PCA 45.39 2.15 50.78 1.10 Depth LDA 75.03 2.87 96.27 0.93 Central Prole 60.48 3.78 82.49 1.34 Prole Set 81.14 2.09 94.30 1.55
2.2.3 Depth Map Depth maps are usually used in conjunction with subspace methods, although most of the existing 2D techniques are suitable for processing the depth maps. The depth map construction consists of selecting a viewpoint, and smoothing the sampled depth values. In [15], PCA and ICA were compared on the depth maps. ICA was found to perform better, but PCA degraded more gracefully with declining numbers of training samples. In Srivastava et al. the set of all k-dimensional subspaces of the data space is searched with a MCMC simulated annealing algorithm for the optimal linear subspace [30]. The optimal subspace method performs better than PCA, LDA or ICA. Achermann at al. compare an eigenface method with a 5-state left-right HMM on a database of depth maps [1]. They show that the eigenface method outperforms the HMM, and
1 In [32] texture was found to be more informative than depth; our ndings point out to warping as a possible reason.
the smoothing effects the eigenface method positively, while its effect on the HMM is detrimental. The 3D data are usually more suitable for alignment, and should be preferred if available. In Lee et al. the 3D image is thresholded after alignment to obtain the depth map, and a number of small windows are sampled from around the nose [18]. The statistical features extracted from these windows are used in recognition. 2.2.4 Prole The most important problem for the prole-based schemes is the extraction of the prole. In an early paper Cartoux et al. use an iterative scheme to nd the symmetry plane that cuts the face into two similar parts [9]. The nose tip and a second point are used to extract the proles. Nagamine et al. use various heuristics to nd feature points and align the faces by looking at the symmetry [25]. Then the faces are intersected with different kinds of planes (vertical, horizontal or cylindrical around the nose tip), and the intersection curve is used in recognition. Vertical planes around 20mm. of the central region and selecting a cylinder with 20 30mm. radius around the nose (crossing the inner corners of the eyes) produced the best results. In [4], Beumier and Acheroy detail the acquisition of the popular 3D RMA dataset with structural light and report prole based recognition results. In addition to the central prole, they use the average of two lateral proles in recognition. Once the proles are obtained, there are several ways of matching them. In [9], corresponding points of two proles are selected to maximize a matching coefcient that uses the curvature on the prole curve. Then a correlation coefcient and the mean quadratic distance is calculated between the coordinates of the aligned prole curves, as two alternative measures. In [4], the area between the prole curves is used. In [14] distances calculated with L1 norm, L2 norm, and generalized Hausdorff distance were compared for aligned proles, and the L1 norm is found to perform better. 2.2.5 Analysis by Synthesis In [6] the analysis-by-synthesis approach that uses morphable models is detailed. A morphable model is dened as a convex combination of shape and texture vectors of a number of samples that are placed in dense correspondence. A single 3D model face is used to render an image similar to the test image, which leads to the estimation of viewpoint parameters (pose angles, 3D translation, focal length of the camera), illumination parameters (ambient and directed light intensities, direction angles of the light, colour contrast, gains and offsets of the colour channels), and deformation parameters (shape and texture). In [22] a system is proposed to work with 2D colour images and corresponding 3D depth maps. The idea is to synthesize a pose and illumination corrected image pair for recognition. The depth images performed signicantly better (by 4-7 per cent) than colour images, and the combination increased the accuracy as well (by 1-2 per cent). Pose correction is found to be more important than illumination correction. 2.2.6 Combinations of Representations Most of the work that uses 3D face data use a combination of representations. The enriched variety of features, when combined with classiers with different statistical properties, produce more accurate and more robust results. In Tsutsumi et al. surface normals and intensities are concatenated to form a single feature vector, and the dimensionality is reduced with PCA [34]. In [35], the 3D data are described by point signatures, and the 2D data by Gabor wavelet responses, respectively. 3D intensities and texture were combined to form the 4D representation in [29]. Bronstein et al. point out to the non-rigid nature of the face, and to the necessity of using a suitable similarity metric that takes this deformability into account [8]. For this purpose, they use multi-dimensional scaling projection algorithm for both shape and texture information. Apart from techniques that fuse the representations at the feature level, there are a number of systems that employ combination
at the decision level. Chang et al. propose in [10] to use Mahalanobis distance-based nearest-neighbor classiers on the 2D intensity and 3D range images separately, and fuse the decisions with a rank-based approach at the decision level. In [32] the depth map and colour maps (one for each YUV channel) are projected via PCA and the distances in four subspaces are combined by multiplication. In [33] the depth map and the intensity image are processed with embedded HMMs separately, and weighted score summation is proposed for the combination. In [21], Lu and Jain combine texture (LDA) and surface (point-to-plane distance) with weighted voting, but only the difcult samples are classied via the combined system. Proles are also used in conjunction with other features. In [5], 3D central and lateral proles, gray level central and lateral proles were evaluated separately, and then fused with Fishers method. In [26] a surface-based recognizer and a prole-based recognizer are combined at the decision level. Surface-matchers similarity is based on a point cloud distance approach, and prole similarity is calculated using Hausdorff distance. In [27], a number of methods are tested on the depth map (Eigenface, Fisherface, and kernel Fisherface), and the depth map expert is fused with three prole experts with Max, Min, Sum, Product, Median and Majority Vote rules, out of which the Sum rule was selected. G kberk et al. have proposed two combination schemes that o use 3D facial shape information [14]. In the rst scheme, called parallel fusion, different pattern classiers are trained using different features such as point clouds, surface normals, facial proles, and PCA/LDA of depth images. The outputs of these pattern classiers are merged using a rank-based decision level fusion algorithm. As combination rules, consensus voting, a non-linear variation of a rank-sum method, and a highest rank majority method are used. Table. 2 shows the recognition accuracies of individual pattern recognizers together with the accuracies of the parallel ensemble methods for the 3D RMA dataset. It is seen that while the best individual pattern classier (Depth-LDA) can accurately classify 96.27 per cent of the test examples, a non-linear rank-sum fusion of Depth-LDA, surface normals, and point cloud classiers improves the accuracy to 99.07 per cent. Paired t-test results indicate that all of the accuracies of the parallel fusion schemes are statistically better than individual classiers performances. The second scheme is called serial fusion where the class outputs of a ltering rst classier is passed to a second more complex classier. The ranked output lists of these classiers are fused. The rst classier in the pipeline should be fast and accurate. Therefore a point cloud-based pattern classier was selected. As the second classier, Depth-LDA was chosen because of its discriminatory power. This system has 98.14 per cent recognition accuracy, signicantly better than the single best classier.
Table 2: Classication accuracies of single face classiers (top part), and the combined classiers (bottom part). Performances of Pattern Classiers Dimensionality Acc. Point Cloud 3, 389 3 95.96 Surface Normals 3, 389 3 95.54 Depth PCA 300 50.78 Depth LDA 30 96.27 Prole Set 1, 557 94.30 Performances of Combined Classiers Pattern Classiers Consensus Voting LDA, PC, SN Nonlinear Rank-Sum Prole, LDA, SN Highest Rank Majority Prole, LDA, SN, PC Serial Fusion PC, LDA Acc. 98.76 99.07 98.13 98.14
3. CONCLUSIONS There are a number of questions 3D face recognition research needs to address. In acquisition, the accuracy of cheaper and less intrusive systems needs to be improved, temporal sequences should be considered. For registration, automatic landmark localization, artifact removal, scaling, and elimination of errors due to occlusions, glasses, beard, etc. need to be worked out. Ways of deforming the face without losing discriminative information might be benecial. It is obvious that information fusion is the future of 3D face recognition. There are many ways of representing and combining texture and shape information. We also distinguish between local and congural processing, where the ideal face recognizer makes use of both. For realistic systems, single training instance cases should be considered, which is a great hurdle to some of the more successful discriminative algorithms. Publicly available 3D datasets are necessary to encourage further research on these topics. REFERENCES [1] Achermann, B., X. Jiang, H. Bunke, Face recognition using range images, in Proc. Int. Conf. on Virtual Systems and MultiMedia, pp.129-136, 1997. [2] Achermann, B., H. Bunke, Classifying range images of human faces with Hausdorff distance, in Proc. ICPR, pp.809813, 2000. [3] Besl, P., N. McKay, A Method for Registration of 3-D Shapes, IEEE Trans. PAMI, vol.14, no.2, pp.239-256, 1992. [4] Beumier, C., M. Acheroy, Automatic 3D face authentication, Image and Vision Computing, vol.18, no.4, pp.315-321, 2000. [5] Beumier, C., M. Acheroy, Face verication from 3D and grey level cues, Pattern Recognition Letters, vol.22, pp.13211329, 2001. [6] Blanz, V., T. Vetter, Face Recognition Based on Fitting a 3D Morphable Model, IEEE Trans. PAMI, vol.25, no.9, pp.10631074, 2003. [7] Bowyer, K., Chang K., P. Flynn, A survey of multi-modal 2D+3D face recognition, Technical Report, Notre Dame Department of Computer Science and Engineering, 2004. [8] Bronstein, A.M., M.M. Bronstein, R. Kimmel, Expressioninvariant 3D face recognition, in J. Kittler, M.S. Nixon (eds.) Audio- and Video-Based Person Authentication, pp.62-70, 2003. [9] Cartoux, J.Y., J.T. LaPreste, M. Richetin, Face authentication or recognition by prole extraction from range images, in Proc. of the Workshop on Interpretation of 3D Scenes, pp.194199, 1989. [10] Chang, K., K. Bowyer, P. Flynn, Multi-modal 2D and 3D biometrics for face recognition, in Proc. IEEE Int. Workshop on Analysis and Modeling of Faces and Gestures, 2003. [11] Chen, Y., G. Medioni, Object Modeling by Registration of Multiple Range Images, Image and Vision Computing, vol.10, no.3, pp.145-155, 1992. [12] Colbry, D., X. Lu, A. Jain, G. Stockman, 3D face feature extraction for recognition, Technical Report MSU-CSE-04-39, Computer Science and Engineering, Michigan State University, 2004. [13] Gordon, G. Face recognition based on depth and curvature features, in SPIE Proc.: Geometric Methods in Computer Vision, vol.1570, pp.234-247, 1991. [14] G kberk, B., A.A. Salah, L. Akarun, Rank-based Decision o Fusion for 3D Shape-based Face Recognition, submitted for publication. [15] Hesher, C., A. Srivastava, G. Erlebacher, A novel technique for face recognition using range imaging, in Proc. 7th Int. Symposium on Signal Processing and Its Applications, pp.201-204, 2003.
[16] Irfano lu, M.O., B. G kberk, L. Akarun, 3D Shape-Based g o Face Recognition Using Automatically Registered Facial Surfaces, in Proc. ICPR, vol.4, pp.183-186, 2004. [17] Lao, S., Y. Sumi, M. Kawade, F. Tomita, 3D template matching for pose invariant face recognition using 3D facial model built with iso-luminance line based stereo vision, in Proc. ICPR, vol.2, pp.911-916, 2000. [18] Lee, Y., K. Park, J. Shim, T. Yi, 3D face recognition using statistical multiple features for the local depth information, in Proc. ICVI, 2003. [19] Lu, X., D. Colbry, A.K. Jain, Three-Dimensional Model Based Face Recognition, in Proc. ICPR, 2004. [20] Lu, X. A.K. Jain, Deformation Analysis for 3D Face Matching, to appear in Proc. IEEE WACV, 2005. [21] Lu, X. A.K. Jain, Integrating Range and Texture Information for 3D Face Recognition, to appear in Proc. IEEE WACV, 2005. [22] Malassiotis, S., M.G. Strintzis, Pose And Illumination Compensation For 3D Face Recognition, in Proc. ICIP, 2004. [23] Medioni, G., R. Waupotitsch, Face recognition and modeling in 3D, IEEE Int. Workshop on Analysis and Modeling of Faces and Gestures, pp.232-233, 2003. [24] Moreno, A.B., A. S nchez, J.F. V lez, F.J. Daz, Face a e recognition using 3D surface-extracted descriptors, in Proc. IMVIPC, 2003. [25] Nagamine, T., T. Uemura, I. Masuda, 3D facial image analysis for human identication, in Proc. ICPR, pp.324-327, 1992. [26] Pan, G., Y. Wu, Z. Wu, W. Liu, 3D Face recognition by prole and surface matching, in Proc. IJCNN, vol.3, pp.21692174, 2003. [27] Pan, G., Z. Wu, 3D Face Recognition From Range Data, submitted to Int. Journal of Image and Graphics, 2004. [28] Pankanti, S., R.M. Bolle, A. Jain Biometrics: The Future of Identication, IEEE Computer, pp.4649, 2000. [29] Papatheodorou, T., D. Rueckert, Evaluation of automatic 4D face recognition using surface and texture registration, in Proc. AFGR, pp.321-326, 2004. [30] Srivastava, A., X. Liu, C. Hesher, Face Recognition Using Optimal Linear Components of Range Images, submitted for publication, 2003. [31] Tanaka, H., M. Ikeda, H. Chiaki, Curvature-based face surface recognition using spherical correlation, in Proc. ICFG, pp.372-377, 1998. [32] Tsalakanidou, F., D. Tzovaras, M. Strinzis, Use of depth and colour eigenfaces for face recognition, Pattern Recognition Letters, vol.24, pp.1427-1435, 2003. [33] Tsalakanidou, F., S. Malassiotis, M. Strinzis, Integration of 2D and 3D images for enhanced face authentication, in Proc. AFGR, pp.266-271, 2004. [34] Tsutsumi, S., S. Kikuchi, M. Nakajima, Face identication using a 3D gray-scale image-a method for lessening restrictions on facial directions, in Proc. AFGR, pp.306-311, 1998. [35] Wang, Y., C. Chua, Y. Ho, Facial feature detection and face recognition from 2D and 3D images, Pattern Recognition Letters, vol.23, pp.1191-1202, 2002. [36] Xu, C., Y. Wang, T. Tan, L. Quan, Automatic 3D face recognition combining global geometric features with local shape variation information, in Proc. AFGR, pp. 308-313, 2004. [37] Yacoob, Y., L.S. Davis, Labeling of human face components from range data, CVGIP: Image Understanding, vol.60, no.2, pp.168-178, 1994.
Table 3: Overview of 3D face recognition systems Group Gordon [13] Tanaka et al. [31] Moreno et al. [24] Achermann and Bunke [2] Lao et al. [17] Medioni and Waupotitsch [23] Irfano lu et al. [16] g Lu et al. [19] Xu et al. [36] Lu and Jain [20] Achermann et al. [1] Hesher et al. [15] Lee et al. [18] Srivastava et al. [30] Cartoux et al. [9] Nagamine et al. [25] Beumier and Acheroy [4] Blanz and Vetter [6] Malassiotis and Strinzis [22] Tsutsumi et al. [34] Beumier and Acheroy [5] Wang et al. [35] Bronstein et al. [8] Chang et al. [10] Pan et al. [26] Tsalakanidou et al. [32] Tsalakanidou et al. [33] Pan and Wu [27] Papatheodorou and R ckert [29] u Lu and Jain [21] G kberk et al. [14] o Representation curvatures curvature based EGI Curvature, line, region features point cloud curve segments mesh point cloud mesh regular mesh deformation points depth map mesh depth map depth map prole vertical, horiz., circular proles vertical proles 2D+viewpoint parameters texture+ depth map texture + depth map 2D and 3D vertical proles point signature Gabor features texture+ depth map texture+ depth map prole+ point cloud texture+ depth map texture+ depth map depth map +prole dense mesh + texture mesh +texture surface normals, proles, depth map, point cloud Database 26 training 24 test NRCC 7 img. 60 persons 120 training 120 test 36 img. 10 persons 7 img. 100 persons 3D RMA 90 training 113 test 3D RMA 500 training 196 test 120 training 120 test FSU 2 img. 35 persons 6 img. 67 persons 3/4 img. 5 persons 10 img. 16 persons 3D RMA CMU-PIE, FERET 110 img. 20 persons 35 img. 24 persons 3D RMA 6 img. 50 persons 157 persons 278 training 166 test 3D RMA XM2VTS 60 img. 50 persons 6 img. 120 persons 12 img. 62 persons 598 test scans 3D RMA Algorithm Euclidean nearest neighbour Fishers spherical correlation Euclidean nearest neighbour Hausdorff nearest neighbour Euclidean nearest neighbour normalized cross-correlation Point set difference (PSD) hybrid ICP and cross-correlation Feature extraction, PCA and NN ICP + TPS, nearest neighbour eigenface vs. HMM ICA or PCA+ nearest neighbour feature extraction+ nearest neighbour subspace projection + SVM curvature based nearest neigbour Euclidean nearest neigbour area based nearest neigbour analysis by synthesis embedded HMM +fusion concatenated features+PCA nearest neighbour +fusion concatenation after PCA+SVM concatenation after PCA+near. neigh. Mahalanobis based near.neigh.+fusion ICP+Hausdorff +fusion nearest neighbour +fusion embedded HMM +fusion (kernel) Fisherface +Eigenface+fusion nearest neighbour +fusion ICP(3D), LDA(2D) + fusion PCA, LDA, nearest neighbour, rank based fusion Notes Curvatures can be used for feature detection but they are sensitive to smoothing. Use principal curvatures instead of surface normals for non-polyhedral objects. Angle, distance and curvature features work better than area based features. Hausdorff distance can be speeded up by voxel discretization. Points with bad correspondence are not used in distance calculation. After alignment, a distance map is found. Statistics of the map are used in similarity. ICP used to align point clouds with a base mesh. PSD outperforms PCA on depth map. ICP distances and shape index based correlation can be usefully combined. Feature derivation + PCA around landmarks worked better than aligned mesh distances. Distinguishing between inter-subject and intra-subject deformations helps recognition. Eigenface outperforms HMM. Smoothing is good for eigenface, bad for HMM. ICA outperforms PCA, PCA degrades more gracefully as training samples are decreased. Mean and variance of depth from windows around the nose are used as features. Optimal subspace found with MCMC simulated annealing outperforms PCA, ICA and LDA. High quality images needed for principal curvatures. Central vertical prole and circular sections touching eye corners are most informative. Central prole and mean lateral proles are fused by averaging. Using a generic 3D model, 2D viewpoint parameters are found. Depth is better than colour, fusion is best. Pose correction is better than illumination correction. Adding perturbed versions of training images reduces sensitivity of PCA. Combination of 2D and 3D helps. Temporal fusion (snapshots taken in time) helps too. Omit 3D info from the eyes, eyebrows (missing elements) and mouth (expression sensitivity) Bending-invariant canonical representation is robust to facial expressions. Pose correction through 3D is not better than rotation-corrected 2D. Surface and prole combined usefully. Discard worst points (10 per cent) during registration. Fusion of frontal colour and depth images with colour faces from prole. Appropriately processed texture is more informative than warped depth maps. Sum rule is preferred to max, min, product, median and majority vote for fusion. 3D helps 2D especially for prole views. Texture has small relative weight. Difcult samples are evaluated by the combined scheme. Best single classier is depth-LDA. Combining it with surface normals and proles with nonlinear rank sum increases accuracy.