Age Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Journal of Image and Graphics, Volume 2, No.

2, December 2014

Age Estimation Using Support Vector Machine–


Sequential Minimal Optimization
Julianson Berueco, Kim Lopena, Arby Moay, Mehdi Salemiseresht, and Chuchi Montenegro
Department of Computer Science, College of Computer Studies, Silliman University, Dumaguete City, Philippines
Email: [email protected]

Abstract—This paper investigates the use of SVM-SMO image contains a face and to identify the facial feature
algorithms in estimating the age of a person through the points required. The second part measures the distances
evaluation of its facial features on both front and side-view and angles between the selected set of facial feature
face orientation. Stephen-Harris algorithm, SURF, and points whose results was fed to SVM-SMO classifier for
Minimum Eigenvalue Feature Detection algorithms were
age determination.
also used for feature extraction. During experiments,
training sets composed on 44 front view images and 44 side
view images were used to train the network. Testing was II. SCOPE AND LIMITATION
performed to 140 front view images and 44 side view images.
The study covers multiple human faces in an image
Result of the experiment shows age recognition of 53.85%
regardless of size or distance and orientation (front or
for front view images and 14.3% for side view images. 
side-view). The image can also be either a colored or in
Index Terms—age determination, image processing, neural black and white. It should also be able to estimate the age
networks regardless of expression or if the subject of the image has
wrinkles because of smoking, drinking, sleep deprivation,
overexposure to the sun and other external factors that
I. INTRODUCTION contribute to them “looking old.” For ease, images of
Filipino faces were used for the study.
The availability of robust face recognition algorithms
The study did not cover faces that are obscured with
brings vast studies in the area of image processing to
sunglasses/eyeglasses, masks, tattoos or any foreign
perform expression recognition, smile detection, identity
objects that covers the feature points used by the
recognition, and much more. One of the more unexplored
application. The images were not blurry or fuzzy and the
areas in face study is age determination. Not much study
luminance levels were normal or where the face is
has been done in determining an age of a person through
recognizable. It also did not cover images whose
image processing and for those studies age determination
subject’s face has been altered by cosmetic surgery,
is only tested on front-view images. Normally, to
injury, illness or scars.
determine someone’s age people usually turn to looking
at a person’s wrinkling face, similar to the study of [1],
III. METHODS
but it poses challenges for some people who have
wrinkles due to frequent smoking, drinking, overexposure The overall process used in this study is reflected in
to sun, and sleep deprivation, among others. The other Fig. 1. Training images were initially fed to the model
method to determine age is by the analysis of a selected before testing of other images was performed. In both
set of facial feature points, for instance the structure of scenarios, all images passed through pre-processing,
facial bones, similar to the study of [2] on how the feature extraction, and classification components.
mandible continues to enlarge in the course of life.
Support Vector Machine-Sequential Minimal
Optimization has been used in age classification and has
proven to have good accuracy but has been applied to a
wide range of age groups. In the study of [3], they used
what they determined to be the optimized facial feature
points for the facial measurements in classifying age.
The purpose of this study is to be able to determine a
person’s age by analyzing a set of facial feature points
from an image. The method is divided into two parts: (1)
face detection and facial feature identification, and (2)
age determination. For the first part, the Viola-Jones
Object Detection framework was used to detect if an
Figure 1. System block diagram
Manuscript received June 25, 2014; revised November 19, 2014.

©2014 Engineering and Technology Publishing 145


doi: 10.12720/joig.2.2.145-150
Journal of Image and Graphics, Volume 2, No.2, December 2014

A. Image Acquisition
All images used in the training and testing phase of
this research were captured using a typical digital camera
having a resolution of 1280×760px. Image acquisition
was also done in a controlled environment, that is, proper
illumination and lighting is observed, and with a light
colored background. For the grayscaling of several
images, it was done using the grayscale option in
Photoshop and as well in Photoscape.
Figure 3. Ears region of interest
B. Face Detection for Front and Side View
Detection of the face region for both front and side c) Chin
view faces is described below. The training of the chin was quite an ordeal, same with
1) Front view viola jones the eye and the ear. A similar procedure with the ears was
In this study, the Vision toolbox in MATLAB was used for the chin, in which we selected the facial feature
used to detect if an image contains faces in front-view. using the cascade training GUI by manually inputting the
The toolbox uses the Viola-Jones Object Detection ROI and forming a rectangular over the chin, starting
Framework as a method for detection. from its base and going near the lips, but forming the
2) Side view training using cascade training GUI smallest possible rectangle on both sides as shown Fig. 4.
The Cascade Training GUI is an interactive GUI for
managing the selection and positioning of rectangular
ROIs in a list of images, and for specifying ground truth
for training algorithms. In this study, it was used to train
the detectors needed for the side view face detection.
Each side and features, ears, chin and eyes were trained
separately for a higher accuracy detection rate. Each
training sets were trained differently in order to meet the
programs requirements. The GUI consisted of stages that
have to be manually adjusted to minimize false detections Figure 4. Chin region of interest
and each set was specified to have the feature type set to
LBP (local binary patterns). During training, several images up to 200+ for each
side were conducted for the purpose to increase detection.
a) Side view Numerous changes on the settings were done to get a
Training of the side views had to cover all the features close 100% accuracy. Most of the changes were in the per
that were used for feature extraction. The ROIs start just – stage false alarm rate and per – stage true positive rate.
above the eyebrows, below the chin and behind the ears. Cascade stages were lowered to 7 stages to increase
The false alarm rate was set to 0.1, the numbers of stages accuracy and having the negative samples factor to 4 in
were set to eight, and the negative samples were set to 5. order for it to compare several hundreds of negative
Manually inputting the object training to 32×34 increased samples and making the object training size equal to
the detection rate. Fig. 2 shows the selection for side view 23×26 to specify a small size will escalate precision.
face detection training.
d) Eye

Figure 2. Region of interest selection Figure 5. Eye region of interest

b) Ear Cascade training GUI for training side-view eye


The training of the ears had the same concept of the detector to detect eyes for both sides was used. In cascade
other detectors, but balancing the stages and negative GUI, 227 positive picture for training left-side eyes and
samples was difficult due to the images used for training. 276 for right side eyes was used, selecting the best ROI
The images had to be manually selected from their for each side-eye as shown in Fig. 5.
directories; the pattern of the ear can be easily obscured For train cascade detector, 2049 negative picture was
by hair and would affect the training data. Earrings, added. Final training settings were set at false alarm rate
specifically studs, were not much of a factor Fig. 3, equal to 0.00150, true positive rate at 0.995, number of
shows the selection of side view ear detection training. cascade stages at 7 and negative samples factor of 8.

©2014 Engineering and Technology Publishing 146


Journal of Image and Graphics, Volume 2, No.2, December 2014

Linear binary pattern facial feature extractor was used the mean that is used to plot the location of the ear as
with an image size of 1280 by 720 pixels. The accuracy is shown in Fig. 6.
95% for both sides and false detector at less than 3%. b) Eye
C. Feature Detection The corner detector used to extract the feature from the
1) Front-view Viola Jones eye was the Harris-Stephens algorithm. After the eye was
In a similar fashion to the studies of [4] and [5], the detected, the region where the eye is located was cropped
researchers needed to be able to detect the left and right and converted into grayscale. The corners were then
eyeballs, and the mouth of a front view face. The detected by using detectHarrisFeatures() method and the
researchers also incorporated elements from the study of strongest features are extracted using
[4] into the research. corners.selectStrongest() method of the toolbox.
For the progression of the research, the Vision Toolbox
was used for detecting the different facial feature points
on the front-viewing face. Similar to face detection, it
also uses the Viola-Jones Object Detection Framework as
its method of detection.
First, the ‘EyePairBig’ model was used to search if the
face region contains both the left and right eyes. If ever
this detection fails, image processing for this region is
aborted.
Second, the ‘LeftEye’ model was used to search the
face region for the left eye. To eliminate the various false
detections that this model may cause, the paper compares
each of the detected LeftEye regions’ point coordinates
Figure 7. Strongest points of eye
with the coordinate of the ‘EyePairBig’ region. The
nearest ‘LeftEye’ region is selected as the ‘LeftEye’ The strongest points were located by computing the
region, the rest are deleted. mean that is used to plot the location of the eye as shown
Third, the ‘RightEye’ model was used to search the in Fig. 7.
face region for the right eye. It followed a similar process
c) Chin
with the ‘LeftEye’, except it compared each of the
detected ‘RightEye’ regions’ point coordinates plus the The corner detector used to extract the feature from the
length of the region with the ‘EyePairBig’ region’s point chin was the Harris-Stephens algorithm. The procedure is
coordinate plus the length of that region. similar to the eye detection.
Then, the ‘Mouth’ model was used to search the face
D. Training of SVM-SMO
region for the mouth.
2) Side-view SURF feature and Harris-Stephens To implement the SVM-SMO, the WEKA (Waikato
algorithm Environment for Knowledge Analysis) data mining
a) Ear software was used. WEKA requires the creation of an
ARFF file in order for it to perform its calculations. The
ARFF file contains the data acquired from the previous
steps, specifically, the measurements and angles of the
facial feature points. Before the output model was created,
a kernel function was first selected. The researchers used
the default kernel function in WEKA, which is the
Polynomial kernel with an exponent of 1.

IV. EXPERIMENTS
A. Front-View Age Classification
Table I shows the number of subjects gathered for
Figure 6. Strongest points using SURF testing. During testing, 98.57% (138) were successful
during the face detection stage. In the feature extraction
The corner detector used to extract the feature from the
stage, only 78 out of the 138 (56.52%) passed the
ear is SURF (Speed Up Robust Feature). After the ear
was detected, the region where the ear is located was detection, though most of them incurred some slight
cropped and converted into grayscale, as needed for all errors. For age classification stage, only 42 out of the 78
corner detectors. The corners were then detected by using (53.85%) managed to correctly estimate the age category.
detectSURFFeatures() method and the features were Overall, the system only managed to correctly classify 42
extracted with using extractFeatures() method of the out of the 140 total subjects, bringing its accuracy to only
toolbox. The strongest points were located by computing 30%.

©2014 Engineering and Technology Publishing 147


Journal of Image and Graphics, Volume 2, No.2, December 2014

TABLE I. AGE TEST DATA FRONT-VIEW subjects, 2 (14.3%) were correctly classified. Although
Age Face Feature Age the age classifier had an accuracy of 84.1% on its training
Subjects
Category Detection Extraction Classification set, the testing set only managed to have 2.27% overall.
-10 5 5/5 5/5 1/5
TABLE II. AGE DATA SET SIDE-VIEW
11-15 13 13/13 7/13 1/7
Testing Control Summary
16-20 59 59/59 44/59 39/44
Age No. of Face Feature Correctly
21-25 24 24/24 11/25 0/11 Category Subjects Detection Extraction Classified
-10 10 9/10 1/9 0/1
26-30 6 6/6 0/6 0/0
11-15 10 10/10 0/10 0/0
31-35 2 1/2 0/1 0/0
16-20 10 10/10 8/10 2/8
36-40 7 7/7 2/7 0/2
21-25 10 10/10 5/10 0/5
41-45 10 10/10 3/10 0/3
26-30 6 6/6 0/6 0/0
46-50 6 5/6 2/5 0/2
31-35 5 3/5 0/3 0/0
51+ 8 8/8 2/8 0/2
36-40 9 9/9 0/9 0/0
Total 140 138/140 78/138 42/78
41-45 10 10/10 0/10 0/0
Percentage 98.57% 56.52% 53.85%
46-50 8 8/8 0/8 0/0
Fig. 8 shows the different types of detections that the
51+ 10 10/10 0/10 0/8
system does for the colored images. Fig. 8(1) shows a
perfect detection and feature extraction, with the points Total 88 85/88 14/85 2/14
exactly where they should be. Percentage 96.6% 16.5% 14.3%

In Fig. 9, it shows the image with the face detected,


features detected and feature points extracted. The images
used in the training set are manually chosen by testing all
images in the database; the images with the most accurate
feature points are then recorded and classified by SVM-
SMO

Figure 9. Side view training set

In Fig. 10, it shows the images with the face detected,


but the required features were not. From our observations,
Figure 8. Front-view selection the factors that hinder the feature detection: (1) the
position of the head during side view, chinned up or
Fig. 8(2) shows detection and feature extraction with down, (2) the position of the face in the image itself, the
slight errors. We still considered this during testing as it bottom of the chin and top of the head needs to be spaced,
still gives out a prediction. Fig. 8(3) shows detection and the minimum distance of the subject to the camera should
feature extraction with a major error, and while the be 2 ft. away, (3) the blurriness of the image and (4) after
system still also gives a prediction for this, we regarded detecting the features, the values given by the edge
the image as being not detected since the mouth was not detectors are not constantly in the same location
boxed. Fig. 8(4) shows detection and a failure during the
primary eye detection. It will show a prediction since if a
detector fails, it will stop processing that particular face
image.
B. Side-View Age Classification
Eighty-eight (88) test subjects were used for side-view
age classification (Table II), out of which, only 85
(96.6%) subjects had their face detected. Out of the 85,
14 (16.5%) had their features extracted, and out of the 14 Figure 10. Side-view false detection

©2014 Engineering and Technology Publishing 148


Journal of Image and Graphics, Volume 2, No.2, December 2014

1) Group image V. CONCLUSION AND RECOMMENDATION


Samples of grouped images are also tested (Fig. 11.1)
taken from the original image with a dimensions of For the duration on this study, the researchers acquired
1280×720px. Observations from the testing shows: (1) images by taking pictures of random Filipino people of
the distance of faces can’t detect the features well; (2) the different ages using an 8.0 megapixel camera cellphone
system in detecting features wasn’t trained on images of camera having resolution setting of 1280 × 728px, with a
the faces (3) lastly the distance of the faces are quite far minimum distance of 2 ft. The images then went through
where the current resolution setting can’t make out the the Viola-Jones Object Detection Framework, an
features of the faces. algorithm used by the researchers for both face and
feature detection. After that, the detected images went
through SURF, Harris-Stephens, and Minimum
Eigenvalue feature detection algorithms for feature
extraction. Following the extraction the feature points
were measured and calculated, and the data went through
the SVM-SMO algorithm for age classification using
WEKA.
The first scenario yielded only 53.85% for the front
view and 14.3% for the side view but had zero percent
accuracy on ages beyond 16-20 age categories. This is
likely because the images used for training was lacking
and/or zooming in for a standard sized image lost some
pixels. The accomplishment of a low-percentage accuracy
stems from the system’s inability to accurately extract the
feature points needed, and properly measure the distances
and angles between these feature points. The
Figure 11. Group image pic measurement of these feature points were heavily
The face detection wasn’t a problem at all for the dependent on the pixel size that any manipulation of the
system as it generated 53/57 (92.98%) face detection images, such as cropping and resizing, may cause it to
(Table III). The drawback for the distant images is the lose some pixels and, therefore, lose the reliability of the
detectors for the facial features weren’t trained for it, measurements, causing further the SVM-SMO to be
hence, the low detection of 17/53 (32.07%) which confused with the data, thereby causing it to predict
coincides with the classification of ages.
inaccurate age estimates.
TABLE III. GROUP IMAGE SUMMARY
Front View Side View Face Feature Correctly REFERENCES
Subjects Subjects Detection Extraction Classified [1] Y. H. Kwon and N. D. V. Lobo, “Age classification from facial
2 0 2/2 0/2 0/0 images,” Computer Vision and Image Understanding, vol. 74, no.
1, pp. 1-21, 1999.
1 1 2/2 ½ 0/1 [2] R. Shaw, E. Katzel, P. Koltz, D. Kahn, J. Girotto, and H.
Langstein, “Aging of the mandible and its aesthetic implications:
3 0 3/3 2/3 2/2
A three dimensional CT study,” AAPS Annual Meeting, 2009.
2 1 3/3 1/3 0/2 [3] Z. Alom, S. Islam, N. Kim, J. H. Park, and M. L. Piao, “Optimized
facial features-based age classification,” World Academy of
3 0 3/3 2/3 0/2 Science, Engineering and Technology, no. 63, pp. 448-452, 2012.
[4] K. C. Fan and C. Lin, “Triangle-based approach to the detection of
2 1 3/3 2/3 0/2 human face,” Pattern Recognition Journal Society, vol. 34, pp.
1271-1284, 2001.
2 0 2/2 2/2 2/2 [5] A. R. Chowdhury, R. Jana, and H. Pal, “Age group estimation
0 2 2/2 0/2 0/0 using face angle,” Journal of Computer Engineering, vol. 7, no. 5,
pp. 35-39, 2012.
14 1 13/15 1/13 1/1

4 0 4/4 1/4 1/1

7 0 7/8 2/7 1/2 Julianson Beureco was born in Cavite,


0 2 2/2 0/2 0/0 Philippines in 1991. He attained his BS
Computer Science Degree in Silliman
0 2 2/2 1/2 0/1 University, Dumaguete. He is currently
working in the R&D Department as an
6 0 5/6 2/5 2/2 Application Developer in E-Hors, Dumaguete
City, Philippines. His current ambition is to
Total 54 53/57 17/53 9/17 attain a Master’s Degree in Marine Biology.
Percentage 92.98% 32.07% 52.9%

©2014 Engineering and Technology Publishing 149


Journal of Image and Graphics, Volume 2, No.2, December 2014

Kim Lopena is a Filipino born in Dumaguete Mehdi Salemiseresht was born in Tehran,
City in 1989. He recently attained his B.S in Iran, in 1982. He recently attained his B.S in
Computer Science degree from Silliman Computer Science degree from Silliman
University, Dumaguete City in 2014. He is University, Dumaguete City in 2014.
currently interning for Rentah Inc., Brooklyn, Currently working for NetworkLabs Nokia,
New York. His current endeavor is being an Manila, Philippines.
IT/Computer Systems (Back–End Developer).

Arby Moay is a Filipino born in Dapitan Chuchi Montenegro was born in Dapitan
City in 1993. He was a scholar in Philippine City, Philippines, in 1971. She received her
Science High School - CMC, and attained a B.S. in Computer Engineering degree from
BS Computer Science degree in Silliman Cebu Institute of Technology – University,
University, Dumaguete City in 2014. He is Cebu City in 1992 and Master in Computer
currently working as a R&D Engineer I for Science from the same university in 2009.
NetworkLabs Nokia, TechnoHub, Quezon She is an assistant professor of the College of
City. He currently dreams of becoming a Computer Studies, Silliman University,
Software Architect for NWL. Dumaguete City. Her research interest is in
the field of neural networks, signal
processing, and speech recognition.

©2014 Engineering and Technology Publishing 150

You might also like