Feature Based Head Pose Estimation For C
Feature Based Head Pose Estimation For C
ABSTRACT: In this paper, we present an approach for making an efficient technique for real-
time face orientation information system, to control motor movement which can be efficiently
used for humanoid robot through computer vision. The project aims at applications related to
Human Computer Interaction (HCI). Our framework does not require any learning or temporal
modeling and can be used instantly. The system identifies the orientation of the face movement
with respect to the eye region detection and also identifies the changes in face orientation. In
addition the algorithm also strives to arrive parallelly, at Haar-classifier based face location
and eye pupil detection. It was experimentally found that recognition was improved by
inculcating the eye location which is recognized by applying circular Hough Transform on the
“eye region”. Finally we demonstrated its working in an augmented reality application for
which the raw data from computer was given to microcontroller to control the movement of
geared dc motor as per the movement of face in real time.
KEYWORDS: Haar Features, Classifiers, OpenCV, Gaussian Filter, Canny Edge Detector, Hough
Transform, Microprocessor
I. INTRODUCTION
motor as per the movement of the face. This feature can be extended to rotate the head
movement of the humanoid robot as per the movement of face in real time. .
Viola and Jones [1] were first to introduce Haar cascade classifiers and use it to the task of face
detection. The idea of using cascade of simple classifiers led to the creation of an accurate and
computationally efficient detection system. Lienhart et al. [2] improved the Haar cascade
classifiers by enlarging the feature pool with the rotated Haar-like features. He additionally
tested the influence of various weak classifiers and boosting algorithms on the performance of
cascades. Weak classifiers ensembles were also used by Meynet et al. [3] they combined simple
Haar cascade classifiers with another parallel weak classifiers ensemble. The Haar Cascade was
used to discard easy to classify non-faces. Wilson and Fernandez [4] used cascades trained
against other features to extract eyes, a mouth and a nose from the face region. As the
processing of a whole face led to many false positives (FP) they proposed the regionalized
search approach. This explicitly means the use of the knowledge about a face structure i.e.
searching for a left eye in an upper-left, for a right eye in an upper-right, a nose in a central and
a mouth in a lower part of the face. Many new algorithms have been developed in recent past
for the face and eye detection and tracking using Haar cascade. In the paper by Subramanya et
al [5] have proposed a technique that makes use of a binary classifier with a dynamic training
strategy and an unsupervised clustering stage in order to efficiently track the pupil (eyeball) in
real time. The dynamic training strategy makes the algorithm invariant of lighting condition,
their experimental results from a real time implementation show that this algorithm is robust
and able to detect the pupils under various illumination conditions. Schedin and White [6]
used OpenCV to load boosted Haar classifier cascades which allows detection of the face and
facial features by using hue histogram thresholding via OpenCV’s Camshift feature and
template matching the proposed method of detection achieves great speed and generality at a
slight cost to accuracy. Raheja et al[7] used pattern matching method using PCA algorithm to
recognise hand gesture for specified robotic action , they implemented part of their algorithm
on an FPGA and rest runs through PC. Their experimental results show that their proposed
system can detect hand gestures with 95 % accuracy.
Whatever research is done in labs should reach to the common people, therefore, to efficiently
use the face and eye tracking feature, various useful applications have been developed and
successfully implemented by different researchers over a period of time. Recently, a number of
research papers have been published on real time applications in computer vision: Hadid et al
[8] have used Haar-like features with AdaBoost for face and eye detection, and Local Binary
Pattern (LBP) approach for face authentication in mobile phones in order to increase the safety
level in mobile phones. Their experimental results show good face detection performance and
average authentication rates up to 96% for faces of 80×80 pixels. Miluzzo et al introduced [9] a
novel hands free interfacing system capable of driving mobile applications using only the user’s
eyes movement and actions (e.g., wink).They have presented a prototype implementation of
EyePhone on a Nokia N810, which is capable of tracking the position of the eye on the display,
mapping this position to an application that is activated by a wink. Kateja and Panchal [10]
used Haar features for eye detection and a simulator for drowsiness detection system in
drivers in order to alert them. The paper by Raajan et al [11] suggests a hybrid gesture
recognition system for computer interface and wireless robot control. The robot used in this
project is controlled by RF module.
OpenCV library is a cross-platform library developed by Intel which mainly aims at real time
computer vision. This library implements a consequent number of images processing functions
and can be modified to implement custom processing methods. OpenCV functions cover a wide
panel of domains, some of them are: Human-Computer Interaction (HCI), Object identification,
segmentation and recognition, Face and gesture recognition etc.
As we are using Haar based Viola Jones method for face and eye detection let us have a brief
idea of how the detection system works. To begin the face detection or any object detection
algorithm, first computer is trained by making a data base containing some hundred to
thousand faces and non-faces pictures, this is called training a classifier which can classify a
face from a non face. When an input image is given the computer identifies if the image
contains a face or not by applying certain set filters and rules. From the applications described
in section II , most part of the tracking algorithm developed in this first approach is based on
Haar-like object detector .These Haar features can be thought of a colvolutional kernel of
different shapes and sizes [1] which are superimposed on a person’s face for the purpose of
facial feature detection , these features have some sort of resemblance with the face features
like nose, eyes, mouth etc. used to detect the face or any other feature in the given image .
This object detector has been proposed by Paul Viola [1] and later improved by Rainer
Lienhart [2]. Since in our project we are detecting the face and eyes, therefore, we will refer
only face detection Haar features.
In Viola Jones method each given image (say 360X420) is divided into a window of 24X24 pixel
size and these Haar features are applied to that window from top left corner to bottom right
most corner, so a single feature is applied for almost 100 to 1000 times in a single window.
Once a single feature is applied over a window its size is increased from say 1X2 pixel size to
1X4 pixel size and again applies it all over the window. Thus finally we get more than 160,000
features in total for a 24X24 window only ,which are applied all over a single window, but
practically to evaluate this huge number of features is incredibly a hard task . The solution to
this problem is achieved by using Adaboost algorithm. Adaboosting is done to eliminate
redundant features in order to narrow down approximately 160,000 features to couple of
thousand features. The passed or good features are arranged in an order through which the
given input image 24X24 window has to pass , if at certain stage the input image window fails ,
it is classified as Non Face and which passes through all the selected Haar classifier are Faces
.The complete summary of training the classifier is shown in Fig. 1.
100-1000
different
Faces
Applying AdaBoost
Face
System Algorithm & Passing
through Cascade Detected
100-1000
different Non
Faces
Input
OpenCV can detect face and eyes in real time with 80% precision with Voila-Jones object
detection algorithm which uses Haar-cascade classifier [12]. We were able to detect face and
eyes in real time with good precision using openCV. We tried to train our own Haar-cascade
classifier for detection purpose, but it did not give good results. Therefore, we used public
domain cascades in our experiment as it consumes lot of time and is not necessary for our
experiment. To detect face, eyes region, and then each eye separately across eyes region, a
group of four cascades of classifiers trained specifically for each feature has been chosen:
A paper by Santana et al [13] lists of all the available public domain classifiers and has also
given a comparison among same targets. For each classifier, its ROC (Receiver Operating
Characteristics) curve was computed applying the original release and some variations
obtained reducing its number of stages. Observing the Area Under the Curve (AUC) of the
resulting ROC curves, it is evident that haarcascade_frontalface_alt2 performs better than other
ones that’s why we have used it in our experiment.
IV. ALGORITHM
Input Image
from Camera
RGB to Gray
Gaussian Filter
Canny Edge
Detector
Getting the
Face & Eye Eye region
Region coordinates
Applying
Hough
Fig. 2: Face and Eye Detection System using Haar Cascade
RGB to gray converts RGB images to gray scale by eliminating the hue and saturation
information while retaining the luminance.
The function used for the conversion is:
The next step is to apply Gaussian filter to smooth out the image to reduce noise and avoid
false circle detection , after applying the filter the image is blurred in order to reduce the noise
present in the picture:
This is done to reduce the number of features in the image so that eye can be easily detected in
the image.
CvSeq* faces = cvHaarDetectObjects( img, cascade, storage, 1.1, 2, CV_ HAAR_ DO_ CANNY _
PRUNING, cvSize(40, 40) );
The .xml files for the faces are loaded into a cvHaarClassifierCascade datatype variable. A
linked list for the face region can be created using cvHaarDetectObjects which returns a CvSeq
of the face region. The type-casted points of ROI are extracted with cvGetSeqElem which
returns a character pointer from the linked list. This ROI of the image can be highlighted with a
square boundary around the face.
A feature is added in the program which counts the number of faces detected on the screen in
real time.
Circular Hough Transform relies on equation for circle which is given as:
R² = (x – a) ² + (y – b) ²
Here ‘a’ and ‘b’ represent the coordinates for the center, and R is the radius of the circle. The
parametric representation of this circle is:
x = a + R Cos (θ)
y = b + R Sin (θ)
Where θ is the angle of the vector from the origin to this closest point.For simplicity, most CHT
programs set the radius to a constant value (hard coded) or provide the user with the option of
setting a range (maximum and minimum) prior to running the application.
The function CV_HOUGH_GRADIENT traces the circle in the image, which is obviously the eye
region. The eye detection step of proposed method firstly detects possible eye center by the
Circular Hough Transform. Then it extracts histogram of gradient from rectangular window
centered at each eye center.
CvSeq*circles=cvHoughCircles(gray,storage,CV_HOUGH_GRADIENT,1,gray->height, 35,25);
The CvCircle function draws the circle around the eye pupil, where radius is taken as 15 and
line thickness as 2, if we want filled circle, negative value may be taken, rest of the values are
the default values for desired functioning.
cvCircle(img,cvPoint(cvRound(p[0]+pt1.x),cvRound(p[1]+pt1.y)),15,CV_RGB(0, 255,0),2,8,0);}
At first place we used the eye coordinates to drive the dc motor but, detecting eyes is a difficult
track because the eye circle keeps on shifting a bit during the eye searching, therefore, it was
better off to use eye region coordinates which provided stable coordinates. We used upper left
eye region coordinates to control the servo motor. When the face moves towards the left side,
the x coordinate goes below 100 and X1 signal is sent to USB port of the computer, when the x
coordinates are greater than 300, X2 signal is given to the output USB port of the computer,
and when the coordinates are between 101 to 299 i.e. when the face is in the centre position X3
signal is sent out to USB port.
V. ELECTRONICS
The output signals X1, X2 and X3 are given to RS-232 serial port which converts the X1,X2
and X3 signal from PC into 8 bit binary number , these 8 bit input signals are given to MAX232
IC (see Fig. 7) which converts input signals into binary signal suitable for use in TTL
compatible digital logic microcontroller 8051 IC. The microcontroller generates appropriate
signals of defined time duration and these signals are then given to L293D which is a dual H-
bridge motor driver integrated circuit (IC). The motor driver IC acts as current amplifiers since
they take a low-current control signal and provide a higher-current signal. This higher current
signal is used to drive the motors. In its common mode of operation, two DC motors can be
driven simultaneously, both in forward and reverse direction. Binary values generated from
microcontroller like, 01 for left rotate, 10 for right rotate and 11 for centre position are given
by the microcontroller to the L293D IC, where on receiving the binary value generates a signal
which rotates the motor accordingly.
We have used Interrupt based serial port activated Kiel compiler for our project, which works
as per the algorithm mentioned below:
A. Algorithm
1. If 01 is received, the coordinates are less than 100, move motor anticlockwise for 2
seconds, and set the Left Flag(LF)=1.
2. If 10 is received, coordinates are more than 300, move motor clockwise for 2 seconds,
and set Right Flag(RF) =1.
3. If 11 is received, then check both flags If LF is set, move motor clockwise, otherwise move
motor anticlockwise.
The flow chart (see Fig. 8) gives the complete pictorial view of the algorithm.
Start
LF & RF Flag=0
Are No
Coordinates<100
Coordinates>300
Yes
Is No
Set Flag LF = set
10=Clockwise Yes
Rotate Motor
01=Anti-Clockwise
Call Delay
Is
Count=0
No
Yes
Stop
VI. RESULT
We have successfully implemented an algorithm which can detect eye pupil using Hough
transform and with the help of eye region coordinates we are able to move the geared dc motor
as per the movement of eye region which eventually signifies the face movement. This can be
used to rotate the face of the humanoid robot in real time according to the movement of a
person standing in front of a camera. The code for face and eye detection, is written in C++ on
Code Block IDE (Integrated Development Environment) using OpenCV image processing
librates. The program to control the motor by 8085 microprocessor is written in C language
using Kiel compiler.
Left Rotate
Center
Right Rotate
The main limitation of the proposed method is that it is unable to model the head pose
estimation in yaw and pitch angles which are the vital head movements, the proposed
algorithm works only when the face moves horizontally in front of the camera. In future work
can be done to overcome this shortcoming. Though the proposed algorithm can be
implemented for real time control of computer screen for various applications, many real time
applications can be made for the blind and physically handicapped persons and gesture
recognition in human beings.
Face detection and counting algorithm is an additional component which can be used to
monitor and count the number of persons present in a specified region. We can further use the
data by transmitting the output information online through Bluetooth or radio frequency for
analyzing the data from a far-off place.
This is a very simple face/eye tracking using OpenCV and only Viola Jones’ face detection
framework. Basically, the idea is to use face / eye tracking to create an immersive 2D
experience to control the motor movement. This is of course a very early prototype and can be
improved further.
VIII. ACKNOWLEDGMENTS
This research was being carried out at Aditya Birla Training & Research Centre, BKBIET
Campus, Pilani, INDIA. Authors would like to thank Dr. P S Bhatnagar, Director, BKBIET, Pilani
for providing research facilities and for his active encouragement and support.
REFERENCES
[1] Violas P, Jones M Rapid object detection using a boosted cascade of simple features. In:
Proceedings of CVPR 1:511-518,(2001); 2004.
[2] Lienhart R, Kuranov A, Pisarevsky V Empirical Analysis of Detection Cascades of Boosted
Classifiers for Rapid Object Detection. Technical report, Microprocessor Research Lab, Intel
Labs, 2002.
[3] Meynet J, Popovici V, Thiran J Face Detection with Mixtures of Boosted Discriminant
Features. Technical report, EPFL,2005.
[4] Wilson P, Fernandez J Facial feature detection using Haar classifiers. J.Comput. Small Coll.
21:127-133, 2006.
[5] Subramanya Amarnag, Raghunandan S. Kumaran and John N. Gowdy; Real Time Eye
Tracking For Human Computer Interfaces, 2003.
[6] George Arceneaux IV, Allison Katherine Schedin, Andrew John Willson White Real Time
Face and Facial Feature Detection and Tracking, 2010.
[7] Jagdish Lal Raheja, Radhey Shyam, G. Arun Rajsekhar and P. Bhanu Prasad (2012). Real-
Time Robotic Hand Control Using Hand Gestures, Robotic Systems - Applications, Control and
Programming, Dr. Ashish Dutta (Ed.), ISBN: 978-953-307-941-7, InTech, DOI: 10.5772/25512.
[8] A. Hadid, J. Y. Heikkil¨a, O. Silven & M. Pietik¨ainen Face and Eye Detection for Person
Authentication in Mobile Phones, 2007.
[9] Emiliano Miluzzo, Tianyu Wang, Andrew T. Campbell, EyePhone: Activating Mobile Phones
With Your Eyes , 2011.
[10] Manoj Kateja , Krunal Panchal Drowsy Driver Detection System: A Novel Approach Using
Haar Like Features , 2012.
[11] N. R. Raajan, R. Krishna Kumar, S. Raghuraman, N. Ganapathy Sundaram, T. Vignesh;Eye-
hand Hybrid Gesture Recognition System for Human Machine Interface;International Journal of
Engineering and Technology (IJET), 2013.
[12] Onindita Afrin, Mahabub Hassan, Mohona Gazi Meem: An eye-controlled system: An eye
control system using OpenCV, Brac University, 2012.
[13] M. Castrill´on-Santana,
a, O. D´eniz-Su´arez,
D´eniz L. Ant´on-Canal´ıs
nal´ıs and J. Lorenzo-Navarro;
Lorenzo Face
and Facial Feature Detection Evaluatio
Evaluation, International Conference
erence on Computer Vision Theory
and Applications(VISAPP)
VISAPP) Funchal, Por
Portugal, 2008.
BIOGRAPHY
Lovendra
ndra Solanki
So has received his M.E degree from M B M Engineering
College, Jodhpur. He has 20 years of teaching
eaching experience in the field of
Electronics Engineering.
Engine He is the authorr of 3 books on Digi
Digital Electronics
and 6 research papers.
pa Currently he is Associate Professor
Profes at B K Birla
Institute of Engineering
Engin & Technology,, Pilani and pursuing
pursui Ph.D from
Singhania University,
Unive Jhunjhunu. He may be reached at
dean.ecee@bkbiet.
[email protected].