Smart Cameras Enabling Automated Face Recognition in The Crowd For Intelligent Surveillance System
Smart Cameras Enabling Automated Face Recognition in The Crowd For Intelligent Surveillance System
Smart Cameras Enabling Automated Face Recognition in The Crowd For Intelligent Surveillance System
ABSTRACT: Smart Cameras are rapidly finding their way into Intelligent Surveillance Systems.
Recognizing faces in the crowd in real-time is one of the key features that will significantly
enhance Intelligent Surveillance Systems. The main challenge is the fact that the enormous
volumes of data generated by high-resolution sensors can make it computationally impossible to
process on mainstream processors. In this paper we report on the prototyping development of a
smart camera for automated face recognition using very high resolution sensors. In the proposed
technique, the smart camera extracts all the faces from the full-resolution frame and only sends
the image information from these face areas to the main processing unit — vastly reducing data
rates. Face recognition software that runs on the main processing unit will then perform the
required pattern recognition.
Introduction
Video surveillance is becoming more and more essential nowadays as society relies on video
surveillance to improve security and safety. For security, such systems are usually installed in
areas where crime can occur such as banks and car parks. For safety, the systems are installed in
areas where there is the possibility of accidents such as on roads or motorways and at
construction sites.
Currently, surveillance video data is used predominantly as a forensic tool, thus losing its primary
benefit as a proactive real-time alerting system. For example, the surveillance systems in London
managed to track the movements of the four suicide bombers in the days prior to their attack on
the London Underground in July 2005, but the footage was only reviewed after the attack had
occurred. What is needed is continuous monitoring of all surveillance video to alert security
personnel or to sound alarms while there is still time to prevent or mitigate the injuries or damage
to property. The fundamental problem is that while mounting more video cameras is relatively
cheap, finding and funding human resources to observe the video feeds is very expensive.
Moreover, human operators for surveillance monitoring rapidly become tired and inattentive due
to the dull and boring nature of the activity. There is a strong case for automated surveillance
systems where powerful computers monitor the video feeds — even if they only help to keep
human operators vigilant by sending relevant alarms.
Smart cameras can improve video surveillance systems by making autonomous video
surveillance possible. Instead of using surveillance cameras to solve a crime after the event, a
smart camera could recognize suspicious activity or individual faces and give out an alert so that
an unwanted event could be prevented or the damage lessened. From another perspective, smart
cameras reduce the need for human operators to continually monitor all the video feeds just to
detect the activities of interest, thus reducing operating costs and increasing effectiveness.
Smart Cameras
Smart cameras are becoming increasingly popular with advances in both machine vision and
semiconductor technology. In the past, a typical camera was only able to capture images. Now,
with the smart camera concept, a camera will have the ability to generate specific information
from the images that it has captured. So far there does not seem to be a well-established definition
of what exactly a smart camera is. In this paper, we define a smart camera as a vision system
which can extract information from images and generate specific information for other devices
such as a PC or a surveillance system without the need for an external processing unit.
Figure 1 shows a basic structure of a smart camera. Just like a typical digital camera, a smart
camera captures an image using an image sensor, stores the captured image in the memory, and
transfers it to another device or user using a communication interface. However, unlike the
simple processor in a typical digital camera, the processor in a smart camera will not only control
the camera functionalities, but it is also able to analyse the captured images to obtain extra
information.
Note that even in the dense crowd of Figure 3, the faces suitable for recognition only represent a
very small proportion of the image area. In many scenes, faces would represent less than 1% of
the image. Thus the smart camera would not overload the client processor by transmitting huge
amounts of high-resolution image data which is destined to be discarded immediately after face
detection. Such massive data reduction at source by up to two orders of magnitude is an
immediate and significant benefit of this approach.
Figure 3: Overall Scene (a), ROI extracted from scene with resolution of
7MP (b), 5 MP(c), 3MP (d), 1MP (e) and VGA (f).
CMOS image sensor offers high resolution and low noise output. Due to the low power and high
speed of CMOS, it is expected in the future that CMOS based image sensors will outperform
CCD based image sensors (D. Litwiller. 2001). There are many CMOS image sensors in the
market. Table 1 shows the highest resolution sensor product from three leading CMOS image
sensor manufacturers; OmniVision, Micron and Kodak. It is noticeable that the frame rate is
inversely proportional to the resolution of the camera. For wide angle surveillance, since no rapid
movements of object of interest are expected, 5 high resolution frames per second can be
considered as acceptable baseline performance for the prototype.
However, currently, there are five commonly used high bandwidth video interface standards
available: FireWire 400 or IEEE 1394a, FireWire 800 or IEEE 1394b, USB2, Gigabit Ethernet or
GigE, and Camera Link. Table 2 shows the general specification on the five interfaces. USB 2
and FireWire 400 can be considered as unsuitable in terms of data transfer speed if we compare
them with the current resolution of CMOS image sensor technology. While Camera Link is
suitable for very fast data transfer, it only supports one-to-one device connection. This means a
network of cameras could not be supported by this interface. GigE and FireWire 800 interfaces
can be considered as the most suitable interfaces for the purposed high resolution surveillance as
they both have a considerable data transfer speed and allow for the networking of the cameras.
For our first prototype, we decided to use FireWire 800 interface, because currently it is a more
established interface. We will try to incorporate GigE interface in our future prototype once this
interface become more mature. With the introduction of a new GigE Vision camera interface, it is
expected GigE will become the dominant machine vision interface in the near future.
One of the key aspects of FPGA is it has large number of arrays of parallel logic and registers
which enable designers to produce effective parallel architectures. Parallel processing is an
important feature especially for embedded systems that require high-level computation in real-
time — for example, face detection on a smart camera processor. Parallel processing allows
information to be transferred effectively and obtains end results faster since processing tasks are
segregated to be carried out concurrently. At the same time, parallel processing reduces power
consumption considerably, especially in processes which involve back-to-back memory access.
Additionally, FPGA allows incorporation of a microprocessor on the same chip. For our smart
camera prototype, the Spartan-3 FPGA was chosen as the main processing. We believe that the
Spartan-3 platform imposes an interesting challenge where optimum hardware resources will be
utilized in every aspect of the design.
The face recognition to be implemented on the system is as proposed by Shan et al. (2006). Their
system is comprised of three major components: 1) a Viola-Jones face detection module (Viola
and Jones. 2001) based on cascaded simple binary features to rapidly detect and locate multiple
faces, 2) a normalization module based on the eye locations, and finally 3) Adaptive Principal
Component Analysis to recognize the faces. As stated earlier, the face detection part (1) will be
implemented on the FPGA platform of the camera while the rest of the module will be
implemented on the client PC.
The face detection and face recognition processes usually require considerable computing power
and could require significant time when running on a standard PC. In a hardware implementation
however, processes can be decomposed and run in parallel so that less time will be taken to
execute the processes. FPGA’s provide a flexible and suitable reconfigurable platform for
applying suitable architecture of the processor. If sufficient parallelism is applied, it is possible
for the overall process to run in real-time.
Acknowledgements
NICTA is funded by the Australian Government's department of Communications, Information
Technology, and the Arts and the Australian Research Council through Backing Australia's
Ability and the ICT Research Centre of Excellence programs, and the Queensland State
Government.
References
D. Litwiller. 2001. CCD vs. CMOS: Facts and Fiction. Photonics Spectra.
M. Bramberger, A. Doblander, A. Maier, B. Rinner, and H. Schwabach. 2006. Distributed embedded smart cameras for
surveillance applications. Computer. 39: 68-75.
P. Garcia, K. Compton, M. Schulte, E. Blem, and W. Fu. 2006. An Overview of Reconfigurable Hardware in
Embedded Systems. EURASIP Journal on Embedded Systems. 1-19.
P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. IEEE Conference on Computer
Vision and Pattern Recognition. 511-518.
T. Shan, B. C. Lovell, S. Chen, and A. Bigdeli. 2006. Reliable Face Recognition for Intelligent CCTV. 2006 RNSA
Security Technology Conference. Canberra. 356-364.
W. Wolf, B. Ozer, and T. Lv. 2002. Smart cameras as embedded systems. Computer. 35: 48-53.