Kim2020 - Low-Cost Hardware Architecture For Integral Image Generation Using Word Length Reduction
Kim2020 - Low-Cost Hardware Architecture For Integral Image Generation Using Word Length Reduction
Abstract—An integral image is widely used in face detection the oldest values and including newly accumulated right pixel
to calculate feature values at high speed. However, implementing values when the sub-widow shifts. The proposed method
integral images in hardware requires considerable logic and requires less computation than conventional methods because it
memory resources. This paper proposes a hardware architecture does not subtract the oldest values to update the integral image
for integral image generation with reduced resource usage by when the sub-window shifts to the right. However, generating
applying the word length reduction method. When implemented an integral image in this method leads to growth in bit width of
in an FPGA, the proposed architecture uses about 83% fewer the elements in the integral image. To solve this problem, we
Slice LUTs than the conventional integral image method. apply the word length reduction method [4] to each element in
Therefore, the proposed architecture is suitable for low-cost real-
the integral image. This is the first study to apply both the sub-
time face detection systems.
window approach and the word length reduction method to the
Keywords—FPGA; hardware architecture; integral image; face integral image.
detection The word length reduction method allows the bit width of
each element in the integral image to be limited by the
I. INTRODUCTION maximum sum of the rectangular area used in the calculation of
In embedded systems, the Haar classifier is generally used the feature value. This means that even if the integral image
for face detection [1]. Accordingly, many studies have been reflects the entire horizontal width of the image, the element
conducted to implement the Haar classifier in hardware. For size of the integral image remains small. Applying the word
real-time operation of the Haar classifier, the integral image length reduction method, the cutoff value of the integral image
should be used to minimize the time taken to access the data in is determined by the maximum sum of the rectangular area
memory. However, generating an integral image of the entire used in the Haar classifier. Then, when the element of the
image demands large hardware resource, especially as the integral image is calculated, the cutoff value is subtracted if the
resolution of the image increases. To solve this hardware accumulated value exceeds the cutoff value. The feature value
resource problem, many studies use architectures that generate calculation of the word length reduction method is carried out
an integral image for a sub-window and update the integral in a similar way to the traditional feature value calculation
image each time the sub-window shifts [2, 3]. Using the sub- process. However, when the sum of the rectangular area is
window approach, the hardware resources used for storing negative or greater than the cutoff value, the cutoff value is
integral images decrease significantly. Nevertheless, the added or subtracted as a correction. This requires additional
hardware resources required to generate the integral image are operations to be performed in the feature value calculation but
still high because when updating the integral image, values of has less negative impact than positive effects as the correction
the leftmost pixels of the integral image before shifting should cab be carried out simply by comparison and compensation.
be subtracted from all other elements of the integral image.
Therefore, we propose a real-time integral image generation III. PROPOSED ARCHITECTURE
method and its hardware architecture to reduce hardware Fig. 1 illustrates the proposed integral image generator. The
resources used in the computation and storage of the integral proposed integral image generator consists of a pixel buffer, a
image. vertical adder tree, adders for the horizontal sum, and an
integral image buffer. The pixel buffer consists of 19 line
II. PROPOSED METHOD buffers for storing input pixel values. Twenty vertical pixel
The proposed method produces integral images by values, which consists of the current input pixel value and 19
accumulating pixel values to the right end of the horizontal outputs of the pixel buffer, are sent to the vertical adder tree.
direction of the input image while the vertical size of integral The vertical adder tree then performs a cumulative sum
images is fixed at the height of the sub-window. The proposed operation on the 20 vertical pixels to produce 1 × 20 column
method stores the integral image of only the size of the sub- sum data. The column sum data are summed horizontally with
window, and it updates the integral image by simply discarding the previous 1 × 20 vertical integral image data in each row to
TABLE I. COMPARISON OF ARITHMETIC UNITS FOR INTEGRAL IMAGE
GENERATION