C Clahe Fpga
C Clahe Fpga
Fig. 4. The Boundary and corner regions and their neighborhood regions
Given that there are 256 bins and the tile size is 64x64, of pixel as shown in equation 7. The derivation is shown step
setting the threshold (β) to 16 would lead to even distribution by step with equations 5-7.
of all pixels over the bins, which turns behavior of CLAHE
into AHE (Fig. 1c) and brings back the noise amplification The pipelined implementation shown in Fig. 8 utilizes four
problem. Therefore we set the β to 32 [19], which makes the multiplier units, and the Weight Generator unit, which provides
size of each bin to be larger than the maximum needed size for the coefficients indicating the distance between current pixel
the even distribution case and allows more number of pixels to and the center of 4 neighboring tiles (xy’,xy,x’y,x’y’). These
be absorbed in each iteration. Based on our experimental values are stored in the Weight Generator unit since it is more
studies, we observed that our C-CLAHE completes efficient to read them from memory than calculating each time.
redistribution process in two iterations in most cases and three For each pixel, four CDF values are read from BRAM
iterations in rare cases. We conservatively set the iteration corresponding to the contextual region. We multiply these CDF
count to three, unroll the loop and design the hardware values with their corresponding coefficients. In this way, we
accordingly. We present our hardware performance analysis calculate weighted four contextual region CDFs. The result is
based on this pessimistic design and evaluate the image quality divided by K × K which can be carried out by bit shifting.
based on this hardware implementation. Given that K is 64, the new intensity value is obtained by using
a 12-bit shifter for division. In the original formula, there are
The block diagram of the HRU is shown in Fig. 7. six different division operations where each of them depends
Histogram redistribution process starts after transferring the on variable distance for every step. In our method, we also
outputs of the HGU to the Mapping RAM. The Address eliminate these four dividers which consume significant
Generator and Controller module shown in Fig. 7 has three amount of logic resources.
outputs. We use the “h[n]<α-β” output to decide whether
histogram bin has enough capacity to add excess pixel or not, −
= ( )+ ( )
“e < 0” and “c” outputs to decide if loop unrolling is applied or
not. Address Generator and Controller unit also generates the
address for accessing the histogram data stored in the Mapping − −
+ ( )+ ( ) (5)
BRAM, which corresponds to the CDF. Address Generator is
based on a simple counter since we move through the bins in
1
an iterative manner. If a bin is filled to its capacity, that bin is = ( × ( )+ ( − )× ( )
skipped using the MSB (9th bit, section IV.B) of the address. ×
C. Calculation of the Mapping Function + ( − )× ( )
Remapping function requires several multipliers and
divider components as seen in equation 3. In addition to these + ( − )( − ) × ( )) (6)
costly operations, Mapping Function requires dynamic
calculation of distance between tiles and pixel. Therefore, we 1
restructured this equation to reduce the number of dividers and = ( × ( )+ × ( )
×
multipliers. In addition, we eliminated dynamic distance
calculation dependency by replacing the distance expression + × ( )+ × ( )) (7)
with the tile size (K). In this approach, the coefficients of where x'=K-x and y'=K- y.
cumulative distribution m(n) only depends on current address
Fig. 8. Remapping unit by interpolation over Upper Left, Upper Right, Below Left, Below Right tiles
TABLE I. Resource usage and performance comparison of the proposed C- increases the critical path delay increases. This is primarily due
CLAHE with state-of-the-art implementation. Image size: 640 × 480.
to the increase in the number of tiles, since our design is
C-CLAHE CLAHE parameterized to scale the number of HGUs and HRUs to
Number of Slice Registers 440 246
match the tile count. The size of the Histogram BRAM in the
HGU and the size of the Mapping BRAM in the HRU also
Number of Slice LUTs 4766 32123
increase. This in turn adds latency to complete all reads and
Number of fully used LUT-FF pairs 284 222 writes for the increased image size.
Number of Block RAM/FIFO 16 192
TABLE II. Performance and critical path delay comparison of the proposed
Operational Frequency (MHz) 108.9 209.6 C-CLAHE for various image sizes.
Performance (fps) 354.4 537.9
512x512 640x480 1280x720 1920x1080
V. RESULTS Critical Path Delay (ns) 8.48 9.19 12.06 14.48
We compare the resource usage and performance of the Maximum Frequency (MHz) 117.9 108.9 82.9 69.1
Throughput (fps) 450.1 354.4 89.9 33.3
proposed C-CLAHE with the state-of-the-art implementation
[5] in Table 1. Our design delivers real-time performance with VI. CONCLUSION AND FUTURE WORK
354fps and reduces hardware resource usage by a factor of 12X
for block RAMs and 6.7X for logic blocks. Fig. 9 shows the Real time processing of CLAHE is realized by using high
original image and the output of our FPGA based C-CLAHE memory bandwidth. We propose a method for real-time
implementation. It is clear from the image that C-CLAHE is interpolation based Contextual CLAHE implementation. To the
bringing out the details by increasing the visual quality and best of our knowledge, this is the first interpolation based
contrast of the image. In the same figure, we also include the CLAHE implementation on the FPGA. Important key
output of the original C-CLAHE of Pizer et al. [4]. We modifications from hardware implementation perspective are
conclude that the image quality is not sacrificed after our introduced to the Contextual CLAHE making it feasible to
modifications to the original C-CLAHE algorithm. implement on a resource limited FPGA. Histogram generation
method is restructured to reduce block RAM usage. A new
The modified C-CLAHE has also been implemented in histogram redistribution technique is proposed to implement
Matlab for functional verification. Matlab implementation iterative redistribution algorithm in hardware. Finally, an
takes 60.83 milliseconds to execute C-CLAHE on an Intel 2.30 alternative interpolation calculation method which is
GHz processor with 6GB RAM. We plot the CDF of Matlab functionally equivalent to the original approach is proposed to
and FPGA based outputs, and confirm that they match with a reduce the computation complexity. These modifications
0.39% negligible difference as shown in Fig. 10. allowed us to achieve real time processing speed for high
We evaluate the performance of our design for various definition image (1280 × 720) with much less hardware
image sizes. Table 2 shows the change in critical path delay, resource requirements. As a future work we plan to investigate
maximum operational clock rate, and throughput with respect different transfer functions for calculating CDF so that
to change in image resolution. For all image sizes we are able computation and memory demand is reduced. To be more
to achieve real-time performance. As the image resolution specific, one option is to maximize the contrast gain for
specific gray-level by using contrast-sensitive transfer function.
Fig. 9. Original input image (left) and C-CLAHE on FPGA (middle), original C-CLAHE in Matlab (right)
histogram”, Infrared Physics & Technology, Volume 48, Issue 1, pp. 77-
82, 2006.
[9] Kuen-Jan He, Chien-Chih Chen, Ching-Hsi Lu. Lei Wang,
“Implementation of a new contrast enhancement method for video
images," IEEE Conference on Industrial Electronics and Applications
(ICIEA), pp. 1982 - 1987. 2010.
[10] C. Iakovidou, V. Vonikakis, I. Andereadis, “FPGA implementation of a
real-time biologically inspired image enhancement algorithm,” Journal
of Real Time Image Processing, Volume 3(4) Dec 1, 2008.
[11] S.M. Pizer, “An Automatic Intensity Mapping for the Display of CT
Scans and Other Images,” in Proc. of the VIIth International Meeting on
Information Processing in Medical Imaging, pp. 276–309, 1983.
[12] R. Hummel, “Image Enhancement by Histogram Transformation,”
Computer Vision, Graphics and Image Processing, vol. 6, pp. 184–195,
1977.
[13] J. Y. Kim, L. S. Kim, S. H. Hwang, “An Advanced Contrast
Enhancement Using Partially Overlapped Sub-Block Histogram
Equalization”, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 11, No. 4, pp.475-484, April 2001.
[14] Z. Q. Wu, J. A. Ware, I. D. Wilson, J. Zhang, “Mechanism analysis of
Fig. 10. FPGA and Matlab based output comparison with 0.39% difference highly overlapped interpolation contrast enhancement”, IEEE
Proceedings Vision, Image & Signal Processing, Vol. 153, No. 4,
pp.512-520, August 2006.
REFERENCES [15] E. Jamro, M. Wielgosz, and K. Wiatr, "FPGA Implementation of the
[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Addison- Strongly Parallel Histogram Equalization," In Proc. 10th IEEE
Wesley Publishing Company, Reading, MA, 1993. Workshop on Design and Diagnostics of Electronic Circuits and
Systems, April 2007.
[2] R. Hummel, “Image enhancement by histogram transformation,”
Computer Graph. Image Processing, vol. 6, pp. 184–195, 1977. [16] Kentaro Kokufuta and Tsutomu Maruyama, “Real-time processing of
local contrast enhancement on FPGA”, International Conference on
[3] V. T. Tom and G. J. Wolfe, “Adaptive histogram equalization and its Field Programmable Logic and Applications (FPL), pp.288-293, 2009.
applications,” SPIE Applicat. Digital Image Process. IV, vol. 359, pp.
204–209, 1982. [17] P.D. Ferguson, T. Arslan, A.T. Erdogan and A. Parmley, "Evaluation of
contrast limited adaptive histogram equalization (CLAHE) enhancement
[4] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. on a FPGA," 2008 IEEE International SOC Conference, pp.119-122, 17-
Greer, B. H. Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive 20 Sept. 2008.
Histogram Equalization and Its Variations”, Computer Vision Graphics
and Image Processing, Vol. 39, pp.355-368, 1987. [18] M. J. Swain and D. H. Ballard, “Color Indexing,” International Journal
of Computer Vision, vol. 7, no. 1, pp. 11–32, 1991.
[5] K. Kokufuta and T. Maruyama, “Real-time Processing of Contrast
Limited Adaptive Histogram Equalization on PGA”, International [19] S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas, and K. E.
Conference on Field Programmable Logic and Applications (FPL), pp. Muller, "Contrast-limited adaptive histogram equalization: speed and
155-158, 2010. effectiveness," Visualization in Biomedical Computing, pp.337-345,
1990.
[6] S. Sajeev, M. Bajger, G. N. Lee, “Segmentation of Breast Masses in
Local Dense Background Using Adaptive Clip Limit-CLAHE,” [20] Yinpeng Jin , Laura Fayad , Andrew Laine, “Contrast Enhancement by
International Conference on Digital Image Computing: Techniques and Multi-scale Adaptive Histogram Equalization,” Proc. SPIE Wavelets:
Applications (DICTA), p1-8, 2015. Applications in Signal and Image Processing IX, SPIE. Vol. 4478, pp.
206-213, 2011.
[7] Ching-Hsi LU, Yu-Sheng Wang, Lei Wang and Hong-Yang Hsu, “A
Platform Implementation for Real Time Image Processing”, IEEE [21] K. Zuiderveld, “Contrast Limited Adaptive Histogram Equalization,” In:
Pacific Rim Conference on Multimedia, pp.238-247, 2008. P. Heckbert: Graphics Gems IV, Academic Press, ISBN 0-12-336155-9,
1994
[8] Wang Bing-Jian,Liu Shang-qian, Li Qing, Zhou Hui-xin, “A real-time
contrast enhancement algorithm for infrared images based on plateau