0% found this document useful (0 votes)
16 views8 pages

C Clahe Fpga

Uploaded by

Kausthubha Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views8 pages

C Clahe Fpga

Uploaded by

Kausthubha Bk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Resource Efficient Real-Time Processing of Contrast

Limited Adaptive Histogram Equalization


Burak Unal Ali Akoglu
Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering
The University of Arizona The University of Arizona
Tucson, USA Tucson, USA
[email protected] [email protected]

Abstract— Contextual Contrast Limited Adaptive Histogram


Equalization (C-CLAHE) is an effective method for solving the
noise amplification effect of the adaptive histogram equalization
(AHE), and enhancing the visibility of local details of an image.
Even though C-CLAHE has a smaller memory foot print than
CLAHE, complexity of the interpolation process increases the
computation demand dramatically. Therefore, FPGA based
implementations have been limited to CLAHE only. In this study
we introduce three key modifications to the C- CLAHE, and for
the first time make it feasible to implement on a resource limited
FPGA. We restructure the method so that the histogram Fig. 1. Image histogram based on various histogram equalization techniques:
redistribution stage is realized with fewer number of iterations. (a) X×Y pixel size input image. K×K Tile indicates K×K pixel local region in
We implement contrast limitation calculations earlier during the this image. (b) Histogram of input image. The histogram of small tile in (a) is
histogram generation stage instead of during the histogram represented as a small histogram in different color. (c) Histogram after HE. (d)
redistribution stage, which reduces the block RAM demand. We Histogram after AHE. (e) Histogram after CLAHE.
finally mathematically derive an alternative interpolation Adaptive HE. The GHE spreads out dynamic range of intensity
calculation used during the remapping stage, which reduces the values over the entire image [1]; whereas the AHE first
computation complexity in terms of required multipliers by a partitions the image into tiles and then spreads out dynamic
factor of 2x, without sacrificing the image quality. These range of intensity values within the tiles individually [2], [3].
algorithmic modifications allowed us to reduce the block RAM
GHE offers ease of hardware implementation as it requires one
demand by a factor of 12x, logic block demand by a factor of 6.7x
time histogram calculation and remapping with a tradeoff in
compared to the state of the art FPGA based CLAHE
implementation, and achieve real time processing of 640 × 480
image quality. AHE is more effective than GHE on enhancing
images at a rate of 354 frames per second. the visibility of local details in a given image. However AHE
suffers from blocking effect on boundary pixels between the
Keywords—histogram equalization; fpga; memory efficient; tiles and furthermore has an inclination to amplify noise in
image enhancement; medical imaging relatively homogeneous regions of an image where the
dynamic range of the image is narrow. Blocking effect is
I. INTRODUCTION observed when pixels on the boundary of two tiles are
An image histogram depicts the distribution of the pixel remapped using two different histograms. Overlapping the two
intensity for a given image. The number of pixels in an image tiles is one method to resolve the blocking effect. However this
with a particular intensity value is a critical piece of significantly increases the computation complexity and
information that is widely used for image contrast enhancement memory footprint, making it harder to implement on a FPGA
through a process called histogram equalization. This process that has limited block RAM resources. Furthermore, these
involves calculating pixel intensity distribution as illustrated in methods fail to resolve the noise amplification of the AHE.
Fig. 1(b) and remapping individual pixel values in such a way Pizer et al. [4] introduced several techniques for overcoming
that the resultant histogram spans the whole range of the gray the noise amplification problem of the AHE such as the
levels as shown in Fig. 1(c). Histogram Equalization (HE) Contrast Limited Adaptive Histogram Equalization (CLAHE)
helps increase the distribution of dynamic range (gray-level) of and Contextual CLAHE, applied to the histogram calculation,
a given image and allows details of object of interest in the histogram redistribution and remapping stages of the AHE.
image become more visible. HE has been investigated through Since both techniques rely on tile overlapping, the memory
hardware and software based implementation approaches for requirement is still a concern from hardware implementation
enhancing images of various forms (x-ray, infrared, ultrasound, perspective. CLAHE uses a fully overlapping strategy, whereas
computer tomography, etc.) in medical imaging, military, the Contextual CLAHE (C-CLAHE) solves the blocking effect
automotive, and aerospace systems. From algorithm and noise amplification using interpolation. Both techniques
development perspective, HE has been traditionally are equally effective in dealing with the blocking problem.
implemented based on two methods, namely Global and Even though C-CLAHE has a smaller memory footprint,
CLAHE has been preferred over C-CLAHE for implementing an enhancement over Bing-Jian’s method by first dividing the
on the FPGA [5] due to computation complexity of the histogram into two parts with respect to a threshold value and
interpolation process involved in the C-CLAHE. then applying different remapping functions for each region.
This method is implemented on the Xilinx Spartan-3 FPGA
The state of art hardware architecture [5] for CLAHE achieving 44.045 fps for 720 × 480 images. Another version of
achieves real-time performance by utilizing large amount of GHE is implemented by Iakovidou in [10] by carefully taking
memory to be able to process histogram calculations for each advantage of the FPGA resources to support very large images.
pixel in parallel. In this study we introduce three key The proposed architecture employs reconfigurable pipeline and
modifications to the C-CLAHE making it feasible to structured memory management to render in real-time image
implement on a resource limited FPGA. We first introduce a data in the order of 2.5Mpixels. The implementation of this
look- ahead excess pixel detection mechanism to the histogram algorithm on the Altera Stratix II FPGA achieves 25 fps.
distribution process so that pixels of an image are distributed
among the histogram bins in fewer number of iterations. This Adaptive histogram equalization (AHE) allows increasing
in turn allows effective loop unrolling during hardware the local contrast to overcome the poor image quality
implementation. We then reorder the contrast limitation performance of the GHE when the image is non-uniform. The
calculations and implement them earlier during the histogram AHE [11] [12] method starts with partitioning the entire image
generation stage instead of during the histogram redistribution into several non-overlapping blocks called tiles. Then, GHE is
stage, which reduces the block RAM demand. Finally we applied on each tile to redistribute the intensity values of the
mathematically derive an alternative interpolation calculation pixels. The main problem with this method occurs when two
used in the remapping stage that is functionally equivalent to neighboring pixels with the same intensity value fall into two
the original approach, but reduces the computation complexity different tiles. Even though those pixels have the same
in terms of required multipliers by a factor of 2x. These intensity value, they will eventually be remapped to different
algorithmic modifications allow us to reduce the block RAM values based on the histogram of the local tiles. This creates
demand by a factor of 12x, logic block demand by a factor of blocking effect on the image. Overlapping the tiles would solve
6.7x compared to the state of the art FPGA based CLAHE the blocking effect. However this poses a tradeoff challenge
implementation, and achieve real time processing of 640 × 480 between computation complexity and image quality. As the
images at a rate of 354 frames per second on the Zynq-7000 size of the overlapping region increases from partial [13] to
development board. fully overlapping [14], the blocking effect decreases, however
number of histogram and remapping calculations increase
The rest of the paper is organized as follows. Section 2 linearly.
presents the literate review of the FPGA based histogram
equalization implementations. In Section 3 we describe the Jamro et. al [15] argues that the computation challenge of
algorithmic flow of the C-CLAHE, followed by the details of the fully overlapping AHE could be resolved with a 2D parallel
our real-time hardware architecture in section 4. We present architecture and effective block RAM utilization on the FPGA.
performance analysis with respect to the state of the art Kokufuta et. al [16] build on these arguments and present a
implementation in Section 5. Finally, Section 6 presents our real time highly parallel fully overlapped AHE implementation
conclusion and future work directions. using Xilinx Virtex-4 FPGA. They achieve 263.8 fps for 640
×480 images that exceeds the performance and quality of GHE
II. RELATED WORK implementations. However this method needs large memory to
Real-time implementation of histogram equalization is a store one column of pixels for each tile while the histogram
necessity in many applications such as space based mission calculation is iterated through the columns of the image. 128
critical operations, medical imaging such as MRI and BRAMs are needed by tis design to reach 263 fps.
surveillance systems [6]. In this section we present an overview Remapping process involves multiplication of each pixel in
of the FPGA based implementations for HE and its variations a tile with a coefficient determined by the histogram of that hat
including GHE, AHE, CLAHE, and C-CLAHE. tile. For the two neighboring pixels that are in the same tile and
Lu et al. [7] implement GHE on Xilinx Spartan3 FPGA for that have close intensity values, the remapping process will
720 × 480 images and achieve real time performance with 122 result in an increase in the difference between the intensity of
frames per second (fps). Bing-Jian et al. [8] improve Lu’s the two pixels. This phenomenon will lead to increase in the
architecture by modifying the GHE method. This noise, which cannot be solved with the overlapping strategy.
implementation is targeted for infrared imaging, therefore Contrast limited AHE (CLAHE) solves the noise problem by
authors claim that the standard GHE is not suitable since pixel reducing the coefficient amount which in turn limits the
intensity values are populated over a much narrower histogram contrast of the histogram. Basic implementation of the CLAHE
range then regular images. GHE distributes the pixel values to is proposed in [17] for 128 × 128 images. However
the complete gray level range uniformly, which doesn’t help implementation details for individual stages of the process
improve the image contrast for the infrared image. Therefore along with post-routing resource and performance analysis are
authors identify a threshold gray level value in the histogram not reported in this study.
and remap only the pixels that have larger intensity value than With the rapid increase in logic blocks and block RAMs on
this threshold. The algorithm is implemented on the ACEX the new generation FPGAs, the study by Kokufuta [5] showed
series Altera FPGA achieving 25 fps for 128 × 128 images. a real-time implementation of the CLAHE. This state of the art
Bing-Jian’s implementation ignores the pixels that have lower fully overlapped CLAHE architecture is implemented using
intensity then the threshold value. Kuen-Jan et. al [9] propose
Xilinx Virtex-4 FPGA, which archives real-time processing of redistributed iteratively until all excess pixels are distributed
640x480 image at a 537.9 fps. Authors reduce the computation and no bin is filled beyond its capacity. In the Algorithm 1,
complexity by storing histogram of each tile in on-chip line numbers 16-22 correspond to this iterative redistribution
memory banks and reuse the calculated histogram data for the process. Clipped and redistributed histogram for each tile is
overlapping regions. However, the implementation requires then used to determine a cumulative distribution function
excessive amount of on-chip storage space in the order of 192 (CDF) for each tile as shown in equation 2.
of 18Kb block RAMs. It is interesting to note that Kokufuta’s
implementation choice is the memory intensive CLAHE as 1, ()=
ℎ( ) = (1)
opposed to C-CLAHE. We believe that the main reason behind 0, ℎ
this choice rooted to the fact that the CLAHE uses fully
overlapping strategy and does not involve computationally where I(i) is intensity value of pixel i, h(k) is the bin for
expensive interpolation process, which make CLAHE more pixel intensity value k (0<k<255).
practical from real-time implementation perspective.
1
= , ( )= ℎ( ) × × (2)
III. ALGORITHMIC FLOW
A. Overview where 1<n<N, and M is the number of pixels in a tile, N
Contextual region based contrast limited adaptive is the number of bins (256 in our case), h(k) is the
histogram equalization (C-CLAHE) is formulated based on histogram, x and y indicate location of a tile.
dividing the image into several non-overlapping tiles of size
64×64 [21]. First, a histogram is computed for each tile. The
histogram accumulates the number of pixels with the same
intensity value into a bin structure, each gray level of
histogram. Typically, a contrast limit is extracted from
multiplying the average value of histogram (mean value) by a
coefficient. Contrast limit indicates the size (capacity) of each
bin. A typical coefficient value is between 1 and 4 times the
histogram mean value [4]. Next, the intensity values that are Fig. 2. Iterative histogram redistribution process.
larger than the contrast limit are clipped off from each bin, and
redistributed to other bins in the histogram to prevent over Algorithm 1: Histogram generation and redistribution flow.
amplification of noise. After the redistribution, the cumulative
# histogram clipping
distribution function is determined for each tile on the contrast
1. excess = 0;
limited histogram. Finally, during the intensity remapping step, 2. for (n= 0; n N; n++) {
pixel values are remapped by using four nearest tiles and 3. if (h[n] > β )
combining them through linear interpolation. The flow of the 4. excess += h[n] - β; h[n] = β;
C-CLAHE is presented in the following subsections. 5. }
B. Histogram Generation # Initial distribution
6. m = excess / N;
Histogram generation for each tile involves identifying the 7. for (n= 0; n N; n++) {
number of pixels for each gray-level (bin) from 0 to 255 using 8. if (excess > 0) {
equation 1. In other HE methods, there is no limit to the bin 9. if (h[n] < β − m)
size. By restricting the bin size to 32, from implementation 10. h[n] = h[n] + m; excess += – m;
point of view, C-CLAHE requires less register space for 11. else if (h[n] < β ) {
keeping track of the bins. If a bin is filled to its capacity then 12. h[n] = β;
the pixel that is supposed to be accumulated in that specific bin 13. excess += – β + h[n]; }
is redistributed to other available bins. For the redistribution 14. }
process, explained in the following subsection, histogram 15. } # end of for
generation stage accumulates the number of pixels exceeding # Iterative redistribution of excess pixels
the bin-limit (excess pixels) for each gray-level and records the 16. while (excess > 0 )
number of bins (capacity bins) that are filled to their capacity. 17. for (n= 0; n N; n++)
18. if (excess > 0) {
C. Histogram Redistribution 19. if (h[n] < β ) {
Excess pixels, clipped histogram and the capacity bins for 20. excess = excess – 1;
21. h[n] = h[n] + 1; }
each tile are the inputs to the histogram redistribution stage.
22. }
This stage iteratively distributes the excess pixels equally to
other bins of the clipped histogram that have not been filled to D. Intensity Remapping Unit
their capacity. In the Algorithm 1, line numbers 6-15 Given the CDF for each tile and the original pixel intensity
correspond to this initial distribution of excess pixels to the values for the entire image, the new intensity value for each
available bins. New set of excess pixels may appear after the pixel in a tile is calculated by taking surrounding contextual
initial distribution if a bin is already near its capacity before regions into account. This process, depending on the location
the distribution as illustrated in Fig. 2. These excess pixels are of the pixel, may use up to four weighted sum of contextual
CDF of the neighbor tiles. Fig. 3 shows an 8x8 grid of tiles. structure. We implement eight histogram generation units to
Zooming into 2x2 region, we show the impact of neighboring operate on eight tiles in parallel. Here we note that our final
tiles on the intensity value calculation of the pixel (indicated as design is parameterized to scale the number of histogram
a square box) that is located in the bottom right (BR) tile. Four generation units based on the input image size. We will
regions (upper left – UL, upper right – UR, bottom left – BL evaluate the performance of the proposed architecture with
and bottom right – BR) and interpolation among them are various image sizes in the results and discussions section. The
illustrated in the figure. Intensity value for this pixel is Buffer is implemented using BRAMs to store intensity values
calculated using the remapping function shown in equation 3, of the original image, that are later used during the remapping
which depends on the distance of the pixel under consideration stage along with the CDF.
from these 4 contextual regions, indicated with z, t, w and s.

Fig. 3. Image divided to 64 tiles and interpolation distance.


Fig. 5. Block diagram of the C-CLAHE

= ( )+ ( ) + A. Histogram Generation Unit


+ + +
The block diagram of the histogram generation unit (HGU)
( )+ ( ) (3) is shown in Fig. 6. Given a single tile of 64x64 pixel intensity
+ + + array, this unit distributes each pixel based on its intensity
value over 256 bins, and stores the result in the Histogram
Pixels that are in the corner or border tiles, as illustrated in
BRAM of size 256 words. We limit the buffer size to 64 pixels,
Fig. 4, will have less than 4 contextual regions. For instance,
which allows streaming an entire row of a tile. Given that there
the pixel in the aforementioned region has only two contextual
are eight HGUs, we feed one row from each of the 8 tiles into
regions such as BR and UR. In this situation, new intensity
each unit sequentially. We overlap the computation time of one
value is expressed using equation 4.
HGU with the data transfer to the subsequent unit. Each word
( )+ ( ) of the Histogram BRAM represents the individual bin of the
= (4)
+ + histogram. According to Pizer[11], the number of pixels for
each bin is between 0 and the total number of pixels in a tile.
For a 64x64 tile, maximum bit-width for each gray level is 12
bits assuming all 4096 pixels have the same intensity value.

Fig. 4. The Boundary and corner regions and their neighborhood regions

IV. HARDWARE IMPLEMENTATION


The overall block diagram of C-CLAHE implementation, Fig. 6. Histogram Generation Unit (HGU) , a three cycle path with data read,
shown in Fig. 5, consists of three units; histogram generation, comparison, and accumulation stages
histogram redistribution and CDF calculation, and intensity The redistribution process introduces a threshold value for
remapping. Histogram equalization can be applied to gray level the size of each bin. If there is excess amount of pixels for a
image without any modification. RGB to YUV converter is bin, those pixels are redistributed to remaining bins. We
used to be able to process color images. Y is then used as an modify the original histogram generation flow to prepare for
intensity value. We present the flow of the architecture the redistribution process, which reduces the storage space
development in the context of a 512x512 image for the sake of required by the histogram generation unit. We set the bin limit
simplicity. Given that each tile is composed of 64x64 pixels, to 32 based on Pizer’s [19] work which shows that even for
for a 512x512 image, there are 64 tiles forming an 8x8 tile medical images, which require high contrast, to be able to
allows us to conveniently keep track of state of each bin. We
Algorithm 2: Modified histogram redistribution process
will discuss the benefit of this design decision on reducing the
# histogram clipping iteration count for the redistribution process in the next section.
1. excess = 0;
2. fullbin = 0; As shown in Fig. 6, we use pixel intensity value itself as an
3. for (n= 0; n N; n++) { address to Histogram BRAM for keeping track of the number of
4. if (h[n] > β ) { pixels with that specific intensity value. The Comparator unit
5. excess += h[n] - β; is used to compare the gray level value with the maximum bin
6. fullbin = fullbin + 1; size (β=32). When the bin size reaches β, its size is not
7. h[n] = β; } incremented by 1 anymore by feeding 0 to the Adder instead of
8. } # end of for 1. In this case, the Capacity Bin counter which keeps track of
# histogram distribution the number of bins that have reached their capacity is updated.
9. while (excess >0) {
The Excess Counter, which keeps track of the pixels spilling
10. α = excess /(N –fullbin);
11. e = 0;
over the “capacity bins” is incremented by 1. In this
12. for (n= 0; n N; n++) { implementation, we use two 2Kx9 BRAMs to overlap
13. if (excess > 0) { computation over one tile with the write back of calculated
14. if (h[n] < β − α) { histogram. The outputs of this unit (clipped histogram, excess
15. if (e >0) { pixels, and capacity bins) are sent to Histogram Redistribution
16. h[n] = h[n] + α +1; unit to form redistributed histogram.
17. e = e -1;
B. Histogram Redistribution Unit
18. excess += – α -1; }
19. else { The number of iterations required to complete the
20. h[n] = h[n] + α; redistribution process for a tile depends on the profile of the
21. excess += – α; } pixel intensity values and the number of histogram bins that are
22. } near their capacity. We revise the baseline implementation
23. else if (h[n] < β ) (Algorithm 1) as shown in Algorithm 2 with the goal of
24. h[n] = β; reducing the iteration count in order to make this stage more
25. } hardware friendly for loop unrolling. For this purpose we
# spill over pixels for next iteration introduce a look-ahead excess pixel detector to the baseline
26. e += α – β + h[n]; algorithm.
27. excess += h[n] – β ;
28. } # end of for In the baseline redistribution flow, the number of excess
29. } # end of while pixels to be placed into available bins (“m” - line number 6 in
Algorithm 1) during an iteration is determined by dividing the
clearly identify the objects of interest visually, the bin limit excess pixel count with the total number of bins, which is fixed
could be set to 4 times the mean value of the histogram. In this to 256. Even though some of those bins may have already
extreme case, setting bin size to 64 is good enough for an been filled to their capacity, this method divides excess pixels
effective redistribution process. For regular images on the other conservatively over 256 bins, instead of diving by the actual
hand, setting bin limit to 2 times the mean value of the number of available bins. We exploit this in our
histogram is appropriate [19]. implementation to reduce the iteration count and replace “m”
As the HGU iterates through the pixels and accumulates the with “α” (line number 10 in Algorithm 2) , which is guaranteed
pixels for each bin, our implementation does not allow bin size to be greater than or equal to “m” as we divide the number of
to exceed the threshold of 32, which requires only 5 bits of excess pixels with the number of available bins.
storage space per bin. However we set the register size to be 8 Assume that “bin size” refers to the current pixel count for
bits to make our architecture parameterized for the bin a given bin with the threshold β. For the baseline
threshold. In the baseline CLAHE hardware implementation implementation (Algorithm 1), during any iteration, when the
[5], these registers have to be set to 12-bits each to gap between the bin size and β is less than “m”, there will be
accommodate the potential 4096 pixels in a 64x64 tile as they spill over pixels, which need to be redistributed in the next
don’t apply clipping during the histogram generation stage. In iteration. We introduce a look-ahead mechanism so that we
our design, additionally we use a separate 12-bit counter to determine whether there will be any excess pixel for the next
accumulate the excess pixels that spill over from the bins iteration or not while redistributing the pixels in the current
during the distribution. In overall we reduce the register foot iteration. Given α, β, and “bin size”, if the bin size is less than β
print by a factor of 1.5 through early clipping strategy. but greater than equal to β – α, then that bin will have spill over
Furthermore, on the FPGA, block RAMs are organized in pixels. After identifying the bins that will have excess pixels,
2Kx9 or 1Kx18 forms. Due to 12-bit requirement of the we accumulate the total number of pixels (“e” in Algorithm 2,
baseline CLAHE [5], the design is restricted to 1Kx18 line number 26) that will be spilled over. Each of the spill over
configuration, whereas our design with 8-bit registers is able to pixels are distributed over the bins whose size is less than β – α
utilize the 2Kx9 configuration. In fact, we completely utilize before moving onto the next iteration (Algorithm 2, line
the bitwidth of the BRAM as we use the 9th bit of BRAM to numbers 15-18).
indicate whether the bin for a specific intensity value has
reached its capacity (maximum bin size of 32) or not. This
Fig. 7. Histogram Redistribution Unit (HRU)

Given that there are 256 bins and the tile size is 64x64, of pixel as shown in equation 7. The derivation is shown step
setting the threshold (β) to 16 would lead to even distribution by step with equations 5-7.
of all pixels over the bins, which turns behavior of CLAHE
into AHE (Fig. 1c) and brings back the noise amplification The pipelined implementation shown in Fig. 8 utilizes four
problem. Therefore we set the β to 32 [19], which makes the multiplier units, and the Weight Generator unit, which provides
size of each bin to be larger than the maximum needed size for the coefficients indicating the distance between current pixel
the even distribution case and allows more number of pixels to and the center of 4 neighboring tiles (xy’,xy,x’y,x’y’). These
be absorbed in each iteration. Based on our experimental values are stored in the Weight Generator unit since it is more
studies, we observed that our C-CLAHE completes efficient to read them from memory than calculating each time.
redistribution process in two iterations in most cases and three For each pixel, four CDF values are read from BRAM
iterations in rare cases. We conservatively set the iteration corresponding to the contextual region. We multiply these CDF
count to three, unroll the loop and design the hardware values with their corresponding coefficients. In this way, we
accordingly. We present our hardware performance analysis calculate weighted four contextual region CDFs. The result is
based on this pessimistic design and evaluate the image quality divided by K × K which can be carried out by bit shifting.
based on this hardware implementation. Given that K is 64, the new intensity value is obtained by using
a 12-bit shifter for division. In the original formula, there are
The block diagram of the HRU is shown in Fig. 7. six different division operations where each of them depends
Histogram redistribution process starts after transferring the on variable distance for every step. In our method, we also
outputs of the HGU to the Mapping RAM. The Address eliminate these four dividers which consume significant
Generator and Controller module shown in Fig. 7 has three amount of logic resources.
outputs. We use the “h[n]<α-β” output to decide whether
histogram bin has enough capacity to add excess pixel or not, −
= ( )+ ( )
“e < 0” and “c” outputs to decide if loop unrolling is applied or
not. Address Generator and Controller unit also generates the
address for accessing the histogram data stored in the Mapping − −
+ ( )+ ( ) (5)
BRAM, which corresponds to the CDF. Address Generator is
based on a simple counter since we move through the bins in
1
an iterative manner. If a bin is filled to its capacity, that bin is = ( × ( )+ ( − )× ( )
skipped using the MSB (9th bit, section IV.B) of the address. ×
C. Calculation of the Mapping Function + ( − )× ( )
Remapping function requires several multipliers and
divider components as seen in equation 3. In addition to these + ( − )( − ) × ( )) (6)
costly operations, Mapping Function requires dynamic
calculation of distance between tiles and pixel. Therefore, we 1
restructured this equation to reduce the number of dividers and = ( × ( )+ × ( )
×
multipliers. In addition, we eliminated dynamic distance
calculation dependency by replacing the distance expression + × ( )+ × ( )) (7)
with the tile size (K). In this approach, the coefficients of where x'=K-x and y'=K- y.
cumulative distribution m(n) only depends on current address
Fig. 8. Remapping unit by interpolation over Upper Left, Upper Right, Below Left, Below Right tiles

TABLE I. Resource usage and performance comparison of the proposed C- increases the critical path delay increases. This is primarily due
CLAHE with state-of-the-art implementation. Image size: 640 × 480.
to the increase in the number of tiles, since our design is
C-CLAHE CLAHE parameterized to scale the number of HGUs and HRUs to
Number of Slice Registers 440 246
match the tile count. The size of the Histogram BRAM in the
HGU and the size of the Mapping BRAM in the HRU also
Number of Slice LUTs 4766 32123
increase. This in turn adds latency to complete all reads and
Number of fully used LUT-FF pairs 284 222 writes for the increased image size.
Number of Block RAM/FIFO 16 192
TABLE II. Performance and critical path delay comparison of the proposed
Operational Frequency (MHz) 108.9 209.6 C-CLAHE for various image sizes.
Performance (fps) 354.4 537.9
512x512 640x480 1280x720 1920x1080
V. RESULTS Critical Path Delay (ns) 8.48 9.19 12.06 14.48
We compare the resource usage and performance of the Maximum Frequency (MHz) 117.9 108.9 82.9 69.1
Throughput (fps) 450.1 354.4 89.9 33.3
proposed C-CLAHE with the state-of-the-art implementation
[5] in Table 1. Our design delivers real-time performance with VI. CONCLUSION AND FUTURE WORK
354fps and reduces hardware resource usage by a factor of 12X
for block RAMs and 6.7X for logic blocks. Fig. 9 shows the Real time processing of CLAHE is realized by using high
original image and the output of our FPGA based C-CLAHE memory bandwidth. We propose a method for real-time
implementation. It is clear from the image that C-CLAHE is interpolation based Contextual CLAHE implementation. To the
bringing out the details by increasing the visual quality and best of our knowledge, this is the first interpolation based
contrast of the image. In the same figure, we also include the CLAHE implementation on the FPGA. Important key
output of the original C-CLAHE of Pizer et al. [4]. We modifications from hardware implementation perspective are
conclude that the image quality is not sacrificed after our introduced to the Contextual CLAHE making it feasible to
modifications to the original C-CLAHE algorithm. implement on a resource limited FPGA. Histogram generation
method is restructured to reduce block RAM usage. A new
The modified C-CLAHE has also been implemented in histogram redistribution technique is proposed to implement
Matlab for functional verification. Matlab implementation iterative redistribution algorithm in hardware. Finally, an
takes 60.83 milliseconds to execute C-CLAHE on an Intel 2.30 alternative interpolation calculation method which is
GHz processor with 6GB RAM. We plot the CDF of Matlab functionally equivalent to the original approach is proposed to
and FPGA based outputs, and confirm that they match with a reduce the computation complexity. These modifications
0.39% negligible difference as shown in Fig. 10. allowed us to achieve real time processing speed for high
We evaluate the performance of our design for various definition image (1280 × 720) with much less hardware
image sizes. Table 2 shows the change in critical path delay, resource requirements. As a future work we plan to investigate
maximum operational clock rate, and throughput with respect different transfer functions for calculating CDF so that
to change in image resolution. For all image sizes we are able computation and memory demand is reduced. To be more
to achieve real-time performance. As the image resolution specific, one option is to maximize the contrast gain for
specific gray-level by using contrast-sensitive transfer function.
Fig. 9. Original input image (left) and C-CLAHE on FPGA (middle), original C-CLAHE in Matlab (right)

histogram”, Infrared Physics & Technology, Volume 48, Issue 1, pp. 77-
82, 2006.
[9] Kuen-Jan He, Chien-Chih Chen, Ching-Hsi Lu. Lei Wang,
“Implementation of a new contrast enhancement method for video
images," IEEE Conference on Industrial Electronics and Applications
(ICIEA), pp. 1982 - 1987. 2010.
[10] C. Iakovidou, V. Vonikakis, I. Andereadis, “FPGA implementation of a
real-time biologically inspired image enhancement algorithm,” Journal
of Real Time Image Processing, Volume 3(4) Dec 1, 2008.
[11] S.M. Pizer, “An Automatic Intensity Mapping for the Display of CT
Scans and Other Images,” in Proc. of the VIIth International Meeting on
Information Processing in Medical Imaging, pp. 276–309, 1983.
[12] R. Hummel, “Image Enhancement by Histogram Transformation,”
Computer Vision, Graphics and Image Processing, vol. 6, pp. 184–195,
1977.
[13] J. Y. Kim, L. S. Kim, S. H. Hwang, “An Advanced Contrast
Enhancement Using Partially Overlapped Sub-Block Histogram
Equalization”, IEEE Transactions on Circuits and Systems for Video
Technology, Vol. 11, No. 4, pp.475-484, April 2001.
[14] Z. Q. Wu, J. A. Ware, I. D. Wilson, J. Zhang, “Mechanism analysis of
Fig. 10. FPGA and Matlab based output comparison with 0.39% difference highly overlapped interpolation contrast enhancement”, IEEE
Proceedings Vision, Image & Signal Processing, Vol. 153, No. 4,
pp.512-520, August 2006.
REFERENCES [15] E. Jamro, M. Wielgosz, and K. Wiatr, "FPGA Implementation of the
[1] R. C. Gonzalez and R. E. Woods, Digital Image Processing. Addison- Strongly Parallel Histogram Equalization," In Proc. 10th IEEE
Wesley Publishing Company, Reading, MA, 1993. Workshop on Design and Diagnostics of Electronic Circuits and
Systems, April 2007.
[2] R. Hummel, “Image enhancement by histogram transformation,”
Computer Graph. Image Processing, vol. 6, pp. 184–195, 1977. [16] Kentaro Kokufuta and Tsutomu Maruyama, “Real-time processing of
local contrast enhancement on FPGA”, International Conference on
[3] V. T. Tom and G. J. Wolfe, “Adaptive histogram equalization and its Field Programmable Logic and Applications (FPL), pp.288-293, 2009.
applications,” SPIE Applicat. Digital Image Process. IV, vol. 359, pp.
204–209, 1982. [17] P.D. Ferguson, T. Arslan, A.T. Erdogan and A. Parmley, "Evaluation of
contrast limited adaptive histogram equalization (CLAHE) enhancement
[4] S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. on a FPGA," 2008 IEEE International SOC Conference, pp.119-122, 17-
Greer, B. H. Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive 20 Sept. 2008.
Histogram Equalization and Its Variations”, Computer Vision Graphics
and Image Processing, Vol. 39, pp.355-368, 1987. [18] M. J. Swain and D. H. Ballard, “Color Indexing,” International Journal
of Computer Vision, vol. 7, no. 1, pp. 11–32, 1991.
[5] K. Kokufuta and T. Maruyama, “Real-time Processing of Contrast
Limited Adaptive Histogram Equalization on PGA”, International [19] S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas, and K. E.
Conference on Field Programmable Logic and Applications (FPL), pp. Muller, "Contrast-limited adaptive histogram equalization: speed and
155-158, 2010. effectiveness," Visualization in Biomedical Computing, pp.337-345,
1990.
[6] S. Sajeev, M. Bajger, G. N. Lee, “Segmentation of Breast Masses in
Local Dense Background Using Adaptive Clip Limit-CLAHE,” [20] Yinpeng Jin , Laura Fayad , Andrew Laine, “Contrast Enhancement by
International Conference on Digital Image Computing: Techniques and Multi-scale Adaptive Histogram Equalization,” Proc. SPIE Wavelets:
Applications (DICTA), p1-8, 2015. Applications in Signal and Image Processing IX, SPIE. Vol. 4478, pp.
206-213, 2011.
[7] Ching-Hsi LU, Yu-Sheng Wang, Lei Wang and Hong-Yang Hsu, “A
Platform Implementation for Real Time Image Processing”, IEEE [21] K. Zuiderveld, “Contrast Limited Adaptive Histogram Equalization,” In:
Pacific Rim Conference on Multimedia, pp.238-247, 2008. P. Heckbert: Graphics Gems IV, Academic Press, ISBN 0-12-336155-9,
1994
[8] Wang Bing-Jian,Liu Shang-qian, Li Qing, Zhou Hui-xin, “A real-time
contrast enhancement algorithm for infrared images based on plateau

You might also like