0% found this document useful (0 votes)
57 views10 pages

Hybrid-Type CAM Design For Both Power and Performance Efficiency

Research paper on Content addressable memory CAM, Useful for implementing a CAM based memory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views10 pages

Hybrid-Type CAM Design For Both Power and Performance Efficiency

Research paper on Content addressable memory CAM, Useful for implementing a CAM based memory.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO.

8, AUGUST 2008

965

Hybrid-Type CAM Design for Both Power


and Performance Efficiency
Yen-Jen Chang, Member, IEEE, and Yuan-Hong Liao

AbstractContent-addressable memory (CAM) is a hardware


table that can compare the search data with all the stored data
in parallel. Due to the parallel comparison feature where a large
amount of transistors are active on each lookup, however, the
power consumption of CAM is usually considerable. This paper
presents a hybrid-type CAM design which aims to combine the
performance advantage of the NOR-type CAM with the power
efficiency of the NAND-type CAM. In our design, a CAM word
is divided into two segments, and then all the CAM cells are
decoupled from the match line. By minimizing both the match
line capacitances and switching activities, our design can largely
reduce the power consumption of CAM. The experimental results show that the hybrid-type CAM can reduce the search
energy consumption by roughly 89% compared to the traditional
NOR-type CAM. Because the hybrid-type CAM provides a fast
pull-down path to speed up the lightweight match line discharge,
the search performance of our design is even better than that of
the traditional NOR-type CAM.
Index TermsContent-addressable memory (CAM),
brid-type CAM, NAND-type CAM, NOR-type CAM.

hy-

I. INTRODUCTION
ONTENT-addressable memory (CAM) is a storage
which is addressed by the content (or data) rather than
the memory address. It widely used in many applications that
require fast table lookup [1]. Due to the parallel comparison
feature and high frequency of lookups, however, the power
consumption of CAM is usually significant. For example, in
StrongARM [2] embedded processors, the fully associative
TLBs with CAM tag consume about 17% of the total chip
power. Because the large power consumption would be vital
for the advanced applications with large CAM, the purpose of
this paper is to develop a low-power CAM design with high
performance.
There are two conventional CAM designs, i.e., NOR-type
and NAND-type CAMs. The NOR-type CAM provides the best
search performance, but its cost is a large amount of power consumption. In contrast, the NAND-type CAM trades the search
performance for a low-power feature. As revealed in previous
research, the match lines are the major power consumer in
CAM. The power consumption of match lines can be reduced
by depressing the voltage swing on the match lines [3], [4], or
by dividing the match line [5][8]. Because the match lines are
precharged conditionally in the segmentation techniques [5],
[6], the performance degradation is a major drawback. In this

Manuscript received October 13, 2006; revised September 12, 2007.


The authors are with the Department of Computer Science and Engineering,
National ChungHsing University, 402 Taichung, Taiwan, R.O.C. (e-mail:
[email protected]).
Digital Object Identifier 10.1109/TVLSI.2008.2000595

paper, we propose a hybrid-type CAM design. Similar to the


segmentation method, our design also divides a CAM word
into two segments. Unlike the techniques [5], [6], both trade a
performance degradation for the power saving, the hybrid-type
CAM design can largely reduce the power consumption of
CAM without any performance penalty.
The most pronounced features of the hybrid-type CAM design are summarized as follows.
1) In our design, it is unnecessary to discharge all the bit lines
to prevent the unexpected short-circuit power consumption. The power dissipated in bit line switching activities
can be effectively reduced.
2) Because all the CAM cells are decoupled from the match
line, the match line is lightweight. Moreover, only the
matched words would discharge the match line from high
to low. The match lines are not the major power consumer
any more in our design.
3) The hybrid-type CAM provides an additional fast pulldown path to speed up the match line discharge in case of a
word match. Independent of the segment size, this fast path
ensures that the high search performance can be realized.
4) Because a level restore path is added to the match line, our
design has the immunity from the false match incurred by
the possible race condition.
The hybrid-type CAM design was implemented with the
TSMC 0.18- m technology. With the size of 128 32, i.e.,
128 words by 32 bits, the experimental results obtained from
the HSPICE postlayout simulation show that if a CAM word
is divided into 4 and 28 bits, compared to the traditional
NOR-type CAM, our design can deliver an energy reduction of
89% while improving the search performance by 5%. The total
area overhead is less than 8%. In addition to the analysis of
the power dissipated in the normal case (called normal power),
we also examine the peak power dissipated in the worst case.
Compared to the NOR-type CAM, although our design largely
amplifies the difference between peak power and normal power,
it still achieves a 6% reduction in peak power consumption.
The rest of this paper is organized as follows. Section II reviews the conventional CAM organization, and indicates their
advantages and disadvantages. In Section III, the circuitry developed for the hybrid-type CAM is described in detail, and
the comparisons between our design and the related work on
CAM power reduction are also provided. Section IV discusses
the most important issues in the implementation of our design.
Next, the experimental results are given in Section V, and Section VI offers some brief conclusions.
II. CONTENT-ADDRESSABLE MEMORY
The content-addressable memory consists mainly of the
CAM cells. As shown in Fig. 1, there are two parts in a typical

1063-8210/$25.00 2008 IEEE

966

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008

Fig. 1. Typical CAM cell. (a) XOR type. (b) XNOR type.

Note that the pull-down transistors arranged in NOR type is


beneficial for search performance, but they contribute a lot of
drain capacitances to the match line. This results in more power
dissipated in match line switching. Because in many applications most of the CAM words are mismatched, a large number
of match line switches would consume a significant dynamic
power. For example, in the CAM tag used in the TLB or cache
memory, at most one word is matched on each lookup, which
implies that almost all the match lines would be discharged to
0, and then be charged to high before the next search. Consequently, the NOR-type CAM is power inefficient, although it
can provide the best performance.
B. NAND-Type CAM

Fig. 2. Traditional CAM designs.

CAM cell; one for storing data, called store unit, and the other
for comparing data, referred to as compare unit. The store unit
is usually implemented as the traditional 6T SRAM cell which
contains a cross-coupled inverter pair. The compare unit is a
pass-transistor logic (PTL) for comparing the stored with search
data. Depending on the different applications, the compare unit
can be implemented as XOR or XNOR functions, shown in
Fig. 1(a) and (b), respectively. Besides the store and compare
units, a pull-down transistor X, which is gate-controlled by
the comparison result, is necessary to connect/disconnect the
match line (ML) to/from the ground.
A. NOR-Type CAM
Traditionally, there are two types of CAM designs. As shown
in Fig. 2, one is NOR-type and the other is NAND-type. In
NOR-type CAM design, the CAM cell is usually XOR-type,
and the pull-down transistors of each CAM cell are arranged
in NOR type. During precharge phase, the match line is initially precharged to high. For a CAM word, if one or more
cells are mismatched, the match line would be discharged to
0. Only when all cells are matched, i.e., the search data are
identical to the stored data, the match line can retain logic high
as in the precharge phase. Because the pull-down path is very
short, in case of a mismatch the match line is discharged to 0
quickly. Thus, the NOR-type CAM provides the best search performance.

In contrast to the NOR-type CAM, the NAND-type CAM


aims to reduce the power dissipated in search operation, in
which the CAM cell is implemented as XNOR-type instead
of XOR-type, and the pull-down transistors of each CAM cell
in the same word are arranged in NAND type, as shown in
Fig. 2(b). The match line is initially precharged to high, and
discharged to 0 only when all CAM cells are matched. Because
the load capacitance of match line is small and only one match
line is discharged to 0 during a search, the power consumption
is minimal. However, the pull-down path is too long, such that
the match line discharge is very slow in case of a match. Thus,
the NAND-type CAM trades the performance degradation for
a large power saving.
III. HYBRID-TYPE CAM DESIGN
A. Overview
The key idea behind our design is to combine the performance advantage of NOR-type CAM with the power efficiency
of NAND-type CAM. As shown in Fig. 3, we divide a CAM
word into two segments, i.e., SEG_1 and SEG_2, and the necessary control circuitry. In the SEG_1, the CAM cell is implemented as XNOR-type and their pull-down transistors are
arranged in the NAND type, denoted as NAND-type block in
Fig. 3. The NAND-type block is connected to the ground only
when all the CAM cells of SEG_1 are matched. In contrast
to SEG_1, we use the XOR-type CAM cell to implement the
SEG_2, and their pull-down transistors are placed in the NOR
type, denoted as NOR-type block in Fig. 3. The NOR-type block
is disconnected from the ground only when all the CAM cells
of SEG_2 are matched.

CHANG AND LIAO: HYBRID-TYPE CAM DESIGN

967

Fig. 3. Word structure of the hybrid-type CAM design.

TABLE I
KEY NODE VOLTAGE (H/L) AND PATH CONNECTION/DISCONNECTION (O/X)
FOR EACH CASE IN THE HYBRID-TYPE CAM DESIGN

B. Search Operation
Similar to the traditional CAM, in our design there are two
phases during a search. They are precharge and match evaluation phases, respectively. In the precharge phase, all the match
lines are first precharged to high, and then in the match evaluation phase only the matched words would change the logic level
of the match line from high to low.
1) Precharge Phase : In this phase, the control signal PRE is
low. Thus, the match line (ML) is initially precharged to high.
Because the pull-down path T1, T2 and T3 are disconnected
by N1, N2, and N3 transistors, respectively, both M1 and M2
nodes are precharged to high via P1 and P2. Due to no paths to
the ground, it is unnecessary to discharge all the bit lines to 0 to
prevent the unexpected short-circuit during the precharge phase.
Compared to the traditional CAM implementation, therefore,
our design is more efficient in bit line power saving. In addition,
in our design the match lines are precharged unconditionally.
It is different from other segmentation techniques [5], [6], in
which the match lines are precharged conditionally that would
result in a performance penalty.
2) Match Evaluation Phase: After the precharge phase, the
control signal PRE is asserted high and the search data have to be
loaded on the bit lines to start the matching process. This phase
is called match evaluation phase. Because we divide a CAM
word into two segments, i.e., SEG_1 and SEG_2 as shown in
Fig. 3, depending on the match results of each segment there
are four possible cases in the match evaluation phase. It is a real
match only when both the SEG_1 and SEG_2 are matched. The

key node voltage and path connection/disconnection for these


cases are summarized in Table I, and the detailed operation are
described as follows.
Case 1: SEG_1 Is Mismatched and SEG_2 Is Mismatched/
Matched: Because SEG_1 is a mismatch, in the NAND-type
block at least one NMOS transistor is turned off that disconnects the pull-down path T1 from the ground. Therefore, node
M1 retains high that turns off the tail transistor N2 and N3 to
disconnect the pull-down path T2 and T3. This implies that no
matter whether SEG_2 is a match or mismatch, node M2 is still
high to turn on N4. Because the path T1 and T2 are disconnected
from the ground, the match line ML would maintain logic high
as in the precharge phase.
Fig. 4 shows the HSPICE waveform for Case 1, in which the
lengths of SEG_1 and SEG_2 are assumed to be 4 and 28, respectively, and Cell is the output signal of one XOR CAM
cell in SEG_2. In this simulation, besides Cell , the output signals of other XOR CAM cells are all 0 in SEG_2. If Cell is
high, then SEG_2 is a mismatch. In contrast, Cell with logic
low implies SEG_2 is matched. From Fig. 4, it is clear that no
matter whether Cell is high or low, i.e., SEG_2 is mismatched
or matched, once SEG_1 is mismatched, both M1 and M2 will
keep high. Thus, the match line ML will maintain logic high to
indicate this word is mismatched.
Case 2: SEG_1 Is Matched and SEG_2 Is Mismatched: Because SEG_1 is a match, in the NAND-type block all NMOS
transistors are turned on that connects the path T1 to ground.
Therefore, node M1 is discharged to 0 that turns on the tail transistor N2 and N3. As shown in the waveform of Fig. 5, during

968

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008

Fig. 4. HSPICE waveform for Case 1, in which SEG_1 is mismatched.

Fig. 5. HSPICE waveform for both Case 2 and Case 3.

the precharge phase M1 would be precharged to high, and then


discharged to 0 through the connected T1 path during the evaluation phase.
In this case, because SEG_2 is a mismatch, in the NOR-type
block at least one NMOS is turned on that connects the pulldown path T3 to the ground. Thus, M2 node is discharged to
0, as illustrated in Fig. 5 in which Cell with logic high indicates SEG_2 is mismatched. Because M2 is low, N4 transistor
is turned off to prevent the match line from discharging to 0
through path T1 and T2. Therefore, ML is still high to indicate this word is mismatched. Note that there is a small pulse
marked with the circle in Fig. 5. It is incurred by the race condition problem which is likely to cause a false match and will
be discussed in Section IV-A.
Case 3: SEG_1 is Matched and SEG_2 Is Matched: Similar
to Case 2, in this case M1 node is also discharged to 0 to turn on
the tail transistor N2 and N3. Because SEG_2 is a match in this
case, in the NOR-type block all NMOS transistors are turned off
that disconnects the pull-down path T3 from the ground. Thus,

M2 node still retains logic high as in the precharge phase, as


shown in Fig. 5 in which Cell with logic low implies SEG_2
is matched. Consequently, N4 is turned off and N4 is turned on
to discharge the match line to 0 through the pull-down path T1
and T2. That indicates a real match.
Note that in our design we provide two pull-down paths, i.e.,
T1 and T2, to discharge the match line. Because the length of
T1 path depends on the SEG_1 length, the discharge delay of
T1 will increase with the SEG_1 length. In contrast, the length
of T2 path is fixed one NMOS transistor. It is independent of the
SEG_1 length and must be shorter than T1. Therefore, T2 is a
fast discharge path to ensure our design has comparable search
performance to the traditional NOR-type CAM.
C. Related Work
There has been much previous work on CAM power reduction [3][9]. Because our design would divide a CAM word into
two segments, we only focus on the work related to the word

CHANG AND LIAO: HYBRID-TYPE CAM DESIGN

segmentation techniques [5][8]. In [5], Zukowski et al. introduced a selective precharge technique to reduce the match line
power consumption by breaking a CAM word into two stages.
A small subset of CAM cells can be used to do a precalculation, and the result is used to decide if the match line needs to
be precharged at all, i.e., conditional (or selective) precharge. A
similar CAM word structure, called static divided word match
line, was proposed in [6]. Besides segmenting the match line,
their work uses static circuit design to improve the reliability. In
addition, a new CAM cell with single bit line was introduced.
The single bit line design requires only one heavy loading bit
line, and prevents the frequent switching. Therefore, the proposed static design [6] can further reduce the CAM power dissipated in the bit line switching activities.
By comparing our design to the techniques [5], [6], the major
differences are summarized as follows.
1) Unlike [5] and [6], in which the match line is precharged
conditionally, the match line is always precharged in our
design, and then it is discharged conditionally.
2) Because we decouple all CAM cells from the match line,
the match line of the hybrid-type CAM is lightweight. In
addition, we further provide a fast pull-down path to discharge the lightweight match line quickly. Therefore, the
search performance of our design is better than both [5] and
[6], in which the match line is still heavyweight and the selective precharge will result in a modest delay penalty.
3) In the selective precharge technique [5], the additional
clock phases are critical to perform the correct search
operation, which increases both the complexity and power
consumption of clock. In contrast, our design needs no
additional timing control signals.
4) In the static divided word match line [6], the single bit line
design is indeed effective in reducing the bit line power
consumption, but it will result in the write problem [10],
that is, it is considerably difficult to write the cell state from
low to high in the single bit line configuration. The possible
solution is to provide a specialized write port or modify the
cell circuit [10].
Both methods would increase the transistor count, and thus the
power consumption of cell. In addition, there is a short-circuit path in the static divided method [6] if the first segment is
matched and the second segment is mismatched. In contrast, our
design is free from the short-circuit path in all possible cases.
An adaptive serial-parallel CAM [7], called SPCAM, is another low-power CAM structure. Besides dividing a CAM word
into two segments, SPCAM can operate in either parallel or serial mode. In serial mode, the energy consumption is almost a
quarter of the conventional parallel CAM, but the performance
degradation is about 25%. In parallel mode, without any performance penalty the energy consumption is still 33% better than
the conventional parallel CAM. By comparison, the search performance of our design is much better than the SPCAM operating in serial mode, and the power reduction of our design is
larger than the SPCAM operating in parallel mode where both
the segments are active.
A pipelined search scheme was proposed in [8], where a
CAM word is further divided into several segments. Each
segment is evaluated sequentially in a pipeline fashion. Only
the words that match a segment can proceed with the search

969

in their subsequent segments. Because the words that fail to


match a segment do not search for their subsequent segments,
the power consumption of the match lines can be reduced.
There are two major considerations in this scheme. First, the
power overhead incurred by the additional flip-flops and sense
amplifier used in each segment is significant, and so is the
area overhead. Second, the control circuitry, especially clock,
used in pipelining would diminish the power benefit from the
pipelined match line.
PB-CAM [9] stores some extra information along with each
word that is used in the search operation to save power. These
extra bits are derived from the stored word, and used in an initial
search before searching the main word. If this initial search fails,
then the CAM aborts the subsequent search, thus saving power.
The concept of PB-CAM is similar to the selective precharge
technique. However, both the power and area overhead incurred
by the precomputation circuitry is considerable, and so is the
precomputation time.
IV. IMPLEMENTATION ISSUES
Depending on the application, user can adjust the length of
SEG_1. If the length of a CAM word is bits and the length of
bits.
SEG_1 is bits, then the length of SEG_2 would be
In the SEG_1, because all the pull-down transistors are arranged
in serial mode (i.e., NAND-type block), and they are on the
critical path to discharge the match line, the length of SEG_1
is a powerful lever on the functionality, performance and power
efficiency in our design.
A. SEG_1 Length Versus Race Condition
From Fig. 3, we note that the speed of M1 discharge depends
on the length of SEG_1. This implies that there is a possible race
condition problem in case 2, i.e., SEG_1 is matched & SEG_2
is mismatched: a) If the M1 discharge is fast enough, then the
tail transistor N3 would be turned on quickly to discharge M2,
such that N4 transistor is turned off quickly to prevent the match
line from discharging; therefore, the logic high level of match
line can be retained correctly; and b) in the other case, if the M1
discharge is too slow to prolong the on time of N4 transistor,
then the match line would be discharged unexpectedly; if the
voltage level of match line is too low, then it is a false match.
To prevent the incorrect match incurred by the race condition, we add a PMOS transistor, N4, to provide the level-restore
capability. Once the M2 node is discharged to 0, regardless of
discharge speed, N4 transistor would be turned on to supply
the lost charge. Consequently, our design provides the immunity from the potential race condition problem. This effect can
be observed from Fig. 5, in which there is a small pulse marked
with the circle. The lost charge would be supplied quickly.
B. SEG_1 Length Versus Charge Sharing
If the length of SEG_1 is too long, the charge sharing
problem would possibly occur when SEG_1 is mismatched and
SEG_2 is matched. As shown in Fig. 6, the worst case is that
all the pull-down transistors are turned on but the most left one.
In this case, the charge of M1 node would be shared among the
, such that the voltage level of M1
intermediate nodes,
node is decreased. Because SEG_2 is matched, N4 is turned on

970

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008

Fig. 6. Example of the charge sharing problem incurred by large SEG_1.

Fig. 7. Voltage level of match line under the consideration to the worst charge
sharing problem for various SEG_1 lengths.

to discharge the match line. If the voltage level of match line is


too low, then it results in a false match.
Fig. 7 shows the voltage level of match line under the consideration to the worst charge sharing problem for various SEG_1
lengths, in which the load capacitance of match line is assumed
to be 4fF. In this simulation, because the threshold voltage of
PMOS transistor is -0.438V in the TSMC 0.18 m model, the
charge sharing problem would result in a false match when the
V
V
voltage level of match line is lower than
V, as the dash line shown in Fig. 7. From this figure we
conclude that if the length of SEG_1 is larger than 4, our design
has a possible false match. Therefore, the length of SEG_1 is
constrained within four bits throughout this paper.
C. SEG_1 Length Versus Power Saving
As described above, short SEG_1 can prevent the charge
sharing problem, but it increases the probability of M1 discharge. Suppose, for example, that the length of SEG_1 is
one bit. For a random pattern, the probability of M1 discharge
would be 50% on average, i.e., the probability of tail transistor
pull-down
N3 turned on is also 50%. Because there are
transistors in the NOR-type block, the probability of T3 path
connected to the ground would increase largely. It results in a
significant power dissipated in the discharge of the M2 node
with large drain capacitances. Ideally, the probability of M2

Fig. 8. Probability of M2 discharge for various SEG_1 lengths.

discharge is the product of the probability of T1 path conducting, i.e., (T1 conducting), and the probability of T3 path
conducting, i.e., (T3 conducting), as shown in the following
equation:
conducting

conducting

where and are the lengths of the entire word and SEG_1,
respectively. In this equation, we assume that the match probability is 1/2 for each CAM cell. In the SEG_2, because all the
pull-down transistors are arranged in the NOR type, the T3 path
is disconnected only when they are all turned off. Thus, (T3
. Fig. 8 shows the probconducting) is equal to
ability of M2 discharge for various SEG_1 lengths, in which
is assumed. Clearly, the probability of M2 discharge
decreases sharply as the length of SEG_1 is increased. This implies that the search operation would consume more power when
we decrease the length of SEG_1.
V. EXPERIMENTAL RESULTS
In this paper, we use TSMC 0.18 m 1P6M technology to
implement the proposed design. Fig. 9 shows the layout block
diagram and the microphotograph of the fabricated hybrid-type

CHANG AND LIAO: HYBRID-TYPE CAM DESIGN

971

Fig. 9. (a) Layout block diagram. (b) Microphotograph of the fabricated chip.

Fig. 10. Normalized match delay.

CAM chip, where the shift registers are used for the function
verification. Note that the core was broken into four blocks for
both the performance and power efficiency. For a substantial
comparison, besides the conventional NOR-type and NANDtype CAM, we also implement the related designs, including the
selective precharge scheme [5], the static divided word structure
[6], and the SPCAM [7] that operates in serial and parallel mode.
They are denoted as SP, SDW, SPCAM_S, and SPCAM_P, respectively. All CAM designs are with size of 128 32, i.e., 128
words by 32 bits, and the data presented in the following discussion are obtained from the HSPICE postlayout simulation.
A. Performance
In this paper, the metric used to evaluate the CAM performance is the match delay, which is defined as the elapsed time
from signal PRE is asserted high to the match line discharged
to 0 in case of a match. Table II lists the match delay of all
CAM designs where the SEG_1 length is varied from 1 to 6 bits.
Due to no segmentation, the match delay of the NOR-type and
NAND-type CAM are fixed at 0.641 ns and 2.774 ns, respectively. In other words, the match delay of NAND-type CAM is
4.3 times larger than that of NOR-type CAM. As indicated in
the background and [5], the NAND-type CAM is not a feasible

solution because of its long match delay. Fig. 10 shows the normalized match delay where the match delay of all CAM designs
are normalized to that of the NOR-type CAM. From this figure,
we summarize the most important aspects as follows.
(1) It is clear that the search performance of our design
is better than the other word segmentation techniques.
Particularly, only the hybrid-type CAM (with SEG_1
) has better search performance than the
length
conventional NOR-type CAM. Because the match line
is precharged conditionally, the search performance
of SP [5], SDW [6], and SPCAM_S [7] is worse than
that of NOR-type CAM. Note that the match delay of
SPCAM_P [7] is almost the same as that of NOR-type
CAM. This is because in SPCAM_P [7] both SEG_1
and SEG_2 are always active concurrently for high
search performance.
(2) The SEG_1 length has a significant impact on the search
performance for all word segmentation techniques except for SPCAM_S [7] and SPCAM_P [7], in which the
SEG_1 is broken into several sets of two bits to limit
the number of transistors in series to three. In our design the match delay increases with the length of SEG_1.
As shown in Fig. 3, the match line discharge relies on

972

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008

TABLE II
MATCH DELAY OF ALL CAM DESIGNS

TABLE III
SEARCH POWER CONSUMPTION OF ALL CAM DESIGNS

the M1 discharge that connects T1 and T2 paths to the


ground; and further, M1 discharge delay increases with
the number of transistors in the NAND-type block.
(3) One interesting observation from this result is that when
the SEG_1 length is less than or equal to four bits,
the match delay of our design is even shorter than that
of the conventional CAM. This is because our design
decouples all CAM cells from the match line, such that
the match line is lightweight. Once the fast path T2
is connected, it can discharge the lightweight match
line quickly. Although the NAND-type block would
degrade the match performance slightly, the fast path
T2 can compensate for the performance loss. Due to the
charge sharing problem, the length of SEG_1 is constrained within four bits. From the detailed data shown
in Table II, if the length of SEG_1 is 4, the match delay
is 0.609 ns. Compared to the conventional NOR-type
CAM, our design can improve the search performance
by 5%.
B. Power and Energy
Table III shows the power consumption during a search for all
CAM designs where the SEG_1 length is varied from 1 to 6 bits.
Clearly, the search power consumption can be reduced sharply
when the SEG_1 length is increased except for SPCAM_P [7],
in which SEG_1 and SEG_2 are checked concurrently for high
search performance. No matter whether SEG_1 is match or not,
SEG_2 is always active on every search, such that the search
power of SPCAM_P [7] is slightly less than that of NOR-type
CAM. In the proposed hybrid-type CAM, the search power consumption is roughly 0.29 mW as the SEG_1 length is 6 bits.
Compared to the NOR-type CAM, whose search power consumption is fixed at 3.04mW, our design can reduce roughly
90% of the search power consumption.
Besides reducing the search power consumption, increasing
SEG_1 length has a large impact on the search performance
as revealed in Section V-A. Consequently, increasing SEG_1
length is a tradeoff between power and performance. For a fair

comparison, energy is a suitable metric, which is the product


of the match delay (performance) and search power (power).
Combining Tables II and III, the detailed energy results for all
CAM designs are summarized in Table IV, and Fig. 11 shows
the normalized search energy, in which all energy results are
normalized to that of the NOR-type CAM design.
From Fig. 11, it is clear that the word segmentation techniques are indeed effective in reducing the energy consumption of CAM except for SPCAM_P [7] where both SEG_1 and
SEG_2 are always active for high search performance. Actually,
the search energy of both SPCAM_P [7] and NOR-type CAM
are almost the same. The major advantage of our design is that
it not only largely reduces the search power, but also improves
the match delay. In contrast, SP [5], SDW [6] and SPCAM_S
[7] would result in a delay penalty while reducing the search
power. Consequently, our design can achieve the most energy
improvement compared to the other related techniques. In addition, the energy efficiency of our design is even better than that
of the NAND-type CAM which has the lowest search power.
From Table IV, our design can reduce the energy consumption of NOR-type CAM by 90% as the SEG_1 length is 6 bits.
and SEG
The improvement difference between SEG
is marginal. However, if the SEG_1 length is larger than 4 bits,
due to charge sharing a possible false match does exist in our
design. For a reliable system, when the SEG_1 length is 4 bits,
our design can reduce the energy consumption of NOR-type and
NAND-type CAM by roughly 88% and 40%, respectively.
C. Peak Power Consumption
In addition to the power consumption under the normal
case (called normal power consumption) as discussed in Section V-B, the peak power consumption is another important
design issue due to the capacity constraint of the power supply,
which is essential to maintain supply voltage levels and increase
reliability [11]. The peak power, by definition, is the maximum
power consumption of the designed circuit during its execution.
It is always incurred by the worst-case data pattern.
Table V shows the normal and peak power consumption for
all CAM designs where SEG_1 length is 4.

CHANG AND LIAO: HYBRID-TYPE CAM DESIGN

973

Fig. 11. Normalized search energy.

TABLE IV
SEARCH ENERGY OF ALL CAM DESIGNS.

TABLE V
NORMAL AND PEAK POWER CONSUMPTION FOR ALL CAM DESIGNS WHERE SEG_1 LENGTH IS 4

1) In the conventional NOR-type CAM, the worst-case data


pattern is that all the CAM words are mismatched. The
difference between normal power consumption and peak
power consumption is only the power dissipated in the
switch of one match line. As shown in Table V, the difference is negligible.
2) The worst case of NAND-type CAM is that one CAM
word is matched and others are mismatched in the MSB
CAM cell. Because the mismatched words would incur the
worst case of charge sharing, the peak power consumption
is roughly 1.5 times as large as the normal power consumption in the NAND-type CAM.
3) For all word segmentation techniques, including our design, the worst case is that all the CAM words are matched
in SEG_1 and mismatched in SEG_2. Unlike the NORtype CAM, all word segmentation techniques would enlarge the difference between the normal and peak power
consumption except for SPCAM_P [7].
As shown in Table V, the peak power consumption of the
hybrid-type CAM is 2.874 mW which is roughly eight times
as large as the normal power consumption. This is because all
the M2 nodes with large drain capacitances are discharged to 0
and then precharged to V in the worst case. Compared to the

NOR-type CAM, although the hybrid-type CAM largely amplifies the difference between peak power and normal power, it still
achieves a 6% reduction in peak power consumption.
D. Area Cost
In contrast with the conventional NOR-type CAM, our design costs nine additional transistors which all come from the
control circuitry. For a CAM word with 32 bits, the layout size
of the conventional NOR-type and our design are 7.54 m
131.21 m and 7.54 m 141.57 m. Note that the height of
the proposed CAM is purposely retained the same as the height
of the conventional NOR-type CAM, such that both designs
have the same power dissipated in the bit line switching. The
area overhead is roughly 7.8%. Because the CAM words are
part of the entire CAM system, the total CAM area overhead is
less than 7.8%.
VI. CONCLUSIONS
In this paper, we have developed a hybrid-type CAM design,
in which we decouple all the CAM cells from the match line,
and provide a fast path to accelerate the search operation. With
a marginal area overhead, our design not only largely reduces

974

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008

the search power consumption but also improves the search performance.
REFERENCES
[1] K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory
(CAM) circuits and architectures: A tutorial and survey, IEEE J. SolidState Circuits, vol. 41, no. 3, pp. 712727, Mar. 2006.
[2] T. Juan, T. Lang, and J. Navarro, Reducing TLB power requirements, in Proc. Int. Symp. Low Power Electronics and Design, 1997,
pp. 196201.
[3] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speed
low-power CMOS fully parallel content-addressable memory macros,
IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.
[4] I. Arsovski and A. Sheikholeslami, A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,
Nov. 2003.
[5] C. A. Zukowski and S. Y. Wang, Use of selective precharge for lowpower content-addressable memories, in Proc. Int. Symp. Circuits and
Syst., 1997, pp. 17881791.
[6] K. H. Cheng, C. H. Wei, and S. Y. Jiang, Static divided word matching
line for low-power content addressable memory design, in Proc. Int.
Symp. Circuits and Syst., 2004, pp. 629632.
[7] A. Efthymiou and J. D. Garside, An adaptive serial-parallel CAM architecture for low-power cache block, in Proc. Int. Symp. Low Power
Electron. and Design, 2002, pp. 136141.
[8] K. Pagiamtzis and A. Sheikholeslami, A low power content-addressable memory (CAM) using pipelined hierarchical search scheme,
IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.

[9] C. S. Lin, J. C. Chang, and B. D. Liu, A low-power precomputationbase fully parallel content addressable memory, IEEE J. Solid-State
Circuits, vol. 38, no. 4, pp. 654662, Apr. 2003.
[10] Y. J. Chang, F. Lai, and C. L. Yang, Zero-aware asymmetric SRAM
cell for reducing cache power in writing zero, IEEE Trans. Very Large
Scale Integr. Syst., vol. 12, no. 8, pp. 827836, Aug. 2004.
[11] S. P. Mohanty, N. Ranganathan, and S. K. Chappidi, Peak power minimization through datapath scheduling, in Proc. IEEE Computer Soc.
Annu. Symp. VLSI (ISVLSI), Feb. 2003, pp. 121126.

Yen-Jen Chang (M02) received the M.S. degree


in computer science and information engineering
from Chung-Yuan Christian University, Jhongli City,
Taiwan, R.O.C., in 1997, and the Ph.D. degree in
computer science and information engineering from
National Taiwan University, Taipei, Taiwan, in 2003.
He joined the faculty of the Department of Computer Science at National ChungHsing University,
Taichung, Taiwan, in February 2004 and is currently
an Associate Professor. His research interests are
computer architectures, low-power VLSI design,
and embedded system and SoC design

Yuan-Hong Liao, photograph and biography unavailable at the time of


publication.

You might also like