Hybrid-Type CAM Design For Both Power and Performance Efficiency
Hybrid-Type CAM Design For Both Power and Performance Efficiency
8, AUGUST 2008
965
hy-
I. INTRODUCTION
ONTENT-addressable memory (CAM) is a storage
which is addressed by the content (or data) rather than
the memory address. It widely used in many applications that
require fast table lookup [1]. Due to the parallel comparison
feature and high frequency of lookups, however, the power
consumption of CAM is usually significant. For example, in
StrongARM [2] embedded processors, the fully associative
TLBs with CAM tag consume about 17% of the total chip
power. Because the large power consumption would be vital
for the advanced applications with large CAM, the purpose of
this paper is to develop a low-power CAM design with high
performance.
There are two conventional CAM designs, i.e., NOR-type
and NAND-type CAMs. The NOR-type CAM provides the best
search performance, but its cost is a large amount of power consumption. In contrast, the NAND-type CAM trades the search
performance for a low-power feature. As revealed in previous
research, the match lines are the major power consumer in
CAM. The power consumption of match lines can be reduced
by depressing the voltage swing on the match lines [3], [4], or
by dividing the match line [5][8]. Because the match lines are
precharged conditionally in the segmentation techniques [5],
[6], the performance degradation is a major drawback. In this
966
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008
Fig. 1. Typical CAM cell. (a) XOR type. (b) XNOR type.
CAM cell; one for storing data, called store unit, and the other
for comparing data, referred to as compare unit. The store unit
is usually implemented as the traditional 6T SRAM cell which
contains a cross-coupled inverter pair. The compare unit is a
pass-transistor logic (PTL) for comparing the stored with search
data. Depending on the different applications, the compare unit
can be implemented as XOR or XNOR functions, shown in
Fig. 1(a) and (b), respectively. Besides the store and compare
units, a pull-down transistor X, which is gate-controlled by
the comparison result, is necessary to connect/disconnect the
match line (ML) to/from the ground.
A. NOR-Type CAM
Traditionally, there are two types of CAM designs. As shown
in Fig. 2, one is NOR-type and the other is NAND-type. In
NOR-type CAM design, the CAM cell is usually XOR-type,
and the pull-down transistors of each CAM cell are arranged
in NOR type. During precharge phase, the match line is initially precharged to high. For a CAM word, if one or more
cells are mismatched, the match line would be discharged to
0. Only when all cells are matched, i.e., the search data are
identical to the stored data, the match line can retain logic high
as in the precharge phase. Because the pull-down path is very
short, in case of a mismatch the match line is discharged to 0
quickly. Thus, the NOR-type CAM provides the best search performance.
967
TABLE I
KEY NODE VOLTAGE (H/L) AND PATH CONNECTION/DISCONNECTION (O/X)
FOR EACH CASE IN THE HYBRID-TYPE CAM DESIGN
B. Search Operation
Similar to the traditional CAM, in our design there are two
phases during a search. They are precharge and match evaluation phases, respectively. In the precharge phase, all the match
lines are first precharged to high, and then in the match evaluation phase only the matched words would change the logic level
of the match line from high to low.
1) Precharge Phase : In this phase, the control signal PRE is
low. Thus, the match line (ML) is initially precharged to high.
Because the pull-down path T1, T2 and T3 are disconnected
by N1, N2, and N3 transistors, respectively, both M1 and M2
nodes are precharged to high via P1 and P2. Due to no paths to
the ground, it is unnecessary to discharge all the bit lines to 0 to
prevent the unexpected short-circuit during the precharge phase.
Compared to the traditional CAM implementation, therefore,
our design is more efficient in bit line power saving. In addition,
in our design the match lines are precharged unconditionally.
It is different from other segmentation techniques [5], [6], in
which the match lines are precharged conditionally that would
result in a performance penalty.
2) Match Evaluation Phase: After the precharge phase, the
control signal PRE is asserted high and the search data have to be
loaded on the bit lines to start the matching process. This phase
is called match evaluation phase. Because we divide a CAM
word into two segments, i.e., SEG_1 and SEG_2 as shown in
Fig. 3, depending on the match results of each segment there
are four possible cases in the match evaluation phase. It is a real
match only when both the SEG_1 and SEG_2 are matched. The
968
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008
segmentation techniques [5][8]. In [5], Zukowski et al. introduced a selective precharge technique to reduce the match line
power consumption by breaking a CAM word into two stages.
A small subset of CAM cells can be used to do a precalculation, and the result is used to decide if the match line needs to
be precharged at all, i.e., conditional (or selective) precharge. A
similar CAM word structure, called static divided word match
line, was proposed in [6]. Besides segmenting the match line,
their work uses static circuit design to improve the reliability. In
addition, a new CAM cell with single bit line was introduced.
The single bit line design requires only one heavy loading bit
line, and prevents the frequent switching. Therefore, the proposed static design [6] can further reduce the CAM power dissipated in the bit line switching activities.
By comparing our design to the techniques [5], [6], the major
differences are summarized as follows.
1) Unlike [5] and [6], in which the match line is precharged
conditionally, the match line is always precharged in our
design, and then it is discharged conditionally.
2) Because we decouple all CAM cells from the match line,
the match line of the hybrid-type CAM is lightweight. In
addition, we further provide a fast pull-down path to discharge the lightweight match line quickly. Therefore, the
search performance of our design is better than both [5] and
[6], in which the match line is still heavyweight and the selective precharge will result in a modest delay penalty.
3) In the selective precharge technique [5], the additional
clock phases are critical to perform the correct search
operation, which increases both the complexity and power
consumption of clock. In contrast, our design needs no
additional timing control signals.
4) In the static divided word match line [6], the single bit line
design is indeed effective in reducing the bit line power
consumption, but it will result in the write problem [10],
that is, it is considerably difficult to write the cell state from
low to high in the single bit line configuration. The possible
solution is to provide a specialized write port or modify the
cell circuit [10].
Both methods would increase the transistor count, and thus the
power consumption of cell. In addition, there is a short-circuit path in the static divided method [6] if the first segment is
matched and the second segment is mismatched. In contrast, our
design is free from the short-circuit path in all possible cases.
An adaptive serial-parallel CAM [7], called SPCAM, is another low-power CAM structure. Besides dividing a CAM word
into two segments, SPCAM can operate in either parallel or serial mode. In serial mode, the energy consumption is almost a
quarter of the conventional parallel CAM, but the performance
degradation is about 25%. In parallel mode, without any performance penalty the energy consumption is still 33% better than
the conventional parallel CAM. By comparison, the search performance of our design is much better than the SPCAM operating in serial mode, and the power reduction of our design is
larger than the SPCAM operating in parallel mode where both
the segments are active.
A pipelined search scheme was proposed in [8], where a
CAM word is further divided into several segments. Each
segment is evaluated sequentially in a pipeline fashion. Only
the words that match a segment can proceed with the search
969
970
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008
Fig. 7. Voltage level of match line under the consideration to the worst charge
sharing problem for various SEG_1 lengths.
discharge is the product of the probability of T1 path conducting, i.e., (T1 conducting), and the probability of T3 path
conducting, i.e., (T3 conducting), as shown in the following
equation:
conducting
conducting
where and are the lengths of the entire word and SEG_1,
respectively. In this equation, we assume that the match probability is 1/2 for each CAM cell. In the SEG_2, because all the
pull-down transistors are arranged in the NOR type, the T3 path
is disconnected only when they are all turned off. Thus, (T3
. Fig. 8 shows the probconducting) is equal to
ability of M2 discharge for various SEG_1 lengths, in which
is assumed. Clearly, the probability of M2 discharge
decreases sharply as the length of SEG_1 is increased. This implies that the search operation would consume more power when
we decrease the length of SEG_1.
V. EXPERIMENTAL RESULTS
In this paper, we use TSMC 0.18 m 1P6M technology to
implement the proposed design. Fig. 9 shows the layout block
diagram and the microphotograph of the fabricated hybrid-type
971
Fig. 9. (a) Layout block diagram. (b) Microphotograph of the fabricated chip.
CAM chip, where the shift registers are used for the function
verification. Note that the core was broken into four blocks for
both the performance and power efficiency. For a substantial
comparison, besides the conventional NOR-type and NANDtype CAM, we also implement the related designs, including the
selective precharge scheme [5], the static divided word structure
[6], and the SPCAM [7] that operates in serial and parallel mode.
They are denoted as SP, SDW, SPCAM_S, and SPCAM_P, respectively. All CAM designs are with size of 128 32, i.e., 128
words by 32 bits, and the data presented in the following discussion are obtained from the HSPICE postlayout simulation.
A. Performance
In this paper, the metric used to evaluate the CAM performance is the match delay, which is defined as the elapsed time
from signal PRE is asserted high to the match line discharged
to 0 in case of a match. Table II lists the match delay of all
CAM designs where the SEG_1 length is varied from 1 to 6 bits.
Due to no segmentation, the match delay of the NOR-type and
NAND-type CAM are fixed at 0.641 ns and 2.774 ns, respectively. In other words, the match delay of NAND-type CAM is
4.3 times larger than that of NOR-type CAM. As indicated in
the background and [5], the NAND-type CAM is not a feasible
solution because of its long match delay. Fig. 10 shows the normalized match delay where the match delay of all CAM designs
are normalized to that of the NOR-type CAM. From this figure,
we summarize the most important aspects as follows.
(1) It is clear that the search performance of our design
is better than the other word segmentation techniques.
Particularly, only the hybrid-type CAM (with SEG_1
) has better search performance than the
length
conventional NOR-type CAM. Because the match line
is precharged conditionally, the search performance
of SP [5], SDW [6], and SPCAM_S [7] is worse than
that of NOR-type CAM. Note that the match delay of
SPCAM_P [7] is almost the same as that of NOR-type
CAM. This is because in SPCAM_P [7] both SEG_1
and SEG_2 are always active concurrently for high
search performance.
(2) The SEG_1 length has a significant impact on the search
performance for all word segmentation techniques except for SPCAM_S [7] and SPCAM_P [7], in which the
SEG_1 is broken into several sets of two bits to limit
the number of transistors in series to three. In our design the match delay increases with the length of SEG_1.
As shown in Fig. 3, the match line discharge relies on
972
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008
TABLE II
MATCH DELAY OF ALL CAM DESIGNS
TABLE III
SEARCH POWER CONSUMPTION OF ALL CAM DESIGNS
973
TABLE IV
SEARCH ENERGY OF ALL CAM DESIGNS.
TABLE V
NORMAL AND PEAK POWER CONSUMPTION FOR ALL CAM DESIGNS WHERE SEG_1 LENGTH IS 4
NOR-type CAM, although the hybrid-type CAM largely amplifies the difference between peak power and normal power, it still
achieves a 6% reduction in peak power consumption.
D. Area Cost
In contrast with the conventional NOR-type CAM, our design costs nine additional transistors which all come from the
control circuitry. For a CAM word with 32 bits, the layout size
of the conventional NOR-type and our design are 7.54 m
131.21 m and 7.54 m 141.57 m. Note that the height of
the proposed CAM is purposely retained the same as the height
of the conventional NOR-type CAM, such that both designs
have the same power dissipated in the bit line switching. The
area overhead is roughly 7.8%. Because the CAM words are
part of the entire CAM system, the total CAM area overhead is
less than 7.8%.
VI. CONCLUSIONS
In this paper, we have developed a hybrid-type CAM design,
in which we decouple all the CAM cells from the match line,
and provide a fast path to accelerate the search operation. With
a marginal area overhead, our design not only largely reduces
974
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 16, NO. 8, AUGUST 2008
the search power consumption but also improves the search performance.
REFERENCES
[1] K. Pagiamtzis and A. Sheikholeslami, Content-addressable memory
(CAM) circuits and architectures: A tutorial and survey, IEEE J. SolidState Circuits, vol. 41, no. 3, pp. 712727, Mar. 2006.
[2] T. Juan, T. Lang, and J. Navarro, Reducing TLB power requirements, in Proc. Int. Symp. Low Power Electronics and Design, 1997,
pp. 196201.
[3] H. Miyatake, M. Tanaka, and Y. Mori, A design for high-speed
low-power CMOS fully parallel content-addressable memory macros,
IEEE J. Solid-State Circuits, vol. 36, no. 6, pp. 956968, Jun. 2001.
[4] I. Arsovski and A. Sheikholeslami, A mismatch-dependent power allocation technique for match-line sensing in content-addressable memories, IEEE J. Solid-State Circuits, vol. 38, no. 11, pp. 19581966,
Nov. 2003.
[5] C. A. Zukowski and S. Y. Wang, Use of selective precharge for lowpower content-addressable memories, in Proc. Int. Symp. Circuits and
Syst., 1997, pp. 17881791.
[6] K. H. Cheng, C. H. Wei, and S. Y. Jiang, Static divided word matching
line for low-power content addressable memory design, in Proc. Int.
Symp. Circuits and Syst., 2004, pp. 629632.
[7] A. Efthymiou and J. D. Garside, An adaptive serial-parallel CAM architecture for low-power cache block, in Proc. Int. Symp. Low Power
Electron. and Design, 2002, pp. 136141.
[8] K. Pagiamtzis and A. Sheikholeslami, A low power content-addressable memory (CAM) using pipelined hierarchical search scheme,
IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 15121519, Sep. 2004.
[9] C. S. Lin, J. C. Chang, and B. D. Liu, A low-power precomputationbase fully parallel content addressable memory, IEEE J. Solid-State
Circuits, vol. 38, no. 4, pp. 654662, Apr. 2003.
[10] Y. J. Chang, F. Lai, and C. L. Yang, Zero-aware asymmetric SRAM
cell for reducing cache power in writing zero, IEEE Trans. Very Large
Scale Integr. Syst., vol. 12, no. 8, pp. 827836, Aug. 2004.
[11] S. P. Mohanty, N. Ranganathan, and S. K. Chappidi, Peak power minimization through datapath scheduling, in Proc. IEEE Computer Soc.
Annu. Symp. VLSI (ISVLSI), Feb. 2003, pp. 121126.