0% found this document useful (0 votes)
97 views13 pages

Energy-Efficient Precharge-Free Ternary Content Addressable Memory (TCAM) For High Search Rate Applications

This is about CAM.

Uploaded by

Neha Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views13 pages

Energy-Efficient Precharge-Free Ternary Content Addressable Memory (TCAM) For High Search Rate Applications

This is about CAM.

Uploaded by

Neha Tripathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS 1

Energy-Efficient Precharge-Free Ternary Content


Addressable Memory (TCAM) for High
Search Rate Applications
Telajala Venkata Mahendra , Member, IEEE, Sheikh Wasmir Hussain , Sandeep Mishra , Member, IEEE,
and Anup Dandapat , Senior Member, IEEE

Abstract— Hardware search engines (HSEs) have been drawing I. I NTRODUCTION


significant attention in replacing software search algorithms in
order to speed up location access and data association in modern
systems. Content addressable memory (CAM) is one of the
H IGH-SPEED searching and accessing of data are primary
tasks of any device to meet faster look-up operation. But
the operation becomes critical to search larger information
promising HSEs due to its parallel search accessibility. However,
it is subjected to considerable dissipation which becomes severe at different instants with far-fetched throughput for various
while accessing many components including cells and associated network algorithms, especially in internet protocol (IP) look-
matchlines (MLs) during every search. Ternary CAM (TCAM) up, worm detection and data compression [1]. The number of
based routing tables, especially employed in network systems for computations and several running applications in processors
packet classification, has put a challenge to design energy-efficient
architectures with high-performance and reliable look-up opera- among computing based appliances connected to servers is
tion. Precharge-free CAM schemes are preferred solutions over increasing consequently due to Internet of things (IoT) [2]. The
precharge types to accomplish high-speed as well as low-power associative memories are mostly used in these applications to
goals of associative memory design. In order to overcome the reduce the number of redundant computations as there exist
drawbacks of precharge based designs and also to improve significant amount of redundant information while processing
performance during the search, we introduce a precharge-free
ternary content addressable memory (PF-TCAM). The proposed application like multimedia [2]–[5]. In residential healthcare
searching approach enhances the rate of search by reducing environments, data look-up is managed by dictionary memory
half of the ML evaluation time as it eliminates precharge phase based software approach [6]. The presence of network router
prior to every search by performing search in HALF clock in these environments offers faster hardware architecture for
cycle. A 32 × 16-bit proposed macro is designed using 45-nm a reliable communication between gateways in and around
CMOS technology and post-layout simulations at 1 V supply
shows 56% and 63% energy efficiency improvements compared home network for users. Threads in the home network are
to conventional TCAM and compact TCAM respectively over employed continuously for managing and passing data to/from
25 different search keys despite increasing evaluation speed by the tag entries. Repeated utilization of the threads involves
50% with an area overhead of 1 transistor/cell over compact access of media access control (MAC) addresses [available in
TCAM. ternary content addressable memory (TCAM) table]. TCAM
Index Terms— Content addressable memory (CAM), based MAC tables are able to manage or pass threads to
high search speed, low-power, precharge-free ternary CAM users with higher speed. MAC address in the table acts as
(PF-TCAM). the key for dictionary memory as all the thread processing
(create/update/delete) are performed in this memory [6].
Battery operated portable devices such as smart phones,
laptops, personal computers and similar electronic comput-
Manuscript received July 6, 2019; revised November 30, 2019; accepted
February 21, 2020. This work was supported in part by the Science and ing devices are commonly used appliances in everyday life.
Engineering Research Board under Project YSS/2015/001198, in part by the Regular usage of these devices is based on MAC table that
Ministry of Electronics and Information Technology (MeitY) under Project provides connectivity from one device to another. On a broader
SMDP-C2SD 9(1)/2014-MDD, in part by the Young Faculty Research Fel-
lowship (YFRF), and in part by the Ministry of Human Resource Development level between users at different locations, internet connectivity
(MHRD), Government of India. This article was recommended by Associate is becoming the main interface (the backbone). Internally,
Editor F. J. Kurdahi. (Corresponding author: Anup Dandapat.) MAC tables comprise of TCAMs as memory module to
Telajala Venkata Mahendra, Sheikh Wasmir Hussain, and Anup Dandapat
are with the Department of Electronics and Communication Engineering, improve performance of table access. All these primary elec-
National Institute of Technology at Meghalaya, Shillong 793003, tronic appliances consist of cache memory internally between
India (e-mail: [email protected]; [email protected]; processor and main memory to improve the access speed of
[email protected]).
Sandeep Mishra is with the Department of Electronics and Communica- task execution. There is a high probability of cache miss in
tion Engineering, Indian Institute of Information Technology at Pune, Pune conventional direct-mapped cache due to continuous refresh of
412109, India (e-mail: [email protected]). cache memory. In the conventional searching approach, cache
Color versions of one or more of the figures in this article are available
online at http://ieeexplore.ieee.org. controller passes addresses of frequently searched data to
Digital Object Identifier 10.1109/TCSI.2020.2978295 cache memory rather than to main memory so that data access
1549-8328 © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

Information exchange based on the content storage dif-


ferentiates CAM from traditional memory such as random
access memory (RAM). In a CAM, search data is compared
simultaneously with all the stored words to return address of
the matched word in single clock cycle [12], [13]. Nowadays,
the hardware transactional memories are prominent in enter-
prise and consumer electronic devices with the use of CAM
as fundamental storage [14]. Content addressable memory is
classified into two types: (i) Binary CAM (BCAM), which
stores only bit strings of ‘0s’ and ‘1s’ and provides exact
match of search; (ii) Ternary CAM (TCAM) stores additional
don‘t care bit denoted by ‘X’ and provides partial match. The
partial matching allows to determine the longest prefix match
out of multiple matches. These features of TCAM are very
useful in look-up table implementation of routers [15] and
Fig. 1. Utilization of TCAM in various network applications.
various network applications. More detailed study on one of
the application is discussed in Section IV-A).
Internet of things (IoT) demands hyper management of
is faster. A fully associative cache is preferred to associate
network traffic and switching to securely connect the infor-
locations in the main memory with the cache memory. Such
mation between various nodes by passing data in terms of
an association is responsible for solving the issue of contention
packets through routers. Most of the space segments can-
for memory locations. However, serial nature of entire cache
not afford latency between packets under processing as data
tag search costs performance bottlenecks. So, CAM often
rate in communication is on the rise. A CAM based router
replaces software cache tag to avoid this issue as CAM-based
chooses an approximate path due to its faster search capability
cache performs the search in a single clock cycle [7].
for transferring packets without having much delay between
A connection model of two different networks (for example,
packets. Since parts of packets are identical for a range of
between a wired network and wireless network) using a
destination addresses, the wildcard match of ternary CAM is
network bridge is illustrated in Fig. 1 to show the importance
required to store a range of data instead of a word in its entry.
of CAM tables. Not only does the appliances require to meet
Fig. 2 represents the interconnect structure for a packet com-
daily utilization but also they demand reliable operation within
munication. The processing speed must be high in three levels:
internet traffic with low-power as well as high-performance
data-link, network processing and switching. The switching
table access. This is where TCAM plays a significant role in
can be performed using a BCAM to move the packets from
achieving performance goals. We introduce first ever precharge
the input port to an output. In this context, BCAM functions
free TCAM cell for meeting performance requirements of
as a shared memory processor. However, the determination
various applications where data association is employed. The
of forward traffic location in the media access control comes
elimination of precharge phase reduces the total number of
through continuous pattern of the datagram. So, using a binary
cycles and doubles the search rate probability of cache despite
CAM for this purpose is bulky and power consuming. The
energy efficiency improvement.
masking feature of large capacity ternary CAM is useful to
accomplish parsing and classification of packets in the network
II. TCAM IN N ETWORK I NTERCONNECT processor since both data width (word-size) and depth of the
When a user requires to browse any information from a table are large.
network, associated content of the information is searched The increasing number of users in the utilization of tech-
first from the routing table and location of the corresponding nology demands faster searching and security services in
content is found later. During this type of content search and internet related prototypes. Owing to the connection based
return of address, TCAMs speed up the data processing as protocol in asynchronous transfer mode (ATM) switches,
they are addressed to the server. Despite routing and address address translation of every ATM cell address at every point
filtering in communication networks, searching feature of across the routing path is necessary [16]. Fig. 3 shows a
TCAMs are also used in radar signal tracking [8], pattern single ATM cell address accommodated in two fields of a
matching and data coding devices. Content addressable mem- 5-Byte header. Virtual path identifier (VPI) of the header is
ory, a special hardware amongst embedded memories. CAM of 8 to 12 bits wide, generally mentioned as 12-bit word,
is also employed in search-intensive general public utilities for while the virtual circuit identifier (VCI) is of 16 bits. Not only
acceleration apart from floating point units and retrieving data in ATMs but also in any application where the functionality
at run-time [9]. CAM‘s storage and searching functionality is based on data, there are requirements of CAM for faster
is also helpful in data coder such as variable-length coding response which can replace slow software algorithms. How-
decoder and Huffman coding [10] for image processing to ever, the process of address retrieval from the data storage
transfer compressed data from input to output after a result of using CAM requires two stages: (i) Initially, precharging the
match [11]. Quick response of CAM operation is often utilized internal address-lines to a predefined level; (ii) The second
in digital and high definition TV broadcasting system. stage compares a user-defined search data with the database.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 3

Fig. 2. Interconnect structure for a packet communication in network devices.

method to improve energy efficiency [18]. Although CAM and


AM have similar functions (return address of matched data),
they operate in different ways. AM searches a data by keeping
restrictions on probable location of the data based on the
content and the limitations on location vary for different AM
taxonomies. For instance, direct-mapped AM follows stringent
rule of limitation such that only one location is possible for a
data. Neuromorphic AM employs the architecture of a neural
network and the associative scheme is capable of recalling
data even under noisy environments. CAM bears resemblance
Fig. 3. Address of an ATM cell. to direct-mapped AM, especially data management. In both
the memories, data update policies are essential to overwrite
unwanted data, and the facility of immediate retrieval of an
In particular, the initial process limits the speed of access as a address for a given data is allowed. But the technique for rapid
certain cycle time is accommodated for charging up (address- search varies between these. There is only one address for a
lines), causing unnecessary dynamic power dissipation as given data to be stored in direct-mapped AM, because the
well during second stage. Most applications require faster storage location is based on the content. Whereas in a CAM,
response and less dissipation, which can be achieved with no restrictions are imposed on where the data has to be stored.
this proposed high search rate, energy-efficient precharge-free Each entry in the CAM consists of search hardware that are
TCAM searching. activated in parallel during evaluation. Data stored in every
entry is compared simultaneously with the search pattern and
III. P RECHARGE F REE T ERNARY CAM the address of the matched entry is returned.
Content-operated memory (COM) is classified into variants
of CAM and associative memory (AM) on the basis of A. Background and Motivation
taxonomy of address/data, applicability in suitable systems and Several popular TCAM cells and various architectural
technology used for implementation. An overview of COM strategies to improve their performance are available in litera-
space is listed in Fig. 4 [14]. In particular, an architecture of ture [12], [17], [19]–[22] but requirement of matchline (ML)
the TCAM based associative processor is shown in Fig. 5, precharging during every evaluation limits the memory per-
where the memory (TCAM) cell acts as core element of formance. NAND-ML and NOR-ML are the two basic ML
content storage by embedding several other necessary periph- structures [12] to form a CAM word. NOR-ML trades power
erals to allow storage as well as parallel processing. Data is for high speed while NAND-ML operates with low-power by
written into the associative memory using data write driver lowering speed. In addition, numerous circuit and architecture
through the activation of row decoder. The don‘t care bits level techniques are developed. Hybrid ML is possible with
in TCAM provide two masking strategies namely local and the combination of NAND and NOR [23]. In this hybrid ML
global masking [17]. Local masking method is used to mask structure, stored data is compared with low swing search data
stored data at cell level through data write driver. On the other to reduce both ML power and SL power, while segmented
hand, global masking is performed in specific search content NOR-ML offered selective ML evaluation [24]. With the help
via search driver. During search, the logic states of MLs of voltage down converter, precharging the MLs lesser than
change and these are either pulled up or down by the sense core supply also offer low swing ML transition to decrease
amplifiers to produce full logical swing to decide a match (or) ML power [25]. Partitioning NAND-ML into two segments
miss. Multiple matches from the sense amplifiers are further with different sizes allows application of reordered overlapped
resolved by the priority encoder to get the match location search (ROS) mechanism to filter out mismatch entries and
based on specific application. Utilization of associative mem- improve energy efficiency of search [26]. The continuous
ories (TCAMs) to implement computations is a promising pattern feature of mask data is utilized to achieve storage

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

Fig. 4. Overview of the content-operated memory (COM) space.

efficiency and low power in IPV6 based TCAM through the


data relocation scheme by relocating the data present in prefix
region [27]. Mask data is also used to optimize NAND and
NOR cells in hybrid-pipelined TCAM with the control of self
power gating technique to reduce ML power [28]. In a low
cost TCAM [29], ML gating and boosting technique eliminates
redundant ML discharge and SL switching to improve power
efficiency with the use of adaptive ML discharging scheme.
Half-and-half compare CAM (HHC-CAM) splits a ML to
filter out entries in precharge-based first side comparison with
a pre-search and operates the other side comparison with
charge sharing mechanism [30]. Don’t care reduction reduces
the number of storage to minimize area and decrease search
power of TCAM [31] by using bypass transistors and extra
decoders. The push-rule 6T SRAM bit cells based configurable
Fig. 5. Architecture of an associative processor.
TCAM is used to achieving area-energy efficiency in numerous
search applications [32]. This configurable memory is used
The remaining yet dominant number of matchlines resulting
to perform some logic operations among two or more words
in MISS (mismatch) are not discharged over multiple searches
and also is used to off-load the computations to memory.
resulting energy-efficiency using this architecture. With the
A compact cell with split-control single load lowers energy and
help of a small control circuit, the TCAM blocks are selec-
also reduces TCAM leakage at the cost of a sense amplifier
tively precharged after 4 search cycles instead of precharging
based on triple voltage margin [33]. A two-sided self gating
prior to every search. However, this design requires precharge
scheme reduces power of mask cells in TCAM [34]. The mask
cycle while precharging that particular matched ML, which
cells are partitioned into various segments and mask bits of
takes HALF clock cycle time. In the MASC-TCAM, switch-
each segment except that of boundary segment are kept same.
ing activity is minimized but the requirement of matchline
In this TCAM architecture, the boundary segment is activated
precharge still exists. In this paper, we have introduced
while the rest of the segments are deactivated, consequently
precharge free TCAM by eliminating unnecessary matchline
reducing leakage power dissipation. Butterfly ML combines
precharge prior to search which is present in existed works.
both architecture and circuit co-design to increase energy effi-
The proposed TCAM performs search in HALF clock cycle
ciency and hierarchical SL reduces search time [35]. An early
as it does not require the matchlines to be precharged at any
termination of precharge mechanism for low power TCAM is
point of time where as existing works on TCAM performs
employed to save the power by minimizing unnecessary ML
search in one or more clock cycles.
precharging [36].
The precharging of MLs prior to every search is mandatory
in all above discussed schemes. The functionality of multiple B. Precharge Free Searching
access single-charge (MASC) TCAM [37] is based on two pri- The possibility of completely excluding ML precharge
mary techniques to save power: 1) cycle duration of precharge motivated us to propose a higher search rate TCAM cell with
phase is controlled for selected TCAM blocks; 2) only those improvements in energy metric as it is a significant parameter
matchlines corresponding to HIT (match) are discharged. to be examined in low-power VLSI design [38]. Recently,

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 5

precharge free based CAM works are presented in [39]–[41]


but these only provide binary association. The work presented
in [39] is a self controlled precharge free CAM based on static
storage and it does not have the ternary approach either. The
design has more short circuit current paths due to the diode
structure formed during matchline evaluation [41]. The work
in [40] presents a precharge free dynamic CAM based on
a 4T dynamic storage and it does not have the availability
of bit level masking. Local masking is required in many
applications including forwarding tables in network routing.
The dynamic CAM also requires extra refresh circuits (leads
to search and area overhead). With the ternary approach, CAM
can perform association with the range of data words and that
provides benefit to perform association with more possible
data as many applications require this type of range of words
association too (it is not possible with binary CAM designs).
Key contributions are:
• The proposed ternary CAM design is the first design
which performs cell level masking along with the
global masking without precharging ML prior to every
search. The proposed design performs search in HALF
clock cycle where as existing designs performs search in
single or more clock cycles.
• A complete ternary approach is introduced that can pro-
vide masking for variety of applications. For masking,
local and global mask drivers are not required in proposed
Fig. 6. Timing diagram. (a) Existing works on TCAM. (b) Proposed scheme.
design because it is done through data write/content
search drivers: it minimizes design area and decreases
power in overall system. nodes are charged through the pull-up transistors (M3 and
• The proposed work reduces the evaluation time by 50%
M10 ), in either state of the written value, they don’t change
compared to existed works on TCAM and resulted as a their own state over time. Transistor pair M11 & M12 per-
high search rate engine. forms the comparison (XOR) operation among (Q1 , Q2 ) and
• Multiple search analysis on TCAM and Monte-Carlo
searchlines (SL, SLB). Whenever the search key matches with
(MC) sampling method have been carried out on the the stored contents, then searchlines pass a LOW logic to
proposed design to test its robustness besides providing the decision node (D) through transistor M11 (or) M12 and
the performance improvements. it turns ON the transistor M13 to pass previous cell ML state
There is a rapid increase in the amount as well as the (MLI). Otherwise, M14 passes LOW logic to the matchline.
rate of information exchange to accommodate both correlated Logic ‘1’ on ML represents match state whereas ‘0’ represents
and non-correlated contents. The specific evaluation task of a mismatch state. The introduced ternary structure of the
CAM involves the precharge phase. Unwanted search overhead PF-CAM adds value to suit bit-level local masking and search
raised from this phase can be eliminated using the precharge enabled global masking, which are as follows:
free structures illustrated through the state timing in Fig. 6.
• Local masking: When a bit-level match is required to be
More number of searches can be performed in the same
evaluation time by accommodating another search in place performed, nodes Q1 and Q2 are charged to LOW state,
of unwanted precharge cycle in CAM evaluation operation as which turns OFF transistors M11 & M12 to avoid XOR
proposed design performs search in HALF clock cycle where comparison operation. Simultaneously, local masking pair
as existing works on TCAM performs search in single clock (M6 and M7 ) turns ON to pass LOW logic. It forces the
cycle. The proposed cell uses two independent latches for matchline state to match.
• Global masking: Performing a global masking is
storing three different possible data (ternary). Weak latches are
used in this regard to minimize the cell leakage in a structure achieved by pulling the searchline pair (SL and SLB)
that encounters no functionality degradation. down in entire memory of that column. It allows the
decision node to charge down through either of searchline
pass transistor (M11 or M12 ) since one of the storage node
C. TCAM Cell is HIGH. One and only exception can be found when
The nodes (Q1 and Q2 ) are used for charge storage; both storage nodes are discharged (Q1 = Q2 = 0). This is
those store complementary data when the written information a likely event but only when the cell is masked locally.
is 0 (or) 1. The soft storage nodes (S1 and S2 ) help in In either scenario, input MLI is passed to the subsequent
maintaining the charge at Q1 and Q2 . Since the soft storage cell for decision.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

Fig. 7. Proposed precharge free ternary content addressable memory.

TABLE I
F UNCTIONAL TABLE OF P ROPOSED PF-TCAM (M: M ATCH ; MM:
M ISMATCH ; LM: L OCAL M ASKING ; GM: G LOBAL M ASKING )

Fig. 8. Functionality of TCAMs with an illustration of 8 × 8 array.

entries. The ML state of entry ‘8’ has remained HIGH for


consecutive searches in the proposed scheme. The same ML
would change its state every time when an evaluation performs
in precharge based designs. For “N” searches, the minimum
evaluation time required in existing precharge based TCAMs
The proposed cell is designed using NAND-type ML struc-
is:
ture (Fig. 7) and operates well at all possible conditions.
The functional table of the precharge-free TCAM (PF-TCAM) Ttotal = N ∗ (TPRE + Tsearch ) (1)
operation is summarized in Table I. States 1-4 represent regular
match and mismatch while 5 and 6 presents the local and But in the proposed design, the evaluation time reduces to:
global masking respectively. Ttotal = N ∗ Tsearch (2)
An example of processing 100 words with cycle time
D. Matchline State Variation of 20 ns is considered. The total time of evaluation required in
Change in voltage level of matchline in precharge based the existing precharge based TCAM designs is “Ttotal = 20 ns
designs is inevitable whenever a match occurs irrespective + 100 × (20 ns + 20 ns) = 4010 ns”; Whereas, the evaluation
of the previous state of it, since MLs are precharged by time required in the proposed TCAM design is: “Ttotal = 20 ns
default prior to every search in evaluation. However, proposed + 100 × (20 ns) = 2010 ns”.
mechanism does not require the matchlines to be precharged
before every search. Thus, ML state variation (HIGH to LOW IV. R ESULTS AND A NALYSIS
or LOW to HIGH) only depends upon the ML state of previous Performance parameters of the proposed PF-TCAM are
evaluation result. The proposed mechanism is explained with analyzed through a random search vector of 16-bits on a
the consideration of single entry: “A mismatch in first search 32 × 16-bit macro designed using Generic Process Design Kit
discharges the corresponding ML to GND, and a match in (GPDK) 45-nm CMOS technology. In order to verify the per-
the second search leads the same ML to charge up again”. formance efficiency, post-layout simulations of the proposed
If consecutive search keys are matched or mismatched with TCAM have been compared over the same simulations of
same entry then it remains at its ML state (no switching conventional 16T TCAM [17] and a compact 13T TCAM
activity in the matchline). [20] of same macro size by supplying 1 V under room
In many applications, mismatch of a search key with most temperature. Proposed TCAM can perform search reducing the
of the stored entries in associative memory is most likely. search cycle requirement by half by achieving better energy
Therefore in search intensive applications, the switching speed efficiency with an area overhead of 1 transistor / TCAM
to decide a match state (i.e. from mismatch to match) is cell compared to compact TCAM with no necessity of any
very much important. It has been demonstrated in Fig. 8 with extra control circuits to remove ML precharge. The results,
examples of matched and mismatched searches against stored discussions and comparison presented in the paper are of same

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 7

B. Multi-Search Analysis
In general, increasing the number of searches increases the
evaluation time requirement for completing look-up. Interest-
ingly, there is saving of 50% evaluation time in the proposed
TCAM due to the void of precharge [Fig. 10(a)]. Observation
from Fig. 10(b) shows that, average power decreases over
the span of multiple searches in case of PF-TCAM while
the same increases in precharged based TCAMs. Compared
to the conventional and compact TCAMs, power reductions
of 67.77% and 73.83%, respectively at 100 multiple searches
prove that PF-TCAM is a power saving architecture for
search intensive applications. It is to be noted that the energy
dissipation contributed by precharge phase is significantly high
in both the existed TCAMs [Fig. 10(c)]. In the proposed
Fig. 9. Evaluation results of the selected nodes discussed in Subsection IV-A. PF-TCAM, reduction in energy approaches bigger improve-
ments of 83.80% to 86.85% over conventional and compact
TCAMs.
macro size; all the designs are implemented on same size and
same matchline type (NAND). Proposed design produces the C. Process Corner Variation
ML state in 4.38 ns while conventional and compact designs
Functionality issues of the compared designs have been
take (TPRE + 13.93 ns) and (TPRE + 5.79 ns) in a single
verified at empirical process corners and performance metrics
search. Analysis is carried out with multiple search vectors as are provided in Table II. Conventional [17] and compact [20]
memory design becomes crucial in system on chip to achieve
TCAMs function satisfactorily at fast corners (FF and FS) with
higher yield and low power [42]. The performance compar-
delay variations of 47% to 68% compared to the typical corner
ison has also been made under process-voltage-temperature (TT). Compact TCAM is over 50% faster than conventional
(PVT) variations and operating frequency versus supply nodes.
TCAM at typical and fast corners but both these designs are
In addition, Monte-Carlo (MC) sampling is carried out to vulnerable at slow corners (SF and SS). In both typical and
verify the working of design under critical conditions.
fastest corner (FF), the proposed TCAM competently improves
the delay (match time) by getting rid of precharge time (TPRE )
and fast discharge of matchline from write to search states.
A. Network Routing: A Case Study A negligible delay variation of 0.23% in the SF corner with
One of the important applications of network routing (such respect to TT is of interest to note in the proposed design.
as IPv4) is packet forwarding based on destination address. Although the write power is comparable among all designs
The IPv4 uses 32-bit addresses without class-less inter-domain (in the order of infinitesimal μWs), yet improvements could be
routing. A case study of the same applied on a home network inferred from the power consumed during evaluation. Exclu-
is presented. A network with a strength of 23 needs at sion of precharge from the evaluation phase in the proposed
least 24 (with one root) nodes. Additional 8 nodes are pro- scheme made it more preferable against other designs in terms
vided for future extension. The IP ranges from 10.10.1.55 to of dissipation at operable process corners. On the contrary,
10.10.1.86 and the root node assigned to 10.10.1.68. When conventional and compact storage schemes operate with the
the host wants to add another client to its root list for necessity of ML precharge that has to be altered during
full administration privileges, it can set all the protocols to search which leads to increase amount of energy dissipation
null (which was originally assigned to the client user). The during evaluation. Also, the compact TCAM dissipates at least
complete distribution is implemented in the TCAM structure. 13.81% more power than the conventional TCAM. At the
It has a local masking feature at bit level of storage. The client worst-case power based corner (FF), evaluation power of
which is to be added to the root list is 10.10.1.64 (binary repre- the proposed TCAM is almost two-times and three-times
sentation is 00001010.00001010.00000001.01000000). Simply efficient than that of the conventional and compact TCAMs
by masking the 3rd bit from the right, can provide all root respectively. This leads to 44.91% and 64.16% improvements
privileges to the selected client. This functionality is tested in total energy dissipation. The write power dissipation in
by first providing the search key same as of 10.10.1.64, then proposed design is more than compared designs because of
twice as of 10.10.1.68. But during the third search, the third transistors M6 and M7 . Out of these transistors, one turns ON
bit in 10.10.1.64 client is masked. The responses are plotted during write. However, in TCAM design, the evaluation power
in Fig. 9. During the third search, the evaluated result in is dominant than the write power as write operation is less
the root is found to be exactly same as of client. One basic and search operations are carried out in millions [once data
advantage using TCAM can clearly be identified in this appli- is stored in TCAM, then probability of rewriting of data is
cation. Further benefits in association can also be achieved very less but searching of stored data is a repeated process].
with simple control at very low overhead. A precharge free So, the reduction of evaluation power advantage outperforms
scheme adds up one more search in the same duration. the disadvantage of small increment of power during write.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

Fig. 10. Multi search analysis at cycle time of 100ns. (a) Evaluation time requirement. (b) Average power. (c) Energy dissipation.

TABLE II
P OWER P ERFORMANCE C OMPARISON AT VARIOUS P ROCESS C ORNERS OVER 25 S EARCHES (T PRE : P RECHARGE C YCLE T IME )

In Table II, TCAM operation is considered non-functional (–) where ‘q’ and ‘Kb ’ are physical constants; ‘a’ and ‘k’ are
according to the following ML discharge time (DT) relation: device parameters; and ‘T’ is the absolute temperature.
According to the above discussion, the static power
[M L DT ]any corner  2[M L DT ]TT (3) increases with the increase in temperature which can be
observed in compared designs along with the proposed design.
D. Temperature Variation The static power is HIGH in compact TCAM [20] and pro-
posed TCAM at high temperatures because these designs uses
In order to demonstrate the performance improvements
leaky storage (soft storage nodes) while conventional TCAM
of the proposed scheme over the existing precharge based
used static storage [strong storage nodes due to utilization of
schemes [17] and [20], temperature analysis is carried out
back to back inverters]. By increasing temperature: (1) the
on the compared designs and plotted in Fig. 11. During ML
threshold voltage of transistors decreases which results in the
evaluation, all MLs except one change their states between
flow of drain current. (2) the gate leakage current increases.
precharge and search in conventional TCAMs leading to high
(3) the sub-threshold current also increases. If we multiply
and almost same amount of peak power in these TCAMs.
all these increased currents to the relevant supply voltage (or
Void of precharge phase in the proposed TCAM simplifies
the input voltage that current flows from) then we get the
the evaluation and hence peak power reduces, as shown
increased static power dissipation. In fact P =  (I) *
in Fig. 11(a). At low temperatures (-35◦C, -15◦ C), static power
VDD . However, total energy dissipation varies between 4-5%
of proposed design is less than other two designs. The effect
and reduces enormously in the proposed scheme because of
of leaky (soft) storage in proposed and compact TCAMs is
the minimization of ML switching activity. Taking all the tem-
significantly observed at higher range of Fig. 11(b) as the
perature nodes into consideration in Fig. 11(c), the proposed
static power increases compared to the conventional TCAM.
design is validated to be energy-efficient TCAM showing total
The leakage current also increases in exponential proportion
energy improvements of 48.67% to 56.58% over conventional
with temperature at a fixed threshold voltage (Vt ) [43], [44].
and compact TCAMs.
Static power is defined as the product of sub-threshold leakage
current (IDsub) and supply voltage. The exponential increment
in IDsub, in turns, increases the static power rapidly despite E. Supply Voltage Scaling
supply scaling. This gradual reduction in Vt due to scaling Performance comparison of the compared TCAMs are
leads to significant amount of static power. As Vt decreases, summarized in Table III over 25 searches at 1 V supply.
IDsub increases exponentially according to the current equation: Besides determining the performance behavior of TCAM at
−q.Vt temperature and process variations, estimating power savings
IDsub = k.e a.K b .T (4) against supply voltage scaling is another important concern.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 9

Fig. 11. Temperature variation in 25 number of repetitive searches. (a) Peak power. (b) Static power. (c) Energy.

Fig. 12. Supply voltage scaling. (a) Static power. (b) Peak power. (c) Evaluation power.

TABLE III difference) same where as it is less in proposed design due


P ERFORMANCE C OMPARISON W ITH O THER W ORKS OVER 25 S EARCHES to continuous search mechanism without precharging match-
lines. Evaluation power, the largest among ML dissipation,
is also inspected at each supply to emphasize the ML power
differences. Conventional as well as compact TCAMs dissipate
large amount of power during evaluation; this is resulted
from the frequent switching imposed by precharge phases that
divides search phases between multiple evaluations. Despite
the improvement of search rate by the elimination of precharge
phase in proposed TCAM, the precharge free ML scheme also
puts positive influence to reduce evaluation power. Between
1.2 V and 1 V, optimization of evaluation power reaches
reduction of 39% to 56% against the conventional design.
As shown in Fig. 12, conventional TCAM indicates lower Even at low supply of 0.9 V, the proposed TCAM is three
adaptability to supply less than 1 V and also static power times power efficient than the compact TCAM. The power
is almost same in both conventional and compact TCAMs. dissipation during evaluation is less in the proposed design at
Peak ML power, defined as the maximum dissipation during the supply variation because the switching activity is very less
CAM operation, is also an important consideration because of from one search to another search as proposed design operates
the constraint of supply [45]. Compared to compact TCAM, without precharge during every search. As supply increases,
proposed design reduces up to 51% peak power in an average the charging/discharging current also increases, leads to more
at all voltage nodes. From Fig. 12(b), it can be observed that power dissipation. From Fig. 12(c), we can see that the
the peak power of compared TCAMs are approximately (small increment rate of proposed design from 0.9 V to 1.0 V, 1.0 V to

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

TABLE IV
M AXIMUM C LOCK F REQUENCY VS S UPPLY V OLTAGE (V DD )

TABLE V
A RRAY S IZE VARIATION (M: M ATCH ; MM: M ISMATCH )

Fig. 13. SPICE waveform of 8 × 8 memory according to data entries


in Fig. 8 over 6 searches.

1.1 V and 1.1 V to 1.2 V is approximately same. By observing


from 1.0 V to 1.2 V, the power increment is doubled as
according to above mentioned increment scenario. This type
of increment is also observed in compared designs. However,
overall power dissipation of proposed design is better than
compared designs at various supply nodes.
Fig. 14. Matchline swing of mismatch state by Monte-Carlo sampling method
for a search over 1000 MC runs.
F. Clock Frequency Versus Supply Voltage
While CAM is performing its operation, it should be syn-
frequencies when the circuit is operating at higher supply and
chronous with the clock frequency to ensure high search rate.
medium word lengths. The supply voltage versus frequency
Maximum permissible frequency for the compared designs are
range is tabulated in Table IV.
tabulated against the supply voltage sweep from 0.8 V in steps
of 0.05 units in Table IV. At 1 V supply, conventional TCAM
ensures precise search result under 71 MHz. Compact TCAM G. Impact of Matchline State Variation
is more reliable than conventional design as the maximum Fig. 13 has shown the advantages of proposed scheme of
clock frequency is increased twice at >1.0 V supply and not only more number of possible searches but also fewer
thrice at <1.0 V supply. The proposed TCAM not only state variations in ML as well. Worst-case delay and related
accommodates additional searches in place of precharges but power metrics has been mentioned in Table V for different
also is highly frequency sustainable. At corresponding supply word lengths. The representation of delay in terms of time
nodes, increase in the clock frequency of the proposed TCAM interval shows that, the delay lies in a near range of it for any
is more than 100% from the maximum allowable frequencies search information. However, the data length can be varied
of conventional TCAM. In comparison with compact TCAM, (increased or decreased) according to requirement. SPICE
the proposed design is well suited at higher frequencies pro- waveform presented in Fig. 13 are obtained from 8×8 memory
viding increment of 30% to 52%. It also notes a high frequency over 6 searches according to entries presented in Fig. 8.
of 861 MHz to restrict failure at a high supply of 1.2 V in From Fig. 13(a), it can be stated that existing works require
the proposed TCAM. As the designs are implemented using twice the time for obtaining ML state compared to proposed
NAND-type ML, it can perform search well in low to moderate [Fig. 13(b)]. In addition, the PF-TCAM is tested with a random
frequencies while operating at low supply (below 1.0 V) and variation using Monte-Carlo (MC) method over 1000 samples
larger word lengths. However, it can be operated at high to verify the robustness of the design, Fig. 14 shows the sharp

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 11

TABLE VI
F EATURE C OMPARISON S UMMARY W ITH R ECENT W ORKS . [MLPR; M ATCHLINE P RECHARGE REQUIREMENT ]

Performance comparison of proposed PF-TCAM and


recently reported works are summarized in Table VI. The
frequency of operation of proposed design is less compared
to recently presented work on TCAM [36] as proposed design
is implemented using NAND-ML where as design in [36] is
implemented using NOR-ML. Among these designs, the pro-
posed design is the only TCAM which operates without ML
precharge. The normalized energy (EfSN) calculation defined
in [35] is used for legitimate comparison.
   
45 nm 1 2
EfSN = EfS× × (5)
Technology VDD
The energy metric is normalized to 45 nm/1.0 V according to
Fig. 15. Scattergram of the average power versus matchline delay of a search above equation (5). The proposed TCAM shown better energy
over 1000 runs by Monte-Carlo sampling method. efficiency and more improvements in energy can be achieved
while performing larger number of searches. The proposed
TCAM can be operated up to 861 MHz at 1.2 V supply.

V. C ONCLUSION
This paper introduces a precharge-free searching approach
in ternary CAM as an alternative solution to precharge
type TCAMs. Absence of precharge cycle in the proposed
precharge-free TCAM (PF-TCAM) reduces evaluation time by
50%, which can enormously be useful in various applications
involving search, association and computation. The introduced
TCAM performs search in HALF clock cycle while existing
TCAM designs performs search in single clock cycle. Depen-
dency of the matchline state variation only on the previous
evaluation makes the search performance of PF-TCAM faster
while the reduction of ML switching activity lowers the
energy dissipation. Unlike the conventional schemes, search
Fig. 16. Histogram of average power dissipation over 1000 MC runs.
frequency is also increased with this precharge free scheme.
The proposed work has been compared with conventional
ML discharge rate variation. Lower spread in the scattergram and compact designs of TCAM to prove the advantage of
(average power versus ML delay) in Fig. 15 proves stable elimination of precharge phase, that otherwise present in all the
performance with less scatter points. The average power is existed TCAMs. The switching activities in the matchlines are
shown in Fig. 16 for 1000 MC runs over two words (match less irrespective of the search vectors. The PF-TCAM design
and mismatch word), recorded minimum and maximum as is also found to be stable over variation even without the fully
155.6 nW and 734.9 nW, respectively. cross-coupled storage. The performance parameters showed

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

12 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS–I: REGULAR PAPERS

that it is a good alternative to make faster and energy efficient [22] S. Mishra, T. V. Mahendra, J. Saikia, and A. Dandapat, “A low-overhead
associative memory for various modern applications involving dynamic TCAM with pipelined read-restore refresh scheme,” IEEE
Trans. Circuits Syst. I, Reg. Papers, vol. 65, no. 5, pp. 1591–1601,
high speed search. May 2018.
[23] B.-D. Yang, Y.-K. Lee, S.-W. Sung, J.-J. Min, J.-M. Oh, and
R EFERENCES H.-J. Kang, “A low power content addressable memory using low swing
search lines,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 12,
[1] B. Agrawal and T. Sherwood, “Ternary CAM power and delay model: pp. 2849–2858, Dec. 2011.
Extensions and uses,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., [24] S. W. Hussain, T. V. Mahendra, S. Mishra, and A. Dandapat, “Match-line
vol. 16, no. 5, pp. 554–564, May 2008. division and control to reduce power dissipation in content addressable
[2] M. Imani, P. Mercati, and T. Rosing, “ReMAM: Low energy resistive memory,” IEEE Trans. Consum. Electron., vol. 64, no. 3, pp. 301–309,
multi-stage associative memory for energy efficient computing,” in Aug. 2018.
Proc. 17th Int. Symp. Qual. Electron. Design (ISQED), Mar. 2016, [25] I. Hayashi et al., “A 250-MHz 18-Mb full ternary CAM with low
pp. 101–106. voltage matchline sensing scheme in 65-nm CMOS,” IEEE J. Solid-State
[3] C. Ji, Y. Li, W. Qiu, U. Awada, and K. Li, “Big data processing in Circuits, vol. 48, no. 11, pp. 2671–2680, Nov. 2013.
cloud computing environments,” in Proc. IEEE Int. Symp. Pervasive [26] N. Onizawa, S. Matsunaga, V. C. Gaudet, W. J. Gross, and T. Hanyu,
Syst., Algorithms Netw. (ISPAN), Dec. 2012, pp. 17–23. “High-throughput low-energy self-timed CAM based on reordered over-
[4] A. Katal, M. Wazid, and R. H. Goudar, “Big data: Issues, challenges, lapped search mechanism,” IEEE Trans. Circuits Syst. I, Reg. Papers,
tools and good practices,” in Proc. 6th Int. Conf. Contemp. Comput. vol. 61, no. 3, pp. 865–876, Mar. 2014.
(IC3), Aug. 2013, pp. 404–409.
[27] B.-D. Yang, “Low-power effective memory-size expanded TCAM using
[5] M. Imani, A. Rahimi, and T. S. Rosing, “Resistive configurable associa-
data-relocation scheme,” IEEE J. Solid-State Circuits, vol. 50, no. 10,
tive memory for approximate computing,” in Proc. Design, Autom. Test
pp. 2441–2450, Oct. 2015.
Eur. Conf. Exhib. (DATE), 2016, pp. 1327–1332.
[6] R. S. Sherratt, B. Janko, T. Hui, W. Harwin, and D. Diaz-Sanchez, [28] T.-S. Chen, D.-Y. Lee, T.-T. Liu, and A.-Y. Wu, “Dynamic reconfigurable
“Dictionary memory based software architecture for distributed Blue- ternary content addressable memory for OpenFlow-compliant low-power
tooth low energy host controllers enabling high coverage in consumer packet processing,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 63,
residential healthcare environments,” in Proc. IEEE Int. Conf. Consum. no. 10, pp. 1661–1672, Oct. 2016.
Electron. (ICCE), 2017, pp. 406—407. [29] W. Choi, K. Lee, and J. Park, “Low cost ternary content addressable
[7] S. Mishra, T. V. Mahendra, and A. Dandapat, “A 9-T 833-MHz memory using adaptive matchline discharging scheme,” in Proc. IEEE
1.72-fJ/bit/search quasi-static ternary fully associative cache tag with Int. Symp. Circuits Syst. (ISCAS), May 2018, pp. 1–4.
selective matchline evaluation for wire speed applications,” IEEE Trans. [30] W. Choi, J. Park, H. Kim, C. Park, and T. Song, “Half-and-Half com-
Circuits Syst. I, Reg. Papers, vol. 63, no. 11, pp. 1910–1920, Nov. 2016. pare content addressable memory with charge-sharing based selective
[8] W. Xingdong, Y. Songyu, and L. Longfei, “Implementation of MPEG- match-line precharge scheme,” in Proc. IEEE Symp. VLSI Circuits,
2 transport stream remultiplexer for DTV broadcasting,” IEEE Trans. Jun. 2018, pp. 17–18.
Consum. Electron., vol. 48, no. 2, pp. 329–334, May 2002. [31] K. Woo and B. Yang, “Low-area TCAM using a don’t care reduction
[9] M. Imani, D. Peroni, and T. Rosing, “NVALT: Nonvolatile approximate scheme,” IEEE J. Solid-State Circuits, vol. 53, no. 8, pp. 2427–2433,
lookup table for GPU acceleration,” IEEE Embedded Syst. Lett., vol. 10, Aug. 2018.
no. 1, pp. 14–17, Mar. 2018. [32] S. Jeloka, N. Bharathwaj Akesh, D. Sylvester, and D. Blaauw, “A 28 nm
[10] L.-Y. Liu, J.-F. Wang, R.-J. Wang, and J.-Y. Lee, “CAM-based VLSI configurable memory (TCAM/BCAM/SRAM) using push-rule 6T bit
architectures for dynamic Huffman coding,” IEEE Trans. Consum. cell enabling logic-in-memory,” IEEE J. Solid-State Circuits, vol. 51,
Electron., vol. 40, no. 3, pp. 282–289, Aug. 1994. no. 4, pp. 1009–1021, Apr. 2016.
[11] C.-T. Hsieh and S. P. Kim, “A concurrent memory-efficient VLC decoder [33] C.-X. Xue, W.-C. Zhao, T.-H. Yang, Y.-J. Chen, H. Yamauchi, and
for MPEG applications,” IEEE Trans. Consum. Electron., vol. 42, no. 3, M.-F. Chang, “A 28-nm 320-kb TCAM macro using split-controlled
pp. 439–446, Aug. 1996. single-load 14T cell and triple-margin voltage sense amplifier,” IEEE J.
[12] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable mem- Solid-State Circuits, vol. 54, no. 10, pp. 2743–2753, Oct. 2019.
ory (CAM) circuits and architectures: A tutorial and survey,” IEEE J. [34] Y.-J. Chang, K.-L. Tsai, and H.-J. Tsai, “Low leakage TCAM for IP
Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006. lookup using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg.
[13] K.-L. Tsai, Y.-J. Chang, and Y.-C. Cheng, “Automatic charge balancing Papers, vol. 60, no. 6, pp. 1478–1486, Jun. 2013.
content addressable memory with self-control mechanism,” IEEE Trans. [35] P. T. Huang and W. Hwang, “A 65 nm 0.165 fJ/bit/search 256 × 144
Circuits Syst. I, Reg. Papers, vol. 61, no. 10, pp. 2834–2841, Oct. 2014. TCAM macro design for IPv6 lookup tables,” IEEE J. Solid-State
[14] R. Karam, R. Puri, S. Ghosh, and S. Bhunia, “Emerging trends in design Circuits, vol. 46, no. 2, pp. 507–519, Feb. 2011.
and applications of memory-based computing and content-addressable [36] K. Lee, G. Ko, and J. Park, “Low cost ternary content addressable
memories,” Proc. IEEE, vol. 103, no. 8, pp. 1311–1330, Aug. 2015. memory based on early termination precharge scheme,” in Proc. IEEE
[15] Z. Cai, Z. Wang, K. Zheng, and J. Cao, “A distributed TCAM coproces- Int. Symp. Circuits Syst. (ISCAS), May 2019, pp. 1–4.
sor architecture for integrated longest prefix matching, policy filtering, [37] M. Imani, S. Patil, and T. S. Rosing, “MASC: Ultra-low energy
and content filtering,” IEEE Trans. Comput., vol. 62, no. 3, pp. 417–427, multiple-access single-charge TCAM for approximate computing,”
Mar. 2013. in Proc. Design, Autom. Test Eur. Conf. Exhib. (DATE), 2016,
[16] M. Defossez, “Content addressable memory (CAM) in ATM applica- pp. 373–378.
tions,” Xilinx Appl. Note, Jan. 2001.
[38] A. Wiltgen, K. A. Escobar, A. I. Reis, and R. P. Ribas, “Power
[17] N. Mohan, “Low-power high-performance ternary content addressable
consumption analysis in static CMOS gates,” in Proc. 26th Symp. Integr.
memory circuits,” Ph.D. dissertation, Dept. ECE, Univ. Waterloo, Water-
Circuits Syst. Des. (SBCCI), Sep. 2013, pp. 1–6.
loo, ON, Canada, 2006.
[18] A. Ghofrani, A. Rahimi, M. A. Lastras-Montano, L. Benini, R. K. Gupta, [39] T. Venkata Mahendra, S. Mishra, and A. Dandapat, “Self-controlled
and K.-T. Cheng, “Associative memristive memory for approximate high-performance precharge-free content-addressable memory,” IEEE
computing in GPUs,” IEEE J. Emerg. Sel. Topics Circuits Syst., vol. 6, Trans. Very Large Scale Integr. (VLSI) Syst., vol. 25, no. 8,
no. 2, pp. 222–234, Jun. 2016. pp. 2388–2392, Aug. 2017.
[19] A. Igor, C. Trevis, and A. Sheikholeslami, “A ternary [40] T. V. Mahendra, S. W. Hussain, S. Mishra, and A. Dandapat, “Precharge
content-addressable memory (TCAM) based on 4T static storage free dynamic content addressable memory,” Electron. Lett., vol. 54,
and including a current-race sensing scheme,” IEEE J. Solid-State no. 9, pp. 556–558, May 2018.
Circuits, vol. 38, no. 1, pp. 155–158, Jan. 2003. [41] T. V. Mahendra, S. W. Hussain, S. Mishra, and A. Dandapat, “Low
[20] H.-Y. Li, C.-C. Chen, J.-S. Wang, and C. Yeh, “An AND-type match- discharge precharge free matchline structure for energy-efficient search
line scheme for high-performance energy-efficient content addressable using CAM,” Integration, vol. 69, pp. 31–39, Nov. 2019.
memories,” IEEE J. Solid-State Circuits, vol. 41, no. 5, pp. 1108–1119, [42] S. Ataei and J. E. Stine, “A 64 kB approximate SRAM architecture
May 2006. for low-power video applications,” IEEE Embedded Syst. Lett., vol. 10,
[21] C.-C. Wang, J.-S. Wang, and C. Yeh, “High-speed and low-power design no. 1, pp. 10–13, Mar. 2018.
techniques for TCAM macros,” IEEE J. Solid-State Circuits, vol. 43, [43] J. A. Butts and G. S. Sohi, “A static power model for architects,” in
no. 2, pp. 530–540, Feb. 2008. Proc. Int. Symp. Microarchitecture, Dec. 2000, pp. 191–201.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

VENKATA MAHENDRA et al.: ENERGY-EFFICIENT PF-TCAM FOR HIGH SEARCH RATE APPLICATIONS 13

[44] S. Krishnan, S. V. Garimella, G. M. Chrysler, and R. V. Mahajan, Sandeep Mishra (Member, IEEE) received the
“Towards a thermal Moore’s law,” IEEE Trans. Adv. Packag., vol. 30, B.Tech. and M.Tech. degrees in electronics and com-
no. 3, pp. 462–474, Aug. 2007. munication engineering from the Biju Patnaik Uni-
[45] Y.-J. Chang and Y.-H. Liao, “Hybrid-type CAM design for both power versity of Technology, Rourkela, India, in 2011 and
and performance efficiency,” IEEE Trans. Very Large Scale Integr. (VLSI) 2013, respectively, and the Ph.D. degree in VLSI
Syst., vol. 16, no. 8, pp. 965–974, Aug. 2008. design from the National Institute of Technology at
Meghalaya, Shillong, in 2018.
He is currently an Assistant Professor with
the Department of Electronics and Communication
Telajala Venkata Mahendra (Member, IEEE) Engineering, Indian Institute of Information Tech-
received the B.Tech. degree in electronics and nology at Pune, India. His research area of interest
communication engineering from JNTU, Kakinada, covers low-power VLSI design, memory design, mixed-signal circuits, analog-
India, in 2013, and the M.Tech. degree in VLSI to-digital converters, and intelligent transportation systems.
design from the National Institute of Technology
at Meghalaya, Shillong, India, in 2016, where
he is currently pursuing the Ph.D. degree with
the Department of Electronics and Communication
Engineering.
His research interests include low-power VLSI
designs, content addressable memories, volatile
memories, and CMOS integrated circuits.

Sheikh Wasmir Hussain received the bachelor’s Anup Dandapat (Senior Member, IEEE) received
degree in electronics and communication engineer- the Ph.D. degree in digital VLSI design from
ing from Visvesvaraya Technological University, Jadavpur University, Kolkata, India, in 2008.
Belgaum, India, in 2013, and the master’s degree in He is currently an Associate Professor with
VLSI design from the National Institute of Technol- the Department of Electronics and Communication
ogy at Meghalaya, Shillong, India, in 2017, where Engineering, National Institute of Technology at
he is currently pursuing the Ph.D. degree with Meghalaya, Shillong, India. He has authored over
the Department of Electronics and Communication 50 national and international journal articles. His
Engineering. current research interests include low-power VLSI
His research interests include high-performance design, low-power memory design, and low-power
memories and low-power VLSI designs. digital design.

Authorized licensed use limited to: INDIAN INST OF INFO TECH AND MANAGEMENT. Downloaded on May 23,2020 at 16:40:45 UTC from IEEE Xplore. Restrictions apply.

You might also like