Tech 3
Tech 3
Abstract—The need for approximate rather than exact in various domains, including images, DNA sequencing,
search arises in numerous compare-intensive applications, from and biomedical data, there is a growing need for CAMs
networking to computational genomics. This brief presents a that can perform approximate search operations [3], [4],
novel sensing approach for approximate matching content- [5], [6]. Conventional CMOS-based or emerging resistive
addressable memory (CAM) designed to handle large Hamming memory-based CAMs typically support only exact search
distances (HDs) between the query pattern and stored data. The
operations [7], [8]. Other memory sensing schemes have been
proposed matchline sensing scheme (MLSS) employs a replica
mechanism and a 12-transistor positive feedback sense ampli- proposed [1], [2], [9], [10].
fier to effectively resolve the approximate match operation. The Several techniques have been proposed in literature to
MLSS was integrated into a 4 kB approximate CAM array and enable approximate matching in CAMs by leveraging cus-
fabricated in a 65 nm CMOS technology. With an overall area tomized sensing schemes and other solutions (i.e., involv-
footprint of 0.0048 mm2 , which includes 512 sense amplifiers and ing redundancy). For instance, error-correction codes have
the replica mechanism, the MLSS allows a flexible and dynamic been suggested for Ternary CAMs (TCAMs) and NAND-
adjustment of the HD tolerance threshold via several design vari- type CAMs, which employ parity bits and a dedicated
ables. Experimental measurements demonstrate the efficiency of matchline scheme [11], [12]. However, these methods can
our sensing scheme in tolerating very large HDs with the highest only handle a small Hamming distance (HD) of 1 to
sensitivity.
4 bits between the input query pattern and stored data,
Index Terms—HD-CAM, hamming distance, content- and their implementations require large area overhead and
addressable memory, approximate CAM, approximate match, increased design complexity. Tunable sampling time tech-
matchline sense amplifier. niques have also been explored [13]; however, their imple-
mentation is challenging due to the strong dependency on
precise device and circuit sizing, susceptibility to jitter,
I. I NTRODUCTION and higher probability of generating false results (matches
ONTENT-ADDRESSABLE memories (CAMs) are instead of mismatches and vice versa). A recent solution,
C widely used in many applications requiring high-speed
parallel search operations between an input query pattern
proposed in [14], presents a large HD-tolerant approximate
CAM (HD-CAM) based on matchline charge redistribution.
and the complete dataset stored within the memory [1], [2]. Unfortunately, the sensing scheme presented in this brief also
Due to the wide demand for similarity search in numer- suffers from a high degree of design complexity and large area
ous emerging applications such as compare-intensive big overhead.
data workloads, machine learning, and pattern recognition This brief proposes a low-complexity, scalable, and area-
efficient sensing scheme for approximate CAMs with a tunable
Manuscript received 4 April 2023; revised 22 May 2023; accepted 12 matchline discharge rate [14], [15]. Our sensing scheme con-
June 2023. Date of publication 14 June 2023; date of current version sists of a 12-transistor positive feedback sense amplifier along
25 September 2023. This work was supported in part by the European
Union’s Horizon Europe Programme for Research and Innovation under Grant with a replica mechanism that provides control of the sampling
101047160; in part by the Israeli Ministry of Science and Technology under time during approximate match operations. Specifically, the
Lise Meitner Grant for Israeli-Swedish Research Collaboration; and in part replica line enables the sensing of the sense amplifier that fur-
by the Italian Ministry of University and Research (MUR) through the Project ther resolves the compare result. Additional design variables
PRIN under Grant 2020LWPKH7. The work of Esteban Garzón was supported
by the Italian MUR under the call “Horizon Europe 2021–2027 Programme
allow adjusting the HD tolerance threshold and the sensitivity
under Grant H25F21001420001.” This brief was recommended by Associate of the proposed sensing scheme. A 4 kB HD-CAM design [14]
Editor Z. Di. (Corresponding author: Esteban Garzón.) integrating the proposed matchline sensing scheme, was fabri-
Esteban Garzón and Marco Lanuzza are with the Department of cated in a commercial 65 nm CMOS technology. The sensing
Computer Engineering, Modeling, Electronics and Systems, University of scheme of the HD-CAM design has a silicon footprint of
Calabria, 87036 Rende, Italy (e-mail: [email protected]; m.lanuzza@
dimes.unical.it). 0.0048 mm2 . The effectiveness of the suggested approximate
Roman Golman, Adam Teman, and Leonid Yavits are with the EnICS match sensing scheme (i.e., its sensitivity as a function of
Labs, Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel HD and its susceptibility to variations) is evaluated through
(e-mail: [email protected]; [email protected]; leonid.yavits@ experimental measurements.
biu.ac.il).
Color versions of one or more figures in this article are available at
This brief provides the following main contributions:
https://doi.org/10.1109/TCSII.2023.3286257. • To our knowledge, this is the first sensing scheme
Digital Object Identifier 10.1109/TCSII.2023.3286257 for approximate search CAM, enabling highly sensitive
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
3868 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, VOL. 70, NO. 10, OCTOBER 2023
Fig. 1. Overview of the Hamming Distance (HD) tolerant CAM (HD-CAM) based on tunnable matchline discharge rate. (a) HD-CAM array. (b) n-bit
HD-CAM word. (c) HD-CAM cell highlighting the storage, comparison and approximate match evaluation blocks. For the sake of simplicity, wordline (WL)
is not shown in the HD-CAM cell block of (b) and (c).
approximate match sensing capabilities, that has been associative search operation involves two steps: precharge and
fabricated and evaluated in silicon. evaluation. During the precharge step, the ML is precharged
• Our sensing scheme supports a very wide range of HD to VDD by enabling the MPC transistor (PC = ‘0’). This is
tolerance through user-configurable design variables. followed by the evaluation step, where the MPC transistor is
• The proposed sensing scheme presents low susceptibility cut off (PC = ‘1’), and the query data is loaded onto the SLs.
to sampling time, temperature, and process variations. The comparison between the query pattern and the data word
• Unlike state-of-the-art matchline sensing schemes, the is performed by the M1-M3 transistors. An evaluation transis-
proposed design utilizes the charge redistribution of a tor (M4) is integrated into the HD-CAM cell to regulate the
replica line to control the sense amplifier sampling time. discharge rate of the ML according to the evaluation voltage
level (Veval ). By controlling the voltage level on M4, HD-CAM
II. BACKGROUND : H AMMING D ISTANCE T OLERANT CAM can perform approximate matching when Veval < VDD , while
The HD tolerant CAM (HD-CAM), proposed in [14], is a conventional exact match CAM operation is executed when
capable of both exact and approximate matching; the latter M4 is driven by a full voltage level, Veval = VDD .
tolerating HDs of up to 60% of the length of the query pat-
tern. HD-CAM design is based on the observation that the III. P ROPOSED M ATCHLINE S ENSING S CHEME (MLSS)
matchline voltage drop is proportional to the HD between
the query pattern and a data word. To evaluate the effi- A. Design and Operating Principle
ciency of HD-CAM approximate matching, it was tested The proposed matchline sensing scheme (MLSS) is based
as a real-time DNA classifier programmed to detect Severe on a positive feedback sense amplifier (SA) that is controlled
Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) by a replica line, as illustrated in Fig. 2 (a). The ML replica
DNA in a metagenomic sample (i.e., containing the DNA line (MLRL) is composed of n replica transistors (Mn−1 to
of multiple organisms). Noteworthy attributes of HD-CAM M0 ) that are connected in parallel. The gate terminals of these
include its ability to tolerate large HDs with high sensitiv- devices are grounded, their drain terminals are connected to
ity and precision, its resilience to DNA sequencing errors and VDD , and their source terminals are connected to the MLRL.
sampling time variation, and its reduced area overhead and The MLSS includes three additional transistors (MN1, MN2,
design complexity. and MP), along with an inverter (I1). MN2 and MP are con-
Fig. 1(a) shows the top-level schematic view of an m × n trolled by the PC signal. The gate of MN1 is connected to the
HD-CAM [14]. Each row in the CAM has its own matchline replica voltage (Vrep ), which in the final design is the same as
(ML), which is connected to a ML sense amplifier (MLSA). Veval (or VDD ), i.e., does not require a separate voltage source
A pair of searchlines (SLs) are connected to all the bitcells (or MN1). However, for the sake of evaluating the MLSS sus-
in a column, thereby forming an n-bit HD-CAM word, as ceptibility to sampling time variations, we enable the Vrep to be
shown in Fig. 1(b). The precharge (PC) transistor (MPC) is biased separately to adjust the MLRL discharge, as presented
used to precharge the ML. The MLSA senses the state of the hereafter.
matchline against a reference voltage (Vref ). Fig. 1(c) shows The MLRL emulates the capacitance of a ML. Its output is
the NOR-type HD-CAM bitcell, which is built upon the con- the Sen signal that timely enables the sensing of the positive
ventional NOR-type CAM bitcell [1]. Similar to a standard feedback SA. Fig. 2(b) details the schematic of the positive
six-transistor static random access memory (6T-SRAM) cell, feedback SA. It comprises a pair of cross-coupled inverters
it is based on a pair of cross-coupled inverters for storing with four enable transistors (MEN1 -MEN4 ), whose gates are
data and accessed for write and read by enabling row access driven by Sen. MEN1 (MEN2 ) acts as header (footer) to con-
through the word line (WL) and driving SL and SL to oppo- nect the latch to (down to) VDD (ground). The last two enable
site logic values for write or pre-charging them for read. The transistors, MEN3 and MEN4 , are connected to the output
GARZÓN et al.: LOW-COMPLEXITY SENSING SCHEME FOR APPROXIMATE MATCHING CAM 3869
down the MLRL discharge, which in turn delays the Sen signal
assertion. Therefore, the design variable (Vrep ) may provide an
additional level of flexibility, enabling the fine-tuning of the
sensitivity response of the approximate match.
Fig. 4. (a) Top-level architecture of the HD-CAM memory array along with the matchline sensing scheme and peripherals. (b) Layout of the HD-CAM
array highlighting the replica row and sense amplifiers. In the inset: the positive feedback sense amplifier layout.
Fig. 5. (a) LEO-II SoC board along with a top-level view of the SoC
layout highlighting HD-CAM, the approximate search CAM equipped with
the proposed sensing scheme. (b) Main features of the test chip. (c) Photo
of the experimental setup. For the purposes of testing and control, an Intel
Cyclone-V FPGA is connected to the prototyping board.
TABLE I
C OMPARISON B ETWEEN THE P ROPOSED D ESIGN AND OTHER P OSSIBLE S ENSING S CHEMES C OMPATIBLE W ITH H AMMING D ISTANCE CAM
higher (lower) the Veval or Vref , the lower (higher) the HD scheme exhibits high sensitivity over a wide range of HDs
tolerance threshold. between the queries and stored data. Testing results show that
We also analyze the MLSS susceptibility to sampling time the proposed design can flexibly adjust the tolerance thresh-
variation, to model which we vary the Vrep . Fig. 6(c) shows old, while exhibiting very limited susceptibility to sampling
the MLSS sensitivity as function of Vref for VDD of 1.2 V time, temperature, and process variations. The proposed design
and different Vrep values. Two sets of measurement results offers an efficient, low-complexity and robust alternative to
are shown: for Veval of 1 V and 0.6 V. For Veval of 1 V, the state-of-the-art approximate search CAM sensing approaches.
sample timing variation shows a very little effect on the MLSS
sensitivity. For Veval of 0.6 V, this variation is higher mainly at R EFERENCES
Vrep of 0.6 V. Overall, the susceptibility to the sampling time [1] K. Pagiamtzis and A. Sheikholeslami, “Content-addressable memory
variation is limited over a wide range of Veval and Vref . (CAM) circuits and architectures: A tutorial and survey,” IEEE J.
Finally, we also analyze the MLSS temperature and process Solid-State Circuits, vol. 41, no. 3, pp. 712–727, Mar. 2006.
variability (shown in Fig. 6(d, e)), where about 100% sensitiv- [2] K. Pagiamtzis, N. Azizi, and F. N. Najm, “A soft-error tolerant content-
addressable memory (CAM) using an error-correcting-match scheme,”
ity is maintained for a wide range of temperatures, and across in Proc. IEEE Custom Integr. Circuits Conf., 2006, pp. 301–304.
5 different chips, respectively. Note that dynamic adjustment [3] M. Imani, Y. Kim, A. Rahimi, and T. Rosing, “ACAM: Approximate
of the MLSS design variables effectively resolves the issue computing based on adaptive associative memory with online learn-
ing,” in Proc. Int. Symp. Low Power Electron. Design, 2016,
when significant PVT variations adversely affect the target HD pp. 162–167.
tolerance. [4] M. Ali, A. Agrawal, and K. Roy, “RAMANN: In-SRAM differentiable
memory computations for memory-augmented neural networks,” in
C. Related Work and Comparison With State-of-the-Art Proc. ACM/IEEE Int. Symp. Low Power Electr. Design, 2020, pp. 61–66.
[5] M. M. Taha and C. Teuscher, “Approximate memristive in-memory ham-
Table I qualitatively compares the proposed matchline sens- ming distance circuit,” ACM J. Emerg. Technol. Comput. Syst., vol. 16,
ing scheme with other sensing approaches compatible with no. 2, pp. 1–14, 2020.
[6] R. Kaplan, L. Yavits, and R. Ginosasr, “BioSEAL: In-memory biological
approximate search CAMs. These approximate matching sequence alignment accelerator for large-scale genomic data,” in Proc.
MLSSs have a large area footprint, can only tolerate a lim- 13th ACM Int. Syst. Stor. Conf., 2020, pp. 36–48.
ited HD, and present a high degree of design complexity. [7] B. Song, T. Na, J. P. Kim, S. H. Kang, and S.-O. Jung, “A 10T-4MTJ
nonvolatile ternary CAM cell for reliable search operation and a com-
Garzón et al. [14] require a complex sizing process and extra pact area,” IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 64, no. 6,
peripherals. Krishnan et al. [11] use an analog comparator and pp. 700–704, Jun. 2017.
additional circuitry. Efthymiou [12] requires an encoder for [8] E. Garzón, M. Lanuzza, A. Teman, and L. Yavits, “AM4 : MRAM cross-
parity bits, a dedicated ML scheme, and an embedded com- bar based CAM/TCAM/ACAM/AP for in-memory computing,” IEEE J.
Emerg. Sel. Topics Circuits Syst., vol. 13, no. 1, pp. 408–421, Mar. 2023.
parator in each cell. Imani et al. [13] use delay lines at the [9] Z. Yang et al., “A novel computing-in-memory platform based on hybrid
clock inputs of four SAs per match line, as well as precise Spintronic/CMOS memory,” IEEE Trans. Electron Devices, vol. 69,
device and circuit sizing. The StrongARM comparator, used no. 4, pp. 1698–1705, Apr. 2022.
[10] N. Mohan, W. Fung, D. Wright, and M. Sachdev, “A low-power
in [14], as well as the proposed MLSS, assure better PVT ternary CAM with positive-feedback match-line sense amplifiers,” IEEE
stability, mainly due to the flexibility to adjust the tolerance Trans. Circuits Syst. I, Reg. Papers, vol. 56, no. 3, pp. 566–573,
threshold to a wide range of HDs [14]. To our knowledge, Mar. 2009.
none of these state-of-the-art designs have been silicon-proven. [11] S. C. Krishnan, R. Panigrahy, and S. Parthasarathy, “Error-correcting
codes for ternary content addressable memories,” IEEE Trans. Comput.,
In contrast, the proposed MLSS has been demonstrated, by vol. 58, no. 2, pp. 275–279, Feb. 2009.
means of silicon prototyping and measurements, to provide an [12] A. Efthymiou, “An error tolerant CAM with nand match-line organi-
efficient low-complexity and low-cost solution for the tunable zation,” in Proc. 23rd ACM Int. Conf. Great Lakes Symp. VLSI, 2013,
pp. 257–262.
matchline discharge rate-based approximate search CAM. [13] M. Imani, A. Rahimi, D. Kong, T. Rosing, and J. M. Rabaey, “Exploring
hyperdimensional associative memory,” in Proc. IEEE Int. Symp. High
V. C ONCLUSION Perform. Comput. Architect. (HPCA), 2017, pp. 445–456.
[14] E. Garzón et al., “Hamming distance tolerant content-addressable
This brief introduced a low-complexity, scalable, and area- memory (HD-CAM) for DNA classification,” IEEE Access, vol. 10,
efficient matchline sensing scheme for approximate search pp. 28080–28093, 2022.
[15] R. Hanhan, E. Garzón, Z. Jahshan, A. Teman, M. Lanuzza, and L. Yavits,
CAM based on tunable matchline discharge. Our circuit was “EDAM: Edit distance tolerant approximate matching content address-
fabricated as part of a 65 nm test chip and evaluated through able memory,” in Proc. 49th Annu. Int. Symp. Comput. Architect., 2022,
post-silicon testing and measurements. The proposed sensing pp. 495–507.
Open Access funding provided by ‘Università della Calabria’ within the CRUI CARE Agreement