0% found this document useful (0 votes)
66 views2 pages

2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator

This paper proposes a novel time-domain computing-in-memory core that implements XNOR-and-accumulate operations using a ring oscillator built within an 8T SRAM cell. The ring oscillator period represents the accumulation result of input XNOR values. This approach addresses signal margin issues in existing mixed-signal computing approaches by representing the output as time, which is converted to digital via a time-digital converter. The circuit is simulated using a 16nm CMOS process and achieves 463 TOPS/W efficiency while maintaining correct functionality and a large signal margin.

Uploaded by

AMANDEEP SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views2 pages

2) A - Time-Domain - Computing-In-Memory - Micro - Using - Ring - Oscillator

This paper proposes a novel time-domain computing-in-memory core that implements XNOR-and-accumulate operations using a ring oscillator built within an 8T SRAM cell. The ring oscillator period represents the accumulation result of input XNOR values. This approach addresses signal margin issues in existing mixed-signal computing approaches by representing the output as time, which is converted to digital via a time-digital converter. The circuit is simulated using a 16nm CMOS process and achieves 463 TOPS/W efficiency while maintaining correct functionality and a large signal margin.

Uploaded by

AMANDEEP SINGH
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

A Time-Domain Computing-In-Memory Micro using

Ring Oscillator
Yixuan He1, Minsu Choi2, Kyung-Ki Kim3, Yong-Bin Kim1
1Dept. of ECE, Northeastern University, Boston, MA, USA
2021 18th International SoC Design Conference (ISOCC) | 978-1-6654-0174-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ISOCC53507.2021.9613954

2Dept. of ECE, Missouri University of Science & Technology, Rolla, MO, USA
3Dept. of Elctronic Eng., Daegu University, Gyeongsan, Korea

[email protected], [email protected], [email protected], [email protected]

Abstract— This paper proposes a novel time-domain cell to store the XNOR output and do change the shared charge
computing-in-memory core that implements XNOR-and- between all the capacitors in a row to get the average voltage as
accumulate (XAC) of XNOR network in 8T SRAM cell. This new the accumulation output. It consumes much less energy than
technique uses an inverter-based ring oscillator to generate others because it does not waste DC current during the whole
periodic waves whose period represents the accumulation result of process. Moreover, Ref. [3] presented an 8T1C SRAM array
the input XNOR values. The circuit is built and simulated using using capacitive-coupling computing. The output is determined
PTM16_HP 16nm CMOS model with a 0.7V power supply. The by the ratio of the capacitors. Thus, it is less sensitive to
results show correct functionality, a large signal margin and 463 temperature and transistor process variation. However, those
TOPS/W efficiency. With further exploration, the time-domain
approaches have limitations such as poor signal margin,
computation could be a new candidate for in-memory computing
since it has its own superiorities in comparison to mixed-signal or
especially when large numbers of SRAM are integrated in the
digital methods. same row for higher efficiency. When many inputs are
connected, the voltage difference between adjacent output
Keywords— Artificial intelligence (AI), computing-in-memory values becomes small comparing to the offset voltage of ADC
(CIM), static random access memory (SRAM), ring oscillator. and thereby causes an error.
In this paper, a new type of implementation for XNOR
I. INTRODUCTION computing is proposed to address the signal margin problem in
With the rapid development of artificial intelligence (AI) and existing mixed-signal approaches. The inverter-based ring
Internet of Things (IoT), numerous challenges and constraints oscillator is used to perform XNOR and accumulation
have been imposed for existing computing architecture. In those operations, and the output is represented by time and converted
edge applications like smartphone or self-driving vehicle, to digital data through a time-digital converter (TDC).
instant inference or even on-time training on chip is often II. PROPOSED TIME-DOMAIN XNOR COMPUTING
considered as the goal for future AI. Therefore, high-speed and
low-power computing and data movement becomes a dare need The proposed time-domain structure is shown in Fig. 1. As
and can directly determine the performance of machine learning it shows, the SRAM is modified as an 8T cell so that it can
algorithms implemented in computing architectures. perform XNOR logic operation depending on the input and the
As for traditional von Neumann topology, data is weight stored in itself. The XNOR value is represented by Sn
continuously moved between memory and computing units in with logic “0” or “1”. Besides, there are totally (3+2n) inverters
series. When the large-scale algorithm is applied to the system, connected in series to form a ring oscillator (RO) in each SRAM
massive data transfer can be expected and results in high latency row and they are controlled by the Sn (“n” means the index of
and power consumption. In addition, this conundrum is often the SRAM cell in array and figures use “0” as an example).
referred to as the “memory-wall bottleneck” associated with the When S0 is logic high, it enables the two inverters associated
conventional computing architecture for machine learning with this memory cell. And when the logic is low, inverters are
applications. In order to catch up with the blossom in AI disabled to save power and an additional pass is created which
algorithms and more intricate neural networks, beyond-von shorts the inverters. Therefore, the ring oscillator has (3+2n_1 )
Neumann machine, such as the computing-in-memory (CIM) inverters in series which results in oscillating frequency and the
technology is treated as a convincing candidate to break through period are
the memory wall due to its nature of eliminating the distance and
barrier between memory and computing units. fRO = 1⁄(2τ(3+2n_1 )) (1)
In fact, plenty of efforts have been made in implementing
XNOR neural network in SRAM using mixed-signal CMOS TRO = 2τ(3+2n_1 ) (2)
circuits and reached promising results [1]. This kind of network
has 1-bit input and 1-bit weight to do XNOR logic and
accumulation operation. Ref. [2] presented a BNN where τ represents the time delay for a single inverter and n_1 is
implementation using 8T1C (8 transistors and 1 capacitor) the number of Sn that are logic high.
SRAM CIM cell. A small capacitor is attached to each SRAM

978-1-6654-0174-6/21/$31.00 ©2021 IEEE 107 ISOCC 2021


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 09:56:54 UTC from IEEE Xplore. Restrictions apply.
Furthermore, a reference ring oscillator (RRO) with three
inverters is implemented to provide a reference time TRRO = 6τ.
The time-digital converter takes the falling edge of RRO and RO
in each row as the starting point and endpoints, respectively.
Thus, the time period entered into the TDC is
1
Tmeasure = (TRO -TRRO ) = 2τn_1 (3)
2

Thereby, the accumulation result can be written as the time


and quantized by TDC to get digital output. Fig. 3. The schematic of inverter components assigned to each memory cell.

IV. SIMULATION RESULTS


The circuit is implemented and simulated using 16nm PTM
CMOS model with a 0.7V supply. For verification, n=7 is
chosen for the number of cells in a row. Figure 4 shows the half
cycle of the ring oscillator with different numbers of input high
XNOR values. When n=0, which is also the output of the RRO,
is shown is Fig. 4 as the dark red line. From the results, we can
get the interval between the adjacent output value is
∆T=2τ=273ps and the Tmeasure =n_1 ∆T. Thus, the simulation
proves the functionality of the system discussed in previous
sections. Besides, the total power results in a 463 TOPS/W with
5ns per cycle.

Fig. 1. The block diagram of proposed time-domain XNOR computation. V. CONCLUSION


This paper proposes a new approach using time-domain
III. CIRCUIT IMPLEMENTATION calculation using ring oscillators and results in proper
functionality, large-signal margin, and acceptable speed and
This section shows the detailed circuit implementation of the power efficiency. This method does not contain capacitors or
proposed system. As shown in Fig. 2, two additional transistors other analog components. Thus, it can be easily integrated with
M7 and M8 are added to the general 6T SRAM cell to generate the fully digital system with low power consumption.
XNOR output S0 . The input signals “In” and “In” are connected
to their drain and the weights control their gates. Since the stored
values are connected to the gates, leakage current loss is greatly
reduced, and the read disturbs problem in the common SRAM
read operation has also been eliminated.

Fig. 4. The simplified illustration of proposed compensation technique.

REFERENCES
[1] C. -J. Jhang, C. -X. Xue, J. -M. Hung, F. -C. Chang and M. -F. Chang,
Fig. 2. The circuit schematic of proposed 8T SRAM cell. "Challenges and Trends of SRAM-Based Computing-In-Memory for AI
Edge Devices," in IEEE Transactions on Circuits and Systems I: Regular
Papers, vol. 68, no. 5, pp. 1773-1786, May 2021.
As for the ring oscillator shown in Fig. 3, each SRAM cell
[2] H. Valavi, P. J. Ramadge, E. Nestler and N. Verma, "A Mixed-Signal
controls two inverters through five switches. The S0 pointed to Binarized Convolutional-Neural-Network Accelerator Integrating Dense
the switch in the figure means that when S0 is “1”, the switch is Weight Storage and Multiplication for Reduced Data Movement," 2018
on and vice versa. Therefore, when S0 is high, top four switches IEEE Symposium on VLSI Circuits, 2018, pp. 141-142.
are closed and inverters are connected to others in the RO. In [3] Z. Jiang, S. Yin, J. Seo and M. Seok, "C3SRAM: An In-Memory-
addition, when S0 is low, the circuit is equivalent to a short Computing SRAM Macro Based on Robust Capacitive Coupling
Computing Mechanism," in IEEE Journal of Solid-State Circuits, vol. 55,
circuit. no. 7, pp. 1888-1897, July 2020.

978-1-6654-0174-6/21/$31.00 ©2021 IEEE 108 ISOCC 2021


Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on June 20,2022 at 09:56:54 UTC from IEEE Xplore. Restrictions apply.

You might also like