BR-CIM An Efficient Binary Representation Computation-In-Memory Design
BR-CIM An Efficient Binary Representation Computation-In-Memory Design
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3941
SRAM-CIM is the column-wise multiplication and accumula- requirement, which makes it possible to deploy neural network
tion (MAC). Therefore, an extra computing unit is incurred to on low power and AI edge devices. The work of Bina-
realize XNOR operation. A mixed-signal accelerator is imple- ryConnect first constraints the weights to either +1 or −1
mented based on SRAM, 2 gate transistors, and 1 MOM capac- during propagations [10]. XNOR-Net took bit reduction a
itance is added to each bit cell [6]. XNOR-SRAM realized step further, which compress both weight and activation preci-
an In-Memory-Computing SRAM macro for Binary/Ternary sion. The classification accuracy on MNIST reached 98.74%.
DNN but each 6T cell is expanded to 12T [7]. The multiplication and accumulation (MAC) are replaced by
Second, the sense margin represents the voltage differ- XNOR operation and the bit count of that XNOR results [11].
ence between two adjacent results as Fig. 1 shows. Limited Binarized Neural Network introduced the training kernel on
sense margin makes it difficult for ADC to distinguish and GPU, which realized 7 times throughput improvement of
quantize two neighboring values. Each bit-wise computing unoptimized GPU kernel without suffering any loss in classifi-
result is inaccurate due to the non-linear functioning transistor. cation accuracy [12]. XOR-Net proposed a network computing
And the sense margin even shrinks with larger accumulating scheme, which is up to 17% faster and 19% more energy-
value. Because the sum operation accumulates deviated results, efficient than XNOR-Net [13]. More works contributed to
which scatter the occurrence probability of outcome voltage. improving the accuracy of binary neural networks in dif-
The distribution range of high accumulating value becomes flat ferent applications. The performance of both classification
and wide, which further degrades the margin. This challenge and semantic segmentation tasks are improved by Group-
will be thoroughly discussed in section II.B. Net [14]. Research on infrared image detection shows the
Third, the largest SRAM-CIM storage size is around comparable performance of binarized networks versus full
5Mb [8], which is far less than the growth of neural network precision networks [15]. There is also an attempt to utilize
size. Though the binary representation is able to reduce the the binary neural network for solving visual place recognition
parameter size, CIM is not large enough to store all parameters for resource-constrained platforms, such as the small robot or
so the weight within CIM will be updated frequently. And drone [16].
the intermediate result will be written back to SRAM for
pipeline computation. The latency of loading fresh weight B. SRAM-CIM Circuit Design Challenge
degrades the overall throughput. A ping-pong CIM is proposed Different device technology has already provided available
to support simultaneous computing and weight-update opera- computation-in-memory design. Among those memory tech-
tions [9]. However, a replicate cell array and extra input wire nologies, SRAM outperforms in lower read/write energy, and
are needed, which increases area overhead and difficulties of high data endurance, SRAM also develops with the newest
layout routing. fabrication process. Several SRAM-CIMs have been proposed
To overcome the limitation of previous computation-in- for energy-efficient AI applications. A 10T SRAM-based CIM
memory architecture, this work presents Binary Representation Conv-RAM is utilized to accelerate CNN [17]. A 6T SRAM
Computation-In-Memory (BR-CIM), which performs binary CIM performed multiplication and accumulation (MAC) of
computation within compact 6T SRAM and the contribution 8-b weight/activation for DNN [18]. To accelerate lower bit
of this work can be summarized as below. precision computing, a compact 6T CIM is implemented for
• A computation-in-memory design performs binarized BNN [19]. However, there are several design challenges for
computing within an SRAM array eliminating frequent SRAM-based CIM architecture.
data movement. The symmetric computing implementa- 1) Read Disturb: When a large number of WLs are acti-
tion supports variable bit precision and enlarges readout vated at the same time, the charge current in BL increases.
margin; Considering the bit line voltage drop lower than the writ-
• Based on the intrinsic column scheme, a column-wise ing threshold, a pseudo-write operation happens and bit-flip
MUX supports simultaneous computing and weight load- occurs in store-1 SRAM cell, especially for 6T SRAM [19].
ing, which shortens the latency of data loading; To mitigate the read disturb issue, the BL voltage is required
• The reconfigurable digital peripheral and mapping to be clamped (above the write trigger voltage) or the number
method enables to support of different computing pre- of activated WL is limited. Another method is using read/write
cisions for volatile algorithm requirements. decoupled SRAM structures, like 8T and 10T [17], which
The remainder of this article is organized as fol- avoid disturbing storage data by isolating read BL and write
lows. Section 2 introduces the background of binary BL. Albeit this method eliminates the read disturb directly,
neural network algorithm and computation-in-memory design. it increases the cell array area greatly.
Section 3 describes the architecture of the proposed 2) Sense Margin: The sense margin determines the dif-
computation-in-memory. Section 4 presents the configurable ficulty of ADC quantization and computing accuracy. The
mapping method. Section 5 presents the experiment setup and voltage swing of accumulated results is bounded by the
evaluation results. Section 6 is the conclusion of this work. supply voltage. With higher bit precision of accumulating
result, the discrete value that the result could represent grows
II. BACKGROUND
exponentially. More voltage nodes are needed to represent
A. Binary Neural Network all discrete values. Eventually, the voltage difference between
Binary Neural Network (BNN) has demonstrated its effi- two neighboring values shrinks or even overlaps with each
ciency in reducing the computation and memory storage other. This issue could be traced back to each activated cell.
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3942 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
TABLE I
T RUTH TABLE OF B INARY R EPRESENTATION XNOR O PERATION
Though the theoretical value is equal, the current contributed which leads to logic-analog mismatch challenge discussed in
by each bit cell is different due to the ‘asymmetric’ circuit the previous section. Therefore, it’s natural to map signed
design. For example, the partial result is viewed as 0 logically (+1/−1) to unsigned representation (+1/0). Observing the
but contains two possible cases in 6T SRAM-CIM [20]. When XNOR result in Table. I, the linear relationship between
input equals 0, the WL is not activated and the current two representations could be found: I Nsign × Wsign =
path from the cell array to BL is cut-off. But the bit cell 2(I Nunsign Wunsign ) − 1. Then Equ. 1 is proposed to prove
still contributes discharge current to the BL when weight that the signed multiplication and accumulation (MAC) com-
equals to 0 as the leakage current path still exists between puting could be linearly mapped to unsigned XOR accumula-
the bit cell and BL. Therefore, the significance of weight tion (XAC). After the linear transform, each operand could be
and activation is not equivalent or non-symmetric. Then the represented by physical quantity like voltage (VDD/0). And
deviation accumulates in SUM operation and leads to the XNOR operation is replaced by XOR operation, which will
quantization error. be explained in the last paragraph of section III.C.
As illustrated in previous XOR-CIM [21], multiple input
combinations may produce same logical partial result but
N
Result M AC = I Nsign × Wsign
acquire different analog values. Like Fig. 2 shows, two cases
contain identical logic computing results, but actual analog
N
accumulating current is different from each other. Because = (2(I Nunsign Wunsign ) − 1)
the bottom two bit-cells of the right column still contribute
N
current and the current path is totally cut-off when the WL is = (2 (I Nunsign Wunsign )) − N
not activated like the bit cell in the left column. The logic-
N
analog mismatch leads to the sense margin degradation = 2(N − I Nsign ⊕ Wsign )−N
3) ADC Overhead: The accumulated bit-wise current is
converted to a digital signal by ADC or sense amplifier. In gen-
N
eral, the area and power overhead of ADC are proportional = N − (2 (I Nunsign ⊕ Wunsign )) (1)
to its quantization precision and computing latency. A high To accommodate the algorithm accuracy requirement, the
resolution ADC is not necessary for CIM design because of the binary representation could be extended to multi-bit binary
large area overhead. In addition, the maximum bit precision is representation (MBR) in this work. Specifically, each single bit
restricted by the sense margin of the computing circuit, which has binary value and keeps its bit significance. For example,
degrades severely with higher bit-width. 1001 equals to (+1) × 23 + (−1) × 22 + (−1) × 21 + 1 or
a decimal value of 3. The relationship between 4-bit binary
III. M ACRO A RCHITECTURE
representation and corresponding decimal logic value is shown
A. Binary Representation in Fig. 3 right. The benefit of MBR is the range is extended
The weight and activation are quantized into one bit in even two times larger than 2’s complement representation
binary representation, each equal to +1 or −1 logically. But because of wider interval. Moreover, the basic operation for
it’s difficult to find a negative physical quantity and thus multiplication is still XNOR so that the computing unit for
requires logic to physical conversion. Previous works utilized binary representation could be reused.
relative force between pull up and pull down network [7], [19] Fig. 3 presents one example of binary mode MAC
to represent the positive and negative number, but can hardly computing. In binary mode, 4 groups of multiplications are
ensure the current contribution of two paths are identical, accumulated vertically. The logic value (+1/−1) is mapped
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3943
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3944 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3945
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3946 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3947
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3948 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3949
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3950 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3951
TABLE II
C OMPUTING -I N -M EMORY C OMPARISON
and weight precision, macro area and cell area, operation computing and eliminates frequent data movement. The com-
function, energy efficiency and evaluation algorithm. For fair puting unit employs a symmetric circuit design to enlarge the
comparison, most of the CIM are selected since those works sense margin and ensure the computing linearity. Based on the
are optimized for binary neural network or enable to perform intrinsic column peripheral, a load-compute MUX is utilized to
energy-efficient XNOR operation. support parallel weight loading and computing, which enables
The result indicates that the performance of this work is hidden the write access latency. Then the binary number is
superior to other SRAM-based CIM designed for binary com- extended to multi-bit representation to achieve a higher accu-
putations in power and area efficiency. In addition, this work racy requirement for the larger dataset. Lastly, flexible data
makes effort to enlarge the signal margin and improves the mapping is proposed to explore data parallelism under differ-
computing linearity. Previous works illustrate the effectiveness ent precision combinations and a reconfigurable digital periph-
of single-bit binary representation in lightweight network and eral completes partial result recovering and post-processing.
datasets, like MNIST. However reduced bit precision algo- The design is implemented in TSMC 28nm process. The
rithm can hardly have an acceptable accuracy encountering a area and energy efficiency outperform previous mixed-signal
larger dataset. Based on this observation, the weight/activation CIM work. We deploy binary quantization LeNet/ResNet-18
precision is extendable in this work to achieve acceptable on CIM Macro and achieve 97.82%/76.4% accuracy on
accuracy when handling a deeper network. The results show MNIST/CIFAR-100 dataset.
that multi-bit binary representation could realize prediction
accuracy equivalent to full precision representation and better R EFERENCES
than 8b MAC CIM array. In addition, a multi-bit binary
representation is based on a similar basic operation (XOR) [1] S. Yin et al., “An energy-efficient reconfigurable processor for
binary- and ternary-weight neural networks with flexible data bit
and could be realized by an intrinsic computing unit without width,” IEEE J. Solid-State Circuits, vol. 54, no. 4, pp. 1120–1136,
incurring complex overhead. The energy efficiency of BR-CIM Apr. 2019.
also outperforms other binary or multi-bit multiplication and [2] V. Seshadri et al., “Ambit: In-memory accelerator for bulk bit-
wise operations using commodity dram technology,” in Proc. 50th
accumulation CIM. The best energy efficiency is estimated Annu. IEEE/ACM Int. Symp. Microarchitecture (MICRO), Oct. 2017,
under 0.8V and 3ns cycle time. pp. 273–287.
[3] S. Yu and P.-Y. Chen, “Emerging memory technologies: Recent trends
and prospects,” IEEE Solid State Circuits Mag., vol. 8, no. 2, pp. 43–56,
VI. C ONCLUSION Spring 2016.
This work implements binary representation computation- [4] C.-X. Xue et al., “16.1 A 22 nm 4mb 8b-precision ReRAM computing-
in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices,”
in-memory (BR-CIM) for binary number system. The in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers,
cell array realizes in-situ XOR and accumulation (XAC) vol. 64, Feb. 2021, pp. 245–247.
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
3952 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 69, NO. 10, OCTOBER 2022
[5] J. Zhang, Z. Wang, and N. Verma, “A machine-learning classifier [26] R. Liu et al., “Parallelizing SRAM arrays with customized bit-cell for
implemented in a standard 6T SRAM array,” in Proc. IEEE Symp. VLSI binary neural networks,” in Proc. 55th ACM/ESDA/IEEE Design Autom.
Circuits (VLSI-Circuits), Jun. 2016, pp. 1–2. Conf. (DAC), Jun. 2018, pp. 1–6.
[6] H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A mixed-signal [27] Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An in-memory-
binarized convolutional-neural-network accelerator integrating dense computing SRAM macro based on robust capacitive coupling computing
weight storage and multiplication for reduced data movement,” in Proc. mechanism,” IEEE J. Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897,
IEEE Symp. VLSI Circuits, Jun. 2018, pp. 141–142. Jul. 2020.
[7] Z. Jiang, S. Yin, M. Seok, and J.-S. Seo, “XNOR-SRAM: In-memory [28] S. Yin, B. Zhang, M. Kim, J. Saikia, and J. S. Seo, “PIMCA: A
computing SRAM macro for binary/ternary deep neural networks,” in 3.4-MB programmable in-memory computing accelerator in 28nm for
Proc. IEEE Symp. VLSI Technol., Jun. 2018, pp. 173–174. on-chip DNN inference,” in Proc. Symp. VLSI Circuits, Jun. 2021,
[8] H. Jia et al., “15.1 A programmable neural-network inference accelerator pp. 1–2.
based on scalable in-memory computing,” in IEEE Int. Solid-State Cir- [29] H. Kim, T. Yoo, T. T.-H. Kim, and B. Kim, “Colonnade: A recon-
cuits Conf. (ISSCC) Dig. Tech. Papers, vol. 64, Feb. 2021, pp. 236–238. figurable SRAM-based digital bit-serial compute-in-memory macro for
[9] J. Yue, X. Feng, Y. He, Y. Huang, and Y. Liu, processing neural networks,” IEEE J. Solid-State Circuits, vol. 56, no. 7,
“15.2 A 2.75-to-75.9TOPS/W computing-in-memory NN processor pp. 2221–2233, Jul. 2021.
supporting set-associate block-wise zero skipping and ping-pong cim
with simultaneous computation and weight updating,” in IEEE Int.
Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2021,
pp. 238–240.
[10] M. Courbariaux, Y. Bengio, and J. P. David, Binaryconnect: Training
Deep Neural Networks With Binary Weights During Propagations. Zhiheng Yue received the B.S. degree in electronic
Cambridge, MA, USA: MIT Press, 2015. science and technology from the Beijing University
[11] M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, XNOR-Net: of Posts and Telecommunications, Beijing, China,
Imagenet Classification Using Binary Convolutional Neural Networks. in 2017, and the M.S. degree in electrical and
Cham, Switzerland: Springer, 2016. computer engineer from the University of Michigan,
[12] I. Hubara, D. Soudry, and R. E. Yaniv, “Binarized neural networks,” in Ann Arbor, MI, USA, in 2019. He is currently
Proc. Adv. Neural Inf. Process. Syst., 2016, p. 29. pursuing the Ph.D. degree in electronic science and
[13] S. Zhu, L. H. K. Duong, and W. Liu, “XOR-Net: An efficient compu- technology with Tsinghua University, Beijing.
tation pipeline for binary neural network inference on edge devices,” in His current research interests include deep learn-
Proc. IEEE 26th Int. Conf. Parallel Distrib. Syst. (ICPADS), Dec. 2020, ing, computation-in-memory, AI accelerator, and
pp. 124–131. very-large-scale-integration (VLSI) design.
[14] B. Zhuang, C. Shen, M. Tan, L. Liu, and I. Reid, “Structured binary
neural networks for accurate image classification and semantic seg-
mentation,” in Proc. Conf. Comput. Vis. Pattern Recognit., 2018,
pp. 413–422.
[15] J. Kung, D. Zhang, G. van der Wal, S. Chai, and S. Mukhopadhyay,
“Efficient object detection using embedded binarized neural networks,” Yabing Wang received the B.S. degree in electronic
J. Signal Process. Syst., vol. 90, no. 6, pp. 877–890, Jun. 2017. science and technology from Xidian University,
[16] B. Ferrarini, M. J. Milford, K. D. McDonald-Maier, and S. Ehsan, Xi’an, China, in 2018. He is currently pursuing the
“Binary neural networks for memory-efficient and effective visual M.S. degree with the School of Integrated Circuits,
place recognition in changing environments,” IEEE Trans. Robot., early Tsinghua University, Beijing, China.
access, Mar. 2, 2022, doi: 10.1109/TRO.2022.3148908. His current research interests include deep learn-
[17] A. Biswas and A. P. Chandrakasan, “Conv-RAM: An energy-efficient ing, computation-in-memory, and very-large-scale
SRAM with embedded convolution computation for low-power CNN- integration (VLSI) design.
based machine learning applications,” in IEEE Int. Solid-State Circuits
Conf. (ISSCC) Dig. Tech. Papers, Feb. 2018, pp. 488–490.
[18] X. Si et al., “15.5 A 28 nm 64kb 6T SRAM computing-in-memory
macro with 8b MAC operation for AI edge chips,” in IEEE Int. Solid-
State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2020, pp. 246–248.
[19] J. Kim et al., “Area-efficient and variation-tolerant in-memory BNN
computing using 6T SRAM array,” in Proc. Symp. VLSI Circuits,
Jun. 2019, pp. C118–C119. Yubin Qin received the B.S. degree from the School
of Electronic Science and Engineering, Southeast
[20] Z. Zhang et al., “A 55 nm 1-to-8 bit configurable 6T SRAM based
University, Nanjing, China, in 2020. He is cur-
computing-in-memory unit-macro for CNN-based AI edge processors,”
rently pursuing the Ph.D. degree with the School
in Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC), Nov. 2019,
of Integrated Circuits, Tsinghua University, Beijing,
pp. 217–218.
China.
[21] S. Huang, H. Jiang, X. Peng, W. Li, and S. Yu, “XOR-CIM: Compute-
His current research interests include deep learn-
in-memory SRAM architecture with embedded XOR encryption,” in
ing, very-large-scale integration (VLSI) design, and
Proc. IEEE/ACM Int. Conf. Comput. Aided Design (ICCAD), Nov. 2020,
hardware—software co-design.
pp. 1–6.
[22] V. H.-C. Chen and L. Pileggi, “An 8.5 mw 5GS/s 6b flash ADC with
dynamic offset calibration in 32 nm CMOS SOI,” in Proc. Symp. VLSI
Circuits, Jun. 2013, pp. C264–C265.
[23] S. Park, Y. Palaskas, and M. P. Flynn, “A 4-GS/s 4-bit flash ADC
in 0.18-µm CMOS,” IEEE J. Solid-State Circuits, vol. 42, no. 9,
pp. 1865–1872, Sep. 2007.
Leibo Liu (Senior Member, IEEE) received the B.S.
[24] J. Yue et al., “15.2 A 2.75-to-75.9TOPS/W computing-in-memory
degree in electronic engineering from Tsinghua Uni-
NN processor supporting set-associate block-wise zero skipping and
versity, Beijing, China, in 1999, and the Ph.D. degree
ping-pong CIM with simultaneous computation and weight updating,” in
from the Institute of Microelectronics, Tsinghua
IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, vol. 64,
University, in 2004.
Feb. 2021, pp. 238–240.
He is currently a Professor with the School of Inte-
[25] W.-S. Khwa et al., “A 65 nm 4kb algorithm-dependent computing-in- grated Circuits, Tsinghua University. His research
memory SRAM unit-macro with 2.3ns and 55.8TOPS/W fully parallel interests include reconfigurable computing, mobile
product-sum operation for binary DNN edge processors,” in IEEE computing, and very-large-scale integration digital
Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, Feb. 2018, signal processing (VLSI DSP).
pp. 496–498.
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.
YUE et al.: BR-CIM: AN EFFICIENT BINARY REPRESENTATION COMPUTATION-IN-MEMORY DESIGN 3953
Shaojun Wei (Fellow, IEEE) was born in Beijing, Shouyi Yin (Member, IEEE) received the B.S.,
China, in 1958. He received the Ph.D. degree from M.S., and Ph.D. degrees in electronic engineering
the Faculte Polytechnique de Mons, Mons, Belgium, from Tsinghua University, Beijing, China, in 2000,
in 1991. 2002, and 2005, respectively.
He became a Professor at the Institute of He was a Research Associate with Imperial Col-
Microelectronics, Tsinghua University, Beijing, lege London, London, U.K. He is currently a Full
China, in 1995. His main research interests Professor and the Vice-Director of the School of
include VLSI SoC design, electronic design Integrated Circuits, Tsinghua University. He has
automation (EDA) methodology, and communica- published more than 100 journal articles and more
tion application-specific integrated circuit (ASIC) than 50 conference papers. His research interests
design. include reconfigurable computing, AI processors,
Dr. Wei is a Senior Member of the Chinese Institute of Electronics (CIE). and high-level synthesis.
Dr. Yin has served as a Technical Program Committee Member of the top
very-large-scale integration (VLSI) and electronic design automation (EDA)
conferences, such as the Asian Solid-State Circuits Conference (A-SSCC),
the IEEE/ACM International Symposium on Microarchitecture (MICRO),
the Design Automation Conference (DAC), the International Conference on
Computer-Aided Design (ICCAD), and the Asia and South Pacific Design
Automation Conference (ASP-DAC). He is also an Associate Editor of
IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS —I: R EGULAR PAPERS ,
ACM Transactions on Reconfigurable Technology and Systems (TRETS), and
Integration, the VLSI Journal.
Authorized licensed use limited to: Chang Gung Univ.. Downloaded on October 20,2024 at 12:15:44 UTC from IEEE Xplore. Restrictions apply.