High-Bandwidth Chiplet Interconnects For Advanced Packaging Technologies in AI ML Applications Challenges and Solutions
High-Bandwidth Chiplet Interconnects For Advanced Packaging Technologies in AI ML Applications Challenges and Solutions
Design and Technology Platform, Taiwan Semiconductor Manufacturing Company Ltd., San Jose, CA 95134, USA
CORRESPONDING AUTHOR: S. LI (e-mail: [email protected])
ABSTRACT The demand for chiplet integration using 2.5D and 3D advanced packaging technologies has
surged, driven by the exponential growth in computing performance required by artificial intelligence and
machine learning (AI/ML). This article reviews these advanced packaging technologies and emphasizes
critical design considerations for high-bandwidth chiplet interconnects, which are vital for efficient
integration. We address challenges related to bandwidth density, energy efficiency, electromigration, power
integrity, and signal integrity. To avoid power overhead, the chiplet interconnect architecture is designed to
be as simple as possible, employing a parallel data bus with forwarded clocks. However, achieving high-
yield manufacturing and robust performance still necessitates significant efforts in design and technology
co-optimization. Despite these challenges, the semiconductor industry is poised for continued growth and
innovation, driven by the possibilities unlocked by a robust chiplet ecosystem and novel 3D-IC design
methodologies.
INDEX TERMS 3Dblox, 3D-IC, advanced packaging, artificial intelligence (AI) and compute, chiplet
integration, energy efficiency, interconnects, Universal Chiplets Interconnect Express (UCIe).
c 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 4, 2024 351
LI et al.: HIGH-BANDWIDTH CHIPLET INTERCONNECTS FOR ADVANCED PACKAGING TECHNOLOGIES
FIGURE 1. Trend in the amount of compute used to train the ML models [2]. FIGURE 2. TSMC 3D Fabric Technology portfolio.
(a)
has RDL with 2.3-µm thick metal. Both has four layers
of metal for signal routing and an additional one layer for
power mesh. The former has a tighter metal width/spacing
granularity. With an 8-µm signal pitch for both cases, the
former can afford to have a much wider metal shield, and
slightly larger signal to signal spacing. As such, the former
is able to operate up to 32 Gb/s for the ×64 UCIe form
factor, whereas the latter is only capable of 16 Gb/s with
×32 data lanes due to the more severe crosstalk.
FIGURE 10. Universal 3D bump map form factor.
(b)
FIGURE 11. SoC level Scalability to support arbitrary 3D chiplet stacking (F2F/F2B
or rotation).
(a)
(b)
FIGURE 18. (a) UCIe 2.0 bumpmap example. (b) Modular design for Chiplet I/F.
(FOPLP) [63], [64] is also on the horizon, promising [12] D. C. H. Yu, C.-T. Wang, and H. Hsia, “Foundry perspectives on
higher packaging throughput, reduced costs, and potentially 2.5D/3-D integration and roadmap,” in Proc. IEEE Int. Electron
Devices Meeting (IEDM), 2021, pp. 3.7.1–3.7.4.
larger integrated systems at panel level, where warpage [13] S. Lie, “Wafer-scale AI: GPU impossible performance,” in Proc. 36th
control remains a significant challenge throughout the entire IEEE Hot Chips Symp., 2024, pp. 1–71.
packaging process [65], [66]. [14] E. Talpes, D. Williams, and D. D. Sarma, “DOJO: The microarchitec-
ture of Tesla’s Exa-scale computer,” in Proc. 34th Hot Chips Symp.,
In the meantime, the hunger for higher interconnect 2023, pp. 1–28.
data bandwidth density continues, for instance, the UCIe [15] S.-R. Chun et al., “InFO_SoW (system-on-wafer) for high
Consortium is working on a 48/64 Gb/s proposal for interdie performance computing,” in Proc. IEEE ECTC, 2020, pp. 1–6.
interconnect. For system scaling up and scaling out, on [16] S. S. Iyer, “Heterogeneous integration for performance and scal-
ing,” IEEE Trans. Compon., Pack. Manuf. Technol., vol. 6, no. 7,
package optical waveguide [67] and co-packaged optical pp. 973–982, Jul. 2016.
engine [68] remain appealing to the industry. [17] A. B. Ahmed and A. B. abdallah, “la-xyz: low latency, high throughput
Bigger systems necessitate vertical power delivery with look-ahead routing algorithm for 3-d network-on-chip (3D-NoC)
architecture,” in Proc. IEEE 6th Int. Symp. Embed. Multicore SoCs,
integrated magnetic components for efficient voltage regu- 2012, pp. 167–174.
lation [69], [70]. The larger scale integration of CPU, GPU, [18] Y. Feng, D. Xiang, and K. Ma, “Heterogeneous die-to-die interfaces:
HBM, SerDes, optical engines, and voltage regulators is Enabling more flexible chiplet interconnection systems,” in Proc. 56th
a significant undertaking, surpassing some of the existing Annu. IEEE/ACM Int. Symp. Microarchit., 2023, pp. 930–943.
[19] Y. Feng, D. Xiang, and K. Ma, “A scalable methodology for designing
engineering feats [13], [14], [15]. Achieving this requires efficient interconnection network of Chiplets,” in Proc. IEEE Int.
a collaborative effort across various industry partners to Symp. High-Perform. Comput. Archit. (HPCA), 2023, pp. 1059–1071.
manage different aspects of technology stacks to achieve high [20] J. Yin et al., “Modular routing design for chiplet-based systems,” in
Proc. ACM/IEEE 45th Annu. Int. Symp. Comput. Archit. (ISCA), 2018,
performance while ensuring exceptional power efficiency, SI, pp. 726–738.
thermal management, and structural robustness. [21] I. Lee, M. Cheong, and S. Kang, “Highly reliable redundant TSV
As the chiplet ecosystem becomes more robust and 3D-IC architecture for clustered faults,” IEEE Trans. Rel., vol. 68, no. 1,
pp. 237–247, Mar. 2019.
design methodologies advance, new possibilities and greater
[22] T.-H. Wang, P.-Y. Chuang, F. Lorenzelli, and E. J. Marinissen, “Test
innovations will emerge. and repair improvements for UCIe,” in Proc. IEEE Eur. Test Symp.
(ETS), 2024, pp. 1–6.
ACKNOWLEDGMENT [23] J. Lau, “Recent advances and trends in advanced packaging,” IEEE
The authors would like to express their gratitude for Trans. Compon., Pack. Manuf. Technol., vol. 12, no. 2, pp. 228–252,
Feb. 2022.
the insightful and regular discussions on 3D integration [24] R. Chaware, K. Nagarajan, and S. Ramalingam, “Assembly and
with King-Ho Tam, Homer Liu, S. J. Yang, Jim Chang, reliability challenges in 3-D integration of 28-nm FPGA die on a
T. C. Huang, Sandeep Goel, Cheng-Hsiang Hsieh, Frank large high density 65-nm passive interposer,” IEEE Trans. Electron
Devices, 2012, submitted for publication.
Lee, Carlos Diaz, Stefan Rusu, and L. C. Lu. [25] S. Hou et al., “Wafer-level integration of an advanced logic-memory
system through the second-generation CoWoS technology,” IEEE
REFERENCES Trans. Electron Devices, vol. 64, no. 10, pp. 4071–4077, Oct. 2017.
[1] D. Amodei and D. Hernandez. “AI and compute.” OpenAI. 2018. [26] Y.-H. Lin et al., “Multilayer RDL interposer for heterogeneous device
[Online]. Available: https://openai.com/index/ai-and-compute/ and module integration,” in Proc. IEEE ECTC, 2019, pp. 931–936.
[2] J. Sevilla, L. Heim, A. Ho, T. Besiroglu, M. Hobbhahn, and [27] M. Lin et al., “Organic interposer CoWoS-R+ (plus) technology,” in
P. Villalobos, “Compute trends across three eras of machine learning,” Proc. IEEE ECTC, 2022, pp. 1–6.
in Proc. Int. Joint Conf. Neural Netw. (IJCNN), 2022, pp. 1–8. [28] Y.-C. Hu et al., “CoWoS architecture evolution for next generation
[3] A. J. Lohn and M. Musser, AI and Compute: How Much Longer Can HPC on 2.5D system in package,” in Proc. IEEE ECTC, 2023,
Computing Power Drive Artificial Intelligence Progress? CSET Issue pp. 1022–1026.
Brief, Center Secur. Emerg. Technol., Washington, DC, USA, 2022. [29] S. Hou et al., “Integrated deep trench capacitor in Si interposer
[4] N. C. Thompson, K. Greenewald, K. Lee, and G. F. Manso, “The for CoWoS heterogeneous integration,” in Proc. IEEE IEDM, 2019,
computational limits of deep learning,” 2022, arXiv:2007.05558v2. pp. 19.5.1–19.5.4.
[5] Y.-J. Mii, “Semiconductor innovations, from device to system,” in [30] C.-F. Tseng, C. S. Liu, C.-H. Wu, and D. Yu, “InFO (wafer level
Proc. Symp. VLSI Technol. Circuits, 2022, pp. 276–281. integrated fan-out) technology,” in Proc. IEEE ECTC, 2016, pp. 1–6.
[6] S. Li, “F1: Transceivers for exascale: Towards Tbps/mm and sub- [31] K. Kim and M.-J. Park, “Present and future, challenges of high
pJ/bit: Advanced packaging and 3DIC interconnections,” in Proc. bandwith memory (HBM),” in Proc. IEEE Int. Memory Workshop
ISSCC, 2023, pp. 519–521. (IMW), 2024, pp. 1–4.
[7] B. Santo (EE Times, Portland, OR, USA). Chiplets: A Short History.
[32] D. B. L. Yolanda, “Wafer to wafer bonding to increase memory
Mar. 2021. [Online]. Available: https://www.eetimes.com/chiplets-a-
density,” in Proc. China Semicond. Technol. Int. Conf. (CSTIC), 2022,
short-history/
pp. 1–4.
[8] A. Tirumala and R. Wong, “NVIDIA blackwell platform: Advancing
generative AI and accelerated computing,” in Proc. 36th IEEE Hot [33] W. Gomes et al., “Ponte Vecchio: A multi-tile 3-D stacked processor
Chips Symp., 2024, pp. 1–33. for exascale computing,” in Proc. IEEE ISSCC, 2022, pp. 42–44.
[9] R. Kaplan, “Intel Gaudi 3 AI accelerator: Architected for Gen AI [34] J. Wuu et al., “3-D V-Cache: The implementation of a hybrid-bonded
training and inference,” in Proc. 36th IEEE Hot Chips Symp., 2024, 64MB stacked cache for a 7-nm ×86–×64 CPU,” in Proc. IEEE
pp. 1–16. ISSCC, 2022, pp. 428–429.
[10] D. D. Sharma, G. Pasdast, Z. Qian, and K. Aygun, “Universal [35] M.-F. Chen, F.-C. Chen, W.-C. Chiou, and D. C. Yu, “System on
chiplet interconnect express (UCIe): An open industry standard for integrated chips (SoIC(TM) for 3-D heterogeneous integration,” in
innovations with chiplets at package level,” IEEE Trans. Compon., Proc. IEEE ECTC, 2019, pp. 594–599.
Pack. Manuf. Technol., vol. 12, no. 9, pp. 1423–1431, Sep. 2022. [36] G. Kuo et al., “A thermally friendly bonding scheme for 3-D system
[11] “Universal chiplet interconnect express (UCIe) specification integration,” in Proc. IEEE ECTC, 2023, pp. 1973–1976.
revision 2.0.” Accessed: Jun. 7, 2024. [Online]. Available: [37] H.-J. Chia et al., “Ultra high density low temperature SoIC with sub-
https://www.uciexpress.org/ 0.5-µm bond pitch,” in Proc. IEEE ECTC, 2023, pp. 1–4.
MU-SHAN LIN was born in Taiwan in 1979. CHIEN-CHUN TSAI received the master’s degree
He received the master’s degree in electronics in electrical engineering from National Taiwan
engineering from National Chiao Tung University, University, Taipei, Taiwan, in 1996.
Hsinchu, Taiwan, in 2004. He has been working as a Design Engineer with
He has since dedicated his career to circuit Taiwan Semiconductor Manufacturing Company
design with Taiwan Semiconductor Manufacturing Ltd. (TSMC), Hsinchu, Taiwan, since November
Company Ltd. (TSMC), Hsinchu, Taiwan. He 1998. Over the years, his responsibilities have
specializes in high-speed SerDes (56/112 Gbps) included standard cell design, standard I/O,
and equalization development, DDR-PHY design, ESD, specialty I/O, high-speed SerDes, and
and parallel-bus forwarded-clock and low-swing chiplet interface design. He is currently the
interconnects for 2.5D/3-D-IC package applica- Department Manager of Advanced Connectivity
tions. He currently serves as the Technical Manager for the Advanced Taiwan, TSMC, focusing on SerDes and 2.5D/3D IC chiplet interface PHY
Connectivity Department, TSMC. His expertise is recognized through development.
numerous IEEE publications, including contributions to ASSC, VLSI, JSSC,
and HotChip conferences, and he holds several patents in collaboration with
TSMC.