TSV Redundancy: Architecture and Design Issues in 3D IC: 978-3-9810801-6-2/DATE10 © 2010 EDAA
TSV Redundancy: Architecture and Design Issues in 3D IC: 978-3-9810801-6-2/DATE10 © 2010 EDAA
Issues in 3D IC
Ang-Chih Hsieh† , TingTing Hwang† , Ming-Tung Chang‡, Min-Hsiu Tsai‡ , Chih-Mou Tseng‡ and Hung-Chun Li‡
† Department of Computer Science, National Tsing Hua University, HsinChu, Taiwan 300
‡ Global Unichip Corporation, Hsinchu, Taiwan 300
Abstract—3D technology provides many benets including box is required for each group to select which 4 TSVs
high density, high band-with, low-power, and small form-factor. are actually used to transfer signals. The advantage of this
Through Silicon Via (TSV), which provides communication links structure is that the delays of all signals are almost identical.
for dies in vertical direction, is a critical design issue in 3D
This is an attractive property for DRAM designs. Although
integration. Just like other components, the fabrication and
bonding of TSVs can fail. A failed TSV may cause a number this structure is suitable to the dedicated layout style of
of known-good-dies that are stacked together to be discarded. memory designs, the cost is too expensive for ASICs. Another
This can severely increase the cost and decrease the yield as fault tolerance scheme that utilizes redundant TSVs targets on
the number of dies to be stacked increases. A redundant TSV 3D network-on-chip (3DNoC) links [7]. Though signicant
architecture with reasonable cost for ASICs is proposed in this yield improvement is achieved, the analysis and design ow
paper. Design issues including recovery rate and timing problem
are based on the dedicated network structure of 3DNoC. For
are addressed. Based on probabilistic models, some interesting
ndings are reported. First, the probability that three or more ASICs, the analysis and design ow may not be suitable. In
TSVs are failed in a tier is less than 0.002%. Assumption of this paper, a redundant TSV architecture and related design
that there are at most two failed TSVs in a tier is sufcient to issues are discussed. The proposed redundant TSV design can
cover 99.998% of all possible faulty free and faulty cases. Next, successfully recover most of the failed chips and increase the
with one redundant TSV allocated to one TSV block, limiting yield to 99.99% based on probabilistic models.
the number of TSVs in each TSV block to be no greater than
The rest of this paper is organized as follows. First, in Sec-
50 and 25 leads to 90% and 95% recovery rates when 2 failed
TSVs are assumed. Finally, analysis on overall yield shows that tion II, the yield of TSV bonding is discussed. In Section III,
the proposed design can successfully recover most of the failed the proposed architecture for TSV redundancy is introduced.
chips and increase the yield of TSV bonding to 99.99%. This can Next, in Section IV, the recovery rate and the number of
effectively reduce the cost of manufacturing 3D ICs. redundant TSVs required for the proposed architecture is an-
alyzed. Probabilistic model is used for evaluation. The design
I. I NTRODUCTION
issues for timing and required design ow are explained in
3D integration techniques are proposed as solutions to Section V. Finally the conclusion of this work is given in
overcome the scaling limit [1]. 3D technology provides many Section VI.
benets including high density, high band-with, low-power,
and small form-factor [2]. Through-Silicon Via (TSV) [3], II. FAILURE R ATE A NALYSIS FOR TSV
which provides communication links for dies in vertical The fabrication of TSV-based 3D ICs can be partitioned
direction, is a critical design issue in 3D integration. In current into following stages. First, dies of each tier are fabricated
manufacturing process for 3D designs, each die to be inte- individually. The fabrication of TSVs in each tier takes place
grated is manufactured individually. When TSV technology in this stage. Depending on the technology (TSV rst/last),
is applied, TSVs and bond pads are fabricated inside each either reactive-ion etching (RIE) or laser drilling is performed
die [4][5]. Then, bonding technology is used for die stacking. before TSV metallization process. According to the diameter
Just like other components, the fabrication and bonding of and aspect ratio of TSV, proper material (Cu or W [5]) is
TSVs can fail. A failed TSV may cause a number of known- selected for metallization. In general, the size of a TSV is
good-dies that are stacked together to be discarded. This can much larger than other on-chip devices. This leads to certain
severely increase the cost and decrease the yield as the number unique defect features for TSV forming [8]. For example, void
of dies to be stacked increases. may be formed in TSV and causae a TSV to fail [10]. After the
To improve the yield, some recovery mechanism is needed. fabrication of TSVs, wafer thinning is performed. Presently,
A simple but effective solution is to add redundant TSVs most 3D IC processes require each tier to be less than 100
which can be used to replace failed TSVs. This idea has been microns [5]. The surface roughness is an important factor to
realized in 3D DRAM designs [6]. In the proposed scheme, for the yield of later bonding stage. When the dies of consecutive
every 4 signals, 6 TSVs are allocated as a group. A switching tiers are stacked, the TSVs of the die in upped tier need to be
bonded to the bond pads of the die on lower tier, as shown in
This work was supported in parts by National Science Council of Taiwan,
Republic of China, under grant NSC 98-2220-E-007-024, NSC 98A052
Figure 1. Due to the alignment problem, a bond pad is required
Upper Tier
90.00%
TSV
85.00%
80 00%
80.00% f = 0 0001 #tier = 2
f=0.0001,#tier=2
% of Chip
TSV TSV
f=0.0001,#tier=5
Lower Tier 75.00% f=0.00002,#tier=2
f=0.00002,#tier=5
70.00%
Fig. 1. Bonding between TSVs and Bond Pads
300 350 400 450 500
R_TSV
R_TSV
TSV_0
TSV_1
TSV_2
TSV_3
TSV_0
TSV_1
TSV_2
TSV_3
FAILED
0 1 0 1 0 1 0 1 0 1 0 1
TSV_1
TSV_2
TSV_3
a chain
The rst problem is related to the recovery of a 3D design.
In Section IV, an analysis based on probabilistic model is
0 1 0 1 0 1 sender
performed to answer this question. The second problem is
related to the timing behavior of shifted signals. Discussions
in_0 in_1 in_2 in_3 on timing issues and guidelines for TSV-chain design are
Fig. 3. Architecture for Redundancy TSV presented in Section V.
IV. R ECOVERY R ATE A NALYSIS
In this section, the relation between the number of TSVs
in each TSV block and recovery rate is analyzed based on
probabilistic models. First, based on the failure rate of a single
TSV, the expected number of TSVs that may fail in a tier
is discussed in Section IV-A. The result of Section IV-A
determines the maximum number of failed TSVs that are
expected to be recovered by our proposed TSV-chains. Next,
for an expected number of failed TSVs in each tier, the
Fig. 5. TSV Blocks
required number of TSV-chains as well as the size limit of
B. TSV Block and TSV-Chain each TSV block are discussed in Section IV-B.
Due to manufacturing and physical design issues, TSVs are A. Analysis on the Expected Number of TSVs to be Recovered
not recommended to be placed arbitrarily on a plane. From
the aspect of manufacturing, a regular placement of TSVs Let 𝐹 stand for the failure rate of a single TSV and 𝑁
improves the exposure quality of the lithographic process stand for the number of TSVs in a tier. The probability that
and therefore improves the yield. In real designs, TSVs are exact 𝑛 TSVs are failed in a tier can be expressed as
suggested to be placed regularly in TSV blocks which are 𝑃𝑓 𝑡𝑠𝑣=𝑛 = 𝐶𝑛𝑁 × (𝐹 𝑛 ⋅ (1 − 𝐹 )𝑁 −𝑛 )
determined in oorplan stage. Inside each TSV block, TSVs
are arranged in a grid-based structure to satisfy the pitch where 𝐶𝑛𝑁 represents the number of combinations of 𝑁 TSVs
constraint of bond pads. Examples of TSV blocks are shown with 𝑛 of them failed and 𝐹 𝑛 ⋅ (1 − 𝐹 )𝑁 −𝑛 represents the
in Figure 5. Obviously, it is undesirable for a TSV-chain to probability of 𝑛 chosen TSVs are failed while other 𝑁 − 𝑛
contain TSVs of different TSV blocks due to long wires TSVs are not. Next, the term 𝐶 𝑅𝑎𝑡𝑖𝑜𝑛 is dened as the
for signal shifting. Therefore, a TSV-chain in our design probability that the number of failed TSVs is no greater than
is suggested to contains TSVs in the same TSV block. 𝑛, including the faulty free condition (that is, 𝑛 = 0). This can
Moreover, we let each TSV block contain only one redundant be computed by accumulating 𝑃𝑓 𝑡𝑠𝑣=𝑖 for 0 ≤ 𝑖 ≤ 𝑛 and
TSV. This means, for each TSV block, only one TSV-chain is can be expressed as
dened. Nevertheless, in terms of recovery rate, the number of 𝑛
∑
TSVs in a TSV-chain needs to be limited. In case the number 𝐶 𝑅𝑎𝑡𝑖𝑜𝑛 = 𝑃𝑓 𝑡𝑠𝑣=𝑖 .
of TSVs in a TSV block is too large for one TSV-chain, the 𝑖=0
TSV block needs to partitioned to a number of smaller TSV The values of 𝑃𝑓 𝑡𝑠𝑣=𝑛 and 𝐶 𝑅𝑎𝑡𝑖𝑜𝑛 for 𝐹 = 0.0001 and
blocks. 𝑁 = {300, 400, 500} are listed in Table I.
The design issues for our the proposed TSV-chain are listed
TABLE I 100.00%
𝑃𝑓 𝑡𝑠𝑣=𝑛 AND 𝐶 𝑅𝑎𝑡𝑖𝑜𝑛 WHEN 𝐹 = 0.0001 90.00%
80.00%
𝑁 𝑛 𝑃𝑓 𝑡𝑠𝑣=𝑛 𝐶 𝑅𝑎𝑡𝑖𝑜𝑛
ecovery Rate
70.00%
0 97.0444% 97.0444%
300 1 2.9116% 99.9560% 60.00%
Recovery
2 0.0435% 99.9996% 50.00%
ignal to be shifted
tsv_0 70%
Timing Aware
tsv_1 60%
50%
tsv_2
40%
Signal
Probability
tsv_3 30%
20%
tsv_4
10%
1/5 2/5 3/5 4/5 5/5 0%
10 20 30 40 50 60 70 80 90 100
Fig. 7. All Possible Shifting Situations for a TSV-chain of Size 6 when 1
TSV is Failed # of TSVs in a TSV-Chain
Fig. 8. Evaluation on the Possibility for the Timing Sensitive Signal to Be
not be acceptable. In this section, timing issues for TSV-chain Shifted
design are discussed in Section V-A. The discussion leads signal to the head of a TSV-chain, the probability is reduced to
to the guidelines to link TSVs in a TSV block as a chain. 2.93% in average. Based on the evaluation, timing sensitive
Candidate TSV-chain structures are proposed in Section V-B. signals should always be routed through the TSVs located
In Section V-C, design issues in each stage of 3D design ow at the head of TSV-chains. This is one of the guideline that
are discussed. should be followed when designing TSV-chains.
A. Design Issues for Timing The next issue is to minimize the delay caused by sig-
nal shifting. This can be done by minimizing the distance
As explained in Section III-A, when a TSV is failed,
between the connected TSVs in a TSV-chain. As mentioned
according to the position of the failed TSV in a TSV-chain,
in Section III-B, TSVs in each block are placed in a grid-
one or more signals need to be shifted. Due to the chaining
based structure. Therefore, by requiring the connected TSVs
structure, even under the assumption that each TSV has
in a TSV-chain to be neighbors in the grid-based structure,
identical failure rate, the probability for each TSV in a TSV-
minimal and xed shifting delay can be guaranteed. This also
chain to be shifted varies. Figure 7 shows this situation.
makes the shifting delay predictable in early design stages.
Assume that 1 TSV is failed in a TSV-chain of size 6,
Thus, the second guideline for TSV-chain design is that any
all possible shifting situations are enumerated in Figure 7.
two connected TSVs in a TSV-chain must be next to each
When no TSV is failed and no shifting is required, the TSV-
other in the grid-based structure.
chain is shown in the right column of the rst row where
the redundant TSV is denoted as tsv r. For each row below, B. TSV-chain Design Problem
the left column indicates the failed TSV and the right column For each TSV block in a plane, the structure of the TSV-
shows the shifting situation. The last row lists the shifting chain needs to be considered. The analysis in Section V-A
probabilities of the TSVs in the TSV-chain when 1 TSV is indicates that timing critical signals should always be routed
failed. For tsv 0, no matter which TSV in the TSV-chain is through the TSVs located at the head parts of TSV-chains.
failed, it is always shifted because it is on the position next to In current design ow, signals that are assigned to each TSV
the redundant TSV. On the contrary, tsv 4, which is at the head block are roughly determined in oorplan stage. However, the
position of the TSV-chain, need not to be shifted unless itself exact assignment of signals to TSVs is not necessarily to be
is failed. In terms of extra delays introduced by signal shifting, done in this stage. From the perspective of physical design,
this property of TSV-chain indicates that the probability that leaving the assignment of signals to TSVs to be done in
the delay of a signal linked by a TSV is increased depends routing stage is benecial to minimize wire length. Therefore,
on the position of the TSV in the TSV-chain. This means, for in addition to the guidelines obtained in Section V-A, the
signals that are timing critical, it is preferable to assign these design of TSV-chain should also consider routing issues.
signals at the head parts of TSV-chains. Based on the concept of bounding box, discussion on wire
An evaluation for an extreme case where only one signal is length is given rst. For two pins on two different tiers to
timing critical is shown in Figure 8. The x-axis stands for the be connected, the relation between the bounding box of these
number of TSVs in a TSV-chain and the y-axis stands for the two pins and a TSV block can be listed as follows. First, the
probability that the timing critical signal is shifted. The line bounding box and the TSV block can be non-overlapped. In
denoted as “Unaware” represents that the timing critical signal this situation, only going through a TSV on the boundary of
has equal probability to be located at any position of a TSV- the TSV block can result in minimum wire length. Next, the
chain. And the line denoted as “Timing Aware” represents that TSV block can be either partially or completely overlapped by
the timing ctitical signal is always located at the beginning the bounding box. In this situation, any TSV that is overlapped
of a TSV-chain. Assume that the failure rate of each TSV by the bounding box can result in minimum wire length.
is identical and there is only one failed TSV. The result in Unless the bounding box is completely contained in the TSV
Figure 8 shows that, in “Unaware” cases, the probabilities for block, a TSV on the boundary of the TSV block can always
the timing critical signal to be shifted are greater than 50% be found for minimum wire length. The discussion shows
in all cases. On the contrary, by assigning the timing critical that, TSVs on the boundary of a TSV block have higher
3D Partitioning:
TSVs required for signal on each tier is determined
3D Floorplanning:
TSV Block are determined
1. Partitioning is required for large TSV blocks
2. The size of each TSV block is limited
Boundary of a Tier Corner of a Tier