High-Bandwidth Memory Interface Design
High-Bandwidth Memory Interface Design
Chulwoo Kim
[email protected]
Dept. of Electrical Engineering
Korea University, Seoul, Korea
February 17, 2013
Chulwoo Kim 1 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 2 of 86
Outline
Introduction
DRAM 101
Simplified DRAM Architecture and Operation
Differences of DRAM (DDRx, GDDRx, LPDDRx)
Trend
Memory Interface: Differences and Issues
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 3 of 86
D D D D D D D D
CLK
DQ
SDRAM
SDR
Single Data Rate
DDR
Double Data Rate
Main Memory
DDRx
PC, Notebook, Server
Graphics Memory
GDDRx
Graphic Card, Console
Mobile Memory
LPDDRx
Phone, Tablet PC
CLK
DQ D
CLK
DQ
D D
CLK
Command C
CAS* Latency
Burst Length
MCU
SDRAM
DRAM 101
Synchronous
Dynamic
Random
Access
Memory
Introduction
CLK &
Command
Data
*CAS : Column Address Strobe
Chulwoo Kim 4 of 86
DRAM DDR4 Die Photo
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Bank
0
Bank
1
Bank
2
Bank
3
Bank
8
Bank
9
Bank
10
Bank
11
Bank
4
Bank
5
Bank
6
Bank
7
Bank
12
Bank
13
Bank
14
Bank
15
Supply Voltage VDD=1.2V, VPP=2.5V
Process 38nm CMOS /3-metal
Banks 4-Bank Group, 16 Bank
Data Rate 2400 Mbps
Number of IOs X4 / X8
Introduction
Chulwoo Kim 5 of 86
Bank
Simplified DRAM Architecture
Bank
Peripheral Circuit
Cell Array
Column Repair Fuse
Write Drv. / Read Amp.
Column Decoder
R
o
w
R
e
p
a
i
r
F
u
s
e
R
o
w
D
e
c
o
d
e
r
W
o
r
d
L
i
n
e
D
r
i
v
e
r
CLK/ADD/CMD Buffer
CMD
Controller
DLL
G
e
n
e
r
a
t
o
r
BLSA*
BLT BLB
WL
ICLK DCLK
DQ TX
Serial to
parallel
Parallel
to serial
DQ RX
Bank Bank
* BLSA : Bit line sense amplifier
Introduction
Chulwoo Kim 6 of 86
Concept of DRAM operation
Bank
Bank
Bank Bank
*BLSA : Bit line sense
amplifier
*Np: Number of
pre-fetch
*Ndq: Number of DQ
Peripheral Circuit
GIO
Ndq bits Ndq bits
WRITE
: Serial to parallel
(DQ GIO)
READ
: Parallel to serial
(GIO DQ)
DQ RX DQ TX
Serial to
parallel
Parallel
to serial
BLSA
BLSA
NpNdq
NpNdq bits
*GIO : Global I/O
Introduction
Chulwoo Kim 7 of 86
tCCD*=1
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR1,BL*=2)
0
[2] JEDEC, JESD79F, pp. 24-29
1 0 1
DQS
DQ
CLK
Number of GIO channel=NpNdq=28=16 (DDR1 x8)
After CL*
* tCCD : CAS to CAS delay * CL : CAS latency
* BL : Burst length
Introduction
BL*=2
Chulwoo Kim 8 of 86
Pre-fetch Diagram(DDR1)
Num. of GIO channel
= 2Ndq
Pre-fetch operation
2-bit pre-fetch
[2Ndq] data access
(If the output data rate is 400Mbps, the internal data rate is
200Mbps)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction
Chulwoo Kim 9 of 86
tCCD=2
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR2,BL=4)
[3] JEDEC, JESD79-2F, pp. 35
0 1 2 3 0 1 2 3
DQS
DQ
CLK
Number of GIO channel=NpNdq=48=32 (DDR2 x8)
* RL : READ latency
After RL*
Introduction
BL=4
Chulwoo Kim 10 of 86
Pre-fetch Diagram(DDR2)
Num. of GIO channel
= 4Ndq
Pre-fetch operation
4-bit pre-fetch
[4Ndq] data access
(If the output data rate is 800Mbps, the internal data rate is
200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction
Chulwoo Kim 11 of 86
tCCD=4
RD
RD
GIO GIO GIO
Pre-fetch Timing(DDR3,BL=8)
[4] JEDEC, JESD79-3F, pp. 62
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
CLK
Number of GIO channel=NpNdq=88=64 (DDR3 x8)
After RL
Introduction
BL=8
Chulwoo Kim 12 of 86
Pre-fetch Diagram(DDR3)
Num. of GIO channel
= 8Ndq
Pre-fetch operation
8-bit pre-fetch
[8Ndq] data access
(If the output data rate is 1.6Gbps, the internal data rate is
200Mbps, same as DDR1)
Bank Bank Bank Bank
Bank Bank Bank Bank
Introduction
Chulwoo Kim 13 of 86
[5] JEDEC, JESD79-4, pp. 77-78
[6] T. Y. Oh et al., ISSCC 2010, pp. 434-435
Bank Grouping Timing(DDR4,BL=8)
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
DQS
DQ
tCCD_S=4 tCCD_L=5
RD
G0
RD
G1
RD
G1
GIO_BG0
GIO_BG1 GIO_BG1
GIO_BG0
GIO_BG1
GIO_BG2
GIO_BG3
CLK
Number of GIO channel=NpNdqNgroup=884 =
256(DDR4 x8)
After RL
Introduction
BL=8
Chulwoo Kim 14 of 86
GIO
MUX
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Pre-fetch & Bank Grouping(DDR4)
Num. of GIO channel
= 8Ndq
Bank Bank Bank Bank
Bank Bank Bank Bank
Group0 Group1
Group2 Group3
Pre-fetch operation
8-bit pre-fetch
Bank grouping
Introduction
Chulwoo Kim 15 of 86
DDRx GDDRx LPDDRx
Architecture
Application PC/Server Graphic card Mobile/Consumer
Socket DIMM On board MCP*/PoP*/SiP*
IO 4/8 16/32 16/32
Unique
Function
Single uni-directional
WDQS, RDQS
VDDQ termination
CRC, DBI
ABI
No DLL
DPD*
PASR*
TCSR*
Differences of DDRx,GDDRx,LPDDRx
Bank
PAD
Bank
Bank Bank
PAD
Bank Bank
Bank Bank
PAD
Bank
PAD
Bank
Bank Bank
* MCP: Multi chip package
* PoP : Package on package
* SiP : System in package
* DPD: Deep power down
* PASR : Partial array self refresh
* TCSR : Temperature compensated self refresh
Introduction
Chulwoo Kim 16 of 86
DDR Comparison
DDR1 DDR2 DDR3 DDR4
VDD [V] 2.5 1.8 1.5 1.2
Data Rate
[bps/pin]
200M~400M 400M~800M 800M~2.1G 1.6G~3.2G
Pre-Fetch 2 bit 4 bit 8 bit 8 bit
STROBE Single DQS Differential DQS, DQSB
Interface SSTL_2 SSTL_18 SSTL_15 POD_12
New
Feature
OCD calibration
ODT
Dynamic ODT
ZQ calibration
Write leveling
CA parity
DBI*, CRC*
Gear down
CAL* PDA*
FGREF * TCAR*
Bank grouping
* DBI: Data bus inversion
* CRC: Cyclic redundancy check
* CAL: Command address latency
* PDA: Per DRAM addressability
* FGREF: Fine granularity refresh
* TCAR: Temperature controlled array refresh
Introduction
Chulwoo Kim 17 of 86
GDDR Comparison
GDDR1 gDDR2 GDDR3 GDDR4 GDDR5
VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35
Data Rate
[bps/pin]
300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G
Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit
STROBE Single DQS
Differential
Bi-direction
DQS*, DQSB
Single Uni-direction WDQS, RDQS
Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15
New
Feature
OCD*
calibration
ODT*
ZQ DBI
Parity(opt)
No DLL
PLL(option)
WCK, WCKB
CRC ABI*
RDQS(option)
Bank grouping
* DQS: DQ strobe signal, DQ is dada I/O Pin
* OCD: Off chip driver
* ODT: On die termination
* ABI: Address bus inversion
Introduction
Chulwoo Kim 18 of 86
LPDDR Comparison
LPDDR1 LPDDR2 LPDDR3
VDD [V] 1.8 1.2 1.2
Data Rate
[bps/pin]
200M~400M 200M~1066M 333M~1600M
Pre-Fetch 2 bit 4 bit 8 bit
STROBE DQS DQS_T, DQS_C DQS_T, DQS_C
Interface SSTL_18* HSUL_12* HSUL_12*
DLL X X X
New
Feature
CA pin ODT
(High tapped termination)
* SSTL: Stub series terminated logic
* HSUL: High speed un-terminated logic
Introduction
Chulwoo Kim 19 of 86
Trend
2.5
1.5
1.8
0.2 0.4 0.8 1.2 1.6 2.0
1.2
2.4
DDR1
GDDR1
7.0
Although all types of DRAMs are
reaching their limits in supply voltage,
the demand of high-bandwidth
memory is keep increasing
DDR2
GDDR3
DDR4
LPDDR2
LPDDR3
2.8 3.2 3.6
V
D
D
[
V
]
Data Rate [Gbps]
LPDDR1
DDR3
gDDR2
GDDR4 GDDR5
Introduction
Chulwoo Kim 20 of 86
Memory Interface
System Feature
Single-ended/high speed
Many channel
(weak for coupling effect)
DDR: multi-drop
(multi rank, multi DIMM)
GDDR: point to point
Impedance discontinuities
(stubs, connector, via, etc. )
Issue
Reflection
Inter-symbol interference
Simultaneous switching output
noise
Pin to pin skew
Poor transistor performance
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
CPU
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
D
R
A
M
GPU
DRAM
DRAM
DRAM DRAM
DRAM
DRAM
Introduction
Chulwoo Kim 21 of 86
Outline
Introduction
Clock Generation and Distribution
Delay-locked loop (DLL)
Duty cycle corrector (DCC)
Clock distribution
Transceiver Design
TSV
Conclusions
References
Chulwoo Kim 22 of 86
Basic DLL Architecture
Variable
Delay Line
Replica
Delay
Controller PD
DRAM External
Clock
Data
tD
1
tD
REP
tD
VDL
I_CLK
FB_CLK
O_CLK
I_CLK
FB_CLK
O_CLK
Clock
Data
tD
2
DATA from
memory core
Clock Generation and Distribution
tD
1
tD
2
tD
REP
tCK N = tD
VDL
+tD
REP
tD
REP
tD
1
+tD
2
tCK N = tD
VDL
+tD
1
+tD
2
+
= tD
REP
(tD
1
+tD
2
)
tD
VDL
Chulwoo Kim 23 of 86
Replica Delay Mismatch
Valid
Data
Window
t
C
K
tDQSCK* (or tAC)
L
o
n
g
S
h
o
r
t
V
D
D
H
V
D
D
L
V
D
D
tDQSCK (or tAC) tDQSCK (or tAC)
V
D
D
H
V
D
D
L
V
D
D
Valid
Data
Window
Valid
Data
Window
variation [ps]
Supply Voltage [V]
*tDQSCK (or tAC) DQS output access time for CK/CKb
Clock Generation and Distribution
0
>0
<0
Chulwoo Kim 24 of 86
Locking Range Considerations
[7] H.-W. Lee et al., submitted to TVLSI
t
C
K
tDQSCK (or tAC)
Birds beak
I_CLK
I_CLK
FB_CLK
FB_CLK
tD
INIT
+tD
REP
tD
REQUIRED
Clock Generation and Distribution
tD
INIT
+tD
REP tD
REQUIRED
tD
INIT
= tD
VDL(0)
+ tD
REP
Chulwoo Kim 25 of 86
S
h
o
r
t
L
o
n
g
NtCK > tD
VDL(0)
+ tD
REP
tCK = tD
VDL
+ tD
REP
+ t
Delay Measure Delay Line
Replicate Delay Line
Clock
OUT
tD
1
tD
2
tD
1
+tD
2
tD
3
Synchronous Mirror Delay (SMD)
Basic Operation
Measure and replicate the delay
No feedback
Match delay in two cycles
tD
1
tD
1
+tD
2
tD
3
tD
3
tD
2
OUT
I_CLK
Clock
Replicate
Measure
Replica
Delay
[8] T. Saeki et al., ISSCC 1996, pp. 374-375
Clock Generation and Distribution
I_CLK
Chulwoo Kim 26 of 86
Disadvantages of SMD
Disadvantages
Mismatch between replica delay and input buffer & clock
distribution
Coarse resolution
Input jitter multiplication
Delay Measure Delay Line
Replicate Delay Line
Clock
OUT
tD
1
tD
2
tD
1
+tD
2
tD
3
Clock
Clock
w/o jitter
w/ jitter
tD
1
tD
1+
tD
2
tCK-(tD
1
+tD
2
)
tD
2
OUT
tCK-(tD
1
+tD
2
)+2
- +
OUT
Input pk-pk
jitter()
Output pk-pk
jitter(2)
tCK-(tD
1
+tD
2
)+2
tCK
tD
1
tD
1+
tD
2
tD
2
+2
Clock Generation and Distribution
I_CLK
Chulwoo Kim 27 of 86
Register Controlled DLL
Locking information is stored digitally in register
Vernier type delay line increases resolution
[9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73
tD+
tD+
tD+
tD+
tD
tD
tD
tD
tD
SW0 SW1 SW2 SW3 SW4
IN
OUT
tD+
tD
fan-out=2
fan-out=1
SW(n-1)
SW(n)
Sub Delay Line
Main Delay Line
Sub Delay Line
Main Delay Line
Clock Generation and Distribution Chulwoo Kim 28 of 86
Single Register Controlled Delay Line
Clock Generation and Distribution
Fine Delay
Controller
I_CLK
CSL1
CSL2
CSL3
IN1
IN2
OUT12
Phase
Mixer
1-K
K
IN1
IN2
OUT12
OUT1
OUT2
OUT12
OUT1
IN2
IN1
OUT2
tUD
tUD
Coarse Delay
UP/DN*
from PD
*DN=Down
Chulwoo Kim 29 of 86
Boundary Switching Problem
IN1(1-K)+IN2K
I_CLK
Shift left
Passing through 4 UDCs
IN1
IN2
OUT12
Phase
Mixer
UDC*
Passing through 3 UDCs
Clock Generation and Distribution
tUD
IN1
K=0
IN2
K=1
tUD
IN1
K=0
IN2
K=1
K=0.9
K=0.9
Coarse shift & fine
reset do not occur
simultaneously
Chulwoo Kim 30 of 86
*UDC=Unit delay cell
Seamless Boundary Switching
Clock
Shift left
Unit Delay Cell
IN1(1-K)+IN2K
Dual Coarse Delay Line
tUD
K(0K1)
IN1
K=0
IN2
K=1
IN1
IN2
Phase
Mixer
OUT12
Clock Generation and Distribution
K=0.9
[10] J.-T. Kwak et al., VLSI 2003, pp. 283-284
tUD
IN2
K=1
IN1
K=0
K=1.0
Fine set first
and then
coarse shift
Chulwoo Kim 31 of 86
Adaptive Bandwidth DLL w/ SDVS*
Variable
Delay Line
Replica
Delay
Controller PD
I_CLK
FB_CLK
Update Period
Pulse Gen.
O_CLK
To Upper Block
NCODE<0:N>
I_CLK
Update
Pulse
FB_CLK
Update Period
mtCK-tD
REP
+tD
REP
=mtCK
m=2,BW
DLL
=1/(2tCK)
[11] H.-W. Lee et al., ISSCC 2011, pp. 502-504
Clock Generation and Distribution
6
8
10
12
14
16
18
DN BASE UP
15.9 ps
10.2 ps
7.8 ps
6
10
14
18
Low
-Speed
Mode
High
-Speed
Mode
Base
[
p
s
]
Fine Unit Delay vs. Mode
Update Pulse
*SDVS: Self-dynamic voltage scaling
Chulwoo Kim 32 of 86
Duty Cycle Corrector (DCC)
DCC
Reduces duty cycle error
Enlarges valid data window for DDR
Needs to correct 15% duty error at max speed
Can be implemented either in analog or digital type
DCC Design Issues
Location of DCC (before/after DLL)
Embedded in DLL or not
Power consumption
Area
Operating frequency range
Locking time in case of digital DCC
Offset of duty cycle detector
Clock Generation and Distribution Chulwoo Kim 33 of 86
Digital DCC
Invert-Delay
Clock
Generator
IN
Out
Phase
Mixer
Pulse Width
Controller
Duty Cycle
Detector
Half-Cycle
Delayed
Clock
Generator
Edge
Combiner
Out
Out
Invert and delay
50% 50%
50% 50%
OUT
IN
IN
OUT
IN
OUT
HD_IN
IN
IN
IN
HD_IN
IN
50% 50%
Clock Generation and Distribution Chulwoo Kim 34 of 86
DCC in GDDR5
Clock Generation and Distribution
R
X
D
i
v
i
d
e
r
CML2
CMOS
DQ
PLL sel.
CML only
Duty Cycle
Detector
Adder-
based
Counter
Duty Cycle
Corrector
Control Pulse
Generator
4-phase
4
PLL
G
l
o
b
a
l
D
r
i
v
e
r
R
e
p
e
a
t
e
r
Duty
Cycle
Adjuster
up/dn
s<1:4>
c<1:5>
4
rxclk rxclkb
sw hclk & lclk
4
4 4
DQ
Clk Distribution
clock
Network
Decreasing
CML_bias
WCK WCKb
X1 X2 X4 X8 X1 X2 X4 X8
c<1:5>
Duty-Cycle
RX
rxclk
r
x
c
l
k
r
x
c
l
k
b
Decoder
rxclkb
Adjuster
duty-cycle
(DCA)
DCA is not in clock path
No jitter addition
[12] D. Shin et al., VLSI 2009, pp. 138-139
Chulwoo Kim 35 of 86
DLL-related Parameters & Reference
DDR1 DDR2
VDD
Lock time
Max. tDQSCK
200 cycles 200 cycles
333MHz~
800MHz
600MHz~
1.37GHz
2~20K cycles
2.5V
600ps
166MHz
1tCK
1.8V 1.5V/1.35V 1.8V 1.5V
Nominal
speed
tXPDLL*(tXARD)
Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns
300ps 225ps 180ps 140ps
333MHz 1.6GHz
512 cycles 2~5K cycles
DDR3/DDR3L GDDR3 GDDR4
2tCK 10tCK
7tCK+tIS 9tCK+tIS
RELATED AREA
DCC block
Variable
Delay Line
Delay
Control Logic
Replica
Low Jitter
REFERENCE Type
23** 14 18 19** 20 22 24 25* 26
23* 26 13 15** 16 18 20 21**
31 32* 33**
27[28]** [29] [30]
29 30**
34* 35*
32 27 [28** 30**
14 [36* 15** 16 32* 24 26 27 17** 19**
14 25* 28**
tXPDLL*(tXARD) Timing for exit precharge power-down to any non-READ command
Clock Generation and Distribution
digital
*
mixed
**
analog
13 14 15** 16 17** 19** 20 21** 18
Chulwoo Kim 36 of 86
Clock Distribution
DQ DQ DQ DQ DQ DQ DQ
DQ DQ DQ DQ DQ DQ DQ DQ
Global
Clock
Buffer
CK/CKB
DQ
Clock Distribution Issues
Clock skew among DQs
Low power
Robust under PVT variations
CML to CMOS converter jitter
[37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500
1
,
2
0
0
m
93,750m
Clock Generation and Distribution Chulwoo Kim 37 of 86
CML to CMOS Converter
Global Clock Buffer
Current logic mode : high-speed clock
CML to CMOS Converter Issue
Susceptible to noise
Jitter
CLK
P
CLK
N
OUT
N
OUT
P
Global Clock Buffer CML to CMOS Converter
1700m
DQ
CLK
P
CLK
N
CLK
OUT
Clock Generation and Distribution Chulwoo Kim 38 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
Channel
Pre-emphasis
Equalizer
Crosstalk and skew
Training
Input buffer
Output driver
DBI/CRC
TSV Interface for DRAM
Summary
References
Output
driver
Training
Pre-emphasis
DBI/CRC
Input
buffer
Training
Equalizer
DBI/CRC
C
H
Chulwoo Kim 39 of 86
Channel Characteristics
GDDRx
Point to point connection
Performance target
High data rate
Few reflection components
PCB VIAS
DDRx
Multidrop
Performance and power
Many reflection components
PCB VIAS, DIMM connector.
GPU
GDDRx
G
D
D
R
x
D
I
M
M
S
l
o
t
CPU
Socket
Transceiver Design
Chulwoo Kim 40 of 86
Emphasis for Channel Compensation
Time
Channel
Original Signal Distorted Signal
D(in) FFE
D(out)
FFE
A
m
p
l
i
t
u
d
e
A
m
p
l
i
t
u
d
e
A
m
p
l
i
t
u
d
e
Channel
FFE Channel
Freq. fdata/2 Freq. Freq. fdata/2 fdata/2
A
m
p
l
i
t
u
d
e
Time
A
m
p
l
i
t
u
d
e
Channel
Transceiver Design
Chulwoo Kim 41 of 86
Pre-emphasis vs. De-emphasis
Pre-emphasis : Transition Bit Boosting
De-emphasis : Non-transition Bit Suppression
1-tap pre-emphasis
No emphasis
1-tap de-emphasis
Va
Va
Va
Time
Transceiver Design
Chulwoo Kim 42 of 86
Basic De-emphasis Circuit
The Number of Taps
Depends on the channel quality and bit rate
Usually from one to three taps
D Q
QB
D
in
D
out
K
0
Unit
delay
-K
1
X(n)
Y(n)
<1-tap de-emphasis model> <Basic 1-tap de-emphasis circuit >
Transceiver Design
Chulwoo Kim 43 of 86
Pre-emphasis Circuit[1/2]
Cascaded Pre-emphasis
Internal node ISI due to limited TR performance at high speed
Internal node pre-emphasis ratio would not be affected by the
channel
Less sensitive to the system environment or channel variations
[38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134
D
in
(n-1)
D
in
(n-2)
Driver
Pre-
emph.
DQ
DQB
D
in
(n)
4:2
4:2
4:2
2:1
2:1
2:1
2:1
No
Pre-emphasis
Conventional
Pre-emphasis
Proposed
Pre-emphasis
400 0
Time[psec]
1.04
1.20
1.08
1.20
1.00
1.20
V
o
l
t
a
g
e
[
V
]
Transceiver Design
Chulwoo Kim 44 of 86
Pre-emphasis Circuit[2/2]
[39] H. Partovi et al., ISSCC, 2009, pp.136-137
Voltage Mode Driver Pre-emphasis
Additional zero by Cc
Time continuous pre-emphasis
P
r
e
-
D
r
i
v
e
r
Main
Driver
P
r
e
-
D
r
i
v
e
r
R
T
R
T
D
in
R
C
C
C
R
C
C
P
D
out
TX
Pre-Emph. Driver
Boosting Capacitor
C
L
R
T
GPU
BW
BW
CH
D
in
R
C
R
T
C
C
D
out
C
L
Equivalent Linear Model
C
P R
T
Transceiver Design
Chulwoo Kim 45 of 86
DFE cancels ISI without noise amplification
Clock must be provided by DLL or PLL
Critical path (feedback path) is important
(A) (B) (C) (D)
Decision Feedback Equalization (DFE)
Time
A
m
p
l
i
t
u
d
e
1UI
Time
A
m
p
l
i
t
u
d
e
ISI
Time
A
m
p
l
i
t
u
d
e
Emulated
ISI
Time
A
m
p
l
i
t
u
d
e
No ISI
Transceiver Design
[40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010
Chulwoo Kim 46 of 86
[41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279
The previously captured data
must be fed back to the
receiver within 1UI
WCK/2_0
DQ V
ref
WCK/2_0
P0
b
P0
WCK/2_0
P270
b
P270
WCK/2_0
DFE SA
DQ
DFE SA
V
ref
WCK/2_0
WCK/2_90
DFE SA
DFE SA
WCK/2_180
WCK/2_270
SR Latch
SR Latch
SR Latch
SR Latch
P270
P180
P90
P0
D0
D270
D180
D90
DQ
WCK/2_270
P270
WCK/2_0
P0
Precharge Evaluation
Precharge Evaluation
D270 D0 D90
T
FB
=T
SA
<1UI
T
FB
1UI
-ISI
Transceiver Design
Fast Feedback 1-tap DFE
Chulwoo Kim 47 of 86
Crosstalk is coupling of energy from one line to another
Crosstalk
Timing Effect
Timing Jitter
Signal Integrity
Near end
crosstalk
Far end
crosstalk
Input signal
Input signal
at far end
Near Far
C
m
Near Far
L
m
I
C
m
I
L
m
I
near
=I
C
m
+I
L
m
I
far
=I
C
m
I
L
m
Transceiver Design
Chulwoo Kim 48 of 86
Staggered Memory Bus
No discrepancy of propagation delay due to the crosstalk
Difference of transition point is /2
Distance between channels with the same transition is
increased
Jitter due to coupling from the adjacent channel is reduced
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
MCU DRAM
Staggered
Memory Bus
Channel
Channel
Transceiver Design
Chulwoo Kim 49 of 86
Compensation for glitch by adding or subtracting current
Rise : I
COMP
is added to the main driver
Fall : I
COMP
is subtracted from the main driver
Glitch Canceller
[42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
Transceiver Design
TX1
Transition
Detector
D
TX3
TX3
TX2
I
BIAS
+I
COMP
D
TX1
D
TX2
Rise/Fall
Aggressor
Victim
D
TX1
Rise
Fall
D
TX2
Chulwoo Kim 50 of 86
Crosstalk equalization at transmitter
Cancel the crosstalk by the impedance calibration
Crosstalk Equalizer (TX)
[37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500
DO[0]
DO[1:3]
DQ[0]
EN[0:5]
DO[0]
t
DO[1]
DQ[0]
Crosstalk Equalizing Driver
EN[1]
EN[0] EN[1]
EN[0]
Transceiver Design
Chulwoo Kim 51 of 86
Skew
Differences of flight time between signals
Skew can cause timing errors
Key design criterion in high-speed systems
Transceiver Design
MCU/GPU
DRAM
B
a
n
k
B
a
n
k
P
e
r
i
p
h
e
r
a
l
C
i
r
c
u
i
t
D
L
L
C
M
D
C
o
n
t
r
o
l
l
e
r
S
e
r
i
a
l
.
P
a
r
a
l
l
e
l
Generator
T
D
T
D
CLK
Command
DQS
DQ
Address
T
D
Chulwoo Kim 52 of 86
Pre/De-skew with Preamble Signal
Skew cancellation circuit is put in each DRAM
With estimated skew information
De-skew the data during write mode
Pre-skew the data during read mode
[43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657
Data
Delay
Lines
PLL
Mux
Register
Files
Skew
Estimator
Skewed Data
Data
Ext.Clk
Data[n]
Skew
De-skewed
Data
Sampling
Clk
8
8
3
8
3
8
Transceiver Design
Chulwoo Kim 53 of 86
Fly-by Topology for DDR3
[4] JEDEC, JESD79-3E, pp. 56-59
Fly-by Topology
Better signal integrity to reduce
the number of stubs and stub
length
Easy to apply a single
termination at the end of signal
DQ and DQS are applied to each
DRAM at the same time
Large skew bw. CLK and DQS
Need to calibrate skew
DRAM
#1
DRAM
#2
DRAM
#7
DRAM
#8
T-branch
CLK, CMD, Address
DRAM
#1
DRAM
#2
DRAM
#7
DRAM
#8
CLK, CMD, Address
S
k
e
w
[
s
]
DRAM
#1
DRAM
#2
S
k
e
w
[
s
]
DRAM
#3
DRAM
#4
DRAM
#5
DRAM
#6
DRAM
#7
DRAM
#8
DRAM
#1
DRAM
#2
DRAM
#3
DRAM
#4
DRAM
#5
DRAM
#6
DRAM
#7
DRAM
#8
DQ & DQS
Fly-by
DQ & DQS
V
TT
T-branch Topology
CLK/CMD/Address are applied to
each DRAM in parallel
Small skew bw. CLK and DQS
Transceiver Design
Chulwoo Kim 54 of 86
Write Leveling for DDR3
Write Leveling
Timing mismatch compensation between CLK and DQS
Write leveling is applied to all DRAMs, respectively
[4] JEDEC, JESD79-3F, pp. 56-59
T0 T1 T2 T3 T4 T5 T6 T7
T0 T1 T2 T3 T4 T5 T6 Tn
CK#
CK
diff_DQS
CK#
CK
diff_DQS
DQ
DQ
diff_DQS
Source
Destination
Push DQS to capture
0-1 transition
0 or 1
0 or 1
0 0 0
1 1 1
Transceiver Design
Chulwoo Kim 55 of 86
Training for GDDR5
Adaptive Interface Training
Ensure the Widest Timing Margins for All Signals
Controlled by MCU
[44] W. Hubert et al., ATS, 2008, pp. 24-27
CK
CMD
ADDR
WCK
DQ
GDDR5 Timing after Training
Transceiver Design
Chulwoo Kim 56 of 86
Training Sequence for GDDR5
Optional
Optimize address input data eye
Clock alignment
Ready for read/write
Search for best read data eye
Detect burst boundaries of read stream
Search for best write data eye
Detect burst boundaries of write stream
[45] JEDEC, JESD212, pp. 23-39
Detect the configuration and mirror function
ODT setting
Transceiver Design
Power Up
Address Training
WCK2CK
Alignment Training
READ Training
WRITE Training
Exit
Chulwoo Kim 57 of 86
Training Example : Write Training
[44] W. Hubert et al., ATS, 2008, pp. 24-27
t
0
+ t
1
Memory Controller
GDDR5 Device
<After Write Training>
Write
Data eyes
t
1
t
2
Memory Controller GDDR5 Device
<Before Write Training>
Write
Data eyes
Data eyes
t
1
t
2
t
0
t
0
t
0
t
0
Data eyes
t
0
- t
2
Transceiver Design
Chulwoo Kim 58 of 86
Input Buffer
Convert attenuated external signal to rail-to-rail signal
Trade-off between high speed operation and power consumption
Transceiver Design
DRAM MCU/GPU
DQS
B
a
n
k
B
a
n
k
CLK
Command
DQ
P
e
r
i
p
h
e
r
a
l
C
i
r
c
u
i
t
D
L
L
C
M
D
C
o
n
t
r
o
l
l
e
r
S
e
r
i
a
l
.
P
a
r
a
l
l
e
l
GEN
4
n
Address
m*
* m: The number of address channels which are depend on kinds of memory or its density
Chulwoo Kim 59 of 86
Input Buffer Comparison
CMOS Type
Simple circuit
Low-speed input (CKE)
Susceptible to noise
Unstable threshold
Differential Type
Complex circuit
High-speed input
Robust to noise
Stable threshold
Commonly used
In OUT
En
En
OUT
En En
In
Vref
En
Transceiver Design
Chulwoo Kim 60 of 86
DDR4 Input Buffer
[46] K. Sohn et al., ISSCC, 2012, pp. 38-40
Gain Enhanced Buffer
Signal transition detector is added
The bias level (I) is controlled
Sensitivity can be enhanced
at higher frequencies
Wide Common-Mode Range DQ Buffer
Delivers stable inputs to
the second stage Amp.
Feedback network reduces the
output common-mode variation
Vref In
CMFB
Amp.
In
Vref
InBuffer
Transition
Detector
I
* CMFB : Common-mode feedback
Transceiver Design
Chulwoo Kim 61 of 86
Pseudo Open Drain (POD)
Impedance Calibration
Manual vs. Automatic
External Resistor
240
D
in
D
in
Pull-UP
Pull-DOWN
D
in
D
in
I/O
Buffer
Channel
240
Transceiver Design
Chulwoo Kim 62 of 86
Impedance Calibration
Thermometer Code Control
PU PU
REG
PD
REG
DRAM External
PU
con
PD
con
Vref
En
En
ZQ
PAD
D
out
n
n
WP
R
WN
R
WP
R
WN
R
WP
R
WN
R
D
in
+
PU
con
D
in
+
PD
con
[47] C. Park et al., JSSC, Apr. 2006, pp. 831-838
Transceiver Design
Chulwoo Kim 63 of 86
Multi Slew-rate Output Driver
Binary-weighted Code Control
PU PU
DF
PD
DF
DRAM External
PU
con
PD
con
Vref
En
En
DF = Digital LPF + UP/DOWN Counter
ZQ
PAD
D
out
WP/4 WP/2 WP 32WP
128R 64R 32R R
WN/4 WN/2 WN 32WN
128R 64R 32R R
60
120
240
n
n
D
in
+
PU
con
D
in
+
PD
con
[48] D. U. Lee et al., ISSCC, 2008, pp. 280-613
Transceiver Design
Chulwoo Kim 64 of 86
Global ZQ Calibration
Global Impedance Mismatch Error < 1%
PVT variation sensor
LS
PA
CP
LO
Ref.
Z
Zcal
i
0
cal
(-)
i
0
cal
O
D
T
c
a
l
i
b
r
a
t
i
o
n
b
l
o
c
k
a
t
Z
Q
p
i
n
Zcal
DQ0 ZQ
LS
PA
CP
LO
Ref.
CP: Comparator
PA: Pre-amplifier
LS: Local PVT sensor
LO: Local controller
i
0
cal
DQn (n=1~31)
Z
Global Reference Signal
[49] J. Koo et al., CICC, 2009, pp. 717-720
Transceiver Design
Chulwoo Kim 65 of 86
Data Bus Inversion (DBI)
Power reduction technique independent of data pattern
Dominant power (I/O Buffer)
P= X C
PCB
X V
DD
2
< 0.5
For high-BW memory, inversion time +CRC can be a bottle
neck
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver Design
Chulwoo Kim 66 of 86
Cyclic Redundancy Check (CRC)
Data error check for every unit interval (64 bits data only)
Redundancy bit : 1 bit/byte
Speed bottleneck for high-BW
Time (READ DBI + READ CRC + CRC calculator) < 9 periods
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Transceiver Design
Error type Detection rate
random single bit 100%
random double bit 100%
random odd count 100%
burst 8 100%
Chulwoo Kim 67 of 86
CRC (contd)
X
8
+X
2
+X
1
+1 with an initial value of 0
Algorithm for GDDR5 ATM-0M83
Logic for algorithm takes a long time
To increase CRC speed XOR logic optimization
CRC calculation time < T
CRC
Transceiver Design
Chulwoo Kim 68 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Bandwidth requirement
DRAM with TSV
TSV DRAM type
DRAM stacking type
Data confliction issue & solution
Failed TSV issue & solution
Summary
References
Chulwoo Kim 69 of 86
Bandwidth Requirements
Requirement
Next GDDR will require over 10Gb/s/pin data rate
Restrictions
Very difficult over 10Gb/s/pin
Cost for performance improvements
Power consumption
2000 2005 2010 2015
0
2
4
6
8
10
12
DDR
DDR2
DDR3
DDR4
GDDR3
GDDR4
GDDR5
D
a
t
a
R
a
t
e
/
P
i
n
[
G
b
p
s
]
DDRx / GDDRx Data Rate/Pin Trend
?
Gb/s/pin Gb/s/chip
GDDR1 32 1
GDDR3 51.2 1.6
GDDR4 102.4 3.2
GDDR5 224 7
GDDR? 448 (?) 14 (?)
TSV Interface for DRAM
Chulwoo Kim 70 of 86
DRAM with TSV
Advantages of DRAM with TSV
Higher density per area
Shorter interconnection : lower power, faster flight time
Higher bandwidth with wide I/O
Wide I/O easily achieves 448 Gb/s/chip at next GDDR
(Example : 800 Mb/s/pin 512 I/O 448 Gb/s/chip)
MCU/GPU
Wide I/O
Memory
TSV
MCU/GPU
Memory
Memory
Memory
Memory
Interposer
TSV Interface for DRAM
Chulwoo Kim 71 of 86
TSV DRAM Type
Type Main Memory Mobile Graphics
Architecture
No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA
Feature
Low power
High speed
Low power
Multi channel
Wide I/O
Max bandwidth
Multi channel
Package
GPU
Controller
Interposer
TSV Interface for DRAM
Chulwoo Kim 72 of 86
Stacking Type
Type Homogeneous Heterogeneous
Architecture
Feature
Same chips
Low cost
Slave : only cells
Master : with peripheral
Slave
Slave
Slave
Master
TSV Interface for DRAM
Chulwoo Kim 73 of 86
Data Confliction Issue
PVT variations cause the data skew
Data Confliction increases the short current
DQ DQ DQ DQ DQ DQ
DQ DQ DQ DQ
Data Confliction
Slowest Chip
Fastest Chip
PVT Variations
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAM
DQ of
CHIP 0
MN0
MP0
EN0
/EN0
MN3
MP3
EN3
/EN3
DQ of
CHIP 3
HIGH
LOW
DQ
Pin
T
S
V
Chulwoo Kim 74 of 86
Rank 0
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
Rank 1
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
Rank 2
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
Separate Data Bus per Group
Separate Data Bus per Bank Group
Less dependent on the PVT variation
Rank 3
Group A
Bank Bank
Bank Bank
Group B
TSV array TSV array
Bank Bank
Bank Bank
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAM
Chulwoo Kim 75 of 86
DLL-Based Self-Aligner
Data alignment to external clock or clock of the slowest
chip
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
TSV Interface for DRAM
Chulwoo Kim
Skew
Detector
Skew
Compensator
Fine
Aligner
Replica
UP/DN
T
S
V
M
o
d
e
l
READ
READb
REAL PATH
0
1
0
1
CK
TRCLK
RFBCLK
C_CLK
CLKOUT
CHIP 1
CHIP 2
CHIP 3
CHIP 0
MODE
TFBCLK
PIN
DQS or
Dummy Pin
TSV model
Pipe
latches
Pipe
latches
Latches
Datas Aligned
Datas
SAM
MODE
PD1
PD2
76 of 86
Failed TSV Issue
a. TSV plating defect b. pinch-off
Decreasing the assembly yield
Increasing the total cost
Failed TSV
[53] D. Malta et al., ECTC, 2010, pp. 1779-1775
TSV Interface for DRAM
Chulwoo Kim 77 of 86
TSV Check
A TSV connectivity check by using the internal circuit
Test Signal Generating Circuits
Scan Chain Based Testing Circuits
T
S
V
_
0
T
S
V
_
1
T
S
V
_
2
T
S
V
_
3
T
S
V
_
4
In_0 In_1 In_2 In_3 In_4
Out_0 Out_1 Out_2 Out_3 Out_4
Receiver End
Sender End
[54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722
TSV Interface for DRAM
Chulwoo Kim 78 of 86
Redundant TSVs for Failed TSV
Conventional : redundant TSVs are dedicated and xed
Proposed : failed TSV is repaired with a neighboring TSV
TSV Repair
Chip1
Conventional
Chip2
A
B
C
D
A
B
C
D
a
b
r2
r1
c
d
Chip1
Proposed
Chip2
B
C
D
A
B
C
D
a
b
c
d
e
f
A
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
TSV Interface for DRAM
Chulwoo Kim 79 of 86
Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 80 of 86
Summary
Although all types of DRAMs are reaching their limits in
supply voltage, the demand of high-bandwidth memory
is keep increasing
For synchronization of external clock and output of
DRAM, low power, small area, and low skew are
important design parameters
To achieve high-BW memory, many design techniques
have been and will be adopted from other high-speed
wireline transceivers
TSV interface for DRAM might be a good solution to
achieve high bandwidth and low power
Summary
Chulwoo Kim 81 of 86
Suggested Papers to See
17.1 A 6.4Gb/s near-ground single-ended transceiver
for dual-rank DIMM memory interface systems
17.2 A 27% reduction in transceiver power for single-
ended point-to-point DRAM interface with the
termination resistance of 4Z
0
at both TX and RX
17.3 A 5.7mW/Gb/s 24-to-240 1.6Gb/s thin-oxide
DDR transmitter with 1.9-to-7.6V/ns clock-feathering
slew-rate control in 22nm CMOS
17.4 An adaptive-bandwidth PLL for avoiding noise
interference and DFE-less fast precharge sampling for
over 10Gb/s/pin graphics DRAM interface
Chulwoo Kim 82 of 86
References
[1] K. Koo et al., A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and 4 half-page architecture,
in IEEE ISSCC Dig. Tech. Papers, pp. 4041, 2012.
[2] JEDEC, JESD79F.
[3] JEDEC, JESD79-2F.
[4] JEDEC, JESD79-3F.
[5] JEDEC, JESD79-4.
[6] T.-Y. Oh et al., A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-group
restriction, in IEEE ISSCC Dig. Tech. Papers, pp. 434435, 2010.
[7] H.-W. Lee et al., Survey and analysis of delay-locked loops used in DRAM interfaces, submitted to IEEE
Trans. VLSI Syst.
[8] T. Saeki et al., A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay, in IEEE
ISSCC Dig. Tech. Papers, pp. 374-375, 1996.
[9] A. Hatakeyama et al., A 256 Mb SDRAM using a register-controlled digital DLL, in IEEE ISSCC Dig. Tech.
Papers, pp. 72-73, 1997.
[10] J.-T. Kwak et al., A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM,
in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 283-284, 2003.
[11] H.-W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm
CMOS technology, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011.
[12] D. Shin et al., Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection scheme
for 54nm 7Gb/s GDDR5 DRAM interface, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 138-139, 2009.
[13] W.-J. Yun et al., A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nm
CMOS technology, IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011.
[14] H.W. Lee et al., A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAM
interface, in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010.
[15] B.-G. Kim et al., A DLL with jitter reduction techniques and quadrature phase generation for DRAM
interfaces, IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.
References
Chulwoo Kim 83 of 86
References
[16] W.J. Yun et al., A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and update
gear circuit for DRAM in 66nm CMOS Technology, in IEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008.
[17] S. Kim et al., A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low
stand-by power DDR I/O interface in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 285-286, 2003.
[18] T. Matano et al., A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output
buffer, IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003.
[19] K.-H. Kim et al., Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2
synchronous DRAM application in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 287-288, 2003.
[20] J.-T. Kwak et al., A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAM
in IEEE Symp. VLSI Circuits Dig. Tech. Papers , pp. 283-284, 2003.
[21] O. Okuda et al., A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for
SDRAMs] in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 37-38, 2001.
[22] F. Lin et al., A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5
Gb/s/pin GDDR4 SDRAM, IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008.
[23] K.-W. Kim et al., A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase input
strobing, and low-jitter fully analog DLL, IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007.
[24] D.U. Lee et al., A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual-
loop digital DLL, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006.
[25] S.J. Bae et al., A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration of
equalization skew and offset coefficients, in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521,
2005.
[26] Y.-J. Jeon et al., A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty-
cycle clock dividers for production DDR SDRAMs, IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092,
Nov. 2004.
[27] T. Hamamoto et al., A 667-Mb/s operating digital DLL architecture for 512-Mb DDR, IEEE J. Solid-State
Circuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.
References
Chulwoo Kim 84 of 86
References
[28] S. Kim et al., A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speed
DRAM, IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002.
[29] J.B. Lee et al., Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM, in IEEE ISSCC
Dig. Tech. Papers, pp. 68-69, 2001.
[30] S. Kuge et al., A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,
IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000.
[31] H.W. Lee et al., A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm
CMOS technology, IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012.
[32] Y. K. Kim et al., A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bang
jitter reduced DLL scheme in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007.
[33] K.H. Kim et al., A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high-
speed DRAM application, in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004.
[34] J.H. Lee et al., A 330 MHz low-jitter and fast-locking direct skew compensation DLL, in IEEE ISSCC Dig.
Tech. Papers, pp. 352-353, 2000.
[35] J. Kim et al., A low-jitter mixed-mode DLL for high-speed DRAM applications, IEEE J. Solid-State Circuits,
vol. 35, no. 10, pp. 1430-1436, Oct. 2000.
[36] H.W. Lee et al., A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power-
noise management with unregulated power supply in 54nm CMOS, in IEEE ISSCC Dig. Tech. Papers, 2009, pp.
140-141.
[37] S.-J. Bae et al., A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk
Equalizer and Adjustable clock-Tracing BW, in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011.
[38] K.-h. Kim et al., A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre-
emphasis transmitter, IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006.
[39] H. Partovi et al., Single-ended transceiver design techniques for 5.33Gb/s graphics applications, in IEEE
ISSCC Dig. Tech. Papers, pp. 136-137, 2009.
[40] Y. Hidaka, Sign-based-Zero-Forcing Adaptive Equalizer Control, in CMOS Emerging Technologies
Workshop, May 2010.
References
Chulwoo Kim 85 of 86
References
[41] S.-J. Bae et al., A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN-
reduction techniques, in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008.
[42] K.-I. Oh et al., A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,
IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009.
[43] S. H. Wang et al., A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,
IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001.
[44] W. Hubert et al., GDDR5 training-challenges and solution for ATE-based test, in Asian Test Symposium,
pp. 24-27, Nov. 2008.
[45] JEDEC, JESD212.
[46] K. Sohn et al., A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant
data-fetch scheme, in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012.
[47] C. Park et al., A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,
IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006.
[48] D. Lee et al., Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm
3.0Gb/s/pin DRAM interface, in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008.
[49] J. Koo et al., Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5
application, in Proc. IEEE CICC, pp. 717-720, Sep. 2009.
[50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid-
State Circuits Conference, pp. 249-252, 2008
[51] H.-W. Lee et al., A 283.2W 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV)
interface, in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012.
[52] U. Kang et al., 8Gb 3D DDR3 DRAM using through-silicon-via technology, in IEEE ISSCC Dig. Tech. Papers,
pp. 130-131, 2009.
[53] D. Malta et al., Integrated process for defect-free copper plating and chemical-mechanical polishing of
through-silicon vias for 3D interconnects, in ECTC, pp. 1769-1775, 2010.
[54] A.-C. Hsieh et al., TSV redundancy: architecture and design issues in 3-D IC, IEEE Trans. VLSI Systems,
pp. 711-722, Apr. 2012.
References
Chulwoo Kim 86 of 86