0% found this document useful (0 votes)
37 views

Post-Si Validation Tutorial

Uploaded by

Janakiraman R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Post-Si Validation Tutorial

Uploaded by

Janakiraman R
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 127

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299562791

TUTORIAL 15: Validation and Verification of High-Performance


Microprocessors: Challenges and Solutions

Presentation · September 2003


DOI: 10.13140/RG.2.1.4659.6241

CITATIONS READS

5 1,765

3 authors, including:

T.M. Mak L. C. Wang


self Dalian University of Technology
96 PUBLICATIONS 1,515 CITATIONS 148 PUBLICATIONS 2,713 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PhD Research View project

Silicon Interposer and 3D silicon View project

All content following this page was uploaded by T.M. Mak on 02 April 2016.

The user has requested enhancement of the downloaded file.


Post Silicon Validation

09/29/03 TM Mak Post Silicon Validation 1


Pre-Si Validation (recap)
• AV (Architectural or Functional)
– Verify feature set match specifications
– “Focused” assembly lang tests to verify each
architectural features
• LV (Logic)
– Usually based on the RTL model
• FV (Formal)
• PV (Performance, design phase)
– Circuit has to perform the other physical spec
item: Speed/timing, voltage, temperature

09/29/03 TM Mak Post Silicon Validation 2


Why Post-Si Validation?
• Partially non-functional (buggy) parts may be
shipped to customers
– Customer replacement (recall)!!
• Customer may have flaky systems or frequent
crashes
• Silent data corruptions, e.g. bank balances,
design simulations
• Customer losing confidence (Value of Brand)
• Lost sales
• Depressed stock!!

09/29/03 TM Mak Post Silicon Validation 3


Pre vs. Post Si Validation
• SRTL validation is MUCH ss..ll..oo..ww..ee..rr than real silicon
– Typical full-chip SRTL simulation with checkers ran at 3-5 Hz on a
1GHz machine
– We used a compute farm containing thousands of machines
running 24/7 to get ~6 billion cycles/week (109 )
– ALL the SRTL simulation cycles we recorded amounted to less
than 2 minutes on a single 1 GHz system!
• But pre-silicon validation has some advantages
– Fine-grained (cycle-by-cycle) checking
– Visibility of internal state (e.g. caches, registers)
– APIs to allow event injection
• No amount of dynamic validation is enough to exhaustively test
a complex microprocessor
– A single dyadic extended-precision FP instruction has 1050
combinations
– A 3GHz processor will do 1017 cycles per year

09/29/03 TM Mak Post Silicon Validation 4


Pre-silicon Validation cycles
not that we don’t try
(Millions)

6000
~1/4 sec
Pentium 4 of real time
5000
execution
4000

3000

2000

1000 Full-
chip
0
40'98

43'98

46'98

49'98

52'98

03'99

06'99

09'99

12'99

15'99

18'99

21'99

24'99

27'99

30'99

33'99

36'99

39'99

42'99

45'99

48'99

51'99
09/29/03 TM Mak Post Silicon Validation 5
Cost of a processor bug
10B$

Lost sales
1B$
Recall d,
n
f ou is
Cost

100M$
g is e it
b u siv
r a pen
10M$ late e ex
he or
Time-to-market T m
the
1M$

Pre-Si Post-Si; Post-production


2-4 yrs Pre-production 5 yrs
0.5-1 yrs

09/29/03 TM Mak Post Silicon Validation 6


Post-Si Validation
• DV (Design)
– Verify design sensitivity to various environmental conditions, (e.g.
voltage, temperature, frequency, and process variation [to some degree])
with a given test suite -- fullchip level test (diagnostic)
• SV (system)
– Verify that product is functional in a given system (designed to facilitate
debugging) with real peripherals, BIOS, OS and Applications
• CV (Commercial/Compatibility)
– Verify that product is functional across OEM systems, OSes, applications
• CMV (Circuit Marginality)
– Verify that product is free from sensitivities to voltage/temp/frequency in a
system level operation
• MV (Manufacturing)
– Verify that a given product can be manufactured in HVM (high volume
manufacturing); Yield is not impacted if circuit is manufactured in high
volume, over time: process variation (+ all of the above)
• RV (Reliability)
– Verify that the product has a low infant mortality rate and achieve low FIT
(failure in time)

09/29/03 TM Mak Post Silicon Validation 7


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 8


Design Validation
• Performance characterization
– Start when product is healthy (no major functional bugs,
initial units reach performance target)
– Usually carried out on ATE with defined set of tests
• Bench may still be needed for parametrics (esp. special
interfaces)
– Parametric shmoos or data collection at specific test
conditions
– Shmoos to find performance range and potential holes (i.e.
failed region in the middle of pass region)
• Also can be viewed as part of the effort to improve
product performance
– One such parameter is “frequency”!

09/29/03 TM Mak Post Silicon Validation 9


Typical
Shmoo

spec
Margin to spec
09/29/03 TM Mak Post Silicon Validation 10
Fail Shmoos
VDD/Freq Fail Shmoo Pattern List: ALU
+----+----+----+----+----+----+----+----+
40.0 MHz |***************AAAAAABBCCCCCCCCCCCCCCCCCC| 25 ns
38.5 MHz |*****************AAAAABBCCCCCCCCCCCCCCCCC|
37.0 MHz |********************AAAAABBCCCCCCCCCCCCCC|
26
27
ns
ns
• Indicates
35.7 MHz |**********************AAAABBCCCCCCCCCCCCC|
34.5 MHz |************************AAAABBBCCCCCCCCCC|
28
29
ns
ns performance curve of
33.3 MHz |*************************AAAABBBBCCCCCCCC|
32.3 MHz |***************************AAABBBBCCCCCCC|
30
31
ns
ns various failure modes
31.2 MHz |****************************AAABBBBCCCCCC| 32 ns
30.3 MHz |*****************************AABBBBBCCCCC| 33 ns – Letter order is NOT
29.4 MHz |******************************AABBBBBCCCC| 34 ns
27.8 MHz |*******************************ABBBBBCCCC| 36 ns pattern order
27.0 MHz |*******************************ABBBBBCCCC| 37 ns
26.3 MHz |********************************ABBBBBCCC|
25.0 MHz |********************************ABBBBBCCC|
38
40
ns
ns
• EG:
23.8 MHz |*********************************ABBBBCCC|
23.3 MHz |*********************************DDEEECCC|
42
43
ns
ns – ‘A’ is the first speed-
22.2 MHz |*********************************DDDEEECC|
20.8 MHz |*********************************DDDDEEEC|
45
48
ns
ns
limiting failure mode at
20.0 MHz |**********************************DDDEEEC|
18.9 MHz |**********************************DDDDEEE|
50
53
ns
ns
high frequency & low
17.9 MHz |**********************************DDDDDEE| 56 ns VDD
16.9 MHz |**********************************DDDDDDE| 59 ns
+----+----+----+----+----+----+----+----+
VDD 7.0 6.5 6.0 5.5 5.0 4.5 4.0 3.5 3.0
– ‘D’ and ‘E’ are hard
low voltage failures
Character Cycle Vector Pattern
---------------------------------------------------- modes even at slow
A
B
116392
89228
2347
5467
gecc1000
geba1113 speed.
C 122 55 ext003
D 83535 1178 geba1113
E 83288 855 geba1113

09/29/03 TM Mak Post Silicon Validation 11


DV: Timing Distribution
calculated_meas_rslt
4

Normal Quantile Plot


Quantiles Moments
.999 3 100.0% maximum 1.68e-9 Mean 1.36e-9
Cumulative .99 99.5% 1.6e-9 Std Dev 1.058e-10
2
97.5% 1.54e-9 Std Err Mean 1.47e-12
Distribution .95
.90
1 90.0% 1.5e-9 upper 95% Mean 1.3629e-9
.75
plot .50 0
75.0%
50.0%
quartile
median
1.44e-9
1.36e-9
lower 95% Mean
N
1.3571e-9
5180
.25
-1 25.0% quartile 1.28e-9
.10
.05 10.0% 1.22e-9
-2 2.5% 1.15e-9
.01

.001 -3 0.5% 1.09e-9


0.0% minimum 1.06e-9

750 Legend
force_temperature Temp range
Count Axis

500 0
110
250

1.1e-9 1.3e-9 1.5e-9 1.7e-9


Sample sizes

09/29/03 TM Mak Post Silicon Validation 12


Bench measurements
• ATE may not have
signal acquisition (or
generation) capability
– Zo
– Serial (clock
embedded) data
• Data needed for
simulation model
refinement
– For future performance
enhancement

09/29/03 TM Mak Post Silicon Validation 13


Design Validation
• Functional test
– Pattern coverage (often from AV, or uAV)
• May not even fit into ATE memory
• How to generate?
• How to measure?
– Correlation between System/Bench/ATE
– Wide enough skews/sample size
• Process, Vcc, Temperature,
• Surprises need to be contained
– Errata, Spec change, etc.
• All anomaly are investigated by designers as part of
silicon debug effort
– Root cause issues
– Provide inputs for new stepping design changes

09/29/03 TM Mak Post Silicon Validation 14


Problems seen in a Shmoo
VOL/IOL Shmoo Pattern: CMOS-OUTS
V
+---------+---------+---------+---------+
10.00 mA | *** ***|
9.50 mA | **** ***|
9.00 mA | ****** ***|
8.50 mA | *********************************|
8.00 mA >| **********************************|
7.50 mA | ****** **************************|
7.00 mA | ***********************************|
6.50 mA | **************************** *****|
6.00 mA | ********************** *************|
5.50 mA | ************************************|
5.00 mA | *************************************|
+---------+---------+---------+---------+
VOL 0.35V 0.40V 0.45V 0.50V 0.55V

• Test Margin, Test Stability, Holes,


Performance Cliffs…
• Debug is needed!

09/29/03 TM Mak Post Silicon Validation 15


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 16


System Validation
• Target major CPU attributes:
– architecture, such as ISA, floating point unit, data space
– micro-architecture, boundary conditions between micro-
architectural units
– Multi-processor, such as memory coherency, consistency
and synchronization
• With these methods
– Directed or Focused test
– Random Instruction Test
– Dataspace Test
– Opcode coverage Tests
• Validation Platform to ease debug and validation

09/29/03 TM Mak Post Silicon Validation 17


Validation Platform
• Scope, Logic Analyzer probe connection all built-in
• Specific hardware to generate bus traffic patterns
• Software controllable clock board that can step
system frequency in 1MHz step
• Software programmable voltage regulators for both
CPU and chipsets
• In-Target Probe (ITP) debug port
• PCI link for large # of synthetic IO agents
• UP, MP configurations

09/29/03 TM Mak Post Silicon Validation 18


In Target Probe
• Aka In Circuit Emulator
– Software/hardware co-debug
– Breakpoints, internal status dump, etc.
– Intel may make hardware interface; third party to
provide tools/GUI etc.
• Utilize the JTAG port to gain access to
internal DFD control and data
– Chip information is under NDA
– Most CPU manufacturer provide similar capability
to aid OEM customers to use their products

09/29/03 TM Mak Post Silicon Validation 19


Focus vs Random test
• Focus tests can target specific • Generate test conditions human test
processor features writers don’t think of
– Local APIC • Generate wild boundary conditions
– Cache geometry (1M/2M vs 256K) – Where it is difficult to determine EXACT test
– Paging Modes (4K/2M/4M, Mode conditions
A/B/C/PSE36) • Don’t know what your looking for. You just
– Data prefetcher logic know that there is (or maybe) a problem.
• Algorithmic – New instruction set added to architecture
– TLB shootdown, MP algorithms, (MMX™, SSE™)
Exhaustive dataspace testing • Difficult to foresee all test conditions
• Tight control of test execution flow – Fails in CV, where it’s difficult to debug
– Multiprocessor, Cache-coherency tests – Speedpath failure in CMV (look for a slower
– Self-modifying/Cross modifying code speed limiter)
tests • You know most of the preconditions for a
• Self-Checking particular failures, but don’t quite have
– Test writer decides what results to check. everything
– Checking is explicit and very specific – Add in known preconditions and randomize
other elements
• No simulator required – Use automated test generation to find failure
• Tests are contrived and programmer condition
written • You have several sets of test conditions to
test and you don’t want to rewrite tests to
do it

09/29/03 TM Mak Post Silicon Validation 20


Random Instruction Testing
• randomize sets of processor features to execute with randomly
generated instructions
• involves more than just random instructions.
– Architectural attributes (GDT, LDTs, Paging, MTRR, memory
spanning and data patterns) are also randomized. Examples
include
• DPLs in segment descriptors randomized between 0 and 3
• PTE present bit randomized (present/not present)
• Fixed/variable MTRRs randomized
– We’re really doing Random Architectural Testing
• Results were compared with memory space map between
actual system and architectural simulated map
• Random test code “doesn’t make any sense”
– No apparent relationship between test instructions
• RIT tools are key enabler

09/29/03 TM Mak Post Silicon Validation 21


More on RIT tools
• For long pipeline processors, need to warm up the
long pipeline
– Truly random instructions tend to cause frequent exceptions
or other control flow discontinuities
– Typical RIT tools would encounter a pipeline hazard within 3
to 20 instructions on average
• Need to avoid false failures
– Getting undefined processor states or other differences
between architectural simulators and real silicon
• Need to fully propagate architectural states to
memory image file without affecting randomness of
instruction stream
• Need to have high throughput to find those rare or
highly intermittent bugs!

09/29/03 TM Mak Post Silicon Validation 22


System validation -- Dataspace
• Dataspace validation is ensuring that arithmetic
operations generate the correct results:
– To check the result, an inverse operation may be used:
• If (A*B equals C), then (C/B = (A ± rounding error))
• must be able to guarantee that the inverse operation could not
have an error that would exactly offset an error in the operation
under test (aliasing).
– Use an algorithm that does not use the instruction/hardware
under test:
• A * 4 = A SHL 2
– Use a “Golden System” to check your answers:
• A * B on our SUT = A * B on our “Golden System”
• Defining what constitutes a “Golden System” is another tutorial
in itself

09/29/03 TM Mak Post Silicon Validation 23


Dataspace – lots of spaces
000000 000001 000002- 080000- 100000- 180000- 200000- 280000- 300000- 380000- 400000- 480000- 500000- 580000- 600000- 680000- 700000- 780000- 7FFFFE 7FFFFF
07FFFF 0FFFFF 17FFFF 1FFFFF 27FFFF 2FFFFF 37FFFF 3FFFFF 47FFFF 4FFFFF 57FFFF 5FFFFF 67FFFF 6FFFFF 77FFFF 7FFFFD

FF INF SNaN SNaN SNaN SNaN SNaN SNaN SNaN SNaN SNaN QNaN QNaN QNaN QNaN QNaN QNaN QNaN QNaN QNaN QNaN
FE
F0-FD
Ex
Dx
Cx
Bx
Ax
9x
8x
7x
6x
5x
4x
3x
2x
1x
02-0F
01
00 0 DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN DEN

• This table represents the dataspace for a single precision operand (32-bits). The
yellow areas represent the special portions of the dataspace: Zero, Infinity,
Denormals, and NaNs. These rows account for less than 1% of the dataspace.
• The green areas representing “interesting” areas because they contain boundary
conditions. These areas also account for less than 1% of the dataspace.
• The blue areas represent the remaining 98%+ of the dataspace

09/29/03 TM Mak Post Silicon Validation 24


Why dataspace validation?
• Two recent arithmetic disasters :
– On February 25, 1991, a Patriot Missile failed to
intercept a scud missile. 28 U.S. soldiers were
killed.
• Cause: Accumulated rounding errors occurred in
the routine that added up tenths of seconds the
system was up since reboot. After 100 hours of
operation, the clock was approximately 1/3 of a
second slow.
– On June 4, 1996, an Ariane Rocket (valued at
about $500 million) exploded 40 seconds after
lift off.
• Cause: The routine that converted a 64-bit
floating-point value into a 16-bit integer
overflowed causing the guidance program to
crash.
• Intel has had an arithmetic “disaster” as well:
– The disaster was a bug known as “FDIV”
– More people probably know about FDIV than
the two previously mentioned disasters

09/29/03 TM Mak Post Silicon Validation 25


The FDIV bug
• In late 1994, the Intel Pentium® Processor had what Intel
referred to as a “flaw”.
• 5 PLA entries were missing that were used in divide ops:
– The entries were used to predict intermediate quotient values
– ms digits of the missing entries were 1.0001, 1.0100, 1,0111,
1.1010, 1.1101
– Actually affected FDIV, FDIVP, FDIVR, FDIVRP, FIDIV, FIDIVR,
FPREM, FPREM1, FPTAN, FPATAN
• Intel’s reaction aggravated the situation:
– Erratum Title: “Slight precision loss for floating-point divides on
specific operand pairs.”
– “The statistical fraction of the total input number space prone to
failure is 1.14x10 -10.”
– “Statistical characterization yields a probability that about one in
nine billion randomly fed operand pairs on the divide instruction will
produce results with reduced precision.”
– “The occurrence of the anomaly depends upon….the way in which
the final result of the application is interpreted.”
• The bottom line:
– This “flaw” cost Intel $490 million!
09/29/03 TM Mak Post Silicon Validation 26
Why are dataspace bugs so bad?
• Dataspace bugs are data integrity bugs!

• They can be really, really, really hard to find

• If you accidentally hit one, you probably won’t notice it:


– You’re probably not going to get a blue screen or any other sign
that an incorrect result has been generated.

• What can happen with inaccurate data?


– A loss of accuracy in the 13th binary digit would probably not show
up in your check book
– Consider the Patriot Missile and Ariane Rocket disasters

• A spec update won’t solve a dataspace bug:


– “Never add those two values” is not an acceptable workaround
– Replacement would be financially catastrophic!

09/29/03 TM Mak Post Silicon Validation 27


• Signal integrity check is
significant as system bus
Analog Validation –
speed towards Gbps level
– Esp. with MP
Bus Signaling
• Component level testing is
insufficient to guarantee that
system will operate with
margin
• DFT can be programmed
through BIOS for varying
time strobes and Vt levels in
varying system environment

VCTERM VCTERM
1.89 in 2.12 in
0.8 in
(48 mm) (54 mm)
iA Processor iA Processor iA Chipset

3.35 in (85 mm)

0.1 in

All dimensions are preliminary .


iA Processor iA Processor
Maximum trace length = ~6.7 in.

09/29/03 TM Mak Post Silicon Validation 28


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 29


Silicon problem can show up in unique
system/apps
Written for
• Some Cards Application
specific O/S
Come with
and Computer
their OWN
BIOS => Written to
replace support ONE
O/S
System type of
BIOS calls computer !

Specific to each SCSI


BIOS
computer (knows BIOS
H/W)

Can be different H/W SCSI


for each computer

09/29/03 TM Mak Post Silicon Validation 30


Commercial/Compatibility Validation
• Open platform challenges
– Selling components imply compatibility with a wide diversity
of hardware, software (contrary to other closed system
architecture)
– E.g. can’t ask Microsoft to change OS when there is a
CPU/OS issue
• Validate microprocessors using robust combinations
of commercial hardware and software across various
platforms
– Find CPU bugs and related issues
– Identify and alert other groups when non-CPU issues are
found, effecting program health (I.e., apps, drivers, Chipset,
platform)

09/29/03 TM Mak Post Silicon Validation 31


CV – issues and challenges
• Testing covers all common user applications and
configurations, from legacy through bleeding edge
technology and custom apps
– Gbit Ethernet, SATA, Fibre channel, RAID, PCI, USB, 1394,
ACPI, UDMA/66/100, AGP4x/8x, MMX/SSE apps, Xeon
debug, Geyserville, soft DVD, video conference, video
capture/editing, web server, cv RIT, LVDS , all NW topologies, , ,
video server gaming applications industry benchmarks for
stress…

– unique OS’s: Linux, WinXP, Win2K, WinNT, SCO,


WinMe/98, Netware, etc.
• Need highly automated, non-intrusive, video capture
and test management tool
• Debugging is often difficult due to unavailability of
source codes

09/29/03 TM Mak Post Silicon Validation 32


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 33


Circuit Marginality Validation
• Overclocking!!
• Ask:
– What is preventing us from running to the next bin
(frequency target)
– Do we hit the right combinations of worse case instructions
or data combination/permutation?
• Find silicon critical paths
– Won’t be surprised that real paths are VERY different than
simulated paths
• System failures are tips of the iceberg: happen to
stumble on something….but it may not be the worst
case!!
– Crosstalk, power noise type of problems are tricky

09/29/03 TM Mak Post Silicon Validation 34


CMV Test Suites
• RIT
• Focus tests
– Move lots of data, lots of cache/memory
interaction
– Create high power and large power transients
– Directed random
• Commercial OSes and applications
• Need close correlation with ATE platforms
– Critical paths test need for speed binning

09/29/03 TM Mak Post Silicon Validation 35


Sources of mis-correlation
• ATE power system has far
better control over
ordinary VRM
• Thermal control also may
have different capability
• Test codes are different
S9K voltage error envelope
– Need capture of system
code to manufacturing tests
to ensure worst case test
are screened CMV voltage error envelope

09/29/03 TM Mak Post Silicon Validation 36


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 37


Manufacturing Validation
• Yield and manufacturability cannot be verified with
samples
– SIU/TIU (Sort/Test interface unit, probe card loadboards,
etc.)
– Equipment (prober, handlers)
– Process (documentation, training)
– A wide variety of production material
• Risk reduction with MV data and learning from similar
processes, products, packages
• All yield issues, manufacturability issues have to be
addressed before ramping volume
– Risk in tying up millions$$ of WIP

09/29/03 TM Mak Post Silicon Validation 38


Post-Si Validation
• DV (Design)
• SV (system)
• CV (Commercial/Compatibility)
• CMV (Circuit Marginality)
• MV (Manufacturing)
• RV (Reliability)

09/29/03 TM Mak Post Silicon Validation 39


Reliability Validation
• ESD
• Infant Mortality
• Life Test
– Device degradation
– Electro-migration, self-heat
• Package reliability
– Temp Cycle
– 85oC/85 RH
– Steam

09/29/03 TM Mak Post Silicon Validation 40


IESD
Electro-Static-Discharge -
- - -
- -
-
- -
-

Charge Device
Model

- -
- -
-
- -
- - IESD -
- - -

Machine Model
di/dt
is large -
- -
-
-
IESD
-
-
-

- -

Human Body
Model
09/29/03 TM Mak Post Silicon Validation 41
Infant Mortality and Life-test
Infant Mortality (declining failure rate)
Failure Due to Latent Reliability Defects
Goals: 500 DPM within 0-30 days & 200 Wearout (increasing failure rate)
Rate Due to oxide wearout, EM, hot-e,
FIT within 0-1 year
etc Goal: <0.1% failing for intrinsic
Cumulative Fallout Vs. Time reliability mechanisms
(follows a lognormal distribution)

Impact of Burn In: .10um .13um .18um


Scope of Control Infant
Burn In Mortality

~1 year 7 YR Time
Wearout
Target

0hr 12hr
6hr 24hr 48hr 168hr 500hr 1000hr

Burn-in readout to assess failure rate


09/29/03 TM Mak Post Silicon Validation 42
Debug and debug tools

09/29/03 TM Mak Post Silicon Validation 43


Why talk about debug here?
• Debug is a successor to Validation
• Bug (A persistent error in software or
hardware)
– All software and hardware have bugs!
• The more complex it is, the more bugs it can be created
– All hardware begin their life as software (e.g. RTL
code)
– If the bug is in software, it can be corrected by
changing the program. If the bug is frozen in
silicon, new silicon have to be made (with different
design)
The art is: how to dress up a bug as a feature

09/29/03 TM Mak Post Silicon Validation 44


Diagnosis? (what cause the bug?)
• Diagnosis (the act of finding the root cause
of a misbehavior) techniques are used for
both
– (initial) silicon debug/yield improvement
– failure analysis of units (during stress tests or
returned by customers)
• Assumptions are very different for either case
– Important distinction!
– May lead to different techniques

09/29/03 TM Mak Post Silicon Validation 45


I will remove this here: recommend move to the front

Models
↓ Specification
↓ Property
HDL -- Behavior =? Verification
↓ HDL -- RTL behavior of nand4 is
begin process(a,b,c,d) Usually piecemeal,
begin

↓ Netlist/schemati z<=NOT(a AND b AND c AND d)


end process
e.g. FV, protocol
checking
cs --Gate level,
end architecture behavior
=? Logic
switch level Verification
↓ GDS II -- Layout
↓ Geometries on
die =? Testing

Check against each level


of transformation
09/29/03 TM Mak Post Silicon Validation 46
Manufacturing Test vs. Debug
• Test screens for manufacturing defects
– Not all die are faulty (some good die available)
– Different die fail in different ways due to different defects
• Debug searches for design validation escapes which
affect all die (or many die)
– Functional design errors which escaped validation
– Unexpectedly slow critical path missed by timing analysis
– Other behaviors not correctly modeled by simulators
• Goals of debug: diagnosis & work-around
– Root-cause the failure to fix in next stepping
– “Peel the onion” to debug multiple problems in one stepping

09/29/03 TM Mak Post Silicon Validation 47


Bugs and defects
CUSTOMER
Return
Policy
E PATCH
IM
T HVM
TEST
POST SI
MFG VAL. Survive Bugs
PROCESS PRE SI
VAL.
DEFECTS

DESIGN
Detect Errors
ERRORS AM
RE
ST
UP
Prevent Errors VE
MO
09/29/03 TM Mak Post Silicon Validation 48
Diagnosis tools
• Controllability & observability of 1s and 0s
– States and data
• Use DFT (design-for-test) and DFD (design-for-debug)
hooks
– Scan, and everything else
• Fault diagnosis T2 T1
– Identify set of all modeled faults
– Use fault simulation to find which tests detect which faults
– Given pass / fail status of a specimen for each test, intersect
fault lists to diagnose the fault
– Success is a function of how well the modeled faults match
reality
• Diagnosis tools are often of little help for silicon debug

Fault List
09/29/03 TM Mak Post Silicon Validation 49
Silicon Debug
• Runs all sort of tests on a sample of new units
• Avoid chasing after defects
– If failure signatures (on new samples) are random, skip until
you can find systematic failures
• Try different frequencies, voltages, temperatures and
get the test to pass
– If no pass region, it is a logic error (functional bug)
– If a passing region can be found, it is a circuit problem
– If it works at lower frequency, it is a performance (delay)
problem
– Shmoo holes can exist (fails only under specific conditions
and not above or below)

09/29/03 TM Mak Post Silicon Validation 50


Failure Analysis
• Uses all tools from silicon debug
• Look for defects that cause chips to fail (and was
returned) in the field (at OEM customer or end use
customers)
– FA also needed for stress test failures for internal reliability
stresses
• Usually result of infant mortality failures
– Defects that don’t reveal themselves during HVM test
• Wearouts are rare (but possible)
– Trap charge increase Vt (and hence drop in performance)

09/29/03 TM Mak Post Silicon Validation 51


Yield Improvement
• Defect localization
– RAM array
– Scan based
• What is it?
– Material analysis
• Process that create or aggravate it
– May involve process simulation

09/29/03 TM Mak Post Silicon Validation 52


DFD and DFT on processors
• JTAG private instructions
– DFT control registers
• Scanout
• Debug mode
• Array freeze/dump
• Internal break point
• Performance counters/last branch records and
branch trace messages
• Physical DFT
• Microcode patch & BIST
• Cache power-on self test, PBIST, SMURTL

Ref Pentium Pro, 97, 98


09/29/03 TM Mak Post Silicon Validation 53
DFD and DFT for processors
• Scan
– Various favors
• Feature (functional) disable
– Cache, execution pipelines, pre-decode etc.
• Clock controls
– Clock shrinking/relaxing (can be applied for any
phase, any cycle)
– Override deskew values for different clock regions
• Checkpointing, SMI and PSMI

09/29/03 TM Mak Post Silicon Validation 54


Other debug tools
• RTL simulator, circuit simulator
• In-Target Porbe
– Breakpoint, register and memory examination, start/stop
execution
• Logic Analyzer trace capture of processor system
bus
– Disassembly back into code streams; invalidation of cache to
force instruction from memory
• Validation Platform
– Schmoo of voltage, temperature and frequency
• Debug ATE
– Very similar to Production ATE
– Allows simulation traces to be downloaded and used
– Flexible and Ease of use to designers (aka simulator
interface)

09/29/03 TM Mak Post Silicon Validation 55


Software based (Simulator)
diagnosis
• RTL model is the
primary golden model
• Faults are seeded
into model to confirm
signatures observed
• RTL itself does not
do diagnosis:
– Diagnosis tool is the
expert knowledge of
the chip

09/29/03 TM Mak Post Silicon Validation 56


System level debug
• Problem may lie in how each
component interact with
each other Microprocessor
– Logic analyzer may help if it
is more hardware oriented Memory
AGP CPU Bridge
– System level simulation may Graphics and
be the only tool if it is Memory Controller Extra I/O
Device
protocol related
I/O PCI Bridge PCI Bus
• Must reduce problem down
to something that can be Functions and I/O
P1
tightly executed on a LPC I/F
test/debug platform FWH Super
– Probing require small loops I/O
to increase signal repetitive
rate

09/29/03 TM Mak Post Silicon Validation 57


Why a tester?
Start of Pattern
• Certain probing tools
require frequent sampling of
data Reset Start of
loop
– Aka digital sampling scope
– Long loop, such as those in
the system would not enable
good Signal/Noise ratio
Test
• Tight program loop also Event of
Interest:
match well with RTL RTLsim
simulator End of Pattern
100
or
Seq. 5000
• Tester also allows easier
docking of probing
instrumentation

09/29/03 TM Mak Post Silicon Validation 58


Tester / System Debug

Tester System
Expensive Cheap
Per pin control Hard to control pins
Deterministic Non-deterministic
Hard vector generation Easy vector generation
Enables easy debug Complicated debug

Use of both is complementary and necessary!

09/29/03 TM Mak Post Silicon Validation 59


PC based diagnosis platform

An minimalist Si-debug platform


Trick is to interface DUT to ebeam or LVP prober

09/29/03 TM Mak Post Silicon Validation 60


Unusual on-chip debug hooks

09/29/03 TM Mak Post Silicon Validation 61


Scanout/Sample-on-the-fly
• Observe only cell (but can take snapshot at any
clock)
– Take snapshot and serial shift out (at speed)
– Loading to signal only, normal data don’t pass through cell
– Area impact usually hidden if cells placed underneath routing
– No need to stop core clock (very IMPORTANT!)
– No controllability; Selective node observability
– >24,000 nodes on McKinley (vs. 130K scan nodes)
• Signature mode
– XOR with previous states of upstream scan node
continuously
– Single scanout bit (usually through TDO)
– Subject to contamination by X-states

09/29/03 TM Mak Post Silicon Validation 62


FSB
ScanOut - Taking a Snapshot
5500 7805
Core

Failure Failure shows up


propagates on the FSB some
Circuit fails
to scannode 2000 bus clocks
in RS Unit
in RS Unit lateras a mismatch
(csim 2500)
(csim 2504) ABUS[31:3]

• So you see an error on the bus?


– Where did it originate?
1. Critical Circuit Fails
– When did it originate?
2. Failure propagates to a
Scan-node
3. Failure propagates through
to Front-Side Bus
09/29/03 TM Mak Post Silicon Validation 63
Microcode patching
• Specific microcode region implemented with
RAM
– available for patching
• Works around bugs
– Delivered with each BIOS boot

09/29/03 TM Mak Post Silicon Validation 64


Clock stretching is diagnosis in
temporal domain
• Originate when clock inside chip is the same as clock
outside of chip
• Outputs are just data pipelined from execution
– Start with the cycle that show IO error ; stretch each clock
cycle until error goes away
– # of clock stretched is the pipeline stage where error first
initialed
– Complication can come with data re-circulation and/or
sequential circuits
– Also difficult to find clock that did the trick if error is from a
register dump (usual for AV tests)
• Clock manipulation is all done on chip after PLL clock
multipliers become norm

09/29/03 TM Mak Post Silicon Validation 65


Speed debug
• On-Die Clock
Shrink/Stretch (ODCS) Duty-Cycle Adjust
– triggered by events or
I/O Clock I/O
counters on-chip PLL Dist CLKs
• Deliberate clock skews
(at domain drivers) to Skew
Detect Skew Out
flush out min/max
problems Skew Adjust Pulse stretch
Core
• Hold time testing using PLL
local driver pulse On-die Clock Shrink
stretch

09/29/03 TM Mak Post Silicon Validation 66


Checkpointing, SMI, PSMI
• Essential for system level debug
– Lengthy test length (hours to days before error
occur)
– Source code may not be available (e.g. WinXP)
– Source of problem may be thousands to millions
of cycles before signature hits the IO
• Have to trim running codes to RTL
environment for confirmation and debug
– RTL run 7 orders of magnitude slower than silicon;
cannot start from boot
– How to go from failure point back to a sync point?

09/29/03 TM Mak Post Silicon Validation 67


SMI and PSMI
• Feature developed from Mobile’s need
– Suspend/resume
– System Management Interrupt is the mechanism
• Full machine states are stored in memory or disk; allow
restart from that point
– Periodic running SMI is a checking pointing
method – PSMI
– From the failure point to last SMI point is short
enough to be ported to RTL environment

09/29/03 TM Mak Post Silicon Validation 68


PSMI DEBUG TOOL

What is PSMI? (Pentium Pro BUS as an example)


P-Pro bus trace with periodic SMI’s
bus transactions Processor & RTL @known state Fail point
Sync point for CSIM

1 2 3 4 5 6

Dump MTRR’s & FP Trace to


SMI pulses Invalidate cache’s/TLB’s run on
Flush BTB, defeature BAC & unwind RSB CSIM
= SMM handler code Dump SMRAM save state map
Dump APIC registers
= P6 bus transactions LA trigger point
= P6 bus transactions with fail point •BPx pin
•bus transaction
•IERR#
•FRCERR#
CPU BUS FLOW •HW/SW breakpoint(ITP)

09/29/03 TM Mak Post Silicon Validation 69


Physical Debug
• Why physical debug?
– Need to control or observe signal with no test point or scan
cell
– Need to observe analog behavior (delays, slopes,
waveforms)
– Need to modify layout to fix a timing or functional bug
• Use DFD (design-for-debug) hooks
– Probe points
– FIB cut and connection points
• Working with transistors & wires, not modeled faults

09/29/03 TM Mak Post Silicon Validation 70


Physical Debug
• Probing
– Micro-Mechanical probe
– Scanning Electron Microscopy (SEM)
– Ebeam probe points (front side)
– Picosecond Imaging Circuit Analysis (PICA)
– Laser Voltage Probing (LVP): Backside IR probing
• Focused Ion beam (blue wiring editing
options)
– Spare hardwares: Bonus cells, bonus wires
– Pre-placed circuit options that can be activated if
needed

09/29/03 TM Mak Post Silicon Validation 71


Why Probe?
Why still probe (when
scan is such a good
debug tool) ?
• For debug of Circuit
sensitivities:
– Crosstalk, inductance, PLL, Long Working distance
sense amps microscope objective 10-
15mm working distance
• For debug of speedpaths:
Probe manipulator Probe manipulator
– When speedpath is not
covered by scan, e.g. OTB
– A wide diversity of probing
tools; not necessarily
mechanical
IC

Sample Holder
and fixturing

Probestation base

09/29/03 TM Mak Post Silicon Validation 72


Micromechanical Probing
• Touch 1-um probe to wire or contact point
– Used to probe top metal directly when geometries were
bigger
– Now, probe FIB-deposited metal pads
• Typically used for power & clock, not signals

09/29/03 TM Mak Post Silicon Validation 73


The Scanning Electron Microscope
(SEM)
• All E-Beam probers are basically SEMs that incorporate
additional hardware to enable voltage measurement
– The E-Beam prober is also a high resolution microscope and
this enables us to see what we are probing
– The imaging beam and the probing are one and the same
• An SEM comprises:
– An electron gun to generate an E-Beam
– Magnetic lenses to focus the E-Beam to a small spot size
– A scanning system to scan the e-beam in TV like raster
– A detector to detect the electron that come off the surface
• These electron carry information on the surface topography and
surface voltage
– Hardware to extract image or voltage information from the
detector output
– A visual display to view the image on
09/29/03 TM Mak Post Silicon Validation 74
SEM for Defect Diagnosis

09/29/03 TM Mak Post Silicon Validation 75


Qualitative Voltage
Contrast A positively
A line at 0 Volts
Primary E-beam biased line
does not
attracts low
influence the low
energy
energy
Secondary
Secondary
Electrons back
Electrons. Most
Secondary to itself. Few
electrons escape
to the detector
Electrons electrons
collected by escape to the
and the line
detector detector and the
appears bright
line appears
dark

0 Volts Vcc Volts

09/29/03 TM Mak Post Silicon Validation 76


Quantitative Voltage measurement

Low energy Secondary Secondary Id I toV


Electron
Electrons return back Converter
Detector Vd
through the filter grid to
the detector

-
Filter grid Servo Amp
+ Vref

1 KeV Primary E-Beam Vs as measured by E-Beam

Secondary electrons
emitted from surface

IC Line: Potential Vs

09/29/03 TM Mak Post Silicon Validation 77


SEM Voltage Contrast
• Place a chip connected to a tester under the e-beam
• Run test pattern (synchronize sample to test cycles)
• Metal with negative bias reflects more electrons and appears
brighter
• Metal with positive bias reflects fewer electrons and appears
darker

09/29/03 TM Mak Post Silicon Validation 78


Fault Diagnosis with SEM Diff
• Diff known good image and bad image to find
errors

09/29/03 TM Mak Post Silicon Validation 79


Spatial and temporal analysis
possible

Metal2

Metal3

Logic state map : Waveform :


2D @ any clock voltage @ any location
09/29/03 TM Mak Post Silicon Validation 80
E-Beam Probing
• Focus electron beam on a single conductor instead of
capturing a whole image
• Run a test in a loop and take multiple samples to
average out noise and construct a voltage waveform
• Limitation (or characteristics)
– Require direct line of sight
• Probe top metal layers directly
• Aim beam between wires or through DFD probe holes (slots) to
access lower metals
• Possible to probe through insulators, but always
target conductors

09/29/03 TM Mak Post Silicon Validation 81


E-beam as Invasive Probe
• Probing is normally non-invasive due to low
electron count in beam relative to carrier
count in circuit
• Increase beam power to charge up a MOS
gate for primitive controllability

09/29/03 TM Mak Post Silicon Validation 82


The Packaging Revolution:
From Wire-Bond to Flip-Chip
Wire-bond Die Top View C4 Package and Die
Top view
LICAs

Die
Backside

Substrate

LICA: Low Inductance Capacitor Arrays


09/29/03 TM Mak Post Silicon Validation 83
C4: blessing and curses
• C4 Reduces inductance • Frontside mechanical
by over 100X and E-beam probing no
• Allows for pad longer possible
placement any-where on • Frontside FIB Patching
the chip also not possible
• Global power • Force the adoption of
distribution now in scan for debug as well
package, not on the as new physical debug
chip. tools

09/29/03 TM Mak Post Silicon Validation 84


Backside Ebeam Probing

In C4, Frontside is no longer accessible:


• For C4, all chip diagnostics must be done
through the silicon backside.
• Could drill precision holes in the substrate to
access diffusions for E-Beam probing.
• Perform blue wiring (FIB edits) directly between
diffusions/metal thru the backside.

09/29/03 TM Mak Post Silicon Validation 85


A high resolution
Infra Red (IR) image
taken through the
Backside (IR) vs. Frontside
silicon backside in
the same location
shown on the A high resolution image
N+ Diffusions
previous slide.
Shallow Trench of a P54C-QS (P854)
Isolations (STR) chip as seen from the
Poly Resistor frontside.

09/29/03 TM Mak Post Silicon Validation 86


Backside Probe points
Chip A Chip B Chip A Thinned Chip B

C4/MCM Substrate Ceramic/Organic C4/MCM Substrate Ceramic/Organic

a) Flip Chip MCM with 2 Chips A & B and LICAs. Chip B b) First Globally thin Chip B only to a thcikness
needs to be probed ~100µm - using a fast wet chemical etch

N+ Diffusion LCE Milled trench


(NAC) Diode
P-Well P-Well
Field Oxide Field Oxide
100µm
ILD0 Contact P+ Substrate

Metal 1 10µm

d) Magnified view of region in circle in c). Final Probe hole c) Magnified view of silicon chip B in MCM in the region
drilled at the base of the LCE generated trench to expose an of the circle. Use Laser Chemical Etch to mill a local
N+ diffusion (NAC) diode. The E-Beam probes the N+ trench to within 10 µ m of the P-Well and active circuits .
diffusion directly. The tapered holes improve electron The trench walls are sloped to minimize the amount of
collection through the hole for the E-Beam probing and FIB silicon removed.
imaging

Holes are required for debugging circuit problems only !

09/29/03 TM Mak Post Silicon Validation 87


FIB Probe Hole
Typical tapered probe hole to expose N+ NAC in .35um

Stepped (tapered)
probe hole

P+
Probe hole back
filled with Tungsten
to enhance contrast ILD0 NAC Metal 1
of SEM cross
section (not present
for probing)

09/29/03 TM Mak Post Silicon Validation 88


FIB Probe Hole
Generating a Probe Hole

1. Sample is FIB marked

09/29/03 TM Mak Post Silicon Validation 89


FIB Probe Hole
Generating a Probe Hole

1. Sample is FIB marked


2. High speed silicon
etch begins

09/29/03 TM Mak Post Silicon Validation 90


FIB Probe Hole
Generating a Probe Hole

1. Sample is FIB marked


2. High speed silicon
etch begins
3. Multi step milling
begins

09/29/03 TM Mak Post Silicon Validation 91


FIB Probe Hole
Generating a Probe Hole

1. Sample is FIB marked


2. High speed silicon
etch begins
3. Multi step milling
begins
4. Oxide interface is
exposed

09/29/03 TM Mak Post Silicon Validation 92


FIB Probe Hole
Generating a Probe Hole

1. Sample is FIB marked


2. High speed silicon
etch begins
3. Multi step milling
begins
4. Oxide interface is
exposed
5. Signal is exposed for
E-Beam probing.

09/29/03 TM Mak Post Silicon Validation 93


LCE (Laser Chemical Etch) Trench

Coarse LCE
Trench Etch Step
Fine LCE
Trench Etch Step

Silicon
Substrate

09/29/03 TM Mak Post Silicon Validation 94


Infrared Navigation with Fiducials

• How do I know what I’m looking at?


– Frontside access: Use top metal geography to get
oriented
– Backside access: No way to tell what’s under flat
silicon
• Place DFD fiducial “beacons” across the chip
• Under IR camera, look like bright lights
surrounded by darkness
• Align a CAD overlay to locate hidden layout
features

09/29/03 TM Mak Post Silicon Validation 95


Laser Voltage Probing (LVP)
Incident IR Reflected IR
• Probing version of Beam Beam
IR microscopy
Input
– Analogous to e- p-well electrical
High
beam signal
electric N+
• Reflected optical field region E(t)

power is ILD0 Metal


proportional to
voltage across Reflected optical power
reverse-biased P-N
junction
Optical

Optical
power

power
• Place DFD probe-
able diodes at gate time time
inputs With NO input With applied
electrical signal electrical signal

09/29/03 TM Mak Post Silicon Validation 96


LVP Sample Preparation
and Process Flow
Chip A Chip B Thinned Chip B
Chip A

C4/MCM Organic LGA Substrate C4/MCM Organic LGA Substrate

Chip B needs to be probed Globally thin Chip B to ~180µm


IR Laser Probe
ARC ARC (~0.25µm thick)
N+ PNAC
Diode 180 µm P+ substrate

ILD0 Contact ILD0 Contact


Metal 1 Metal 1
Probe N+ directly with laser Deposit SiO Anti-Reflection
Coating (ARC) layer
09/29/03 TM Mak Post Silicon Validation 97
LVP Simple Schematic
Polarizing
Collimating Faraday Silicon IC
Beam
Lens Rotator Splitter

IR Objective
Mode Locked Laser
Lens
100MHz, 1.064µm,
~30ps pulse width
Photodiodes

To Detection
Laser sampling pulses Electronics

Output Timing Waveform


09/29/03 TM Mak Post Silicon Validation 98
Electrons are Out
Infrared Photons Are In
• Carriers emit photons as they flow across P-N
junctions
– Materials and dopant concentrations are chosen in LEDs to
make the photon wavelength in the visible spectrum
– Normal MOSFETs emit infrared photons when they switch
(no steady-state current flow)
• Silicon is transparent to infrared light
– IR photons pass through substrate, out the backside of the
die
• Focus an IR camera on the die to collect photons
– Probability of emission is ~1 in a million, so test needs to be
run through millions of loops to collect enough photons

09/29/03 TM Mak Post Silicon Validation 99


Dynamic emission gives
picosecond resolution
– Emission is proportional to current
– From NMOS on falling edge
– Faint signal: 10-3 photons/ switch/ um
V
PMOS Ip
V
In
NMOS Light

time
09/29/03 TM Mak Post Silicon Validation 100
Picosecond Imaging Circuit
Analysis

Movie File (MPEG)

Movie File (MPEG)

09/29/03 TM Mak Post Silicon Validation 101


Picosecond Imaging Circuit
Analysis (PICA)

09/29/03 TM Mak Post Silicon Validation 102


Focused Ion Beam (FIB) Editing
• Verify circuit fix without the
tapeout/fabrication of new wafers
– Used for debug workarounds &
engineering samples Before Reconnect
– Not reliable enough for production with this
• Physically modify a circuit
– An Exact-o-Knife with a precision of 10nm for
cutting metal lines
– A wiring tool for reconnecting nodes with
metal with a 10nm precision After Cut Here
– Destroy & repair to access buried wires
– Cannibalization
• Success rate & throughput time metrics
• DFD features to make FIB access easier
– Deterministic & opportunistic

09/29/03 TM Mak Post Silicon Validation 103


FIB Overview
Global Thinning from 700um to 150um Local Thinning to 10um over area of interest
Die
C4 bumps Scanning
Mirrors Argon Ion
C4 Substrate Laser

Lands
Cl2 filled Cell
SiCl Active Area
Silicon
Substrate

FIB Blue Wire

Focused Gas
Focused Deliver
Ion Beam
Ion Beam Needle

W(CO)6 LCE Trench Floor Br2


W(CO)6 Br2
SiBr 4 SiBr 4
Shallow Trench Silicon Shallow Trench Silicon Substrate
Oxide Substrate Oxide
Diffusion
ILD0

1um
Metal Line (signal)

09/29/03 TM Mak Post Silicon Validation 104


Transistor Trimming

09/29/03 TM Mak Post Silicon Validation 105


Spare Hardware
• Spare combinational and/or sequential cells
– Need to choose types & density requirements
• Spare wires
– Need to choose lengths, metal layers, and accessibility
• Placement policy decisions
– Add concurrently with functional design to guarantee
availability?
– Add in leftover space after design to minimize overhead?
– How much spare hardware should be added?
– Leave floating or tie to power rail?
• DFD features for FIB accessibility, or ECO only?

09/29/03 TM Mak Post Silicon Validation 106


Example Complex FIB Edit

09/29/03 TM Mak Post Silicon Validation 107


SEM Photos of Edit

09/29/03 TM Mak Post Silicon Validation 108


Challenges for silicon debug
• Circuit level issues are difficult to model
– Charge sharing, xtalk, leakage for large fan out
– Schematics may span across hierachy (extraction is more
difficult)
– What is the equivalent of fault injection?
– Problem is usually obscure, that is why they are not
discovered before tapeout
• Existing diagnosis tools all assume some kind of
defect model – kind of useless
• Confirmation stage is usually needed
– Circuit level issues never behaved as simple fault injection
(not a 1/0 problem, at least not at the circuit where it
originates)

09/29/03 TM Mak Post Silicon Validation 109


Yield analysis
• Usually a result of fabrication induced defects
– Defect mode may or may not be known ahead of time (esp. for
new process)
– More than likely, yield is not just a stuck fault (or even
bridge/opens), it can be a performance limiter
– Even trickier if new process + new design + new package are
combined
– Process cannot be improved if defect mechanism is not fully
understood
• SRAM array is usually the yield improvement vehicle
– Bad cell or bad rows/columns are easy to identify
– But what is the defect that cause the actual performance problem
– Enter LYA test mode, pre-simulate cell fault dictionary to deduce
possible faults – electrical diagnosis
– SRAM layout is more monotonic so specific litho issues may or
may not be revealed

09/29/03 TM Mak Post Silicon Validation 110


From failed bit-maps to faults within
the cell

09/29/03 TM Mak Post Silicon Validation 111


Some defects are easily found … (if
you know where to look…)
using top-down microscope and SEM visuals during reverse processing.

09/29/03 TM Mak Post Silicon Validation 112


Decreasing circuit dimensions….
• very small defects (20nm) can be killer defects

Drain
Contact

Node 1 shorted to Node 2 10K Res


TEM

09/29/03 TM Mak Post Silicon Validation 113


Invisible defects
• Process variation
– Use to be able to be detected via
scribed line Etest structures
– On-Die process variation changes
all that; need to go back to stepper
characterization data
– Affect not only min dimension like
Le, but will also affect dielectric and
metal thickness
– Gate oxide pin hole/local oxide
thickness can cause increased
leakage
• Dopant variation
– Dopant under gate decrease to
thousands or even hundreds of
impurity atoms
– Diffusion/poly resistance

re: Asenov et al
09/29/03 TM Mak Post Silicon Validation 114
Other debug/analysis tools
• Material analysis tools
– Fab/assembly process debug

09/29/03 TM Mak Post Silicon Validation 115


Other debug/analysis tools
• Material analysis Incident
energy
tools
– Fab/assembly Sample emissions:
process debug
X-rays, visible light,
auger and secondary
sample electrons, and ions

Except for AFM,


Transmitted
This is what we do -- which is different
how complicated can it energy

be?

09/29/03 TM Mak Post Silicon Validation 116


acronym Incident detected
Atomic Force Microscopy AFM force deflection
Auger auger Electron Sample electron
30 kV
Energy Dispersive EDX Electron Sample X-ray
X-ray
Inductively coupled ICP-M S R F- Sample ions
plasma Mass generate
spectroscopy (ICP-MS) d plas m a
Infrared spectroscopy Ir ir Incident ir
Scanning Electron SEM Electrons Sample electrons (SE)
Microscopy (SEM) 1-30kV Incident Electrons (BS)
Secondary Ion Mass SIMS or Cs, O Sample ions
Spectroscopy dynamic SIMS ions
Time-of-flig ht SIMS T O F-SIMS Ga Ions Sample ions
Transmission Electron TEM Electrons Incident electrons
Microscopy 200kV
Total Reflection X-ray TXRF Mo/W X- Sample X-ray
Fluorescence ray
X-ray photoelectron Xps Al X -ray Sample electron
spectroscopy
X-ray diffraction XRD Cu X -ray Incident X-ray
X-ray Reflectivity XRR C u-X-ray Incident X-ray
09/29/03 TM Mak Post Silicon Validation 117
Failures in Packages
• Schematic representation of Thermal
Underfill IHS
a X-section of an OLGA Interface
package Material
– The 2nd schematic
highlights the types of Substrate
failures observed in a
typical multilayer organic
substrate

cracks through
the layers

Delamination at
interfaces

Cracks in plated
through hole Via adhesion
issues

09/29/03 TM Mak Post Silicon Validation 118


X-Ray Radiography

BGA Solder Ball BGA Solder Ball

C4 Bump

C4 Bump Void

Plated
Through
Hole

Multilayer interconnects mask the defects - challenge is for the


difficult materials like solders.

09/29/03 TM Mak Post Silicon Validation 119


3D XRay Computed Tomography
Images of Mammoth Stacked micro-vias
• Need to detect failures, damage to the internal structures of the package
without damaging or destroying the parts
• NON-DESTRUCTIVE techniques to probe the package, such as XRAYs…

Stacked Vias Solder joints Here, the package is intact and has not
been tampered with in the region of
interest

09/29/03 TM Mak Post Silicon Validation 120


Thermal Imaging

IR radiation
High speed
IR camera
Image/data
processing defect

Heat conduction

Flash Lamps

Monitor surface ∆T as
Surface excitation Data process
a function of time (in msec)

09/29/03 TM Mak Post Silicon Validation 121


Correlation between thermal imaging and acoustic imaging

IR image

Reflection SAM Transmission


image SAM image

IR image

Reflection SAM
image
Good
region

09/29/03 TM Mak Post Silicon Validation 122


A multi-disciplinary task
• Si-Debug or FA is a complicate problem,
often involves multi-disciplinary skills ranging
from Sys S/W to circuit design
• FA is especially hard as few units returned
bear the same signature
– Identify primary root cause so that corrective
actions be taken (solve the BIG problems first)
• Both Si-Debug and FA are under extreme
time pressure due to various business
implications
– Competitive pressure; customer line down
09/29/03 TM Mak Post Silicon Validation 123
Reference
• Validating The Intel® Pentium® 4 Processor Bob Bentley, ITJ
Q1, 2001
• An Overview of Advanced Failure Analysis Techniques for
Pentium® and Pentium® Pro Microprocessors EH Yeoh, ITJ,
Q2,98
• The application of novel failures analysis techniques for
advanced multi-layered devices, EH Yeoh, ITC97
• Pentium Pro Processor Design and debug, Carbine, A., ITC97
• Pentium Pro Processor Design for Test and Debug, D&T , A
Carbine et. al., Jul-Sept 98
• Performance characterization of the Pentium Pro processor, D
Bhandardar, 97
• Clock Generation and Distribution for the First IA-64
Microprocessor, S Tam, JSSC Nov 2000

09/29/03 TM Mak Post Silicon Validation 124


• Optical probing of flip-chip packaged
microprocessors, T Eiles, ISSCC00
• Design for Physical Debug for Silicon Microsurgery
and Probing of Flip-Chip Packaged Integrated
Circuits
– Livengood & Medeiros, ITC ’99, p.877-882
• Novel Optical Probing Technique for Flip-Chip
Packaged Microprocessors
– Paniccia, Eiles, et. al., ITC ’98, pp. 740-747
• Picosecond Imaging Circuit Analysis
– Tsang et. al., IBM Journal of R&D
– www.research.ibm.com/journal/rd/444/tsang.html
• Electron Beam Testing Technology
– Thong, 1993, Plenum Press
• ITP700 Debug Port Design Guide, Intel, Feb 2002

09/29/03 TM Mak Post Silicon Validation 125


• A Multigigahertz Clocking Scheme for the Pentium® 4 Microprocessor,
N Kurd, Nov01
• Debug Methodology for the McKinley Processor, D Josephsen, ITC
2001
• The Core Clock System for the Next Generation ItaniumTM
Microprocessor, F Anderson, ISSCC01
• System-Level Validation of the Intel® Pentium® M Processor, Isic
Silas, Intel Technology Journal, 2003
• Pre-Silicon Validation of Hyper-Threading Technology, David
Burns, Intel Technology Journal, 2002
• Proving the IEEE Correctness of Iterative
Floating-Point Square Root, Divide, and Remainder Algorithms
Marius Cornea-Hasegan, ITJ Q2, 98
• Formally Verifying IEEE Compliance of Floating-Point Hardware
John O'Leary, ITJ, Q1,99
• Thermal Challenges During Microprocessor Testing Pooya
Tadayon, ITJ Q3,2000
• Pentium® Processors Statistical Analysis of Floating Point Flaw
Intel White Paper 1994

09/29/03 TM Mak Post Silicon Validation 126

View publication stats

You might also like