0% found this document useful (0 votes)
17 views204 pages

Microelectronic

Uploaded by

Dayanand GK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views204 pages

Microelectronic

Uploaded by

Dayanand GK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 204

About MicroElectronic

TUTORIALS

for elektroda people


Microelectronic Design
Chapter 1 - Integrated Circuit Design Process (Overview)

The chapter contains the following sections:


1.1 Introduction 1.4 Design Considerations - Architecture
1.2 The Design Process 1.4.1 PLDs
1.3 Design Considerations - Technology 1.4.2 Gate Arrays
1.3.1 Silicon 1.4.3 FPGA
1.3.2 Gallium Arsenide 1.4.4 Standard Cell (CBIC)
1.4.5 Full Custom
1.5 Other Design Considerations
Self Assessment Questions

1.1 Introduction

Typically 1 cm
An integrated circuit (IC) is a piece of semiconductor
material, most commonly silicon and often referred to
as a chip.

Circuit components are built into and on one face


(monolithic) and inter-connected by metal tracks to
form a complex electrical circuit.

Key features are small size, high complexity, low cost


and very high reliability.

Penalties are limited range of component values


available and unwanted interactions caused by close
proximity of circuit components on the same chip.

IC designer’s role is to achieve required circuit


functionality despite IC limitations due to unwanted
interactions.
1 cm
Containing up to 1 million components
● First ICs developed in 1958 by Jack Kilby of Texas Instruments.
● Development of IC a result of the development of the first transistor by Shockly, Bardeen and Brattain at Bell Laboratories in 1947.
● Initial ICs consisted of only a few transistors and resistors but offered advantages on systems, based on thermionic values, used in
computers at that time in terms of reduced size, improved reliability and reduced power.
● Technological improvements now enable chips to be designed containing up to 10 million transistors referred to as Very Large Scale
Integration (VLSI).

Scale of increase of complexity (number of transistors on a chip) often expressed as complexity doubles every 2.2 years (Moore’s Law) -
shown diagrammatically over the last 35 years in Graph 1. Some recent evidence that the increase in complexity is slowing down becoming
more of a curve than a straight line (see graph 1)

Graph 1 - Integrated circuit complexity versus time

1.2 The Design Process

Design of a complex VLSI chip is a major task - can take several man-years. Teams of designers working on different parts of a
chip use Computer Aided Design (CAD) tools to reduce design time and with the objective of ensuring that design is correct first
time.

Diagram 1. shows the IC design/manufacturing process overall.


Note: This is
a key
diagram and
the different
contributions
will be
covered in
the modules
taken on
this course
and will
enable
delegates to
carry out
actual IC
designs as
they
progress
through the
modules
and carry
out the final
project.

Diagram 1 - The IC design process

Chip specification drawn up by the system user/designer covering all aspects including functionality, speed, power dissipation,
package type, number of package pins, volume, reliability, voltage and current ratings, etc.
Initial specification leads the designer to conclusions on the technology and architecture to be used.

System design and system partitioning into sub-systems follows.

System test considerations taken into account to ensure adequate and economic testing possible. (Design for Test or DFT).

Sub-system design using pre-defined circuits (cells) or specially designed sub-systems if required and allowed within the design
style adopted.

Designs are usually now carried out using a Hardware Description Language (HDL) such as VHDL (Very High Speed IC HDL)
for digital systems or Analogue HDL for analogue or mixed analogue/digital systems. Advantage is ease of use, and correct-by-
design facility by using a synthesis tool to generate the low-level designs.

Traditional design procedure (Diagram 1) is:

● Draw the circuit using a schematic capture package.

● Simulate the circuit using a simulator tool (logic for digital, circuit for analogue).

● Lay out the chip using a layout package.

● Back-annotate the layout to check for corrections and re-simulate using actual layout parameters.

● Mask production from the layout.

● Silicon fabrication using the masks and thin slices of silicon (wafers), test the wafers and package the chips.

● Production test.
1.3 Design Considerations - Technology

At least two materials are available for fabricating ICs:

Silicon (Si).

Gallium arsenide (GaAs).

1.3.1 Silicon
• Two main types of technologies available in silicon based on Bipolar Junction Transistor (BJT) and Metal-Oxide-
Semiconductor Field Effect Transistor (MOSFET).
• Bipolar technology offers high speed, high current drive but at the expense of high power dissipation, low
complexity. It uses n-p-n and p-n-p BJTs, diodes and resistors and is good for analogue and digital circuits.
• MOSFET technology offers moderate speed and power and high complexity and based on either Enhancement
(normally off) type n-channel (EnMOS) or, for improved performance, both Enhancement and Depletion type (normally
on) n-channel MOSFETS (EDnMOS). Only suitable for digital circuits.
• p-channel MOSFET technology (pMOS) solely not offered any more due to the devices' slow performance
compared to nMOS technology because of lower mobility of the charge carriers (holes).
• Technology based on both p-channel and n-channel enhancement MOSFET known as Complementary Metal-
Oxide Semiconductor (CMOS) popular as it offers very low power dissipation, is suitable for both analogue and digital
applications but at the expense of lower complexity compared to EDnMOS. CMOS is becoming the standard process
for all but the highest speed devices.
• Technology based on both bipolar and CMOS (BiCMOS) can give the best of both technologies but at the
expense of increased fabrication complexity and cost.

1.3.2 Gallium Arsenide

● Inherently material capable of producing much faster transistors than silicon.

● Based on components which are a variation of field effect transistors known as Metal Semiconductor Field Effect Transistors (MESFET).

● Originally used in very high frequency ICs but now available for very high speed digital circuits of relatively low complexity.

● Expensive compared to silicon ICs.

● Choice of technology will depend on a number of factors given in the specification, including cost, CAD tools, etc.

1.4 Design Considerations - Architecture


Several architecture choices available; the main types are:

● Programmable Logic Device (PLD) (semi-custom)

● Gate array (semi-custom)

● Field Programmable Gate Array (FPGA) (semi-custom)

● Standard cell (semi-custom). Often called CBIC (Cell-based I.C.)

● Full custom

Note: In semi-custom the designer accepts restrictions in order to simplify the design whereas in full custom the designer is free
to optimise each component to improve performance and reduce chip size. This increases cost and design time enormously.

1.4.1 PLDs
• PLDs are reusable PROM memory devices used in computers where the user programmes an array of
transistors/gates to form a given function using an electrical programming device.
• No involvement of IC manufacture.
• Cheap and quick but with little flexibility.
• Range of CAD tools available running on PCs.
• Originally digital only but pure analogue arrays now also available.

1.4.2 Gate Arrays


• Gate arrays are arrays of transistors manufactured on-chip but not connected.
• Individual interconnection mask made for each design which is used to ‘customise’ the arrays.
• Cheap for medium or even low volumes.
• Semiconductor manufacturer has to be involved in the process.
• Typical turn-round times 2-4 weeks.
• High complexity now available up to a million logic gates using 4 million transistors.
• Available in most technologies.
• High flexibility.
• Originally purely digital but now available with analogue cells giving mixed analogue/digital capability.

1.4.3 FPGA
• Variation of gate array which allows the user to field programme a gate array as in PLD and thus reduce design
time.
• FPGAs less dense than gate arrays and therefore higher unit cost.
• Compatible FPGA/gate array device ranges available to enable initial designs to be made using FPGAs and
transfer easily to gate arrays later if required.

1.4.4 Standard Cell (CBIC)


• Designer uses a library of standard pre-designed cells stored in a computer and builds up his chip design in that
way.
• Highly flexible in allowing cells to be placed, modified and connected giving chip areas typically 10-15% smaller
in area compared to gate arrays and hence lower unit costs.
• Since each design fully customised requires the production of a unique full mask set and therefore longer turn-
round time (typically 10 weeks) and higher initial (non-necessary) charges and therefore only usually economic at high
volumes.

1.4.5 Full Custom


• The chip is specially designed from the beginning giving a design that is fully customised to a given specification.
• Designer effectively works at transistor level but may use pre-defined cells if suitable.
• Design may take a number of man-years and therefore very expensive.
• Finished chip is likely to match the specification precisely but at high cost.
• Only economic at very high volumes ( in excess of 100K units/year).

1.5 Other Design Considerations


General design approach either ‘top-down’ - essentially starting from the system requirements or ‘bottom-up’ - starting from the
IC components available to the designer.

Chip area and floorplan and packaging aspects need to be considered, because chip area usually determines cost and yield
while packaging is of concern to the customer and designer alike.

Power supply including operating voltage and current requirements are required in order to design power track widths,
electromigration, etc.

CAD tools availability, effectiveness and ease of use important as they will form a key part of the design process, including
manufacture.

Test aspects are vital, including the incorporation of Design for Test (DfT) circuitry to enable efficient and effective production
testing to take place in a few seconds.

Test costs are becoming a significant part of total chip costs in many cases. See Graph2:
Graph 2 - Relative cost versus cost breakdown.

Finally, last but not least, cost considerations have to be taken into account as each of the previous technological, architectural
and other considerations have different cost implications.

We can estimate the cost of a particular chip having made some reasonable assumptions.
For a typical chip cost illustration see section 1.6.1 in the recommended textbook.

For this example this results in a typical cashflow forecast - see Graph 3:

Graph 3 - Balance versus months

Hence the break-even time (27 months in this case) can be estimated.
Microelectronic Design
Chapter 2 - Microelectronics Fabrication Process

The chapter contains the following sections:


2.1 Introduction 2.5 Silicon Dioxide & Other
2.2 Epitaxial Growth or Epitaxy Dielectric Layers
2.3 Photolithography and Etching 2.6 Aluminium & Other Metallisation
2.4 Diffusion and Ion Implantation Layers
2.4.1 Diffusion 2.7 Packaging & Assembly
2.4.2 Ion Implantation 2.8 Examples of IC Fabrication
Process

2.1 Introduction

● Fabrication process described in this module applies only to silicon as it is by far the most commonly used semiconductor
material, and covers all the stages involved in the fabrication of any microelectronics device.

● GaAs fabrication process has many parallels with silicon process but is different in some significant ways - for further
information see texts on GaAs device fabrication.

● Semiconductors essential for the fabrication of microelectronic devices because their atomic band structures are such that the
addition of small amounts of certain elements (doping) changes the electrical properties of the semiconductor dramatically.
● Silicon naturally occurs as silicon dioxide in sand,
chemically reduced and purified until it is very pure
containing typically <1 part per billion (ppb) impurity.

Polysilicon Ingots

Ingot Pulling

● Silicon is then treated using


processes such as float zone so as
to produce single crystal ingots
typically 8-10" diameter of highly
pure or accurately doped silicon.

Silicon Ingots
Single Silicon Ingot
(Mitsubishi)

● Ingots sawn into thin wafers or substrates


which are the starting points of the IC
fabrication process - for further
information see section A1.2 in the
textbook.

Wafers
● Many variants of the basic IC process to produce different types of IC eg. bipolar, CMOS, nMOS, etc. but all essentially
follow the stages shown in Diagram 2.

Diagram 2 - Process stages


• Complete IC fabrication process has many individual processing steps (>100) and can take
several weeks to carry out.
• Each process step accurately controlled in order to give acceptable overall result (high process
yield).
• For a typical IC chip of area 1cm × 1cm containing 1 million or more components each
component in the order of µms (human hair approximately 50µm diameter). Patterning defines the
component sizes - currently 0.2-0.5µm (line width).

2.2 Epitaxial Growth or Epitaxy


● Process whereby very thin
layers (1-10µm) of
accurately controlled
doped silicon ‘grown’
onto the wafer in such a
way that the crystal
structure is continuous
between the substrate and
the epitaxial layer. See
diagrams for the
equipment used for
epitaxy.

Interior of epitaxial reactor Epitaxial reactors


(Mitsubishi) (Mitsubishi)

2.3 Photolithography and Etching


Photoresist Application
(Ontrak)

Mask

Process which effectively transfers the chip layout on a mask onto the silicon surface - has similarities to photographic
printing.

Stepper Mask
(AMS Lithography) (SGS Thomson)

• Object is to enable many (millions) of shapes to be printed on the wafer in one operation
(enormous cost benefits).
• Most important process as far as ensuring that the various components line up with each other
and are interconnected correctly (determines line width)
2.4 Diffusion and Ion Implantation

• Ability to change the doping level or type essential in IC fabrication process since parts of the
IC have to be doped differently in order that the IC functions correctly. For example, for a high gain
BJT, emitter doping>base doping>collector doping.
• Epitaxy (section 2.2) changes the doping over the whole of the wafer (globally) whereas more
often it is required to change the doping over part of the slice (selectively).
• Photolithography used to form patterns on the wafer surface but cannot, in itself, be used as a
mask to prevent dopants reaching the silicon wafer underneath it. This is because both diffusion and
ion implantation are high temperature/high energy processes and the chemical elements involved
would simply pass through the photoresist.
• Silicon dioxide which can be easily formed on the surface of the wafer and is very dense and
strong, is capable of forming an excellent dopant barrier.
• Process to selectively dope an area in two steps; firstly photolithography used to define the
required pattern in the silicon dioxide layer which is then used in the second step to limit the dopant
to the required areas only.
• Process relies on the ability to accurately remove material such as silicon dioxide defined by
photolithography - process known as etching.

● Originally used acids or


solvents of various types (wet
etching) but suffered seriously
since that removed material in
all directions (isotropic) thus
removing the masking material
also resulting in a change in
the pattern dimensions from
original layout/mask. This was
known as undercutting and
affected device line widths and
in some cases produced faulty Automated Acid Etch
devices. (SEZ)

Acid Etch
(Cybor)
● Dry etching anisotropic processes developed which only
remove material in one direction (normally vertically)
overcoming undercutting, giving a faithful representation
of the mask pattern on the silicon wafer.

Plasma Asher
(Fusion Systems)

2.4.1Diffusion

• Diffusion (or solid state


diffusion) is the process
whereby a solid will physically
diffuse itself into another solid in
close contact with it due to the
random thermal movement of
atoms.
• Essentially zero at room
temperatures and up to 300-
400ºC, over long periods at
normal operating temperatures. Furnace
Important in finished devices (Thermco Systems)
maintaining their functionality.
• At high temperatures
(>1,000ºC) diffusion increases
considerably.

• To form a p-type region the element boron is diffused into silicon while the elements arsenic or
phosphorus used to form n-type regions.
• For an n-type wafer placed in a furnace at high temperatures in the presence of a high
concentration of boron, the boron will progressively diffuse into the wafer to a depth dependent on
the furnace temperature and duration. (Typical depths used are 0.25-2.0µm). A p-n junction is then
formed in the wafer whose electrical properties are those of a diode and is electrically stable.
• Further diffusions could be used to form n-p-n, BJTs, MOSFETS, resistors, etc.

2.4.2 Ion Implantation


• Alternative to diffusion which may be used in certain cases.
• Some diffusion processes are difficult to control accurately in terms of junction depths and
doping concentration and in particular, due to the high temperatures involved, the profiles of the
dopant fronts are not square but tapered in depth resulting in non-ideal device performance.
• Ion implantation works in two stages by firing high energy atoms of the relevant elements, say
boron, onto the silicon wafer. The ions travel a small distance (typically <1µm) into the wafer before
losing their energy and being absorbed. Accurate control of the energy of the ions ensures that the
absorption depth has a tight tolerance.
• Second stage is the annealing process at temperatures of about 900ºC for a short time to repair
the mechanical damage caused by the high energy ions and also to cause ions to fit into the silicon
crystal lattice substitutionally and hence become electrically active as dopant atoms.
• Essentially a low temperature process ensuring squarer diffusion profiles and less unwanted
diffusions and hence improved device electrical characteristics but slower than diffusion.

2.5 Silicon Dioxide & Other Dielectric Layers

• Silicon dioxide and silicon nitride are dielectric materials and are used in IC processing for
electrical insulation purposes and for passivation (final covering of all exposed areas of silicon).

• Silicon dioxide and silicon


nitride also used as dopant
masks, and as dielectrics in
capacitors.
• Both types of layers easily
formed by deposition or by
heating the wafer at temperatures
up to around 1,000ºC in the
presence of relevant gases.
• Polycrystalline Silicon
(polysilicon) is made up of many
small grains of silicon and is not
a regular crystal structure as the
wafer itself is formed by
depositing silicon onto the
wafer.
• Used as an interconnect Oxidation Furnace
between parts of the chip (highly (Thermco Systems)
doped in order to reduce
resistance) or for passivation
(undoped to increase its
resistance).
2.6 Aluminium & Other Metallisation Layers
• ICs contain a very large number of transistors and other components formed in the silicon
wafer surface.
• The components have to be interconnected to form a working circuit using a low resistance
material (metal) - usually aluminium - that is compatible with the silicon fabrication process
(metallisation).
• Polysilicon also used as it offers the ability to form extra layers relatively easily but its
resistance is higher than a metal.

PVD Sputtering Tool


(Sputtered Films Corporation)
Thin Film Deposition
(Acatel Corporation)

• Metallisation takes place towards the end of the fabrication process and involves the deposition
of a thin layer of aluminium (typically 1µm) over the whole of the wafer, (by processes known as
aluminium evaporation or sputtering) and then the use of photolithography to define the interconnect
pattern.
• A layer of dielectric over the first level interconnect will allow a further layer of interconnect to
be formed and so on (multi-level interconnect). The interconnect on the higher layers are connected
to the silicon wafer by cutting contact holes in the insulator or to each other.
• A typical IC process will see one or two layers of polysilicon and two or three layers of
aluminium.

2.7 Packaging & Assembly


● Before assembly into an IC package starts each die on the finished wafer is electrically tested using a computer driven wafer
probe system.
● Failed die are ink-marked and will not be packaged.

● Wafer is diced up, faulty chips


removed, good chips bonded into
Wire Bonding appropriate IC packages,
(Kaijo Corporation) interconnect between chip and
package formed using thin gold
wire, and finally package
hermetically sealed. Alternatively
package sealed by moulding
plastic around the silicon chip.

Die Lead Frame Attachment


Wire Bonding (Ablestik)
(Kulicke & Soffa Industries Incorporated)

● Final test of completed devices carried out against full specification prior to shipment.

2.8 Examples of IC Fabrication Process


All processes for making different ICs such as bipolar, CMOS, nMOS, etc. use the stages described in sections 2.1-2.7
many times in order to fabricate the particular device in question as shown in Diagram 2.
Microelectronic Design
Chapter 3 - Integrated Circuit Packaging

The chapter contains the following sections:


3.1 Introduction 3.4 Package Cooling Considerations
3.2 Design Considerations 3.5 Package Sealing and Encapsulation
3.2.1 Electrical Considerations
3.2.2 Thermal Considerations
3.3 IC Packaging Technologies
3.3.1 Chip Package Connection
3.3.2 Chip Packages
3.1 Introduction
● Packaging of ICs is the
ability to establish
interconnections between the
chip and package and
package to PCB, etc. and
maintain a suitable operating
environment for the IC to
function effectively and
efficiently.
● Key aspects are topological,
electrical, thermal and
reliability. Four major functions of the package
● IC package has four major
functions:
❍ Power distribution.
❍ Signal distribution.
❍ Heat dissipation.
❍ Circuit protection.
Packaging process must
be cheap and reliable.

Commercial IC

3.2 Design Considerations

● Successful package will satisfy all application requirements at an acceptable design, manufacturing and operating cost.
● Improving technology increases the demands on the number and density of package pins and interconnections demanding reduced physical dimensions
requiring the use of improved techniques.
● Need to improve quality and reliability of the packaging process important.
● Number of connections (pin-out) a major cost factor and strongly dependent on IC function, eg. memories require few connection pins but random logic
requires many more.
● The number of terminals required and the number of circuits are related by Rent's Rule which states that
N = KMp where N is the number of terminals required, K is the average number of terminals used for an individual logic circuit, M is the total number of
circuits and p is a constant (0 £ p £ 1). Typically p is 0.12 for static memory, 0.45 for microprocessors and 0.63 for high performance computer chips.
● Graph 1 shows pin-out for memory and random logic ICs showing memories requiring relatively few pins whereas random logic requires many and
approximately follows Rent’s Rule.
Graph 1 - Graph of number of pins (terminals) versus circuit complexity for various microelectronic functions

● Increasing pin-out requires the package to squeeze more pins into the same or less space whilst still maintaining requirements of mechanical fragility, electrical
performance and thermal specification.

3.2.1 Electrical Considerations

● Basic package electrical parasitics of resistance, inductance and capacitance present in all IC packages and can cause signal delays, signal distortion and noise.
● Self-capacitances in particular cause signal delays.
● Non-zero source resistances also increase signal delays.
● New more complex packages have reduced self-capacitances (by using improved geometry and lower dielectric constant materials) and reduced source
resistances.
● Package resistance causes voltage drops and increases signal delays.
● Signal reflections are particularly troublesome causing faulty circuit operation in some cases.
● Noise generated by switching current from one chip driver can affect other drivers through inductances.
● Reduce noise by reducing inductances, restricting the total switched current, or using decoupling capacitors.
● Power distribution across a chip must be accurately controlled (<10% variation maximum) so package power lines must be designed to ensure this happens
even in the event of circuit switching activity.

3.2.2 Thermal Considerations

● More complex chips make more demands on efficient heat removal from the chips.
● Silicon chips limited to approximately 100ºC for normal operation which limits power densities on chip to a maximum of 10watts/cm2 in current IC packages.
● Imposes a limit average on power dissipation per individual circuit on chip of approximately 1µW/circuit for a 10 million transistor chip of area 1cm × 1 cm.
● Important to reduce power dissipation/circuit and improve package thermal design in order to produce larger, more complex ICs in the future.
● Differential thermal expansion of package parts gives rise to mechanical stresses and reliability risks.

3.3 IC Packaging Technologies

● Total package technologies in electronic products are very diverse and include ICs, PCBs, flexible circuit carriers and Multi-Chip Modules (MCMs) as shown
in Table 1.
Chip 1st level package 1st to 2nd 2nd level package 2nd to 3rd 3rd level package Chip Max
connection level level cooling chips/
connection connection system
Consumer electronics
WB PSCM SMT/PTH Card - - - <10
WB/TAB PSCM SMT/PTH Card - - -
WB PSCM SMT/PTH Card/flex - - -
Low-end systems
WB PSCM SMT/PTH Card Connection Board - 10s
WB PSCM SMT/PTH Card Connection Board -
WB PSCM SMT/PTH Card - - -
WB PSCM SMT/PTH Card - - -
WB PSCM SMT/PTH Card/flex - - -
Intermediate systems
WB C-SCM PTH Card Connection Board Air 100s
WB C-PGA SMT Card Connection Board Air (w/fin)
WB C-PGA PTH Card Connection Board Air
C4 C-TCM PTH Board Connection Cable Air
Large systems
WB C-L-CC SMT P-G board Connection Cable Water 1,000s
WB C0FP C-MCM PTH P-G board Cable P-G board Air
C4 C-TCM PTH FR-4 board Connection Cable Water
TAB FTC SMT LCM PTH P-G board Water
Supercomputers
WB C-FP SMT Card Connection Cable FC-78 >10,000
TAB C-LCC SMT Board Connection Cable LN2
TAB FTC SMT LCM PTH P-G board Water
Table 1 - Typical packaging technologies

C - FP: Ceramic flat pack LCM: Liquid cooled module


Ceramic leaded chip carrier PC: Personal Computer
C - MCM: Ceramic multichip module PGA: Pin grid array
P-G
Conn: Connector Polyimide-glass board
Board:
C - PGA: Ceramic pin grid array PSCM:Plastic single chip module
C - SCM: Ceramic single chip module PTH: Pin-through-hole
C - TCM: Ceramic Thermal conduction module SMT: Surfacemount technology
FC - 78: Fluorocarbon liquid TAB: Tape automated bonding
FR - 4 Board: Epoxy-glass board TCM: Thermal conduction module
FTC: Flip TAB carrier WB: Wirebond
LCC: Leaded chip carrier

● A wide range of different materials used in packages as shown in Table 2.

Technology Technology Typical Typical Typical


function options materials process process
temperature
oC
Connection to chip Wirebond Gold, aluminium Wirebond 225
Solder bond (C4) Pb-Sn Reflow 360
TAB Copper, gold, Thermocompression 550
aluminium, polyimide
1st level package Ceramic Al2O3, SiC, BeO Sintering 1,500-2,000
Plastic Epoxy Moulding 200
TAB Cu on Kapton® Adhesive bond 200
1st to 2nd level connection Surface mount solder Pb-Sn Reflow 220
Pin-in-hole solder Kovar, Pb/Sn Reflow 220
Pin braze Kovar, Au/Sn Braze 400
2nd level package Card Epoxy glass Cure 200
Metal carrier Glass on steel, invar Fuse 1,000
Flex Cu on Kapton® Adhesive bond 200
Injection moulding card Resin Moulding 200
3rd level package Board Epoxy glass Cure 175
Polyimide, glass Cure 200
2nd to 3rd level connection Connector Polymer, BeCu Cure 200
Cable Polymer, copper Cure 200
Note: Kapton is a trademark of Dupont Company
Table 2 - Packaging technologies and processes

● Packaging hierarchy imposes thermal hierarchy considerations on the assembly of the total system.

3.3.1 Chip Package Connection

● Connections between the chip and package commonly performed by one of the following three technologies:
❍ Wirebond using thin gold or aluminium wire.
❍ Solder bond or Controlled Collapse Chip Connection (C4), also called flip-chip.
❍ Tape Automated Bonding (TAB).
● Wirebond most common, cheapest, lowest temperature but limited in the number of connections that can be made (<500) and also high lead inductance
limiting electrical performance.
● C4 capable of many more connections up to 20k and low lead inductance but more complex technology and higher temperature.
● TAB gives higher number of connections than wirebond though not as high as C4, better electrical performance, high yield and lower assembly costs but
highest temperature and more complex technology.

3.3.2 Chip Packages

● Chip packages are made of metal, ceramics and are either hermetically sealed or encapsulated in plastic.
● The most common types are:

Dual-In-Line (DIP) and variants

Intended for the older ‘through the board’ style


Pin Grid Array (PGA)

Leadless Chip Carrier (LLCC)

Small Outline Package (SOP)

Leaded Chip Carrier (PLCC)

Intended for the cheaper surface mount technology


Quad Flat Pack (QFP)

Tape Automated Bonding (TAB)

Chip Scale Packages

● Their main characteristics are given in Table 3:

Package Package materials Number of Future I/O


interconnections 2000 spacing
(mm)
Dual-in-line Alumina ceramic, 64 2.54
plastic 64 2.54
Shrink DIP Plastic 64 1.77

Skinny DIP Plastic 64 2.54


Single-in-line Plastic 21 2.54
(SIP)
Leadless chip Ceramic 132 1.27
carrier (LLCC)
300 400 0.63
Plastic 180 1.00
Small outline Plastic 40 1.27
package (SOP)
Leaded chip Plastic 84 1.27
carrier (LCC)
144 0.63
Quad Flat pack Plastic 130 1.00
(QFP)
160 500 0.65
Ceramic 180 0.40

200 0.63

Very small 300 .0.40


600
peripheric array
500 0.40
Pin-grid array Alumina (single chip) 312 2.54
(PGA)
>1000 1.27
Ceramic (multichip) 2177 >5000 2.54
Plastic 240 2.54

>500 2.54
Tap automated Plastic 300 0.50
bonding (TAB) >1000
0.25
Ball grid array Plastic 300 >500 0.50
Ceramic 604 >1000 0.40
Chip Scale 300-1000 >1000 0.5
Thin film ceramic
packages 300-1000 >1000 0.5
SLIM Thin film - >5000 0.25
Table 3 - First level single chip packages and their characteristics

● IC packages have to:


❍ Provide the required number of pins for power and signals.
❍ Ensure thermal expansion compatibility with the chip.
❍ Have a thermal path for heat removal from the chip.
❍ Keep signal delays and noise to a minimum.
● Low dielectric constant materials used where possible to reduce signal delays.
● High thermal conductivity materials used to give good thermal properties.
● Area occupied by packages on PCB often an important concern.
● Other aspects sometimes important are burn-in ability, test, solder bond or socket to PCB and field upgradeability.
● Costs always important - Graph 3 gives a plot of cost per lead ratio versus number of pins for the common packages.
Graph 3 - Cost per lead ratio versus number of I/O pins

3.4 Package Cooling Considerations

● Despite reductions in power dissipation/individual circuit due to technology improvements, current chips contain more individual circuits/chip and so total
power dissipation/chip is increasing.
● Graph 4 shows a plot of typical power dissipation per chip between 1970-1990 showing at least a tenfold increase over that period.
● Most systems use forced-air cooling of modules for package cooling.

Graph 4 - Power per chip versus year of first use

● High performance systems additionally use heat sinks mounted onto packages using various technologies.
● In extreme cases packages are further cooled by immersion in inert liquids, such as fluorocarbons.
3.5 Package Sealing and Encapsulation

● Intended to protect the chip and package metallisation from corroding environments and from mechanical damage due to handling.
● Moisture is one of the major sources of corrosion.
● Plastic materials, such as silicones and epoxies, developed with low water diffusion properties are used extensively for IC encapsulation.
● For high reliability devices hermetic sealing used based on welding or brazing of ceramic/metal packages. More expensive and time-consuming than plastic
encapsulation.
Microelectronic Technologies and Applications
Chapter 1 - CMOS Digital Logic

Chapter Overview
This chapter reviews aspects covered in two earlier modules: Microelectronic Design and Business Issues and Benefits of Microelectronic Devices, and also covered in
the textbook.

The chapter contains the following sections:


1.1.1 The Design Process
1.1.2 ASIC Cost Comparisons
1.1.3 Cell Libraries
1.1.4 Websites of Interest
1.1.5 CMOS Inverter
1.1.6 CMOS Process
1.1.7 CMOS Design Rules
1.1.8 Logic Cells
1.1.8.1 Combinational
1.1.8.2 Sequential
1.1.8.3 Datapath Logic Cells
1.1.8.4 Datapath Cells
1.1.8.5 I/O Cells
1.1.8.6 Compiled Cells
Self Assessment

1.1.1 Chapter Overview - The Design Process

● Work through the MIB - DTI presentation associated with this chapter.
● Review sections 1.1 and 1.2 in module Microelectronic Design.
● Read section 1.2 Design Flow in the textbook.
● Module concentrates on ICs made using CMOS technologies (low power, cheapest and popular).
● Basic two input NAND gate (equivalent gate) requires four transistors in CMOS; the number of equivalent gates equals one quarter the number of transistors.
● Feature size, l quoted for processes (currently 0.25µm typically); equals half the dimension (length or width) of the smallest transistor.
● Review sections 1.4.1 - 1.4.5 in module Microelectronic Design on the different types of ASICs.
● Read sections 1.1 - 1.1.8 in the textbook on the same topic.
● Read section 1.3 in the textbook - case study on the development of the Sun Microsystems SPARCstation 1.

1.1.2 ASIC Cost Comparisons

● Review the costing within section 1.5 of module Microelectronic Design and also in module Business Issues and Benefits of Microelectronic Devices.
● Read sections 1.4, 1.4.1, 1.4.2 in the textbook relating to product costs for different ASIC solutions.
● Figure 1.11 shows a break-even analysis for different ASIC types.
FIGURE 1.11 - A break-even analysis for an FPGA, a masked
gate array (MGA) and a custom cell-based ASIC (CBIC). The
break-even volume between two technologies is the point at
which the total cost of parts are equal. These numbers are very
approximate.

● Familiarise yourself with the constituent parts of fixed costs (spreadsheet figure 1.12) - section 1.4.3 and variable costs (spreadsheet figure 1.14) - section 1.4.4 in the textbook.
● Understand how the costs change with technology advances and product maturity, see figure 1.15

FIGURE 1.15 - Example price per gate figures.

● Carry out the EXCEL Spreadsheet Exercise

1.1.3 Cell Libraries

● Read section 1.5 in the textbook.


● Understand the importance of the cell libraries in ASIC design.
● Be familiar with the different types of cell libraries eg. physical, behavioural etc. and their uses in the design process.
EDN - http://www.ednmag.com/
University Video Communications - http://www.uvc.com/
EDAC - http://www.edac.org/
EDA Companies (list) - http://www.yahoo.com/Business%20&%20Economy/Companies/Computers/Software/Graphics/CAD/IC%20Design/
MOSIS Users' Group - http://www-ece.engr.utk/
NASA - http://nppp.jpl.nasa.gov/dmg/jpl/loc/

1.1.5 CMOS Inverter


See section 2.1 in the textbook
· Based on CMOS transistors.
· A CMOS transistor is essentially a switch with four terminals, gate(G), source(S), drain(D) and substrate or bulk (B).
· Two types of CMOS transistors - n-channel and p-channel.
· Positive logic assumed - VDD is logic 1 (say +5V) and VSS is logic 0 (say OV).
· n-channel transistor is ON with logic 1 at the gate and OFF with logic 0 at the gate.
· p-channel transistor is ON with logic 0 at the gate and OFF with logic 1 at the gate. - see figure 2.1

Figure 1.1 - CMOS transistors as switches. (a) An n-channel transistor. (b) a p-channel transistor. (c) A CMOS inverter and its symbol (an equilateral triangle and a circle).

• CMOS inverter formed by connecting n-channel and p-channel transistors in series between VDD and VSS
• Operation is that inverter output at logic 0 for logic 1 input and output at logic 1 for logic 0 input - see CMOS interactive
exercise.
• In both logic states one transistor OFF and hence no current flow and very low power dissipation (virtually zero) in both
states. Major advantage of CMOS!
• Other gates eg. NAND/NOR and more complex structures can be easily designed - see figure 2.2 in the textbook.
• Theory of CMOS transistor operation complex - see sections 2.1, 2.1.1, 2.1.2, 2.1.4 in the textbook.
• Theory gives V-I equations in saturation (VDS>VGS-VtN) as equation 2.12 and in linear region (VDS < VGS-VtN) as equation
2.9 for n-channel transistor.
• Equations 2.15 apply for p-channel transistor.
• All V-I equations contain b term (Gain Factor) which equals k (process transconductance) times width/length of the transistor.
• b Important allowing current in a CMOS transistor to be varied by varying device geometry (W/L) as well as terminal voltages.
• Theory and practical measurements agree well (see figure 1.4). Short-channel transistors are normal in most ASIC devices.

Figure 2.4 - MOS n-channel transistor characteristics for a generic 0.5mm process (G5).
A short channel transistor, with W = 6 mm and L = 0.6 mm
(drawn) and a long channel transistor (W = 60mm, L =
6mm)

• CAD programme SPICE often used to characterise transistors or gates. Parameters for a generic 0.5µm CMOS process given
in Table 1.1 for example.
• Due to transistor operation logic levels can be either strong or weak.
• n-channel transistor gives a strong '0' logic level but a weak '1' - see section 2.1.4 and Figure 2.5 in the textbook.
• p-channel transistor gives a strong '1' logic level but a weak '0'.
• Use both transistors together in CMOS gates to give strong '0' and '1' levels.

1.1.6 CMOS Process


See section 2.2 in the textbook

· Review chapter 2 in module mdesign - Microelectronics Fabrication Process and in particular the single n-well CMOS process (section 2.8).
· The various mask/layer names together with MOSIS (US design house) mask labels are given in Table 2.2
· Using these names Figure 2.7 in the textbook shows the layers required to achieve a typical standard cell layout given in figure 1.3 (p8),
together with the complete cell layout and the phantom layout often used in ASIC designs.
· 'Wells' of the opposite type of semiconductor to the substrate are used in CMOS to allow the fabrication of p-type and n-type transistors on the
same substrate.
· Single, twin and triple-well processes available.
· In general the more wells in a process the more control over transistor properties.
· In all cases n-wells must be connected to the most positive part of the circuit (VDD) to ensure that substrate/source drain junctions are not
forward biased and p-wells must be connected to the most negative part of the circuit (VSS) for the same reason.
· Often substrate connections not shown on circuit schematics but vital for correct circuit operation.
· CMOS process for circuit depicted in figure 2.7 described in pages 52-55 of the textbook.
· Sheet resistance is inversely proportional to the concentration of a doped layer. Sheet resistance is a measure of the concentration of a
semiconductor doped layer.
· Sheet resistance measured in ohms/square since layers are very shallow compared to widths/lengths.
· Typical values between 1.1kW/sq for an n-well to 30 x 10-3W/sq for metal - see Tables 2.3 and 2.4 in the textbook for example set.
· Contact resistance (CR) - metal/silicon - often significant and process steps taken to reduce CR and also improve contact reliability (see
Tables 2.5 and 2.6 in the textbook).

1.1.7 CMOS Design Rules


· Design rules are a comprehensive set of rules, derived by a foundry which states the minimum geometry of layers and their relations to other
layers to ensure correct circuit functionality after fabrication by the foundry.
· Often given in terms of l (minimum feature size) currently 0.25µm so that the rules are scaleable as technology develops and feature size
reduces.
· MOSIS is a US-based foundry serving their academic institutions; its design rules (version 7) are shown in figure 2.11, Table 2.7, 2.8 and 2.9
in the textbook.
· Design rules are only of concern to full-custom ASIC designs as in other ASICs the design rules are in-built into the layout and checking
software and therefore it is impossible for designers to contravene them.

1.1.8 Logic Cells


1.1.8.1 Combinational

· Basic combinational gates (NAND, NOR, etc) can be made in CMOS (figure 2.2) as already discussed.
· More complex combinational cells comprising several gate combinations such as AND-OR-INVERT(AOI) and OR-AND-INVERT (OAI) are
much more efficient in CMOS and often used in combinational design - see Table 2.10 in the textbook.
· Numbering notation based on number of inputs at first level and second level often used - see Figure 2.12 in the textbook.
· Design procedure (pushing bubbles!) using networks of transistors (stacks) used.
· Illustrated in for the AOI 221in figure 2.13 of the textbook.
· Different hole and election mobilities give rise to different transistor gain factors bn and bp (equations 2.11 and 2.15 in the textbook).
· Equalise by adjusting (W/L) ratio of n and p type transistors to make bn and bp equal (same drive strength). (see section 2.4.2 in the textbook)
· Cells in a library available in a range of drive strengths.
· For transistors in series or parallel, design procedures are more complicated but essentially:
• for transistors in parallel, make all the lengths unity and add the widths
• for transistors in series, make all the widths unity and add the lengths.
· For example applied to AOI221 in figure 2.13c.
· Alternative combinational design approach based on CMOS transmission gate (TX) exists for simple gates. (section 2.4.3 and figure 2.14).
· More efficient in terms of number of transistors but other considerations may be important ie. charge sharing which may require extra
buffering and therefore extra transistors.
· Can design of a 2/1 multiplexer using TX gates (figure 2.15)
· Comparison with design using OAI22 cell (figure 2.16) shows little difference but for longer MUXs differences can become significant.
· Can then design EXC-OR cell from a 2/1 MUX and an OR gate (section 2.4.4).

1.1.8.2 Sequential
See section 2.5 in the textbook
· Synchronous design using a single system clock is nearly always used since it is safer, compatible with CAD tools and usually guarantees that
the ASIC will work as simulated
· Sequential cells have a memory or storage feature
· Simple Latches are transparent (ie. changes at inputs appear at the Q output when the clock is high)
· Flip-flops are more complex (require at least two latches)
· Design a latch from TX gates – operation illustrated in Figure 2.17 (Ref. Smith)
· Design a flip-flop from two D latches (see fig 2.18)
· Clocked inverter easily designed from an inverter and one TX gate (see fig 2.19) which can then replace inverters in latches and FFs

1.1.8.3 Datapath Logic Cells


See section 2.6 in the textbook
· Datapath structures exploit the regularity of functions such as adders, multipliers, subtractors, barrel shifters etc. to give efficient VLSI
implementations. For example, a 1-bit full adder output is usually expressed as:

Sum = A B CIN
and Cout = A.B + A.CIN + B.CIN
· These can be expressed in terms of the PARITY function (where an output is 1 if there are an odd number of inputs are 1's) and the
MAJORITY function (where the output is 1 if the majority of the three inputs are 1) as:

Sum = PARITY (A,B,CIN)


and Cout = MAJORITY (A,B,CIN).
· Hence using form efficient single logic cells shown in figure 2.20a.
· Layout in figure 2.20c with data running horizontally and control signals vertically and array structure shown in figure 2.20d.
· Extend easily to a 4bit full adder (ADD4) by connecting four full adder cells as shown in figure 2.20b.
· Datapath refers to the layout of buswide logic operating as in the ADD element described earlier (datapath cell or element).
· Datapath cells are usually stored in a library and are the same size so more complex datapath cells can be created (expandable).
· Usually orientated so that increasing size in bits grows the datapath in height; adding different elements to increase the function grows the
datapath in width.
· Datapath implementations are regular and interconnect included in the cells.
· Disadvantages are control overheads and the requirement to pre-design the datapath cells themselves.

1.1.8.4 Datapath Cells


1.1.8.4.1 Adder
See section 2.6.1 and 2.6.2 in the textbook

o Symbols given in figure 1.21


Figure 1.21 - Symbols for a datapath adder. (a) a data bus is shown by a heavy line and a bus symbol. If the bus is N-bits then MSB = n-1. (b) An
alternative symbol for an adder. (c) Control signals are shown as lightweight lines.

o Table 2.11 reviews the common binary arithmetic operations (add, subtract etc) for the four common binary number representations
(unsigned, signed magnitude, ones' complement, two's complement).
o Often addition is in terms of generate G(i) and propagate P(i) signals - see section 2.6.2 for an explanation and equivalences.
o Form a ripple carry adder (RCA) conventionally (figure 1.22a) or using the generate/propagate approach (figure 1.22b).

Figure1.22 - The Ripple Carry Adder (RCA). (a) A conventional RCA. The delay may be reduced slightly by adding pairs of bubbles as shown to use two-input NAND gates. (b) An
alternative RCA circuit topology using different cells for odd and even stages and an extra connection between cells. The carry chain is a fast string of NAND gates (shown in bold).

■ Other faster adders are based on carry-save (CSA) (figure 2.23), carry-by pass (CBA), carry-skip and the most well-known the carry-lookahead(CLA).
■ Brent-Kung adder reduces the delay and increases the regularity of CLAs (figure 2.24).
■ Fastest adders are based on carry-select leading to the conditional sum adder (CSA) (see figure 2.25).
■ Graphs of normalized delays versus number of bits (figure 2.26a) show ripple-carry to be the slowest, carry- select faster and carry-save the fastest.

Figure 1.26 - Datapath adders. This data is from a series of submicron datapath libraries.
(a) Delay normalized to a two-input NAND logic cell delay (approximately equal to 250ps in a 0.5mm process). For example, a 64-bit ripple-carry
adder (RCA) has a delay of approximately 30ns in a 0.5mm process. The spread in delay is due to variation in delays between different inputs and
outputs. An n-bit RCA has a delay proportional to n. The delay of an n-bit carry-select adder is approximately proportional to log2n. The carry-save
adder delay is constant (but requires a carry-propagate adder to complete an addition).
(b) In a datapath library the area of all adders is proportional to the bit size.

■ Graphs of areas versus bits (figure 2.26b) show that ripple-carry and carry-save take up about the same area whereas carry-select takes up about twice as much
area.

1.1.8.4.2 Multipliers
See section 2.6.4 in the textbook
· Multiplication is a series of additions and shifts.
· Use a number of adders to form an array multiplier - see figure 2.27 for a 6-bit multiplication illustration.
· Performance determined by the number of partial products and the addition of partial products.
· Use canonical signed - digit vectors (CSDs) to reduce the number of add/subtract operations and replace some additions
by shifts.
· Further improvement by using Booth encoding (partial products reduced by a factor of 2 improves speed and area
utilisation).
· Improve speed still further by using Wallace-trees and Dadda multipliers (Figures 2.28, 2.29 and 2.30)
· Several considerations apply in the choice of parallel multiplier architecture eg. overall speed, power dissipation,
implementation (cell or full custom), pipelining etc.

1.1.8.4.3 Other datapath operations


See section 2.6.6 in the textbook
· Combinational and sequential datapath cells eg. NAND/NOR and FFS/latches available and operate identically to
standard forms.
· Subtractors and adder/subtractors essentially adders with modified control lines.
· Barrel shifters rotate or shift input data stream by a specified amount - used in floating point arithmetic.
· Other floating point arithmetic operators include leading - one detectors, priority encoders, accumulators and registers
of various types.

1.1.8.5 I/O Cells See section 2.7 in the textbook.

· Popular tri-state bi-directional output buffer available (Figure 1.33)

Figure 1.33

· Many other buffers available and easily designed


· I/O cells have to be designed to withstand static electricity electrostatic discharge (ESD) brought about by handling. Require input pads to be
tied to structures that clamp the input voltage to below the gate breakdown voltage ( 10v for a 100Å gate oxide)
· Essentially automatic layout tools which generate regular structures of variable sizes such as RAM, ROM, multipliers.
· Often linked to a model compiler (for behavioural verification) and a netlist compiler (for layout level verification).
· Complete system produces blocks that are correct by design.
Microelectronic Technologies and Applications

Chapter 2 - Time delay and power dissipation considerations

Chapter Information

The chapter contains the following sections:


2.1 Chapter Overview - Introduction 2.3 Library Cell Design
2.1.1 Transistor parasitic capacitances

2.2 'Logical Effort' Delay Model


2.2.1 Logical area 2.4 Power Dissipation
2.2.2 Logical paths
2.2.3 Optimum delay
2.2.4 Optimum number of stages

2.1 Chapter Overview - Introduction


This chapter covers two important aspects of CMOS ICs, namely gate delay and gate power dissipation. Physical parameters within the gates that determine the
values for the two parameters in practice are discussed as are methods of estimating values from calculation and CAD tools.
Introduction

See section 3.1 in the textbook

❍ In many ASIC design styles it is necessary to use cells from a library.


❍ Some knowledge of the characteristics of the library cells is required when using them in designs.
❍ In the previous chapter transistors were modelled as perfect switches - with no time delay.
❍ In practice a CMOS inverter has a propagation delay between input changing and the resulting output change which is a function of gate capacitances and resistances.
❍ Often modelled in terms of the non-linear resistance of the transistors and total capacitance (see Figure 2.1).
Figure 2.1 A model for CMOS logic delay

❍ Although no current is taken by a CMOS inverter under static conditions (logic 1 or 0), current is drawn during switching due to both transistors being on (see Figure
2.2 in textbook).
❍ Switching delay often given driving standard loads (gates) where n = 1, 2, 4, 8 etc.
❍ Simulation shows that switching delays are approximately linear function of load capacitances - see Figure 2.3 in text book (and hence the number of loads).
❍ Typically tpdf = Rpd (COUT + Cp) - equation 3.2
❍ Due to complexity it is only possible to evaluate time delays accurately from CAD simulation (eg. SPICE)
❍ Hand calculations only give estimates.

2.1.1 Transistor parasitic capacitances

❍ Cell delay results from transistor resistances, transistor (intrinsic) parasitic capacitances and load (extrinsic) capacitances.
❍ Input capacitance of driven cell in the load capacitance of the driving cell.
❍ CAD tool SPICE lists 8 capacitances for a CMOS transistor (shown diagrammatically in Figure 2.4 in text book).
❍ Junction capacitances CBD and CBS are p-n diode capacitances.
❍ Overlap capacitances CGSOV and CGDOV are oxide capacitances, and account for lateral diffusion of drain and source under the gate.
❍ Gate-source, gate-drain and gate bulk capacitances, CGS, CGD and CGB are combinations of junction and oxide capacitances.
❍ Detailed formulae for calculating each capacitance is given in Table 3.1 together with a typical calculation at one operating condition.
❍ All transistor parasitic capacitances are functions of operating conditions.
❍ For instance, Figure 2.5 in textbook shows the variation of all the parasitic capacitances with VIN from 0 to +3V.

2.2 'Logical Effort' Delay Model

❍ 'Logical effort' is a concept which gives us an insight into why logic has the delay it had and allows us to examine relative delays.
❍ Modifies equation 3.2 by an additional term tq to give tpd = R (COUT + Cp) + tq - equation 3.10.
❍ tq is non-linear and includes delay due to parasitic capacitances and other effects.
❍ For scaled cells (scaling factor s) since capacitances increase and resistances

decrease then equation 3.12 results giving tpd = (COUT + sCp) + stq.
❍ Equation 3.12 is then rewritten using the capacitance of the scaled cell - equation 3.14 and normalising the delay d, using the pull resistance RINV and input
capacitance CINV of a minimum size inverter giving equation 3.15 where
d= 3.15

and T = RINV CINV

❍ So delay (d) is the sum of the effort delay (f), parasitic delay (p) and non-ideal delay (q).
❍ Effort delay (f) is further broken up into the product of logical effort (g) and electrical effort (h).
❍ Logical effort is defined in figure 2.8 and is a function of the type of logic cell.

Figure 2.8 Logical effort

❍ Table 3.2 gives logical efforts for inverter, NAND and NOR cells.
❍ Section 3.3.1 in the textbook shows how to use this technique to calculate the delay in a 3 i/p NOR gate driving a net with capacitance 0.3pF giving an answer of
0.846ns.

2.2.1 Logical area

❍ Enables a calculation of the area of transistors in a logic cell to be made - logical area (see section 2.3.2).

2.2.2 Logical paths

❍ Calculation of delay in section 2.3.1 did not depend on logical effort g because it is not driving the NOR cell with another logic cell which is ideal.
❍ In a logical path situation it is possible to calculate the delays of logic cells driven by a minimum size inverter.
❍ Path delay, d, is the sum of the logical effort, parasitic delay and non-ideal delay at each stage.
❍ Extend this concept to work out delays in multistage cells. (See section 2.3.4)
2.2.3 Optimum delay

❍ Path logical effort g is the product of logical efforts on a path.


❍ Path electrical effort h is the product of electrical efforts on the path.
❍ Path effort F is the product of g and h.
❍ Optimum effort delay can then be found.
❍ Logical effort is useful in the design of logic cells and in the design of logic using logic cells.

2.2.4 Optimum number of stages

❍ For a chain of N inverters each with equal stage effort f, then neglecting parasitic and non ideal delay it can be shown that minimum delay occurs when electrical
effort h is equal to 2.7.
❍ Shown diagrammatically and in a tabular form in Figure 2.12

Figure 2.12 Stage Effort (f)

2.3 Library cell design

❍ Use hand-crafted or more commonly, symbolic layout, such as STICKS to draw the cell layout.
❍ Particular design rules built-in.
❍ Cells for gate array, standard cell and datapath are quite different - see section 3.6, 3.7, 3.8 and elsewhere in this course.
2.4 Power Dissipation
See section 15.5 in the textbook

❍ Power dissipation in CMOS logic arises from three sources:


i) dynamic power dissipation due to switching current charging/discharging parasitic capacitance.
ii) dynamic power dissipation due to short-circuit current when both transistors are momentarily on.
iii) static power dissipation due to leakage current and sub-threshold current.

❍ Switching current power dissipation (P1) given by f C VDD2 and is the major source of power dissipation in CMOS.
❍ Reduce by reducing supply voltage VDD and parasitic capacitance C.
❍ Short circuit current (crowbar current) power dissipation (P2) can be important for output drivers and large clock buffers

and is given by P2 = (VDD - 2VtN)3


❍ Typically P2 is 10% of P1
❍ Sub-threshold current in CMOS is less than 5pA/mm of gate length but for a large ASIC containing > l00K transistors it gives a total current of 0.1mA.
❍ Similarly the reverse-biased diodes conduct a very small leakage current - typically 1-5mA for a 100,000 transistor ASIC.
Both currents constitute the very low static power dissipation of CMOS. Typically 10mW total for a 100,000 transistor CMOS ASIC (0.5 mm technology).
❍ Minimising power dissipation becoming very important because large complex ASICs containing potentially millions of gates require very low power dissipation/gate
in order that the total package dissipation level is not exceeded. Also important for battery-driven equipment eg. mobile phones.
Microelectronic Technologies and Applications

Chapter 3 - Simple Programmable Logic Devices (SPLDS)

Chapter Overview
This chapter introduces a group of custom silicon components known collectively as Programmable Logic Devices. Typical generic architectures are considered
with particular reference to the construction, principle of operation and application of those devices categorised as Simple Programmable Logic Devices.

The chapter contains the following sections:


3.1 Introduction to Programmable Logic Devices
3.2 Basic Principles
3.3 PROM Architecture (Fixed AND Array, Programmable OR
Array)
3.4 PLA Architecture (Programmable AND Array,
Programmable OR Array)
3.5 PAL Architecture (Programmable AND Array, Fixed OR
Array)
3.6 Simple Programmable Logic Devices (SPLDs)
Self Assessment Questions

3.1 Introduction to Programmable Logic Devices


Programmable Logic Devices (PLDs) offer a low cost, low risk route into customised silicon for digital circuit implementations. They are particularly suitable for
small volume production (typically < 1000 units /year) or where a higher volume but fast time to market is required.

Devices are supplied from the manufacturer with an array of prefabricated logic components and interconnect. The designer utilises a CAD system to define the
required configuration of components and interconnect for a given design.

Unlike mask programmable devices that require fabrication at a silicon foundry, PLDs are electrically programmable by the user. Typically this is implemented by
downloading the configuration data from the serial line of the CAD system either directly into the device or via a device programmer.

Device architectures vary considerably from the simple AND/OR gate structure of small PLDs to the complex logic cell arrays of Field Programmable Gate Arrays
(FPGAs). Device types are further distinguished by being either one time programmable (OTP) or re-programmable.

Most manufacturers will also offer hardwired versions of their devices in which the interconnect circuitry is removed in favour of single tracks. This reduces silicon
area and therefore device cost and would be an appropriate consideration for higher volume production (typically > 10,000 units/year).

Programmable Logic Devices are available in a wide range of architectures and sizes. They are generally classified into three groups of increasing capacity, features
and cost as follows :-

• Simple Programmable Logic Devices (SPLDs)


• Complex Programmable Logic Devices (CPLDs)
• Field Programmable Gate Arrays (FPGAs)
The Department of Trade and Industry's Microelectronics in Business seminar presentation provides an overview of the technical and business issues involved in
utilising programmable logic technology.

3.2 Basic Principles


The simplest form of PLD comprises an AND array driving an OR array as shown in Figure 1.
Each AND gate in the example has 6 inputs whilst each OR gate has 8. Inputs
to gates are often represented as single lines for clarity. A matrix of input
signals (true and inverse via input buffers) is therefore formed at the input to
the AND array with a second matrix being formed at the input to the OR
array.

This mode of construction is ideally suited to implementing logic expressions


of the form :-
___ __ _ _ _
F = ABC + ABC + ABC + --------- ABC + ABC

often referred to as Sum of Products expressions.

Connections to gates in each array matrix can be fixed or programmable. A


fixed connection is hardwired whilst a programmable connection is
implemented by a fusible link. Devices are supplied by the manufacturer with
all fusible links intact. The device is customised to implement the required
logic functions by blowing those fuses where connections are not to be
defined.

Three possible device architectures can be constructed depending on whether a


matrix is fixed or programmable.

Figure 1. PLD Device Array Structure

3.3 PROM Architecture (Fixed AND Array, Programmable OR Array)


The simplest form of PLD comprises an AND array driving an OR array as shown in Figure 1.
Figure 2 shows a construction that defines a Programmable Read Only Memory (PROM) with 8 memory locations and four separate output
functions.

Since the AND array is fixed, each of the 8 combinations for the three input variables I2, I1 and I0 effectively defines a memory address.

For an array with n inputs, 2n AND gates will be required.

Only one AND gate will be active for any given input combination so each of the outputs O3, O2, O1 and O0 will register a logic 1 or logic 0
depending on whether the corresponding fuse in the OR array is left intact or blown.

Logic expressions are therefore being defined by storing a complete truth table in the PROM for each of the 4 functions required. The limitation of
this type of architecture is the inefficiency incurred when large numbers of input variables are required. A sixteen variable function for example
would require a 64K location PROM and would occupy far more silicon area than a discrete logic gate implementation.
3.4 PLA Architecture (Programmable AND Array, Programmable OR Array)

Figure 3 defines a Programmable Logic Array (PLA) structure with 3 input variables and 4 output functions.

Having both arrays programmable allows greater flexibility in generating logic functions, particularly when more than one output function requires the
same product term. In this case the AND gate generating the product term is simply connected to the appropriate OR gates.

However, should a particular logic function require a large number of product terms this can quickly use up the available supply of AND gates leaving
few left to implement the other functions

The major disadvantage of the PLA structure is related to having an extra set of fuses in the OR array which adds extra propagation delay to the signals
and reduces the component packing density. For this reason examples of commercially available devices are limited.
3.5 PAL Architecture (Programmable AND Array, Fixed OR Array)
Figure 4 defines a Programmable Array Logic (PAL) structure with 6 input variables and 4 output functions.

The basic PAL structure is the exact opposite of that required for a PROM.

The number of AND gates required is considerably reduced from the 2 n needed for the PROM and is related only to the number of product terms to be defined.

The fixed OR array, however, imposes restrictions on the number of product terms than can be logically ORed together and therefore limits the complexity of the
required logic expressions.

PAL devices provide the optimum compromise for speed, flexibility and packing density and offer many features such as programmable input/output pins, internal
feedback from outputs, flip flops and active LOW/active HIGH outputs that make them ideal components for implementing logic functions.
3.6 Simple Programmable Logic Devices (SPLDs)

There exists a bewildering array of terminology amongst manufacturers for this group of devices. Often they are simply referred to as PLDs. The PAL devices
previously discussed fall into this category. More complex devices are sometimes referred to as Erasable Programmable Logic Devices (EPLDs) although some
manufacturers use this as a generic term to cover all programmable devices.

PAL devices form the entry level into the group, typically replacing 5 or 6 TTL chips in a 20 or 22 pin package. They provide a good introduction to the features,
architecture and applications of SPLDs.

The PAL was invented at Monolithic Memories Incorporated (MMI) in 1978 to provide an alternative to Small Scale Integration (SSI) chips in applications where
customised combinational or sequential logic is required. In its original form it employed bipolar technology and fusible links for programming.

Amongst the first devices to be offered were the PAL18H8 (a combinational device constructed from an AND/OR array structure) and the PAL16R8 (which added
flip flops to enable sequential circuits to be implemented).

The modern equivalent is the PAL16V8 manufactured in CMOS technology and combining features from both these devices. The programming fuses are replaced
by electrically programmable cells, thereby allowing the device to be repeatedly reconfigurable.

Programming is accomplished by placing the device in a device programmer and downloading the configuration data file from the serial line of a CAD system. The
file conforms to an internationally agreed format known as JEDEC.

The programmer configures the device by re-designating the input and output pins to programming functions. This is achieved by applying a higher than normal
voltage to a specified input pin to select programming mode.

The address of the cell to be configured is placed on a subset of the input pins and its required state (logic 0 or logic 1) is set as an appropriate voltage on the output
pin connected to the AND/OR block it resides in. The cell is configured by applying a programming pulse to a second specified input pin and the cell state read back
on its corresponding output pin to verify correct programming has occurred.

A security bit is provided in the device, which when set during programming, inhibits any reading of the device contents.

The simple AND/OR array construction of PAL devices provides consistent and easily predictable propagation delays through the logic elements, an important
requirement in speed critical applications. Architectures that exhibit this property are often referred to as Deterministic.

Visit the Lattice Semiconductor web site - http://www.latticesemi.com/

Select the “Literature” section.

(Note you may have to register an account to access this information).

Locate and select the “Data Sheets: PAL and GAL Products" entry and observe the devices currently available.

Select the “PALCE16V8” entry and download the file.

Study this information carefully and in particular familiarise yourself with the following:

1) The distinctive characteristics of the device family


2) The device naming convention (ordering information)
3) The function, structure, operation and configuration options for the macrocell
4) The device architecture
Microsystems and Multichip Modules

Chapter 1 - Introduction to Microsystems

Chapter Information

The chapter contains the following sections:


1.1 What is a microsystem?
Self assessment questions 1.4 Scaling
Self assessment questions
1.2 Types of microsystem
Self assessment questions Further reading

1.3 Why have a Microsystem?


Self assessment questions

1.1 What is a Microsystem?

You will probably come across several different definitions of a microsystem but, as pointed out in the
Business Issues module, no single definition is generally accepted. A very basic definition of a microsystem
might be a very finely toleranced structure. This is much too broad for our purposes and would include any
precision engineered component. A more specific definition is the combination of microelectronics together
with a micromachined element on the same substrate. This sounds fine but is actually quite restrictive and
would exclude some of the more interesting techniques and devices emerging from this field. It will
probably be useful to have a more flexible view of what constitutes a microsystem. It may also be helpful to
consider the definition of the broader term: microengineering.

This question is also posed in the module Business Issues and Benefits of Microelectronic Devices chapter 5.
This question raises several interesting questions about how we view microelectronics in the wider sense.

Microengineering has developed over the last ten years or so and has largely arisen out of a recognition of
the possibilities of using the normal microelectronics production techniques to produce things other than
integrated circuits. Using batch processes which operate on hundreds or thousands of components at one
time, microelectronics technologies allow the production (the mass production) of highly complex structures
with feature sizes in the range from 1-100 microns. Microengineering recognises that the same, or similar,
techniques can be used to fabricate very small sensors or actuators. If we combine these artifacts with some
electronic signal processing at the sub-mm level, then we have a microsystem. Note that this gets round the
difficulty posed by one of the above definitions in that the integration does not have to take place on the
same substrate.

One of the most useful concepts is to think of the difference between a microelectronic device and a
microsystem. A microelectronic component processes electrical information whereas a microsystem
interacts with its environment.

It is important to realise that microelectronics technologies on their own do not provide a wide enough
portfolio of processes and materials; many additional techniques are required and combinations of
technologies are used together to give highly miniaturised components and systems. In particular, materials
other than silicon are used. Microelectronics is essentially a two-dimensional technology, consisting of thin
patterned layers. MST often requires a third dimension.

The issue of definition is important given the potential market for microsystems and the rate at which it is
expected to grow over the next decade or so. Some sources predict a market with growth rates and overall
size similar to that for microelectronics over the next couple of decades. This sounds excessive, but there is
no doubt that the eventual market will be huge; probably growing to some billions of dollars within the next
five years (see the module on Business Issues). Indeed, in a recent survey, the European Commission has
identified nearly 20,000 companies in Europe which are interested in applying microsystems technology to
their products in the immediate future. The world-wide market is predicted to reach around $US 7-8 billion
by 2002.

It is interesting to note how this technology is defined in different parts of the world. MST is largely a
European term. In the US, the generally accepted term is MEMS (Micro Electro Mechanical Systems)
while, in Japan, the subject is known as Micromachines.

These differences stem largely from the different evolutionary paths the technology has followed. In Europe
and the U.S. the main drive has been from device engineers looking for new applications for semiconductor
technologies. In Japan, the origins lie in the field of robotics and a search for increasing miniaturisation.

Microsystems technology (MST) then, is concerned with the production of new products, systems or
components through the use of microengineering techniques. Such a component will incorporate sensors
and/or actuators and signal processing on a microscopic scale.

This last definition will serve us well but has its limitations. Some authorities would exclude the necessity to
incorporate any electronics in the system before it becomes a microsystem. And, although a photodiode is
undoubtedly a sensor and an LED or semiconductor laser can be considered to be an actuator, MST
sometimes excludes optoelctronic devices from its remit. We also have to take care to include new
categories of component such as microfluidic devices in our definition. These may or may not have an active
electronic component. One can see the need to adopt a flexible definition.
In this module we shall look at the various MST techniques, examine their capabilities and discuss some
applications.

Self Assessment Questions - Definitions


Q 1. Look at the various definitions of MST (both in the text and on the www
references).
Which is the most restrictive and which is the least? Write a definition of your own.

Q 2. What are the key differences between microelectronics and micro systems?

Q 3. What is the predicted market growth for microsystems?

Q 4. Can you think of a reason for the estimates to vary so much?

Q 5. Define the term MEMS

1.2 Types of microsystem


Microsystems are commonly classified by function. They can be split into four main areas:

• Sensors
• Actuators
• Microstructures and
• Integrated Microsystems

This gives rise to several possibilities:

• Sensors can be connected directly to processing electronics


• Electronics can drive microactuators
• Complete systems involving Sensors-Processing electronics-Actuators can be created
• Free-standing microstructures, which do not involve electronic functions, can be made

Self Assessment Questions


Q 6. What are the main types of microsystem? Give some
examples of each.

1.3 Why have a Microsystem?

When the laser was being developed in the 1960s, it was described as “a solution in search of a problem.”
Luckily, those involved in the development (and the funding) ignored this view and kept working. The
problems for this solution came thick and fast in the 1970s and ‘80s and now it is difficult to imagine a world
without the laser. Applications range from industrial to medical. In fact, the laser is now a household item,
being a key component of a compact disk player. Think about this. The chances are that you have more than
one. I have at least four in my home; two CD-ROM drives (three if you count last year’s, slow model sitting
on the shelf) and two audio CD players. One of the latter is battery powered, can be carried round or used in
the car. The key points are:
• a technology which was once seen as being of little practical use is now the basis of a
huge, world-wide industry,
• the application areas are many and diverse with a requirement for a large number of
types, models and variations,
• this demand can only be met be means of mass production methods - particularly
when it comes to the smaller, cheaper devices.

When the original technology was being developed, it would have been very difficult to look at these early,
large, power-hungry lasers, with their limited capabilities and requirement for cabinets full of drive
electronics and predict that, one day, the man in the street would carry one around with him, have one in his
car and a few more in his home. Not only has this happened, it has happened very quickly (within a few
decades).

A virtuous circle is at work here; the emerging applications lead to the development of new devices and
manufacturing techniques and the possibilities thus revealed lead to the identification of new applications
which leads to further development etc. All this is driven by the, potentially enormous, financial rewards to
be had. The creation of a huge consumer market for “must have” goods which did not exist previously (CD-
ROMs and audio players) is a golden scenario for industry.

If anything, MST is an even more "golden" opportunity. Firstly, the origins of the technology, to a large
extent, arise out of manufacturing techniques which already exist. Secondly, many of the potential markets
and applications have already been identified. Indeed, some of these are already established and can be said
to be mature.

The parallel with the laser industry is therefore close but not exact. In fact, some laser devices could now be
considered microsystems in themselves. Lasers were a classic case of a phenomenon known as "technology
push." This is where a new technology is developed and the applications and markets follow. The opposite
case is where the market or need is identified first and a technology is developed (or an existing one adapted)
to meet the requirement. This is called "market pull."

MST has a foot in both camps. Many of the well known "demonstrators" (for example miniature cogs,
sprockets, and even motors) are of little obvious use - for the moment anyway. However, MST has
responded very rapidly to some market pulls, such as the requirement for airbag triggers in the automotive
industry.

There are basically two ways in which the potential of MST can be exploited:

• better ways of doing existing things


• new things that can’t be done any other way

The first of these can only be true if there is some positive advantage in moving to a microsystem.

In general, there are three key advantages advantages of microsystems over their (macro) counterparts:

• reduced size
• reduced cost
• improved performance
These may be applicable to greater or lesser degrees. They are not necessarily independent and a
combination of advantages can result. This is especially so for the size and performance arguments. Let us
look at this a bit more deeply.

Reduced size

This is perhaps the most obvious advantage of a microsystem. Many applications are driven, solely or
largely, by considerations of space. Invasive and implantable medical devices such as catheters are an
obvious example. A reduction in size also opens up the possibility of incorporating many components into
one device. An example of this is the array of magnetic coils produced by CSEM.

An array of magnetic coils produced by CSEM

The microsystem need not be in the form of an array of similar components. Different components can be
incorporated on the one substrate giving rise to possibilities such as the laboratory on a chip. A reduction in
size brings other benefits. Smaller devices often consume less power and have a faster response (ie.
improved performance).

Reduced cost

This is often a result of the cost of production derived from the processing techniques developed for
microelectronics. The batch processing techniques used in microelectronics manufacture has been the key to
its staggering success. The ability to make large numbers of components at once has driven costs down
while functionality and performance have increased. For microsystems using similar processes, the same
advantages will apply. A related benefit is that of reproducibility. Material properties and dimensions can be
kept within tight limits and made uniform both within a batch and across batches. This results in predictable
component characteristics which is a considerable advantage to both the system and the component
designer. The microelectronics industry spends a considerable amount of time and energy improving the
quality and predictability (and, in turn, reliability) of its processes. This benefits not only costs but also
performance.

Improved performance

There are several reasons why a microsystem might display improved performance, mostly related to size.
One is the potential to integrate the sensing element with the electronics. This means the signal does not
have to travel any significant distance before being processed. Much weaker effects can thus be measured.
There is also the possibility to incorporate calibration functions in the device. The size of the device also
makes it less likely to interfere with its environment. A smaller sensor will be less affected by outside
influences and forces (this is most obviously so for mechanical microsensors and will be discussed in
Chapter 3). In addition, the improved quality and reproducibility of the fabrication process will lead to
improved predictability of performance. A fourth advantage exists: the ability to do things that could not be
done by any other method.

Self Assessment Questions


Q 7. What are the three key advantages of a microsystem?

1.4 Scaling

When we design a microsystem, we cannot simply scale down the dimensions of an existing macrodesign.
We rely upon certain relationships to predict performance. As we reduce the dimensions of an object, the
significance of the various parameters in these relationships changes. For example, consider an airplane. If
we take the linear dimension as L, the fuel load it can carry, and hence the distance it can travel on one load,
will depend on volume (L3). The drag, however, will be proportional to the surface area (L2). So, all other
things being equal, if we increase the size, we will increase the distance travelled on a single load of fuel.
Following the same logic, we can see that the strength of adhesion of a bond will be proportional to the area
(L2) while the mass will scale by L3. A similar argument can be applied to a supporting structure and its
cross sectional area so a smaller object will be more capable of supporting its own weight than a large one.
Examples of this are micromechanical cantilevers which can be very long relative to their thickness and in
the macro world where small animals can carry their weight with greater ease than large ones (compare an
elephant to an insect).

So it is essential to have some idea of how the forces we take for granted and make use of in the macro world
scale down to microsystem dimensions.

The main forces that are likely to be employed are:

• Gravitational forces
• Elastic forces
• Surface tension forces
• Electrostatic forces
• Electromagnetic forces
• Piezoelectric forces
• Thermal forces

Some of these forces will act destructively and some will be useful eg. as a source of motive power for a
microactuator. Scaling is considered in detail in the study section:-

In particular note the following:

The table of properties in table 7.5. Study these in detail and try to relate them to what you know about them
and the effect they have at the everyday, macro level. Note the Load and Response parameters. These can be
read as "static" and "dynamic" parameters respectively and this gives a clue to a major classification of
sensors into those that detect a static load directly and those that use an indirect method such as a change in
resonant frequency.

Pay particular attention to table 7.6 and 7.7. Try to envisage how the physical parameters change with the
linear dimension as this reduces.

Self Assessment Questions


Q 8. As you reduce the dimensions of a microsystem
cantilever, what will, in theory, happen to the
1. Natural frequency
2. Spring constant
3. Deflection
4. The effect of gravity?
It will:

Q 9. What is the Kundsen effect and how could it impact the


design of a micro-component?
Further reading

The physicist Richard Feynman gave a classic talk in 1959 entitled "There's Plenty of Room at the Bottom".
His main theme was the possibilities and problems of making very small machines. Whilst talking about this
he touched on many interesting points which are interesting from the perspective of the state of both
microsystem and microelectronics technology today. You can find this on:

http://www.zyvex.com/nanotech/feynman.html

Read this and note the following, bearing in mind that this talk was given in 1959 before the creation of the
semiconductor industry as we know it:

• In the section "Miniaturization by evaporation" compare what he says in his first paragraph
with what you know about current semiconductor processing.

• Read what he says about the problems of scaling in this and the following section. Note his
comment on the effect of Van der Waals attractions and compare with the Kundsen effect
mentioned on page 153 of the textbook. He also hints at the possibility of creating a highly
focused beam of light.

You may wish to print this paper and refer to it at the end of this module to see how Feynman fares in his
predictions.

Another useful paper is "Grand In Purpose, Insignificant In Size" by William Trimmer which you can
find on:

http://home.earthlink.net/~trimmerw/mems/mems_97_talk.html

This paper is very useful as a reference to others in the field and as a summary of the history of the
technological process. Read this carefully. Note particularly the following:

• In the paragraph beginning "The field we are contemplating....." he indicates that we are
looking at a technology, or a series of technologies, which have not evolved from a single point
from development into manufacture (like the spawning of the microelectronics industry from the
transistor). However, this can be considered a strength (see the paragraph which begins "One
thing giving me confidence.....").

• The sentence "Complex calculations and decisions have now become inexpensive" sums
up, in a few short words, almost the entire benefit and reason behind the success of digital
computing over the last few decades (and, in fact, of electronics as a whole). See if you can think
of an equivalent, concise sentence to sum up the benefits of microsystems - if not now, then when
you have completed the module.
• Pay attention to his description of Surface and Bulk Micromachining and LIGA. We will
look at these in more detail soon. Keep in mind his analogy of the flour & steel automobile when
you come to look at Surface Micromachining

Additional information

The booklets published by the DTI (references 1, 2 & 3) are introductory guides aimed at managers and chief
engineers of companies who may find microsystems useful in their product development. They make useful
background reading and Ref 3 (the Handbook of the Microsystems Supply Industry) gives some www
addresses which are of interest.

The European Commission, through the Europractice initiative, has carried out several activities to promote
awareness of MST throughout Europe. The UK centre was known as MST 2001. Visit their web site:

www.jasa.or.jp/mst/index_e.html

Browse around the three sites linked from this page to get a feel for the services on offer. Have a look at:

http://www.MST-Design.co.uk/markets.html

Compare the tables for current and emerging products. Look at the case studies on:

http://www.MST-Design.co.uk/casestudies.html

These are not detailed but the last two (Thin Film Bulk Acoustic Resonator and Microwave Switch) give a
good account of how the devices were manufactured. You may like to muse on the techniques which were
employed to fabricate these structures.

The AML site is of interest as an example of a company offering services in MST. A microsystem will often
require a number of processes and techniques for its manufacture. Not all of these will be available from one
source so someone (a lead contractor perhaps) will have to undertake to manage the device through these
various facilities (not an easy task). There will be a role in this industry for services such as this one from
AML, perhaps from companies who carry out no manufacture themselves. Again from the MST 2000 home
page you will find a link to NMRC. Have a look at the microsystems articles in their newsletter.

The page:

http://www.nexus-emsto.com/

is very useful. Look at the market reports on:

http://www.nexus-emsto.com/jap-tai_mission.html

and:

http://www.nexus-emsto.com/mission.html
(You will have to register to get access to the latter).
These give the results from some recent visits to Japan and the US and serve to give a flavour of what is
currently happening in these areas.

The market report:

http://www.nexus-emsto.com/market-analysis/index.html

is worth looking at and gives another take on the MST market roll-out. (You will only get access to the
executive summary).
Microsystems and Multichip Modules

Chapter 2 - Microsystems - Technologies & Methodologies

Chapter Information

The chapter contains the following sections:


2.1 A short overview 2.3 Microsystem design

2.2 Microsystem development flow


Self Assessments 7 - 9
Reference to Engineering MST
Fuse newsletter (Development Technologies)
Self Assessment 10 - 12
Self Assessments 1 - 6

Instructions to register with MST News

Throughout this module reference will be made to articles in MST News. Access to back issues is free but you need to register with the site first.

Go to the link: http://www.vdivde-it.de/mst/

On the top, right hand corner, in the index box pull down menu, choose "mst newsletter".

This takes you to a page headed "MST News - International newsletter on Microsystems and MEMs"

From the greyed text on the right select "REGISTRATION" and fill out the details.

You should now be able to get to the issues of the journal by following the above steps but, instead of "REGISTRATION" click on "DOWNLOAD"

This will take you to a list of issues and you can select from these as before. It may be as well to download the PDF version of the appropriate issues for easy reference.

2.1 Microsystem technologies - a short overview

The techniques for manufacturing MST components will be covered in the following chapters but a short overview of the main methods will be given here.

Mechanical Machining techniques

Some, very precise, conventional machine tools can manufacture objects in at the micron level. However, difficulties occur due to the material properties of the tool,
such as elasticity, thermal effects etc. CO2 lasers can be used to cut and form the material but, since they do this by melting, the precision is again limited. Finer
dimensions and higher precision can be achieved through the use of Excimer lasers which operate at high frequency. With this technique, the laser fires pulses which
blast away layers of atomic thickness. The pulses are so short that no heating or melting takes place. The intensity of the beam can be altered to control the amount of
material removed with each pulse and the depth of cut can be determined by counting the number of pulses.
The main problem with direct machining techniques is that they commonly work on one structure at a time. This makes them suitable for prototypes but not for mass
production. However, there is still a place for such methods (particularly Eximer laser abation) in the MST portfolio of techniques.

Micromachining

As we have already established, the most useful MST manufacturing techniques have been developed from existing, microelectronics technologies. The basic
techniques were described in Chapter 2 of Microelectronic Technologies and Applications. You should look at this again now in order to refresh your memory on the
steps involved.

These methods are based on the fundamental technique of photolithography where patterns are reproduced on the surface of the material (usually silicon) and a circuit
is built up through a combination of patterning, etching, deposition, doping etc. This results in an essentially two-dimensional structure whereas MST requires three-
dimensional features. However, the basic technique has such powerful advantages in terms of unit cost and reproducibility that it makes an excellent basis upon which
to develop MST methods. Indeed, much of MST research is focused on overcoming the restrictions of photolithographic manufacturing whilst retaining the benefits.

The term micromachining is typically used to refer to these processing techniques. There are two main subdivisions: surface micromachining and bulk
micromachining.

Surface micromachining uses common microelectronic and thin film processing to form micromechanical structures on the surface (that is, to a depth of a few
microns) of the silicon (we refer to silicon but the techniques can be applied to other materials). Thin film techniques (essentially material deposition) can be employed
to enhance the process capabilities.

A lot of very useful structures (such as cantilevers) can be made using surface micromachining. But there is often a requirement for deeper, taller structures. The big
advantage is that these techniques can often be combined with a specific microelectronic process in order to produce a sensing device as part of an electronic
component.

Bulk micromachining exploits the fact that silicon etches much faster in some directions than others. This is a property of the crystal structure and can be controlled by
doping the silicon with various impurities. Two methods of etching are employed: wet etching and dry etching.

These micromachining methods can be broken down into several key processes:

• Pattern replication
• Material deposition
• Etching
• Sacrificial layer processing

LIGA

The term LIGA comes from the German for Lithography, electroplating and molding (Lithographie, Galvanformung und Abformung). It is essentially a technique for
creating a mold on the micron scale and using this to mass produce very small structures. The LIGA technique is not based on, or compatible with, silicon processing. If
it is to be combined with electronics, this must be done using a packaging technology.
Packaging Techniques

In most cases the ideal solution is to integrate the sensor or actuator on the same piece of silicon as the electronics. This will give a true microsystem. However, as
noted above, some of the techniques for producing microsystems are often incompatible with normal silicon processing. One solution is to combine a separate
microsensor/actuator and a microchip on the same substrate using one of the packaging techniques developed for microelectronics. This is often referred to as Hybrid or
Multichip Module (MCM) technology and involves bonding of a device (or many devices) onto the same substrate which can be silicon or some other material. We will
look at this in more detail in chapter 10.

It can be seen that the manufacture of microsystems can employs a variety of techniques and often a combination of methods is used. The above list is not exhaustive.
Many other techniques are used and others are being developed. For example, chemical and biological sensors usually employ a coating of some kind to make a FET
sensitive to a particular substance.

2.2 Microsystems development flow

Look again at the ASIC development process described in Microelectronic Technologies and Applications Section 1.2. The flow chart of diagram 1 is a typical
representation of the IC design route. It involves a number of steps including system design, partitioning, layout and simulation with feedback loops at various stages.
Although the details of the representation may vary and the process itself will evolve, the design flow is fairly well established.

There are several reasons for this:

• The design process is tried and tested


• The manufacturing process is well understood leading to accurate modeling of the process and devices. This impacts on the design route in the form
of established design rules and models for the chosen manufacturing process.
• Extensive and comprehensive CAD tools are available which allow design and verification. (In many cases these tools are available as an integrated
suite with comprehensive support. The situation further improves with well established processing lines where a design team as access to the latest process
data).

The opportunity for design verification is a big advantage, dramatically increasing the chances of “right first time” devices.

The main difficulties are:

• The complications arising from the possible (or probable) need to combine different technologies (and hence manufacturers)
• The lack of adequate design tools.
• In particular, note the comments on page 22.

All this makes microsystem development sound a lot more lengthy, costly and risky than that for an IC. Especially when we consider a microsystem which effectively
has an ASIC design as a subset of its overall development. This is indeed so and is likely to remain so given the nature of the technology. However, the issue is being
addressed by the MST community, particularly by companies offering a brokerage service.

Reproduced below is an article from FUSE Newsletter No.4


Read this and compare it with Ref 1 page 21 & 22 and with the ASIC design flow in Module 2. In particular, look at the differences between the design routes for the
two technologies and the ways in which they manage risk. Now try SAQs 1-6.

Writtenby H.v.d.Vlekkert of CSEM, a Swiss company specialising in MST development and supply. Reproduced with
permission.

Development technologies Introduction

The goal of the EUROPRACTICE program is to promote access to Microsystems. A Microsystem is defined as a small
(<1 cm3 typically) package containing at least a microstructure which interfaces with the non-electrical world (sensor or
actuator) and an IC which provides an intelligent signal processing interface between the microstructure and the user. By
promoting access, it is intended that the number of Microsystem products manufactured in Europe will increase.

However, manufacturing represents only the final stage of a product life cycle (as depicted in Figure 1) The life cycle
passes three main phases: technology set-up, development and production. The technology set-up phase is the beginning
of a Microsystem and, starting from an idea, results in a function demonstrator for which manufacturing is described in a
cook book. The development phase consists of two stages: product definition and product development. The product
definition stage is carried out by the customer, in co-operation with the Microsystems service provider. The product
development stage starts after the product definition phase and assumes that the Microsystem specifications have been
defined and agreed upon with the customer. The production phase starts with industrialisation followed by the production
of the Microsystem. In parallel, the customer implements market introduction, distribution and sales.

Figure 1: The life cycle of a microsystem

Before Microsystem manufacturing at medium to large scales can begin, a product development stage must be executed.
The goal of product development is to make industrial prototypes of the Microsystem which can be manufactured in
medium to large quantities without the necessity of a redesign.

In the past, the development stage for a Microsystem used to represent a long, costly and risky stage. The length of a
Microsystem development often spans a period of several years. Its cost can run up to several million ECUs, often with
major setbacks along the project or even total failure at the end.

To reduce these problems, CSEM has implemented a methodology for the development of Microsystems. The
methodology describes the flow of a Microsystem development to guarantee its systematic execution. It focuses on the
reuse of existing components and knowledge through the optimal sue of design libraries. The software tools are optimised
to execute each part of the development stage as effectively as possible. Check points described extensively in the
methodology limit the risks of the development.
Microsystem Development Flow
The flow of the Microsystem development stage that has been defined in the methodology is depicted in the flow chart.

The development begins with the system design phase in which the product specifications are partitioned over the different
system components. Simulations are performed to verify that the system will meet all specifications. In the next stage, all
components are designed in detail according to their specifications. The results of the detailed simulations are cross-
checked against the system level simulations. When the components meet their specifications, they are fabricated and
tested. They can then be assembled to form the first prototype of the system. This prototype is then tested extensively to
gain insight in the tolerances of the system to different parameters. When the initial prototype meets all critical
specifications, the project continues with the design of the final prototype. Minor modification to the design will be made
to assure that this prototype meets all specifications.

The experience gained with the fabrication will now also be used to optimise the final prototype so that it can be produced
industrially without any further modifications. The product specific equipment necessary for this future production will
also be defined at this stage. The final prototypes are then fabricated, tested and sent to the customer. They can also
undergo the environmental and quality tests specified in the development contract.

The methodology for a Microsystem development is similar to the one for an ASIC with two notable difference. The first
difference is that the Microsystem methodology develops an IC, a microstructure and a package in parallel with much
emphasis on their interactions during the entire development stage. The second difference is that the ASIC development
methodology does not distinguish between first and industrial prototypes. The need for this distinction in Microsystems
stems from the fact that there are no standard test or assembly procedures for Micro- systems. Therefore, the first
prototype is used to optimise the test and assembly procedures for industrial production. The resulting industrial prototype
is conceived in such a way that the prototype can be produced in large quantities without the need for redesign in the
industrialisation stage.

The system design phase is very important in the Microsystem methodology. In this phase, three different issues are
addressed. The first issue is system partitioning which distributes the customer specifications over he different
components of the Microsystem. The second issue is the choice of technologies for the different components and the
verification as to whether the component specifications can be met with the technologies chosen. The third issue is
concerned with the assembly and test of the components and the system. Given the small dimensions of a Microsystem, a
test and assembly concept must be worked out during the system design.

Throughout the entire methodology, there are checkpoints defined with the precise definition of the information which
must be available. The checkpoints are very effective in limiting the risks of the Microsystem development, since they
require the evaluation of all critical aspects of the Microsystem and split the development stage into shorter well-defined
parts.

The methodology also helps defining the software tools needed for the Microsystem development. The requirements of
the software tools for each step of the development stage are based on the kind of information that must be available at the
end of the step. This has helped us to choose the software and implement the libraries necessary for each step. The
libraries in turn will help shorten the development time and reduce its cost, since they maximise the reuse of available
components and knowledge.

Conclusion
The methodology described above has been implemented at CSEM and is used for all our development projects. We are
using this methodology to define software tools and libraries necessary for Microsystem design and simulation. The
libraries of existing components are currently being implemented and will be extended as more components become
available.

The result of the methodology appears to be that development projects tend to get shorter, although it is too early to reach
a definitive conclusion. The methodology has certainly helped us in discussions with potential customers, because it
explains how a Microsystem development takes place and how it limits the risks of such a project.

Self Assessment Questions


Question 1

What features of standard, microelectronic processes make them suitable for development as MST techniques?
Question 2

Why are mechanical machining techniques not suitable for MST?

Question 3

What features allow the ASIC design route to be well established?

Question 4

What are the difficulties in doing this for MST?

Question 5

What additional stage(s) is usually required? (hint: see refs 1 & 4)

Question 6

With reference to the article on “development technologies” from ref 4 answer the following:

1. During the design, how does CSEM reduce the risk?


2. Why is the initial prototyping stage necessary?
3. What are the two notable differences between the ASIC and the Microsystem development flows?
4. What are the three issues addressed in the system design phase?
5. How is the software defined?

2.3 Microsystem Design

Let us explore the design task more thoroughly. Go back to the bullet points of section 2.2 where we list the reasons for the well established IC design flow and the
difficulties in the MST case. Bearing these in mind, read the following article from the journal "MST News". (Click on the link below to reach the relevant copy of the
journal and look at the paper entitled "Moving MEMS CAD Tools into the next Century"- the file is in PDF format). Read this critically now noting the following points
and answer the SAQs in order to assess your understanding of the material. As you study this, compare it to what you know of the IC design task and the CAD tools
available.
http://www.ami.bolton.ac.uk/courseware/msysmcm/ch2/mstnews0499.pdf

Note the "initial observations" made here.


Also note the suggestion that the tools will have to provide some guidance and help to the user in the area of modelling. Are there any other areas in which you think
they would require help? They also suggest the use of proven IP elements to reduce the Time To Market (they really mean the design time here). Can you see a
difficulty with this? My own opinion is that the diversity of applications will make this difficult for MST as IP elements are likely to serve a smaller range of
applications than with ICs. It is more difficult to design say a pressure sensor element for general use than say a JPEG module and the performance is more likely to be
compromised. Do you agree with this?

Note the point made in the final paragraph on the inaccuracy of process parameters. Again, this is a major problem. SAQs 7, 8 & 9 refer to this paper.

Self Assessment Questions

Question 7

In the preceding paper, what do the authors suggest to speed up Time To Market?

Question 8

What are suggested as being the main areas of difficulty in creating a CAD tool suite?

Question 9

What are the suggested modelling strategies?

Now have a look at the paper in MST News of 5/00 entitled "Towards Dedicated Design Tools for Microsystems".

http://www.ami.bolton.ac.uk/courseware/msysmcm/ch2/mstnews5-500.pdf

The first thing to note is that the first paper was from April 1999 whereas this is dated May 2000 ie. just over a year later.

Read this paper carefully paying particular attention to the diagrams. Note the following:

• The influence attributed to "market pull".


• The four steps in standard IC design tool evolution. This is interesting as it suggests we are at a similar position to IC CAD tools in the 1980s. Do you
think this is correct?
• The extra level of "system" design (c.f. IC CAD) which is suggested by figure 2.
• See the statement that "MEMS design tools do not need to handle the complexity of designs as the IC industry does…but must cover different domains
and complex design flows". In other words the complexity lies in the flow and combination of flows rather than the sheer complexity of the design detail. Do
you agree with this?
• Note the description of how design used to be carried out "they iterated their design between process technology runs…" This means they used a
prototyping, processing step as part of the design cycle. This may seem pretty drastic but IC design used to be like this in the 1980s. It used to be the rule,
rather than the exception to allow for one prototyping stage. This was due to the lack of adequate simulation software. Imagine what this did to the design
cycle time; design times of one year were quite common.
• Look at figure 3. Note the ad-hoc mix of tools used and note the steps of the problem-oriented approach suggested in the following paragraph.
• Look at figures 5, 6, 7 & 8. Do you think they cover all the angles discussed in the text? If not, where are the gaps?
• Note that Figure 5 shows a combination of fabrication simulations. These give the process models and parameters that will affect the performance of
the microsystem. This is rarely included in an IC design. Have you any idea why? (The processing house will still characterise its processes to a very high
degree but, since the processes are much more predictable, the results are usually taken into account in the design rules and there is no need for process
modelling in the design stage).
• The author states that a behavioural model has to be deduced from the FEM results - Note the interdependence implied by this.
• The penultimate section suggests that quite a lot of co-ordinating effort is required to enable the development of comprehensive toolsets.
• Several important points are mentioned in the conclusions. In particular the need for an interdisciplinary design team. Also the need for three things:
two different design flows and the link between them.
• Now note that this paper originates from the same company as the previous one. Do you think they have made much progress in the year?

The paper paints a fairly detailed picture of the requirements for an integrated set of design tools. Do you think this will ever come about? Can you think of any factors
that would slow the development of such a toolset? The IC CAD business is largely driven by the huge amounts of money at stake. The CAD vendors have therefore a
strong incentive to develop new tools and a very competitive market has resulted. For MST, the wide range of applications areas and technologies needed would mean a
wide range of toolsets and a more fragmented market. This may limit the applicability of each and there may not be such an incentive for CAD vendors to get involved
in this field. Until the predicted large market develops, the development of the tools may be slow.

Self Assessment Questions

Question 10

What two design tools are used most in the ad-hoc approach to MST design?

Question 11

Which tool is used to model micro-structures?

Question 12

What does the author suggest is the cause of the loss of information and a source of errors?

If you have the time, read the other articles in this issue of MST news. This journal is very comprehensive and is a good way of keeping up to date with
developments in MST. Indeed, we shall refer to more material from this source in the following chapters. If you develop an interest in the subject you
should get a subscription (it's free).

Demonstration

Finally for this chapter, you can download a demonstration version of an MST toolset. This is an executable file and you should be able to run it on your
PC. You will not be able to design any Microsystems with it but play around with it and, from your knowledge of IC design tools, see how it differs from
these. There are no SAQs on it.

Note: The MST toolset software will take typically fifteen minutes to download, depending on the speed of your modem
and internet connection.
Microsystems and Multichip Modules

Chapter 3 - Microsystems Processing Techniques - 1

Chapter Information

The chapter contains the following sections:


3.1 Introduction 3.5 Sacrificial Layer
3.2 Pattern Replication Processing
3.3 Deposition 3.6 Silicon Planar
3.4 Etching Technology
3.7 Other Materials

Self Assessment
3.1 Introduction

There are a number of ways in which we could classify and group the various processes
used in MST. As we have already ascertained, there is much to be gained by using
processes compatible with, or based upon, well understood semiconductor processes.
These use silicon (almost exclusively) as a base material. However, MST frequently
requires the use of other materials, so we will need additional processes. It may be
useful therefore to discuss MST in terms of silicon processing and specialised
processing. (This has the added advantage that the assigned textbook classifies it in this
way).

In this chapter, we shall look at the common silicon processes and talk about their
suitability for microsystem manufacture.

First of all, let us consider the key processes for microfabrication (note: the terms
micromachining and microfabrication are frequently used interchangeably). These were
mentioned at the end of the previous chapter and you should try to recall what they are.

We will now discuss them in more detail.

3.2 Pattern Replication

Automated methods of reproducing patterns are the fundamental process required for
almost every mass production technique. (For example, think of the stamping of car
body parts in sheet steel). Traditional methods include printing, moulding, casting and
embossing. Difficulties arise when we try to apply such methods directly to the
dimensions required for microsystems. These lie in:

• the adaptation of the process itself and


• the fabrication of the master or tool
The fabrication of
ICs solves this
through the use of
a well-established
process called
photolithography.
This consists of
the transfer of a
pattern on a mask
plate to a layer of
photosensitive
material, usually
by means of
ultraviolet
illumination. This
allows very
precise
reproduction (to
sub-micron
resolutions). The
use of a series of
masks, each
aligned to the
surface of the
material, allows
quite complex
structures to be
fabricated.
Typically, the
Mask mask will consist
of a repeated
pattern (in order
to fabricate many
identical devices
at once) on a glass
or quartz plate
and the working
material will be a
thin wafer of
silicon.

There are two important restrictions:

• The surface of the structure must be planar and...


• The exposing light will be diffracted by the resist. This limits the
thickness (depth) that can be exposed without loss of resolution.
These are a result of the requirement for the sensitive layer to be in contact with the
mask or in the image plane of the optical system. The result is that fabrication based on
photolithography is essentially two-dimensional. As we have discussed, MST often
requires three-dimensional features.

A lot of MST research is aimed at overcoming these restrictions of the photolithographic


manufacturing whilst retaining its considerable advantages of reproducibility and
advanced state of development. Hence, a lot of MST fabrication is based on Silicon
Planar technology. It starts with a silicon wafer and treats it to a series of patterning,
deposition and etching steps to achieve the desired result.

3.3 Deposition

Another key technique is that of deposition. As the term suggests, this is essentially a
process, which deposits material onto the surface of the wafer. The main techniques are:

1. Epitaxy. This is a process whereby a very thin (1-10 micron) layer of doped silicon is
effectively grown on the wafer in such a way that the crystal structure is continuous
between the substrate and the deposited layer.

2. Vacuum deposition. By means of thermal evaporation or ion bombardment (known


as sputtering), atoms are liberated from a source material in a vacuum chamber and
condense on the cool surface of the wafer (or other object). This technique is often used
to deposit thin (< 1 micron) layers of pure metals but can be used for non-metals and
mixed materials. The technique allows precise control of thickness. However, the
atoms travel from the source to the sample in a straight line. This means that, if the
surface is not planar, shadowing will result and the coating will not be of uniform
thickness (shown in figure 3.1 below).
figure 3.1

Other disadvantages are:

a) the deposition rates are low, limiting the


applicability for thicker layers and...
b) A large proportion of the source material
is wasted.

3. CVD (Chemical Vapour Deposition): A gas is passed across a heated sample in a


reaction chamber. The surface of the material reacts with the gas to produce the desired
layer. This technique is commonly used to create layers of Silicon Dioxide (SiO2),
Silicon Nitride (Si3N4) and polysilicon with thickness of 1 micron and above. The
result is a conformal coating (where the added layer covers the entire surface - see figure
3.2 below). However, the process requires elaborate equipment and is difficult to set
up. There are also safety & environmental concerns owing to the materials used. (But
note that the use of CVD is becoming more widespread for IC processing. It may,
therefore, be subject to rapid development).

figure 3.2

4. Spin coating. In this process, a vacuum holds the wafer to a spinning chuck. A
liquid is applied which spins out to form a coating. This dries or polymerises to form a
layer of about 100 microns or greater. There is no precise control and the process tends
to planarise a non-planar surface (see figure 3.3 below). The technique is commonly
used for spinning of photoresist onto silicon wafers where such considerations are less
important.
figure 3.3

5. Electroplating. This can be achieved for conductive surfaces by passing an electric


current through an electrolytic bath. Layers greater than 1 micron thickness can be
achieved. The apparatus required is relatively simple, but control is difficult and
imprecise. The quality of the surface is usually poor. Electroplating can be combined
with the photolithograhpic process. If a patterned photoresist is applied, deposition will
not occur where the resist blocks the current. If a thick layer (or deep feature) is
required, the resist layer must be equally thick. (see figure 3.4 below)

figure 3.4

6. Lift-off technique. This is a combination of Photolithography and Vacuum


Deposition. Basically, the deposition takes place onto a surface patterned with
photoresist. The coating is deposited over the entire surface but, when the resist is
dissolved, the coating above will detach. For this to occur, the edges of the coating must
be well defined. Hence the technique cannot be used with either CVD or spin coating.
This can readily be seen from figure 3.5 below.

Before Lift Off


After Lift Off

figure 3.5

3.4 Etching

Etching is basically a means of removing material. Common, mechanical methods such


as milling and grinding are difficult to apply to microsystems (this was discussed briefly
in chapter 1). In microelectronics and microengineering chemical methods are much
more common. The main reasons for this are:

• They can be highly selective with regard to the material they etch.
• This means they can be controlled using the photolithographic method
with a patterned etch resist layer.
• The use of different etchants and other methods of control (such as
doping or etch stops - discussed later) provides a variety of techniques for
different effects.

There are two main etching techniques - wet & dry.

Wet etching is the simplest process where the sample is placed in a liquid that dissolves
some materials but not others (typically, the mask material or a doped region). Broadly
speaking, etchants can be Isotropic or Anisotropic. Isotropic etchants attack the material
equally in all directions and Anisotropic etchants attack the material at different rates in
different directions. It is particularly useful for cutting deep V-groves and trenches. We
shall look at these in more detail later.

Dry etching uses a gaseous etchant. The gas is ionised and the ions are propelled to the
sample in an RF field. This is known as Reactive Ion Etching (RIE). It has a high
directionality and allows deep, steep-sided features to be made.

(A) wet isotropic (B) anisotropic in (100) silicon (C) reactive-ion (RIE)
figure 3.6

Typical effects of the two wet and the dry etching techniques are shown in figure 3.6
above.
Planar Process used in IC manufacture. In particular, note the following:

• Figure 3.2 gives an interesting breakdown of the processes and where


they are used. Note the classification of the bipolar processes into different
lateral and vertical isolation types. (Also note the date on this diagram).
• Take a good look at the fabrication process as shown in figure 3.3. You
should be familiar with this from some of the earlier modules. It is important
to realise that this is only one example of a process and there are many
variations on this theme. An IC process is commonly classified by how many
mask layers it requires. For example, we can refer to a "5 layer" or a "12
layer" process. This gives a rough indication of how complicated the process
is (and of how much - relatively - it is likely to cost you). In general, the
simpler processes will have fewer layers. Another way of classifying the
process is by how many layers of metal interconnect are involved. So you can
have a "5 layer, single metal" process (likely to be a simple NMOS process)
or a "14 layer, double metal process" (likely to be a CMOS process).
• Whilst reading the text, try to relate it to this diagram. Note the
indicated layer thicknesses shown in figure 3.3.
• Try to follow the general sense of the equations in section 3.2.1.
• Section 3.2.6 discusses passivation. Although this is a fairly simple,
protective layer, it requires a mask and more processing. Holes have to be left
in this layer to allow wires to be connected to the bond pads (passivation
windows). This may seem obvious, but I have seen IC designs come back
from processing with a layer of passivation over the entire chip, including the
bond pads.

3.5 Sacrificial Layer Processing

This is an extension of the normal silicon IC process to obtain micromechanical


structures. It is often referred to as surface micromachining. It consists of a series of
steps: depositions and patterned etches. Two or more materials are used and one is
finally selectively etched away to leave a free-standing structure. SiO2 is normally used
as the sacrificial layer and polysilicon or silicon nitride for the structures. An isotropic,
wet etch or RIE can be used for etching.
Note that this technique is closely compatible with conventional IC technology, giving
the potential for integration of the electronics in the same device.

These then are the main techniques for micromachining.

3.6 Silicon Planar Technology

In order to appreciate the processes required for MST, it is important to have a good
grasp of the Silicon Planar Process.

In order to remind or familiarise yourself with this, look at module Microelectronic


design chapter 2. In particular, note the process model of diagram 2 and compare it with
the way we have categorised the process steps in this chapter.

Now read section 3.2 of the textbook. Note the process steps as set out on page 37. A
couple of important additions are bonding and encapsulation. Passivation is discussed
in section 3.2.6. of the book. It is important to note here that it is possible to leave a gap
in the passivation layer. With an IC it is necessary to leave such gaps (known as
windows) over the metal bond pads to allow bonding to take place. One important
technique we have not explicitly discussed above is doping. A good account of this is
given in section 3.2.1 and you should go over this carefully. Figure 3.2 is interesting
mainly because it comes from a source dated 1976. Although it doesn’t imply any
relative importance of the processes listed, CMOS is undoubtedly the most prominent
process today and this is where most research and investment goes. NMOS & PMOS
are by no means extinct, but they are very rare. One process not mentioned is SOI
(Silicon on Insulator). You may come across this as SOS (Silicon on Sapphire).

We have from time to time mentioned that much of the research into microsystems is
focused on solving the problems of compatibilitiy with common IC production. The
April 1998 issue of the magazine "Semiconductor international" contains an article from
Scandia Labs showing one innovative approach (and mentions some interesting points
along the way). You should look at this. Note the following in particular:

• The discussion of the problems involved. These are not confined to


process techniques but to areas like equipment reliability.
• The reference to packaging as a barrier
• The discussion on reliability
3.7 Other Materials

The processes above are generally applied to silicon. Of course, silicon is just one of the
materials available to us to construct a microsystem. A number of other materials, both
passive and active are used in MST.

Self Assessment Questions


Question 1

What is the main disadvantage of the


photolithographic process when used for
the manufacture of microsystems?

Question 2

What is the difference (in effect) between a


negative and a positive photoresist?
Question 3

State three disadvantages to the use of


vacuum deposition when applied to
microsystems?

Question 4

What are the five common techniques for


depositing material?

Question 5

Why are the spin coating and CVD


processes not suitable for use in the lift-off
method?
Question 6

Why are chemical etchants commonly used


in microelectronics & microsystems
processing?

Question 7

Sketch out the principle steps of the


photolithographic process for a negative
and a positive resist.

Now read the paper presented by Sniegowski et al. which you will find on the link below.

This gives a short overview of how one team see the problem of adapting silicon
processes and integrating MEMS. The sections of note are those entitled "Integrating
MEMS and CMOS" and "Adapting to microsystem manufacture". Figure 3 is
interesting. This paper hints at some of the things we shall look at in the following
chapter."

http://www.semiconductor.net/semiconductor/issues/Issues/1998/apr98/docs/feature8.asp
Microelectronic Project Management

Chapter 1 - Introduction

Chapter Information

The chapter contains the following


sections:
1.1 The importance
of project
management
1.2 Content of the
module
1.3 Project definition
1.4 Project
management
1.5 Project planning
1.6 Project control
1.7 Project teams
1.8 Reading list
1.1 The importance of project management
In many companies or departments the work is arranged as a series of projects.
This is a common feature for microelectronic development, as it is for product
development in general. To those who work in such an environment, the
importance of good Project Management is clear.

It may be felt in other environments, where innovation is not a key feature, that
the techniques of Project Management are unlikely to be appropriate. This would
be a dangerous misconception. Good management achieves most of its effects
from the successful introduction of change rather than by the supervision of the
status quo. Project Management in such an environment is particularly important
and demanding.

Having accepted that microelectronic development activities generally fall into


the category of ‘project’, the question may be asked why such sophisticated and
involved techniques and methodologies for project management are necessary.
Although mankind has been involved in projects since the beginning of recorded
history, obviously the nature of projects and the environment have changed.
Modern projects are subject to greater technical complexity and require greater
diversity of skills. Managers are faced with the problem of putting together and
directing large temporary organisations while being subjected to constrained
resources, limited time schedules, and environmental uncertainty. To cope with
new, more complex kinds of activities and greater uncertainty, new forms of
project organisation and new practices of management have evolved.

Two examples of modern, complex activities that required


new management practices and organisation are the
Manhattan Project to develop the first atomic bomb and the
Apollo Space Program to put a man on the moon. Compared
to earlier undertakings, projects such as these were not only
unparalleled in terms of technical difficulty and
organisational complexity, but also in terms of the strict
requirements circumscribing them. In ancient times project
“requirements” were more flexible. If the Pharaohs needed
more workers, then more slaves or more of the general
population was conscripted. If funding ran out during
construction of a Renaissance cathedral, the work was
stopped until more money could be raised from the
congregation (indeed, this is one reason many cathedrals
took decades to complete). If a king ran out of money while
building a palace, he simply raised taxes. In other cases
where additional money could not be raised, more workers
could not be found, or the project could not be delayed, then
the scale of effort or the quality of workmanship was simply
reduced to accommodate the constraints.

In projects like Manhattan and Apollo, requirements are not so flexible. First,
both projects were subject to severe time constraints. Manhattan, undertaken
during World War II, required that the atomic bomb be developed in the shortest
time possible, preferably ahead of the Nazis; Apollo, undertaken in the early
1960s, had to be finished by 1970 to fulfil President Kennedy's goal of landing a
man on the moon and returning him safely to earth “before the decade is out".

Both projects involved advanced research and development and explored new
areas of science and engineering. In neither case could technical performance
requirements be compromised to compensate for limitations in time, funding, or
other resource; to do so would increase the risk to undertakings that were already
very risky.

Complex efforts such as these defy traditional management approaches for


planning, organisation, and control. The examples above are representative of
activities that require modern methods of project management and organisation to
fulfil difficult technological or market- related performance goals in spite of
severe limitations on time and resources.

Today, project management is being applied in a wide variety of industries and


organisations, although its application still lags far behind its potential. Originally
developed in large-scale, complex technological projects such as Apollo, project
management techniques are applicable to any project-type activities, regardless of
size or technology. Modern project management would have been as useful to
early Egyptian and Renaissance builders as it is now to present day contractors,
engineers and systems specialists.

1.2 Content of the module


Most courses on project management address either the techniques (e.g. network
planning, work breakdown structure, budgetary control etc.) or behavioural topics
(e.g. project leadership, group psychology). The course content of this module
will place emphasis on the techniques of project planning and control. However,
because it is felt that the behavioural aspects of project management are also very
important, there will be a session on project team selection and group dynamics.
In addition, the theoretical content of the module is applied to microelectronic
projects.

Most sections contain self-study exercises, intended to reinforce the learning of


the module. Some chapters also encourage hands-on use of a popular project
management package, Microsoft Project for Windows. The first exercise using
the package appears in chapter four, although there's nothing to prevent you
familiarising yourself with the package now.

The next sections present an overview of the main stages of planning and
managing a typical microelectronics project.

1.3 Project definition


A project is an endeavour to create something new, defined by a set of objectives,
using resources, in a set time scale.

All projects share one common characteristic -the projection of ideas and
activities into new endeavours. The ever-present element of risk and uncertainty
means that the steps and tasks leading to completion can never be described with
absolute accuracy in advance. For some complex projects the achievement of a
successful outcome may even be in question. The function of project management
is to foresee or predict as many of the dangers and problems as possible and to
plan, organise and control activities so that the project is completed successfully.

The principal identifying characteristic of a project is its novelty. It is a step into


the unknown, fraught with risk and uncertainty. No two projects are ever exactly
alike, and even a repeated project will differ in one or more commercial,
administrative, or physical aspect from its predecessor.

When the uncertainty of a project drops to nearly zero, and when it is repeated a
large number of times, then the effort is usually no longer considered a project.
For example, building a skyscraper is definitely a project, but mass construction
of prefabricated homes more closely resembles an assembly line than a project.
Admiral Byrd's exploratory flight to the South Pole was a project, but modern
daily supply flights to Antarctic bases are not. When (far in the future) tourists
begin taking chartered excursions to Mars, trips there will no longer be considered
projects either. They will just be ordinary scheduled operations.

In all cases, projects involve organisations which, after target goals have been
accomplished, go on to do something else (construction companies or
microelectronics project teams) or are disbanded (Admiral Byrd's crew, the Mars
exploration team). In contrast, repetitive, high certainty activities (prefabricated
housing, supply flights and tourist trips to Antarctica or Mars) are performed by
permanent organisations which do the same thing over and over, with little
change in operations other than rescheduling. That projects differ greatly from
repetitive efforts requires that they be managed differently.

1.4 Project management


Before 1970, the term Project Management was seldom used in industry,
although, in many cases, it was probably a role fulfilled through separate job
description i.e. site manager, design office manager, contracts manager, site agent,
manager, and supervisor, to name but a few. Indeed, with the increasing use of the
term Project Manager through the 1970s and 1980s, some of these earlier roles
did little more than change their name, with perhaps the odd Project Coordinator
thrown in for good measure.
However, on the positive side, Project Management has also developed into an
effective 'science', and where used effectively, has very often proved invaluable to
both the client and members of the design/construction team.

As a consequence of this apparent indiscriminate use of the term Project


Management, it is clear that it is a term often abused in its application throughout
industry, not only in terms of those individuals who carry its title, but also in what
it provides. Possibly the most common misconception is that of "Project Planning
and Control", ignoring the equally important element of "Management".

A more accurate definition of Project Management, however, recognises the need


for professional judgement (or management) in addition to the planning of time,
cost and resources. Therefore:

MANAGEMENT + PLANNING
= PROJECT MANAGEMENT
The objectives of Project Management, as distinct from those of Project Planning,
are:
· to manage time and progress;
· to manage cost and cash-flow;
· to manage quality and performance.
The means by which these objectives are achieved include Project Planning,
together with co-ordinating, monitoring and controlling available resources.

The projects mentioned earlier in this chapter, the Great Pyramids of Egypt, the
Manhattan project, the Apollo space programme, and the development of Product
X all have something in common with each other and with every other
undertaking of human organisations: they all require, in a word, management.
Certainly the resources, work tasks and goals of these projects vary greatly, yet
without management none of them could happen.

1.5 Project planning


Project planning begins with defining the project. The start point for some
projects is very clear - the arrival of a product spec. from a customer, the first
design engineer beginning work on a new product etc. However, other projects
have a much less distinct beginning. For example (from a different sector), the
design of a new car often involves many discussion and much conceptual design
before even the go ahead for full design is given. Some of the pre-discussion
involves planning the work that would be carried out, in order to cost the new
project and assess it for feasibility. Should this work be classed as part of the
project, or not? Many companies choose to think of the start of the design project
to be once the firm go-ahead for detailed design has been given. In any case, once
the start point of the project has been decided, the project can be defined.

The project specification must be developed, including objectives, functional


targets, cost and time objectives.

The project must then be broken down into manageable work packages. From
here activities can be defined and, against the activities, resources can be
planned. The order in which activities are carried out, and the extent to which
parts of the project can be carried out concurrently is calculated. This is normally
achieved using techniques of networkplanning. The normal outcome here is a
list of start and finish dates for activities.

In addition, a cost plan must be formulated such that the accumulation of costs
throughout the project can be investigated, planned and monitored.

Because the project plan is necessarily formulated in advance, unforeseen


circumstances can occur which force activities to deviate from the theoretical
plan, in either time, cost or quality terms. If outcomes are critical, it is necessary
to plan in advance for likely failures. This is known as risk planning, and often
results in contingency plans which can be put into action if necessary. A good
example here is the Apollo missions, where, in addition to planned activities and
procedures for a ‘normal’ mission, the project team also designed literally
thousands of ‘contingency procedures’ to cover every conceivable ‘failure’
eventuality.

1.6 Project control


Project control, as the title suggests, is concerned with ensuring that the work is
carried out in accordance with the plan. Costs, timing, quality, and any other
objectives are monitored against the plan, and corrective action taken should any
deviation take place.

1.7 Project teams

Complex projects often


involve many different
people from different
disciplines and often
different companies. For
example, the design of
the Rolls Royce Trent
engine involved an
estimated 6000 Rolls
Royce engineers, and up
to 100000 employees
from other organisations
all interacting to bring the
project to fruition. Often
the project is split into
work packages, with
separate smaller teams
associated with each part
of the project. However,
smaller and less complex
projects may be handled
by a single team. For
example the design of a
new manufacturing
system for assembling
ventilation fans involved
the product designer, a
manufacturing systems
engineer, a production
controller and a shop
floor representative.
A single engineer, interfacing with others in the company may carry out
microelectronics projects. On the other hand more involved projects will require
larger teams.
Often the profile of a project team can change through the duration of the project.
For example at the start of a manufacturing systems project, more input from
product designers may be sought. Later, facilities engineers and equipment
suppliers will be more involved.

Key players in the project team are the project manager, the ‘customer’ (external
or internal to the organisation), team members and senior management of the
company.

In addition to requiring the right mix of technical skills to carry out the work,
there is also a requirement for a good mix of ‘human’ skills. Too many natural
leaders can lead to chaos. Likewise, a group consisting of all ‘doers’, or all
‘thinkers’ would also not be effective. The formulation of teams from a ‘human’
perspective is equally important to ensuring the right mix of technical knowledge
and expertise.

You may have come across the work of Belbin, who defined eight ‘personality’
types which are required for a team to function well. He said that some people
naturally prefer to operate in the role of ‘leader’, whereas others had a natural
tendency to prefer to carry out detailed work. Other ‘characteristics’ he
determined as being important to have in the team, include someone who is
naturally good at encouraging and being enthusiastic, someone who can smooth
disputes, and someone who can be constructively critical- or evaluative - about
the work of the team. Belbin’s theory will be covered in more depth later in the
module.

1.8 Reading List

1. Lock, Dennis Project management, Gower, 1992, ISBN 0566 0734 04


(library 658.404)

2. Nicholas, John Managing business and engineering projects, Prentice Hall,


1990. ISBN 0135 5663 04
(library 658.404 NIC)

3. Andersen, Erling Goal directed project management Kogan Page 1995.


ISBN 0749 4138 91
(library 658.404 AND)

4. Spinner, Pete Elements of project management, Prentice Hall 1992. ISBN


0132 4955 89

5. Kliem, Ralph The people side of project management, Gower 1994 ISBN
0556 0736 33 (library 658.404 KLI)

6. Chicken, John C Managing risks and decisions in major projects. Chapman


and Hall 1994,
ISBN 0412 5873 00 (library 658.403)

7. CICA Using computers for project management Construction Industry


Computing Association
ISBN 0906 2257 75 (library 624.068)

Self Assessment
Question 1
Describe a project in which you have been involved. State the objectives,
timescale and any penalties for not achieving the objectives.
Describe briefly how you broke the project into activities.
List any activities which you could carry out simultaneously and those that were
sequential.

Question 2
Describe the characteristics which classify a piece of work as a project rather than
a routine or regular task.

Question 3
Describe the main steps in a typical microelectronics project from definition
through to completion of the product design.
Microelectronic Project Management

Chapter 2 - Project Specification Formulation

Chapter Information

The chapter contains the following sections:


2.1 Introduction
2.2 What is a project?
2.3 Quality function deployment
2.4 Financial specification
2.5 The customer's project specification
2.6 The contractor's project specification
2.7 Product development project
specification
2.8 Developing and documenting

2.1 Introduction
This section deals with project definition, which should take place before a project is given the ‘go-ahead’ (authorisation).
Project specification provides information with which to appraise the proposed works against required outcomes (cost, time,
quality, fit for purpose etc.). It also should give a sound basis of information with which to carry out the detailed planning of
the project.

The best source of information for creating the project specification is the person or persons requiring the work. This may be
an external customer, or an internal manager or department. In either case the ‘initiator’ of the project will be termed ‘the
customer’ for the purposes of these notes. (Similarly, the person(s) carrying out the project work will normally be termed ‘the
contractor’) Ensuring that the customer’s specifications for the project are fully understood and documented is a vital first
step towards a successfully completed project.

During this session we will first examine the nature of projects, in order to understand the features which must be specified.
2.2 What is a project?
Why are some works considered "projects" while other human activities, such as planting and harvesting a crop, stocking a
warehouse, issuing payroll cheques, or manufacturing a product, are not?

What is a project? This is a question we will cover in more detail as we progress through the course. Just for an introduction
though, some characteristics will be listed that warrant classifying an activity as a project. They centre on the purpose,
complexity, uniqueness, unfamiliarity, stake, impermanence and life cycle of the activity.

1. A project involves a single, definable purpose, end product or result, usually


specified in terms of cost, schedule, and performance requirements.

2. Projects cut across organisational lines since they need to utilise skills and talents from multiple
professions and organisations. Project complexity often arises from the complexity of advanced
technology, which relies on task interdependencies, and may introduce new and unique problems. This
may be especially true for microelectronic applications.

3. Every project is unique in that it requires doing something different than was done previously. Even
in "routine" projects such as home construction, variables such as terrain, access, zoning laws,
labourmarket, public services and local utilities make each project different. A project is a one time
activity, never to be exactly repeated again.

4. Given that a project differs from what was previously done, it also involves unfamiliarity. It may
encompass new technology and, for the organisation undertaking the project, possess significant
elements of uncertainty and risk. So the organisation usually has something at stake when doing a
project. The activity may call for special effort because failure would jeopardise the organisation or its
goals.

5. Projects are temporary activities. An ad hoc organisation of personnel, material, and facilities is
assembled to accomplish a goal, usually within a scheduled time frame; once the goal is achieved, the
organisation is disbanded or reconfigured to begin work on a new goal.

6. Finally, a project is the process of working to achieve a goal; during the process, projects pass
through several distinct phases, called the project life cycle. The tasks, people, organisations and other
resources change as the project moves from one phase to the next. The organisation structure and
resource expenditure slowly builds with each succeeding phase, peak, and then decline as the project
nears completion.

Imagine a company which primarily carries out project work for customers. These notes are oriented towards a ‘building
contractor’ type of organisation. However, the same logic can be applied to a mechanical engineering jobbing shop, or indeed
a department receiving requests for work from other areas of a company. The project specification process described here is
in essence generic, although particular examples are mostly oriented towards construction.

Project definition is a process which starts when the customer or investor first conceives the idea of a project. It does not end
until the last piece of information has been filed to describe the project in its finished 'as built condition'. Figure 2.1 shows
some of the elements in the overall process. This section deals with the part of project definition that should take place before
a project is authorised; the part most relevant to setting the project on its proper course and which plays a vital role in helping
to establish any initial contractual commitments.

The sales specification is only the first stage in defining a project. The process is not complete until as-built records have
been made.
Figure 2.1 the process of project definition
With acknowledgements to Dennis Lock "Project Management" 1992

Enquiries and subsequent orders for commercial projects generally enter contracting companies through their sales
engineering or marketing organisation, and it is usually from this source that other departments learn of each new enquiry or
firm order. Even when enquiries bypass the sales organisation, sensible company rules should operate to ensure referral to the
marketing director or sales manager, so that every enquiry is ‘entered into the system'. This will ensure that every enquiry
received can be subjected to a formal screening process for assessing its potential project scope, risk and value. The work
involved in preparing a tender can easily constitute a small project in itself, needing significant preliminary engineering
design work plus sales and office effort that must be properly authorised and budgeted. The potential customer will almost
certainly set a date by which all tenders for the project must be submitted, so that time available for preparation is usually
limited. Everything must be planned, co-ordinated and controlled if a tender of adequate quality is to be delivered on time.
Some companies record their screening decision and appropriate follow-up action on a form.

Before any company (or internal department within a company) can even start the enquiry screening process, and certainly
before tender preparation can be authorised, the customer's requirements must be clearly established and understood. The
project must be defined as well as possible right at the start. The contracting company must know for what it is bidding and
what its commitments would be in the event of winning the contract. Similarly the product development department of a
manufacturing company must understand the product requirements, normally identified in conjunction with the marketing
department.

Adequate project definition is equally important for the customer, who must be clear on what he expects to get for his money.
Project definition is also just as important for a company considering an in-house project, where that company (as the
investor in the project) can be regarded as the customer.

2.3 Quality function deployment


Recently, systematic methods for ensuring customer requirements are incorporated into the specification have gained some
popularity. The most common is a technique known as Quality Function Deployment or QFD.
Quality Function Deployment (QFD) is a step-by-step planning process that is driven by customer requirements. It is used to
produce new products and services, or enhance existing products and services. Customer requirements are continually refined
through the QFD steps until product specifications are identified. QFD is a means to ensure that production focuses on the
features that are wanted by the customer.

2.4 Financial specification


There may, of course, be overriding legal, ethical or operational reasons for going ahead with a project where the financial
considerations are secondary (for example, a programme of structural works which have been declared necessary in order
that a company shall comply with statutory health and safety regulations). In most cases, however, when a commercial
project requiring significant expenditure is contemplated the proper approach is for the customer or investor to use one or
more of the accepted processes for project financial appraisal. These help to forecast the likely net savings or profits that the
investment will yield, put these into perspective against the company's corporate objectives, and assess any risk that the
expected benefits might not accrue.

All of this demands proper and extended project definition, the full scope of which would include the evaluation and
assessment of some or all of the following parameters:

· An outline description of the project, with its required performance characteristics quantified in
unambiguous terms.
· The total amount of expenditure expected to carry out the project and bring its resulting product into
use.
· The expected date when the product can be put effectively to its intended use.
· Forecast of any subsequent operating and maintenance costs for the new product.
· A forecast of the costs of financing likely over the appraisal period (bank interest rates, inflationary
trends, international exchange rate trends, and so on, as appropriate).
· Fiscal considerations (taxes or financial incentives expected under national or local government
legislation).
· A schedule which sets out all known items of expenditure (cash outflows) against the calendar.
· A schedule which sets out all expected savings or other contributions to profits (cash inflows) against
the same calendar.
· A net cash flow schedule (which is the difference between the inflow and outflow schedules, again
tabulated against the same calendar).
For short-term commercial projects the financial appraisal may take the form of a simple payback calculation. This sets out
expenditure against time (it could be tabulated or drawn as a graph) and also plots all the financial benefits (savings or
profits) on the same chart. Supposing that graphs were drawn, the point where the two graphs intersect is the break -even
point, where the project can be said to have 'paid for itself'. The time taken to reach this point is called the payback period.
Any financial sum listed as a saving or cost item in future years will have its significance distorted through the passage of
time (for instance, £100 spent today is more expensive than spending £100 in a year's time, owing to lost interest that the
money might have earned for the investor on deposit in the meantime). Such distortions can have a considerable effect on the
forecast cash flows of a project lasting more than two or three years if factors are not introduced to correct them, and it is best
to use a discounting technique for the financial appraisal of long -term projects.

Project managers do not, of course, have to be expert in all or any of the techniques of project financial appraisal. It may,
however, help to increase their determination to meet the defined objectives if they realise that these were key factors in an
earlier appraisal decision; factors (time, money, performance) on which the investor and the contractor are both dependent if
the completed project is to be a mutual success.

2.5 The customer's project specification


Initial enquiries from customers can take many different forms. Perhaps a set of plans or drawings will be provided, or a
written description of the project objectives. A combination of these two, rough sketches, or even a verbal request are other
possibilities. Ensuing communications between the customer and contractor, both written and verbal, may result in
subsequent qualifications, changes or additions to the original request. All of these factors, taken together and documented,
constitute the 'customer specification', to which all aspects of any tender must relate.

Project scope
Should the quotation be successful and a firm order result, the contractor will have to ensure that the customer's specification
is satisfied in every respect. His commitments will not be confined to the technical details but will encompass the fulfilment
of all specified commercial conditions. The terms of the order may lay down specific rules governing methods for invoicing
and certification of work done for payment. Inspection and standards may be spelled out in the contract and one would
certainly expect to find a well- defined statement of delivery requirements. There may even be a warning that a condition of
the resulting contract will provide for penalties to be paid by the contractor should he default on the agreed delivery dates.

Any failure by the contractor to meet his contractual obligations could obviously be very damaging for his reputation. Bad
news travels fast throughout an industry, and the contractor's competitors will, to put it mildly, not attempt to slow the
process. The contractor may suffer financial loss if the programme cannot be met or if he has otherwise miscalculated the
size of the task which he undertook. It is therefore extremely important for the contractor to determine in advance exactly
what the customer expects for the money.

The customer's specification should therefore set out all the requirements in unambiguous terms, so that they are understood
and similarly interpreted by customer and contractor alike. Much of this section deals with the technical requirements of a
specification but, equally important, is the way in which responsibility for the work is to be shared between the contractor,
the customer, and others. In more precise terms, the scope of work required from the contractor, the size of his contribution
to the project, must be made clear.

At its simplest, the scope of work required might be limited to making and delivering a piece of hardware in accordance with
drawings supplied by the customer. At the other extreme, the scope could be defined so that the contractor handles the project
entirely, and is responsible for conceptual design through until the purchaser is able to accept delivery of a fully completed
and proven project (known as a turnkey operation).

Whether the scope of work lies at one of these extremes or the other, there is always a range of ancillary items that have to be
considered. Will the contractor be responsible for any training of the customer's staff and, if so, how much (if any) training is
to be included in the project contract and price? What about commissioning, or support during the first few weeks or months
of the project's working life? What sort of warranty or guarantee is going to be expected? Are any training, operating or
maintenance instructions to be provided? If so, in what language?

Answers to all of these questions must be provided, as part of project definition, before cost estimates, tenders and binding
contracts can be considered. Checklists are useful way of ensuring that nothing important is forgotten.

Use of checklists
Contractors who have amassed a great deal of experience in their particular field of project operation will learn the type of
questions that must be asked of the customer in order to fill in most of the information gaps and arrive at a specification that
is sufficiently complete.

The simplest level of checklist use is seen when a sales engineer takes a customer's order for equipment that is standard, but
which can be ordered with a range of options. The sales engineer will use a pad of pre-printed forms, ticking off the options
that the customer requires. People selling replacement windows with double-glazing use such pads. So do some automobile
salesmen. The forms are convenient and help to ensure that no important details are omitted when the order is taken and
passed back to the factory for action.

2.6 The contractor's specification


When serious consideration of the customer's specification encourages any contractor to prepare a tender, he must obviously
put forward technical and commercial proposals for carrying out the work. These proposals will also form a basis for the
contractor's own provisional design specification. It is usually necessary to translate the requirements defined by the
customer's specification into a form compatible with the contractor's own normal practice, quality standards, technical
methods and capabilities. The design specification will provide this link.

Concept options
It is well known that the desired end results of a project can often be achieved by a variety of technical or logistical concepts.
There could be considerable differences between proposals submitted by companies competing for the same order. Once an
order has been won, however, the successful contractor knows the general solution which has been chosen. The defeated
alternative options will usually be relegated to history. But there will still remain a considerable range of possibilities for the
detailed design and make- up of the project within the defined boundaries of the accepted proposal and its resulting contract.

Taking just a tiny element of a technical project as an example, suppose that a plant is being designed in which there is a
requirement to position a lever from time to time by automatic remote control. Any one or combination of a number of drive
mechanisms might be chosen. Possibilities include hydraulic, mechanical, pneumatic, or electromagnetic devices. Each of
these could be subdivided into a further series of techniques. If, for example, an electromagnetic system were chosen this
might be a solenoid, a stepping motor or a servo motor. There are still further possible variations within each of these
methods. The device chosen might have to be flameproof or magnetically shielded, or special in some other respect. Every
time the lever has been moved to a new position, several ways can be imagined for measuring and checking the result.
Electro-optical, electrical, electronic or mechanical methods could be considered. Very probably the data obtained from this
positional measurement would be used in some sort of control or feedback system to correct errors. There would, in fact,
exist a very large number of permutations between all the possible ways of providing drive, measurement and positional
control. The arrangement eventually chosen might depend not so much on the optimum solution (if such exists) as on the
contractor's usual design practice or simply on the personal preference of the engineer responsible.

With the possibility of all these different methods for such a simple operation, the variety of choice could approach infinite
proportions when the permutations are considered for all the design decisions for a major project. It is clear that coupled with
all these different possibilities will be a correspondingly wide range of different costs, since some methods by their very
nature must cost more than others. When a price or budget is quoted for a project, this will obviously depend not only on
economic factors (such as the location of the firm and its cost/ profit structure) but also on the system and detailed design
intentions.

It can be seen that owing to their cost implications, the main technical proposals must be established before serious attempts
at estimating can start. Once these design guidelines have been decided, they must be recorded in a provisional design
specification. If this were not done, there would be a danger that a project could be costed, priced and sold against one set of
design solutions but actually executed using a different, more costly, approach. This danger is very real. It occurs in practice
when the period between submitting a quotation and actually receiving the order exceeds a few months, allowing the original
intentions to be forgotten. It also happens when the engineers carrying out the project work decide not to agree with the
original proposals (sometimes called the 'not invented here' syndrome). Projects in this author's experience have strayed so
far from their original design concept for such reasons that their total costs reached more than double their budgets.

Similar arguments apply concerning the need to associate the production methods actually used in manufacturing projects
with those assumed in the cost estimates and subsequent budgets. It can happen that certain rather bright individuals come up
with suggestions during the proposal stage for cutting corners and saving expected costs- all aimed at securing a lower and
more competitive tender price. Provided that these ideas are recorded with the estimates, all will be well and the cost savings
can be achieved when the project goes ahead. Now imagine what could happen if, for instance, a project proposal were to be
submitted by one branch of the organisation, but that when an order eventually materialised responsibility for carrying out
the work was switched to a production facility at some other location in the organisation, with no record of the production
methods originally envisaged. The cost consequences could prove to be nothing short of disastrous. Unfortunately, it is not
necessary to transfer work between locations for mistakes of this kind to arise. Even the resignation of one production
engineer from a manufacturing company could produce such consequences if his intentions had not been adequately
recorded. The golden rule, once again, is to define and document the project in all essential respects before the estimates are
made and translated into budgets and price.

2.7 Product development project specification

Development programmes aimed at the introduction of additions or changes to a company's product range are perhaps more
prone than most to overspending on cost budgets and timescale. One possible cause of this phenomenon is that chronic
engineer's disease which might be termed 'creeping improvement sickness'. Many will recognise the type of situation
illustrated in the following case study:
Case Study Questions
1. Imagine you are involved in discussions between the chief engineer, marketing and the production manager before the
design engineer (George) is called in to brief him on the new product. You decide that you would like to create a written
project specification, rather than rely on a verbal briefing. Draw up a list of the information you would include in the
specification for the new product development project.

2. How would the team decide an appropriate target cost for the unit. How would the target completion date be set?

3. Imagine now that the engineer has received the design brief, along with the written specification you have developed
above. Which problems from the case study would have been avoided by creating and agreeing the project specification?

Your name

Enter your email address

2.8 Developing and documenting the project specification


Given the importance of specifying project requirements as accurately as possible, it is appropriate to end this section with
some thoughts on the preparation of a specification document.

Although the customer may be clear from the very first about his needs, it is usual for dialogue to take place between the
customer and one or more potential contractors before a contract is signed. During this process each contractor can be
expected to make various proposals for executing the profit that effectively add to or amend the customer's initial enquiry
document. In some companies this pre- project phase is aptly known as solution engineering, since the contractor's sales
engineers work to produce and recommend an engineering solution which they consider would best suit the customer (and
win the order).
Solution engineering may last a few days, several months, or even years. It can be an expensive undertaking (especially when
the resulting tender fails to win the contract). Although it is a nice tidy theory to imagine the contractor's sales engineers
putting their pens to paper and writing the definitive project specification at the end of the solution engineering phase, the
practice is likely to be quite different. An original descriptive text, written fairly early in the proceedings, will undergo
additions and amendments as the solution develops, and there will probably be a pile of drawings, artists' impressions,
flowsheets, schedules (or other documents appropriate to the type of project) which themselves have undergone amendments
and substitutions. A fundamental and obvious requirement when the contract is signed is to be able to identify positively
which of these versions record the actual contract commitment. Remember that the latest issue of any document may not be
the correct issue.

Consider, therefore, the composition of a project specification. The following arrangement provides the basis for
unambiguous definition of project requirements at any stage by reference to the specification serial number and its correct
revision number. The total specification will comprise:

1. Binder or folder The specification for a large project is going to be around for some time and receive considerable
handling. It deserves the protection of an adequate binder or folder. This should carry the project number and title,
prominently displayed for easy identification. The binder should be loose-leaf, to allow for the addition or substitution of
amended pages.

2 Descriptive text The narrative describing the project should be written clearly and concisely. The text should be preceded
by a contents list, and be divided logically into sections, with all pages numbered. Every amendment must be given an
identifying serial number or letter, and the overall amendment (or revision) number for the entire specification must be raised
each time the text is changed. Amended paragraphs or additional pages must be highlighted, for example by placing the
relevant amendment number alongside the change (possibly within an inverted triangle in the manner often used for
engineering drawings).

3 Supporting documents Most project specifications need a number of supporting engineering and other documents that
may be too bulky for binding in the folder. All these documents must be listed and treated as part of the specification (see the
next item).

4 Control schedule of specification documents This vital part of the specification should be bound in the folder along with
the main text (either in the front or at the back). This schedule must list every document which forms part of the complete
specification or which is otherwise relevant to adequate project definition (for example, a standard engineering specification).
Minimum data required for each document are its serial and correct revision numbers. Preferably the title of each document
should also be given. Should any of the associated documents itself be complex, it should have its own in-built control
schedule. It usually helps if a control schedule is given the same serial and amendment numbers as the main document which
it is controlling.

Self Assessment Questions

· The following self assessment questions are intended to re-inforce the material presented in chapter
2.
· Once you have completed the questions click the 'submit' button at the bottom of the page.
· To reset given answers and start again click the 'reset' button at the bottom of this page.

Question 1
Describe the four elements of a documented project specification.
Question 2
What factors must be specified in order to appraise a project financially?

Question 3
How should amendments be incorporated into the specification?

Question 4
List the type of information which should be sought from the project "customer" in order to specify a microelectronics
project.

Question 5
List information the "contractor" would add to complete the specification

Question 6
Describe QFD and the main steps in its implementation. What are the benefits of the approach? How could it be applied to a
Microelectronics example?
Microelectronic Project Management

Chapter 3 - Project Teams

Chapter Information

The chapter contains the following sections:


3.1 Introduction
3.2 The major players
3.3 The project
manager
3.4 Senior management
3.5 Client
3.6 Project team
3.7 Selecting and
working in teams

3.1 Introduction
Throughout the history of project management, project managers have managed
their projects according to three criteria: cost, schedule, and quality (see Figure
3.1). They treated all other considerations as subordinate.

Ironically,
following this
approach has
not proven too
successful for
any of the three
criteria.
Projects in most
industries often
exceed project
completion
dates by
months, even
years, and
overrun their
budgets by
thousands, even
millions, of
pounds. In
addition, each
criterion seems
to go in
different
directions.
Meeting the
schedule often
means
foregoing
budget and
quality
considerations.
Adhering to
budget
frequently
means
sacrificing
quality or
ignoring the
schedule.
Concentrating
on quality
means 'blowing'
the budget or
ignoring the
schedule. All
this has
occurred when
project
managers have
a wide array of
project
management
tools and
techniques at
their disposal.
Many plan their
projects by
developing
work break-
down
structures, time
Figure 3.1 - Criteria for managing projects estimates, and
network
diagrams.
Many organise
their projects
by developing
organisation
charts and
forms and
allocating
resources.
Many control
their projects
by collecting
information on
progress of the
project and
developing
contingency
plans to address
anticipated
problems. In
addition, these
tools and
techniques have
become more
sophisticated
and
automated.Then
why the dismal
record, at least
from the
perspective of
the three
criteria?

The answer is that schedule, budget, and quality are not enough. One other
important criterion is missing: people.

What many project managers fail to realise is that their handling of people affects
the outcome of their projects. Indeed, their neglect or mismanagement of people
can affect schedule, cost, and quality.

People management, therefore, is as important for managing a project as schedule,


budget, and quality. Indeed, it can bridge the gap that often exists between the
other three criteria (see Figure 3.1).

Successful project managers are those who recognise the importance of people in
completing their projects. They know that without people no project would exist
in the first place. They also recognise that people play an integral role in
completing the project within budget, on schedule, and with top workmanship.

The people side of project management views people as a critical factor in


completing projects and recognises that handling human beings cannot occur in a
mechanical, systems-oriented way.

In contrast, the 'hardside' of project management entails planning projects by


developing work breakdown structures, network diagrams, and budgets;
organising by developing organisation charts and forms as well as allocating
resources; and controlling by collecting information on progress of the project and
developing contingency plans to address anticipated problems.

The people side is not more important than the hardside and vice versa. Rather,
project managers must recognise the equal importance of both sides. That entails
adding the fourth important criterion, people, to the traditional three: cost,
schedule, and quality.
3.2 The major players
To progress smoothly, project management requires that four key players (shown
in Figure 3.2) participate. These players are the project manager, senior
management, client, and the project team.

3.3 Project manager


As a project manager you play a vital function in the entire project. You are the
one who is responsible for the successful execution of your project. That can only
occur if you take the lead in getting all parties to participate fully in their projects.
You serve as a bridge between all three parties, enabling communication between
senior management, the client, and project team. When any one party fails to
participate in the project, you fail. The communication breakdowns that occur will
lead to obstacles towards making any progress.
Project managers are crucial to a successful project for another obvious reason.
They are the ones responsible for managing the entire project. They are the people
who plan, organize, control, and lead it. If project managers fail to participate
fully in their projects, the likelihood of failure increases.

Some project managers do not participate in a project even though they hold the
title of 'project manager.' They may be uninterested in the project because it was
forced upon them or they assumed the position by circumstance. In response to
this situation, they may fail to plan, organise, control, or lead these projects
adequately. The results are unsuccessful projects, that is, projects that fail to meet
goals and objectives with regard to cost, schedule, and quality.
PROJECT MANAGER The project manager of today
Orchestrates successful delivery of the plays an important central role in
project ensuring that communication and
Enables interactive communications co-ordination among different
among senior management, client and participants occur efficiently and
project team effectively. If project managers
Co-ordinates effective and efficient fail to perform such tasks disaster
participation is soon forthcoming.
Develops project plans, including
estimates, work breakdown structure
and schedules
Provides mechanism for monitoring
and tracking progress regarding
schedule, budget and technical
performance
Creates infrastructure for managing
the project team
SENIOR MANAGEMENT
Determine project's fate (proceed or
stop)
Allocate project support resources
including money and manpower
Identify favoured or preferred
projects
Continued participation throughout the
life cycle
Provide strategic guidance and
direction
CLIENT
Pays for project/product
Co-ordinates with project manager for
project/product clarification
Uses the product
Approves the product
Dedicates resources to the project
including people, time and money
PROJECT TEAM
Supports the project manager
Provides requisite skills and creativity
Operates as a unified team
Works with the clients to obtain
requirements, feedback and approvals
Figure 3.2 - Responsibilities of four key players in projects
3.4 Senior management
The project manager needs the participation of senior management because much
power resides with them. Senior management decides whether the project will
proceed. They also determine the extent of support the project will receive
relative to other projects. If they do not view the project as having much
importance, senior management will allocate resources to more ‘significant'
endeavours. If they have a favourable view, the opposite will occur.
The importance of senior management's participation becomes very clear when
there is a split over how important a project is. This may give a project a 'stop and
go' mode of operation which can result in poor productivity and low morale. The
problem can become even worse if management withdraws their support.

For example, senior management may waver in support of a project due to


changing market conditions. One month they support the project; the next month
they give priority attention to another one. People with special skills may be
pulled from the original project and then sent to another and returned. As a result
the employees start feeling insignificant rather than contributing members of the
company.

Senior management's participation is critical but what is even more important is


the style of participation. If they participate in an overbearing, authoritative
manner, senior management may constrain the project manager and,
consequently, the project. Senior management must do what they do best -
manage. They should not tell members of the team how to do their jobs. If senior
management want the project to succeed they must allow the project manager and
team members the latitude to do the job. That means delegating, something many
senior managers fail to do.

Senior management must not, however, adopt a policy of benign neglect. They
must keep abreast of what occurs on the project. The emphasis is on what, not
how. Feedback up and down the chain of command is absolutely essential.
3.5 Client
The client is the reason why the project exists in the first place. Clients may be
internal or external (outside the company). They pay for the project, either at the
beginning or later. Their participation, like that of senior management, is
principally during the start and end of a project.
The client is not always a monolithic entity but may comprise several types of
people. First, there are the people who pay for the product; they typically are the
principal decision- makers. Second, there are the people whose function is the co-
ordination with the project manager during most of the project; they are the main
contacts for information and clarification. Third, and finally, there are the people
who will use the product in their operational environment; they see the value of
the product in terms of how it improves their productivity.

Dealing with the client requires sensitivity. What and how much to tell a client
depends on your relationship. The best policy, from your perspective as a project
manager, is to maintain an open and honest relationship. Any hint of dishonesty
or duplicity can result in a breakdown of communications and a cancellation of
agreements.

There is another aspect to the requirement for sensitivity and project managers
find themselves caught in a political crossfire. They can make one person on the
client's side happy and inadvertently anger someone else. Project managers must
always be aware of this possibility and focus on the key participants (with respect
to political power) in the client's organisation.

3.6 Project Team


These people comprise the project manager's team and their skills should
complement one another.

Unity and co-operation among team members are absolutely necessary. Projects
involve a diverse number of specialised skills which must complement one
another in achieving goals. If team members fight with one another, energy is
directed into unproductive endeavours. If the team members fight with the client,
the latter can withhold co-operation, or, worse, cancel the contract. If team
members fight with senior management, communications up and down the chain
of command suffer and so, ultimately, will productivity.

Without the support of any one of these people, the quality of the product will
decline. As the project manager, you play an important role in ensuring that senior
management, client, and project team contribute to your project. If your
relationship with them deteriorates in any way or if their relationships with one
another worsen, the people side of project management can prove very difficult
and damage progress, affecting schedule, budget, and workmanship.

3.7 Selecting and working in teams


Some projects require the setting up of a project team. Often the judicious
selection of team members with the right technical and team-working capabilities
can influence the outcome of the project.
Team-roles according to the Belbin model

R. Meredith Belbin describes in his book Management Teams, Why they Succeed
or Fail (1981, Butterworth Heinemann) a number of team-roles that are played by
team-members when bringing them into a team. Understanding the different roles
helps the members in finding their place during the life-cycle of the project.
Teams need to have several of the different roles present in order to be successful.

A major mistake that is made is selecting people upon their technical competence
only. Research carried out by Belbin proved that teams comprised of very clever
members had a significant poor performance compared to other teams. This
phenomenon has been identified as the Apollo syndrome.

In finding explanations for this the following roles were discovered:

1. Creative team roles: PLANT and RESOURCE INVESTIGATOR


2. Leadership roles: SHAPER and CHAIRMAN
3. Miscellaneous roles: COMPANY WORKER, MONITOR, COMPLETER-
FINISHER and TEAM-WORKER
(PL) Plant (Creative):
Inventive, original, imaginative and unorthodox; solves difficult
problems. Weak in communicating and managing ordinary people. He
gives some guidelines in building the team.
(RI) Resource Investigator (Outgoing):
Extrovert, enthusiastic and communicative; Explores opportunities,
develops contacts. Loses interest once initial enthusiasm is passed
(SH) Shaper (Hard Driving):
Influencer; dynamic, outgoing and highly strung; challenges, pressurises
and finds ways around obstacles. Prone to provocation and short-lived
bursts of temper.
(CH) Chairman (Orchestrates):
Mature, confident and trusting; a good chairman; clarifies goals;
promotes decision making.
Not necessarily the most clever or creative member of the group.
(CO) Company worker (Organising):
Disciplined, reliable, observative and efficient; turns ideas into practical
actions.
Inflexible, slow to respond to new possibilities.
(ME) Monitor Evaluator (Objective):
Sober; strategic and discerning; see all options; judges accurately. Lacks
drive and ability to inspire others.
(CF) Completer Finisher (Meticulous):
Painstaking, conscientious and anxious; search out errors and omissions;
delivers on time.
Inclined to worry unduly. Reluctant to delegate.
(TW) Team Worker (Diplomatic):
Social, mild, perceptive and accommodating; listens, builds and averts
friction. Indecisive in crunch situations.
Figure 3.3 Team roles expanded
The lessons drawn from Belbin’s work can be summarised thus:
1. People can contribute in two ways: their functional role and their team-role.
2. An optimal balance between these two roles is needed for each member.
3. The effectiveness of a team depends on the ability of the members to recognize
and adjust themselves to the
relative strength within the team in expertise and specific team-role.
4. Personal qualities make members succeed or fail in certain roles.
5. Technical resources within the team are employed to best advantage only when
there is a right balance of
team-roles.
A team role as defined in Belbin's book is a pattern of behaviour characteristic of the way in
which one team member interacts with another where performance serves to facilitate the
progress of the team as a whole. The value of the team-role theory lies in enabling an
individual or team to benefit from self-knowledge and adjust according to the demands being
made by the external situation.

As individuals differ greatly in personality and in behaviour, so will the team role profiles of
individuals vary. The research that Dr. Belbin conducted showed that the natural variations in
different team roles of individuals can give strength to a team if they occur in the right
combination and are used in an appropriate setting. A need thus developed for a way to make
these research findings available and useful.

The value for project management is an understanding of the necessary ingredients for a
successful team. The book has a self-perception questionnaire that gives someone’s roles as
perceived by himself. It can be used by teams as a team-building exercise in which team-
members get to know and understand each other.

Belbin Self Perception Inventory


An early questionnaire-based tool was developed and became known as the "Self
Perception Inventory". This allows team members to respond to questions posed
concerning their preferred mode of working in teams. From these responses, an
indication of a team members role is given.

Knowledge of preferred team function can be useful both for individuals and the
project manager on many levels.

The website indicated above shows how a "team map" can be constructed to
visualise the team in terms of its members' preferred roles. This can help to
highlight potentially problematic situations such as a team where most members
tend towards one role. One can imagine the difficulties if for example, the team is
made up entirely of the "plant" role, where plenty of ideas might be conceived,
but the team has great difficulty in following these through. The Belbin theory
suggests a balanced team, with strengths in most of the roles would be most
effective (at least in terms of team dynamics).

Knowledge of one's own preferred role(s) can also be useful in aiding team
dynamics. If for example, the team seems to lack a natural leader, or perhaps is
struggling with various conflicts, individual members "can take up the missing
role" in an attempt to move the team forward.

For example, if there is a lack of "team-worker" (diplomatic, mild, builds and


averts friction), then when conflict arises, situations may be more difficult to
resolve. A team member who has "team-worker" as perhaps second or even third
preferred role may be able to 'deliberately' swap into this role.

Self Assessment Questions 1 - 3

Place your answers in an email or an attached Word file (using the button below) and please
indicate whether or not you would like a response.

Question 1. - Outline the main roles of the project manager.

Question 2. - Why is it important to ensure close communication between senior


management and the project team?

Question 3. - Describe the Belbin team roles, and how the theory could be applied in the
Microelectronics industry.
Introduction to Programmable Logic Components

Definitions of Relevant Technology


• PLA: a small FPD that contains two levels of logic, an AND-plane and an OR-plane, where
both levels are programmable.

• PAL: a small FPD that contains two levels of logic, an AND-plane and an OR-plane, the
AND-plane is programmable while the OR- plane is fixed.

• CPLD: a more complex PLD that consists of an arrangement of multiple SPLD-like


blocks on a single chip.

• FPGA: Field Programmable Gate Array is an FPD featuring a general structure that
allows very high logic capacity. Whereas CPLDs feature logic resources with a wide
number of inputs, FPGAs offer more narrow logic resources.
– FPGA also offer a higher ratio of flip-flops to
logic resources than CPLDs.
Interconnect: the wiring resources in FPD.

• Programmable Switch: a user programmable switch that can connect a logic element to an
interconnect wire, or one interconnect wire to another.

• Logic Block: a relatively small circuit block that is replicated in an array in an FPD. When a
circuit implemented in an FPD, it is first decomposed into smaller sub-circuits that can
each be mapped into a logic block.

• Logic Capacity: The amount of digital logic that can be mapped into a single FPD.

Evolution of Programmable Devices

• The first type of programmable device was the programmable read-only memory PROM. PROM
is a one-time programmable device that consists of an array of read-only cells.

• Two basic versions of PROMs are available:


mask-programmable are those that can be programmed only by the manufacture, and
field-programmable as those that can be programmed by the end-user.

• Superior speed performance can be obtained with a mask-programmable chip because


connections within the device can be hardwired during manufacture.

• In field programmable, connections always involve some part of programmable switch


that is inherently slower than a hardwired connection.

However, field programmable devices offer advantages that often out-weights its
speed-performance shortcomings:
•Field programmable chips are less expensive
•They can be programmed in very short time.
Two field programmable variants of the PROM, the EPROM and the EEPROM,
both can be erased and reprogrammed many times.

• PROMS are more suitable for the implementation of computer memory.

• Programmable Logic Devices PLDs are designed specifically for implementing logic circuits.

• PLD typically comprises an array of AND gates connected to an array of OR gates.


There are basically two types of PLDs:
• PAL: Programmable Array Logic is the most basic version of a PLD.
PAL consists of a programmable AND-plane followed by a fixed OR-plane. The output of the
OR gate can be optionally registered by a flip-flop in most chips.
•PLA: Programmable Logic Array.
In PLA both AND-plane and OR-plane are programmable.

PLDs can only implement small logic circuits that can be represented with a modest number of
product terms.

Programmable Logic Elements and ASICs

•Programmable logic elements fall under the category of Application Specific Integrated Circuits
ASICs.

•A full custom IC includes logic cells that are customized and all mask layers that are customized.
A microprocessor is an example of a full-custom IC.

Full-custom ICs are the most expensive to manufacture and to design. The time needed
to manufacture an IC is called the manufacturing lead time, is typical 8 weeks
for a full custom IC. Usually these ICs are application specific and therefore they are
called Application Specific Integrated Circuits ASICs.

A semicustom ASIC, these are of two types; standard cell based and gare array
based ASICs.

A programmable ASICs; are these that have logic cells that are predesigned.
There are two types of programmable ASICs: the programmable logic device
PLD and the field programmable gate array FPGA.

Simple designs using Programmable logic

• ROM are memories that can generate all the possible minterms of its input. Dioda
are used to summate minterms into sum-of-product expression.
Example: Consider a ROM that have 3 inputs and that would satisfy the following
function:

Y=Σ(mo, m1, m2, m3, m4)

Solution: We first need to find the sum of product terms.

Example: Draw the schematic diagram of a ROM implementation of the following functions:

W=/A /C + A /C /D + A /B D
X=A /C + BD + AB + /A /B C /D
Y=/C /D + /A B + B /C A /B D
Z=/C D + A B /C + /A B /D + A /B C /D
Example: Consider the following set of logic equations:

W=A B /C + A /B C + /A /B /C
X=A /B + /A C
Y=A B + /A /B + A C
Z=A /B /C + B C

What size of PAL is needed to implement these functions? Draw the schematic diagram?
Notice that there are 4 inputs considered, the maximum number of minterms is three and that
each minterm must be able to select arguments from a domain of three input variables.
Field Programmable Gate Array FPGA

• An FPGA consists of an array of uncommited elements that can be interconnected in a general


way.
• Like PAL, the interconnections between the elements are user-programmable.

• FPGAs were introduced by Xilinx in 1985. Since then, many different FPGAs have been developed
by a number of companies: Actel, Altera, Synaposis and others.
• An FPGA consists of a two-dimentional array of logic blocks that can be connected by general
interconnection resources.

• The interconnect comprises segments of wire, where the segments may be of various lengths.
Within the interconnect are programmable switches that serve to connect the logic blocks to
the wire segments, or one wire segment to another.

• Logic circuits are implemented in FPGA by partitioning the logic into individual logic blocks
and then interconnecting the blocks as required via the switches.

Logic Blocks

• The structure and contents of a logic block are called the architecture.
Logic block architecture can be designed in many different ways.
Most logic blocks also contain some type of flip-flops to aid in the
implementation of sequential circuits.

• Each vendor has its own design of logic blocks.


Interconnect Resources

• The structure and content of the interconnect in an FPGA is called its routing architecture.

• Routing architecture consist of both wire segments and programmable switches including static RAM
cells, anti-fuses, EPROM and EEPROM transistors.

• There exist many different ways to design the structure of the routing architecture.
Applications of FPGAs

• FPGAs can be used in almost all of the applications that currently use Mask-
Programmable Gate Arrays, PLDs and Small Scale Integration SSI logic chips.
Following is a few categories of such designs:

• Application-Specific Integrated Circuits FPGAs are particularly suited for


implementation of ASICs
• Prototyping : The low cost of the implementation and the short time needed
to physically realize a design are two important advantages here.
• FPGA-Based Computer Engines. On-Site Re-configurable Hardware

Implementation Process
• The starting point is the design entry of the circuit to be implemented. This step
typically involves:

• drawing a schematic
• entering an HDL code
• specifying Boolean expressions

• Or State Diagrams

• Regardless of the initial design entry method, the circuit is translated into
a standard form such as Boolean expressions.

• The Boolean expressions are then processed by logic optimizer tool, which manipulates the
expressions. The goal is to optimize these expressions to optimize the area and/or speed of
the final circuit.

• The optimized Boolean expressions are then transformed into a circuit of FPGA logic blocks.
This is done by a technology mapping program.

• The mapper may attempt to optimize the total number of blocks required (area optimization).

• Alternatively, the objective may be to minimize the number of stages of logic blocks in time-
critical paths (delay optimization).

• The next step is to decide on where to place each block in the FPGA’s array.

• Typical placement algorithms attempt to minimize the total length of interconnect required for the
resulting placement.

• The next step is the routing. The routing assigns the FPGA’s wire segments and chooses
programmable switches to establish the required connections among the logic blocks. It is often
necessary to do routing such that propagation delays in time-critical connections are minimized.

• The final step is the programming. The CAD system’s output is fet to a
programming unit which configures the final FPGA chip.

Programmable Technologies

• Programming elements are used to implement the programmable connections


among the routing resouces and logic elements.

• Programmable Read Only Memory (PROM) devices are usually thought of as


memory elements.
• However, the PROM can be thought of functionally as a fixed AND array, followed
by a programmable OR array.

ROM
• Each bit position in the memory consists of a transistor switch, bipolar
or MOS connected in series with a small fuse.
Fuse

• Fuses are made of nichrome, titanium tungsten or polycrystalline silicon.

• The fuses break open when current flowing through exceeds a certain limit.

• As the fuses start to heat up, reducing its resistance, which rapidly increases the current
flow causing further rapid heating until the fuse is vaporised to form an open circuit.

Antifuse

• An antifuse is the opposite of a regular fuse—an antifuse is normally an open circuit


until you force a programming current through it (about 5 mA).

• In a poly–diffusion antifuse the high current density causes a large power dissipation
in a small area, which melts a thin insulating dielectric between polysilicon and diffusion
electrodes and forms a thin (about 20 nm in diameter), permanent, and resistive silicon link .

• The programming process also drives dopant atoms from the poly and diffusion
electrodes into the link, and the final level of doping determines the resistance
value of the link.
Metal-Metal Antifuse

• There are two advantages of a metal–metal antifuse over a poly–diffusion antifuse.

– The first is that connections to a metal–metal antifuse are direct to metal—


the wiring layers. Connections from a poly–diffusion antifuse to the wiring
layers require extra space and create additional parasitic capacitance.

– The second advantage is that the direct connection to the low-resistance metal
layers makes it easier to use larger programming currents to reduce
the antifuse resistance.

• The long-term reliability of antifuses is an important issue since there is a tendency


for the antifuse properties to change over time.
• There have been some problems in this area, but as a result we now know an enormous
amount about this failure mechanism.

• There are many failure mechanisms in ICs—electromigration is a classic example—


and engineers have learned to deal with these problems.

Static RAM

• An example of static RAM ( SRAM ) programming technology is shown below.

• This Xilinx SRAM configuration cell is constructed from two cross-coupled inverters
and uses a standard CMOS process. The configuration cell drives the gates of other
transistors on the chip—either turning pass transistors or transmission gates on to make a
connection or off to break a connection.

• The advantages of SRAM programming technology are that designers can reuse chips
during prototyping and a system can be manufactured using ISP.

• This programming technology is also useful for upgrades—a customer can be sent a new
configuration file to reprogram a chip, not a new chip.

• Designers can also update or change a system on the fly in reconfigurable hardware .

• The disadvantage of using SRAM programming technology is that you need to keep
power supplied to the programmable ASIC (at a low level) for the volatile SRAM
to retain the connection information.

• Alternatively you can load the configuration data from a permanently programmed memory
(typically a programmable read-only memory or PROM ) every time you turn the system on.

• The total size of an SRAM configuration cell plus the transistor switch that the SRAM
cell drives is also larger than the programming devices used in the antifuse technologies.

EPROM

EPROM (erasable programmable read-only memory) is programmable read-only memory


(programmable ROM) that can be erased and re-used.

Erasure is caused by shining an intense ultraviolet light through a window that is designed
into the memory chip. (Although ordinary room lighting does not contain enough ultraviolet light
to cause erasure, bright sunlight can cause erasure. For this reason, the window is usually
covered with a label when not installed in the computer.)

• Altera MAX 5000 EPLDs and Xilinx EPLDs both use UV-erasable EPROM cells as
their programming technology.

• Altera's EPROM cell is shown below.

• The EPROM cell is almost as small as an antifuse. An EPROM transistor looks like a normal
MOS transistor except it has a second, floating, gate.

• Applying a programming voltage V PP (usually greater than 12 V) to the drain of the n- channel
EPROM transistor programs the EPROM cell.

• A high electric field causes electrons flowing toward the drain to move so fast they “jump” across
the insulating gate oxide where they are trapped on the bottom, floating, gate.

• We say these energetic electrons are hot and the effect is known as hot-electron injection
or avalanche injection . EPROM technology is sometimes called floating-gate avalanche
MOS (FAMOS ).
EEPROM

EEPROM (electrically erasable programmable read-only memory) is user-modifiable


read-only memory (ROM) that can be erased and reprogrammed (written to) repeatedly
through the application of higher than normal electrical voltage.

Unlike EPROM chips, EEPROMs do not need to be removed from the computer to
be modified. However, an EEPROM chip has to be erased and reprogrammed in its
entirety, not selectively.

Register Transfer and Data Paths

• In digital systems there exists two types of modules:

• Datapath: performs data-processing operations

• Control unit: determines the sequence of those operations


• Datapaths are defined by their registers and the operations that are performed on
data stored in them.

• Examples of register operations are shift, count, clear and load.

• Registers are considered the basic elements of any digital system.

• The movement of the data stored in registers and the processing performed
on them is referred to as register transfer operations.

• Register transfer operations of digital systems are specified by the following


three basic components:

1. The set of registers in the system

2. The operations that are performed on the data stored in the registers, and

3. The control the oversees the sequence in the system.

• Elementary operations performed on data in registers is called microoperations.

• The control signal provides signals that sequence the microoperations in prescribed manner.
Basic Technology Requirements

Switching
An input must be able to affect an output conditionally
At least one of the functions AND or OR

Inversion
In must be possible to invert (NOT) the logic state

Amplification
Nothing is 100% efficient so somegain must be provided

Instantiation
It must be possible to make many copies of the basic elements and
connect them selectively
The output of one gate is connectable to the inputs of others

Any new technology must fulfil these requirements.

Universal Logic Elements


The lectures in this course will concentrate mostly on gate-level
CMOS design.
This is the technology most used for:
Microprocessors (Pentiums et alia)
ASICs – Application Specific Integrated Circuits
ASSPs – Application Specific Standard Products
General purpose logic chips

Other implementation technologies are used in some applications.

Specifically:

The Xilinx FPGAs used in the laboratories use RAM look-up tables.

Other Programmable Logic Devices (PLDs)

Memory devices use a variety of specialised processes


Programmable Logic Devices
PLDs do not really feature in this course. This is just a glimpse …

The structure allows a set of sum-of-product terms to be evaluated.


(This is what a Karnaugh map evaluates to.)
ASICs and Custom Chips

The VLSI part of this course is concerned primarily with the way
Gates are integrated on to large devices. Of course FPGAs, PLDs,
memories etc. are also made in very similar ways.
The integrated circuits most commonly encountered in industry are
Application Specific Integrated Circuits (ASICs).These are devices
Aimed at integrating as many desired functions as possible onto one
device.

Integrated circuits are now sufficiently large (incapacity) that in


Many cases it is possible to produce a complete System -on- Chip
(SoC).

Design methodologies will include some or all of:


Schematic Design
Logic Synthesis
Hardware Description Languages (HDLs)

and will combine locally produced and bought in designs.


Architecture of FPGAs and CPLDs: A Tutorial

Abstract

This paper provides a tutorial survey of architectures of commercially available high-capacity


field-programmable devices (FPDs). We first define the relevant terminology in the field and then

describe the recent evolution of FPDs. The three main categories of FPDs are delineated: Simple

PLDs (SPLDs), Complex PLDs (CPLDs) and Field-Programmable Gate Arrays (FPGAs). We
then give details of the architectures of all of the most important commercially available chips,

and give examples of applications of each type of device.


1 Introduction to High-Capacity FPDs
Prompted by the development of new types of sophisticated field-programmable devices (FPDs),
the process of designing digital hardware has changed dramatically over the past few years.

Unlike previous generations of technology, in which board-level designs included large numbers

of SSI chips containing basic gates, virtually every digital design produced today consists mostly

of high-density devices. This applies not only to custom devices like processors and memory, but

also for logic circuits such as state machine controllers, counters, registers, and decoders. When

such circuits are destined for high-volume systems they have been integrated into high-density

gate arrays. However, gate array NRE costs often are too expensive and gate arrays take too long
to manufacture to be viable for prototyping or other low-volume scenarios. For these reasons,

most prototypes, and also many production designs are now built using FPDs. The most compel-

ling advantages of FPDs are instant manufacturing turnaround, low start-up costs, low financial

risk and (since programming is done by the end user) ease of design changes.

The market for FPDs has grown dramatically over the past decade to the point where there is
now a wide assortment of devices to choose from. A designer today faces a daunting task to

research the different types of chips, understand what they can best be used for, choose a particu-

lar manufacturers’s product, learn the intricacies of vendor-specific software and then design the

hardware. Confusion for designers is exacerbated by not only the sheer number of FPDs avail-

able, but also by the complexity of the more sophisticated devices. The purpose of this paper is to

provide an overview of the architecture of the various types of FPDs. The emphasis is on devices

with relatively high logic capacity; all of the most important commercial products are discussed.

Before proceeding, we provide definitions of the terminology in this field. This is necessary

because the technical jargon has become somewhat inconsistent over the past few years as compa-

nies have attempted to compare and contrast their products in literature.


1.1 Definitions of Relevant Terminology

The most important terminology used in this paper is defined below.

• Field-Programmable Device (FPD) — a general term that refers to any type of integrated cir-
cuit used for implementing digital hardware, where the chip can be configured by the end user
to realize different designs. Programming of such a device often involves placing the chip into
a special programming unit, but some chips can also be configured “in-system”. Another name
for FPDs is programmable logic devices (PLDs); although PLDs encompass the same types of
chips as FPDs, we prefer the term FPD because historically the word PLD has referred to rela-
tively simple types of devices.

• PLA — a Programmable Logic Array (PLA) is a relatively small FPD that contains two levels
of logic, an AND-plane and an OR-plane, where both levels are programmable (note: although
PLA structures are sometimes embedded into full-custom chips, we refer here only to those
PLAs that are provided as separate integrated circuits and are user-programmable).

• PAL* — a Programmable Array Logic (PAL) is a relatively small FPD that has a programma-
ble AND-plane followed by a fixed OR-plane

• SPLD — refers to any type of Simple PLD, usually either a PLA or PAL

• CPLD — a more Complex PLD that consists of an arrangement of multiple SPLD-like blocks
on a single chip. Alternative names (that will not be used in this paper) sometimes adopted for
this style of chip are Enhanced PLD (EPLD), Super PAL, Mega PAL, and others.

• FPGA — a Field-Programmable Gate Array is an FPD featuring a general structure that allows
very high logic capacity. Whereas CPLDs feature logic resources with a wide number of inputs
(AND planes), FPGAs offer more narrow logic resources. FPGAs also offer a higher ratio of
flip-flops to logic resources than do CPLDs.

• HCPLDs — high-capacity PLDs: a single acronym that refers to both CPLDs and FPGAs. This
term has been coined in trade literature for providing an easy way to refer to both types of
devices. We do not use this term in the paper.

* PAL is a trademark of Advanced Micro Devices.


• Interconnect — the wiring resources in an FPD.

• Programmable Switch — a user-programmable switch that can connect a logic element to an


interconnect wire, or one interconnect wire to another

• Logic Block — a relatively small circuit block that is replicated in an array in an FPD. When a
circuit is implemented in an FPD, it is first decomposed into smaller sub-circuits that can each
be mapped into a logic block. The term logic block is mostly used in the context of FPGAs, but
it could also refer to a block of circuitry in a CPLD.

• Logic Capacity — the amount of digital logic that can be mapped into a single FPD. This is
usually measured in units of “equivalent number of gates in a traditional gate array”. In other
words, the capacity of an FPD is measured by the size of gate array that it is comparable to. In
simpler terms, logic capacity can be thought of as “number of 2-input NAND gates”.

• Logic Density — the amount of logic per unit area in an FPD.

• Speed-Performance — measures the maximum operable speed of a circuit when implemented


in an FPD. For combinational circuits, it is set by the longest delay through any path, and for
sequential circuits it is the maximum clock frequency for which the circuit functions properly.

In the remainder of this section, to provide insight into FPD development the evolution of

FPDs over the past two decades is described. Additional background information is also included
on the semiconductor technologies used in the manufacture of FPDs.

1.2 Evolution of Programmable Logic Devices

The first type of user-programmable chip that could implement logic circuits was the Programma-
ble Read-Only Memory (PROM), in which address lines can be used as logic circuit inputs and

data lines as outputs. Logic functions, however, rarely require more than a few product terms, and

a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient architecture
for realizing logic circuits, and so are rarely used in practice for that purpose. The first device

developed later specifically for implementing logic circuits was the Field-Programmable Logic

Array (FPLA), or simply PLA for short. A PLA consists of two levels of logic gates: a program-
Inputs & Flip−flop feedbacks

AND D

D
Outputs
Plane D

Figure 1 - Structure of a PAL.

mable “wired” AND-plane followed by a programmable “wired” OR-plane. A PLA is structured

so that any of its inputs (or their complements) can be AND’ed together in the AND-plane; each

AND-plane output can thus correspond to any product term of the inputs. Similarly, each OR-

plane output can be configured to produce the logical sum of any of the AND-plane outputs. With
this structure, PLAs are well-suited for implementing logic functions in sum-of-products form.

They are also quite versatile, since both the AND terms and OR terms can have many inputs (this

feature is often referred to as wide AND and OR gates).

When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that

they were expensive to manufacture and offered somewhat poor speed-performance. Both disad-

vantages were due to the two levels of configurable logic, because programmable logic planes
were difficult to manufacture and introduced significant propagation delays. To overcome these

weaknesses, Programmable Array Logic (PAL) devices were developed. As Figure 1 illustrates,
PALs feature only a single level of programmability, consisting of a programmable “wired” AND-

plane that feeds fixed OR-gates. To compensate for lack of generality incurred because the OR-
plane is fixed, several variants of PALs are produced, with different numbers of inputs and out-

puts, and various sizes of OR-gates. PALs usually contain flip-flops connected to the OR-gate out-

puts so that sequential circuits can be realized. PAL devices are important because when
introduced they had a profound effect on digital hardware design, and also they are the basis for

some of the newer, more sophisticated architectures that will be described shortly. Variants of the

basic PAL architecture are featured in several other products known by different acronyms. All
small PLDs, including PLAs, PALs, and PAL-like devices are grouped into a single category

called Simple PLDs (SPLDs), whose most important characteristics are low cost and very high

pin-to-pin speed-performance.

As technology has advanced, it has become possible to produce devices with higher capacity

than SPLDs. The difficulty with increasing capacity of a strict SPLD architecture is that the struc-

ture of the programmable logic-planes grow too quickly in size as the number of inputs is
increased. The only feasible way to provide large capacity devices based on SPLD architectures is

then to integrate multiple SPLDs onto a single chip and provide interconnect to programmably

connect the SPLD blocks together. Many commercial FPD products exist on the market today
with this basic structure, and are collectively referred to as Complex PLDs (CPLDs).

CPLDs were pioneered by Altera, first in their family of chips called Classic EPLDs, and then

in three additional series, called MAX 5000, MAX 7000 and MAX 9000. Because of a rapidly
growing market for large FPDs, other manufacturers developed devices in the CPLD category and

there are now many choices available. All of the most important commercial products will be

described in Section 2. CPLDs provide logic capacity up to the equivalent of about 50 typical

SPLD devices, but it is somewhat difficult to extend these architectures to higher densities. To
build FPDs with very high logic capacity, a different approach is needed.
The highest capacity general purpose logic chips available today are the traditional gate arrays

sometimes referred to as Mask-Programmable Gate Arrays (MPGAs). MPGAs consist of an array

of pre-fabricated transistors that can be customized into the user’s logic circuit by connecting the
transistors with custom wires. Customization is performed during chip fabrication by specifying

the metal interconnect, and this means that in order for a user to employ an MPGA a large setup

cost is involved and manufacturing time is long. Although MPGAs are clearly not FPDs, they are
mentioned here because they motivated the design of the user-programmable equivalent: Field-

Programmable Gate Arrays (FPGAs). Like MPGAs, FPGAs comprise an array of uncommitted

circuit elements, called logic blocks, and interconnect resources, but FPGA configuration is per-

formed through programming by the end user. An illustration of a typical FPGA architecture
appears in Figure 2. As the only type of FPD that supports very high logic capacity, FPGAs have

been responsible for a major shift in the way digital circuits are designed.

Logic
Block

I/O Block

Figure 2 - Structure of an FPGA.


Figure 3 summarizes the categories of FPDs by listing the logic capacities available in each of

the three categories. In the figure, “equivalent gates” refers loosely to “number of 2-input NAND

gates”. The chart serves as a guide for selecting a specific device for a given application, depend-
ing on the logic capacity needed. However, as we will discuss shortly, each type of FPD is inher-

ently better suited for some applications than for others. It should also be mentioned that there

exist other special-purpose devices optimized for specific applications (e.g. state machines, ana-
log gate arrays, large interconnection problems). However, since use of such devices is limited

they will not be described here. The next sub-section discusses the methods used to implement the

user-programmable switches that are the key to the user-customization of FPDs.

1.3 User-Programmable Switch Technologies

The first type of user-programmable switch developed was the fuse used in PLAs. Although fuses

are still used in some smaller devices, we will not discuss them here because they are quickly

being replaced by newer technology. For higher density devices, where CMOS dominates the IC

industry, different approaches to implementing programmable switches have been developed. For
CPLDs the main switch technologies (in commercial products) are floating gate transistors like

40000 ***

20000


Equivalent 12000 **


Gates
5000 *


2000
1000

Legend
200


*** Altera FLEX 10000, AT&T ORCA 2


** Altera MAX 9000
* Altera MAX 7000, AMD Mach, Lattice (p)LSI, Cypress FLASH370, Xilinx XC9500

 

SPLDs CPLDs FPGAs

Figure 3 - FPD Categories by Logic Capacity.


those used in EPROM and EEPROM, and for FPGAs they are SRAM and antifuse. Each of these

is briefly discussed below.

An EEPROM or EPROM transistor is used as a programmable switch for CPLDs (and also

for many SPLDs) by placing the transistor between two wires in a way that facilitates implemen-
tation of wired-AND functions. This is illustrated in Figure 4, which shows EPROM transistors as

they might be connected in an AND-plane of a CPLD. An input to the AND-plane can drive a

product wire to logic level ‘0’ through an EPROM transistor, if that input is part of the corre-
sponding product term. For inputs that are not involved for a product term, the appropriate

EPROM transistors are programmed to be permanently turned off. A diagram for an EEPROM-
based device would look similar.

Although there is no technical reason why EPROM or EEPROM could not be applied to

FPGAs, current commercial FPGA products are based either on SRAM or antifuse technologies,

as discussed below.

An example of usage of SRAM-controlled switches is illustrated in Figure 5, showing two


applications of SRAM cells: for controlling the gate nodes of pass-transistor switches and to con-

+5 V 


input wire input wire

product wire

EPROM EPROM

Figure 4 - EPROM Programmable Switches.


Logic Cell Logic Cell


SRAM


SRAM
SRAM

Logic Cell Logic Cell

Figure 5 - SRAM-controlled Programmable Switches.

trol the select lines of multiplexers that drive logic block inputs. The figures gives an example of

the connection of one logic block (represented by the AND-gate in the upper left corner) to

another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM

cells. Whether an FPGA uses pass-transistors or multiplexers or both depends on the particular
product.

The other type of programmable switch used in FPGAs is the antifuse. Antifuses are origi-

nally open-circuits and take on low resistance only when programmed. Antifuses are suitable for

FPGAs because they can be built using modified CMOS technology. As an example, Actel’s anti-

fuse structure, known as PLICE [Ham88], is depicted in Figure 6. The figure shows that an anti-

fuse is positioned between two interconnect wires and physically consists of three sandwiched

layers: the top and bottom layers are conductors, and the middle layer is an insulator. When

unprogrammed, the insulator isolates the top and bottom layers, but when programmed the insula-

tor changes to become a low-resistance link. PLICE uses Poly-Si and n+ diffusion as conductors
e oxide
wir 
Poly−Si
dielectric
wire

antifuse

n+ diffision
silicon substrate

Figure 6 - Actel Antifuse Structure.

and ONO (see [Ham88]) as an insulator, but other antifuses rely on metal for conductors, with
amorphous silicon as the middle layer [Birk92][Marp94].

Table 1 lists the most important characteristics of the programming technologies discussed in

this section. The left-most column of the table indicates whether the programmable switches are

one-time programmable (OTP), or can be re-programmed (RP). The next column lists whether the

switches are volatile, and the last column names the underlying transistor technology.

Name Re-programmable Volatile Technology

Fuse no no Bipolar

EPROM yes no UVCMOS


out of circuit

EEPROM yes no EECMOS


in circuit

SRAM yes yes CMOS


in circuit

Antifuse no no CMOS+

Table 1 - Summary of Programming Technologies.


1.4 Computer Aided Design (CAD) Flow for FPDs

When designing circuits for implementation in FPDs, it is essential to employ Computer-Aided

Design (CAD) programs. Such software tools are discussed briefly in this section to provide a feel

for the design process involved.

CAD tools are important not only for complex devices like CPLDs and FPGAs, but also for
SPLDs. A typical CAD system for SPLDs would include software for the following tasks: initial

design entry, logic optimization, device fitting, simulation, and configuration. This design flow is
illustrated in Figure 7, which also indicates how some stages feed back to others. Design entry

may be done either by creating a schematic diagram with a graphical CAD tool, by using a text-

based system to describe a design in a simple hardware description language, or with a mixture of
design entry methods. Since initial logic entry is not usually in an optimized form, algorithms are

employed to optimize the circuits, after which additional algorithms analyse the resulting logic

equations and “fit” them into the SPLD. Simulation is used to verify correct operation, and the

user would return to the design entry step to fix errors. When a design simulates correctly it can be
loaded into a programming unit and used to configure an SPLD. One final detail to note about Fig-

ure 7 is that while the original design entry step is performed manually by the designer, all other

steps are carried out automatically by most CAD systems.

fix errors

merge & optimize device simulate


text entry translate equations fitter SPLD

schematic Programming Unit


capture configuration
file

manual automatic

Figure 7 - CAD Design Flow for SPLDs.


The steps involved for implementing circuits in CPLDs are similar to those for SPLDs, but the

tools themselves are more sophisticated. Because the devices are complex and can accommodate

large designs, it is more common to use a mixture of design entry methods for different modules
of a complete circuit. For instance, some modules might be designed with a small hardware

description language like ABEL, others drawn using a symbolic schematic capture tool, and still

others described via a full-featured hardware description language such as VHDL. Also, for
CPLDs the process of “fitting” a design may require steps similar to those described below for

FPGAs, depending on how sophisticated the CPLD is. The necessary software for these tasks is

supplied either by the CPLD manufacturer or a third party.

The design process for FPGAs is similar to that for CPLDs, but additional tools are needed to

support the increased complexity of the chips. The major difference is in the “device fitter” step

that comes after logic optimization and before simulation, where FPGAs require at least three
steps: a technology mapper to map from basic logic gates into the FPGA’s logic blocks, placement

to choose which specific logic blocks to use in the FPGA, and a router to allocate the wire seg-

ments in the FPGA to interconnect the logic blocks. With this added complexity, the CAD tools
might require a fairly long period of time (often more than an hour or even several hours) to com-

plete their tasks.

2 Overview of Commercially Available FPDs


This section provides many examples of commercial FPD products. SPLDs are first discussed

briefly, and then details are given for all of the most important CPLDs and FPGAs. The reader

who is interested in more details on the commercial products is encouraged to contact the manu-

facturers, or their distributors, for the latest data sheets*.

* Most FPD manufacturers now provide their data sheets on the world wide web, and can be located
at URL “http://www.companyname.com”.
2.1 Commercially Available SPLDs

As the staple for digital hardware designers for the past two decades, SPLDs are very important

devices. SPLDs represent the highest speed-performance FPDs available, and are inexpensive.

However, they are also fairly straight-forward and well understood, so this paper will discuss

them only briefly.

Two of the most popular SPLDs are the PALs produced by Advanced Micro Devices (AMD)

known as the 16R8 and 22V10. Both of these devices are industry standards and are widely sec-
ond-sourced by various companies. The name “16R8” means that the PAL has a maximum of 16

inputs (there are 8 dedicated inputs and 8 input/outputs), and a maximum of 8 outputs. The “R”

refers to the type of outputs provided by the PAL and means that each output is “registered” by a
D flip-flop. Similarly, the “22V10” has a maximum of 22 inputs and 10 outputs. Here, the “V”

means each output is “versatile” and can be configured in various ways, some configurations reg-

istered and some not.

Another widely used and second sourced SPLD is the Altera Classic EP610. This device is

similar in complexity to PALs, but it offers more flexibility in the way that outputs are produced
and has larger AND- and OR- planes. In the EP610, outputs can be registered and the flip-flops

are configurable as any of D, T, JK, or SR.

In addition to the SPLDs mentioned above many other products are available from a wide

array of companies. All SPLDs share common characteristics, like some sort of logic planes

(AND, OR, NOR, or NAND), but each specific product offers unique features that may be partic-

ularly attractive for some applications. A partial list of companies that offer SPLDs includes:

AMD, Altera, ICT, Lattice, Cypress, and Philips-Signetics. Since some of these SPLDs have com-
plexity approaching that found in CPLDs, the paper will now move on to more sophisticated

devices.
I/O
LAB
Block

PIA

Figure 8 - Altera MAX 7000 Series.

2.2 Commercially Available CPLDs

As stated earlier, CPLDs consist of multiple SPLD-like blocks on a single chip. However, CPLD
products are much more sophisticated than SPLDs, even at the level of their basic SPLD-like

blocks. In this section, CPLDs are discussed in detail, first by surveying the available commercial
products and then by discussing the types of applications for which CPLDs are best suited. Suffi-

cient details are presented to allow a comparison between the various competing products, with

more attention being paid to devices that we believe are in more widespread use than others.

2.2.1 Altera CPLDs

Altera has developed three families of chips that fit within the CPLD category: MAX 5000,

MAX 7000, and MAX 9000. Here, the discussion will focus on the MAX 7000 series, because it

is widely used and offers state-of-the-art logic capacity and speed-performance. MAX 5000 rep-

resents an older technology that offers a cost effective solution, and MAX 9000 is similar to MAX
7000, except that MAX 9000 offers higher logic capacity (the industry’s highest for CPLDs).

The general architecture of the Altera MAX 7000 series is depicted in Figure 8. It comprises

an array of blocks called Logic Array Blocks (LABs), and interconnect wires called a Program-
mable Interconnect Array (PIA). The PIA is capable of connecting any LAB input or output to

any other LAB. Also, the inputs and outputs of the chip connect directly to the PIA and to LABs.

A LAB can be thought of as a complex SPLD-like structure, and so the entire chip can be consid-
ered to be an array of SPLDs. MAX 7000 devices are available both based in EPROM and

EEPROM technology. Until recently, even with EEPROM, MAX 7000 chips could be program-

mable only “out-of-circuit” in a special-purpose programming unit; however, in 1996 Altera


released the 7000S series, which is reprogrammable “in-circuit”.

The structure of a LAB is shown in Figure 9. Each LAB consists of two sets of eight macro-

cells (shown in Figure 10), where a macrocell comprises a set of programmable product terms
(part of an AND-plane) that feeds an OR-gate and a flip-flop. The flip-flops can be configured as

D type, JK, T, SR, or can be transparent. As illustrated in Figure 10, the number of inputs to the

Array of 16
Macrocells
I/O Control Block

P
to I/O Cells
I
A

product-term sharing

to other LABs

from I/O pins

LAB

Figure 9 - Altera MAX 7000 Logic Array Block (LAB).


OR-gate in a macrocell is variable; the OR-gate can be fed from any or all of the five product

terms within the macrocell, and in addition can have up to 15 extra product terms from macrocells

in the same LAB. This product term flexibility makes the MAX 7000 series LAB more efficient in
terms of chip area because typical logic functions do not need more than five product terms, and

the architecture supports wider functions when they are needed. It is interesting to note that vari-

able sized OR-gates of this sort are not available in basic SPLDs (see Figure 1). Similar features
of this kind are found in other CPLD architectures discussed shortly.

Besides Altera, several other companies produce devices that can be categorized as CPLDs.

For example, AMD manufacturers the Mach family, Lattice has the (i)pLSI series, Xilinx pro-
duces a CPLD series that they call XC7000 (unrelated to the Altera MAX 7000 series) and has

announced a new family called XC9500, and ICT has the PEEL array. These devices are dis-

cussed in the following sub-sections.

inputs from other


macrocells in LAB Global clock

set
state

S
D Q
PIA

array clock

clear
(global clear
to PIA
not shown)

Local LAB Product Select


Interconnect
Matrix

Figure 10 - MAX 7000 Macrocell.


Central Switch Matrix 34V16 PAL
I/O (32)

I/O (8) I/O (8)

I/O (8) I/O (8)

I (12) clk (4)

I/O (8) I/O (8)

I/O (8) I/O (8)

I/O (32)

Figure 11 - Structure of AMD Mach 4 CPLDs.

2.2.2 Advanced Micro Devices (AMD) CPLDs

AMD offers a CPLD family with five sub-families called Mach 1 to Mach 5. Each Mach device
comprises multiple PAL-like blocks: Mach 1 and 2 consist of optimized 22V16 PALs, and Mach 3

and 4 comprise several optimized 34V16 PALs, and Mach 5 is similar but offers enhanced speed-
performance. All Mach chips are based on EEPROM technology, and together the five sub-fami-

lies provide a wide range of selection, from small, inexpensive chips to larger state-of-the-art

ones. This discussion will focus on Mach 4, because it represents the most advanced currently

available parts in the Mach family.

Figure 11 depicts a Mach 4 chip, showing the multiple 34V16 PAL-like blocks, and the inter-

connect, called Central Switch Matrix, for connecting the blocks together. Chips range in size

from 6 to 16 PAL blocks, which corresponds roughly to 2000 to 5000 equivalent gates and are in-

circuit programmable. All connections in Mach 4 between one PAL block and another (even from

a PAL block to itself) are routed through the Central Switch Matrix. The device can thus be
viewed not only as a collection of PALs, but also as a single large device. Since all connections

travel through the same path, timing delays of circuits implemented in Mach 4 are predictable.

A Mach 4 PAL-like block is depicted in Figure 12. It has 16 outputs and a total of 34 inputs

(16 of which are the outputs fed-back), so it corresponds to a 34V16 PAL. However, there are two
key differences between this block and a normal PAL: 1. there is a product term allocator

between the AND-plane and the macrocells (the macrocells comprise an OR-gate, an EX-OR gate

and a flip-flop), and 2. there is an output switch matrix between the OR-gates and the I/O pins.
These two features make a Mach 4 chip easier to use, because they “decouple” sections of the

PAL block. More specifically, the product term allocator distributes and shares product terms
from the AND-plane to whichever OR-gates require them. This is much more flexible than the

fixed-size OR-gates in regular PALs. The output switch matrix makes it possible for any macro-

cell output (OR-gate or flip-flop) to drive any of the I/O pins connected to the PAL block. Again,
flexibility is enhanced over a PAL, where each macrocell can drive only one specific I/O pin.

Mach 4’s combination of in-system programmability and high flexibility promote easy hardware

design changes.

clock generator
PT allocator, OR, EXOR

output switch matrix


macrocells (ffs)
central switch matrix

output / buried
AND−plane

I/O cells

34 80 16 16 8
I/O (8)

16
input
switch 16
matrix

Figure 12 - AMD Mach 4 PAL-like (34V16) BLock.


2.2.3 Lattice CPLDs

Lattice offers a complete range of CPLDs, with two main product lines: the Lattice pLSI consist

of three families of EEPROM CPLDs, and the ispLSI are the same as the pLSI devices, except

that they are in-system programmable. For both the pLSI and ispLSI products, Lattice offers three

families that have different logic capacities and speed-performance.

Lattice’s earliest generation of CPLDs is the pLSI and ispLSI 1000 series. Each chip consists

of a collection of SPLD-like blocks, described in more detail later, and a global routing pool to
connect blocks together. Logic capacity ranges from about 1200 to 4000 gates. Pin-to-pin delays

are 10 nsec. Lattice also offers a CPLD family called the 2000 series, which are relatively small

CPLDs, with between 600 and 2000 gates that offer a higher ratio of macrocells to I/O pins and
higher speed-performance than the 1000 series. At 5.5 nsec pin-to-pin delays, the 2000 series

offers state-of-the-art speed.

Lattice’s 3000 series represents their largest CPLDs, with up to 5000 gates. Pin-to-pin delays

for this device are about 10-15 nsec. In terms of other chips discussed so far, the 3000 series func-

tionality is most similar to AMD’s Mach 4. The 3000 series offers some enhancements over the
other Lattice parts to support more recent design styles, such as JTAG boundary scan.

The general structure of a Lattice pLSI or ispLSI device is indicated in Figure 13. Around the

outside edges of the chip are the bi-directional I/Os, which are connected both to the Generic

Logic Blocks (GLBs) and the Global Routing Pool (GRP). As the fly-out on the right side of the

figure shows, the GLBs are small PAL-like blocks that consist of an AND-plane, product term

allocator, and macrocells. The GRP is a set of wires that span the entire chip and are available to

connect the GLB inputs and outputs together. All interconnections pass through the GRP, so tim-
ing between levels of logic in the Lattice chips is fully predictable, much as it is for the AMD

Mach devices.
2.2.4 Cypress FLASH370 CPLDs

Cypress has recently developed a family of CPLD products that are similar to both the AMD

and Lattice devices in several ways. The Cypress CPLDs, called FLASH370 are based on FLASH

EEPROM technology, and offer speed-performance of 8.5 to 15 nsec pin-to-pin delays. The

FLASH370 parts are not in-system programmable. Recognizing that larger chips need more I/Os,

FLASH370 provides more I/Os than competing products, featuring a linear relationship between

the number of macrocells and number of bi-directional I/O pins. The smallest parts have 32 mac-
rocells and 32 I/Os and the largest 256 macrocells and 256 I/Os.

Figure14 shows that FLASH370 has a typical CPLD architecture with multiple PAL-like

blocks and a programmable interconnect matrix (PIM) to connect them. Within each PAL-like
block, there is an AND-plane that feeds a product term allocator that directs from 0 to 16 product

terms to each of 32 OR-gates. Note that in the feed-back path from the macrocell outputs to the

Output Generic Logic


Routing Pool Blocks

Global Routing Pool AND product


macrocells
plane term
allocator

Input Bus
I/O Pads

Figure 13 - Lattice (i)PLSI Architecture.


PIM

Figure 14 - Architecture of Cypress FLASH370 CPLDs.

PIM, there are 32 wires; this means that a macrocell can be buried (not drive an I/O pin) and yet

the I/O pin that could be driven by the macrocell can still be used as an input. This illustrates

another type of flexibility available in PAL-like blocks in CPLDs, but not present in normal PALs.

2.2.5 Xilinx XC7000 CPLDs

Although Xilinx is mostly a manufacturer of FPGAs, they also offer a selection of CPLDs, called

XC7000, and have announced a new CPLD family called XC9500. There are two main families
in the XC7000 offering: the 7200 series, originally marketed by Plus Logic as the Hiper EPLDs,

and the 7300 series, developed by Xilinx. The 7200 series are moderately small devices, with

about 600 to 1500 gates capacity, and they offer speed-performance of about 25 nsec pin-to-pin

delays. Each chip consists of a collection of SPLD-like blocks that each have 9 macrocells. The

macrocells in the 7200 series are different from those in other CPLDs in that each macrocell

includes two OR-gates and each of these OR-gates is input to a two-bit Arithmetic Logic Unit
(ALU). The ALU can produce any functions of its two inputs, and its output feeds a configurable

flip-flop. The Xilinx 7300 series is an enhanced version of the 7200, offering more capacity (up to
3000 gates when the entire family becomes available) and higher speed-performance. Finally, the
new XC9500, when available, will offer in-circuit programmability with 5 nsec pin-to-pin delays

and up to 6200 logic gates.

2.2.6 Altera FLASHlogic CPLDs

Altera’s FLASHlogic, previously known as Intel’s FLEXlogic, features in-system programmabil-

ity and provides on-chip SRAM blocks, a unique feature among CPLD products. The upper part
of Figure 15 illustrates the architecture of FLASHlogic devices; it comprises a collection of PAL-

like blocks, called Configurable Function Blocks (CFBs), that each represents an optimized
24V10 PAL.

I/O I/O I/O I/O

in
global interconnect matrix clk

I/O I/O I/O I/O

10 PAL 24V10
data in

address CFB CFB


128 SRAM
in SRAM in PAL
control
10
mode mode
clk
data out

Figure 15 - Altera FLASHlogic CPLD.


In terms of basic structure, FLASHlogic is similar to other products already discussed. How-

ever, they have one unique feature that stands them apart from all other CPLDs: each PAL-like

block, instead of being used for AND-OR logic, can be configured as a block of 10 nsec Static
RAM. This concept is illustrated in the lower part of Figure 15, which shows one CFB being used

as a PAL and another configured as an SRAM. In the SRAM configuration, the PAL block

becomes a 128 word by 10 bit read/write memory. Inputs that would normally feed the AND-
plane in the PAL in this case become address lines, data in, and control signals for the memory.

Notice that the flip-flops and tri-state buffers are still available when the PAL block is configured

as memory.

In the FLASHlogic device, the AND-OR logic plane’s configuration bits are SRAM cells that

are “shadowed” by EPROM or EEPROM cells. The SRAM cells are loaded with a copy of the

non-volatile EPROM or EEPROM memory when power is applied, but it is the SRAM cells that
control the configuration of the chip. It is possible to re-configure the chips in-system by down-

loading new information into the SRAM cells. The SRAM cells’ contents can be written back to

the EEPROM, so that non-volatile re-programming (in-circuit) is available.

2.2.7 ICT PEEL Arrays

The ICT PEEL Arrays are basically large PLAs that include logic macrocells with flop-flops

and feedback to the logic planes. This structure is illustrated by Figure 16, which shows a pro-
grammable AND-plane that feeds a programmable OR-plane. The outputs of the OR-plane are

divided into groups of four, and each group can be input to any of the logic cells. The logic cells

provide registers for the sum terms and can feed-back the sum terms to the AND-plane. Also, the

logic cells connect sum terms to I/O pins.

Because they have a PLA-like structure, logic capacity of PEEL Arrays is somewhat difficult

to measure compared to the CPLDs discussed so far; an estimate is 1600 to 2800 equivalent gates.
Figure 16 - Architecture of ICT PEEL Arrays.

PEEL Arrays offer relatively few I/O pins, with the largest part being offered in a 40 pin package.

Since they do not comprise SPLD-like blocks, PEEL Arrays do not fit well into the CPLD cate-

gory, however the are included here because they represent an example of PLA-based, rather than

PAL-based devices, and they offer larger capacity than a typical SPLD.

The logic cell in the PEEL Arrays, depicted in Figure 17, includes a flip-flop, configurable as
D, T, or JK, and two multiplexers. The multiplexers each produce an output of the logic cell and

can provide either a registered or combinational output. One of the logic cell outputs can connect

Figure 17 - Structure of ICT PEEL Array Logic Cell.


to an I/O pin and the other output is buried. One of the interesting features of the logic cell is that

the flip-flop clock, as well as preset and clear, are full sum-of-product logic functions. This differs

from all other CPLDs, which simply provide product terms for these signals and is attractive for
some applications. Because of their PLA-like OR-plane, the ICT PEEL Arrays are especially

well-suited to applications that require very wide sum terms.

2.2.8 Applications of CPLDs

We will now briefly examine the types of applications which best suit CPLD architectures.
Because they offer high speeds and a range of capacities, CPLDs are useful for a very wide assort-

ment of applications, from implementing random glue logic to prototyping small gate arrays. One

of the most common uses in industry at this time, and a strong reason for the large growth of the
CPLD market, is the conversion of designs that consist of multiple SPLDs into a smaller number

of CPLDs.

CPLDs can realize reasonably complex designs, such as graphics controller, LAN controllers,

UARTs, cache control, and many others. As a general rule-of-thumb, circuits that can exploit wide

AND/OR gates, and do not need a very large number of flip-flops are good candidates for imple-
mentation in CPLDs. A significant advantage of CPLDs is that they provide simple design

changes through re-programming (all commercial CPLD products are re-programmable). With in-

system programmable CPLDs it is even possible to re-configure hardware (an example might be
to change a protocol for a communications circuit) without power-down.

Designs often partition naturally into the SPLD-like blocks in a CPLD. The result is more pre-

dictable speed-performance than would be the case if a design were split into many small pieces

and then those pieces were mapped into different areas of the chip. Predictability of circuit imple-
mentation is one of the strongest advantages of CPLD architectures.
2.3 Commercially Available FPGAs

As one of the largest growing segments of the semiconductor industry, the FPGA market-place is

volatile. As such, the pool of companies involved changes rapidly and it is somewhat difficult to

say which products will be the most significant when the industry reaches a stable state. For this

reason, and to provide a more focused discussion, we will not mention all of the FPGA manufac-

turers that currently exist, but will instead focus on those companies whose products are in wide-

spread use at this time. In describing each device we will list its capacity, nominally in 2-input
NAND gates as given by the vendor. Gate count is an especially contentious issue in the FPGA

industry, and so the numbers given in this paper for all manufacturers should not be taken too seri-
ously. Wags have taken to calling them “dog” gates, in reference to the traditional ratio between

human and dog years.

There are two basic categories of FPGAs on the market today: 1. SRAM-based FPGAs and 2.

antifuse-based FPGAs. In the first category, Xilinx and Altera are the leading manufacturers in

terms of number of users, with the major competitor being AT&T. For antifuse-based products,
Actel, Quicklogic and Cypress, and Xilinx offer competing products.

2.3.1 Xilinx SRAM-based FPGAs

The basic structure of Xilinx FPGAs is array-based, meaning that each chip comprises a two-

dimensional array of logic blocks that can be interconnected via horizontal and vertical routing

channels. An illustration of this type of architecture was shown in Figure 2. Xilinx introduced the

first FPGA family, called the XC2000 series, in about 1985 and now offers three more genera-

tions: XC3000, XC4000, and XC5000. Although the XC3000 devices are still widely used, we
will focus on the more recent and more popular XC4000 family. We note that XC5000 is similar

to XC4000, but has been engineered to offer similar features at a more attractive price, with some

penalty in speed. We should also note that Xilinx has recently introduced an FPGA family based
on anti-fuses, called the XC8100. The XC8100 has many interesting features, but since it is not

yet in widespread use, we will not discuss it here. The Xilinx 4000 family devices range in capac-

ity from about 2000 to more than 15,000 equivalent gates.

The XC4000 features a logic block (called a Configurable Logic Block (CLB) by Xilinx) that
is based on look-up tables (LUTs). A LUT is a small one bit wide memory array, where the

address lines for the memory are inputs of the logic block and the one bit output from the memory

is the LUT output. A LUT with K inputs would then correspond to a 2K x 1 bit memory, and can
realize any logic function of its K inputs by programming the logic function’s truth table directly

into the memory. The XC4000 CLB contains three separate LUTs, in the configuration shown in
Figure 18. There are two 4-input LUTS that are fed by CLB inputs, and the third LUT can be used

in combination with the other two. This arrangement allows the CLB to implement a wide range

of logic functions of up to nine inputs, two separate functions of four inputs or other possibilities.
Each CLB also contains two flip-flops.

C1 C2 C3 C4
Inputs

selector

G4 state Outputs

G3 S
Lookup Q
D Q2
G2 Table
G1
E R
Lookup
Table G

F4 state
F3 S
Lookup
D Q Q1
F2 Table
F1

E R
Vcc

Clock F

Figure 18 - Xilinx XC4000 Configurable Logic Block (CLB).


Toward the goal of providing high density devices that support the integration of entire sys-

tems, the XC4000 chips have “system oriented” features. For instance, each CLB contains cir-

cuitry that allows it to efficiently perform arithmetic (i.e., a circuit that can implement a fast carry
operation for adder-like circuits) and also the LUTs in a CLB can be configured as read/write

RAM cells. A new version of this family, the 4000E, has the additional feature that the RAM can

be configured as a dual port RAM with a single write and two read ports. In the 4000E, RAM
blocks can be synchronous RAM. Also, each XC4000 chip includes very wide AND-planes

around the periphery of the logic block array to facilitate implementing circuit blocks such as

wide decoders.

Besides logic, the other key feature that characterizes an FPGA is its interconnect structure.

The XC4000 interconnect is arranged in horizontal and vertical channels. Each channel contains

some number of short wire segments that span a single CLB (the number of segments in each
channel depends on the specific part number), longer segments that span two CLBs, and very long

segments that span the entire length or width of the chip. Programmable switches are available

(see Figure 5) to connect the inputs and outputs of the CLBs to the wire segments, or to connect
one wire segment to another. A small section of a routing channel representative of an XC4000

device appears in Figure 19. The figure shows only the wire segments in a horizontal channel, and
does not show the vertical routing channels, the CLB inputs and outputs, or the routing switches.

An important point worth noting about the Xilinx interconnect is that signals must pass through

switches to reach one CLB from another, and the total number of switches traversed depends on
the particular set of wire segments used. Thus, speed-performance of an implemented circuit

depends in part on how the wire segments are allocated to individual signals by CAD tools.
vertical channels
not shown

CLB CLB CLB CLB CLB

length 1
wires
length 2
wires
long
wires

CLB CLB CLB CLB CLB

Figure 19 - Xilinx XC4000 Wire Segments.

2.3.2 Altera FLEX 8000 and FLEX 10000 FPGAs

Altera’s FLEX 8000 series consists of a three-level hierarchy much like that found in CPLDs.

However, the lowest level of the hierarchy consists of a set of lookup tables, rather than an SPLD-

like block, and so the FLEX 8000 is categorized here as an FPGA. It should be noted, however,
that FLEX 8000 is a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based

and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000

gates to more than 15,000 for the 8000 series.

The overall architecture of FLEX 8000 is illustrated in Figure 20. The basic logic block,
called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry cir-

cuitry for arithmetic circuits (similar to Xilinx XC4000). The LE also includes cascade circuitry

that allows for efficient implementation of wide AND functions. Details of the LE are illustrated

in Figure 21.

In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term

borrowed from Altera’s CPLDs). As shown in Figure 22, each LAB contains local interconnect

and each local wire can connect any LE to any other LE within the same LAB. Local interconnect
I/O

I/O

FastTrack
interconnect

LAB
(8 Logic Elements
& local interconnect)

Figure 20 - Architecture of Altera FLEX 8000 FPGAs.

Cascade out
Cascade in

data1
data2


S
Look−up Cascade D 
LE out
data3 Q


Table
data4 R

Carry in Carry
Carry out

cntrl1
set/clear
cntrl2

cntrl3
clock
cntrl4

Figure 21 - Altera FLEX 8000 Logic Element (LE).




From FastTrack
interconnect cascade, carry
cntrl
2


4


data 

To FastTrack
LE interconnect
4

To FastTrack


LE


local interconnect interconnect

LE To FastTrack
interconnect

to adjacent LAB

Figure 22 - Altera FLEX 8000 Logic Array Block (LAB).

also connects to the FLEX 8000’s global interconnect, called FastTrack. FastTrack is similar to

Xilinx long lines in that each FastTrack wire extends the full width or height of the device. How-
ever, a major difference between FLEX 8000 and Xilinx chips is that FastTrack consists of only

long lines. This makes the FLEX 8000 easy for CAD tools to automatically configure. All Fast-

Track wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more

predictable than FPGAs that employ many smaller length segments because there are fewer pro-

grammable switches in the longer paths. Predictability is furthered aided by the fact that connec-
tions between horizontal and vertical lines pass through active buffers.

The FLEX 8000 architecture has been extended in the state-of-the-art FLEX 10000 family.

FLEX 10000 offers all of the features of FLEX 8000, with the addition of variable-sized blocks of

SRAM, called Embedded Array Blocks (EABs). This idea is illustrated in Figure 23, which shows

that each row in a FLEX 10000 chip has an EAB on one end. Each EAB is configurable to serve

as an SRAM block with a variable aspect ratio: 256 x 8, 512 x 4, 1K x 2, or 2K x 1. In addition, an

EAB can alternatively be configured to implement a complex logic circuit, such as a multiplier, by
I/O

I/O

EAB

EAB

Figure 23 - Architecture of Altera FLEX 10K FPGAs.

employing it as a large multi-output lookup table. Altera provides, as part of their CAD tools, sev-

eral macro-functions that implement useful logic circuits in EABs. Counting the EABs as logic
gates, FLEX 10000 offers the highest logic capacity of any FPGA, although it is hard to provide

an accurate number.

2.3.3 AT&T ORCA FPGAs

AT&T’s SRAM-based FPGAs feature an overall structure similar to that in Xilinx FPGAs,

and is called Optimized Reconfigurable Cell Array (ORCA). The ORCA logic block is based on

LUTs, containing an array of Programmable Function Units (PFUs). The structure of a PFU is

shown in Figure 24. A PFU possesses a unique kind of configurability among LUT-based logic

blocks, in that it can be configured in the following ways: as four 4-input LUTs, as two 5-input

LUTs, and as one 6-input LUT. A key element of this architecture is that when used as four 4-


LUT D Q

LUT D Q

switch matrix


LUT D Q

LUT D Q

Figure 24 - AT&T Programmable Function Unit (PFU).

input LUTs, several of the LUTs’ inputs must come from the same PFU input. While this reduces

the apparent functionality of the PFU, it also significantly reduces the cost of the wiring associ-
ated with the chip. The PFU also includes arithmetic circuitry, like Xilinx XC4000 and Altera

FLEX 8000, and like Xilinx XC4000 a PFU can be configured as a RAM block. A recently

announced version of the ORCA chip also allows dual-port and synchronous RAM.

ORCA’s interconnect structure is also different from those in other SRAM-based FPGAs.

Each PFU connects to interconnect that is configured in four-bit buses. This provides for more

efficient support for “system-level” designs, since buses are common in such applications. The
ORCA family has been extended in the ORCA 2 series, and offers very high logic capacity up to

40,000 logic gates. ORCA 2 features a two-level hierarchy of PFUs based on the original ORCA

architecture.
2.3.4 Actel FPGAs

In contrast to FPGAs described above, the devices manufactured by Actel are based on anti-

fuse technology. Actel offers three main families: Act 1, Act 2, and Act 3. Although all three gen-

erations have similar features, this paper will focus on the most recent devices, since they are apt

to be more widely used in the longer term. Unlike the FPGAs described above, Actel devices are

based on a structure similar to traditional gate arrays; the logic blocks are arranged in rows and

there are horizontal routing channels between adjacent rows. This architecture is illustrated in
Figure 25. The logic blocks in the Actel devices are relatively small in comparison to the LUT-

based ones described above, and are based on multiplexers. Figure 26 illustrates the logic block in
the Act 3 and shows that it comprises an AND and OR gate that are connected to a multiplexer-

based circuit block. The multiplexer circuit is arranged such that, in combination with the two

logic gates, a very wide range of functions can be realized in a single logic block. About half of

the logic blocks in an Act 3 device also contain a flip-flop.

I/O Blocks

Logic
Block
Routing Rows
Channels
I/O Blocks

I/O Blocks

I/O Blocks

Figure 25 - Structure of Actel FPGAs.


Multiplexer−based
inputs output
Circuit Block

inputs

Figure 26 - Actel Act 3 Logic Module.

As stated above, Actel’s interconnect is organized in horizontal routing channels. The chan-

nels consist of wire segments of various lengths with antifuses to connect logic blocks to wire
segments or one wire to another. Also, although not shown in Figure 25, Actel chips have vertical

wires that overlay the logic blocks, for signal paths that span multiple rows. In terms of speed-per-
formance, it would seem probable that Actel chips are not fully predictable, because the number

of antifuses traversed by a signal depends on how the wire segments are allocated during circuit

implementation by CAD tools. However, Actel provides a rich selection of wire segments of dif-
ferent length in each channel and has developed algorithms that guarantee strict limits on the

number of antifuses traversed by any two-point connection in a circuit which improves speed-per-

formance significantly.

2.3.5 Quicklogic pASIC FPGAs

The main competitor for Actel in antifuse-based FPGAs is Quicklogic, whose has two fami-

lies of devices, called pASIC and pASIC-2. The pASIC-2 is an enhanced version that has only

recently been introduced, and will not be discussed here. The pASIC, as illustrated in Figure 27,

has similarities to several other FPGAs: the overall structure is array-based like Xilinx FPGAs, its

logic blocks use multiplexers similar to Actel FPGAs, and the interconnect consists of only long-
amorphous silicon

metal 2

metal 1
oxide
I/O Blocks

ViaLink at
every wire
crossing
Logic Cell

Figure 27 - Structure of Quicklogic pASIC FPGA.

lines like in Altera FLEX 8000. We note that the pASIC architecture is now independently devel-

oped by Cypress as well, but this discussion will focus only on Quicklogic’s version of their parts.

Quicklogic’s antifuse structure, called ViaLink, is illustrated on the left-hand side of Figure
27. It consists of a top layer of metal, an insulating layer of amorphous silicon, and a bottom layer

of metal. When compared to Actel’s PLICE antifuse, ViaLink offers a very low on-resistance of

about 50 ohms (PLICE is about 300 ohms) and a low parasitic capacitance. Figure 27 shows that

ViaLink antifuses are present at every crossing of logic block pins and interconnect wires, provid-

ing generous connectivity. pASIC’s multiplexer-based logic block is depicted in Fig 28. It is more

complex than Actel’s Logic Module, with more inputs and wide (6-input) AND-gates on the mul-

tiplexer select lines. Every logic block also contains a flip-flops.





QS

A1

A2
A3 

AZ



A4

A5
A6
B1


OZ
B2 0

C1 1


D S Q QZ


C2 0
D1 1
D2 0 R
E1
E2 1
NZ
F1
F2
F3
F4 FZ
F5
F6



QC
QR

Figure 28 - Quicklogic (Cypress) Logic Cell.

2.3.6 Applications of FPGAs

FPGAs have gained rapid acceptance and growth over the past decade because they can be

applied to a very wide range of applications. A list of typical applications includes: random logic,

integrating multiple SPLDs, device controllers, communication encoding and filtering, small to
medium sized systems with SRAM blocks, and many more.

Other interesting applications of FPGAs are prototyping of designs later to be implemented in


gate arrays, and also emulation of entire large hardware systems. The former of these applications

might be possible using only a single large FPGA (which corresponds to a small Gate Array in
terms of capacity), and the latter would entail many FPGAs connected by some sort of intercon-

nect; for emulation of hardware, QuickTurn [Wolff90] (and others) has developed products that

comprise many FPGAs and the necessary software to partition and map circuits.

Another promising area for FPGA application, which is only beginning to be developed, is the

usage of FPGAs as custom computing machines. This involves using the programmable parts to

“execute” software, rather than compiling the software for execution on a regular CPU. The
reader is referred to the FPGA-Based Custom Computing Workshop (FCCM) held for the last

four years and published by the IEEE.

It was mentioned in Section 2.2.8 that when designs are mapped into CPLDs, pieces of the

design often map naturally to the SPLD-like blocks. However, designs mapped into an FPGA are
broken up into logic block-sized pieces and distributed through an area of the FPGA. Depending

on the FPGA’s interconnect structure, there may be various delays associated with the intercon-

nections between these logic blocks. Thus, FPGA performance often depends more upon how
CAD tools map circuits into the chip than is the case for CPLDs.

3 Summary and Conclusions


We have presented a survey of field-programmable devices, describing the basic technology that

provides the programmability and a description of many of the architectures in the current mar-
ketplace. This paper has not focussed on the equally important issue of CAD tools for FPDs.

We believe that over time programmable logic will become the dominant form of digital logic

design and implementation. Their ease of access, principally through the low cost of the devices,

makes them attractive to small firms and small parts of large companies. The fast manufacturing

turn-around they provide is an essential element of success in the market. As architecture and

CAD tools improve, the disadvantages of FPDs compared to Mask-Programmed Gate Arrays will

lessen, and programmable devices will dominate.


4 Further Reading
A reasonable introduction to FPGAs can be found in the book:

[Brow92] S. Brown, R. Francis, J. Rose, Z. Vranesic, Field-Programmable Gate Arrays,


Kluwer Academic Publishers, May 1992.
A more specific discussion of three FPGA/CPLD architectures can be found in:
[Trim94] S. Trimberger, Ed., Field-Programmable Gate Array Technology, Kluwer
Academic Publishers, 1994.
More detailed discussion of architectural trade-offs can be found in:
[Rose93] J. Rose, A. El Gamal, A. Sangiovanni-Vincentelli, “Architecture of Field-
Programmable Gate Arrays,” in Proceedings of the IEEE, Vol. 81, No. 7, July 1993,
pp. 1013-1029.
A textbook-like treatment, including digital logic design based on the Xilinx 3000 series and the
Algotronix CAL chip can be found in:

[Oldf95] J. Oldfield, R. Dorf, Field Programmable Gate Arrays, John Wiley & Sons, New
York, 1995.

Up to date research topics can be found in several conferences - CICC, ICCAD, DAC and the
published proceedings: FPGA Symposium Series: FPGA ‘95: The 3rd Int’l ACM Symposium
on Field-Programmable Gate Arrays, and FPGA ‘96. In addition, there have been international
workshops on Field-Programmable Logic in Oxford (1991), Vienna (1992), Oxford (1993) Pra-
gue (1994), and Oxford (1995) and Darmstadt (1996), some of the proceedings of which are pub-
lished by Abingdon Press, UK. Finally, there is the Canadian Workshop on Field Programmable
Devices, 1994, 1995, and 1996.

5 References
[Birk92] J. Birkner et al, “A very-high-speed field-programmable gate array using metal-to-
metal antifuse programmable elements,” Microelectronics Journal, v. 23, pp. 561-568.
[Ham88] E. Hamdy et al, “Dielectric-based antifuse for logic and memory ICs,” IEEE
International Electron Devices Meeting Technical Digest, pp. 786 - 789, 1988.
[Marp94] David Marple and Larry Cooke, “Programming Antifuses in CrossPoint’s FPGA,”
Proc. CICC 94, May 1994, pp. 185-188.
[MAX94] Altera Corporation, “MAX+PlusII CAD Design System, version 5.0”, 1994.
[ORCA94] AT&T Corp., “ORCA 2C Series FPGAs, Preliminary Data Sheets,” April 1994.
[Wolff90] H. Wolff, “How QuickTurn is filling the gap,” Electronics, April 1990.

Acknowledgments

We wish to acknowledge students, colleagues, and acquaintances in industry who have helped

contribute to our knowledge.


PROGRAMMABLE CHIPS IN CONSUMER ELECTRONICS
AND TELECOMMUNICATIONS

Architectures and Design Technology

1. Introduction

Mobile and personal communication systems, and multi-media are among


the most prominently growing sectors of the electronics industry today. As
an illustration, Figure 1 gives an indication of the volume of some personal
communication applications in the European market. New business and
home applications are emerging, using advanced communication media such
as satellite links, cellular radio, or high-speed optical networks. The success
of these developments will however depend to a great extent on the ability
to realise complex digital signal processing functionalities in cost-efficient
VLSI chips.

Figure 1.
European market of personal communication systems (source: Elsevier Ad-
vanced Technology).

The design of these chips is subject to stringent requirements in terms


of processing performance and power dissipation. At the same time, strong
economical pressures exist, such as an increasingly shorter time-to-market,
or the desire to differentiate an existing design by adding new features to it.
These economical aspects necessitate a more flexible design paradigm that
supports specification changes in a late stage of the design cycle.
In this chapter we will discuss the concept of application-specific
instruction-set processors (ASIPs). ASIPs are an emerging design paradigm
that facilitates the implementation of complex, low-cost communication
and consumer electronics systems using VLSI integration. By integrating
an ASIP as a core component in a custom IC, programmability is introduced
(thus offering the desired flexibility in the design process),while maintain-
ing most of the advantages of customised VLSI architectures such as the
potential to optimise the processing performance and power dissipation).
As such, ASIPs form a synergy between two previously distinct classes of
architectures.
The success of ASIPs will critically depend on the availability of a sup-
porting CAD environment, in the form of software tools for designing and
programming ASIPs. This chapter includes a discussion on CAD tool re-
quirements, and a survey of existing and new techniques for code generation
and ASIP design.

2. Traditional Architectures for Consumer and Communication


Systems

High-volume consumer or communication products have traditionally been


built by designing VLSI chips of which the architecture is customised to the
application . These chips are referred to as ASICs (application-specific
integrated circuits). The architecture of an ASIC typically consists of sev-
eral parallel, pipelined data-paths, of which the topology is matching the
structure of computations in the target application. Similarly, the ASIC´s
memory structure is matching the storage capacity and communication
bandwidth requirements of the application, and the control unit is tuned
to the application´s control flow and speed requirements. Owing to this
architectural customisation, efficient solutions in terms of processing per-
formance and power dissipation can be obtained.
In the past decade, the CAD research community developed the concept
of high-level synthesis, to support the design process from behavioural spec-
ification to custom ASIC architecture . High-level synthesis tools are
now commercially available, e.g. from Mentor Graphics and Synopsys.
Next to ASICs, programmable processors such as standard micro-
processors or DSPs are being used in consumer and telecom applications
as well , in the form of off-the-shelf components. They were primarily
used for control functions (e.g in telecom switching), for low-volume signal
processing functions,or to build (multi-processor) prototypes prior to the
design of an ASIC.

Until recently, programmable processors were not a viable alternative for


high-volume ASICs in applications like mobile or personal communicators.
However, with the advent of ASIPs, this picture is clearly changing. Before
discussing the concept of ASIPs in more detail, we will firrst introduce a
number of commonly used terms relating to processor architectures.

3. Terminology of Programmable Processors

3.1 CISC ARCHITECTURE

Processor architectures designed in the seventies and eighties were mainly of


the CISC (complex instruction-set computer) type. The instruction set of
a CISC contains many instructions that directly implement the constructs
available in high-level languages.As a result, a CISC has a large variety of
instruction formats. A CISC typically uses a shared data and programme
memory ("Von Neumann" concept) and has functional units that fetch their
operands directly from memory.
3.2. RISC AND VLIW ARCHITECTURE

In the eighties, two new processor types were introduced: RISC and VLIW.

RISC (reduced instruction-set computer) architectures are based on the


following principles. RISCs have a load-store architecture , meaning that
functional units get operands from, and produce results in addressable reg-
isters. Communication between memories and registers requires separate
"load" and "store" operations, which can often be pipelined with arith-
metic operations. Data and programme memory are separately accessible
("Harvard" concept). The load-store architecture used in RISCs has two
important consequences. First of all, it results in a small instruction
set,
with fewer instruction formats. This implies that the control unit of a RISC
is less complex than of a CISC. Secondly, the architecture can be intensively
pipelined:RISCs typically have more pipeline stages than CISCs. As a re-
sult of these two factors,clock frequencies can be significantly increased
which leads to a high processing performance.

VLIW (very long instruction word) architectures, on the other hand,


use a different concept. In this case, a larger number of functional units
operate in parallel. These units can be controlled independently, which im-
plies that the instruction word consists of a large number of bits, grouped
into orthogonal fields (also called instruction streams). Super-scalar ar-

chitectures are a more special case, where the processor can dynamically
schedule pipelined instructions into parallel instruction streams.

3.3. DSP ARCHITECTURE

The above described concepts of CISC, RISC and VLIW, are applicable
to computer architectures for many broad application domains, such as
scientific computing, control, or digital signal processing. The term DSP
(digital signal processing)architecture is often used to designate a processor
architecture suited for the latter application domain, including e.g. audio,
speech processing and telecom applications.
DSPs may use any of the above concepts (CISC, RISC or VLIW). Usu-
ally a processor is classified as a DSP when it has the following features:
- It contains a parallel multiplier unit, in addition to the standard ALU.
This allows to execute multiply-accumulate instructions at a rate of
one per machine cycle.
- It has efficient memory and register structures, ensuring a high com-
munication bandwidth between the different functional units in the
processor's data path, and between data path and memory.
- Several addressing modes are available for the data memory.

3.4. ASIP ARCHITECTURE

Since the nineties, a new class of architectures is emerging, called ASIP


(application-specific instruction-set processor). ASIPs are pro-
grammable processors of which the architecture and instruction set are cus-
tomised to a specific application, such as GSM or DECT terminals (cellular
and cordless phone), MPEG (image compression), or automotive real-time
control.
Compared to the class of DSP processors, discussed in Section 3.3,
ASIPs introduce a further degree of customisation. However, as illustrated
by the above examples,an ASIP can also be developed for non-signal pro-
cessing applications.
ASIPs typically have a small instruction set, containing :

- A selection of standard arithmetic, memory and control flow instruc-


tions that are useful in the application;

- A number of specialised instructions, e.g. a biquad section, a wave-


digital filter adaptor, or a full stage of a Viterbi decoding algorithm.
In this way the critical loops of the application code can be executed
in a minimal number of machine cycles (possibly one single cycle),
without excessive storage of intermediate values.

The term "ASIP" was first introduced by Sato for application-specific


integrated processor (rather than instruction-set processor). The prefix "integrated" re-
ferred to the use of these processors as an on-chip core component. Other authors use
the term "ASSP", standing for application-specific standard part, or application-specific
signal processor.

ASIPs are an intermediary between custom VLSI architectures and pro-


grammable processors. On the one hand, ASIPs provide field or mask pro-
grammability, which leads to an increased design flexibility : late specifica-
tion changes can be accommodated by updating the application programme
running on the ASIP. On the other hand, by providing specialised functional
units and instructions that exploit the algorithmic parallelism, more effi-
cient solutions can be obtained than with a general purpose processor, like
e.g. a standard RISC or super-scalar processor or even an off-the-shelf DSP.
ASIPs typically operate at low supply voltages and low clock frequencies
to minimise power dissipation, while still meeting the required processing
performance (data throughput) of their target application(s).

4. ASIP Architectures for Consumer and Telecom Systems

In this section we will focus on ASIP architectures for consumer electronics


and telecom applications. We start by discussing the use of these ASIPs as
core processors. This is followed by an in-depth discussion of the architec-
tural characteristics of these ASIPs.
Within the consumer and telecom world, ASIPs are already being used
for applications with low to medium data rates, such as speech processing,
audio and data transmission. The concept may however extend to higher
rate applications as well: first programmable video processors are emerging
which contain more parallelism than contemporary processors.

4.1. INTEGRATION OF ASIPS INTO CUSTOMISED VLSI CHIPS

Many consumer systems, and especially mobile and personal communica-


tors, are compact portable devices. To make this miniaturisation possible,
there is an ongoing trend towards single-chip integration of complete elec-
tronic systems. Today, a GSM cellular radio terminal typically contains
three custom ASICs, but single-chip GSM terminals are expected to ap-
pear on the market within two or three years.
ASIP architectures, as defined in Section 3.4, will play an important role
in these future single-chip systems. Typically, ASIPs will be used as a core
processor that can be integrated as a component in an application-specific
IC. In the sequel, such an ASIP will be referred to as an ASIP core. The
ASIP core can be complemented with a number of other modules that are
integrated on the same chip. The result is a powerful, mask pro-
grammable device, referred to as heterogeneous IC or soft chip. By virtue

note
Such a mixed-signal chip would implement all signal processing and control functions,
excluding power amplifier circuits.

of its programmability, late specification changes can be accommodated in


the design of the chip, while still guaranteeing a cost-effective solution.
Figure 2. Block diagram of a GSM cellular radio terminal (bottom), and its
implementation in a heterogeneous IC.

The concept of heterogeneous ICs is illustrated in Figure 2. The IC


contains the following modules :

- An ASIP core, used as the central computation engine. The goal is to


implement as much as possible of the signal processing functions (e.g.
the source and channel coding/decoding functions in Figure 2) on the
ASIP core. In addition, the ASIP core can be used to implement control
functions (e.g. system management or the man-machine interface in
Figure 2), which in a more traditional design would be implemented
using a separate micro-controller.
The architecture of the ASIP core itself is designed together with the
first generation of the product. The core can then be reused in subse-
quent generations, or even in different products with a related func-
tionality. This approach is already being followed in major consumer
and telecom companies. An alternative approach would be to use the
core of a programmable DSP processor provided by a commercial ven-
dor. Certain vendors provide parameterisable versions of DSP proces-
sor cores for embedded applications . Compared to an in-house
developed ASIP core, these solutions imply less customisation, and
therefore a lower effciency. Furthermore they may result in an unde-
sirable dependency on external suppliers.

- One or several hardware accelerators. These are specialised data paths,


that can be used to extend the instruction set of the ASIP core in
an application-specific way. Hardware accelerators are useful for two
reasons : to implement specialised operations that were not foreseen in
the original instruction-set of the ASIP, or to implement time-critical
functions of which the performance requirements cannot be realised
in programmable technology today. An example of the latter are the
modulation/demodulation functions in the GSM application, shown in
Figure 2. Accelerators may optionally have a local controller.

- Peripheral blocks which are mainly intended for communication pur-


poses. Examples are memories, timers, serial and parallel interfaces,
DMA controllers, analogue/digital and digital/analogue converters,
etc.

Data communication between the ASIP core and hardware accelerators


may occur in several ways, e.g. by connecting the ASIP's data port to an
on-chip data bus, by a serial interface, by means of DMA, etc.
The concept of heterogeneous ICs implies the use of silicon libraries,
containing increasingly complex, re-usable macro cells. For example, most
peripheral blocks have a standard functionality and can be provided in the
form of a library cell. Hardware accelerators can be designed separately
(possibly in a parameterised way), using logic or high-level synthesis tools,
and entered in libraries for later reuse. Even a complete ASIP core can be
considered as a macro cell, stored in the library. The task of the designer
then is to select and connect cells from the library, and to generate machine
code running on the ASIP core that implements the desired functionality.
In such an environment, CAD support is needed for several tasks:

- System-level design, i.e. partitioning an application into processor and


accelerator functions, and defining the appropriate communication
channels. These CAD issues are discussed in another chapter.

- Cell-level design, i.e. designing new ASIP architectures, implementing


the ASIP as a portable silicon macro cell, code generation for the
selected ASIP, and verification of the eventual solution. These CAD
issues are discussed in more detail in Sections 5 to 7.

4.2. ARCHITECTURAL CHARACTERISTICS OF CONSUMER AND


TELECOM ASIPS

As introduced in Section 3.4, ASIPs are processors of which the architecture


and instruction-set are tuned to a specific application. Compared to more
general-purpose processors, the architectural specialisation of an ASIP re-
sults in better area/performance and power/performance ratios. However,
as will be explained in Section 5, some of these architectural features may
complicate the task of programming the ASIP, and consequently of devel-
oping efficient code generation tools.
In this section we will discuss the architectural features of ASIPs for
consumer and telecom applications. This discussion will be based on six
main parameters, which are valid for more standard DSP processors as well.
With each parameter, we will indicate the typical values for our application
domain. These parameters are : data type, code type, memory structure, reg-
ister structure, instruction format, and finally the architectural peculiarities
of the processor. This discussion summarises our observations made during
recent cooperative projects with consumer and telecom companies.
Architectures of ASIPs, developed in systems industry, are normally
classified as sensitive information. Therefore, in the sequel we will often
refer to a hypothetical ASIP, of which the basic structure and a part of
the instruction set are shown in Figure 3. Furthermore, to illustrate the
use of the 6-parameter classification scheme, Table 1 also lists the most
important parameter values for a number of commercial DSP processors.
Figure 3. Example of an ASIP : (a) Basic structure; (b) Instruction set implemented in
the instruction decoder. For simplicity, only arithmetic operations are shown.

note:
The data path structure of this ASIP is largely similar to the ADSP-21xx processor.

4.2.1. Data type


ASIP cores for consumer and telecom applications normally support fixed-
point arithmetic only. The bit-width of functional units, busses and mem-
ories can be customised to the application. Note that general-purpose pro-
cessors often contain floating-point units,which result in a larger silicon
area and a higher power dissipation. Floating-point arithmetic can how-
ever be avoided relatively easily in the VLSI implementation of consumer
and telecom systems, without sacrificing numerical accuracy, by including
the appropriate scaling operations in the software programme.

4.2.2. Code type


Two basic code types are commonly distinguished in computer architec-
ture : microcode and macrocode , and ASIPs currently used in consumer
and telecom applications can be of either type.

- In the case of a microcoded processor, all instructions execute in ex-


actly one machine cycle. Every instruction word, fetched from the pro-
gramme memory, specifies all data path and memory actions to be
executed during the corresponding machine cycle. This is also termed
time-stationary coding.

- In the case of macrocoded processors, some instructions can execute in


multiple cycles. Every instruction word now specifies all data path and
memory actions that correspond to a specific transformation of data,
even if these actions take place in different machine cycles. This is
also termed data-stationary coding.

Both code types are illustrated in Figure 4, which shows the execution
of a small application consisting of multiply-accumulate sequences, imple-
mented on the multiplier-accumulator of Figure 3, assuming that an extra
pipeline stage has been added between the multiplier and accumulator unit.
Per sequence three operations have to be executed : an operand fetch, a
multiplication, and an accumulation (we assume that the result is kept in
the accumulator register MR). The processor allows to start a new sequence
in every machine cycle. A macrocoded multiply-accumulate instruction is
shown, which executes in three cycles, thus controlling one complete se-
quence. Alternatively, a microcoded multiply-accumulate instruction would
execute in a single cycle, controlling the fetch, multiply and accumulate op-
erations of three consecutive sequences.
In the case of a microcoded processor, setting up and maintaining the
instruction pipeline is a responsibility of the programmer or the code gen-

note:
Hybrid code types do exist as well. For example, some processors contain instructions
that specify an operand fetch and an arithmetic operation that start in the same machine
cycle, but the arithmetic operation continues in the next cycle. Such a hybrid code type
is sometimes referred to as time-stationary macrocode.
Figure 4. Different code types, illustrated for a multiply-accumulate instruction (b), on
a pipelined data path (a),

erator. The resulting pipeline schedule is fully visible in the machine code
programme. In contrast, in a macrocoded processor these actions are per-
formed by the processor controller. Macrocoded processors may exhibit
pipeline hazards . Depending on the processor, pipeline hazards may
have to be resolved in the machine code programme (statically) or by means
of interlocking in the processor controller (dynamically). Macrocoded pro-
cessors with interlocking are relatively easy to programme, although it may
more difficult for a designer to predict their exact cycle time behaviour.

4.2.3. Instruction format


A distinction is made between orthogonal and encoded instruction formats.

- An orthogonal format consists of fixed control fields that can be set


independently from each other. For example, "Very Long Instruction
Word" (VLIW) processors have an orthogonal instruction format.

- In the case of an encoded format, the interpretation of the instruction


bits as control fields is depending on the value of designated format-bits
(e.g. instruction bits 1 and 2 in Figure 3(b)). The instruction decoder
translates instruction bits into control signals. Note that format bits
can also encode the opcode.

In the case of an orthogonal instruction format, the instruction bits within


a control field can be further encoded to reduce the field width. In our
classification, this will not be termed an encoded format however.
ASIP cores for consumer and telecom chips usually have a 16 to 32-bit
wide encoded instruction format. The application programme will normally
reside on-chip. By restricting the width of the instruction word, the chip
area devoted to the programme memory can be reduced. Moreover, should
the chip be field programmable, it is convenient to choose an instruction
width equal to the width of the chip´s parallel data port (so that the instruc-
tions can be loaded via this port) and/or equal to the width of standard
memory components (used to store the application programme).
Instruction encoding restricts the parallelism offered by the processor.
A challenging task in the design of an ASIP is to determine an instruction
set that can be encoded using a restricted number of instruction bits, while
still offering highly parallel instructions for most critical functions in the
target application.

4.2.4. Memory structure


Consumer and telecom ASIPs have efficient memory and register struc-
tures, which ensure a high communication bandwidth between the differ-
ent data-path units, and between data-path and memory. In this section
we will discuss memory structures register structures will be treated in
Section 4.2.5.
ASIPs typically have one or two data memories, which are placed on-
chip to reduce board cost, access time (allowing for single-cycle access) and
power dissipation. Data and programme memories are separately accessi-
ble ("Harvard" concept). Different data memory structures can be distin-
guished :

- Most ASIPs for consumer and telecom products have a load-store (also
called register-register) architecture. In a load-store architecture, all
data path operators get their operands from, and produce results in
addressable registers (e.g. AX,AY,AR in Figure 3(a)). Communication
between memories and registers requires separate "load" and "store"
operations, which may be scheduled in parallel with arithmetic opera-
tions if permitted by the instruction set (see Section 4.2.3).

- Memory-memory and memory-register architectures, although some-


times used in off-the-shelf DSPs, are less frequently encountered in
consumer/telecom ASIPs. In these architectures, some operands are
directly loaded from the data memory. An example is the TMS320C5x
DSP processor, which can load a multiplier operand from memory and
always stores the result in the accumulator register.

4.2.5. Register structure


We distinguish between homogeneous and heterogeneous register struc-
tures :

- The homogeneous case corresponds to a general-purpose register set,


of which all registers are interchangeable. This model is encountered
in most general-purpose micro-processors, as well as in many floating-
point DSPs. However, it is less common in ASIP cores.

- In order to increase the communication bandwidth, while keeping the


number of instruction bits low, consumer and telecom ASIPs mostly

have a heterogeneous register structure. This means that special-


purpose distributed registers and register files are used which are di-
rectly connected to specific ports of functional units. As a consequence,
functional units can only retrieve operands from and produce results
in designated registers. E.g. in Figure 3(a), the ALU´s left operand can
come from registers AX, AR, MR1 or MR0.

During the design of an ASIP, determining the register structure is an


important task. The architecture should only contain those communication
paths that are useful to efficiently implement critical functions from the
target applications. Furthermore, writing machine code that exploits such
a heterogeneous register structure in an efficient way is non-trivial.

4.2.6. Architectural peculiarities


Architectures of ASIPs designed for consumer and telecom applications
usually exhibit a number of peculiarities distinguishing them from more
general-purpose processors, that do not fit in any of the previous charac-
teristics. The following is a non-exhaustive list of peculiarities encountered
in existing consumer and telecom ASIPs.

Specialised arithmetic instructions. As mentioned before, the instruction-


set of an ASIP will contain specialised instructions, which allow to execute
critical loops of the target algorithms in a minimal number of machine cy-
cles and without excessive storage of intermediate values. The topology of
the ASIP´s data path is especially designed for these algorithms. Commonly
encountered examples include : (1) ASIPs for mobile communication often
contain a 16-bit ALU suited for Viterbi decoding. This ALU supports the
execution of two parallel 8-bit additions, as occuring in a Viterbi butterfly
function. In this case, the instruction set will contain a special add instruc-
tion in which carry flow between bit 8 and 9 is interrupted. (2) ASIPs for
speech or audio processing often contain specialised data paths for filter ap-
plications. For example, a complete biquad section, or a wave-digital filter
adapter, could be implemented in a single machine cycle.

Multiple addressing modes. Data memories are normally provided with


one or two address generation units, that support immediate, direct and in-
direct addressing modes. Specialised address operations are available, such
as modulo counting to implement circular buffers for filter applications, and
counting with reversed carry propagation for FFT applications.
Bit manipulation. Speech and audio ASIPs often contain bit manipula-
tion units to implement specialised saturation characteristics. Furthermore,
these ASIPs usually support a range of fixed-point data types (e.g. two dif-
ferent 16-bit two´s complement types with different binary point interpre-
tations, a 32-bit two´s complement type in an accumulator, and an 8-bit
integer type in the address generation units). Data type conversions are
supported by the data path hardware. A prominent example of the latter
is the multi-word register MR in Figure 3. This 32-bit result register can be
broken up into two 16-bit registers (MR0,MR1) which can be individually
addressed as the source of some arithmetic or move operations.

Control flow. Most ASIPs support standard control flow instructions, like
conditional branching based on bit values in the condition code register.
However, several measures are usually taken to guarantee good performance
in the presence of control flow.

- Branch penalties are usually small, i.e. 0 or 1 cycles. This term refers
to the delay incurred in executing a branch due to the instruction
pipeline.

- Many ASIPs have zero-overhead loop instructions. This allows to ex-


ecute the body of a repetitive algorithm without spending separate
cycles for loop control. This feature is essential for many time-critical
applications.

- Several arithmetic or move instructions are conditionally executable.


This allows to implement conditional algorithms without the overhead
of conditionally loading the programme counter.

- Some arithmetic instructions can be residually controlled. In this case


the behaviour of the instruction depends on the actual bit values in a
residual control register, which can be written by other operations.

- Finally, standard interrupt instructions are available. In some cases,


specialised context saving mechanisms like register shadowing are used
to minimise context switching times.

5. Design Technology Requirements for ASIPs

As explained in Section 4, ASIPs are a promising new paradigm to design


complex ICs for highly competitive markets like consumer electronics and
telecom. The success of this concept will however depend on the availability
of efficient and reliable CAD tools, to support the design and programming
of ASIPs.
CAD support for ASIPs is currently not commercially available. Since
two or three years, a growing number of research activities are being re-
ported in this area . These efforts are mainly situated in the
intersection between two existing research disciplines:software compilation
and hardware synthesis.
Figure 5. Components of a design environment supporting the use of ASIPs.

With respect to the use of an ASIP, the following major design tasks
can be identified (Figure 5):

ASIP architecture design. This task refers to the definition of the


instruction-set architecture of a new ASIP, based on a specification of a
set of benchmark functions which are typical representatives of the target
applications. Ideally, these benchmarks are described in a high-level pro-
gramming language (such as C ). Based on an analysis of these descriptions,
the ASIP´s data path, memory structure, instruction set, and peripherals
can be defined. Constraints on execution speed, area and power dissipa-
tion have to be taken into account. This task is discussed in more detail in
Section 6.

Retargetable code generation. The task of code generation is to translate


a behavioural description of a signal processing algorithm (e.g. a C pro-
gramme) into assembly or binary code that can be executed on the target
ASIP. A retargetable code generator differs from a traditional software com-
piler in the fact that it can address a range of different target processors.
This aspect of retargetability is essential, since the designer may want to
design his/her own ASIP or propose modifications of an existing ASIP. To
that effect a retargetable code generator reads a specification of the target
processor, e.g. in a formal processor description language. More details are
provided in Section 7.

Instruction-set simulation. The purpose of this task is to simulate an as-


sembly or binary code programme, in an instruction-cycle accurate way.
This programme may have been produced by the code generator. The sim-
ulator uses a formal specification of the target ASIP, preferably the same
spec that serves as an input to the retargetable code generator, with be-
havioural models for the operations in the instruction set. The simulation
should execute fast enough,i.e. a few thousands of instructions per second.
For the eventual verification of a complete heterogeneous IC, an optional
coupling between the instruction-set simulator and a VHDL based simula-
tor can be very useful.

ASIP implementation. When the ASIP architecture is satisfying all func-


tional requirements, a silicon implementation of the ASIP has to be made
that satisfies certain clock rate requirements. The ASIP may be designed
in the form of a parameterisable macro cell that can be added to a library.
The ASIP must be made testable; usually this aspect is only considered
after the ASIP´s basis structure and instruction set have been designed.
In the next two sections, we present an overview of existing work on
methodologies and tools for ASIPs. The main focus will be on retargetable
code generation (Section 6). Some of the approaches that are quoted below,
were not originally developed in the context of ASIPs as sketched in this
chapter. In all cases mentioned, it is the authors´ belief that an adaptation
to the ASIP context is possible.

6. Code Generation for ASIPs

Code generators start from an algorithmic description in a high-level lan-


guage. For signal processing applications, procedural languages (e.g. C ) or
applicative languages (e.g. DFL ) are mostly used. In an applicative lan-
guage, no execution order of operations is specified, except for the implicit
ordering caused by data dependences. In a procedural language, additional
ordering constraints can be imposed by the sequence of statements in the
description.
Signal processing algorithms have designated input and output vari-
ables. A software compiler may apply transformations to the algorithm
provided that the input/output behaviour does not change. The process of
software compilation is traditionally divided into a front-end and a back-
end :

- In the front-end, high-level transformations are carried out, whereby


the input description is transformed in an intermediate form. Well-
known intermediate forms include the static single assignment form
(SSA form ) and the control/data flow graph (CDFG). The
front-end includes a data flow analysis to determine all required
data dependences in the algorithm. Often a set of standard, processor-
independent transformations is used, to reduce e.g. the number of oper-
ations, or the sequentiality of the description .
- The back-end performs the actual code generation, whereby the inter-
mediate form is mapped on the instruction set of the target processor.
Figure 6. Different phases and design representations in the Chess/Checkers environ-
ment for retargetable code generation and instruction-set simulation.

The discussion in this chapter will be restricted to the code generation part.
As an illustration, Figure 6 shows the Chess retargetable code generation
environment, that is currently under development at IMEC . The code
generation process is traditionally divided in a number of phases. For the
ASIPs envisaged in this chapter, it is useful to distinguish the following
code generation phases :

- Code selection : The operations in the intermediate form are covered


with patterns that correspond to partial instructions supported by the
instruction set.

- Register allocation : Intermediate computation values are bound to


registers or memories. As a result, the previously identified patterns
are extended with read and write operations. If necessary, additional
transfer and memory load or store operations are added (e.g. for reg-
ister spills).

- Scheduling : Partial instructions that can execute in parallel are


grouped into complete instructions, and assigned to machine cycles.

Although these different phases are often solved in different passes of the
code generation trajectory it is well known that they are strongly depen-
dent, especially in the case of parallel architectures. This is referred to
note:
A distinction is sometimes made between determining a valid ordering of partial in-
structions (called scheduling) and the actual merging of partial instructions into complete
instructions (called compaction).

as phase coupling . In order to generate high quality code, each code


generation phase should take the impact on other phases into account.

6.1. TRADITIONAL COMPILER TECHNIQUES

In the software compiler community, the problem of code generation has


been addressed extensively during the seventies and eighties . Several
basic techniques for code selection, register allocation, and compaction are
well understood today. Also in the microprogramming community, several
basic techniques have been developed. A complete survey of existing tech-
niques for software compilation and microprogramming is beyond the scope
of this chapter. We will only summarise the main directions, and indicate
results that are of interest in the context of ASIPs. Although software com-
pilers rather use representations like the SSA form, in the sequel we will
assume that the application is specified as a CDFG.

6.1.1. Code selection


In compilers for CISC architectures, code selection has always been one
of the most critical phases. Initial code selection techniques were therefore
primarily targeting CISCs, but were later applied to RISCs and microcoded
machines as well. Note that in the case of CISCs, which always have a
memory-memory or a memory-register structure, the code selection phase
also generates the register allocation and the schedule as a by-product.
A standard approach to code selection is based on tree pattern matching
and covering . The processor´s instruction set is represented as a data
base of template patterns, each having the form of a CDFG tree. The code
selection problem then amounts to finding a complete cover of the applica-
tion CDFG with patterns from the template base. The idea of casting code
selection as a tree pattern matching problem is illustrated for our example
ASIP, in Figure 7.
Several algorithms are available to solve this tree pattern matching and
covering problem. Usually a dynamic programming algorithm is used, as
proposed initially by Aho and Johnson and reused in several other pub-
lications . However, in the context of code generation for ASIPs,
this method has two drawbacks :

- It can be shown that the dynamic programming technique produces


an optimal solution to the code selection problem, if the processor
has a homogeneous register structure. As mentioned in Section 4.2,
this condition is often not satisfied for ASIPs. In the latter case the
technique is still applicable but may produce a suboptimal result.

- The method presupposes that both the legal instruction patterns and
the application programme take the form of trees. Again, these con-

Figure 7. Code selection as a tree pattern matching problem: (a) Template pattern
base, derived for the ASIP of Figure 3; (b) CDFG of a symmetrical FIR filter, with a
possible cover using the available pattern base.

ditions may not be satisfied in the case of ASIPs : instructions in an


ASIP may contain reconvergent and even cyclic paths. Likewise, CD-
FGs representing the application are generally graphs and not trees.

Another approach to code selection was presented by Davidson.


This method consists of an initial expansion step in which the applica-
tion programme is translated into register-transfer statements, followed by
a so-called combiner step which heuristically combines successive register
transfers into instructions. The combiner compares the possible combina-
tions with the instruction set, for which a register transfer model is assumed
to be available.

6.1.2. Register Allocation


Register allocation is an important problem especially for RISC like archi-
tectures. The reason is that RISCs have load-store memory structures (see
Section 3), so the compiler must select registers for storing operands and
results of each partial instruction.
In the case of processors with a homogeneous register structure, the reg-
ister allocation problem is usually formulated as a graph colouring problem
on an interference graph , of which the nodes correspond to lifetime in-
tervals of values. In this problem it is assumed that the execution order of
the partial instructions is fixed (i.e. scheduling has been performed prior
to register allocation). Colouring is done by means of dedicated heuristics.
The problem is complicated by the fact that the available storage capacity
in a register bank is limited. When the maximum capacity is reached. a
standard solution is to spill values to memory. Efficient global algorithms
for register spilling have been proposed by Chaitin . Briggs developed a
technique to reduce the register load by re-computation of values .
Again, the above methods cannot be applied immediately to ASIPs. As
mentioned above, the ASIPs considered in this chapter have heterogeneous
register structures. In this case, not every register is directly accessible by
a functional unit. Therefore the phase coupling between register allocation
and scheduling becomes essential.

6.1.3. Scheduling
Scheduling (or compaction) is an essential task for architectures that have
parallelism in their instructions, or that exhibit pipeline hazards (e.g.
RISCs, VLIWs). In these cases the code selection phase normally delivers
only partial instructions, which can be further combined by a scheduling
tool. In the microprogramming community, the set of partial instructions
after code selection is usually called vertical microcode, while the final in-
structions after scheduling are referred to as horizontal microcode.
Scheduling algorithms are usually based on list scheduling . Initial
scheduling algorithms were only operating at the basic block level. More
global scheduling algorithms have been proposed as well, such as trace
scheduling and software pipelining.

6.2. RECENT APPROACHES TO RETARGETABLE CODE GENERATION

The code generation techniques described in the previous section were pri-
marily developed for general purpose processors. Complementary to this
work, new research activities have started in the past few years, into the
problem of code generation for ASIPs. These new activities are justified by
the following reasons :

- For more specialised architectures, standard software compilation tech-


niques do not always produce acceptable results. This can be illustrated
by investigating the C compilers that are available with most com-
mercial fixed-point DSPs today. It is well known that these compilers,
which employ standard software compilation techniques, often produce
inferior code quality : they cannot suficiently exploit the features
of the target architecture. As a result, industrial DSP system design
groups are today still spending a lot of time in manual assembly coding.
These observations can easily be extended to consumer and telecom
ASIPs.

- Notwithstanding some initial investigations,


the problem of architectural retargetability has never been fundamen-

Note:
A number of commercial compilers for fixed-point DSPs are based on standard soft-
ware compilation techniques, enhanced with processor-specific heuristics in the various
code generation phases. These heuristics result in an improved code quality,but they
make the compiler non-retargetable.
tally solved in the compiler community. As motivated in Section 5,
retargetability is a basic requirement in an ASIP environment.

In this section we survey some of the recent work on retargetable code


generation.

6.2.1 ASIP modelling for retargetability


A key aspect in making a code generator retargetable is to use an efficient
and powerful model to represent the characteristics of the processor. This
model should be used in all code generation phases, and even in other tools
such as an instruction-set simulator.
A first approach, also used in software compilation and microprogram-
ming, is to represent the target processor by enumerating all possible partial
instructions . This approach has been adopted in some code
generators for ASIPs, e.g. by Paulin . Such a pattern base only
describes a part of the instruction set however. In Paulin´s approach it
is therefore complemented by an abstract netlist describing the data path
topology, and by a definition of register classes capturing the heterogeneity
of the register structure.
Nowak extracts a so-called connection-operation graph from a structural
description . This description is a detailed netlist of the processor, in-
cluding its controller and instruction decoder. A limitation of this approach
is that the netlist is often not available. Even when the ASIP is developed
in-house, it may be desirable to run the retargetable code generator at an
early stage of the ASIP design process, i.e. before the detailed netlist is
constructed. Furthermore, Nowak´s approach can only deal with resource
conflicts that can be represented as encoding conflicts on the instruction
set. For some types of hardware conflicts, the latter is not possible.
Van Praet uses a graph-based processor model, called instruction-set
graph (ISG). The ISG can capture both behavioural (instruction
set) and structural (register structure, pipeline behaviour, structural haz-
ards) information, in a single model. The abstraction level is comparable
to
programmers´ manuals of processors. This single model is used by all Chess
code generation tools and by the instruction set simulator Checkers. Fig-

ure 8 shows the ISG representation of the ASIP of Figure 3. Vertices in


the ISG correspond to storage elements or to micro-operations. Both ob-
jects are annotated with their enabling condition to indicate the instruction
format they belong to.
Freericks proposed a processor specification language called nML
Similar to the ISG model, nML can captures the behaviour at the level

note:
The "best" form of retargetability offered by a traditional compiler today is
probably the portability of the GCC compiler, which requires a signifjcant
intervention by the user.
Figure 8. ISG representation of the example ASIP.

of programmers´ manuals. nML is however not a graph but an attributed


grammar. The grammar´s production rules define the composition of the in-
struction set. The semantics of the instructions are captured by attributes.
In the latest version of nML, is is also possible to combine behavioural
and structural information similar to the ISG. In a code genera-
tor is described of which the different phases use specialised views of the
processor, derived from an nML specification.

6.2.2. Code selection

The tree pattern matching/covering technique, described in Section 6.1,


is directly applicable when both the application CDFG and the legal in-
struction patterns are trees. As explained before, this condition may not be
satisfied in the case of ASIPs. In 2 a method was proposed to partition
the application graph in different trees, so that the dynamic programming
algorithm can be applied to the individual trees, and combine the results
afterwards by allocating registers for values that are shared among trees.
This method has been applied in several recent code generators for ASIPs,
e.g. by Liem and Fauth .

Another drawback of tree pattern matching/covering is that it can only


produce optimal results for homogeneous register structures. In the case
of heterogeneous structures, phase coupling with register allocation may
become important. Extensions for this case have been described by Wess
and Araujo. Wess considers tree-based heterogeneous structures.
His method combines the code selection and register allocation phases. The
covering problem is reformulated as a path search problem in trellis trees
(that can be solved optimally), followed by a separate compaction phase.

The above methods rely on processor models in which all patterns cor-
responding to legal partial instructions are enumerated in advance. Van
Praet proposed a bundling technique, in which the required patterns are
constructed on the fly during code selection . Whether or not such a
pattern is legal, can be derived from the ISG model. A comparable ap-
proach is used by Nowak who uses the connection-operation graph to
check patterns.

6.2.3. Register allocation


As already mentioned in Section 6.1, efficient phase coupling mechanisms
are essential in code generation for ASIP architectures with heterogeneous
register structures. Finding the best register to store intermediate computa-
tion values in a heterogeneous architecture, with restricted storage capacity,
is non-trivial. As an illustration, Figure 9 shows a number of alternative
solutions for the multiplication operand of the symmetrical FIR filter ap-
plication, implemented on our target ASIP (see also Figure 7). Extra data
transfers may even be inserted to route values between functional units. To
determine the best solution, scheduling information is needed.
Figure 9. Three alternative register allocations for the multiplication operand in the
symmetrical FIR filter. The route followed by the operand is indicated by bold arrows :

(a) Storage in AR; (b) Storage in AR followed by MX; (c) Spilling to memory.The latter
two alternatives require extra transfers.

Clearly the register allocation methods of Section 6.1 cannot be di-


rectly applied. This was noticed already by compiler developers for VLIW
architectures. Ellis proposed a method for combined register allocation and
scheduling in VLIWs . However, to cope with the increased compu-
tational complexity, his method only uses local, greedy search techniques,
which typically lack the power to identify good candidate values for spilling
to memory. Rimey and Hartmann have further elaborated on Ellis´
approach, in their schedulers for ASIP architectures. Wilson proposed an
integer-programming based scheduler that includes register allocation.
Liem uses the concept of register classes (see Section 6.2.1) and applies
traditional register allocation techniques to each register class separately.

Lanneer presented a register allocation technique for ASIPs with hetero-


geneous registers and a load-store architecture . This method supports
many different schemes to route values between functional units. It starts
from an unscheduled description. Phase coupling with scheduling is sup-
ported, by the use of probabilistic scheduling estimators during the register
allocation process.

7. Other Tasks in an ASIP Design Environment

7.1 DESIGN OF ASIP ARCHITECTURES

Hitherto, the design of industrial ASIPs has largely been a manual task.
Whereas the selection of functional units for a given application may be rea-
sonably straightforward, defining efficient memory and register structures
is non-trivial . Moreover the design of the actual instruction set involves
a careful tradeoff between the parallelism that will be offered (which de-
termines the ASIP´s performance and also its power dissipation) and the
width of the instruction word and programme memory. In reality, designers
sometimes start from an existing processor architecture that is modified for
new applications. In this case only part of the search space will be explored,
leading to sub-optimal solutions. A first aid to ASIP designers would be an
environment in which :

- The architectural parameters defined by the designer can be described


easily, at a sufficiently high abstraction level;

- Architectural decisions can be evaluated rapidly.

Such a methodology has been described by Van Praet. In this approach


the designer can describe a high-level view of the ASIP´s structure and
note:
This tradeoff is typical for any processor architecture (not only ASIPs),
Designers of the MIPS micro-processor quoted the so-called "1 %"-rule,
saying that a new instruction could only be added to an already existing
version of the instruction set if this would result in a performance
improvement by at least 1 %.

instruction set, using a graph-based processor model (see also Section 6.2).
Evaluations of the architecture are made by calling a retargetable code
generator that operates on the processor model, and produces compile-time
diagnostics to the designer.
Several authors have explored the semi-automatic definition of ASIP
architectures for a given application. Alomary´s approach is to start from a
generic ASIP model . This is a processor with a maximal instruction set,
consisting of a fixed set of basic instructions extended with more sophis-
ticated instructions that are compositions of the basic ones or correspond
to special hardware blocks. A tool then selects a subset from the maximal
instruction set. The latter problem is solved using a branch-and-bound al-
gorithm, based on user-defined constraints and metrics (e.g. performance
optimisation under area and power constraints). Holmer presented a tech-
nique to define macrocoded ASIP architectures using a code compaction
technique . In this approach, the designer defines the ASIP´s data path,
the instruction invariant behaviour of the instruction pipeline, and an up-
per bound on the allowed instruction width. The tool then determines the
actual instruction set, optimising the processor performance.
Huang presented a method similar to Holmer´s that uses different com-
piler techniques . Moreover, his method can also automatically deter-
mine certain parameters of the data path, such as the number of parallel
functional units. Goossens presented an architectural synthesis system for
microcoded ASIPs with orthogonal instruction formats . The user spec-
ifies the required functional units in the data path. The tool then optimises
the actual register structure, within the scope of a restricted data path
model.

7.2 SYSTEM DESIGN ASPECTS

To conclude this chapter, we will turn our attention to a number of system


design aspects, that are crucial in the context of ASIPs.

7.2.1 Low power design


One of the key motivations for using an ASIP is to reduce power dissipa-
tion. Although the use of power models is mentioned by researchers who
investigated the automatic optimisation of ASIP instruction sets (see e.g.
4 and Section 7.1 no systematic methods to design low-power ASIPs
have been published hitherto. At the architectural level, low power ASIPs
for consumer and telecom applications usually employ a combination of
different techniques such as :
- Providing enough instruction-level parallelism in the architecture, such
that the supply voltage and the clock frequency can be reduced.

- Latching of the operands of functional units, to retain previous operand


values when units are idle.

- Providing power-down instructions which allow to switch to a slow-


clock idle state under software control.

Furthermore, power-friendly circuit design techniques and technologies are


used.
Compiler optimisations, like used in the code generation phases of Fig-
ure 6 normally aim at minimisation of execution time or machine code size.
An interesting question is whether these optimisation techniques could be
steered by power models. Power modelling of instruction sets for micropro-
cessors and DSPs has been explored in recent papers by Tiwari and Lee.
Based on experimental energy measurements for a number of exist-
ing processors, it is argued that for general purpose microprocessors energy
consumption can be estimated reasonably accurately, by simply adding
energy contributions associated with the individual instructions that oc-
cur in the machine code. For DSPs however, the estimator should take
into account all sequences of consecutive instructions in the machine code.
Therefore, more sophisticated models are required that model the energy
of instruction sequences.

7.2.2. Real-time kernels


The execution of machine code on a DSP or ASIP is controlled by a single
micro-sequencer that steers the entire architecture in a synchronous way.
For this reason, code generators for such a processor traditionally start
from a specification language that models a single thread of control (e.g.
C ). The code generator can statically schedule the micro-operations derived
from this specification.
The abstraction level offered by these languages is however rather low
to serve as a full system specification. Complete audio or telecom systems
are more naturally specified using the paradigm of concurrent, communi-
cating processes, supporting an asynchronous control model. Consider for
example the GSM terminal shown in Figure 2. The execution of the oper-
ations in the system management process is dependent on external control
signal supplied by the man-machine interface process, as well as on status
information retrieved from the modulation/demodulation process (indicat-
ing the quality of the terminal´s synchronisation with the base station).
On the other hand, during normal operation of the terminal, the execution
rates of the operations in the actual signal processing functions (modu-
lation/demodulation, channel and source coding/decoding) are determined
by the incoming data streams that are being processed. Clearly there is no
compile-time relationship between the execution rates of the system man-
agement operations on the one hand and the signal processing operations
on the other hand : these processes are asynchronous with respect to each
other. Asynchronous processes represent multiple parallel control threads.
A discussion on concurrent system specification languages is beyond the
scope of this chapter.
A run-time kernel is a software programme that dynamically controls
the execution of different control threads on a single processor (e.g. an

ASIP). For every control thread, the machine code is compiled statically
using a code generator. During the execution, the kernel may e.g. decide to
interrupt an ongoing process of which the timing is non-critical, to insert
a high-priority time-critical process. Run-time kernels introduce a multi-
tasking functionality on the processor, similar to multi-tasking in a time-
sharing operating system. The kernel itself is simply another software pro-
cess running on the processor. Note however that most processors have
special hardware provisions to facilitate the implementation of a run-time
kernel (e.g. interrupt handlers, timers, etc.).
Several run-time kernels for DSP processors are commercially available.
These tools are stripped versions of time-sharing operating
systems, with more efficient mechanisms for context switching between
processes. In most cases they use a fixed priority preemptive scheduling
mechanism, in which the user has to set the process priorities. Limitations
of these tools are : their inability to guarantee that hard real-time con-
straints will be met, and the lack of automatic retargetability to different
processor cores (e.g. ASIPs).
Consumer and telecom applications are real-time systems, that must
operate under externally specified timing constraints. The kernel should
schedule the different processes in such a way that these timing constraints
are met. In this context the term real-time kernel is sometimes used. In
addition, care must be taken to ensure a correct communication of data
between processes. Current research in the field of run-time kernels is fo-
cussing on these issues 27
Recent publications in the CAD community have considered the prob-
lem of automatic synthesis of application-specific kernels .
By statically compiling a kernel that exploits the information about the
application at hand, more efficient communication and context switching
mechanisms can be incorporated. By means of a static timing analysis, the
timing constraints can already be checked at compile time.

8. Conclusions

The strong growth of the multi-media, mobile and personal communica-


tion systems market is pushing the development of new integration tech-

note:
Some kernels also support the execution on multi-processor targets.

nologies. Heterogeneous ICs form an emerging concept to realise complex


communication systems using VLSI integration. These chips contain an
application-specific instruction-set processor (ASIP) as a core component,
complemented with high-speed accelerators and peripheral blocks. In this
chapter we have discussed the architectural characteristics of contempo-
rary ASIPs used in the application domains of consumer electronics and
telecommunications.
New design technologies are required to support the design of practical
systems using ASIPs. In this chapter we have outlined the most impor-
tant requirements for an ASIP-based CAD environment. Important design
problems that must be addressed include : the design of ASIP architectures,
retargetable code generation, instruction-set simulation, silicon implemen-
tation of ASIPs, and system support in the form of real-time kernels. New
research will be required in all these areas, without however neglecting the
large body of knowledge that was developed previously in domains like
software compilers, high-level synthesis or operating systems. We have pro-
vided a summary of previous and ongoing research work that we consider
to be relevant in the context of ASIPs.
An important task for researchers in the emerging discipline of soft-
ware/hardware co-design, as discussed in this book, may very well be to
master the existing literature in these previously disjoint research worlds,
and to find a way of applying and extending these concepts to the new
design problems of today. Co-design is an interdisciplinary research activ-
ity, and the challenge is to design the right synergy between previously
developed pieces of the puzzle.

ACKNOWLEDGEMENTS

The authors whish to acknowledge the help of the following researchers at IMEC,
who directly contributed to the insights described in this chapter : Augusli Kifli,
Koen Schoofs, Hans Cappelle, and Stefan De Troch.

References
Systems-on-programmable chips: A look at the
packaging challenges

Complex FPGAs are increasingly taking on the characteristics of complete


systems-on-a-chip, including embedded memory and processors, specialized
I/O, and multiple differentiated power and ground planes. Developing packages
for these devices presents a set of challenges both common to other SoC
offerings and unique to systems-on-programmable chips (SoPCs).

For example, programmable logic device (PLD) vendors offer their customers
the ability to develop and verify designs for their devices well before the actual
devices are shipped which is typically 4-6 months before first samples are
available. This necessitates the packaging aspects of the entire product family to
be finalized before that. These aspects include items like pinout and electrical
and thermal characteristics which collectively facilitate early board layout,
design timing and verification, signal integrity analysis, and power budgeting.

Programmable logic vendors als o offer customers the ability to migrate designs
between different device family members with the same package and pinout
avoiding expensive board re-spins. This feature is called vertical migration and
has been largely accomplished through the package/die layout optimization.
Facilitating this capability has required proactive development of associated
substrate technologies to support the routing densities required.

Altera has been one of the early users of high-density interconnect (HDI)
technology and has and continues to work extensively with these providers to
enhance capabilities and improve performance. One of the more recent
challenges in programmable logic packaging has been the integration of high-
speed transceivers.

The proper operation of these transceivers requires several extra demands on


the packaging of these devices, including equalizing trace pair lengths to
minimize skew and optimizing transmission line impedance. The deleterious
effects of small discontinuities become increasingly evident at these data rates,
in excess of 3.125 Gbit/seconds. In addition, signal integrity should be
maximized by optimal trace placement and an overall reduction of inductance
within the package, in particular from the multiple power and ground planes
that are required to support device operation. All of these factors are
interdependent, and slight changes to influence one of them can cause
unforeseen changes in others.

Such requirements necessitate silicon package co-planning and design.


Considerations are made for silicon-package partitioning and package power
optimization at the product planning stages. This analysis is done using
comprehensive simulations to determine package characteristics many months
before actual packaged units are available. The entire packaging design is now
an integrated, iterative process involving optimization between pin layout, chip
layout and cost performance objectives. This has been significant change in
packaging design methodology which has be en quietly evolving over the last
four to five years.

Altera's experience with its first generation of transceiver devices — the Mercury
FPGA family, — enabled us to establish procedures for the complex simulation of
transceiver-based FPGAs, laying the groundwork for its more recent work with
the Stratix GX family. At that time, Altera package engineers discovered that
they had to develop a common framework and process to address both the
mechanical and electric aspects of these increasingly complex packages.

Working with the silicon design engineering team and using a combination of
tools from multiple vendors, the package engineers were able to develop
accurate models of the electrical behavior of the packaging circuitry that, when
used with the IC design test bench, would indicate the overall behavior of the
packaged die on the board. These models included ball-to-transmission line,
transmission line, and transmission line-to-bump H-spice models, as well as S-
parameter model s of the ball-to-bump behavior.

This process enabled Altera's engineers to accurately predict the signal integrity
behavior of the Stratix GX device several months before actual silicon was
available.
Higher priorities
The increased emphasis on signal integrity is coupled with enhanced customer
usability requirements of managing power dissipation and board mounting
considerations such as accommodating various reflow conditions for different
package lead finishes. The implementation of lead-free packages adds an
additional flavor to these challenges.

Our approach to these challenges has been to develop modeling techniques


using finite element methodologies to allow them to predict board-level
behavior. Typical customer requirements are for 2000-5000 cycles of board-
level reliability through 0-100°C, with slight variations such as higher
temperature requirements or greater number of cycles for certain market
segments, like communications, industrial, consumer and automoti ve.

Programmable logic devices tend to be high pin count with relatively large die
sizes. Component and board-level reliability are realized by starting with optimal
silicon and package design. Programmable logic vendors need to partner closely
with their assembly partners to optimize processes to meet customer
requirements of reliability and manufacturability. This includes participation in
the material selection for substrates/leadframes, underfill and die attach as well
as encapsulation. Modeling as well as empirical techniques are used to validate
these using test vehicles prior to product introduction.

Going forward, recent advances in semiconductor processes — like the transition


to 300 mm wafers or the introduction of low-k dielectric materials — are also
impacting package technology. For example, low-K dielectric materials are more
brittle than conventional FSG dielectrics, and package engineers must determine
ways of maintaining high reliability in the face of this differe nce. These methods
might include developing design rules that influence the use of low-K dielectrics
during IC layout, or identifying an appropriate bill of materials to meet the
customer requirements. The enhanced power dissipation that is anticipated at
the 90 nm node will also influence packaging choices for next-generation
devices.

As devices continue to progress towards higher levels of integration, the


packaging aspects are becoming a part of product feature set. The associated
methodologies and processes need to evolve to keep ahead of the curve. The
concept of system-level designs with co-design and co-optimization of the
various subsystems is gaining momentum in the industry. Semiconductor
vendors are facing these challenges need to continue working as enablers,
drawing upon several technologies to advance the state of the packaging art. An
increasing amount of participative effort of the EDA, foundry and silicon
engineering community along with the packaging community is needed to
support these needs.

You might also like