Evolution of Microprocessor
A 30 year history of microprocessors
Four generation of innovation
High performance microprocessor drivers:
Memory hierarchies instruction level parallelism (ILP)
Where are we and where are we going? Focus on desktop/server microprocessors vs. embedded/DSP microprocessor
Microprocessor Generations
First generation: 1971-78
Behind the power curve (16-bit, <50k transistors)
Second Generation: 1979-85
Becoming real computers (32-bit , >50k transistors)
Third Generation: 1985-89
Challenging the establishment (Reduced Instruction Set Computer/RISC, >100k transistors)
Fourth Generation: 1990 Architectural and performance leadership (64-bit, > 1M transistors, Intel/AMD translate into RISC internally)
In the beginning (8-bit) Intel 4004
First general-purpose, single-chip microprocessor Shipped in 1971 8-bit architecture, 4-bit implementation 2,300 transistors Performance < 0.1 MIPS (Million Instructions Per Sec) 8008: 8-bit implementation in 1972
3,500 transistors First microprocessor-based computer (Micral)
Targeted at laboratory instrumentation Mostly sold in Europe
1st Generation (16-bit) Intel 8086
Introduced in 1978
Performance < 0.5 MIPS
New 16-bit architecture
Assembly language compatible with 8080 29,000 transistors Includes memory protection, support for Floating Point coprocessor
In 1981, IBM introduces PC
Based on 8088--8-bit bus version of 8086
2nd Generation (32-bit) Motorola 68000
Major architectural step in microprocessors:
First 32-bit architecture
initial 16-bit implementation
First flat 32-bit address
Support for paging
General-purpose register architecture
Loosely based on PDP-11 minicomputer
First implementation in 1979
68,000 transistors < 1 MIPS (Million Instructions
Per Second) Used in
Apple Mac Sun , Silicon Graphics, & Apollo workstations
3rd Generation: MIPS R2000
Several firsts:
First (commercial) RISC microprocessor First microprocessor to provide integrated support for instruction & data cache First pipelined microprocessor (sustains 1 instruction/clock)
Implemented in 1985
125,000 transistors 5-8 MIPS (Million Instructions per Second)
4th Generation (64 bit) MIPS R4000
First 64-bit architecture Integrated caches
On-chip Support for off-chip, secondary cache
Integrated floating point Implemented in 1991:
Deep pipeline 1.4M transistors Initially 100MHz > 50 MIPS
Intel translates 80x86/ Pentium X instructions into RISC internally
Key Architectural Trends
Increase performance at 1.6x per year (2X/1.5yr)
True from 1985-present
Combination of technology and architectural enhancements
Technology provides faster transistors ( 1/lithographic feature size) and more of them Faster transistors leads to high clock rates More transistors (Moores Law):
Architectural ideas turn transistors into performance
Responsible for about half the yearly performance growth
Two key architectural directions
Sophisticated memory hierarchies Exploiting instruction level parallelism
Memory Hierarchies Caches: hide latency of DRAM and increase BW
Trend 1: Increasingly large caches
CPU-DRAM access gap has grown by a factor of 30-50! On-chip: from 128 bytes (1984) to 100,000+ bytes Multilevel caches: add another level of caching
First multilevel cache:1986 Secondary cache sizes today: 128,000 B to 16,000,000 B Third level caches: 1998
Trend 2: Advances in caching techniques:
Reduce or hide cache miss latencies
Cache aware combos: computers, compilers, code writers
prefetching: instruction to bring data into cache early
early restart after cache miss (1992) nonblocking caches: continue during a cache miss (1994)
The 1 uP: Intel 4004
s t
Introduced 1971 2250 transistors 108 kHz, 60,000 ops/sec 16 pins 10-micron process As powerful as the ENIAC which had 18000 tubes and occupied a large room Targeted use: Calculators Cost: less than $100
Introduced December 2001 55 million transistors 32-bit word size
Currently Popular Intel Pentium 4 (2.2GHz)
2 ALUs, each working at 4.4GHz 128-bit FPU 0.13 micron process Targeted use: PCs and low-end workstations Cost: around $600
Moores Law
In 1965, one of the founders of Intel Gordon Moore predicted that the number of transistor on an IC (and therefore the capability of microprocessors) will double every year. Later he modified it to 18months His prediction still holds true in 02. In fact, the time required for doubling is contracting to the original prediction, and is closer to a year now
4 0 0 4 8 0 0 8 8 0 8 0 8 0 8 6 2 8 6 3 8 6 4 8 6 P e ntium P e ntium 2 P e ntium 3 P e ntium 4 1 0 0 ,0 0 0 ,0 0 0 1 0 ,0 0 0 ,0 0 0 1 ,0 0 0 ,0 0 0 1 0 0 ,0 0 0 1 0 ,0 0 0
Evolution of Intel Microprocessors
1 ,0 0 0 1970
1975
1980
1985
1990
1995
2000
2005
4-, 8-, 16-, 32-, 64-bit (Word Length)
The 4004 dealt with data in chunks of 4bits at a time Pentium 4 deals with data in chunks (words) of 32-bit length The new Itanium processor deals with 64bit chunks (words) at a time Why have more bits (longer words)?
kHz, MHz, GHz (Clock Frequency) 108kHz 4004 worked at a clock frequency of
The latest processors have clock freqs. in GHz Out of 2 uPs having similar designs, one with higher clock frequency will be more powerful Same is not true for 2 uPs of dissimilar designs. Example: Out of PowerPC & Pentium 4 uPs working at the same freq, the former performs better due to superior design. Same for the Athlon uP when compared with a Pentium
Basic Components of Digital Computer
CPU Memory I/O
CPU
Memor y I/O
Could be a chip, a board, or several boards
Microcontrollers
Memory CPU
ROM RAM
I/O Subsystems: Timers, Counters, Analog Interfaces, I/O interfaces
A single chip
A microprocessor system?
uPs are powerful pieces of hardware, but not much useful on their own Just as the human brain needs hands, feet, eyes, ears, mouth to be useful; so does the uP A uP system is uP plus all the components it requires to do a certain task A microcomputer is 1 example of a uP system
Microprocess Data or Cache
Memory Bus
RAM I/O
Bus Interface Unit
System Bus
Control Unit Instruction Decoder
Arithmetic & Logic Unit Registers Floating Point Unit Registers
Instruction Cache
General-purpose microprocessor
CPU for Computers No RAM, ROM, I/O on CPU chip itself Example Intels x86, Motorolas 680x0
Many chips on mothers board
CPU GeneralPurpose Microprocessor
Data Bus
RAM
ROM
I/O Port
Timer
Serial COM Port
Address Bus General-Purpose Microprocessor System
Microcontroller :
A smaller computer On-chip RAM, ROM, I/O ports... Example Motorolas 6811, Intels 8051, Zilogs Z8 and PIC 16X
CPU I/O Port
RAM ROM Serial Timer COM Port
A single chip
Microcontroller
Microprocessor vs. Microcontroller
Microcontroller Microprocessor CPU, RAM, ROM, I/O and CPU is stand-alone, RAM, timer are all on a single chip ROM, I/O, timer are separate fix amount of on-chip ROM, designer can decide on the RAM, I/O ports amount of ROM, RAM and I/O ports. for applications in which cost, expansive power and space are critical versatility single-purpose
general-purpose
Block Diagram
External interrupts Interrupt Control On-chip ROM for program code
Timer/Counter
On-chip RAM
Timer 1 Timer 0
Counter Inputs
CPU Serial Port
OSC
Bus Control
4 I/O Ports
P0 P1 P2 P3
TxD RxD
Address/Data
8086 microprocessor
Address Bus 20 lines A19 A0 Data Bus 16 lines D15 D0 Microprocessor 8086 16 bit- microprocessor ? 16-bits data bus? Da ta Bu s
Control signals
Add Bus
20 bits address bus?
It can address any one of 1,048,576 (=220 ) memory locations/addresses. Each memory location is one byte wide. To store a word of 16 bit 2 memory locations are required. If the first byte of the word is at even address 8086 can read the entire word in one operation. If the first byte of the word is at an odd address, the 8086 will read the first byte with one bus operation and the second byte with another bus operation.
A19 A0 0.0 1.1
00000H FFFFFH 00000H
Memory Address Space
1,048,576 memory locations=1MBytes
FFFFFH
8086 INTERNAL ARCHITECTURE
2 units a 1. BIU 2. EU
Fig: 8086 Internal block diagram .
BIU and EU
BIU (bus interface unit) sends out addresses, fetches instructions from memory, reads data from ports and memory, and writes data to ports and memory. In other words, the BIU handles all transfers of data and addresses on the buses for the execution unit. EU (execution unit) of the 8086 tells the BIU where to fetch instructions or data from, decodes instructions, and executes instructions.
Bus Interface Unit
Receives instructions & data from main memory Instructions are then sent to the instruction cache, data to the data cache Also receives the processed data and sends it to the main memory
Floating-Point Unit (FPU)
Also known as the Numeric Unit It performs calculations that involve numbers represented in the scientific notation (also known as floating-point numbers). This notation can represent extremely small and extremely large numbers in a compact form Floating-point calculations are required for doing graphics, engineering and scientific work The ALU can do these calculations as well, but will do them very slowly
Registers
Both ALU & FPU have a very small amount of super-fast private memory placed right next to them for their exclusive use. These are called registers The ALU & FPU store intermediate and final results from their calculations in these registers Processed data goes back to the data cache and then to main memory from these registers
Control Unit
The brain of the uP Manages the whole uP Tasks include fetching instructions & data, storing data, managing input/output devices
Overview
Intel 8088 facts
20 bit address bus allow accessing
1 M memory locations
VDD (5V) 8-bit data
8088
16-bit internal data bus and 8-bit
external data bus. Thus, it need two read (or write) operations to read (or write) a 16-bit datum
20-bit address
control Byte addressable and byte-swapping signals To 8088 Word: 5A2F CLK 18001 5A High byte of word 18000 2F Low byte of word Memory locations
control signals from 8088
8088 signal classification
GND
Organization of 8088/8086
Address bus (20 bits) AH BH CH AL General purpose BL register CL
Execution Unit DL DH SP (EU)
BP SI DI ALU Data bus (16 bits)
Segment register
CS DS SS ES IP
Data bus (16 bits)
ALU Flag register EU control
Instruction Queue
Bus control
External bus
Bus Interface Unit (BIU
General Purpose Registers
15 8 7 0
AX
Data Group
AH BH CH DH SP
AL BL CL DL
Accumulator Base Counter Data Stack Pointer Base Pointer Source Index Destination Index
BX CX DX
Pointer and Index Group
BP SI DI
Arithmetic Logic Unit (ALU)
A
n bits
B
n bits
Carry Y= 0 ? A>B?
0 0 0 A+B 0 0 1 A -B 0 1 0 A -1 F 0 1 1 A 1 0 0 1 0 1
and B A or B not A
Signal F control which function will be conducted by ALU.
Signal F is generated according to the current instruction. Basic arithmetic operations: addition, subtraction, Basic logic operations: and, or, xor, shifting,
Flag Register
Flag register contains information reflecting the current status of a microprocessor. It also contains information which controls the operation of the microprocessor.
15
NT IOPL OF DF IF TFSF ZF AF PF
Status Flags
0 CF
Control Flags
IF: DF: TF:
Interrupt enable Direction flag Trap flag
CF: flag PF: AF: ZF: SF: OF: NT:
Carry flag Parity flag Auxiliary carry flag Zero flag Sign flag Overflow flag Nested task flag
Instruction Machine Codes
Instruction machine codes are binary numbers
For Example: 1000100011000011
MOV AL, BL
Machine code structure
Opcode
MOV Register mode
Mode Operand1Operand2
Some instructions do not have operands, or have only one operand
Opcode tells what operation is to be performed. Mode indicates the type of a instruction: Register type, or Memory type Operands tell what data should be used in the operation. Operands can be addresses telling where to get data (or where to store results)
(EU control logic generates ALU control signal
EU Operation
1. Fetch an instruction from instruction queue
AH BH CH DH AL General purpose BL register CL DL
2. According to the instruction, EU control logic generates control signals. (This
3. Depending on the control signal, EU performs one of the following operations:
SP process is also referred to as instruction BP SI decoding) ALU Data bus DI (16 bits)
ALU An arithmetic operation A logic operation Flag register Storing a datum into a register Moving a datum from a register Changing flag register
EU instruction control 1011000101001010
Generating Memory Addresses
How can a 16-bit microprocessor generate 20-bit memory addresses? Left shift 4 bits 16-bit register 0000 + 16-bit register Offset
FFFFF Addr1 + 0FFFF Addr1 Offset Segment address 00000 1M memory space Segment (64K)
20-bit memory address
Intel 80x86 memory address generation
Memory Segmentation
A segment is a 64KB block of memory starting from any 16-byte
boundary
For example: 00000, 00010, 00020, 20000, 8CE90, and E0840 are all valid segment addresses The requirement of starting from 16-byte boundary is due to the 4-bit left shifting
Segment registers in BIU
15
CS DS SS ES
Code Segment Data Segment Stack Segment Extra Segment
Memory Address Calculation
Segment addresses must be stored in segment registers Offset is derived from the combination of pointer registers, the Instruction Pointer (IP), and immediate values Examples 3 4 8 A 0 CS 4 2 1 4 IP + Instruction address 3 8 A B 4 1 2 3 4 0 DS 0 0 2 2 DI + Data address 1 2 3 6 2 +
Segment address Offset
0000
Memory address
5 0 0 0 0 SS F F E 0 SP + Stack address 5 F F E 0
Fetching Instructions
Where to fetch the next instruction?
8088 CS1 2 3 4 IP 0 0 1 2 12352
Update IP
Memory
12352 MOV AL, 0
After an instruction is fetched, Register IP is updated as follows: IP = IP + Length of the fetched instruction
For Example: the length of MOV AL, 0 is 2 bytes. After fe the IP is updated to 0014
Accessing Data Memory
There is a number of methods to generate the memory address when accessing data memory. These methods are referred to as Addressing Modes Examples: Direct addressing: MOV AL, [0300H] 1 2 3 4 0 0 3 0 0 Memory address 1 2 6 4 0 DS Register indirect addressing: MOV AL, [SI] 1 2 3 4 0 0 3 1 0 Memory address 1 2 6 5 0 DS (assume DS=1234H) (assume SI=0310H) (assume DS=1234H)
Reserved Memory Locations
Programs should not be loaded in these areas
Locations from FFFF0H to FFFFFH are used for system reset code Reset instruction area
Some memory locations are reserved for special purposes.
FFFFF FFFF0
Locations from 00000H to 003FFH Interrupt are used for the interrupt pointer table pointer It has 256 table entries table
Each table entry is 4 bytes 003FF 256 4 = 1024 = memory addressing space From 00000H to 003FFH 00000
Interrupts
An interrupt is an event that occurs while the processor is executing a program
The interrupt temporarily suspends execution of the program and switch the processor to executing a special routine (interrupt service routine) When the execution of interrupt service routine is complete, the processor resumes the execution of the original program Interrupt classification Hardware Interrupts
Caused by activating the processors interrupt control signals (NMI, INTR)
Software Interrupts
Caused by the execution of an INT instruction Caused by an event which is generated by the execution of a program, such as division by zero
8088 can have 256 interrupts
Minimum and Maximum Operation modes
Intel 8088 (8086) has two operation modes:
Minimum Mode
Maximum Mode
8088 generates control signals It needs 8288 bus controller to generate for memory and I/O operationscontrol signals for memory and I/O operations It Some functions are not available allows the use of 8087 coprocessor; in minimum mode it also provides other functions Compatible with 8085-based systems
8086/8088 Functional Units
E x e c u tio n U n it (E U )
B u s In te rfa c e U n it( B IU ) F e tc h e s O p c o d e s , R e a d s O p e ra n d s , W r it e s D a t a
8 0 8 6 /8 0 8 8 M P U
8086/8088 Internal Organisation
EU B IU A d d re s s B u s 2 0 b its AH BH CH DH SP SS BP ES DI IO BI In te rn a l C o m m u n ic a tio n s R e g is te rs Bus C o n tro l 8088 B us AL BL CL CS DL DS S U M M A T IO N D a ta B u s
T e m p o ra ry R e g is te rs In s tru c t io n Q u e u e ALU
EU C o n tro l
F la g s
8086/8088 20-bit Addresses
CS 1 6 - b it S e g n m e n t B a s e A d d r e s s 0000
IP 1 6 - b it O ff s e t A d d r e s s
2 0 - b it P h y s ic a l A d d r e s s
Exercise: 20-bit Addressing
1. CS contains 0A820h,IP contains 0CE24h. What is the resulting physical address? 2. CS contains 0B500h, IP contains 0024h. What is the resulting physical address?
GND AD14
GND AD14
1 GND
A14
i8 0 8 6 C ir c u it - M a x im u m M o d e
V cc
CLK R EADY R ESET S0# S1# S2# CLK M R DC # M W TC # AM W C # IO R C # IO W C # A IO W C # IN T A #
8284A C lo c k G e n e ra to r R DY
8288 Bus C o n t r o lle r
D EN D T /R # ALE
8086 CPU
M N /M X #
LE O E# BHE# A D 1 5 :A D 0 A 1 9 :A 1 6 IN T R 74LS373 x3
A 1 9 :A 0 , BHE#
A D D R /D A T A
D IR EN # 74LS245 74LS245 x2 x2
D 1 5 :D 0
A D D R /D a ta
8086/8 Maximum Mode
In maximum mode, the 8288 uses a set of status signals (S0, S1, S2) to rebuild the normal bus control signals of the microprocessor
MRDC#, MWTC#, IORC#, IOWC# etc Equivalent to MEMR# etc
Look at some special signals briefly
RESET# Signal
The Active low RESET# signal puts the 8086/8 into a defined state Clears the flags register, segment registers etc. Sets the effective program address to 0FFFF0h (CS=0F000h, IP=0FFF0h) 8086/8 Programs always start at 0FFFF0H after Reset has been asserted and removed Continues into latest generation CPUs
BHE# Signal (8086 Only)
The 8086 processor can address memory a byte at a time Its data bus is 16b wide It uses the BHE# signal and A0 (sometimes called BLE#) to address bytes using its 16b bus
Use of BHE#/A0(BLE#)
B y te - W id e a d d r e s s in g (8 0 8 8 ) FFFFF FFFFE FFFFD FFFFC A 1 9 ..A 1 O D D A d d re s s e s (8 0 8 6 ) FFFFF FFFFD FFFFB FFFF9 A 1 9 ..A 1 E V E N A d d re s s e s (8 0 8 6 ) FFFFE FFFFC FFFFA FFFF8
00002 00001 00000
00005 00003 00001
00004 00002 00000
D 1 5 :D 8 BHE# A 0 /B L E #
D 7 :D 0
Use of BHE#/BLE#
BHE# 0 0 1 1 A0/BLE# 0 1 0 1 Selection Whole word (16-bits) High byte to/from odd address Low byte to/from even address No selection
ALE and Address/data Bus Multiplexing
8086/8 Multiplexes the Address and Data signals onto the same set of pins Need off-chip logic to separate the signals Transparent latches designed just for address demultiplexing
ALE and 74HC373 Transparent Latch
C lo c k
A d d re s s / D a ta Bus
A d d re s s T im e
D a t a T im e
ALE
O u tp u t o f 74H C 373
M ic r o c o m p u t e r A d d r e s s B u s
7 4 H C 3 7 3 o r e q u iv a le n t
A d d re s s / D a ta B u s
In 0 : I n 7
Q 0 :Q 7
S y s te m A d d re s s B u s
ALE
LE OE# T r iS t a t e C o n t r o l s ig n a l, O E # , s h o w n c o n n e c te d to G N D f o r s im p lic it y
Use of ALE (Address Latch
Enable)
ALE is used with an external latch (74HC373) to demultiplex the address and data lines 74HC373 is transparent when its LE input (connected to ALE) is high When ALE goes low, the 373 holds the last data until ALE goes high again
8288 Bus Controller and Bus Transceivers
8288 B u s C o n t r o ll e r DEN# D T /R # EN# DD I IRR C P U [D 1 5 :D 8 ] 74H C 245 B u ffe r e d [D 1 5 :D 8 ] 8 2 8 8 B u s C o n t r o lle r a ls o g e n e r a t e s D i r e c t io n a n d E n a b l e s i g n a ls f o r B i D ir e c t io n a l T r a n s e iv e r s S u p p o r ts B u ff e r in g th e S y s te m D a ta B u s
EN# D IR C P U [D 7 :D 0 ] 74H C 245 B u ffe re d [D 7 :D 0 ]
To Memory and I/O Systems
8086 Read Cycle
8086 Write Cycle
8086 Read Cycle
(1 Wait State)
8086/8088 Summary
First Generation (introduced June 1978) One of the first 16b processors on the market 16b internal registers 16/8b external data bus 20b address bus (1MB addressable) Used in 1st generation IBM PCs (1981)
80186/80188
Evolution of 8086/8088 80186/80188 Increased instruction set On-chip system components (Clock generator, DMA, Interrupt, Timers) Unsuccessful in PCs Popular in embedded systems
2nd Generation Processor 286
P2 (286) = 2nd Generation Processor Introduced in 1981 CPU behind IBM AT Throughput of original IBM AT (6MHz) was about 500% of IBM PC (4.77MHz) Level of integration: 134k transistors (vs 29k in 8086) Still a 16b processor Available in higher clock frequencies: 25MHz
2nd Generation Processors 286
Fully backwards compatible to 8086 Improved instruction execution
80286 runs 8086 software without modification Average instruction takes 4.5 cycles vs. 12 cycles (8086)
Improved instruction set Real mode and Protected Mode
Multitasking-support. What happens in one area of memory doesnt affect other programs. Protected mode supported by Windows 3.0.
16MB addressable physical memory On-chip MMU (1GB virtual memory) Non-multiplexed address-bus and data-bus
Improving Computer Performance
Weve seen how 16b computer technology based on the 8086 and 80286 processors developed These computers are not powerful enough for todays applications How do you improve the performance of your computer? Lets start with the CPU
CPU Performance (1)
MOST OBVIOUS: Processor Clock Frequency Increased frequency increased execution rate State of the Art: >4GHz (03/2005) Memory and I/O access times can be performance bottleneck unless you take some special measures
CPU Performance (2)
ALU register width
A processor is an n-bit processor, where N represents the precision of the ALU N can be 4, 8, 16, 32, or 64 The wider the registers the more processing per clock
Data bus width
The wider the data bus the faster we can transfer data Since the memory and I/O device access times are finite, the more bits transferred per cycle the better
CPU Performance (3)
Address bus width Increased address width doesnt provide a speed increase as such CPU can directly address more memory PCs use big programs, which would not fit in a smaller address space Overcoming small address space takes time
Impacts on overall system performance
3rd Generation Processor 386
P3 (386) = 3rd Generation Processor Introduced: 10/1985 Full 32b processor
(32b registers. 32b internal and external databus. 32b address bus)
275k transistors. CMOS. 132-pin PGA package.
(Supply current Icc=400mA. Roughly the same as 8086 !)
Clock speeds: 16-33MHz P3 processors were far ahead of their time: First 386 PCs early 1987
(COMPAQ)
It took 10 years before 32b operating systems became mainstream!
3rd Generation Processor 386
Modes of operation:
Real. Protected. Virtual Real.
Protected mode of 386 is fully compatible with 286 New virtual real mode
Protected mode=native mode of operation. Chips are designed for advanced operating systems such as Windows NT Processor can run with hardware memory protection while simulating the 8086s real-mode operation. Multiple copies of e.g. DOS can run simultaneously, each in a protected area of memory. If a program in one memory area crashes, the rest of the system is protected.
Intel 32-bit Architecture:IA-32
A d d re s s A d d r e s s in g U n it (A U )
B u s U n it ( B U ) P re fe tc h Q u e u e
E x e c u t io n U n it ( E U ) ALU
C o n tro l U n it (C U )
D a ta
I n s t r u c tio n U n it ( IU )
R e g is te r s
T h e 8 0 3 8 6 in c lu d e s a B u s In t e r f a c e U n it f o r r e a d in g a n d p r o v id in g d a t a a n d in s tr u c t io n s , w it h a P r e f e tc h Q u e u e , a n I U fo r c o n t r o llin g th e E U w it h it s r e g is te r s , a s w e ll a s a n A U f o r g e n e r a t in g m e m o r y a n d I/ O a d d r e s s e s
80386 Features
32b general and offset registers 16B prefetch queue Memory management unit with segmentation unit and paging unit 32b address and data bus 4GB physical address space 64TB virtual address space i387 numerical coprocessor Implementation of real, protected and virtual 8086 modes
80386 Operating Modes
Protected Mode for Multitasking support Real Mode (native 8086 mode)
Processor powers up in Real Mode
System Management Mode
Power management or system security Processor switches to separate address space, while saving the entire context of the currently running program or task
80386 Register Set
I n s t r u c t io n P o in t e r 31 E IP 16 15 IP 0 EFLAG E F L A G R e g is te r 16 15 31 FLAG E0
G e n e r a l- P u r p o s e R e g is t e r s 16 15 31 EAX EBX ECX EDX ESI EDI EBP ESP AH BH CH DH
8 7 AL BL CL DL SI DI BP SP
0 CS SS DS ES FS G S
S e g m e n t R e g is te r s 15 0
80386 Prefetch Queue
E x e c u t io n U n it
1 6 - b y te d e e p In s tr u c tio n Q u e u e
B u s In t e r fa c e U n it
3 2 -b it D a ta Bus
Fetching from on-chip Queue is fast
Reading from offchip Memory is slow
80386 Prefetch Queue
80386 Prefetch queue is 16B deep 1. The instruction fetch can read from the prefetch queue faster than from memory 2. The prefetcher can do some work while the execution unit is doing other tasks in parallel
Coprocessor: i387
The hardware implementation of floating point processing in the i387 means floating point operations run at much higher speed. The i386 can execute all mathematical expressions using software emulation of the i387.
80386: Classic CISC Processor
CISC = Complex Instruction Set Computer Complex instructions ...but code-size efficient Micro-encoding of the machine instructions Extensive addressing capabilities for memory operations Few, but very useful CPU registers
80386 Execution Sequence
C o p ro c e s s o r C IS C P ro c e s s o r R e g is te r R e g is te r R e g is te r R e g is te r Execution Unit ALU
M ic r o c o d e RO M
M ic r o c o d e Q ueue
Prefetch Queue
Bus Interface
Decoding Unit
C o n tr o l U n it
In a m ic r o p r o g r a m m e d C I S C t h e p r o c e s s o r f e tc h e s t h e in s t r u c t io n s v ia t h e b u s in te r f a c e in t o a p r e f e t c h q u e u e , w h ic h t r a n s fe r s th e m t o a d e c o d in g u n it . T h e d e c o d in g u n it b r e a k s th e m a c h in e in s t r u c t io n in t o m a n y e le m e n t a r y m ic r o - in s t r u c t io n s a n d a p p le s t h e m t o a m ic r o c o d e q u e u e . T h e m ic r o - in s t r u c t io n s a r e t r a n s f e r r e d f r o m t h e m ic r o c o d e q u e u e t o t h e c o n t r o l a n d e x e c u t io n u n it w h ic h d r iv e s t h e A L U a n d t h e r e g is t e r s
80386 Complex Instructions
CISC drawback: Most instructions are so complicated, they have to be broken into a sequence of micro-steps These steps are called Micro-Code Stored in a ROM in the processor core Micro-code ROM: Access-time and size... They require extra ROM and decode logic