0% found this document useful (0 votes)
18 views761 pages

B. The Most Successful Microprocesor and Their Interface Circuits

Prof. Muhammad H. El-SABA is an accomplished electronic engineering educator with a Ph.D. from INSA-Lyon, France, and has authored numerous books and articles on microprocessors and electronic devices. His work includes the development of EDA tools for VLSI devices, and he has taught various subjects including microprocessor architecture and assembly language programming. The document outlines the contents of his book 'Introduction to Microprocessors & Interface Circuits,' which covers a wide range of topics related to microprocessors and their architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views761 pages

B. The Most Successful Microprocesor and Their Interface Circuits

Prof. Muhammad H. El-SABA is an accomplished electronic engineering educator with a Ph.D. from INSA-Lyon, France, and has authored numerous books and articles on microprocessors and electronic devices. His work includes the development of EDA tools for VLSI devices, and he has taught various subjects including microprocessor architecture and assembly language programming. The document outlines the contents of his book 'Introduction to Microprocessors & Interface Circuits,' which covers a wide range of topics related to microprocessors and their architecture.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 761

Prof.

Muhammad El-Saba,
mhs1308&gmail.com

Includes
_________________________________________

ABOUT THE AUTHOR


__________________________________________

Muhammad H. El-SABA was born in Cairo,


Egypt. He obtained his Ph.D. in integrated
electronics from INSA-Lyon, France, in 1993.
He's been a lecturer, a teacher, and associate
professor of electronic engineering at the
Faculty of engineering, Ain-Shams University,
in Cairo. He designed and implemented so
many EDA tools like the device simulators
GOOD-SIMTM. The original tools, which he
specifically developed for VLSI devices &
circuits, are currently adopted in the electronic
industry. He authored 33 books and about 70
articles on device modeling, simulation of
electronic devices, superconductor devices, and
solid-state integrated circuits, microprocessors and microcontrollers. He also
prepared and animated several training courses in different areas of
industrial electronics, mobile communications, object-oriented programming
and VHDL. His current interests include design and implementation of VLSI
circuits & SoC, with emphasis on communication equipment.
Introduction to Microprocessors
&
Interface Circuits
________________________________________________

Covers

Intel 8086/8088/80286/80386/80486/
Pentium, Pentium II, Pentium III, Pentium4, Xeon, Itanium,
Intel Atom, Core, Core2, Core i7, Corei9 Processors,
Latest AMD, ARM and SPARC Processors

Contains an intensive description of


Intel x86 Assembly, Microsoft C/C++ and Java Languages

Prof. Dr. Muhammad El-SABA


Department of Electronics and Communications,
Faculty of Engineering, Ain-Shams University in Cairo.

2002-2020
Introduction to Microprocessors & Interface Circuits INDEX

Copyright © 2002-2020, by the author. All rights reserved.

1st Edition 2002


2nd Edition 2003
3rd Edition 2005
4th Edition 2007
5th Edition 2009
6th Edition 2013
7th Edition 2020

Reproduction or translation of any part of this work, without permission of


the copyright owner, is unlawful. Requests for permission or further
information should be addressed to the author, at the Dept. of Electronic
Engineering, Faculty of Engineering, 1 Sarayat street, 11517Abbasia, Cairo,
Egypt. E-mail Address [email protected]

Deposit No. 2003/10177 (Dar El-Kottob)

ii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

CONTENTS

Subject Page

Preface
Preamble

CH1: Introduction to Microprocessors & Microcomputers 1


1-1. Microprocessors & Microcomputers 3
1-2. Microprocessor History 6
1-2.1. Intel Microprocessors 6
1-2.2. Motorola Microprocessors 8
1-2.3. MOS Microprocessors 9
1-2.4. Zilog Microprocessors 9
1-2.5. AMD Microprocessors 10
1-2.6. SPARC Microprocessors 11
1-2.7. DEC Alpha Microprocessors 12
1-2.8. ARM Microprocessors 13
1-3. Microcomputer History 14
1-4. RISC & CISC Processors 16
1-5. How does the Microprocessor Work? 16
1-6. Memory & Addressing 19
1-7. Microprocessor Instructions 20
1-8. How does the Microcomputer Work? 28
1-9. Operating Systems 29
1-10. Computer Languages 33
1-11. Summary 36
1-12. Problems 37
1-13. References 39
CH2: Microprocessor Architecture 41
2-1. Introduction 43
2-2. Architecture of 8086/8088 Microprocessor 43
2-3. Architecture of 8087 Coprocessor 55
2-4. Architecture of 80286 Microprocessor 57
2-5. Architecture of 80386 Microprocessor 59
2-6. Architecture of 80486 Microprocessor 64
2-7. Architecture of Intel’s Pentium Microprocessor 66
2-8. Architecture of Intel’s Pentium II Microprocessor 69
2-9. Architecture of Intel’s Pentium III Microprocessor 70

iii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
2-10. Architecture of Intel’s Pentium 4 Microprocessor 71
2-11. Intel’s Core and Core2 Microprocessors 76
2-12. Intel’s Core i5, Core i7 and Corei9 Microprocessors 77
2-13. Architectures 64-bit Microprocessor 77
2-14. Architecture of AMD K10 78
2-15. Summary of Intel & AMD Architectures 79
2-16. Evolution of 80x86 from CISC to RISC Architecture 86
2-17. Architecture of RISC Processors (ARM & SPARC) 88
2-17.1. Architecture of ARM Processors 88
2-17.2. Architecture of SPARC Processors 92
2-17.3. Architecture of Super-SPARC Processors 93
2-17.4. Architecture of SPARC64 Processors 94
2-17.5. Architecture of UltraSPARC64 Processors 98
2-17.6. Multithreading Technology 100
2-18. CPU Market Share 101
2-19. Moore’s Law 104
2-20. Summary 105
2-21. Problems 112
2-22. References 114
CH3: Memory Organization and Segmentation 115
3-1. Memory Segmentation in Computer Systems 117
3-1.1. Virtual Memory 117
3-1.2. Physical Memory 117
3-1.3. Memory Paging 117
3-2. Memory Segmentation in x86 Systems 118
3-2.1. Flat Memory Model 118
3-2.2. Segmented Memory Model 118
3-3. Operation Modes of x86 Processors 119
3.3.1. Real Mode 120
3.3.2. Protected Mode 120
3.3.3. Virtual Mode 121
3.3.4. Long & Legacy Modes of x86-64 Processors 122
3-4. Memory Addressing of x86 Processors 123
3.4.1. Real Mode Addressing (Generating 20-bit Address) 124
3-4.2. Protected Mode Addressing (Generating 32-bit Address) 127
3-4.3. Memory Paging in Protected Mode 131
3-4.4. Protection Aspects 132
3-4.5. Privilege Levels 133

iv

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
3-4.6. Entering and Leaving the Protected Mode 134
3-4.7. Protected Multitasking 135
3-4.8. Virtual Mode 135
3-4.9. Physical Address Extension (PAE) 135
3-4.10. Long Mode Addressing in x86-64 Architecture 135
3-4.11. Long Mode Memory Management 137
3-4.12. RIP-Relatives Addressing 138
3-5. Stack Operation 139
3.5.1. Setting-up a Stack 141
3-5.2. Stack Operations 142
A.. Push Operation 142
B.. Pop Operation 144
3-5.3. Illustration Examples 145
3-5.4. Stack and Calling Procedures 146
3-5.5. Stack Behavior in 64-Bit Mode 147
3-6. IBM Memory Organization 148
3-7. SPARC Memory Models and Addressing Space 150
3-7.1. SPARC Memory Modes 150
3-7.2. SPARC Addressing Space 152
3-8. ARM Memory Organization 153
3-8.1. ARM Registers 153
3-8.2. ARM Stack 155
3-9. Summary 156
3-10. Problems 160
3-11. References 162
CH4: Microprocessor Instructions 163
4-1. Introduction 165
4-2. Data Types (Bytes, Words, Integers, Floating point numbers, … ) 165
4-3. Instruction Format of x86 Processors 172
4-4. Addressing Modes of x86 Processors 173
4-5. Intel’ 8086/80186/80286/80386/80486 Instruction Set (Alphabetical) 175
4-6. Basic Instruction Set of x86 Processors (by category). 180
4-6.1. Data Transfer Instructions 181
4-6.2. Arithmetic Instructions 187
4-6.3. Logic Instructions 190
4-6.4. String Instructions 193
4-6.5. Program Control Instructions 196
4-6.6. Processor Control Instructions 197
4-7. Math Coprocessor (x87) Instructions 199

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
4-8. Subroutine Calls & Interrupts in Assembly Language 201
4-8.1. Subroutine Calls (CALL) 202
4-8.2. Interrupts (INT) and interrupt vector table (IVT) 202
4-8.3. Masking Interrupts (Turning Interrupts Off) 206
4-8.4. Interrupts Priority 207
4-9. IBM PC Interrupts and & DOS Calls 208
4-9.1. PC Boot Process 208
4-9.2. PC Interrupt Service Routines (ISR’s) 209
4-9.3. BIOS Calls & DOS Calls 213
4-10. Interrupts in Protected Mode 215
4-10.1. Gates 215
4-10.2. Interrupt Descriptor Table (IDT) 216
4-10.3. Interrupt Masking in Protected Mode 216
4-10.4. Debugging in Protected Mode 217
4-11. New Instruction Sets of x86-64 Architecture 218
4-11.1. Media Instructions 219
4-11.2. Floating-Point Instructions 219
4-12. Summary of the Recent x86 Instructions 221
4-12.1. MMX Instructions. 221
4-12.2. Streaming SMID (SSE) Instructions. 221
4-12.3. SSE2 Instructions 222
4-12.4. SSE3 Instructions. 223
4-12.5. SSE4 Instructions. 223
4-13. Undocumented x86 Instructions 224
4-14. Converting Assembly Language to Machine Code 224
4-15. Case Study: Encoding the MOV Instruction 229
4-16 Execution Time of x86 Instructions 231
4-17. Instructions Set of SPARC Processors 233
4-18. Instruction Format of SPARC Processors 236
4-19. Encoding Load / Store Instructions of SPARC Processors 240
4-20. Instructions Set of ARM Processors 243
4-21. Summary 250
4-22. Problems 253
4-23. References 257
CH5: Assembly Language Programming, Compilation & Debugging 259
5-1. Introduction 261
5-2. DEBUG Program 262
5-3. Macro Assembler Programs 266
vi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
5-4. Assembly Language Instructions Format. 267
5-5. Assembler Data Types. 269
5-6. Assembler Directives 269
5-7. Declaring Variables 272
5-8. Modifiers & Attribute Operators 273
5-9. Difference between Values, Addresses and Pointers 275
5-10. Arrays in Assembly Language 276
5-11. Tables & Lookup Tables in Assembly Language 278
5-12. Other Data Structures in Assembly Language 278
5-12.1. Queues 279
5-12.2. Linked Lists 279
5-12.3. Hash Tables 282
5-12.4. Binary Trees 282
5-13. Working with Strings in Assembly Language 284
5-14. Procedures in Assembly Programs 286
5-15. Functions in Assembly Programs 286
5-16. Writing & Initializing Interrupts in Assembly Programs 287
5-17. Creating Macros in Assembly Programs 289
5-18. Assembly Program Compilation & Linking 291
5-19. 16-Bit Macro-Assemblers (MASM16, TASM16) 292
5-20. MASM Syntax for x86 Memory Addressing Modes 293
5-21. 32-Bit Macro-Assemblers (MASM32) 298
5-22. 64-Bit Macro-Assemblers (YASM) 300
5-23. Summary of x86 Macro-assembler Programs 301
5-24. Summary 302
5-25. Problems 303
5-26. References 306
CH6: Writing Assembly Routines within C/C++ and Java Programs 307
6-1. Introduction 309
6-2. General Considerations (16-bit, 32-bit and 64-bit programs) 310
6-2.1. Using YASM Assembler with Visual Studio and VC++ 310
6-2.2. I/O Software Layers 311
6-2.3. I/O in DOS, and Windows 311
6-2.4. Direct Memory Access (ActiveX and all that Stuff) 312
6-3. C-Programming Language (Summary) 314
6-3.1. Data Types in C-language 315
6-3.2. Variable Declaration in C-language 316
6-3.3. Expressions 316
vii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
6-3.4. Operators 316
6-3.5. Conditional Execution & Branching 318
6-3.6. Looping Instructions 319
6-3.7. Functions Declaration (Prototyping) and Definition 319
6-3.8. Derived Types 321
6-3.9. Data Structures 321
6-3.10. Accessing Structure Members 322
6-3.11. Pointers in C-language 322
6-3.12. Utilization of Pointers with Structures 322
6-3.13. Input / Output in C-Language 324
6-3.14. C-Preprocessor Directives 329
6-4. C++ and Object-Oriented Programming 331
6-4.1. Object-Oriented Programming (OOP) 331
6-4.2. Classes in C++ 333
6-4.3. Class Constructors and Destructors 334
6-4.4. Specific Operators in C++ 335
6-4.5. Input / Output in C++ 335
6-4.6. Inheritance in C++ 337
6-4.7. Polymorphism in C++ 340
6-4.8. Abstract Classes in C++ 342
6-4.9. Operator Overloading 342
6-4.10. Friend Functions in C++ 344
6-4.11. Generic Types (Templates) in C++ 344
6-4.12. Additional Notes about C/C++ 347
6-4.13. Common Problems in C/C++ 347
6-4.14. C++11 347
6-5. Programming under Windows 348
6-5.1. Windows Messaging System 348
6-5.2. Writing Windows DLL in C/C++ and Assembly 351
6-6. Writing Assembly Blocks inside C/C++ Programs 352
6-6.1. The _asm Keyword in Visual C/C++ 352
6-6.2. Using C or C++ Symbols in_asm Blocks 353
6-6.3. Writing Functions with Inline Assembly 353
6-6.4. Accessing C or C++ Data in__asm Blocks 354
6-6.5. Jumping to Labels in Inline Assembly 356
6-6.6. Calling C-Functions in Inline Assembly 347
6-6.7. Calling C++ Functions in Inline Assembly 348
6-6.8. Interrupts in Inline Assembly 358
6-7. Java-Programming Language (Summary) 360
6-8. Java versus C++ (Comparison) 405
6-9. Java versus C# (Comparison) 406
viii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
6-10. Invoking Assembly Language Programs from Java 408
6-11. Summary 415
6-12. Problems 417
6-13. References 420
CH7: Memory Interfacing with x86 Microprocessors 421
7-1. Introduction 423
7-2. Bus Timing of Memory Read/Write Operations 425
7-2.1. Memory Read Timing 425
7-2.2. Memory Write Timing 426
7-2.3. Wait States in 80x86 Microprocessors 427
7-2.4. Pentium Processor Bus Timing 428
7-2.5. Bus Cycle Time & Bus Bandwidth of 8-x86 Processors 428
7-3. Memory Address Decoding 430
7-4. ROM Interfacing 431
7-5. RAM Interfacing (SRAM, DRAM) 434
7-5.1. SRAM Interfacing 435
7-5.2. SRAM versus Cache Memory 436
7-5.3. DRAM Interfacing (EDO, SDRAM, DDR, RAMBUS, DDR2) 439
7-5.4. DRAM Interfacing with 16-bit Data Bus 447
7-5.5. DRAM Interfacing with 32-bit Data Bus 447
7-5.6. DRAM Interfacing with 64-bit Data Bus 448
7-5.7 DRAM Modules 448
7-5.8. DRAM Controllers 452
7-6. Memory Requests 454
7-7. Checking Memory Errors 456
7-7.1. Parity Checking 456
7-7.2. Errors Checking and Correction (ECC) 457
7-8. Serial Memory Devices 458
7.9. Secondary Memory 460
7-9.1. Magnetic Storage Devices 461
i. Magnetic Tapes 461
ii. Magnetic Disk Drives 462
7-9.2. Optical Memory & Compact Disks (CD) 464
7-10. Mobile Memory Modules 466
7-10.1. SRAM Cards 466
7-10.2. Flash Memory Cards 466
7-10.3. USB Flash Memory Drives 472
7-11. Summary 475
7-12. Problems 479
ix

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
7-13. References 481
CH8: I/O Interfacing Circuits for 80x86 Microprocessors 483
8-1. Introduction (I/O Transfer Modes) 485
8-2. Methods of Addressing I/O Ports 486
8-2.1. I/O address Space 486
8-2.2. Memory-mapped I/O 487
8-3. I/O Instructions 487
8-3.1. Register I/O Instructions 487
8-3.2. Block I/O Instructions 489
8-4. Protected I/O 491
8-5. Designing I/O interfaces for 80x86 systems 492
8-5.1. Implementing Simple Input Ports Using 74LS244 Buffers 492
8-5.2. Implementing Simple Output Ports Using 74LS373 Latch 493
8-6. Using 8255 Programmable Peripheral interface (PPI) Chip. 497
Example 8-1. Basic I/O Mode 499
Example 8-2. Basic I/O Mode 501
Example 8-3. Keyboard Scanner & 7-Segment Display 502
Example 8-4. Square Wave generator (BSR Mode). 506
Example 8-5. Input from ADC 507
Example 8-6. Stepper Motor Control 509
8-7. I/O with Handshaking Capabilities 512
8-7.1. I/O with Handshaking Capabilities 514
Example 8-7. I/O with handshaking (Mode 1) 516
8-7.2. Bidirectional I/O with Handshaking Capabilities 517
Example 8-8. Bidirectional I/O with handshaking (Mode 2) 519
8-7.3. CPU Services for I/O Control 520
8-8. I/O – Memory Interface & Direct Memory Access (DMA) 521
8-8.1. The DMA Chip (8237) Architecture 521
8-8.2. How Does DMA Work? 522
8-8.3. DMA Usage in IBM PC 523
8-8.4. DMA Modes of Operation 524
8-8.5. Programming The DMA 527
8-9. I/O Processors 528
8-9-1. Features of IOP's 528
8-9-2. Intel 8089 IOP 529
8-9-3. Intel 80321 IOP 530
8-10. Summary 531
8-11. Problems 532
8-12. References 535
x

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
CH9: Interfacing with IBM PC & Compatibles 537
9-1. Introduction (Overview of the IBM PC) 539
9-2. PC Motherboard 540
9-3. Busses & Expansion Slots 544
9-4. History of PC Buses 546
9-4.1. PC (8-bit) Bus 547
9-4.2. ISA (16-bit) Bus 550
9-4.3. Proprietary Buses & their Problems 552
9-4.4. MCA & EISA (32-bit) Buses 551
9-4.5. VESA Local Bus 551
9-4.6. PCI (64-bit) Bus 553
9-4.7. Accelerated Graphic Port (AGP)PCI-X (128-bit) Bus 556
9-4.8. PCI Express Bus 558
9-4.9. IEEE-488 (GP-IB) Bus 559
9-4.10. SMBus 560
9-4.11. I2C Bus 560
9-4.12. JTAG (IEEE 1949.1) Bus Architecture 561
9-4.13. Control-area Network (CAN) 562
9-4.14. Local Interconnection Network (LIN) 562
9-4.15. Multi-Bus Architecture 564
9-4.16. Bus Hierarchy 565
9-4.17. Bus Topologies 566
9-4.18. PCMCIA & ExpressCard 568
9-4.19. PC I/O Extension Cards 568
9-5. PC Serial Ports 570
9-5.1. Introduction to Serial Communications & RS232 570
9-5-2. UART Chips 573
9-5.3. Description of the Serial Port 575
9-5.4. How Many Wires do We Need for a Serial Connection? 576
9-5.5. Addressing the Serial Port 577
9-5.6. Programming the Serial Port 577
9-5.7. Universal Serial Bus (USB) 579
9-4.8. USB to RS232 Interface 582
9-5.9. Other Serial Bus Standards (CCESS.bus, Fire Wire, IrDA) 583
9-5.10. PC-to-PC Communication (Networking & Ethernet) 586
9-5.11. Switching Networks 591
9-6. PC Parallel Ports 593
9-6.1. Parallel Port Architecture 594
9-6.2. IBM-PC Parallel Port Cable 595
9-6.3. Parallel port I/O Addressing 596
9-6.4. Parallel Port Timing Diagram 597
xi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
9-6.5. Programming the Parallel Port 597
9-6-6. Recent Improvements in the PC Parallel Port 599
9-6-7. Parallel Port I/O under Windows 9X, 2K, XP 601
9-7. Attaching a Mass Storage Device to IBM PC 603
9-7.1. Historical Development of ATA and IDE 603
9-7.2. Parallel ATA (PATA) Interface 606
9-7.3. Serial ATA (SATA) Interface 607
9-7.4. Comparison between ATA, SATA, SCSI and USB 608
9-8. Keyboard Interface 610
9-8.1. Keyboard Operation 611
9-8.2. Detailed Operation of the PS/2 Keyboard 614
9-8.3. Keyboard Protocol & Data Format 614
9-8.4. Keyboard Connectors 615
9-8.5. Keyboard BIOS Calls 616
9-8.6. Keyboard Interface Circuits 617
9-9. Mouse Interface 619
9-10. Video Monitor Interface Circuits 622
9-10.1. Video Adaptor Standards 622
9-10.2. Video Monitors & Connectors 624
9-10.3. BIOS Video Interface & Interrupt 10H 627
9-10.4. Graphic Processing Unit (GPU) & Graphic Accelerators 629
9-11. Summary of I/O Addresses in IBM PC & Compatibles 630
9-12. Summary 632
9-13. Problems 635
9-14. References 637
CH 10: Microprocessor Support Chips & PC Chipsets 639
10-1. Introduction 641
10-2. The 8237 DMA Controller 642
10-3. The 8250A UART Chip 643
10-4. The 8251A USART Chip 646
10-5. The 16550 UART Chip 647
10-6. The 8253 Programmable Interval Timer (PIT) 647
10-7. The 8255 Programmable Peripheral Interface (PPI) 650
10-8. The 8259 Programmable Interrupt Controller (PIC) 652
10-9. The 8279 Keyboard / Display Controller 657
10-10. The 8288 Bus controller 659
10-11. The 8289 Bus Arbiter 661
10-12. The 8275 & 6845 CRT Controllers 662

xii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Subject Page
10-13. Peripheral Universal Interface (UPI), Intel 8742 Chip 664
10-14. Graphic Processing Units (GPU) & Graphic Accelerators 666
10-15. IBM PC Chipsets 668
10-16. Case Study: Intel 82845 Chipset 669
10-16.1. North-Bridge Chipset (MCH) 669
10-16.2. South-Bridge Chipset (ICH) 670
10-17. Case Study: Intel DG965 Chipset 673
10-18. Case Study: AMD 690G Chipset 673
10-19. Case Study: Intel x58 (Core i7) Chipset 675
10-20. Case Study: Apple PC Chipsets 677
10-21. Case Study: Intel Z170 (Skylake) Chipset 679
10-22. Summary 681
10-23. Problems 686
10-24. References 689
CH 11: Microprocessor Selection Guide 691
11-1. Intel Microprocessors Selection Guide 693
11-2. AMD Microprocessors Selection Guide 700
11-3. SPARC Microprocessors Selection Guide 704
11-4. ARM Microprocessors Selection Guide 706
11-5. Processor Performance Factors 706
11-6. Benchmarks 708
11-7. Microprocessor Packages & Marking 712
11-8. Processor Sockets 713
11-9. Processor Bus Speeds 718
11-10. Processor Overclocking 718
11-11. Processor Supply Voltages 719
11-12. Processor Cooling 719
11-13. Summary 721
11-14. Problems 724
11-15. References 725
Appendices 727
Appendix A: Quick Reference of 8086/8088 Instruction Set. 729
Appendix B: Basic Instruction Set of x86 Microprocessors. 737
Appendix C: Flag Reference of X86 Instructions 797
Appendix D: Math Coprocessor (x87) Instructions 801
Appendix E: Basic Instruction Set of SPARC Processors. 809

xiii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Appendix F: Summary of ARM Instruction Set 821


Acronyms 825
Trademarks 855

xiv

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

List of Figures
Figure Figure Caption Page
Chapter 1.
Fig. 1-1 Schematic of a microprocessor system. 4
Fig. 1-2 Photograph of the Intel first microprocessor, the 4004 7
Fig. 1-3 Chronological evolution of Intel's 80x86 family 8
Fig. 1-4 Photograph of some old AMD processors 10
Fig. 1-5 Photograph of the ALTAIR 8800 computer 12
Fig. 1-6 Photograph of the IBM first personal computer (IBM PC 5150) 13
Fig. 1-7. Schematic diagram of a simple microprocessor 15
Fig. 1-8. Schematic diagram of a memory system, 18
Fig. 1-9. Flowchart of the program and the equivalent C-language code 20
Fig. 1-10. Flowchart of the microprocessor operation 24
Fig. 1-11(a) Block diagram of a hardwired controller 25
Fig. 1-11(b) Block diagram of a microprogrammed controller 26
Fig. 1-12. Architecture of a microcomputer system (Software & Hardware) 27
Fig. 1-13. Compilation and linking of high-level languages 32

Chapter 2.
Fig. 2-1 Basic architectures of microprocessors 39
Fig. 2-2 Pin-out diagram of the Intel 8086 microprocessor 40
Fig. 2-3 Architecture of the Intel 8086/8088 microprocessors 42
Fig. 2-4 Block Diagram of the 74LS374 octal latch 44
Fig. 2-5 Address de-multiplexing from address/data lines of 8086 processors 45
Fig. 2-6 Block Diagram of the 74LS245 octal tri-state buffer 45
Fig. 2-7(a) Generating control bus signals in 8086 minimum mode 46
Fig. 2-7(b) Generating control bus signals in 8088 minimum mode 46
Fig. 2-8 Illustration of the FLAGS register in Intel 8086/8088 microprocessors. 48
Fig. 2-9 Handling hardware interrupts in 8086/8088 systems, using the 8259 chip 49
Fig. 2-10 Gating the hardware Interrupts inside the 80x86 systems 49
Fig. 2-11 RESET, CLK and READY pins in 8086/8088 microprocessors 50
Fig. 2-12 Architecture of the 8087 math coprocessor 51
Fig. 2-13 Connection of the 8087 with 8088 microprocessor, in maximum mode 52
Fig. 2-14(a) Architecture of the 80286 microprocessor 54
Fig. 2-14(b) Machine status word register (MSW) in the 80286 microprocessor 54
Fig. 2-15(a) Pinout diagram of the 80386DX microprocessor 55
Fig. 2-15(b) Internal architecture of the 80386 microprocessor 56
Fig. 2-16(c) Internal registers in 80386 microprocessors. 58
Fig. 2-16(a) Structure of the EFLAGS register in 80386 microprocessors 59
Fig. 2-16(b) Structure of the CR0 control register in 80386 microprocessors 60
Fig. 2-17 Architecture of the Intel 80486 microprocessor 61

xv

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Captions Page

Fig. 2-18 Architecture of the Intel Pentium (80586) microprocessor 63


Fig. 2-19 Pipelined and Non-pipelined instruction processing 64
Fig. 2-20 Architecture of the Intel Pentium 4 microprocessor 67
Fig. 2-21(a) Architecture of the Intel dual Core (Core2) microprocessor 70
Fig. 2-21(b) Photograph of the Intel dual Core2 microprocessor 71
Fig. 2-22 Nehalem Microarchitecture (Core i7) 72
Fig. 2-23 Application registers in x86-64 (64 bit) microprocessors 73
Fig. 2-24 Architecture of AMD K10 microprocessors 74
Fig. 2-25 Low-power state of Atom processors 80
Fig. 2-26 Comparison between the execution cycles of RISC and CISC machines. 83
Fig. 2-27(a) Architecture of ARM117JZ microprocessor, with ARM 11 core 85
Fig. 2-27(b) ARM processors roadmap 86
Fig. 2-28(a) SPARC processor registers 89
Fig. 2-28(b) SPARC integer unit registers organization 90
Fig. 2-29 Block diagram of a SuperSPARC processor 91
Fig. 2-30 Block diagram of the SuperSPARC64 architecture 92
Fig. 2-31 Details of the SuperSPARC64 VI architecture 93
Fig. 2-32 Block diagram of the SuperSPARC64 chip 93
Fig. 2-33 Roadmap of SuperSPARC64 architectures 94
Fig. 2-34 Architecture of UltraSPARC-T1 processor 95
Fig. 2-35 Virtual threading (VMT) and simultaneous threading (SMT) technologies. 97
Fig. 2-36 Market shares of x86 microprocessors manufacturers 98
Fig. 2-37 Microprocessors and DRAM roadmap. 99

Chapter 3.
Fig. 3-1(a) Basic memory models 115
Fig. 3-1(b) Basic memory operation modes in x86 processors 116
Fig. 3-1(c) Illustration of the operation modes in x86-64 processors. 118
Fig. 3-1(d) Virtual memory space in x86-64 microprocessors 119
Fig. 3-2(a) Memory segmentation in x86 systems (Real mode). 121
Fig. 3-2(b) Addressing a Memory Location inside a Segment, by adding an offset 122
Fig. 3-2(c) Addressing a memory location inside a segment 122
Fig. 3-3(a) Generation of the 32-bit address offset in 80386 and later processors 124
Fig. 3-3(b) Segmented address generation in real and protected modes 124
Fig. 3-4(a) Segment selector architecture 125
Fig. 3-4(b) Fields in a descriptor table (segment descriptor). 125
Fig. 3-4(c) Linear address generation mechanism in protected memory mode. 125
Fig. 3-4(d) Combining the 32-bit effective address with the segment selector 126
Fig. 3-5(a) One of the page directory records (translation table entry). 127
Fig. 3-5(b) Illustration of the paging mechanism 128
Fig. 3-6. Privilege levels, in 80386 (and later processors) systems 129
xvi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Captions Page


Fig. 3-7 Structure in x86-64-bit virtual memory 132
Fig. 3-8 Memory management in long operation sub-modes 133
Fig. 3-9 A Stack of paper 135
Fig. 3-10 Stack segment structure 136
Fig. 3-11 Stack frame 137
Fig. 3-12(a) Operation of the PUSH instruction (in IA-32 processors) 138
Fig. 3-12(b) Operation of the POP instruction (in IA-32 processors) 140
Fig. 3-13(a) Illustration of the Push AX instruction (in 16-bit processors) 141
Fig. 3-13(b) Illustration of the Pop BX instruction (in 16-bit processors) 142
Fig. 3-14 Stack organization with local variables for calling procedures 143
Fig. 3-15(a) IBM PC memory map (in real mode). 144
Fig. 3-15(b) Modern PC memory map (32bit). 145
Fig. 3-16(a) SPARC memory model, memory side. 147
Fig. 3-16(b) SPARC memory model, processor side. 147

Chapter 4.
Fig. 4-1(a) Byte and word organization in memory, with little endian representation 158
Fig. 4-1(b) Packed and unpacked binary-coded decimal (BCD) number representation 160
Fig. 4-1(c) Fixed-point number representation (signed magnitude form). 161
Fig. 4-1(d) Floating-point number representation, as two fixed point numbers 161
Fig. 4-1(e) Floating-point number representation, in IEEE 754 format, for 32-bit 162
Fig. 4-1(f) Floating-point number representation, in IEEE 754 format, for 64-bit 163
Fig. 4-1(g) Floating-point number representation, in IEEE 754 format, for 80-bit 163
Fig. 4-2 Fundamental instruction format for x86 microprocessors 164
Fig. 4-3 Basic addressing modes of the x86 microprocessors 165
Fig. 4-4 Sequence of operations for executing the instruction: MOV AX,[BX+3]. 166
Fig. 4-5 BCD to 7-segment code translation 175
Fig. 4-6 Sign extension and zero extension, from 1 byte to 2 bytes 178
Fig. 4-7(a). Illustration of MUL BX instruction, where BX contains 0100 180
Fig. 4-7(b) Illustration of MUL and DIV instructions, with different operand sizes 181
Fig. 4-8. Shift and Rotate operations in 80x86 microprocessors 184
Fig. 4-9. String transfer (MOVS) or comparison (CMPS) operations, when DF = 0 186
Fig. 4-10. Schematic representation of simple and nested subroutine calls 193
Fig. 4-11. Schematic representation of interrupt vector table 197
Fig. 4-12(a) IBM PC memory map after boot-up process 202
Fig. 4-12(b) IBM PC layered architecture 205
Fig. 4-13. Architecture of the interrupt descriptor table registers 208
Fig. 4-14. Debug registers in 80386 and higher processors 209
Fig. 4-15(a) Instructions format of x86 processors (16-bit instructions). 217
Fig. 4-15(b) Instructions format of x86 processors (32-bit instructions). 217
Fig. 4-16 Encoding the MOV instruction of x86 processors 221
Fig. 4-17 Instruction formats of SPARC processors: Format 1,2,3 228

xvii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Captions Page


Fig. 4-18 Instruction formats of SPARC processors: Format 3 (cont.) 229
Fig. 4-19 Instruction formats of SPARC processors: Format 4 229
Fig. 4-20 Formats of Load/Store Instructions of SPARC processors: 232
Chapter 5.
Fig. 5-1 Calling the DEBUG program. 247
Fig. 5-2 Assembly program tracing using the DEBUG Program 247
Fig. 5-3(a) Using the DEBUG program to display memory contents, 248
Fig. 5-3(b) Using the DEBUG program to enter data, and run programs 248
Fig. 5-3(c) Using the DEBUG program to load a disk sector, with “-L” command. 249
Fig. 5-4 A piece of an assembly program, as it appears in DEBUG and MASM 250
Fig. 5-5 Symbolic format of Assembler instructions 251
Fig. 5-6. Arrangement of data arrays in the main memory 261
Fig. 5-7(a) Illustration of the linked list data structure 263
Fig. 5-7(b) Example of a linked list data structure 264
Fig. 5-7(c) Illustration of the double-linked list data structure 266
Fig. 5-8 Illustration of the binary tree data structure 268
Fig. 5-9. Flowchart of assembly program compilation and linking 275
Fig. 5-10. Memory models of 16-bit assemblers 276
Fig. 5-11 Template of an assembly program, under MASM32. 283

Chapter 6.
Fig. 6-1 Piece of a C++ code and its equivalent binary code 293
Fig. 6-2(a) Operating system layered structure of an IBM PC, 295
Fig. 6-2(b) Operating system layered structure, in old and recent PC’s. 296
Fig. 6-3 Representation of an array of data elements in C-Language 300
Fig. 6-4 Main components of objected-oriented programming technology 316
Fig. 6-5 Block diagram of a typical Windows application program 333
Fig. 6-6 Compilation and interpretation of Java programs 344
Fig. 6-6 Representation of an array of ten data elements 344
Fig. 6-7 Class hierarchy in java.lang package 372
Fig. 6-8 GraphicObject Class hierarchy 387
Fig. 6-9 Accessing JNI functions 393
Fig. 6-10 Windows MessageBox called from Java 398

Chapter 7.
Fig. 7-1 Memory organization of a computer system 407
Fig. 7-2 Primary, secondary and tertiary memory devices 408
Fig. 7-3 Memory interface circuit to the Intel 8088 microprocessor in IBM PC 409
Fig. 7-4(a) Timing diagram of memory read cycle in 8086/8088 microprocessors 410
Fig. 7-4(b) Timing diagram of memory write cycle in 8086/8088 microprocessors 411
Fig. 7-5 Timing diagram of memory read cycle in Pentium microprocessors 412
xviii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Captions Page


Fig. 7-6(a) Memory address decoding of 2 ROM chips, using a simple inverter 414
Fig. 7-6(b) Memory address decoding of several memory chips, using a decoder 415
Fig. 7-7(a) Schematic symbol of a ROM chip. 416
Fig. 7-7(b) Pin-out diagram of 2716 (2k x 8 bit) EPROM 416
Fig. 7-8 Interfacing 8088 to eight 2764 (8k x 8 bit) EPROM chips 417
Fig. 7-9(a) Schematic symbol of a RAM chip. 418
Fig. 7-9(b) External connection of RAM IC's 418
Fig. 7-10 Conventional SRAM Cell, with 6 MOSFET transistors (6T cell). 419
Fig. 7-11 Block diagram of an SRAM chip 420
Fig. 7-12(a) Cache memory operation. 421
Fig. 7-12(b) Multi-level cache memory organization 422
Fig. 7-13(a) Conventional DRAM Cell, with one MOSFET and one capacitor (1T-1C) 422
Fig. 7-13(b) A DRAM module, in READ mode 423
Fig. 7-13(c) A DRAM module, in Write mode 424
Fig. 7-14 General block diagram of a DRAM chip and its timing diagram 425
Fig. 7-15(a) Pin-out diagram of 41256 (256k x 1 bit) DRAM 426
Fig. 7-15(b) Interfacing 8088 with a bank of eight 4164 DRAM chips 427
Fig. 7-16(a) Memory interface with 16-bit data bus 430
Fig. 7-16(b) Memory interface with 32-bit data bus 430
Fig. 7-16(c) Memory interface with 64-bit data bus 431
Fig. 7-17(a) SIMM with 30-pins memory module and its pin-out diagram 432
Fig. 7-17(b) DIMM with 168-pin memory modules 432
Fig. 7-17(c) DDR SDRAM module (DIMM with 184 pin) 432
Fig. 7-17(d) DDR2 SDRAM module (DIMM with 240 pin) 433
Fig. 7-17(e) DDR SDRAM modules and their evolution roadmap 433
Fig. 7-18(a) Pin-out diagram of the 8205 DRAM controller chip 435
Fig. 7-18(b) Interfacing a 1MB DRAM in 2 banks, via the 8205 DRAM controller chip. 436
Fig. 7-18(c) Architecture of the 8420 (1MB) DRAM controller chip 436
Fig. 7-19(a) Schematic diagram of memory requests 419
Fig. 7-19(b) Details of memory request in IBM PC's 437
Fig. 7-20 Simple parity generator circuit 440
Fig. 7-21 Logic symbol of the 74ABT853 data transceiver with parity generator 440
Fig. 7-22 Logic diagram of the 74ABT853 data transceiver with parity generator 441
Fig. 7-23 Structure of a 8-bit serial RAM 441
Fig. 7-24 Structure of 1x1 bit RAM 441
Fig. 7-25 Pinout diagram of Atmel 1k bit serial EEPROM, AT24C01 442
Fig. 7-26 Schematic diagram of Atmel 1k bit serial EEPROM, AT24C01 443
Fig. 7-27. Schematic diagram of a magnetic storage system 444
Fig. 7-28. Schematic diagram of a magnetic tape system 444
Fig. 7-29. Photgraphs of some floppy disks (diskettes). 445
Fig. 7-30. Schematic diagram of the hard disk drive (HDD). 446

xix

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Caption Page


Fig. 7-31 Schematic diagram of a compact disk (CD) and the optical head assembly. 448
Fig. 7-32 Photograph of an SRAM card 449
Fig. 7-33 EEPROM array and cell structure 450
Fig. 7-34 Structure of an AND and NOR flash memories 451
Fig. 7-35 Flash memory cards of different sizes and form factors 452
Fig. 7-36(a) Structure of the Fujitsu 1 MB Flash memory, MBM29LV800 4534
Fig. 7-36(b) Architecture. of the 2GB NNND flash memory HY27HU08AG 454
Fig. 7-37 USB flash memory modules 456
Fig. 7-38 Internal structure of a USB flash memory module 456

Chapter 8.
Fig. 8-1 I/O interfacing in a microprocessor system 467
Fig. 8-2(a) Implementation of an input port using the 74LS244 octal buffer 473
Fig. 8-2(b) Implementation of a simple input port using 74LS244 and 8 dip switches 474
Fig. 8-3(a) Implementation of an output port using the 74LS373 octal latch 475
Fig. 8-3(b) Implementation of a simple output port using 74LS374 and 8 LED’s 475
Fig. 8-3(c) Implementation of an output port using octal latch and 7-segment display 476
Fig. 8-3(d) Output port with7segment circuits and a BCD-to-7segment decoder 477
Fig. 8-3(e) Output driving circuits 477
Fig. 8-4 Pin-out diagram of the 8255 PPI (or PIO) chip. 478
Fig. 8-5 Control register word of the 8255 PIO chip. 479
Fig. 8-6 Connecting the 8255 PIO chip with a microprocessor busses 481
Fig. 8-7 Connecting the 8255 with the microprocessor busses 482
Fig. 8-8(a) Implementation of a 16-key keypad interface, using the 8255 PIO chip 484
Fig. 8-8(b) Flowchart of the KEY procedure 485
Fig. 8-9 Schematic of the square wave generator by the 8255 in BSR Mode 488
Fig. 8-10 Implementation of an ADC interface, using the 8255 PIO chip 489
Fig. 8-11(a) Schematic of the stepper motor driver circuit 490
Fig. 8-11(b) Schematic of the stepper motor driver circuit and the layout of ULN2003 491
Fig. 8-11(c) Cross section of a permanent magnet bipolar stepper motor 492
Fig. 8-12(a) Schematic representation of data transfer using strobing mechanism 493
Fig. 8-12(b) Schematic representation of data transfer with handshaking 494
Fig. 8-12(c) Data transfer from CPU (via PPI) to Output device with handshaking 494
Fig. 8-12(d) Data transfer from an Input device to CPU (via PPI) with handshaking 495
Fig. 8-13 Handshaking signals of the 8255 PIO chip 496
Fig. 8-14 Timing diagram of the 8255 PIO chip in mode 1 497
Fig. 8-15 Data output from the 8255 PIO chip to a line Printer with Handshaking 498
Fig. 8-16 Bidirectional I/O operation (Mode 2) of PA with handshaking 499
Fig. 8-17(a) Pin-out diagram of the Intel 8237 DMA controller 502
Fig. 8-17(b) Architecture of the Intel 8237 DMA controller 503
Fig. 8-18 Connection of the Intel 8237 DMA controller with the 8088 505
Fig. 8-19 Connection of an I/O Processor (IOP) to a host CPU, via a local bus 510
xx

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Caption Page


Fig. 8-20 Functional block diagram of the Intel 8089 I/O Processor 510
Fig. 8-21 Functional block diagram of the Intel 80321 I/O Processor 511

Chapter 9.
Fig. 9-1 Overview of an old desktop IBM PC 519
Fig. 9-2 General block diagram of an IBM PC, with peripheral devices. 520
Fig. 9-3(a) Motherboard of the first IBM PC (1981) and itsschematic layout. 521
Fig. 9-3(b) Motherboard of a Pentium-based IBM PC.. 522
Fig. 9-3(c) Intel motherboards: DG965 for Intel Core2 Duo microprocessor. 522
Fig. 9-3(d) Intel motherboards. for Intel Core i7 microprocessor, 523
Fig. 9-4 Installing an expansion card into an expansion slot. 525
Fig. 9-5 Different standard shapes of expansion cards 526
Fig. 9-6 Overview of the S-100 interface bus and an card 527
Fig. 9-7(a) Overview of the ISA interface bus and an ISA card 528
Fig. 9-7(b) Description of the ISA bus. 529
Fig. 9-8 Overview of the MCA bus and the EISA bus. 531
Fig. 9-9(a) Overview of the VESA local bus (VLB). 531
Fig. 9-9(b) Details of the VESA local bus (VLB). 532
Fig. 9-10(a) Pins of the original 32-bit PCI bus 534
Fig. 9-10(b) I/O Pins and corresponding signals of the 32-bit PCI bus 534
Fig. 9-11 Different shapes of PCI slots and cards. 536
Fig. 9-12(a) Illustration of the AGP slot on the motherboard 537
Fig. 9-12(a) AGP Architecture and its connection to the PCI bus 537
Fig. 9-13 Slots of the PCI Express bus 531
Fig. 9-14 GPIB bus and how it connects devices to a PC, via a GPIB cable 540
Fig. 9-15 Simplified model of I2C bus 541
Fig. 9-16 Simplified model of the JTAG bus 541
Fig. 9-17 Simplified model of the CAN bus 542
Fig. 9-18 PC expansion slots. 543
Fig. 9-19 Different busses and I/O devices in the IBM PC 544
Fig. 9-20 Bus topologies 545
Fig. 9-21 Circuit diagram and layout of an ISA bus extension card 547
Fig. 9-22 Transmission of serial data, through serial ports. 549
Fig. 9-23 OSI model of a computer communication system 550
Fig. 9-24 Serial data frame 551
Fig. 9-25 Connection of the 8250 UART with a MODEM, for serial communication 554
Fig. 9-26 DB9 serial cable interface 555
Fig. 9-27 Typical serial cable connection 556
Fig. 9-28 USB connectors 559
Fig. 9-29 USB cables lines 560
Fig. 9-30 USB 3/0 male plugs 561

xxi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Caption Page


Fig. 9-31 FT232R USB-UART chip for RS232 -USB interface. 562
Fig. 9-32 FireWire 400 (IEEE 1394a) connectors. 564
Fig. 9-33 Network interface card, with BNC connector. 564
Fig. 9-34 Network interface card, with BNC connector.. 566
Fig. 9-35 PC networking via hubs and switches. 566
Fig. 9-36 PC Networking via hubs or switches, with different configurations.. 567
Fig. 9-37 Ethernet standard connector (RJ-45 plug and cables). 568
Fig. 9-38 Wideband area network and Internet connection to a local area network 559
Fig. 9-39 ATM data packets (cells)as compared to conventional RS 232 frames 571
Fig. 9-40 Transmission of parallel data, through parallel ports. 572
Fig. 9-41 Parallel port DB25 connector. 574
Fig. 9-42 Timing Diagram for the Centronics protocol 576
Fig. 9-43(a) Structure of the parallel port 32-bit drivers. 581
Fig. 9-43(b) Pin list and cables of ATA connector (40-pin plug and ribbon cables).. 581
Fig. 9-44 Serial ATA (SATA) and parallel ATA (PATA) connectors 585
Fig. 9-45 Serial ATA (SATA) plug and cables.. 585
Fig. 9-46 SATA to USB converter. 586
Fig. 9-47 Photographs of two IBM PC keyboards 588
Fig. 9-48 PC/XT keyboard interface 589
Fig. 9-49 Keyboard keys and scan codes 592
Fig. 9-50 Keyboard data frames 592
Fig. 9-51 Main keyboard connectors 593
Fig. 9-52 Schematic of the PS2 keyboard interface, showing the 8048 controller 594
Fig. 9-53 Different shapes of the computer mouse 595
Fig. 9-54 Schematic diagram of a PS/2 mouse 596
Fig. 9-55 Old CRT (analog) monitor and a recent LCD (digital) monitor 597
Fig. 9-56 Video connectors (VGA and DVI and HDMI). 597
Fig. 9-57. Video connectors (VGA and DVI). 598
Fig. 9-58. Basic block diagram of monochrome CRT monitor 600
Fig. 9-59. Video Graphic Adaptor of an old IBM PC 602
Chapter 10.
Fig. 10-1 Bock diagram of the IBM PC, showing the 8088 CPU and support chips. 613
Fig. 10-2(a) The 8250 UART chip block diagram 618
Fig. 10-2(b) Pin-out diagram of the 8250 and 16550 UART Chips 619
Fig. 10-3(a) Pin-out diagram of the 8253/8254 programmable interval timer chip 620
Fig. 10-3(b) Functional block diagram of the 825/54 programmable interval timer 621
Fig. 10-3(c) Bits of the control register of the 8254 PIT. 621
Fig. 10-4(a) Pin-out diagram of the 8255b chip 622
Fig. 10-4(b) Architecture of the 8255 UART chip 623
Fig. 10-4(c) Mode summary of the 8255 chip 624
Fig. 10-5(a) Pin-out diagram of 8259 PIC chip 625

xxii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Figure Figure Caption Page


Fig. 10-5(b) Functional block diagram of the 8259 PIC chip 625
Fig. 10-5(c) Connection of the 8259 PIC chip to the 8088 microprocessor in IBM PC 627
Fig. 10-6(a) Pin-out diagram of the 8279 keyboard controller 629
Fig. 10-6(b) Functional block diagram of the 8279 keyboard controller 629
Fig. 10-6(c) Circuit diagram illustrating the use of the 8279 controller 630
Fig. 10-6(d) Circuit diagram illustrating the use of the 8279 controller 630
Fig. 10-7 Functional block diagram if the 8288 bus controller. 631
Fig. 10-8 Connection of the 8289 bus arbiter to the 8088 in maximum mode. 634
Fig. 10-9a Pin-out diagram of the 6845 CRT controller 635
Fig. 10-9b Utilization of 6845 CRT controller in a monochrome display adapter 636
Fig. 10-9c Utilization of 6845 CRT controller in a color graphic adapter 636
Fig. 10-10a Interfacing the 8088 CPU to peripheral devices via the 8742 UPI 636
Fig. 10-10b Pinout diagram of the 8742 UPI. 637
Fig. 10-10c Block diagram the 8742 UPI. 637
Fig. 10-11 Block diagram of the 82485 chipset 639
Fig. 10-12 Pins of the 82485 North-Bridge chipsets (MCH). 641
Fig. 10-13 Pins of the 82485 SouthBridge chipsets (ICH2). 642
Fig. 10-14 PC motherboard with i485D chipsets 643
Fig. 10-15 Graphic Address Re-mapping Table (GAET) 643
Fig. 10-16 Block diagram of the Intel DG965 chipset, 644
Fig. 10-17 AMD G690 chipset, as connected with the AMD Sempron 645
Fig. 10-18 Block diagram of the Intel x58 chipset 647
Fig. 10-19 Difference between Nehalem chipsets and previous Intel microprocessors 648
Fig. 10-20 Apple Xserve chipset with the Xenon microprocessor modules 649
Chapter 11.
Fig. 11-1 Photographs of some Intel microprocessors 661
Fig. 11-2 Photographs of some of the AMD microprocessors 668
Fig. 11-3 Photographs of some of the SPARC microprocessors 671
Fig. 11-4 Benchmarks of Pentium 3 and Pentium 4 microprocessors 676
Fig. 11-5 Benchmarks of Intel and AMD microprocessors 676
Fig. 11-6 Packages of different microprocessors 678
Fig. 11-7 Marking of the Pentium 4 microprocessors 677
Fig. 11-8 Socket 370 and Slot 1, their locations on the motherboard 679
Fig. 11-9 Photograph of the 754 socket (Intel) and the AM2 Socket (AMD) 682
Fig. 11-10 Chronological evolution of processor supply voltages 684
Fig. 11-11 Photograph of a laptop cooler 685

xxiii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

List of Tables
Table Table Title Page
Chapter 1.
Table 1-1 First Intel microprocessors 7
Table 1-2 First Motorola microprocessors. 9
Table 1-3 First Zilog microprocessors. 9
Table 1-4 First AMD microprocessors. 10
Table 1-5 First SPARK microprocessors. 11
Table 1-6 Instruction set of a simple microprocessor. 21
Table 1-7 Unix and its variants 30

Chapter 2.
Table 2-1 Pin assignment of 8086 microprocessor, in the minimum mode 41
Table 2-2 BHE pin signals 43
Table 2-3 Control signals of the 8086/8088, in minimum and maximum modes. 44
Table 2-4 Pin assignment of the 80386 microprocessors. 56
Table 2-5 Special registers in the 80386 microprocessors 60

Chapter 3.
Table 3-1 Segment registers and their typical offsets in x86 microprocessors 121
Table 3-2 Register usage in legacy and 64-bit operation modes 124
Table 3-3 ASI values for different addressing spaces in SPARC processors 148

Chapter4.
Table 4-1 Instruction set of x86 microprocessors, arranged in alphabetical order. 167
Table 4-2 Variant data transfer instructions, in x86 microprocessors 173
Table 4-3 Stack instructions, in x86 microprocessors 177
Table 4-4 Arithmetic instructions, in 80x86 microprocessors 179
Table 4-5 Logic instructions, in x86 microprocessors 183
Table 4-6 String instructions in x86 microprocessors 185
Table 4-7. Program control instructions in 80x86 microprocessors 188
Table 4-8, Processor control instructions in 80x86 microprocessors 189
Table 4-9. Privilege instructions in x86 microprocessors 191
Table 4-10 Floating point instructions format of x87 coprocessors 192
Table 4-11. First 10 exceptions and interrupts in x86 systems 196
Table 4-12 Summary of main Interrupt vectors, in IBM PC & compatible computers 204
Table 4-13 Undocumented instructions of the 80x86 processors 216
Table 4-14 REG field bits, in the addressing mode byte of x86 instructions 218
Table 4-15 MOD field bits, in the addressing mode byte of x86 instructions 219
Table 4-16 R/M field bits in the addressing mode byte of 80x86 instructions 219
Table 4-17 Effective calculation time in x86 processors (with no pipelining) 224

xxiv

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Table Table Title Page


Table 4-18 Effective address calculation in x86 processors (with no pipelining) 224
Table 4-19 Operation encodings for the load and store operations 233

Chapter 5.
Table 5-1 DEBUG program Instructions. 246
Table 5-2 Different possible formats of an assembler line 252
Table 5-3 Summary of the x86 macro assembler directives and pseudo-ops 255
Table 5-4 Summary of the most famous macro-assemblers, for x86 processors. 285

Chapter 6.
Table 6-1 Variable types in C-language 299
Table 6-2 Arithmetic operators in C-language 301
Table 6-3 Logical operators in C-language 301
Table 6-4 Relational operators in C-language 304
Table 6-5 Bit-level operators in C-language 302
Table 6-6 Escape sequences and string format identifiers in C language 308
Table 6-7 Data conversion specifiers in C language 309
Table 6-8 File I/O modes in C-language 311
Table 6-9. Reserved constants for file search (fseek) in C-language 312
Table 6-10. Standard I/O streams in C language 313
Table 6-11. Standard I/O streams in C++ language 320
Table 6-12. Access rights and inheritance in C++ 321
Table 6-13. Brief list of important messages of Windows operating systems 334
Table 6-14 Basic data types in Java 345
Table 6-15 Java operators 347
Table 6-16 Converters and flags used in TestFormat.java 385

Chapter 7.
Table 7-1 Pin assignment of the 2716 EPROM 417
Table 7-2 Pin assignment of the 41256 DRAM 427
Table 7-3 Bandwidth of most well-known DRAM types and their peak value 429
Table 7-4 Memory bank selection, in 16-bit data bus PC systems 430
Table 7-5 Specifications for SDRAM (DDR and DDR2) modules 434
Table 7-6 List of the most famous flash memory cards 453
Table 7-7 Pin assignment of the 2 MB Flash memory HY27HU08AG 455
Table 7-8 Summary of memory technologies and their applications 460

Chapter 8.
Table 8-1 Port selection map of the 8255 PIO chip 479
Table 8-2 Stepping modes of a stepper motor 492
Table 8-3 Direct memory access (DMA) channels usage 504

xxv

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Table Table Title Page


Chapter 9.
Table 9-1 Comparison between AGP and classic PCI bus. 537
Table 9-2 Comparison between AGP and other busses. 538
Table 9-3 Addresses of the PC communication ports 556
Table 9-4 Comparison between different communication networking technologies 570
Table 9-5 List of pins of the Parallel Interface cable 574
Table 9-6 Addresses of the PC line printers (LPTn) 577
Table 9-7 Comparison between serial devices and their data transfer rates 579
Table 9-8 Keyboard scan codes, for IBM PC’s 590
Table 9-9 Mouse data packets 596
Table 9-10(a) Pins of the VGA (15 pin) adaptor 598
Table 9-10(b) Pins of the VGA (9 pin) adaptor 598
Table 9-10(c) Pins of the DVI (24 pin) adaptor 599
Table 9-10(d) Pins of the DFP (20 pin) adaptor 599
Table 9-11 Video adaptor standards. 602
Table 9-12a Summary of I/O addresses in the IBM PC and compatibles 603
Table 9-12b Recent additions for I/O addresses in the IBM PC and compatibles 604

Chapter 10.
Table 10-1 Intel microprocessor support chips 614
Table 10-2 Registers of UART chips 610
Table 10-3 Bits of the Interrupt Enable Register (IER) 617
Table 10-4 Interrupts in the IBM PC 625
Table 10-5 Interrupt Requests (IRQs) in a recent IBM PC 628
Table 10-6 Input status and output control signals of the 8288 chip. 631
Table 10-7 Description of pins of the 8742 638
Table 10-8 List of recent PC chipsets and their characteristics 646

Chapter 11.
Table 11-1 INTEL Processors, from 1971 to 2008. 660
Table 11-2a Examples of Intel Mobile Processors 661
Table 11-2b Intel Desktop Processors. 662
Table 11-2c Intel Corei7 Desktop Processors 662
Table 11-2d Intel Laptop Processors 663
Table 11-2e Intel Laptop Processors (Cont) 664
Table 11-2f Intel Workstations and Server Processors. 664
Table 11-2g Intel Workstations and Server Processors (Cont.) 665
Table 11-3 AMD Processors, from 1975 to 2008. 666
Table 11-4 SPARC Processors from 1987 to 2008. 670
Table 11-5 Benchmarks of different processors 674
Table 11-6 Intel CPU sockets 681
Table 11-7 AMD CPU sockets. 681

xxvi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

PREFACE
Introduction to Microprocessors & Interface circuits deals with the general
principles of microprocessor design and interfacing by looking at the famous
x86 microprocessors (from Intel and AMD) and their associated peripheral
interface chips. For the matter of comparison, I also briefly introduced the
architecture of ARM and SPARC processors. My goal, from this book is
first educational. The book aims to give the electrical engineering students a
general understanding in microprocessor system design and programming
techniques. The architecture, operation and programming of x86 micro-
processors as well as their interface circuits are all covered in a didactic
manner along this book. Particularly, we handle the x86 microprocessors,
which are certainly worthy of study. In fact, the generic term x86 refers to
the instruction set of the most commercially successful CPU architecture in
the history of personal computing. The x86 assembly language is depicted in
order to emphasize the sequence of operations and their implications on the
hardware. We look at important concepts such as address decoding, memory
and input/output devices interfacing as well as data communication and
handshaking mechanism. Furthermore, we look at the PC architecture and
operation. This should enable the student to enter the workplace with
microprocessor design skills, and an understanding of microprocessor-based
applications. Assembly programming is satisfactorily explained in the
course. My choice has been to present a large set of assembly language
examples, which illustrate the various design options and possibilities, both
in instruction sets and in overall configurations. I make use of the DEBUG
utility to show what action the instruction performs, and then provides
sample assembly programs to show its application. The given examples are
actual programs, taken from the technical literature and manuals, offering
students a fun, hands-on learning experience. The knowledge of assembly
language will help the reader write better programs, even with high-level
languages, such as C, C++ and Java. The book covers, in eleven chapters
and seven appendices, the hardware and software design issues, which are
needed for building a complete microprocessor-based system. However, this
is not a dedicated book about the assembly language. It is not a dedicated
book about the PC, neither. Rather, it is a one-stop source on
microprocessors that uses an easy-to-understand, step-by-step approach to
teaching the fundamentals of assembly language programming and the PC
architecture.

xxvii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

I handled PC-related subjects, including all types of interface buses, memory


map and management, BIOS interrupts, serial and parallel ports, I/O
interfacing techniques, supporting chips, chipsets, PC networking, Ethernet,
and more.

There are three distinct populations of professionals whose education is to be


served by this book: the electronic engineer, who will design computer
system circuitry; the computer engineer, who will design the physical
computer architecture; and generally, the electrical engineer, who sees
computer systems simply as one part of a larger system. This book will
hardly make easy fare for undergraduate students, who do not have an
instructor somewhat skilled in the art that is being taught. However, this
book is meant for study. It goes without saying that for the computer
designer, the worked examples of this book should be fully assimilated.

The learning objectives are stated at the beginning of each chapter. These
learning objectives serve as a preview of the information the student is
expected to learn in the chapter. Each chapter is appended with summary
and ample problems to test the student understanding. The questions are
based on the learning objectives. On successful completion of this book, the
student will be able to:
 Analyze the performance of a microprocessor-based system and assess
the contribution of each part to overall system performance.
 Describe and analyze the effect of hardware limitations on the
performance of the microprocessor system and appreciate how the
designer can overcome such limitations.
 Predict the way in which trends in microprocessor architecture and
peripheral design will be incorporated in the next generation of
microprocessor systems.

I have goals for the book in addition to the educational ones. I think the book
can serve as a useful reference for the practicing electronics & computer
engineers. Behind the goal of the book as a guide for the computer designer,
lays the feeling that the field of computer engineering needs to develop a
sense of history and of looking to the past for guidance. The fantastic
advance in basic logic technology -in speed, cost, and reliability- makes each
day seem an absolutely new one. Thus, we have the goal of saving some of
the past and let our engineers catch the technology train for our future needs
xxviii

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

in the computer industry. This goal is mixed with a certain archival feeling.

We are all trying to increase the productivity of creative work of our society
in general and of our engineers in particular, by arming them with necessary
tools to conquer the computer (hardware and software) design arena.

I’m indebted to all my professors and colleagues, in Ain-Shams University,


and all the assistances and students (allover the world) who have read and
criticized the various subjects and examples throughout this book.

Finally I'd like to thank, my colleagues who encouraged me to publish this


book, and adopted it as a main references for students in the electrical and
electronic engineering departments in Egypt and the Gulf countries.

Prof. Dr. Muhammad EL-SABA

Cairo in April 2013

xxix

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

Preamble

This book refers to many public domain web sources and text books, in the
field of microprocessors and assembly programming. We don’t claim any
major original scientific contribution in this book. In fact, almost all the
information in this book can be found elsewhere in so many public domain
resources. For the matter of recognition, all these resources and references
are collected at the end of this book. In fact we intended to write the book in
a friendly manner, as an introductory text book, and we intentionally didn’t
mean it to look like a scientific paper in a specialized journal. The footnotes
are restricted to the explanation of the technical idioms, whose explanation
may deviate the attention of the reader, if they were inserted in the main text.

However, a great effort has been exerted – throughout several years of hard
work- such that the huge amount of information contained in this book, is
arranged and realized, in a didactic manner. An additional effort has been
exerted to supplement all this information with pedagogic discussion,
illustrating figures, and graded-in-difficulty solved examples.

I would also like to emphasize that a book about microprocessors should


refer to so many trademarks and advertising pages of various manufacturers.
All trademarks belong to their corresponding owners. These trademarks are
summarized in a separate page (Trademarks) at the end of this book.

xxx

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits INDEX

xxxi

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors CHAPTER 1

Introduction to
Microprocessors
Contents

1-1. Microprocessors & Microcomputers


1-2. Microprocessor History (INTEL, ZILOG, MOTOROLA, Cyrix, AMD)
1-2.1. Intel’s Microprocessors
1-2.2. Zilog Microprocessors
1-2.3. MOS Microprocessors
1-2.4. Motorola Microprocessors
1-2.5. AMD & Cyrix Microprocessors
1-2.6. DEC Alpha Microprocessors
1-2.7. SPARC Microprocessors
1-2.7. ARM Microprocessors
1-3. Microcomputer History
1-4. RISC & CISC Processors
1-5. How does the Microprocessor Work?
1-6. Memory & Memory Addressing
1-7. Microprocessor Instructions
1-8. How does the Microcomputer Work?
1-9. Operating Systems
1-10. Computer Languages
1-11. Summary
1-12. Problems

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Introduction to
Microprocessors

1-1. Microprocessors & Microcomputers


Microprocessors (P’s) are the brain of computers. The microprocessor is
a general-purpose programmable integrated circuit, which is able to
execute so many sets of user-defined functions (programs). These
programs are usually written, using a set of predefined instructions,
which the microprocessor can handle. This set is called the
microprocessor instruction set. Microprocessors are widely used in so
many products in our daily life, like TV, Cars, home appliances and of
course computers.

The microcomputer is a microprocessor-based system, which consists of


a central processing unit (CPU), memory and input/output (I/O) ports
that are interfaced to input and output devices. The microprocessor is
essentially a CPU which is implemented on a single chip. So, CPU is the
part of computer where information is processed (add, subtract… etc.).

Memory is the computer part which stores information in the form of


binary numbers (0’s and 1’s code). Input devices, like keyboard and
mouse, permit us to input information to the computer system. Also,
output devices, like monitors and printers, permit us to get back the
information from within a computer system after they have been
processed. The microprocessor has the ability to perform the following
functions:

• Execute stored set of instructions (programs),


• Access external memory chips for read and write operations,
• Access external input and output ports.

As shown in figure 1-1, the microprocessor system is interfaced to


external memory and input/output ports via a group of wires, called
system buses. There exist three main buses in the microprocessor system,
namely: data bus, address bus, and control bus.

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The address bus (that may be 8, 16 or 32 bits wide) sends an address to


memory. The data bus (that may be 8, 16 or 32 bits wide) can send data
to memory or receive data from memory.
The control bus carries the signals of control and co-ordination of the
various activities across the computer. For instance, it has RD (read) and
WR (write) lines tell the memory whether it wants to set or get data. As
shown in figure, the microprocessor is generally composed of two units:
the bus interface unit (BIU) and the execution unit (EU). The EU handles
instructions and executes them and the BIU handles the CPU transactions
with memory and input/output devices. The EU is composed of several
building blocks, among which, the most important are the arithmetic logic
unit (ALU) and the control unit (CU).

Data Bus

Read Read
EU BIU Write Write

CPU Memory Input Output


(KB) (Monitor)

Address Bus

Control Bus
RD/WR

CLOCK

Fig. 1-1. Schematic of a microprocessor system. The central processing unit (CPU) is
generally composed of two units: bus interface unit (BIU) and execution unit (EU).
4

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The ALU is a combination of logical math circuits, like adders,


multipliers and divider. More advanced ALU's have other logic functions,
like shift, rotate and compare capabilities. ALU's existed as standard
logic chips prior to microprocessors. Microprocessor designers just drew
on the ideas of such chips to create a whole thing.

The control unit (CU) is the traffic cop of any microprocessor. The CU
implements the microprocessor instruction set. The CU handles the order
of execution of programs and fetches and decodes the instructions to be
executed. Based on the bit combinations of the instructions the CU
moves data around the microprocessor and sends the necessary signals to
the ALU to perform the operation needed. After instructions are executed
the CU sets various signals, called flags that indicate the status of the
microprocessor and execution of instructions. More advanced CU's can
respond to unplanned events inside and outside of the microprocessor
through interrupts. The interrupts cause the CU to invoke special
programs to deal with these events. More advanced CU's use pipelining
to handle the execution of multiple instructions at once, pre-execute
instructions, and predict jump sequences.

There are two types of control units: hard-wired and micro-


programmed. The difference between them is summarized in how the
microprocessor instruction set is implemented; otherwise the function is
pretty much the same. In a hard-wired microprocessor, the instruction
register, is hardwired to rest of the microprocessor. As described above,
each bit, in each position, of an instruction causes the Control Unit (CU)
to do something. They are instructions within an instruction to control
how the microprocessor does what it does. This is a very efficient and
cost-effective way of implementing a microprocessor, but it has one very
major drawback. In the next generation of microprocessors, it will be
difficult if any significant changes are to be made to increase
performance or even reliability. There is a very good chance that the
instruction set will need to change because of the very tight coupling of
the bits of the instruction and operation of the microprocessor.

To solve this problem, microprocessor designers decided to put a buffer


between the instructions and the CU. Designers decided on a fixed
macro-instruction set that would not change from one generation to
another. The designers would also create a set of micro-instructions that
are hardwired and optimized for each generation of microprocessor. The
macro-instructions are then mapped to their corresponding micro-
instructions.
5

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The mapping of micro-instructions is stored in a memory accessible by


the control unit (CU). During program execution, the CU fetches the
corresponding macro-instructions in place of the instruction for
execution. This concept allows the investment in programming to be
preserved when upgrading to new generations of microprocessors.
Famous examples of micro-programming include the IBM System/360
architecture and the Intel x86 architecture.

1-2. Microprocessor History


The first single chip CPU (which has been called a microprocessor) was
the Intel 4004, that was introduced in 1971. The story started in 1969,
when Busicom (Japan) asked Intel and Mostek to develop a set of IC's
for an electronic calculator. Intel completed the task with a single
microprocessor chip, the 4004, which was used by Busicom for the
desktop electronic calculator 141-PF. Mostek developed a complete
"calculator-on-a-chip" which was used in Busicom's first handheld
model, the Handy LE-120. We mention here some of the well known
microprocessors, which have been available for sale commercially. The
following list mainly includes the 1980's and 1990's microprocessors. The
major microprocessor foundries, such as Intel, AMD, and IBM are
constantly improving their CPU designs, in several ways. Such
improvements or iterations are called technology steppings. In the next
chapter we discuss different generations of x86 microprocessors, their
internal architecture and functionality issues.

1-2.1. INTEL Microprocessors


Intel was founded in July 1968 by American engineers Robert
Noyce and Gordon Moore. Historically, the Intel 4004 was the first
microprocessor. The 4004 was a 4-bit processor, but its instructions were
8 bits long. It had a 1kB data memory and a 12-bit program counter for
4kB program memory. It had also sixteen 4-bit general-purpose registers.
The 4004 had 46 instructions, using only 2,300 transistors in a 16-pin
dual inline package (DIP). It ran at a clock rate of 740 kHz (CPU cycle of
10.8 s). Many authors expressed their admiration of the first
microprocessor chip, 4004, by saying that if the Pioneer 10 should ever
be found by an extraterrestrial species, the 4004 will represent an
example of Earth's technology1.

1
The 4004 was the first CPU on a single chip. It had the same computing performance of the first large
scale digital computer ENIAC. However, the CPU of ENIAC was built with thousands of vacuum
tubes, on an area of about 180 m2, while the 4004 was built on a single chip of a few mm2.

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The Intel 4040 (1972) was an enhanced version of the 4004, adding 14
instructions, 8K program space, larger stack (8 level), and interrupt
abilities (including shadows of the first 8 registers). The Intel 8008 was
the first 8-bit microprocessor. It was introduced in 1974 and contained
twice the same power of Intel 4004. The Intel 8008 microprocessor was
able to perform 50,000 instructions per second.

Fig. 1-2. Photograph of the Intel first microprocessor, the Intel 4004, and the latest
Intel Corei9-X series

The Intel 8080 was the first general-purpose 8-bit microprocessor. It was
introduced in 1974 and was ten times faster than the 8008. The 16-bit
architecture allowed the 8080 to access up to 64 kB of memory. So, it
was utilized in the first microcomputer kits, like Altair and IMSAT. Intel,
with Hewlett-Packard, developed a generation of processors with 64-bit
architecture called IA-64 (the older 80x86 design was renamed IA-32).
The following chronological list depicts the microprocessors that
followed 8080. Also figure 1-3 depicts the chronological evolution of the
Intel microprocessors.

Table 1-1. First Intel microprocessors

Processor Year
8086 (16bit data, 20bit address), 29000 transistors, 5MHz [1979]
8088 (8bit data, 20bit address), first IBM PCs [1981]
80286 (16bit data, 24bit address), 134000 transistors, 6MHz [1982]
80386, (32bit data, 32bit address), 275000 transistors, [1985]
80486 (32bit data, 32bit address), 1.2 million transistor, [1989]
Pentium (64 bit, 32bit address), 3.1 million transistors, 60MHz [1993]
Pentium III (64bit, 32 address), 9.5 million transistor, 450MHz [1999]
Pentium 4 (64 bit, 32bit address), 42 million transistors, [2000]
1.5GHz
Xeon (64 bit, 32bit address), [2001]
Pentium M (64 bit, 32bit address), [2002]
Itanium (64 bit), 25 million transistors, [2003]

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Pentium Extreme Edition (64 bit), [2005]


Xeon (64 bit) [2006]
Core2, (64 bit), 2 cores, 65nm [2007]
Core2 Quad, 4-cores, 45nm, [2008]
Corei3, 2-6 cores, 32nm.3.6 GHz [2011-2020]
Corei5, 2-6 cores, 32-14nm, 3.6 GHz [2011-now]
Corei7, 4-8 cores, 14nm, 3.6 GHz [2011-now]
Corei9, 10-18 cores, 14nm technology, 3.9 GHz [2019-now]

The first IA-64 implementation was named Itanium and was intended to
be compatible with the 80x86. Itanium is a VLIW (very long instruction
word) machine and can handle six instructions simultaneously. The
lineup of Core processors includes the Intel Core i3, Core i5, Core i7, and
Core i9, as well as the X-series of Intel Core CPUs. As of 2021,
the x86 architecture is used in most high end computers, including cloud
computing, servers, workstations, personal computers and laptops.

Fig. 1-3. Chronological evolution of Intel's 80x86 family.

1-2.2. MOTOROLA Microprocessors.


The Motorola MC6800 micro-processor was introduced in 1974 (six
months after Intel’s 8080). It was one of the first of the 8-bit
microprocessors such as the MOS Technology 6502 and the Intel 8080,
all of which were used in early personal computers. The 6800 was also an
8-bit data microprocessor, which had a good reputation in the
microcomputer industry. Motorola went on to develop the 68000 series
processors, which were used by Apple to power their Macintosh
machines. The 68060 was the last development of the 680x0 series for
general purpose use. It was used in some Amiga machines, but Apple had
moved to other RISC processors.
8

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Table 1-2. First Motorola microprocessors


Processor Year
6800 (8-bit data, 16-bit address), [1974]
68000 (16-bit data, 24-bit address), [1979]
68010 (16-bit data, 24-bit address), [1983]
68020 (32-bit data, 32-bit address) for Apple MAC, [1985]
68030 (32-bit, 32-bit address 18 MIPS, 50 MHz), [1988]
68060 (32-bit data, 32-bit address), [1993]

1-2.3. MOS Technology Microprocessors.


The MOS Technology 6502 microprocessor is a 8-bit microprocessor,
that was introduced in 1975. The 6502 was designed by many of the same
engineers that had designed the Motorola 6800 microprocessor family.
The 6502 was used in Apple II, which was among the first introduced
PC’s. The 6502 was also used in ATARI computers and Nintendo
entertainment systems (NES).

Fig. 1-4. Development of Motorola 680x0 family.

1-2.4. ZILOG Microprocessors


After the introduction of Intel 8080, ZILOG introduced Z80 by July
1976. The Z80 came about when Federico Faggin left Intel after working
on the 8080, and founded Zilog. The Zilog Z80 was an improved version
of 8080. It was an 8-bit microprocessor and was widely used in
embedded computers. The Z-80 used 8 bit data and 16 bit addressing, and
could execute all of the 8080 opcodes, but included 80 more, instructions.
The Zilog processors clock speed ranged from 2.5MHz to10MHz.
9

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The thing that really made the Z-80 popular in designs was the built-in
memory interface. The Z-80 CPU generated its own RAM refresh signals,
which meant easier design and lower system cost. This was a deciding
factor in its selection for the Radio Shack microcomputer TRS-80.
Although Zilog made several attempts to move off the Z80 onto more
powerful 16-bit (Z800, Z8000) and 32-bit (Z80000) platforms, other
companies were offering CPU's in this performance range years earlier,
and the Zilog chips never caught on.

The Z80000 was introduced in 1986, as an extension of the 16-bit Z8000


to 32-bit. Its address bus was 24-bit and it had a 256 byte cache.
However, Z80000 was not Z80 compatible.
Table 1-3. First Zilog microprocessors

Processor Year
Z80, (8-bit data, 16-bit address), [1980]
Z8000, (16-bit data, 24-bit address), [1982]
Z80,000, (32-bit data, 24-bit address) [1986]

1-2.5. Advanced Micro Devices (AMD) & Cyrix Processors


The most important competitors to Intel market share of microprocessors
has been Advanced Micro Devices (AMD) and Cyrix. However, National
Semiconductors purchased Cyrix, but Cyrix name still appear on the x86
chips that National semiconductor produces. In 1987 Intel cancelled the
technology exchange agreement with AMD. As a result, AMD didn't get
rights to 80386 and later Intel CPU’s. Since then, AMD started its own
line of products. The following list summarizes the main AMD
processors. A complete list can be found in Chapter 11.
Table 1-4. First AMD microprocessors

Processor Year
Am2900, 4-bit slice microprocessor [1975]
AMD 29000 series, aka 29K [1987-1995]
x86 processors second-sourced, under contract with Intel [1979-1991]
Amx86 series [1991–95]
K5 series [1995]
K6 series [1997-2001]
K7 series [1999-2003]
K8 series [2003-now]
K10 series [2007-now]
10

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Zen series, (zen1:14nm, zen2:7nm) [2017-now]

AMD AM2901 AMD 80286 AMD 80386

Fig. 1-5. Photographs of the some old AMD microprocessors

Other companies introduced their own versions of Intel’s 8080, like the
National Semiconductors IMP-8, and the Fairchild F-8. However, only
Intel and Motorola continued to create and improve new microprocessors.
Zilog has rather concentrated on microcontrollers, which are complete
microcomputer systems on a single chip. On the other hand, Texas
Instruments has concentrated on the production of RISC processors
(SPARC) and digital signal processors (DSP’s).

1-2.6. SPARC Processors


The Scalable Processor ARChitecture (SPARC) was formulated at Sun
Microsystems in 1984 through 1987. The original SPARC architecture
was based on the RISC I and II designs engineered at the University of
California at Berkeley (UCB) from 1980 through 1982. It was developed
at the same time as the MIPS architecture, which was developed at
Stanford University and then adopted by Silicon Graphics. Like MIPS,
the architecture is open (not proprietary), and the name SPARC is
licensed by SPARC international to those who manufacture chips.
Actually, SPARC was first implemented by Sun in 1987. In this year, Sun
introduced the first SPARC-based computer; the Sun-4. This was
packaged as Sun-3 servers (based on Motorola 68030 CPU). In 1989, Sun
introduced the SPARCstation1, 3 times faster than the Sun-3 model. This
blew away all other competitors, who were waiting to get volume
quantities of Motorola's 68040 CPU, to replace their 68030 models. In
fact the 68040 took a long time to come, which forced many workstation
manufactures to ship older models with the promise of a future upgrade.
Since then, SUN has been the major player in workstation market.The
line of SPARC processors has been traditionally produced by Texas
Instruments and utilized by Sun Microsystems, in their famous
workstations. However, there is no single dominant supplier of SPARC
CPU's. Actually, Sun Microsystems, the largest consumer of SPARC
chips, runs a fabless business 11 model. This means that Sun
Prof. Dr. Muhammad EL-SABA
Introduction to Microprocessors CHAPTER 1

does not own a fabrication utility (Silicon foundry), which manufactures


their processors.
Table 1-5. First SPARK microprocessors

Processor Name Year Data bus Frequency Platform


SPARC 1987 32 bit 14.2-40 MHz SUN-4
SuperSPARC 1992 32 bit 33-90 MHz SPARCstation20
hyperSPARC 1993 32 bit 40-200 MHz SPARCstation20
TurboSPARC 1995 32 bit SPARCstation5
SPARC64 1995 64 bit 101-118 MHz HALstation 300
SPARC64 VI 2007 64 bit 2150-2400
Processor Name Year Data bus Frequency Platform
UltraSPARC I 1995 64 bit 145-167 Ultra 2 Server
UltraSPARC IV 2004 64 bit 1050-1350 SunFire
UltraSPARC T1 2005 64 bit 1000-1400 SunFire
UltraSPARC T2 2007 64 bit 1000-1400 Enterprise Server
UltraSPARC RK 2009 64 bit 2300-2500 Enterprise Server
Oracle SPARC-T5 2013 64 bit 3600 Enterprise Server

The figure 1-6 depicts the chronological evolution of SPARC64


processors, up to the end of 2010.

Fig. 1-6. Roadmap of SPARC64 architectures

1-2.7. DEC Alpha Processors


The DEC Alpha was originally 12 developed by Digital Equipment

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Corp (DEC). DEC was designed as a successor to the VAX line of


computers; it supported the VMS operating system, UNIX as well as
Windows. The 64-bit alpha processor was introduced in 1992 at 200MHz.
DEC Alpha was designed as a 64-bit architecture with super-pipelining
and superscalar design. At the time, DEC touted it as the world fastest
processor. Faster pieces were announced in 2001 and were available since
2003 at 1.1GHz and upwards.

1-2.8. ARM Processors


The Advanced RISC Machine processor (ARM) is a series of low-cost,
power-efficient 32-bit RISC microprocessors for embedded control,
computing, digital signal processing, and portable applications. ARM was
the first commercial RISC microprocessor (or MIPS R2000). It was
licensed for production by Asahi Kasei Microsystems, Cirrus Logic, GEC
Plessey Semiconductors, Samsung, Sharp, Texas Instruments and VLSI
Technology. The ARM Holding company develops the instruction set
and architecture for ARM products, but does not manufacture them. The
company periodically releases updates to its cores.

ARM has devised two different naming conventions, namely:

 ARM revisions (ARMv1, v2, v3, v4, v5, v6, v7)


 ARM Core implementations (ARM1, ARM2,…ARM11, Cortex)

Versions can be qualified with variant letters to specify collections of


additional instructions that are included as an architecture extension.
However, there exist interrelations between these naming conventions.
For instance, ARM11 is based on ARMv6 and ARM Cortex series is
based on ARMv7
The ARM architecture has evolved significantly since it was first
developed. Six major versions of have been defined to date, denoted by
v1 to v6. The first three versions are now OBSOLETE. All ARM cores
(after ARM3) support a 32-bit address space and arithmetic; the ARM8,
announced in 2011, had support for 64-bit address and arithmetic.

13

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Fig. 1-7. ARM processors roadmap.

1-3. Microcomputer History


The microcomputer, like the IBM PC, is a general purpose machine that
can be used for many applications such as word processing, electronic
spread sheets, databases and computer-aided design. The definition of
first PC is somewhat fuzzy. However, the first widely-known personal
computer kit (called Altair 8800) was based on the Intel 8080
microprocessor and contained 256 byte of memory. Altair 8800 was
introduced2 as a hoppyist project in Popular Electronics magazine in
January 1975. The basic Altair, shown in figure 1-6, had no keyboard, no
display and no disk drive. Rather, it was programmed by toggle switches
with light-emitting diodes (LED's) in its front panel.

Fig. 1-8. Photograph of the ALTAIR 8800 computer.

2
Altair was developed by Edward Roberts, William Yates and Jim Bybee

14

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Another early innovation was by Jobs and Wozniak with their invention
of the Apple II. This simple microcomputer used a 6502 processor, rather
than the M6800 (don't ask why!). It had a ROM BIOS-based operating
system3, and a BASIC program interpreter. The latter made it very easy to
operate, by supplying an easy method for programming and controlling it.
After the introduction of Apple II in 1977, the individual user became a
new target in the computer industry. IBM, the major computer
manufacturer at that time, needed to react quickly. Thus, IBM decided to
develop the Personal Computer. IBM outsourced the production of
microprocessor chips to Intel and the operating system to Microsoft.

In 1981, IBM chose the 8-bit Intel 8088 microprocessor, which is a


version of the 8086, for the IBM 5150 PC. Actually, IBM's engineers
wanted to use the Motorola 68000 microprocessor. But IBM already had
rights to manufacture the 8086 microprocessor, in exchange for giving
Intel the rights to its bubble memory designs. So, IBM was using the
8086 in the IBM Display Writer word processor (the 8080 and 8085 were
also used in other products). Other factors, which affected the IBM
choice, were the fact that the 8-bit 8088 microprocessor could use
existing low cost 8085-support chips and components, and allowed the
computer to be based on a modified 8085 design. The 68000
microprocessor components were not widely available. In fact, the IBM
first attempt to make a personal computer in 1975 has failed (it was called
IBM 5100). The IBM 5100 was a discrete logic CPU with no bus,
equipped with BASIC language and APL as the operating system. It had
16kB RAM and 5" monochrome monitor –for $10,000! So, cost was a
large factor in the design of personal computers.

Fig. 1-9. Photograph of the IBM first personal computer (IBM PC 5150).
3
ROM stands for Read-Only Memory and BIOS stands for Basic Input/Output System.

15

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The availability of the operating system (OS) was also an important


factor in IBM choice of 8088 microprocessor. At that time, the CP/M-86
operating system, which was written by Digital Research, was the
standard operating system for 8086 microprocessor. However, due to
some legal problems between IBM and Digital Research, the former hired
Microsoft Corporation to provide the operating system of its first PC. The
OS of the first IBM PC was purchased by Microsoft from Seattle
Computer Products and renamed MS-DOS4. The system was greatly
improved by IBM engineers and called IBM DOS. On the other hand
Microsoft continued to improve MS-DOS over the years and finally
introduced Windows, which became the de facto standard OS for IBM
compatible personal computers.

1-4. RISC & CISC Processors


According to the nature of their instruction set, microprocessor systems
can be divided into two basic architectures:

• Reduced instruction set computers (RISC),


• Complex instruction set computers (CISC).

Earlier microprocessors were based around the idea that making the CPU
supporting a larger number of advanced and complex instructions would
lead to increased performance. This idea is at the root of CISC systems.
The Intel x86 and the Motorola 680x0 are CISC systems. For instance,
the Intel x86 processors have more than three hundred instructions. On
the other hand, RISC systems are characterized by small set of short
primitive instructions with fixed length, from which a computer
programmer can build more complex routines and programs. The
SPARC, PowerPC, MIPS DEC Alpha and ARM have RISC
architecture Such RISC systems are sometimes referred to as load/store
systems. The so-called single-instruction computers (SIC) are extreme
cases of RISC machines whose instruction set is reduced to minimum.

The CISC machines have been becoming more and more complex with
each new generation. For instance, the x86 processors vary in length from
one to over a dozen of bytes. This increases the functionality of such
processors and enables their compatibility with older generations. RISC
design techniques offer power in even small sizes, and thus has become
dominant for low-power 32-bit CPUs. As of 2007, the x86 designs are as
fast as the fastest available RISC solutions.

4
MS-DOS stands for Microsoft Disk Operating System.
16

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-5. How Does the Microprocessor Work?


As we mentioned above, a microprocessor executes a collection of
machine instructions that tell the microcomputer what to do. Based on
these instructions, a microprocessor does three basic functions:

 Using its ALU (Arithmetic/Logic Unit), a microprocessor can perform


mathematical operations like addition, subtraction, multiplication and
division. Modern microprocessors contain complete floating-point
processors (FPP) that can perform floating point operations.
 A microprocessor can move data from a memory location to another.
 A microprocessor can make decisions and jump to a new set of
instructions (subroutines) based on those decisions.

There may be very sophisticated things that a microprocessor does, but


those are the basic activities.

Fig. 1-10. Schematic diagram of a simple microprocessor. This is a modified version


of the SAP processor, mentioned in Malvino’s book: Digital Computer Electronics,

Figure 1-10 illustrates the block diagram of a simple microprocessor


capable of doing the above cited functions. As shown in figure,
the microprocessor is interfaced to 17 the external world via the

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

following group of connection wires (buses):

 An address bus that sends an address to memory


 A data bus that sends data to memory or receive data from memory
 An RD (read) and WR (write) lines to tell the memory whether it
wants to set or get the addressed location
 A clock line that lets a clock pulse sequence the processor
 A reset line that resets the program counter to zero and restarts
program execution.

Let's assume that both the address and data buses are 8-bits wide in this
example. Here are the components of this simple microprocessor:

 Registers A, B and C are cascaded latches made out of edge-


triggered flip-flops. Registers are temporary information-holding
places and are important in any processor architecture.
 The address latch is just like registers A, B and C.
 The program counter PC is a latch with the extra ability to increment
by 1 when told to do so and also to reset to zero when told to do so.
 The ALU could be as simple as an 8-bit adder. It might be able to add,
subtract, multiply and divide 8-bit values.
 The test register is a special latch that can hold values from
comparisons performed in the ALU. An ALU can compare two
numbers and determine if they are equal, if one is greater than the
other, etc. The test register can also hold a carry bit from the last stage
of the adder.
 It stores these values in flip-flops and then the instruction decoder can
use the values to make decisions. There are six boxes marked "3-
State" in the diagram. These are tri-state buffers. A tri-state buffer
can pass data when it is enabled or it can disconnect its output (by
showing a high impedance) when it is disabled. So, tri-state buffers
allow bidirectional data sources to connect to a wire, but only one of
them can actually drive data onto the line at a time.
 The instruction register and instruction decoder are responsible for
controlling all of the other components.

Although they are not all shown in the above diagram, there are control
lines from the instruction decoder to perform the following functions:

 Tell the A register to latch the value currently on data bus


 Tell the B register to latch the 18 value currently on data bus

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

 Tell the C register to latch the value currently on data bus


 Tell the program counter to latch the current value on data bus
 Tell the address register to latch the value currently on data bus
 Tell the instruction register to latch the current value on data bus
 Tell the program counter to increment
 Tell the program counter to reset to zero
 Activate any of the six tri-state buffers (six separate lines)
 Tell the ALU what operation to perform
 Tell the test register to latch the ALU test bits
 Activate the RD line
 Activate the WR line

Coming into the instruction decoder are the bits from the test register and
the clock line, as well as the bits from the instruction register.

1-6. Memory and Addressing


Memory has to be thought of as a sort of file cabinet with each location in
it being a folder in the cabinet. In a file cabinet, you go through the tabs
on the folders until you find the right one. To get to any memory location,
a different method is used. Instead, a unique address is assigned to each
location. A byte is the most used number in a microcomputer because
each memory location is one byte wide.
As in microprocessor systems, there are three buses associated with the
memory subsystem. One is the address bus, the second is the data bus,
and the third is the control bus. The address bus acts to select one of the
unique memory locations.
The control bus determines whether this will be a read or a write. In the
case of an instruction fetch, the control bus is set up for a read operation.
Data is read or written through the data bus.
There are several different types of memory in a microcomputer. One is
Program (or Code) memory. This is where the program is located.
Another type is Data memory. This is where data that might be used by
the program is located. The difference between the two is that Program
memory is write protected, or read only. Data memory, on the other hand,
can be changed by the program as necessary. Two terms are used when
talking about memory.
Memory Read is getting a value from memory and Memory Write is
putting a value into memory. In some computers the address is a word of
16 bits. This allows for a maximum of 216 (65536) unique addresses or
memory locations that can be accessed. These addresses are
19

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

usually referred to by a 4-digit hexadecimal5 number. Such a memory


usually starts at address 0000H and could go up to FFFFH. To access
these locations, a 16 bit address is presented to memory and the byte at
that location is either read or written, according to a read- control (RD) or
write-control (WR) signal. Figure 1-11, depicts a memory system with
16-bit address bus (A0 to A15) and 8-bit data bus (D0 to D7). As shown,
the memory has a read/write control line (R/W) and 2 16 addressable
locations, and said to have 64kB. In microcomputer systems, where
memory size extends to 1MB or more, the memory space is usually
divided into segments.

Fig. 1-11. Schematic diagram of a memory system, with addressable 64 k byte.

In 16-bit microcomputers, the segment size is usually 64kB. Therefore,


the information to be stored in memory is mapped into 3 types of
segments:

i) Code segments
ii) Data segments
iii) Stack segments

The code segments hold the main program code, while the data segments
hold data. The stack segments hold the necessary parameters to handle

Hexadecimal numbers are base-16 numbers which are represented by a string of hexadecimal digits
followed by the character H. A hexadecimal digit is a character from the set (0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
A, B, C, D, E, F). In some cases, a leading zero is added if the number would otherwise begin with one
of the digits A-F. For example, 0FH is equivalent to the decimal number 15.

20

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

subroutines. Subroutines are secondary tasks, to which the main program


sometimes needs to branch.

The type of memory assignment used in Intel processors is Little Endian.


Other processors, like ARM, may consider both Little Endian and Big
Endian formats of memory. In Little Endian format, the lowest
numbered byte in a word is considered the least significant byte, and the
highest numbered byte the most significant. In Big Endian format, the
most significant byte of a word is stored at the lowest numbered byte and
the least significant byte at the highest numbered byte.

21

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1.7. Microprocessor Instructions


As we have seen so far, a microprocessor can fetch and execute a
collection of machine instructions. Even the incredibly simple
microprocessor shown in the previous example will have a fairly large set
of instructions that it can perform. The collection of instructions is
implemented as bit patterns, each one of which has a different meaning
when loaded into the instruction register. Humans are not particularly
good at remembering bit patterns, so a set of short words are defined to
represent the different bit patterns. This collection of words is called the
assembly language of the processor. Here's the set of assembly language
instructions that the designer might create for the simple microprocessor
in our example:

 LOADA mem - Load register A from memory address (mem)


 LOADB mem - Load register B from memory address (mem)
 CONB con - Load a constant value (con) into register B
 SAVEB mem - Save register B to memory address (mem)
 SAVEC mem - Save register C to memory address (mem)
 ADD - Add A and B and store the result in C
 SUB - Subtract A and B and store the result in C
 MUL - Multiply A and B and store the result in C
 DIV - Divide A and B and store the result in C
 COMP - Compare A and B and store the result in test
 JUMP addr - Jump to an address (addr)
 JEQ addr - Jump, if equal, to address (addr)
 JNEQ addr - Jump, if not equal, to address (addr)
 JG addr - Jump, if greater than, to address (addr)
 JGE addr - Jump, if greater than or equal, to address (addr)
 JL addr - Jump, if less than, to address (addr)
 JLE addr - Jump, if less than or equal, to address (addr)
 STOP - Stop execution

If you are familiar with C-programming language, then you know that the
following simple piece of C-code, in figure 1-11, will calculate the
factorial of 7 (factorial 7 = 7! = 7* 6 * 5 * 4 * 3 * 2 * 1 = 5040). Here,
we declared two integer variables, namely: a and F. At the end of the
program execution, the variable F will contain the factorial of 7.

22

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

int main () Start


{
a=1
int a =1;

int F =1; F=1

while (a <= 7) // loop 7 times


{
F = F * a; a <= 7 ?
a = a + 1; // increment a No
}
YES
return 0; F=F*a

}
a=a+1 End

Fig. 1-12. Flowchart of the program and the equivalent C-language code.

A C-compiler can translate this C-code into assembly language of a


given microprocessor. Assuming that RAM starts at address 128 in this
processor, and ROM (which contains the assembly language program)
starts at address 0, then for our simple microprocessor the assembly
language might look like this:

; Assume a is to be stored (saved) at address 128 (in the RAM)


; Assume F is to be stored (saved) at address 129 (in the RAM)
; Note that comments are preceded here by a semicolon ;
0 CONB 1 ; a =1
1 SAVEB 128
2 CONB 1 ; F =1
3 SAVEB 129
4 LOADA 128
5 CONB 7
6 COMP ;a >7?
7 JG 17 ; if a>7, jump to 17 and stop, else proceed
8 LOADA 129 ; F= F * a
9 LOADB 128
10 MUL
23

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

11 SAVEC 129
12 LOADA 128 ; a = a+1;
13 CONB 1
14 ADD
15 SAVEC 128
16 JUMP 6 ; loop back to compare ( if a >7 condition)
17 STOP

So now the question is, "How do all of these instructions look in ROM?"
Each of these assembly language instructions must be represented by a
binary number. For the sake of simplicity, let's assume that each assembly
language instruction is given a unique code (or number), like this:
Table 1-6. Instruction set of a simple microprocessor.

INSTRUCTION MENIMONIC CODE (OPCODE)


LOADA 1
LOADB 2
CONB 3
SAVEB 4
SAVEC mem 5
ADD 6
SUB 7
MUL 8
DIV 9
COMP 10
JUMP addr 11
JEQ addr 12
JNEQ addr 13
JG addr 14
JGE addr 15
JL addr 16
JLE addr 17
STOP 18

The above code numbers are usually called opcodes. Our little program,
in ROM would look like the second column (Opcode/Value) in the
following list (in binary form, without comments):

; Assume a is at address 128 (RAM)


; Assume F is at address 129 (RAM)
24

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

;Addr Opcode/Value Comment


0 3 ; CONB 1
1 1
2 4 ; SAVEB 128
3 128
4 3 ; CONB 1
5 1
6 4 ; SAVEB 129
7 129
8 1 ; LOADA 128
9 128
10 3 ; CONB 7
11 7
12 10 ; COMP
13 14 ; JG 17 (31)
14 31
15 1 ; LOADA 129
16 129
17 2 ; LOADB 128
18 128
19 8 ; MUL
20 5 ; SAVEC 129
21 129
22 1 ; LOADA 128
23 128
24 3 ; CONB 1
25 1
26 6 ; ADD
27 5 ; SAVEC 128
28 128
29 11 ; JUMP 6 (12)
30 12
31 18 ; STOP

You can see that the internal seven lines of C-code became 18 lines of
assembly language, and that became 32 bytes in ROM.

The instruction decoder needs to turn each of the opcodes into a set of
signals that drive the different components inside the microprocessor.

Let's take the ADD instruction as an example and see what it needs to do:

25

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1. During the first clock cycle, we need to fetch (or load) the
instruction. Therefore the microprocessor needs to:
 activate the tri-state buffer for the program counter (PC)., to point
to the instruction (ADD) location in memory,
 activate the RD line,
 activate the data-in tri-state buffer,
 read and latch the instruction into the instruction register.
2. During the second clock cycle, the ADD instruction is decoded
(and executed). It needs to do very little:
 set the operation of the ALU to addition,
 latch the output of the ALU into the C register.
3. During the third clock cycle, the program counter (PC) is
incremented.

Start

Are there N
any instruction
waiting ?
Y

Fetch instruction

Execute
instruction

N N
HALT Are there any
instruction ? I/O interrupts
waiting ?

Y Y

Transfer control to Interrupt


Service Routine
Stop

Fig. 1-13. Flowchart of the microprocessor operation


26

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Executing instruction requires executing sub-instructions. This is


controlled in two ways, according to the design of the CPU control:

 Hardwired control.
 Microprogramming.

In hardwired control, each processor has to be constructed from digital


circuits that are hardwired to execute the machine code instructions. Such
a design is specific to a particular architecture and organization. As
shown in figure 1-14(a), the inputs to a hardwired controller are:

 Instruction register (IR).


 Step numbers (instruction address).
 Condition codes and status flags.

Fig. 1-14(a). Block diagram of a hardwired controller

Microprogramming is a simple concept whereby a machine-level


instruction is decomposed into a sequence of primitive operations that,
together, implement the machine level instruction. These primitive
operations (micro-operations or microinstructions) consist of simple
actions that trigger flip-flops and enable some tri-state gates.
Microprogramming is a general technique that can be applied to any
computer; for example, you could construct a micro-programmed control
unit that is capable of implementing, say, the Intel Pentium instruction set
or the Motorola 680x0 instruction set.
27

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Because the microprogrammed control unit is such a regular structure, it


was possible to develop formal methods for the design of microprograms
and to construct tools to aid the creation, validation, and testing of micro-
programs.

There are many ways in which the above microprocessor design can be
improved. For instance, the registers A, B and C may be replaced by a
larger set of general-purpose registers. Also, the microprocessor can be
designed to handle input/output (I/O) interrupt events that may occur
during the program execution and to transfer control to their service
subroutines. The device requesting an interrupt can identify itself by
sending a special code to the CPU over the bus. The code, supplied by the
interrupting device, may represent the starting address of an interrupt
service routine (ISR).

Fig. 1-14(b). Block diagram of a microprogramming controller

28

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-8. How Does the Microcomputer Work?


Any computer can only do what the programmer has told it to do, in the
form of a program. A program is just a sequence of very simple
commands that lead the computer to solve some problem. Once the
program is written and debugged (you hardly ever get it right the first
time), the computer can execute the instructions very fast, and always do
it the same, every time, without a mistake. As we mentioned so far, the
microcomputer is a microprocessor-based system. In addition to the
microprocessor, the microcomputer system has other hardware and
software components, as shown in figure 1-15. The microcomputer
memory is composed of a RAM and a ROM. The RAM is a read/write
memory that contains several bytes. The microprocessor can read or write
to those bytes depending on whether the RD or WR control line is
signaled. One problem with conventional RAM chips is that they forget
everything once the power goes off. That is why the computer needs a
ROM.

A ROM chip is programmed with a permanent collection of preset bytes.


The address bus tells the ROM which byte to get and place on the data
bus. When the RD control line is activated, and the address bus points to
a certain location into the ROM, the ROM chip presents the selected byte
onto the data bus. On a PC, the ROM is called BIOS (Basic Input/Output
System).

CPU

BIOS
ROM

Fig. 1-15. Architecture of a microcomputer system (Software & Hardware).

29

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

When the microprocessor starts, it begins executing instructions it finds


in the BIOS. The BIOS instructions do things like testing the machine
hardware (power-on self test or POST), and then it goes to the hard disk
to fetch the boot sector. This boot sector contains another small program,
and the BIOS stores it in RAM after reading it off the disk. The
microprocessor then begins executing the boot sector instructions from
RAM. The boot sector program will tell the microprocessor to load the
operating system (e.g., DOS or Windows) from the hard disk into RAM,
which the microprocessor then executes. The operating system (OS) tells
the microprocessor how to load other application programs (like WORD)
from hard disk to memory, in order to execute them under user request.

1.9. Operating Systems


You can view a computer system as being built from three general
components: the hardware, the operating system, and the applications, as
shown in figure 1-14. The operating system (OS) is the component that
on one side manages and controls the hardware and on the other manages
the applications. It is the first "program" that loads when the computer
boots. Its main part, the kernel, is loaded whenever the computer is on.
An operating system can be viewed as the set of basic programs and
utilities that make your computer run. At the core of an operating system
is the kernel. The kernel is the most fundamental program on the
computer and does all the basic housekeeping and lets you start other
programs. Examples of operating systems are UNIX, MS-DOS and
Windows from Microsoft,

MAC-OS from Apple, OS/2 and OS/400 from IBM.6 Once upon a time,
not so long ago, everyone knew what an operating system was. It was the
complex software sold by the maker of your computer system, without it
no other programs can function on the computer. Its duties included:
spanning the disks, monitoring the terminals, and generally keeping track
of what the hardware is doing. An application (user) programs were
asking the operating system to perform various functions; and users were
seldom talking to the OS directly. Today those boundaries are not quite
so clear. The rise of graphical user interfaces, macro and scripting
languages, and the increased popularity of networks --all of these factors
have blurred the traditional distinctions. Today's computing environments
consist of layers of hardware and software that interact together to form a
wowing whole.

6
All these names are trademarks of their corresponding companies

30

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

The Operating System components reflect the services made available by


the OS, namely:

1. Process Management
o Process is a program in execution
o Process creation/deletion (book-keeping)
o Process suspension/scheduling,
o Process synchronization
o Process communication
2. Memory Management
o Maintain bookkeeping information
o Map processes to memory locations
o Allocate/de-allocate memory space as requested/required
I/O Device Management
o Disk management functions such as free space management,
storage allocation, fragmentation removal, head scheduling
o I/O device interface through buffering/caching, custom drivers for
each device.
File System (Built on top of disk management)
o File creation/deletion.
o Support for hierarchical file systems
o Update/retrieval operations: read, write, append, seek
o Mapping of files to secondary storage
Protection (Controlling access to the system)
o Resources --- CPU cycles, memory, files, devices
o Users --- authentication, communication
Network Management (Often built on top of file system)
o TCP/IP, IPX, IPng
o Connection/Routing strategies
o Communication mechanism
o Data/Process migration
Network Services (Built on top of networking)
o Email, messaging (Exchange)
o FTP
o www and gopher,
o Distributed file systems --- NFS, AFS, LAN Manager
o Name service --- DNS, NIS
o Security --- kerberos
User Interface
o Character-based shells such as sh, and command.com
o GUI --- XWindows, Win32

31

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-9.1. UNIX
UNIX (or UNICS7) was initially developed in 1969 and released in 1971,
by the AT&T engineers Ken Thompson and Dennis Ritchie to run on the
DEC PDP-7. When Thompson went to University of California at
Berkeley (UCB) to teach for a year, one of his students (Bill Joy)
developed the em editor and the first release of Berkeley Software
Division (BSD) was released in 1977. UNIX BSD was licensed to several
companies and further editions of UNIX followed (see table 1-2). For
instance, SCO developed its first Unix system called SCO XENIX
System V for Intel x86 processor-based PCs. Also, Sun Microsystems
developed the SUNOS, which was later renamed SOLARIS.

UNIX is a powerful and popular operating system which operates in a


consistent and elegant way. UNIX has a superuser who is protected by a
password and who has special privileges. The superuser (su) is able to
configure and maintain the operating system. When a user logs in to a
UNIX system, a program called the shell interprets the user commands.
The commands of UNIX take a little getting used to, because they are
heavily abbreviated and the abbreviations are not always what you might
expect. The UNIX immense popularity in the academic world influenced
the thinking of a generation of programmers and systems designers.

Table 1.7. Unix and its variants

Unix / Variant Developer (Company) Date of release


Unix version 1.0 AT & T 1971
BSD BSD (UCB) 1977
SCO Xenix SCO (Santa Claus Operation) 1979
HP-UX HP 1986
Unix System V SUN Microsystems and AT&T 1987
AIX IBM 1990
Solaris SUN Microsystems 1991
Linux Linus Torvalds 1991
FreeBSD BSD 1993
NetBSD BSD 1994
SCO Unix SCO 1995
IRIX SGI (Silicon Graphics) 1998
Lindow (LinSpire) Microsoft 2001 / 2004

7
UNICS stands for UNiplexed Information and Computing Service, and changed later to UNIX

32

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-9.2. MS-DOS
The Microsoft Disk Operating System (MS-DOS) has been the most
popular PC operating system for about two decades. In 1980 IBM
commissioned Bill Gates to produce an operating system for their new
PC. Bill Gates was known to IBM because he had written a version of the
BASIC language for the Intel 8080-based Altair PC. Because IBM’s
original PC had only 64K bytes of RAM and no hard disk, a powerful
operating system like UNIX could not be supported. Bill Gates did not
have time to develop an entirely new operating system, so his company,
Microsoft, bought an operating system called 85-DOS from Seattle
Computer Products and renamed it 86-DOS. This product was then
modified, by doping it with some flavors from CP/M (from Digital
Research) and renamed MS-DOS. The first version of MS-DOS, was
released in 1981. MS-DOS Version 1.0 occupied 12K bytes of memory
and supported only a 160 KB in diskette.

MS-DOS performed all its input and output transactions by calling


routines from the IBM PC basic input/output system (BIOS) ROM. MS-
DOS 1.0 also included a command processor, Command.com, like
UNIX’s shell. Indeed, MS-DOS has many features of UNIX but it lacks
UNIX’s consistency. Unlike UNIX, MS-DOS was not designed as a
multi-user timesharing system and, therefore, there is no logon procedure.
In other words, MS-DOS has no security mechanism. Also, MS-DOS file
name was restricted to 8 characters, while UNIX file names can be up to
255 characters. In fact, MS-DOS was designed for the 8086
microprocessor that could address only up to 1 MB of address space. In
addition, the pressure to maintain backward compatibility with older
versions of PC software has meant that MS-DOS cannot handle programs
larger than 640 KB. However, over the years Microsoft refined MS-DOS
to take advantage of the improved microprocessors of later generations.
While MS-DOS is not used commonly today, it still can be accessed from
Windows, by clicking Start / Run and typing the desired command.

1-9.3. Windows
The need to make computers accessible to those who want to employ
them as a tool, forced the development of graphical user interfaces (GUI)
like Windows. The first version of Windows appeared in November
1985. However, at least until version 3.11 (1994), Microsoft’s Windows
was not an operating system, but a front-end GUI. The first versions of
Windows enabled users to switch among several concurrently running
applications. The product included a set of desktop applications, such as a
file manager, notepad, calculator, clock, and communications programs
33

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Windows NT (new technology) was released in July 1993, and was the
first Windows operating system to combine support for client/server
business applications. However, Windows NT shared so many features of
the IBM OS/2, that appeared almost in the same time.

Windows 95 was the successor to the three existing desktop operating


systems from Microsoft—Windows 3.1, Windows for Workgroups, and
MS-DOS. Windows 95 integrated a 32-bit kernel with TCP/IP for
Internet support, and Plug and Play capabilities that made it easy for users
to install hardware. Windows 98 was the first version of Windows
designed specifically for consumers. Improvements included support for
DVD and universal serial bus (USB) devices. Windows Me
(Millennium) was released in 2000 to offer consumers numerous
improvements. Windows 2000 Professional was designed to replace
Windows 95, Windows 98, and Windows NT Workstation 4.0 on all
business desktops and laptops.

With the release of Windows XP in October 2001, Microsoft merged its


consumers and business lines around Windows 2000. Windows XP had
been released in so many editions, like Home edition, Professional,
Media Center, and embedded and Windows XP 64-Bit Edition. Windows
XP 64-Bit Edition was the first 64-bit client operating system from
Microsoft, to satisfy the needs of high-performance computing. Windows
XP was later appended with Vista, Windows7, 8 and lately Windows10.

1-10. Computer Languages


Computer languages are the method of communication between human
and computer. There exist two basic types of computer languages:

 High-level languages like BASIC, FORTRAN, PASCAL, C and


C++. Computer programs can be written in either of these high-level
languages, without through knowledge of internal computer work. Some
of these high-level languages are portable, that’s, they are not dependent
on the microprocessor type or the machine structure.
 Low-level languages. The assembly language of a specific
microprocessor, unlike high level languages, is very close to the structure
of that microprocessor. Assembly programmers must have detailed
knowledge of the internal working of the machine, to which they are
writing a program.

The following figure depicts the historic development of high-level


languages, from low-level and machine code.
34

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

Fig. 1-16 Ddevelopment of high level languages

Figure 1-17 indicates the different steps, which are needed to translate a
high-level source program file (e.g., C-language source files *.C) into an
executable file (*.EXE).

program Compilation Linking


High-level Object Executable
program code code
*.OBJ
*.BAS *.EXE
*.FOR
*.PAS Compiler Linker
*.C .
*.CPP

Fig. 1-17. Compilation and linking of high-level languages.

NOTE: Static Linking and Dynamic Linking


The library files can be linked to the executable code in two ways. One is
static linking and the other is dynamic linking. Static linking is used to
combine the library routines with the application. When building the
program, the linker will search all the static libraries (*.LIB) for the
unresolved functions called in the program and if found, it will copy the
corresponding object code from the library. This method is fast and easy
but generates bulky executables.

35

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

In dynamic linking, the called functions are compiled and saved in a


dynamic library (*.DLL). Unlike static linking, no object code is copied
in to the executable from the libraries. Instead, the executable will keep
the name of the DLL in which the required function resides. When the
executable is running, it loads the required DLL (from system files
folder) into memory and calls the required functions. This method yields
executable code with small size, but compromised on speed a little.

Compilers are constructed in a pipeline architecture made up of several


stages that communicate different forms of data. The front end of a
compiler is language specific and includes a parser for the given language
that results in parsed trees and the intermediate representation (the
Register Transfer Language, or RTL). The back end is then responsible
for using this language-independent representation and product
instructions for a particular microprocessor. To do this, the optimizer uses
the RTL to create fast and compact code.

The optimized RTL is then fed to the code generator, which produces
target object (binary) code.

Fig. 1-18. Simplified view of the compiler stages

There exist a variety of open-source (downloadable for free) compilers.


For instance, The GNU Compiler collection includes front ends for C,
C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these
languages. GCC, which a C/C++ compiler, was originally written as the
compiler for the GNU operating system.

36

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-11. Summary

The microprocessor is a general-purpose central processing unit (CPU),


which is integrated on a single chip. The microprocessor is hence a
programmable integrated circuit, which is able to execute many sets of
user-defined functions (instructions). These programs are written using a
set of predefined instructions, which the microprocessor can handle.

The microcomputer is a microprocessor-based system, which consists of


a central processing unit (CPU), memory and input/output (I/O) ports
that are interfaced to input and output devices.

The microprocessor has the ability to perform the following functions:

• Execute stored set of instructions (programs),


• Access external memory chips for read and write operations,
• Access external input and output ports.

The first single chip CPU (which has been called a microprocessor) was
the Intel 4004 that was introduced in 1971. The Intel 8008 was the first
8-bit microprocessor. The 8088 was used in the first successful IBM PC.
Today the microprocessor is one of the most commonly used electronic
components for PC’s, communication and control systems as well as
automotive applications (some cars have over 10 of them inside).

According to the nature of their instruction set, microprocessor systems


can be divided into two basic architectures:

• Reduced instruction set computers (RISC),


• Complex instruction set computers (CISC).

Item of Comparison CISC RISC


Instruction Set Large (100 to 300) Small (100 or less)
Addressing Modes Complex (8 to 20) Simple (4 or less)
Instruction Format Specialized Simple
Code Length Variable Fixed
Execution Cycles Variable Standard for most
Cost / CPU Complexity Higher Lower
Simplifies Compilation Processor design
Complicates Processor design Software

37

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-12. Problems
1-1) Draw a block diagram showing the architecture of a simple
microprocessor and explain briefly its main components. Explain why it
is necessary to have an address bus, a data bus and a control bus in
microprocessor systems.
1-2) An MPU is:
(a) the same as a microprocessor unit.
(b) made from more than one Central Processing Unit.
(c) a small, single chip computer.
(d) an abbreviation for main processing unit.

1-3) Explain why some processors have general-purpose registers?


What’s the difference between the microprocessor register and usual
memory? Why RISC processors have a large number of registers?
1-4) What is a tri-state-latch? And where is it used?
1-5) Explain, how you can isolate low voltage computers from high
voltage loads that they are controlling.
1-6) What’s the difference between RISC and CICS processors? Give
examples of real commercial CISC and RISC processors.
1-7) What do you know about microprocessor benchmarks?
1-8) Draw a flow chart and write an assembly program to calculate the
exponential of a given value using its expansion (ex). Use the instruction
set of the simple microprocessor system, explained in this chapter
Hint: ex = 1 + x + x / 2! - x/3! +. . .
we can obtain a reasonably approximation by adding just a few of terms.
1-9) Write another program that makes use of the above program as a
subroutine. For instance, write a program that calculates the natural
logarithm of a user-defined value
1-10) What is function of each of the following programs: operating
systems, compilers, assemblers, linkers, and loaders.
1-11) A 32-bit processor has
 32 registers
 32 I/O devices
 32 Mb of RAM
 a 32-bit bus or 32-bit registers
38

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-12) Big endian format:


(a) stores low byte in the highest address. (b) is used in all CPUs.
(c) stores high byte in the highest address. (d) is used in a cache but never
in the main memory.
1-13) The Program Counter (PC) register is used for:
a) Holding the address of memory location that is to be next processed
b) Counting the amount of total memory being used by the program
c) Holding the result of operations performed by the ALU
d) Counting the number of programs run during one use of the computer
1-14) Which of these does NOT comprise a part of the system bus?
a) Data bus
b) Logic bus
c) Control bus
d) Address bus
1-15) The arithmetic and logic unit (ALU) is used to:
a) Control the operation of the rest of the CPU
b) Determine whether a program is able to be executed
c) Carry out basic operations on integers and Booleans
d) Transfer information to the memory
1-16) The width of the address bus determines:
a) The speed at which information is dealt with
b) The amount of instructions the CPU can deal with at one time
c) The distance information can be transported in the computer
d) The maximum addressing capacity
1-17) Which of these does NOT comprise a part of the control unit?
a) Decoder
b) Instruction register
c) Control logic circuits
d) Timer or clock
1-18) A 20-bit address bus allows access to a memory of capacity
a) 1 MB b) 2 MB c) 4 MB d) 8 MB
1-19) What is meant by Maskable interrupts?
a) An interrupt that can be turned off by the programmer.
b) An interrupt that cannot be turned off by the programmer.
c) An interrupt that can be turned off by the system.
d) An interrupt that cannot be turned off by the system

39

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors CHAPTER 1

1-13. References

[1] A. R. Ismail and V. M. Rooney, Microprocessor Hardware and


Software Concepts, 1987.

[2] I. L. Sayers, A. P. Robson, A. E. Adams, and G. E. Chester,


Principles of Microprocessors, 1991.

[3] F. Faggin, "The Birth of the Microprocessor", Byte Magazine,


pp.145-150, March 1992.

[4] M. Slater, A Guide to RISC Microprocessors, 1992.

[5] B. Randell, "The Origins of Computer Programming", IEEE Annals


of the History of Computing, Vol. 16, No. 4, pp6 – 13, 1994.

[6] J. P. Hoges, Computer Architecture and Organization, McGraw-Hill,


1998.

[7] John Crisp, Introduction to Microprocessors and Microcontrollers,


2004.

[8] Jon Stokes, Inside the Machine: An Illustrated Introduction to Micro-


processors and Computer Architecture, 2006.

[9] Microsoft Knowledge Base, Description of Windows XP (MSKB


886540), Microsoft, June 5, 2007.

40

Prof. Dr. Muhammad EL-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 2

Microprocessor
Architecture
Contents

2-1. Architecture of 8086/8088 Microprocessor


2-2. Architecture of 8086/8088 Microprocessor
2-3. Architecture of 8087 Coprocessor
2-4. Architecture of 80286 Microprocessor
2-5. Architecture of 80386 Microprocessor
2-6. Architecture of 80486 Microprocessor
2-7. Architecture of Intel‟s Pentium Microprocessor
2-8. Architecture of Intel‟s Pentium II Microprocessor
2-9. Architecture of Intel‟s Pentium III Microprocessor
2-10. Architecture of Intel‟s Pentium 4 Microprocessor
2-11. Intel‟s Core and Core2 Microprocessors
2-12. Intel‟s Core i5, Core i7, Core i9 Microprocessors
2-13. Summary of Intel Microprocessor Architectures
2-14. Architecture of 64-bit Microprocessors‟
2-15. Architecture of AMD K10
2-16. Evolution of x86 Processors from CISC to RISC Architecture
2-17. Architecture of Famous RISC Processors (ARM and SPARC)
2-18. CPU Market Share
2-19. Moore‟s Law
2-20. Summary
2-21. Problems

41
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

42
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Microprocessor
Architecture
2-1. Introduction
The microprocessor architecture is the framework and the conceptual
design of the microprocessor structure. There are two basic architectures
of microprocessors, namely:

The Harvard architecture: This architecture has two physically


different memory storages for Data and Instruction (code). Although
early computers used this architecture, it is still used in
microcontrollers units (MCU), which are complete microprocessor
systems on a chip.
Princeton architecture (Von-Neumann architecture): This architecture
has single memory storage for both data and instructions. .

Fig. 2-1. Basic architecturess of microprocessors

Another way to divide computer architectures is into Complex Instruction


Set Computer (CISC) and Reduced Instruction Set Computer (RISC). A
CISC approach may only a single instruction taking values from memory,
performing the addition internally and writing the result back. This means
the instruction may take many cycles. A RISC approach is only
performing operations on values in registers and explicitly loading and
storing values to and from memory. All modern architectures would be
considered RISC architectures.
43
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-2. Architecture of the 8086 / 8088 Microprocessors


The real PC history started with the introduction of Intel's 8086
microprocessor in 1978 and the 8088 in 1979. The 8086 is a 16-bit
microprocessor, which means it processes 16-bit data (2 bytes), at
a time. The 8086 was built in 6 micron NMOS technology and
had 29,000 transistors on a single chip, with 40 pin package.

GND 1 40 VCC
 AD14 2 39 AD15 
 AD13 3 38 A16 / S3 
 AD12 4 37 A17 / S4 
 AD11 5 36 A18 / S5 
 AD10 6 35 A19 / S6 
 AD9 7 8086 34 BHE / S7 
 AD8 8 33 MN/ MX 
 AD7 9 32 RD 
 AD6 10  31 HOLD, RQ/GT0
 AD5 11  30 HOLDA, RQ/GT1
 AD4 12 29 WR, LOCK 
 AD3 13 28 IO/M, S2 
 AD2 14 27 DT/ R, S1 
 AD1 15 26 DEN, S0 
 AD0 16 25 ALE, QS0 
 NMI 17 24 INTA , QS1 
 INTR 18 23 TEST 
 CLK 19 22 READY 
GND 20 21 RESET 

Fig. 2-2. Pin-out diagram of the Intel 8086 microprocessor.

Figure 2-2 shows the pin-out diagram of the 8086 microprocessor


and table 2-1 depicts the functions of each pin.
44
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

As shown, the 8086 microprocessor, has a 16-bit data bus (D0


through D15), and a 20-bit address bus (A0 through A19), and
hence can address up to 1 MB (220 Bytes) of memory. The 16-bit
data lines of the 8086 are multiplexed with the first 16 lines of the
address bus (named AD0 through AD15). The clock frequency of
the 8086 has been rated from 3 to 10 MHz.

Table 2-1. Pin assignment of 8086 microprocessor, in the so called minimum mode of
operation. In this mode pin33 should be connected to ground.

Pin # Pin Name Function


1 GND Ground
2 AD14 Address/Data bus
3 AD13 Address/Data bus
4 AD12 Address/Data bus
5 AD11 Address/Data bus
6 AD10 Address/Data bus
7 AD9 Address/Data bus
8 AD8 Address/Data bus
9 AD7 Address/Data bus
10 AD6 Address/Data bus
11 AD5 Address/Data bus
12 AD4 Address/Data bus
13 AD3 Address/Data bus
14 AD2 Address/Data bus
15 AD1 Address/Data bus
16 AD0 Address/Data bus
17 NMI Non-maskable interrupt
18 INTR Interrupt Request
19 CLK Clock
20 GND Ground
21 RESET Reset signal
22 READY Ready signal
23 TEST Test signal
24 INTA Interrupt Acknowledge
25 ALE Address Latch Enable
26 DEN Data Enable
27 DT/ R Data Transmit/Receive
28 IO/M Input Output/ Memory
29 WR Write
30 HOLDA Hold Acknowledge
31 HOLD Hold
32 RD Read

45
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Pin # Pin Name Function


33 MN/ MX Min / Max mode
34 BHE / S7 Bus High Enable /Status
35 A19 / S6 Address bus / status
36 A18 / S5 Address bus / status
37 A17 / S4 Address bus / status
38 A16 / S3 Address bus / status
39 AD15 Address / Data bus
40 VCC Vcc (+5V)

Figure 2-3 depicts the 8086 internal architecture. The Intel 8086
was based on the design of the Intel 8080 and Intel 8085 (it was
source compatible with the 8080) with a similar register set, but
was expanded to 16 bits.

Fig. 2-3. Architecture of the Intel 8086/8088 microprocessors

The Bus Interface Unit (BIU) feeds the instruction stream to the
Execution Unit (EU) through a 6-byte prefetch queue. So fetch
and execution are concurrent (a primitive form of pipelining,
which is a method of parallel processing). In fact the 8086
instructions vary from 1 to 4 bytes. It features 64k x 8-bit (or 32k
x 16-bit) I/O ports and fixed vectored interrupts.
46
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The Intel 8088 is a version of 8086, with an 8-bit external data


bus. Even though its internal architecture has also an internal 16-
bit data bus, the 8088 is 30% slower than the 8086. The original
IBM PC made use of the 8088 at a clock rate of 4.77 MHz. In
8088 microprocessors, the address and data busses are only
multiplexed in the first 8 lines (AD0-AD7).

The ALE (address latch enable) signal, on pin 25, indicates


whether the multiplexed address/data lines are carrying address or
data. When the microprocessor sends an address, the ALE is set
high. When the microprocessor sends or receives data, the ALE is
set low. In order to de-multiplex the address signals (A0-A15)
from (AD0-AD15), we use an octal latch (e.g., 74LS374). The
block diagram of 74LS374 is shown in figure (2-4). The 74LS373
octal latch is the same as 74LS374, except that the clock is
replaced by a special enable line (control). The latch has to be
enabled by the ALE signal as shown in figure 2-4.The bi-
directional data lines (D0-D15) can be driven by the octal tri-state
buffer 74LS245, whose block diagram is shown in figure 2-5.
Alternatively, the data lines (D0-D15) can be driven by the 8286
bus transceiver chip. In each case two chips are needed. The BHE
pin (pin 34 in 8086) is used to indicate whether data appearing on
AD8-AD15 should be latched or not. In fact, both BHE and A0
are used to indicate how data appear on data bus. The four
possibilities are shown in the following table. Note that the BHE
pin doesn't exist in 8088, which has only 8-bit external data bus.

Table 2-2. BHE pin signals.

Condition BHE A0 Data on


R/W 16-bit at even address 0 0 AD0-AD15
R/W 16-bit at odd address 0 1 AD0-AD15
R/W 8-bit at even address 1 0 AD0-AD7
R/W 8-bit at odd address 1 1 AD0-AD7

The 8086/8088 microprocessors can operate in two basic modes:

 Minimum mode, is enabled by connecting pin33 (MN/MX) to logic 1


 Maximum modes, is enabled by connecting pin 33 to logic 0.

47
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Table 2-3 depicts the pin assignment of pins 24-31 in minimum and
maximum modes. In minimum mode, the 8086/8088 processors work as
stand-alone processors, that generate their own control signals. So, in
minimum mode the control pins 24 to 31 are used as I/O control signals
(INTA, ALE, DEN, DT/R, IO/M, WR, HOLD, HOLDA) which are
generated from the 8086 (as 8085A control signals). Thus, in minimum
mode, the 8085A peripheral support chips can be also used with the
8086/8088 microprocessors.
Table 2-3. Control signals of the 8086/8088, in minimum and maximum modes.

Function
Pin Minimum Mode Maximum Mode
31 HOLD (Hold for DMA request) RQ/GT0 (Request/Grant)
30 HLDA (Hold Acknowledge) RQ/GT1 (Request/Grant)
29 WR (Write control) LOCK (Lock bus)
28 IO/M (I/O or Memory control , 8086) S2 (Status signal)
IO/M (I/O or Memory control , 8088)
27 DT/R (Data transmit/receive) S1 (Status signal)
26 DEN (Data enable) S0 (Status signal)
25 ALE (Address latch enable) QS0 (Queue status)
24 INTA (Interrupt acknowledge) QS1 (Queue status)

VCC A7 AD7 AD6 A6 A5 AD5 AD4 A4 G


20 11
Q D D Q Q D D Q
CLK CLK CLK CLK
E E E E

Q D D Q Q D D Q
CLK CLK CLK CLK
E E E E

1 10

OC A0 AD0 AD1 A1 A2 AD2 AD3 A3 GND

Fig. 2-4. Block diagram of the 74LS374 octal latch.

48
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-5. Block Diagram of the 74LS245 octal tri-state buffer.

Fig. 2-6. Address de-multiplexing from address/data lines of 8086 processors.

Figure 2-7 depicts how the control signals MEMR, MEMW, IOR,
and IOW can be generated from IO/M, WR and RD signals, for
8086/8088 microprocessors in minimum mode. Note that IO/M in
8086 is replaced with IO/M in 8088. The IBM PC/XT, made use
of the 8088 in its maximum mode, while the IBM PCJR was
running its 8088 in minimum mode.

49
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

In maximum mode, some I/O control signals have to be generated


externally, using the 8288 bus controller chip. In this case, the
8288 chip uses the S0, S1, S3 signals (pins 26-28) to generate I/O
controls (INTA, IOR, IOW and other signals). Note that 8288
output signals MRD and MWT are equivalent to the usual control
signals MEMR and MEMW, respectively. However, in this case,
some new features are available, like the possibility of adding a
coprocessor chip (8087) to the system.

Fig. 2-7(a). Generating control bus signals in 8086 minimum mode.

Fig. 2-7(.b). Generating control bus signals in 8088 minimum mode.


50
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The RQ/GT0, and RQ/GT1 pins coordinate the use of other


processors (e.g., the math coprocessor 8087 or I/O processors
8089) on the same 8086 busses, in maximum mode. The request
and grant RQ/GT0, RQ/GT1 lines provide some sort of
handshaking between other processors and the 8086. The LOCK
signal (pin 29) is used in the maximum mode to prevent other
processors from gaining control on the system bus. In fact, some
8086 instructions need that no other processor can intervene or
takes over the busses, while they are executed. Note that bus
mastering in minimum mode is performed by HOLD and HLDA
signals (instead of RQ/GT0, RQ/GT1).

The queue status lines QS0, and QS1 tell whether the 8086 is
taking its next byte from internal byte queue or from external
memory. The Ready pin is usually pulled low when the 8086 is
communicating with a slow memory or input/output devices.
When the memory or input/output device finishes the reading or
the writing operation it pulls ready high.

The 8086/8088 processors have four 16-bit general purpose


registers, which could also be accessed as eight 8-bit registers, and
four 16-bit index registers (including the stack pointer). There are
also four segment registers. As shown in figure 2-3, the 8086/8088
registers are all 16-bits, as follows:

i) 4 General-purpose Registers : AX, BX, CX, DX


*AX (AL/AH): Accumulator, used in multiplication, division and I/O
*BX (BL/BH): Base register,
*CX (CL/CH): Count register, used as counter in loop operations
*DX (DL/DH): Data register, used in multiplication, division and I/O
ii) 5 Pointer / Index Registers:
SP: Stack Pointer, BP: Base Pointer, IP: Instruction Pointer
SI: Source Index, used as pointer in string operation
DI: Destination Index, used as pointer in string operation
iii) 4 Segment Registers : to access memory and I/O
CS: Holds Code segment address (the beginning of the code segment)
DS: Holds Data segment address (the beginning of the data segment)
SS: Holds Stack segment address (the beginning of the stack segment)
ES: Extra segment
iv) 1 Flag Register FLAGS: to monitor the result of ALU instructions.
51
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The FLAGS register of the Intel 8086/8088 is described in figure 2-8.


FLAGS is unlike other registers of the 8086. This register is simply a
collection of one-bit values (flip flops), which help us to determine the
current state of the processor. Although the FLAGS register is 16-bits
wide, the 8086 processor uses only nine of those bits. Of these flags, four
flags you use all the time: carry C, zero Z, sign S, and overflow O. For
example, after an addition of two numbers, if the sum is larger than 8 bits,
the Carry flag (CF) is set to one. When an arithmetic operation results in
zero, the Zero flag (ZF) is set to one.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- - - - OF DF IF TF SF ZF - AF - PF - CF
 CF: Carry Flag (contains carry out of MSB of result)
 PF: Parity Flag (indicates if result has even parity)
 AF: Auxiliary carry Flag (contains carry out of bit 3 in AL)
 ZF: Zero Flag (indicates if result is zero)
 SF: Sign Flag (indicates if result is negative)
 OF: Overflow Flag (indicates if overflow occurred in result)
 IF: Interrupt Flag (indicates if interrupt is enabled or disabled)
 DF: Direction Flag (controls pointer updates during string operation)
 TF: Test Flag (provides single-step execution capability for debugging)
Fig. 2-8. Illustration of the FLAGS register in Intel 8086/8088 microprocessors.

The instruction pointer (IP) is a 16-bit register which contains the address
of the current executing instruction. The microprocessor uses this register
to sequence the execution of instructions. The IP and FLAGS are two
special registers on the 8086 CPU. You do not access these registers the
same way you access the other registers. Instead, it is the CPU which
manipulates these registers directly.
The 8086/88 microprocessors can handle hardware interrupts, via
INTR (interrupt request) and NMI (non-maskable interrupt) pins.
Interrupts are external events (like mouse movement or memory failure)
that need CPU attention. When NMI pin is edge triggered, the CPU will
finish its current instruction and handle the interrupt. Similarly, when the
INTR is activated, by an external device, the CPU will finish its current
instruction and respond by interrupt acknowledge signal (INTA). The
INTA signal is received by an interrupt controller (the 8259 chip), as
shown in Fig. 2-8. This chip puts an interrupt vector byte on the data bus
and the CPU uses it to determine the address of the appropriate interrupt
service routine (ISR). 52
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The interrupt signal INTR, can be masked (and hence ignored) if interrupt
flag (IF) is cleared by CLI instruction. Figure 2-9 depicts how the
microprocessor handles the hardware interrupt signals.

Fig. 2-9. Handling hardware interrupts in 8086/8088 systems, using the 8259
interrupt controller chip. The INTA signal and other control signals are generated in
the maximum mode via the 8288 bus controller.

8086
8259 INTA

INTR

IF

End of execution of
current instruction

Fig. 2-10. Gating the hardware Interrupts inside the 80x86 systems.

53
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The internal clock of the 8086/8088 is equal to one third of the input
frequency signal at pin 19 (CLK). The CLK signal is usually supplied via
the 82284 clock chip, as shown in figure 2-11. For instance, if the 8088 is
to be operated at 5MHz, then the 8284 chip should generate 15MHz. The
8284 is also used to manage READY and RESET signals of the
8086/8088.

The RESET pin of the 8086/8088 is used to restart the microprocessor


system when the system is crashed. When the RESET line goes high
momentarily the 8086/8088 microprocessor turns all its IP, DS, ES and
SS registers zeroes and makes the CS points to 0FFFFH. This address
should contain the startup routine of the microprocessor system. In IBM
PC and compatible clones, this is usually the beginning address of the
BIOS ROM. The reset signal can be generated via a simple pushbutton or
via the 8284 clock chip, as shown in figure 2-11.

Fig. 2-11. RESET, CLK and READY pins in 8086/8088 microprocessors and their
connection to the 8284 clock chip.

54
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-3. Architecture of the 8087 Math Coprocessor


The ALU of the 8086/8088 could only do integer math. For data that is
floating point, the calculation was handled by writing programs to break
down the math operation into a series of integer operations. As this was a
slow calculation, special purpose arithmetic chips were developed. These
circuits were developed specifically to do mathematic functions on
floating point data. These chips were referred to as co-processors or
floating point processors, FPU.

Fig. 2-12. Architecture of the 8087 math coprocessor.

The 8087 math coprocessor was one of the innovations of the x86 family
of microprocessors. The 8087 is capable of performing mathematical
floating point operations in 80-bit precision. It was designed to monitor
the instruction stream and watch for ESC sequences (instructions which
are proceeded by ESC prefix). Whenever an ESC sequence is detected,
the coprocessor knows that the following instruction involves a floating
point operation and can execute it more efficiently than the 8086/88. In
the meantime, the 8086/88 has to wait the result of the 8087, by issuing a
Wait instruction until the 8087 is done and its BUSY signal goes low.
Other (non ESC) instructions are ignored by the 8087 and normally
handled by the 8086/88.
55
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

As shown in figure 2-12, the 8087 coprocessor has eight 80-bit data
registers, named ST(0) through ST(7), which can be accessed randomly
or as a stack. The connection of 8087 with 8086/88 microprocessor in
maximum mode is shown in the following figure.

8088 8087
AD0-AD15 AD0-AD15 INT

A16/S3-A19/S6 A16-A19 CLK

BHE / S7 BHE
S0-S2 S0-S7

QS0-QS1 QS0-QS1
RQ/GT0 RQ/GT0
RQ/GT1
TEST BUSY

INTR RDY
RESET

Fig. 2-13. Connection of the 8087 math coprocessor with 8088 microprocessor, in
maximum mode.

56
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-4. Architecture of the 80286 Microprocessor


With the IBM PC great success story, Intel wanted to continue to be the
preferred supplier of CPU chips for all the IBM PC models. In order to do
that, Intel found it necessary to introduce subsequent improved versions
of the 8086, such as the 80186 and 80286.

The 80286 (also known as iPAX 86) is a 16-bit microprocessor that was
introduced in 1982. It was built in 1 micron NMOS technology with
134,000 transistors in a 68-pin quad-flat pack (QFP) package. The 80286
features a 16-bit data bus, and a 24-bit address bus and hence can address
up to (224) 16 MB of memory space. The processor was used in IBM
PC/AT, which has been introduced in 1984. The 80286 performance was
more than twice that of its predecessors (8086 and 8088) per clock cycle.
For instance, the model operated at 12 MHz had a benchmark of 2.66
MIPS (million instruction per second). Figure 2-14 depicts the
architecture of the 80286 microprocessor. As shown, the 80286 has 5
additional control registers for segmented memory management and
multiple processing:

 GDTR: Global Descriptor Table Register


 LDTR: Local Descriptor Table Register
 IDTR: Interrupt Descriptor Table Register
 TR: Task Register
 MSW: Machine Status Word

The additional control registers enabled 80268 to work in the protected


virtual address mode (PVAM). The machine status word (MSW) is a
register that contains information indicates whether the machine operates
in real mode or in PVAM. When bit 0 of the MSW register (termed PE)
is set to zero, the machine enters the PVAM. In this case and the 80286
can address up to 16 MB of memory. The GDTR and LDTR registers
point to the general and local segment descriptor tables. IDTR register
points to a table of entry points for interrupt. The TR register points to
the information needed by the processor to define the current task.
Beside additional registers, there are three additional bits present in the
80286 flags register. The I/O Privilege Level is a two bit value (bits 12
and 13). It specifies one of four different privilege levels necessary to
perform I/O operations. These two bits generally contain 00 when
operating in real mode on the 80286.
57
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-14(a). Pinout of the 80286 microprocessor.

Fig. 2-14(b). Architecture of the 80286 microprocessor.


58
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The NT (nested task) flag controls the operation of an interrupt return
(IRET) instruction. The benefit of the protected mode on the 80286 is to
access more than one megabyte of RAM. However, as the 80286 is now
virtually obsolete, and there are better ways to access more memory on
later processors, programmers rarely use the PVAM mode.

MSW (80286)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- - - - PE

Fig. 2-14(c). Machine status word register (MSW) in the 80286 microprocessor.

2-5. Architecture of the 80386 Microprocessor


The 80386 processor was a significant step in the line of Intel's 80x86
processors. The 80386 is a 32-bit microprocessor that was introduced for
the first time in 1985. It has 275,000 transistors, built in 1m CMOS
technology, in 132 pin package. Figure 2-15(a) and table 2-4 depict the
pin assignment of the 80386 chip. The 80386 features a 32-bit data bus, a
32-bit address bus, and hence can address up to (2 32) 4 GB of memory.
This 32-bit architecture added better memory management and
multiprocessing. Before that time, personal computers based on Intel
processors were almost exclusively driven by the DOS systems that could
run only one program at a time. The 386 allows multiple application
programs to run at the same time using the so called "protected mode". In
this mode, each application has its own protected memory area. In
addition, the 80386 microprocessors have a virtual memory mode, in
which the microprocessor can address up to 64TB of external memory. In
order to employ all these new facilities, Intel has equipped the 80386
microprocessors with new opcodes in a kludgy fashion similar to the
Zilog Z80. In addition to all the registers on the 80286 (and therefore, the
8086), the 80386 added several new registers and extended the definition
of the existing registers. There are different versions of the 80386 CPUs,
among which one can cite:
 80386DX - which works with 16-bit and 32-bit external buses.
 80386SX – which is a low cost version of the 80386. It has 16 bit
external data bus and 24-bit external address bus.
Figure 2-15(a) shows the pinout diagram of the 8386SX microprocessor
and figure 2-15(b) depicts the internal architecture of the 8386 CPU.
59
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig.. 2-15(a). Pin-out diagram of the 80386SX microprocessor.


Table 2-4. Pin assignment of the 80386 microprocessor.

Pin Assignment Pin Assignment


A2-A31 Address Bus BE0 - Bus Enable encoded from A0-A1 to
D0-D31 Data Bus BE3 select 4 (8-bit) memory banks
W/R Write/Read ADS Address Data Strobe. Combined with
M/IO Memory/IO W/R to generate MRD, MW, IOR, IOW
D/C Data / Control BS16 Bus Size 16. Select 16-bit or 32-bit data
NA Next Address BUSY Input used by coprocessor to Wait
CLK2 2X clock PEREQ FPP Request to 80386
INTR Interrupt Request NMI Non-maskable Interrupt
ERROR Error signal from READY Ready signal. Controls the number of
coprocessor. wait states to lengthen memory access.
HLDA Hold Acknowledge HOLD Hold requests a DMA action
INTR Interrupt Request NMI Non-maskable Interrupt

60
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-15(b). Internal architecture of the 80386 microprocessor

As shown in figure 2-15(c), some of the internal registers of 80386


microprocessors are extended to 32 bit as follows:

i) 4 Data Registers (32 bit): EAX, EBX, ECX, EDX


ii) 6 Segment Registers (16 bit): CS. DS, SS, ES, FS and GS
iii) 5 Pointer/Index Registers (32 bit):
ESP: Stack Pointer
EBP: Base Pointer
ESI: Source Index
EDI: Destination Index
EIP: Instruction Pointer
iv) 1 Flag register EFLAGS (32bit): to indicate result of ALU
instructions (like addition, subtraction, comparison, etc.)

The most important change, from the programmer point of view, to the
80386 was the introduction of a 32-bit register set. The 16-bit AX, BX,
CX, DX, SI, DI, BP, SP, FLAGS, and IP registers were all extended to 32
bits. The 80386 calls these new 32-bit versions EAX, EBX, ECX, EDX,
ESI, EDI, EBP, ESP, EFLAGS, and EIP to differentiate them from their
16-bit versions (which are still available on the 80386). Besides the 32-bit
registers, the 80386 also provides two new 16-bit segment registers, FS
and GS, which allow the programmer to concurrently access six different
segments in memory without reloading a segment register.

61
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Note that all the segment registers on the 80386 are still 16 bits. The
80386 did not extend the segment registers to 32 bits. The 80386
microprocessor extended the flags register to 32 bits (renamed EFLAGS)
and defined bits 16 and 17. Bit 16 of the EFLAGS register is the debug
resume flag (RF) used with the set of 80386 debug registers. Bit 17 is the
virtual 8086 mode flag (VM), which determines whether the processor is
operating in virtual-86 mode (that simulates an 8086) or standard
protected mode. Furthermore, the 80386 has additional special registers
for control and memory management. So, in addition to the 5 special
registers of the 80286 microprocessor: GDTR, LDTR, IDTR, TR, and
MSW, the 80386 has added 16 registers, as shown in table 2-5. As
shown, the 80386 added four control registers (CR0-CR3). These
registers extend the MSW register of the 80286 (the 80386 emulates the
80286 MSW register for compatibility, but the information really appears
in the CRx registers). These registers control functions such as paged
memory management, and protected mode operation.

Fig, 2.15(b). Internal registers in 80386 microprocessors.

62
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The 80386 added eight debugging registers (DR0-DR7). A debugging


program like Microsoft's CodeView can use these registers to set
breakpoints when you try to locate errors within a program. While you do
not use these registers in an application program, you will notice that
such debuggers reduce the time to eradicate bugs in programs. Also, the
386 processor has a set of test registers (TR6-TR7) to the system, which
test the proper operation of the processor when the system powers up.
Intel put these registers to allow chip testing after fabrication. However,
system designers can use these registers to do power-on self test (POST).

EFLAGS (386+)
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
- NT IOPL OF DF IF TF SF ZF - AF - PF CF
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16
- - - - - - - - - - ID VIP VIF AC VM RF

 CF: Carry Flag (contains carry out of MSB of result)


 PF: Parity Flag (indicates if result has even parity)
 AF: Auxiliary carry Flag (contains carry out of bit 3 in AL)
 ZF: Zero Flag (indicates if result is zero)
 SF: Sign Flag (indicates if result is negative)
 OF: Overflow Flag (indicates if overflow occurred in result)
 IF: Interrupt Flag (indicates if interrupt is enabled or disabled)
 DF: String Direction Flag (controls pointer updates during string operation)
 TF: Test or Trap Flag (provides single-step execution capability for debugging)
 IOPL: I/O Privilege Level. Used in protected mode to select priority level of I/O
devices. IOPL takes 2 bits (12, 13). So, there exist 4 privilege levels (0,1, 2, 3)
 NT: Nested Task flag. Indicates if current task is nested
 VM: Virtual Mode flag. Selects virtual 80386 mode operation in protected mode.
 RF: Resume Flag. Used with the set of 80386 debug registers.
 AC: Alignment Check flag (added in 80486)
 VIF: Virtual Interrupt Flag. Copy of IF bit in Pentium (80586) and later processors.
 VIP: Virtual Interrupt Pending. Used in multithreading (80586).
 ID: Identification flag. Indicates whether the microprocessor supports CPUID
instruction, which provides information about the microprocessor (80586)

Fig, 2.16(a). Structure of the EFLAGS register in 80386 microprocessors.

63
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Table 2-5. Special registers in 80386 and later microprocessors.


CR0 Control Register 0 DR0 Debug Register 0
CR1 Control Register 1 DR1 Debug Register 1
CR2 Control Register 2 DR2 Debug Register 2
CR3 Control Register 3 DR3 Debug Register 3
DR4 Debug Register 4
DR5 Debug Register 5
DR6 Debug Register 6
TR6 Test Register 6 DR7 Debug Register 7
TR7 Test Register 7

CR0 (386+)
31 30 29 ... 18 17 16 ... 10 9 8 7 6 5 4 3 2 1 0
PG CD NW … AM - WP … - - - - - NE ET TS EM MP PE

 EM: Emulation Mode (of 80x87)


 ET: Extension Type (presence of 80387)
 MP: Math Processor
 PE: Protection Enable
 PG: Paging Gate (disabled when PG = 0)
 TS: Task switch
 CD: Cache Disable (486)
 NW: No-Write through (486)
 AM: Alignment Mask (486)
 WP: Write Protect (484)
 NE: Number Exception (486)
Fig, 2.16(b). Structure of the CR0 control register in 80386 microprocessors.

2-6. Architecture of the 80486 Microprocessor


The 80486 is also a 32-bit microprocessor that contains both the 80386
microprocessor and the 80387 coprocessor on a single chip. The first
version of 80486 was introduced in 1989 using the 1-micron CMOS
technology. It had 1.2 million transistors in a 168-pin package, called pin-
grid array (PGA). The 80486 has a 32-bit data bus and a 32-bit address
bus and hence can address up to 4 GB (2 32 bytes) of physical memory
space. Figure 2-17 depicts the architecture of the 80486 microprocessor
system. The 80486 performance is 20 times faster (20X) than 8086/8087
on integer numbers and 40 times faster (40X) than 8086/8087 for
floating numbers operations.
64
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Exactly as 80386, the 80486 processor has eight 32-bit general-purpose


registers. It is also equipped with a floating point unit (FPU) to perform
mathematical operations with floating point numbers. The FPU has a
register file of eight 80-bit floating-point registers. The 80486 processor
is also equipped with internal 8kB of cache memory to accelerate the
processor instructions handling. As we‟ll see in details in chapter 7, the
cache memory is a sort of fast static RAM (SRAM) which is usually
integrated on the microprocessor chip. This memory is sometimes called
level-one cache (L1-Cache). The 80486 did not add any new registers to
the 80386 basic register set, but it defined a few bits in some registers left
undefined by the 80386. The 80486 adds a third bit to the EFLAGS
register at position 18 - the alignment check flag (AC). Along with
control register zero (CR0) on the 80486, this flag forces a trap (program
abort) when the processor accesses non-aligned data (e.g., a word on an
odd address). The 80486 extends the control functions of the control
registers. In addition to their use for paged memory management, and
protected mode operation, the 80486 uses control registers for cache
enable/disable operation. It should be also noted that the 80486
microprocessor can execute two instructions per cycle. It also does
pipelined floating-point and performs branch prediction as well as clock
doubling (like the Zilog Z80).

Fig. 2-17. Architecture of the Intel 80486 microprocessor.

65
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-7. Architecture of Intel’s Pentium (80586) Microprocessor


The Intel Pentium (80586) is a 32bit microprocessor and was introduced
for the first time in 1993. It is called "Pentium" because it is the fifth in
the 80x86 lines. It would have been called the 80586 unless a US court
had ruled that you can't trademark a number. The Pentium processor has
internal 32bit data bus, a 32bit address bus and can address up to 4GB of
memory. However, this processor has 64bit memory interface and an
external data bus of 64bit. It has 3.1 million transistors, and built in
0.18m BICMOS technology. The clock of this CPU was initially rated
at 66MHz. Figure 2-18 shows the architecture of the Pentium processor.

Fig. 2-18. Architecture of the Intel Pentium (80586) microprocessor.

66
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

As shown in figure, the Pentium processor is equipped with a floating-


point unit (FPU) with an 80-bit bus. It has two 32-bit integer pipelines.
The Pentium processor has also 16kB of cache memory; 8kB for data (D-
Cache) and 8kB for instructions (I-Cache), to accelerate instructions
handling and execution. The branch prediction Unit (BPU) predicts
which instruction block will be executed after a conditional branch
instruction. The instructions in that block are then fetched and their
execution can begin even before the branch decision is taken. Whenever
the prediction is correct, the instruction execution continues smoothly,
saving so much time. Otherwise, the processing in the prediction path is
abandoned,

The Pentium surpasses the 80486 processor in speed by using internal


256bit data paths and pipelined processing that lets all operations happen
at once. In addition, instruction processing is split into dual arithmetic
logic units. In pipelining, the instructions are broken into small tasks, as
shown in figure 2-19, and their execution is shared between various
stages as follows:

 The Bus Interface Unit (BIU) retrieves code and data from RAM.
 The BIU sends code along one 64-bit path to the 8kB I-Cache and sends
data along another 64-bit path to the 8kB D-Cache. The two caches
collect code and data until other components request them.
 The BPU inspects the code in the I-Cache to determine which of the two
pipelines, or data paths, can most efficiently carry each instruction.
 The instruction Pre-fetch Buffer obtains new instructions in 256-bit
bursts and the Instruction Decode Unit prepares the code for execution.
 The FPU calculates any non-integer math and puts the result in D-Cache.
 The two integer ALU's simultaneously take two sequential instructions
of up to 32 bits each from the Instruction Decode Unit.
 The instructions are executed using data placed in the Execution Unit's
Registers from the D-Cache.
 The D-Cache receives the results of the calculation. The Cache sends
the results to the BIU, which in turn stores the results to RAM.

The heat dissipation of first Pentium processors was about 16W and
needed to special cooling fans

67
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Instruction 1 Instruction 2
Task / Time 1 2 3 4 5 6 7 8 9 10
Cycle
Instruction Fetch IF1 IF2
Instruction Decode ID1 ID2
Operand Load OL1 OL2
Instruction IE1 IE2
Execute
Operand Store OS1 OS2
(a) Non-pipelined processing
Task / Time 1 2 3 4 5 6 7 8 9 10
Cycle
Instruction Fetch IF1 IF2 IF3 IF4 IF5 IF6 IF7 IF8
Instruction Decode ID1 ID2 ID3 ID4 ID5 ID6 ID7 ID8
Operand Load OL1 OL2 OL3 OL4 OL5 OL6 OL7
Instruction Execute IE1 IE2 IE3 IE4 IE5 IE6
Operand Store OS1 OS2 OS3 OS4 OS5
(b) Pipelined processing (with 4 pipes)
Fig. 2-19. Pipelined and Non-pipelined instruction processing.

The performance (or benchmark) of a Pentium processor is about 100


million instructions per second (MIPS) when it is operated at 66 MHz.
Benchmarks are standard programs or set of programs which can be run
on different computers to give a measure of their performance. The count
of executed instructions per second, by millions (MIPS), was a measure of
performance for the first microprocessor generations. However, as
microprocessors have been widely produced by several manufacturers and
their instruction sets are different in length, the MIPS has no longer been
an acceptable benchmark in the microprocessor literature. Alternatively,
the SPECint92 and SPECfp92 have been introduced as benchmarks for the
evaluation of microprocessor performance. The SPECint92 is a
performance evaluation standard, whose result is derived from a set of
integer benchmarks. So, the SPECint92 is a benchmark, which may be
used to estimate a machine's single-tasking performance on integer
operations. Similarly, the SPECfp92 is a benchmark, which is used to
estimate the microprocessor single-tasking performance on floating point
numbers. The performance of the 80486 in terms of the above benchmarks
is 64.5 SPECint92, and 56.9 SPECfp92 at 66 MHz.

68
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
2-8. Architecture of Intel’s Pentium II Microprocessor
Pentium II is the successor of the Pentium (80586) and Pentium Pro
(80686) microprocessors. The first produced Pentium II processors were
code named Klamath. They were manufactured using a 0.35 micron
CMOS process with about 7.5 million transistors on a single chip. The
initial versions of Pentium II supported clock rates of 233 to 300 MHz at
a bus speed of 66 MHz. The Pentium II second-generation (code-named
Deschutes) were made with the 0.25 micron CMOS technology and
supported clock rates of 350 to 800 MHz at a bus speed of 100 MHz. The
Pentium II can execute all the instructions of all the earlier x86 processor
family. There are four versions targeted at different user markets. The
Celeron processor is the simplest and cheapest version. The standard
Pentium II is aimed at mainstream home and business users. The Pentium
II Xeon is intended for higher performance business servers. There is also
a mobile version of the Pentium II for use in portable computers.

All versions of the Pentium II are packaged on a special daughter-board


that plugs into a card-edge processor slot on the motherboard. The
daughter-board is enclosed within a rectangular black box called a Single
Edge Contact (SEC) cartridge.

The low cost Celeron may be sold as a card only without the box.
Consumer line Pentium II requires a 242-pin slot called Slot 1. The Xeon
processor uses a 330-pin slot called Slot 2. Intel refers to Slot1 and Slot2
as SEC-242 and SEC-330 in some of their technical documentation. The
daughter-board has mounting points for the Pentium II CPU itself plus
various support chips and cache memory chips. You can find a
recapitulation of the features of all the above mentioned sockets and
interfaces, as well as their photos, in Chapter 11 of this book.

Pentium II is a super-scalar CPU. A superscalar architecture is a uni-


processor that can execute two or more scalar operations in parallel.
Super-scalar architectures require multiple functional units, which may or
may not be identical to each other. In some superscalar processors the
order of instruction execution is determined statically (purely at compile-
time), in others it is determined dynamically (partly at run time). All
Pentium II processors have Multimedia Extensions (MMX) and
integrated level-1 and level-2 cache controllers. Additional features
include Dynamic Execution and Dual Independent Bus Architecture, with
separate 64 bit system and cache busses.

69
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

MMX is a set of 57 extra instructions built into some versions of Intel's


Pentium microprocessors for supporting SIMD (single instruction /
multiple data) operations on multimedia and communications data types.
MMX is not an acronym for "MultiMedia eXtension", according to
Intel, but an Intel brand name. MMX-enhanced processors have been
released in 1997. They are fully compatible with previous Intel processors
and software but software only benefit if it is written to use the new
MMX instructions. They can handle many common multimedia
operations, such as digital signal processing, normally handled by a
separate sound card or video card.

2-9. Architecture of Intel’s Pentium III Microprocessor


The Pentium III microprocessor was introduced in 1999 as Intel
Corporation's successor to the Pentium II. The Pentium III is very similar
to the Pentium II in architecture. It has a 500 MHz to 1 GHz clock rate
and its external bus can be clocked at 100 or 133 MHz. It can have up to
512kB of secondary cache (L2-Cache) and it comes in various packages,
including Flip-Chip Pin Grid Array (FC-PGA)

The Pentium III has Pentium Pro dynamic execution microarchitecture.


Dynamic Execution consists in fetching and decoding several
instructions and preparing them for eventual non-ordered execution.
Dynamic execution is actually a combination of different techniques
(e.g., multiple branch prediction and data flow analysis). Intel
implemented Dynamic Execution, for the first time, in the Pentium Pro
after analyzing the execution of billions of lines of code.

The Pentium III also features a multi-transaction system bus, and MMX,
like the Pentium II. It adds 70 new instructions, Dual Independent Bus
(DIB) architecture, the Intel processor serial number, internet streaming
and Single Instruction/Multiple Data (SIMD) Extensions. Some Pentium
III versions also include an Advanced Transfer Cache and Advanced
System Buffering.

The SIMD is a technology used in parallel processors, where many


processing elements (functional units) perform the same operations on
different data. There is often a central controller which broadcasts the
instruction stream to all the processing elements. When Intel released a
1.1 GHz version of the Pentium III processor using a 0.18 micron
fabrication process on July, 2000, it was the world highest performance
microprocessor for PCs.
70
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-10. Architecture of Intel’s Pentium 4 Microprocessor


The Pentium IV microprocessor was introduced in 2001. It comes in
various packages including 423-pin FC-PGA and has 1 to 3 GHz clock
rate. The Pentium 4 processor features the NetBurst micro-architecture,
which operates at higher clock speeds and delivers performance levels
that are higher than previous generations of IA-32 processors. While
based on the Intel NetBurst micro-architecture, the Pentium IV still
maintains the tradition of compatibility with IA-32 software. The Intel
NetBurst micro-architecture features include hyper-pipelined technology,
a rapid execution engine, either 400/533/800/1066 MHz system bus, and
an Execution Trace Cache (ET-Cache).

Fig. 2-20. Architecture of the Intel Pentium 4 microprocessor.

The hyper-pipelined technology doubles the pipeline depth in the


Pentium 4 processor, allowing the processor to reach much higher core
frequencies. The rapid execution engine allows the two integer ALU's in
the processor to run at twice the core frequency, which allows many
integer instructions to execute in 1/2 clock tick.
71
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The 400 MHz system bus is a quad-pumped bus running off a 100 MHz
system clock making 3.2 GB/sec data transfer rates possible.

The ET-Cache is a level-1 cache (L1-Cash) that stores approximately 12


kB decoded micro-operations (OPs), which removes the decoder from
the main execution path, thereby increasing performance.

Improved features within the Intel NetBurst micro-architecture include


advanced dynamic execution, advanced transfer cache (AT-Cache), and
enhanced floating point and multimedia unit, and Streaming SIMD
Extensions (SSE2).
The advanced dynamic execution improves speculative execution and
branch prediction internal to the processor. The AT-Cache is a 256 kB,
on-die level-2 cache (L2-Cash) with increased bandwidth over previous
micro-architectures. The floating point and multimedia units in Pentium
4 have been improved by making the registers 128 bits wide and adding a
separate register for data movement.

The SSE2 enables break-through levels of performance in multimedia


applications including 3-D graphics, video decoding/encoding, and
speech recognition. The new packed double-precision floating-point
instructions enhance performance for applications that require precision,
including scientific and engineering applications as well as advanced 3-D
geometry techniques. Finally, the SSE2 adds 144 new instructions for
double-precision floating point, SIMD integer, and memory management.
Figure 2-20 depicts the architecture of the Pentium 4. The Branch Target
Buffer (BTB) table contains all the addresses to where a branch will or
could be made.

The Micro-Operations (µOP) is the name that Intel gives to its new
instructions, which can be directly understood by the execution units of
the microprocessor. These RISC-like instructions represent very simple
instructions that can be quickly carried out by the processor. Unlike x86-
instructions, those µOPs are of a defined size and can thus easily be fed
into the execution pipeline. The decoder translates an x86-instruction into
one or many more µOPs, unless the x86-instruction is so complex. In this
case, the Micro Instruction Sequencer (MIS) has to produce a sometimes
rather long sequence of µOPs, using the Micro Code ROM (MCR). In
average, most x86-instructions get decoded to about two µOPs.

72
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The Pentium 4 processor uses a scalable system bus protocol referred to


as the “system bus”. The system bus uses Source-Synchronous Transfer
(SST) of address and data to improve performance. Whereas the P6
processor family transfers data once per bus clock, the Pentium 4
processor transfers data 4- times per bus clock (4X data transfer rate).
Along with the 4X-data bus, the address bus can deliver addresses two
times per bus clock and is referred to as a „double-clocked‟ or 2X-
address bus. Working together, the 4X-data bus and 2X-address bus
provide a data bus bandwidth of up to 3.2 G Byte/sec (3200 MB/sec).

Most system bus signals of the Intel Pentium 4 processor in the 423-pin
package system bus signals use the so-called Assisted Gunning
Transceiver Logic (AGTL) signaling technology. This signaling
technology provides improved noise margins and reduced ringing through
low voltage swings and controlled edge rates.
Unlike the P6 processor family, the termination voltage level for the
Pentium 4 processor AGTL signals is VCC, of the processor core. The
AGTL inputs require a reference voltage (called GTLREF), which is
used by the receivers to determine if a signal is a logic 0 or a logic 1.
GTLREF must be generated on the system board.

Unlike previous processors, the Pentium 4 uses a differential clocking


implementation. The system bus clock (BCLK) directly controls the
system bus interface speed as well as the core frequency of the processor.
As in previous processors, the Pentium 4 processor core frequency is a
multiple of the BCLK frequency. The Pentium 4 processor bus ratio
multiplier is set at its default ratio at manufacturing. No jumpers or user
intervention is necessary; the processor will automatically run at the
speed indicated on the package.

In 2005, Intel has released an extreme edition of Pentium 4


microprocessor running at 3.73 GHz, which was likely the last of its kind
before the Core design. As far as we know today, there is no further
frequency speed jump on this family of chips. This processor was based
on the so-called hyper-threading (HT) technology. The Pentium D
processor belongs to the dual core Pentium microprocessors. It operates
with 800MHz front bus clock and sits in the LGA 775 pin socket, with a
thin heat-sink on it.

73
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-11. Intel’s Core and Core2 Microprocessors


A "core" is a processor (not the chip), and each core adds more
processing power to the overall performance. The Intel Core micro-
architecture is a multi-core processor architecture issued by Intel in 2006.
It is may be considered as the latest iteration of the Intel 32-bit P6
architecture. The extreme power consumption of NetBurst-based
processors and the inability to increase clock speed was the primary
reason, which made Intel abandon the NetBurst architecture.

The Intel Core2 is the eighth-generation of Intel x86 microprocessors.


The first wave of Core2 processors was released on July 27, 2006. The
Intel Core2 is a multi-core 64-bit microprocessor, which allows for true
parallel computing capabilities on the desktop PC‟s. It has two processor
cores in one package running at the same frequency. Each core has its
own 2MB L2-Cache (total 4MB), as shown in Fig. 2-21(a).

Fig. 2-21(a). Architecture of the Intel dual Core (Core2) microprocessor

74
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The Core architecture has 14-stages pipeline, but it can decode, fetch and
issue up to 4 instructions per clock cycle. Like recent PC processors,
Core2 translates x86 instructions (using a Code ROM) into RISC-like
short instructions (Ops). One new technology included in the Core
design is Micro-Ops Fusion, which combines two x86 instructions into a
single micro-operation. For example, a common code sequence like a
compare followed by a conditional jump would become a single Op.
Other new features include 1 cycle throughput of all 128-bit SSE
instructions and a new power saving design.

The so–called Intel‟s Virtualization-Technology (VT) permits one


hardware platform to function as multiple virtual platforms. For high-end
CPU's, the front side bus (FSB) runs at 1333 MHz. However this is
scaled down to 1066 MHz for lower end 1.60 and 1.86 GHz variants. The
Core2 Duo has a 1066 MHz front bus clock and dissipates only 65W at
frequencies up to 2.93GHz.

Fig. 2-21(b). Photograph of the Intel dual Core2 microprocessor

2-12. Intel’s Core i5, Core i7 & Core i9 Microprocessors Processors


The Core i7 processors are based on the Nehalem microarchitecture,
which is successor to the Core microarchitecture. As shown in figure 2-
22, this architecture offers 6 dispatch ports: one Load, two Store and three
universal ports for 4 decoders, one for complex instructions (CD) and
three for simple instructions (SC). The desktop Core i7 was released on
2009. Server and mobile Core i7 processors followed in 2010 and 2011.
The initial Core i7 processors used the 45 nm technology and had 731
million transistors. The processor has 64 kB L1-Cache (32kB code, 32kB
data), 256 kB L2-Cache per core and 2-3 MB L3-Cache per core, shared
by all cores.
75
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Core i7 processors also have an integrated memory controller
supporting DDR3 SDRAM and 3 memory channels as well as an
integrated graphics processor (IGP) in the same CPU package. It has also
a new point-to-point processor interconnect, the Intel QuickPath
Interconnect (QPI), replacing the legacy front side bus.

Fig. 2-22. Nehalem microarchitecture (Core i7)

The Core i9, with up to 18 cores, is Intel's fastest consumer processor yet.
In Intel's simple terms, the Core i9 is faster than the Core i7, which in
turn is faster than the Core i5. However, "faster" is not always "better".
Many people don't need such extra power, which un-fortunately affects
battery life in laptops.
76
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Only the Core i7 and Core i9 series now support Hyper-Threading for
virtual cores. The new Core i5 series does not have it.

2-13. Architecture of 64-bit Microprocessors


The architecture of Intel x86 series, up to Pentium 4, has been
traditionally called IA-32. However, in 2003, Advanced Micro Devices
(AMD) has introduced the so-called AMD64, which is a 64-bit superset
of the x86 instruction set architecture. The AMD64 architecture is a
simple yet powerful 64-bit, backward-compatible extension of the
industry-standard (legacy) x86 architecture. The need for 64-bit
architecture was driven by giant applications that address large amounts
of memory, such as high-performance servers, database management
systems, and CAD tools.

Fig. 2-23. Application registers in x86-64 (64 bit) microprocessors.

The AMD64 architecture has been cloned by Intel under the name Intel
64. This leads to the common use of the names x86-64 to collectively
refer to the two nearly identical implementations. Note that x86-64 is not
the same as IA-64, which is the architecture of Intel's Itanium processors.
77
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Intel 64 is Intel's implementation of AMD64 (x86-64). It is used in newer


versions of Pentium 4, Pentium D, Pentium Extreme Edition, Celeron D,
Xeon and in all versions of the Core2 and later processors.

The x86-64 (x86 in 64-bit) processors extended the 32-bit registers in a


similar way that 32-bit protected mode did before it. The register names
become: RAX, RBX, RCX, RDX, RSI, RDI, RBP, RSP, RFLAGS, RIP.
However, x86-64 processors also added 8 additional 64-bit general
registers (R8, R9, R15), as shown in figure 2-23.

2-14. AMD K10 Architecture


The AMD K10 is AMD's latest microprocessor architecture. Actually,
AMD has used the K-nomenclatures (which stand for Kryptonite) up to
K8 or Athlon 64 processor family. The third-generation Opteron was
launched on September 2007 and the Phenom processors for desktop
followed as the successors to the AMD K8 series (Athlon 64, Sempron
64). The following figure describes the AMD K10 architecture.As shown
in figure, the K10 architecture features 128-bit wide SSE units, Wider L1
data cache interface allowing for two 128-bit loads per cycle (as opposed
to two 64-bit loads per cycle with K8), Lower integer divide latency, 512-
entry indirect branch predictor and a larger return stack (size doubled
from K8) and branch target buffer.

78
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-24. Architecture of the AMD K10 microprocessors.

2-15. Summary of Intel & AMD Architectures


Intel and AMD are always in a battle of processors. In this section, we
summarize the historical developmental of Intel architectures starting
from the Intel 8086 processor to the latest Intel Core processors. Note that
the object code created for processors released as early as 1978 still
executes on the latest x86-64 architecture.

2-15.1. Intel 16-bit Processors (1978)


The Intel 16-bit architecture (IA-16) family started with the 16-bit
processors, the 8086 and 8088. The 8086 has 16-bit registers and a 16-bit
external data bus, with 20-bit addressing giving a 1-MB address space.
The 8088 is similar to the 8086 except it has an 8-bit external data bus.
The 8086/8088 introduced segmentation to the Intel microprocessor
architecture.
79
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

With segmentation, a 16-bit segment register contains a pointer to a


memory segment of up to 64 KB. The 20-bit addresses that can be formed
using a segment register (CS or DS or SS or ES) and an additional 16-bit
pointer provide a total address range of 1 MB.

2-15.2. Intel 286 Processor (1982)


The Intel 286 processor introduced protected mode operation into the
Intel architecture. Protected mode uses the segment register content as
selectors or pointers into descriptor tables. Descriptors provide 24-bit
base addresses with a physical memory size of up to 16 MB, support for
virtual memory management on a segment swapping basis, and a number
of protection mechanisms. These mechanisms include:
 Segment limit checking
 Four privilege levels

2-15.3. Intel 80386 Processor (1985)


The Intel386 processor was the first 32-bit processor in the IA-32
architecture family. It introduced 32-bit registers. The lower half of each
32-bit Intel 386 register retains the properties of the old 16-bit registers of
earlier generations, permitting backward compatibility. The processor
also provides a virtual-86 mode that allows executing programs created
for 8086/88 processors. In addition, the 80386 processor has support for:

 A 32-bit address bus that supports up to 4-GB of physical memory


 A segmented-memory model and a flat memory model
 Paging, with a fixed 4-KB page size, providing a method for virtual
memory management
 Support for parallel stages

2-15.4. Intel 80486 Processor (1989)


The Intel 80486 processor added more parallel execution capability by
expanding the 80386 processor instruction-decode and execution units
into five pipelined stages. Each stage operates in parallel on up to five
instructions in all stages of execution. In addition, the processor added:

 8 kB on-chip first-level cache (L1-Cache) that increased the ratio of


instructions that could execute at the scalar rate of one per clock,
 Integrated 80x87 floating point unit (FPU),
 Power saving and system management capabilities
80
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-15.5. Intel Pentium Processor (1993)


The introduction of the Intel Pentium processor added a second execution
pipeline to achieve superscalar performance (two pipelines, known as U
and V, together can execute two instructions per clock). The on-chip L1-
Cache has doubled, with 8 KB for code and another 8 KB for data. The
data cache (D-Cache) uses the MESI protocol to support more efficient
write-back cache in addition to the write through cache of 80486. Branch
prediction with on-chip branch table was added to increase performance
in looping constructs. In addition, the Pentium processor added the
following features:
 Extensions to make the virtual-8086 mode more efficient and allow
for 4-MB as well as 4-kB pages
 Internal data paths of 128 and 256 bits add speed to data transfers
 Burst external data bus was increased to 64 bits
 An APIC to support systems with multiple processors
 A dual processor mode to support glue-less two processor systems

A subsequent stepping of the Pentium family introduced Intel MMX


technology. Intel MMX technology uses the single instruction, multiple-
data (SIMD) execution model to perform parallel computations on
packed integer data contained in 64-bit registers.

2-15.6. Intel P6 Family of Processors (1995-1999)


The P6 family of processors was based on a superscalar microarchitecture
that set new performance standards. One of the goals in the design of the
P6 family microarchitecture was to exceed the performance of the
Pentium processor significantly while using the same 0.6 micron, 4-layer,
metal BiCMOS process. Members of this family include the following
processors:

 The Intel Pentium-Pro processor is 3-way superscalar. Using parallel


processing, the processor is able to decode, dispatch, and complete
execution of (retire) 3 instructions per clock cycle.
 Pentium Pro introduced the dynamic execution (micro-data flow
analysis, out-of-order execution, superior branch prediction, and
speculative execution) in a superscalar implementation. The processor
was further enhanced by its caches. It has the same two on-chip 8-KB
L1-Caches as the Pentium processor and an additional 256KB L2-
Cache in the same package as the processor.
81
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
 Pentium II processor added the MMX technology to the Intel
processors along with other enhancements. The processor core is
packaged in the single edge contact cartridge (SECC). The L1- data
and instruction caches were enlarged to 16kB, and L2-cache of 512KB
and 1MB are supported. Low-power states such as Sleep, Deep-Sleep
and Auto-HALT are supported to conserve power when PC is idle.
 Pentium II Xeon processor combined the characteristics of previous
generations of Intel processors. This includes: scalability and a 2 MB
L2-Cache running on a clock speed backside bus.
 The Intel Celeron processor family focused on the value PC market
segment. It offers an integrated 128 KB of L2-Cache and a plastic pin
grid array (PPGA) form factor to lower the PC system cost.
 The Intel Pentium III processor introduced the Streaming SIMD
Extensions (SSE) to the IA-32 architecture. SSE extensions expand
the SIMD model introduced with the Intel MMX technology by
providing a new set of 128- bit registers and the ability to perform
SIMD operations on packed single-precision floating-point values.
 The Pentium III Xeon processor extended the performance levels of
the IA-32 processors with the enhancement of speed, and Advanced
Transfer Cache (ATC).

2-15.7. Intel Pentium 4 Processor Family (2000-2006)


The Intel Pentium 4 processor family is based on Intel NetBurst micro-
architecture. The Pentium 4 processor introduced Streaming SIMD
Extensions 2 (SSE2), Extensions 3 (SSE3) and Hyper-Threading
Technology (HTT). The Intel Virtualization Technology (IVT) was also
introduced in the late Pentium 4 processors. Note that Pentium processors
are based on the IA-32 (32-bit) architecture. However, some late versions
of Pentium 4, like Pentium 4E and Pentium D support Intel 64bit
technology (EM64T).
2-15.8. Intel Xeon Processor (2001- 2007)
Intel Xeon processors (except for dual-core and, Xeon 5100 series) are
based on the Intel NetBurst microarchitecture, which is a variant of IA-
32. The Xeon family is designed for multi-processor servers and
workstations. The 64-bit Intel Xeon processor running at 3.6 GHz has
introduced the Intel 64 architecture (IA-64). The Intel Xeon processor
70xx series include Intel Virtualization Technology. The Intel Xeon
processor 5100 series introduced the Intel Core microarchitecture,
which is based on Intel 64 architecture. This processor includes Intel
Virtualization Technology and dual-core technology.
82
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Intel Xeon processor 3000 series are also based on Intel Core
microarchitecture. The Intel Xeon processor 5300 series introduces four
processor cores in a single package. They are also based on Intel Core
microarchitecture.

2-15.9. Intel Pentium M Processor (2003-now)


The Intel Pentium M processor family is a low power mobile processor
family with microarchitecture enhancements over previous generations
of IA-32 mobile processors. This family is designed for extending battery
life and seamless integration with platform innovations that enable new
usage models (such as integrated wireless networking). Its enhanced
micro-architecture includes:
 Support for Intel Architecture with Dynamic Execution
 A high performance, low-power technology with copper interconnect
 On-die, 32-KB instruction cache and 32-KB write-back data cache
 On-die, L2-Cache (up to 2 MB) with Advanced Transfer Cache
 Advanced Branch Prediction and Data Prefetch Logic
 Support for MMX technology, SIMD and SSE2 instructions.
 A 400/533 MHz, Source-Synchronous Processor System Bus
 Advanced power management using Intel SpeedStep technology

2-15.10. Intel Pentium Processor Extreme Edition (2005-2007)


The Intel Pentium processor Extreme Edition introduced dual-core
technology. This technology provides advanced hardware multi-threading
support. The processor is based on Intel NetBurst microarchitecture and
supports SSE, SSE2, SSE3, Hyper-Threading, and Intel 64 architecture.

2-15.11. Intel Core Processors (2006-2007)


The Intel Core processor offers power-efficient and multi-core
performance with a low-power design that extends battery life. The Intel
Core processors offer micro-architectural enhancements over Pentium M
processors. Its enhanced microarchitecture includes:

 Intel Smart Cache which allows for efficient data sharing between two
 processor cores,
 Improved decoding and SIMD execution
 Intel Dynamic Power Coordination and Enhanced Intel Deeper Sleep
to reduce power consumption
 Intel Advanced Thermal Manager which features digital thermal
sensor interfaces
83
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The dual-core Intel Xeon processor LV is based on the same micro-


architecture as Intel Core Duo processor, and supports IA-32 architecture.

2-15.12. Intel Xeon 5x00 & Multi-Core Processors (2006-now)


The Intel Xeon processor 3000, 3200, 5100, 5300, and 7300 series,
Pentium Dual-Core, Core Duo, Core Quad and Core Extreme processors
support Intel 64 architecture; they are based on the Intel Core micro-
architecture built on 65 nm process technology. The Intel Core micro-
architecture includes the following innovative features:
 Intel Wide Dynamic Execution to increase performance
 Intel Intelligent Power Capability to reduce power consumption
 Intel Advanced Smart Cache which allows for efficient data sharing
between two processor cores
 Intel Smart Memory Access to increase data bandwidth
 Intel Advanced Digital Media Boost which improves application
performance using multiple generations of SSE technology.
The Intel Xeon processor 5300 series, Core2 Extreme processor QX6800
series, and Core2 Quad processors support Intel quad-core technology.

2-15.13. Intel Xeon 5200, 5400 and Core2 Processors (2007-now)


The Xeon processor 5200, and 5400 series, Core2 Duo processor E8000
series and Core2 Quad processor Q9000 Series, support Intel 64
architecture; they are based on the Enhanced Intel Core micro-
architecture. The Enhanced Core microarchitecture provides the
following features:
 A radix-16 divider, faster OS primitives further increases the
performance of Intel Wide Dynamic Execution.
 Improves Intel Advanced Smart Cache with larger L2-Cache.
 A 128-bit shuffler engine improves the performance of Advanced
Digital Media Boost and SSE4
2-15.14. Intel Atom Processor Family (2008 - Current)
The Intel Atom processors are based on a new microarchitecture, called
Intel Atom microarchitecture, which is optimized for ultra low power
devices. Intel Atom processors are built on 45-nm technology. For
instance, the Atom Processor N270 (code named Mobile Diamond) was
the first generation of low-power IA-32 microarchitecture designed for
Netbook Platforms. The processor supports low power states at the
thread level and the package level. Package low power states
include Sleep and Deep Sleep. 84
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-25. Low power states of Atom processors

The Atom microarchitecture has also the following features:


 Deep Power Down Technology with Dynamic Cache Sizing
 Hyper-Threading and Enhanced SpeedStep Technology
 Support for new instructions including Supplemental Streaming SIMD
Extensions 3 (SSSE3) and Intel Virtualization Technology.
 Support for Intel 64 Architecture

2-15.15. Intel Nehalem Microarchitecture (2009 - Current)


Nehalem is the codename for the Intel microarchitecture, which is
successor to the Core microarchitecture. The architecture is named after
the Nehalem River in Northwest Oregon, which is in turn named after the
Nehalem Native American tribe in Oregon. As shown in figure 2-22, this
architecture offers 6 dispatch ports: one Load, two Store and three
universal ports for 4 decoders, one for complex instructions (CD) and
three for simple instructions (SC). The first processor released with the
Nehalem architecture is the desktop Core i7, which was released on 2009.
Server and mobile Core i7 processors followed in 2010 and 2011.

2-15.16. Sandy Bridge & Ivy Bridge Microarchitectures (2011-now)


The Sandy Bridge microarchitecture was developed by Intel to replace
the Nehalem microarchitecture. Intel demonstrated a Sandy Bridge
processor in 2009, and released first products based on this architecture in
2011 under the Core brand. Originally, implementations were in 32nm
technology process using double-gate MOSFET transistors. One of the
main features of this microarchitecture is the integration of the memory
controller, integrated graphics and processor into single die. This
architecture permits up to 8 physical cores or 16 logical cores through
Hyper-threading.
85
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
The Sandy Bridge architecture was followed by Ivy Bridge micro-
architecture, which makes use of the 22nm technology. The Ivy Bridge is
based on 3D tri-gate MOSFET transistors and was demonstrated by Intel
in 2011. However, lately in 2013, Intel demonstrated the Haswell
architecture, which is a successor to Sandy Bridge and Ivy Bridge.
Skylake is the Intel 6th generation of Core microarchitecture which was
launched in 2015. Skylake uses the 14nm manufacturing technology.
[

Fig. 2-26. Timeline of Intel architectures

2-16. Evolution of x86 Processors from CISC to RISC Architecture


The 80x86 family of microprocessors has been traditionally CISC
(Complex Instruction Set Computer) devices. In such processors, each
instruction needs multiple clock cycles to be read, decoded and executed.
For instance, the STA r (store accumulator in register r) instruction takes
2 cycles to be read, 1 cycle to be decoded and 2 cycles to be executed. If
the processor clock frequency is 100MHz, then the clock duration is 10ns
and this instruction needs 50ns to be executed. Such processors have
always a large number of opcodes (400 for 8086).

On the other hand, the so-called RISC (Reduced Instruction Set


Computer) processors (like ARM, SPARC, Alpha, and PowerPC) have
a few number of simple opcodes. RISC processors are faster than CISC
processors because they use simpler fixed-length instructions and their
architecture enables higher performance through the use of pipelining and
superscalar execution. In order to increase the performance of the latest
generations of processors, Intel and its competitors have borrowed from
RISC technology. The first hint of the incorporation of RISC technology
into the x86 family came about in 1990 with the integration of a floating-
point unit (FPU), and by incorporating more hard-wired instructions and
pipelining. The FPU was Intel's response to the superior floating-point
performance of RISC processors.
86
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Pipelining and reduced micro-code enabled 80486 processor to process


many instructions at an effective rate. RISC processors achieve the same
result by using simpler instructions that require fewer clock cycles.

Cyrix Company adapted these techniques to its improved 80386 chips


and created hybrids like the 486SLC. Texas Instruments and IBM took
the technology even further with IBM using a larger cache memory and
introducing the clock doubling technology. Because the 80486 processor
has only one pipeline its theoretical throughput limit is one instruction per
clock cycle and so Intel provided the Pentium with two pipelines so it
could handle two instructions simultaneously. This allows the Pentium to
issue some instructions at a rate of greater than one per clock cycle. The
nature of CISC instructions (like compare and jump) makes multiple
pipelines difficult to implement.

RISC processors generally use fixed length instructions whereas CISC


processors use variable length instructions ranging in length from 8 to
120 bits. This means a CISC processor must decode each instruction
before it fetches the next one.

By overcoming these limitations of a CISC processor, the Pentium and


later Core processors may be considered as hybrid CISC/RISC
processors. Another limitation imposed by the CISC origin of the x86
family is the shortage of registers. This family of processors has only 8
general purpose registers (GPR) but the Cyrix M1 chip overcame this
limit with 32 GPR‟s that are dynamically renamed, making it appear as
there are only eight general purpose registers.

As the development of the PowerPC microprocessors advances and IBM


and the rest of its consortium (IBM, HP, Motorola) espouse the virtues of
RISC technology we started to see more and more RISC technology
incorporated into Intel‟s microprocessor chips. However, for a given level
of performance, such hybrid RISC/CISC processors have a higher count
of transistors. This will generally translate to a larger chip die size, more
power dissipation and more heating problems.

87
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Fig. 2-27. Comparison between the execution cycles of RISC and CISC machines.

2-17. Architecture of RISC Processors


We have seen so far in Chapter 1 that the SPARC, PowerPC, MIPS,
ARM and DEC Alpha have RISC architectures. We have also pointed out
that RISC microprocessors are typically designed to have a relatively
small number of instructions that can be decoded and run quickly. To
enable the instructions to run quickly, the number of memory accesses is
kept to the smallest amount possible. To help limit the number of
memory accesses, RISC machines typically have a large number of
registers. Both MIPS, ARM and SPARC machines have a relatively large
number of registers for the programmer/compiler to use. In the MIPS and
ARM machine, there are 32 registers that the program can use. In the
SPARC machine there are 32 registers that the program can use at a time.
Generally speaking, RISC architecture offers power in even small sizes,
and thus has become dominant for low-power 32-bit CPUs. However,
other 64-bit RISC processors are also available in applications ranging
from laptops to supercomputers. In this section we depict the architecture
of some famous RISC processors, such as ARM and SPARC machines.

2-17.1. Architecture of ARM Processors


The ARM is a Reduced Instruction Set Computer (RISC), as it
incorporates the typical RISC architecture features. The ARM
architecture has large uniform register file and a small orthogonal
instruction set, like most RISC processors.
88
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2
Orthogonal instructions set have the same format and all registers and
addressing modes can be used interchangeably. Over time, the ARM
architecture has evolved to include architectural features to meet the
growing demand for new functionality and the needs of new and
emerging markets (e.g., smart mobile phones).

Fig. 2-28. ARM processor families

The first ARM processor, the ARM1 was a prototype, which was never
released. The ARM2 was originally called the Acorn RISC Machine. It
was designed by Acorn Computers and used in the Archimedes. Their
successor to the BBC Micro and BBC Master models were based on the
8-bit 6502 microprocessor. It was clocked at 8 MHz giving an average
performance of 4.7 MIPS. Development of the ARM family was then
continued by a new company called Advanced RISC Machines Ltd.
The ARM3 added a fully-associative on-chip cache and some support for
multiprocessing. This was followed by the ARM600 chip which was an
ARM6 processor core with a 4kB 64-way set-associative cache, an MMU
based on the MEMC2 chip, a write buffer and a coprocessor interface.
The ARM7 processor core uses half the power of the ARM6 and takes
around half the die size. In 1994 VLSI Technology, Inc. released the
ARM710 processor chip. The subsequent ARM11 micro-architecture
represented a major step in embedded systems. By scaling both the clock
frequency and the supply voltage, the developer can control power
consumption and performance. First ARM11 processors are implemented
in 0.13µm process technology and dissipate less than 0.4 mW/MHz when
they are powered by 1.2V. Figure 2-28 depicts one of the ARM11
processors and a roadmap to over 1GHz. As shown, the ARM11
processor contains an AMBA interface, which improves memory bus
performance and facilitates "right-first-time" development of embedded
systems with multiple peripherals.
89
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Instructions of ARM cores have 32-bits fixed-length instructions, but


later versions of the architecture also support a variable-length instruction
set that provides both 32 and 16 bits wide instructions for improved code
density. Instructions are split into load and store which access memory
and arithmetic/logic instructions which work on registers (two sources
and one destination. The ARM has 27 registers of which 16 (R0-R15) are
accessible in any particular processor mode. The ALU includes a barrel-
shifter allowing, single-cycle shift and add.

All modern ARM processors include hardware debugging facilities,


allowing software debugging operations such as halting, stepping, and
break-points. These facilities are built using JTAG support. The ARM
architectures used in smartphones, PDAs and other mobile devices range
from ARMv5, used in low-end devices, through ARMv6, to ARMv7 in
current high-end devices. ARMv7 includes a hardware floating-point unit
(FPU), with improved speed.

The first 32-bit ARM-based personal computer ran an interim operating


system called Arthur. The 32-bit ARM architecture is supported by a
large number of embedded and real-time operating systems, including
Linux, Symbian, Debian, Windows CE, Windows Phone, Windows RT,
Android, OS-9 and RISC OS. In 2009, some manufacturers introduced
netbooks based on ARM CPUs, in direct competition with netbooks
based on Intel Atom. ARM's CPU cores dominate the mobile space. This
year (2015) the core of choice for high-end smartphones and tablets is
ARM's Cortex A9 and late next year it'll be probably the Cortex A15.

The ARM processor has four processor modes:

1- User mode,
2- Interrupt mode (with a private copy of R13 and R14),
3- Fast interrupt mode (private copies of R8 to R14) and
4- Supervisor mode (private copies of R13 and R14).

90
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

.
Fig. 2-29. Architecture of ARM1176JZ processor with ARM11 core.

The following table summarizes the different types of ARM processors


and microcontrollers (Embedded).

91
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Table 2-6. ARM processors and microcontrollers

Processor Class Arch. Speed Notes


ARM7TDMI Emb v4T 15 MIPS @ 16.8 MHz
ARM720T App v4T 60 MIPS @ 59.8 MHz
ARM7EJ-S Emb v5TEJ Up to 133MHz Jazelle DBX
ARM920T App v4T 200 MIPS @180 MHz
ARM922T App v4T Up to 250MHz
ARM926EJ-S App v5TEJ 220 MIPS @200 MHz Jazelle DBX
ARM946E-S Emb v5TE Up to 210MHz
ARM966E-S Emb v5TE Up to 250MHz (VFP)
ARM1020E App v5TE Up to 325MHz 32K/32K cache
ARM1022E App v5TE Up to 325MHz 16K/16K cache
ARM1026EJ-S App/Emb v5TEJ Up to 325MHz Jazelle DBX
ARM1136J(F)-S App v6 Up to 550MHz (VFP)
ARM1156T2(F)-S Emb v6T2 Up to 550MHz Thumb-2
Cortex-M3 Emb v7M 120 DMIPS Thumb-2 only
Cortex-A8 App v7A Up to 800MHz
Cortex-A9 App v7A 2.0 DMIPS/MHz

92
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-17.2. Architecture of SPARC Processors


The Scalable Processor ARChitecture (SPARC) is a CPU instruction set
architecture (ISA), which is derived from the reduced instruction set
computer (RISC) line of machines. Actually, the SPARC architecture was
formulated at Sun Microsystems in 1984 through 1987. The SPARC
architecture is based on the RISC I and II designs engineered at the
University of California at Berkeley (UCB) from 1980 through 1982.

The SPARC architecture was first implemented by Sun in 1987, using


version 7 architecture. Since then, SPARC architectures have gone
through two major changes: SPARC V8 in 1990, and the latest
implementation, SPARC V9, first published in 1994, which introduces
64-bit addressing support.

SPARC was licensed by SPARC International Inc., a consortium of


computer makers who control the design. The scalable part of the name
means that the design allows for forward compatibility of programs.
Nowadays, SPARC processors are designed and implemented by Sun
Microsystems, Texas Instruments, Toshiba, Fujitsu, Cypress, and Tatung,
among others. These companies use the chip in applications ranging from
laptops to supercomputers.

The SPARC processor can operate in either of two modes:

1- User mode or
2- Supervisor mode.

In supervisor mode, the processor can execute any instruction, including


the privileged (supervisor-only) instructions. In user mode, an attempt to
execute a privileged instruction will cause a trap to supervisor software.
“User application” programs are programs that execute while the
processor is in user mode.

A SPARC processor logically comprises an integer unit (IU), a floating-


point unit (FPU), and an optional coprocessor (CP), each with its own
registers. This organization allows for implementations with maximum
concurrency between integer, floating-point, and coprocessor instruction
execution. All of the registers, with the possible exception of the
coprocessor‟s, are 32 bits wide. Instruction operands are generally single
registers, register pairs, or register quadruples.

93
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

In all SPARC machine there are 32 registers that the program can use at
the same time (actually, 28 of the 32 are generally available). Since the
SPARC has been optimized for subroutine calls, the actual number of
registers in the microprocessor is much larger than 32 (often 124 registers
exist), but only 32 registers are visible to the program at any given time
(see the subroutine overview section for more information). Registers
can be categorized by function since they are typically created for a given
purpose. On RISC machines, if the register is not being used for the
purpose it is designed for, it is available to use by any instruction. The
most general registers are the "temporary" registers. These registers exist
to store values loaded from memory or calculated by the ALU before
being stored in memory. These registers are not preserved across
subroutine calls. Basically, the program should use these as the "general
use" registers for the program.

Fig. 2-30. Layout of SPARC processor registers

The SPARC machine has only eight temporary registers designated %l0-
%l7 (registers 16-23), but the SPARC often has other registers available
that can also be used as temporaries if the program needs them. For
memory accessing and for subroutine calls, SPARC provide a few
specialized registers. Since they are keys to the operation of the machine,
the program should use the registers as designed, unless they are known
to not be needed. The SPARC provides a stack pointer register, $sp and
%sp. It also both provides a frame pointer register, $fp and %fp. Like
MIPS,
94
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

SPARC reserve one register for the constant zero. This register,
designated %g0 on SPARC, will always return zero when read from and
will not change if written to. To provide for subroutines, there needs to be
registers available for passing the arguments to the registers, returning
values from the registers and returning from the subroutine. The SPARC
machine has six registers for function arguments, designated %i0-%i5,
six registers for function return values, designated %o0-%o5, and a
function return address, designated %i7. When a subroutine call requires
more arguments than the machine provides registers for, then the program
must use the stack to save the additional information. The following
figure depicts the layout of the SPARC registers.

i. Integer Unit (IU)


The IU contains the general-purpose registers and controls the overall
operation of the SPARC processor. The IU executes the integer arithmetic
instructions and computes memory addresses for loads and stores. It also
maintains the program counters and controls instruction execution for the
FPU and the CP. An implementation of the IU may contain from 40 to
520 general-purpose 32-bit registers. This corresponds to a grouping of
the registers into 8 global registers, plus a circular stack of 2 to 32 sets of
16 registers each, known as register windows.

Fig. 2-31. Register organization of the Integer Unit (IU) of SPARC processors

At a given time, an instruction can access the 8 globals and a register


window into these registers. A 24 register window comprises a 16-
register set, divided into 8 IN registers (ins) and 8 local registers, together
with the 8 IN of an adjacent register set, addressable from the current
window as its OUT registers (outs).
95
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The specified register organization is illustrated in Figure 2-28(d). The


current window is specified by the current window pointer (CWP) field
in the processor state register (PSR). Window overflow and underflow
are detected via the window invalid mask (WIM) register, which is
controlled by the supervisor software. The actual number of windows in a
SPARC implementation is invisible to user programs. When the IU
accesses an instruction from memory, it appends to the address an address
space identifier (ASI), which encodes whether the processor is in
supervisor or user mode, and whether the access is to instruction memory
or data memory.

ii. Floating-point Unit (FPU)


The FPU has 32 32-bit floating-point registers. Double-precision values
occupy an even-odd pair of registers, and quad-precision values occupy a
quad-aligned group of 4 registers. Thus, the floating-point registers can
hold a maximum of either 32 single-precision, 16 double-precision, or 8
quad-precision values. Floating-point load/store instructions are used to
move data between the FPU and memory whereas Floating-Point operate
(FPop) instructions are used to perform the actual floating-point
arithmetic. The floating-point data formats and instruction set conform to
the IEEE Standard for Binary Floating-point Arithmetic, ANSI/IEEE
Standard 754-1985. An implementation can indicate that a floating-point
instruction did not produce a correct ANSI/IEEE Standard 754-1985
result by generating a special floating-point unfinished or unimplemented
exception. Software must emulate any functionality not present in the
hardware. If an FPU is not present, or if the enable floating-point (EF) bit
in the PSR is 0, an attempt to execute a floating-point instruction will
generate an fp_disabled trap.
iii. Coprocessor (CP)
The instruction set includes support for a single, implementation-
dependent coprocessor. The coprocessor has its own set of registers, the
actual configuration of which is implementation-defined but is nominally
some number of 32-bit registers. The Coprocessor load/store instructions
are used to move data between the coprocessor registers and memory. For
each floating-point load/store in the instruction set, there is an analogous
coprocessor load/store instruction.

96
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-17.3. Architecture of SuperSPARC Processors


A SuperSPARC microprocessor is a superscalar SPARC microprocessor,
which is compatible with the SPARC V8 (version 8) architecture. A
block diagram of the device is shown in Figure 2-32. The processor
contains an integer unit (IU), double precision floating point unit (FPU),
fully consistent instruction and data caches, a SPARC reference Memory
Management Unit (MMU), and a dual mode bus interface.

Fig. 2-32. Block diagram of a SuperSPARC processor

2-17.4. Architecture of SPARC64 Processors


Since 1998 and for quite some time, Fujitsu has been making its own
SPARC implementation, called SPARC64. Presently the SPARC64 is in
its 6th generation (SPARC64 VI). The processor is able to execute the
SPARC instruction set but the processor internal design is different from
Sun's implementation. Since 2008, SPARC Enterprise mid-range and
high end models incorporated Quad cores processors SPARC64 VII,
together with SPARC64 VI. Figure 2-32 shows the block diagrams of the
dual core SPARC64 VI and V2 SPARC64 architectures. Also, figure 2-
33 depicts the details of the dual core SPARC64 VI architecture. Figure
2-34 illustrates the block diagram of the Fujitsu SPARC64 VI processor
chip. Two cores share the L2 cache. What is not shown in the diagrams is
that, like the IBM and Intel processors, the SPARC VI is dual-threaded
per core.
97
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

The type of multithreading is similar to that found in the Intel processors


and is called Vertical Multithreading (VMT). The multithreading
technology is illustrated in section 2-16.7.

Fig. 2-32. Block diagram of the SPARC64 architectures (VI and VII)

Fig. 2-33. Details of SPARC64 VI architectures

98
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Currently, the highest clock frequency of SPARC64 is 2.4 GHz. The


theoretical peak performance is presently 9.6 GFlops/core. In 2008
Fujitsu has brought out a dual core SPARC64 VI+ at 2.7GHz.

Fig. 2-34. Block diagram of the SPARC64 chip

2-17.5. Architecture of UltraSPARC Processors


The UltraSPARC Architecture 2007 architecture is derived from the
SPARC architecture V9. Like its ancestors, UltraSPARC architecture
consists of an integer unit (IU) and a floating-point unit (FPU), each with
its own registers. This organization allows for implementations with
concurrent integer and floating-point instruction execution. However, the
new version of SPARC introduced 64-bit addressing, superscalar
execution, instruction and data pre-fetching, and handling of nested traps.

Since the sizes of data sets and programs continue to increase, 64-bit
addressing support became necessary, and SPARC took on this challenge.
Integer registers have become 64 bits wide; floating-point registers are
32, 64, or 128 bits wide.
99
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Instruction operands are single registers, register pairs, register


quadruples, or immediate constants. In addition to pipelining, the new
architecture calls for superscalar execution, in which more than one
instruction may be fetched at a time. This causes far more serious hazards
in branching than with the previous design, and version 9 solves this with
branch prediction, a feature available in other processors such as MIPS
during their 32-bit lives. Furthermore, instruction pre-fetching in
UltraSPARC can be done before taking a known branch or call to another
program. Without pre-fetching, the instructions at the branch or called
location probably are not in the same area of memory as the calling
routine. So, the instructions to be called are probably not in cache, either.
Pre-fetching allows the programmer to speed up transition over a call by
pre-fetching the instructions in the new routine to cache. .

Fig. 2-35. Architecture of UltraSPARC.T1 processor and its package photograph

The UltraSPARC Architecture virtual processor can run in nonprivileged


mode, privileged mode, or hyperprivileged mode. In hyperprivileged
mode, the processor can execute any instruction, including privileged
instructions. In privileged mode, the processor can execute nonprivileged
and privileged instructions. In nonprivileged mode, the processor can
only execute nonprivileged instructions.
100
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

In nonprivileged or privileged mode, an attempt to execute an instruction


requiring greater privilege than the current mode causes a trap to
hyperprivileged software.

The UltraSPARC T1 was the first multicore and multithreaded SPARC


processor. The processor is available with 4, 6 or 8 CPU cores, each core
able to handle 4 threads concurrently. Thus, the T1 can handle up to 32
simultaneous threads. This is called simultaneous multithread technology
(SMT). The UltraSPARC T1 processor was designed to lower the energy
consumption of Sun servers, the CPU typically uses 72 W of power at 1.4
GHz. Figure 2-36 shows the architecture of UltraSPARC T1 processor.

Fig. 2-36. Photographs of UltraSPARC.T1 and Oracle T5 processors

Actually, the T1 cores are less complex than those of high end processors
in order to allow 8 cores to fit on the same die. The UltraSPARC T1 and
T2 are designed for single CPU systems. Recent UltraSPARC processors
such as Rock (2009) support multiple chip server architectures. The most
recent commercial iterations of the SPARC processors is SPARC64 X
"Athena" introduced in 2012, and the 16 core SPARC T5 introduced by
Oracle Corporation in 2013, and running at 3.6 GHz.

2-17.6. Multithreading Technology


The UltraSPARC T1 processor is slow on single threaded work but
shines on multi-threaded work. In fact, the studies which were carried out
by Intel showed that even under full load, a typical x86 CPU is idle 50 to
60% of the time. This is due to cache misses whom all CPU architectures
suffer from; they must wait for data to arrive from RAM..

101
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

However, CPUs belonging to the T1 family do not suffer from this


problem. Instead, as soon a T1 thread stalls due to a cache miss, the T1
switches thread in 1 clock cycle and continues to do work while waiting
for the data. Typically on a modern CPU, a thread switch takes a much
longer time than 1 clock cycle. This is the reason a T1 can work 95% of
the time and only waits for data 5% of the time. Compare this to an x86
CPU at 3 GHz. Because the x86 CPU can only work at half speed due to
cache misses, it can be compared to a 1.5GHz CPU at full speed. The
following figure depicts the effect of multithreading on the speed of CPU.
Both virtual threading (VMT) and simultaneous multithreading (SMT)
technologies are illustrated.

Fig. 2-37. Illustration of the virtual threading (VMT) and simultaneous threading
(SMT) technologies.

2-18. CPU Market Share


Both Intel and its intimate rival AMD processors have been extensively
used in the microcomputer and workstation market since relatively long
time. The following figure depicts the Market shares of x86
microprocessors manufacturers, from Intel, AMD and other
manufacturers, according to Mercury Research. Obviously, the CPUs
based on Intel Core micro-architecture stimulated rapid changes in the
computer market. As we have seen so far, these processors set new
performance records for mainstream PC systems.
102
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Thus, Intel has earned the title of the today‟s fastest x86 processors
developer. In the meantime, the AMD processors are pushed back and
started to be just a good solution for inexpensive systems. In order to
retain the sales volume, AMD undertook an unprecedented reduction of
the pricing on their solutions. For instance, the price of AMD Phenom
X4 9600 (quad core with 512 kBx4 L2-Cache at 2.3 GHz), is $145.99, as
declared on AMD official website, in 2009. The price of the
corresponding Intel Core2 Quade, running at 2.4GHz, ranges from
$184.99-$310, depending on the Cache size.

Fig. 2-38. Market shares of x86 microprocessors manufacturers, according to


Mercury Research.(March 2013)

As for SPARC processors, they have failed long ago on the desktop and
still being insignificant in the overall notebook market (despite the
availability of technically impressive products), Therefore, unlike Intel
and AMD architectures, SPARC is best viewed solely as a server
processor architecture. The market prospects for all servers (not just
SPARC) is driven by the following considerations.

103
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

 The Credit Crunch: for most enterprises uncertainty about their


future survival and lack of funding - will mean that spending on new
servers will be done only as a last resort when all other options are
exhausted.
 Virtualization: time sharing applications capacity into a common
pool of less servers is already a well established trend in the market. This
will continue to a more rigorous degree.
 Fatter Multi-Core Processors: As the number of core heads into
double digits it satisfies many customer needs by reducing the physical
and energy footprint of server installations - as well as reducing cost.

2-19. Moore’s Law


Along the above discussion, we have seen that each generation of
microprocessor chips is more powerful than its predecessor. This is
because the electronic industry is constantly increasing the complexity
(the number of integrated transistors on the same chip) and reducing the
feature size of the integrated circuit.

Fig. 2-39. Microprocessors and DRAM roadmap.

104
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

In the mid-1960s, the Intel chairman of the Board Gordon Moore deduced
a principle or “law” which has continued to be true for over three
decades: the computing power and the complexity (roughly, the number
of transistors per chip) of the silicon integrated circuit microprocessor
doubles every one to two years, and the cost per CPU chip is cut in half.
This law is the main explanation for the computer revolution, in which
the Intel Architectures (IA) play such a significant role.

105
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-20. Summary
In this chapter we described the architecture of some famous
microprocessors, with emphasis on x86 microprocessors. The jargon
computer terms, such as pipelining, threading and virtualization, which
usually appear in microprocessor datasheets and the advertisements of
CPU vendors, have been explained in a simple didactic manner.
The 80x86 is the generic name of Intel microprocessor architecture. The
generic term x86 refers to the instruction set of the most commercially
successful CPU architecture in the history of personal computing. The
Intel 8086 CPU was the first of the x86 architecture, which appeared in
1978. Three years later, the 8088 (an eight-bit data bus version of 8086),
was chosen as the main CPU for the IBM PC.
The architecture has twice been extended to larger data bus sizes. In
1985, Intel released the 32-bit 80386 to replace the 16-bit 80286. This
extension to the x86 architecture is commonly called IA-32 (Intel
Architecture, 32-bit). In 2003, AMD further extended the architecture to
64 bits, variously called x86-64 or AMD64. Intel 64 should not be
confused with the unrelated IA-64 architecture
The x86 architecture is a variable instruction length, CISC design with
emphasis on backward compatibility. The instruction set is not typical
CISC however, but basically an extended and orthogonalized version of
the simple eight-bit 8085 architecture. Words are stored in little-endian
order and 16-bit and 32-bit accesses are allowed to unaligned memory
addresses. To conserve opcode space, most register-addresses are three
bits, and at most one operand can be in memory (in contrast with some
highly orthogonal CISC designs such as PDP-11 where both operands can
be in memory), but this memory operand may also be the destination,
while the other operand, the source, can be either register or immediate.
This contributes, among other factors, to a code footprint that rivals 8-bit
machines and enables efficient use of instruction cache memory. During
execution, current x86 processors employ a few extra decoding steps to
split most instructions into smaller pieces, micro-ops (Ops), which are
readily executed by a micro-architecture that may be described as a
RISC-machine without the usual load/store limitations. The small number
of general registers (inherited from 8085) has made register-relative
addressing (using small immediate offsets) an important method of
accessing operands, especially on the stack.
106
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Much work has therefore been invested in making such accesses as fast as
register accesses, i.e. one cycle instruction throughput in most
circumstances. The following table summarizes the common architecture
steppings of the x86 processors:

Generation Date CPU Brands Linear / Features


Physical
Address space
1st (IA-16) 1978 Intel 8086, Intel 16-bit / 20-bit first x86
8088 (segmented) microprocessors
2nd 1982 Intel 80186, as above fast address, fast
80188, NEC V20 mul/div etc
2nd 1982 Intel 80286 16-bit (30-bit MMU, protected mode
virtual) / 24-bit and larger address
(segmented) space
3rd (IA-32) 1985 Intel386, AMD 32-bit (46-bit 32-bit instruction set,
Am386 virtual) / 32-bit MMU, paging
4th 1989 Intel486 see above RISC-like pipelining,
FPU, on-chip cache
5th 1993 Pentium, see above superscalar, 64-bit data
Pentium MMX bus, faster FPU, MMX
5/6th 1996 Cyrix 6x86, see above register renaming,
Cyrix MII speculative execution
6th 1995 Pentium Pro, 36-bit physical μ-op translation,
AMD K5 (PAE) integrated L2 cache
6th 1997 AMD K6, see above L3-cache support, 3D
Pentium II/III Now, SSE
7th 1999 Athlon, Athlon see above superscalar FPU, wide
XP design
7th 2000 Pentium 4 see above deep pipelined, SSE2,
hyper-thread
6/7th -M 2003 Pentium M see above optimized for low
power
8th (x86-64) 2003 Athlon 64 64-bit / 40-bit x86-64 instruction set,
physical. on-die memory
controller
8th 2004 Prescott see above very deeply pipelined,
high frequency, SSE3
9th 2006 Intel Core, some versions low power, multi-core,
Intel Core 2 are 32-bit only lower clock frequency
10th 2007- AMD Phenom, see above 4-core, 128 bit FPUs,
2008 K10 SSE4 , on-die L3 cache

107
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

It should be noted that the P6 family processors are 32-bit Intel


Architecture (IA-32) processors. This includes the Pentium Pro,
Pentium II, Pentium III, and Pentium III Xeon processors. The
Pentium 4, Pentium D, and Pentium processor Extreme Editions
are based on the Intel NetBurst Microarchitecture. Most early
Intel Xeon processors are also based on the Intel NetBurst
Microarchitecture.

Intel Netburst MicroArchitecture

The Intel Core (Solo and Duo) and dual-core Intel Xeon processor
are based on an improved Pentium M processor architecture.

The Intel Pentium dual-Core, Intel Core Duo, Core Quad and Core
Extreme, Intel Xeon 3x00 and 7x00 series processors are all based
on Intel Core Microarchitecture. The Intel Core2 Duo, Core2
Quad and Core2 Extreme, Xeon 5x00 series are based on
Enhanced Intel Core microarchitecture.

108
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Intel Core MicroArchitecture

The Intel Penryn mircoarchitecture, which included the Core 2 family of


processors, was the first mainstream Intel microarchitecture based on the
45nm fabrication process. This allowed Intel to create higher performance
processors that consumed less power than previous processors. The Intel
Nehalem microarchitecture encompasses the Core i7 class of processors
and uses a 45nm fabrication process. The 2nd generation of Nehalem
processors uses 32nm technology and was codenamed Sandy Bridge.
The Sandy Bridge architecture was followed by Ivy Bridge micro-
architecture, which makes use of the 22nm technology. The Ivy Bridge is
based on 3D tri-gate MOSFET transistors and was demonstrated by Intel
in 2011. However, in 2013, Intel demonstrated the Haswell architecture.
Skylake is the Intel 6th generation of Core microarchitecture which was
launched in 2015. Skylake uses the 14nm manufacturing technology.

Through the continuous development of x86 architecture, the processor


internal registers have been changed from 16-bit to 64-bit wide. Although
the general-purpose registers in x86 processors can be used for anything,
they were envisaged to be used for the following purposes:
109
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

 AX (16-bit) / EAX (32-bit) / RAX(64-bit): accumulator


 BX / EBX / RBX: base
 CX / ECX / RCX: counter
 DX / EDX / RDX: data/general
 SI / ESI / RSI: "source index" for string operations.
 DI / EDI / RDI: "destination index" for string operations.
 SP / ESP / RSP: Stack pointer for top address of the stack.
 BP / EBP/ RBP: stack base pointer to hold the address of stack frame.
 IP/EIP/RIP: Instruction pointer. Holds the current instruction address.

However, the segment registers (CS, DS, SS, ES) are still 16-bit wide for
the matter of compatibility with previous processor generations.

The x86 processors also include various special/miscellaneous registers


such as control registers (CR0 through CR4), debug registers (DR0
through 3, plus 6 and 7), test registers (TR4 through TR7), descriptor
registers (GDTR, LDTR, IDTR), and a task register (TR). In addition to
the increase of the size of internal registers, new technologies have been
introduced, with the advent of new processor architectures.

Floating point unit (FPU): Initially, IA-32 included floating-point


capabilities only on co-processors (8087, 80287 and 80387.) With the
introduction of the 80486, the 8 floating point registers, known as ST(0)
through ST(7) were built into the CPU. Each register is 80 bits wide and
stores numbers in the double precision format of the IEEE floating-point
standard. These registers are accessible like a LIFO stack. The register
numbers are not fixed, but are relative to the top of the stack; ST(0) is the
top of the stack, ST(1) is the next register below the top of the stack, etc.
This means that data is pushed down from the top of the stack, and
operations are always done against the top of stack. So you couldn't just
access any register randomly, it has to be done in the stack order.
110
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

MMX is an SIMD instruction set designed by Intel, introduced in 1997


for Pentium microprocessors. It first appeared in the Pentium MMX. It is
supported on most subsequent IA-32 processors by Intel and other
vendors. MMX is typically used for video applications. MMX added 8
new registers to the architecture, known as MM0 through MM7 (referred
to as MMn). In reality, these registers were just aliases for the existing
x87 FPU stack registers. Hence, anything that is done to the floating point
stack will also affect the MMX registers. Unlike the FP stack, these MMn
registers were fixed, not relative. Each of the MMn registers are 64-bit
integers. The upper 16-bits of the stack registers are unused in MMX, and
these bits are set to ones, which makes it look like NaN's or infinities in
the floating point view.

Single Instruction/Multiple Data (SIMD): SIMD is a technology used in


parallel processors, where many processing elements perform the same
operations on different data. There is often a central controller which
broadcasts the instruction stream to all the processing elements. SIMD
technology was introduced into x86 processors, starting from Pentium III.

Streaming SIMD Extensions (SSE): Streaming SIMD extensions


instruction set. The SSE enables high levels of performance in
multimedia applications like 3-D graphics, video decoding, and speech
recognition In 1999, Intel introduced the SSE instruction set, following in
2000 with SSE2. The first addition made MMX almost obsolete and the
second allowed the instructions to be realistically targeted by
conventional compilers. Introduced in 2004 along with the Prescott
revision of Pentium 4, SSE3 added specific memory and thread-handling
instructions to boost the performance of Intel's processors. SSE discarded
all legacy connections to the FPU stack.. The designers created eight 128-
bit registers, named XMM0 through XMM7. In AMD64, the number of
SSE XMM registers has been increased from 8 to 16. However, the
operating systems had to have an awareness of this new set of instructions
to save their register states. Intel created a slightly modified version of
Protected mode, called Enhanced mode which enables the usage of SSE
instructions, whereas they stay disabled in regular Protected mode.
SSE2 is much more suitable for scientific calculations than previous
technologies such as SSE1 or 3DNow! From AMD, which were limited
to only single precision. SSE4A is a further extensions to the SSE
instruction set.
111
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

Virtualization: x86 virtualization is difficult because the architecture did


not meet the so-called “Popek and Goldberg requirements” until recently.
Nevertheless, there are several commercial x86 virtualization products,
such as VMware, Parallels and Microsoft Virtual PC.Intel and AMD have
introduced x86 processors with hardware-based virtualization extensions
that overcome the classical virtualization limitations of the x86
architecture. These extensions are known as Intel VT (IVT) and AMD-
V. Although most modern x86 server-based and many modern x86
desktop-based processors include these extensions, the technology is
generally considered immature at this point.

Multithreading: The studies showed that even under full load, a typical
x86 server CPU is idle about 50% of the time. This is due to cache misses
which all CPU architectures suffer from; they must wait for data to arrive
from RAM. However, CPUs belonging to the SPARC T1 family do not
suffer from this problem. Instead, as soon a T1 thread stalls due to a
cache miss, the T1 switches thread in 1 clock cycle and continues to do
work while waiting for the data. Typically on a modern CPU, a thread
switch takes a much longer time than 1 clock cycle.

Quick Path Interconnect (QPI): In the development of Nehalem's


microarchitecture, Intel had to remove some of the limitations of last
generation's Core2 Duo and Quad processors. The first thing to go was
the Front Side Bus, the old system interconnect that allowed the CPU,
motherboard and memory to talk to one another. With processors getting
faster and memory sizes getting larger, the FSB became increasingly
choked up transferring data between the CPU and RAM. Intel's solution
is a new point-to-point bi-directional bus called Quick Path Interconnect
that transfers data directly between the CPU and the chipset. Every Core
i7 processor is actually equipped with two QPI links, which could
potentially allow for future multi-processor systems. It's exciting, but
we're not quite there yet. For now though, QPI simply connects the CPU
to the motherboard over a set of 20-bit wide connections that operate at
either 4.8GHz (Core i7) or 6.4GHz (Core i7 Extreme Edition). Since
these links are bi-directional and allow the CPU and the chipset to both
send and receive information simultaneously, the end result is 19.2GB/s
of bandwidth between the Intel Core i7 and the Intel chipset. The
important thing to take away is that the Core i7 won't become bandwidth
bottlenecked anytime, thanks to QPI.

112
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-21. PROBLEMS
2-1) Draw a general block diagram describing the internal architecture of
the 8086 microprocessor and explain briefly each block.
2-2) Calculate the maximum address space of the 8086 microprocessor
and the maximum number of segments that can be located in this space .
2-3) List and explain the significance and use of the general-purpose
registers in the 8086 microprocessor.
2-4) Explain how the address/data lines can be de-multiplexed in 8086
and 8088 microprocessors using octal latches like the 8282 and bus
transceivers 8286 chips.

2-5) In which microprocessor does the concept of pipeline was first


introduced?
(a) 8086 (b) 80286 (c) 80386 (d) 80486
2-6) Explain how the bus controller chip 8288 can be used to obtain main
control signals from 8086 microprocessors.
2-7) What‟s meant by a 32-bit microprocessor? What are the main
features of 80386 and 80486/8088 microprocessors?
2-8) SIMD is:
(a) used in standard Pentiums but not in the MMX versions.
(b) A way of preventing wraparound.
(c) Single in-line multimedia data.
(d) Single instruction multiple data.
(e) All the above
2-9) Describe how INTR signals can be generated by the interrupt
controllers and how can it interfaced to the 8086 microprocessor.
2-10) What do you know about Cash Memory and how does it operate in
a microprocessor? What is meant by L1-Cash and L2-Cash memory?
2-11) Branch prediction logic:
(a) is another name for the prefetch register.
(b) is only used in MMX versions.
(d) attempts to guess the future steps to be taken by a program.
(e) All the above
113
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-12) Check the right phrases with (√) sign and false ones with (x)
[1] The early Intel x86 processors were generally CISC processors [ ]
because they made use of complex instructions sets
[2] The 8088 is a 8-bit microprocessor while 8086 is a 16-bit PU [ ]
[3] The Core 2 Duo is a dual core microprocessor, which belongs to [ ]
Intel‟s x86 microprocessors
[4] RISC processors are faster than CISC processors because they [ ]
use simpler fixed-length instructions and their architecture
enables pipelining and superscalar execution
[5] In the flat memory mode, the whole memory of a 80386 micro- [ ]
processor may be considered as one segment of 4GB
[6] The interrupt service routines are called by the CPU when IF=0 [ ]
[7] The AF is raised (AF=1) when the addition of 8-bit numbers [ ]
results in a carry
[8] The parity flag helps to correct memory errors, in x86 [ ]
microprocessor system
[9] Core2 Duo is a dual core microprocessor with shared L2-Cache [ ]
[10] The 80486 has a built-in FPU [ ]
2-13) Describe the main features of Pentium processors, with respect to
their precursors. What‟s the difference between Pentium 4 and Itanium
microprocessors?
2-14) Describe the meaning of the following terms:
 Pipelining,
 Super-scalar architecture,
 SEC, SIMD, MMX, SSEE, SSEE2
2-15) What are the main power saving modes, which are supported in
Pentium microprocessor.
2-16) Describe the operation of the main support chips in the 8086/8088
– based microcomputer systems, and show how they‟re interfaced to the
microcomputer system. Hint: The bus controller 8288, the programmable
timer / counter PTC 8243/8244, the programmable interrupt controller
PIC 8259, the programmable peripheral interface PPI 8255
2-17) Explain the difference between a directive, an operation, and an
instruction. Give an example of each.
2-18) How are the integer registers named on the SPARC?
2-19) How many integer registers are there on the SPARC? For each of
the integer registers that have special attributes, explain the special
attributes.
114
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 2

2-22. Bibliography

[1] C. MORGAN and M. WAITE, 8086/8088 16-bit microprocessor


primer, McGraw-Hill, 1986.

[2] M. Sergent, IBM PC Inside Out, McGraw Hill, 1986

[3] Intel Workshop, IAPX 286 Microprocessors, Intel corporation, 1986.

[4] M. THORNE, Computer organization and assembly language


programming for IBM PCs and Compatibles, Benjamin-Cummings,
1991.

[5] J. E. Uffenbeck, Microprocessors: The 8080, 8085, and Z-80


Programming, Interfacing, and Troubleshooting (3rd Edition), 1988.

[6] 1982K. Hwang, Advanced Computer architecture, McGraw Hill,


1993

[7] K. AYALA, The 8086 Microprocessor: Programming and Interfacing


the PC, CENGAGE, West Publishing, 1995.

[8] E. TRIEBEL, 80386/80486 and Pentium Processors, Prentice-Hall,


1995.

[9] Barry B. Brey, The Intel Microprocessors 8086/8088, 80186/80188,


80286, 80386, 80486, Pentium, and Pentium Pro Processor Architecture,
Programming, and Interfacing, Book News, NY, 1999.

[10] W. A. Triebel, 80386, 80486, and Pentium Microprocessor: The


Hardware, Software, and Interfacing, 1999.

[11] T. Shanley, Pentium Pro and Pentium II System Architecture (2nd


Edition), Mind share Inc. 2000.

[12] ARM Architecture Reference Manual, ARMv7-A and ARMv7-R


edition, issue C.b, Section A2.10, 24 July 2012

[12] http://www.intel.com
[13] http://www.x86-guide.com
[15] ARM milestones, ARM company website. Retrieved 8 April 2015
115
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 3

Memory Organization &


Segmentation
Contents

3-1. Memory Organization in Computer Systems


3-1.1. Virtual Memory
3-1.2. Physical Memory
3-1.3. Memory Paging
3-2. Memory Segmentation in x86 Systems
3-2.1. Flat Memory Model
3-2.2. Segmented Memory Model
3-3. Operation Modes in x86 Microprocessors
3.3.1. Real Mode
3.3.2. Protected Mode
3.3.3. Virtual Mode
3.3.4. Long & Legacy Modes of x86-64 Processors
3-4. Memory Addressing in x86 Microprocessors
3-4.1. Real Mode Addressing (Generating 20-bit Address)
3-4.2. Protected Mode Addressing (Generating 32-bit Address)
3-4.3. Memory Paging in Protected Mode
3-4.4. Protection Aspects
3-4.5. Privilege Level
3-4.6 Entering& Leaving the Protected Mode
3-4.7. Protected Multitasking
3-4.8. Virtual Mode
3-4.9. Physical Address Extension (PAE)
3-4.10. Long Mode Addressing in x86-64 Architecture
3-4.11. Long Mode Memory Management
3-4.12. RIP-Relative Addressing
3-5. Stack Operation
3-6. IBM Memory Organization
115

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-7. SPARC Processors Memory Models & Addressing Space


3-7.1. SPARC Memory Modes
3-7.2. SPARC Addressing Space
3-8. ARM Memory Organization
3-9. Summary
3-10. Problems
3-11. Problems

116

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Memory Organization
& Segmentation

3-1. Memory Organization in Computer Systems


The physical memory of computer systems is usually organized as a
sequence of 8-bit bytes. Each byte is assigned a unique address that
ranges from zero to a maximum allowed memory space, which depends
on the width of the address bus. For instance, the 8086 has 20-bit address
bus and can address up to 1MB (2 20 bytes). Also, the 80386 has 32-bit
address bus and can address up to 4GB (2 32 bytes) of physical memory.
On the other hand, the recent 64-bit processors, like Core2, has 64-bit
address bus and can address theoretically up to 16 B (264 bytes).

3-1.1. Virtual Memory


The so-called virtual memory consists of the entire address space
available to programs. It is a large linear-address space that is translated
by a combination of hardware and operating-system software to a smaller
physical-address space, parts of which are located in memory and parts
on disk or other external storage media.

3-1.2. Physical Memory


The physical memory is the installed memory (excluding cache memory)
in a particular computer system that can be accessed through the
processor’s bus interface. The maximum size of the physical memory
space is determined by the number of address bits on the bus interface. In
a virtual memory system, the large virtual-address space (also called
linear-address space) is translated to a smaller physical address space by
a combination of segmentation and paging hardware and software.

3-1.3. Memory Paging


Paging is a mechanism for translating linear (virtual) addresses into fixed-
size blocks called pages, which the operating system can move, as
needed, between memory and external storage media (typically disk).

117

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-2. Memory Segmentation in x86 Systems


The architecture of modern processors allows multitasking and gives
designers the freedom to choose one of the following two memory
models for each task:

 Flat (linear) address space consisting of a single array of up to 2n-1


bytes (where n is the number of address lines).
 Segmented address space consisting of a number of segments, each of
linear address space of up to 2n-1 Bytes.

In x86 systems, the flat address space of 32-bit processors consists of a


single array of up to 4GB. Also, the segmented address space of such
systems consists of up to 16,383 segments, each of linear address space of
up to 4GB (for 32-bit address) or up to 16 B (for 64-bit address).

3.2.1 Flat Memory Model


In a flat model of memory organization, the applications programmer
sees a single array of up to 4GB (232 bytes), in 32-bit microprocessors, or
up to 264 bytes, in recent 64-bit microprocessors. An operating system
designer can choose to simulate a flat memory model (also called linear
model), by creating one large code segment (CS) and one large data
segment (DS) and having all programs use the same values for CS and
DS. The UNIX operating system – with its VAX heritage – is typically
implemented in such a linear memory model.

The processor maps the flat memory space onto the physical address
space using a specific address translation mechanism. However, the
applications programmers do not need to know the details of the
mapping. Relocation of separately compiled modules in this space must
be performed by the operating system software (e.g., linkers, locators,
binders, loaders).

3.2.2 Segmented Memory Model


In a segmented memory organization model, the address space, as viewed
by an applications program, is called the logical address space.
Applications programmers view the logical address space of the 80386
(and later processors) as a collection of up to 16,383 one-dimensional
subspaces, each with a specified length. Each of these linear subspaces is
called a segment. Note that a segment is a unit of contiguous address
space. The segment sizes in a 32-bit address system may range from one
byte up to 4GB (232 bytes).

118

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

In x86 processors, the total logical address space is up to 64TB (246 –1


bytes). The processor maps the 64TB logical address space onto the
physical address space (up to 4GB) by specific address translation
mechanisms. Every time an x86 processor wants to refer to memory (to
fetch an instruction or get/put data) it must go through a complex
mapping process to calculate the physical memory address. This is
because application programmers do not use physical addresses. Rather,
they use the logical addresses, which have at least one level of
indirection. Therefore, applications programmers do not need to know the
details of this mapping.

Memory
Address Model

Flat Segmented
Address Model Address Model

Fig. 3-1(a). Illustration of the basic memory models.

3-3. Operation Modes of x86 Microprocessors


The x86 microprocessor families can operate in various modes. The Intel
8086, Intel 8088, Intel 80188 and Intel 80186 had only real mode. In real
mode the processor can address up to 1 MB (as 8086 in its maximum
mode) using 20-bit address lines.

Processors beginning with the Intel 80286 feature a second mode called
protected mode (or protected virtual address mode PVAM). In this mode
the microprocessor can address up to 24 MB. To ease the transition
to/from protected mode, Intel 80386 and later processors have been
provided with a third mode called "virtual 86", or simply V86 mode. The
x86-64 processors have two modes, namely: the long mode and legacy
mode. Each of these modes has sub-modes, which are backward
compatible with previous 16-bit real mode, 32-bit protected mode as well
as the virtual 86 mode.
119

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Operation
Modes

Real Virtual Protected


Mode Mode Mode
Fig. 3-1(b). Illustration of the basic memory operation modes in x86 processors.

3-3.1. Real Mode


The real mode is the oldest memory mode, which appeared with
8086/8088 microprocessors, with 20-bit address. In real mode, addresses
are generated by adding an address offset to the value of a segment
register shifted left four bits (multiplying it by 16). As the segment
register and address offset are 16 bits long, in such microprocessors, this
results in a 20-bit address. This is the origin of the 1MB (220 Byte) limit
in real mode.

3-3.2. Protected Mode


The 80386 and later processors contain mechanisms to verify memory
accesses and instruction execution for conformance to protection criteria.
In protected mode, the segment registers contain an index into a table of
segment descriptors. Each segment descriptor contains a start address of
the segment, to which the offset is added to generate the address. In
addition, the segment descriptor contains memory protection information.
This includes an offset limit and bits for write and read permission. This
allows the processor to prevent memory accesses to certain data. The
operating system can use this to isolate and protect memory areas of
different processes, hence the name protected mode.

The effective addresses generated by the CPU (EA or Offset) are passed
to the MMU to be checked against the limit in the segment descriptor and
are there added to the segment base address in the descriptor to form a
linear address. On an 80386 and later processors, the linear address is
further processed by the paged MMU before the final result (physical
address) appears on the chip address bus.

120

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

The 80286 doesn't have a paged MMU so the linear address is output
directly as the physical address.

Software, which has been written or compiled to run in protected mode,


must only use segment register values given to it by the operating system.
Unfortunately, most application code for MS-DOS, written before the
80286, will fail in protected mode because it assumes real mode
addressing and writes arbitrary values to segment registers. Such use of
segment registers is only necessary with data structures that are larger
than 64 kB and thus don't fit into a single segment. This is usually dealt
with by the huge memory model in compilers.

In this model, compilers generate address arithmetic involving segment


registers. A solution which is portable to protected mode with almost the
same efficiency would involve using a table of segments instead of
calculating new segment register values ad hoc. After processor reset, all
processors start in real mode. Protected mode has to be enabled by
software. On the 80286 there exists no documented way back to real
mode apart from resetting the processor. Later processors allow switching
back to real mode by software.

3.3.3. Virtual Mode (V86)


The Intel 80386 and later processors provide a virtual 86 mode (V86) to
facilitate the transition from real mode to protected mode. The 80386
processors support execution of one or more 8086/88 programs (DOS
sessions with 1 MB memory) in a protected-mode environment.

This means that 80386 microprocessor can execute multiple DOS


programs in V86 mode. An 8086/8088 program runs in this environment
as part of a V86 task. V86 tasks take advantage of the hardware support
of multitasking offered by the protected mode. So, not only can there be
multiple V86 tasks, each one executing an 8086 program, but V86 tasks
can be multi-programmed with other 80386 tasks.

The VM flag bit, in the EFLAGS register, selects virtual mode operation
in protected mode. Once, this mode is entered, any attempt to access
memory beyond 1 MB, will result in an error. Thus, the purpose of a V86
task is to form a "virtual machine" with which we can execute an 8086
program. A complete virtual machine consists not only of 80386
hardware but also of system software. Thus, the emulation of an 8086 is
the result of cooperation between hardware and software.

121

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

The roles of hardware and software components are as follows

 The hardware provides a virtual set of registers, via the task-state


segment (TSS), a virtual memory space (the first 1 MB of the linear
address space of the task), and directly executes all instructions that
deal with these registers and with this address space.
 The software controls the external interfaces of the virtual machine
(I/O, interrupts, and exceptions) in a manner consistent with the
larger environment in which it executes. In the case of I/O,
software can choose either to emulate I/O instructions or to let the
hardware execute them directly without software intervention.

3.3.4. Long & Legacy Modes in x86-64 Processors


A processor implementation x86-64 (or AMD64 architecture) can run in
either 64-bit mode or legacy modes. The 64-bit mode, in its turn, may be
either a compatibility mode or a long mode. Long mode is the mode
where 64-bit applications (or operating systems) can access the 64-bit
instructions and registers, while 32-bit and 16-bit programs are executed
in a compatibility sub-mode.

Legacy mode is an operating mode of the x86-64 architecture in which


existing 16-bit and 32-bit applications and operating systems run without
modification. Legacy mode has three sub-modes, real mode, protected
mode, and virtual-8086 mode, as shown in figure 3-1(c).

x86-64 (AMD64)
Operation Modes

Legacy 64-bit
Modes Mode

Real Virtual Protected Long Compatibility


Mode Mode Mode Mode Mode

Fig. 3-1(c). Illustration of the operation modes in x86-64 processors.

122

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

As shown in figure 3-1(d), the 64-bit mode of x86-64 utilizes the flat
memory model. In fact, most of the modern operating systems
neglect the segmentation features available in the legacy x86
architecture. Instead, operating systems handle segmentation
functions entirely in software.

3.4. Memory Addressing in x86 Microprocessors


As we have mentioned so far, the virtual memory space of a
microprocessor system consists of the entire address space available to
programs. It is a large linear-address space that is translated by a
combination of hardware and operating-system software to a smaller
physical-address space, parts of which are located in memory and parts
on disk or other external storage media. Operating systems have used
segmented memory as a method to isolate programs from the data
they used to increase the reliability of systems running multiple
programs simultaneously. In such segmented memory systems, the
memory space is divided into data, code and stack segments and the
information to be stored in memory, is categorized and mapped into these
segments.

Fig. 3-1(d). Virtual memory space in x86-64 microprocessors.

123

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

We have seen so far, that the 8086/80286 microprocessors have 4


segment registers (CS, DS, SS, and ES). The 80386 and later processors
have 2 additional 16-bit segment registers (GS, FS). The beginning of the
different segments may be directly contained in the segment registers (in
real mode), as shown in figure 3.2(a), or indirectly calculated from their
contents (in protected mode). The question now is how the
microprocessor can point to a certain memory location within one of
these segments? In order to point to a physical memory location within a
segment, the microprocessor adds an offset to the segment address, using
a certain addressing mechanism.

3.4.1. Real Mode Addressing (20-bit Address)


As we mentioned so far, the physical address in real mode is determined
from the segment address and the offset address. Figure 3.2(b) depicts
how the 8086/88 adds an offset address to the segment address, after
being shifted left by 4 binary bits, to obtain the memory location of a
certain byte in a physical memory of 1 MB. So, the resultant 20-bit linear
address is given by:

Linear (20-bit) Address = Segment Address x 16 + Offset (3.1)

Note that, within the 1 MB memory space of 8086/8088 processors, the 20-
bit linear address is equivalent to the 20-bit physical address in memory.
As shown in figure 3-2(c), the offset of the code segment is obtained from
the IP register content. Therefore, when the processor wants to fetch a
new instruction from the code segment, it adds the IP content (offset) to
the CS content (code segment address) to point to that instruction. After
executing the instruction, the microprocessor will need to point to the
next instruction. To obtain the next instruction address, the processor
increment the IP register (by the length of last instruction in bytes) and
adds IP content to CS, as indicated above and so on. Table 3.1 indicates
the segment registers and the location of their corresponding offsets. It
should be noted that the data segment, DS, is usually used to store
program variables.

A variable name is actually an address, which represents its location in a


certain data segment. When the processor wants to reach to a certain
variable in a certain data segment it simply adds the variable name (the
data item offset) to the DS content, which points to data segment. More
details will be given after the memory addressing modes.

124

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

FFFFF

CODE Segment

CS

DS DATA Segment

SS Memory

ES STACK Segment

P
EXTRA Segment

00000

Fig. 3-2(a). Memory segmentation in x86 systems. In real mode, each segment
register points directly to the beginning of corresponding segment in memory.

In real mode, the result offset is a 16-bit value that is sometimes called
effective address (EA)1.

EA=Base(BX or BP) + Index(SI or DI)*Scale+ Displacement (3.2)

The combination of the segment address and offset address is usually


written in the form Segment:Offset. For instance, a location in code
segment may be referred to as CS:IP.

Table 3.1. Segment registers and their typical offsets in x86 microprocessors

SEGMENT OFFSET (EA) USE


CS IP Instruction address
DS BX, DI, SI or any 16-bit number Data address
SS BP, SP Stack address
ES DI (string instructions) String destination

1
Intel manuals sometimes call this combination the effective address (EA) and sometimes call it Offset
(when they discuss the assembly language).
125

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

CS 0000

DS 0000

SS 0000

ES 0000 OFFSET

Physical Address

Fig. 3-2(b). Addressing a Memory Location inside a Segment, by adding an offset.

FFFFF
CODE Segment
Next Instruction
(Program Code)
IP
CS

Data item DATA Segment


Offset (Program Variables)
(EA)
DS Memory

STACK Segment

Top of Stack (Subroutine Parameters)


SP

SS

00000

Fig. 3-2(c). Addressing a memory location inside a segment

126

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

In order to provide more flexible addressing way, data offset address


may be formed by adding a combination of BX or BP registers content,
SI or DI registers content and a displacement. The segment and offset
addressing scheme, allows programs to be relocated in the memory
system. The executable (*.exe) programs under MS-DOS and Windows
operating systems are relocatable programs. A relocatable program can
be placed in any area of memory and executed without any change. So,
relocatable programs, which are written to real mode, can also operate in
the protected memory mode.

3-4.2. Protected-Mode Addressing (32-bit Address)


A virtual address is specified by 2 components: selector and offset. So, a
complete pointer in this address space consists of two parts:

1. A segment selector, which is used to identify the segment base


address. The selector is contained in a 16-bit segment register.
2. An offset, which is a 32-bit ordinal that addresses to the byte level
within a segment.

Offset-Address Generation: Figure 3-3 depicts how the 80386 and later
processors can generate a 32-bit address. As shown, the 32-bit offset, or
effective address (EA), is generally given by the specific summation.
This summation is similar to equation (3-2), except that the base and
index registers are all 32-bit in 80386 and later processors.

EA (offset) = Base + Index * Scale + Displacement (3.3)

Segment Base Address Generation: As we mentioned above, the


segment registers do not directly contain base segment address, in
protected mode. Instead, they contain a segment selector of 14-bits, as
shown in figure 3-4. The 14-bit address in the segment selector contains
the index of a descriptor table of 32-bit segment descriptors that contain
the segment base address. The index (13-bit) and the table-type index (TI)
bit fields tell the CPU where to find the segment descriptor and in which
descriptor table. For instance, TI=0 means the general descriptor table
(GDT) and index = 13A7 means the 839th line of the GDT. Alternatively,
when TI=1 this means that the segment descriptor is located at the local
descriptor table (LDT), which is currently active. The 2-bit RPL field,
designates the requested privilege level (4 levels are possible). Figures 3-
4(c), (d) depict how the 32-bit effective address (offset) is combined with
the segment selector to obtain the linear address in protected mode.

127

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Fig. 3-3(a). Generation of the 32-bit address offset (or effective address) in 80386
and later processors.

Fig. 3-3(b). Segmented address generation in real and protected modes.

128

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Segment Selector
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
14-bit Selector RPL
Index TI

Figure 3-4(a). Segment selector architecture.

Segment Descriptor
63 … 52 51 … 48 47 … 41 40 39 … 16 15 … 0
Base Address Limit Access 0 Base Address Limit
A24 - A31 L16 - L19 rights A0 - A23 L0 - L15

Figure 3-4(b). Fields in a descriptor table (segment descriptor).

FFFFFFFF
Memory
Selector Offset
13A7 0010F405
Segment 33EC4050
:

Segment Base address

032DD000\
Descriptor Table
Segment Descriptor
Segment Descriptor
……………..
...........................
Segment Descriptor

00000000

Figure 3-4(c). Linear address generation mechanism in protected memory.

129

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

A descriptor is a block of memory that describes the characteristics of a


given element of the system. These characteristics include: segment base
addresses, segment size limit, access rights, and privilege levels. Figure
3-4(b) depicts the fields of the descriptor table. As shown, the segment
base address (A0-A31) is contained in 2 fields. Also, the access rights
field occupies 8 bits, and is divided into 3 subfields. The descriptor
privilege level (DPL) is one of these subfields. The DPL may differ from
the RPL which is located in segment selector.

An 14-bit selector implies up to 16K segment descriptors (and hence 16K


memory segments) and since each segment can have an offset of 32-bits
(4 GB of address space), an upper limit of 64 TB is theoretically possible.
This limit is not currently supported by any existing operating system.

Figure 3-4(d). Combining the 32-bit effective address (offset) with the segment
selector to obtain the linear address in protected memory

130

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-4.3. Memory Paging in Protected Mode


As we have pointed out so far, memory paging is an addressing
mechanism (for 386 and later processors) that allows any physical
memory location to be mapped into a linear memory location. As, we
stated before, the linear address is the address generated by the CPU, by
combining the segment selector with the effective address in your
program. The linear address can be then translated to a physical address,
by the paging mechanism. The paging mechanism allows for arbitrary
mapping of 4kB memory blocks (pages), through a translation table
stored in memory.

The so-called page directory is a special page of memory, which has the
translation table entries. It has up to 1024 page translation table entries,
each 4 bytes long. The paging mechanism makes use of the control
register CR3 of the microprocessor (the page descriptor table register),
for holding the access address of the page directory.

The whole linear address of 4 GB is divided into 1024 pages. Each entry
in the page directory can translate the leftmost 10-bits of a linear address.

Page Directory Table Entry

31 … ... … … 12 11 10 9 8 7 6 5 4 3 2 1 0
Page Table Address Reserved 0 0 D A 0 0 U/S R/W P

 A: Accessed bit (set 1, whenever the microprocessor access this page entry)
 D: Dirty bit (used by the operating system)
 P: Present bit (set 1, whenever the page entry can be accessed in translation)
 R/W: Read/write bit (used in page protection scheme)
 U/S: User/supervisor bit (used with R/W bit to develop page priority level)

Figure 3-5(a). One of the page directory records (translation table entry).

A few entries of the translation table are cached in the MMU Translation
Look-aside Buffer (TLB) to avoid excessive memory accesses. The TLB
stores the 32 most frequently used pages table entries and page directory
entries on the processor cache memory (L1-Cache).

The paging mechanism can also admit the use of memory in certain areas,
where no memory exists (e.g., in system BIOS and VIDEO BIOS
ROMs). For instance, the EMM386.EXE program can be used to page the
extended memory area (above the 640kB DOS limit) into 4kB blocks or
pages.
131

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Linear Address
31 …… 22 21 …… 12 11 …… 0
Directory Index Table Index Offset

Page Directory Table Page Frame


Page Table
1023 4095
1023
31…..12 11...... 0 Physical

Address
0
0
Page Table Address 0
Page Address

Figure 3-5(b). Illustration of the paging mechanism

3-4.4. Protection Aspects


The purpose of the protection features is to help detect and identify bugs,
particularly in sophisticated programs that may consist of multiple
modules. To help debug applications faster and more robust, the 80386
and later processors contain mechanisms to verify memory accesses and
instruction execution protection criteria. These mechanisms may be used
or not, according to the system requirements. Protection has five aspects:

 Restriction of addressable domain


 Segment type checking
 Segment limit checking
 Restriction of instruction set
 Restriction of procedure entry points

The protection hardware is an integral part of the memory management


hardware in 80386 and later processors. Protection applies both to
segment translation and to page translation. Each reference to memory is
checked by the hardware to verify that it satisfies the protection criteria.
All the above checks are made before the memory cycle is started; any
violation prevents that cycle from starting and results in an exception.
The segment descriptors store protection parameters. The CPU performs
protection checks automatically when the selector of a segment descriptor
is loaded into a segment register and with every segment access. The
protection parameters are placed in the descriptor by systems software at
the time a descriptor is created. In general, applications programmers do
not need to be concerned about protection parameters.

132

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-4.5. Privilege Levels


The concept of privilege is implemented by assigning a value from zero
to three for key-objects recognized by the processor. This value is called
the privilege level. The value zero represents the greatest privilege; the
value three represents the least privilege, as shown in figure 3-6. The
following processor-recognized objects contain privilege levels:
 Descriptors contain a field called the descriptor privilege level
(DPL).
 Selectors contain a field called the requestor privilege level (RPL).
The RPL is intended to represent the privilege level of the
procedure that originates a selector. Particularly, the RPL of the
selector in the CS (code segment selector) identifies the procedure
of the currently executing routine and is called the current privilege
level (CPL).
 An internal processor register records the current privilege level
(CPL). Normally the CPL is equal to the DPL of the segment that
the processor is currently executing. CPL changes as control is
transferred to other segments with different DPL's.

Application Most secure


Privilege Ring

OS Kernel
0

Least secure 2
Privilege Ring 3

Fig. 3-6. Privilege levels, in 80386 (and later processors) systems.

The processor automatically evaluates the right of a procedure to access


another segment by comparing the CPL to one or more other privilege
levels. The evaluation is performed at the time the selector of a descriptor
is loaded into a segment register. The criterion which is used for
evaluating access to data differs from that for evaluating transfers of
control to executable segments.
133

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-4.6. Entering and Leaving Protected & Real-Address Modes


Real-address mode comes into effect when the CPU is powered up or
after a signal on the RESET pin. Even if the system is going to be used in
protected mode, the start-up program will execute in real-address mode
temporarily while initializing for protected mode. Setting the CR0
protection enable bit (PE) to 1 switches the CPU into protected mode. So,
the processor enters protected mode when a MOV to CR0 instruction sets
the PE bit in CR0. For compatibility with the 80286, the LMSW
instruction may also be used to set the PE bit

For switching back to real-address mode, the software should clear the
PE bit in CR0 with a MOV to CR0 instruction. A procedure that attempts
to do this, however, should proceed as follows:

1. If paging is enabled (PE=1), perform the following sequence:


 Transfer control to linear addresses that have an identity mapping; i.e.,
linear addresses equal physical addresses.
 Clear the PG bit in CR0.
 Move zeros to CR3 to clear out the paging cache.
2. Transfer control to a segment that has a limit of 64K (FFFFH). This
loads the CS register with the limit it needs to have in real mode.
3. Load segment registers SS, DS, ES, FS, and GS with a selector that
points to a descriptor containing the following values, which are
appropriate to real mode:
 Limit = 64k (FFFFH)
 Byte granular (G = 0)
 Expand up (E = 0)
 Writable (W = 1)
 Present (P = 1)
 Base (any value)
4. Disable interrupts. A CLI instruction disables INTR interrupts. NMI
can be disabled with external circuitry.
5. Clear the PE bit.
6. Jump to the real mode code to be executed using a far JMP. This
action flushes the instruction queue and puts appropriate values in the
access rights of the CS register.
7. Use the LIDT instruction to load the base and limit of the real-mode
interrupt vector table.
8. Enable interrupts.
9. Load the segment registers as needed by the real-mode code.

134

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-4.7. Protected Multitasking


To provide efficient, protected multitasking, the 80386 and later
processors employ the following special registers and data structures.

 Task state segment (TSS)


 Task state segment descriptor
 Task register (TR)
 Task gate descriptor (TG)

With these structures the 80386 and later processors can rapidly switch
execution from one task to another, saving the context of the original task
so that the task can be restarted later.

3-4.8. Entering and Leaving the Virtual Mode


As we stated before, the virtual mode can be entered by setting the VM
flag, which is bit-20 in the EFLAGS register, to logic 1 (VM =1). When
the privilege level is 0 (highest priority), this mode can only be entered by
executing the IRET instruction.

3-4.9. Physical Address Extension (PAE)


By default, physical addresses are 32-bit, however, there exists a page
extension mode called Physical Address Extension or PAE, first added in
the Intel Pentium Pro, which allows an additional 4 bits of physical
addressing. The size of memory in Protected mode is usually limited to 4
GB. Through tricks in the processor's page and segment memory
management systems, x86 operating systems may be able to access more
than 32-bits of address space, even without the switch-over to the 64-bit
paradigm. This mode does not change the length of segment offsets or
linear addresses; those are still only 32 bits.

3-4.10. Long Mode Addressing in x86-64 Architecture2


We have pointed so far that the virtual memory of a microprocessor
system consists of the entire address space available to programs. Figure
3-7 shows how the virtual-memory space is treated in the two sub-modes
of long mode:
• 64-bit mode—This mode uses a flat segmentation model of virtual
memory. The 64-bit virtual memory space is treated as a single, flat (un-
segmented) address space. Program addresses access locations that can be

2
Since AMD64 and Intel 64 are substantially similar, many software and hardware products use one
vendor-neutral term to indicate their support for both implementations. AMD's original designation for
this processor architecture, "x86-64", is still sometimes used for this purpose
135

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

anywhere in the linear 64-bit address space. The operating system can use
separate selectors for code, stack, and data segments for memory-
protection, but the base address of all these segments is always 0.
• Compatibility mode—This mode uses a protected, multi-segment model
of virtual memory, just as in legacy protected mode. The 32-bit virtual-
memory space is treated as a segmented set of address spaces for code,
stack, and data segments, each with its own base address and protection
parameters. A segmented space is specified by adding a segment selector
to an address.
Although virtual addresses are 64 bits wide in 64-bit mode, current
implementations do not allow the entire virtual address space of 264 bytes
to be used. Most operating systems and applications will not need such a
large address space for the foreseeable future. For example, Windows 64
is only populating 16 TB, or 44 bits long, so supporting such wide virtual
addresses would simply increase the complexity and cost of address
translation with no real benefit. AMD therefore decided that, in the first
implementations of the x86-64 architecture, only the least significant
48 bits of a virtual address would actually be used in address translation.
However, bits 48 through 63 of any virtual address must be copies of bit
47, or the processor will raise an exception.

Fig. 3-7. Structure in x86-64-bit virtual memory.

136

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Addresses complying with this rule are referred to as canonical form.


Canonical form addresses run from 0 through 00007FFF`FFFFFFFF, and
from FFFF8000`00000000 through FFFFFFFF`FFFFFFFF, for a total of
248 bytes or 256 terabytes of usable virtual address space.

The 64-bit addressing mode is a superset of Physical Address Extensions


(PAE). Because of this, page sizes may be either 4 KB (212 bytes), 2 MB
(221 bytes), or 1GB (230 bytes). However, systems running in long mode
use four levels of page table: PAE's Page-Directory Pointer Table is
extended from 4 entries to 512, and an additional Page-Map Level 4
Table is added, containing 512 entries in 48-bit implementations.
Anyway, a full mapping hierarchy of 4 KB pages for the whole 48-bit
space would take a bit more than 512 GB of RAM.

3-4.11. Long Mode Memory Management


Figure 3-8 shows the flow of memory management functions
performed in the two sub-modes of long mode.

Fig. 3-8. Memory management in long operation sub-modes

The following table depicts the register usage in legacy and 64-bit
operation modes

137

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Table 3-2. Register usage in legacy and 64-bit operation modes

The following operating systems and releases support the x86-64


architecture in long mode:

 DOS: It is possible to enter long mode under DOS with a DOS


extender similar to DOS/4GW
 BSD: So many versions of the Berkely Software, like FreeBSD and
NetBSD as well as OpenBSD have supported the x86-64 architecture,
since 2004.
 Linux: was the first operating system kernel to run the x86-64
architecture in long mode, starting with the 2.4 version prior to the
physical hardware's availability.
 Mac OS X: Mac OS X v10.4.7 and higher versions support 64-bit
command-line tools using the POSIX and math libraries
 Solaris: Solaris 10 (from SUN Microsystems) and later releases
support the x86-64 architecture
 Windows: Windows XP Professional x64 Edition and Windows
Server 2003 as well as Windows Vista are supporting x86.

3-4.12. RIP-Relative Addressing


RIP-relative addressing—that is, addressing relative to the 64-bit
instruction pointer — is available in 64-bit mode. The effective address
(EA) is formed by adding the displacement to the 64-bit RIP of the next
instruction. In the legacy x86 architecture, addressing relative to the
instruction pointer (IP or EIP) is available only in control-transfer
instructions.

138

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-5. Stack Operation


A stack is a contiguous array of identical items of any type, arranged
linearly with access at one end only, called the top. This means that data
can be added or removed from only the top. Formally this type of stack is
called a Last In, First Out (LIFO) stack. Think of a stack as a pile of
papers on your desk which can be added on top. You generally keep the
things that you are working on toward the top, and you take things off as
you are finished working with them.

Figure 3-9. A Stack of papers

Your computer has a stack, too. The computer’s stack is located at the
very top addresses of memory. Data is added to the stack using the Push
operation, and removed using the Pop operation. Stack may be contained
in a memory segment and identified by the segment selector in the SS
register. A stack can be up to 4 GB long, the maximum size of a segment.
When using the flat memory model, the stack can be located anywhere in
the linear address space which is dedicated for the program. As shown in
the following figure, you can push data onto the top of the stack by the
PUSH instruction, which pushes either a register or memory value onto
the top of stack. Well, we say it’s the top, but the "top" of the stack is
actually the bottom of the stack memory. Although this is confusing, the
reason for it is that when we think of a stack of anything - like papers -
we think of adding and removing to the top of it. However, in memory
the stack starts at the top of memory and grows downward due to
architectural considerations. Therefore, when we refer to the "top of the
stack" remember it’s at the bottom of the stack’s memory. We can
actually continually push data onto the stack and it will keep growing
down in memory until we hit the program code or data. This condition is
called Stack Overflow. You can also pop values off the top using the POP
instruction. This removes the top value from the stack and places it into a
register or memory location of your choice.

139

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

When an item is pushed onto the stack, the processor decrements the ESP
register, then writes the item at the new top of stack. When an item is
popped off the stack, the processor reads the item from the top of stack,
then increments the ESP register. In this manner, the stack grows down in
memory (to lesser addresses) when items are pushed on the stack and
shrinks up (to greater addresses) when the items are popped from stack.

Fig. 3-10. Stack segment structure

A program or operating system executive can set up many stacks. For


example, in multitasking systems, each task can be given its own stack.
The number of stacks in a system is limited by the maximum number of
segments and the available physical memory. When a system sets up
many stacks, only one stack—the current stack—is available at a time.
The current stack is the one contained in the segment referenced by the
SS register. For example, when the ESP register is used as a memory
address, it automatically points to an address in the current stack.

140

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-5.1. Setting-up a Stack


To set a stack and establish it as the current stack, the program or
operating system/executive must do the following:

1. Establish a stack segment (or a stack frame using Enter instruction)


2. Load the segment selector for the stack segment into the SS register
using a MOV, POP, or LSS instruction.
3. Load the stack pointer for the stack into the ESP register using a MOV,
POP, or LSS instruction. The LSS instruction can be used to load the SS
and ESP registers in one operation.

Figure 3-11. Stack frame

141

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-5.2. Stack Operations


You cannot write assembly-language functions without understanding
how the computer’s stack works. Each computer program that runs uses a
region of memory as a stack to enable functions to work properly. For IA-
32 processors from the Intel 286 on, the PUSH ESP instruction pushes
the value of the ESP register as it existed before the instruction was
executed. (This is also true for Intel64 architecture, real-address and
virtual-8086 modes of IA-32 architecture.) For the Intel® 8086 processor,
the PUSH SP instruction pushes the new value of the SP register (that is
the value after it has been decremented by 2).

Similarly, the POP ESP instruction increments the stack pointer (ESP)
before data at the old top of stack is written into the destination.

A. Push Operation
The PUSH instruction decrements the stack pointer and then stores the
source operand on the top of the stack. The address-size attribute of the
stack segment determines the stack pointer size (16, 32 or 64 bits). The
operand-size attribute of the current code segment determines the amount
the stack pointer is decremented (2, 4 or 8 bytes). In non-64-bit modes: if
the address-size and operand-size attributes are 32, the 32-bit stack
pointer (ESP) is decremented by 4. If both attributes are 16, the 16-bit SP
register (stack pointer) is decremented by 2.

Fig. 3-12(a), Operation of the PUSH instruction (in IA-32 processors)

142

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
RSP ← (RSP − 8);
IF (SRC is FS or GS)
THEN
TEMP = ZeroExtend64(SRC);
ELSE IF (SRC is IMMEDIATE)
TEMP = SignExtend64(SRC); FI;
ELSE
TEMP = SRC;
FI
RSP ← TEMP; (* Push quadword *)
ELSE (* OperandSize = 16; 66H used *)
RSP ← (RSP − 2);
RSP ← SRC; (* Push word *)
FI;
ELSE IF StackAddrSize = 32
THEN
IF OperandSize = 32
THEN
ESP ← (ESP − 4);
IF (SRC is FS or GS)
THEN
TEMP = ZeroExtend32(SRC);
ELSE IF (SRC is IMMEDIATE)
TEMP = SignExtend32(SRC); FI;
ELSE
TEMP = SRC;
FI;
SS:ESP ← TEMP; (* Push doubleword *)
ELSE (* OperandSize = 16*)
ESP ← (ESP − 2);
SS:ESP ← SRC; (* Push word *)
FI;
ELSE StackAddrSize = 16
IF OperandSize = 16
THEN
SP ← (SP − 2);
SS:SP ← SRC; (* Push word *)
ELSE (* OperandSize = 32 *)
SP ← (SP − 4);
SS:SP ← SRC; (* Push doubleword *)
FI;
FI;
FI;

143

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

B. Pop Operation
The POP instruction loads the value from the top of the stack to the
location specified with the destination operand (or explicit opcode) and
then increments the stack pointer. The destination operand can be a
general-purpose register, memory location, or segment register.

Fig. 3-12(b), Operation of the POP instruction (in IA-32 processors)

IF StackAddrSize = 32
THEN
IF OperandSize = 32
THEN
DEST ← SS:ESP; (* Copy a doubleword *)
ESP ← ESP + 4;
ELSE (* OperandSize = 16*)
DEST ← SS:ESP; (* Copy a word *)
ESP ← ESP + 2;
FI;
ELSE IF StackAddrSize = 64
THEN
IF OperandSize = 64
THEN
DEST ← SS:RSP; (* Copy quadword *)
RSP ← RSP + 8;
ELSE (* OperandSize = 16*)
DEST ← SS:RSP; (* Copy a word *)
RSP ← RSP + 2;
FI;
FI;
ELSE StackAddrSize = 16
THEN
IF OperandSize = 16
THEN
DEST ← SS:SP; (* Copy a word *)
SP ← SP + 2;
ELSE (* OperandSize = 32 *)
DEST ← SS:SP; (* Copy a doubleword *)
SP ← SP + 4;
FI;
FI;
144

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-5.3. Illustration Examples


The following two figures depict the execution of Push and Pop
instructions in x86 microprocessor. Note that the stack segment edge is
obtained by multiplying [SS] by 16 in real mode. In protected mode, the
SS holds the stack segment selector, from which the CPU can calculate
the stack segment address. Note also that The stack address decreases as
more data is pushed onto it and the address increases as data is popped
back off the stack.

Fig. 3-13(a), Illustration of the Push AX instruction (in 16-bit processors)

As shown in figure, stack operations (like push and pop ) work word wise
(and not byte wise). For instance look at the last example (Push AX).
Here the low byte (AL) is pushed first onto the stack and then the high
byte (AH) at the top of stack. Then the stacked pointer is moved to the
top of stack (decremented by 2 bytes), because stack grows from high to
low address memory.

Although, the conventional Little Endian representation seems to be


violated, but you can think of PUSH as follows: Low memory byte in
Low stack byte, and High memory byte in High stack byte. Also, in POP
we transfer Low stack byte in Low memory, and High stack byte in High
memory.
145

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Fig. 3-13(b), Illustration of the Pop BX instruction (in 16-bit processors)

The operation of stack PUSH AX can be summarized in the following 2


equations:

PUSH AX: AX  ((SS)) , ((SP))  ((SP)) - 2

Also, the operation of stack POP BX can be summarized as follows:

POP BX: ((SS))  BX , ((SP))  ((SP)) + 2

3-5.4. Stack and Calling Procedures


In high-level languages, such as C and Pascal, the compilers use a part of
the stack as temporal storage (called heap) for saving parameters and
local variables of the subroutines. This temporary storage area are
destroyed when the CPU return to the calling program. The use of a stack
allows subroutines to be recursive since each call can have its own calling
context. In this case, the CALL, and RET instructions which are used to
call and end procedures perform operations on the current stack. More
details about the procedure instructions will come in Chapters 4 and 5. It
is also interesting to mention that some early computers (during 1970's)
were entirely based on the extensive use of stacks. Such computers were
called 0-addressing or stack-oriented computers.
146

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Therefore, it is not surprising to know that some numeric coprocessors


(like 80x87) are also based on the extensive use of hardware stacks to
perform various floating point operations.

Figure 3-14. Stack organization with local variables for calling procedures.

3-5.5. Stack Behavior in 64-Bit Mode


In 64-bit mode, the stack pointer size (RSP) is 64 bits and cannot be
overridden by an instruction prefix. In implicit stack references, address-
size overrides are ignored. Pushes and pops of 32-bit values on the stack
are not possible in 64-bit mode. 16-bit pushes and pops are supported by
using the 66H operand-size prefix. PUSHA, PUSHAD, POPA, and
POPAD are not supported. In fact, the address calculations that reference
SS segments in 64-bit mode, are treated as if the segment base is zero.
The stack segment DPL (descriptor priority level) is modified such that it
is equal to the current priority level (CPL). This will be true even if it is
the only field in the SS descriptor that is modified. The PUSH/POP
instructions increment/decrement the stack using a 64-bit width (8 bytes).
When the content of a segment register is pushed onto 64-bit stack, the
pointer is automatically aligned to 64 bits.
147

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-6. IBM PC Memory Organization


The x86 microprocessors provide 20-bit address (A0-A19) in real mode,
which means a total of 1MB addressable memory. Therefore, the memory
map of the x86 microprocessor systems, which are working in the real
mode, resembles the old IBM PC/XT memory map. As shown in figure
3-15, IBM PC and compatible clones in real mode, have 1MB of
addressable space ranging from 00000H to FFFFFH. As shown in figure,
a total of 256kB (C0000-FFFFFF) is set for the ROM in the IBM PC. The
PC Basic Input/Output System (BIOS), which is usually stored in the PC
ROM, occupies the highest 8kB (FE000-FFFFF) of the memory map.
The upper 16 kB of the ROM (FFFF0-FFFFF) is dedicated for the start-
up code for every x86 CPU. Each time the PC is reset, the CS:IP
registers will have FFFF:0000, which points to FFFF0H physical
location in the BIOS ROM. This location contains the code that starts the
operation of the PC. In 32-bit microprocessor systems (80386 and later),
the starting location is 0FFFFFFF0H (16 bytes from highest end of the
32-bit address space).

Linear Address
(Hexadecimal) Decimal
FFFFFH ROM (BIOS), 8kB 1024kB
ROM (BASIC Compiler), 32 kB .
ROM (user), 8kB .
ROM (expansion), 168 kB .
ROM (Hard Disk Driver BIOS) .
ROM expansion (32kB)
C0000H 768 kB
BFFFFH RAM Video Adaptor (128kB) .
A0000H .
9FFFFH 640 kB.

. | .
|
. RAM (user) .
. | .
. | .
. .


00000H 0 kB

Fig. 3-15(a). IBM PC memory map (in real mode).

148

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Modern IBM PC’s and compatible computers provide a system memory


address decode space of 4GB. Figure 3-15(b) provides the mapping of
specific memory areas in modern IBM PC’s, with 32-bit address CPU’s,
such as 80386 and later Intel processors. As shown, there exists a 1MB
DOS compatibility area. This area is divided into the following regions:

• 0 – 640 kB MS-DOS Area.


• 640–768 kB Video Buffer Area.
• 768–896 kB Expansion Area .in 16kB sections (8 sections).
• 896-960 kB Extended BIOS Area in 16kB sections (4 sections).
• 960 kB–1MB System BIOS memory (BIOS Area) -

Fig. 3-15(b). Modern PC memory map (32bit).

The extended memory area, which covers the address range100000H


(1MB) to FFFFFFFFH (4GB-1), is divided into the following regions:

149

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

• Main Memory from 1MB to the top of memory (4GB system memory).
• PCI Memory from the top of memory to 4GB with 2 ranges:
— APIC Configuration Space from FEC0_0000H (4GB–20MB) to
FECF-FFFFH and FEE0_0000H to FEEF_FFFFH.
— High BIOS area from 4GB to 4 GB–2MB.

It is the BIOS or the system designers responsibility to limit memory


population so that adequate peripheral computer interface (PCI), High
BIOS, and Advanced programmable interrupt controller (APIC) memory
space can be allocated.

3-7. SPARC Memory Models and Addressing Space


Like other computer systems, the SPARC system memory is byte
addressed. The SPARC system memory is the collection of locations
accessed by the load/store instructions. These locations include traditional
memory, as well as I/O registers, and registers accessible via address
space identifiers.

3-7.1. SPARC Memory Models


The SPARC memory model defines the semantics of memory operations
such as load and store, and specifies how the order in which these
operations are issued by a processor is related to the order in which they
are executed by memory. It also specifies how instruction fetches are
synchronized with memory operations. The model applies both to uni-
processors and to shared-memory multiprocessors.

Memory is modeled as an N-port device, as shown in Figure 3-16, where


N is the number of processors. A processor initiates memory operations
via its port in what is called the issuing order. Each port contains a Store
Buffer used to hold stores, FLUSHes, and atomic load-stores. A switch
connects a single-port memory to one of the ports at a time, for the
duration of each memory operation. The order in which this switch is
thrown from one port to another is nondeterministic and defines the
memory order of operations.

A standard memory model called Total Store Ordering (TSO) is defined


for SPARC. All SPARC implementations must support TSO. An
additional model called Partial Store Ordering (PSO) is defined, which
allows higher-performance memory systems to be built. Machines that
implement Strong Consistency (also known as Strong Ordering)
automatically satisfy both TSO and PSO. Machines that implement TSO
automatically satisfy PSO.
150

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Fig. 3-16(a). SPARC memory model, memory side

Fig. 3-16(b). SPARC memory model, processor side.

151

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-7.2. SPARC Addressing Space


The SPARC uses separate address spaces for data and code (text). In fact,
the SPARC provides at least four address spaces: the user instruction
(code) space, the supervisor instruction (code) space, the user data space,
and the supervisor data space. When the processor is in user state,
instructions are fetched from the user instruction space while data values
are loaded from and stored to user data memory. Similarly, when the
processor is in supervisor mode, instructions are fetched from supervisor
instruction space and data values are, by default, loaded from and stored
to supervisor data memory.

When the processor is in supervisor state, you can use special load and
store instructions to access data values in alternate memory spaces. For
examples, you can load a value from the user data space, or store a value
into the user instruction space. These instructions require an explicit
address space indicator (ASI). Table 3-3 summarizes the ASI values used
for these instructions.

Table 3-3. ASI values for different addressing spaces, in SPARC processors.

Address Space ASI value


User Instructions (code) 7
Superuser Instructions (code) 8
User Data 9
Superuser Data 10

152

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-8. ARM Memory Organization


The ARM memory map is summarized in the following figure. As shown
in figure, the processor has a fixed default memory map that provides up
to 4GB of addressable memory. This is a unified address space covering
both program and data regions as well as peripherals and system control
registers.

Fig. 3-17. Memory organization of ARM processors

3-8.1. ARM registers


ARM has 31 general-purpose 32-bit registers. At any one time, 16 of
these registers (R0-R15) are visible and accessible in any particular
processor mode. The other registers are used to speed up exception
processing.
153

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

All the ARM instructions can address any of the 16 visible registers.

The main bank of 16 registers is used by all unprivileged code. These are
the User mode registers. User mode is different from all other modes as it
is unprivileged, which means it can only switch to another processor
mode by generating an exception. The SWI instruction provides this
facility from program control.

Fig. 3-17. ARM registers

Out of the 16 visible registers, the following three registers have special
roles:
Stack pointer Software normally uses R13 as a Stack Pointer (SP). R13
is used by the PUSH and POP instructions in T variants, and by the SRS
and RFE instructions from ARMv6.
Link register Register 14 is the Link Register (LR). This register holds
the address of the next instruction after a Branch and Link (BL or BLX)
instruction, which is used to make a subroutine call. It is also used for
return address on entry to exception modes. At other times, R14 can be
used as a general-purpose register.
154

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Program counter Register 15 is the Program Counter (PC). It can be


used in most instructions as a pointer to the instruction which is two
instructions after the instruction being executed. In ARM state, all ARM
instructions are four bytes long (one 32-bit word) and are always aligned
on a word boundary. This means that the bottom two bits of the PC are
always zero, and therefore the PC contains only 30 non-constant bits.

3-8.2. Stacks
The processor uses a full descending stack. This means the stack pointer
holds the address of the last stacked item in memory. When the processor
pushes a new item onto the stack, it decrements the stack pointer and then
writes the item to the new memory location. The processor implements
two stacks, the main stack and the process stack, with a pointer for each
held in independent registers,

155

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-8. Summary
The physical memory of computer systems is usually organized as a
sequence of 8-bit bytes. Each byte is assigned a unique address that
ranges from zero to a maximum allowed memory space, which depends
on the width of the address bus. For instance, the 8086 has 20-bit address
bus and can address up to 1MB (220 bytes). On the other hand, the 80386
has 32-bit address bus and can address up to 4GB (232 bytes).

Virtual Memory: Virtual memory consists of the entire address space


available to programs. It is a large linear-address space that is translated
by a combination of hardware and operating-system software to a smaller
physical-address space, parts of which are located in memory and parts
on disk or other external storage media.

Physical Memory: Physical memory is the installed memory (excluding


cache memory) in a particular computer system that can be accessed
through the processor’s bus interface. The maximum size of the physical
memory space is determined by the number of address bits on the bus
interface. In a virtual memory system, the large virtual-address space
(also called linear-address space) is translated to a smaller physical
address space by a combination of segmentation and paging hardware and
software. Paging is a mechanism for translating linear (virtual) addresses
into fixed-size blocks called pages, which the operating system can move,
as needed, between memory and external storage media (typically disk).

The architecture of modern processors allows multitasking and gives


designers the freedom to choose one of the following two memory
models for each task:

 A flat (linear) address space consisting of a single array of up to 4GB.


 A segmented address space consisting of an up to 16,383 segments,
each of linear address space of up to 4 GB each (for 32-bit address)
or up to 16 B (for 64-bit address).

The 80x86 processor families can operate in various modes. The Intel
8086, Intel 8088, Intel 80188 and Intel 80186 had only real mode.

In real mode the processor can address up to 1 MB (as 8086 in its


maximum mode) using 20-bit address lines. Processors beginning with
the Intel 80286 feature a second mode called protected mode (or
protected virtual address mode PVAM).

156

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

In this mode the microprocessor can address up to 24 MB. To ease the


transition to/from protected mode, Intel 80386 and later processors have
been provided with a third mode called "virtual 86", or simply V86
mode.

Addressing modes for 16-bit x86 processors can be summarized by the


following formula:

The 16-bit offset or effective address (EA) may be also considered as the
16-bit instruction pointer (IP) or the 16-bit stack pointer (SP). There are
some special combinations of segment registers and general registers that
point to important addresses:

CS:IP points to the address of the byte of code the processor will fetch

SS:SP points to the location of the last item pushed onto the stack.

DS:SI is often used to point to data that is about to be copied to ES:DI

With the advent of the 32-bit 80386 processor, the 16-bit general-purpose
registers, base registers, index registers, instruction pointer, and FLAGS
register, but not the segment registers, were expanded to 32 bits. This is
represented by prefixing an "E" (for Extended) to the register opcodes
Thus the expanded AX became EAX, SI became ESI and so on. The
general-purpose registers, base registers, and index registers could all be
used as the base in addressing modes, and all of those registers except for
the stack pointer could be used as the index in addressing modes. Two
new segment registers (FS and GS) were added. With a greater number of
registers, instructions and operands, the machine code format was
expanded. To provide backward compatibility, segments with executable
code can be marked as containing either 16-bit or 32-bit instructions.
Special prefixes allow inclusion of 32-bit instructions in a 16-bit segment
or vice versa.

Addressing modes for 32-bit processors can be summarized by this


formula:
157

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

The x86 in 64-bit processors extended the 32-bit registers in a similar


way that 32-bit protected mode did before it (RAX, RBX, RCX, RDX,
RSI, RDI, RBP, RSP, RFLAGS, RIP). AMD also added 8 additional 64-
bit general registers (R8, R9, ..., R15). The addressing modes were not
dramatically changed from 32-bit mode, except that addressing was
extended to 64 bits, physical addressing is now sign extended (so memory
always adds equally to the top and bottom of memory; note that this does
not affect linear or virtual addressing), and other selector details have
been dramatically reduced. Addressing modes for 64-bit code on 64-bit
x86 processors can be summarized by these formulas:

Where general-register is one of the 64-bit general purpose registers


inside the microprocessor (RAX, RBX, RCX, RDX). The 64-bit effective
address EA (offset) can be also considered as the sum of the 64-bit
instruction pointer (RIP) and a displacement:

Effective Address (Offset) = RIP + [displacement]

Stack is a part of memory, which can be used for temporary storage of


data from either within a subroutine or as the normal method of passing
parameters to a subroutine. The Stack stores data structures in a special
manner so that data can be accessed quickly in last-in first-out order
(LIFO).
158

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

Scalable Processor ARChitecture, (SPARC) is a CPU architecture,


derived from a reduced instruction set computer (RISC) lineage. As
architecture. SPARC is an instruction set architecture (ISA) with 32-bit
integer and 32-, 64-, and 128-bit IEEE Standard 754 floating-point as its
principal data types. It defines general-purpose integer, floating-point,
and special state/status registers and 72 basic instruction operations, all
encoded in 32-bit wide instruction formats. The load/store instructions
address a linear, 232-byte address space. In addition to the floating-point
instructions, SPARC also provides instruction set support for an optional
implementation-defined coprocessor.

SPARC V8 architecture includes the following principal features:


A linear, 32-bit address space.
Few and simple instruction formats — All instructions are 32 bits
wide, and are aligned on 32-bit boundaries in memory. There are only
three basic instruction formats, and they feature uniform placement of
opcode and register address fields. Only load and store instructions
access memory and I/O.
Few addressing modes. A memory address is given by either ―register
+ register‖ or ―register+immediate.‖
Triadic register addresses. Most instructions operate on 2 register
operands (or register and constant) and place result in a third register.
A large windowed register file. At any instant, a program sees 8 global
integer registers plus a 24-register window into a larger register file.
The windowed registers can be described as a cache of procedure
arguments, local values, and return addresses.
A separate floating-point register file - configurable by software into
32-bit single-precision, 64-bit double-precision, 128-bit quad-
precision registers, or a mixture thereof.
Delayed control transfer. The processor always fetches the next
instruction after a delayed control-transfer instruction. It executes it or
not, depending on the control-transfer instruction annul bit.
Fast trap handlers. Traps are vectored through a table, and cause
allocation of a fresh register window in the register file.
Tagged instructions. The tagged add/subtract instructions assume that
the two least-significant bits of the operands are tag bits.
Multiprocessor synchronization instructions. One instruction performs
an atomic read-then-set-memory operation; another performs an
atomic exchange-register-with-memory operation.
Coprocessor. The architecture defines a straightforward coprocessor
instruction set, in addition to the floating-point instruction set.
159

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-9. Problems

3-1) Calculate the maximum address space of the 80x86 microprocessors


in different memory models?

3-2) List and explain the use of the four segment registers in the 8086
microprocessors. Also describe how the 20-bit physical address is
obtained from the 16-bit segment and offset addresses .

3-3) For an 8086 microprocessor the 20-bit physical address of a byte of


program code is 5FB60H. Calculate the value of the appropriate segment
register, given that the contents of the instruction pointer register is
IP=0370H.

3-4) Describe the following memory addressing terms:


i- Logical address (in both real and protected modes)
ii- Segmented address (in real mode, e.g., CS:IP)
iii- Virtual address (in protected mode, e.g., CS:EIP)
iv- Linear address (e.g., 67890H)
v- Physical address (e.g., 1ABCDH)

3-5) Show how a 32-bit effective address (EA) is obtained, in the


following case: SS = 0001H, ESP = 100A0H, if the first descriptor table
record contains the base address value 01101H.

3-6) Describe the main operation of the stack in 8086 microprocessor


systems (PUSH and POP instructions).

3-7) What do you know about memory paging addressing mechanism in


the Intel microprocessors?

3-8) Describe how the translation Look-aside Buffer (TLB) can help and
save time when a linear-to-physical memory translation is needed?

3-9) _____ method is used to map logical addresses of variable length


onto physical memory.
a) Paging
b) Overlays
c) Segmentation
d) Paging with segmentation
160

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-10) The type of memory assignment used in Intel processors is _____ .


a) Little Endian
b) Big Endian
c) Medium Endian
d) None of the above

3-11) The unit which acts as an intermediate agent between memory and
backing store to reduce process time is _____ .
a) TLB’s
b) Registers
c) Page tables
d) Cache

3-12) The small extremely fast, RAM’s are called as _______ .


a) Cache
b) Heaps
c) Accumulators
d) Stacks

3-13) The reason for the implementation of the cache memory is


a) To increase the internal memory of the system
b) The difference in speeds of operation of the processor and memory
c) To reduce the memory access and cycle time
d) All of the above

3-14) The reason for the implementation of the cache memory is


a) To increase the internal memory of the system
b) The difference in speeds of operation of the processor and memory
c) To reduce the memory access and cycle time
d) All of the above

3-15) The write-through procedure is used


a) To write onto the memory directly
b) To write and read from memory simultaneously
c) To write directly on the memory and the cache simultaneously
d) None of the above

3-16) Write-back:
(a) reverses the order of the bits of data.
(b) is used to double-check the accuracy of data before use.
(c) is only used in the little-endian system.
(d) stores results in the cache rather than in the external memory.
161

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 3

3-10. Bibliography

[1] M. THORNE, Computer organization and assembly language


programming for IBM PCs and Compatibles, Benjamin -Cummings, 1991

[2] W. A. Triebel, 80386, 80486, and Pentium Microprocessor: The


Hardware, Software, and Interfacing, 1999.

[3] Barry B. Brey, The Intel Microprocessors 8086/8088, 80186/80188,


80286, 80386, 80486, Pentium, and Pentium Pro Processor Architecture,
Programming, and Interfacing, Book News, NY, 1999.

[4] T. Shanley, Pentium Pro and Pentium II System Architecture (2nd


Edition), Mind share Inc. 2000.

162

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

Microprocessor
Instructions
Contents

4-1. Introduction
4-2. Data Types (Bytes, Words, Integers, Floating point numbers, … )
4-3. Instruction Format of x86 Microprocessors
4-4. Addressing Modes. of x86 Microprocessors
4-5. Intel‘ 8086/80186/80286/80386/80486 Instruction Set (Alphabetical)
4-6. Basic Instruction Set of x86 Microprocessors (by category)
4-6.1. Data Transfer Instructions
4-6.2. Arithmetic Instructions
4-6.3. Logic Instructions
4-6.4. String Instructions
4-6.5. Program Control Instructions
4-6.6. Processor Control Instructions
4-7. Math Coprocessor (x87) Instructions
4-8. Subroutine Calls & Interrupts in x86 Microprocessors
4-8.1. Subroutine Calls (CALL)
4-8.2. Interrupts (INT)
4-8.3. Masking Interrupts (Turning Interrupts Off)
4-8.4. Interrupts Priority
4-9. IBM PC Interrupts and & DOS Calls
4-9.1. PC Boot Process
4-9.2. PC Interrupt Service Routines
4-9.3. BIOS Calls & DOS Calls

163
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Contents of Chapter 4 (Cont.)

4-10. Interrupts in x86 Protected Mode


4-10.1. Gates
4-10.2. Interrupt Descriptor Table
4-10.3. Interrupt Masking in Protected Mode
4-10.4. Debugging in Protected Mode
4-11. New Instruction Sets of x86-64 Microprocessors
4-11.1. Media Instructions
4-11.2. Floating-Point Instructions
4-12. Summary of the Recent x86 (32-bit & 64-bit) Instructions
4-12.1. MMX Instructions.
4-12.2. Streaming SMID (SSE) Instructions.
4-12.3. Streaming SMID Extension2 (SSE2) Instructions
4-12.4. SSE3 Instructions.
4-12.5. SSE4 Instructions.
4-13. Undocumented x86 Instructions
4-14. Converting x86 Assembly Language to Machine Code
4-15. Case Study: Encoding the MOV Instruction
4-16. Execution Time of x86 Instructions
4-17. Instruction Set of SPARC Processors
4-18 Instruction Format of SPARC Processors
4-19. Encoding of Load and Store Instructions in SPARC Processors
4-20. Summary
4-21. Problems

164
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Microprocessor
Instructions

4-1. Introduction
It is well known that digital computers can only understand and execute
machine (binary) codes. However, humans almost never write programs
directly in machine code. Instead, they use higher-level programming
languages which can be translated by special computer programs
(compilers) into machine code.

The simplest kind of programming languages is the assembly language,


which has a one-to-one correspondence with the resulting machine code
but allows the use of meaningful text strings, called mnemonics
(pronounced ne-mon-ik). The assembly language is thus a symbolic
representation of machine language of a specific processor.

The assembly language of a certain microprocessor is composed of a set


of instructions, which are specific for this processor. Each instruction is
generally composed of two fundamental parts: the opcode (the part of the
instruction which encodes the basic type of operation to perform) and
operands (names of variables and constants and their locations in the
program). In this chapter we‘ll present the main features of the assembly
language of x86 microprocessors and how it can be used to write
application programs for 80x86 microprocessor systems.

4-2. Data Types


Each programming language has its own data types. Thus, programmers
can instantiate variables and constants using these data types. In fact, the
programmer has to specify the type of each variable or constant to be
used in the course of his program. This is an obligation in modern
programming languages, which are called typed programming languages.

4-2.1. Fundamental Data Types


The assembly language of most processors (such as x86 microprocessors)
can recognize the following fundamental data types: bytes, words, and
dwords (double words).

165
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The byte is eight contiguous bits starting at any logical address. The bits
are numbered 0 through 7; bit zero is the least significant bit (LSB).

The word is two contiguous bytes in many computer systems. A word


thus contains 16 bits. The bits of a word are numbered from 0 through 15;
bit 0 is the LSB. The byte containing bit 0 of the word is called the low
byte; the byte containing bit 15 is called the high byte. Each byte within
a word has its own address, and the smaller of the addresses is the address
of the word. The byte at this lower address contains the eight least
significant bits of the word, while the byte at the higher address contains
the eight most significant bits. This way of arrangement of bytes in
memory, where lower byte in lower address and higher byte in higher
address as shown in figure 4-1a, is called the little endian representation.

A dword (double word) is two contiguous words starting at any byte


address. A double word thus contains 32 bits. The bits of a double word
are numbered from 0 through 31; bit 0 is the LSB. The word containing
bit 0 of the double word is called the low word; the word containing bit
31 is called the high word.

When used in a configuration with a 32-bit bus, actual transfers of data


between processor and memory take place in units of double words
beginning at addresses evenly divisible by four; however, the processor
converts requests for misaligned words or double words into the
appropriate sequences. The misaligned data transfers reduce performance
by requiring extra memory cycles. For maximum performance, data
structures (including stacks) should be designed in such a way that,
whenever possible, word operands are aligned at even addresses and
double word operands are aligned at addresses evenly divisible by four.

Higher FFFFF
Address : MSB
: XXXX1 15 Higher Byte 8
: XXXX0 7 Lower Byte 0
Lower : LSB
Address 00000

Fig. 4-1(a). Byte and word organization in memory, according to the little endian
representation (lower byte has lower address)

166
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-2.2. Additional Data Types


Although bytes, words, and double words are the fundamental types of
operands, the recent Intel x86 processors also support additional
interpretations of these operands. Depending on the instruction referring
to the operand, the following additional data types are recognized:
i- Integer: A signed binary numeric value contained in a 32-bit
doubleword, 16-bit word, or 8-bit byte. All operations assume a 2's
complement representation. The sign bit is located in bit 7 in a byte, bit
15 in a word, and bit 31 in a double word. The sign bit has the value zero
for positive integers and one for negative. Since the high-order bit is used
for a sign, the range of an 8-bit integer is -128 through +127; 16-bit
integers may range from -32,768 through +32,767; 32-bit integers may
range from -231 through +231 -1. The value zero has a positive sign.
ii- Ordinal: An unsigned binary numeric value contained in a 32-bit
doubleword, 16-bit word, or 8-bit byte. All bits are considered in
determining magnitude of the number. The value range of an 8-bit ordinal
number is 0-255; 16 bits can represent values from 0 through 65,535; 32
bits can represent values from 0 through 232 -1.
iii- Near Pointer: A 32-bit logical address. A near pointer is an offset
within a segment. Near pointers are used in either a flat or a segmented
model of memory organization.

iv- Far Pointer: A 48-bit logical address of two components: a 16-bit


segment selector component and a 32-bit offset component. Far pointers
are used by application programmers, only when system designers choose
a segmented memory organization.
v- String: A contiguous sequence of bytes, words, or dwords. A string
may contain from zero bytes to 232-1 bytes (4 gigabytes).
vi- Bit field: A contiguous sequence of bits. A bit field may begin at any
bit position of any byte and may contain up to 32 bits.

vii- Bit string: A contiguous sequence of bits. A bit string may begin at
any bit position of any byte and may contain up to 2 32 -1 bits.
viii- BCD: A byte (unpacked) representation of a decimal digit in the
range 0 through 9. Unpacked decimal numbers are stored as unsigned
byte quantities. One digit is stored in each byte. The magnitude of a
number is determined by the low-order half-byte; hexadecimal values 0-9
are interpreted as decimal numbers.
167
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The high-order half-byte must be zero for multiplication and division; it


may contain any value for addition and subtraction.

ix- Packed BCD: A byte (packed) representation of two decimal digits,


each in the range 0 through 9. One digit is stored in each half-byte.

The digit in the high-order half-byte is the most significant. Values 0-9
are valid in each half-byte. The range of a packed decimal byte is 0-99, as
shown in figure 4-1(b).

Unpacked BCD
7 6 5 4 3 2 1 0
0-9
Packed BCD
7 6 5 4 3 2 1 0
0-9 0-9
Fig. 4-1(b). Packed and unpacked binary-coded decimal (BCD) number
representation.

x- Real Numbers: Real numbers are non-integer numbers and may be


32-bit short real or 64-bit long real. Encoding non-integer values requires
the use of scientific fixed point or floating-point notations.

A- Fixed-Point Encoding: Fixed-point numbers are usually proper


fractions. The fixed point is usually set after the first bit, which is usually
the sign bit, as follows:

N = 0.b-1 b-2 b-3 ………b-L for N ≥ 0 (4-1a)


= 1.b-1 b-2 b-3 ………b-L for N < 0

The equivalent decimal value is hence given by:

N = + b-1 x 2-1 + b-2 x 2-2 + b-3 x 2-3 + …+ b-L x 2-L for N ≥ 0 (4-1b)
= - b-1 x 2-1 + b-2 x 2-2 + b-3 x 2-3 + …+ b-L x 2-L for N < 0

This representation is called the signed-magnitude form. The most


significant bit is hence 0 or 1 depending on the whether the number (N) is
positive or negative, as shown in figure 4-1(c). Alternatively, negative
fixed numbers may be expressed as 2's complement (N  2 - |N| ) or1's
complement (N  2 – 2-L - |N| ).

168
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Fixed-point Number
Sign bit  L bits 
0 or 1 b-1 b-2 b-3 ... ….. ……. …… … ... b-L
Decimal point
Fig. 4-1(c). Fixed-point number representation (signed magnitude form).

B- Floating Point Encoding: Floating point numbers are usually stored


as two fixed-point numbers, one for signed exponent (E) and one for
signed fraction or mantissa (F), as shown in figure 4-1(d).

Floating-point Number
Signed Exponent E Signed Fraction (Mantissa) F
Sign bit Exponent Sign bit Mantissa
0 or 1 0 or 1 b-1 b-2 …. b-L

Fig. 4-1(d). Floating-point number representation, as two fixed point numbers.

For example the binary number 1001.11 is represented as 0.100111 x 24,


with E = 0.4 (4 with zero sign bit), and F = 0.100111. However, there
exist so many floating-point representation systems, but the most
common one is the IEEE 754 standard. The actual structure of this
representation is examined in the following section.

Normalized Mantissa (Base 2): A floating point value in binary is


always represented as 1.xxx times 2 to the power of some "exponent"
where xxx represents some sequence of 0's and 1's. Actually, there might
be a plus or minus in front of the 1.xxxx.
The main point here is that a "normalized" base 2 floating point value will
always have a single 1 to the left of the decimal point. Since it will
always be the same thing, there is no need to encode the one; it can be
assumed without taking up any actual code bit space.

Excess-127 Exponent (Base 2): The IEEE 754 standard specifies that the
"exponent" will be encoded as a 8-bit value, using the unsigned binary
code of a value which is 127 more than (in "excess" of) the actual base 2
exponent required to represent the desired value.

32-bit Format (Short Real): This format allows for approximately 7


decimal digits of precision and a scale of almost 40 decimal digits.

169
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Floating-point Number (32-bit Format)


31 30 ……. 23 22 …….. … …… ……. ……… 0
1-bit Sign 8-bit Exponent 23-bit Fraction (Mantissa)
Fig. 4-1.(e) Floating-point number representation, according to IEEE 754 format,
for 32-bit (short real) numbers

This format may be translated to the usual decimal form as follows:

N = (-1)S x 2 E - 127 x (1.F) (4-1c)

Example 4-1.
Let's write the 32-bit number 434D4000H in IEEE 754 format. This
number may be re-written in binary form as follows:

4 3 4 D 4 0 0 0
0100 0011 0100 1101 0100 0000 0000 0000

This binary number can be re-grouped in the floating-point format (IEEE


754), as shown in the following figure Note that multiplying the fraction
part of the number by 2E means moving the decimal point of the fraction
7 digits to the right (because E = 7). Then you get 1001101.01 , which
equal to 205.25

Floating-point Number (32-bit Format)


31 30 ……. 23 22 …….. … …… ……. ……… 0
1-bit Sign 8-bit Exponent 23-bit Fraction (Mantissa)
(+127)
0 10000110 (1.)10011010100000000000000
+ E = 134 -127 = 7 F = 1x2-1 + 0x2-2 + 0x2-3 + 1x2-4 + 1x2-5
+0x 2-6 + 1x 2-7 + 0x 2-8 +1x 2-9 + 0
= 0.603515625
7
N = + 2 x 1.100110101= 11001101.01 = +205.25

64-bit Format (Long Real or Double Precision): The 64-bit format has
the same structure as the 32-bit format except that it uses 11-bit exponent
encoded in excess-1023 notation and 52-bit mantissa. This provides 15
decimal digits of precision and a scale of 300 decimal digits.

170
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Floating-point Number (64-bit Format)


63 62 ……. 52 51 …….. … …… ……. ……… 0
1-bit Sign 11-bit Exponent 52-bit Fraction (Mantissa)
Fig. 4-1(f). Floating-point number representation, according to IEEE 754 format,
for 64-bit (long real or double precision) numbers

80-bit Format (Extended Precision): The 80-bit format has also the same
structure as the 32-bit format except that it uses 15-bit exponent and 64-
bit mantissa.

Floating-point Number (80-bit Format)


79 78 ……. 64 63 …….. … …… ……. ……… 0
1-bit Sign 15-bit Exponent 64-bit Fraction (Mantissa)
Fig. 4-1(g). Floating-point number representation, according to IEEE 754 format,
for 80-bit (extended precision) numbers

Floating-point values may be handled completely through software, by


means of an extra "math co-processor", or by complex instructions built
into the main processor. Some special floating numbers, like NaN (Not a
Number) are represented by special sequences, recognized by the CPU.

Note 4-1. Data Types in Different Computer Systems


The basic data types are almost similar in all computer systems. However, in
some computer systems, the word size is 32 bits. For instance, in RISC
machines, such as SPARC processors, the width of basic data types is as
follows:
Byte: 8 bits
Halfword: 16 bits
Word: 32 bits
Extended Word: 64 bits
Doubleword: 64 bits
Quadword: 128 bits
Additional data types, such as integers, may also have different sizes in
different computer systems. For instance, SPARC processors have the
following additional data types:
Signed Integer: 8, 16, 32, and 64 bits
Unsigned Integer: 8, 16, 32, and 64 bits
Floating Point: 32, 64, and 128 bits (according to IEEE 754 format)

171
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-3. Instruction Format of x86 Microprocessors


The information encoded in an x86 instruction includes a specification of
the operation to be performed (opcode), and the arguments (operands) to
be manipulated. If an operand is located in memory, the instruction must
also select, explicitly or implicitly, which of the currently addressable
segments contains the operand. The 80x86 instructions are composed of
various elements and have various formats. The fundamental elements of
instructions are described below. Figure 4-2 depicts the fundamental
instruction format for x86 microprocessors. The exact format of
instructions is described in Appendix B.

Prefix Opcode Operands


Mnemonic Argument1 Argument2 Argument 3

Fig. 4-2. Fundamental instruction format for x86 microprocessors.

Here, Prefix is one or more bytes preceding an instruction to modify


its operation. It may be one of the instruction prefixes (like REP) or a
segment override prefix. For instance, the register size override (66H)
is used when 80386-through Pentium processors operate as 16-bit
machines. The REX prefix specifies a 64-bit operand size and provides
access to additional registers, in x86-64 microprocessors.

A mnemonic is a reserved name (e.g., MOV) for a class of instruction


opcodes that have the same function.

The operands argument1, argument2, and argument3 are optional. There


may be from zero to three operands, depending on the opcode. When
present, they take the form of either literals or identifiers for data items.

Operand identifiers are either reserved names of registers or are assumed


to be assigned to data items declared in another part of the program
(which may not be shown in the example). When two operands are
present in an instruction that modifies data, the right operand is the source
and the left operand is the destination.
Example 4-2:
Consider the following instruction: MOV EAX, SUBTOTAL

In this example MOV is the mnemonic identifier of an opcode, EAX is


the destination operand, and SUBTOTAL is the source operand.

172
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-4. Addressing Modes of x86 Microprocessors


Addressing modes are set of methods for specifying the operands of a
microprocessor instruction. Different processors vary in the number of
addressing modes they provide. The most common modes are
immediate, register, and memory (or absolute) modes. The following
table illustrates the most common addressing modes, with illustrating
examples for x86 processors. In immediate addressing mode - the
operand is contained within the instruction. In register addressing mode -
the operand is stored in a specified register. In memory or absolute
addressing mode - the operand is stored at a specified memory address.

Addressing Mode Examples


1) Immediate Addressing ADD CX,1024
2) Register Addressing MOV AL,BL
3) Memory Addressing
¤ Direct: MOV AX,[3000]
¤ Indirect:
* Register Indirect, MOV CL,[DX]
* Based, MOV CX,[BX+3]
* Indexed, MOV AX,[DI+2]
* Based Indexed, MOV [BP+SI],BL
* Based Indexed with Displacement, MOV [BX+SI+208],AH
4) String Addressing MOVSB
5) Port Addressing IN AL,40
OUT 80,AL
Fig. 4-3. Basic addressing modes of the x86 microprocessors.

The memory address may be Directly given (eg., [3000]), or Indirectly


pointed to by another register (e.g., [DX]), or Based (e.g., [BX+3]), or
Indexed (e.g., [DI+2]), or Based Indexed (e.g., [BP+SI]), or Based
Indexed with Displacement (e.g., [BX+DI+208]). Port addressing
mode, which is used in input and output instructions (IN, OUT), is also a
type of memory addressing.

Most processors have indirect addressing modes (register indirect, or


memory indirect) where the specified register or memory location does
not contain the operand but contains its address (actually the effective
address EA). Indirect addressing modes often have options for pre- or
post- increment or decrement.

173
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Multidimensional arrays of data structures may use complex addressing


modes, where the effective address is formed by adding one or more
registers and one or more constants.

It should be noted that the addressing mode may be sometimes


"implicit". This would be the case for instructions that modify a control
register in the CPU, or, in a stack-based processor where operands are
always on the top of the stack.

Example 4-3:
Draw a schematic representation depicting the execution of the indirect
addressing instruction: MOV AX,[BX+3]. Assume that BX contains 8000
and the contents of the data segment offset locations 8003, 8004 are 99H,
77H, respectively. Consider, the data segment address is (DS) = A000.
Solution:
As shown in figure 4-4, this instruction causes the following actions:

1) The effective address is calculated as follows:


EA = [BX+3] = 8000+03 = 8003,
2) The Physical memory locations are calculated as follows:
Physical memory = Linear memory = Segment Address*16 + Offset
= [DS] *16 + EA = A0000 +8003 = A8003
3) Physical memory locations (A8003, A8004) are referenced, by CPU,
4) The content of physical memory locations A8003, A8004 (2 bytes) is
moved from memory to AX.

(4) FFFFF Memory


P

AX 7799 A8004 77
A8003 99
BX 8000

DS A000 (3)

Physical address = [DS]*16 + EA = A8003


(1) (2)

EA = [BX+ 3] = 8003

Fig. 4-4. Sequence of operations for executing the instruction: MOV AX,[BX+3]

174
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-5. Intel 8086/80186/80286/80386/80486 Instruction Set


In this section we jet a quick look at the entire Intel microprocessors
instruction set. The following table recapitulates most of the 8086/
80186/80286/80386/80486 microprocessors instructions arranged in
alphabetical order. The processor column indicates the processors, to
which the instruction belongs. The + sign means the validity of the
instruction for later processors, because of the descendent compatibility
between x86 processors and their predecessors. So, 386+ means that this
instruction is valid for 80386 and later processors. Also, PM means
―protected mode‖ instruction and PRV means ―privilege level‖. When the
processor type column is not indicated (blank), it means that the
instruction is valid for all the above mentioned processors.
Table 4-1. Instruction set of x86 microprocessors, arranged in alphabetical order.

Instruction Description Processor

AAA Ascii Adjust after Addition 8086+


AAD Ascii Adjust for Division 8086+
AAM Ascii Adjust after Multiplication 8086+
AAS Ascii Adjust after Subtraction 8086+
ADC Add With Carry 8086+
ADD Arithmetic Addition 8086+
AND Logical And 8086+
ARPL Adjusted Requested privilege Level 286+ PM

BOUND Array Index Bound Check 80188+


BSF Bit Scan Forward 386+
BSR Bit Scan Reverse 386+
BSWAP Byte Swap 486+
BT Bit Test 386+
BTC Bit Test with Compliment 386+
BTR Bit Test with Reset 386+
BTS Bit Test and Set 386+

CALL Procedure Call 8086+


CBW Convert Byte to Word 8086+
CDQ Convert Double to Quad 386+
CLC Clear Carry 8086+
CLD Clear Direction Flag 8086+
CLI Clear Interrupt Flag (disable) 8086+

175
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Description Processor

CLTS Clear Task Switched Flag 286+ PRV


CMC Complement Carry Flag 8086+
CMP Compare 8086+
CMPS Compare String (Byte, Word, Dword) 286+
CMPXCHG Compare and Exchange 486+
CWD Convert Word to Doubleword 8086+
CWDE Convert Word to Extended Dword 386+

DAA Decimal Adjust for Addition 8086+


DAS Decimal Adjust for Subtraction 8086+
DEC Decrement 8086+
DIV Divide 8086+

ENTER Make Stack Frame 80188+


ESC Escape 8086+

HLT Halt CPU 8086+

IDIV Signed Integer Division 8086+


IMUL Signed Multiply 8086+
IN Input Byte or Word From Port 8086+
INC Increment 8086+
INS Input String from Port 80188+
INT Interrupt (software) 8086+
INTO Interrupt on Overflow 8086+
INVD Invalidate Cache 486+
INVLPG Invalidate Translation Look-Aside 486+
Buffer Entry
IRET / Interrupt Return 8086+
IRETD 386+

Jxx Jump Instructions (on condition xx) 8086+


JCXZ / Jump if Register CX / 8086+
JECXZ ECX is Zero 386+
JMP Unconditional Jump 8086+
LAHF Load Register AH From Flags 8086+

176
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Description Processor

LAR Load Access Rights (286+ protected) 286+


LDS Load Pointer Using DS 8086+
LEA Load Effective Address 8086+
LEAVE Restore Stack for Procedure Exit 80188+
LES Load Pointer Using ES 8086+
LFS Load Pointer Using FS 386+
LGDT Load Global Descriptor Table 286+ PRV
LIDT Load Interrupt Descriptor Table 286+ PRV
LGS Load Pointer Using GS 386+
LLDT Load Local Descriptor Table 286+ PRV
LMSW Load Machine Status Word 286+ PRV
LOCK Lock Bus (single prefix) 8086+
LODS Load String (Byte, Word or Double) 8086+
LOOP Loop and decrement CX if not Zero 8086+
LOOPE / Loop while Equal / 8086+
LOOPZ Loop while Zero
LOOPNZ / Loop while Not Zero / 8086+
LOOPNE Loop while Not Equal
LSL Load Segment Limit 286+ PM
LSS Load Pointer Using SS 386+
LTR Load Task Register 286+ PRV

MOV Move Byte or Word 8086+


MOVS Move String (Byte or Word) 8086+
MOVSX Move with Sign Extend 386+
MOVZX Move with Zero Extend 386+
MUL Unsigned Multiply 8086+

NEG Two's Complement Negation 8086+


NOP No Operation (90h) 8086+
NOT One's Compliment, Logical NOT 8086+

OR Inclusive Logical OR 8086+


OUT Output Data to Port 8086+
OUTS Output String to Port 80188+

177
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Description Processor


POP Pop Word off Stack 8086+
POPA / Pop All Registers onto Stack 80188+
POPAD 386+
POPF / Pop Flags off Stack 80188+
POPFD 386+
PUSH Push Word onto Stack 8086+
PUSHA / Push All Registers onto Stack 80188+
PUSHAD 386+
PUSHF / Push Flags onto Stack 8086+
PUSHFD 386+

RCL Rotate Through Carry Left 8086+


RCR Rotate Through Carry Right 8086+
REP Repeat String Operation 286+
REPE / REPZ Repeat Equal / Repeat Zero 286+
REPNE / REPNZ Repeat Not Equal / Repeat Not Zero 286+
RET / RETF Return From Procedure 8086+
ROL Rotate Left 8086+
ROR Rotate Right 8086+

SAHF Store AH Register into FLAGS 8086+


SAL Shift Arithmetic Left 8086+
SAR Shift Arithmetic Right 8086+
SBB Subtract with Borrow/Carry 8086+
SCAS Scan String (Byte, Word or Dword) 8086+
SETAE / Set if Above or Equal / 386+
SETNB Set if Not Below
SETB / Set if Below / 386+
SETNAE Set if Not Above or Equal
SETBE / Set if Below or Equal / 386+
SETNA Set if Not Above
SETE / SETZ Set if Equal / Set if Zero 386+
SETNE / SETNZ Set if Not Equal / Set if Not Zero 386+
SETL / SETNGE Set if Less / Set if Not Greater or Equal 386+
SETGE / SETNL Set if Greater or Equal / Set if Not Less 386+
SETLE / Set if Less or Equal / 386+
SETNG Set if Not Greater or Equal

178
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Description Processor*

SETG / SETNLE Set if Greater / Set if Not Less or Equal 386+


SETS Set if Signed 386+
SETNS Set if Not Signed 386+
SETC Set if Carry 386+
SETNC Set if Not Carry 386+
SETO Set if Overflow 386+
SETNO Set if Not Overflow 386+
SETP / SETPE Set if Parity / Set if Parity Even 386+
SETNP / SETPO Set if No Parity / Set if Parity Odd 386+
SGDT Store Global Descriptor Table 286+ PRV
SIDT Store Interrupt Descriptor Table 286+ PRV
SHL Shift Logical Left 8086+
SHR Shift Logical Right 8086+
SHLD / SHRD Double Precision Shift 386+
SLDT Store Local Descriptor Table 286+ PRV
SMSW Store Machine Status Word 286+ PRV
STC Set Carry Flag 8086+
STD Set Direction Flag 8086+
STI Set Interrupt Flag (Enable Interrupts) 8086+
STOS Store String (Byte, Word or Dword) 8086+
STR Store Task Register 286+ PRIV
SUB Subtract 8086+

TEST Test For Bit Pattern 8086+

VERR Verify Read 286+ PM


VERW Verify Write 286+ PM

WAIT / FWAIT Event Wait 8086+


WBINVD Write Back and Invalidate Cache 486+

XCHG Exchange (two locations) 8086+


XLAT / XLATB Translate / Translate Byte 8086+
XOR Exclusive OR 8086+

Note that several instructions of the 80x86 assembly language share


similar features and there are several ways to accomplish a task. This is a
general feature of assembly instructions of CISC machines. However, this
is not always the case.

179
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

For instance, there are only three instructions to set three flags, namely:
STC (set CF), STI (set IF) and STD (set DF).

In order to set other flags, you have to get around by pushing all flags into
the stack (PUSHF) and changing the flag bits you would like to modify
and then popping the stack back into the FLAG register (POPF). The
following piece of code sets the TF.

PUSHF
MOV BP,SP
OR WORD PTR[BP+0],0100H ; This is a mask to set the TF
MOV SP,BP
POPF

Note 4-2. Register Notations

Now on, we may use the notation (E)REG, where REG is any 16-bit
register, to indicate that instruction may be used with either the 16-bit
register (e.g., SP) or the corresponding 32-bit register (ESP). Also, we
may use the notation (R)REG with either a 16-bit register (e.g., SP) or the
corresponding 32-bit register (ESP) or the corresponding 64-bit register
(RSP). However, we sometimes drop the initial letter (E) or (R), for the
matter of simplicity.

4-6. Basic Instructions of x86 Microprocessors (by Category)


In this section, we present a detailed summary (by category) of the
instruction set of x86 microprocessors. More detailed description of the
x86 instructions and their opcodes is given in Appendices A and B.

The x86 instruction set can be grouped in the following categories:

 1. Data transfer instructions (e.g., MOV, XCHG)


 2. Arithmetic instructions (e.g.,ADD,SUB,MUL, DIV)
 3. Logic (Bit manipulation) instruction (e.g., AND, OR, NOT,..)
 4. String instructions (e.g., MOVS, CMPS,..)
 5. Program transfer instruction (e.g., CALL, JMP)
 6. Processor control instruction (e.g., HLT, CLC)

180
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-6.1. Data Transfer Instructions


These instructions provide convenient methods for moving bytes, words,
or double words of data between memory and the registers of the base
architecture. They fall into the following classes:

i. General-purpose data movement instructions.


ii. Stack manipulation instructions.
iii. Type-conversion instructions.

4-6.1.i. General-Purpose Data Movement Instructions


MOV (Move) transfers a byte, word, or double word from the source
operand to the destination operand. So, it has the syntax MOV dest,src,
where src is the source operand and dest is the destination operand. The
MOV instruction is useful for transferring data along any of these paths:

 To a register from memory, MOV BX,[300]


 To memory from a register, MOV [200],AL
 Between general registers, MOV BX,CX
 Immediate data to a register, MOV CX,78H
 Immediate data to a memory, MOV [BX],95H

The MOV instruction cannot move from memory to memory or from


segment register to segment register. Memory-to-memory moves can be
performed, however, by the string move instruction MOVS. Other
prohibited move instructions are immediate operand to segment registers,
any data to CS register and any data to IP register. There are variants of
MOV that operate on segment registers and between memory locations.
They are all summarized in the following tables:

Table 4-2. Variant data transfer instructions, in x86 microprocessors

Instruction Function Notes


IN Input byte or word from input port 8086+
OUT Output byte or word to output port 8086+
LAHF Load AH flags For 8085 compatibility
SAHF Store AH flags For 8085 compatibility
LDS Load pointer using data segment 8086+
LEA Load effective address 8086+
XLAT Translate byte 8086+
INS Input string from input port 286+
OUTS Output string to output port 286+

181
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Table 4-2. (Cont.) Variant data transfer instructions, in x86 microprocessors

Instruction Function Notes


LFS Load pointer using FS 386+
LGS Load pointer using GS 386+
LSS Load pointer using SS 386+
MOVSX Move with sign extended 386+
MOVZX Move with zero extended 386+
BSWAP Byte swap 80486
MOVC Move to/from control register Pentium
CMOV Conditional move Pentium Pro
FCCMOV Floating point conditional move Pentium Pro

The conditional move instruction (CMOV) was borrowed from the world
of single instruction computers (SIC), which are some sort of optimized
RISC machines, to enrich the instruction set of Pentium processors.
CMOV has also the general syntax CMOVx dest,src, which corresponds
to the following statement:

if condition x is TRUE then dest := src

where x is the condition code. For instance, CMOVs AX,BX means do the
transfer from BX to AX if the sign flag is set (SF =1).

XCHG (Exchange) swaps the contents of two operands. This instruction


takes the place of three MOV instructions, without need to a temporary
location to save the content of one operand while loading the other. The
XCHG instruction can swap two bytes, two words, or two dwords. The
operands for the XCHG instruction may be two registers, or a register
with a memory operand. When used with a memory operand, XCHG
activates the LOCK signal (bus lock). XCHG is useful for implementing
semaphores or similar data structures for process synchronization.

IN and OUT are special data transfer instructions. IN reads data in from
input port and OUT writes data out to an output port. IN and OUT have
various formats. For instance, IN AL,1FH will input data to AL, from the
input port address 1FH. Also, OUT 0F,AL will output data from AL, to
the output port address 0FH. The DX register is sometimes used to hold
the port address. For instance, IN AL,DX will input data to AL, from the
input port whose address is stored in DX. Such instructions will be
discussed in details in chapter 8.

182
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The translate byte instructions (XLAT / XLATB) are frequently used for
translating values from encoding format to another. XLATB replaces the
byte in AL with byte from a user table addressed by BX. The original
value of AL is the index into the translate table. For instance, if we want
to transform a 4-bit binary number to an ASCII encoded Hexadecimal
digit, we proceed as follows (assuming the 4-bit digits in AL):

TABLE DB ‗0123456789ABCDEF‘ ; Define a table of bytes


MOV BX,OFFSET TABLE ; Point to the table at BX
XLAT ; Translate table
…….

This is equivalent to adding 30H to AL if the digit is between 0-9, and


adding 37H if it is between A-F.

Example 4-4:
Show how to use the XLAT instruction to perform the translation from
binary-coded decimal (BCD) to 7-segment code, as shown in figure 4-5.
Solution:
Consider the following BCD to 7-segment translation program, where the
unpackaged BCD string begins at the address BCD_STR and the resultant
7-segment code is to be stored at 7SEG_STR.

G F E D C B A
0 0 1 1 1 1 1 1 3F
1 0 0 0 0 1 1 0 06
2 1 0 1 1 0 1 1 5B
3 1 0 0 1 1 1 1 4F
4 1 1 0 0 1 1 0 66
5 1 1 0 1 1 0 1 6D
6 1 1 1 1 1 0 1 7D
7 0 0 0 0 1 1 1 07
8 1 1 1 1 1 1 1 7F
9 1 1 0 1 1 1 1 6F
Fig. 4-5. BCD to 7-segment code translation.

The next program is written in the form of a procedure (subroutine)


that can be called from any 183 assembly program using CALL

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

instruction. Note that LEA BX,TABLE is equivalent to MOV BX,OFFSET


TABLE. Note also that XLAT operand (TABLE, in the above program) is a
dummy operand that is only used to improve the readability of the code.
7SEG PROC FAR
PUSH BX
TABLE DB 3FH ; 0 code
DB 06H ;1
DB 5BH ;2
DB 4FH ;3
DB 66H ;4
DB 0DH ;5
DB 7DH ;6
DB 07H ;7
DB 7FH ;8
DB 6FH ;9

LEA BX,TABLE ; Load base address of TABLE to BX


LEA SI,BCD_STR ; Set up pointer to source (BCD) string
LEA DI,7SEG_STR ; Set up pointer to destina (7seg) string
CLD ; Clear direction flag
MOV CX,LENGTH BCD_STR ; Load counter with length of BCD string
CONVERT:
LODS BCD_STR ; Load a BCD_STR digit to AL
CMP AL,9 ; Check digit, Is it > 9 ?
JA INVALID ; If true, invalid digit is detected
XLAT TABLE ; Otherwise convert digit to 7-segment code
STOS 7SEG_STR ; Store the result in 7SEG_STR
LOOP CONVERT ; Repeat for next digit until end of BCD_STR
INVALID:
POP BX
RET

7SEG ENDP

4-6.1.ii. Stack Manipulation Instructions


As we mentioned so far, the stack is a data structure for storing items,
which are to be accessed in last-in first-out (LIFO) order. The following
instructions are specific for stack operations. Remember that, we may use
the notation (E)REG or (R)REG, where REG is any 16-bit register, to
indicate that instruction may be used with either a 16-bit register or the
corresponding 32-bit or 64-bit registers. However, we drop here the initial
letter (E) or (R), for the matter of simplicity.

PUSH (push) decrements the stack pointer SP, then transfers the source
operand to the top of stack indicated by SP, as shown in figure 3-8. PUSH
is often used to place parameters on the stack before calling a procedure;
it is also the basic means of 184 storing temporary variables on

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

the stack. The PUSH instruction operates on memory operands,


immediate operands, and register operands (including segment registers).

PUSHA (Push All Registers) saves the contents of the eight general
registers on the stack. This instruction simplifies procedure calls by
reducing the number of instructions required to retain the contents of the
general registers for use in a procedure. The processor pushes the general
registers on the stack in the following order: AX, CX, DX, BX, the initial
value of SP before AX was pushed, BP, SI, and DI. PUSHA is
complemented by the POPA instruction.

POP (Pop) transfers the word or double word at the current top of stack,
indicated by SP, to the destination operand, and then increments SP to
point to the new top of stack, as shown in figure 3-8. POP moves
information from the stack to a general register, or to memory. There is
also a variant of POP that operates on segment registers.

POPA (Pop All Registers) restores the registers saved on the stack by
PUSHA, except that it ignores the saved value of SP. There exist other
instructions which deal with stack. They are summarized in table 4-3.

Table 4-3. Stack instructions, in 80x86 microprocessors

Instruction Function Processor


POPF Pop FLAGS register 8086+
POPD Pop double word off stack 386+
POPFD Pop double word EFLAGS off stack 386+
POPAD Pop all double word registers off stack 386+

Table 4-3 (cont.). Stack instructions, in 80x86 microprocessors

Instruction Function Processor


PUSHF Push FLAGS register 8086+
PUSHAD Push all double word registers into stack 386+
PUSHFD Push double word EFLAGS register into stack 386+

4-6.1.iii. Type Conversion Instructions


The type conversion instructions convert bytes into words, words into
double words, and double words into 64-bit items (quad-words). These
instructions are especially useful for converting signed integers, because
they automatically fill the extra bits of the larger item with the
185
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

value of the sign bit of the smaller item. This kind of conversion is called
sign extension.

There are two classes of type conversion instructions:

1. The forms CWD, CDQ, CBW, and CWDE which operate only on
data in the EAX register.
2. The forms MOVSX and MOVZX, which permit one operand to be
in any general register while permitting the other operand to be in
memory or in a register.

CWD (Convert Word to Double word) and CDQ (Convert Double word
to Quad-Word) double the size of the source operand. CWD extends the
sign of the word in register AX throughout register DX. CDQ extends the
sign of the double word in EAX throughout EDX. CWD can be used to
produce a double word dividend from a word before a word division.

CDQ can be used to produce a quad-word dividend from a double word


before double word division.

CBW (Convert Byte to Word) extends the sign of the byte in register AL
throughout AX.

CWDE (Convert Word to Double word Extended) extends the sign of the
word in register AX throughout EAX.

MOVSX (Move with Sign Extension) sign-extends an 8-bit value to a


16-bit value and a 8- or 16-bit value to 32-bit value, and so on, by
repeating the sign bit into all high-order bytes, as shown in figure 4-6.

MOVZX (Move with Zero Extension) extends an 8-bit value to a 16-bit


value and an 8- or 16-bit value to 32-bit value, and so on, by inserting
high-order zeros, as shown in figure 4-6.

1 x x x x x x x
MOVSX
1 1 1 1 1 1 1 1 1 x x x x x x x

x x x x x x x x
 MOVZX
0 0 0 0 0 0 0 0 x x x x x x x x
186
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Fig. 4-6. Sign extension and zero extension, from 1 byte to 2 bytes.

4-6.2. Arithmetic Instructions


There exist various instructions which manipulate mathematical
operations in 80x86. In fact, the first Intel microprocessor 4004 was
dedicated for calculator machines. The following table depicts the
arithmetic instructions, which are available in 80x86 microprocessors. As
shown, the arithmetic operations can be divided into 5 main groups:

 Addition instructions (e.g., ADD, ADC)


 Subtraction instructions (e.g., SUB, SBB)
 Multiplication instructions (e.g., MUL, IMUL)
 Division instruction (e.g., DIV, IDIV)
 ASCII adjust instructions (e.g., AAA, AAS, AAM, AAD)

The ASCII adjusting operations change the content of AL (after add,


subtract, multiply or divide operations) to valid unpacked decimal
number.

Table 4-4. Arithmetic instructions, in 80x86 microprocessors


Instruction Function Notes
AAA ASCII adjust after addition No operands
AAD ASCII adjust for division
AAM ASCII adjust after multiply
AAS ASCII adjust after subtract
ADC Add byte or word plus carry
ADD Add byte or word unsigned
CMP Compare byte or word affects ZF
DAA Decimal adjust for addition
DAS Decimal adjust for subtraction
DEC Decrement byte or word by 1
DIV Divide byte or word (unsigned) unsigned
IDIV Integer divide byte or word
IMUL Integer multiply byte or word
INC Increment byte or word by 1
MUL Multiply byte or word (unsigned) unsigned
NEG Negate byte or word
SBB Subtract byte or word and borrow

187
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

SUB Subtract byte or word, affects flags


CMPXCHG Compare and exchange 486+
XADD Exchange and add 486+
CMPXCHG8B Compare and exchange 8 bytes Pentium

The basic arithmetic instructions, like ADD (addition), SUB


(subtraction), MUL (multiply) and DIV (divide) support full range of
addressing modes for operands. The syntax of these instructions usually
takes the form:

ADD operand1,operand2

This instruction adds operand2 to operand1. The answer is stored in


operand1. Immediate data cannot be used as operand1 but can be used as
operand2. Operands may be 8, 16 or 32 bits. The result is always stored
in the destination operand, which is not necessary the accumulator (as
8080/8085). If only one source operand is supplied, the accumulator (AL
or AX or EAX) is assumed as a destination operand, according to the size
of the supplied source operand.
For instance, ADD BL,CH adds CH to BL and puts the result in BL. Also
SUB SI,1234 subtracts the immediate value 1234H from the content of SI
and puts the result in SI. The byte instruction MUL 12H will multiply AL
by the 8-bit value 12H and put the 16-bit result in AX. Similarly, MUL
BX multiplies AX by BX and puts the result in DX:AX, as illustrated in
figure 4-7(a).

When the result exceeds the length of the destination, the carry flag is set
and the destination is extended. For instance, the word instruction MUL
1234H will multiply AX by the 16-bit value 1234H and put the 32-bit
result in the register pair DX:AX. Also, the 32-bit multiplication MUL
12345678H will multiply EAX by the 32-bit value 12345678H and stored
the result in EDX:EAX. Figure 4-7(b) illustrates these operations
Note that MUL instruction multiplies two unsigned (positive) integers
while the IMUL instruction multiplies two signed integers (either
positive or negative). Similarly, the DIV instruction divides two unsigned
integers, while the IDIV instruction divides two signed integers.

MUL BX
DX AX BX
188
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

0 0 0 0 7 C 4 8 0 1 0 0 Before

0 0 7 C 4 B 0 0 0 1 0 0 After

Fig. 4-7(a). Illustration of MUL BX instruction, where BX contains 0100.

MUL op DIV op
Multiplier Divider Result
Multiplicand Product Dividend
(op) (op) Remainder Quotient
Byte AL AH AL Byte AL AH AL
Word AX DX AX Word AX DX AX
Dword EAX EDX EAX Dword EAX EDX EAX

Fig. 4-7(b). Illustration of MUL and DIV instructions, with different operand sizes.

The ADC (add with carry) instruction also sums two binary operands
placing the result in the destination. If CF is set, a 1 is added to the
destination. The SBB (subtract with borrow) instruction subtracts the
source from the destination, and subtracts 1 extra if the Carry Flag is set.
Results are returned in destination.

The AAA (ASCII adjust for addition) and AAS (ASCII adjust for
subtraction) instructions permit to do simple arithmetic operations
directly on ASCII numbers. Also, AAM (ASCII adjust for multiplication)
is used after multiplication of two unpacked decimal numbers. The high
order nibble of each byte must be zeroed before using AAM instruction.

The AAD (ASCII adjust before division) is used before dividing


unpacked decimal numbers. It multiplies AH by 10 and the adds result
into AL and then sets AH to zero.

Similarly, the DAA (decimal adjust for addition) and DAS (decimal
adjust for subtraction) instructions permit to do simple arithmetic
operations directly on BCD numbers. In fact, the BCD numbers (0-9) can
be stored instead of usual binary numbers, such that each one BCD digit
occupies 4 bits. So, if we added 2 bytes which are containing BCD
numbers (each byte contains 2 decimal digits), the result will not be
necessary correct, because the ADD command assume binary numbers.
The DAA instruction will adjust the result such that when we translate it
as BCD, we find it correct.
189
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Example 4-5:
Suppose we‘ve the packed decimal number 72 in AL and we want to add
19 to it. We know that the BCD result is 91. However, the ADD AL,19H
command will result in AL= 8BH, which is not correct. So, the DAA will
adjust the content of Al to the correct answer 91.

MOV AL,72H ; Consider AL = 36 (packed BCD)


ADD AL,19H ; add AL content and 19H as BCD numbers
DAA ; adjust AL such that AL= 72+19= 91 (BCD)

4-6.3. Logic Instructions


There exist a lot of logic instructions which manipulate logic algebra and
bit operations in x86 processors. The following table depicts the x86 logic
instructions. The logic instructions can be divided into 5 main groups:

 Main logic operations (e.g., AND, OR, XOR, NOT)


 Arithmetic Shift operations (e.g., SAL, SAR)
 Logic Shift operations (e.g., SHL,SHR, SHDL, SHRD)
 Rotate operations (e.g., ROL, ROR, RCL, RCR)
 Bit Test operations (e.g., BT, BTC, BTR)

The main logic operations are usually done between accumulator and
operand content, and the result is saved in accumulator. Figure (4-8)
depicts the main shift and rotate operations, and their effect on different
flags, in x86 microprocessors.
The AND, OR an XOR instructions operate on 8, 16 and 32 bit operands.
One of the interesting uses of AND instruction is to mask (set to zero)
selected bits in some value. For instance to mask all bits of AL except for
the first bit, we use the instruction:

AND AL,01H ; Mask all AL bits, except for first bit

Note that AND will change the destination content (AL) and will affect
the flags according to the result. The TEST instruction can be used for
masking without changing the content of the destination. For instance,
TEST is used to check if a certain bit is zero or one:

TEST AL,01H ; Test first bit of AL (check if it is 0 or 1)

If the first bit of AL is zero, the result is zero and zero flag (ZF) is set.
Also, we can use TEST to check whether the content of DL is positive or
negative using the following instruction:
190
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

TEST DL,80H ; Check if DL content is positive or negative

If the content of DL is positive (last bit is zero) then the zero will be set.
The following program input data from input port (PORT1) and check if
the first bit changes its value from high (1) to low (0):

MOV DX, IPORT1 ; Point to IPORT1


HANG: IN AL,DX ; Input byte from IPORT1
TEST AL,01H ; Check first bit of input byte
JNZ HANG ; Keep on checking if not 0
LOW: … ; Come here if first bit is 0

Similarly, the XOR instruction can be used for toggling (ones to zeros
and zeros to ones) of a certain value. For instance, in order to toggle all
bits in MEMOVAL we use the instruction:

XOR MEMO,0FFH ; Toggle MEMO bits

Table 4-5. Logic instructions, in 80x86 microprocessors

Instruction Function Notes


AND Logical AND of byte or word 8086+
NOT Logical NOT of byte or word
OR Logical OR of byte or word
RCL Rotate left byte or word (via carry)
RCR Rotate right byte or word (via carry)
ROL Rotate left byte or word
ROR Rotate right byte or word
SAL Arithmetic shift left byte or word
SAR Arithmetic shift right byte or word
SHL Logical shift left byte or word
SHR Logical shift right byte or word
XOR Logical X-OR of byte or word
BSF Bit scan forward 80386+
BSR Bit scan reverse
BT Bit test
BTC Bit test and complement
BTR Bit test and reset
SHLD Shift left double precision
SHRD Shift right double precision

191
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

SETcc Set byte or condition

As shown in figure 4-8, in the SHR operation, all bits are shifted right 1
bit, and the most left significant bit is filled with zero. The SHR is
equivalent to unsigned division by 2.

Similarly, in the SHL operation, all bits are shifted left 1 bit, and the most
right significant bit is filled with zero.

The SHL is equivalent to unsigned multiplication by 2. On the other


hand, the arithmetic shift right SAR operation is similar to SHR, with the
following exception.

In SAR, the most left significant bit is filled with the last bit (before
shifting). So, the shifted register will keep its sign after shift operation
(and divided by 2).

Shift operations

CF CF

0 0
SHR SHL

CF CF

SAR SAL 0

Rotate operations

CF CF

ROR ROL

192
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

CF CF

RCR RCL

Fig. 4-8. Shift and Rotate operations in 80x86 microprocessors.

4-6.4. String Instructions


Strings are chains of characters (or bytes). For instance, your given name
is a chain of characters, which may be called a string. In fact each
character can be stored in a byte, using the ASCII code. The x86
microprocessors can handle strings, via so many operations, like moving
a string or comparing two strings. String instructions make use of the data
segment (DS) and the extra segment (ES) as the source and destination
base addresses, and the source index (SI) and destination index (DI) as
pointers for source and destination strings. So, in string instructions, the
source string is pointed to by DS:SI and the destination string is pointed
to by ES:DI. We dully note that SI and DI may be replaced with ESI and
EDI in 32-bit mode or RSI and RDI in 64-bit mode. The following table
recapitulates most of the string instructions in x86 processors.
Table. 4-6. String instructions in 80x86 microprocessors.

Instruction Function Notes


CMPS Compare 2 strings byte, word or dword strings
CMPSB Compare 2 byte strings byte strings
CMPSW Compare 2 word strings word strings
CMPSD Compare 2 dword strings, (+386) dword strings
CLD Clear direction flag DF = 0
STD Set direction flag DF = 1
LODS Load a string in accumulator load ES:DI to AX
LODSB Load a byte string load ES:DI to AL
LODSW Load a word string load ES:DI to AX
LODSD Load a dword string, (+386) load ES:DI to EAX,
STOS Store a string Store AX to ES:DI
STOSB Store a byte string store AL to ES:DI
STOSW Store a word string store AX to ES:DI
STOSD Store a dword string store EAX to ES:DI
MOVS Move a string byte, word or dword strings

193
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

MOVSB Move a byte string byte strings


MOVSW Move a word string word strings
MOVSD Move a dword string, (+386) dword strings
REP Repeat operation Prefix
REPE Repeat operation while equal Prefix
REPZ Repeat operation while zero Prefix
REPNE Repeat operation while not equal Prefix
REPNZ Repeat operation while not zero Prefix
SCAS Scan a string in accumulator Scanned value in AX

MOVS Memory
P
6AB3
ES:DI
ES
DI 74FF0
6A B3
ES 47 FF

SI 2A 03
2A03
DS 03 80 DS:SI DS
03800

Fig. 4-9. String transfer (MOVS) or comparison (CMPS) operations, when DF = 0

Note that the direction flag (DF), which is bit 10 of the FLAGS register,
controls string instructions. Setting DF (to 1) causes string instructions to
auto-decrement; that is, to process strings from high to low addresses.
Clearing DF (to 0) causes string instructions to auto-increment, or to
process strings from low to high addresses. String instructions may be
preceded by the instruction prefix REP (repeat) or REPE (repeat if
equal) or REPNE (repeat if not equal), that repeat the instruction
operation, by the number of times specified by CX.

The first string instruction in the above table is CMPS (compare strings)
instruction. CMPS is used for 194 comparing two strings in

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

memory, to see if they are identical or not. The comparison is done by


subtracting the two strings. According to the result of comparison, the CF
and ZF flags are changed. The result is not saved, but each time the
CMPS is executed, both the DI and SI are decremented (or incremented)
by 1, according to the state of direction flag DF.

The CMPSB instruction is an alternative instruction that compares two


byte strings. If the CMPSB command is repeated (using the prefix REP)
the comparison is done with the next byte of the string.

The CMPSW is similar to CMPSB but it compares two word strings, so


DI and SI are decremented (or incremented) by 2. So, if the CMPSW
command is repeated (by REP) the comparison is done with the next
word of the string.

Similarly CMPSD compares two dword strings, and EDI and ESI are
decremented (or incremented) by 4, each time the instruction is invoked.
If the CMPSD command is repeated (by REP) the comparison is done
with the next dword of the string.

STOS (and its variants STOSB, STOSW, STOSD) store value in


accumulator to location at ES:DI. This is the case even if the operand is
given. The destination index DI is incremented or decremented based on
the size of the operand (or instruction format) and the state of the DF.

MOVS (and its variants MOVSB, MOVSW, MOVSD) copy data from
source string addressed by DS:SI to the destination location ES:DI
destination, based on the size of the operand (byte, word or double word)
or the used instruction. MOVS also updates SI and DI. In byte string
transfers (MOVSB), SI and DI are incremented (+1) when the DF is
cleared and decremented (-1) when the DF is set.

In word string transfers (MOVSW), SI and DI are incremented (+2)


when the DF is cleared and decremented (-2) when the DF is set. In
doubleword string transfers (MOVSD), SI and DI are incremented (+4)
when the DF is cleared and decremented (-4) when the DF is set. MOVS
and variant instructions are frequently used with REP prefixes
SCAS (and its variants SCASB, SCASW, SCASD) are used to scan a
string, searching for a particular byte or value (which is stored in
accumulator). SCASB compares a byte string at ES:DI (even if operand
is specified) from the accumulator and sets the flags similar to
subtraction. The SCAS takes 195 either REPE or REPNE prefixes

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

to repeat the instruction operation.


Example 4-6: The following assembly program searches for the letter ‗E‘
in a text string ―THIS IS TEXT‖
STRING DB ―THIS IS A TEXT‖ ; Define a byte string called STRING
CLD ; Clear direction flag
LEA DI,STRING ; Point to the start address of string
MOV AL,‘X‘ ; Search for the value ‗X‘
MOV CX,0EH ; STRING contains 14 byte
MOV BX,CX ; Save string length in BX

REPNE SCASB ; Scan STRING and loop till match


SUB BX,CX ; Calculate position of ‗X‘

Note that the prefixes REPE and REPNE are similar to REP in that they
cause the specified instruction to repeat for the number of times specified
by CX (until CX=0). Furthermore, these two prefixes stop the execution
of the repeating instruction when the zero flag (ZF) is equal/not equal 1.

4-6.5. Program Control (transfer) Instructions


Program control instructions, handle the following operations:
 Subroutine calls,
 Interrupts service routines, which are a special type of subroutines
 Conditional (and unconditional) Jumps.
 Looping

Subroutine calls and interrupts make the microprocessor re-direct


execution from the main program to another subroutine or interrupt
handler. Also, conditional jumps make the microprocessor leave its
regular sequence of instructions and branch to another location, in certain
conditions. Looping instructions make the processor repeat the execution
of a group of instructions, until CX content becomes zero or satisfy a
certain condition.

Table. 4-7. Program control instructions in 80x86 microprocessors.

Instruction Function Notes


CALL Call procedure (subroutine)
RET Return from procedure (subroutine)
INT Jump to interrupt service subroutine
INTO Interrupt if Overflow
IRET Return from interrupt subroutine

196
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

JA (JNBE) Jump if above (not below not equal) Unsigned condition


JAE (JNB) Jump if above or equal (not below) ―
JB (JNAE) Jump if below (not above not equal) ―
JC Jump if carry is set ―
JCXZ Jump if register CX equal zero ―
JE (JZ) Jump if equal (zero) ―
JG (JNLE) Jump if greater (not less nor equal) Signed condition
JGE (JNL) Jump if greater or equal (not less) ―
JL (JNGE) Jump if less (not greater nor equal) ―
JLE (JNG) Jump if less or equal (not greater) ―
JMP Unconditional jump
JNC Jump if no carry Unsigned condition
Instruction Function Notes
JNE (JNZ) Jump if not equal (not zero) ―
JNO Jump if no Overflow Signed condition
JNS Jump if no sign ―
JS Jump if sign ―
JO Jump if Overflow ―
JNP (JPO) Jump if no parity (parity odd) Unsigned condition
JP (JPE) Jump if parity (parity even) ―
LOOP Unconditional Loop ECX is decremented
LOOPE Loop if equal ―
LOOPZ Loop if zero ―
LOOPNE Loop if not equal ―
LOOPNZ Loop if not zero ―
BOUND Check index against array bound 286+
ENTER Enter a procedure 286+
LEAVE Leave a procedure 286+
IRETD Interrupt Return +386
JECXZ Jump if ECX is zero +386

The jump instructions have one operand, which specifies the jump target.
Note that conditional jumps may be signed or unsigned. In signed
conditional jump, the sign flag (SF) is taken into account. For instance,
when executing the instructions JGE/JNL, which means jump if greater
or equal / jump if not less, the microprocessor checks if ( SF XOR OF)=0.

4-6.6. Processor Control Instructions


Processor control instructions, handles the CPU flags, and internal
working. The following table depicts the main 80x85 processor control
instructions.
Table. 4-8. Processor control instructions in 80x86 microprocessors.

197
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Function Notes


CLC Clear Carry Flag +8086
CLD Clear direction Flag +8086
CLI Clear interrupt flag +8086
CMC Complementary carry flag +8086
ESC Escape to external processor +8086
HLT Halt processor Privileged Instruction
LOCK Lock bus during next instruction +8086
NOP No operation +8086
STC Set carry flag +8086
Instruction Function Notes
STD Set direction flag +8086
STI Set Interrupt flag +8086
APRL Adjust requested privilege level Privileged Instr, +286
CLTS Clear task switched flag "
LAR Load access rights "
LGDT Load global descriptor table "
LIDT Load Interrupt descriptor table "
LLDT Load local descriptor table "
LMSW Load machine status word "
LSL Load segment limit "
WAIT Wait for TEST pin activity (in 8086/8088) 8086 /8088
or BUSY pin activity (in 80286/80386) 80286/80386
LTR Load task register Privileged Instr., +386
SGDT Store global descriptor table +386
SIDT Store Interrupt descriptor table +386
SLDT Store local descriptor table +386
SMSW Store machine status word +386
STR Store task register +386
VERR Verify segment for reading +386
VERW Verify segment for writing +386
INVD Invalidate cache +486
INVLPG Invalidate TLB entry +486
WBIND Write back and invalidate cache +486

CPUID CPU identification Pentium


RDMSR Read from model-specific register Pentium
RDTSC Read from time stamp counter Pentium
RSM Resume from system management mode Pentium
WRMSR Write to model-specific register Pentium

The LOCK instruction prefix and 198 its corresponding output signal

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 4

should only be used to prevent other bus masters from interrupting a data
movement operation. LOCK may only be used with the following 80386
instructions when they modify memory. An undefined-opcode exception
results from using LOCK before any other instruction.

 Bit test and change: BTS, BTR, BTC.


 Exchange: XCHG.
 1-operand arithmetic and logical: INC, DEC, NOT, and NEG.
 2-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.

The so-called privileged instructions affect system data structures and


can only be executed when the current privilege level (CPL) is zero. If
the CPU encounters one of these instructions when CPL is greater than
zero, it signals a general protection exception. In addition to HLT,
LMSW, CLTS, LGDT, LIDT, LLDT, LTR, these instructions include the
instructions in the following table
Table. 4-9. Privilege instructions in 80x86 microprocessors.

Instruction Function Notes


MOV to/from CRn Move to Control Register n Privileged Instructions
MOV to /from DRn Move to Debug Register n Privileged Instructions
MOV to/from TRn Move to Test Register n Privileged Instructions

The ESC instruction (obsolete) was used in 8086 to pass information to


the 8087 coprocessor. Six bits of the ESC instruction were providing the
opcode to the coprocessor and the 8086/8088 was providing the memory
address (if required) and waits.
The WAIT instruction monitors the TEST pin (in 8086/8088) or the
BUSY pin (in 80286/80386). The TEST/BUSY pin is usually connected
to the floating point processor (8087/80287/80387). The coprocessor
keeps the TEST/BUSY pin inactive as long as it is performing a floating
point operation. Starting from 80486 and later processors, the Intel CPU's
don't contain a BUSY pin, as the coprocessor is built-in into the CPU.

4-7. Math Co-processor (x87) Instructions


The 80x87 math coprocessors were developed by Intel to extend the
capabilities of the 80x86 family of processors to include floating point
arithmetic. They include addition, subtraction, negation, multiplication,
division, square roots, and truncation. The operations also include
conversion instructions which can load or store a value from memory in
the following formats: BCD, 32- bit, 64-bit integers, 32-bit, 64-bit
199
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

or 80-bit floating point numbers. The 80x87 also includes some


transcendental functions like sine, cosine, tangent, arctangent, exponential
and logarithms of bases 10, 2 and e.

The 80487 and later processors contain a built-in coprocessor, called


FPU. However, the program must be written to take advantage of the
coprocessor. If the program contains no coprocessor instructions, the
coprocessor will never be utilized.

The Intel mnemonics for the 80x87 begin with the letter 'F' (no normal
8086 mnemonics begin with 'F'). For example, the mnemonic ADD
specifies a 8086 integer addition, while the mnemonic FADD selects a
8087 floating-point addition.

The 8087 has 68 basic instructions, which may be divided into the
following 6 groups:

1- Data transfer instructions (e.g., FST ST[n], FLD ST[n])


2- Arithmetic instructions (e.g., FABS, FADD, FSUB, FMUL, FDIV)
3- Comparison instructions (e.g., FCOMP, FICOMP)
4- Transcdental instructions (e.g., FSIN, FSQRT)
5- Constant instructions (e.g., FLD1, FLDPI)
6- Processor control instructions (e.g., FNOP, FFREE).

The floating point programming model is a stack-oriented model. It


makes extensive use of the stack registers ST[0] through ST[7] of the
coprocessor.

The top of stack ST[0], or simply ST, is used as a default destination


operand (like the accumulator, in the main processor). The following
table depicts some variants of FADD instruction:
Table 4-10. Floating point instructions format of x87 coprocessors.

Instruction Operands Function

FADD ST[2],ST[3] 2 stack operands ST[2] ST[2]+ST[3]

FADD ST[3] 1 stack operands ST ST+ST[3]

200
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

FADD [EBP+2] 1 memory operand ST ST+[EBP+3]

FADD No operand temp pop()


ST ST + temp

Appendix D illustrates all the 80x87 coprocessor instructions, with brief


discussion of their different variants.

4-8. Subroutine Calls & Interrupts in x86 Microprocessors


In this section we‘ll demonstrate how subroutine calls and interrupts are
handled in assembly language. We‘ll also show the differences and
similarities between subroutine calls and interrupt execution mechanisms
in 80x86 microprocessor systems.

4-8.1. Subroutine Calls


As we pointed earlier, subroutines are small programs or procedures (like
mathematical trigonometric functions), which we need to execute
frequently during the normal program. Subroutines are written once in the
main program and should be terminated with a RET instruction. When
they are needed in the main program, we use CALL addr, where addr is
the memory location of the subroutine code.

When a CALL instruction is executed in real mode, the microprocessor


saves (pushes) the contents of the IP register (2 bytes) onto the top of
stack. Then the microprocessor jumps to the address indicated in the
CALL instruction (by loading it to IP) and starts subroutine execution.
When the RET instruction is executed, at the end of subroutine, the
microprocessor re-loads (pops) the stack top into IP. Then the
microprocessor continues in the execution of the normal program.

If the microprocessor encountered another subroutine call inside the first


subroutine (i.e. a nested subroutine), then it pushes the current IP
contents (2 bytes) onto the stack (over the first 2 bytes). The last pushed 2
bytes (last in) are then popped when the microprocessor executes the RET
instruction of the second subroutine call. Figure 4-10 illustrates the call
and return from simple and nested subroutines. Also the following piece
of code use nested subroutines.

201
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Main Subroutine A Nested Subroutine B


Program A
.
CALL A .
CALL B
.
Main
Program
continues RET
RET

Fig. 4-10. Schematic representation of simple and nested subroutine calls.

202
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The following main program prints the content of BX (16-bits) as four


hexadecimal digits. To do so, it calls the PrintByte subroutine, which
prints the content of AL.

MAIN: …………… ; Main program


MOV AL,BH ; Get high order byte of B into AL
CALL PrintByte ; Print AL content (the BH)
MOV AL,BL
CALL PrintByte ; print AL content (the BL)
PrintByte: ; 1st Subroutine which prints AL
PUSH AX ; PUSH a copy of AX
MOV CL,4
SHR AL,CL ; Shift AL right 4 bits
CALL PrintNibble ; Print 1st hex digit (2nd Nibble of AL)
POP AX ; Restores original AL
CALL PrintNibble ; Print 2nd hex digit (1st Nibble of AL)
RET

PrintNibble: ……. ; 2nd Subroutine, which prints


……. ; one hex digit (in 1st Nibble of AL)
RET

Here the address of the subroutine is indicated by a label PBYTE. The


PBYTE subroutine, in turn, calls another subroutine PRINT, which prints
a hexadecimal digit. The details of PRINT subroutine are not shown here.

So, subroutines can be nested to any required depth; the only limitation is
the stack size. As we'll see, later in chapter 5, subroutines are usually
called procedures in assembly programming environments.

4-8.2. Interrupts (INT)


Interrupts are an efficient way for communication between computer and
external events. Think of an interrupt as a doorbell. If you did not have a
doorbell on your home, you would have to check periodically to se
whether someone was at your door. This repeated checking, or polling,
tends to be inefficient and wastes your time. The 80x86 has two
mechanisms for interrupting program execution:

1. Interrupts, which are asynchronous events typically triggered by


external devices needing attention.
2. Exceptions, which are the responses of the CPU to certain
conditions detected during the execution of an instruction.

203
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Interrupts and exceptions are alike in that both cause the processor to
temporarily suspend its present program execution in order to execute a
program of higher priority. The major distinction between these two
kinds of interrupts is their origin. An exception is always reproducible by
re-executing with the program and data that caused the exception,
whereas an interrupt is generally independent of the executing program.

We have seen so far, in chapter 2, the 80x86 interrupts fall into one of the
following two categories:

 Hardware Interrupt
* NMI : non-maskable Interrupt
* INTR : maskable Interrupt
 Software Interrupt (handled by INT instruction)
* INT 0 ~ INT 255
* INT 0 : divide error
* INTO = INT 4 : interrupt on overflow

Application programmers are not normally concerned with servicing


interrupts1. Certain exceptions, however, are of interest to applications
programmers, and many operating systems give applications programs
the opportunity to service these exceptions. However, the operating
system itself defines the interface between the applications programs and
the exception mechanism of the x86 microprocessors. Table 4-11
highlights the exceptions that may be of interest to applications
programmers.
 A divide error exception results when the instruction DIV or IDIV is
executed with a zero denominator or when the quotient is too large for
the destination operand. The debug exception may be reflected back
to an applications program if it results from the trap flag (TF).
 A breakpoint exception results when the instruction INT 3 is
executed. This instruction is used by some debuggers to stop program
execution at specific points.
 An overflow exception results when the INTO instruction is executed
and the overflow (OF) flag is set (after an arithmetic operation that
sets the OF flag). The bounds check exception results when the
BOUND instruction is executed and the array index it checks falls
1
Interrupt service routines are supplied by hardware suppliers and are usually written
by system programmers
204
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

outside the bounds of the array. Invalid opcodes may be used by some
applications to extend the instruction set. In such a case, the invalid
opcode exception presents an opportunity to emulate the opcode.
 The "coprocessor not available" exception occurs in older x86
processors if the program contains instructions for a coprocessor, but
no coprocessor is present in the system.
 A coprocessor error is generated when a coprocessor detects an illegal
operation.
Table 4-11. First 10 exceptions and interrupts in x86 systems

VECTOR NUMBER DESCRIPTION


00 Divide error (exception)
01 Debug trace (single step)
02 NMI Interrupt
03 Breakpoint
04 Overflow (INTO Detected)
05 BOUND Range Exceeded
06 Invalid Opcode
07 Coprocessor Not Available
08 Double Exception
09 Coprocessor Segment Overrun

The instruction INT generates an interrupt whenever it is executed; the


processor treats this interrupt as an exception. The effects of this interrupt
(and the effects of all other exceptions) are determined by exception
handler routines provided by the application program or as part of the
systems software (provided by systems programmers). The INT
instruction (software interrupt) is a special type indirect CALL
instruction. The format of the INT instruction takes the following form:

INT nn

where nn is a number, ranging from 0 to 255, indicating the interrupt


type. This number is called the interrupt vector number. The
microprocessor, uses this number to calculate the address of interrupt
service routine (ISR). The interrupts service routines jumped to by an
INT instruction are similar to usual subroutines reached by CALL
instruction, but they should end with IRET instruction (instead of RET).

205
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The execution of INT nn , in real mode, causes the following sequence of


events:

At first, the flags register, the CS register and IP are all pushed onto
stack. The interrupt service routine address is fetched from the interrupt
vector table, which starts at the absolute memory address 0:0 to 0:3FFH.
The CS:IP are loaded with the interrupt vector located at the absolute
memory address 0:4*nn and 0:4*nn+2, respectively.
Then the program jumps to the new CS:IP address, which is the location
of interrupt service routine, and starts its execution. When the
microprocessor encounters IRET instruction (in the end of interrupt
service routine), it pops the original CS:IP as well as flags from the stack,
resuming to the main program and continues its execution.

Figure 4-11 depicts the structure of the interrupt vector table (IVT),
which is located at the lowest 400H bytes (1kB) of memory. It‘s some
sort of a big jump table, which contains the addresses of interrupt service
routines.

IP255
Interrupt Vector INT 255 03FE
CS255 M
03FC
IP254
Interrupt Vector INT 254 03FA E
CS254
03F8
M
IPnn
Interrupt Vector for INT nn 4*nn +2
CSnn O
4*nn +0
IP00 R
Interrupt Vector for INT 0 0002
CS00 Y
0000

Fig. 4-11. Schematic representation of interrupt vector table.

It should be noted that the software interrupts (INT 00 through INT 04)
have predefined tasks and cannot be used for any other purpose. For
instance, the divide by zero interrupt (INT 00) is sometimes referred to as
a processor exception that the CPU is unable to handle, since the division
by zero produces undefined answer. So, INT 00 is invoked by the
microprocessor when an attempt is made to divide a number by zero. In
the IBM PC, the interrupt service routine of this interrupt displays the
message ―DIVIDE BY ZERO ERROR‖ on the screen of the
microcomputer. The following code, for instance, will invoke INT 00:
206
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

MOV AL, 20 ; AL = 20
SUB CL,CL ; CL = 0
DIV CL ; AL/CL = 20/0  AL

Example 4-7: Use the dump instruction of the DEBUG program to know
the memory address of the interrupt service routines (ISR) of INT 00
through INT 03
Solution: We introduce the DEBUG program in the next chapter and
show how to handle assembly programs using it. However, the DEBUG
program has so many commands. It is so easy to display the memory
content of a range of bytes using the dump command ―-D‖ followed by
the range of address to be displayed. The interrupt vectors of the first 4
interrupts can be found in the following address range: 0000:000 through
0000:000F

C:\> DEBUG
-D 0000:0000 - 000F
0000:0000 E8 56 2B 02 56 07 70 00 – C3 E2 00 F0 56 07 70 00

The addresses (CS:IP) of the corresponding ISR's are as follows:


Interrupt Interrupt Vector Table Interrupt Vector
(IVT) CS:IP
INT 00 00-03 022B:56E8
INT 01 04-07 0756:0070
INT02 08-0B F000:E2C3
INT03 0C-0F 0070:0756

Note that the low address has the low value, because of the little endian
conventions (low byte = low address) used in DEBUG program.

4-8.3. Masking Interrupts (Turning Interrupts Off)


When the interrupt comes from external device or from a programmable
interrupt controller (PIC), like 8259 chip, it can be "masked" or turned
off. However some events occurring to the computer, like memory parity
error, cannot be masked, these are described as non-maskable interrupts
(NMI). The RESET button is an example of NMI - its effect for whatever
the computer is doing is fatal! NMI is often used to control the computer
hard disks since reading and writing to disk is an operation, which should
not be interrupted by anything else (under single-task operating systems).
Interrupts can also be turned off completely, as follows:

207
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

 Assembly language instruction code CLI clears the interrupt flag IF-
and disables all maskable interrupts
 Assembly language instruction code SLI sets the interrupt flag IF-
and enables the maskable interrupts.

Note that an interrupt be interrupted. But any two interrupts cannot occur
simultaneously- They never do because they are given different priorities!

4-8.4. Interrupt Priority


We can turn interrupts off or on, but what about priority? Interrupt
requests (IRQ‘s) are given an order of priority so that one of higher
priority can interrupt another with a lower priority (a fire alarm is more
important than a telephone ringing!). The programmable interrupt
controller (PIC) chip is responsible of achieving these priorities. The
order is usually as follows in IBM PC & compatibles:

 RESET (highest priority)


 NMI
 IRQ7  LPT1
 IRQ6  FDD
 IRQ5
 IRQ4  COM1
 IRQ3  COM2
 IRQ2  A 2nd PIC (8259 chip) can be chained here
 IRQ1  KB
 IRQ0  Timer (8243 chip)

Notice that IRQ7 is used by the printer drivers (in MSDOS), so has a very
high priority, and that this accounts for why the MSDOS computers were
tied up and unable to do anything during a print job.

Example 4-8.
Show how the IBM PC uses Date and Time interrupt to keep its internal
clock.
Solution:
The computer keeps track of the date and the time to within 1/100 th
seconds, and time is stored as follows:

Hours: Minutes: Seconds: Centi-Seconds.

208
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The sequence of events used to keep track of time is as follows:

1- The computer is running some program - it could be anything!


2- Each timer tick generates a hardware interrupt INT 8 (called IRQ0)
3- The processor finishes its present instruction.
4- The processor checks to see if this interrupt is allowed.
5- The CPU puts the position of instruction pointer (CS:IP) onto stack
6- Load the CS:IP with the 4 bytes at the special interrupt address.
This contains the address of the code to service the timer interrupt
7- Run the interrupt subroutine code to read the timer
8- Return from the interrupt by restoring the CS:IP from the stack
9- Continue with the original program that was interrupted from step

In a following example (Example 4-9) we depict, with an assembly


program, how to use INT 21 to get the system time.

4-9. IBM PC Interrupts & DOS Calls


In this section we describe the supported interrupts in the IBM PC, the
location of their service routines in the main memory after the PC boot-up
process and how they can be called from within an assembly program, in
real mode or under DOS. You may wonder why we're talking about
DOS? We no longer use it! Windows is the entire new thing now. The
simple answer is that Windows is still running in certain manner, on top
of DOS. In fact, the real mode, in which the x86 processors should start,
works with DOS. So, it is not surprising to hear that Windows 95/98 had
hidden DOS inside their core. Subsequent versions of Windows changed
a lot, but they still rely on new flavors of extended DOS. Therefore,
understanding how the PC is booting and loading its components, under
DOS, and how interrupts are treated at that level is very important and is
going to be so in the future.

4-9.1. PC Boot Process


When the x86 processors are powered up or re-initialized (by reset) the
CPU is in real mode (with real address) and all protection features are
disabled and the memory space is limited to 1MB, of physical memory.
This is very similar to what happen with earlier IBM PC's, operating in
real mode. Therefore, we assume an IBM PC equipped with 8088
microprocessor, to discuss the initialization (boot-up) process. The boot-
up process begins when the PC is powered up or reset. The reset circuit in
the clock generator chip produces the reset pulse that is applied to pin 21
of the 8088 CPU. This pin is connected to the reset circuit inside the
micro-processor to force a reset address on address bus A0 through A20.
209
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

This will execute a jump instruction at address F000:FFF0 inside the


ROM BIOS chip that points to the first instruction of the BIOS.
The ROM BIOS program is approximately 8K bytes long, or so, and
controls all of the hardware on the system board and interface cards. The
CPU support chips are initialized with the proper default values to control
such things as the video monitor, disk drives, printer ports and keyboard.
After the initialization of all the hardware, the program executes a very
extensive diagnostic type test on the x86 CPU, ROMS, RAM etc. to
complete what is called the Power-On-Self-Test (POST). If there are no
critical errors during POST, the default disk drive (e.g., C:) is turned ON
and tested. The pass condition will cause the head to position over track
0, head 0, sector 0 of the disk and the boot loader program is transferred
into memory.

Once loaded in memory, the boot loader program is given control of the
CPU and a series of instructions are executed that will look in the
directory of the disk for the system files, DOS.SYS and BIO.SYS 2. If
these two system files are on the disk, they are loaded into low memory
in that order, along with any driver programs that are listed in the ―device
= statement‖ of config.sys file. Control of the CPU is then given to the
DOS program to finish the boot up process, by loading the command
processor program COMMAND.COM into memory in the next available
space right after DOS.SYS. If the system files do not exist or they are
corrupt in any way, the boot loader program will display the familiar
message " DISK BOOT FAILURE ".

4-9.2. PC Interrupt Service Routines


The boot process is complete when COMMAND.COM is given the final
control of the CPU. Atually, the operating system is made up of three
basic programs (IBMBIO.SYS, IBMDOS.SYS and COMMAND.COM)
that are loaded in low memory starting at 00000H and ending at 0B000H.
The actual ending address will depend on the version of DOS and the
number of device drivers that are loaded during the boot process. Figure
4-12 shows the memory map of the IBM PC after the boot up process.
The first 3FF bytes of RAM, starting at 00000, contain the interrupt
vector table (IVT), which is a table of addresses that point to the interrupt
sub-routines in IBMBIO.SYS, IBMDOS.SYS or the ROM BIOS chip.
Table 4-12 depicts the main interrupt vectors in IBM PC. A complete
listing of the interrupt vector table (IVT) in the IBM PC can be found in
2
In the original IBM PC these files are called IBMDOS.SYS and IBMBIO.SYS. In IBM-compatible
PCs which run Microsoft DOS, they are called MSDOS.SYS and IOSYS.SYS.

210
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Chapter 9. The next FF bytes starting at 00400 are used by the operating
system to store a complete equipment list in HEX form. Then starting at
00500 is the IBMBIO.SYS program that contains all of the sub-routines
used by the operating system to interface with the hardware. The BIOS
contains very low level instructions that communicate with such devices
as the keyboard, printer, video, disk drives and the chips on the mother
board. The next program loaded-in is called IBMDOS, which contains
subroutines that interface with the Disk Operating System (DOS).

FFFFFH ROM (BIOS)

 256kB
C8000H

 128B 
BFFFFH VRAM (Video Adaptor RAM swaps here)

A0000H
9FFFFH

Transient Program Area (TPA)

640kB
DOS (COMMAND.COM program)
DOS (IBMDOS.SYS / MSDOS.SYS)
00500H DOS (IBMBIOS.SYS / IO.SYS)
004FFH DOS (Equipment List)
00400H
003FFH Interrupt Vector Table (IVT)

00000H

Fig. 4-12(a). IBM PC memory map after boot-up process,

The instructions in IBMDOS.SYS are communicating with the


IBMBIO.SYS program and the last program loaded into memory, is
COMMAND.COM. Command.com is a command interpreter that
monitors the keyboard and waits for you to press the enter key. When the
enter key is pressed the command.com determines what command was
entered. The command will either be an internal command or an external
command, depending on where the instructions for the command are
stored.
211
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

All internal commands are located inside command.com itself and all
external commands are located on the disk. After the command is
interpreted, command.com will pass all the parameters to the IBMDOS
program. IBMDOS will process the parameters into the proper format for
the IBMBIO program, which will actually turn-on the drive or control the
hardware needed for the command entered. MS-DOS is a three level
operating system, such that it is made up of three programs that are at
three levels of programming. The COMMAND.COM program is the
highest level because it understands commands like DIR, COPY etc. The
IBMDOS.SYS is the second highest level because it receives instructions
from command.com and passes them on to IBMBIO.SYS, which is the
lowest level of programming.

All of the subroutines, that are part of these three programs, perform
specific functions in the computer. For example, there are sub-routines
that are written just to control the video monitor and how it displays data
on the screen. The subroutines used in the operating system require many
8088 instructions to perform a specific task and are very complex in the
program style. Fortunately, IBM and Microsoft designed a method that
allows the computer programmer to utilize all of these tested and proven
subroutines in their own programs. This makes programming easier
because programmers will spend less time writing code that has already
been written by knowledgeable programmers.

In order to use a software interrupt, the programmer must set-up the


program before the INT instruction is actually executed. Most interrupts
require certain values for the proper operation of the interrupt sub-routine.
In other words, all interrupts require an input from your program and it
must be stored inside specific registers. Therefore, to use these interrupts
you must make reference to the interrupt list that is supplied by IBM and
Microsoft (see table 4-12).

The interrupts used in programs are going to access either IBMDOS or


IBMBIO and you should know the difference before you use them. In
general, any interrupt that calls IBMDOS is going to execute a little slower
because, as was stated before, IBMDOS is a higher level program and its
instructions are passed onto IBMBIOS which actually turns on the chips.
Going through IBMDOS first, slows down the program but because it is a
higher level, the input required from your program is much easier to
develop and set-up.

212
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Table 4-12. Summary of main Interrupt vectors, in IBM PC and compatible computers

VECTOR NUMBER DESCRIPTION


0 Divide by zero
1 Single step (for Debugging)
2 NMI Interrupt
3 Breakpoint instructions
4 Overflow
5 Print screen
6,7 Reserved
8 Timer (time of day) H/W interrupt
8 Double Exception
9 Keyboard H/W interrupt
A Reserved
B,C Serial communications H/W interrupt
D Hard Disk H/W interrupt
E Floppy Disk H/W interrupt
F Printer H/W interrupt
10 Video I/O Calls
11 Equipment Check Calls
12 Memory Check Calls
13 Diskette I/O Call
14 Serial Communication (RRS232) I/O Calls
15 Cassette I/O (now System functions) Calls
16 Keyboard I/O Calls
17 Printer I/O Calls
18 ROM BASIC entry code
19 Boot strap loader
1A Time of day {Timer) Call
1B Get control, by Keyboard break
1C Get control on timer interrupt
1D Pointer to Video initialization table
1E, 1F Pointers to diskette table, character generator
20 DOS Program terminate
21 DOS Function Calls
25-26 Direct disk read, disk write and handle errors
27 Terminate and Stay Resident (TSR) in memory
28-5F Reserved
60-67 Reserved for user software interrupts
80-F0 Reserved for BASIC interpreter
F1-FF Not used

Not that the interrupt types 20H-3FH are serviced by DOS routines. DOS
interrupts are often referred to as DOS CALLS and they are all INT
instructions. The other interrupt type is often called BIOS interrupt because
it calls sub-routines inside IBMBIO.SYS or the ROM BIOS chip.
213
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The BIOS interrupts are much faster than DOS interrupts because the
BIOS is a low-level program, just in contact with the hardware layer, as
shown in figure 4-12.

Application

DOS

BIOS

HW

Fig. 4-12(b). IBM PC layered architecture

4-9.3. BIOS CALLS and DOS CALLS


All of these Interrupts are sub-routines in one of the operating system
programs and they are further divided into smaller sub-routines called
Functions and Sub-function. For example INT 21 has a total of 87 DOS
functions and sub-functions within its code.

In order to access a function within an interrupt, the calling program must


have the function number inside the AH register. The function results are
usually returned into AX and DX registers. For instance, INT 21 function
01 (AH=01) input a character from KB into AL. Also INT 21 function 02
(AH=02) displays a character from DL to screen. For faster input/output,
one can also use INT 10 that has a total of 55 BIOS functions and sub-
functions within its code.

Example 4-9:
Show how to use INT 21 to get the system time (DOS function 44)?

Solution:
The INT 21 is concerned with DOS function calls. To get the system time
we use the DOS function number 44. This number should be put into AH,
before we call the interrupt 21. After calling INT 21, the result (system
time) can be then found in CX and DX, as follows:

Interrupt ENTRY: AH <= 2CH (DOS function 44),


Interrupt EXIT: CH = hrs, CL = min, DH = sec, DL = hundred of sec
214
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

MOV AH,2C ; Get system time (Function 44)


INT 21 ; DOS call
: ; Now system time in CX and DX.

A more detailed program, that gets the time of the day, converting it to
ASCII characters ‗hh:mm:ss‘ and displays it on the screen, is presented in
Problem (5-13), at the end of the next chapter.

Example 4-10:
Show how to put the cursor in the middle of the screen and print the
letters ―Hi‖ using the BIOS video calls (INT 10).

Solution:
Use INT 10, Function 2 (set cursor position) and Function 9 (display a
character) as shown in the following program:

MOV AH,2 ; Step 1, CALL Function 2


MOV DH,C ; Position the cursor to:
MOV DL,24 ; Column 24
MOV BH,0 ; Row 0
INT 10 ; BIOS Video calls

MOV AH,9 ; Step 2, CALL Function 9


MOV AL,48 ; Move letter 'H to AL
MOV BH,0 ; Display the letter 'H'
MOV BL,17 ; Blue Background
MOV CX,1 ; White foreground
INT 10 ; BIOS Video calls

MOV AH,2 ; Step 3, CALL Function 2


MOV DH,C ; Position the cursor to
MOV DL,25 ; Column 25
MOV BH,0 ; Row 0,
INT 10 ; BIOS Video calls

MOV AH,9 ; Step 4, CALL Function 9


MOV AL,69 ; Move letter 'i' to AL
MOV BH,0 ; Display the letter ''i'
MOV BL,17 ; Blue background
MOV CX,1 ; White foreground

215
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

INT 10 ; BIOS Video calls

INT 20 ; End the program

As an exercise, the reader may make use of the last two examples, to get
the system time and display it in the middle of the screen. You may use
the DEBUG program for writing and debugging your assembly program,
for that purpose.

4-10. Interrupts in x86 Protected Mode


We have seen so far that the protected mode is a special mode on the
80386, and later processors, which is designed to provide protection
between different running programs. For this, the protected mode has
been the mode of choice for in 32-bit multitasking environments.

We have also stated earlier, that all interrupts and exceptions share a
common feature; the current execution location (CS:IP) and flags are
saved onto the stack, and the control is transferred to the interrupt service
routine (ISR). We also mentioned that x86 machines supports 256
interrupt, invoked by the instruction INT nn, where nn from 0 to 255.

In real mode, the interrupt number (nn) is used to point at a location in the
interrupt vector table (IVT), where the ISR address is there. The
difference between real mode and protected mode interrupts is that the
IVT is replaced with an interrupt descriptor table (IDT), in protected
mode. The IDT still contains up to 256 interrupt levels entries but each
level is accessed via an interrupt gate instead of the interrupt address.
Thus the first 1kB of memory no longer contains interrupt vectors.
Instead, the IDT may be located anywhere in the memory map of the x86
system. In protected mode, the ISR is reached via a gate in the IDT. In the
following sections we show what a gate is and what the IDT is, in details.

4-10.1. Gates
A Gate is a system object that points to a procedure in the code segment.
Each gate has a descriptor and a privilege level. There exist 4 types of
gates:

1- Call gates, invoked by standard CALL instructions,


2- Interrupt gates, invoked by software INT instructions,
3- Trap gates, invoked by hardware interrupts,
4- Task gates, invoked by JMP, CALL or INT or hardware interrupts.

216
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-10.2. Interrupt Descriptor Table (IDT)


The IDT is a system object that contains descriptors that relate to
hardware and software interrupts. A special register inside the IDT is
called IDTR contains the linear address (offset) and size limit of the IDT.
Figure 4-13 depicts the architecture of the IDT records (called interrupt
gate descriptors). The DPL field indicates the interrupt descriptor
privilege level, and the Offset fields contain the ISR offset address. The
behavior of interrupt gates is similar to that of call gates, although the
later don't contain a word count field. So, they can point to code segment
(via the selector) of specific privilege levels.

Interrupt Descriptor Table Register (IDTR)


63 ….. 48 47 46 45 31 …….. 16 15 … 0
Offset P DPL S Type 00H Segment Selector Offset
A16-A31 A0-A15

 DPL: descriptor privilege level


 P: Present bit (set 1, whenever the interrupt entry can be accessed)
 Type: Interrupt type
 S: Supervisor bit

Fig. 4-13. Architecture of the interrupt descriptor table registers


(interrupt gate descriptor).

4-10.3. Interrupt Masking in Protected Mode


As in the real mode, software interrupts (initiated by INT instruction) can
be turned on (enabled) and off (masked) by the STI and CLI instructions.
These instructions set and clear the interrupt flag (IF). Note that both IF
and TF (trap flag) are cleared when the contents of EFLAGS are stacked
(pushed) in the beginning of any interrupt. Note also that the when TF is
set, a trap interrupt (INT 3) is generated, after the execution of each
instruction. This process is called the step-by-step (or single-step)
execution and it is very useful for program debugging.

In protected mode, there exist other situations where software interrupts


are not permitted because of access right or priority issues. Generally,
interrupts have the following priority ranking:

1- Normal (non-debug) faults 2- Traps (e.g., INT 3)


3- Debug traps 4- Debug faults
5- Hardware NMI 6- Hardware INTR interrupts

217
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-10.4. Debugging in Protected Mode


Debugging is accomplished with breakpoints (inserted at certain points of
your code, by a debugger) and the ability of the microprocessor to
execute step-by-step. In 80386 and later processors, the debug registers
(DR0-DR7) can implement 4 break points; each one identifies a linear
address, as shown in figure 4-14. When the microprocessor access that
address, a debug fault interrupt occurs.

Debug Registers

Linear Breakpoint address 0 DR0


Linear Breakpoint address 1 DR1
Linear Breakpoint address 2 DR2
Linear Breakpoint address 3 DR3
Intel reserved. Do not define DR4
Intel reserved. Do not define DR5
Breakpoint Status DR6
Breakpoint Control DR7

Fig. 4-14. Debug registers in 80386 and higher processors

218
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-11. New Instructions Sets of x86-64 Microprocessors


The x86-64 architecture supports the full legacy x86 instruction set, with
additional instructions to support long mode. The application
programming instructions are organized into four subsets, as follows:

General-Purpose Instructions—These are the basic x86 integer


instructions used in virtually all programs. Most of these instructions
load, store, or operate on data located in the general-purpose registers
(GPRs) or memory. Some of the instructions alter sequential program
flow by branching to other program locations. These instructions are all
backward compatible with x86 instructions we explained so far.

128-Bit Media Instructions—These are the streaming SIMD extension


(SSE, SSE2, SSE3, SSE4A) instructions that load, store, or operate on
data located primarily in the 128-bit XMM registers. They perform
integer and floating-point operations on vector (packed) and scalar data
types. Because the vector instructions can simultaneously perform a
single operation on multiple sets of data, they are called single-
instruction, multiple-data (SIMD) instructions. They are useful for media
and scientific applications that operate on blocks of data.

64-Bit Media Instructions—These are the multimedia extension (MMX


technology) and AMD 3DNow! technology3 instructions. These
instructions load, store, or operate on data located primarily in the 64-bit
MMX registers. Like their 128-bit counterparts, described above, they
perform integer and floating-point operations on vector and scalar data
types. Thus, they are also SIMD instructions and are useful in media
applications that operate on blocks of data.

Note: AMD no longer recommends the use of 3DNow! instructions,


which have been superseded by more efficient 128-bit media
counterparts.

x87 Floating-Point Instructions—These are the floating-point


instructions used in legacy x87 applications. They load, store, or operate
on data located in the x87 registers. Some of these application-
programming instructions bridge two or more of the above subsets. For
3
In 1997, AMD introduced the so called 3Dnow! as a natural evolution of the Intel
MMX technology from integers to floating point. It uses the exact same register
naming convention as MMX, that is MM0 through MM7. The only difference is that
instead of packing byte to quad-word integers into these registers, one would pack
single precision floating points into these registers.
219
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

example, there are instructions that move data between the general-
purpose registers and the XMM or MMX registers, and many of the
integer vector (packed) instructions can operate on either XMM or MMX
registers, although not simultaneously.

4-11.1. Media Instructions


Media applications—such as image processing, music synthesis, speech
recognition, full-motion video, and 3D graphics rendering—share certain
characteristics:

• They process large amounts of data,


• They often perform the same sequence of operations repeatedly across
the data,

• The data are often represented as small quantities, such as 8 bits for
pixel values, 16 bits for audio samples, and 32 bits for object coordinates
in floating-point format. The 128-bit and 64-bit media instructions are
designed to accelerate these applications. The instructions use a form of
vector (or packed) parallel processing known as single-instruction,
multiple data (SIMD) processing. The vector technology has the
following characteristics: A single register can hold multiple independent
pieces of data. For example, a single 128-bit XMM register can hold 16
8-bit integer data elements, or four 32-bit single-precision floating-point
data elements.

• The vector instructions can operate on all data elements in a register,


independently and simultaneously. For example, a PADDB instruction
operating on byte elements of two vector operands in 128-bit XMM
registers performs 16 simultaneous additions and returns 16 independent
results in a single operation. 128-bit and 64-bit media instructions take
SIMD vector technology a step further by including special instructions
that perform operations commonly found in media applications.

The 128-bit and 64-bit media instructions include saturating-arithmetic


instructions to simplify this type of operation. A result that would
overflow or underflow, is forced to saturate at the largest or smallest
value that can be represented in the destination register.

4-11.2. Floating-Point Instructions


The x86-64 architecture provides three floating-point instruction subsets,
using three distinct register sets:

220
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

• 128-Bit Media Instructions support 32-bit single-precision and 64-bit


double-precision floating point operations, in addition to integer
operations. Operations on both vector data and scalar data are supported,
with a dedicated floating-point exception-reporting mechanism. These
floating point operations comply with the IEEE-754 standard.
• 64-Bit Media Instructions (the subset of 3DNow! technology
instructions) support single precision floating-point operations.
Operations on both vector data and scalar data are supported, but these
instructions do not support floating-point exception reporting.
• x87 Floating-Point Instructions support single-precision, double-
precision, and 80-bit extended precision floating-point operations.

Only scalar data are supported, with a dedicated floating-point exception-


reporting mechanism. The x87 floating-point instructions contain special
instructions for performing trigonometric and logarithmic transcendental
operations. The single-precision and double-precision floating-point
operations comply with the IEEE-754 standard. Maximum floating-point
performance can be achieved using the 128-bit media instructions. One of
these vector instructions can support up to four single-precision (or two
double-precision) operations in parallel.

In 64-bit mode, the x86-64 architecture doubles the number of legacy


XMM registers from 8 to 16. Applications gain additional benefits using
the 64-bit media and x87 instructions. The separate register sets
supported by these instructions relieve pressure on the XMM registers
available to the 128-bit media instructions. This provides application
programs with three distinct sets of floating-point registers. In addition,
certain high-end implementations of the x86-64 architecture may support
128- bit media, 64-bit media, and x87 instructions with separate
execution units.

221
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-12. Summary of the Recent x86 (32 & 64-bit) Instructions


This section recapitulate the additional instruction sets of the most recent
IA-32 and Intel 64 as well as x86-64 processors4,

4-12.1. MMX Instructions.


The following instructions were added with Pentium MMX:

EMMS, MOVD, MOVQ, PACKSSDW, PACKSSWB, PACKUSWB,


PADDB, PADDD, PADDSB, PADDSW, PADDUSB, PADDUSW,
PADDW, PAND, PANDN, PCMPEQB, PCMPEQD, PCMPEQW,
PCMPGTB, PCMPGTD, PCMPGTW, PMADDWD, PMULHW,
PMULLW, POR, PSLLD, PSLLQ, PSLLW, PSRAD, PSRAW, PSRLD,
PSRLQ, PSRLW, PSUBB, PSUBD, PSUBSB, PSUBSW, PSUBUSB,
PSUBUSW, PSUBW, PUNPCKHBW, PUNPCKHDQ, PUNPCKHWD,
PUNPCKLBW, PUNPCKLDQ, PUNPCKLWD, PXOR.

The so-called MMX+ instructions were added with Athlon processor,


from AMD. They are the same as the SSE SIMD Integer Instructions
which operated on MMX registers. The following extended MMX
(EMMX) instructions were added with 6x86MX from Cyrix,

PAVEB, PADDSIW, PMAGW, PDISTIB, PSUBSIW, PMVZB,


PMULHRW, PMVNZB, PMVLZB, PMVGEZB, PMULHRIW,
PMACHRIW

The following 3Dnow! Instructions were added with K6-2 from AMD.

FEMMS, PAVGUSB, PF2ID, PFACC, PFADD, PFCMPEQ, PFCMPGE,


PFCMPGT, PFMAX, PFMIN, PFMUL, PFRCP, PFRCPIT1, PFRCPIT2,
PFRSQIT1, PFRSQRT, PFSUB, PFSUBR, PI2FD, PMULHRW,
PREFETCH, PREFETCHW

4-12.2. Streaming SIMD Extensions (SSE)


Modern x86 CPUs contain SIMD instructions, which largely perform the
same operation in parallel on many values encoded in a wide SIMD
register. Various instruction technologies support different operations on
different register sets, but taken as complete whole (from MMX to SSE3)
they include general computations on integer or floating point arithmetic

4
IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit
microprocessors. The x86-64 architecture or AMD64 is the instruction set architecture and
programming environment which is the superset of Intel‘s 32-bit architectures. It is compatible with the
IA-32 architecture, Intel 64 architecture as well as AMD64.

222
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

(addition, subtraction, multiplication, shift, minimization, maximization,


comparison, division or square root). For example, PADDW MM0, MM1
performs 4 parallel 16-bit (indicated by W) integer adds (indicated by the
PADD) of MM0 values to MM1 and stores the result in MM0.

SSE and SSE-2 also include floating point modes in which only the very
first value of the registers is actually modified. Some other unusual
instructions have been added including a sum of absolute differences
(used for motion estimation in video compression, such as is done in
MPEG) and a 16-bit multiply accumulation instruction (useful for digital
filtering). SSE3 and 3DNow! extensions, include addition and subtraction
instructions for treating paired floating point values like complex
numbers. These instruction sets also include numerous fixed sub-word
instructions for shuffling, inserting and extracting the values around
within the registers.

A. SSE Floating-Point Instructions


ADDPS, ADDSS, CMPPS, CMPSS, COMISS, CVTPI2PS, CVTPS2PI,
CVTSI2SS, CVTSS2SI, CVTTPS2PI, CVTTSS2SI, DIVPS, DIVSS,
LDMXCSR, MAXPS, MAXSS, MINPS, MINSS, MOVAPS,
MOVHLPS, MOVHPS, MOVLHPS, MOVLPS, MOVMSKPS,
MOVNTPS, MOVSS, MOVUPS, MULPS, MULSS,
RCPPS, RCPSS, RSQRTPS, RSQRTSS, SHUFPS, SQRTPS, SQRTSS,
STMXCSR, SUBPS, SUBSS, UCOMISS, UNPCKHPS, UNPCKLPS

B. SSE Integer Instructions


ANDNPS, ANDPS, ORPS, PAVGB, PAVGW, PEXTRW, PINSRW,
PMAXSW, PMAXUB, PMINSW, PMINUB, PMOVMSKB,
PMULHUW, PSADBW, PSHUFW, XORPS

4-12.3. Streaming SIMD Extensions 2 (SSE2).


The following instructions were added with Pentium 4

ADDPD, ADDSD, ANDNPD, ANDPD, CMPPD, CMPSD, COMISD,


CVTDQ2PD, CVTDQ2PS, CVTPD2DQ, CVTPD2PI, CVTPD2PS,
CVTPI2PD, CVTPS2DQ, CVTPS2PD, CVTSD2SI, CVTSD2SS,
CVTSI2SD, CVTSS2SD, CVTTPD2DQ, CVTTPD2PI, CVTPS2DQ,
CVTTSD2SI, DIVPD, DIVSD, MAXPD, MAXSD, MINPD, MINSD,
MOVAPD, MOVHPD, MOVLPD, MOVMSKPD, MOVSD, MOVUPD,
MULPD, MULSD, ORPD, SHUFPD, SQRTPD, SQRTSD, SUBPD,
SUBSD, UCOMISD, UNPCKHPD, UNPCKLPD, XORPD
223
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-12.4. The SSE3 Instructions


The following instructions added with Pentium 4 supporting SSE3

A. SSE3 SIMD Integer Instructions

MOVDQ2Q, MOVDQA, MOVDQU, MOVQ2DQ, PADDQ, PSUBQ,


PMULUDQ, PSHUFHW, PSHUFLW, PSHUFD, PSLLDQ, PSRLDQ,
PUNPCKHQDQ, PUNPCKLQDQ

B. SSE3 SIMD Floating-Point Instructions

ADDSUBPD, ADDSUBPS (for Complex Arithmetic)


HADDPD, HADDPS, HSUBPD, HSUBPS (for Graphics)
MOVDDUP, MOVSHDUP, MOVSLDUP (for Complex Arithmetic)

C. SSSE3 Instructions
These instructions are added with Xeon 5100 series and initial Core 2

PSIGNW, PSIGND, PSIGNB, PSHUFB, PMULHRSW, PMADDUBSW


PHSUBW, PHSUBSW, PHSUBD, PHADDW, PHADDSW, PHADDD
PALIGNR , PABSW, PABSD, PABSB

4-12.5. The SSE4 Instructions


The following SSE4.1 instructions were added with Core 2 x9000 series

MPSADBW, PHMINPOSUW , PMULLD, PMULDQ


DPPS, DPPD, BLENDPS, BLENDPD, BLENDVPS, BLENDVPD,
PBLENDVB, PBLENDW PMINSB, PMAXSB, PMINUW, PMAXUW,
PMINUD, PMAXUD, PMINSD, PMAXSD ROUNDPS, ROUNDSS,
ROUNDPD, ROUNDSD INSERTPS, PINSRB, PINSRD/PINSRQ,
EXTRACTPS, PEXTRB, PEXTRW, PEXTRD/PEXTRQ,
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD,
PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD,
PMOVSXWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ, PTEST ,
PCMPEQQ , PACKUSDW, MOVNTDQA

The following SSE4.2 instructions are added with Nehalem processors

CRC32, PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM,


PCMPGTQ

224
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-13. Undocumented x86 Instructions


The x86 CPUs have undocumented instructions which are implemented
on the chips but have never listed in any official available document.

Table 4-13. Undocumented instructions of the 8086 processors

Mnemonic Opcode Description Status


AAM imm8 D4 imm8 Divide AL by imm8, put Available beginning with
the quotient in AH, and 8086, documented since
the remainder in AL Pentium
AAD imm8 D5 imm8 Multiplication counterpart Available beginning with
of AAM 8086, documented since
Pentium
SALC D6 Set AL depending on the Available beginning with
value of the Carry Flag 8086, documented since
Pentium Pro.
ICEBP F1 Single byte single-step Available beginning with
exception / Invoke ICE 80386, documented (as
INT1) since Pentium Pro
LOADALL 0F 05 Loads All Registers from Only available on 80286
Memory Address
0x000800H
LOADALLD 0F 07 Loads All Registers from Only available on 80386
Memory Address ES:EDI
POP CS 0F Pop top of the stack into Only available on 8086.
CS Segment register

4-14. Converting x86 Assembly Programs to Machine Code


The microprocessor can only understand binary codes (ones and zeros).
So, after writing an assembly program, it has to be converted to the
machine binary code, in order to be executed by the microprocessor. This
is usually done by a special compiler (an assembler). However, for small
routines it may be useful to know how to convert the assembly language
instructions into machine code. As shown in figure 4-15, each assembly
instruction has an operation code (opcode), which may be one or two
bytes. The opcode of each assembly instruction is given by the
microprocessor manufacturer. Once the instruction opcode is known, it is
easy to convert the entire instruction into machine code, using the
following scheme. The opcode, is possibly followed by an address
specifier consisting of an addressing mode byte (Mod-REG-R/M), a
displacement, and finally an immediate data field, if required.

225
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Instruction Format (16-bit Code)


(1-2) Byte (0-1) Byte (0-2) Byte (0-2) Byte

OPCODE D W MOD REG R/M Displacement Immediate


o o r r r m m m

Register used in EA
calculation

Register extension

Mode with displacement length

(Word / Byte) ( 1 / 0 )

Direction (to/from register) (1/0 )

Instruction Opcode

Figure 4-15(a). Instructions format of x86 processors (16-bit instructions).


.
Instruction Format (32-bit Code)
(1-2) Byte (0-1) Byte (0-1) Byte (0-4) Byte (0-4) Byte
Prefix OPCODE MOD REG R/M SIB displacement Immediate

Fig. 4-15(b). Instructions format of x86 processors (32-bit instructions). The register field
code (REG), addressing mode code (MOD) and R/M code are shown in tables 4-13,14,15.

Now let us discuss the details of encoding bits in x86 instructions. The
opcode of the instruction is the first byte of the instruction. However,
some opcodes may occupy more than 1 byte. Appendix B depicts the
opcode map of x86 instructions. Within most of opcodes there are special
1-bit indicators; namely:

W-bit (Word). Defines if the instruction can operate on a byte (W = 0), a


word (W = 1) or a double word (W = 1).
D-bit (Direction of transfer). Defines the direction of transfer for two-
operand instructions (except for string instructions and those having an
immediate operand).
226
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

As one of the two operands should be a register, the D-bit defines


whether transfer from REG (D = 0) or to REG (D = 1).
S-bit (Sign extension). Appears with the W-bit in the ADD, SUB and
CMP instructions, which involve immediate to REG/M operands. The S-
bit is assigned as follows:
 S:W=00, for 8bit operation, without sign extension,
 S:W=01, for 16-bit immediate operand (with no sign extension),
 S:W=11, for 16-bit operations (with sign extension of 8-bit immediate
operand).

Sign extension of 8-bit 2's complement numbers to 16-bit or 32-bit 2's


complement mean extending the bits in the high-order bytes to be equal
to the MSB (the sign bit) in the lower-order byte.
V-Bit. Used by shift and rotate instructions to determine the number of
shifts.
Z-Bit. Used with the REP instruction prefix.

Following is the opcode of the "MOD-REG-R/M" byte or simply, the


addressing mode byte. The bits in the REG field let you select one of
eight different registers. The 8086 supports 8 eight bit registers and 8
sixteen bit general purpose registers. The 80386 also supports eight 32 bit
general purpose registers. The CPU decodes the meaning of the REG
field as shown in table.
Table 4-14. REG field bits, in the addressing mode byte of x86 instructions

Register Field Encoding ( REG )


REG 8-bit mode 16-bit mode 32-bit mode
rrr W=0 W=1 W=1
000 AL AX EAX
001 CL CX ECX
010 DL DX EDX
011 BL BX EBX
100 AH SP ESP
101 CH BP EBP
110 DH SI ESI
111 BH DI EDI

The R/M field, in conjunction with the MOD field, chooses the
addressing mode. The mod field encoding is described in table 4-15.
Also, table 4-16 depicts the R/M encoding bits.
227
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Table 4-15. MOD field bits, in the addressing mode byte of x86 instructions

Addressing Mode Encoding ( MOD )


MOD Meaning
oo
00 Memory addressing mode, with no displacement. The R/M field
denotes a memory or a base/indexed addressing mode (see encoding for R/M)
unless the R/M field contains 110. If MOD=00 and R/M=110 the MOD and
R/M fields denote displacement-only (direct) addressing.
01 Memory addressing mode, with 8-bit displacement The R/M field
denotes an indexed or base/indexed/displacement addressing mode. There is
8bit signed displacement following the MOD-REG-R/M byte
10 Memory addressing mode, with 16/32/64-bit displacement The R/M
field denotes an indexed or base/indexed/displacement addressing mode.
There is a 16 bit signed displacement (in 16-bit mode) or a 32-bit signed
displacement (in 32 bit mode) or a 64-bit signed displacement (in 64 bit mode)
following the MOD-REG-R/M byte.
11 Register mode. The R/M field denotes a register and uses the same
encoding as the REG field

The MOD field chooses the memory mode. It also chooses the size of the
displacement (zero, one, two, or four bytes) that follows the instruction
for memory addressing modes. If MOD=00, then you have one of the
addressing modes without a displacement (register indirect or base/
indexed). If MOD does not equal 11, the R/M field encodes the memory
addressing mode as follows:

Table 4-16. R/M field bits in the addressing mode byte of 80x86 instructions
:
R/M Field Encoding
R/M Addressing mode (Assuming MOD=00, 01, or 10)
mmm
000 [BX+SI] or DISP[BX][SI] (depends on MOD)
001 [BX+DI] or DISP[BX+DI] (depends on MOD)
010 [BP+SI] or DISP[BP+SI] (depends on MOD)
011 [BP+DI] or DISP[BP+DI] (depends on MOD)
100 [SI] or DISP[SI] (depends on MOD)
101 [DI] or DISP[DI] (depends on MOD)
110 Displacement-only or DISP[BP] (depends on MOD)
111 [BX] or DISP[BX] (depends on MOD)

Example 4-11:
Given that the opcode of MOV instruction is 100010, show how to
encode the following instruction: MOV BP,SP

228
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Solution:
Opcode = MOV (100010)
D = transfer to register (1)
W = word (1)
REG = BP (101)
MOD = register mode (11)
R/M = SP (100)

Byte 1 Byte 2 Byte3 Byte4


OPCODE D W MOD REG R/M
1 0 0 0 1 0 1 1 1 1 1 0 1 1 0 0

Therefore, the resulting 2-byte instruction in hexadecimal code is:


8BECH

Example 4-12:
The following assembly program calculates the sum of the ten integers (1
through 10) into the microprocessor accumulator. Encode the program
into equivalent 8088 machine code.

xxxx:0100 MOV AX,0 ; Initialize the sum (in AX) to zero


xxxx:0103 MOV CX,1 ; 1 is the first integer to be added
xxxx:0106 ADD AX,CX ; Add an integer to the sum
xxxx:0108 INC CX ; Increment the integer
xxxx:0109 CMP CX,0A ; Compare new integer with 10
xxxx:010C JBE 106 ; Is new integer greater than 10?
; If integer greater than 10, jump to 0106
xxxx:010E NOP ; Terminate when the 10 integers in AX

Solution:
When the instructions are decoded using the 8088 opcodes, we have:

8088 Machine Code Assembly


xxxx:0100 B8 00 00 MOV AX,0
xxxx:0103 B9 01 01 MOV CX,1
xxxx:0106 01 C8 ADD AX,CX
xxxx:0108 41 INC CX
xxxx:0109 83 F9 0A CMP CX,0A
xxxx:010C 76 F8 JBE 106
xxxx:010E 90 NOP

229
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-15. Case Study: Encoding the MOV Instruction


The examples throughout this book make extensive use of the MOV
instruction. Furthermore, the MOV instruction is the most common 80x86
machine instruction. Therefore, it's worthwhile to spend a few moments
discussing the operation of this instruction. The MOV instruction is very
simple and takes the form:

MOV Destination , Source

As we mentioned so far, MOV makes a copy of Source and stores its value
into Destination. It overwrites the Source value in Destination and does
not affect the original contents of Source. However, encoding of the MOV
instruction is probably the most complex in the instruction set.
Nonetheless, without studying the machine code for this instruction you
will not be able to appreciate it, nor will you have a good understanding
of how to write optimal code using this instruction.

There are several versions of the MOV instruction. The mnemonic MOV
describes over a dozen different instructions on 80x86 processors. The
most commonly used form of the MOV instruction has the following binary
encoding scheme, shown in figure 4-16.

Opcode (1 byte) Addressing Mode (0-1 byte)


7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
1 0 0 0 1 0 D W MOD REG R/M
Displacement (0 or 1 or 2 or 4 bytes)
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
x x x x x x x x x x x x x x x x

Fig. 4-16. Encoding the MOV instruction.

The opcode of MOV is the first eight bits of the instruction. Bits zero and
one define the width W of the instruction (Byte, Word, or Double Word)
and the direction D of the transfer. Sometimes, the values of D and W
will be filled for you, as a part of the opcode.

Following the opcode is the "MOD-REG-R/M" byte. This byte, if


present, chooses which of 256 different possible operand combinations
the MOV instruction allows. The generic MOV instruction takes different
assembly language forms, such those shown in the following table:

230
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

MOV Instruction Opcode Operands


MOV reg, reg 1000 10 1w 11 rrr mmm
MOV mem, reg 1010 00 0w oo rrr mmm displacement
MOV reg, mem 1000 10 1w oo rrr mmm displacement
MOV reg, [disp] 1000 10 1w 00 rrr 110 displacement
MOV [disp], reg 1000 10 0w 00 rrr 110 displacement
MOV mem, accum 1010 00 0w displacement
MOV reg, imm 1011 w rrr immediate data

Note that at least one of the operands is always a general purpose register.
If present, the REG field in the addressing mode byte specifies that
register. As we pointed out so far, the bits in the REG field (rrr) let you
select one of eight different registers. Table 4-12 depicts the REG field
bits. The R/M field (mmm), with the MOD field (oo), choose the
addressing mode. The MOD field encoding is shown in table 4-14.
The MOD field chooses the register-to-register or register-to- or –from-
memory move. It also chooses the size of the displacement (zero, one,
two, or four bytes) that follows the instruction for memory addressing
modes. If MOD = 00, then you have one of the addressing modes without
a displacement (register indirect or base/indexed). Note the special case
where MOD = 00 and R/M = 110, as indicated in rows 5 and 6. This
would normally correspond to the [BP] addressing mode. The 8086 uses
this encoding for the displacement-only addressing mode.

This means that there is no true [BP] addressing mode on the 8086. In
order to understand why you can use the [BP] addressing mode in your
programs, look at MOD = 01 and MOD = 10 in the above table. These bit
patterns activate the disp[reg] and the disp[reg][reg] addressing modes.
This is not the same as the [BP] addressing mode. However, consider the
following instructions:

MOV AL, 0[BX]


MOV AH, 0[BP]
MOV 0[SI], AL
MOV 0[DI], AH

These statements, using the indexed addressing modes, perform the same
operations as their register indirect counterparts (obtained by removing
the displacement from the above instructions). The only real difference
between the two forms is that the indexed addressing mode is one byte
longer (if MOD = 01, two bytes longer if MOD = 10) to hold the
displacement of zero.
231
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Because they are longer, these instructions may also run a little slower.
This trait of the 80x86 - providing two or more ways to accomplish the
same thing - appears throughout the instruction set. In fact, you've to see
more examples before you're through with the MOV instruction. Table 4-14
depicts the R/M field encodes, when MOD does not equal 11.

Don't forget that addressing modes involving BP use the stack segment
(SS) by default. All others use the data segment ( DS) by default. If this
discussion has got you totally lost, you haven't even seen the worst of it
yet. Keep in mind; these are just some of the 8086 addressing modes.
You've still got all the 80386 addressing modes to look at. You're
probably beginning to understand what they mean when they say
complex instruction set computer. Full description of the x86 opcode
map, can be found in appendix C.

Notes on the MOV Instructions

There are several important facts you should always remember about the
MOV instruction. First of all, there is no memory to memory move. For
some reason, newcomers to assembly language have a hard time grasping
this point. While there are a couple of instructions that perform memory
to memory moves, loading a register and then storing that register is
almost always more efficient.

4-16. Execution Time of x86 Instructions


The time required to execute an instruction is calculated by multiplying
the processor clock cycle period by the number of clocks needed to
execute the instruction.

T = Tc . N = Tc (No + EA) (4-1)

The number of clocks (N) is the sum of the basic clocks (No) plus the
total time required to calculate the effective address (EA) if memory
operand is involved.
Example 4-13.
Refer to Appendix B and calculate the ADD instruction execution time
for different variations. Which ADD instruction is the fastest one?

Solution: The following table depicts the different variations of


ADD/SUB instructions and their corresponding number of clock cycles.

232
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Table 4-17. Effective execution time in x86 processors (with no pipelining)

Instruction Operands N Number of Memory


transfers
ADD / SUB reg,reg 3 0
" mem,reg 16 + EA 2
" reg,mem 9 + EA 1
" mem,imm 17 + EA 2
" reg,imm 4 0

As shown in the following table, the fastest ADD instruction is ADD


reg,reg, cause it does not involve any external memory access. Note that
ADD reg,mem needs 9+EA while ADD mem,reg takes 16+EA because
the former needs 1 memory references while the later needs 2 memory
references; one to read operand from memory and another to store result
in memory. The effective address calculation time (EA) depends on the
addressing mode, as shown in the following table.

Table 4-18. Effective address calculation in x86 processors (with no pipelining)


:
EA mode Example EA
Direct [100] 5
Register indirect [DX] 6
Reg + disp [DX+300] 9
Based indexed (BP + DI) [BP + DI] 7
(BP + SI) [BP + SI ] 8
Based indexed + disp (BP+DI+disp) [BP+DI + 320] 11
(BX+SI +disp) [BX+SI + 204]
(BP +SI + isp) [BP+SI + 502] 12
(BX+DI + isp) [BX+DI + 100]

It worth notice that the LOOP instruction is usually deployed to perform


delay routines in assembly programs. On 8086 processors the LOOP
instruction takes 17 clocks to loop back (when CX≠0) and takes 5 clocks
to leave the loop (when CX=0). It should be also noted that conditional
branch instructions have two different timing figures, depending on
whether the condition is met or not.

233
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-17. Instructions Set of SPARC Processors


Like x86 instructions, SPARC instructions are accessed by the processor
from memory and then are executed, annulled, or trapped. SPARC
instructions are encoded in four major formats and partitioned into eleven
general categories. The instruction at the memory location specified by
the program counter (PC) is fetched and then executed. After instruction
execution, new values are assigned to the program counter (PC) and the
next program counter (nPC).

In much the same way as x86 instructions, the SPARC basic instructions
can be divided into the following categories.

Memory access
Integer operate
Control transfer
State register access
Floating-point operate
Conditional move
Register window management

These classes are discussed in the following subsections. A complete


description of SPARC instructions set can be found in Appendix E.

4-17.1. Memory Access Instructions


The SPARC is based on the load/store architecture. This means that
registers are used as the operands for all data manipulation operations.
The operands for these operations cannot be in memory locations. Load,
Store, Prefetch, Load Store Unsigned Byte, Swap, and Compare and
Swap are the only instructions that access memory. All of the instructions
except Compare and Swap use either two r registers or an r register and
simm13 (13-bit immediate value) to calculate a 64-bit byte memory
address. Compare and Swap uses a single r register to specify a 64-bit
byte memory address. For this 64-bit address, the IU appends an ASI
(Alternate Address Space) that encodes address space information.

4-17.2. Arithmetic, Logical and Shift Instructions


The arithmetic/logical/shift instructions perform arithmetic, logical, and
shift operations. With one exception, these instructions compute a result
that is a function of two source operands; the result is either written into a
destination register or discarded. Shift instructions are used to shift the
contents of an r register left or right by a given count. The shift distance
is specified by a constant in the instruction or by the contents of an r
register.
234
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The integer multiply instruction performs 64 x 64-bit operation. The


integer division instructions perform 64 / 64-bit operations. In addition,
for compatibility with SPARC-V8, 32 x 32¸ 32/ 32-bit divide, and
multiply step instructions are included. Division by zero causes a trap.

4-17.3. Control Transfer Instructions


Control-transfer instructions (CTIs) include PC-relative branches and
calls, register-indirect jumps, and conditional traps. The basic control-
transfer instruction types are as follows:

Conditional branch (Bicc, BPcc, BPr, FBfcc, FBPfcc)


Unconditional Branch
Call and Link (CALL)
Jump and Link (JMPL, RETURN)
Return from trap (DONE, RETRY)
Trap (Tcc)

A control-transfer instruction functions by changing the value of the next


program counter (nPC) or by changing the value of both the program
counter (PC) and the next program counter (nPC). When only the next
program counter, nPC, is changed, the effect of the transfer of control is
delayed by one instruction. Most control transfers in SPARC-V9 are of
the delayed variety. Annulled instructions have no effect upon the
program-visible state nor can they cause a trap.

A trap is a vectored transfer of control to privileged software through a


trap table that may contain the first eight instructions (thirty-two for
fill/spill traps) of each trap handler. The base address of the table is
established by software in a state register (the Trap Base Address register,
TBA). The displacement within the table is encoded in the type number
of each trap and the level of the trap. One half of the table is reserved for
hardware traps; one quarter is reserved for software traps generated by
trap (Tcc) instructions; the final quarter is reserved for future expansion
of the architecture. A trap causes the current PC and nPC to be saved in
the TPC and TNPC registers. It also causes the CCR, ASI, PSTATE, and
CWP registers to be saved in TSTATE.

TPC, TNPC, and TSTATE are entries in a hardware trap stack, where the
number of entries in the trap stack is equal to the number of trap levels
supported (impl. dep. #101).

235
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

A trap also sets bits in the PSTATE register, one of which can enable an
alternate set of global registers for use by the trap handler. Normally, the
CWP is not changed by a trap; on a window spill or fill trap, however, the
CWP is changed to point to the register window to be saved or restored.

A trap may be caused by a Tcc instruction, an asynchronous exception, an


instruction induced exception, or an interrupt request not directly
related to a particular instruction. Before executing each instruction, the
processor behaves as though it determines if there are any pending
exceptions or interrupt requests. If any are pending, the processor selects
the highest-priority exception or interrupt request and causes a trap.

4-17.4. State Register Access Instructions


The read and write state register instructions read and write the contents
of state registers visible to non-privileged software (Y, CCR, ASI, PC,
TICK, and FPRS). The read and write privileged register instructions read
and write the contents of state registers visible only to privileged software
(TPC, TNPC, TSTATE, TT, TICK, TBA, PSTATE, TL, PIL, CWP,
CANSAVE, CANRESTORE, CLEANWIN, OTHERWIN, WSTATE,
FPQ, and VER). Software can use read/write state register instructions to
read/write implementation-dependent registers (ASRs 16..31).

4-17. 5. Floating-Point Operate Instructions


Floating-point operate (FPop) instructions perform all floating-point
calculations; they are register-to-register instructions that operate on the
floating-point registers. Like arithmetic/ logical/shift instructions, FPops
compute a result that is a function of one or two source operands. Specific
floating-point operations are selected by a subfield of the FPop1/FPop2
instruction formats.

4-17. 6. Conditional Move Instructions


Conditional move instructions conditionally copy a value from a source
register to a destination register, depending on an integer or floating-point
condition code or upon the contents of an integer register. These
instructions increase performance by reducing the number of branches.

4-17.7. Register Window Management Instructions


These instructions are used to manage the register windows. SAVE and
RESTORE are non-privileged and cause a register window to be pushed
or popped. FLUSHW is non-privileged and causes all of the windows
except the current one to be flushed to memory. SAVED and
RESTORED are used by privileged software to end a window spill or fill
trap handler.
236
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-18. Instruction Format of SPARC Processors


In SPARC machines, the instructions are encoded in four major 32-bit
formats and several minor formats, as shown in the following figures.

Fig. 4-17. Instruction formats of SPARC processors: Format 1,2,3

237
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Fig. 4-18. Instruction formats of SPARC processors: Format 3 (cont.).

Fig. 4-19. Instruction formats of SPARC processors: Format 4

238
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

The instruction fields in the above formats are interpreted as follows:

a: The a bit annuls the execution of the following instruction if the branch
is conditional and untaken, or if it is unconditional and taken.
c0, cc1, and cc2: specify the condition codes (icc, xcc, fcc0, fcc1, fcc2,
fcc3) to be used in the instruction.

Individual bits of the same logical field are present in several other
instructions: Branch on Floating-Point Condition Codes with Prediction
Instructions (FBPfcc), Branch on Integer Condition Codes with
Prediction (BPcc), Floating-Point Compare Instructions, Move Integer
Register if Condition is Satisfied (MOVcc), Move Floating-Point
Register if Condition is Satisfied (FMOVcc), and Trap on Integer
Condition Codes (Tcc). In instructions such as Tcc that do not contain the
cc2 bit, the missing cc2 bit takes on a default value. See table 38 on page
279 for a description of these fields‘ values.

cmask: This 3-bit field specifies sequencing constraints on the order of


memory references and the processing of instructions before and after a
MEMBAR instruction.

cond: This 4-bit field selects the condition tested by a branch instruction.

d16hi and d16lo: These 2-bit and 14-bit fields together comprise a word-
aligned, sign-extended, PCrelative displacement for a branch-on-register-
contents with prediction (BPr) instruction.

disp19: This 19-bit field is a word-aligned, sign-extended, PC-relative


displacement for an integer branch-with-prediction (BPcc) instruction or
a floating-point branch-with prediction (FBPfcc) instruction.

disp22 and disp30: These 22-bit and 30-bit fields are word-aligned, sign-
extended, PC-relative displacements for a branch or call, respectively.

fcn: This 5-bit field provides additional opcode bits to encode the DONE
and RETRY instructions.

i: The i bit selects the second operand for integer arithmetic and
load/store instructions. If i = 0, the operand is r[rs2]. If i = 1, the operand
is simm10, simm11, or simm13, depending on the instruction, sign-
extended to 64 bits.

239
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

imm22: This 22-bit field is a constant that SETHI places in bits 31..10 of
a destination register.
imm_asi: This 8-bit field is the address space identifier in instructions
that access alternate space.
impl-dep: The meaning of these fields is completely implementation-
dependent for MPDEP1 and IMPDEP2 instructions.
mmask: This 4-bit field imposes order constraints on memory references
appearing before and after a MEMBAR instruction.
op and op2: These 2- and 3-bit fields encode the three major formats and
the Format 2 instructions.
op3: This 6-bit field (together with one bit from op) encodes the Format 3
instructions.
opf: This 9-bit field encodes the operation for a floating-point operate
(FPop) instruction.
opf_cc: Specifies the condition codes to be used in FMOVcc instructions.
See cc0, cc1, and cc2 above for details.
opf_low: This 6-bit field encodes the specific operation for a Move
Floating-Point Register if Condition is satisfied (FMOVcc) or Move
Floating-Point register if contents of integer register match condition
(FMOVr) instruction.
p: This 1-bit field encodes static prediction for BPcc and FBPfcc
instructions, as follows:

p Branch prediction
0 Predict branch will not be taken
1 Predict branch will be taken

rcond: This 3-bit field selects the register-contents condition to test for a
move based on register contents (MOVr or FMOVr) instruction or a
branch on register contents with prediction (BPr) instruction.
rd: This 5-bit field is the address of the destination (or source) r or f
register(s) for a load, arithmetic, or store instruction.
rs1: This 5-bit field is the address of the first r or f register(s) source
operand.

240
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

rs2: This 5-bit field is the address of the second r or f register(s) source
operand with i = 0.
shcnt32: This 5-bit field provides the shift count for 32-bit shift
instructions.
shcnt64: This 6-bit field provides the shift count for 64-bit shift
instructions.
simm10: This 10-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for a MOVr instruction
when i = 1.
simm11: This 11-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for a MOVcc instruction
when i= 1.
simm13: This 13-bit field is an immediate value that is sign-extended to
64 bits and used as the second ALU operand for an integer arithmetic
instruction or for a load/store instruction when i = 1.
sw_trap#: This 7-bit field is an immediate value that is used as the
second ALU operand for a Trap on Condition Code instruction.
x: The x bit selects whether a 32- or 64-bit shift will be performed..

4-19. Encoding Load and Store Instruction of SPARC Processors


The SPARC machine language uses two different formats for load and
store instructions. These formats are shown in Figure 4-20. The first
format is used for instructions that use one or two registers in the
effective address. The second format is used for instructions that use an
integer constant in the effective address.

A. Instruction Format 1: Load Instructions op [rs1+rs2], rd


Store Instructions op rd, [rs1+ rs2]
s1 s0 29 25 24 19 18 14 13 12 5 4 0
11 rd ops rs1 0 asi rs2
B. Instruction Format 2: Load Instructions op [rs1+const], rd
Store Instructions op rd, [rs1+ const]
s1 s0 29 25 24 19 18 14 13 12 0
11 rd ops rs1 1 const
Fig. 4-20. Formats of load and store instructions of SPARC peocessors

241
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

In the first format the 32-bit instruction is divided into seven fields. The
first field (reading from the left) holds the 2-bit value 11, while the fifth
field (bit 13) holds the 1-bit value 0. These bits are the same for all load
and store instructions that use two source registers. The sixth field (bits 5
through 12) holds the address space indicator, asi. For the present, we
will always set the asi field to zero. The remaining fields, rd, op4, rs1, and
rs4, hold encodings for the destination register, the operation, and the two
source registers, respectively. Registers are encoded using the 5-bit
binary representation of the register number. Table 4-19 summarizes the
operation codes for the load and store instructions.

Table 4-19: Operation encodings for the load and store operations

Instruction Opcode Instruction Opcode


ld 000000 ldsh 001000
ldub 000001 st 000100
lduh 000010 stb 000101
ldd 000011 sth 000110
ldsb 001001 std 000111

Example 4-14.
Show how to assemble the following load instruction:

ldd [%r4+%r7], %r11

Because this instruction uses two registers in the address specification, it


is encoded using the first format shown in Figure 4-20. As such, we must
determine the values for the rd, op4, rs1, and rs4 fields. The following
table summarizes these encodings:

Field Symbolic Value Encoded Value


rd %r11 01011
op ldd 000011
rs1 %r4 00100
rs2 %r7 00111

These encodings lead to the following machine instruction:

s1 s0 29 25 24 19 18 14 13 12 5 4 0
11 01011 000011 00100 0 0000000 00111

That is, 1101 0110 0001 1001 0000 0000 0000 0111 in binary or
0xD6190007 in Hexadecimal.
242
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

If the assembly language instruction only uses a single register in the


address specification (e.g., register indirect addressing), the register is
encoded in one of the source register fields (i.e., sr or sr ) while %r0 is
encoded in the other. It doesn't matter which field holds the register
specified in the assembly language instruction and which field holds the
encoding for %r0. However, isem-as encodes %r0 in sr .

243
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-20. Instructions Set of ARM Processors


All ARM instructions are 32 bits long. Actually, there are two instruction
sets that the ARM core can use: Regular ARM code with 32bit
instructions, and a subset of this called THUMB, which is 16bit long.
Naturally, the ARM set is more powerful, but the most used instructions
can be found in both.

4-20.1. ARM Instruction Set (Alphabetic)


The following table summarizes the most famous ARM instructions in
alphabetic order. These instructions include a version of the Thumb (16-
bit) instruction set.

Note that angle brackets, <>, enclose alternative forms of the operand
braces, {}, enclose optional operands and Op2 is a flexible second
operand that can be either a register or a constant. Note also that most
instructions can use an optional condition code suffix

Table 4-20. ARM instructions summary

Mnemonic Operands Description Flags


ADC, ADCS {Rd,} Rn, Op2 Add with Carry N,Z,C,V

ADD, ADDS {Rd,} Rn, Op2 Add N,Z,C,V

ADD, ADDW {Rd,} Rn, #imm12 Add N,Z,C,V

ADR Rd, label Load PC-relative Address -

AND, ANDS {Rd,} Rn, Op2 Logical AND N,Z,C

ASR, ASRS Rd, Rm, <Rs|#n> Arithmetic Shift Right N,Z,C

B label Branch -

BFC Rd, #lsb, #width Bit Field Clear -

BFI Rd, Rn, #lsb, width Bit Field Insert -

BIC, BICS {Rd,} Rn, Op2 Bit Clear N,Z,C

BKPT #imm Breakpoint -

BL label Branch with Link -

BLX Rm Branch indirect with Link -

244
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Mnemonic Operands Description Flags


BX Rm Branch indirect -

CBNZ Rn, label Compare and Branch if Non Zero -

CBZ Rn, label Compare and Branch if Zero -

CLREX - Clear Exclusive -

CLZ Rd, Rm Count Leading Zeros -

CMN Rn, Op2 Compare Negative N,Z,C,V

CMP Rn, Op2 Compare N,Z,C,V

Change Processor State, Disable


CPSID i -
Interrupts

Change Processor State, Enable


CPSIE i -
Interrupts

DMB - Data Memory Barrier -

DSB - Data Synchronization Barrier -

EOR, EORS {Rd,} Rn, Op2 Exclusive OR N,Z,C

ISB - Instruction Synchronizatio Barrier -

IT - If-Then condition block -

Load Multiple registers,


LDM Rn{!}, reglist -
increment after

LDMDB, Load Multiple registers,


Rn{!}, reglist -
LDMEA decrement before

LDMFD, Load Multiple registers,


Rn{!}, reglist -
LDMIA increment after

LDR Rt, [Rn, #offset] Load Register with word -

LDRB,
Rt, [Rn, #offset] Load Register with byte -
LDRBT

LDRD Rt, Rt2, [Rn, #offset] Load Register with two bytes -

LDREX Rt, [Rn, #offset] Load Register Exclusive -

245
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Mnemonic Operands Description Flags


Load Register Exclusive with
LDREXB Rt, [Rn] -
Byte

Load Register Exclusive with


LDREXH Rt, [Rn] -
Halfword

LDRH,
Rt, [Rn, #offset] Load Register with Halfword -
LDRHT

LDRSB,
Rt, [Rn, #offset] Load Register with Signed Byte -
LDRSBT

LDRSH, Load Register with Signed


Rt, [Rn, #offset] -
LDRSHT Halfword

LDRT Rt, [Rn, #offset] Load Register with word -

LSL, LSLS Rd, Rm, <Rs|#n> Logical Shift Left N,Z,C

LSR, LSRS Rd, Rm, <Rs|#n> Logical Shift Right N,Z,C

Multiply with Accumulate, 32-bit


MLA Rd, Rn, Rm, Ra -
result

Multiply and Subtract, 32-bit


MLS Rd, Rn, Rm, Ra -
result

MOV, MOVS Rd, Op2 Move N,Z,C

MOVT Rd, #imm16 Move Top -

MOVW, MOV Rd, #imm16 Move 16-bit constant N,Z,C

Move from Special Register to


MRS Rd, spec_reg -
general register

Move from general register to


MSR spec_reg, Rm N,Z,C,V
Special Register

MUL, MULS {Rd,} Rn, Rm Multiply, 32-bit result N,Z

MVN, MVNS Rd, Op2 Move NOT N,Z,C

NOP - No Operation -

ORN, ORNS {Rd,} Rn, Op2 Logical OR NOT N,Z,C

246
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Mnemonic Operands Description Flags


ORR, ORRS {Rd,} Rn, Op2 Logical OR N,Z,C

POP reglist Pop registers from stack -

PUSH reglist Push registers onto stack -

RBIT Rd, Rn Reverse Bits -

REV Rd, Rn Reverse byte order in a word -

Reverse byte order in each


REV16 Rd, Rn -
halfword

Reverse byte order in bottom


REVSH Rd, Rn -
halfword and sign extend

ROR, RORS Rd, Rm, <Rs|#n> Rotate Right N,Z,C

RRX, RRXS Rd, Rm Rotate Right with Extend N,Z,C

RSB, RSBS {Rd,} Rn, Op2 Reverse Subtract N,Z,C,V

SBC, SBCS {Rd,} Rn, Op2 Subtract with Carry N,Z,C,V

SBFX Rd, Rn, #lsb, #width Signed Bit Field Extract -

SDIV {Rd,} Rn, Rm Signed Divide -

SEV - Send Event -

Signed Multiply with Accumulate


SMLAL RdLo, RdHi, Rn, Rm -
(32 x 32 + 64), 64-bit result

Signed Multiply (32 x 32), 64-bit


SMULL RdLo, RdHi, Rn, Rm -
result

SSAT Rd, #n, Rm {,shift #s} Signed Saturate Q

Store Multiple registers,


STM Rn{!}, reglist -
increment after

STMDB, Store Multiple registers,


Rn{!}, reglist -
STMEA decrement before

STMFD, Store Multiple registers,


Rn{!}, reglist -
STMIA increment after

247
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Mnemonic Operands Description Flags


STR Rt, [Rn, #offset] Store Register word -

STRB, STRBT Rt, [Rn, #offset] Store Register byte -

STRD Rt, Rt2, [Rn, #offset] Store Register two words -

STREX Rd, Rt, [Rn, #offset] Store Register Exclusive -

STREXB Rd, Rt, [Rn] Store Register Exclusive Byte -

Store Register Exclusive


STREXH Rd, Rt, [Rn] -
Halfword

STRH, STRHT Rt, [Rn, #offset] Store Register Halfword -

STRT Rt, [Rn, #offset] Store Register word -

SUB, SUBS {Rd,} Rn, Op2 Subtract N,Z,C,V

SUB, SUBW {Rd,} Rn, #imm12 Subtract N,Z,C,V

SVC #imm Supervisor Call -

SXTB {Rd,} Rm {,ROR #n} Sign extend a byte -

SXTH {Rd,} Rm {,ROR #n} Sign extend a halfword -

TBB [Rn, Rm] Table Branch Byte -

TBH [Rn, Rm, LSL #1] Table Branch Halfword -

TEQ Rn, Op2 Test Equivalence N,Z,C

TST Rn, Op2 Test N,Z,C

UBFX Rd, Rn, #lsb, #width Unsigned Bit Field Extract -

UDIV {Rd,} Rn, Rm Unsigned Divide -

Unsigned Multiply with


UMLAL RdLo, RdHi, Rn, Rm Accumulate (32 x 32 + 64), 64-bit -
result

Unsigned Multiply (32 x 32), 64-


UMULL RdLo, RdHi, Rn, Rm -
bit result

USAT Rd, #n, Rm {,shift #s} Unsigned Saturate Q

248
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Mnemonic Operands Description Flags


UXTB {Rd,} Rm {,ROR #n} Zero extend a Byte -

UXTH {Rd,} Rm {,ROR #n} Zero extend a Halfword -

WFE - Wait For Event -

WFI - Wait For Interrupt -

4-20.2. ARM Instruction Encoding


Actually, the ARM has many possible instruction word formats, as shown
in figure 4-20.

Fig. 4-20. Formats of the ARM instruction format

For the matter of demonstration, the following figure illustrates the


encoding format of the ARM's load and store instructions. Memory
access operations have a conditional execution field in bit 31, 03, 29, and
28. The load and store instructions can be conditionally executed
depending on a condition specified in the instruction. Now look at the
following examples:
249
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

CMP R1, R2
LDREQ R3, [R4]
LDRNE R3, [R5]

Fig.4-21. Encoding ARM load/store instructions

250
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-21. Summary

In this chapter we introduced the instruction set and instruction format of


x86 and SPARC processors. The modern x86 instruction set is a series of
extensions of instruction sets that began with the Intel 8008
microprocessor. Nearly full binary backward compatibility is present
between the Intel 8086 through to the modern Pentium 4, Intel Core,
AMD Athlon 64, and Opteron, processors.

The simplest kind of programming languages is the assembly language,


which has a one-to-one correspondence with the resulting machine code
but allows the use of meaningful text strings, called mnemonics. The
assembly language is a symbolic representation of machine language of
a specific processor. The x86 assembly language can recognize the
following fundamental data types: bytes, words, and dwords (double
words). The information encoded in an x86 instruction includes a
specification of the operation to be performed (opcode), and the
arguments (operands) to be manipulated.

Addressing modes are set of methods for specifying the operands of an


instruction. Different processors vary in the number of addressing modes
they provide. The most common modes are immediate, register, and
memory (or absolute) modes.

Addressing Mode Examples


1) Immediate Addressing: ADD CX,1024
2) Register Addressing: MOV AL,BL
3) Memory Addressing:
¤ Direct: MOV AX,[3000]
¤ Indirect:
* Register Indirect, MOV CL,[DX]
* Based, MOV CX,[BX+3]
* Indexed, MOV AX,[DI+2]
* Based Indexed, MOV [BP+SI],BL
* Based Indexed with MOV [BX+SI+208],AH
Displacement,
¤ String Addressing, MOVSB
¤ Port Addressing. IN AL,40
OUT 80,AL

251
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

Note that CS:IP are some special combinations of segment registers and
general registers that point to important addresses. For instance: CS:IP
points to the address where the processor will fetch the next byte of code.
SS:SP points to the location of the last item pushed onto the stack.
DS:SI is often used to point to data that is about to be copied to ES:DI

The basic x86 instruction set includes data manipulation, arithmetic,


logic, program flow control and processor control instructions.

The x86 data manipulation instruction includes specific string instructions


which load, store and move instructions (LODS, STOS, and MOVS)
which perform each operation to a specified size (B for 8-bit byte, W for
16-bit word, D for 32-bit double word) then increments the implicit
address register (SI for LODS, DI for STOS, and both for MOVS). For
the load and store, the implicit target/source register is in the AL, AX or
EAX register (depending on size.) The implicit segment used is DS,
except for MOVS which uses ES for the store and DS for the load. In
modern x86 processors, these complex instructions don't offer any
performance advantage over more simply implemented separate
load/store and address increment instructions.

The stack is implemented with an implicitly decrementing (push) and


incrementing (pop) stack pointer. In 16-bit mode, of x86 processors, this
implicit stack pointer is addressed as SS:[SP], in 32-bit mode it is
SS:[ESP], and in 64-bit mode it's [RSP].

When the x86 processors are powered up or re-initialized (by reset) the
CPU is in real mode (with real address) and all protection features are
disabled and the memory space is limited to 1MB, of physical memory.
This is very similar to what happen with earlier IBM PC's, operating in
real mode. In such a case. we can assume an IBM PC equipped with 8088
microprocessor, to discuss the initialization (boot-up) process. The boot-
up process begins when the PC is powered up or reset. This will execute a
jump instruction at address F000:FFF0 inside the ROM BIOS chip that
points to the first instruction of the BIOS. The ROM BIOS program is
approximately 8K bytes long, or so, and controls all of the hardware on
the system board and interface cards. The CPU support chips are
initialized with the proper default values to control such things as the
video monitor, disk drives, printer ports and keyboard. After the
initialization of all the hardware, the program executes a very extensive
diagnostic type test on the x86 CPU, ROMS, RAM etc. to complete what
is called the Power-On-Self-Test (POST).
252
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

If there are no critical errors during POST, the default disk drive (e.g., C:)
is turned ON and tested. The pass condition will cause the head to
position over track 0, head 0, sector 0 of the disk and the boot loader
program is transferred into memory.

Once loaded in memory, the boot loader program is given control of the
CPU and a series of instructions are executed that will look in the
directory of the disk for the system files, dos.sys and bio.sys. If these two
system files are on the disk, they are loaded into low memory in that
order, along with any driver programs that are listed in the ―device =
statement‖ of config.sys file.

Control of the CPU is then given to the DOS program to finish the boot
up process, by loading the command processor program command.com
into memory in the next available space right after dos.sys. The boot
process is complete when command.com is given the final control of the
CPU. So, the operating system is made up of three basic programs
(ibmbio.sys, ibmdos.sys and command.com) that are loaded in low
memory starting at 00000H and ending at 0B000H. The actual ending
address will depend on the version of DOS and the number of device
drivers that are loaded during the boot process.

253
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-22. PROBLEMS

[4-1] What is the correct definition of the term 'instruction set'?


a) range of opcodes which a CPU is programmed to recognize
b) list of instructions which forms the program being executed in memory
c) specific subroutine of a program, run if certain conditions are satisfied
d) process by which a single instruction of a program is executed

[4-2] Find the contents of all the affected 8086 microprocessor registers
and flags, after each line of the following program has run. For each line,
indicate the new value for the registers that change.
ADD BL,AL
SHL AL,CL
AND AL,0Fh
SUB BL,AL

[4-3] Explain with examples all addressing modes supported by 80x86


family of CPUs

[4-4] Explain with examples all the stack operations supported by 80x86
family of CPUs (e.g., PUSH reg, PUSHA, PUSHF, POP reg, POPA,
POPF). Draw schematic representations of the stack area before and after
execution of these instructions

[4-5] Encode the following 8086 instructions into machine code

MOV BL,AL ;given that the opcode of MOV is 100010


MOV BL,AX ;given that the opcode of MOV is 100010
MOV AX,BX ;given that the opcode of MOV is 100010
ADD AX,[SI] ;given that the opcode of ADD is 000000
ADD AX,[DI] ;given that the opcode of ADD is 000000
XOR CL,[1234H] ;given that the opcode of XOR is 001100

[4-6] Show how to encode the following instructions. The MOV


instruction opcodes can be found in appendix B,

MOV CL,08H
MOV BL,B2H
MOV AL,7FH

[4-7] What‘s the difference between CALL and INT instructions, in


80x86 microprocessor systems.

254
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

[4-8] Write a 8086 assembly program that fills 1000D byte block of
memory in the extra segment beginning at address BLOCK, with the data
byte 20H (ASCII Space).

[4-9] Examine and encode the following portion of list file for real mode
program. Consider that the register BX contains initially 1234H
CMP BX,4 ; Be sure BX is in range
JNC ERROR
SHL BX,1 ; Convert to word offset
MOV BX,TABLE[BX] ; Index into table
TABLE: DW PROC0
DW PROC1.
DW PROC2.
DW PROC3
ERROR:

[4-10] Write an assembly program that input two 8-bit unsigned numbers
from input ports A0H, B0H and output the product to the 16-bit output
port 7080H.
[4-11] Obtain the approximate decimal value that conforms to the IEEE
754 floating point format of the following numbers:
A = 100101111 10000000000000000000000
B = 010001110 00000000000000000000001

[4-12] Calculate the largest positive number, the smallest non-zero


positive number and the negative number with largest magnitude that can
be represented by the 32-bit IEEE format.

[4-13] The BOOT sector files of the system are stored in _____ .
a) Harddisk b) ROM
c) RAM d) Fast solid state chips in the motherboard

[4-14] Calculate the largest positive number, the smallest non-zero


positive number and the negative number with largest magnitude that can
be represented by the 80-bit IEEE format.
[4-15] Which of the following instructions are valid and which are not?
Mention why
MOV CL, AH
MOV IP, CX
MOV CX, SP
MOV CL, 1234H
MOV CS, DS
255
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

[4-16] In Intel Pentium processors, the size of the floating registers can
be extended upto _____ .
a) 128 bit b) 256 bit
c) 80 bit d) 64 bit

[4-17] Find the contents of all the affected 8086 microprocessor registers
and flags, after each line of the following program has run. For each line,
indicate the new value for the registers that change. If the instruction is
not a legal instruction, write "ILLEGAL" anywhere inside the box. If no
registers change, write ―NONE‖ anywhere inside the box. Assume the
following Status before each part:

Registers Memory
AX 0002 BX 0114 CX 0003 DX FF05 SI 0003 ARRAY DW 5,4,3,2,1

A) MOV AX, [ARRAY+SI] B) ASR DX, CL C) ADD BYTE [02], BL


Ax______________ Ax______________ Ax______________
Bx______________ Bx______________ Bx______________
Cx _____________ Cx _____________ Cx _____________
Dx_______________ Dx_______________ Dx_______________
SI________________ SI________________ SI________________

D) ADD BYTE 02, BL E) MUL 02 F) DIV CX, AL


Ax______________ Ax______________ Ax______________
Bx______________ Bx______________ Bx______________
Cx _____________ Cx _____________ Cx _____________
Dx_______________ Dx_______________ Dx_______________
SI________________ SI________________ SI________________

G) TEST AX, BX H) AND DH, 55H I) CWD


Ax______________ Ax______________ Ax______________
Bx______________ Bx______________ Bx______________
Cx _____________ Cx _____________ Cx _____________
Dx_______________ Dx_______________ Dx_______________
SI________________ SI________________ SI________________

[4-18] Show how to load the Flag register from the Accumulator (AH) ←
(Flags), on an 8088 microprocessor system. Show also how to perform the
inverse process, to store AH into Flags (Flags) ← (AH)

[4-19] Show how to clear and set the interrupt flag on an 8088
microprocessor system?
256
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

[4-20] Calculate the execution time of different variations of the MOV


instruction. Put your answer in a table, like we did in example 4-13.

[4-21] Calculate the execution time of the BCD-to 7segment conversion


procedure, shown in example 4-4. Assume a non-pipelined processor,
with 1 GHz clock.

[4-22] Encode the assembly program of the following program (explained


in example 4-6). Calculate the size of the program in memory, and the
size of its executable version, after adding the PHP

STRING DB ―THIS IS A TEXT‖ ; Define a byte string called STRING


CLD ; Clear direction flag
LEA DI,STRING ; Point to the start address of string
MOV AL,‘X‘ ; Search for the value ‗X‘
MOV CX,0EH ; STRING contains 14 byte
MOV BX,CX ; Save string length in BX
REPNE SCASB ; Scan STRING and loop till match
SUB BX,CX ; Calculate position of ‗X‘

257
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

4-23. Bibliography

[1] C. MORGAN and M. WAITE, 8086/8088 16-bit microprocessor


primer, McGraw-Hill 1982.

[2] J. E. UFFENBECK, The 8086/8088 family: Design, Programming


and Interfacing, Prentice-Hall, 1987.

[3] D. Willen and J. Krantz, 8088 Assembler Language Programming:


The IBM PC, Macmillan, NY, 2nd Edition, 1989.

[4] [15] Peter Norton et al, PC Programming Bible, Microsoft Press, 1996.

[5] Barry B. Brey, The Intel Microprocessors 8086/8088, 80186/80188,


80286, 80386, 80486, Pentium, and Pentium Pro Processor Architecture,
Programming, and Interfacing, Book News, NY, 1999.

[6] V. Rajaraman, and T. Radhakrishnan, Essentials of Assembly


Language Programm-ing, for the IBM PC, Prentice-Hall, 2000.

258
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 4

259
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Assembly Language:
Programming, Compilation &
Debugging
Contents
5-1. Introduction
5-2. DEBUG Program
5-3. Macro Assembler Programs
5-4. Assembly Language Instructions Format.
5-5. Assembler Data Types.
5-6. Assembler Directives
5-7. Declaring Variables
5-8. Modifiers & Attribute Operators
5-9. Difference between Values, Addresses and Pointers
5-10. Arrays in Assembly Language
5-11. Tables & Lookup Tables in Assembly Language
5-12. Other Data Structures in Assembly Language (Queues, Linked lists,..)
5-13. Working with Strings in Assembly Language
5-14. Procedures in Assembly Programs
5-15. Functions in Assembly Programs
5-16. Writing & Initializing Interrupts in Assembly Programs
5-17. Creating Macros in Assembly Programs
5-18. Assembly Program Compilation & Linking
5-19. 16-Bit Macro-Assemblers
5-20. MASM Assembler Syntax for x86 memory Addressing Modes
5-21. 32-Bit Macro-Assemblers (MASM32)
5-22. 64-Bit Macro-Assemblers (YASM)
5-23. Summary of x86 Macro-assembler Programs
5-24. Summary
5-25. Problems

259
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

260
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Assembly Language:
Programming,
Compilation & Debugging

5-1. Introduction
We outlined, so far in chapter one, that a program is a sequence of simple
commands that lead the computer to solve some problem. Once the
program is written and debugged, the computer can execute the
instructions. We have also indicated that the assembly language of a
given processor is a collection of instructions, which has to be translated
into bit patterns, or machine code, in order to be executed by the
microprocessor. Assembly language has several benefits:

 High speed of execution. Assembly language programs are


generally the fastest programs around.
 Small memory space. Assembly language programs are often the
smallest.
 Extended flexibility. You can do things in assembly, which are
difficult or impossible in high-level languages.
 Higher efficiency. Your knowledge of assembly language will help
you write better programs, even when you use high-level languages. An
expert assembly language programmer is always capable of writing faster
and more efficient programs than an expert C programmer.

An assembler is a program that helps you translate the assembly


language words into their corresponding bit patterns very easily, and then
the output of the assembler is placed in memory for the microprocessor to
execute. However, you hardly ever get it right the first time, so you may
need to debug your program and search for syntax or typing errors.

Assembler programs, like MASM (from Microsoft) or TASM (from


Borland) are equipped with powerful editing and debugging tools. You
may also use the DEBUG program, which is supplied with your
operating system for this purpose. Once, the program is written and
debugged, the computer can execute the instructions very fast, and always
do it the same way, every time you run your program.

261
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-2. DEBUG Program


The DEBUG program, which is supplied with the disk operating system
(DOS) of the IBM PC, can be used to write and execute short assembly
programs. When the DEBUG program is started, it responds with its own
hyphen “-” prompt, as shown, in figure 5-1. When the hyphen prompt
appears debug is waiting for you to enter one of its commands. One can
then enter one of the DEBUG single-letter commands, which are
indicated in table 5-1, followed by the appropriate parameters.

Table 5-1. DEBUG Instructions. The square brackets [ ] contain optional. parameters

DEBUG Command Function


A [address] Assemble (assembly input)
C range address Compare
D [range] Dump (display)
E address [list] Enter (binary input)
F range list Fill
G [=address] [addresses] Go (run program)
H value1 value2 Hex
I port Input
L [address] [drive] [first sector] [No.] Load
M range address Move
N [pathname] [arglist] Name
O port byte Output
P [=address] [number] Proceed
Q Quit
R [register] Register
S range list Search
T [=address] [value] Trace (run step-by-step)
U [range] Un-assemble
W [address] [drive] [first sector] [No.] Write (to disk)
XA [#pages] Allocate expanded memory
XD [handle] De-allocate expanded memory
XM [Lpage] [Ppage] [handle] Map expanded memory pages

Starting debug this way will allow you to work on the internal hardware
of the computer and view the contents of all of the memory location in
RAM. You can also load in as many as 128 sectors of a floppy or Hard
disk and view, edit or move the contents to another. You can also use
DEBUG to perform so many other tasks such as:

262
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

 Look at the DOS data area in memory to determine what kind of


equipment is installed on the mother board.
 Look at the internal workings of DOS at such things as the real time
clock, interrupt vector table, ROM BIOS chip and VRAM chips, to
name a few.
 Recover deleted files from a floppy or hard disk.
 Recover lost data from a disk that is un-readable by DOS.
 Do diagnostics on the hardware, such as video display, and drives.
 Low level format a hard-drive

For instance, the assemble command “-A” is used to edit an assembly


program in a specified location of memory.

C:\> DEBUG
-A 0100
14BA:0100 MOV CX,0A
14BA:0103 MOV AX,0
14BA:0106 ADD AX,CX
14BA:0108 LOOP 0106
-

Fig. 5-1. Calling the DEBUG program. The assemble command “-A” is used to
create an assembly program at a given address (here 0100).

One can also trace the program execution step-by-step, using the “T”
command. The trace command displays the content of 8086 registers after
execution of each line of program as shown in figure 5-2.

-T
AX=0000 BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0103 NV UP EI PL NZ NA PO NC
14BA:0103 B80000 MOV AX
-T
AX=0000 BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0106 NV UP EI PL NZ NA PO NC
14BA:0106 01C8 ADD AX
-T
AX=000A BX=0000 CX=000A DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0108 NV UP EI PL NZ NA PE NC
14BA:0108 E2FC LOOP 0106
-T
AX=000A BX=0000 CX=0009 DX=0000 SP=FFEE BP=0000 SI=0000 DI=0000
DS=14BA ES=14BA SS=14BA CS=14BA IP=0106 NV UP EI PL NZ NA PE NC
14BA:0106 01C8 ADD AX
-

Fig. 5-2. Assembly program tracing using the DEBUG Program.

263
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

One can also list any part of the program using the dump “-D” command.
For instance, one can use the dump command to list the BIOS date, which
is stored in the memory address F000:FFF5 through F000:FFFD of the
ROM BIOS of the IBM PC:

C:\> DEBUG
-D F000:FFF5 FFFD
F000:FFF0 31 31 2F-31 32 2F 30 37 00 11/12/07.
-

Fig. 5-3(a). Using the DEBUG program to display memory contents, with “-D”
command.

The following figure 5-3 depicts how to enter a value using the “-E”
command, store it in memory address [210] and calculate its square root
(using coprocessor instructions). A 16-bit integer constant, which is
stored in memory address [210] is read using the FILD (Floating Integer
Load) coprocessor instruction. This number is stored internally in the
80x87 as an 80-bit floating point value. After taking the square root,
using FSQRT, the floating-point result will be stored, using FSTP, at
memory address [200] for inspection.

C:\> DEBUG
-A 100
FILD word [210]
FSQRT
FSTP qword [200]
INT 20
-E 210 ; Enter the value 0005 as a 16-bit integer
3AAO:0210 00.05 00.00 ; Enter the value 05 at address [210]
; and the value 00 at address [211]
-G ; Go! Run the program
-D 200
3AA0:0200 A8 F4 97 9B 77 E3 01 40
3AA0:0210 05 00

Fig. 5-3(b). Using the DEBUG program to enter data, with “-E” command, and run
programs using the “-G” command..

Note that the 16-bit value of the integer 5 reads 0005, and the 64-bit value
of square root of 5 reads 4001E3779B97F4A8

The “-L” command is also used to load a disk sector. For instance, the
following commands load and examine a sample boot sector:

264
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

C:\>DEBUG
-L0000 2 0 1
-D0000 001F
1026:0000 EB 3C 90 4D 53 44 4F 53-35 2E 30 00 02 04 01 00 .<MSDOS5.0
1026:0010 02 00 02 00 00 F8 F8 00-11 00 10 00 11 00 00 00 ... ... ..
-

Fig. 5-3(c). Using the DEBUG program to load a disk sector, with “-L” command.

The DEBUG program can also be called at the DOS prompt, with a
binary file name, that you‟d eventually like to load, un-assemble (decode
it from binary to assembly) and edit, like this.

C://> DEBUG filename

Then, DEBUG will be loaded into memory along with the file that is
specified in the command line and put the first byte of the file at offset
100 of the work area. By starting debug this way, you can view, edit or
move a COM program (smaller than 64 kB).

Notes 5-1. Program Segment Prefix (PSP)


DEBUG sets up a work area in memory of 64k byte, which is equal to
FFFF bytes in Hex. The first 256 byte or 100 Hex bytes of this area is set
aside for what is called the Program Segment Prefix (PSP) of a program
and must not be altered in any way. Whenever we load sectors or data in
memory with debug, it must be put at a location starting at offset 100.
In fact, MS-DOS allows only two types of programs to run under its
control and they must end with the extensions of EXE or COM. The
difference in these two program types is in the way DOS handles the
maintenance portions of the program. This maintenance area, often called
the Program Segment Prefix (PSP), is needed by DOS to return control
back to the operating system when the program terminates.

Note 5-2. Difference between *.COM & *.EXE Programs


The *.COM files are characterized by the following features:
1. COM programs are very small and compact programs that cannot be
larger than 64 kB)in size. The PSP of a COM program is located in the
first 100 Hex (256D) locations of the program.
2. The first instruction of a COM program starts at offset 100 in memory.
3. DOS creates the PSP for the COM program, which means we do not
have to be concerned with this when we assemble a program.
265
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

4. All the data, code, and the stack area are in the same segment.

As for *.EXE files, they have the following characteristics:


1. The EXE programs can be any size from (200 bytes-640k bytes)
2. The PSP must be setup by the programmer, when the program is
assembled. The programmer determines where the first instruction is in
the program.
3. The EXE program uses separate segments for the data, code and stack
area in memory.

5-3 Macro-Assembler Programs


As we pointed out, in the above section, the DEBUG program, though
simple, but it cannot be used to edit long assembly programs. The
DEBUG program has no text editor and has no simple method to save
assembly programs. In addition, we cannot refer to a memory location by
a symbol (a label) when we use the DEBUG program to create assembly
routines. Therefore, if we need to modify the program (e.g., by inserting
or deleting an assembly instruction) we‟ve to modify the absolute
addresses, which are used as references in conditional jump instructions.
Figure 5-4 indicates, how the assembler programs, like MASM, simplify
the editing job, by using labels instead of absolute addresses. Also, the
assembler programs make it easy to edit and save assembly programs.

14BA:0100 MOV CX,0A MOV CX,0A


14BA:0103 MOV AX,0 MOV AX,0
14BA:0106 ADD AX,CX NEXT: ADD AX,CX
14BA:0108 LOOP 0106 LOOP NEXT

Fig. 5-4. A piece of an assembly program, as it appears in DEBUG and MASM


programs. The use of labels in DEBUG is not possible.

So many programmers consider the assembler as a necessity when a high


level language fails in terms of capacity or performance.

Assembly programs, which are written within a macro-assembler (like


MASM and TASM), usually have three constituent parts:

 Machine Instructions
 Assembler Directives
 Assembler Controls

266
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Machine instructions are the machine code that can be executed by the
microprocessor. Appendix A provides an overview about 80x86 machine
instructions. Detailed discussion of the 80x86 instructions can be found in
chapter 4 and Appendix B.

Assembler directives are used to define the program structure and


symbols, and generate non-executable code (data, messages, etc.). Refer
to “Chapter 4. Assembler Directives” are grouped in table 5-2.

Assembler controls set the assembly modes and direct the assembly
flow. Table 5-3 contains a guide to all the assembler controls .

5-4 Assembly Language Instruction Format


Like any other language, the assembler has its own instruction format.
The 80x86 assembler instruction has the following general format:

Name Action Operands Comment

Fig. 5-5. Symbolic format of Assembler instructions

where:

 Name may be a label (an identifier that is followed by a colon) or a


constant name or a variable name. Labels are sometimes used
instead of addresses, in front of instructions you want to refer to
somewhere else in your program. As we‟ll see later, labels make
the assembly code readable and relocatable.
 Action is either a mnemonic of instruction (opcode) that has the
same processor function, or an assembler directive. Directives are
different from opcodes, they are assembler instructions, rather than
microprocessor instructions.
 Operands are optional parameters, which may be from zero to
three operands, depending on the opcode. When present, they take
the form of either literals or identifiers for data items.

The following table depicts different possible formats of an assembler


line.

267
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Table 5-2. Different possible formats of an assembler line

Name Action Operands Comment


Format 1
Label: Opcode Operand(s)
Example
LOADREG: MOV CX,78 ; Let CX=78

Format 2
Constant Directive Operand
Example
NUM2 EQU 18H ; let NUM2=18H

Format 3
Variable Directive Operand
Examples
VAR9 DB 00 ; let VAR9 be a byte
; variable and fill it with 00
MSG DB “Hello” ; let MSG be a byte string
; variable = Hello
String2 DB “Hey”,0 ; String2 is a zero-terminated
; byte string variable = Hey
X BYTE 1 ; X is a byte whose value = 1
Y SBYTE -2 ; Y is a signed byte whose
; initial value = -2

The directive DB (or BYTE or SBYTE) is short for declare byte and the
MSG is an array of bytes (an ASCII character takes up one byte). Data
can be declared in a number of sizes, like bytes (DB), words (DW),
double words (DD) and quad words (DQ). Note that "DB" is an older
term that MASM 6.x and later assemblers updated with “BYTE” or
“SBYTE”. More details about data types and assembler directive will
come in the following sections.

As for operands, there are three basic types of operands that can be used
in assembly instructions, immediate, memory or another register.

An IMMEDIATE operand is usually a number but it can also be a string


literal (characters) in the form "string" which is converted by the
assembler to its ASCII equivalent:

268
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

MOV AL, "string" ; Immediate string of literals


MOV EDX, 0A1H ; Immediate number
MOV EDX, A1H ; Label (variable or constant)

Note that 0A1H is considered as a number while A1H is considered as


label, which may be a constant or a variable name. The leading 0 tells
MASM that this is an immediate number.

A MEMORY operand is an address in memory of some form of data .

MOV AL, [ESI] ; Copy byte whose address in ESI into AL


MOV EDX, lpVar ; Copy the variable whose name (address)
; is lpVar into EDX.

A REGISTER operand is a register with a value in it .

MOV ECX, EDX ; Copy EDX into ECX

The actions that can be performed are determined by the available


opcode. Trying to move one memory operand into another directly does
not work because there is no opcode in the processor to do it .

5-5. Assembler Data Types


The recent x86 assembler programs (like MASM and TASM) can
recognize and treat so many data types. As we pointed out in chapter 4,
the x86 assembly language can recognize the following fundamental data
types: Bytes, words, double words and quad words. In addition, the
recent Intel x86 processors also recognize integers, ordinal numbers, near
pointers, far pointers, strings, bit fields, unpacked BCD and packed
BCD as well as short and long real numbers.

5-6. Assembler Directives


As we mentioned above, assembler programs have directives, which are
different from microprocessor instructions (opcodes). Rather, directives
are assembler instructions that tell the assembler program to do some
jobs. The assembler directives are sometimes called pseudo-codes (or
pseudo-ops). Table 5-2 depicts the main directives of 80x86 assembler
programs.

Example 5-1. Write a template for a minimal assembly program, that


demonstrates the use of various MASM macro-assembler directives.
269
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

TITLE This is a minimal Assembly Program


PAGE 25, 80
;
***********************************************************
DSEG SEGMENT ; Initialization of a data segment called DSEG
; All variables will go into this segment
NUM1 EQU 18H ; NUM1 is a constant = 18H
VAR1 DW 1234H ; VAR1 is defined as a word
VAR2 DB 00 ; VAR2 is a byte assigned value 00
MSG DB “HELLO” ; Variable MSG is assigned a string "HELLO"
TABLE DB 10 DUP(?) ; TABLE is an array of 10 unallocated bytes
DSEG ENDS
;
***********************************************************
SSEG SEGMENT ; Initialization of a stack segment called SSEG
DW 80 DUP (?) ; Reserve 80 words for the stack
SSEG ENDS
;
***********************************************************
CSEG SEGMENT ; Initialization of a code segment called CSEG
ASSUME CS:CSEG,DS:DSEG

START: ; START is just a label, start main code here


:
MAIN PROC FAR ; This is a procedure (subroutine) called MAIN
:
RET ; Return from MAIN procedure
MAIN ENDP ; End of subroutine
:
CSEG ENDS ; End of code segment
END ; End of assembly program

In the above example we see that the assembly program is divided into
three parts, each is called a program segment. Segments begin with the
segment name followed by the reserved word SEGMENT and end with
segment name followed by ENDS. Note that some lines of assembly
modules may contain only assembler directives, instead of
microprocessor instructions. It should be also noted your style of writing
assembly language programs is almost as important as your accuracy.
Good habits in layout, selection of symbolic names, and appropriate
comments help you to program correctly and easily.

270
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Table 5-3. Summary of the x86 macro assembler directives and pseudo-ops.
Directive Description
.286, .386, .486, .586, … Processor directives
.8087, .80387, .NO87 Coprocessor directives
.CODE Start Code segment
.DATA Start Data segment
.EXIT Exit to DOS
.MODEL Select memory mode (small, medium, large, etc.)
.STARTUP Indicate start of program, when using many modules
ABS Absolute value of operand
ALIGN Align to word boundary
ASSUME sr: sy(,...) Assume segment register name(s )
ASSUME NOTHING Remove all former assumptions
BYTE Byte type operation (=DB)
DB e(,...) Define Byte(s (
DD e(,...) Define Double word(s)
DQ Define Quad byte(s)
DT Define Tera byte(s)
DUP Generate duplicate variable or constant
DWORD Double Word operation (=DD)
DW e(,...) Define Word(s(
END End of program
ENDM End of macro
ENDP End of procedure
ENDS End of segment
EQU Assign this as Equal
EXT(sr:) sy(t) External(s)(t=ABS/BYTE/DWORD/FAR/NEAR/WORD)
FAR IP and CS registers altered
HIGH High-order 8 bits of 16-bit value
IF, ELSE, ENDIF Conditional pseudo ops
LABEL t Label (t=BYTE/DWORD/FAR/NEAR/WORD)
LENGTH Number of basic units
LOW Low-order 8 bit of 16-bit value
NEAR Only IP register need be altered
OFFSET Offset portion of an address
ORG Define program starting address (origin)
PAGE n1, n2 Number of lines per page, maximum number of chars/page
PROC t Procedure (t=FAR/NEAR, default NEAR)
PTR Create a variable or label
SEG Segment portion of an address
SHORT One byte for a JMP operation
SIZE Number of bytes defined by statement
TITLE Title line (Header of each page)
TYPE Number of bytes in the unit defined
WORD Word operation (=DW)

271
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

The assembler program can distinguish between opcodes and its


directives (or pseudo-codes) by looking at the first word in the line. If the
first word ends with a colon, it will be interpreted as a label for an
instruction location (address). If it is one of the x86 mnemonics, the line
will be interpreted as a microprocessor instruction. If the first encountered
word is neither a label nor a mnemonic, the assembler program will check
if it is a legal directive or not.

5-7 Declaring Variables in Assembly Language


The MASM 6.x and later assemblers let you declare 1-byte, 2-byte, 4-
byte, 6-byte, 8-byte, and 10-byte variables using the DB/BYTE,
DW/WORD, DD/DWORD, DF/FWORD, DQ/QWORD, and DT/
TBYTE directives.

In addition, the DF/ FWORD directive declares 48-bit pointers for use in
32-bit protected mode on the 80386 and later processors. You should only
use this directive for 48-bit far pointers on the 80386. DQ/QWORD lets
you declare quadword (8-byte) variables. The original purpose of this
directive was to let you create 64-bit double precision floating point
variables and 64-bit integer variables. There are better directives for
creating floating point variables. The DT / DTBYTE directives allocate
10-bytes of storage.

There are two data types indigenous to the 80x87 coprocessor that uses
a10-byte data: ten byte BCD values and extended precision (80-bit)
floating point values. As for the floating point type, you can use REAL4,
REAL8 and REAL10 to reserve 4, 8, and 10 bytes. The operand fields for
these statements may contain a question mark (if you don't want to
initialize the variable) or it may contain an initial value in floating point
form. The following examples demonstrate their use:

I DB 12 ; I is a byte whose initial value is 12 (decimal)


BS DB 45,12 ; Store 45 in BS and 12 in the following memory location
CH DB 'A' ; Store ASCII code of 'A' in a variable called CH
J BYTE 255H ; J is a byte whose initial value is 255 (hexa)
K SBYTE -2 ; K is a signed byte whose initial value is -2
U WORD ? ; Reserve space for word and leave it undefined
INTBIG DD 40 ; Reserve space for dword and let it = 400
INTNEG SDWORD -1 ; Reserve space for signed word and let it = -1
X REAL4 1.5 ; X is a 32-bit (short real) floating number
Y REAL8 1.0E-5 ; Y is an double precision floating number
Z REAL10 -1.94E+3 ; Z is an extended precision floating number

272
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

In addition, you can also define your own types using TYPEDEF
directive, in MASM 6 and later assemblers.

CHAR TYPEDEF BYTE


BOOLEAN TYPEDEF BYTE
FLOAT TYPEDEF REAL4

If you're writing a big assembly program, you'd rather divide it into several
modules (files). The EXTRN directive is used to tell the assembler that the
symbols following it are already defined (declared) in another assembly
module. Also, the directive PUBLIC may be used to tell the assembler that
the symbols following it are shared for all modules.

5-8 Modifiers & Attribute Operators in Assembly


There are several pseudo codes, referred to as attribute operators that will
appear as we progress in our discussion of assembly language. One of these
operators is the pointer (PTR) operator. We've also the value-returning
operators LENGTH, SIZE, OFFSET, SEG, and TYPE.

PTR operator. One of the purposes of the PTR operator is to specify the
length of a quantity in ambiguous situations. It is written after the desired
type to specify the length of unknown length operand. For instance, the
following instruction:

INC BYTE PTR [BX]

tells the assembler to perform a byte increment operation, i.e., to consider


the content of the address which is pointed at by BX, as a byte. Similarly,

INC WORD PTR [BX]

tells the assembler to perform a word increment operation, i.e., to


consider the content of the address which is pointed at by BX, as a word.
Note that writing the above instruction in the following simple form:

INC [BX]

is ambiguous, because the assembler doesn't know if the quantity whose


address is pointed at by BX, is a byte or a word. However, note that in
other situations, the assembler may assume the size of an operand
according to the size of other operand, such as:

ADD AX,[BX]
273
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Here, the assembler will assume that the content whose address in BX is a
word (cause AX is 16 bit) and process a word addition.

LENGTH Operator. The LENGTH operator retrieves the number of


units (bytes, words or dwords) assigned to a variable. For instance, the
following statement will cause 100 words to be associated with the
variable FEES

FEES DW 100 DUP(?)

Then, the statement MOV CX, LENGTH FEES


is equivalent to MOV CX,100

SIZE Operator. The SIZE operator retrieves the number of bytes


assigned to a variable. For instance, in the above example

The statement MOV CX, SIZE FEES


is equivalent to MOV CX,200

OFFSET Operator. The OFFSET operator returns the value of the offset
address (EA) of a variable or a label. For instance, the instruction:

MOV EAX, OFFSET Op1


is equivalent to LEA EAX, Op1

SEG Operator. The SEG operator causes the segment address of a


variable or a label to be inserted as an immediate operand. So, the
following instruction

MOV EAX,SEG Op2

will put the address of the (data) segment containing the variable Op2
inside EAX.

TYPE Operator. The TYPE is used primarily with variables and structures
to return the number of bytes associated with them. So, if Array1 is a one-
byte array, then

ADD SI,TYPE Array1 is equivalent to ADD SI,1

Example 5-2:
The following code shows how to display a message using INT 21.

274
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

We declare a message in the data segment like this: MSG DB "Hello$"


means that MSG is a pre-assigned string constant, which contains the
ASCII characters H, e, l, l, o. Alternatively, you can declare this string as
follows: MSG DB 'H','e','l',',l','o','$'. The words OFFSET and SEG tell the
assembler that you want the segment address or the offset address of the
message put in the register not the contents of the message (Hello) itself.
The $ sign is just a termination character for the message.

MSG DB "Hello$"
MOV DX,OFFSET MSG ; DX contains offset of message MSG
MOV AX,SEG MSG ; AX contains segment of MSG
MOV DS,AX ; DS:DX points to MSG
MOV AH,9 ; DOS function 9 - Display string MSG
INT 21H ; Call DOS service routine.

5-9. Difference between Pointers, Addresses and Values in Assembly


In assembly language it is very important to distinguish between the
ADDRESS of a variable and the VALUE of a variable. The ADDRESS of
a variable is WHERE it is located in memory, the VALUE of a variable
is what is written at that ADDRESS.

The method used in assembler to get the value at an address is a technique


called "dereferencing"

MOV EAX, lpvar ; Copy address into EAX


MOV EAX, [EAX] ; Dereference it
MOV newvar, EAX ; Copy EAX into new variable (newvar)

Using square brackets around EAX gives access to the information at the
address in EAX. A register enclosed in square brackets is effectively a
memory operand. The size of the data accessed at the address is determined
by the size of the register used to receive it. In the above example it is a 32-
bit value as it uses a 32-bit register but it can be done with 16- and 8-bit
values as well using the correct size register.
The so-called Pointers are special type of variables, which contain
addresses of other variables. Pointers are usually used in high level
languages (like C and PASCAL) for passing addresses between subroutines
and performing other types of complex data manipulation.

In assembly language, when we use an instruction like :

LEA EAX, MyVar


275
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

we then put the ADDRESS of a variable into the EAX register (LEA means
load effective address). When you put that ADDRESS into a variable of its
own, you‟ll have a Pointer to the address:

MOV lpMyVar, EAX

5-10. Arrays in Assembly Language


Arrays are aggregates of similar data. Most programming languages
provide for arrays as primitive data structures. In a linear array, each
record is associated with a single integer called its subscript or index.
The records in a linear array X of n records are customarily denoted:
X(0), X(1),…X(n-1). Frequently, arrays are grouped as 1-dimensional
arrays (vectors) or 2-dimensional arrays (matrices). For instance, a
vector (called MyArray) of 4 byte elements may be declared and
initialized as follows:

MyArray DB 29,14,23,10

This line allocates 4 consecutive bytes in RAM. The address of the first
byte element is MyArray, the address of the second byte is MyArray+1,
and so on. Similarly, in order to declare a 100 byte element vector, whose
initial values are 0 we can make use of the DUP directive as follows:

MyBigArray DB 100 DUP(0)

Example 5-3. Assume you had a 1-dimensional array (vector) of 64 items


that are 32-bit (4 bytes) in size and you wanted to read the 16th member of
that 32 bit array.

Solution: You copy the number 16th member of a zero based index into
the register that you are using as the index, the address of the array into
the register that you are using as the base address and then, read the
value of the array member into another register.

MOV ESI, lpArray ; base address register


MOV ECX, 15 ; index register (zero based index)
MOV EAX, [ESI+ECX*4]

These three lines of code have read the required variable from the array
into the EAX register. If you wanted to compare the 16th and 17th
members of the array and not have to use an additional register, you can
add the required displacement so that you only have an extra line of code .

276
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

MOV EAX, [ESI+ECX*4]


CMP EAX, [ESI+ECX*4+4]

For declaring matrices we can proceed as follows: Consider we'd like to


declare an n (columns) x m (rows) matrix of bytes, whose name (address)
is My2DArray. If N = 4 and M = 5, then we may proceed as follows:

My2DArray DB 20 DUP(0)

In order to point to the (Ith row, Jth column) element of an array, put its
address in EAX, such that:
MOV ESI My2DArray
MOV ECX M-1
MOV EAX, [ESI+ECX*I+ J]

Example 5-3. Draw a schematic representation to show how variables in


the following assembly instructions are stored in main memory.
Vector1 DB 0,?,?,5
Mat1 DB 2 DUP(0,3,?)
Solution: The address of the first element of each array is just the name
of the array. The two arrays will be memorized as shown in figure 5-6.

Address Content
 MEMORY

High memory Mat1+5 -----------


Mat1+4 00000011
Mat1+3 00000000
Mat1+2 -----------
Mat1+1 00000011
Mat1 00000000

Vector1+3 00000101
Vector1+2 -----------
Vector1+1 -----------
Vector1 00000000
Low memory
Fig. 5-6. Arrangement of data arrays in the main memory.

Note that the starting address of Vector1 will be decided by the assembler
and will be the first available memory place inside the data segment, where
there is a room for 4 consecutive bytes. The second array (Mat1) will
immediately follow, in the next available 6 consecutive bytes.
277
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-11. Tables & Look-up Tables in Assembly Language


A table is nothing more than an array that is initialized with some data
values that do not change during the execution of the program. A table
can be compared to an array in the same way an integer constant can be
compared to an integer variable. In assembly language, you can use tables
for a variety of purposes: computing functions, controlling program flow,
or simply "looking things up". The assembly language programmers also
use tables to compute complex or otherwise slow functions.

Assembly language programmers tend to compute many values via


lookup tables rather than through the execution of some function. This
has the advantage of being easier, and often more efficient as well. The
following x86 assembly converts the character variable character from
lower case to upper case if character is in the range 'a'..'z'.

MOV AL, character


CMP AL, 'a'
JB NotLower
CMP AL, 'z'
JA NotLower
AND AL, 05FH ; Same operation as SUB AL, 32
NotLower:
MOV character, AL

Using a table look up, however, allows you to reduce this sequence of
instructions to just four instructions:

MOV AL, character


LEA BX, CnvrtLower
XLAT
MOV character, AL

CnvrtLower is a 256-byte table which contains the values 0…60H at


indices 0...60H, 41H…5AH at indices 61H…7AH, and 7BH...0FFH at
indices 7BH…0FFH. Often, using this table look up facility will increase
the speed of your code. Tables may be written using DB, DW directives.
However, one big problem with using a look-up table is creating the table
in the first place, particularly if the table has a large number of entries.

5-12. Other Data Structures in Assembly Language


In computer programs, we need to organize data for efficient storage and
retrieval. Therefore, one of the important issues in the development of a
software program is the selection of appropriate data structures and their
implementation.
278
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Note 5-3. Data Structures.


Technically, a data structure is a collection of data records along with a mechanism
for insertion, deletion and retrieval of records. Each record consists of several fields,
including the information fields that contain the actual data. Other fields of a record
may contain links, which hold addresses of other records.

In the previous sections of this chapter we described the fundamental


types of data structures, like arrays, tables and look-up table. We have
also described the operation of stacks, which are last-in first-out (LIFO)
data structures, in chapter 3. In the following section we describe other
useful data structures, such as queues, linked lists, binary trees and
hash tables and their implementations in x86 assembly programs.

5-12.1. Queues
A queue is a list of records in which records are inserted at one end of the
list (tail of the list), and records are extracted and deleted from the other
end (head of the list). Thus, a queue has a First-In-First-Out (FIFO)
structure: records are removed from the list in the same order as they
arrive. An insertion of a record is said to en-queue it; similarly, deletion
de-queues a record. Note that queue is different from a sack, which has a
Last-In-First-Out (LIFO) structure, such that data is added (pushed) or
deleted (popped) from one end (top of the stack).

5-12.2. Linked Lists


Sometimes we wish to maintain a list of records sorted according to the
value of an information field. We may also wish to insert a new record at
a certain point in the list or to delete some record from the list. What data
structure should we use then? Stacks and Queues are not suitable because
they allow records to be inserted or deleted only at the ends of the list
(LIFO or FIFO). Inserting a new record into the middle of an array
requires tedious relocation of all records of the array. A good solution to
this problem is the linked list, which, permits lists to be modified easily.

Fig. 5-7(a). Illustration of a linked list data structure.

279
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

In a linked list, each record contains a link field which holds the address
of the next record in the list. The sequencing from one record of the list to
the next thus involves accessing the link field of each record. Therefore,
insertions and deletions of records involve only resetting of links. As
records may be located anywhere in memory, linked lists are appropriate
whenever dynamic allocation is needed. However, linked lists are not
needed for storage of static data like tables of constants. In order to
understand the idea of a linked list, consider the following list of names:
Ahmad at offset a, Badr at b, Camel at c, and Darsh at d. Each cell now
has 2 fields: info and link:

Address Info Link


a: Ahmad b
b: Badr c
c: Camel d
d: Darsh 00

The link field in the last record, Darsh, has a special value `00' to mark
the end of the list. We draw this list with arrows as follows:

Fig. 5-7(b). Example of a linked list data structure.

To delete the record Badr, change the link field in the record Ahmad:

The list now has only three records:

To insert a new record Danny at address g between Camel and Darsh, set
the link of Camel to the address of Darsh and the link of Danny to the
address of Darsh:

280
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

In memory, each link field holds one-word offset in the data segment.
Thus if the information fields occupy b bytes, then the length of each
record is b+2 bytes. Assume that the link field is located b bytes from the
beginning of the record; let us define the constant LINK:

LINK EQU b

Suppose BX holds the offset of (the first byte of) a record in a linked list.
Then [BX+LINK] specifies the link field of this record. To change BX to
point to the next record in the list:

MOV BX, [BX+LINK]

To insert a record whose offset is in SI immediately following the record


whose offset is in BX, we proceed as follows:
MOV AX, [BX+LINK] ; Copy link
MOV [BX+LINK], SI
MOV [SI+LINK], AX

Before:

After:

The use of linked lists arose in the early 1960‟s in the course of artificial
intelligence research. The linked list is a fundamental data structure of the
LISP language, which is heavily used for artificial intelligence
programming. Many variations on the idea of linked cells have been
subsequently introduced. For example, a doubly linked list has both
forward and backward links to facilitate searching in the list. The binary
trees also use more than one link per record.

281
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Fig. 5-7(c). Illustration of a double linked list data structure.

5-12.3. Hash Tables


In their quest for ever faster searching methods, engineers and scientists
discovered hashing in the 1950's. The idea is to transform the information
itself into a subscript in a linear array.

Let the array that holds the data be T(0), ..., T(n-1). The array T is called
a hash table. A hash function h transforms the information x into an
integer h(x) such that: . The information x is then stored
at T(h(x)), together with any additional information fields associated with
x. If the record T(h(x)) is already in use, then a collision occurs, and x
must be stored elsewhere. A good hashing scheme minimizes the
frequency of collisions by scattering information into random locations in
the hash table. The choice of hash functions and the resolution of
collisions are discussed below.

Hash Functions: Perhaps the simplest hash function maps x, interpreted


as a positive integer, to its remainder when divided by n:
. If x consists of more than one word, then one can
compress x into one word by forming the exclusive-or of the words x1, ...,
xk that constitute x: .

5-12.4. Binary Trees


To locate a record with a particular information field in a linked list, one
starts at the head of the list and traces through successive records, one at a
time. This process can take a long time, even with sorted lists. to reduce
the time, we can avoid inspecting many records. The binary tree data
structure permits faster searches than linked lists, but requires more space
because it has more link fields in each record.

In a binary tree, each record is stored in a node. For each node X, at most
one node Y is the left child of X and at most one node Z is the right child
of X. In other words, any node X may have 0, 1, or 2 children. X is the
common parent of nodes Y and Z. There is one node in the tree with no
parent; it is called the root of the tree.
282
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Fig. 5-8. Illustration of the binary tree data structure.

In the above figure depicts a binary tree, where each record has a link
pointing to its left child and a link pointing to its right child. For instance,
Badr is the left child of Darsh, and Lola is the right child of Darsh. Darsh
is the parent of both Badr and Lola. Ahmad is the root of the tree.
However, some applications of the binary tree may include a pointer from
each node to its parent.

Notice the arrangement of names in this tree. All names in the left subtree
of any node are lexicographically less than the name at that node; that is,
they would occur earlier in an alphabetic sort. All names in the right
subtree are lexicographically greater. For instance, the left subtree of
Frank is the tree rooted at David, and all names in this subtree are
lexicographically less than Frank. Thus the location of a record in the tree
expresses its relationship to other records.

This arrangement of names permits rapid insertion of new records.


Starting at the root of the tree, compare the new name with the name at
the current node. If the new name is "smaller," then proceed to the left
child; if the new name is "larger," then proceed to the right child. For
example, to insert Harry into the tree, inspect Frank, Janet, and Garth in
that order and then make Harry the right child of Garth. In similar fashion
one can search the tree for a specific name.

If a tree with n records is well balanced, then the maximum number of


records inspected during an insertion is approximately log 2n. In contrast,
for linear lists this maximum number would be n. The references describe
techniques for keeping trees well balanced after both insertions and
deletions.

283
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-13. Working with Strings in Assembly Programs


There are so many forms of string data used in 32-bit and 64-bit operating
systems. The normal data type for zero-terminated strings is BYTE data
and a zero-terminated string can be addressed as a BYTE array in
memory. Working with binary data requires that you keep track of the
length of the BYTE data and this is usually done in a separate variable.

The simplest approach is to use the built-in string instructions. In 32-bit


code, the string instructions use the source index ESI, the destination
index EDI and ECX as the loop counter. The string instructions in this
pre-built loop technique are used with the prefix instructions REP or
REPE/REPNE which repeat the action of the string instruction until a
condition is met.

Example 5-4.
Write down a string-copy subroutine, equivalent to the C-language
function strncpy(src, dest, len), where src and dest are the addresses of
the source and destination strings and len is the number of characters to
be copied.

Solution. The following example shows how this is done. Here src is the
address of the of the source buffer to copy, dest is the address of the
destination buffer, len is the byte count to copy

CLD ; Clear direction flag (DF=0) forward


MOV ESI, src ; Put address into the source index
MOV EDI, dest ; Put address into the destination index
MOV ECX, len ; Put the number of bytes to copy in ECX
REP MOVSB ; Repeat copying bytes from ESI to EDI till ECX= 0

In this example, MOVSB copies each byte from ESI to EDI and
decrements ECX. The exit condition for the REP prefix is when ECX is
decremented to zero . It is assumed that the destination buffer (dest) is
large enough to receive the byte count in the sourc (src). When you copy
a zero-terminated string, you can write an algorithm that copy until it
finds an ASCII zero.

Example 5-5.
Write down a string-copy subroutine, equivalent to the C-language
function strcpy(src, dest), where src and dest are the addresses of the
source and destination strings. The source string src is assumed to be
zero-terminated.

284
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Solution.
The following algorithm shows how this is done.
CLD ; Clear direction flag (DF=0) to read forward
MOV ESI, src ; Put source string address into the source index
MOV EDI, dest ; Put destination string address into the destination index
BACK :
LODSB ; Load byte from source into AL and inc ESI
STOSB ; Write AL to dest and inc EDI
CMP AL, 0 ; See if the byte is an ASCII zero
JNE BACK ; Read the next byte if its not

A trick that will make this algorithm run faster is to directly move each
byte from the source address (src) to AL and then from AL to the
destination address (dest). On Pentium and later processors, it is faster to
use MOV/INC than LODSB or STOSB. This is done by "dereferencing"
both ESI and EDI so that they function as memory addresses as shown in
the following example:

Example 5-6:
Show how to implement the above strcpy(src,dest) subroutine using
MOV and INC instead of LODS and STOS.

Solution:
The following algorithm shows how this is done.

MOV ESI, src ; Put source address into the source index
MOV EDI, dest ; Put destination address into the destination index
BACK :
MOV AL, [ESI] ; Copy byte at address in ESI to AL
INC ESI ; Increment address in ESI
MOV [EDI], AL ; Copy byte in AL to address in EDI
INC EDI ; Increment address in EDI
CMP AL, 0 ; See if the byte is an ASCII zero
JNE BACK ; Jump back and read next byte if not

It should be noted that the direction flag (DF) does not affect this method
and you can use any 32-bit registers when you are not using the string
instructions. This code is longer but faster on recent processors with
pipelines due to what is called pairing.

When mnemonics can go through the two pipelines in pairs, the code runs
nominally twice as fast. The choice of mnemonics in this simple
algorithm is small instructions such that it runs faster than the shorter
algorithm with older string instructions.
285
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-14. Procedures in Assembly Language


A procedure is a subroutine, which consists of a set of instructions that
compute some value or take some action (such as printing or reading a
character value). As we stated so far in chapter 4, the assembly language
implements subroutines procedures using the CALL/RET mechanism.
For example, the following subroutine initializes a 256-byte block, whose
address is pointed at by [SI]:

Init: XOR AX, AX


MOV CX, 128
ZLOOP: MOV [SI], AX
ADD SI, 2
LOOP ZLOOP
RET

Then you can load the SI register with the address of some block of 256
bytes and issuing a CALL Init instruction, you can zero out the
specified block. However, in a Macro-assembler environment, you don‟t
define your own procedures in this manner. Instead, you should use the
MASM PROC and ENDP assembler directives as follows:

Init PROC
XOR AX, AX
MOV CX, 128
ZLOOP: MOV [SI], AX
ADD SI, 2
LOOP ZLOOP
RET
Init ENDP

The x86 microprocessors support NEAR and FAR subroutines calls. The
NEAR calls and returns transfer control between procedures in the same
code segment. Far calls and returns pass control between different
segments. The PROC directive has an optional operand that is either near
or far. If the operand field is empty, then NEAR is assumed.

5-15. Functions in Assembly Language


There is no syntactic difference between functions and procedures in
assembly language. However, the purpose for a function is to return some
explicit value while the purpose for a procedure is to execute some action.
To declare a function in assembly language, use the PROC/ENDP
directives, just like procedures. So, all the rules and techniques that apply to
procedures apply to functions.
286
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-16. Writing & Initializing ISR's in Assembly Language


The interrupt service routines (ISR's) are written like any other assembly
procedure except that they return with an IRET instruction rather than
RET. However, hardware ISR's have a very special restriction: they must
preserve the state of the microprocessor. In particular, the ISR must
preserve all registers that it modifies. For instance, consider the following
simple ISR:

NaiveISR PROC FAR


MOV AX, 0
IRET
NaiveISR ENDP

This ISR obviously does not preserve the machine state. Suppose you
were executing the following code segment when a hardware interrupt
transferred control to the above ISR:

MOV AX, 5
ADD AX, 2
INT nn ; Suppose the interrupt (that calls NaiveISR) occurs here.
:
PRINT

The interrupt service routine would set the AX register to zero and your
program would print zero rather than the value five. Worse yet, hardware
interrupts are generally asynchronous, meaning they can occur at any
time and rarely do they occur at the same spot in a program. Therefore,
the code sequence above would print seven most of the time; once in a
great while it might print zero or two (it will print two if the interrupt
occurs between the MOV AX,5 and ADD AX,2 instructions). Bugs in
hardware interrupt service routines are very difficult to find, because such
bugs often affect the execution of unrelated code.

The solution to this problem, of course, is to make sure you preserve all
registers you use in the interrupt service routine for hardware interrupts
and exceptions. Finally, it should be noted out that writing an ISR is only
the first step for implementing an interrupt handler. You must also
initialize the interrupt vector table entry with the address of your ISR.
There are two ways to accomplish this - directly store the address in the
interrupt vector table or use a DOS call and let it do this task for you..

287
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Storing the address directly is an easy job. All you need to do is to load
the segment register CS with zero (since the interrupt vector table is
situated in segment zero) and store the four byte address at the
appropriate offset within that segment. The following code sequence
initializes the entry for interrupt 255 with the address of the interrupt
routine NaiveISR presented above:

MOV AX, 0
MOV ES, AX
PUSHF
CLI
MOV WORD PTR ES:[0FFH*4], OFFSET NaiveISR
MOV WORD PTR ES:[0FFH*4 + 2], SEG NaiveISR
POPF

This code turns off the interrupts while changing the interrupt vector
table. This is important if you are patching a hardware interrupt vector
because it wouldn't do for the interrupt to occur between the last two MOV
instructions above; at that point the interrupt vector is in an inconsistent
state and invoking the interrupt at that point would transfer control to the
offset of NaiveISR and the segment of the previous interrupt 0FFH
handler. This, of course, would be a disaster. Perhaps a better way to
initialize an interrupt vector is to use DOS' Set Interrupt Vector call.
Calling DOS with ah equal to 25H provides this function. This call
expects an interrupt number in the al register and the address of the
interrupt service routine in DS:DX. The call to MS-DOS that would
accomplish the same thing as the code above is:

MOV AX, 25FFH ; AH=25H, AL= 0FFH.


MOV DX, SEG NaiveISR ; Load DS:DX with
MOV DS, DX ; Address of ISR
LEA DX, NaiveISR
INT 21H ; Call DOS
MOV AX, DSEG ; Restore DS SO IT
MOV DS, AX ; Point back at DSEG.

Although this code is a little bit longer than writing the data directly into
the interrupt vector table, it is safer. Many programs monitor changes
made to the interrupt vector table through DOS. If you call DOS to
change an interrupt vector table entry, those programs will become aware
of your changes.
288
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-17. Creating Macros in Assembly Language


Macros are small subroutines, which can be invoked in any point of the
assembly program, without CALL instruction. Macros are faster than
subroutine calls because they don‟t need to CALL and RET instructions.
Macros sequences can be easily created using the assembler directives
“MACRO”, in the start and “ENDM” at the end. Although there is a
large class of procedures that are totally self-contained, most procedures
require some input data and return some data to the caller.

Parameters are values that you pass to and from a procedure. Pass by
name is the parameter passing mechanism used by macros.
For instance, consider the following MASM macro:

Add12 MACRO Parameter1, Parameter2


MOV AX, Parameter1
ADD AX, Parameter2
ENDM

If you invoked the Add12 macro in the form: Add12 BX, CX, then
MASM emits the following code, substituting BX for Parameter1 and CX
for Parameter2:
MOV AX, BX
ADD AX, CX

Example 5-7: The following example depicts the macro (COPYSTR)


which copies a string of N bytes from its memory address (STR1) to the
address of another string (STR2).

COPYSTR MACRO STR1,STR2,N


MOV CX,N ; Move length of 1st string to CX
LEA SI,STR1 ; Move address of STR1 to SI
LEA DI,STR2 ; Move address of STR2 to DI
REP MOVSB ; Move string bytes (from address
ENDM ; [SI] to address [DI])

You can place the COPYSTR macro at the beginning of your assembly
program and then invoke it as follows:

STRING1 DB “BLOCK1”
STRING2 DB “BLOCK2”
:
COPYSTR STRING1,STRING2,10
:
289
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Note that the COPYSTR macro is invoked in the main program, with its
name followed by the actual parameters (STRING1, STRING2 and 10).

Example 5-8:
The following example demonstrates the creation and use of an assembler
macro (MOVE), which moves data from a location (B) to another (A),
and how it can be invoked from within an assembly program:

MOVE MACRO A,B


PUSH AX
MOV AX,B
MOV A,AX
POP AX
ENDM
:
VAR1 DB 0100H
VAR2 DB 0FFFH
:
MOVE VAR1,VAR2
:
END

Example 5-9:
The following example demonstrates the creation and use of an assembler
macro (PRINT), to print out the IBM PC system time on the screen in
ASCII characters:
PRINT MACRO PARM
PUSHA
MOV AL,PARM
AAM
ADD AL,30H
MOV DL,AL
MOV AH,02
INT 21
POPA
ENDM
:
:
MOV AH,44
INT 21
PRINT CL
PRINT CH
PRINT DL
PRINT DH
END
290
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-18. Assembly Program Compilation & Linking


The programs written in assembly language need a sort of translation
(compilation), in order to be encoded into binary form and executed on a
specific computer. The translation programs, which are used to transform
assembly programs into machine code, are called assemblers. The
resulting machine code (called object code) should be then linked in order
to be ready for execution, on a specific machine, with a specific operating
system. Figure 5-9 depicts the different steps, which are needed to
translate an assembly program file (*.ASM) into executable file (*.EXE).

In addition to the translation process, the assembler programs have the


capacity to write proper modular code which becomes a necessity as a
project becomes larger. Assemblers have the freedom to write code
ranging from the self imposed structural approach to unrestrained
freestyle code, each having their respective advantages. Modular coding
has the advantage of easy organization and debugging, particularly with
larger projects and free style has its advantages in loop optimization code.
Macro-assemblers translate code top-to-down. Compilation from
assembly code to machine code is usually performed in 2 passes.
However, Assemblers usually do more than just translate code. So, the
macro assemblers proceed as follows:
 Pass one – (no machine code created). Symbol table created, a list
of all labels and their addresses
 Pass two – (machine code created). Symbol table used to resolve
all forward references (FR‟s).

Compilation Object Linking


Assembly Object Executable
program code
code code
*.OBJ
*.ASM *.EXE
Compiler *.LST
Linker
Pass 1
Pass 2
*.LIB

Libraries

Fig. 5-9. Flowchart of assembly program compilation and linking.

291
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-19. 16-Bit Macro Assemblers


The 16-bit assemblers (like the Microsoft Macro Assembler MASM16 or
the Borland TASM), are dedicated for 8086/80186/80286 processors, but
can also work with higher processors under DOS. You should specify in
the source code, the paths to the library files are specified and in the batch
files that are used by the editor to drive the build options, the paths of
each binary file is specified as well. The 16-bit assemblers which work
with 64-kB memory segments provide different memory models, namely:

1- The TINY memory model,


2- The SMALL memory model,
3- The MEDIUM memory model,
4- The COMPACT memory model,
5- The LARGE memory model,
6- The HUGE memory model.

The TINY memory model mimics the 8080, all segments fit into one
segment of 64kB. In the SMALL memory model there is a separate code
segment and all other segments (data and stack) fit into one segment.

CS, Code CS DS, CS DS


DS, + SS, CS DS
SS, Data ES SS ES
ES +
Stack

Code Data + Stack


Code Data
Stack

TINY SMALL LARGE


Fig. 5-10. Memory models of 16-bit assemblers.

Other memory models permit the edition of larger programs with multiple
code segments. The LARGE model allows multiple code segments and
multiple data segments. The HUGE model is same as large model, but
allows data segments of 128 kB for large double precision tables. An
illustration of these memory models is shown in figure 5-10.

These are specific instructions to compile and link assembly programs,


under DOS.
292
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

MASM16 synopsis:
ml [-o outfile] infile.asm
Turbo Assembler synopsis:
tasm [-o outfile] infile.asm
tlink outfile [/t]

where infile is the assembly source filename and outfile is the object
filename. The /t switch makes a COM file. This will only work if the
memory model is declared as tiny in the source file. If you have a
compiler other than MASM16 or TASM Turbo Assembler, then refer to
its instruction manual.

5-20. MASM Syntax for x86 Memory Addressing Modes


The Microsoft macro assembler MASM uses several different variations
to denote indexed, based/indexed, and displacement plus based/indexed
addressing modes. You will see all of these forms used interchangeably
throughout this book. The following list depicts some of the possible
combinations that are legal for the various x86 addressing modes:

disp[BX], [BX][disp], [BX+disp], [disp][BX], [disp+BX],


[BX][SI], [BX+SI], [SI][BX], [SI+BX]
disp[BX][SI],disp[BX+SI], [disp+BX+SI], [disp+BX][SI], disp[SI][BX],
[disp+SI][BX], [disp+SI+BX], [SI+disp+BX], [BX+disp+SI], etc.

Note that MASM treats the "[ ]" symbols just like the "+" operator. This
operator is commutative, just like the "+" operator. Of course, this
discussion applies to all the 80x86 addressing modes, not just those
involving BX and SI. You may substitute any legal registers in all the
above addressing modes. The effective address (EA) is the final offset
produced by an addressing mode computation. For example, if BX
contains 10H, the effective address for 10H[BX] is 20H.

293
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

NOTE: Fast Method to Memorize 80x86 Addressing Modes

There are a total of 17 different legal memory addressing modes on the


80x86 processors: disp, [BX], [BP], [SI], [DI], disp[BX], disp[BP],
disp[SI], disp[DI], [BX][SI], [BX][DI], [BP][SI], [BP][DI],
disp[BX][SI], disp[BX][DI], disp[BP][SI], and disp[BP][DI].

You can memorize all these forms so that you know which are valid
(and, by omission, which forms are invalid). However, there is an easier
way besides memorizing these 17 forms. Consider the following chart:
[BX] [SI]
DISP ------ -----
[BP] [DI]

If you choose zero or one items from each of the columns and wind up
with at least one item, you've got a valid 80x86 memory addressing
mode. Some examples. For instance, choose disp from column one,
nothing from column two, [DI] from column 3, you get disp[DI].

Example 5-10.
The following program demonstrates how to write to the screen using the
file function 40H of interrupt 21H. The program makes use of the small
memory model, in which all segments (except the stack) fit into one
segment.

TITLE Example10.asm
.MODEL SMALL
.STACK
.CODE
MOV AX,@DATA ; SETUP DS AS DATA SEGMENT
MOV DS,AX
MOV AH,40H ; FUNCTION 40H - WRITE FILE
MOV BX,1 ; HANDLE = 1 (SCREEN)
MOV CX,17 ; LENGTH OF STRING
MOV DX,OFFSET TEXT ; DS:DX POINTS TO STRING
INT 21H ; CALL DOS SERVICE ROUTINE
MOV AX,4C00H ; TERMINATE PROGRAM
INT 21H
.DATA
TEXT DB "THIS IS A TEXT"
END

294
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Example 5-11.
The next program shows how to set up and call function 13H of interrupt
10H - write string. This has the advantages of being able to write a string
anywhere on the screen in a specified color but it is hard to set up. The
program also makes use of the small memory model.

TITLE: Example11.ASM
.MODEL SMALL
.STACK
.CODE
MOV AX,@DATA ; SETUP DS AS THE SEGMENT FOR DATA
MOV ES,AX ; PUT THIS IN ES
MOV BP,OFFSET TEXT ; ES:BP POINTS TO MESSAGE
MOV AH,13H ; FUNCTION 13 - WRITE STRING
MOV AL,01H ; ATTRIBUTE IN BL, MOVE CURSOR
XOR BH,BH ; VIDEO PAGE 0
MOV BL,5 ; ATTRIBUTE - MAGENTA
MOV CX,17 ; LENGTH OF STRING
MOV DH,5 ; ROW TO PUT STRING
MOV DL,5 ; COLUMN TO PUT STRING
INT 10H ; CALL BIOS SERVICE ROUTINE
MOV AX,4C00H ; RETURN TO DOS
INT 21H
.DATA
TEXT DB "THIS IS A TEXT"
END

Example 5-12.
The next program demonstrates how to write to the screen using REP
STOSW to put the writing in video memory.
TITLE Example12.ASM
.MODEL SMALL
.STACK
.CODE
MOV AX,0B800H ; SEGMENT OF VIDEO BUFFER
MOV ES,AX ; PUT THIS INTO ES
XOR DI,DI ; CLEAR DI, ES:DI POINTS TO VIDEO MEMORY
MOV AH,4 ; ATTRIBUTE - RED
MOV AL,"G" ; CHARACTER TO PUT THERE
MOV CX,4000 ; AMOUNT OF TIMES TO PUT IT THERE
CLD ; DIRECTION - FORWARDS
REP STOSW ; OUTPUT CHARACTER AT ES:[DI]
MOV AX,4C00H ; RETURN TO DOS
INT 21H
END
295
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Example 5-13.
The next program makes use of the TINY memory model, in which all
segments fit into one segment of 64kB. The assembly program
demonstrates some simple input, output operations

TITLE Example13.ASM
.MODEL TINY
.CODE
ORG 100H
START:
MOV DX,OFFSET MESSAGE ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
MOV DX,OFFSET PROMPT ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
JMP FIRST_TIME
PROMPT_AGAIN:
MOV DX,OFFSET ANOTHER ; DISPLAY MESSAGE On SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
FIRST_TIME:
MOV DX,OFFSET AGAIN ; DISPLAY MESSAGE ON SCREEN
MOV AH,9 ; USING FUNCTION 09H
INT 21H ; OF INTERRUPT 21H
XOR AH,AH ; FUNCTION 00H OF
INT 16H ; INTERRUPT 16H GETS A CHAR
MOV BL,AL ; SAVE TO BL
MOV DL,AL ; MOVE AL TO DL
MOV AH,02H ; FUNCTION 02H - DISPLAY CHAR
INT 21H ; CALL DOS SERVICE
CMP BL,'Y' ; IS AL=Y?
JE PROMPT_AGAIN ; IF YES THEN DISPLAY IT AGAIN
CMP BL,'Y' ; IS AL=Y?
JE PROMPT_AGAIN ; IF YES THEN DISPLAY IT AGAIN
THEEND:
MOV DX,OFFSET GOODBYE ; PRINT GOODBYE MESSAGE
MOV AH,9 ; USING FUNCTION 09
INT 21H ; OF INTERRUPT 21H
MOV AH,4CH ; TERMINATE PROGRAM
INT 21H
.DATA
CR EQU 13 ; ENTER CHARACTER
LF EQU 10 ; LINE-FEED CHARACTER
MESSAGE DB "A SIMPLE ASSEMBLY PROGRAM$"
PROMPT DB CR,LF,"HERE IS YOUR FIRST PROMPT.$"

296
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

AGAIN DB CR,LF,"Do You Want To Be Prompted Again? $"


ANOTHER DB CR,LF,"Here Is Another Prompt!$"
GOODBYE DB CR,LF,"GoodBye Hon!."
END START

It should be noted that, if you‟d like to generate a *.COM file (which fits
inside a single 64kB segment), or *.EXE file that can be easily converted
to *.COM file (using the EXE2BIN program), proceed as follows:

1) Give the first instruction a label like START, and make sure that
the final instruction is end START

2) Put an ORG 100H statement at the beginning of the CODE


segment. This is not necessary if you write an installable device driver.

3) Take all variables, tables and move them into the CODE segment.
In fact, you cannot have a separate DATA segment.

4) Don‟t include a STACK segment Make sure that all references to


DS, ES and SS registers in your ASSUME statement refer to CODE
segment. For instance, if your CODE segment is called CSEG, then
write:

ASSUME CS:CSEG, DS:CSEG,SS:CSEG

5) Do not initialize DS, ES or SS in your program, because they are


already point to the CODE segment.

6) When you link your program, the linker program may issue a
warning that there is no STACK segment. Ignore this warning.

297
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-21. 32-Bit Macro-Assemblers


The 32-bit assemblers are both clearer and simpler than the DEBUG and
16-bit assembly programs. You no longer have to deal with the
complexity of segment arithmetic or the AX:DX pairs for long integers
and there is no 64k boundary imposed by the segmented structure of 16-
bit software. The complexity of writing 32-bit Windows software is
related to the structure of Windows and the Windows API (Application
Program Interface) functions set. It differs from DOS code only in so far
as the parameters are passed on the stack rather than in registers as in the
DOS functions interrupts.

The MASM32 (or ML) is a 32-bit version of the MASM macro-


assembler. One of the advantages of MASM32 assembler is that it
comfortably handles the "C" format of the Windows APIs with no
difficulty. Zero terminated strings, structures, pointers, data sizes are all
recognized. Files from different sources usually do not build from
MASM32 without some modifications. Placing the following MASM
specific directives at the beginning of the source file will solve most of
the problems. The other is to use the "include" & "include lib" syntax for
the INCLUDE file and LIBRARIES so that their paths can be found.

.386 ; Forces 32-bit assembly


.model FLAT ; FLAT memory model
.model stdcall ; Standard call convention

As we indicated so far, the FLAT mode is done by starting in DOS real


mode, switching to 32 bit protected mode, loading the segment registers
with selectors which point to descriptors set to a segment size of 4GB and
switching back to meal mode while keeping the segment register values
set in protected mode. You may expect that you are back in the standard
real mode, but - due to the fact that the 386 does not reset the segment
limits while switching back to real mode - you have full 32 bit access to
the whole memory while using 16 bit code. Figures 5-11 demonstrate the
MASM main editing window, with a model assembly file (model.asm).

The MASM32 uses its own version of "windows.inc". It is a file of about


1 MB and has a very large set of equates and structures for 32-bit
windows programming. The “windows.inc” file should always be put
before the system include-files and libraries in any assembly code. The
MASM32 provides so many libraries of input/output and graphic
subroutines. The MASM32 starts with a set of pre-built include files that
are used to build matching libraries at installation.
298
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

Include \masm32\include\windows.inc ; Always first


Include \masm32\include\user32.inc ; System include
include \masm32\include\kernel32.inc ; File next
include \masm32\include\gdi32.inc
includelib \masm32\lib\user32.lib ; Match system
includelib \masm32\lib\kernel32.lib ; Libraries
includelib \masm32\lib\gdi32.lib

Fig. 5-11. Template of an assembly program, under MASM32.

So for each library, you use the include file that matches it. To find a
function that you need, look in the system include file to see which file
has the function prototype and include the matching library. Most of the
common functions are in the following three system DLLs:

299
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

GDI32.INC ; Graphics related functions


KERNEL32.INC ; Operating system kernel functions
USER32.INC ; Various user-interface functions

5-22. 64-Bit Macro-Assemblers


There exist so many 64-bit assembler programs, like YASM, NASM and
GNU AS (GAS). The YASM started in 2001 as a rewrite of the Netwide
x86 assembler (NASM) under BSD license. Since then, it has exceeded
NASM‟s capabilities, incorporating features such as supporting the 64-bit
AMD64 architecture, parsing GNU AS (GAS) syntax, and generating
information for CodeView 8, STABS, and DWARF2 debuggers.

In particular, YASM is available under Windows in two forms: win32


and win64. The win64 or x64 object format generates Microsoft Win64
object files for use on the 64-bit native Windows x64 (and Vista)
platforms. Object files produced using this object format may be linked
with 64-bit Microsoft linkers such as that in Visual Studio 2005 in order
to produce 64-bit executables.
The YASM program synopsis is as follows:

yasm [-f format] [-o outfile] [other options...] infile

where infile is the assembly source filename and outfile (if specified) is
the object filename. If outfile is not specified, yasm will derive a default
output file name from the name of its input file, usually by appending .o
or .obj, or by removing all extensions for a raw binary file. If errors or
warnings are discovered during execution, Yasm outputs the error
message to stderr (the terminal). Many options may be given in one of
two forms: either a dash followed by a single letter, or two dashes
followed by a long option name.

-a arch or --arch=arch : Select target architecture


-f format or --oformat=format : Select object format
-g debug or --dformat=debug : Select debugging format
-h or –help : Print a summary of options
-L list or --lformat=list : Select list file format
-m machine or --machine=machine : Select target machine architecture
-p parser or --parser=parser : Select parser

The last option selects the parser (the assembler syntax). The default
parser is „nasm‟, which emulates the syntax of NASM, the Netwide
Assembler.
300
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

YASM supports so many instruction set architectures (ISAs), such as


x86. The x86 architecture supports the IA-32 instruction set and
derivatives as well as the AMD64 instruction set. It consists of two
machines: x86 (for the IA-32 and derivatives) and amd64 (for AMD64
and derivatives). The default machine for the x86 architecture is the „x86‟
machine. Like most assemblers, each YASM source line contains (unless
it is a macro), a preprocessor directive or an assembler directive:

label: instruction operands ; comment

As usual, most of these fields are optional; the presence or absence of any
combination of a label, an instruction and a comment is allowed. The
BITS directive specifies whether YASM should generate code designed
to run on a processor operating in 16-bit, 32-bit, or 64-bit modes. The
syntax is BITS 16, BITS 32, or BITS 64. Alternatively, USE16, USE32,
and USE64 directives can be used in place of BITS 16, BITS 32, and
BITS 64 respectively for compatibility with other assemblers. Another
available parser is GAS, which emulates the syntax of GNU AS (GAS).

5-23. Summary of x86 Macro-Assembler Programs


The following table depicts the most famous 16-bit, 32-bit and 64-bit
macro-assemblers, for x86 microprocessors.
Table 5-4. Summary of the most famous macro-assemblers, for x86 microprocessors.

Assembler License OS X86 Platforms


A86 Proprietary Windows, DOS 16-bit
A386 Proprietary Windows, DOS 16, 32-bit
FASM BSD Windows, DOS, Unix 16, 32-bit,
GAS GPL Unix-like 16, 32, 64-bit
HLA Freeware Windows, Linux, FreeBSD 64, 32-bit
MASM Freeware Windows 16, 32-bit
NASM LGPL Windows, Linux, DOS, OS/2 16, 32, 64-bit
TASM Proprietary Windows 16, 32-bit
YASM BSD Windows, DOS, Unix 16, 32, 64-bit
WinAsm Freeware Windows 16, 32-bit

Note that WinAsm is a free Integrated Development Environment for


developing 32-bit Windows and 16-bit DOS assembly programs.
WinAsm supports MASM and FASM Add-Ins. WinAsm can be
downloaded from this website: http://www.winasm.net/winasm-studio-
full-package.html
301
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-24. Summary

An assembler is a program that helps you translate the assembly


language words into their corresponding bit patterns very easily, and then
the output of the assembler is placed in memory for the microprocessor to
execute. However, you hardly ever get it right the first time, so you may
need to debug your program and search for syntax or typing errors.
Assembler programs, like MASM (from Microsoft) or TASM (from
Borland) are equipped with powerful editing and debugging tools. You
may also use the DEBUG program, which is supplied with your
operating system for this purpose. Once, the program is written and
debugged, the computer can execute the instructions very fast, and always
do it the same, every time you run your program.

The DEBUG program, which is supplied with the disk operating system
(DOS) of the IBM PC, can be used to write and execute short assembly
programs. When the DEBUG program is started, it responds with its own
hyphen “-” prompt,

When the hyphen prompt appears debug is waiting for you to enter one of
its commands. One can then enter one of the DEBUG single-letter
commands, followed by the appropriate parameters.

The DEBUG program, though simple, but it cannot be used to edit long
assembly programs. Alternatively, the assembler programs, like MASM,
simplify the editing job, and make it easy to edit and save assembly
programs.

302
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-25. PROBLEMS
5-1) Examine the following assembler code and explain the
meaning of each pseudo op.
WY DW 1000
WZ DW 1234H,0ABCDH
TEMP DW ?
SCORES DB 10 DUP(0)
TIMES DW 7 DUP(?)
TOP EQU 13

5-2) Use the DEBUG program, to find out the memory address of DOS
Timer Function. Show how to list the first 10 lines (80 bytes) of this
function, using your DEBUG program.

5-3) Show how to use the DEBUG program, to find out the date of the
BIOS, of your PC, given that the address of BIOS date lies in F000:FFF5
through F000:FFF5.

5-4) Show how to use the “-L” command of the DEBUG program to load
the boot sector of a hard disk

5-5) Write a template file, which may be used to generate any assembly
program, using MASM macro assembler.

5-6) Explain all the interrupts, which are supported by the 8086
microprocessors, giving a brief description of each .

5-7) Explain the term "Vectored Interrupts", give an example of its use
and describe how the 8086 microprocessors obtains the address of an
interrupt vector in relation to its Type number.

5-8) For an 8086 microprocessor, describe the sequence of events which


occur following an interrupt up to the point when normal operation is
resumed.
5-9) Show how to use INT 12 to check the size of your PC Memory.
Write a small assembly program that invokes this interrupt and display
the result on your screen, and compile it using MASM.
Hint: After executing INT 12, (E)AX will contain the total K bytes of
conventional RAM memory on the system. Note that the value in
hexadecimal, and you should convert it to decimal number.

303
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-10) Write a program that displays the time of the day in the following
format: 8:15 P.M., Friday, February 11, 2005. Make use of INT 21, to
display a character on the screen.

5-11) The following procedure demonstrates the operation of checksum


function, which is usually used as check over the integrity of data blocks
or files.
CSUM PROC NEAR
MOV CX,8192 ; 8192 is the number of bytes to add
XOR AL,AL
C26: ADD AL,DS:[BX]
INC BX
LOOP C26
OR AL,AL
RET
CSUM ENDP

Write an assembly program that calls the above procedure to make a


checksum over a block of data of 2k Bytes. If the checksum failed, the
program should displays a message “CHECKSUM FAILURE” and if it
succeeded, it should display “CHECKSUM SUCCEEDED”

5-12) Write an assembly program that detects whether a device driver is


installed in memory or not and find out its ISR address.
Hint: Use the INT 21, DOS call 35 to find out the address of the ISR
whose interrupt number is known. For instance, the Sound Blaster card
uses the interrupt 2FH. Let AX=352FH and call INT 21, then search for
the ISR address (CS:IP) in ES:BX. If ES=0H then the driver is not
installed.

5-13) Find out what does the following program do? Rewrite the program
with comments
PRINT MACRO PARM8
PUSHA
PUSH AX
MOV AL, PARM8
AAM
ADD AL, 3030H
PUSH AX
MOV AH, 02
INT 21
POP AX

304
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

MOV DL,AH
MOV AH,02
INT 21
POPA
ENDM
HR:
DB “Hours”
DB “Min”
MOV AH,44
INT 21
PUSH CX
PUSH DX
PRINT CH
MOV CL,5
LEA SI, Hours
NEXT1:
LODSB
MOV DL,AL
MOV AH,02
INT 02
INC SI
LOOP NEXT1
POP CX
PRINT CL
MOV CL,3
LEA SI, Min
NEXT2:
LODSB
MOV DL,AL
MOV AH,02
INT 21
INC SI
LOOP NEXT2
POP DX
PRINT DH
END

5-14) Write down a string-copy subroutine, equivalent to the C-language


function strcpy(src, dest), where src and dest are the addresses of the
source and destination strings. Show how you can calculate the number of
characters to be copied.

305
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 5

5-26. Bibliography

[1] C. MORGAN and M. WAITE, 8086/8088 16-bit microprocessor


primer, McGraw-Hill 1982.

[2] J. E. UFFENBECK, The 8086/8088 family: Design, Programming


and Interfacing, Prentice-Hall, 1987.

[3] D. Willen and J. Krantz, 8088 Assembler Language Programming:


The IBM PC, Macmillan, NY, 2nd Edition, 1989.

[4] [15] Peter Norton et al, PC Programming Bible, Microsoft Press, 1996.

[5] Barry B. Brey, The Intel Microprocessors 8086/8088, 80186/80188,


80286, 80386, 80486, Pentium, and Pentium Pro Processor Architecture,
Programming, and Interfacing, Book News, NY, 1999.

[6] V. Rajaraman, and T. Radhakrishnan, Essentials of Assembly


Language Programming, for the IBM PC, Prentice-Hall, 2000.

[7] Intel 64 and IA-32 Architectures Software Developers Manual, Vol. 2,


Intel Corp., April 2008.

306
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 6

Writing Assembly
Routines within C/C++
and Java Programs
Contents

6-1. Introduction
6-2. General Considerations (16-bit , 32-bit and 64-bit programs)
6-2.1. Using YASM assembler within Visual Studio and VC++
6-2.2. I/O Software Layers
6-2.3. I/O in DOS, and Windows
6-2.4. Direct Memory Access (ActiveX and all that Stuff)
6-3. C-Programming Language (Summary)
6-4. C++ and Object-Oriented Programming
6-4.1. Object-Oriented Programming (OOP)
6-4.2. Classes in C++
6-4.3. Specific Operators in C++
6-4.4. Input / Output in C++
6-4.5. FILE Input / Output in C++
6-4.6. Inheritance in C++
6-4.7. Polymorphism in C++
6-4.8. Abstract Classes in C++
6-4.9. Operator Overloading
6-4.10. Friend Functions in C++
6-4.11. Generic Types (Templates) in C++
6-4.12. Additional Notes about C++
6-4.13. Common Problems in C/C++
6-4.14. C++11
6-5. Programming under Windows
6-5.1. Windows Messaging System
6-5.2. Writing Windows DLL in C/C++ and Assembly Languages

307

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-6. Writing Assembly Blocks inside C/C++ Programs


6-6.1. The _asm Keyword in Visual C/C++
6-6.2. Using C or C++ Symbols in_asm Blocks
6-6.3. Writing Functions with Inline Assembly
6-6.4. Accessing C or C++ Data in__asm Blocks
6-6.5. Jumping to Labels in Inline Assembly
6-6.6. Calling C-Functions in Inline Assembly
6-6.7. Calling C++ Functions in Inline Assembly
6-6.8. Interrupts in Inline Assembly
6-7. Java-Programming Language (Summary)
6-8. Java versus C++ (Comparison)
6-9. Java versus C# (Comparison)
6-10. Invoking Assembly Language Programs from Java
6-11. Summary
6-12. Problems

308

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Writing Assembly
Routines within
C, C++ and Java Programs

6-1. Introduction
It is well known that the assembly language is more powerful and faster
than high level languages. However, in order to write a huge software
system, it is more practical to use a high level language, like C or C++,
and only use the assembly language when you would like to build
efficient I/O routines. One of the old jokes we heard about assembly
language was something like this: "There are three reasons for using
assembly language: speed, speed, and more speed." Even those who
absolutely hate assembly language will admit that if speed is your primary
concern, assembly I/O routines from within a high-level language is the best
way to go.

In this chapter we describe how to write assembly language routines


within C/C++ programs. Upon completion of this chapter you will be able
to write robust I/O assembly routines, in your C/C++ application
programs.

The following figure depicts the C++ source code of cout function and
how the C/C++ syntax is much easier than its equivalent assembly routine
and binary code. The code first writes out the "H", the same operations
have to be repeated for each letter, "e", "l", "l", "o". If you look up "H"
ASCII code you will find that it is 48, so that substituting a 65 gives "e",
6C "l", and 6F "o" and so on

C/C++ Source Assembly Equivalent Machine Code Binary Code


cout << "Hello"; MOV DL,48 B2 48 10110010 01001000
MOV AH,02 B4 02 10110100 00000010
INT 21 CD 21 11001101 00100001
INT 20 CD 20 11001101 00100000

Fig. 6-1. Piece of a C++ program and its equivalent binary code

309

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-2. General Considerations.


In the following sections, we make use of the well known C/C++
compilers, like Microsoft Visual C++ or Borland C++ or GNU C, as a
vehicle to implement assembly routines inside C/C++ code. In either
case, applications may be simply built as a console (or DOS) application,
without a graphic user interface (GUI). It should be noted that DOS is
inherently a 16-bit program, while Microsoft Windows has been
classically a 32-bit system. Recent operating systems of the IBM PC and
compatibles, like Windows Vista, are 64-bit systems, but can run both 64-
bit and 32-bit as well as 16-bit applications. It should be also noted that
DOS functions, which can be called by INT 21, are only feasible in 16-bit
console (DOS) applications. However, the so-called DOS Protected Mode
Interface (DPMI) provides interrupt-level functions for things such as
switching between real and protected mode, allocating memory, and
setting interrupt vectors. Thus, calling interrupts in protected mode under
DPMI is very similar to calling interrupts in real mode.

Fortunately, the Microsoft Visual C/C++ does not make use of the AX,
BX, CX, DX and ES registers of the x86 microprocessors. Therefore, we
will be able to use them freely in assembly routines. In order to be able to
use any other register of the microprocessor, we save (PUSH) its content
into the stack before any manipulation for our own benefit. After we are
done, we have to reload its original contents again (POP) from stack.
6-2.1. Using YASM Assembler with Visual Studio and VC++
At first, you need to locate the directory where the VC++ compiler
binaries are located and put a copy of yasm.exe in this directory. Yasm
executable binaries that are not named yasm.exe will need to be renamed
yasm.exe after being placed in the appropriate directory. On a win32
system the win32 version of Yasm has to be used. On an x64 system
either the 32 or the 64 bit versions can be used but the rules file is set up
to use the 32 bit version. The win32 Yasm should be placed in the 32-bit
VC++ binary directory, which is typically located at:

Program Files (x86)\Microsoft Visual Studio 8\VC\bin

If needed the 64-bit Yasm binary should be placed in the 64-bit tools
binary directory, which is typically at:

Program Files\Microsoft Visual Studio 8\VC\bin

This allows us to configure Yasm as an assembler within the VC++ IDE.

310

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-2.2. I/O Software Layers


In the PC environment, the most of input/output routines are done using
the operating system. However, the application programmer may need to
perform input/output directly, from within his programs. In order to better
understand how the application program can interact with hardware,
directly or via the operating system, let us jet a look at the I/O software
layers in a PC, as shown in figure 6-2. As shown in figure, there exist 4
typical I/O software layers, which perform well-defined functions in any
operating system (like DOS or Windows). The interrupt handler layer,
is responsible of treating hardware interrupts through a series of different
routines, usually stored in BIOS ROMs. In fact when a peripheral device
is attached to a computer, it needs to a piece of software called a device
driver. These drivers are usually written by system programmers, and
supplied by the peripheral manufacturer. The device-independent layer
is a part of the operating system which is responsible of making the
computer transparent to any externally-added peripheral devices.

This layer is represented in recent operating systems, by a series of virtual


device driver (VxDs), as shown in figure 6-2(b).

Fig. 6-2(a). Operating system layered structure of a PC

6-2.3. I/O in DOS and Windows Environments


The following figure depicts the migration of software from the 16-bit
DOS to the 32-bit / 64-bit Windows environments and how they are
treating with user applications and I/O software. As shown, the user calls
to DOS and BIOS calls are still supported in new systems, via the virtual
memory machine (VMM), which mimics the DOS environment inside
Windows.

311

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Fig. 6-2(b). Operating system layered structure, in old and recent PC’s.
VxD = Virtual Device Driver, VMM= Virtual Memory Machine

6-2.4. Direct Hardware Access (DirectX and all this Stuff)


Nowadays, the PC's with multimedia support (for Video, Audio, Internet
and games applications) is the norm. Such applications involve transfer of
huge amount of data.

In the early days of DOS, the only way to create such speed hungry
applications was to directly access hardware, bypassing the DOS. In fact
DOS didn't give dedicated support for such multimedia devices. On the
other hand, the Window API provided a suitable means to develop
multimedia applications in a seamless manner. For instance, the graphic
functions were grouped in the graphic device interface (GDI), which is a
subsystem of Windows. This made life easier, for programmers, but it
results in a dramatic decrease of execution speed of the applications. So,
multimedia and game developers were forced to write their own
applications, using special direct hardware access interfaces, which do not
have all the burden of the Windows API. The most famous direct
hardware access technologies are:

1- DirectX,
2- OpenGL, and
3- Glide

The DirectX is nowadays installed as part of Microsoft's Windows


98/XP/Vista. DirectX can be subdivided into a large number of
subsystems, such as:

312

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

1- DirectDraw
2- Direct3D,
3- DirectSound,
4- DirectInput
5- DirectPlay
6- DirectMusic

DirectX also includes several layers and components, such as


ActiveMovie and VRML (Virtual Reality Modeling Language), as well
as NetMeeting (Video Conferencing Software).

OpenGL was originally developed by Silicon Graphics Corp. in 1992


and has been adopted by Microsoft and added to Windows to speed up
CAD programs and 3D graphics. OpenGL is portable over several
platforms, like UNIX, Mackintosh OS as well as Windows.

313

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3. C-Programming Language (Summary)


C is a modular programming language and sometimes referred to as a
―high-level assembly language‖. This section provides a basic summary
of the C programming language. Every C program contains a number of
functions or subprograms. Each of these functions can perform a specific
task through a series of C instructions. So, each function can make use of
other functions by calling them. However, there exists a special function
called the main function that can call other functions but cannot be called
by other functions. The main function is called by the operating system
when the compiled program starts to run. The structure of any C-
program looks like the following pseudo code:

# include < stdio.h >


return_type function1(type parameter1, type parameter2,..); // function1 declaration
:
return_type function10(type parameter1, type parameter2,..); // function10 declaration

// -------------------- This is a line comment --------------------------------


int main( )
{
variable_type variable1_name; // Variable declaration
variable_type variable2_name;
:
instructions;
:
function1(parameter1, parameter2,..); // function1 call
variable_name = function10(); // function10 call
return 0;
}
// ------------------------------- End of main -------------------------------

return_type function1(type parameter1, type parameter2,..) // function1 definition


{
:
instructions;
:
return;
}
/////////////////////////////////////////////////////////////////////////////////////////////////////////
return_type function2(type parameter1, type parameter2,..) // function2 definition
{
:
instructions;
:
return;
}

314

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

As shown in the above listing, the first part of the main() function
contains declaration of variables, which we intend to use in our C-
program. Note that each C instruction should be terminated with a
semicolon ―;‖. Also, all functions, as well as groups of C-instructions,
should be packed as whole blocks between parentheses { }.

6-3.1. Data Types in C-language


The following table depicts the main variable types in C language:

Table 6-1. Variable types in C-language.


Type Length (bits) Range
int ( 16 Bit ) = word -32, 768 : +32767
long ( 32 bit ) = dword -2, 147483648 : +2, 147, 983, 648
unsigned int ( 16 bit ) = word 0 : 65535
unsigned long ( 32 bit) = word 0 : 4, 294, 967, 296
float ( 32 bit ) = dword 3.4 x 10 –38 : 3.4 x 1038
double ( 64 bit ) = qword 1.7 x 10-308 : 3.4 x 10308
long double ( 64/80 bit) 3.4 x 10-4932 : 3.4 x 10
char ( 8 bit ) = byte -128 : 127
unsigned char ( 8 bit ) = byte 0 : 255
Tchar (2 byte character ) -32, 768 : +32767
enum ( 16 bit ) = word -32.768 : 32 767

In C-language, the integer literal constants that begin with "0x" are
hexadecimal constants. You need to replace the "0x" prefix with a "$"
prefix when converting the value from C to assembly. For example, the C
literal constant "0x1234A" becomes the assembly literal constant
"$1234A". A character literal constant in C and assembly usually consists
of a single character surrounded by quotes, e,g., `a' and ‗z‘. The C
language defines three different floating-point sizes: float, double, and
long double. Some compilers (e.g., Borland) use a 10-byte extended
precision format for long double while others (e.g., Microsoft) use an
eight-byte double precision format

The C language does not support a string type. Instead, C uses an array of
characters with a zero terminating byte to represent a character string. On
the other hand, Assembly defines a character string type. Fortunately,
assembly string format is compatible with the zero-terminated string
format that C uses, so it is easy to convert assembly strings into C format.
Both languages use double quotes to represent a string literal constant,
like ―MHS‖.

315

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3.2. Variable Declaration in C-language


Variables are data storage locations. Variable names must start with a
letter (a to z) or by an underscore ‗_‘. Only the 1st 32 characters are
significant in C language. In this language the compiler can distinguish
between small and capital letters.

Examples 6-1:

int I, kilos, sacs;


float x, y, price, total ;
double Z[5]; // a 1-dimensional array (a list) of 5 elements
float List[10][5]; // a 2-dimensional array of 10 columns and 5 rows

Note that the array is a collection of variables, which hold the same type
of data. In C, arrays start at position 0. Also, an array can be initialized
when declared. For instance: int Z[5] = {1, 20, 33, 4, 50}; This means
that Z[0]=1; Z[1]=20; Z[2]=33; Z[3]=4; and Z[4]=50.

Each item in an array is called an element, and each element is accessed


by its numerical index. As shown in the following illustration, numbering
begins with 0. The 9th element, for example, is accessed at index 8.

Fig. 6-3. Representation of an array of ten data elements

6-3.3. Expressions
In C language, an expression is anything that evaluates to a value. All
expressions are statements, e.g., y = x +5;

6-3.4. Operators
There exist so many mathematical, logical and relational operators, which
can be used in C language. The following tables depicts different types of
operators in C-language.

316

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Table 6-2. Arithmetic Operators in C-language:

Operator Symbol Meaning Example Notes


= Equal (assign) Y= X;
+ Plus (add) Y = X + Z;
+= Plus (add & assign) Y += X ; Y = Y+X ;
- minus (subtract) Y = X-Z;
-= Minus (sub & assign) Y -= X ; Y = Y-X ;
* product (multiply) Y = X-Z;
** multiply and assign Y = X*Z; Y= Y*X;
/ Division (float) Y = X/Z;
/= Division and assign) Y= X/=Z; Y=Y/X;
% Division (integer) I= X%Z;
** power Z = x**4 Z = x**4
++ increment I++; I= I+1; postfix
-- decrement I--; I= I-1; postfix

Table 6-3. Logical Operators in C-language:

Operator Symbol Example Notes


AND logical && (Expression1 && Evaluates to True
Expression2) or False
OR logical || (Expression1 || ―
Expression2)
NOT logical ! !(Expression) ―

Table 6-4. Relational Operators in C-language:


Operator Symbol Example Notes
Equal == (Expression1 == ―
Expression2)
Not equal != (Expression1 != ―
Expression2)
Greater > (Expression1 > ―
Expression2)
Greater or equal >= (Expression1 >= ―
Expression2)
Smaller that < (Expression1 > ―
Expression2)
Smaller or equal <= (Expression1 >= ―
Expression2)

317

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Table 6-5. Bit-level Operators in C-language:

Operator Symbol Example Notes


AND & (byte1 & byte2) Evaluates to a bit pattern
OR | (byte1 | byte2) ―
XOR ^ (byte1 ^ byte2) ―
NOT ~ ~(byte) ―
SHIFT RIGHT >> Byte >> 1; Shift 1-bit right (/ 2)
SHIFT LEFT << Byte << 1; Shift 1 bit left (x 2)

6-3.5. Conditional Execution & Branching in C-language


The syntax of the if statement is as follows:

if (expression condition) {Instructions}


or
if (expression condition) {Instructions} else {Other Instructions }

Example 6-2:
if (I >= 5) printf ( ―I is greater than 5\n‖);
if (I < 5) printf (―I = %d / n‖, I); else printf (―I is greater than 5\n‖);

The syntax of the switch statement is as follows:


switch (value )
{
case value 1: instructions; break;
case value 2 : instructions ; break;
default: instructions;
}
Example 6-3:
int i = 3;
switch (i )
{
case 1: printf(―You Enterd i =1\n‖); break;
case 2 : printf(―You enterd i =2\n‖); break;
case 3: continue;
default: printf(―You enterd a value not equal to 1 nor 2\n‖);
}

The break statement causes immediate end of a loop, so that an execution


jump is made to the closing brace. The continue statement causes the
loop to begin at its top.
318

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3.6. Looping Instructions in C


The counted for loop statement syntax is as follows:

for (initial value; condition limit; type of evalution)


{
Instructions sequence;
}

Example 6-4:
for (i =1; i <10; i + +) printf ( ―i = %d /n‖, i);

The pre-conditioned while loop statement syntax is as follows:

while (Condition True) { Instructions sequence; }

The post-conditioned do-while loop statement syntax is as follows:

do { Instructions sequence ;} while (Condition True);

Example 6-5:
int I = 1;
do { printf ( ―i = %d /n‖, I); I ++; } while (I < 10);

Example 6-6:
int I = 1;
while (I < 10) { printf ( ―i = %d / n‖, I); I ++; }

6-3.7. Functions Declaration (Prototyping) and Definition


We've said that every C program contains a number of functions (at least
the main function). Each function must be declared before its call. The
declaration of a function is called: function prototyping.

A) Function Declaration (Prototyping):


Any C-function should be declared before usage, as follows:

type function_name (type parameter1, type parameter2, .. );

Example 6-7:
float cube (float x ); // takes a float and returns a float value
void printxy(float x, float y); // takes 2 floats, doesn‘t return any value
319

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The declaration of all functions (prototypes), is usually grouped in header


files. So, the declaration of the C standard functions, which you intend to
use in your program, should be included in the beginning, using the
#include <headerfile_name.h>. For instance the header file math.h
includes all the standard mathematical functions.
B) Function Definition:
The code which describes what the function does, is called the function
definition (body). Any function in C-language must have the following
definition style:
type function_name (type parameter1, type parameter2, .. )
{
declarations of local variables;
:
instructions;
:
return;
}

Example 6-8:
float cube (float x )
{
float y;
y = x*x*x;
return y;
}

C) Function Call:
If the function returns a value, it may be assigned to another variable:

Variable = function_name (parameter1, parameter2, .. );

If the function doesn‘t return a value (void function), it should be directly


called as follows:

function_name (parameter1, parameter2, .. );

Examples 6-9:
float x, y;
y = cube (x ); // call cube() function and returns its value to y.
printxy(x, y); // just call print() function.
320

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3.8. Derived Types


You can define new data types, (other than the predefined ones: int, float,
char,..etc..) using the command typedef.
Example 6-10:
# include < stdio.h >
int main( )
{
typedef unsigned char Byte; // Byte takes values from 0255
Byte alpha = 224, beta = 255;
printF (―%C %C‖, alpha, beta);
return 0;
}

6-3.9. Data Structures


Structures are new types of compound data aggregates. It can be created
using the command struct
Example 6-11, you can create a data structure called employee as
follows:

struct employee {
char name [30],
int code ;
float salary;
};

or as follows:

typedef struct {
char name [30] ,
int code;
float salary;
} employee;

After creation of this new data type (structure) you can use it to create
new structure variables as follows:

int main ( )
{
employee Engineer ;
return 0;
}
321

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3-10. Accessing Structure Members


Now, let us know how to access a structure member. The structure
member is accessed, by calling the structure name, followed by a point
―.‖ and then the member variable name:

Structure_name.member_name

Example 6-12:

# include < stdio.h >


int main ( )
{
employee clerk;
float income;
:
income = clerk.salary
:
return 0;
}

6-3.11. Pointers in C-language


Pointers are variable representation method in C-language. A pointer to a
variable is a place of memory, which contains the address of that variable,
not the variable itself. A pointer is declared as follows:

type * pointer_name;

Examples 6-13:

int intptr; // pointer to an integer type variable


char *chaine // pointer to an a character type variable

6-3-12. Utilization of Pointers with Structures


You can use pointers with structures as follows. The structure member is
accessed, by calling the structure name, followed by a point ―->‖ and then
the member variable name:

Structure_type *Structure_name; // declaration.


Structure_name->member_name; // manipuletion.

322

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Example 6-14:

# include <stdio.h>
# include <string.h>

typedef struct
{
char name [30]; float salary; char code;
} employee;

////////////////////////////////////// Main Function ///////////////////////////////////////////

int main( )
{
int Num, I;
int Code ;
char Name [30] ;
float Salary
employee Engineer [10];
employee *ptr_Engineer;
scanf (―Enter Number of Engineer [<10] %d‖, Num);
ptr_Engineer = Engineer; // point to first Engineer structure
for (I = 0; I <= Num; I++)
{
printf(―Input Data of Engineer Number [%d]\n‖, I);
printf(―1- Input Engineer Name‖); scanf(―%s‖, Name);
printf(―2- Input Engineer Code:‖); scanf(―%d‖, &Code);
printf(―3- Input Engineer Salary:‖); scanf(―%f‖, &Salary);
strcpy (ptr_Engineer.name, Name);
ptr_Engineer->code = Code;
ptr_Engineer->salary = Salary;
ptr_Engineer + +; // point to following employee structure.
}
:
return 0;

}/////////////////////////////////////////////////////////////////////////////////////////////////////

323

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-3.13. Input / Output in C-Language


There exist 3 basic types of data input output (I/O) commands:
1- Console I/O
2- File I/O
3- String (memory) I/O

6-3-13.i. Console Input / Output


The console means the keyboard (as the standard input device) and the
screen (as the standard output device). So you can write a variable to the
screen or read a variable from the keyboard as follows:
A) Input from the Keyboard:

char getchar ( ); // reads a single character from the keyboard


or
char *gets(char *s+ +); // reads a string from the KB
or
scanf( ―format string ..‖, var_list, .); // reads a list of vars from the KB
The scanf() is a formatted output statement. The format strings are
combination of the following identifiers and escape sequences:
Table 6-6: Escape sequences and string format identifiers in C language

Format string identifiers:


%d int
%c char
%f float
%l double
%s string
%o unsigned int
%[ ] scan for a set of characters
Escape sequences:
\n new-line character
\t tab
\b backspace
Examples 6-15:
char Ch; Ch = getchar();
char *Str; gets(Str);
int A; scanf (―%d‖, &A);
int A, B; scanf (―%d %d %c %s \n‖, &A, &B, &Ch, Str);
324

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

B) Output to Screen
The C language provides powerful statements for screen output, such as:

putchar (char Ch); // write a single character to screen


puts (char *Str); // write a string to screen
printf (―..format string..‖, var_list,…); // write a list of variables to screen

Example 6-16:

putchar (‗0x07‘); // write char whose hexadecimal code is 0x07 (Alarm)


printf(―%d %c‖, A, Ch);

Printf() is a formatted output statement. The part of the string that begins
with % in the printf() is called the format specifier. The format for what
appears about a % sign statement is:

%[flag][min width][precision][length modifier][conversion specifier]

Most of these fields are optional, other than providing a conversion


specifier, which you've already seen (for example, using %d to print out a
decimal number). The conversion specifier is the part of format specifier
that determines the basic formatting of the value to be printed out.
Table 6-7. Data conversion specifiers in C-Language.

Specifier Description Example


d or i Decimal integer number (base 10) 9
o Octal integer number in (base 8) 7
x Hexadecimal Integer number (base 16) 123FF
Floating point number using decimal
f 3.1415
representation
e Floating number using scientific notation e 1.86e6
E Like e, but with a capital E in the output 1.86E6
g Use shorter of the two representations: f or e 3.1 or 1.86e6
G Like g, except uses the shorter of f or E 3.1 or 1.86E6

The precision modifier is written ".number", and has different meanings


for the different conversion specifiers (like d or g). For floating numbers
(e.g. %f), it controls the number of digits printed after the decimal point:
printf( "%.3f", 1.2 );
will print 1.200
325

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The plus sign will include the sign specifier for the number; such that

printf( "%+d\n", 10 );

will print +10. Finally, the minus sign will cause the output to be left-
justified. This is important if you use the width specifier and you want the
padding to appear at the end of the output instead of the beginning. Thus

printf( "|%-5d|%-5d|\n", 1, 2 );

displays: |1 |2 | with the padding at the end of the output.

6-3-13-ii. String (Memory) Input /Output:


Here you can read from, or write to, a string variable (in memory).
A) Input from a String

sscanf (char* str, ―format string ..‖, var_list,..);

B) Output to a String

sprintf (char* str, ―format string....‖, var_list,…);

Examples 6-17:
int A; char Ch; char *Str;
sscanf(Str, ―%d %c‖, &A, &Ch); // read A and Ch from the string Str
sprintf(Str, ―%d %c‖, &A, &Ch); // write A and Ch in the string Str

6-3-13.iii. FILE Input /Output in C-Language:


Disk Files are treated as devices in the C-language. However, there exist
many levels for the operation with disk files. The most appropriate level
is called the buffered input/output (or streamed input/output) method. In
this case you have to assign a memory buffer (or a stream) of FILE type.
This stream (which is actually a pointer to a structure of FILE type) is
used as intermediate place between your variables and the disk files.

A) Opening and Closing a File


You can use fopen() and fclose() functions to open and close file streams.

FILE *fp; // fp is a pointer to FILE structure (a stream)


fp = fopen (filename , ―mode‖);
:
fclose (fp);

326

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Example 6-18:

FILE *fp;
If ( fb = fopen ( ―test‖, ―w‖ )) = = NULL )
{ print (―cannot open file‖, exit (1); }

The following table illustrates the available modes for file I/O in C-
language:

Table 6-8. File I/O modes in C-language

Mode Meaning
r Read from a text file ( default )
w Write to a text file ( default )
a Append to a text file ( default )
rb Read from a binary file
wb Write to a binary file
ab Append to a binary file
r+ Open a text file for read/write
w+ Create a text file for read/write
a+ Append a text file for read/write
r+b Open a binary file for read/write
w+b Create a binary file for read/write
a+b Append a binary file for read/write

B) Input from a File


You can use fgetc() and fgets() functions to read characters and strings
from a file stream.

fgetc ( FILE *fp); // read a single character from a file


fgets (char* str, int length, FILE *fp); // read a string from a file
fscanf (File *fp , ―format string ..‖, var_list,..); // read a list of variables from a file
fread(void * buffer, int N_bytes, int count, FILE *fp); // read N bytes from a file

Example 6-19:
Assume fp is a pointer to a FILE structure (a stream). In order to read
(get) characters from the file stream, you may use the fgetc() function as
follows:

char Ch;
do { Ch = fgetc(fp); } while (Ch !=EOF);

327

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

C) Output to a File

fputc (char var, FILE* fp); // write a single character in a file


fputs (char *str, FILE* fp); // write a string in a file
fprintf(FILE* fp, ―format string..‖, var_list,.); // write a list of vars in file
fwrite(void *buffer, int N_bytes, int count, FILE *fp);
// Write a number of bytes in a file

D) Detection of the End of File


You can use feof() function to detect the end of file streams.

int feof (FILE *fp );

Example 6-20:
Assume fp is a pointer to a file stream. You can use feof() to detect the
end of file, as follows.

char *str;
while ( ! feof (fp)) str = fgets (fp);

E) Seeking a Certain Position in a File

int fseek (FILE *fp, long numbyte, int origin );

where numbyte is the number of bytes to be searched in and the origin


maybe an integer or one of the following reserved constants:

Table 6-9. Reserved constants for file search (fseek) in C-language

Origin Meaning
SEEK_SET Beginning of file
SEEK_CUR Current position
SEEK_END End of file

Example 6-22:
The following routine will print all strings, which are separated by 128
byte from myfile

FILE *fp;
fp = fopen(myfile, ‗r‘);
while ( ! feof (fp))
{ fseek (fp, 128, SEEK_CUR ); str = fgets (fp); puts(str); }

328

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

F) Erasing a File (from the disk)


The remove() function can be used to delete a file, as follows:

int remove (char * filename );

6-3.13.iv. Standard Streams in C-Language


You do not need to specify a stream, by the FILE declaration, when you
treat with the standard devices. The following table depicts the standard
I/O stream names in C-language.

Table 6-10. Standard I/O streams in C-language.

I/O Device Stream


screen Stdout
KB Stdin
screen Stderr

However, you can direct input/output from/to standard devices using the
redirection characters (< , >) in the command line.

6-3.14. C-Preprocessor Directives


Any standard C-Compiler has a preprocessor (sometimes called CPP),
that can translate and understand a number of specific statements or
macros. The CPP statements are usually preceded by a directive, such as
#include. All directives begin with the character ―#‖, to distinguish them
from other statements. Here are examples of such directives and macros

A) The #include directive


This directive is used to insert the whole text of another file (usually
called include or header file) in the beginning of the current C-file.

# include ―file_name‖ // the file is located at the current directory


# include <file_name> // the file is located at the include directory

Examples 6-22:

# include <stdio.h>
# include ―myfile.h‖

B) The #define directive


This directive is used to insert a macro definition in the current C-file.
#define macro_name sequence
329

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Examples 6-23:

# define TRUE 1
# defile FALSE 0
# define min (a,b) (a) < (b) ? (a) : (b)
# define min (a,b) (a) > (b) ? (a) : (b)

C) The #pragma directive


You‘ll see in this chapter that you may use the #pragma asm and
#pragma endasm preprocessor directives to insert assembly instructions

#pragma asm
:
#pragma endasm

Example 6-24: Write a C-program that makes use of pragma directive to


calculate the Clock Rate of your PC.

Solution: The following is a WATCOM C program that measures the


speed of your PC by comparing the Pentium's cycle counter to the real
time system clock (RTSC). The program calls the BIOS function
RDTSC().

#include <stdio.h>
#include <time.h>
int RDTSC (void) ; // Read Real-Time system Clock
volatile time_t t;
#pragma aux RDTSC = ".586" "rdtsc" modify [eax edx] value [eax];

main(int argc, char * argv[])


{
int cyclemark1, cyclemark2, dt;
double speed;
if (argc >1) sscanf(argv[1],"%d",&dt); else dt = 5;
printf("Please wait, this program takes about %d sec to run.\n",dt+1 );
t = time(NULL);
while( t == time(NULL) );
t = time(NULL);
cyclemark1 = RDTSC();
while( dt+t > time(NULL) );
cyclemark2 = RDTSC();

330

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

speed = ((double)(cyclemark2 - cyclemark1)) /


((double)1000000)/ ((double) dt);
printf( "Your Pentium PC clock rate is %g MHz\n", speed);
}

6-4. C++ and Object-Oriented Programming


C++ was developed in 1983 by Bjarne Stroustrup. C++ is a superset of C,
i.e. C++ includes the whole syntax of C and extends this for object-
oriented features, which came from Simula programming language.
Today, C++ is the most popular object oriented language.

6-4.1. Object-Oriented Programming (OOP)


An object is a software bundle of related state and behavior. Software
objects are often used to model the real-world objects that you find in
everyday life. Object-oriented programming (OOP) is a programming
paradigm that uses objects and their interactions to design applications
and computer programs. Programming techniques may include features
such as data abstraction, encapsulation, polymorphism, and inheritance.

Abstraction is simplifying complex reality by modeling classes


appropriate to the problem, and working at the most appropriate level of
inheritance for a given aspect of the problem. A class is a blueprint or
prototype from which objects are created. All object-oriented
programming languages provide mechanisms that help you implement the
object-oriented model. They are:

Encapsulation
Inheritance
Polymorphism

Encapsulation is the mechanism that binds code and the data together,
and keeps both safe from outside interference and misuse. One way to
think about encapsulation is as a protective wrapper that prevents the
code and data from being arbitrarily accessed by other code defined
outside the wrapper. Access to the code and data inside the wrapper is
tightly controlled through a well-defined interface. Conclusion: The
wrapping up of data and methods into a single unit (called class) is
known as encapsulation.

331

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Fig. 6-4. Main components of objected-oriented programming technology

Inheritance is the process by which object of one class acquires the


properties of another class. Inheritance supports the concept of
hierarchical classification. For example, the atlas is a part of class bicycle,
which is again a part of the class cycle. As illustrated in the principal
behind this sort of division is that each derived class shares common
characteristics with the class from which it is derived. In OOP, the
concept of inheritance provides the idea of reusability. This means that
we can add additional features to an existing class without modifying it.
This is possible deriving a new class from the existing one. The new class
will have the combined features of both the classes. Thus the real appeal
and power of the inheritance mechanism is that allows the programmer to
reuse a class that is almost, but not exactly, what he wants, and to tailor
the class is such a way that is does not introduce any undesirable side
effects into the rest of the class. The drive class is knows as 'subclass'.
Inheritance provides a powerful and natural mechanism for organizing
and structuring your software.

Polymorphism is a feature that allows one interface to be used for a


general class of actions. The specific action is determined by the exact
nature of the solution. Polymorphism means the ability to take more than
one form. For example, consider the operation of addition. For two
numbers, the operation will generate the sum. If the operands are three
numbers, then the operation would produce the product of them. That is a
single function name can be used to handle different number and different
arguments.
332

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

This is something similar to a particular word having several different


meanings depending on the context. Polymorphism plays an important
role in allowing objects having different internal structures to share the
same external interface. This means that a general class of operations may
be accessed in the same manner even though specific actions associated
with each operation may differ. Polymorphism is extensively used in
implementing inheritance. Polymorphism allows the programmer to treat
derived class members just like their parent class' members.

6-4.2. Classes in C++


C++ allows the declaration and definition of classes. Technically
speaking, classes are some sort of structures that contain data aggregates
as well as functions (or methods) that operate on them. Classes can be
created using the command class

Example 6-25
class Point {
int _x, _y; // member variables (point coordinates)
public: // member functions (methods)
void setX (const int val);
void setY (const int val);
int getX() { return _x; }
int getY() { return _y; }
};

The class data members are sometimes called the class variables and the
class member functions are sometimes called methods. The class data
members (variables) and member functions (methods) may be classified
by 3 modifiers; public, protected and private. The public members can
be manipulated everywhere in the program, without restriction. However,
private members can only be manipulated by the other class functions. If
not specified, class members are private by default. Member functions
(methods) have full access to all data members of the class. They may be
defined inside the class (inline definition), as shown above, or outside the
class (deported definition), as follows:

float Point::getX()
{
return _x;
}
333

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

All C++ classes have one or more special member functions, called
―constructors,‖ that are called to initialize objects. If you don‘t specify a
constructor function in your class definition, the compiler generates a
default constructor with no arguments.
Instances of classes are called objects. An object of a certain class is just
an instance (variable) of this class type. For example, you can declare an
object of the Point class and call its members as follows:

# include < iostream.h >


int main ( )
{
Point apoint;
float x, y ;
apoint.setX(1); // Initialization
apoint.setY(2);
x = apoint.getX();
y = apoint.getY();
:
return 0;
}

You can also create pointers to certain class objects and arrays of objects
in much the same manner as you do with structures.
6-4-3. Class Constructors and Destructors
Constructors are methods which are used to initialize an object at its
definition time. We extend our class Point such that it initializes a point
to coordinates (0, 0):

class Point {
int _x, _y;
public:
Point() { _x = _y = 0; } // constructor
~Point() { } // destructor
void setX(const int val);
void setY(const int val);
int getX() { return _x; }
int getY() { return _y; }
};

Constructors have the same name of the class. They have no return
value. Like other functions, constructors can take arguments.
334

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

When we leave the scope of the definition of the Point object, we must
ensure that the allocated memory is released. We therefore define a
special method called destructor, which is called for each object at its
destruction time. Destructors are declared similar to constructors. They
also use the name of the defining class prefixed by a tilde (~).
6-4-4. Specific Operators in C++
The C++ has some specific operators like this, new, and delete. The new
operator is used for dynamic memory allocation. It returns a pointer to the
allocated memory and delete is used to destroy this pointer.
6-4-5. Input / Output in C++
The C++ has a distinct I/O library, whose functions are available through
the iostream class. When you include iostream.h in your file, you can use
cin and cout for console input/output. Thus, you can use the input stream
cin, to input data to the standard console as follows:
int A; cin >> A;
int A, B; cin >> A >> B ;
Note the use of the >> operator to input data to an input stream. Also,
you can use cout to output data to the standard console as follows:
int A; cout << A;
int A, B; cout << A << B ;
Note the use of the << operator to output data to am output stream. You
can also open and close files for different modes using the derived classes
ifstream, ofstream and their associated functions, as follows:
#include <iostream.h>
int main()
{
ifstream fin; // fin is an ifstream object
fin.open (filename , ―mode‖);
:
fin.close();
:
ofstream fout; // fout is an ofstream object
fout.open (filename , ―mode‖);
:
fout.close();
:
return 0;
}
335

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The following Table depicts the available modes for file I/O in C++:
Table 6-11. Standard I/O streams in C++ language.

Mode Meaning
ios::create Create a new file
ios::app Append

336

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The member functions associated with setting get and put pointers are:
seekg() and seekp() for dragging the get and put pointer, to the specified
position. Both seek methods take an argument (streampos) relative to the
beginning of the file (using ios::beg), the end of the file (ios::end), or the
current position (ios::cur). tellg() and tellp() provide the current location
of the get and put pointers, The following lines clear up most questions:

Example 6-26:
seekg(0); seekg(0,ios::beg); //sets the get pointer to the beginning.
seekg(5,ios::beg); //sets get pointer to 5 chars forward of the beginning.
tellp(); tellg() //returns the current value of the put/get pointer
seekp(-10,ios::end); //sets the put pointer to 10 chars before the end
seekp(1,ios::cur); //proceeds to next char

6-4-6. Inheritance in C++


In our pseudo language, we formulate inheritance with ``inherits from''. In
C++ these words are replaced by a colon. As an example let's design a
class for 3D points. Of course we want to reuse our already existing class
Point. We start designing our class as follows:

class Point3D : public Point {


int _z;
public:
Point3D() { setX(0); setY(0); _z = 0; }
Point3D(const int x, const int y, const int z) {
setX(x); setY(y); _z = z; }
~Point3D() { }
int getZ() { return _z; }
void setZ(const int val) { _z = val; }
};

i. Types of Inheritance
You might notice the keyword public used in the first line of the class
definition (its signature). This is necessary because C++ distinguishes
two types of inheritance: public and private. By default, classes are
privately derived from each other. Consequently, we must explicitly tell
the compiler to use public inheritance. The type of inheritance influences
the access rights to elements of the various superclasses. Using public
inheritance, everything which is declared private in a superclass remains
private in the subclass. Similarly, everything which is public remains
public. When using private inheritance the things are quite different as is
shown in the following table.
337

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Table 6-12: Access rights and inheritance


Type of Inheritance
Private Public
Private Private Private
Protected Private Protected
Public Private Public

The leftmost column lists possible access rights for elements of classes. It
also includes a third type protected. This type is used for elements which
are directly usable in subclasses but are not accessible from the outside.
The second and third column show the access right of the elements of a
superclass when the subclass is private and public derived, respectively.
ii- Inherited Class Construction
When we create an instance of class Point3D its constructor is called.
Since Point3D is derived from Point the constructor of class Point is also
called. However, this constructor is called before the body of the
constructor of class Point3D is executed. In general, prior to the
execution of a particular constructor body, constructors of all superclasses
are called to initialize their part of the created object. For instance

Point3D point(1, 2, 3);

The second constructor of Point3D is invoked. Prior to the execution of


the constructor body, the constructor Point() is invoked, to initialize the
point part of object point. Fortunately, we have defined a constructor
which takes no arguments. This constructor initializes the 2D coordinates
_x and _y to 0 (zero). As Point3D is only derived from Point there are no
other constructor calls and the body of Point3D(const int, const int, const
int) is executed. Here we invoke methods setX() and setY() to override the
2D coordinates. Subsequently, the value of the third coordinate _z is set..
Thus we must only tell that instead of using the default constructor
Point() the paramterized Point(const int, const int) should be used. We
can do that by specifying the desired constructors after a single colon just
before the body of constructor Point3D():
class Point3D : public Point {
...
public:
Point3D() { ... }
Point3D(const int x, const int y, const int z) : Point(x, y) {_z = z; }
...
};
338

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

If we would have more superclasses we simply provide their constructor


calls as a comma separated list. We also use this mechanism to create
contained objects. For example, suppose that class Part defines a
constructor with one argument. Then to correctly create an object of class
Compound we must invoke Part() with its argument:
class Compound {
Part part;
...
public:
Compound(const int partParameter) : part(partParameter) {...}
...
};

This dynamic initialization can also be used with built-in data types. For
example, the constructors of class Point could be written as:
Point() : _x(0), _y(0) {}
Point(const int x, const int y) : _x(x), _y(y) {}

You should use this initialization method as often as possible, because it


allows the compiler to create variables and objects correctly initialized
instead of creating them with a default value and to use an additional
assignment (or other mechanism) to set its value.
iii. Inherited class Destruction
If an object is destroyed, the destructor of the corresponding class is
invoked. If this class is derived from other classes their destructors are
also called, leading to a recursive call chain.
iv- Multiple Inheritance
The C++ language allows a class to be derived from more than one
superclass, as was already mentioned. You can easily derive from more
than one class by specifying the superclasses in a comma separated list:

class DrawableString : public Point, public DrawableObject {


...
public:
DrawableString(...) : Point(...),DrawableObject(...) {...}
~DrawableString() { ... }
...
};

339

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-4.7. Polymorphism in C++


Another way to make objects work together is to define methods that take
different objects as parameters. You get even more cooperation and
efficiency when objects are unified by a common superclass. We can use
this in C++, through the polymorphism mechanism. At first, we define a
virtual method (with no body), within a class, and then derive other
classes from this class with overloaded methods.

class DrawableObject {
public:
virtual void print(); //
};

The virtual method print() will be overloaded and defined later in derived
classes. For instance, the derived class Point can define print() as follows:

class Point : public DrawableObject {


...
public:
...
void print() { …any definition… }
};

Again, print() is a virtual method, because it inherits this property from


DrawableObject.

Any other function, like display() which is able to display any kind of
DrawableObject, can then call the function print(), as follows:

void display(const DrawableObject &obj) {


:
// prepare anything necessary
:
obj.print();
}

When using virtual methods some compilers complain if the class


destructor is not declared virtual as well. This is necessary when using
pointers to (virtual) subclasses. As the pointer is declared as superclass
normally its destructor would be called. If the destructor is virtual, the
destructor of the actual referenced object is called (and then, recursively,
all destructors of its superclasses). Here is an example:
340

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

class Color {
public:
virtual ~Color();
};
class Red : public Color {
public:
~Red(); // Virtuality inherited from Color
};
class LightRed : public Red {
public:
~LightRed();
};

Using these classes, we can define a palette as follows:

Color *palette[3];
palette[0] = new Red; // Dynamically create a new Red object
palette[1] = new LightRed;
palette[2] = new Color;

The newly introduced operator new creates a new object of the specified
type in dynamic memory and returns a pointer to it. Thus, the first new
returns a pointer to an allocated object of class Red and assigns it to the
first element of array palette. The elements of palette are pointers to
Color and, because Red is-a Color the assignment is valid. The operator
delete explicitly destroys an object referenced by a pointer. If we apply
delete to the elements of palette the following destructor calls happen:

delete palette[0];
// Call destructor ~Red() followed by ~Color()
delete palette[1];
// Call ~LightRed(), ~Red() and ~Color()
delete palette[2];
// Call ~Color()

The various destructor calls only happen, because of the use of virtual
destructors. If we did not declared them virtual, each delete would have
only called ~ Color() (because palette[i] is of type pointer to Color).

341

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6.4.8. Abstract Classes in C++


Abstract classes are defined just as ordinary classes. However, some of
their methods are designated to be defined by subclasses. We just
mention their signature including their return type, name and parameters
but not a definition, in the abstract class. One could say, we omit the
method body or, in other words, specify ``nothing''. This is expressed by
appending ``= 0'' after the method or function signatures:

class DrawableObject {
...
public:
...
virtual void print() = 0;
};

This class definition would force every derived class from which objects
should be created to define a method print(). These method declarations
are also called pure methods. Pure methods must also be declared virtual,
because we only want to use objects from derived classes. Classes which
define pure methods are called abstract classes.
6.4 9. Operator Overloading
If we recall the abstract data type for complex numbers, Complex, we can
create a C++ class as follows:

class Complex {
double _real, _imag;
public:
Complex() : _real(0), _imag(0) {}
Complex(const float real, const float imag) : _real(real), _imag(imag) {}
Complex add(const Complex op);
Complex mul(const Complex op);
...
};

Then, we are able to use complex numbers calculate them as follows:

Complex a(1.0, 2.0), b(3.5, 1.2), c;


c = a.add(b);

Here we assign c the sum of a and b. What we should rather use is the "+''
operator to express addition of two complex numbers. Fortunately, C++
allows us to overload almost all of its operators for newly created types.
342

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

For example, we could define a ``+'' operator for our class Complex as
follows:

class Complex {
...
public:
...
Complex operator +(const Complex &op) {
double real = _real + op._real,
double imag = _imag + op._imag;
return(Complex(real, imag));
}
...
};

In this case, we have made operator '+' a member of class Complex. An


expression of the form

c = a + b;

is translated into a method call

c = a.operator + (b);

Thus, the binary operator '+' only needs one argument. The first argument
is implicitly provided by the invoking object (in this case a). However,
an operator call can also be interpreted as a usual function call,

c = operator +(a, b);

In this case, the overloaded operator is not a member of a class. It is


rather defined outside as a normal overloaded function. For example, we
could define operator + in this way:

class Complex {
public:
double real() { return _real; }
double imag() { return _imag; } // No need to define operator here! };
Complex operator +(Complex &op1, Complex &op2) {
double real = op1.real() + op2.real(),
imag = op1.imag() + op2.imag();
return(Complex(real, imag));
}
343

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

In this case we must define access methods for the real and imaginary
parts because the operator is defined outside of the class's scope.
However, the operator is so closely related to the class, that it would
make sense to allow the operator to access the private members. This can
be done by declaring it as a friend of class Complex.

6-4.10. Friend Functions in C++


We can define functions or classes to be friends of a class to allow them
direct access to its private data members. For example, in the previous
section we would like to have the function for operator '+' to have access
to the private data members _real and _imag of class Complex. Therefore
we declare operator '+' to be a friend of class Complex:
class Complex {
...
public:
...
friend Complex operator +( const Complex &, const Complex &);
};
Complex operator +(const Complex &op1, const Complex &op2) {
double real = op1._real + op2._real,
double imag = op1._imag + op2._imag;
return(Complex(real, imag));
}

You should not use friends very often because they break the data
abstraction principle. If you have to use friends very often it is always a
sign that it is time to restructure your inheritance graph.

6-4.11. Generic Types (Templates) in C++


Templates facilitate the generic definition of functions and classes so that
they are not tied to specific types. They remove the burden of redefining a
function or class so that it will work with yet another data type. In C++,
the generic data types are called class templates. A class template looks
like a normal class definition, where some aspects are represented by
placeholders. In the forthcoming list example we use this mechanism to
generate lists for various data types:

template <class T> class List : ... {


public:
...
void append(const T data);
...
};
344

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

In the first line we introduce the keyword template which starts every
template declaration. The arguments of a template are enclosed in angle
brackets. Each argument specifies a placeholder in the following class
definition. In our example, we want class List to be defined for various
data types. One could say, that we want to define a class of lists. In this
case the class of lists is defined by the type of objects they contain. We
use the name T for the placeholder. We now use T at any place where the
type of the actual objects is expected. For example, each list provides a
method to append an element to it. We can now define this method with
T. An actual list definition must now specify the type of the list. If we
stick to the class expression, we have to create a class instance. From this
class instance we can then create ``real'' object instances:

List<int> integerList;

Here we create a class instance of a List which takes integers as its data
elements. We specify the type enclosed in angle brackets. The compiler
applies the provided argument ``int'' and generates a class definition
where the placeholder T is replaced by int, for example, it generates the
following method declaration for append():

void append(const int data);

Templates can take more than one argument to provide more place
holders. For example, to declare a dictionary class which provides access
to its data elements by a key, one can think of the following declaration:

template <class K, class T>


class Dictionary {
...
public:
...
K getKey(const T from);
T getData(const K key);
...
};

Here we use two placeholders to be able to use dictionaries for various


key and data types. Template arguments can also be used to generate
parameterized class definitions. For example, a stack might be
implemented by an array of data elements. The size of the array is
specified as follows:

345

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

template <class T, int size>


class Stack {
T _store[size];
public:
... };
Stack<int,128> mystack;

Here, mystack is a stack of integers using an array of 128 elements. The


next listing shows the definition of doubly-linked lists as class templates.

#include <iostream.h>
enum Bool {false, true};
template <class Type> class List; // forward declaration
template <class Type>
class ListElem {
public:
ListElem (const Type elem) : val(elem) {prev = next = 0;}
Type& Value (void) {return val;}
ListElem * Prev (void) {return prev;}
ListElem * Next (void) {return next;}
friend class List<Type>; // one-to-one friendship
protected:
Type val; // the element value
ListElem *prev; // previous element in the list
ListElem *next; // next element in the list
}; //---------------------------------------------------------

template <class Type>


class List {
public:
List (void) {first = last = 0;}
~List (void);
virtual void Insert (const Type&);
virtual void Remove (const Type&);
virtual Bool Member (const Type&);
friend ostream& operator <<(ostream&, List&);
protected:
ListElem<Type> *first; // first element in the list
ListElem<Type> *last; // last element in the list
};

Here List represents a doubly-linked list and ListElem represents a list


element. It consists of a value whose type is denoted by the type Type.
The forward declaration of the List class is necessary because ListElem
refers to List.

346

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-4.12. Additional Notes about C/C++


In addition to the object-oriented extensions, we present here some other
basic extensions of C++ to already introduced concepts of C-language.
C++ adds a new comment which is introduced by two slashes (//) and
which lasts until the end of line. C++ introduces a new data type called
reference. You can think of them as aliases to variables or objects. An
alias cannot exist without its corresponding real part. The ampersand (&)
is used to define a reference. For example:
int ix; // ix is "real" variable
int &rx = ix; // rx is alias for ix

References can be used as function arguments and return values.


6-4.13. Common Problems in C/C++
Since C++ is superset of C, it has some bad features of C-language.
Manual allocation and deallocation of memory is tedious and error prone.
For instance, the usage of char * and strcpy causes memory problems.
The following techniques are proposed to overcome the faults of C.

1. Use C++ references instead of pointers, or


2. Use String class instead of char *. String class is part of C++ library.
3. To use the char *, you better put all your C programs in a separate file
and link to C++ programs by extern "C" −

extern "C" {
#include <stdlib.h>
}
extern "C" {
some_c_function();
}

6-4.14. C++11
C++11 is the new standard of C++. It fixes many bugs and add
many language features, such as the auto keyword and lambda
(inline) expressions. In C++11, you don't need to provide the type of a
variable if the compiler can determine its type from its initialization. For
example, you can write a piece of code like this:

int x = 3;
auto y = x;

Then, the compiler will deduce that y is an int.


347

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-5. Programming under Windows


Programming under DOS (or Consol Applications) is based on the
concept of a main program that takes control of the CPU when the
program first starts running and this main program drives the application.
The main program calls the various functions and subroutines of the
application and makes requests for operating system services via calls to
the DOS, which return once the DOS deliver the requested service.
However, programming under Windows makes use of a different
mechanism. The Windows takes control of the show and communicates
with your application through messages. It is, effectively, the "main
program" that tracks events throughout the system and calls functions
within various applications while accumulating events (like keyboard
strokes or mouse clicks) that it feels the application needs to service. In
the beginning of the program execution, Windows calls a special
procedure (WndProc) and passes it some messages. Part of the message
tells the window procedure what event has occurred that the window
procedure has to handle. The window procedure then transfers control
(dispatches) to some functions that handles that particular event. This
completely changes the way one writes a program from the application
programmer's perspective.

6-5.1. Windows Messaging System


Programming However, if you have ever written a 16-bit DOS
application in assembly language, you should have done some message
passing. The DOS interrupt INT 21H that calls DOS is equivalent to
calling the Windows procedure WndProc. The values you pass in the
x86 processor registers correspond to the message and, in particular, the
value in the AH register selects the particular DOS function you wish to
invoke. Although the perspective is different (Windows is calling you
instead of you calling Windows), the basic idea behind message passing
is exactly the same. Figure 6-5 depicts the block diagram of a typical
Windows application and how it is running under Windows.
The infinite message loop captures the Windows messages, translates
them and dispatch them, via WndProc as follows:

While (GetMessage(&msg, NULL,00))


{
TranslateMessage (&msg);
DispatchMessage (&msg);
}

348

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

WinMain( )
Windows messages  {  Exit()
 MessageLoop  WndProc( )
}

Fig. 6-5. Block diagram of a typical Windows application program and its interaction
with Windows via messages.

To process the messages, a MessageQueue is created for the window or


the thread. It collects all incoming messages. You evaluate them using the
function GetMessage. GetMessage always returns a message. That's
because this function waits (and releases CPU time to other programs)
until a message has arrived.
349

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

If you don't want to wait, you can use PeekMessage instead. This
function returns immediately. Therefore it also returns whether a message
has arrived at all. If the return value of GetMessage equals 0, WM_QUIT
has occurred. Using DispatchMessage you forward the message to the
window procedure by means of the OS.

Resources are data which are linked in the program file. This feature is
used to include icons, menus, and multiple language support. In order to
create resources, you need a resource script (*.RC). It describes the
resources to be linked in your *.exe file. You can create resource scripts
using a text editor or a resource editor. It's compiled together with the
data to a *.res or *.obj file, which then gets passed to the linker.
To use Win32 functions, you have to include the required *.lib files in
the program. While TASM stores all functions in import32.lib, MASM
has a separate LIB for every DLL. That means that if you use MASM,
you have to check what DLL contains the function you need.

Table 6-13. Brief list of important message of Windows operating systems

Window Message Function


WM_CREATE The window has been created. This message is
automatically produced by CreateWindowEx.
WM_QUIT The program is to exit.
WM_DESTROY The window is to be destroyed.
WM_PAINT The window is to be re-painted.
WM_KEYDOWN, I don't think I have to explain these. Useful: The
WM_KEYUP scancode is included.
WM_SYSKEYDOWN, Like above, but combined with the ALT keys (menu
WM_SYSKEYUP commands).
WM_MOUSEMOVE Guess what... Additionally, it contains the state of the
mouse buttons. For the mouse buttons itself there are
also WM_LBUTTONDOWN, WM_LBUTTONUP,
WM_RBUTTONDOWN, WM_RBUTTONUP.
WM_TIMER This is sent by the standard timer if it's activated. The
multimedia timers, however, are a better solution for
faster demos and games.
WM_ACTIVATE The application is activated or terminated. You should
evaluate this message in demos and games in order to
stop all threads to prevent the program from continuing
in the background.
WM_COMMAND A menu item has been selected.

350

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-5.2. Writing Windows DLL in C/C++ and Assembly Languages


Most of programming literature about Windows 95/98/XP/NT and Vista
programming deals exclusively with C/C++ programming language. As
the amount of memory in a system is always limited, it makes perfect
sense to make application programs as small and as fast as possible. We
dully note that the smallest and fastest programs are those written in
assembly language. Yet, if anything needs to be optimized in assembly
language, it is the dynamic link library (DLL) programs. After all, a DLL
is meant to be shared by several programs, perhaps running at the same
time. So it is essential that it fit in as little memory as possible.

We present in Appendix F a sample assembly program to show how to


write an optimized DLL for both assembly and C languages. The program
is named FileCRC.asm. It calculates the CRC-32 of a file and can be used
in many applications. If you read the comments in the code, you will
notice that the problem of creating a DLL is fast, when we pass
parameters in the CPU registers. Assembly language programs can call all
functions in crc32.dll using registers, while C and other programs can use
the stack.

This program is a classical example showing how it is so efficient to use


registers. CRC-32 is typically called from a big loop and the return value
of the CRC-32 calculation is passed as a parameter to the next CRC-32
calculation. Using EAX for both the return value and the parameter to the
next call means you do not have to worry about passing the parameter at
all. This presents a dilemma: Should DLL's written in assembly language
use a HLL interface to be useful to as many programmers as possible, or
should they pass parameters in registers? The answer shown here is
simple: Use both. Code it for assembly language interface, but also
include functions callable from C (or any other high level language). As it
turns out, you can just pop the parameters off the stack and fall through to
the fast routines. This is thanks to the STDCALL interface used in
Windows 9X, in which the called function clears the stack, not the caller.
You can even give the C-language and assembly language functions the
same name, with capitalization being the only difference. The STDCALL
convention, after all, is case sensitive.

Two versions of the CRC-32 function are present in Appendix F, one


suitable for calls from high-level languages (HLL) like C or C++, the
other for calls from assembly language programs.

351

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-6. Writing Assembly Blocks in C/C++ Programs


As we stated above, every C-program has a main function and sometimes
other functions. If we would like to add any sequence of assembly
instructions inside the main function (or any other function), they should
be written within _asm { instruction sequence } block.

6-6.1. The _asm Keyword in Visual C/C++


The __asm keyword invokes the inline assembler and can appear
wherever a C or C++ statement is legal. It cannot appear by itself. It must
be followed by an assembly instruction, a group of instructions enclosed
in braces, or, at the very least, an empty pair of braces. The term ―__asm
block‖ here refers to any instruction or group of instructions, whether or
not in braces. The following code fragment is a simple __asm block
enclosed in braces:

__asm
{
MOV AL, 2
MOV DX, 0xD007
OUT AL, DX
}

Alternatively, you can put __asm in front of each assembly instruction:

__asm MOV AL, 2


__asm MOV DX, 0xD007
__asm OUT AL, DX

Because the __asm keyword is a statement separator, you can also put
assembly instructions on the same line:

__asm MOV AL, 2 __asm MOV DX, 0xD007 __asm out AL, DX

All the three examples generate the same code, but the first style
(enclosing the __asm block in braces) has some advantages. The braces
clearly separate assembly code from C/C++ code and avoid needless
repetition of the __asm keyword. Braces can also prevent ambiguities. If
you want to put a C/C++ statement on the same line as an __asm block,
you must enclose the block in braces. Without braces, the compiler
cannot tell where assembly code stops. Finally, because the text in braces
has the same format as MASM text, you can cut and paste text from
existing MASM source files.
352

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Unlike braces in C/C++, the braces enclosing an __asm block don‘t affect
the variable scope.

6-6.2. Using C or C++ Symbols in_asm Blocks


An __asm block can refer to any C/C++ symbol in scope where the block
appears. Note that C/C++ symbols are variable names, function names
and labels; that is, names that aren‘t symbolic constants or enum
members. Therefore, you cannot call C++ member functions. The
following few restrictions apply to C/C++ symbols:

 Each assembly-language statement can contain only one C/C++


symbol. Multiple symbols can appear in the same assembly instruction
only with LENGTH, TYPE, and SIZE expressions.
 Functions referenced in an __asm block must be declared (prototyped)
earlier in the program. Otherwise, the compiler cannot distinguish
between function names and labels in the __asm block.
 An __asm block cannot use any C/C++ symbols with the same
spelling as MASM reserved words. MASM reserved words include
instruction names such as PUSH and register names such as SI.
 Structure and union tags are not recognized in __asm blocks.

6-6.3. Accessing C or C++ Data in__asm Blocks


A great convenience of inline assembly is the ability to refer to C/C++
variables by name. An __asm block can refer to any symbols, including
variable names, that are in scope where the block appears. For instance, if
the C variable var is in scope, the following instruction stores the value
of var in EAX

__asm MOV EAX, var

If a class, structure, or union member has a unique name, an __asm block


can refer to it using only the member name, without specifying the
variable or typedef name before the period (.) operator. If the member
name is not unique, however, you must place a variable or typedef name
immediately before the period operator. For example, the following
structure types share same_name as their member name:

struct first_type {
char *wawa; int same_name;
};

353

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

struct second_type
{
int waq; long same_name;
};

If you declare variables with the types

struct first_type HAL;


struct second_type OAT;

All references to the member same_name must use the variable name
because same_name is not unique. But the member weasel has a
unique name, so you can refer to it using only its member name:

__asm
{
MOV EBX, OFFSET hal
MOV ECX, [EBX] hal.same_name ; Must use 'hal'
MOV ESI, [EBX].weasel ; Can omit 'hal'
}

Note that omitting the variable name is merely a coding convenience. The
same assembly instructions are generated whether or not the variable
name is present. You can access data members in C++ without regard to
access restrictions. However, you cannot call member functions.
6-6.4. Writing Functions with Inline Assembly
If you write a function with inline assembly code, it‘s easy to pass
arguments to the function and return a value from it. The following
examples compare a function first written for a separate assembler and
then rewritten for the inline assembler. The function, called power2,
receives two parameters, multiplying the first parameter by 2 to the power
of the second parameter. Written for a separate assembler, the function
might look like this:
; POWER.ASM
; Compute the power of an integer
PUBLIC _power2
_TEXT SEGMENT WORD PUBLIC 'CODE'
_power2 PROC
PUSH EBP ; Save EBP
MOV EBP, ESP ; Move ESP into EBP so we can
; refer to arguments on the stack

354

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

MOV EAX, [EBP+4] ; Get first argument


MOV ECX, [EBP+6] ; Get second argument
SHL ESX, CL ; EAX = EAX * ( 2 ^ CL )
POP EBP ; Restore EBP
RET ; Return with sum in EAX

_power2 ENDP

_TEXT ENDS

END

Since it is written for a separate assembler, the function requires a


separate source file and assembly and link steps. C and C++ function
arguments are usually passed on the stack, so this version of the power2
function accesses its arguments by their positions on the stack. Note that
the MODEL directive, available in MASM and some other assemblers,
also allows you to access stack arguments and local stack variables by
name. The POWER2.C program writes the power2 function with inline
assembly code:

// POWER2.C
#include <stdio.h>

int power2( int num, int power );

void main( void )


{
printf( "3 times 2 to the power of 5 is %d\n", power2( 3, 5) );
}
int power2( int num, int power )
{
__asm
{
MOV eax, num ; Get first argument
MOV ecx, power ; Get second argument
SHL eax, cl ; EAX = EAX * ( 2 to the power of CL )
}
/////////////// Return with result in EAX ////////////////////////////////
}

355

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The inline version of the power2 function refers to its arguments by


name and appears in the same source file as the rest of the program. This
version also requires fewer assembly instructions. Because the inline
version of power2 does not execute a C return statement, it causes a
harmless warning if you compile at warning level 2 or higher. The
function does return a value, but the compiler cannot tell that in the
absence of a return statement.

You can use #pragma warning to disable the generation of this warning.

6-6.5. Jumping to Labels in Inline Assembly


Like an ordinary C or C++ label, a label in an __asm block has scope
throughout the function in which it is defined (not only in the block).
Both assembly instructions and goto statements can jump to labels inside
or outside the __asm block.

Labels defined in __asm blocks are not case sensitive; both goto
statements and assembly instructions can refer to those labels without
regard to case. C and C++ labels are case sensitive only when used by
goto statements. Assembly instructions can jump to a C or C++ label
without regard to case. The following code shows all the permutations:

void func( void )


{
goto C_Dest; // Legal: correct case
goto c_dest; // Error: incorrect case
goto A_Dest; // Legal: incorrect case
goto a_dest; // Legal: correct case
__asm
{
jmp C_Dest ; Legal: correct case
jmp c_dest ; Legal: incorrect case
jmp A_Dest ; Legal: incorrect case
jmp a_dest ; Legal: correct case
:
a_dest: ; __asm label
}
C_Dest: // C label
return;
}

356

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Do not use C library function names as labels in __asm blocks. For


instance, you might be tempted to use exit as a label, as follows:

; This is BAD: using library function name (like EXIT) as a label


JNE EXIT
:
EXIT:

Because exit is the name of a C library function, this code might cause a
jump to the exit function instead of to the desired location. As in MASM
programs, the dollar symbol ($) serves as the current location counter. It
is a label for the instruction currently being assembled. The main use of
__asm blocks is to make long conditional jumps:

JNE $+5 ; next instruction is 5 bytes long


JMP farlabel
; $+5
:
farlabel:

6-6.6. Calling C-Functions in Inline Assembly


The __asm block can call C functions, including C library routines. The
following example calls the printf library routine:

#include <stdio.h>
char format[] = "%s %s\n";
char hello[] = "Hello";
char world[] = "WORLD";
void main( void )
{
__asm
{
MOV EAX, OFFSET world
PUSH EAX
MOV EAX, OFFSET hello
PUSH EAX
MOV EAX, OFFSET format
PUSH EAX
CALL printf
//clean up the stack so that main can exit cleanly
//use the unused register EBX to do the cleanup

357

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

POP EBX
POP EBX
POP EBX
}
}

Because function arguments are passed on the stack, you simply push the
needed arguments—string pointers, in the previous example—before
calling the function. The arguments are pushed in reverse order, so they
come off the stack in the desired order. To emulate the C statement this
example pushes pointers to world, hello, and format, in that order, and
then calls printf.

printf( format, hello, world );

6-6.7. Calling C++ Functions in Inline Assembly


An __asm block can call only global C++ functions that are not
overloaded. If you call an overloaded global C++ function or a C++
member function, the compiler issues an error. You can also call any
functions declared with extern "C" linkage. This allows an __asm block
within a C++ program to call the C library functions, because all the
standard header files declare the library functions to have extern "C"
linkage.

6-6.8. Interrupts in Inline Assembly


The following simple (and safe) program makes use of INT 5 instruction,
which invokes the PC print screen function. You can compile this
program with a C/C++ compiler and execute it, under DOS, to check its
operation.

# include < iostream.h >


int main()
{
_asm {
INT 5
}

return 0;
}

358

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Now consider the following example, which reads one character from the
keyboard and displays it on the screen, if it is between ‗0‘ and ‗9‘.

# include < stdio.h >


int main()
{
_asm {
MOV AH,8
INT 21H
CMP AL,‘0‘
JB CORNER
CMP AL,‘9‘
JA CORNER
MOV DX,AX
MOV AH,2
INT 21H
}
CORNER:
return 0;
}

In this program we make use of INT 21, to call various DOS functions.
For instance, the keyboard input function is called by loading the
accumulator high byte, AH, with 8H and then calling INT 21. Also, the
video output function is called by loading AH with 2H and then calling
INT 21 again. Note that if the input character is below 0 or above 9, the
assembly routine invokes conditional jump instructions (JB, which means
jump if below, and JA, which means jump if above) to transfer control to
an external location (the label CORNER) outside the assembly block.

It should be noted that using DOS functions calls (by INT 21) is very
difficult in Windows 32-bit applications. So, if you‘d like to make data
input/output from/to console in a 32-bit Windows application, use the
console functions _getch() to input characters (bytes) or _putch() to
display characters (bytes).

359

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-7. Java Programming Language (Summary)


The Java programming language was developed by Sun Microsystems, as
a fast programming language for internet applications. Java was called
Oak, and originally designed by James Gosling for use in embedded
electronic applications. After several years of experience with the
language, it was re-targeted to the Internet, and renamed Java. Java is a
high-level language that is characterized by simplicity, portability and
robustness. Actually, Java was derived from Ada95, which is a powerful
object-oriented programming language. Java is also platform-
independent. Platform independence is one of the most significant
features where you can write your code once and run it anywhere on any
computer, under any operating system, equipped with a Java virtual
machine (JVM). As you know, most programming languages needs to
either be compiled or interpreted so that you can run programs on your
computer. The Java programming language is unusual in that a program
is both compiled and interpreted. The compiler first translates a program
into an intermediate language called Java bytecode. Compilation happens
just once and interpretation occurs each time the program is executed on a
computer. The following figure illustrates how compilation and
interpretation of Java programs works.

Fig. 6-6. Compilation and interpretation of Java programs

Sun Microsystems provide Java Development Kits (JSDK) for many


platforms including Windows and Linux. Sun also provides a standard
edition for its Java platforms. You can obtain the Java2 Standard Edition
(J2SE) and Enterprise Edition (J2EE) from java.sun.com web site. The
Microsoft Visual J++ is also a powerful tool that offers visual
environment to create, test and deploy Java language. The following
sections show how Java is both a programming language and a platform.

360

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-7.1. Variable Declaration in Java


The Java programming language is strongly-typed, which means that all
variables must first be declared before they can be used. This involves
stating the variable's type and name, as you've already seen in C-
programming language:
int gear = 1;

This tells the compiler that a field named "gear" exists, holds numerical
data, and has an initial value of "1". Other examples are as follows:

char capitalC = 'C';


byte b = 100;
double d1 = 123.4;

6-7.2. Primitive Data Types


The following table summarizes the main variable types and their default
values in Java. The eight primitive data types are: byte, short, int, long,
float, double, boolean, and char.
Table 6-14. Basic data types in Java

Type Description Default Value


byte 8-bit signed two's complement integer 0
short 16-bit signed two's complement short integer 0
int 32-bit signed two's complement integer 0
long 64-bit signed two's complement long integer 0L
float Single-precision 32-bit IEEE 754 floating point 0.0f
double Double precision 64-bit IEEE 754 floating point 0.0d
char 16-bit Unicode character '\u0000'
boolean One bit (true or false) false

In addition to the eight primitive data types listed above, the Java
programming language also provides special support for character strings
via the java.lang.String class. Enclosing your character string within
double quotes will automatically create a new String object; for
example, String s = "this is a string"; String objects
are immutable, which means that once created, their values cannot be
changed. The String class is not technically a primitive data type, but
you may think of it as such.

361

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-7.3. Enum Types


An enum type is a type whose fields consist of a fixed set of constants.
Common examples include directions (NORTH, SOUTH, EAST, and
WEST) and the days of the week. Because they are constants, the names
of an enum type fields are in uppercase letters. In Java, you define an
enum type by using the enum keyword. For example, you may specify a
day-of-the-week type as follows:

public enum Day {


SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY,
FRIDAY, SATURDAY }

You can use enum types when you need to represent a fixed set of
constants. This includes natural enum types such as the solar system
planets, the choices on a menu and data sets where you know all possible
values at compile time.

6-7.4. Arrays in Java


An array is a container object that holds a fixed number of values of a
single type. The length of an array is established when the array is
created. After its creation, the array length is fixed. Each item in an array
is called an element, and each element is accessed by its numerical index.
As shown in the following illustration, numbering begins with 0. The 9th
element, for example, would therefore be accessed at index 8.

Like variables of other types, an array declaration has two components:


the array type and the array name. For example, you may declare
myArray as follows:

int[] myArray; // declaration of an array of integers

You may also place the square brackets after the array name:

float myArrayFloats[]; // this form is not recommended

Another way to create an array is by the new operator. The next statement
allocates an array with ten integer elements and assigns the array to the
myArray variable.

myArray = new int[10]; // create an array of 10 integers

You may assign values to each element of the array as follows:

362

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

myArray[0] = 100; // initialize first element


myArray[1] = 200; // initialize second element

6-7.5. Java Operators


Operators are special symbols that perform specific operations on one,
two, or three operands, and then return a result. One of the most common
operators is the simple assignment operator "=". It assigns the value on its
right to the operand on its left, like this:

int cadence = 0;

The following table summarizes all the Java operators and their
precedence.
Table 6-15. Java operators

Operators Precedence
postfix expr++ expr--
unary ++expr --expr +expr -expr ~ !
multiplicative */%
additive +-
shift << >> >>>
relational < > <= >= instanceof
equality == !=
bitwise AND &
bitwise exclusive OR ^
bitwise inclusive OR |
logical AND &&
logical OR ||
ternary ?:
assignment = += -= *= /= %= &= ^= |= <<= >>= >>>=

The following reference summarizes the operators supported by Java.


363

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

i- Simple Assignment Operator


= Simple assignment operator
ii- Arithmetic Operators
+ Additive operator (also used for String concatenation)
- Subtraction operator
* Multiplication operator
/ Division operator
% Remainder operator
iii- Unary Operators
+ Unary plus operator; indicates positive value
- Unary minus operator; negates an expression
++ Increment operator; increments a value by 1
-- Decrement operator; decrements a value by 1
! Logical compliment operator; inverts the value of a boolean
iv- Equality and Relational Operators
== Equal to
!= Not equal to
> Greater than
>= Greater than or equal to
< Less than
<= Less than or equal to
v- Conditional Operators
&& Conditional-AND
|| Conditional-OR
?: Ternary (shorthand for if-then-else statement)

vi- Type Comparison Operator


instanceof Compares an object to a specified type
vii- Bitwise and Bit Shift Operators
~ Unary bitwise complement
<< Signed left shift
>> Signed right sift
>>> Unsigned right shift
& Bitwise AND
^ Bitwise exclusive OR
| Bitwise inclusive OR

364

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-7.6. Expressions in Java


An expression is a construct made up of variables, operators, and method
invocations, which are constructed according to the syntax of the
language and evaluates to a single value. You've already seen examples
of expressions, illustrated in bold below:

int cadence = 0 ;
myArray[0] = 100 ;

The Java programming language allows you to construct compound


expressions from various smaller expressions as long as the data type
required by one part of the expression matches the data type of the other.
Here's an example of a compound expression:

Int j = 1 * 2 * 3

6-7.7. Java Statements and Blocks


Statements are roughly equivalent to sentences in natural languages. A
statement forms a complete unit of execution. The following types of
expressions can be made into a statement by terminating the expression
with a semicolon ( ; ).
 Assignment expressions
 Any use of ++ or --
 Method invocations
 Object creation expressions
Such statements are called expression statements. Here are some
examples of expression statements.

aValue = 8933.234; // assignment statement


aValue++; // increment statement

In addition to expression statements, there are other declaration


statements and control flow statements. A block of code is a group of
statements between balanced braces and can be used anywhere a single
statement. The following example illustrates the use of blocks:
class BlockDemo {
public static void main (String[] args) {
boolean condition = true;
if (condition)
{ // begin block 1
365

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

System.out.println("Condition is true.");
} // end block one
else
{ // begin block 2
System.out.println("Condition is false.");
} // end block 2
}
}

6-7.8. Control Flow Statements


The statements inside source programs are generally executed from top to
bottom. However, Control flow statements break up the flow of execution
by employing branching and looping, enabling your program to
conditionally execute particular blocks of code. We describe here the
decision-making statements (if-then, if-then-else, switch), the looping
statements (for, while, do-while), and the branching statements (break,
continue, return) of Java language.
i. The if-then Statement
The if-then statement is the most basic of all the control flow statements.
It tells the program to execute a certain section of code only if a particular
test evaluates to true. For example, the Bicycle class could allow the
brakes to decrease the bicycle speed if the bicycle is already in motion.
One possible implementation of applyBrakes may be as follows:

void applyBrakes() {
if (isMoving) { // the "if" clause: bicycle must moving
currentSpeed-- ; // the "then" clause: decrease current speed }
}

ii. The if-then-else Statement


The if-then-else statement provides a secondary path of execution when
an "if" clause evaluates to false. You could use an if-then-else statement
in the applyBrakes method to take some action if the brakes are applied
when the bicycle is not in motion. In this case, the action is to print an
error message stating that the bicycle has already stopped.
void applyBrakes()
{
if (isMoving)
{
currentSpeed--;
}
366

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

else
{
System.err.println("The bicycle has already stopped!");
}
}

iii. The switch Statement


Unlike if-then and if-then-else, the switch statement allows for any
number of possible execution paths. A switch works with the byte, short,
char, and int primitive data types. It also works with enumerated types.
The following program, SwitchDemo, declares an int named month
whose value represents a month of the year. The program displays the
name of the month, based on its value, using the switch statement.

class SwitchDemo
{
public static void main(String[] args)
{
int month = 8;
switch (month) {
case 1: System.out.println("January"); break;
case 2: System.out.println("February"); break;
case 3: System.out.println("March"); break;
case 4: System.out.println("April"); break;
case 5: System.out.println("May"); break;
case 6: System.out.println("June"); break;
case 7: System.out.println("July"); break;
case 8: System.out.println("August"); break;
case 9: System.out.println("September"); break;
case 10: System.out.println("October"); break;
case 11: System.out.println("November"); break;
case 12: System.out.println("December"); break;
default: System.out.println("Invalid month.");break;
}
}
}

iv. The while and do-while Statements


The while statement continually executes a block of statements while a
particular condition is true. Its syntax can be expressed as:

367

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

while (expression)
{
statement(s)
}

The while statement evaluates expression, which must return a


boolean value. If the expression evaluates to true, the while
statement executes the statement(s) in the while block. The while
statement continues testing the expression and executing its block until
the expression evaluates to false. You can implement an infinite loop
using the while statement as follows:

while (true)
{
// your code goes here
}

The Java programming language also provides a do-while statement,


which can be expressed as follows:

do
{
statement(s)
} while (expression);

The difference between do-while and while is that do-while evaluates its
expression at the bottom of the loop instead of the top. Therefore, the
statements within the do block are executed at least once

v. The for Statement


The for statement provides a compact way to iterate over a range of
values. Programmers often refer to it as the "for loop" because of the way
in which it repeatedly loops until a particular condition is satisfied. The
general form of the for statement can be expressed as follows:

for (initialization; termination; increment) { statement(s) }

When using this version of the for statement, keep in mind that:
 The initialization expression initializes the loop.
 The loop terminates When the termination expression is FALSE.
 The increment expression is invoked after each iteration of the loop:
368

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

class ForDemo {
public static void main(String[] args)
{
for(int i=1; i<11; i++)
{
System.out.println("Count is: " + i);
}
}
}

The three expressions of the for loop are optional; an infinite loop can
be created as follows:

for ( ; ; ) // infinite loop


{
// your code goes here
}

vi, The break Statement


The break statement has two forms: labeled and unlabeled. You saw the
unlabeled form in the previous discussion of the switch statement. You
can also use an unlabeled break to terminate a for, while, or do-while
loop, as in the following program:

class BreakDemo {
public static void main(String[] args) {
int[] arrayOfInts = { 32, 87, 3, 589, 12, 1076, 2000, 8, 622, 127 };
int i; int searchfor = 12;
boolean foundIt = false;
for (i = 0; i < arrayOfInts.length; i++)
{
if (arrayOfInts[i] == searchfor) { foundIt = true; break; }
}
if (foundIt) {
System.out.println("Found " + searchfor+ " at index " + i);
} else
{ System.out.println(searchfor + " not in the array");
}
}
}

369

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

vii. The continue Statement


The continue statement skips the current iteration of a for, while , or do-
while loop. The unlabeled form skips to the end of the innermost loop's
body and evaluates the Boolean expression that controls the loop.

viii, The return Statement


The last of the branching statements is the return statement. The return
statement exits from the current method, and control flow returns to
where the method was invoked. The return statement has two forms: one
that returns a value, and one that doesn't. To return a value, simply put the
value (or an expression that calculates the value) after the return keyword.

return ++count;

6-7.9. Classes and Objects in Java


We have mentioned so far (when we talked about C++), the concept of
classes and how they can be instantiated into objects, in object-oriented
programs. Here, we talk about classes, methods and constructors in Java.

i. Classes in Java
Here is sample code for a possible implementation of a Bicycle class, to
give you an overview of a class declaration. For the moment, don't
concern yourself with the details.

public class Bicycle


{
// the Bicycle class has three fields
public int cadence;
public int gear;
public int speed;
// the Bicycle class has one constructor
public Bicycle(int startCadence, int startSpeed, int startGear) {
gear = startGear;
cadence = startCadence;
speed = startSpeed; }
// the Bicycle class has four methods
public void setCadence(int newValue) { cadence = newValue; }
public void setGear(int newValue) { gear = newValue; }
public void applyBrake(int decrement) { speed -= decrement; }
public void speedUp(int increment) {
speed += increment;
}
}
370

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

ii. Class Declaration


You've seen classes defined in the following way:

class MyClass {
//field, constructor, and method declarations
}

This is a class declaration. The class body (between the braces) contains
all the necessary code of the created objects from the class: constructors
for initializing new objects, declarations for the fields and its objects, and
methods to implement the behavior of the class and its objects. In general,
class declarations can include these components, in order:

1. Modifiers such as public, private and protected.


2. The class name, with the initial letter capitalized by convention.
3. The name of the class parent (superclass), if any, preceded by the
keyword extends.
4. A comma-separated list of interfaces implemented by the class, if
any, preceded by the keyword implements. A class can implement more
than one interface.
5. The class body, surrounded by braces, {}.

iii. Declaration of Member Variables


There are several kinds of variables:
 Member variables in a class—these are sometimes called fields.
 Variables in a method or block of code, called local variables.
 Variables in method declarations, called parameters.
The Bicycle class uses the following lines of code to define its fields:

public int cadence;


public int gear;
public int speed;

Field declarations are composed of three components, in order:

1. Zero or more modifiers, such as public or private.


2. The field's type.
3. The field's name.

The fields of Bicycle are named cadence, gear, and speed and are all of
type integer. The public keyword identifies these fields as public
members, accessible by any object that can access the class.

371

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

iv. Access Modifiers


The first (left-most) modifier used lets you control what other classes
have access to a member field. Let‘s consider only public and privat
modifiers for the moment. Other access modifiers will be discussed later.

 public modifier—the field is accessible from all classes.


 private modifier—the field is accessible only within its own class.

According to the encapsulation concept, it is common to make some


fields private. This means that they can only be directly accessed from the
Bicycle class. We still need access to these values, however. This can be
done indirectly by adding public methods that obtain the field values:

public class Bicycle {


private int cadence;, gear, speed;
public Bicycle(int startCadence, int startSpeed, int startGear) {
gear = startGear; cadence = startCadence; speed = startSpeed; }
public int getCadence() { return cadence; }
public void setCadence(int newValue) { cadence = newValue; }
public int getGear() { return gear; }
public void setGear(int newValue) { gear = newValue; }
public int getSpeed() { return speed; }
public void applyBrake(int decrement) { speed -= decrement; }
public void speedUp(int increment) { speed += increment; }
}

v. Class Methods (Functions)


Class methods in Java are similar to class functions in C++. Here is an
example of a typical method declaration in Java:

public double calculateAnswer (double wingSpan, int numberOfEngines,


double length, double grossTons) { //do the calculation here }

The basic elements of a method declaration are the method name, return
type, parentheses, and a body between braces, {}. Generally, method
declarations have six components, in order:
1. Modifiers—such as public, private and protected.
2. The return type—the data type of the value returned by the method,
3. The method name.
4. The parameter list in parenthesis.
5. An exception list.
6. The method code body, enclosed between braces.
372

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Although a method (or function) name can be any legal identifier, code
conventions restrict method names. By convention, method names should
be a verb in lowercase or a multi-word name that begins with a verb in
lowercase, followed by adjectives, nouns, etc. In multi-word names, the
first letter of each of the second and following words should be
capitalized. Here are some examples:

runFast
getBackground
getFinalData
setX
isEmpty

Typically, a method has a unique name in its class. However, a method


may have the name of other methods in other classes, or with different list
of parameters. This feature is called method (or function) overloading.
vi. Method Overloading
Such as other object-oriented programming languages, Java supports
method or function overloading. This means that methods within a class
can have the same name if they have different parameter lists. Suppose
that you have a class that can use calligraphy to draw various types of
data (strings, integers, and so on) and that contains a method for drawing
each data type. It is cumbersome to use a new name for each method—for
example, drawString, drawInteger, drawFloat, and so on. In the Java,
you can use the same name for all the drawing methods but with different
argument lists. Thus, the data drawing class might declare four methods
named draw, each of which has a different parameter list.

public class DataArtist {


...
public void draw(String s) {...}
public void draw(int i) {...}
public void draw(double f) {...}
}

Overloaded methods are differentiated by the number and the type of the
arguments passed into the method. In the code sample, draw(String s) and
draw(int i) are distinct methods because they require different argument
types. You cannot declare more than one method with the same name
and the same number and type of arguments, because the compiler cannot
differentiate between them. The compiler does not consider return type
when differentiating methods.
373

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

vii. Class Constructors


A class contains constructors that are invoked to create objects from the
class. Constructor declarations look like method declarations—except
that they use the same name of the class and have no return type. For
example, Bicycle has a constructor

public Bicycle(int startCadence, int startSpeed, int startGear) {


gear = startGear;
cadence = startCadence;
speed = startSpeed; }

To create a new Bicycle object called myBike, a constructor is called by


the new operator:

Bicycle myBike = new Bicycle(30, 0, 8);

This creates space in memory for the object and initializes its fields.
Although Bicycle only has one constructor, it could have others, including
a no-argument constructor:

public Bicycle() { gear = 1; cadence = 10; speed = 0; }

The statement Bicycle yourBike = new Bicycle(); invokes a no-argument


constructor to create a new Bicycle object called yourBike. Both
constructors could have been declared in Bicycle because they have
different argument lists. Like methods, Java differentiates constructors on
the basis of the number of arguments in the list and their types.

You cannot write two identical constructors that have the same number
and type of arguments for the same class, because the compiler won‘t be
able to tell them apart. It is not obligatory to provide a constructor for
your class, but you should be careful when doing this. The compiler
automatically provides a no-argument, default constructor for any class
without constructors. This default constructor will call the no-argument
constructor of the parent superclass.

viii. Passing Information to a Method


The declaration for a method or a constructor declares the number and the
type of the arguments for that method or constructor. For example, the
following method computes the monthly payments for a home loan,
based on the amount of the loan, the interest rate, the length of the loan
(the number of periods), and the future value of the loan:
374

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

public double computePayment (double loanAmt, double rate, double


futureValue, int numPeriods)
{
double interest = rate / 100.0;
double partial1 = Math.pow((1 + interest), - numPeriods);
double denominator = (1 - partial1) / interest;
double answer = (-loanAmt/denominator) - ((futureValue*partial1)/
denominator);
return answer;
}

ix- Parameter Types


You can use any data type for a parameter of a method or a constructor.
This includes primitive data types, such as doubles, floats, and integers,
as you saw in the computePayment method, and reference data types,
such as objects and arrays. Here's an example of a method that accepts an
array as an argument. In this example, the method creates a new Polygon
object and initializes it from an array of Points

public Polygon polygonFrom(Point[] corners) {// method body here }

x- Arbitrary Number of Arguments


You can use a construct called varargs to pass an arbitrary number of
values to a method. You use varargs when you don't know how many of
a particular type of argument will be passed to the method. It's a shortcut
to creating an array manually (the previous method could have used
varargs rather than an array).
To use varargs, you follow the type of the last parameter by an ellipsis
(three dots, ...), then a space, and the parameter name. The method can
then be called with any number of that parameter.

public Polygon polygonFrom(Point... corners)


{
int numberOfSides = corners.length;
double squareOfSide1, lengthOfSide1;
squareOfSide1=(corners[1].x - corners[0].x)*(corners[1].x - corners[0].x)
+ (corners[1].y -corners[0].y)*(corners[1].y - corners[0].y) ;
lengthOfSide1 = Math.sqrt(squareOfSide1);
// more method body code follows that creates and returns a polygon
// connecting the Points
}
375

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Note that the method, corners is treated like an array. The method can be
called either with an array or with a sequence of arguments. The method
code will treat parameter as an array in all cases. You will most
commonly see varargs with the printing methods; for example, the printf
method, which allows you to print an arbitrary number of objects. It can
be called as follows:

System.out.printf("%s: %d, %s%n", name, idnum, address);

xi- Parameter Names


When you declare a parameter to a method, you provide a name for that
parameter. This name is used within the method body to refer to the
passed-in argument. The name of a parameter must be unique in its
scope. It cannot be the name of a local variable within the method or
constructor. A parameter can have the same name as one of the class
fields. In this case, the parameter is said to shadow the field. Shadowing
fields can make your code difficult to read and is used only within
constructors and methods that set a particular field. For example, consider
the following Circle class and its setOrigin method:
public class Circle
{
private int x, y, radius;
public void setOrigin(int x, int y) {
...
}
}
The Circle class has three fields: x, y, and radius. The setOrigin method
has two parameters, each has the same name as a field. Each method
parameter shadows the field that shares its name. So using the simple
names x or y within the method refers to the parameter, not the field.

xii- Passing Primitive Data Type Arguments


Primitive arguments, such as an int, are passed into methods by value.
When the method returns, the parameters are gone and any changes to
them are lost. Here is an example:

public class PassPrimitiveByValue


{
public static void main(String[] args) {
int x = 3;
passMethod(x); // print x to see if its value has changed
376

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

System.out.println("After invoking passMethod, x = " + x); }


// change parameter in passMethod()
public static void passMethod(int p) {
p = 10;
}
}

When you run this program, the output will be as follows:

After invoking passMethod, x = 3

xiii. Passing Reference Data Type Arguments


Reference data type parameters, such as objects, are also passed into
methods by value. This means that when the method returns, the passed-
in reference still references the same object as before. However, the
values of the object's fields can be changed in the method, if they have
the proper access level. For example, consider a method in an arbitrary
class that moves Circle objects:

public void moveCircle(Circle circle, int deltaX, int deltaY) {


// code to move origin of circle to x+deltaX, y+deltaY
circle.setX(circle.getX() + deltaX);
circle.setY(circle.getY() + deltaY);
//code to assign a new reference to circle
circle = new Circle(0, 0);
}

Let the method be invoked with these arguments:

moveCircle (myCircle, 23, 56),

Inside the method, circle initially refers to myCircle. The method changes
the x and y coordinates of the object that circle references (i.e., myCircle)
by 23 and 56, respectively. These changes will persist when the method
returns. Then circle is assigned a reference to a new Circle object with
x=y=0. This reassignment has no permanence, because the reference was
passed in by value and cannot change. Within the method, the object
pointed to by circle has changed, but, when the method returns, myCircle
still references the same Circle object as before the method was called.
6-7.10. Java Objects
In a typical Java program, you create many objects, which interact by
invoking methods.
377

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Through object interactions, a program can carry out various tasks, such
as sending and receiving information over a network. Once an object has
completed its work, its memory resources should be recycled for use by
other objects. Here's a small program that creates three objects: one Point
object and two Rectangle objects. The program displays information
about various objects.

public class CreateObjectDemo {


public static void main(String[] args) {
//Declare and create a point object and two rectangle objects.
Point originOne = new Point(23, 94);
Rectangle rectOne = new Rectangle(originOne, 100, 200);
Rectangle rectTwo = new Rectangle(50, 100);
//display rectOne's width, height, and area
System.out.println("Width of rectOne: " + rectOne.width);
System.out.println("Height of rectOne: " + rectOne.height);
System.out.println("Area of rectOne: " + rectOne.getArea());
//set rectTwo's position
rectTwo.origin = originOne;
//display rectTwo's position
System.out.println("X Position of rectTwo: "+ rectTwo.origin.x);
System.out.println("Y Position of rectTwo: "+ rectTwo.origin.y);
//move rectTwo and display its new position rectTwo.move(40, 72);
System.out.println("X Position of rectTwo: "+ rectTwo.origin.x);
System.out.println("Y Position of rectTwo: "+ rectTwo.origin.y);
}
}

After running this program, here's the output:

Width of rectOne: 100


Height of rectOne: 200
Area of rectOne: 20000
X Position of rectTwo: 23
Y Position of rectTwo: 94
X Position of rectTwo: 40
Y Position of rectTwo: 72

The following sections use the above example to describe the life cycle of
an object within a program. From them, you will learn how to write code
that creates and uses objects in your own programs. You will also learn
how the system cleans up after an object when its life has ended.
378

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

i. Creating Objects
As you know, a class provides the blueprint for objects; you create an
object from a class. Each of the following statements taken from the
CreateObjectDemo program creates an object and assigns it to a variable:

Point originOne = new Point(23, 94);


Rectangle rectOne = new Rectangle(originOne, 100, 200);
Rectangle rectTwo = new Rectangle(50, 100);

The first line creates an object of the Point class, and the second and third
lines each create an object of the Rectangle class. Each of these
statements has three parts:

Declaration: The code set in bold are all variable declarations that
associate a variable name with an object type.
Instantiation: The new keyword is a Java operator that creates the object.
Initialization: The new operator is followed by a call to a constructor,
which initializes the new object.

ii. Declaring a Variable to Refer to an Object


Previously, you learned that to declare a variable, you write:

type name;

This notifies the compiler that you will use name to refer to data whose
type is type. With a primitive variable, this declaration also reserves the
proper amount of memory for the variable. You can also declare a
reference variable on its own line. For example: Point originOne; If you
declare originOne like this, its value will be undetermined until an object
is actually created and assigned to it. Simply declaring a reference
variable does not create an object. For that, you need to use the new
operator, as described in the next section. You must assign an object to
originOne before you use it in your code. Otherwise, you will get a
compiler error. A variable in this state, which currently references no
object, can be illustrated as follows (the variable name, originOne, plus a
reference pointing to nothing):
iii. Instantiating a Class
The new operator instantiates a class by allocating memory for a new
object and returning a reference to that memory. The new operator also
invokes the object constructor. The new operator requires a single, postfix
argument: a call to a constructor. The name of the constructor provides
the name of the class to instantiate.
379

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The new operator returns a reference to the object it created. This


reference is usually assigned to a variable of the appropriate type, like:

Point originOne = new Point(23, 94);

iv. Initializing an Object


Here's the code for the Point class:

public class Point {


public int x = 0;
public int y = 0;
// constructor
public Point(int a, int b) {
x = a; y = b;
}
}

This class contains a single constructor. You can recognize a constructor


because its declaration uses the same name as the class and it has no
return type. The constructor in the Point class takes two integer
arguments, as declared by the code (int a, int b). The following statement
provides 23 and 94 as values for those arguments:

Point originOne = new Point(23, 94);

The result of executing this statement can be illustrated in the next figure:

Here's the code for the Rectangle class, which contains four constructors:

public class Rectangle {


public int width = 0;
public int height = 0;
public Point origin;

380

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

// four constructors
public Rectangle() {
origin = new Point(0, 0); }
public Rectangle(Point p) {
origin = p; }
public Rectangle(int w, int h) {
origin = new Point(0, 0); width = w; height = h; }
public Rectangle(Point p, int w, int h) {
origin = p; width = w; height = h; }
// a method for moving the rectangle
public void move(int x, int y) {
origin.x = x; origin.y = y; }
// a method for computing the area of the rectangle
public int getArea() { return width * height; }
}

Each constructor lets you provide initial values for the rectangle's size
and width, using both primitive and reference types. If a class has
multiple constructors, they must have different signatures. The Java
compiler differentiates the constructors based on the number and the type
of the arguments.
When the Java compiler encounters the following code, it calls the
constructor in the Rectangle class that requires a Point argument followed
by 2 integer arguments:

Rectangle rectOne = new Rectangle(originOne, 100, 200);

This calls one of Rectangle's constructors to initialize origin to originOne.


Also, the constructor sets width to 100 and height to 200. Now there are
two references to the same Point object— an object can have multiple
references to it, as shown in the next figure. The following line of code
calls the Rectangle constructor that requires two integer arguments, which
provide the initial values for width and height. If you inspect the code
within the constructor, you will see that it creates a new Point object
whose x and y values are initialized to 0:

Rectangle rectTwo = new Rectangle(50, 100);

The Rectangle constructor used in the following statement doesn't take


any arguments, so it's called a no-argument constructor:

Rectangle rect = new Rectangle();


381

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

All classes have at least one constructor. If a class does not explicitly
declare any, the Java compiler automatically provides a no-argument
constructor, called the default constructor. This default constructor calls
the class parent's no-argument constructor, or the Object constructor if the
class has no other parent. If the parent has no constructor (Object does
have one), the compiler will reject the program.

v- Calling the Object Methods


You also use an object reference to invoke an object's method. You
append the method's simple name to the object reference, with an
intervening dot operator (.). Also, you provide, within enclosing
parentheses, any arguments to the method. If the method does not require
any arguments, use empty parentheses.
objectReference.methodName(argumentList);

or

objectReference.methodName();

The Rectangle class has two methods: getArea() to compute the rectangle
area and move() to change the rectangle's origin.

vi- Using this operator with a Field


The most common reason for using the this keyword is because a field is
shadowed by a method or constructor parameter. For example, the Point
class was written like this
382

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

public class Point {


public int x = 0;
public int y = 0;
//constructor
public Point(int a, int b) { x = a; y = b; }
}
It could also be written as follows:
public class Point
{
public int x = 0;
public int y = 0;
public Point(int x, int y) { // constructor
this.x = x; this.y = y; }
}

vii. Nested Classes


The Java programming language allows you to define a class within
another class. Such a class is called a nested class and is illustrated here:
class OuterClass {
...
class NestedClass {
...
}
}

A nested class is a member of its enclosing class and has access to other
members of the enclosing class, even if they were private. As a member
of OuterClass, a nested class can be declared private, public, protected, or
package private. Recall that outer classes can only be declared public or
package private.
6-7.11. Interfaces in Java
There are a number of situations in software engineering where each team
should be able to write their code without any knowledge of how the
other group's code is written. Generally speaking, interfaces are such
protocols between different pieces of software.

In Java programming language, an interface is a reference type, similar to


a class that can contain only constants, and method signatures
(prototypes). There are no method bodies. Interfaces cannot be
instantiated—they can only be implemented by classes or
extended by other interfaces. 383 Defining an interface is similar to

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

creating a new class:

public interface OperateCar {


// constant declarations, if any
// method signatures
int turn(Direction direction, // An enum with values RIGHT, LEFT
double radius, double startSpeed, double endSpeed);
int changeLanes(Direction direction, double startSpeed, double
endSpeed);
int signalTurn(Direction direction, boolean signalOn);
int getRadarFront(double distanceToCar, double speedOfCar);
int getRadarRear(double distanceToCar, double speedOfCar);
......
// more method signatures
}

Note that the method prototyping (signature) have no braces and are
terminated with a semicolon. To use an interface, you write a class that
implements the interface. When an instantiable class implements an
interface, it provides a method body for each of the methods declared in
the interface. For example,

public class OperateBMW760i implements OperateCar


{
// the OperateCar method signatures, with implementation, for example:
int signalTurn(Direction direction, boolean signalOn) {
//code to turn BMW's LEFT turn indicator lights on
//code to turn BMW's LEFT turn indicator lights off
//code to turn BMW's RIGHT turn indicator lights on
//code to turn BMW's RIGHT turn indicator lights off }
// other members, as needed -- for example, helper classes
// not visible to clients of the interface
}

If you have a robotic car for example, it is the automobile manufacturers


who will implement the interface software. Chevrolet's implementation
will be substantially different from that of Toyota, of course, but both
manufacturers will adhere to the same interface.

ii- Interfaces and Multiple Inheritance


Interfaces have another very important role in the Java programming
language. Interfaces are not part of the class hierarchy, although they
work in combination with classes.
384

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The Java programming language does not permit multiple inheritance


(inheritance is discussed later in this lesson), but interfaces provide an
alternative.
In Java, a class can inherit from only one class but it can implement more
than one interface. Therefore, objects can have multiple types: the type of
their own class and the types of all the interfaces that they implement.
This means that if a variable is declared to be the type of an interface, its
value can reference any object that is instantiated from any class that
implements the interface.

iii- Defining an Interface


An interface declaration consists of modifiers, the keyword interface, the
interface name, a comma-separated list of parent interfaces (if any), and
the interface body. For example:

public interface GroupedInterface extends Interface1,


Interface2, Interface3
{
// constant declarations
double E = 2.718282; // base of natural logarithms
// method signatures
void doSomething (int i, double x);
int doSomethingElse(String s);
}

The public access specifier indicates that the interface can be used by any
class in any package. If you do not specify that the interface is public, it
will be accessible only to classes defined in the same package. An
interface can extend other interfaces, just as a class can derive from other
classes. The interface declaration includes a comma-separated list of all
the interfaces that it extends.

iii- The Interface Body


The interface body contains method declarations for all the methods
included in the interface. A method declaration within an interface is
followed by a semicolon, but no braces, because an interface does not
provide implementations for the methods declared within it. All methods
declared in an interface are implicitly public, so the public modifier can
be omitted. An interface can contain constant declarations in addition to
method declarations. All constant values defined in an interface are
implicitly public, and static. Once again, the modifiers can be omitted.

385

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

iv- Interface Implementation


To declare a class that implements an interface, you include an implement
clause in the class declaration. Your class can implement more than one
interface, so the implements keyword is followed by a comma-separated
list of the interfaces implemented by the class. By convention, the
implements clause follows the extends clause, if there is one. Consider
the following interface that defines how to compare the size of objects.

public interface Relatable {


// this (object calling isLargerThan) and other must be instances
// of the same class
// returns 1, 0, -1 if this is greater than, equal to, or less than other
public int isLargerThan(Relatable other);
}

If you want to be able to compare the size of similar objects, no matter


what they are, the class that instantiates them should implement
Relatable. Any class can implement Relatable if there is some way to
compare the relative "size" of objects instantiated from the class. For
strings, it could be number of characters; for books, it could be number of
pages; and so forth. For planar geometric objects, area would be a good
choice, while volume would work for three-dimensional geometric
objects. All such classes can implement the isLargerThan() method.

v- Using an Interface as a Type


When you define a new interface, you are defining a new reference data
type. You can use interface names anywhere you can use any other data
type name. If you define a reference variable whose type is an interface,
any object you assign to it must be an instance of a class that implements
the interface.

386

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-7.12. Class Inheritance in Java


The folloeing sections describe the way in which you can derive one class
from another. That is, how a subclass can inherit fields and methods from
a superclass. You will learn that all classes are derived from the Object
class, and how to modify the methods that a subclass inherits from
superclasses. This section also covers interface-like abstract classes.

i. Java Class Hierarchy


The Object class, defined in the java.lang package, defines and
implements behavior common to all classes—including the ones that you
write. In Java, many classes derive directly from Object, other classes
derive from some of those classes, and so on, forming a hierarchy of
classes. As shown in the following figure, all Classes in the Java Platform
are Descendants of Object. At the top of the hierarchy,

Fig. 6-7(a). Class hierarchy in java

Object is the most general of all classes. Classes near the bottom of the
hierarchy provide more specialized behavior.

387

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Fig. 6-7(b). Class hierarchy in java.lang package.

388

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

ii. An Example of Inheritance


Here is the sample code for a possible implementation of a Bicycle class
that was presented in the Classes and Objects lesson:

public class Bicycle {


// the Bicycle class has 3 fields
public
int cadence, gear, speed;
// the Bicycle class has 1 constructor
public Bicycle(int startCadence, int startSpeed, int startGear) {
gear = startGear; cadence = startCadence; speed = startSpeed; }
// the Bicycle class has 4 methods
public void setCadence(int newValue) { cadence = newValue; }
public void setGear(int newValue) { gear = newValue; }
public void applyBrake(int decrement) { speed -= decrement; }
public void speedUp(int increment) { speed += increment; }
}

A class declaration for a MountainBike class that is a subclass of Bicycle


might look like this:

public class MountainBike extends Bicycle {


// the MountainBike subclass adds one field
public int seatHeight;
// the MountainBike subclass has one constructor
public MountainBike(int startHeight, int startCadence, int startSpeed, int
startGear) { super(startCadence, startSpeed, startGear);
seatHeight = startHeight; }
// the MountainBike subclass adds one method
public void setHeight(int newValue) { seatHeight = newValue; }
}

The class MountainBike inherits all the fields and methods of Bicycle and
adds the field seatHeight and a method to set it. Except for the
constructor, it is as if you had written a new MountainBike class from
scratch, with 4 fields and 5 methods.

iii-What You Can Do in a Subclass


A subclass inherits all of the public and protected members of its parent,
no matter in which package the subclass is. If the subclass is in the same
package as its parent, it also inherits the package-private members of the
parent.
389

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

You can use the inherited members, replace them, hide them, or
supplement them with new members:

 The inherited fields can be used directly, just like any other fields.
 You can declare a field in the subclass with the same name as the
one in the superclass, thus hiding it (not recommended).
 You can declare new fields in the subclass that are not in the
superclass.
 The inherited methods can be used directly as they are.
 You can write a new instance method in the subclass that has the
same signature as the one in the superclass, thus overriding it.
 You can write a new static method in the subclass that has the
same signature as the one in the superclass, thus hiding it.
 You can declare new methods in the subclass that are not in the
superclass.
 You can write a subclass constructor that invokes the constructor of
the superclass, either implicitly or by using the keyword super.
The following sections in this lesson will expand on these topics.

iv- Private Members in a Superclass


A subclass does not inherit the private members of its parent class.
However, if the superclass has public or protected methods for accessing
its private fields, these can also be used by the subclass.
A nested class has access to all the private members of its enclosing
class—both fields and methods. Therefore, a public or protected nested
class inherited by a subclass has indirect access to all of the private
members of the superclass.

v- Casting Objects
We have seen that an object is of the data type of the class from which it
was instantiated. For example, if we write

public MountainBike myBike = new MountainBike();

then myBike is of type MountainBike. MountainBike is descended from


Bicycle and Object. Therefore, a MountainBike is a Bicycle and is also an
Object, and it can be used wherever Bicycle or Object objects are called
for. The reverse is not necessarily true.

Casting shows the use of an object of one type in place of another type,
among the objects permitted by inheritance and implementations. For
example, if we write
390

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Object obj = new MountainBike();

then obj is both an Object and a Mountainbike (until such time as obj is
assigned another object that is not a Mountainbike). This is called implicit
casting. If, on the other hand, we write

MountainBike myBike = obj;

we would get a compile-time error because obj is not known to the


compiler to be a MountainBike. However, we can tell the compiler that
we promise to assign a MountainBike to obj by explicit casting:

MountainBike myBike = (MountainBike)obj;

This cast inserts a runtime check that obj is assigned a MountainBike so


that the compiler can safely assume that obj is a MountainBike. If obj is
not a Mountainbike at runtime, an exception will be thrown.

Note: You can make a logical test as to the type of a particular object
using the instanceof operator. This can save you from a runtime error
owing to an improper cast. For example:

if (obj instanceof MountainBike)


{
MountainBike myBike = (MountainBike)obj;
}

Here the instanceof operator verifies that obj refers to a MountainBike so


that we can make the cast with no runtime exception thrown.

vi- Overriding and Hiding Methods


An object method in a subclass with the same signature (name, number
and type of its parameters) and return type overrides the superclass
method. The ability of a subclass to override a method allows a class to
inherit from a superclass and then to modify its behavior as needed. The
overriding method has the same name, number and type of parameters,
and return type as the method it overrides. An overriding method can
return a subtype of the type returned by the overridden method. This is
called a covariant return type. When overriding a method, you may want
to use the @Override annotation that instructs the compiler that you
intend to override a method in the superclass.

391

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

vii- SubClass Methods


If a subclass defines a class method with the same signature as a class
method in the superclass, the method in the subclass hides the one in the
superclass. The distinction between hiding and overriding has important
implications. The version of the overridden method that gets invoked is
the one in the subclass. The version of the hidden method that gets
invoked depends on whether it is invoked from the superclass or the
subclass. Let's look at an example that contains two classes. The first is
Animal, which contains one instance method and one class method:

public class Animal {


public static void testClassMethod() {
System.out.println("The class method in Animal."); }
public void testInstanceMethod() {
System.out.println("The instance method in Animal."); }
}

The second class, a subclass of Animal, is called Cat:

public class Cat extends Animal {


public static void testClassMethod() {
System.out.println("The class method in Cat."); }
public void testInstanceMethod() {
System.out.println("The instance method in Cat."); }
public static void main(String[] args) {
Cat myCat = new Cat();
Animal myAnimal = myCat;
Animal.testClassMethod();
myAnimal.testInstanceMethod(); }
}

The Cat class overrides the instance method in Animal and hides the class
method in Animal. The main method in this class creates an instance of
Cat and calls testClassMethod() on the class and testInstanceMethod() on
the instance. The output from this program is as follows:

The class method in Animal.


The instance method in Cat.

The version of the hidden method that gets invoked is the one in the
superclass, and the version of the overridden method that gets invoked is
the one in the subclass.
392

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

viii- Modifiers
The access specifier for an overriding method can allow more, but not
less, access than the overridden method. For example, a protected
instance method in the superclass can be made public, but not private, in
the subclass.

You will get a compile-time error if you attempt to change an instance


method in the superclass to a class method in the subclass, and vice versa.
Note that in a subclass, you can overload the methods inherited from the
superclass. Such overloaded methods neither hide nor override the
superclass methods—but they are new methods.

ix- Hiding Fields


Inside a class, a field that has the same name as a field in the superclass
hides the superclass field, even if their types are different. Within the
subclass, the field in the superclass cannot be referenced by its simple
name. Instead, the field must be accessed through super, which is covered
in the next section. Generally speaking, it is not recommended to hide
fields as this makes code difficult to read.

x- Accessing Superclass Members


If a method overrides one of its superclass's methods, you can invoke the
overridden method through the use of the keyword super. You can also
use super to refer to a hidden field. Consider this class, Superclass:

public class Superclass {


public void printMethod() {
System.out.println("Printed in Superclass."); }
}

Here is a subclass, called Subclass, that overrides printMethod():

public class Subclass extends Superclass {


public void printMethod() {
//overrides printMethod in Superclass
super.printMethod(); System.out.println("Printed in Subclass"); }
public static void main(String[] args) {
Subclass s = new Subclass();
s.printMethod();
}
}

393

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Within Subclass, the simple name printMethod() refers to the one


declared in Subclass, which overrides the one in Superclass. Thus, to
refer to printMethod() inherited from Superclass, Subclass must use a
qualified name, using super as shown. Compiling and executing Subclass
prints the following:

Printed in Superclass.
Printed in Subclass

xi- Subclass Constructors


The following example illustrates how to use the super keyword to
invoke a superclass's constructor. Recall from the Bicycle example that
MountainBike is a subclass of Bicycle. Here is the MountainBike
(subclass) constructor that calls the superclass constructor and then adds
initialization code of its own:

public MountainBike(int startHeight, int startCadence, int startSpeed,


int startGear) { super(startCadence, startSpeed, startGear);
seatHeight = startHeight; }

Invocation of a superclass constructor must be the first line in the


subclass constructor. The syntax for calling a superclass constructor is

super(); or super(parameter list);

With super(), the superclass no-argument constructor is called. With


super(parameter list), the superclass constructor with a matching
parameter list is called. Note that if a constructor does not explicitly
invoke a superclass constructor, the Java compiler automatically inserts a
call to the no-argument constructor of the superclass. If the super class
does not have a no-argument constructor, you will get a compile-time
error. Object does have such a constructor, so if Object is the only
superclass, there is no problem.

If a subclass constructor invokes a constructor of its superclass, either


explicitly or implicitly, you might think that there will be a whole chain
of constructors called, all the way back to the constructor of the root
Object. It is called constructor chaining, and you need to be aware of it.

394

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

xii- Using Object as a Superclasses


The Object class, in the java.lang package, sits at the top of the class
hierarchy tree. Every class is a descendant, direct or indirect, of the
Object class. Every class you use or write inherits the instance methods of
Object. You need not use any of these methods, but, if you choose to do
so, you may need to override them with code that is specific to your class.

The methods inherited from Object that are discussed in this section are:

 protected Object clone() throws CloneNotSupportedException


Creates and returns a copy of this object.
 public boolean equals(Object obj)
Indicates whether some other object is "equal to" this one.
 protected void finalize() throws Throwable
Called by the garbage collector on an object when garbage
collection determines that there are no more references to the object
 public final Class getClass()
Returns the runtime class of an object.
 public int hashCode()
Returns a hash code value for the object.
 public String toString()
Returns a string representation of the object.

The notify, notifyAll, and wait methods of Object all play a part in
synchronizing the activities of independently running threads in a
program, which is discussed in a later lesson and won't be covered here.

There are five of these methods:

 public final void notify()


 public final void notifyAll()
 public final void wait()
 public final void wait(long timeout)
 public final void wait(long timeout, int nanos)

A- The clone() Method


If a class, or one of its superclasses, implements the Cloneable interface,
you can use the clone() method to create a copy from an existing object.
In order to create a clone, you write:

CloneableObject.clone();

395

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Object's implementation of this method checks to see whether the object


on which clone() was invoked implements the Cloneable interface. If the
object does not, the method throws a CloneNotSupportedException
exception. Exception handling will be covered in a later lesson. For the
moment, you need to know that clone() must be declared as

protected Object clone() throws CloneNotSupportedException


-- or --
public Object clone() throws CloneNotSupportedException

when you are going to write a clone() method to override the one in
Object. If the object on which clone() was invoked does implement the
Cloneable interface, Object's implementation of the clone() method
creates an object of the same class as the original object and initializes the
new object's member variables to have the same values as the original
object's corresponding member variables.

The simplest way to make your class Cloneable is to add implements


Cloneable to your class's declaration. then your objects can invoke the
clone() method. For some classes, the default behavior of Object's
clone() method works just fine. If, however, an object contains a
reference to an external object, say ObjExternal, you may need to
override clone() to get correct behavior. Otherwise, a change in
ObjExternal made by one object will be visible in its clone also. This
means that the original object and its clone are not independent—to
decouple them, you must override clone() so that it clones the object and
ObjExternal. Then the original object references ObjExternal and the
clone references a clone of ObjExternal, so that the object and its clone
are truly independent.

B. The equals() Method


The equals() method compares two objects for equality and returns true if
they are equal. The equals() method provided in the Object class uses the
identity operator (==) to determine whether two objects are equal. For
primitive data types, this gives the correct result. For objects, it does not.

The equals() method provided by Object tests whether the object


references are equal—that is, if the objects compared are the exact same
object. In order to test whether two objects are equal (containing the
same information), you must override the equals() method. Here is an
example of a Book class that overrides equals():

396

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

public class Book {


...
public boolean equals(Object obj) {
if (obj instanceof Book)
return ISBN.equals((Book)obj.getISBN());
else
return false;
}
}

Consider this code that tests two instances of the Book class for equality:

Book firstBook=new Book("0201914670"); //Swing Tutorial, 2nd edition


Book secondBook = new Book("0201914670");
if (firstBook.equals(secondBook)) {
System.out.println("objects are equal");
} else { System.out.println("objects are not equal"); }

This program displays objects are equal even though firstBook and
secondBook reference two distinct objects. They are considered equal
because the objects compared contain the same ISBN number. You
should always override the equals() method if the identity operator is not
appropriate for your class. Note that if you override equals(), you must
override hashCode() as well.

C. The finalize() Method


The Object class provides a callback method, finalize(), that may be
invoked on an object when it becomes garbage. Object's implementation
of finalize() does nothing—you can override finalize() to do cleanup, such
as freeing resources. The finalize() method may be called automatically
by the system, but when it is called, or even if it is called, is uncertain.
Therefore, you should not rely on this method to do your cleanup. For
example, if you don't close file descriptors after performing I/O and you
expect finalize() to close them, you may run out of file descriptors.

D. The getClass() Method


You cannot override getClass. The getClass() method returns a Class
object, which has methods you can use to get information about the class,
such as its name getSimpleName(), its superclass getSuperclass(), and the
interfaces it implements getInterfaces(). For example, the following
method gets and displays the class name of an object:

397

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

void printClassName (Object obj)


{
System.out.println ("Object class is" obj.getClass().getSimpleName() );
}

The Class class, in the java.lang package, has a large number of methods
(more than 50). For example, you can test to see if the class is an interface
isInterface(), an annotation isAnnotation(), or an enumeration isEnum().
You can see what the object's fields are getFields() or what its methods
are getMethods(), and so on.

E. The hashCode() Method


The value returned by hashCode() is the object's hash code, which is the
object's memory address in hexadecimal. By definition, if two objects are
equal, their hash code must also be equal. If you override the equals()
method, you change the way two objects are equated and Object's
implementation of hashCode() is no longer valid. Therefore, if you
override the equals() method, you must also override the hashCode()
method as well.

F. The toString() Method


You should always consider overriding the toString() method in your
classes. The Object's toString() method returns a String representation of
the object, which is very useful for debugging. The String representation
for an object depends entirely on the object, which is why you need to
override toString() in your classes. You can use toString() along with
System.out.println() to display a text representation of an object, such as
an instance of Book:

System.out.println(firstBook.toString());

which would, for a properly overridden toString() method, print


something useful, like this:

ISBN: 0201914670; The JFC Swing Tutorial; A Guide to Constructing


GUIs, 2nd Edition

G. Writing Final Classes and Methods


You can declare some or all of a class's methods final. You use the final
keyword in a method declaration to indicate that the method cannot be
overridden by subclasses. The Object class does this—a number of its
methods are final.
398

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

You might wish to make a method final if it has an implementation that


should not be changed and it is critical to the consistent state of the
object. For example, you might want to make the getFirstPlayer method
in this ChessAlgorithm class final:

class ChessAlgorithm {
enum ChessPlayer { WHITE, BLACK }
...
final ChessPlayer getFirstPlayer() { return ChessPlayer.WHITE; }
...
}

Methods called from constructors should generally be declared final. If a


constructor calls a non-final method, a subclass may redefine that method
with surprising or undesirable results. Note that you can also declare an
entire class final — this prevents the class from being subclassed. This is
particularly useful, for example, when creating an immutable class like
the String class.

H. The printf() and format() Methods


The java.io package includes a PrintStream class that has two formatting
methods that you can use to replace print() and println(). These methods,
format() and printf(), are equivalent to one another. The familiar
System.out that you have been using happens to be a PrintStream object,
so you can invoke PrintStream methods on System.out. Thus, you can use
format() or printf() anywhere in your code where you have previously
been using print() or println(). For example,

System.out.format(.....);

The syntax for these two java.io.PrintStream methods is the same:

public PrintStream format(String format, Object... args)

where format is a string that specifies the formatting to be used and args
is a list of the variables to be printed using that formatting. A simple
example would be

System.out.format("The value of the float variable is %f, while the value


of the " + "integer variable is %d, and the string is %s", floatVar, intVar,
stringVar);

399

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The first parameter, format, is a format string specifying how the objects
in the second parameter, args, are to be formatted. The format string
contains plain text as well as format specifiers, which are special
characters that format the arguments of Object... args. Here, the notation
Object... args is called varargs, which means that the number of
arguments may vary.

Format specifiers begin with a percent sign (%) and end with a converter.
The converter is a character indicating the type of argument to be
formatted. Between the percent sign (%) and the converter you can have
optional flags and specifiers. There are many converters, flags, and
specifiers, which are documented in java.util.Formatter. Here is an
example:

int i = 461012;
System.out.format("The value of i is: %d%n", i);

The %d specifies that the single variable is a decimal integer. The %n is a


platform-independent newline character. The output is:
The value of i is: 461012.

The printf() and format() methods are overloaded. Each has a version
with the following syntax:

public PrintStream format(Locale l, String format, Object... args)

To print numbers in the French system (where a comma is used in place


of the decimal place in the English representation of floating point
numbers), for example, you would use:

System.out.format (Locale.FRANCE, "The value of the float variable is


%f, while the value of the integer variable is %d, and the string is %s%n",
floatVar, intVar, stringVar);

The following table lists some of the converters and flags.

400

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Table 6-16. Converters and flags used in TestFormat.java

Converter Flag Explanation


d A decimal integer.
f A float.
n A new line character. You should always use %n,
rather than \n.
tB A date & time conversion—locale-specific full name
of month.
td, te A date & time conversion—2-digit day of month. td
has leading zeroes as needed, te does not.
ty, tY A date & time conversion—ty = 2-digit year, tY = 4-
digit year.
tl A date & time conversion—hour in 12-hour clock.
tM A date & time conversion—minutes in 2 digits, with
leading zeroes as necessary.
tp A date & time conversion—am/pm (lower case).
tm A date & time conversion—months in 2 digits, with
leading zeroes as necessary.
tD A date & time conversion—date as %tm%td%ty
08 Eight characters in width, with leading zeroes as
necessary.
.3 Three places after decimal point.
10.3 Ten characters in width, right justified, with three
places after decimal point.

6-7.13. Abstract Methods and Abstract Classes in Java


An abstract class is a class that is declared abstract—it may or may not
include abstract methods. Abstract classes cannot be instantiated, but they
can be subclassed. An abstract method is a method that is declared
without an implementation (without braces, and followed by a
semicolon), like this:

abstract void moveTo(double deltaX, double deltaY);

If a class includes abstract methods, the class itself must be declared


abstract, as in:

401

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

public abstract class GraphicObject {


// declare fields
// declare non-abstract methods
abstract void draw();
}

When an abstract class is subclassed, the subclass usually provides


implementations for all of the abstract methods in its parent class.
However, if it does not, the subclass must also be declared abstract. Note
that All of the methods in an interface (see the Interfaces section) are
implicitly abstract, so the abstract modifier is not used with interface
methods (it could be—it's just not necessary).

i. Abstract Classes versus Interfaces


Unlike interfaces, abstract classes can contain fields that are not static and
final, and they can contain implemented methods. Such abstract classes
are similar to interfaces, except that they provide a partial
implementation, leaving it to subclasses to complete the implementation.
If an abstract class contains only abstract method declarations, it should
be declared as an interface instead. Multiple interfaces can be
implemented by classes anywhere in the class hierarchy, whether or not
they are related to one another in any way. Think of Comparable or
Cloneable, for example. By comparison, abstract classes are most
commonly subclassed to share pieces of implementation. A single
abstract class is subclassed by similar classes that have a lot in common
(the implemented parts of the abstract class), but also have some
differences (the abstract methods).

ii. Example about the Abstract Classes


In an object-oriented drawing application, you can draw circles,
rectangles, lines, Bezier curves, and many other graphic objects. These
objects all have certain states (for example: position, orientation, line
color, fill color) and behaviors (for example: moveTo, rotate, resize,
draw) in common. Some of these states and behaviors are the same for all
graphic objects—for example: position, fill color, and moveTo. Others
require different implementations—for example, resize or draw.

All GraphicObjects must know how to draw or resize themselves; they


just differ in how they do it. This is a perfect situation for an abstract
superclass. You can take advantage of the similarities and declare all the
graphic objects to inherit from the same abstract parent object—for
example, GraphicObject, as shown in the following figure.
402

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Fig. 6-8. GraphicObject Class hierarchy

Classes Rectangle, Line, Bezier, and Circle inherit from GraphicObject


First, you declare an abstract class, GraphicObject, to provide member
variables and methods that are wholly shared by all subclasses, such as
the current position and the moveTo method. GraphicObject also declares
abstract methods for methods, such as draw or resize, that need to be
implemented by all subclasses but must be implemented in different
ways. The GraphicObject class can look something like this:

abstract class GraphicObject {


int x, y;
...
void moveTo(int newX, int newY) {...}
abstract void draw();
abstract void resize();
}

Each non-abstract subclass of GraphicObject, such as Circle, must


provide implementations for the draw and resize methods

class Circle extends GraphicObject {


void draw() {...}
void resize() {...}
}
class Rectangle extends GraphicObject {
void draw() {...}
void resize() {...}
}

iii. When an Abstract Class Implements an Interface?


In the section of Interfaces, it was noted that a class that implements an
interface must implement all of the interface methods. It is possible,
however, to define a class that does not implement all of the interface
methods, provided that the class is declared to be abstract. For example,
403

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

abstract class X implements Y {


// implements all but one method of Y
}
class XX extends X {
// implements the remaining method in Y
}

In this case, class X must be abstract because it does not fully implement
Y, but class XX does, in fact, implement Y. An abstract class may have
static fields and static methods. You can use these static members with a
class reference—for example, AbstractClass.staticMethod()—as you
would with any other class.

404

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-8. Java versus C++


As we have seen so far, C++ is one of the most powerful languages and
will be used for a long time in the future in spite of the advent of Java.
Actually, C++ runs extremely fast and is in fact 10 to 20 times faster than
Java. Java runs very slow because it is a byte−code−interpreted language
running on top of a Java virtual machine (JVM). Java runs faster with
Just−In−Time (JIT) compilers, but it is still slower than C++.
Nevertheless, optimized C++ programs are about 3 to 4 times faster than
Java with a JIT compiler. In addition, the code from Java functions can be
copied into C++ functions with minimal change. Then, why do people
use Java? Because it is pure object oriented and is easier to learn. Also,
Java automates memory management, and programmers do not need to
deal with memory allocations in java programs.

In fact, language choice is very difficult, due to many things, such as


people skills, cost, tools, and influence of business. The best language
based on technical merits may not be selected simply due to some
political issues! Java is much closer to Ada95 than C++. As per David
Wheeler's Ada comparison chart, Ada95 gets the maximum points. Ada
got 93%, Java 72%, C++ 68% and C got 53%. Thus, C++ and Java are
close in points (4% difference), and Java is not a big revolution as
compared to C++.

Java is indeed more suitable for developing applications running inside


web−browsers (Java applets) but runs very slow. Hence, the golden rule
is "in Web−server side programming use C++ and in web−client side
(browser) programming use Java applets". The reason is − the server−side
OS is under your control, but you will never know what the client side
web−browser OS is. It can be an Internet device (embedded linux) or a
PC running Windows, Apple Mac, or Solaris etc. The advantage of Java
language is that you can create "Applets (GUI)" which can run on any
client OS platform. Many web−browsers supports Java applets and
web−browser like Hot Java is written in java itself. But the price you pay
for cross−platform portability is performance, and lower speed.

405

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-9. Java versus C#


We have pointed out that Java was developed, by Sun Microsystems, as a
fast programming language for internet applications. The C#
programming language (pronounced C-sharp) was Microsoft‘s answer to
Sun's Java. C# ties the speed and powerfulness of C++ language and the
high productivity of Visual Basic together. The only disadvantage of C#
is the platform-dependency, because C# will probably only be available
for Windows operating system

C# has garbage-collection, but it is also possible to use pointer-arithmetic


in special declared blocks and be responsible for the memory-removal. In
C# all data types are objects. C# has built-in common object model
(COM) functionality. Also, C# has get/set-methods (properties) and
events. Every C#-object is automatically a COM-object and because of
that any other COM-object created in any language can be used.
Altogether, C# is a very interesting language (especially for Windows-
developer), whereby the change from C++ will not be very hard. Very
interesting is the meta-documentation in C#, to every element (class,
method...). You can also access the documentation during runtime.

The following example demonstrates how the C# is different from C and


C++.

// Hello World in C

#include <stdio.h>

main()
{
printf("Hello World\n");
}

// Hello World in C++


#include <iostream.h>

int main(int argc, char *argv[])


{
cout << "Hello World" << endl;
return 0;
}

406

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

// Hello World in C#

using System;
class HelloWorld

{
static void Main()
{
Console.WriteLine("Hello World");
}
}

// Hello World in Java

using System; class Hello


{
public static void main(String[] args) {
System.out.println("hello, world");
}
}

It is interesting to show that the equivalent assembly program to the


above three codes is as simple as follows:

; Hello World program in Intel x86 assembly under DOS, ( using MASM)

.MODEL tiny
.CODE
ORG 100H
HELLO PROC
MOV AH, 09h
LEA DX, msg
INT 21h ; Display Hello World
MOV AX, 4C00h ; Exit to DOS
INT 21h
HELLO ENDP
msg DB 'Hello World$'
END

In the following section, we'll see how to write fast and efficient
assembly routines within the C/C++ and Java programs.

407

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-10. Invoking Assembly Language Programs from Java


The Java Native Interface (JNI) provides a powerful platform for
integrating code written in languages other than Java--mainly C and C++-
-with that written in the Java programming language. Although,
theoretically speaking, JNI does provide a fairly generalized interface, the
support structure that comes with JNI is basically aimed at linking C/C++
code with Java code. The literature that is available also appears to deal
exclusively with the methodology of linking Java and C/C++. This
section demonstrates the techniques that allow Java code to call code
written in assembly language. The version of assembly language used for
writing the illustration code is MASM32.

6-10.1. JNI Approach


When a Java method calls assembly language code, some information
will almost always have to move from one environment to the other. The
calling method will usually pass parameters to the called function and the
called function may return some information to the caller. In addition to
this, each environment requires information about the other to be able to
work together. The problem is that data representation within the Java
Virtual Machine (JVM) is different from that in the assembly language
environment. Also some information, especially within the JVM, is of a
specialized nature and there is no provision in native languages
(C/C++/assembly) to directly access such information. JNI provides a
rich set of interface functions that facilitate exchange of such data by
providing access to the internal database of the JVM and by providing the
required mapping from the data type of one environment to the
corresponding data type of the other. JNI also has certain other support
structures that make it easy for C and C++ programs to call these
interface functions. Unfortunately, these support mechanisms are not
directly usable by assembly language programs. The assembly language
programmer, therefore, needs to understand how the interface functions
can be directly accessed, and an appreciation of the structure of JNI is
necessary to achieve this understanding.

6-10.2. JNI Structure


Whenever a Java program calls a native method, the called method
compulsorily receives two parameters in addition to those specified by
the calling method. The first is the JNIEnv pointer and the second is a
reference to the calling object or class. It is the first parameter that is the
key to the world of JNI. JNIEnv is a pointer that, in turn, points to
another pointer.

408

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The second pointer points to a function table that is an array of pointers.


Each pointer in the function table points to a JNI interface function. In
order to call an interface function, we have to determine the value of the
corresponding entry in the function table. Let us see how we can do this
in two steps. First we find out what the value of the second pointer in our
chain is. In other words, we get the contents of the location pointed to by
JNIEnv. We do that as follows:

mov ebx, JNIEnv


mov eax, [ebx]

The first instruction loads the contents of JNIEnv into ebx and the second
loads the contents of the address pointed to by ebx into eax. Since the
content of ebx is the same as that of JNIEnv, eax now has the content of
the location pointed to by JNIEnv. This means eax now contains the
starting address of the function table.

Fig. 6-9. Accessing JNI functions

Next, we need to retrieve the contents of the entry in the function table
that corresponds to the function we want to call. To do this, we have to
multiply the zero based index of the function by four--since each pointer
is four bytes long--and add the result to the starting address of the
function table which we have formed in eax earlier. We do it as follows:
mov ebx, eax ; save pointer to function table
mov eax, index ; move the value of index into eax
mov ecx, 4
mul ecx ; multiply index by 4
add ebx, eax ; ebx points to the desired entry
mov eax, [ebx] ; eax points to the desired function
409

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The content of eax can now be used to call the function. This scheme of
accessing JNI interface functions is shown in figure, below.

Example 6-31
In order to see how the JNI technique can be used to call an assembly
language program, let us consider a simple example. In our example a
Java class (ShowMessage) calls assembly language code to display a
Windows message box. If the message box is displayed, then the
assembly language code returns a string to tell the calling class that it was
successful. Otherwise, an error message is returned. In either case, the
calling class prints the returned string on the console. The Java class
looks like this:

class ShowMessage
{
public native String HelloDLL(String s);
static
{
System.loadLibrary("hjwdll");
}
public static void main(String[] args)
{
ShowMessage sm = new ShowMessage();
String returnMessage = sm.HelloDll("Hello, World of JNI");
System.out.println(returnMessage);
}
}

Those familiar with JNI will notice that the Java class is identical to what
it would have been if the called native method had been written in C or
C++, which, of course, is as it should be, since the calling method need
not be aware of the language used to write the called method. All that
matters to the Java code is that it is calling a native method as declared in
the third line of the code:

public native String HelloDll(String s);

We will not go into the structure of the Java class—you may find it in
other books (as simple as C++). It is the assembly language code that is
of interest to us here, and we shall examine it in some detail.

410

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

.386
.model flat,stdcall
option casemap:none
include <pathname>\include\windows.inc
include <pathname>\include\user32.inc
include <pathname>\include\kernel32.inc
includelib <pathname>\lib\user32.lib
includelib <pathname>\lib\kernel32.lib
Java_ShowMessage_HelloDll PROTO :DWORD, :DWORD, :DWORD
; This macro returns pointer to the function table in fnTblPtr
GetFnTblPtr MACRO envPtr, fnTblPtr
mov ebx, envPtr
mov eax, [ebx]
mov fnTblPtr, eax
ENDM
; This macro returns pointer to desired function in fnPtr.
GetFnPtr MACRO fnTblPtr, index, fnPtr
mov eax, index
mov ebx, 4
mul ebx
mov ebx, fnTblPtr
add ebx, eax
mov eax, [ebx]
mov fnPtr, eax
ENDM
.data
Caption db "JAV_ASM",0
ErrorMsg db "String conversion error",0
SccsMsg db "MessageBox displayed",0
.code
hwEntry proc hInstance:HINSTANCE, reason:DWORD,
reserved1:DWORD
mov eax, TRUE
ret
hwEntry endp

411

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

Java_ShowMessage_HelloDll proc JNIEnv:DWORD, jobject:DWORD,


Msgptr:DWORD
LOCAL fntblptr : DWORD
LOCAL Message : DWORD
LOCAL fnptr : DWORD
GetFnTblPtr JNIEnv, fntblptr ; pointer to function table
GetFnPtr fntblptr, 169, fnptr ; pointer to GETstringUTFChars
push NULL ; push parameters for GetStringUTFChars
push Msgptr
push JNIEnv
call [fnptr] ; call GetStringUTFChars
mov Message, eax ; if eax is NULL then error
.if eax == NULL
invoke MessageBox, NULL, addr ErrorMsg, addr Caption, 16
GetFnPtr fntblptr, 167, fnptr ; pointer to NewStringUTF
push offset ErrorMsg ; push parameters for
push JNIEnv ; NewStringUTF
call [fnptr] ; call NewStringUTF
.else
invoke MessageBox, NULL, Message, addr Caption, 64
push Message
push Msgptr
push JNIEnv
call [fnptr] ; release string
GetFnPtr fntblptr, 167, fnptr ; pointer to NewStringUTF
push offset SccsMsg ; push parameters for
push JNIEnv ; NewStringUTF
call [fnptr] ; call NewStringUTF
.endif
ret ;return to Java program
Java_ShowMessage_HelloDll endp
End hwEntry

Note that <pathname> will be determined by the directory structure of


your system. For instance, it may be C:\masm32. Note also that this
program has been written as a .dll (Dynamic Link Library) as shown by
the hwEntry procedure at the beginning of the .code section. A .dll
can be linked with a Java program at run time. Needless to say, the
function of a .dll does not have to be restricted to calling a Windows API;
it can do whatever you want it to do.

412

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The MASM code should be saved as hjwdll.asm and, on successful


assembly, hjwdll.dll and hjwdll.lib files will be created. The first thing
that we need to look at is the name of the native method as it appears in
the assembly language code. HelloDll, the name used by the calling
method, appears in quite a different form in the called method. The trans-
formation from HelloDll to Java_ShowMessage_HelloDll is
called mangling and has been explained in The Java™ Native Interface
Programmer's Guide and Specification.

The mangled name can be derived manually by using the algorithm used
by JNI or generated automatically by running javah on
ShowMessage. You can do this by typing:

javah -jni ShowMessage

at the command line. The resulting file will be ShowMessage.h and will
show the mangled name. If you do use the javah approach, do not
include the output file in the assembly code. The only thing to be used is
the mangled name.

Other important points here are, of course, the two macros


GetFnTblPtr and GetFnPtr. These are modified versions of the
code snippets introduced in the preceding section. The modifications
enable the macros to operate directly on appropriate memory locations
and obviate the need for manipulating input and output variables through
the registers. Obtaining the pointer to the function one wants to call
becomes fairly simple because of the macros.

The HelloDll procedure first gets the pointer to the function table. It
then gets the pointer to the GetStringUTFChars function to convert
the String object passed by the Java method into a UTF8 string that
can be handled by assembly language. The parameters required for
calling GetStringUTFChars are then pushed onto the stack. Note
that the right-most parameter is pushed first in accordance with the
stdcall convention followed by JNI. The function puts its return value
in eax. If this value is NULL, then there was an error. Otherwise, a valid
pointer to a UTF8 string is available in eax, which can be used to display
the message passed by the Java method. After the message is displayed,
the UTF8 string should be released as shown.

413

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

The native method returns one of two strings to the calling method
depending on whether it succeeded or failed in displaying the message
passed to it by the Java method. However, the string generated by the
native method has to be converted into a Java String object before
being returned. This is done by a call to NewStringUTF.

Take note of the fact that the pointer to the function table needs to be
derived only once in a thread. That is why it is better to split the pointer
translation process into two parts so that the first part need not be
executed unnecessarily over and over again. Once you have compiled the
ShowMessage class and have created the hjwdll.lib and hjwdll.dll files,
put all the three files in the same folder. Now, if you execute
ShowMessage, you will see a message box like the one in the following
figure 6-9.

Figure 6-10. Windows MessageBox called from Java

414

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-11. Summary
The assembly language is more powerful and faster than all high level
languages. However, in order to write a huge software system, it is more
practical to use a high level language, like C/C++ or Java, and only use
the assembly language when you would like to build efficient I/O
routines. Anything you can do with a C/C++ or Java you can do in assembly
since C/C++ compilers convert the C-source code into machine code. In this
chapter we described how to write assembly language routines within
C/C++ and java programs.

We also explained what you need to know to use the Visual C/C++ inline
assembler with Intel x86-series processors and compatible. Inline
assembly code may be included as a string parameter, one instruction per
line, to the asm function in a C/C++ source program.

asm("incl x; movl 8(%ebp), %eax ");

We have also explained how to call assembly routines in other ways, in a


multi-language programming environment. In the meantime we
introduced a descriptive summary of both C/C++ and java programming
languages. The differences between the C++ and Java programming
languages can be traced to their heritage, as they have different design
goals.
C++ was designed mainly for systems programming, extending the
C programming language. To this procedural programming language
designed for efficient execution, C++ has added support for statically-
typed object-oriented programming, exception handling, scoped
resource management, and generic programming, in particular. It also
added a standard library that includes generic containers and
algorithms.
Java was created initially to support network computing. It relies
on a virtual machine to be secure and highly portable. It is bundled
with an extensive library designed to provide a complete abstraction of
the underlying platform. Java is a statically typed object-oriented
language that uses a syntax similar to C, but is not compatible with it.
It was designed from scratch, with the goal of being easy to use and
accessible to a wider audience.
The different goals in the development of C++ and Java resulted in
different principles and design trade-offs between the languages. The
differences are as follows :
415

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

C++ Java
More or less backwards compatible Designed without backward
with C source code. compatibility with any previous
language. The syntax is however
influenced by C/C++ to make
transition easy for developers.
Allows direct calls to native system Call through the Java Native
libraries. Interface.
Exposes low-level system facilities. Runs in a protected virtual
machine.
Optional automated bounds Always performs bounds checking.
checking.
Supports native unsigned No native support for unsigned
arithmetic. arithmetic.
No standardized limits or sizes for Standardized limits and sizes of all
any numerical types. Only relative primitive types.
sizes specified.
Parameters passed by value, pointer Parameters always passed by value;
or by reference. however objects are accessed
through references and it is these
references that are passed or
returned by value, not the objects
themselves (comparable in C++)
Explicit memory management, Automatic garbage collection only,
though third party frameworks exist though can be manually tuned by
to provide garbage collection. programmer.
Allows explicitly overriding types. Rigid type safety except for
widening conversions.
The C++ Standard Library has a The standard library has grown
much more limited scope and with each release.
functionality than the Java standard
library
Operator overloading. Meaning of operators is immutable.
Full, multiple inheritance Full single inheritance, multiple
inheritance from interfaces only

416

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-12. Problems
6-1) Choose the most suitable answer for the following questions:
i) What is the correct value to return to the operating system upon the
successful completion of an executable program?
A. Programs do not return a value B. -1
C. 1 D. 0
ii) What is the only function all C++ programs must contain?
A. start() B. system()
C. main() D. program()
iii) What punctuation is used to show the start and end of code blocks?
A. { } B. -> and <-
C. BEGIN and END D. ( and )
iv) What punctuation ends most lines of C++ code?
A. . B. ;
C. : D. '
v) Which of the following is a correct comment in C/C++?
A. */ Comments */ B. ** Comment **
C. /* Comment */ D. { Comment }
vi) Which of the following is not a variable type in C language?
A. float B. real
C. int D. double
vii) Which of the following is the operator to compare 2 variables?
A. := B. =
C. equal D. ==
viii) Which is not a proper prototype?
A. int funct(char x, char y); B. void funct();C
double funct(char x) D. char x();
ix) What purpose do classes serve?
A. data encapsulation
B. providing a convenient way of modeling real-world
objects
C. simplifying code reuse
D. all of the above
x) Which is not a protection level provided by classes in C++?
A. protected B. hidden
C. private D. public
xi) What value must a destructor return?
A. Pointer to the class. B. Object of the class.
C. Status code showing whether the class is destructed
correctly
D. Destructors do not return a value.

417

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

xii) Which of the following is a valid class declaration?


A. class A { int x; }; B. class B { }
C. public class A { } D. object A { int x;};
xiii) Which functions will every class contain?
A. None
B. Constructor
C. Destructor
D. Both a constructor and a destructor

6-2) Write a C-program that sorts a table of 100 string and arrange them
in alphabetical order, in the same array.

6-3) Repeat the above program in 80x86 assembly language. Compile


this program using MASM16 or MASM32 or TASM and compare the
size of the executable code and the speed of execution in both cases.
Hint: Use the timer functions to monitor the execution time in both cases.

6-4) Write a C-program that reads in a string of characters and displays it


as a message on the screen. Use assembly language instructions instead of
C-language statements instead of standard I/O functions.

6-5) Write a C-program that displays the BIOS date

6-6) Write a C-Program that hooks the timer interrupt and turns the
speaker on and off

6-7) Write a C-program that opens a text file named list.txt, for write
mode, using fopen() function. How you can accelerate opening the file by
using Assembly routines instead of the fopen() function.

6-8) Write the output of the following Java program (Welcome.java) and
re-wtite it using the printf() method (instead of println).

// Listing of Welcome.java
public class Welcome3 {
// main method begins execution of Java application
public static void main( String args[ ] )
{
System.out.println( "Welcome\n to \n Java \n Programming!" );
// end method main
} // end class Welcome3

418

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-9) The following program (Welcome2.java) prints a dialog box.


Compile this program to see its output and describe each line in it.

import javax.swing.JOptionPane; // import class JOptionPane


public class Welcome4 {
public static void main( String args[ ] )
{
JOptionPane.showMessageDialog(null, "Welcome Everybody to
Java Programming!");
System.exit( 0 ); // terminate Windows application
} // end main
} // end class Welcome2

6-10) The following program (Addition.java) reads two numbers from


dialog boxes and prints the sum in another dialog box. Compile this
program to see its output and describe each line in it.

1. import javax.swing.JOptionPane;
2. public class Addition {
3. public static void main(String args[ ])
4. {
5. String Number1, Number2;
6. int number1, number2, sum;
7. Number1= JOptionPane.showInputDialog("Enter 1st integer" ;
8. Number2 = JOptionPane.showInputDialog("Enter 2nd integer");
9. number1 = Integer.parseInt( Number1 );
10. number2 = Integer.parseInt( Number2 );
11. sum = number1 + number2;
12. JOptionPane.showMessageDialog("The sum is " sum, "Results
13. null ", JOptionPane.PLAIN_MESSAGE );
14. System.exit( 0 );
15. }
16. }

6-11) Show how to use the JNI technology to replace input/output code
in the above java programs with equivalent Assembly code.

419

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 6

6-13. Bibliography

[1] D. Willen and J. Krantz, 8088 Assembler Language Programming:


The IBM PC, Macmillan, NY, 2nd Edition, 1989.

[2] Peter Norton et al, PC Programming Bible, Microsoft Press, 1996.

[3] Barry B. Brey, The Intel Microprocessors 8086/8088, 80186/80188,


80286, 80386, 80486, Pentium, and Pentium Pro Processor Architecture,
Programming, and Interfacing, Book News, NY, 1999.

[4] V. Rajaraman, and T. Radhakrishnan, Essentials of Assembly


Language Programming, for the IBM PC, Prentice-Hall, 2000.

[5] Intel 64 and IA-32 Architectures Software Developers Manual, Vol. 2,


Intel Corp., April 2008.

[6] http://www.intel.com

[7] http://www.microsoft.com

420

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 7

Memory Interfacing
with Microprocessors

Contents

7-1. Introduction
7-2. Bus Timing of Memory Read/Write Operations
7-2.1. Memory Read Timing
7-2.2. Memory Write Timing
7-2.3. Wait States in 80x86 Microprocessors
7-2.4. Pentium Processor Bus Timing
7-2.5. Bus Cycle Time & Bus Bandwidth of 80x86 Processors
7-3. Memory Address Decoding
7-4. ROM & Its Interface Circuits
7-5. RAM (SRAM, DRAM) & Its Interface Circuits
7-5.1. SRAM Interfacing
7-5.2. Cache Memory and Content Addressable Memory (CAM)
7-5.3. DRAM Interfacing (EDO, SDRAM, DDR, RAMBUS)
7-5.4. DRAM Interfacing with 16-bit Data Bus
7-5.5. DRAM Interfacing with 32-bit Data Bus
7-5.6. DRAM Interfacing with 64-bit Data Bus
7-5.7 DRAM Modules
7-5.8. DRAM Controllers
7-6. Memory Requests
7-7. Checking Memory Errors
7-7.1. Parity Checking
7-7.2. Errors Checking & Correction (ECC)
7.8. Serial Memory Devices

421
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7.9. Secondary Memory


7-9.1. Magnetic Storage Drives
i. Magnetic Tape Drives
ii. Magnetic Disk Memory
7-9.2. Optical Disk Memory & Compact Disks (CD)
7-10. Mobile Memory Modules
7-10.1. SRAM Cards
7-10.2. Flash Memory Cards
7-10.3. USB Flash Drives
7-11. Summary of PC Memory Types
7-12. Problems
7-13. References

Memory to Processors: Without me, you’re nothing!

422
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Memory Interfacing
with Microprocessors

7-1. Introduction
Memory is one of the most important components in microprocessor-
based systems, like computers and embedded control systems. Some
computer basic input/output routines (BIOS) have to be permanently
stored in the computer read-only-memory (ROM). Every time a computer
is started up, programs are loaded from secondary memory (usually hard
disk) into the computer memory. The main memory into which these
programs are loaded is the computer random access memory (RAM).
Therefore, every computer contains several types of memory devices, as
shown in figure 7-1. These memory devices are different in capacity,
speed, and theory of operation. In this chapter we briefly discuss the
various aspects of memory interfacing in computer systems, in general,
and with 80x86 microprocessors, in particular.

Fig. 7-1. Memory organization of a computer system.

423
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

As shown in the following figure, the various memory devices: may be


also classified as follows:

 Primary and Secondary and Tertiary Storage


 Volatile and non-volatile storage
 Read only and Writable storage
 Random Access and Sequential Access storage
 Magnetic storage
 Optical storage

Fig. 7-2. Primary, secondary and tertiary memory devices.

Primary storage devices are comparatively faster than all other kinds of
memory types. The most popular example of this kind of memory is the
RAM (Random Access Memory) that we use in modern computers and
PC’s. The following figure depicts the memory interface circuit to the
Intel 8088 microprocessor in the early IBM PC’s.

424
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-3. Memory interface circuit to the Intel 8088 microprocessor in IBM PC.

7-2. Bus Timing & Decoding of Memory Read/Write Operations


The microprocessor operation involves reading from and writing to
memory. Hence, the memory and I/O devices should respond as fast as
the microprocessor. Consequently, it is very important to understand the
microprocessor timing of memory read/write operations before discussing
memory and I/O interfacing. The timing diagrams show how to calculate
the memory access times. The x86 processors make read/write operations
during the so-called bus cycle. In 8086/8088 microprocessors the bus
cycle is composed of 4 clock periods (T states). So if the microprocessor
is clocked at 10 MHz, then T = 100 ns and the bus cycle is 400 ns. More
recent microprocessors, like 80286, 80386, 80486 and Pentium have bus
cycles of only 2 clock periods.

7-2.1. Memory Read Timing


When the microprocessor executes a program, it reads instructions from
memory using memory read bus cycle. As shown in figure 7-4(a), the
read machine cycle of 8086/8088 microprocessors proceeds as follows:

425
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

1- The CPU puts the address on address/data bus,


2- The address is latched (ALE), and the DT/R goes low (Receive)
3- The MEMR signal goes low and sent to memory to enable read
4- When the microprocessor receives READY signal from memory,
DEN goes low and data becomes available on the data bus for read

T1 T2 T3 T4
CLK

S0,
S1,
S2
A/D Address Data valid for Memory Read
valid
ALE

MEMR

DT/R

DEN

Fig. 7-4(a). Timing diagram of memory read cycle in 8086/8088 microprocessors. All
signals (except for CLK, S0, S1, S2 ) are generated by the 8288 bus controller.

It should be noted that the Clock (CLK) keeps everything synchronized


during memory read and memory write operations. If the memory is
slower than the microprocessor, the microprocessor will issue wait states
until memory becomes ready for read operations.

7-2.2. Memory Write Timing


The write machine cycle, shown in figure 7-4(b), of 8086/8088 proceeds,
as follows:

1- The CPU puts the address on address/data bus,


2- The address is latched (ALE is high),
3- Data is put on the data bus and the DT/R goes high (Transmit)
4- The MEMW signal goes low and sent to memory to enable write
5- When the CPU receives READY signal from memory, DEN goes
low and data becomes available on the data bus for writing

426
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

The DEN signal should be kept high during a minimum setup time
(sometimes called time data valid to write going high or TDVWH).
Again, if the memory is slower than the microprocessor, the
microprocessor will issue wait states until memory becomes ready for
write operations.

T1 T2 T3 T4
CLK

S0,
S1,
S2
A/D Address Data valid for Memory Write
valid
ALE

MEMW

DT/R

DEN

Fig. 7-4(b). Timing diagram of memory write cycle in 8086/8088 microprocessors.

7-2.3. WAIT States in x86 Microprocessors


It should be noted that the Clock (CLK) keeps everything synchronized
during memory read and memory write operations. The microprocessor
enables data bus (by issuing DEN), only when it receives high READY
signal from memory. If the memory is slower than the microprocessor,
the microprocessor will issue wait states until memory becomes ready for
read or write operations. For instance, the 8086/8088 microprocessor
samples the READY input signal during the third clock state (T3) of the
bus cycle.

When the memory device is not fast enough, and the READY signal is
low, then the microprocessor generates WAIT states (in addition to the
original 4 clock states), until memory presents its data and sends high
READY signal to the microprocessor.

427
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-2.4. Pentium Processors Bus Timing


The Pentium non-pipelined memory cycle consists of 2 clock periods (T1
and T2). As shown in figure 7-5, the read machine cycle of Pentium
microprocessors proceeds as follows:

1- During T1, the Pentium CPU issues address, ADS, W/R, and M/IO
signals. The MEMR or MEMW (sometimes denoted MRD, MWT) can
be generated from these signals by simple logic.
2- During T2, the data bus is sampled at the positive edge end of T2.
3- Memory wait states are inserted into timing by controlling BRDY
input (to CPU from external memory devices). BRDY should be 0 at the
end of T2, otherwise additional T2 (wait states) are inserted.

T1 T2 T1 T2
CLK

ADDR

ADS

W/R READ

DATA

BRDY

Fig. 7-5. Timing diagram of memory read cycle in Pentium microprocessors


(without pipelining)

7-2.5. Bus Cycle & Bus Bandwidth in 80x86 Microprocessors


We have seen in the previous section that the bus cycle of 8086/8088
processors is composed of 4 clock pulses, when there is no wait states.
Therefore, the total bus cycle, of 8086/8088 processors, is given by:

Bus Cycle Time (8086) = (4 + WAIT states)*Clock Period (7-1)

In more recent processors, like 80286/80386/80486 and Pentium, the data


bus is not multiplexed with address bus. So, the bus cycle is composed of

428
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

only 2 clock periods, when there are no wait states. Hence the bus cycle
time for such processors is given by:

Bus Cycle Time (386) = (2 + WAIT states)*Clock Period (7-2)

For instance, the bus cycle of an 80386 operating at 20MHz, with zero
wait states is given by: Bus Cycle Time (386)= (2+0)*(1/20MHz)=100ns.
And the bus Cycle of an 80486 operating at 50MHz, with 1 wait states is
given by: Bus Cycle Time (486) = (2 + 1)* (1/50 MHz) = 60 ns

The so-called bus speed is given by the inverse of the bus cycle time.
Also the bus bandwidth is given by the product of the bus speed
multiplied by the width of the data bus.

Bus Speed = 1/ Bus Cycle time (7-3)

Bus Bandwidth = Bus Speed * Data Bus width (in bytes) (7-4)

Example 7-1:
Calculate the bus speed and the bus bandwidth of a 80486 operating at
50MHz, with zero wait states and transferring data over 32-bit data bus:

Solution:
Bus Cycle Time (486) = (2 + 0)* (1/50 MHz) = 40 n sec
Bus speed = 1/(40 n sec) = 25 MHz
Bus bandwidth = 25 (MHz) x 4 (bytes) = 100 M Byte /sec
However, it should be noted that the bus speed is usually limited by the
external bus type, which is used on the mother board hosting the
microprocessor. For instance, the so called ISA Bus, which is a 16-bit
bus, has a maximum speed of a 8MHz. Also, the so-called EISA bus,
which is a 32-bit bus, supports higher speeds. The most recent PCI bus,
which is a 64-bit bus, admits higher speeds (up to 400MHz).

It should be also noted that the bus speed are measures of the computer
performance, because they express how fast is the communication
between the microprocessor and memory or I/O devices.

Example 7-2:
Calculate the number of wait states, which should be used when a 10 ns
ROM is used with a Pentium operating at 100 MHz, given that the ROM
selection circuits add a delay of 15 ns?

429
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Solution:
The zero bus cycle of a Pentium operating at 100 MHz = 2x10ns = 20ns,
which is shorter than the total time needed to access the ROM (10+15 =
25 ns). So, adding one wait state, will be enough to make the
microprocessor bus cycle slow enough (it will be then 3 x 10ns = 30 ns)
to access the ROM data. So the number of wait states for a 80386
microprocessor may be found using the following inequality:

Bus Cycle Time (80386) = (2 + W)*T ≥ ta

Here T is the microprocessor clock period (T= 1/f =10ns), W is the


number of wait states and ta is the total memory access time. Here ta =
10ns + 15ns = 25ns. The number of wait stats is given by:

(2 + W)*10 ≥ 25 or W = 1

7-3. Memory Address Decoding


The memory space of a specific system is usually composed of several
memory chips. The role of address decoding is to generate chip select
(CS) signals for each memory chip, in the memory system. For example,
if we’d like to use two 32kB ROM chips (to make 64kB ROM), we need
15 common address lines (A0-A14), and additional address line (A15) for
chip select, as shown in Fig. 7-6(a).

Fig. 7-6(a). Memory address decoding of 2 ROM chips, using a simple inverter.

430
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

For large number of memory chips, we use dedicated address decoder


chips, as shown in Fig. 7-6(b). We usually break the address space into
similar blocks, and the block size determines the number of least
significant bits (A0 - An-1) of the address bus. This block size should be
equal to the memory chip size. The number of memory chips to be used
determines the extra address lines (N - n). Therefore, we usually make use
of an address decoder of (N - n) inputs, to select one of (2N-n) memory
chips at a time.

Fig. 7-6(b). Memory address decoding of several memory chips, using a decoder.

7-4. ROM & its Interface Circuits


The read only memory (ROM) stores data permanently. The
programmable ROM (PROM) is one-time programmable (OTP) ROM.
The erasable–programmable ROM (EPROM) is another variant, which
can be programmed again and can be erased by exposure to UV light. The
EEPROM is electrically erasable (byte by byte) and programmable ROM.
EEPROM is sometimes called EAROM (Electrically alterable ROM).
Flash memory is a sort of cheap EEPROM. However, unlike EEPROM,
Flash memory can only be erased by blocks. Figure 7-7(a) depicts the
schematic symbol of a ROM chip As shown, the a ROM chip should
have input address lines (A0 to An-1) and output data lines (D0 to D7) as
well as a chip select input control (CE). The number of address lines (n)
is related to the ROM capacity (2n Bytes). Figure 7-7(b) depicts the pin-

431
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

out diagram of the 2716 EPROM chip. The M2716 (from Motorola) is a
16k bit (2k x 8 bit) UV erasable and electrically programmable memory.

Fig. 7-7(a). Schematic symbol of a ROM chip.

A7 1 24 VCC

A6 2 23 A8

A5 3 22 A9

A4 4 21 VPP

A3 5 20 CS

A2 6 19 A10

A1 7 2716 18 PD / PGM

A0 8 EPROM 17 DO7

DO0 9 16 DO6

DO1 10 15 DO5

DO2 11 14 DO4
GND 12 13 DO3

Fig. 7-7(b). Pin-out diagram of 2716 (2k x 8 bit) EPROM.

432
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

The 2716 EPROM is usually housed in a 24-pin dual-in-line package


(DIP). The transparent lid allows the user to expose the chip to ultraviolet
light to erase the bit pattern. A new pattern can then be written to the
device using a suitable EPROM programmer. The following table
indicates the pin assignment of this chip. Note that the chip has two
power supplies, because the programming voltage VPP is much higher
than VCC. Figure 7-8 depicts how the 8088 processor can be interfaced
to eight 8kB EPROMS (2764), via the 74LS138 (3-to-8) decoder chip.
The 74LS138 chip is used for address decoding.

Table 7-1. Pin assignment of the 2716 EPROM.

Pin Description
A0-A10 Address lines
CS Chip Select
DO0-DO7 Output lines
PD / PGM Power down / Program

2764 EPROM

A0 A0
A1 A1
.
. .
A12 .
A12
74LS138
A13 I0 Qo CS
A14 I1 Q1
A15 I2 Q2 O0
. O1
Decoder . .
A16 . .
: . .
A19 G2A Q7 CS O7

MEMR G2B
RESET G1

Fig. 7-8. Interfacing 8088 to eight 2764 (8k x 8 bit) EPROM chips.

433
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-5. RAM (SRAM and DRAM) & its Interface Circuits


Random access memory or RAM works as a temporary storage medium in
computers and microprocessor-based systems. Every time a computer is
started up, programs are loaded into its random access memory. As its name
indicates, RAM can be accessed or written to randomly. Any byte or piece
of RAM can be used without accessing the other bytes or pieces of memory.
Thus, the access time for any location in the RAM is the same. Figure 7-9
depicts the circuit of a RAM chip and its external connection. As shown
in figure, a RAM chip usually has input address lines (A 0 to An-1) and
input/output data lines (D0 to D7) as well as a chip select (CS) and
read/write (R/W) input controls.

Fig. 7-9(a). Schematic symbol of a RAM chip.

Address (Word) Data (Bit)


Lines n n Lines
Words
2 xW W bits

RAM

CS WE OE
Control Lines

Fig. 7-9(b). External connection of RAM IC's.

434
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

There are two basic types of RAM, namely: static RAM (SRAM) and
dynamic RAM (DRAM). Basic memory devices can be fabricated using
different semiconductor technologies, such as the standard CMOS or the
standard bipolar technologies. Both types of RAM are volatile -- they lose
their contents when the power is turned off.

7-5.1. SRAM & its Interface Circuits


Static RAM or SRAM is a type of fast but expensive memory, which is
usually used as system cache, inside your PC motherboard. Basic
memory devices can be fabricated using different semiconductor
technologies, such as the standard CMOS or the standard bipolar
technologies. SRAM employs so many transistors, typically 4 to 6
transistors1 for each bit as shown in figure 7-10 giving it faster speed but
less storage capacity.

Word Line

A A'

Bit Line VDD Bit Line

A
A'

Fig. 7-10. Conventional SRAM Cell, with 6 MOSFET transistors (6T cell).

In 1970, Fairchild Corporation invented the first 256 kB SRAM chip.


SRAM does not need to be refreshed, which makes it faster than DRAM.
The typical access time of SRAM is 5-10 ns, in contrast to a typical
access time of 60 ns for DRAM. Figure 7-11 depicts the block diagram of
an SRAM chip, like the 4008.
1
Some sort of SRAM's called FeRAM's (ferroelectric RAM) are utilizing only one transistor per bit.

435
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-11. Block diagram of SRAM chip.

7-5.2. Cache Memory and Content Addressable Memory (CAM)


Cache memory is a sort of small fast memory (piece of SRAM) which is
used to improve average memory response time. In order to maintain full
speed, the CPU must latch instructions and data in an internal cache
memory, thus avoiding any need to access external memory.

Caches usually make use of random access or even faster access method,
called "associative addressing". Associative memories are also
commonly known as content-addressable memories (CAM). In a CAM
any stored item can be accessed by using the contents of the item in
question. The field chosen to access the CAM is called a KEY. As shown
in figure 7-12(a), CAM has a match output to see which words contain a
key. The CAM unit cell is similar to the (6T) SRAM cell, in addition to 4
match transistors. The items stored in a CAM can be viewed as having
two-field format: KEY and DATA, where KEY is the stored address and

436
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

DATA is the information to be accessed. An associative cache employs a


tag that is a block address, as the KEY.

Fig. 7-12(a). Structure of a CAM chip.

At the start of a memory access, the incoming tag is compared


simultaneously to all the tags stored in the cache memory. If a match
(cache hit) occurs, a match-indicating signal triggers the cache to service
the requesting memory access. On the other hand, if a no-match signal
(cache miss) occurs, the memory access request will be forwarded to
main memory, as shown in figure 7-12(b).

Fig. 7-12(b). Cache memory operation. The KEY and DATA fields of the cache
memory are represented here by xi and value (xi).

Level one cache (L1-Cache) is the highest speed memory in the system
and is most often integrated with the CPU chip itself. The L1-cache is
sometimes divided into 2 parts; namely, the instruction cache (I-Cache)

437
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

and the data cache (D-Cache). However, some CPUs do not have on-chip
cache, and the L1-cache is a high-speed external SRAM tightly coupled
to the CPU, with the capability of operating at near-CPU speeds. High-
speed SRAM that works in this manner is very expensive, so typically a
price versus performance analysis is done to select the most cost-effective
cache configuration for a particular system. Unfortunately, cache is not
usually large enough to contain the entire executable code base, so the
CPU must periodically go off-chip for instructions and data. When the
CPU is forced to make external accesses (to memory or other I/O
devices), then the main memory performance become a critical issue. A
way to solve this problem is to build a two-level (or three-level) caching
system, as shown in figure 7-12(b).

The 80486 and later processors work in this fashion. The first level is on-
chip cache (typically 16kB with 10 ns access time). The next level,
between the on-chip cache and the main memory, is a secondary cache
(L2-Cache) built on the computer system motherboard. A typical L2-
Cache contains from 64kB to 2MB of memory. Common size on PC
systems is 512 kB of cache.

CPU
Cache miss
EU L1-Cache
L2-Cache L3-Cache

RAM

Fig. 7-12(c). Multi-level cache memory organization

You might ask, "Why bother with a two-level cache? Why not use a
higher capacity SRAM (e.g., 512kB) in one level cache?" Well, the L2-
Cache generally does not operate at zero wait states. The circuitry to
support 512 kB of 10 ns access time memory would be more expensive.
438
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Therefore, most system designers use slower memory, which requires one
or two wait states. However, this is still much faster than main memory.
Combined with the on-chip cache, you can get better performance.

7-5.3. DRAM Interfacing


DRAM stands for dynamic random access memory The DRAM forms
the main memory in PC’s and other systems. DRAM is simpler in design
and cheaper in price than SRAM. Unlike SRAM, the DRAM chips need
to be periodically recharged with power to keep the information from
fading. A conventional DRAM cell is illustrated in figure 7-13(a).

Word Line

Bit Line

Fig. 7-13(a). Conventional DRAM Cell, with one MOT and one capacitor (1T-1C)

DRAM has higher power consumption and capabilities than SRAM. A


new version of DRAM called single data rate synchronous. The following
figures depicts the DRAM array in READ and Write modes. DRAM,
though being simpler but it is the slower of all. This is because each cell
of the DRAM has a small capacitor that needs to be refreshed thousands
of times per second or it will lose its contents.

439
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-13(b). A DRAM module, in READ mode

440
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-13(c). A DRAM module, in WRITE mode

441
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

As shown, each DRAM cell consists of a MOSFET and a small capacitor


to hold data. Such a DRAM cell is usually referred to as 1T-1C cell.
DRAM bit cells are arranged on a chip in a grid of rows and columns
where the number of rows and columns are usually a power of two.
Often, but not always, the number of rows and columns is the same. A
1M bit device would then have 1024x1024 memory cells. A single
memory cell can be selected by a 10-bit row address and a 10-bit column
address. Conventional DRAM's have multiplexed address lines and
separate data inputs and outputs. In fact, multiplexing admit cutting the
number of address lines into half. The two address halves are applied to
the address pins on 2 separate clock cycles. The two address halves are
fed to the DRAM one after another. In order to do that, there are three
control signals: RAS (row address strobe), CAS (column address strobe),
and WE (write enable), as shown in figure 7-14.

Address Bus Data Bus


Raw
Address Memory Array Sense &
Latch Refresh
Amplifiers

Column Address Latch

RAS CAS

Address Bus Time 

RAS

CAS

Fig. 7-14. General block diagram of a DRAM chip and its timing diagram.

Memory access to a DRAM proceeds as follows:

1. The control signals are all initially inactive (high), a memory cycle is
started with the row address applied to the address inputs and a falling
edge of RAS. This latches the row address and "opens" the row,
transferring data in the row to the buffer. The row address can then be
removed from the address inputs since it is latched on-chip.

442
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

2. With RAS still active, the column address is applied to the address
pins and CAS is made active as well. This selects the desired bit or
bits in the row, which subsequently appear at the data output(s). By
additionally activating WE the data applied to the data inputs can be
written into the selected location in the buffer.

3. Deactivating CAS disables the data input and output again.

4. Deactivating RAS causes the data in the buffer to be written back into
the memory array.

Figure 7-15(a) depicts the pin-out diagram of the 41256 DRAM chip.
Also, table 7-2 indicates the pin assignment of this chip. Note that the
chip organized as 256k x 1bit (Din/Dout). Figure 7-15(b) depicts how the
8088 microprocessor can be interfaced to eight 4164 (64k x 1bit) DRAM
chips, via the 74LS245 bi-directional buffers. In this figure, the address
pins are multiplexed by the 74LS158 (as required by the DRAM).
Multiplexing the address pins saves pins on the DRAM chip, but usually
requires additional logic in the system to properly generate the address
and control signals, not to mention further logic for refresh. Therefore,
DRAM chips are usually preferred when pin count is small. The
additional cost for the control logic is outweighed by the lower price.

A8 1 16 GND

 Din 2 15 CAS

WR 3 14 Dout 
RAS 4 41256 13 A6

A0 5 12 A3

A2 6 DRAM 11 A4

A1 7 10 A5

VCC 8 9 A7

Fig. 7-15(a). Pin-out diagram of 41256 (256k x 1 bit) DRAM.

443
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Table 7-2. Pin assignment of the 41256 DRAM.

Pin Description
A0-A8 Address lines
Din Input data (1 bit)
Dout Output data (1 bit)
WR Write enable
RAS Row Address strobe
CAS Column Address strobe
VCC 5V Power supply

Fig. 7-15(b). Interfacing 8088 with a bank of eight 4164 DRAM chips (each chip is
64k x1 bit), via 74LS158 multiplexers and 74LS245 bi-directional data buffer.

Based on these principles, chip designers have developed many varieties


to improve performance or simplify system integration of DRAM’s:
The FPM DRAM (Fast Page Mode DRAM) was the traditional form of
DRAM for PC’s long time before EDO was introduced. FPM DRAM
waits through the entire process of locating a bit by column and row and
then reading the bit, before it starts on the next bit. The access time of
FPM DRAM is typically 60 or 70ns. The maximum transfer rate of FPM

444
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

DRAM (to L2-Cach) is approximately 176 MB/s. It was usually mounted


on SIMM modules of 2, 4, 8, 16 and 32 MB.
The EDO DRAM (Extended Data Out DRAM) is an enhanced version of
FPM DRAM that can continue to output data from one address while
setting up a new address. EDO DRAM is 40% faster than conventional
FPM DRAM and can be used in pipelined systems. However, EDO
DRAM was effective for bus speeds up to 66 MHZ, and that has been
quickly bypassed by the introduction of Pentium processors.
The SDRAM (Synchronous DRAM) adds a clock signal to the control
signals of the conventional DRAM. SDRAM is tied to the front-side bus
clock of the PC system. SDRAM and the bus execute instructions at the
same time rather than one of them having to wait for the other. In the last
few years, SDRAM has become the standard memory type for PC's. With
data being transferred 64 bits at a time, DDR SDRAM gives a transfer
rate of (memory bus clock rate) × 2 (for dual rate) × 64 (number of bits
transferred) / 8 (number of bits/byte). Thus with a bus frequency of
100 MHz, DDR-SDRAM gives a maximum transfer rate of 1600 MB/s.
DDR SDRAM (Double-Data-Rate Synchronous DRAM) memory
technology followed the SDRAM. Like SDRAM, DDR is synchronous
with the system clock. But it achieves greater bandwidth than the
preceding single-data-rate SDRAM by transferring data on the rising and
falling edges of the clock signal (double pumped). Effectively, it doubles
the transfer rate without increasing the frequency of the memory bus. For
example, instead of a data rate of 133MHz (the front side bus speed),
DDR memory can transfer data at 266MHz. In this case it can achieve a
peak bandwidth of 2.1 GB/s.
DDR2 SDRAM memory modules are the second generation of SDRAM
DDR. They also transfer data on the rising and falling edges of the clock.
The difference of DDR2 to DDR is a doubled bus frequency for the same
physical clock rate, thus doubling the effective data rate another time.
DDR2 chips have been recently introduced and were supposed to take
over the world PC memory market, but high prices, the marginal
performance boost they offer compared to existing DDR chips. Both
DDR2 and DDR memory supports both ECC (error correction code,
typically used in servers) and non-parity (used on desktops/laptops.).
DDR2 was introduced in the second quarter of 2003 at two initial speeds:
200 MHz (referred to as PC2-3200) and 266 MHz (PC2-4200). Both
performed worse than the original DDR specification due to higher
latency, which made total access times longer. DDR2 started to become
445
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

competitive with older DDR by the end of 2004. Further generations of


DDR such as DDR3 have entered the mass market, with DDR4 currently
being designed and anticipated to be available in 2013.
The so-called RDRAM (RAMBUS DRAM) changes the system
interface of DRAM completely. A byte-wide bus is used for address, data
and command transfers. RAMBUS memory sends less information on the
data bus (which is 18 bits wide as opposed to the standard 32 or 64 bits)
but it sends data more frequently. RDRAM reads data on both the rising
and falling edges of the clock signal, as DDR does. So, if the bus operates
at very high speed: 500 million transfers per second. The chip operates
synchronously with a 400 MHz clock; data is transferred at transfer
speeds of 800MHz and higher. The signals on the RAMBUS channel use
nonstandard signal levels, making it incompatible with standard system
logic. These disadvantages are compensated by a fast data transfer.
Another difference of RAMBUS memory is that all memory slots must
be populated. The unused memory slots must be populated with a PCB,
known as a continuity module. RAMBUS modules are known as RIMM
modules (RAMBUS inline memory modules).
XDR DRAM (extreme data rate DRAM) is a high-performance RAM
interface and successor to the Rambus RDRAM. XDR was designed to be
effective in small, high-bandwidth consumer systems. It eliminates the
high latency problems of RDRAM. XDR is used by Sony in the
PlayStation 3 consoles. The XDR2 DRAM can deliver up to 80GB/s of
peak bandwidth from a single, 4-byte-wide, 20Gbps XDR2 DRAM
device. The following table depicts the bandwidth (BW, in Bytes/sec) of
the some well-known DRAM types.
Table 7-3. Bandwidth of some well-known DRAM types and their peak value. The
peak value is the maximum transfer rate from DRAM to L2-Cache

DRAM (Frequency) Module Bandwidth Peak Bandwidth


FPM 176 MB/s
EDO 264 MB/s
SDRAM (100MHz) 100 MHz x 64 bit 800 MB/s
DDR (100MHz) 2x100 MHz x 64 bit 1.6 GB/s
DDR (200 MHz) 2x200 MHz x 64 bit 3.2 GB/s
DDR2 (400 MHz) 2x400 MHz x 64 bit 6.4 GB/s
DDR3 (400 MHz) 2x400 MHz x 64 bit 6.4 GB/s
DDR4 (400 MHz) 2x400 MHz x 64 bit 6.4 GB/s
XDR (800 MHz) 800 MHz x 64 bit 6.4 GB/s
XDR2 (800 MHz) 2x800 MHz x 64 bit 12.8 GB/s
446
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-5.4. DRAM Interface With 16-Bit Data Bus


In 8086/8088 and 80286 microprocessors, the BHE signal is combined
with the first bit of the address bus A0 (which may be called BLE) to
select one of two (or both) memory banks connected to the data bus, as
shown in figure 7-16(a).

Low Bank High Bank


BLE (A0) BHE
A1-A19 A1-A19
FFFFF FFFFE
: :
D0-D7 : D8-D15 :
: :
00003 00002
00001 00000
A0 Decoder CS CS

BHE

Fig. 7-16(a). Memory interface with 16-bit data bus.

Table 7-4. Memory bank selection, in 16-bit data bus PC systems.

A0 (BLE) BHE Selected Memory Bank (s)


0 0 Both banks enabled (16-bit transfer)
1 0 High banks enabled (8-bit transfer)
0 1 Low banks enabled (8-bit transfer)
1 1 No banks

7-4.5. DRAM Interface With 32-Bit Data Bus


The 80386/80486 microprocessors have 4 bus enable lines (BE0-BE3),
which encode the two missing address lines (A0-A1). Different
combinations of 4 memory banks (8-bit wide) can be selected using a
decoder or separate write signals (WE), as shown in figure 7-16(b).

BE3 BE2 BE1 BH0


FFFFFFFF FFFFFFFE FFFFFFFD FFFFFFF
C
: : : :
: : : :
00000007 00000006 00000005 00000004
00000003 00000002 00000001 00000000

Fig. 7-16(b). Memory interface with 32-bit data bus.

447
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-5.6. DRAM Interface With 64-Bit Data Bus


The Pentium processors have external 64-bit data bus and 8 bus-enable
lines (BE0-BE7), which encode the 3 missing address lines (A0-A2). So,
different combinations of 8 memory banks (8-bit wide each) can be
selected using a decoder or separate write signals (WE), as shown in the
following figure, 7-16(c)

BE3 BE2 BE1 BH0


F..FFFFB F..FFFFA F..FFFF9 F..FFFF8
: : : :
: : : :
0..0000B 0..0000A 0..00009 0..00008
0..00003 0..00002 0..00001 0..00000

BE7 BE6 BE5 BH4


F..FFFFF F..FFFFE F..FFFFD F..FFFFC
: : : :
: : : :
0..0000F 0..0000E 0..0000D 0..0000C
0..00007 0..00006 0..00005 0..00004

Fig. 7-16(c). Memory interface with 64-bit data bus.

7-5.7. DRAM Modules


Although DRAM chips allow memories in the range of gigabytes to be
implemented with a reasonable cost, the affordable size is still small
compared to the demands of recent computers. However, with the advent
of surface mount technology, memory manufacturers began to offer
memory modules, where a bank of several memory chips is pre-
assembled on a little printed circuit board (PCB). So, memory in a
computer is usually installed as SIMM's (Single Inline Memory
Modules) or DIMM's (Dual Inline Memory Modules). SIMM's and
DIMM's are small printed circuit boards with memory chips soldered
onto one side or on both sides. Old SIMMs had 30 or 72 pins, and old
SDRAM DIMM's had 168 pins. However, DDR SDRAM DIMMs have
184 pins. More recent DIMM memory modules with DDR2 memory
usually have 240 pins, as shown in figure 7-17. Note that the DDR2
modules are packaged using Ball Grid Array (BGA) technology, instead
of the previous Thin Shrink Small-Outline package (TSSOP). This
packaging change was necessary to maintain signal integrity at higher
speeds. The target clock speed range for DDR2- was 400~1066 MHz and
that of DDR3 was 1066~2133 MHz.

448
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-17(a). SIMM with 30-pins memory module and its pin-out diagram

Fig. 7-17(b). SIMM with 72-pins memory module

Fig. 7-17(c). DIMM with 168-pin and two notches.

449
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-17(d). DDR SDRAM module (DIMM with 184-pin in TSSOP)

Fig. 7-17(e). DDR SDRAM modules and their evolution roadmap

450
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Today, the DDR4-SDRAM standard aims for clock speeds between 2133
and 4266 MHz, with DRAM voltages of 1.1V~1.2V. Figure 7-17(e)
shows some photographs of the above mentioned memory modules. The
standardization authority JEDEC has set standards for speeds of DDR
SDRAM, divided into two parts: The first specification is for memory
chips and the second is for memory modules. Table 7-5 depicts these
specifications.

Table 7-5. Specifications for the early SDRAM modules (DDR and DDR2)

Module Standard Memory Time between Data Peak transfer


name name clock signals transfer rate rate
PC-100 DDR-100 100 MHz 10 ns 100 MB/s 0.8 GB/s
PC-1600 DDR-200 100 MHz 10 ns 200 MB/s 1.6 GB/s
PC-2100 DDR-266 133 MHz 7.5 ns 266 MB/s 2.13 GB/s
PC-2700 DDR-333 166 MHz 6 ns 333 MB/s 2.67 GB/s
PC2-3200 DDR2-400 100 MHz 5 ns 400 MB/s 3.20 GB/s
PC2-6400 DDR2-800 200 MHz 5 ns 800 MB/s 6.40 GB/s
PC3-800 DDR3-800 400 MHz 2.5 ns 800 MB/s 6,400 MB/s
PC3-1600 DDR3-1600 800 MHz 2.5 ns 1600 MB/s 12,800 MB/s

Note that PC100 is the SDRAM standard that meets the Intel PC100
specification. Intel created this specification to enable RAM
manufacturers to make chips that work with Intel's i440BX chipset. This
chipset was designed to operate at clock frequency of 100 MHz, on a 64-bit bus.
As faster chipsets appeared, new standards, like PC2-6400 appeared.

451
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-5.8 DRAM Controllers


As we have pointed out so far, the DRAM chip has essentially millions of
tiny capacitors with each one holding one bit of data. These capacitors are
charged with power to represent a "1" or drained to represent a "0".
Because all capacitors leak, power must be added at regular intervals to
keep the "1" values intact. The RAM chips actually handle the task of
pumping power back into all of the appropriate locations in DRAM, but
they must be told when to do it by the computer system so that the refresh
activity won't interfere with the normal access to DRAM. If the computer
is unable to refresh memory, the contents of memory will become
corrupted in just a few milliseconds. Since memory read and write cycles
count as refresh cycles (a DRAM refresh cycle is actually an incomplete
memory read cycle), when any peripheral controller continues reading or
writing data to memory locations, that action will refresh the memory.
DRAM controllers are responsible of multiplexing address lines and
providing refresh control signals. For instance, the 8203 chip can control
up to 2 banks of 64k x 16-bit DRAM's. Also, the 8208 DRAM chip can
control up to 2 banks of 256 k x 16-bit DRAM's. Figure 7-13(a) depicts
the pin-out diagram of the 8208 chip. Figure 7-18 depicts the 8205
controller interface to 2 banks of 1MB DRAM. As shown in figure, the
A0, BHE and WE signals can be decoded to generate the WE for up to 4
DRAM's of 256k x 8bit (like 41256A8) to form a 1MB memory.

Fig. 7-18(a). Pin-out diagram of the 8205 DRAM controller chip.

452
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-18(b). Interfacing a 1MB DRAM in 2 banks, via the 8205 DRAM controller.

Fig. 7-18(c). Architecture of the 8420 (1MB) DRAM controller chip.

453
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Figure 7-18(c) depicts the architecture of DP8420/21/22 DRAM


controller. The DP8420/21/22 DRAM controllers provide a single chip
interface between DRAM and all 8/16/32-bit systems. These chips
generate all the required access control signals for 2 banks of DRAM’s.
An on-chip refresh request clock is used to refresh the DRAM array.
Refresh and access are arbitrated on chip. If necessary, a WAIT or (data
acknowledge DTACK) output inserts wait states into system access
cycles. The insertion of wait states guarantees the RAS signal low time
during refreshes and pre-charge after refreshes and back to back access.

In modern computers and PC’s, the DRAM refreshing is incorporated


with other input/output interface functions in the so-called chipset. For
instance, a host Pentium4 processor with 64-bit Front-side Bus (FSB), has
been supported by the so-called E7210 MCH chipset, which incorporates
a DDR memory controller. This DDR controller has two 64-bit wide
interfaces and supports up to 4GB of system memory.

7-6. Memory Requests


The figure 7-19(a) depicts the details of memory request in general
purpose microcomputer systems. More details about the instruction life
cycle, with emphasis on the memory access are shown in figure 7-19(b).

Fig. 7-19(a). Schematic diagram of memory requests.

454
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-19(b). Details of memory request in IBM PC's.

455
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-7. Checking Memory Errors


There exist various factors that may cause errors in memory contents, like
manufacturing defects and environmental factors. Such errors frequently
happen during data transmission or during data store into memory. The
memory errors may be generally classified into two categories:

1- Hardware errors: due to failure or malfunction of the memory array


or associated sensing and decoding circuits.
2- Soft Errors: due to thermal or radiation effects. DRAM soft errors are
often induced by alpha particles (He2+) which are unavoidably emitted
from IC package and can penetrate deep into silicon. Unlike hard
errors, soft errors are spontaneous and non-reproducible. The error is
called soft because the device functions normally after data is restored.

Such errors frequently happen during data transmission or data store into
memory. The reliability of a memory system can be improved by
employing error detection and correction codes (EDCC). This may be
achieved by various techniques such as:

1- Parity checking, which is a common way to detect or correct errors by


appending special check bits to every word (or byte) of the memory. For
instance, the 74LS285 is a simple parity checking IC.
2- Hamming code, adding k-parity bits to each byte of memory, to detect
and correct errors. For instance, the 74LS636 append 5 bits to each byte
3- Checksum, where each block of data is used to generate a fingerprint
checksum that is used to check again the same block after transmission
for searching for errors. Cheksum patterns are usually generated by
feedback shift registers (FSR).

7-7.1. Parity Checking. One popular technique for memory error


detection is to add a single check bit, called parity bit to each byte of
memory. So, if we have a word of n-bits (with binary coefficients x0, x1,
x2,..xn-1), then the parity bit (Co) would be as follows:

Co = x0 ⊕x1 ⊕x2 … ⊕xn-1 for even parity (7-5.a)


Co = x0 ⊕x1 ⊕x2 … ⊕xn-1 for odd parity (7-5.b)

The IBM PC original specifications obligated that all RAM of the main
memory should have a parity bit, to check for errors. This means that an
additional parity bit should be added to every 8-bit of main memory.
Unfortunately, while parity allows for the detection of single bit errors, it
does not provide a means of determining which bit is in error to correct it.

456
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-20. Simple parity generator circuit.

7-7.2. Error Checking & Correction (ECC).


ECC is an extension of the parity concept. The ECC technology is based
on the insertion of several check bits such that the location of certain
errors can be identified (and hence corrected). The location of errors can
be defined by verifying specific combinations of the check bits and the
original bits according to a certain scheme or a code (e.g., Hamming
code). Although the parity bit is still in use, the new ECC technology has
been widely spread. ECC is less expensive and can correct errors without
even interrupting the computer work. An example of error correcting IC's
is the 74LS636, which inserts 5 parity bits, using the Hamming code.
Figure 7-21 shows the logic diagram of the 74ABT853 data transceiver

Fig. 7-21. Logic symbol of the 74ABT853 data transceiver with parity generator.

457
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-22 Logic diagram of the 74ABT853 data transceiver with parity generator.

7.8. Serial Memory Devices


Serial memory devices, such as serial EPROM’s, save the pin count and
can be easily interfaced to microprocessors and microcontrollers, with a
few wires. Figure 7-23 depicts the implementation of an 8-bit serial RAM
from 1x1 bit cells. Note that data input and output are carried out on
single wires (Din and Dout). The 1x1 bit RAM cell has the following
structure, shown in figure 7-24.

Fig. 7-23 Structure of a 8-bit serial RAM.

458
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-24. Structure of 1x1 bit RAM.

Serial EEPROMs are small electronically-erasable programmable ROM


chips. These devices are usually used to store user configurable
parameters and device serial numbers. They use a serial bus interface,
which allows them to be packaged in inexpensive 8 pin packages.

There are several types of serial EEPROMs, but most of them fall into
either a 2-wire or 3-wire interface category. The 2-wire interface, called
I2C or Inter-Integrated Circuit, uses only two wires, regardless of how
many chips are attached.

I2C is an interface bus, invented by of Philips. The three wire interfaces


include SPI (Serial Peripheral Interface) and Microwire, which is a
trademark of National Semiconductor. Figure 7-25 depicts the logic
diagram and pinout of the Atmel 1k-bit serial EEPROM, AT24C01. As
shown in figure, this 1k bit serial EEPROM needs only two wire
connection for serial data input/output (SDA) and serial clock (SCL).

Fig. 7-25. Pinout diagram of Atmel 1k bit serial EEPROM, AT24C01.

459
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

For microcontroller systems, these little chips offer a nifty way to store a
small amount of data, using only a few of the port pins, and without
raising the system cost. They are usually specified to retain the data for
10 years and to endure about 100,000 write operations before failure.

Fig. 7-26. Schematic diagram of Atmel 1k bit serial EEPROM, AT24C01.

7-9. Secondary & Tertiary Memory


The secondary memory of a computer system is its massive memory. It
has a high capacity (in the order of 100GB or more) but its speed is
usually much lower than the main semiconductor memory. So, this type
of memory is usually cheaper than the main memory in terms of cost-per-
bit. Although storage density of these devices grew rapidly during the last
few years, their access time could not be decreased at the same rate. The
slow speed of secondary memory devices is due to their mechanism of
operation, which is usually based on the mechanical motion of the storage
medium, with respect to a pick-up head (optical or magnetic).

460
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-9.1. Magnetic Storage Devices


Magnetic storage devices store information in the form of magnetic field
on magnetically coated surface. Magnetic storage devices fall in the
category of non-volatile devices. This makes magnetic storage devices to
be useful for long term data storage. Hard disks, floppy disks and tape
devices are examples of magnetic storage devices

Fig. 7-27. Schematic diagram of a magnetic storage system.

i. Magnetic Tape Drives


The magnetic tape drives started to replace punch cards in 1950. Only a
couple of years later, magnetic drums appeared on the scene. The early
PC's used data cassette recorders, which are a sort of tape drives, as a
secondary storage device. Using a data cassette for storage was very slow.

Fig. 7-28. Schematic diagram of a magnetic tape system.

461
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

ii. Magnetic Disk Memory


IBM introduced the first floppy disk, in 1971. It was an 8" floppy plastic
disk coated with iron oxide. The term "floppy" accurately fits the earliest
8" and 5.25" diskettes that succeeded them. However, removable floppy
disks and floppy disk drives (FDD) did not become popular as storage
devices before 1978 when Apple introduced the disk II. The inner disk
that holds data inside floppy diskettes is usually made of mylar and
coated with a magnetic oxide, and the outer, plastic cover, bends easily.
The first 3.5" FDD, double-sided, double-density, holding up to 875 kB,
was introduced by Sony in 1980. The inner disk of 3.5" floppies are
similarly constructed, but they are housed in a rigid plastic case, which is
much more durable than the older 5” diskettes.

Fig. 7-29. Photgraphs of some floppy disks (diskettes).

The first hard disk drive (HDD) was introduced in 1957 as a component
of IBM's RAMAC 350. It required 50 x24" disks to store data and cost
about $35,000. In 1973, IBM introduced the IBM 3340 hard disk unit,
known as the Winchester2. The recording head, of this drive rides on a
thin air gap 0.0005 mm thick, over the rotating hard disks. The descriptor
"hard" is used because the inner disks that hold data in a hard drive are
made of a rigid aluminum alloy. These thin disks (called platters) are
coated with a much improved magnetic material and last much longer
than plastic floppy diskette. The longer life of a hard drive is also a
function of the disk drive read/write head. In fact, the heads do not
contact the storage media in a hard disk drive, whereas in a floppy drive,
the read/write head does contact the media, causing wear. For years, hard
disk drives were confined to mainframe and minicomputers. With the
introduction of the IBM PC, in 1982, hard disk drives also became a
standard component of most personal computers.

2
IBM's development code name.
462
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Sector

Tracks

Fig. 7-30. Schematic of a Hrad disk drive (HDD).

In 1997, IBM announced the highest capacity desktop PC hard disk drive
with a breakthrough technology called Giant Magneto-resistive (GMR)
heads. The first HDD with GMR heads was used in the IBM Deskstar, a
16.8 GB drive. Figure 7-17 depicts the hard disk drive and the disk
organization. The hard disk drive is composed of several disks and
several read/write heads. The heads are arranged as a movable comb.
Each surface of a disk is divided into concentric tracks and each track is
subdivided into sectors. Each sector can hold 512 bytes of data or more.
The tracks of similar diameter of all surfaces of the hard disk assembly
are also called a logical cylinder.

The data on the disk can be addressed by the surface number, the track
number as well as the sector number. The electronic circuits of the HDD
can identify the first sector, of a given track using an outer timing track
(in hard disks) or an index hole (in floppy diskettes). The hard disk
capacity can be calculated as follows. Assuming a HD with N reordering
surfaces, T tracks per surface, S sectors per track, and each sector holds a

463
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

block of B bytes per sector. Then the HD capacity is given by the


following equation

HD capacity = N surfaces x T track/surface x


S sector/track x B bytes/ sector (7-6)

The access time of a HD is the time delay between receiving the data
address and the beginning of data transfer. In moving head HDD's, the
access time is the sum of the track seek time and the rotational delay (or
latency) time. The track seek time is dependent on the relative distance
between the head initial track and requested track positions, and its
average value is about 10ms. The latency time is the time taken for the
head to be positioned on a requested address, after it has been positioned
on the requested track. The average latency time of a HDD is estimated as
the time of a half revolution, and its average value is about 5 ms.

The data on a floppy diskette is usually encoded using the so-called


modified frequency modulation (MFM) techniques, whereas HDD make
use of the so-called Run Length Limited (RLL) encoding mechanism. In
all cases, data is encoded by the disk drive controller before being sent for
storage on the disk surface. Such encoding mechanisms enable faster
access and more data storage density.

The so-called disk controller is an interface circuit that controls the disk
speed, the head motion as well as data encoding and interfacing services.
The ST-506, the oldest disk controller, was capable of transmitting data
at a maximum speed of 1MB/s. The Integrated Drive Electronics (IDE)
systems incorporate the disk drive with its controller interface that can be
attached to the computer motherboard through a simple cable. The access
time of an IDE drive is in the order of 10 ms and its speed is 10 MB/s.
Some enhanced version of the IDE (EIDE) can transfer data at 33 MB/s.
The data transfer rate of the so-called SCSI (Small Computer Storage
Interface) controllers can attain 50 MB/s or even higher speeds. The
HDD is usually connected to the PC motherboard via parallel Advanced
Technology Attachment attachment (PATA) or serial ATA (SATA)
cables. More details about PATA and SATA can be found in Chapter 9.

7-9.2. Optical Disk Memory & Compact Disks


Optical storage devices store data on reflective discs in the form of pits
and bumps (or lands), as shown in figure 7-30. The CD is usually
manufactured, by plastic injection of polycarbonate disks. The optical
head emits a modulated laser beam onto the rotating CD surface via
464
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

lenses. The reflected light is transferred and transformed into a stream of


1's and 0's, which corresponds to the stored data. The compact disk read-
only memory (CD-ROM) and optical disk drives were introduced in the
1980's by Philips and Sony, as an extension of audio CD technology. The
usual CD can store up to 700 MB of data or audio tracks. The first low-
cost CD-ROM drive for PCs, were introduced by Tandy in 1991, at
US$400. In 1994, NEC Technologies announced its quad-speed (4X or
600 kB/s) CD-ROM, priced at $1000. Nowadays, fast CD drives (CDD's)
can be bought at less than $20. The write once read many (WORM)
CDD's are common place. Also, the digital video disk (DVD) drives,
which can host huge data, are widespread in laptops and PC's.

Fig. 7-31. Schematic diagram of a compact disk (CD) and its optical head assembly.

465
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-10. Mobile Memory Modules


Mobile memory modules are some sort of plug-in memory modules.
Historically, memory module cards have been introduced with pocket and
handheld computers as an electronic alternative to mechanical hard-disk
drives. For this reason, memory cards are sometimes called solid-state
disks (SSD). They are usually implemented from either SRAM modules
with built-in back-up batteries or Flash (EEPROM) memory chips.
Today, mobile memory modules are realized in so many other forms,
which are suitable for laptop and desktop computers.

7-10.1 SRAM Cards


SRAM Memory cards, shown in figure 7-32, offer a high performance,
non-volatile storage solution for code and data storage, disk caching, and
write intensive mobile and embedded applications. Packaged in PCMCIA
housing (a sort of bus extension slots in mobile computers), the SRAM
card is based on high density and super low power SRAM memories,
providing densities starting from 512 kB up to hundreds of MB. The
SRAM Memory cards usually operate at speeds around 100 ns. They are
usually based on CMOS technology. SRAM cards contain rechargeable
lithium batteries and recharge circuitry. The recharge feature eliminates
the data loss during critical times.

Fig. 7-32. Photograph of an SRAM card.

7-10.2 Flash Memory Cards


Flash memory is is non-volatile computer memory that can be electrically
erased and reprogrammed. Flash is a form of EEPROM that allows
multiple memory locations to be erased or written in one operation.
Figure 7-33 depicts an array of such non-volatile memory cells.

466
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-33. EEPROM array and cell structure

The flash devices are configured as NAND flash or NOR flash. NOR and
NAND flash get their names from the structure of the interconnections
between memory cells, as shown in figure 7-34.

NOR-based flash has lower density than newer NAND-based systems. A


single-level NOR flash cell in its default state is logically equivalent to a
binary "1" value, because current will flow through the channel under
application of an appropriate voltage to the control gate. A NOR flash cell
can be programmed, or set to a binary "0" value, by applying a normal
on-voltage (typically 1.5-5V) to the control gate. In order to erase a NOR
flash cell (resetting it to the "1" state), a large voltage of the opposite
polarity is applied between the CG and drain, pulling the electrons off the
FG through quantum tunneling. Modern NOR flash memory chips are
divided into erase segments (often called blocks). Apart from being used
as random-access ROM, NOR memories can also be used as storage
devices. However, NOR flash chips typically have slow write speeds
compared with NAND flash.

467
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Fig. 7-34. Structure of an AND and NOR flash memories

NAND flash architecture was first introduced by Toshiba in 1989.


Unlike NOR flash memory, NAND flash memories cannot provide
execute in place due to their different construction principles. Thus, the
NAND flash memories are accessed much like block devices. The pages
are typically 512 or 2,048 bytes in size. A few bytes (12–16 bytes) are
associated with each page for storage of an error detection and correction
checksum. Such NAND flash memory chips form the core of the
removable USB interface storage devices known as USB flash drives, as
well as most memory card formats available today. One limitation of
flash memory is that while it can be read or programmed byte by byte, it
must be erased by block. Starting with an erased block, any byte within
that block can be programmed. Once a byte is programmed, it cannot be
changed again until the entire block is erased. In other words, flash
memory (specifically NOR flash) offers random-access read and
program operations, but cannot offer random-access rewrite or erase.

468
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Flash memory is actually used in so many applications, namely:


 The PC BIOS chip,
 Flash memory cards in digital cameras and game consoles,
 PCMCIA Type I and Type II memory cards,
 USB Memory Sticks or USB Flash.
Flash memory cards are solid-state electronic data storage devices used in
so many applications such as digital cameras, mobile computers, cell
phones, music players and game consoles. They offer high re-record-
ability, power-free storage, small form factor, and rugged environmental
specifications. There are also non-solid-state memory cards that do not
use flash memory, and there are different types of flash memory. There
are many different types of memory cards. PC card (PCMCIA) were
among first commercial memory card formats (type I cards) to come out
in the 1990s, but are now only mainly used in industrial applications. In
1990s, a number of memory card formats smaller than PC Card came out,
including CompactFlash, SmartMedia, and Miniature Card. In other
areas, tiny embedded memory cards (SID) were used in cell phones, game
consoles started using proprietary memory card formats. From the late
1990s to the early 2000s a host of new formats appeared, including
SD/MMC, Memory Stick, and a number of variants and smaller cards.
The desire for ultra-small cards for cell-phones, PDAs, and digital
cameras drove a trend toward smaller cards. In digital cameras
SmartMedia and CompactFlash had been very successful. In 2001 SM
alone captured 50% of the digital camera market. Starting from 2005
however, SD/MMC had nearly taken over SmartMedia spot, with stiff
competition coming from Memory Stick variants, xD, as well as
CompactFlash. In industrial fields, even the venerable PC card
(PCMCIA) memory cards still manage to maintain a niche, while in cell-
phones and PDAs, the memory card market is highly fragmented.

Fig. 7-35. Flash memory cards of different sizes and form factors.

469
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Table 7-6. List of the most famous flash memory cards

Name Acronym Form factor


PC Card PCMCIA 85.6 × 54 × 3.3 mm
CompactFlash I CF-I 43 × 36 × 3.3 mm
CompactFlash II CF-II 43 × 36 × 5.5 mm
SmartMedia SM / SMC 45 × 37 × 0.76 mm
Memory Stick MS 50.0 × 21.5 × 2.8 mm
Memory Stick Duo MSD 31.0 × 20.0 × 1.6 mm
Memory Stick Micro M2 M2 15.0 × 12.5 × 1.2 mm
Multimedia Card MMC 32 × 24 × 1.5 mm
MMCmicro Card MMCmicro 12 × 14 × 1.1 mm
Secure Digital card SD 32 × 24 × 2.1 mm
Universal Flash Storage UFS
miniSD card miniSD 21.5 × 20 × 1.4 mm
microSD card microSD 11 × 15 × 0.7 mm
xD-Picture Card xD 20 × 25 × 1.7 mm
Intelligent Stick iStick 24 x 18 x 2.8 mm
Serial Flash Module SFM 45 x 15 mm
µ card µcard 32 x 24 x 1 mm
NT Card NT NT+ 44 x 24 x 2.5 mm

Nowadays, most new PCs have built-in slots for a variety of memory
cards; Memory Stick, CompactFlash, SD, etc. Some digital gadgets
support more than one memory card to ensure compatibility. Fig. 7-35(b)
shows the Fujitsu 1MB Flash memory MBM29LV800. This flash
memory is organized as 1M bytes of 8-bits or 512K words of 16 bits. The
Fujitsu Flash memory MBM29LV800 features a single 3V power supply
operation for both read and write functions. These devices can electrically
erase the entire chip or all bits within a sector simultaneously via Fowler-
Nordhiem tunneling. A sector is typically erased and verified in 1 sec (if
already preprogrammed). The bytes/words are programmed one
byte/word at a time using the EPROM programming mechanism of hot
electron injection. Figure 7-36 depicts the architecture of the 2GB NAND
Flash memory HY27HU08AG. Also, the following table depict the pin-
out diagram and pin assignments of this flash memory. Recently, some
companies, like Samsung Electronics, succeeded to produce a NAND
Flash memory chips, with a 32GB capacity per chip. Such Flash memory

470
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

chips can be used in huge capacity memory modules, which are able to
store up to 64GB of data, or 40 movies.

Fig. 7-36(a). Structure of the Fujitsu 1 MB Flash memory, MBM29LV800

Fig. 7-36(b). Architecture. of the 2GB NNND flash memory HY27HU08AG

471
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Table 7-7. Pin assignment of the 2 MB Flash memory HY27HU08AG

7-10.3. USB Flash Drives


The USB flash drives are non-volatile flash storage devices, which are
used for storing and transferring data between computers via USB
(universal serial bus3) ports. USB flash drives have no moving parts and
usually have a design that incorporates excellent shock resistance and
data retention functionality. The USB flash drive is lightweight, compact
and incredibly small so that it can fit into your pocket! It requires no
cables, nor batteries. In addition, the USB interface means no other
external power source or driver is required. USB flash drive comes with
optional password protection security. The USB flash is bootable and
features a write protect switch to prevent the data from being erased by
accident. In order to operate the USB flash drive, it is simply plugged into
the USB port of any computer and it is ready to use. This device is
especially attractive to persons who regularly transport files between
several systems. Flash drive capacities on the market increase continually.
Few manufacturers continue to produce models of 1 GB; and many phase

3
For more details about USB, refer to chapter 9.

472
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

out 2GB, 4GB, 8GB and 16GB capacity flash memories. High speed has
become a standard for modern flash drives and capacities of up to 256 GB
have come on the market, as of 2010.

Fig. 7-37. USB flash memory modules

Fig. 7-38. Internal structure of a USB flash memory modul.


1- USB connector 2- USB memory controller, 3- Test points, 4- Flash memory chip, 5-
Crystal oscillator, 6- LED, 7- Write-protect switch, 8- Space for other flash memory.
473
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Because of the particular characteristics of flash memory, it is better used


with specifically-designed file systems which deal with the long erase
times of NOR flash blocks. The basic concept behind flash file systems is
as follows: When the flash store is to be updated, the file system will
write a new copy of the changed data over to a fresh block, remap the file
pointers, and then erase the old block later when it has time. JFFS was
the first of these file systems, YAFFS was released in 2003, to deal with
NAND flash, and JFFS was updated to support NAND flash too.

A group called the Open NAND Flash Interface Working Group (ONFI)
has developed a standardized low-level interface for NAND flash chips.
This allows interoperability between conforming NAND devices from
different vendors. The ONFI specification version 1.0 was released on
December 28, 2006. It specifies:

 a standard physical interface (pinout) for NAND flash in TSOP-48,


WSOP-48, LGA-52, and BGA-63 packages
 a standard command set for reading, writing, and erasing NAND flash
chips
 a mechanism for self-identification (comparable to the Serial Presence
Detection feature of SDRAM chips)

The ONFI group is supported by major NAND flash manufacturers,


including Intel, Micron Technology, and Sony, as well as by major
manufacturers of devices incorporating NAND flash chips.

474
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-11. Summary

In this chapter we discuss the different concepts of memory interface


circuits to a microprocessor, with emphasis on x86-based systems. The
chapter covers modern techniques in memory systems design including
cache memory, DRAM modules and emerging memory technologies. As
the performance of a computer system is strongly dependent on memory
technology, the book also covers high-performance storage systems based
on Flash technologies. The following figure depicts the memory interface
circuit to an 8088 microprocessor, which was used in the early IBM PC’s.

In fact, the Memory subsystem is the most important components in any


microprocessor-based systems. Some computer basic input/output
routines (BIOS) have to be permanently stored in the computer read-only-
memory (ROM). Every time a computer is started up, programs are
loaded from secondary memory (hard disk) into the computer memory.
The main memory into which these programs are loaded is the random
access memory (RAM). Therefore, every computer contains several types
of memory devices. These memory devices are different in capacity,
speed, and theory of operation.

475
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

Random Access Memory (RAM) is a key component of computers and,


there's a wide variety of RAM. RAM has been around since the 1940s
when military computers used vacuum tubes to store a computer's
working memory, but today most people interact with RAM as
microchips either inside their computing devices or outside as little Flash
drives. RAM's constant evolution has broken it into several main
categories based on speed, while modern computers have nonvolatile
RAM in the form of Flash memory.

SRAM stands for static random access memory. PCs, routers and servers
have this SRAM into their hardware. SRAM does not need to be
periodically refreshed with power to retain its information, like dynamic
random access memory (DRAM), helping it to conserve power. SRAM is
usually used as system cache, inside your PU and your PC motherboard.
SRAM employs so many transistors, typically 4 to 6 transistors 4 for each
bit as shown in Fig. 7-6(c), giving it faster speed but less storage capacity.
SRAM does not need to be refreshed, which makes it faster than DRAM.
The typical access time of SRAM is 5-10 ns, in contrast to a typical
access time of 60 ns for DRAM. Figure 7-7 depicts the block diagram of
an SRAM chip, like the 4008.

Cache memory is a sort of small fast memory (piece of SRAM) which is


used to improve average memory response time. To maintain full speed,
the CPU must latch instructions and data in an internal cache memory,
thus avoiding any need to access external memory. Caches usually make
use of random access or even faster access method, called "associative

4
Some sort of SRAM's called FeRAM's (ferroelectric RAM) are utilizing only one transistor per bit.

476
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

addressing". Associative memories are also commonly known as


content-addressable memories (CAM). In a CAM any stored item can be
accessed by using the contents of the item in question. The field chosen to
access the CAM is called a "KEY".

DRAM stands for dynamic random access memory and unlike SRAM,
the chip needs to be periodically recharged with power to keep the
information on it from fading. DRAM has higher power consumption and
capabilities than SRAM. A new version of DRAM called single data rate
synchronous DRAM, SDR SDRAM led to faster computing and higher
memory capacities. The following figure depicts the DRAM cell:

The next table summarizes the RAM technologies and their applications
RAM Technology Application Access Speed Ports Characteristics
Static RAM level-1 and level-2 Fast One More expensive than DRAM
(SRAM) cache memory
Burst SRAM Level-2 cache Fast One SRAM in burst mode
(BSRAM) memory
DRAM Main memory Slow One A generic term for any kind of
Low-cost video dynamic (refreshed) RAM
FPM (Fast Page Main memory Slow One Prior to EDO DRAM, the most
Mode) DRAM Low-cost video common type of DRAM
EDO (Extended Main memory 5-20% faster than One Uses overlapping reads (one can
Data Out) DRAM Low-cost FPM DRAM begin while another is finishing)
BEDO (Burst EDO) Main memory and Faster than EDO One Not widely used, not supported
DRAM low-cost video DRAM by processor chipset makers
EDRAM (Enhanced Level-2 cache 15 ns SRAM One Contains a 256-byte SRAM
DRAM) memory 35 ns DRAM inside a larger DRAM
Nonvolatile RAM Preset phone Fast One Battery-powered RAM
(NVRAM) numbers
Synchronous DRAM Main memory See forms of One Generic term for DRAMs with a
(SDRAM) SDRAM synchronous interface
JEDEC SDRAM Main memory Fast One Dual-bank architecture
Most common form of SDRAM
PC100 SDRAM Main memory Intended to run at One An Intel specification designed
100 MHz to work with their i440BX
Double Data Rate Main memory Up to 200 MHz One Activates output on both up and
(DDR) SDRAM down of clock edge, double data

477
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

rate of PC100 SDRAM


Double Data Rate Main memory Up to 400 MHz One Activates output on both up and
(DDR2) SDRAM down of clock edge,
Enhanced SDRAM Main memory Fast > 100 MHz Two Twice as fast as SDRAM
(ESDRAM) See Enhanced Memory Systems
SyncLink DRAM Main memory Fastest >200 MHz One Uses "packets" for address, data,
(SLDRAM) and control
Direct Rambus Main memory Up to 800 MHz One Backed by Intel and Rambus Inc.
DRAM (DRDRAM) but with 16-bit bus
Ferroelectric RAM Main memory in ? ? Developed by Ramtron
(FRAM) small devices
RAMDAC Video card Fast One SRAM to store color palette table
Rambus DRAM Video memory for Up to 600 MHz One Intel and Rambus Inc.
(RDRAM) Nintendos architecture
Synchronous Moderate to high-end Closer to VRAM One Has special performance-
Graphics RAM video memory than DRAM enhancing features
(SGRAM) Example: Matrox Mystique
VRAM (Video Higher-cost video Twice the speed of Two Dual-ported, new image is stored
RAM) memory DRAM while other is sent to display
WRAM (Window Less expensive video 25% faster than Two With RAMDAC, can handle true
RAM) memory VRAM color at 1600 x 1200 pixel
Multibank DRAM Low-cost video Faster One Interleaved memory accesses
(MDRAM) memory applications between banks

478
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-12. Problems
7-1) Discuss the different methods, which may be used for memory
addressing. Show how to decode an address for a ROM system using
simple NAND gates, 74LS138/74LS139 decoders. Show how to use a
PAL (programmable array logic) for address decoding of memory systems
7-2) Show how to use PAL16L8 for address decoding of sixteen 27512
EPROM memory devices (64k x 8 bits) interfaced to a Pentium
microprocessor at locations FFF80000H-FFFFFFFFH. Write down the
PAL program, to be used for the PAL16L8.
7-3) Draw a schematic diagram showing how the memory is organized in
the IBM PC, which is equipped with 8088 processor in maximum mode
7-4) Draw a schematic diagram showing how to interface 256kB RAM to
8088 microprocessor (8-bit data)
7-5) Draw a schematic diagram showing how to interface 256kB RAM to
8086 microprocessor (16-bit data)
7-6) Draw a schematic diagram showing how to implement a 32-bit
memory interface for 80386/80486 processors, using four 8-bit DRAM
banks (each up to 1GB).
Hint: each memory bank is connected to only 8-bit lines. For instance,
the first memory bank is connected to D0-D7 and bank 2 is connected to
D8-D15 and so on. The Address lines A0 and A1 are used for Bank
selection, while the other 32-bit address lines A2-A31 are used for
addressing memory locations inside each memory bank (up to 1GB).
7-7) Draw a schematic diagram showing how to implement a 64-bit
memory interface for Pentium processors (64-bit external data bus), using
four 8-bit DRAM banks (each up to 1GB).
7-8) Consider the moving area disk – storage device that has the
following parameters. Estimate the disk capacity, the average latency of
the disk and calculate the data transfer rate of the whole drive.
- Number of recording surface: 8
- Number of tracks/recording surface: 200
- Track storage capacity: 64 k bit/track
- Disk rotation speed (RPM): 2400 revolution per minute
Hint: The data transfer rate may be calculated as (track storage capacity)
/ (time taken to read/write a track). The average time taken to read/write a
track may be approximated as 60/RPM.

479
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-9) The reason for the implementation of the cache memory is


a) To increase the internal memory of the system
b) The difference in speeds of operation of the processor and memory
c) To reduce the memory access and cycle time
d) All of the above
7-10) The cache memory is based on the property of ____
a) Locality of reference
b) Memory localisation
c) Memory size
d) None of the above
7-11) The reason for the implementation of the cache memory is
a) To increase the internal memory of the system
b) The difference in speeds of operation of the processor and memory
c) To reduce the memory access and cycle time
d) All of the above
7-12) The type of memory assignment used in Intel processors is _____ .
a) Little Endian
b) Big Endian
c) Medium Endian
d) None of the above
7-13) The address space in ARM is ______ .
a) 2^24
b) 2^64
c) 2^16
d) 2^32
7-14) The address system supported by ARM systems is/are _____ .
a) Little Endian
b) Big Endian
c) X-Little Endian
d) Both a and b
7-15) The effective address of the instruction written in Post-indexed
mode, MOVE[Rn]+Rm is _____ .
a) EA = [Rn]
b) EA = [Rn + Rm]
c) EA = [Rn] + Rm
d) EA = [Rm] + Rn

480
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

7-13. Bibliography

[1] Stephen W. Miller, Memory and Storage Technology, AFIPS Press,


Montvale 1977.

[2] J. E. UFFENBECK, The 8086/8088 family: Design, Programming


and Interfacing, Prentice-Hall, 1987.

[3] K. Hwang, Advanced Computer architecture, McGraw Hill, 1993

[4] Intel Corp., Peripheral Components, Santa Clara. CA, 1993.

[5] J. P. Hoges, Computer Architecture and Organization, McGraw-Hill,


1998.

[6] R. P. Nelson, Microsoft's 80386/80486 Programming Guides,


Microsoft Press, 1998.

[7] http://www.intel.com

481
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 7

482
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 8

Input/Output Interface
Circuits for Microprocessors
Contents
8-1. Introduction (I/O Transfer Modes)
8-2. Methods of Addressing I/O Ports
8-2.1. I/O address Space
8-2.2. Memory-mapped I/O
8-3. I/O Instructions
8-3.1. Register I/O Instructions
8-3.2. Block I/O Instructions
8-4. Protected I/O
8-5. Designing I/O interfaces in 80x86 systems
8-5.1. Implementing Simple Input Ports Using 74LS244 Buffers
8-5.2. Implementing Simple Output Ports Using 74LS373 Latch
8-6. The 8255 Programmable Peripheral interface (PPI) Chip.
Example 8-1. Basic I/O Mode
Example 8-2. Basic I/O Mode
Example 8-3. Keyboard Scanner & 7-Segment Display
Example 8-4. Square Wave generator (BSR Mode).
Example 8-5. Input from ADC
Example 8-6. Stepper Motor Control
8-7. I/O with Handshaking Capabilities
8-7.1. I/O with Handshaking Capabilities
Example 8-7. I/O with handshaking (Mode 1)
8-7.2. Bidirectional I/O with Handshaking Capabilities
Example 8-8. Bidirectional I/O with handshaking (Mode 2)
8-7.3. CPU services for I/O Control
483

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Contents of Chapter 8 (Cont.)


8-8. I/O – Memory Interface & Direct Memory Access (DMA)
8-8.1. The DMA Architecture
8-8.2. How Does DMA Work?
8-8.3. DMA Usage in IBM PC
8-8.4. DMA Modes of Operation
8-8.5. Programming The DMA
8-9. I/O Processors
8-9-1. Features of IOP's
8-9-2. Intel 8089 IOP
8-9-2. Intel 80321 IOP
8-10. Summary
8-11. Problems

484

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Input/Output Interface
Circuits for
Microprocessors
8-1. Introduction (I/O Transfer Modes)
Interfacing is the process of connecting a microprocessor to the rest of
external devices. We have seen so far that microprocessors can access data
from I/O ports as well as memory. In this chapter we present the main
principles of I/O interfacing in any microprocessor-based systems, with
emphasis on x86 processors. There exist several modes of data
input/output in microprocessor-based and computer systems:

 I/O under microprocessor control


 Interrupt-initiated data transfer
 Direct memory access (DMA) transfer, from secondary to main memory
 Transfer of data through I/O processors (IOP's)

Data
Address
Control
CPU
Interface Interface Interface

Input Device Output Device I/O Device

Fig. 8-1. I/O interfacing in a microprocessor system.

The first two I/O modes are directly serviced by microprocessor, whereas
the other two modes are serviced by specialized chips. In this chapter, we
discuss the I/O operations of the x86 microprocessors, from the following
perspectives:

 Methods of addressing I/O ports


 Instructions for I/O operations
 Protected I/O
 Designing I/O interface in 80x86 systems
 Using the 8255 programmable peripheral interface (PPI) chip.
485

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

In addition, we’ll discuss the microprocessor services for smart I/O data
transfer, like interrupts and I/O with handshaking. We’ll also discuss how
the 80x86 can be interfaced to the 8237 DMA controller or an I/O
processor.

8-2. Methods of I/O Addressing in 80x86 Systems


The x86 processors systems, like other CPUs, allows input/output to be
performed in either of two ways:

 Using a separate I/O address space, with specific I/O instructions


 Using memory-mapped I/O, with general-purpose operand
manipulation instructions.

8-2.1. I/O Address Space


The x86 processors provide a separate I/O address space, distinct from
physical memory that can be used to address the input/output ports. The
I/O address space consists of 64k (216) individually addressable 8-bit ports.
Also, any two consecutive 8-bit ports can be treated as a 16-bit port; and
any four consecutive 8-bit ports can be treated as a 32-bit port. Thus, the
I/O address space can accommodate up to 64k 8-bit ports, up to 32k 16-bit
ports, or up to 16k 32-bit ports.

The program can specify the address of the port in two ways:

1- Using an immediate byte constant, indicating the port address, the


program can specify:
* 256 x 8-bit ports numbered 0 through 255.
* 128 x 16-bit ports numbered 0, 2, 4, . . . , 252, 254.
* 64 x 32-bit ports numbered 0, 4, 8, . . . , 248, 252.
2- Using a value in DX, indicating the port address, the program can
specify:
* 64k x 8-bit ports numbered 0, 1, 2, . . . , 65534, 65535
* 32k x 16-bit ports numbered 0, 2, 4, . . . , 65532, 65534
* 16k x 32-bit ports numbered 0, 4, 8, . . . , 65528, 65532

Generally speaking, the x86 microprocessors can transfer 8, 16, 32 and 64


bits at a time to a device located in the I/O space (according to the
processor data bus width). Like double-words in memory, 32-bit ports
should be aligned at addresses evenly divisible by four so that the 32 bits
can be transferred in a single bus access.
486

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

The instructions IN and OUT move data between a register and a port in
the I/O address space. The instructions INS and OUTS move strings of
data between the memory address space and I/O ports. Like words in
memory, 16-bit ports should be aligned at even-numbered addresses so that
the 16 bits can be transferred in a single bus access. An 8-bit port may be
located at any memory location, so that either an even or odd addresses are
possible.

8-2.2. Memory-Mapped I/O


I/O devices also may be placed in the memory address space. As long as
the devices respond like memory components, they are indistinguishable to
the processor.
Memory-mapped I/O provides additional programming flexibility. Any
instruction that references memory may be used to access an I/O port
located in the memory space. For example, the MOV instruction can
transfer data between any register and a port; and the AND, OR, and TEST
instructions may be used to manipulate bits in the internal registers of a
device (see Figure 8-1). Memory-mapped I/O maintains the full
complement of addressing modes for selecting the desired I/O device (e.g.,
direct address, indirect address, base register, index register). Memory-
mapped I/O, like any other memory reference, is subject to access
protection and control when executing in protected mode.

487

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-3. I/O Instructions


The I/O instructions of the x86 systems provide access to the processor's
I/O ports for the transfer of data to and from peripheral devices. These
instructions have as one operand the address of a port in the I/O address
space. There are two classes of I/O instruction:

 Those that transfer a single item (byte, word, or dword) located in a


register.
 Those that transfer strings of items (strings of bytes, words, or dwords)
located in memory. These are known as "string I/O” or "block I/O”
instructions.

8-3.1. Register I/O Instructions


The I/O instructions IN and OUT are provided to move data between I/O
ports and the accumulator EAX (32-bit), AX (16-bit), or AL (8-bit)
registers. IN and OUT instructions address I/O ports either directly, with
up to 256 port addresses coded in the instruction, or indirectly via the DX
register for up to 64k ports.

IN (Input from port) transfers a byte, word, or dword from an input port to
AL, AX or EAX. If a program specifies AL with the IN instruction, the
processor transfers 8 bits from the selected port to AL. If a program
specifies AX with the IN instruction, the processor transfers 16 bits from
the port to AX. If a program specifies EAX with the IN instruction, the
processor transfers 32 bits from the port to EAX.

IN eAX,port# Or IN eAX,DX

where port# is the immediate value of input port address and eAX indicates
the accumulator name (AL or AX or EAX, according to the port size). As
we mentioned above, the input port address may be pointed at by a value
inside the DX register.

OUT (Output to Port) transfers a byte, word, or doubleword to an output


port from AL, AX, or EAX. The program can specify the number of the
port using the same methods as the IN instruction.

OUT port#,eAX Or OUT DX,eAX

Again, port# is the immediate value of the output port address and eAX
indicates the accumulator name (AL or AX or EAX, according to the port
size). Also, the output port address may be pointed at by the DX register.
488

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-3.2. Block I/O Instructions


The block (or string) I/O instructions INS and OUTS move blocks of data
between I/O ports and memory space. Block I/O instructions usually use
the DX register to specify the address of a port in the I/O address space.
INS and OUTS use DX to specify:

 8-bit ports numbered 0 through 65535


 16-bit ports numbered 0, 2, 4, . . . , 65532, 65534
 32-bit ports numbered 0, 4, 8, . . . , 65528, 65532

We can perform block input or output operations by using the repeat


prefixes (REP, REPE, REPNE ... etc), with INS and OUTS, instructions.
The repeat prefix REP, modifies INS and OUTS to provide a means of
transferring blocks of data between an I/O port and memory. These block
I/O instructions are string primitives (refer also to Chapter 3 for more on
string primitives). They simplify programming and increase the speed of
data transfer by eliminating the need to use a separate LOOP instruction or
an intermediate register to hold the data.

Block I/O instructions use either ESI or EDI to designate the source (for
OUTS) or destination memory address (for INS). For each transfer, SI or
DI are automatically either incremented or decremented as specified by the
direction flag bit (DF) in the flag register.

The string I/O primitives can operate on byte strings, word strings, or
doubleword strings. After each transfer, the memory address in ESI or EDI
is updated by 1 for byte operands, by 2 for word operands, or by 4 for
doubleword operands. The value in the direction flag (DF) determines
whether the processor automatically increments ESI or EDI (DF=0) or
whether it automatically decrements these registers (DF=1).

INS (Input String from Port) transfers a byte or a word string element from
an input port to memory.

INS dest, port

INS dest,port transfers a byte, word or doubleword from the hardware port
specified by port or specified in DX to ES:EDI even if a memory
destination operand “dest” is supplied. The mnemonics INSB, INSW, and
INSD are variants that explicitly specify the size of the operand. For INSB,
INSW, INSD no operands are allowed and the size is determined by the
mnemonic.
489

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

If a program specifies INSB, the processor transfers 8 bits from the


selected port to the memory location indicated by ES:DI. If a program
specifies INSW, the processor transfers 16 bits from the port to the
memory location indicated by ES:DI. If a program specifies INSD, the
processor transfers 32 bits from the port to the memory location indicated
by ES:EDI. The destination segment register choice (ES) cannot be
changed for the INS instruction. Combined with the REP prefix, the INS
instruction can move a block of information from an input port to a series
of consecutive memory locations.

OUTS (Output String to Port) transfers a byte, word, or doubleword string


element from memory to an output port.

OUTS port, src

OUTS port,src transfers a byte, word or doubleword from "src" to the


hardware port specified by port or specified in DX. For instructions with
no operands the "src" is located at DS:ESI and ESI is incremented or
decremented by the size of the operand or the size dictated by the
instruction format.

The mnemonics OUTSB, OUTSW, and OUTSD are variants that


explicitly specify the size of the operand. If a program specifies OUTSB,
the processor transfers 8 bits from the memory location indicated by DS:SI
to the selected port. If a program specifies OUTSW, the processor transfers
16 bits from the memory location indicated by DS:SI to the selected port.
If a program specifies OUTSD, the processor transfers 32 bits from the
memory location indicated by EDS:EDI to the selected port.

Combined with the REP prefix, the OUTS instruction can move a block of
information from a series of consecutive memory locations indicated by
DS:ESI to an output port.

490

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-4. Protection and I/O


There exist two mechanisms for providing protection for I/O functions in
80386 and later microprocessors:

1. The input/output privilege level IOPL field, in the EFLAGS register,


defines the right to use I/O-related instructions.
2. The I/O permission bit map of the TSS segment defines the right to
use ports in the I/O address space.

These mechanisms operate only in protected mode, including virtual 86


mode; they do not operate in real mode. In real mode, there is no protection
of the I/O space; any procedure can execute I/O instructions, and any I/O
port can be addressed by the I/O instructions.

Instructions that deal with I/O need to be restricted but also need to be
executed by procedures executing at privilege levels other than zero. The
IOPL defines the privilege level needed to execute I/O-related instructions
(IOPL=0 means highest priority and IOPL=3 means lowest priority). The
IN, INS, OUT, OUTS, STI and CLI instructions can be executed in
protected mode only if current privilege level CPL = IOPL.

The 80386 and later processors have the ability to selectively trap
references to specific I/O addresses. The structure that enables selective
trapping is called the I/O Permission Bit Map

491

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-5. Designing I/O interface for x86 Systems


In this section we discuss the simplest methods to implement I/O ports for
a microprocessor, with emphasis on the 80x86 microprocessor systems,
using octal 3-state buffers (e.g., 74LS244/74LS245) and octal latches (e.g.,
74LS373/74LS374).

8-5.1. Implementing Simple Input Ports Using 3-State Buffers


The octal 3-state buffer chips (such as 74LS244 / 74LS245) can be used
for implementing simple 8-bit input ports, as shown in figure 8-2(a).

Fig. 8-2(a). Implementation of an input port using the 74LS244 octal buffer. Note that
G1 and G2 are active low and each one controls only 4 data bits of 74LS244 .

As shown in figure, the input port address may be implemented by gating


(or decoding) the address lines (A0-A7) together with IOR. For instance,
we’ve chosen A0 –to- A4 and A6, active high, while A5 and A7 are active
low, to implement an input port address at 5FH (when IOR is low). The
input port data may be obtained via dip switches, as shown in figure 8-2(b)

492

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

VCC

22k  resistors

8 dip switches

D0 2 18 D0 D
A
D1 4 16 D1 T
D2 6 14 D2 A
D3 8 74LS244 12 D3
D4 11 9 D4
D5 13 7 D5 B
D6 15 5 D6 U
D7 17 3 D7 S
1 19
1G 2G
SEL

Fig. 8-2(b). Implementation of a simple input port using 74LS244 octal buffer and 8 dip
switches

8-5.2. Implementing Simple Output Ports Using Octal Latches


The 3-state octal latch chips (such as 74LS373 / 74LS374) can be used for
implementing 8-bit output ports, as shown in figure 8-3(a). As shown in
figure, the port address can be implemented by appropriate gating (or
decoding) of the address lines with IOW. For instance, we’ve chosen A0,
A3, A4 and A7 active high, while A1, A2, A5 and A6 active low, to
implement an output port address equal 99FH (when IOW is low). The
output port data may be connected to a seven segment display, or simply to
8 light emitting diodes (LEDs), as shown in figure 8-3(b). Also, figure 8-
3(c) depicts the implementation of a simple output port, with an octal latch
and a 7-segment display. Of course the 74374 may be replaced with the 7-
segment decoder/driver circuit, which takes only 4 BCD lines input.
However, when using the 74374, the decoding process may be
implemented by software, as indicated in example 4-4.

493

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Fig. 8-3(a). The implementation of an output port using the 74LS373 octal latch.

VCC

680
resistors

8 LEDs

D D0 3 2 O0
A D1 4 5 O1
T D2
7 6 O2
A
D3 8 74LS374 9 O3
D4 13 12 O4
B D5 14 15 O5
U D6 17 16 O6
S D7 18 19 O7
1
OC CLK 7
11

Fig. 8-3(b). Implementation of a simple output port using an octal latch and 8 LEDs
494

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

VCC

.gfedcb a
74LS374 270
resistors
D D0 3 2 O0
A D1 4 5 O1
T D2
7 6 O2
A
D3 8 9 O3
D4 13 12 O4
B D5 14 15 O5
U D6 17 16 O6
S D7 18 19 O7

OC CLK 7
A0
A1
.
.
A5
A6
A7

IOW

Fig. 8-3(c). Implementation of a output port by octal latch and 7-segment display

Note that the 8 flip flops of the 74LS373 or 74LS374 are working as edge-
triggered D-type flip flops (latches). On the positive edge transition of the
clock (CLK) in 74374 or the control signal (C) in 74373 the outputs are set
equal to the inputs of the 8 latches. Note also that if you make use of a
common anode (CA) 7-segment display, then you should invert all data
inputs. This may be done either by hardware inverters or by software.
Instead of using 8 date lines (O0-O7) to drive the 7degment one can use
only 4 lines and a BCD-to-7segment decoder, as shown in the next figure
495

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Fig. 8-3(d) Output port with7segment circuits and a BCD-to-7segment decoder

Off course, additional driving circuits may be need to deliver more current
to the output devices as shown in the following figure.
,

Fig. 8-3(e) Output driving circuits

496

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-6. Using 8255 Programmable Peripheral Interface Chip


The 8255 programmable peripheral interface (PPI) chip is one of the most
famous chips, which are used to implement I/O interfacing with x86 and
other processors. The 8255 is sometimes called PIO (parallel I/O). The
8255 is a 40-pin chip, which has 3 programmable 8-bit I/O ports (PA, PB
and PC). Each of these 3 ports can be programmed as input or output. So,
the 8255 is more economical than separate I/O ports with 74LS244 buffers
and 74LS374 latches. Figure 8-4 depicts the pinout of the 8255 chip.

PA3 1 40 PA4

PA2 2 39 PA5

PA1 3 38 PA6

PA0 4 37 PA7

RD 5 36 WR
CS 6 35 RESET

GND 7 8255 34 D0

A1 8 33 D1

A0 9 32 D2

PC7 10 31 D3

PC6 11 30 D4

PC5 12 29 D5
PC4 13 28 D6

PC0 14 27 D7

PC1 15 26 VCC

PC2 16 25 PB7

PC3 17 24 PB6

PB0 18 23 PB5

PB1 19 22 PB4

PB2 20 21 PB3

Fig. 8-4. Pin-out diagram of the 8255 PPI (or PIO) chip.
497

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

As shown in figure, the three ports are all 8-bits (PA0-PA7, PB0-PB7, and
PC0-PC7). One can select a certain port at a given time; by the address
lines (A0 and A1) as well as the chip select (CS), as indicated in table 8-1.
The read (RD) and write (WR) control signals are active low and can be
connected with IOR and IOW of the processor system bus.
Table 8-1. Port selection map of the 8255 PIO chip

Selected Port CS A1 A0
PA 0 0 0
PB 0 0 1
PC 0 1 0
Control Register 0 1 1
Chip not selected 1 x x

The 8255 chip has internal control register, which can be selected to write
in or read from via address lines (A0, A1) as indicated in table 8-1. Figure
8-5 indicates control register word mapping. As shown in figure, the
control register word is used to program the input /output ports according
to the following modes:

D7 D6 D5 D4 D3 D2 D1 D0
Chip PA PA PCU PB Mode PB PCL
Mode Mode Select Direction Select Direction
1 = I/O 00 = Mode 0 1=I 1=I 0= Mode 0 1=I 1=I
0 = BSR 01 = Mode 1 0=O 0=O 1= Mode 1 0=O 0 =O
1x = Mode 2
Fig. 8-5. Control register word of the 8255 PIO chip. PCL means the lower 4-bits of
Port C, while PCU means the upper 4-bits of Port C.

As indicated above, the direction of PA, PB as well as the upper nibble


PCU and lower nibble PCL of PC can be controlled (as input or output)
from the control word. Also, PA can be programmed in 3 modes (mode 0,
1,2), while PB can be programmed in 2 modes (mode 0,1), as follows:
Mode 0 (Basic I/O mode): In this mode any port can be selected as input
or output by adjusting its own bit (0 or 1).
Mode 1 (Strobed I/O mode): In this mode PA and PB can be used as I/O
ports with handshaking capabilities. The handshaking bits are provided by
port PC.
Mode 2 (Bi-directional I/O): Here port PA can be used as bi-directional
I/O port with handshaking capabilities. Handshaking capabilities are
provided by PC.
498

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

In addition, the 8255 can be operated in BSR mode (bit set/reset mode). In
this mode, only individual bits of port PC can be programmed. When port
PC is used as status/control for PA or PB, the bits of PC can be set or reset
using the BSR mode. When the RESET pin is activated high, it clears the
control register and the default mode is selected (all ports are set as input
ports).
Example 8-1 (Basic I/O Mode).
Show how to configure the 8255 ports as follows:
port PA as input, port PB as output and port PC (both PCL and PCU) as
output. Proceed as follows:
i) Draw the circuit diagram, which contains the 8255 and the micro-
processor address, data, and control read/write lines.
ii) Determine the port addresses which will be assigned to PA, PB, PC
as well as the control register of the 8255
iii) Determine the control word (byte) which you’ll use
Write an assembly program that inputs data from port A and then sends
this data to both ports B and C (the output ports)

Solution:
Assume we’ll use the microprocessor 1st 8 address lines (A0 through A7)
to generate addresses for the 8255 ports, as follows:
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255,
and the Chip select (CS) is gated from (A2 through A7) as shown in figure
(8-6) so that CS = 110110
ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:

PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 1101 10 0 0 D8H
B 1101 10 0 1 D9H
C 1101 10 1 0 DAH
Control register 1101 10 1 1 DBH

iii) The control word is then as follows


D7 D6 D5 D4 D3 D2 D1 D0
1 00 1 0 0 0 0
I/O PA Port A PCU PB Port B PCL
Mode 0 Input Output Mode 0 Output Output

499

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

So, the control word is 10010000 or 90H

D0 D0 PA
D1 D1 Input Port
D3 D3
D4 D4
D5 D5
D6
PB Output Port
D6
D7 D7
8255
P A0
D7 A0
A1 D7 PC Output
A2 A1
A3 Port
.A4
A5 CS
A6 RD WR
A7

IOR

IO
W
Fig. 8-6. Connecting the 8255 PIO chip with a microprocessor address, data, and
control lines

iv) The assembly program:

TITLE PROGRAMMING THE 8255 PPI (in I/O Mode)


PORTA EQU D8H ;
PORTB EQU D9H
PORTC EQU DAH
CTRL W EQU DBH

MOV AL,90H ; Write control word into Accumulator


OUT CTRLW,AL ; send control word to control register
IN AL,PORTA ; input data from port A
OUT PORTB,AL ; output data to port B
OUT PORTC,AL ; output data to port C
END

In IBM PC, the 8255 chip is used in I/O mode 0. So, PA address =60H, PB
address =61H, PC address =62H, control register address =63H. Also, the
default control word is 99H.

500

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Example 8-2. Basic I/O Mode.


The circuit shown in figure 8-7 below makes use of the 8255
programmable peripheral interface. The 8255 ports are to be configured as
follows: PA as output, PB as input, PCL (PC0-PC3) as input and PCU
(PC4-PC7) as output.

i) Find the address of different ports of the 8255.


ii) Find the control word (byte) which you’ll use
iii) Show how to program the 8255 chip (using an 8086 assembly) so that
it gets data from PA and then sends this data to both PA and gets data
from PCL and sends it to PCU.

D0 D0 PA
D1 D1 Output Port
D3 D3
D4 D4
D5 D5
D6
PB Input Port
D6
D7 D7
8255
P A0
D7 A0
A1 D7 PCU
A1 Output
A2
A3
.A4 PCL Input
A5 C
A6 S RD WR
A7

IOR

IO
W
Fig. 8-7. Connecting the 8255 with the microprocessor address, data, and read/write
control lines

Solution:
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255
chip, and the Chip select (CS) is gated from (A2 through A7) as shown in
figure (8-7) so that CS = 011111

ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:

501

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 011111 0 0 7CH
B 011111 0 1 7DH
C 011111 1 0 7EH
Control register 011111 1 1 7FH

ii) The control word is then as follows:


D7 D6 D5 D4 D3 D2 D1 D0
I/O PA Port A PCU PB Port B PCL
Mode 0 Output Output Mode 0 Input Input
1 00 0 0 0 1 1

So, the control word is 10000011 or 83H

iii) The assembly program:

TITLE Programming the 8255 PPI in I/O Mode

MOV AL,83H
OUT 7F,AL ; Fill control register with control word
IN AL,7DH ; Input data from port PB
OUT 7CH,AL ; Output data to port PA
IN AL,7EH ; Get the 4-bit from PCL
AND AL,0FH ; Mask upper bits of AL (make them 0)
MOV CL,4 ; Load counter with 4
ROL AL,CL ; Rotate left 4 times
OUT 7EH,AL ; Send the 4-bits to PCU
END

Example 8-3. Keypad Scanner. (Basic I/O Mode).


Show how to configure the 8255 ports to implement a 16-key keypad
scanner interface. Let PA work as input (4 rows) and PB work as output (4
columns). Let also PC work as output such that you can connect a 7-
segment display, to display the pressed key from the keypad.

Solution
i) The first 2 address lines (A0, A1) are connected to (A0, A1) of the 8255,
and the Chip select (CS) is gated from (A2 through A7) as shown in figure
8-8, so that CS = 011011
ii) When the address lines A0 through A7 are gated as shown in figure, the
port addresses are as follows:
502

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

VCC
8086 8255
D0 PA0 Row0
D1 PA1 Row1
D2 PA2 Row2
D3 PA3 Row3
D4 PB0 Col0 10k
D5 PB1 Col1
D6 PB2 Col2
D7 PB3 Col3
A0 a
A1 PC0 b
: 7 :
PC6 g
CS RD WR 270
A2
A3
A4
A5
A6
A7 IO IOW
R
Fig. 8-8(a). Implementation of a 16-key keypad interface, using the 8255 PIO chip.

PORT CS Address
A7 A6 A5 A4 A3 A2 A1 A0
A 011011 0 0 6CH
B 011011 0 1 6DH
C 011011 1 0 6EH
Control register 011011 1 1 6FH

ii) The control word is then as follows:


D7 D6 D5 D4 D3 D2 D1 D0
1 00 1 0 0 0 0
I/O PA Port A PCU PB Port B PCL
Mode 0 Input Output Mode 0 Output Output

Therefore, the control word is 10010000 or 90H


iii) The assembly program is shown below. We write here the details of the
KEY procedure, which scans the keypad and sends the value of the pressed
key to AL. The flowchart of the KEY procedure is shown in figure 8-8(b).
The details of the 7SEG procedure, which reads the BCD value in AL and
translates it to 7segment code is shown in Example 4-4 (LODS BCD_STR
& STOS 7SEG_STR)
503

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Start

Scan keys
Scan keys
Delay, to debounce
Delay, to debounce
Scan keys
Scan keys

Y
Closed Key? Y
Open Key?

N N
Determine Key Code

RET

Wait for Key Release Wait for Keystroke


Fig, 8-8(b). Flowchart of the KEY procedure.

TITLE The 8255 PPI as a Keypad Interface (I/O Mode)


PA EQU 6C
PB EQU 6D
PC EQU 6E
CR EQU 6F
ROWS EQU 4
COLS EQU 4
;----------------------------------- Main Program -----------------------------------------------
MOV AL,90H ; Initialize the 8255 (PA input, PB output)
OUT CR,AL ; Fill control register with control word
CALL KEY ; Scan keypad, send pressed key to AL
CALL 7SEG ; Translate BCD value in AL to 7seg code
NEG AL ; Invert data for common anode 7segment
OUT PC,AL ; Display data.
;----------------------------------- KEY Subroutines -------------------------------------------
KEY: PROC NEAR
CALL SCAN ; Test all keys
JNZ KEY ; If No key is closed rewind, else
CALL DELAY ; wait for 10 ms (de-bounce)
CALL SCAN ; Test all keys
JNZ KEY ; If No key is closed rewind, else
Key1: CALL SCAN ; Scan all keys
JZ Key1 ; If a key is closed rewind, else
CALL DELAY ; wait 10 ms
504

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

CALL SCAN ; Scan all keys again


JZ Key1 ; If a key is closed rewind, else
PUSH AX ; Stack row codes
MOV AL,COLS ; Compute 1st row key
SUB AL,CL
MOV CH,ROWS
MUL CH
MOV CL,AL
DEC CL
POP AX
Key2: ROR AL,1
INC CL
JC KEY2
MOV AL,CL
RET
KEY ENDP
;-------------------------------------- SCAN Subroutines -------------------------------------
SCAN: PROC NEAR
MOV CL,ROWS
MOV BH,0FFH
SHL BH,CL
MOV CX,COLS
MOV BL,0FFH
Scan1: MOV AL,BL
OUT PB,AL
ROL BL,1
IN AL,PA
OR AL,BH
CMP AL,0FFH
JNZ Scan2
LOOP Scan1
Scan2: RET
SCAN ENDP
;----------------------------------- DELAY Subroutines ---------------------------------------
DELAY PROC NEAR
MOV CX,5000 ; For 8086/8088 running at 8MHz
Delay1: LOOP Delay1
RET
DELAY ENDP
END

Note that the delay is obtained by looping N (5000) times. Each loop
instruction takes about 17 clocks in 8086/8088, where the clock duration T
= 1/8MHz. So, we’ve: x 17. N.T = 10 ms or N =5,000

505

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Example 8-4. Square Wave Generator (BSR Mode).


Show how to program the 8255 PPI circuit shown in figure 8-7 in Bit
Set/Reset (BSR) mode, so that it can generate a square wave of 98Hz and
50% duty cycle, from 1st bit in Port C (PC1). Consider that the 8255 is
connected to a 8086 MPU operating at 10 MHz.
Solution: In the BSR mode, all bits of PC can be accessed individually by
setting the control register as follows:

D7 D6 D5 D4 D3 D2 D1 D0
BSR x x x Bit select Bit select Bit select S/R
0 0 0 0 = PC0 1 = Set
0 0 1 = PC1 0 = Reset
0 1 0 = PC2
: : :
1 1 1 = PC7

In the first half cycle (Ts1) bit PC1 is set (high). So the control word in the
first half cycle is 00000011 or 03H. In the second half cycle (Ts2) bit PC1
is reset (low). So the control word in the second half cycle is 00000010 or
02H.
iii) The assembly program:
TITLE THE 8255 PPI as a Square Wave Generator (BSR Mode)
MOV AL,03H
OUT 7F,AL ; Fill control register with control word
CALL DELAY ; Set bit PC1 for a delay time = Ts/2
MOV AL,03H
OUT 7F,AL ; Fill control register with control word
CALL DELAY ; Reset bit PC1 for a delay time = Ts/2
END

Assume the resultant square waveform has a duration Ts=1/98=10.2 ms.


For 50% duty cycle, the high and low durations of the square wave (Ts1,
Ts2) should be equal, as shown figure (8-9). So, we’ve Ts1=Ts2 = Ts/2
=5.1 ms. This delay is obtained by looping N times. Each loop instruction
takes about 17 clocks in 8086/8088, where the clock duration T =
1/10MHz. So, we’ve: 17 T. N = 5.1 ms or N =3,000. So, the DELAY
procedure may be written as follows:

DELAY PROC NEAR


MOV CX,3000
G7: LOOP G7 ; Loop 3,000 times
DELAY ENDP

506

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8255 Ts1 Ts2

PC1
:
PC7
t
0 Ts

Fig. 8-9. Schematic of the square wave, which is generated by the 8255 in BSR Mode

Example 8-5. Input from an ADC (Basic I/O, BSR Mode).


Show how to implement an interface between a 8-bit ADC (like the
AD570) and a microprocessor via the 8255 PPI.

Solution: The interface circuit is shown in figure 8-10. As shown, PA will


read digital output (Do0-Do7) from the ADC (AD570). PC0 is used to
send a "Start of Conversion" pulse (BC) for at least 2 s. At the end of
conversion, PC7 reads the "Data Ready" signal (DR) from the ADC.
The first control word (in the BSR mode) is as follows:

D7 D6 D5 D4 D3 D2 D1 D0
BSR x x x Bit select Bit select Bit select S/R
0 0 0 0 0 0 0 = PC0 1 = Set
0 = Reset

Therefore, the first control word CW1 = 01H to set PC0 or 00H to reset
PC0. The second control word (in the I/O mode) is as follows:

D7 D6 D5 D4 D3 D2 D1 D0
1 00 1 1 0 0 0
I/O PA Port A PCU PB Port B PCL
Mode 0 Input Input Mode 0 Output Output

So, the control word CW2 = 10010000 or 98H

507

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8086 8255 AD570 Analog


D0 D0 PA0 2 Do0 Vi input
D1 : PA1 13
3 Do1
D2 : PA2 4 Do2 VREF
: 14
D3 PA3 5 Do3
:
D4 : PA4 6 Do4
D5 : PA5 7 Do5
D6 : PA6 8 Do6
D7 D7 PA7 9 Do7
A0
A1 11 BC
PC0
PC7 17 DR
CS RD WR
A2
A3
A4
A5
A6
A7 IO IOW
R
Fig. 8-10. Implementation of an ADC interface, using the 8255 PIO chip.

The assembly program "ADC" that sends a start of conversion pulse and
reads digital data from the ADC at the end of conversion is as follows:
TITLE The 8255 PPI as ADC-Microprocessor Interface (BSR & I/O Modes)
ADC PROC NEAR
MOV AL,01H ; Issue BSR 1st control word
OUT 7F,AL ; Fill control register, Now PC0 =1
CALL DELAY
MOV AL,00H ; Issue BSR control word to reset PC0
OUT 7F,AL ; Fill control register, Now PC0 = 0. Start Conversion
READ:
MOV AL,98 ; Issue I/O Mode (Mode 0)
OUT 7F,AL ; Now PA is input and PCU is input
IN AL,7E ; Read PCU
RLC ; Place PC7 in Carry Flag bit (CF=PC7)
JC READ ; If PC7 = 1 then Rewind (wait) until end of conversion
IN AL,7C ; If PC7 = 0 then Read ADC digital output into AL
RET
ADC ENDP
DELAY PROC NEAR
MOV CX,12
G7: LOOP G7 ;
DELAY ENDP
END
508

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Note that, since the analog voltage needs to be constant during A/D
conversion, you need a sample & hold circuit before the ADC. The
DELAY routine is based on 17 T. N = 20 s for a 10 MHz processor.

Example 8-6. Stepper Motor Control. (Basic I/O Mode).


The stepper motor is useful in many applications like disk drives, printers,
plotters and robotics. Show how to interface a stepper motor to a PC
equipped with an 8086/8088 microprocessor, via the 8255 chip. Write an
assembly program that makes the motor running continuously until a key is
pressed.

D0 D0
D1 D1 D
D3 D3 R
D4 D4 PA I
D5 D5 V
D6 D6 E
D7 D7 R
8255
P A0
D7 A0
A1 D7
A2 A1
A3 Stepper
.A4 Motor
A5 CS
A6 RD WR
A7

IOR

IO
W
Fig. 8-11(a) Schematic of the stepper motor driver circuit

Solution:
The four leads of the stepper motor windings (A, B, A and B) can be
controlled by four bits of any port of the 8255. Consider PA0-PA3 as to
control the stepper motor, as shown in figure (8-11a). In order to rotate in
clockwise direction, we've to feed the motor 4 coils by the step positions
(33H,66H,0CCH,99H), from PA. This allows ROL and ROR instructions
to rotate the position bit pattern to the next step forward or reverse
position. So, if POS=33H, then the instruction ROL POS,1 will result in
POS=66H. Similarly, if POS=66H, then ROL POS,1 will result in
POS=99H, and so on.
509

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

We may also use a driver between the output bits (PA0-PA3) and the
stepper motor coils. The driver may be 4 inverter gates, with protection
diodes (to bypass the back EMF of motor coils). One can also use the
ULN2003 chip, which has 7 inverters with protection diodes, as shown in
figure (8-11b). The motor common wires should be connected to +VCC.

Fig. 8-11(b) Schematic of the stepper motor driver circuit and the layout of ULN2003.

TITLE THE 8255 PPI as a Motor Driver (I/O Mode)


PA EQU 300H ; Port PA address
CW EQU 303H ; Control register address
POS EQU 33H ; step sequence (33H or 66H or 99H or 0CCH)
COUNT EQU FFFFH ; Count of delay loops

MOV AL,80H ; Control word for I/O (PA output)


MOV DX,CW
OUT DX,AL ; Issue control word
MOV BL,POS ; First step sequence
ENCORE: MOV AH,01 ; Check if a key is pressed ??
INT 16H ; Call interrupt 16, to check keyboard
JNZ FIN ; Finish if any key is pressed,
MOV AL,BL ; otherwise prepare a step sequence to motor
MOV DX,PA
OUT DX,AL ; Send the step sequence to the motor
MOV CX,COUNT ; Wait for a certain delay time
ICI: LOOP ICI ; The delay is controlled via COUNT
ROR BL,1 ; Issue next step sequence for the motor
JMP ENCORE ; Repeat the above process
FIN:

510

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Note about Stepper Motors


A stepper motor is an electromagnetic device that converts electrical pulses
into mechanical rotational steps. The stepper motor rotation angle is
proportional to the input pulses. One of the most significant advantages of
stepper motors is their ability to be accurately controlled in open loop.
Unlike servo motors, a stepper motor doesn’t need to expensive feedback
and position sensing devices, such as optical encoders.
There exist 3 basic types of stepper motors, namely: variable reluctance
(VR) stepper motors, permanent magnet (PM), and hybrid motors.
Permanent magnet motors maybe unipolar or bipolar.

Fig. 8-11(c). Cross section of a permanent magnet bipolar stepper motor.

The following table indicates the most common drive modes of a stepper
motor.
1-Wave drive (1 phase is ON at a time, ABAB)
2-Full-step drive (2 phases ON at a time, ABABABAB)
3-Half-step drive (1&2 phases on, ABBABA ABBABA)

Table 8-2. Stepping modes of a stepper motor

Wave drive Full step drive Half-wave drive


Phase 1 2 3 4 1 2 3 4 1 2 3 4 5 6 7 8
A 1 1 1 1 1 1
B 1 1 1 1 1 1
A 1 1 1 1 1 1
B 1 1 1 1 1 1

511

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-7. Input /Output with Handshaking Capabilities.


There exist 2 famous ways for controlling asynchronous data transfer to
and from smart I/O devices:

1- Strobing
2- Handshaking

Assume we have two communicating devices; a source device which will


issue data and a destination device which will receive this data. For
instance, the source device may be the microprocessor or a PPI chip (like
8255), and the destination device maybe one of the I/O devices which are
attached to the microprocessor system. In order to synchronize data
transfer between data source and destination, we use either strobing or
handshaking techniques.

Strobing may be initiated either by the source or the destination. In


source-initiated strobing, the source device issues a STROBE signal
indicating that it is ready to send data. Then the destination device will
expect data on the data bus, which connects source with destination, as
shown in figure 8-12(a).

In destination-initiated strobing, the destination device issues a STROBE


signal indicating that it is ready to receive data. Then the source device will
issue data on the data bus, which connects source with destination.

Handshaking is a powerful method for controlling asynchronous data


transfer. In handshaking, the source device will issue a request control
signal, to tell destination that data is available and destination replies with
acknowledge control signal. So, handshaking involved two-way control
from both source and destination devices, as shown in figure 8-12(b,c,d)

Data
SOURCE DESTINATION

Strobe

Fig. 8-12(a) Schematic representation of data transfer using strobing mechanism

512

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Fig. 8-12(c) depicts the handshaking signals between a PPI (source) and an
I/O device (destination), for a data output job.

Request

SOURCE DESTINATION
Reply

Data

Fig. 8-12(b) Schematic representation of data transfer with handshaking

Fig. 8-12(c). Data transfer from CPU (via PPI) to Output device with handshaking

The sequence of output data transfer with handshaking is as follows:

1-The CPU executes OUT and sends data from AL to the PPI
2-The PPI reads data
3- The PPI sends data to I/O device
4- PPI sets OBF to TRUE to tell I/O device the data is available & valid
5- The I/O device reads data when OBF changes from FALSE to TRUE
6- The I/O device sets ACK to TRUE to tell the PPI, it received data
7- The PPI raises an interrupt by setting INTR to TRUE

Similarly, figure 8-12(d) depicts the handshaking signals between an I/O


device (source) and a PPI (destination), for a data input job. The sequence
of input data transfer with handshaking is as follows:

1-The I/O device sends data to the PPI


2-The I/O device sets STB to FALSE to tell PPI to read data
3- The PPI reads data and sets IBF to TRUE
4- The PPI interrupts the CPU to inform it that it has a data
5- The CPU reads the data, which sets INTR and IBF to FALSE
513

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Fig. 8-12(d). Data transfer from an Input device to CPU (via PPI) with handshaking.

8-7.1. Input /Output with Handshaking Using the 8255 .


The 8255 PPI chip can handle programmable I/O operation with
handshaking capabilities. In mode 1, both port PA and PB can be
programmed as I/O, while port PC handles handshaking signals, as shown
in figure 8-13). In this case, the control word of the 8255 is as follows:

D7 D6 D5 D4 D3 D2 D1 D0
1 01 1/0 1/0 1 1/0 x
I/O PA Mode 1 Port A PC4,5 / PC6,7 PB Mode 1 Port B
In/Out In/Out In/Out

When the control register is loaded with the control word, which indicates
that the 8225 is used in a handshaking mode (Mode 1), the port PC, will be
furnished with a status word, which can be used by the I/O devices and the
microprocessor for servicing the asynchronous data transfer process. Note
that the PC free bits are PC4, PC5 when PA is output or PC6, PC7 when
PA is input. These free bits can be used as I/O bits.

The status word in output mode, with handshaking is given by:

PC7 PC6 PC5 PC4 PC3 PC2 PC1 PC0


OBFA INTE.A I/O I/O INTR.A INTE.B OBFB INTR.B

Where OBFA is Output-Buffer Full (of PA), INTE.A is Interrupt Enable


for port PA. Also, INTR.A (Interrupt Request for port PA) occupies bit
PC3 and INTR.B (Interrupt Request for port PB) occupies bit PC0. The
two bits, PC4 and PC5 are free and can be used for input/output. Figure 8-
14(a) depicts the timing diagram of the 8255 in mode 1, for strobed input.
Similarly, the status word in input mode, with handshaking is given by:

PC7 PC6 PC5 PC4 PC3 PC2 PC1 PC0


I/O I/O IBFA INTE.A INTR.A INTE.B IBFB INTR.B

514

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Here IBFA signal means Input Buffer Full (of PA) and occupies bit PC5.
Also, INTE.A occupies bit PC4 and the two bits PC6 and PC7 are free and
can be used for input/output.

8255 PA 8255 PA

INTE.A PC4 STB INTEA PC6 ACK


A
IBFA A
PC5 PC7 OBF
A

PC3 INTR.A PC3 INTR.A


PC6 Free bits PC4 Free bits
PC7 PC5

(a) PA input, with handshaking, (b) PA output with handshaking

8255 PB 8255 PB

INTE.B PC2 STB INTE.B PC0 ACK


B
IBFB B
PC1 PC1 OBF
B

PC0 INTR.B PC0 INTR.B

(c) PB input, with handshaking, (d) PB output with handshaking

Fig. 8-13. Handshaking signals of the 8255 PIO chip for data input/output, to/from
PA and PB. The 2 free bits are determined according to whether PA is input or output

Figure 8-14(b) depicts the timing diagram of the 8255 in mode 1, for
strobed output. The handshaking signals for port PA in either of the above
two cases (input with handshaking or output with handshaking) are
delivered by 3 bits of port PC. Actually, 3 other bits are used for providing
handshaking signals of port PB (PC0, PC1, PC2) and the rest two bits of
PC are free. So, in I/O with handshaking mode, one can use PA as input
and PB as output or vice versa or both as inputs or both as output.
515

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

The following example depicts how the 8255 chip can be used for issuing
data from port PA (output data) to a line printer, with handshaking. So, the
PC7 and PC6 will hold the strobing (OBFA) and acknowledge (ACKA)
signals, for port PA, respectively.

(a) strobed input

(b) strobed output


Fig. 8-14. Timing diagram of the 8255 PIO chip in mode 1,
(a) strobed input, (b) strobed output.

Example 8-7: I/O with Handshaking.


Show how to program the 8255 PPI circuit shown in figure 8-15 for I/O
with handshaking capabilities (mode 1). Consider the 8255 is connected to
a line printer (LPT1), as shown in figure. Write a program that prints the
message “Hello there” #. The # sign indicates the end of message.

Solution: Here port PA is used as output port with handshaking


capabilities. So the control word is as follows:
D7 D6 D5 D4 D3 D2 D1 D0
1 01 0 0 0 0 x
I/O PA Mode 1 Port A PC4,5 PB Mode Port B
Output Out 0 Output

516

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

So, the first control word (CW1) for I/O mode is 101000000 (A0H). The
assembly program is as follows:

Data
PA D0-D7
8255
PC3 PC6 ACK LPT1
INTR.A ACKA

PC7 Strobe
OBFA

Fig. 8-15. Data output from the 8255 PPI (port PA) to a line Printer with
Handshaking signals.

TITLE Programming the 8255 PPI (I/O Mode, & Handshake)


PA EQU 300H
PB EQU 301H
PC EQU 302H
CR EQU 303H
LF EQU 0AH
CW1 EQU 0A0H
CW2 EQU 0DH
MSG DB “Hello There”, CR, LF, ”#”
MOV AL,CW1 ; Control word for I/O with handshaking
OUT CR,AL ; Issue 1st Control word
MOV AL,CW2 ; Control word for BSR Mode
OUT CR,AL ; Issue 2nd Control word to set INTE.A
MOV SI,OFFSET MSG
ENC: MOV AH,[SI] ; Get a character
CMP AH,’#’ ; Is this the end character?
JZ FIN ; If YES, exit
UP: IN AL,PC ; else Load status word from PC into AL
AND AL,08 ; Check if INTR.A signal (PC3) is high?
JZ UP ; If NO, rewind and keep checking (polling)
MOV AL,AH ; else Move the character to AL,
OUT PA,AL ; and Send it to LPT1, via PA
INC SI ; Point at the next character
JMP ENC ; Repeat steps for next character
FIN: ; Return to operating system
END

8-7.2. Bidirectional Input/Output with Handshaking Using the 8255


In mode 2 of the 8255, PA is programmed as bidirectional port to exchange
data with other bidirectional I/O devices or microcontrollers or even
another computer.
517

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

For bidirectional operation of PA, we need 5 handshaking signals from PC


as shown in figure 8-16. However, in this case, PB may be programmed in
either mode 0 or mode 1.

8255 8255
PA PA

PB PB
INTRA INTRA
PC3 PC7 OBF PC3 PC7 OBF
PC6 A PC6 A
PC5 PC5
PC4 ACKA PC4 ACKA
IBFA IBFA
PC1 Free
STBA PC1 OBFB
STBA
PC2 I/O PC2
PC0 bits PC0 ACKB
INTR.B

CW D7 D6 D5 D4 D3 D2 D1 D0 CW D7 D6 D5 D4 D3 D2 D1 D0

1 1 x x x 0 1 1/0 1 1 x x x 1 0 x

PA in mode 2, with PB in mode 0 (input). PA in mode 2, with PB in mode 1 (output)


Here D0 is 1/0 to assign free bits as I/O.

Fig. 8-16. Bidirectional I/O operation (Mode 2) of PA with handshaking.

The status word in bidirectional mode, with handshaking takes the


following form:

PC7 PC6 PC5 PC4 PC3 PC2 PC1 PC0


OBFA INTE.A1 IBFA INTE.A2 INTR.A x x x

Note that there exist two interrupt enable signals here, INTE.A1
(associated with OBFA) and INTE.A2 (associated with IBFA). These
interrupt enable signals are gated inside the 8255 with OBFA and IBFA to
generate the interrupt request signal INTR.A as follows:

INTR.A = (INTE.A1 AND OBFA) OR (INTE.A2 AND IBFA)

518

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Example 8-8: Bidirectional I/O with Handshaking.


Show how to program the 8255 PPI such that PA transmits data to AL and
receives data from AH, simultaneously.

Solution:
The following program depicts the bidirectional operation of PA. We make
use of the masking bytes MB5 and MB7 to check IBFA (PC5) and OBFA
(PC7) of the status word.

TITLE Programming the 8255 (Bidirectional I/O Mode & Handshake)


PA EQU 60H
PB EQU 61H
PC EQU 62H
CW EQU 303H
MB7 EQU 80H
MB5 EQU 20H
READ PROC NEAR
IN AL,PC ; Get IBFA (PC5) of the status word
TEST AL,MB5 ; Test IBFA
JZ READ ; Rewind if IBFA=0
IN AL,PA ; else, Get data from AL to PA (receive)
RET
READ ENDP
WRITE PROC NEAR
IN AL,PC ; Get OBFA (PC7) of the status word
TEST AL,MB7 ; Test OBFA
JZ WRITE ; Rewind if OBFA=0
MOV AL,AH ; else, Get data from AL to PA (transmit)
IN PA,AL
RET
WRITE ENDP
END

519

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-7.3. CPU Services for Input /Output Control


The CPU usually supports two types of input/output control services,
namely:

1- Polling Control
2- Interrupt Control

As we’ve seen in the previous example, the CPU was checking regularly
for the presence of the INTR.A signal (interrupt request for PA). This
process is called polling.

Polling service is simple, but overheads the CPU. Interrupt service is more
efficient than polling, but needs somewhat more complex software
handling and additional hardware. Handling interrupts and their service
routines was previously presented in chapter 1 and chapter 4.

When the 8255 PPI chip is programmed to operate in mode 1 or mode 2,


control signals are provided to be used for interrupt request inputs to the
CPU. The interrupt request signals are generated from port PC and can be
enabled or disabled by setting or resetting the corresponding INTE flip-
flops (PC2 and PC6), using the BSR mode. For instance, if the PPI is in
mode 1 and PA is set for data output, with handshaking, the INTR.A signal
is used to interrupt the CPU when an output device (e.g., LPT) has
accepted data transmitted by CPU. In this case INTR.A is set high when
ACKA is one, OBFA is one and INTE.A is one. This allows an output
device to request service from the CPU by strobing data from the PPI.
INTR.A is reset at the falling edge of WR of the 8255 chip.

520

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-8. I/O-Memory Interface & Direct Memory Access (DMA)


Direct memory access (DMA) allows computer peripherals to
communicate directly with the computer RAM, bypassing the processor.
Therefore, DMA is a method of allowing data to be moved from one I/O
and mass storage devices to memory and vice versa in a computer without
intervention from the CPU. With DMA, peripherals work faster and use
less processor power. In modern computers, DMA access is negotiated by
the BIOS during startup, or by the operating system. The PC DMA
subsystem is based on the Intel 8237 DMA controller. The 8237 contains
four DMA channels that can be programmed independently and any one of
the channels may be active at any moment. These channels are numbered
0, 1, 2 and 3. Figure 8-17(a) depicts the pin-out diagram of the 8237 chip.

Fig. 8-17(a). Pin-out diagram of the Intel 8237 DMA controller

8-8.1. DMA Chip (8237) Architecture


As shown in figure 8-17(b), the 8237 has four channels (DMA0-DMA3).
The DMA channels has several different operating modes, and a couple of
them can be joined together to allow direct data transmissions without use
of the CPU. Each channel of the DMA has 4 registers assigned to it.

521

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

The first hold a 16-bit value for the address in memory and the second hold
a 16-bit value for the numbers of bytes (8-bit channels) or words (16-bit
channels). The last two are used to monitor the DMA transfer. The 8237
has two electrical signals for each channel, named DRQ (DMA Request)
and DACK (DMA Acknowledge).

There are additional signals with the names HRQ (Hold Request), HLDA
(Hold Acknowledge), EOP (End of Process), and the bus control signals
MEMR (Memory Read), MEMW (Memory Write), IOR (I/O Read), and
IOW (I/O Write).

Fig. 8-17(b). Architecture of the Intel 8237 DMA controller

The 8237 DMA is known as a ``fly-by'' DMA controller. This means that
the data being moved from one location to another does not pass through
the DMA chip and is not stored in the DMA chip. Subsequently, the DMA
can only transfer data between an I/O port and a memory address, but not
between two I/O ports or two memory locations.

8-8.2. How does DMA Work ?


When an I/O device, such as a hard disk controller, requests a DMA
transfer, it does so as follows:

 The I/O device signals the 8237 on the DRQ (DMA Request) line.
 The 8237 then signals the processor that it wants to take control of the
bus by activating the HRQ (Hold Request) line to the processor.
522

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

 The system waits for the main processor to finish whatever it is doing
and then disconnects it from the bus and activates the HOLDA (Hold
Acknowledge) line, which causes the processor to be locked out. (The
processor is not actually halted, but merely left idle.)
 The 8237 takes control of the bus, signals the device that it is ready via
the DACK (DMA Acknowledge) line and transfers data to or from the
device.
 When the transfer is completed the DMA is disconnected from the bus,
all the lines are reset and the processor is reconnected to the bus and
carries out any tasks are demanded of it.

8-8.3. DMA Usage in IBM PC


As we stated above, the IBM PC DMA subsystem is based on the 8237
controller, which has four DMA channels (0-3). Starting from the PC/AT,
IBM added a second 8237 chip, and numbered its channels 4, 5, 6 and 7.
The original DMA controller channels (0-3) move one byte in each
transfer. The second DMA controller channels (4-7) move 16-bits from
two adjacent memory locations in each transfer, with the first byte always
coming from an even-numbered address. The two controllers are identical
components and the difference in transfer size is caused by the way the
second controller is wired into the system.

In the PC architecture, each DMA channel is activated only when the


hardware that uses a given DMA channel requests a transfer by asserting
the DRQ line for that channel. The following list shows what the different
DMA channels in the PC system are used for. Here is the list:

Table 8-3. Direct memory access (DMA) channels usage

Channel Size Normally Used For Most likely Use


0 8 bit Unknown 8 bit requested transfers
1 8 bit Sound 8 bit requested transfers
2 8 bit Floppy Transfers 8 bit requested transfers
3 8 bit Sound or LPT/ECP 8 bit requested transfers
4 16 bit DMA-1 Cascaded Through DMA-1 Cascade
5 16 bit Unknown 16 bit requested transfers
6 16 bit Unknown 16 bit requested transfers
7 16 bit Sound 16 bit requested transfers

The four 8-bit DMA channels of the 8237 can be used with a variety of
adaptors, like:
523

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

 8-bit sound input/output


 Floppy disks transfer
 COM (communication ports) transfer
 LPT (line printer) transfer

The rest three 16-bit channels can also be used with a variety of adaptors,
such as:

 16-bit sound input/output


 Hard Disk Drive (HDD) / Ultra DMA support

Fig. 8-18. Connection of the Intel 8237 DMA controller with the 8088 microprocessor,
in IBM PC

8-8.4. DMA Modes of Operation


The 8237 DMA can be operated in several modes. The main modes are as
follows:
524

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

i-Single mode. A single byte (or word) is transferred. The DMA must
release and re-acquire the bus for each additional byte. This is commonly-
used by devices that cannot transfer the entire block of data immediately.
The peripheral will request the DMA each time it is ready for another
transfer. The standard PC-compatible floppy disk controller (NEC 765)
only has a one-byte buffer, so it uses this mode.

ii-Block/Demand mode. Once the DMA acquires the system bus, an entire
block of data is transferred, up to a maximum of 64kB. If the peripheral
needs additional time, it can assert the READY signal to suspend the
transfer briefly. READY should not be used excessively, and for slow
peripheral transfers, the Single Transfer Mode should be used instead. The
difference between Block and Demand is that once a Block transfer is
started, it runs until the transfer count reaches zero. DRQ only needs to be
asserted until -DACK is asserted. Demand Mode will transfer one more
bytes until DRQ is de-asserted, at which point the DMA suspends the
transfer and releases the bus back to the CPU. When DRQ is asserted later,
the transfer resumes where it was suspended.
Older hard disk controllers used Demand Mode until CPU speeds
increased to the point that it was more efficient to transfer the data using
the CPU, particularly if the memory locations were above the 16M mark.
iii-Cascade mode. This mechanism allows a DMA channel to request the
bus, but then the attached peripheral device is responsible for placing the
address information on the bus instead of the DMA. This is also used to
implement a technique known as ``Bus Mastering''. When a DMA
channel in Cascade Mode receives control of the bus, the DMA does not
place addresses and I/O control signals on the bus like the DMA normally
does when it is active. Instead, the DMA only asserts the -DACK signal for
the active DMA channel.
So, it is up to the peripheral connected to that DMA channel to provide
address and bus control signals. The peripheral has complete control over
the system bus, and can do reads/writes to any address below 16M. When
the peripheral finishes with the bus, it de-asserts the DRQ line, and the
DMA controller can return control to the CPU or to other DMA channel.

Cascade Mode can be used to chain multiple DMA controllers together,


and this is exactly what DMA Channel 4 is used for in the PC architecture.
When a peripheral requests the bus on DMA channels 0, 1, 2 or 3, the slave
DMA controller asserts HRQ, but this wire is actually connected to DRQ4
on the primary DMA controller instead of to the CPU.
525

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

The primary DMA controller, thinking it has work to do on channel 4,


requests the bus from the CPU using HRQ signal. Once the CPU grants the
bus to the primary DMA controller, DACK4 is asserted, and that wire is
actually connected to the HLDA signal on the slave DMA controller.

The slave DMA controller then transfers data for the DMA channel that
requested it (0, 1, 2 or 3), or the slave DMA may grant the bus to a
peripheral that wants to perform its own bus-mastering, such as a SCSI
controller.
Because of this wiring arrangement, only DMA channels 0, 1, 2, 3 (of
master DMA) and 4, 5, 6, 7 (of slave DMA), are usable with peripherals on
PC/AT systems.

Note that DMA channel 0 was reserved for refresh operations in early IBM
PC computers, but it is generally available for use by peripherals in
modern systems. When a peripheral is performing Bus Mastering, it is
important that the peripheral transmit data to or from memory constantly
while it holds the system bus. If the peripheral cannot do this, it must
release the bus frequently so that the system can perform refresh operations
on main memory.
As we mentioned so far in chapter 7, the DRAM used in PCs must be
accessed frequently to keep the charge of stored bits. Since memory read
and write cycles ``count'' as refresh cycles (a dynamic RAM refresh cycle
is actually an incomplete memory read cycle), as long as the peripheral
controller continues reading or writing data to sequential memory
locations, that action will refresh all of memory.
iv-Auto-initialize mode. This mode causes the DMA to perform Byte,
Block or Demand transfers, but when the DMA transfer counter reaches
zero, the counter and address are set back to where they were when the
DMA channel was originally programmed. This means that as long as the
peripheral requests transfers, they will be granted. It is up to the CPU to
move new data into the fixed buffer ahead of where the DMA is about to
transfer it when doing output operations, and read new data out of the
buffer behind where the DMA is writing when doing input operations.

This technique is frequently used on audio devices that have small or no


hardware ``sample'' buffers. There is additional CPU overhead to manage
this ``circular'' buffer, but in some cases this may be the only way to
eliminate the latency that occurs when the DMA counter reaches zero and
the DMA stops transfers until it is reprogrammed
526

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-8.5. Programming the DMA


The DMA channel that is to be programmed should always be ``masked''
before loading any settings. This is because the hardware might
unexpectedly assert the DRQ for that channel, and the DMA might
respond, even though not all of the parameters have been loaded or
updated. Once masked, the host must specify the direction of the transfer
(memory-to-I/O or I/O-to-memory), what mode of DMA operation is to be
used for the transfer (Single, Block, Demand, Cascade, etc), and finally the
address and length of the transfer are loaded.

The length that is loaded is one less than the amount you expect the DMA
to transfer. The LSB and MSB of the address and length are written to the
same 8-bit I/O port, so another port must be written to guarantee that the
DMA accepts the first byte as the LSB and the second byte as the MSB of
the length and address.

Then, one has to be sure to update the Page Register, which is external to
the DMA and is accessed through a different set of I/O ports. Once all the
settings are ready, the DMA channel can be un-masked. That DMA
channel is now considered to be ``armed'', and will respond when the DRQ
line for that channel is asserted. You can refer to the 8237 data sheet for
more precise programming details. You will also need to refer to the I/O
port map for the PC system, which describes where the DMA and Page
Register ports are located.
The following is a rude example of the communication protocol used when
DMA is being used to transfer data from an adapter to memory:

 When the transfer begins the DMA channels registers are loaded with
the correct address base and counter value.
 The adapter is told to begin the transfer.
 When adapter has the first data ready it signals the DMA controlchip.
 The DMA controlchip ask the CPU for ownership over the data bus.
 When ownership is granted, the DMA chip signals back to the adapter
to start sending.
 At the same time the DMA puts the base address on the address bus.
 The adapter puts its data on the data bus. And the the RAM circuits
automaticly reads the data.
 The DMA chip controls the transmission. And then sends a
Transmission Complete (TC) signal when the transmission is finished
(ie. when the counter value changes from 0000H to FFFFH).

527

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

 When the adapter senses the TC signal it uses its IRQ, to inform the
code which requested the DMA transfer that it has finished.
 The program can then check to see if the transmission has run
smoothly.

8-9. I/O Processors


Many storage, networking, and embedded applications require fast I/O
throughput for optimal performance. Unlike DMA controllers, an I/O
processor (IOP) can fetch and execute its own instructions. IOP's are
different from loosely-coupled coprocessors, in which the slave processor
executes its instruction via ESC and Wait from the main processor.
Although the IOP instructions are tailored for I/O processing, they may
also include arithmetic and logic instructions. So, the IOP's allow servers,
workstations and storage subsystems to transfer data faster, reduce
communication bottlenecks, and improve overall system performance by
offloading I/O processing functions from the host CPU.

8-9-1. Features of IOP's


IOP's have the following general features:

 Intelligent I/O Processing: Offloads I/O processing functions, such as


I/O interrupt processing and parity calculations, from the CPU. This
allows the CPU to streamline application processing and to use other
system resources, such as the system bus and memory, more
effectively.
 On-chip Cache: Improves data throughput by reducing external bus
traffic.
 Parallel Transaction Capabilities: Eliminates the need to use
expensive proprietary controllers to handle parallel transactions and
compression algorithms.
 Single-chip design: Provides smaller packaging, and board cost
savings.

Figure 8-19 depicts the connection of IOP to the main processor, via a
local bus. The communication between the host processor and the IOP may
be summarized as follows. The host processor initiates an I/O operation by
writing a message in memory to describe the I/O function to be performed.
Then the IOP reads this message from memory and carries out the I/O
operation. When the IOP finishes, it notifies the host CPU when it is done.

528

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-9-2. The 8089 I/O Processor (IOP)


The 8089 is a 40-pin chip, which is designed to handle the details involved
in I/O processing in conjunction with an 8086 processor. The 8089 has two
independent channels. The 8086 and the 8089 communicates with each
other via writing messages (or control blocks) into the main memory.

Fig. 8-19. Connection of an I/O Processor (IOP) to a host CPU, via a local bus.

8089 Channel 1
Channel 2
DRQ1
Lock GA
Control logic
GB EXT1
RQ/G 19 0
GC SINTR1
T CCP
Status TP
SEL
PP
15 0
READY IX
BC DRQ2
ALU MC
RESET CC
EXT2
CLK Bus Control PSW SINTR2

Fig. 8-20. Functional block diagram of the Intel 8089 I/O Processor.

529

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

The 8086 prepare control blocks that describe the task to be performed,
and then dispatched the task to the IOP through a channel attention signal
(via the RQ/GT). Then the 8089 IOP reads the control block to locate a
program sequence called "channel program", which is written in 8089
instruction set.

The IOP performs the assigned task by executing this program. When the
IOP is done, it notifies the 8086 either through an interrupt request or by
updating a status location in memory.

8-9-3. The 80321 I/O Processor


The Intel 80321 (or IOP321) is an I/O processor with core speed 600 MHz
that is operating with PCI-x bus (frequency 133 MHz and width 64-bit),
and 200 MHz DDR SDRAM (extendable to1GB). Figure 8-21 depicts the
block diagram of 80321 I/O processor.

Fig. 8-21. Functional block diagram of the Intel 80321 I/O Processor.

530

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-10. Summary

In this chapter we discussed the different methods to design and implement


interface circuits for input/output devices for a microprocessor system,
with emphasis on 80x86 microprocessors.

In particular, the 80x86 processors provides a separate I/O address space,


distinct from physical memory that can be used to address the input/output
ports. The I/O address space consists of 64k (216) individually addressable
8-bit ports. Also, any two consecutive 8-bit ports can be treated as a 16-bit
port; and any four consecutive 8-bit ports can be treated as a 32-bit port.

The following circuits depict the simplest form input and output ports in a
microprocessor system

We also presented the several modes of data input/output in


microprocessor-based and computer systems, namely:

 I/O under microprocessor control


 Interrupt-initiated data transfer
 Direct memory access (DMA) transfer, from secondary to main
memory
 Transfer of data through I/O processors (IOP's)

531

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-11. Problems

8-1) Explain the memory-mapped I/O concept in 80x86 microprocessors.

8-2) Write a program for a 8088 microprocessor to output the word stored
in EX register to the output ports whose addresses are 8004H and 8005H.

8-3) Show how you can design 2 output ports using the 74LS373 chips and
how to connect them to an 80x86 microprocessor as a single 16-bit port.

8-4) Show how to connect two 8255 chips 80x86 microprocessor system to
obtain 3 programmable 16-bit I/O ports

8-5) The 8255 chip is connected to an 8086 microprocessor such that its
ports are assigned as follows: port A and port C as input, and port B as
output.

i) Draw the circuit diagram, which contains the 8255 and the
microprocessor address, data, and control read/write lines.
ii) Determine the port addresses which will be assigned to PA, PB,
PC as well as the control register of the 8255
iii) Determine the control word (byte) which you’ll use
iv) Write an assembly program that inputs data from port A and port
C and add them and then sends the result to ports B

8-6) Show how to design and program an 8086 microprocessor interface to


a stepper motor using 8255. Write a program that makes the motor running
forward continuously until the spacebar key is pressed or run backward
continuously if the backspace key is pressed.

Hints: As shown in fig.ure 8-9, the four leads of the stepper motor (WA,
WB, WC and WD) can be controlled by four bits of any port of the 8255.
Consider PA0-PA3 (of port PA) in this example as to control the stepper
motor.

8-7) Design a simple parallel printer interface card (with handshaking) on


the basis of the 8255 PIO

8-8) Show how to design and program an 8086 microprocessor interface to


a keyboard and self-scanned display using 8255.
532

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

Make use of the following port assignments in your design:

8-9) The DMA differs from the interrupt mode by


a) The involvement of the processor for the operation
b) The method accessing the I/O devices
c) The amount of data transfer possible
d) Both a and c

8-10) The DMA transfers are performed by a control circuit called as


a) Device interface
b) DMA controller
c) Data controller
d) Overlooker

8-11) In DMA transfers, the required signals and addresses are given by
a) Processor
b) Device drivers
c) DMA controllers
d) The program itself

533

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-12) After the complition of the DMA transfer the processor is notified by
a) Acknowledge signal
b) Interrupt signal
c) WMFC signal
d) None of the above

8-13) The DMA controller has _______ registers


a) 4
b) 2
c) 3
d) 1

534

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

8-12. Bibliography

[1] W. A. Triebel, 80386, 80486, and Pentium Microprocessor: The


Hardware, Software, and Interfacing, 1999.

[2] http://www.hokeyball.com
[3] http://www.XBitlabs.com
[4] http://www.x86-guide.com

535

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 8

536

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 9

Interface Circuits
with
IBM PC & Compatibles
Contents

9-1. Introduction (Overview of the IBM PC)


9-2. The PC Motherboard
9-3. Busses & Expansion Slots
9-4. History of PC Buses
9-4.1. PC (8-bit) Bus
9-4.2. ISA Bus
9-4.3. Proprietary Buses & their Problems
9-4.4. MCA & EISA Buses
9-4.5. VESA Local Bus
9-4.6. PCI Bus
9-4.7. Accelerated Graphic Port (AGP)
9-4.8. PCI-Express Bus
9-4.9. IEEE-488 (GP-IB) Bus
9-4.10. I2C Bus
9-4.11. SMBus
9-4.12. JTAG (IEEE1149.1) Bus
9-4.13. Multi-Bus Architecture
9-4.14. Controller Area Network (CAN) Bus
9-4.15. Bus Hierarchy
9-4.16. Bus Topologies
9-4.17. PCMCIA & ExpressCard
9-4.18. PC I/O Extension Cards

537
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Contents of Chapter 9 (Cont.)

9-5. The IBM PC Serial Ports


9-5.1. Introduction to Serial Communications & RS232
9-5-2. UART Chips
9-5.3. Description of the Serial Port
9-5.4. How Many Wires do We Need for a Serial Connection?
9-5.5. Addressing the Serial Port
9-5.6. Programming the Serial Port
9-5.7. Universal Serial Bus (USB)
9-5.8. Other Serial Bus Standards (FireWire and IrDA)
9-5.9. PC-to-PC Communication (Networking) & Ethernet
9-5.10. Switching Networks
9-6. The IBM PC Parallel Ports
9-6.1. Parallel Port Architecture
9-6.2. IBM-PC Parallel Port Cable
9-6.3. Parallel port I/O Addressing
9-6.4. Parallel Port Timing Diagram.
9-6.5. Programming the Parallel Port
9-6-6. Recent Improvements in the PC Parallel Port
9-6-7. Parallel Port I/O under Windows
9-7. Attaching a Mass Storage Device to IBM PC & Compatibles
9-8. Keyboard Interface Circuits
9-9. Mouse Interface Circuits
9-10. Video Monitor Interface Circuits
9-11. Summary of I/O Addresses in IBM PC & Compatibles
9-12. Summary
9-13. Problems
9-14. References

538
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Interface Circuits
with
IBM PC & Compatibles
9-1. Introduction (Overview of the IBM PC)
Before we jump onto the details of the IBM PC and how to interface them
with external circuits, it is wise to look at their internal components and
how they work. The IBM PC & compatible microcomputers are based on
the Intel series 80x86 of microprocessors.

Fig. 9-1. Overview of an old desktop IBM PC.

The basic hardware configuration of a IBM PC‘s has not changed much
over the years. A typical PC Computer consists of the following items.
1. System Unit that contains:
o Mother Board that provides
CPU, RAM, BIOS ROM, Bus slots, Parallel and Serial I/O Ports
o Hard disk drives, Floppy disk drives and CD/DVD drives.
o Video Interface card
o Switch mode Power Supply
539
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

2. Keyboard &Mouse
3. Video Display Unit (VDU or Monitor)
4. Printer

Fig. 9-2. General block diagram of an IBM PC, with peripheral devices.

9-2. The PC Motherboard


The motherboard is the main circuit board inside the PC which holds the
processor, memory and expansion slots. Figure 9-3(a) shows the
motherboard of the original IBM PC, which is based on 8088 CPU, and
its components. Also figure 9-3(b),(c), (d) show more recent
540
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

motherboards, based on Pentium, Core2 and corei7 microprocessors.

Fig. 9-3(a). Motherboard of the first IBM PC (1981) and itsschematic layout

541
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-3(b). Motherboard of a Pentium-based IBM PC.

Fig. 9-3(c). Intel motherboards: DG965 for an Intel Core2 Duo microprocessor

542
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-3(d). Intel motherboards. for an Intel Core i7 microprocessor,

543
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

As shown, the PC motherboard contains a socket to host the


microprocessor, sockets for DRAM (DIMM or DDR), BIOS ROM,
chipset containing most of the interface circuits, sockets for connecting
hard disks (ATA, SATA), sockets for CD/DVD drives, socket for power
supply, expansion slots that can receive add-on cards (e.g., ISA and PCI
slots) as well as serial and parallel ports connectors. Actually, PC‘s use
many buses to link their various components

Each PC motherboard is designed for a certain range of processors. The


microprocessor performs 90% of all the functions of the PC. One of the
important factors is the CPU socket, which is soldered onto the board.
Most motherboards use Zero Insertion Force (ZIF) sockets that allow
easy insertion or removal of the CPU. Along the years, Intel continued to
design a series of sockets, for its line of CPUs. For example; the 80286
and early versions of 80386 (132 pin) microprocessor used PLCC
sockets. The 80486 (168 pin) used special PGA sockets, such as Socket1,
Socket2 and Socket 3. More details about processor sockets can be found
in Chapter 11 of this book.

The BIOS (basic input/output system) ROM performs what is called the
Power-On Self-Test (POST), when it boots the PC. The POST is a built-
in diagnostic program that checks the PC hardware to ensure that
everything is present and functioning properly. Another function of the
BIOS is to set of information that is critical to the operation of your PC,
but is not stored on your hard disk at all. This is called the CMOS
Settings. These settings are very important because minor changes to
them can have a major impact on how your system functions.

9-3. Buses & Expansion Slots


The components inside a computer talk to each other in various different
ways. Most of the internal system components, including the processor,
cache, memory, expansion cards and storage devices, talk to each other
over one or more buses. A bus, in computer terms, is simply a channel
over which information flows between two or more devices (technically,
a bus with only two devices on it is considered by some a "port" instead
of a bus). A bus normally has access points, or places into which a device
can tap to become part of the bus, and devices on the bus can send to, and
receive information from, other devices. In IBM PC‘s and compatible
microcomputers, there exist some slots on the motherboard, which are
ready to receive add-on input/output cards, like sound, video, display and
Fax/Modem adaptors. These expansion slots are ready to host any other
I/O devices.
544
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In this section, we describe the microcomputer busses and expansion slots


and how they can be exploited for interfacing the IBM PC & compatible
microcomputers with expansion cards.

Expansion slots has so many pins to power the expansion cards and for
connecting then with data, address and control bus. The expansion slots
are connected with the CPU via a group of signal lines (on the
motherboard) called the expansion bus. The expansion bus contains a
large number of input/output pins, for data and address as well as control
signals and it is usually operated at a frequency, lower than the
microprocessor clock. The efficiency of the expansion bus is expressed in
terms of its bandwidth. The expansion bus speed is calculated using the
following equation:

Bus speed = Bus width (in Bytes) x Bus Clock (in MHz)

Fig. 9-4. Installing an expansion card into an expansion slot.

545
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4. History of the PC Busses


Since the appearance of PC‘s in early 1980‘s, so many expansion busses
have been emerged, which are different in their speed and power to
handle data transfer. In the early days, all microcomputers were built
using their own proprietary bus designs. However, before long time,
someone had the bright idea that if designers used the same design
specifications, you could build a computer out of boards from different
companies. This idea created the S-100, a bus that is still in use today in
some areas1. The S-100 bus essentially consisted of the pins of the Intel
8080 run out onto the backplane to form the single system bus. With the
introduction of the IBM PC, several busses have appeared and adopted
along the years in its different generations.

Fig. 9-5. Different standard shapes of expansion cards.

9-4.1. IBM PC (8-bit) Bus.


The original IBM PC was equipped with PC 8-bit bus. This 62-pin bus is
still used by some I/O cards, under the name PC/XT bus2. The 8088 ran at
4.77MHz, which was fine for the expansion cards, and running the
expansion slots at the same clock speed as the CPU made the system
boards easier to design and cheaper to build.
1
The Altair 8800, the first hoppyist microcomputer, used a connector with 100 pins. This connector
was called the Altair bus and was re-named as the S-100 bus
2
This bus is sometimes called the 8-bit ISA bus

546
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-6. Overview of the S-100 interface bus and an card

9-4.2. The Industry Standard Architecture (ISA) Expansion Bus


The ISA bus (or AT bus) appeared in 1984, with the IBM PC/AT which
had a 16-bit data bus. The new AT slots were designed to be backward
compatible with the 8-bit PC slots. The AT extension connector was
added to the end of the 62-pin edge connector of the original 8-bit bus
slot. This extension is a 36-pin edge connector. This bus was later called
Industry Standard Architecture (ISA) and has survived to this day. One
important aspect of the ISA bus was that IBM never made any
specification about the bus speed.
547
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In the original 6MHz IBM PC/AT, and the subsequent 8MHz version, the
bus ran at the same speed as the CPU. It was not surprising that as PC
clone vendors started looking for a marketing edge over IBM, they
simply kept the bus running at the CPU speed as they boosted speeds to
12 MHz, or even faster. This led to problems with users. Boards that ran
fine in 8 MHz PCs were not reliable at faster speed. The industry settled
on 8 MHz as a standard clock speed and on the name Industry Standard
Architecture. Figure 9-6 depicts the ISA bus and its pin assignments.

Fig. 9-7(a). Overview of the ISA interface bus and an ISA card.

As shown, the upper part of the ISA bus is identical to the old 8-bit PC
bus and is divided into 'A' and 'B' sides. The lower part is divided into 'C'
and 'D' sides, which provides additional pins for the 16-bit data, 24-bit
address bus as well as supplementary interrupt requests and DMA
channels. Now, let's start describing the operation of the ISA bus with a
simple read cycle from an Input/Output port. The first thing the
microprocessor does is to send out a high on the ALE signal, and then
sends out the A0-A19 lines. After that, the ALE signal goes low.

548
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

For the "B" side:


A B
 D0-D7 (pins A9 to A2): Data bus. GND  1  -I/O CH CK
 A0-A19 (pins A31 to A12): The RST DRV   D7
+5V   D6
address bus twenty lines. IRQ9   D5
 AEN (pin B11): Used by the -5V   D4
DMA controller to take over data DRQ2   D3
-12V   D2
& address buses in DMA transfer. Reserved   D1
For the "A" side: +12V   D0
 GND (pins B1, B10, B31): ground. GND  10  -I/O CH RDY
MEMW   AEN
 +5V (pins B3, B29): 5V DC output MEMR   A19
 -5V (pin B5): -5V DC output. IOW   A18
IOR   A17
 -12V (pin B7): -12V DC output. -DACK3   A16
 +12V (pin B9): +12V DC output. DRQ3   A15
 
 MEMW (pin B11): The P asserts -DACK1
DRQ1  
A14
A13
this signal to write to memory. Refresh   A12
MEMR (pin B12): The P asserts CLK
IRQ7


20 

A11
A10
this signal to read from memory. IRQ6   A9
 IOW (pin B13): Processor asserts IRQ5   A8
IRQ4   A7
this signal when writes to a port. IRQ3   A6
 IOR (pin B14): The P asserts this -DACK2   A5
T/C   A4
signal when doing a read from a port. +ALE   A3
 IRQ2-IRQ7 (pins B4, B21, B22, +5V   A2
B23, B24 and B25): Interrupt signals. OSC   A1
GND  31  A0
 ALE (pin 28): Used by P to lock the
16 lower address bus in a latch D C
during a memory or I/O operation. -MEM 16  1  BHE
 CLOCK (pin 20): The system clock. -I/O CS16   A23
IRQ10   A22
OSC (pin 30): High frequency clock IRQ11   A21
which can be used for the I/O boards. IRQ12   A20
For the "C" side: IRQ13   A19
IRQ14   A18
 D08-D15 (pins C11 to C18): The -DACK0   A17
upper byte of the data bus. DRQ0   -MEMR
-DACK5  10  -MEMW
A17-A23 (pins C2 to C8): The rest of the DRQ5   D08
address bus -DACK6   D09
For the "D" side: DRQ6   D10
-DACK7   D11
 IRQ10-IRQ14, Additional DRQ7   D12
 DRQ, DACK: Additional DMA 5V   D13
MASTER   D14
 MSTER: This signal allows GND  18  D15
another P to take over bus

Fig. 9-7(b). Description of the ISA bus.

549
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

From now on the address of the target port to be read is latched. Then the
ISA bus takes the IOR signal to low level so that the addressed device is
put a data byte onto the D0-D7 data bus. The microprocessor will read
then the data bus and take the -IOR signal to a high again. A write cycle
to a port works this way: The microprocessor asserts the ALE high, and
then outputs the port address on A0-A19. Then the ALE goes low again.
The microprocessor sends out the data byte to be written to the data bus.
It then asserts the IOW signal. After the device have time to read the data
byte, the microprocessor raises the IOW signal high again. The only
difference between a memory read/write cycle and a port read/write cycle
is that the MEMR and MEMW signals will be replaced by IOR and IOW
signals. The 24-bit address lines of the ISA bus limit its capability to
handle I/O cards that use the first 16 MB of addressable memory space
for RAM or ROM which are built on the card. Some video cards look for
a memory aperture (also known as a linear frame buffer), a hole in the
system memory, where they can insert and address their own memory
(several Mega bytes). This memory aperture overcomes the problem of
page switching brought about by the assignment of only a 128 kB for the
Video RAM (VRAM) in system memory map. The VRAM of such VGA
cards can be accessed by switching parts of it in and out of the memory
range
Interfacing I/O devices to the IBM PC via ISA bus (or PC bus) needs,
at least, the connection with the following pins (among the first 62 pins):
1- A0-A9 for address decoding (you can assign up to 1k ports)
2- IOR and IOW (both active low)
3- AEN signal: AEN = 0 when the CPU is using the bus

9-4.3. Proprietary Buses & Their Problems


Everything was fine until Intel made the 80386 available. Here was a
processor that could access the world in 32-bit chunks and how should
the industry provide for the wider data path? In particular the data bus to
the RAM needed to be 32-bits wide in order to take advantage of the 386
processors wider data bus. Therefore, many DOS computers had some of
their RAM on expansion cards and the ISA bus limited such RAM to
only 16-bit at a speed of 8MHz. One answer to this problem was to put
the system memory on a local bus with the CPU on the system board. The
memory could be connected directly to the processors data bus and have
no intermediate buffer devices between it and the CPU. This way it could
be 32-bits and accessed at the CPU clock speed. Many companies
decided to make special 32-bit expansion slots for proprietary memory
boards that could be added later.
550
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.4. Micro-Channel Architecture (MCA) and Extended ISA (EISA)


In order to regain control of the PC computer market, IBM introduced a
32-bit proprietary bus, called Micro Channel Adapter (MCA), with its
PS/2 range of computers. The response from clone manufacturers was to
get together and design the Extended Industry Standard Architecture
(EISA) bus, providing a 32-bit data path. The advantage of the EISA
design over MCA was that it remained backward compatible with ISA
boards, and 8-bit cards. The cost of the computers using the MCA or
EISA buses were high and so these busses failed to get into market.

Fig. 9-8. Overview of the MCA bus and the EISA bus.

9-4.5. VESA (Video Electronics Standards Association) Local Bus


With the introduction of graphic user interfaces (GUI), such as Microsoft
Windows, the demand for high-speed graphics has been tremendous. The
modest performance of MCA and EISA pushed developers to introduce
new bus architectures. The VESA Local bus (or VLB) was originally
introduced in 1992 to address such problems. By the middle of 1993 the
VLB became familiar in the market and almost all PCs had VLB slots
The VLB specifications provided two performance-boosting features:
burst mode and bus mastering. In burst mode, VLB devices gain
complete control of the external data bus for up to 4 bus cycles, passing
up to 16 bytes of data in a single burst. Bus mastering allows the VLB
controller to arbitrate data transfers between the external data bus and up
to three VLB devices without assistance from the CPU. The VLB
connectors resemble the ISA slot with an additional short slot.

Fig. 9-9(a). Overview of the VESA local bus (VLB).

551
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-9(b). Details of the VESA local bus (VLB).


552
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.6. Peripheral Component Interconnect (PCI) Bus


The peripheral component interconnect (PCI) overcame the limitations of
ISA, EISA, MCA, and VLB, and offered the performance needed for fast
systems. PCI was developed by Intel, in 1998, but it took some time to
get it to work reliably. The PCI-bus was originally designed to speed up
the display of graphics on Intel-based PCs, but the standard itself is
processor independent and suitable for other platforms and cards that
require high bandwidth, including network, video and SCSI adaptors.
For a while, some people in the computer industry saw a war between the
two competing local-bus standards (VESA-bus and PCI-bus) but in
reality they were not in the same battlefield. The PCI and VESA local
busses did basically the same thing - both speed up PC computers by
letting peripherals like graphic adaptors and hard disk controllers run at
up to 33MHz, instead of the 8MHz of the ISA-bus. The similarity breaks
down when we start talking about how the two designs work. The VESA-
bus bypassed the ISA bus by using the same bus the CPU is connected to
its RAM by and so it was relatively cheap and easy for system and
peripheral makers to implement. On the other hand, the Intel PCI-bus was
a whole new bus, in much the same way the EISA and MCA busses were.
The PCI bus gave only a slight speed improvement when used with 486-
based systems, but it was far ahead when used with the Pentium chips.
The PCI-bus has some other features, such as concurrent bus-
mastering, a full burst mode, and a type of pipelining queue that can
reduce the number of potential wait states compared to the VESA-bus
design.

The PCI-bus uses three elegant techniques to resolve local bus problems.
The first, known as reflective wave signaling, reduces the amount of
electrical amplification required on the signal paths and thus reduces
noise and loading problems. The second is multiplexing. Multiplexing
allows two different signals to use the same electrical path, reducing the
number of pins required for peripheral chips. The third is a protocol
letting the PCI controller receives specific configuration information from
the PCI devices themselves. Intel did not define a standard adaptor
connector for the bus, leaving that job up to a PCI-bus special-interest
group (PCI-SIG) who settled on the white 112 pin connector. However,
the original PCI bus had 32 data lines and could operate 4 times faster
than the ISA bus (at 33MHz). The 64-bit data PCI bus, which operates at
66 MHz, is 4 times faster the original PCI bus. As shown in figure 9-5, a
PCI interface needs a minimum of 47 pins, if it operates as a target device
and 49-pins if it works as a master device.

553
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-10(a). Pins of the original 32-bit PCI bus.

PCI is a platform independent bus. Thus, it was soon used in other


computers built around the PowerPC microprocessors. This is one of the
few times a standard I/O bus has been used across platforms and this has
to be a big feature in its favor. The various companies involved in the
PowerPC development, including Apple and IBM adopted the PCI-bus
for PowerPC based computers. Actually, Apple had been using the
Macintosh NuBus for many years, but switched to the PCI-bus for its
PowerPC products. Digital Equipment (DEC) with their Alpha systems,
Hewlett-Packard and SUN Microsystems are all including PCI slots in
their products. Intel licensed its patents on the PCI free of royalties to all
who wished to use it. The wide ranges of cards that have followed the use
of the PCI-bus on PC systems are available for the first time to users of
other hardware. All that should be required is alternative driver software
for the various platforms.

554
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-10(b). I/O Pins and corresponding signals of the 32-bit PCI bus.

555
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-11. Different shapes of PCI slots and cards.

9-4.7. Accelerated Graphic Port (AGP)


The Accelerated Graphic Port (AGP) interface is a bus specification that
enables high performance video graphics capabilities, on PC's at
reasonable prices. AGP was originally designed by Intel for Pentium II
based motherboards. Starting at 266 MB/s peak transfer speed, the latest
specification, known as AGP 4X, allows for a peak transfer rate of 1066
MB/s, eight times faster than the PCI bus found in desktop systems. The
AGP delivers this peak bandwidth using pipelining, sideband addressing,
and more data transfers per clock.

The AGP also enables graphics cards to execute texture maps directly
from system memory instead of forcing it to pre-load the texture data to
the graphics cards local memory. Some authors consider the AGP as
some sort of advanced PCI. For a system with a front-side bus (FSB)
speed of 133 MHz, the AGP speed is equal to the memory bus speed.
Therefore, it can support a data transfer rate of 1066 MB/s. AGP attains
this high transfer rate because of its ability to transfer data on both the
rising and falling edges of the clock. In addition, AGP does not share
bandwidth with other devices, whereas the PCI bus shares bandwidth.
However, in recent PC‘s the AGP is replaced by the PCI Express bus.

556
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-12(a). Illustration of the AGP slot on the motherboard.

Local CPU System


Memory Memory
16-128 MB
256 MB-4 GB
Peak at
800 MB/s Peak at Peak at
1066 MB/s 1066 MB/s
Display Graphics Chipset
AGP
Chip

PCI Bus

Fig. 9-12(b). AGP Architecture and its connection to the PCI bus.

Table 9-1. Comparison between AGP and classic PCI bus.

AGP Classic PCI


Pipelined requests Non-pipelined
Address/data de-multiplexed Address/data multiplexed
Peak at 1066 MB/s in 32 bits (AGP 4X) 133 MB/s in 32 bits
Single target, single master Multi-target, multi-master
Memory read/write only, no other I/O Link to entire system
High/low priority queues No priority queues

557
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Table 9-2. Comparison between AGP and other busses

Bus Width Bus Speed Bus Bandwidth


(Bits) (MHz) (MB/sec)
PC/XT (8-bit ISA) 8 4.7 - 8 3.25-7.9
16-bit ISA 16 8-16 6.5 -15.9
EISA 32 8 31.8
VESA (VLB) 32 33-50 133 and above
PCI 32 33 133
PCI 2.1 (64-bit) 64 33-133 533 and above
AGP 32 66 254.3
AGP (X2) 32 66 X2 508.6
AGP (X4) 32 66 X4 1,066
PCI-Express 64 133–400 533 and above

9-4.8. PCI-Express Bus


The PCI-Express (PCI-E or PCIe) is a computer expansion interface with
much higher speed than PCI-bus. PCIe bus offers up to 3.5 times more
bandwidth than conventional PCI bus. The PCIe link is built around
unidirectional couples of serial (1-bit), point-to-point connections known
as lanes. This is in contrast to the PCI connection, which is a bus-based
system where all the devices share the same bidirectional, 32-bit (or 64-
bit), parallel bus.

PCIe slots come in a variety of sizes referred to by the maximum lane


count they support. Each slot of the PCIe carries one, two, four, eight,
sixteen or thirty-two lanes of data between the motherboard and the card.
Lane counts are written with an x prefix e.g. x1 for a single lane card and
x16 for a sixteen lane card. In PCIe 1.1 (2007) each lane carries 250MB/s
in each direction. Thirty-two lanes of 250MB/S gives a maximum
transfer rate of 8 GB/s in each direction for PCIe 1.1. Therefore, an eight
lane slot has a data rate comparable to the fastest version of AGP. PCIe
2.0 doubles this and PCIe 3.0 doubles it again. By the end of 2011, PCI-
SIG announced PCIe 4.0 featuring 16 GT/s. A larger card will not fit in
a smaller slot but a smaller card can be used in a larger slot. While a 16
lane card cannot be used in an 8 lane slot it can be used in a 16 lane slot
with only 8 lanes connected. PCIe electrical interface is used in a variety
of form factors including the express card laptop expansion card
interface. As of 2013, The PCIe bus has replaced AGP as the default
interface for graphic cards on new PC systems. Almost all graphics cards
released since 2010 by AMD (ATI) and NVIDIA make use of the PCIe.

558
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-13. Slots of the PCI Express bus (from top to bottom: x4, x16, x1 and x16),
compared to a traditional 32-bit PCI slot (bottom).

9-4.9. IEEE-488 (GP-IB) Bus


The IEEE-488 or GPIB (General Purpose Interface Bus) can drive daisy
chained devices via a wide bidirectional cable connected to the back of
the PC, as shown in figure 9-8. The GPIB replaced the old name HP-IB
(Hewlett Packard Interface Bus), which was widely used as a
measurement bus. GPIB is an industry standard published by the Institute
of Electrical and Electronic Engineers (IEEE) as ANSI/IEEE 488. The
ANSI/IEEE 488.2-1987 strengthened the original standard by defining
precisely how controllers and instruments communicate. Standard
Commands for Programmable Instruments (SCPI) took the command
structures defined in IEEE 488.2 and created a single, comprehensive
programming command set that is used with any SCPI instrument. In
order to use the GPIB you need a GPIB adaptor card in your PC and a
GPIB cable. Fourteen devices can be connected to one GPIB and data can
be transferred at up to 200kB/s. The GPIB uses a 16 line parallel
connection. The 16 lines are divided into 8 data lines, 3 handshake lines
to synchronize the transfer and 5 management lines to control use of the
bus. At any time there must be at least one device on the bus, which is
the Controller. This device issues commands to other devices, and is
always the PC. Other devices may be Talkers - putting data onto the bus,
Listeners - reading from the bus. Only one device may talk at once, but
more than one may listen to the Talker.
559
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-14. GPIB bus and how it connects devices to a PC, via a GPIB cable.

9-4.10. SMBus (System Management Bus)


The System Management Bus (SMBus) is more or less a derivative of the
I2C bus, which has been developed by Intel. The main application of the
SMBus is to monitor critical parameters (e.g., supply voltage, fan and
CPU temperature) on PC motherboards and embedded systems.

9-4.11. I2C (Inter-Integrated Circuits) Bus


The I2C bus is a simple 2 wire serial interface which has been developed
by Philips. It is widely used in consumer electronics and industrial
applications. In addition to microcontrollers, several peripherals also exist
that support the I2C bus. The I2C bus physically consists of 2 active wires
and a ground wire. The active wires, SDA (Serial Data) and SCL (Serial
Clock), are both bidirectional. Up to 128 devices can exist on the network
and they can be spread out over 10 meters. I2C devices can act as receiver
and/or transmitter. As shown in figure 9-11, the I2C bus is a multi-master,
multi-slave network interface. The master is usually the microcontroller
and the clock is always generated by the master. Each node
(microcontroller or peripheral device) may initiate a message, and then
transmit or receive data. Each node on the network has a unique address
which accompanies any message passed between nodes. The I2C bus
interface typically has data transfer speeds up to 3.4 Mb/s.

560
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-15. Simplified model of I2C bus.

9-4.12. JTAG (IEEE 1149.1) Bus.


The JTAG is a standard for providing external test access to integrated
circuits serially, via a 5-pin external interface. JTAG is an acronym for
the Joint Test Action Group, which developed the standard. The JTAG
standard has been adopted as an IEEE standard (IEEE 1149.1) in 1990
under the name ―Standard Test Access Port and Boundary-Scan‖. In
1994, the standard has been supplemented with the boundary scan,
description language (BSDL). The Intel 80486 was the first processor
released with JTAG. Since then, JTAG ports have been widely embraced
by processor manufacturers.

Fig. 9-16. Simplified model of the JTAG bus. The connector pins are:
TDI (Test Data In), TDO (Test Data Out), TCK (Test Clock), TMS (Test Mode
Select) and an optional TRST (Test Reset).

561
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.13. Controller Area Network (CAN) Bus.


The CAN bus was originally developed in the 1980s by Bosch GmbH as
a low-cost communications bus between devices in electrically noisy
environments. Mercedes-Benz became the first automobile manufacturer
in 1992 to employ CAN in their automotive systems. Today almost every
automobile manufacturer uses CAN controllers and networks to control
devices such as: windshield wiper motor controllers, rain sensors, airbags,
door locks, engine timing controls, anti-lock braking systems, power train
controls and electric windows.
Owing to its electrical noise tolerance, minimal wiring, excellent error-
detection capabilities and high-speed data transfer, CAN is rapidly
expanding into other applications such as industrial control, marine,
medical, aerospace and more. The CAN bus is a balanced (differential) 2-
wire interface running over a shielded twisted pair (STP), unshielded
twisted pair (UTP) or ribbon cable. Each node uses a male 9-pin D
connector. Non-return-to-zero (NRZ) bit encoding is used, with bit
stuffing to ensure compact messages with a minimum number of
transitions and high noise immunity. The CAN bus interface uses an
asynchronous transmission scheme in which any node may begin
transmitting any time the bus is free. Messages are broadcast to all nodes
on the network.

Fig. 9-17. Simplified model of the CAN bus

9-4.14. Local Interconnection Network (LIN)


The Local Interconnection Network (LIN) standard defines a low cost,
serial communication network for automotive distributed electronic
systems. LIN is a complement to the other automotive multiplex
networks, including the Controller Area Network (CAN), but it targets
applications that require networks that do not need excessive bandwidth,
performance, or extreme fault tolerance. LIN enables a cost-effective
communication network for switches, smart sensors and actuator
applications inside a vehicle. The communication protocol is based on the
UART data format, a single-master/multiple-slave concept, a single-wire
(plus ground) 12V bus, and a clock synchronization for nodes without a
precise time base (i.e., without a crystal or resonator).
562
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-18. Local interconnection network (LIN) master-slave architecture.

Fig. 9-19. Serial Peripheral Interface (LIN) transceiver structure.

As shown in the figure above, LIN bus is a single-wire bus connected via
a termination resistor to the positive battery node Vbat. The bus is
terminated with a pull-up resistance of 1kΩ in the master node, and
typically 30kΩ in a slave node.LIN versus CAN.

Compared to CAN, LIN offers the advantage of lower cost per node
when the bandwidth and performance of CAN is not needed. LIN's lower
cost results from the use of single-wire communications, a lower
implementation cost versus CAN, and need for crystals in the slave
nodes. The tradeoff for LIN's lower cost is the more restrictive nature of
a single-master network and lower bandwidth, as indicated in the
following table.
563
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.15. Multi-Bus Architecture


Multi-bus system boards helped to overcome problems in the continuing
evolution of the computer buses. System boards with VESA-bus slots are
dual bus boards as they have ISA and VESA-bus slots by the design of
the VESA-bus standard. Many combinations of the various buses that
have been available over the years are possible and some system board
manufacturers produced boards with combinations of ISA, EISA, MCA,
VESA and PCI-bus. This was to allow users to make use of older cards
such as SCSI controllers. Some system boards still have ISA slots in
addition to PCI bus slots, as shown in figure 9-20.

Fig. 9-20. PC expansion slots. The shown motherboard has 2 ISA and 3 PCI slots
564
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.16 . Bus Hierarchy


The PC has a hierarchy, in a way, of different buses. Most modern PC's
have at least four buses. Each one is also generally slower than the one
above it (obviously because the processor is the fastest device in modern
PC's). The top-down bus hierarchy is as follows:

The Processor Bus: This is the highest-level bus that the chipset uses to
send information to and from the microprocessor.
The Cache Bus: Higher-level architectures, such as those used by the
Pentium processors, employ a dedicated bus for accessing the system
cache. This is sometimes called a backside bus. Conventional processors
using fifth-generation motherboards and chipsets have the cache
connected to the standard memory bus.
The Memory Bus: This is a second-level system bus that connects the
memory subsystem to the chipset and the processor. In some systems the
processor and memory buses are basically the same thing.

Fig. 9-21. Different busses and I/O devices in the IBM PC.

565
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The Local I/O Bus: This is a high-speed input/output bus used for
connecting performance-critical peripherals to the memory, chipset, and
processor. For example, video cards, disk storage devices, and high-speed
networks generally use such a bus. The two most common local I/O buses
are the VESA local bus and the PCI bus.
The Standard I/O Bus: Connecting to the above three buses is the
standard I/O ISA bus, used for slower peripherals (modems, sound cards,
low-speed network) and also for compatibility with older devices.

Nowadays, PCI Express has replaced AGP as the most common interface
for graphics cards. PCI Express is also used for gigabit Ethernet and Wi-
Fi. However, add-on cards are still generally PCI. Sound cards, modems
and other cards with low speed are still all PCI. For this reason most
motherboards still offer legacy PCI slots.

9-4.17. Bus Topologies


There exist three major topologies for cascading multiple devices through
an interface bus:
Multi Drop
Daisy Chain
Switched Hub

i- Multi Drop
In multi drop topology, the devices are connected in parallel on the bus.
The data transmitted by any device is sent to all other devices and it is up
to each device to accept or reject the data. If the data on the bus matches
the device criterion, the device may read the data. Otherwise the device
will stay inactive. A contention can happen if more than one device tries
to transmit data on the bus. In order to avoid this, multi drop buses have a
mechanism of collision detection and correction.

ii- Daisy Chain


In this topology, each device is connected to two adjacent devices.
Exception for this observation is the devices connected at the two ends of
the topology. The following diagram will help to get a clear idea of how
Daisy Chain topology looks like. The major advantage of Daisy chain
topology is that since it does not have a share data path, bus contention is
practically zero. But a major disadvantage is that a device's direct
accessibility is restricted to the two immediate peers only.

566
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-22. Bus topologies. Multi-drop, Daisy-chain and Switched hub topology.

iii- Switched-Hub Topology


Switched hub topology uses a Hub as a mediator for communication
between devices. All access requests are routed through hub only. The
hub should be intelligent enough to rout the request to appropriate
connected device. The following picture a switched hub topology. The
major advantage of this topology is that the chance for collision is
virtually zero since all requests are routed through the hub. The Hub will
become a bottleneck in the network. The most popular examples of
networks that use Hub topology are Ethernet and USB.

567
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-4.18. PCMCIA and ExpressCards


The so-called CardBus was developed by the Personal Computer
Memory Card International Association (PCMCIA), for laptop memory.
This expansion was later developed and named Type II PCMCIA. These
cards led to all manner of devices being made available in this form.
Typical devices include hard disks, modems, and network cards. Type II
cards are 16-bit cards 5mm thick, 85.6 mm long, and 54.0 mm wide.

ExpressCard is the modern hardware standard which replaced Type II


cards for laptops and notebooks. The host device supports both PCI
Express and USB 2.0 connectivity. ExpressCard supports two form
factors, ExpressCard/34 (34 mm wide) and ExpressCard/54 (54 mm
wide) — the connector is the same width (34 mm) on both. Due to the
shorter length of the ExpressCard standard Type II cards inserted into
dual-purpose slots will protrude from the laptop. Laptop manufacturers
have started selling machines which have only ExpressCard ports..

Fig. 9-23. Cardbus and Express cards, for laptop and notebook computers

9-4.19. PC I/O Extension Cards


The I/O extension cards are expansion interface cards that can be inserted
into a PC bus extension slot (e.g., an ISA or a PCI slot) to provide
additional external I/O ports. The figure below depicts a simple ISA
extension card, with only 24 line connection. The extension card delivers
3 additional programmable I/O ports via a 8255 PPI chip. Figure 9-21
depicts the circuit diagram of an ISA extension card and its layout. As
shown, the card has only 3 chips. The first one, U1, is the 8255 PPI. The
other two chips, U2 and U3, are 3-to-8 address decoders (74LS138). The
card occupies the address space (210H-213H). More details about the
8255 chip can be found, in chapter 8 of this book. This extension card
provides extra I/O ports for use by external I/O devices. In our circuit, we
get three ports from connecting this card on the ISA bus. These extra
ports can be configured as either input or output ports by application
program that you can develop using MASM 32.

568
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-24. Circuit diagram and layout of an ISA bus extension card

569
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-5. PC Serial Ports


Serial communications are used for transferring data over long distances,
using little number of wires. However, serial data received from a
MODEM or any other serial device has to be converted to parallel so that
they can be transferred to the PC bus. This is the job of the PC serial port.

Fig. 9-25(a). Transmission of serial data, through serial ports.

PC‘s are usually equipped with two serial ports. IBM originally called
these communications ports as COM ports (COM1 and COM2). The
serial ports are built using a universal asynchronous receiver/transmitter
(UART) chip to convert data from parallel to serial and from serial to
parallel transfer. Serial ports usually make use of the so-called RS232
interface standard (sometimes called EIA-232-D) that specifies the data
logic levels during data transfer. Serial ports are usually used for:

 Mouse and pointing devices


 Telephone MODEM's (also can provide a Fax connection)
 Serial Printer with a Serial Interface
 Computer to Computer file transfer (e.g., Laplink and Interlink)
 Simple Network (software like Lantastic Z and Inter-server)
 Digitizer input device used for CAD drawing
 Digital Camera
 Data Acquisition and barcode readers
 Software Security Devices (dongles)

9-5-1. Introduction to Serial Communications & RS232,


The modern computer communication links are based on the open-system
interconnectivity (OSI) model. According to this model, the computer
communication link is subdivided into 7 layers (as shown in figure 9-26).
570
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

1. Physical layer- Hardware equipment of the bus and the layer that
passes the bit streams to and from the network
2. Data Link layer- Deals with the messages at the bit and byte level and
provides data transfer control between a node and the network,
3. Network layer- Sets up addresses and delivers message packets ,
4. . Transport layer- Controls the sequencing of message components,
5. Session layer- Manages the data coordination during communication,
6. Presentation layer- Performs data conversion & encryption,
7. Application layer- Provides the user interface to lower levels,

The serial communication systems can be divided into simplex, half-


duplex and full duplex. A simplex serial communication device sends
information only in one direction (i.e. a commercial radio station). Half-
duplex means that data can be send in either direction between two
systems, but only in one direction at a time. In a full-duplex transmission
each system can send and receive data at the same time. There are two
ways to transmit serial data: synchronously or asynchronously.

Fig. 9-26. OSI model of a computer communication system

571
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In a synchronous transmission, data is sent in blocks and the transmitter


and the receiver are synchronized by one or more special characters,
called sync characters. The serial port of the PC is an asynchronous
device. For asynchronous transmission, a start bit identifies its start and
1 or 2 stop bits identify its end, don't need any synchronization. The data
bits are sent to the receiver after the start bit. The least significant bit is
transmitted first. A data character usually consists of 7 or 8 bits.
Depending on the configuration of the transmission a parity bit is send
after each character. It is used to check errors in the stream of data.
Finally 1 or 2 stop bits are sent.

1 2 3 4 5 6 7 8
Space
Mark
1 0 1 0 1 1 1 0 1 0 0
Start -------------- Data bits ----------- Parity Stop bits

Fig. 9-24. Serial data frame. The picture shows how data leaves the serial port for
the period character "." (0101110) with even parity.

Indeed, the majority of serial ports in the past were using the RS232C
interface standard. The RS232 was first introduced in 1962 by the EIA,
and has remained widely used through the industry. In fact, the RS232 is
still supported in many microcontrollers and embedded computer
projects. The RS232 signals are represented by voltage levels with
respect to a system ground. As shown in figure 9-24, the "idle" state
(MARK) has a negative signal level, and the "active" state (SPACE) has
the positive signal level. There are 2 types of RS-232 devices. The first is
called a Data Terminal Equipment DTE device. A common example is a
computer. The other type is called a Data Communications Equipment
DCE device. A common example is a MODEM. The RS-232 interface
pre-supposes a common ground between the DTE and the DCE. This is a
reasonable assumption when a short cable connects the DTE to the DCE,
but with longer lines or connections with different grounds, this may not
be true . S232 data is bipolar. Thus, a +312V indicates an ON or 0-state
(SPACE), while a –3V-12V indicates an "OFF" 1-state (MARK).
Modern computer equipment ignores the negative level and accepts a
zero voltage level as the "OFF" state. In fact, the "ON" state may be
achieved with lesser positive potential. This means circuits powered by
5V are capable of driving RS232 circuits directly. However, the overall
range of the RS232 signal may be dramatically reduced when transmitted
or received.
572
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In the various RS-232-like definitions this dead area may vary. RS232
has numerous handshaking lines. These lines specify the communication
protocol that controls data flow between the DTE (PC) and DCE
(external devices). Request to Send (RTS) is one of the hardware
handshaking signals. When the PC wants to send data to an external
device it sets this pin to 0. In other words, it sets the pin to 0 and says "I
want to send you data. Is it ok?" The external device (MODEM) says it is
OK to send data by setting its clear to send (CTS) pin to 0. The PC then
sends the data. Clear to Send (CTS) is the other half of hardware
handshaking. As noted above, an external device, like a MODEM, sets
this pin to 0 when it is ready to receive data from PC

Another important data flow control is called software handshaking. Like


the hardware handshaking, software handshaking is used to make sure
both communicating devices are ready to send/receive data. The most
popular "character flow control" is called XON/XOFF. We simply let the
receiver sends the XOFF character when it wants the transmitter to pause
sending data. When it's ready to receive data again, it sends the XON
character. XOFF is referred to as the hold-off character and XON as the
release character..The types of buffer/driver IC's used in serial ports can
be divided into three general categories :

 Classical Drivers which require plus (+) and minus (-) voltage power
supplies as the 1488 series of IC's (most desktop PC's use this type).
 Low power drivers which require one +5V power supply (as DS232).
This type of driver has an internal charge pump for voltage
conversion. Many industrial microprocessor controls use this type.
 Low voltage (3.3V) drivers which meet the EIA-562 standard (for
laptops(.

9-5.2. Universal Asynchronous Transmitter Receiver (UART) Chip


The universal asynchronous transmitter receiver (UART) chip is the heart
of the PC serial port. This device is performing the parallel to serial and
serial to parallel conversions and providing the handshaking between two
connected devices. The UART receives data from the I/O bus and sends a
start bit, eight bits of data, and a stop bit over one of the wires of an
external interface. In the other direction, it receives data one bit at a time
until the entire byte is assembled. It then generates an interrupt to the PC
so the system will read the new data from the chip and store it in the
computer memory. In brief, the UART chip convert the data from the
data bus into a serial flow and when information is received, the UART
collects data into bytes (8 bits) and passes those bytes onto the data bus.
573
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In order to do so, the UART has parallel-to-serial shift registers and for
serial-to-parallel conversions. The UART chip provides also all the
handshaking required to control the flow of data to and from the
computer and other serial devices. There are many different types of
UART chips, but the PC COM port is based on chips compatible with the
National Semiconductor 8250. The 8250 chip was used in the serial ports
of PC/XT (based on 8088 microprocessor), and the 16450 in the serial
ports of PC/AT (80286) and then with 80386, and 80486 machines, until
early 1995. Over the years the maximum data rate provided by devices
connected to the serial ports has been steadily rising. Back in 1987, the
2.4kb/s telephone MODEM was considered fast. Until 2000, the cost
effective telephone MODEM's were transferring data at 56 kb/s.

The more recent ADSL MODEM's are so much faster (about 1 Mb/s or
higher). The serial ports must keep up with the MODEM and therefore
the UART must be faster than the MODEM. The UART chips usually
include four internal registers:

THR: Temporary output register.


TSR: Output register.
RDR: Input register.
RSR: Temporary input register.

Every character to be transmitted is stored in the THR register. The


UART adds the start and stop bits, then copies all bits (data, start, stop
bits) to the TSR. To finish the process the bits are sent to the TD signal.
Every character received from the line RD is stored in the RSR register.
The start and stop bits are eliminated and the UART writes this character
to the RDR. To finish the process the character is read for the PC. Figure
9-25 depicts the 8250 UART connection with a MODEM.

When the 8250 UART chip initiates transmission, the sequence of


operation is as follows:

1- The 8250 sends RTS signal (Request to Send) to MODEM (DCE)


2- The MODEM (DTE) responds with DSR signal (Data Set Ready)
3- The 8250 starts transmission via SOUT or TxD (Transmitted Data)
4- The 8250 sends DTR signal (Data Terminal Ready) to MODEM.
5- The MODEM responds with CTS (Clear to Send) signal
6- The MODEM starts transmission via SIN or RxD (Received Data)

574
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-25. Connection of the 8250 UART with a MODEM, for serial data
communication between two computers via a telephone line.

9-5.3. Description of the Serial Port.


The PC serial port has been traditionally compatible with the RS-232C
standard. The standard specifies 25 signal pins, and that the DTE
connector should be a male and the DCE connector should be a female.
The most used connectors have been the DB25 although many PC's use a
DB9 connector. The voltage levels are -3V to -15V for a logic high and
+3V to +15V for a logic low. The most commonly used signals in a
serial port are listed below for data transmission is initiated by MODEM:

DTR (Data-Terminal-Ready): PC tells the MODEM that is powered up


and ready to send data.
DSR (Data-Set-Ready): MODEM tells the PC it is powered up and ready
to transmit or receive data.
RTS (Request-To-Send): PC sets this signal when it has a character ready
to be sent.
CD (Carrier-Detect): MODEM sets this signal when detects PC.
CTS (Clear-To-Send): MODEM is ready to transmit data.
TxD: MODEM receives data from de PC.
RxD: MODEM transmits data to the PC.

575
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-5.4. How Many Wires do We Need for a Serial Connection?


RS232 connection can be made with as few as 2 wires and a ground wire.
However, a full implementation of the standard uses 9 wires, to provide
handshaking capabilities, as shown in figure 9-21. The serial ports on the
back of the PC use either a DB9P or a DB25P plug and maximum
unshielded cable length is about 30 m. If you need to put a printer some
distance from the computer it is necessary to use a serial interface rather
than a parallel interface.

Pin Signal Description


1 DCD Data Carrier Detect
2 RxD Received Data
3 TxD Transmitted Data
4 DTR Data Terminal Ready
5 SG (GND) Signal Ground
6 DSR Data Set Ready
7 RTS Request (Ready) to Send
8 CTS Clear to Send
9 RI Ring Indicator
Fig. 9-26. DB9 serial cable interface.

Fig. 9-27. Typical serial cable connection.

576
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-5.5. Addressing the Serial Port.


The serial port requires a small range of I/O addresses and an IRQ line.
The original DOS assignments for communication ports (COM1 and
COM2) were like the following table. With the introduction of DOS
version 3.1, two more serial ports (COM3 & COM4).were provided
Table 9-3. Addresses of the PC communication ports

COM Port I/O Address IRQ


COM 1 3F8 to 3FF IRQ 4
COM 2 2F8 to 2FF IRQ 3
COM 3 3E8 to 3EF IRQ 4
COM 4 2E8 to 2EF IRQ 3

There are two ways to address the serial port, by the 14H BIOS interrupt
and by the 21H DOS interrupt. The 14H BIOS interrupt uses four
functions to program the serial port. Each function is selected assigning a
value to the AH register of the CPU. We list these functions below:

Function 00H: Initializes serial port, sets speed, data stop and parity bits
Function 01H: Sends a character to the specified serial port.
Function 02H: Reads a character from the specified serial port.
Function 003: Returns the state of the specified serial port.

There are three functions in the 21H DOS interrupt related to the
operation of the serial port:

Function 03H: Reads a character from the COM1 serial port.


Function 04H: Writes a character to the COM1 serial port.
Function 40H: This function sends bytes from buffer to device

9-5.6. Programming the Serial Port.


The following program shows you how to communicate two PCs via the
serial port:
// Program to communicate two PC's via serial port
// 00H BIOS function (in AL register)
// bits 7 6 5 Baud rate
// 0 0 0 110
// 0 0 1 150
// 0 1 0 300
// 0 1 1 600
// 1 0 0 1200
// 1 0 1 2400
// 1 1 0 4800
// 1 1 1 9600

577
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

//vbits 4 3 Parity bits


// 0 0 no parity
// 0 1 odd parity
// 1 1 even parity
// bit 2 stop bits
// 0 1 stop bit
// 1 2 stop bit
// bits 1 0 Number of bits per data
// 1 0 7 data bits
// 1 1 8 data bits
// Register Dx 0->COM1, 1->COM2, 2->COM3, 3->COM4
// Configuration: 9600 bit/s, No parity, 2 stop bits,
// 8 data bits. AL register value is 11100111 (0xE7)
#include <stdio.h>
#include <process.h>
#include <conio.h>
#include <dos.h>
#include <bios.h>
#define TRUE 1
#define PARAM 0xA7
#define COM1 0
#define COM2 1

void init_port(void);
char state_port(void);
void send_byte(unsigned char);
unsigned char read_byte(void);
void keyb(void);
int tecla = 1;
//**************************************************

int main ( void )


{
unsigned char read_com, read_kb;
clrscr();
init_port();
while(read_kb != 'c') {
read_kb=getch();
send_byte(read_kb);
read_com=read_byte();
if(read_com!=0){printf("%c",read_com);}
}
return 0;
}//*************End main()**************************

void init_port()
{
union REGS regs;
regs.h.ah = 0x00;
regs.x.dx = COM2;
regs.h.al = PARAM;
int86( 0x14, &regs, &regs);
}//*************************************************

char state_port() // return state of the port*******


{
union REGS regs;
regs.h.ah = 0x03; regs.x.dx = COM2;
int86( 0x14, &regs, &regs);

578
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

if(regs.h.ah & 0x80) printf("\t EXCEED TIME\n");


if(regs.h.ah & 0x40) printf("\t TSR EMPTY\n");
if(regs.h.ah & 0x20) printf("\t THR EMPTY\n");
if(regs.h.ah & 0x10) printf("\t INTERRUPTION\n");
if(regs.h.ah & 0x08) printf("\t THREAT ERROR\n");
if(regs.h.ah & 0x04) printf("\t PARITY ERROR\n");
if(regs.h.ah & 0x02) printf("\t OVERLOAD ERROR\n");
return (regs.h.ah); }
//**************************************************

void keyb() //Keyboard handle********************


{
union u_type{int a;char b[3];} keystroke;
char inkey=0;
if(bioskey(1)==0) return;
keystroke.a=bioskey(0);
inkey=keystroke.b[1];
switch (inkey)
{
case 1: keyb=0; return; // ESC
default: keyb=15;
return;
}
}//****************************************************

void send_byte(unsigned char byte) // Send char to serial port


{
union REGS regs;
regs.h.ah = 0x01;
regs.x.dx = COM2;
regs.h.al = byte;
int86( 0x14, &regs, &regs);
if( regs.h.ah & 0x80) {
printf("\t SENDING ERROR ");
exit(1); }
}//************************************************

unsigned char read_byte() //read a char from serial port


{
int x,a;
union REGS regs;
if((estate_port() & 0x01)) {
regs.h.ah = 0x02; regs.x.dx = COM2;
int86(0x14,&regs,&regs);
if(regs.h.ah & 0x80) {printf("\t RECEIVING ERROR"); exit(1);}
return(regs.h.al);}//////////////////////////////////
else return(0);
}//***************************************************

9-5.7. Universal Serial Bus (USB)


The Universal Serial Bus (USB), introduced in 1997 took some time to
gain acceptance but by the middle of 1998 manufacturers were starting to
make peripheral devices interfaced to the PC via USB. The USB has
become nowadays the standard serial bus for notebook and desktop
computers. The USB standard was developed by Compaq, Digital
Equipment, IBM, Intel, Microsoft, NEC, and Northern Telecom.
579
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

It is an open and royalty-free standard. All that is required is to plug a


pair of external USB connectors into a socket on the board. The USB
provides advantages over the old mixture of serial, parallel and games
(SPG) ports, and custom I/O interfaces used for the existing devices like
keyboard, mouse, printers, data modems, backup storage and scanner
devices. It can also be used to interconnect between computer systems.
The biggest factor in its wide acceptance by computer industry is its "ease
of use". Some key features of the USB that provide ease of use include:

 Completely Plug and Play - Peripherals will be correctly detected and


configured automatically as soon as they are attached to a computer.
 Hot attach and detach - Allowing adding and removing devices at any
time, without powering down or rebooting the computer system.
 USB should provide a direct connection to the phone network with
these advantages:
 Enable PBX and digital phone connectivity without add-in cards.
 Enhanced Time Division Multiplexing (eTDM) to support high speed
digital telephone trunk interfaces, like ISDN, T1 or E1 lines.

Male
Connectors

Female
Connectors

Host computer side Device side


Fig. 9--28. USB connectors. USB devices side (Type-B plug) and host computer side
(Type-A plug).

580
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

A single connector can be used for chaining many devices that have to be
interfaced via Serial, Parallel or Games Ports in the past. All kinds of
devices can be hooked to the PC through the same connector
simultaneously. These devices include:

 The keyboard, mice and other pointing devices


 Modems
 Printers, Graphical scanners, bar code scanners and Digitizers
 Digital cameras and web cams
 Memory sticks (mobile flash memory devices)

USB requires less real estate (less space on the back plane) than existing
I/O ports and this is particularly important for laptop and hand-held PDA
systems. It reduces the number of BUS slots required on the system
board, allowing a footprint reduction for desktop systems.

Fig. 9-29. USB cables lines. Data is transferred on differential pair (D-, D+).

The USB system is generally composed of 3 parts:

 USB host, which is usually the computer (only one host is allowed)
 USB devices (serial devices, like mice and kB, or hubs)
 USB interconnects.

USB Specifications
The old USB 1.1 had ample bandwidth for digital gaming peripherals and
video applications, and provided cost effective connection for peripheral
devices. The main features of USB 1.1 are as follows:

 12 Mbps design with low cost peripherals


 Supports up to 127 devices
 Both synchronous and asynchronous data transfers
 Up to 5 m cable length
 Built in power distribution (VBUS) for low power devices
 Supports chaining through a tiered star multi-drop topology, via hubs
581
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

USB 2.0, is 40X faster than USB 1.1. It provides additional bandwidth of
peripherals that may be attached to your computers. USB 2 moves data at
480 Mb/s, and backwards compatible with the old USB devices. USB 3.0
is the new standard for super speed Universal Serial Bus, USB 3.0 can
support speeds up to 5Gbps. USB3 is back compatible with USB2. Much
like its predecessors, USB 3.0 will offer a variety of plug and receptacle
types. Note that the "B" plug on USB3 is different from USB2.

Fig. 9-30. USB 3/0 male plugs. Note that the B plug is different from that of USB2

9-5.8. USB to RS232 Interface.


The FT232R is a USB to serial UART interface with optional clock
generator output. The FT232R adds two new functions compared with its
predecessors, effectively making it a "3-in-1" chip for some application
areas. The internally generated clock (6MHz, 12MHz, 24MHz, and
48MHz) can be brought out of the device and used to drive a
microcontroller or microprocessor. The following figure depicts the
FT232R circuit for RS232 -USB interface

A unique identification number is burnt into the FT232R device during


manufacture and is readable over USB, thus forming the basis of a
security dongle which can be used to protect customer application
software from being copied. Software drivers, which allow FTDI devices
to work with several operating systems like Windows XP/Vista, are
available. For most of these operating systems two types of driver are
available: Virtual COM Port (VCP) drivers and direct (D2XX) drivers.
The VCP driver emulates a standard PC serial port such that the USB
device may be communicated with as a standard RS232 device. The
582
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

D2XX driver allows direct access to a USB device via a DLL interface.

Fig. 9-31. FT232R USB-UART chip for RS232 -USB interface.

9-5.9. Other Serial Bus Standards (ACCESS.bus, Fire Wire, IrDA)


The conventional RS232 standards based serial interfaces provided on
PC's are limited to a data rate of about 115 kb/s and this is only suitable
for slow peripheral devices like mice, printers and telephone modems.
The traditional parallel port has been enhanced to a point where it is fast
enough to be used to connect many devices today but its biggest problem
is with the standard cables used between the PC and devices connected
via the parallel port. The cables used for a Serial connection are far
simpler than those used for a Parallel connection, requiring only two to
four wires.
583
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

A. ACCESS.bus
The concept of ACCESS.bus (or A.b), was originally developed by
Philips Semiconductors and Digital Equipment Corp. (DEC) in the early
1990‘s, and was taken over by the Independent ACCESS.bus Industry
Group (ABIG). As compared to I2C bus, ACCES.bus adds two wires to
provide power to the connected devices (+5 V and GND). Also, A.b
supports the 100 kbit/s and 10 kbit/s modes. Compared to USB, A.b has
several advantages. One is that any device on the bus can be a master or a
slave, and a protocol is defined for selecting which one a device under
any particular circumstance. This allows devices to be plugged together
with A.b without a host computer. For instance, a digital camera could be
plugged directly into a printer and become the master. Under USB the
computer is always the master and the devices are always slaves. At first
the ACCESS.bus showed the potential to become an industry standard. On
the downside, A.b is much slower than USB. For this reason, it did not
attract much support from PC hardware manufacturers.

B. FireWire (IEEE 1394)


The FireWire is a fast universal serial bus, which was designed by Apple.
FireWire has been endorsed by the IEEE who has formulated a
specification for the technology, outlined in document IEEE 1394. The
FireWire 400 (IEEE 1394a) has a transfer rate of 400 Mb/s, almost the
same as USB 2.0. There is also even faster version of 800 Mb/s transfer
rate (called FireWire 800 or IEEE 1394b). FireWire allows for 63 devices
on a single bus without pre-assigning addresses and without the need for
terminating devices. The FireWire bus can be used in daisy chain; star or
tree topologies. The FireWire connectors are derived from the Nintendo
GameBoy design and use either a friction detent (standard) or the special
side-locking tab restraints (You squeeze the sides of the connectors for
removal). The most common FireWire interface is present in 6-pin
(desktops) and 4-pin (laptops and camcorders, etc.) variants.

4-pin FireWire connectors 6-pin FireWire connectors


Fig. 9-32. FireWire 400 (IEEE 1394a) connectors.
584
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In the computer industry some people see USB and FireWire as


complementary technologies. They see USB being used for applications
that do not need the high-speed transfer rate. Devices such as keyboards,
mice and modems, and FireWire used for video processors, video
capture, virtual reality gaming devices and high speed data transfer to and
from external storage devices.

C. The IrDA Bus


The IrDA is a standard that was started by the Infrared Data
Association in June 1993, The IrDA maximum data rated from the
original 115 kb/s to 4 Mb/s. It was intended as an alternative to Wireless
(RF) networks for connecting Laptop computers to local area networks
(LAN's), or for replacing parallel or serial cables to printers. The first
IrDA standard was published in June 1994, and it was revised by August
1994 with the high speed extensions to the specification.

Fig. 9-33. Connection of the PC with peripheral devices, via the IrDA port.

The original basic specifications for IrDA were:

 Serial, a synchronous communications


 Range of at least 1 m
 Data rate, 2400 to 115 kb/s (kilo bits per sec)

Support for IrDA must be provided at both the hardware and software
level. Many of the system boards have an IrDA interface and hardware
support provided by the BIOS. Many Laptop Computers have an IrDA
port built in and some printers have IrDA interfaces.

585
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The implementation of IrDA on system boards usually takes the place of


one of the conventional serial ports. This may not be a problem with
system boards which provide a PS/2 mouse port. This also means the data
rate is limited to the fastest speed available from the UART, 115 kb/s.
However, some attempts have been made to run at higher speeds,
included in the IrDA specification. The high-speed extensions to the
IrDA standard allow data rates of 1.152 Mb/s and 4 Mb/s. High speed
devices are backward compatible with the first IrDA standard and devices
built to this specification.

9-5.10. PC-to-PC Communication (Networking & Ethernet).


The most popular reason to network PC's is to allow multiple users to
share files, applications and hardware resources. For instance, you can
share a single printer, or a single modem with one phone line, or an
Internet account for Web access. Network interface cards (NICs) act as
the physical interface or connection between the computer and the
network cable. Figure 9-28 shows a NIC with a coaxial-cable connection.
The cards are installed in an expansion slot in each computer and server
on the network. The role of the NIC is to:

 Prepare data from the computer for the network cable.


 Send the data to other computers.
 Control the flow of data between the computer and the cables.

Ethernet is the most popular networking topology standard for computer


connections. Ethernet was invented in the 1970's at the Xerox Research
Center and formalized as a universal standard (IEEE 802.3) in 1985.
There have been many kinds of Ethernet, but the most popular is 10/100
Mb/s running over copper twisted pair wires. 100 Mb/s Ethernet is also
called 100Base-T and Fast Ethernet.

Older Ethernet standards ran on coaxial cable and were referred to as


10Base2 thin Ethernet and 10Base5 thick Ethernet. Another Ethernet
standard called Gigabit Ethernet or 1000 Base-T can run over copper
wires. The 10/100 Ethernet cables have 8 wires, of which 4 are used for
data. The other wires are twisted around the data lines for electrical
isolation from electrical interference. The cables end in RJ-45 connectors
that resemble large telephone line connectors. Two kinds of wiring
schemes are available for Ethernet cables. Patch cables and crossover
cables. Crossover cables are special because with a single cable, two
computers can be directly connected together without a hub or switch.

586
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-34. Network interface card, with BNC connector.

Fig. 9-35. PC networking via hubs and switches.

Hubs connect your computers together and switches act as full-duplex


traffic cops making your network more efficient. If you are connecting
many computers via a hub or a switch, you need interface cards and patch
cables. There are different grades of cable quality. The most common are
CAT5, CAT5e and CAT6. CAT5 is good for most purposes and can
transfer data at 100Mbps. CAT5e is rated for 200Mb/s and CAT6 is rated
for gigabit Ethernet. In order to cope with distance limitation problems,
the network signals are amplified with repeaters. A repeater in is a
device that allows multiple and longer cables to be joined.

587
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Figure 9-36. PC Networking via hubs or switches, with different configurations.

Traditional Ethernet employs a bus topology, meaning that all devices or


hosts on the network use the same shared line. Each device possesses an
Ethernet address. Data sent over the Ethernet exists in the forms of
frames. An Ethernet frame contains a header, a data section, and a footer
having a combined length of about 1518 bytes. The Ethernet header
contains the addresses of both the intended recipient and the sender. Data
sent over the Ethernet is automatically broadcast to all devices on the
network. By comparing their Ethernet address against the address in the
frame header, each Ethernet device tests each frame to determine if it was
intended for them and reads or discards the frame as appropriate. Devices
wanting to transmit on the Ethernet first perform a preliminary check to
determine whether the medium is available or whether a transmission is
currently in progress. If the Ethernet is available, the sending device
transmits onto the wire. It is possible, however, that two devices transmit
at the same time. When a collision occurs, it causes both transmissions to
fail and both devices should re-transmit again. Ethernet uses an algorithm
588
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

based on random delay times to determine the proper waiting period


between re-transmissions.

Crossover Cable Straight Thru Cable


RJ-45 RJ-45 RJ-45 RJ-45
PIN (PC) PIN (PC) PIN (PC) PIN (Hub)
1 Rx+ 3 Tx+ 1 Tx+ 1 Rx+
2 Rx- 6 Tx- 2 Tx- 2 Rx-
3 Tx+ 1 Rx+ 3 Rx+ 3 Tx+
4 4 4 4
5 5 5 5
6 Tx- 2 Rx- 6 Rx- 6 Tx-
7 7 7 7
8 8 8 8
Fig. 9-37. The Ethernet standard connector (RJ-45 plug and cables)

Routers and bridges link two or more individual Local Area Networks
(LAN's) to create an extended-network LAN or Wide Area Network
(WAN). On the other hand, routers are physical devices that link
multiple wired or wireless networks. Home networks often use an
Internet Protocol (IP) wired or wireless router, where IP is the most
common OSI network layer protocol. An IP router such as a DSL
MODEM router can join the home LAN to the Internet WAN. Since
2000, 802.11b has become the standard wireless Ethernet networking
technology for wireless LAN's (WLAN's). As shown in the following
figure, an ADSL (asynchronous digital subscriber line) circuit connects
an ADSL modem on each end of a twisted-pair telephone line The
802.11b is a half duplex protocol – it can send or receive, but not both at
the same time. The 802.11b adapter cards come in two major forms,
namely, PC Cards for laptops and USB cards for desktops. In addition,
there are PCI adapters that let you plug a PC Card into a PCI Slot. An
802.11b wireless network adapter can operate in two modes, Ad-Hoc and
Infrastructure. In infrastructure mode, all your traffic passes through a
wireless ‗access point‘. In Ad-hoc mode your computer talks directly to
other computers and does not need any access point.
589
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-38(a). Wideband area network (WAN) and Internet connection to a local area
network (LAN) via wideband DSL modems.

The Wi-Fi organization was created to ensure interoperability between


802.11b products. So, WiFi refers to a set of wireless networking
technologies more specifically referred to as 802.11a 802.11b and
802.11g. The WiFi standards are universal, and allow users that have
WiFi devices, like a laptop, to connect anywhere (where a WiFi access
point is available). The WiFi standards signify the speed of the
connection. The 802.11b (which transmits at 11 Mb/s) is the most
common, although it is quickly getting replaced by the faster WiFi
standards. Both 802.11a and 802.11g are capable of 54Mb/s. Generally
speaking, all of the WiFi standards are fast enough to generally allow a
broadband connection. Table 9-4 shows a comparison between different
networking technologies. The 802.11b uses the same 2.4GHz carrier as
many cordless and cell phones.

590
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-38(b). Wideband area network (WAN) and Internet connection to a local area
network (LAN) via wideband modems and a hub or a switch.

Table 9-4. Comparison between different communication networking technologies.

Technology Speed (Mb/s) Type Range Cost


FireWire (IEEE1394) 393-3145 Cables D↓ A↑
USB 1.1/2.0/3.0 12/480/4800 Cables D↓ A↑
Ethernet 10/100 10/100 Cables A A
Ethernet Gigabit 1/100 Gb/s Cables A D
IrDA 1-100 Wireless, IR D C
WiFi 802.11b 11 Wireless, RF B B
WiFi 802.11a 52/72 Wireless, RF C C
WiFi 802.11g 22/54 Wireless, RF C NA
Bluetooth 1-3 Wireless, RF D C
PowerLine (PLC) 15 Cables D↓ A↑

9-5.11. Switching Networks.


A switched network goes through a switch instead of a router. This
actually is the way most networks are headed, toward flat switches on
instead of routers. A router operates at Layer 3 of the OSI Model and can
create and connect several logical networks, including those of different
network topologies, such as Ethernet and Token Ring. Being a Layer 3
device, the router uses the destination IP address to decide where a frame
should go.
591
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

A switch is very like a bridge as it looks to data destination addresses to


determine where data should be directed. However, switches are superior
to bridges because they provide greater port density and intelligence. The
three most common switch methods are:

1. Store-and-Forward - The entire frame is copied into the switch


memory and it stays there while the switch checks for errors. If the frame
contains no errors, it will be forwarded, otherwise it will be dropped
2. Cut-through - Streams data so that the first part of a packet
switches before the rest of the packet finish entering the switch.
3. Fragment-free Switching - This is a hybrid of cut-through and
store-and-forward.
Data switching networks, that is, networks which are designed for data
transmission by switches, can be broadly divided into two categories:
1- Circuit switching networks
2- Packet switching networks
In circuit-switched networks, network resources are static, set over
copper wires, from the sender to receiver, thus creating a circuit. The
resources remain dedicated to the circuit during the entire transfer and the
entire message follows the same path. Examples of circuit switching
networks are the public switching telephone networks (PSTN) and the
cellphone network. In such networks, each circuit cannot be used by other
callers until the circuit is released and a new call is set up. In packet-
switched networks, the message is broken into small entities called
frames or packets. Each packet is labeled with its destination. These
packets take different routes to the destination where the packets are
assembled into the original message. The so-called Asynchronous
Transfer Mode (ATM) combines voice and data communication using
short packets (called cells) of 53 bytes, as shown in figure 9-39.

 ATM Cell (53 bytes) 


Header (5 Bytes) Data (48 Bytes)
Addres Contro Checksu D0 D1 …….…….. D46 D47
s l m

Fig. 9-39. ATM data packets (cells)as compared to conventional RS 232 frames.
592
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-6. PC Parallel Ports


The parallel port is a standard which was designed to connect computers
and to transmit parallel data between parallel devices. It was used in IBM
PC & compatible computers to send data to parallel printers. This
interface drives some input and output signals. The purpose of such
signals is to let the PC know the state of a printer and control it. The
parallel port has 8bits data bus to carry the information sent to the printer.
The parallel port standard is based on the Centronics parallel interface
standard but it has been modified to be bi-directional. Some older Parallel
Port hardware in some DOS type computers is not fully bi-directional and
these will not work some devices such as pocket hard drives and tape
backup drives.

Fig. 9-40. Transmission of parallel data, through parallel ports.

Parallel Ports can be generally used for:

 Printers with a parallel interface


 Computer-to-Computer file transfer (e.g., in Interlink)
 Simple network (like Lantastic Z)
 Low cost Tape Backup drives
 External CD-ROM drives
 Optical Scanner devices and Digital Camera
 SCSI interface adaptors that provide a SCSI via Parallel Port
 Pocket Hard Disk Drives interface via a Parallel Port
 Data Acquisition and Control
 Software Security Device (Dongles)

593
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The standard parallel cable has a DB25P (plug) on the computer end (a
socket is used on the computer) and a 36-pin Centronics plug on the
printer end. The cable should be shielded and should be no longer than
3m. When ASIC chips were first used to provide the parallel port, some
of these had trouble driving long cables (over 3 m) because they had
CMOS outputs rather than TTL outputs and they did not like high
capacitance loading.

9-6.1. Parallel Port Architecture


The parallel port, as implemented in the original IBM PC, consisted of a
DB25S connector with 17 signal lines and 8 ground lines. The signal
lines are can be divided into three groups:

 Data (8 lines)
 Control (4 lines)
 Status (5 lines)

Both the Status and Control lines (9 lines) provide handshaking. All these
signals are connected to a 25-pin connector (DB25), as shown in figure 9-
31. All the bits have TTL logic levels. The signal lines are listed below:

Outputs:
 STROBE (pin 1): Tells the printer when the 8 data bits are ready
to be read.
Turns to a low logic level when the data are ready.
 D0-D7 (pin 2-9): Data bits.
 AUTO FD (pin 14): Tells printer to print empty line followed by
carriage return
 INIT (pin 16): Reset the printer.
 SLCT IN (pin 17): Selects the printer when it turns to a low logic
level.
Inputs:
 ACK (pin 10): Tells the CPU that the data has been correctly
received.
 BUSY (pin 11): The printer sets this line when its buffer is full. The
computer will stop sending more data.
 PE (pin 12): Paper End. The printer is out of paper.
 SLCT (pin 13): Tells the computer that a printer is present.
 ERROR (pin 15): An error occurred. The CPU stop sending data

The control lines (C0-C3) were originally designed as flow control


(handshaking) signals from the PC to the printer.
594
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The status lines (S3-S7) were used for Flow Control signals and as
Status Indicators for such things as paper empty, busy indication and
interface or peripheral errors. The data lines (D0-D7) were used to
provide data from the PC to the printer, in that direction only. Later
implementations of the parallel port allowed for data to be driven from
the peripheral to the PC.

Fig. 9-41. Parallel port DB25 connector.

9-6.2. IBM PC-Printer Connector Cable


The printer cable has a DB25P connector on the computer end and a 36-
pin Centronics connector on the printer end. To limit the Radio
Frequency Interference (RFI) generated the cable should be shielded on
both ends. The original limit on cable length was 3m but this depends on
the type of Parallel Port hardware, some can drive far longer cables.
Table 9-5. List of pins of the Parallel Interface cable.

PIN NUMBER
PIN NAME DESCRIPTION
DB25 36 pin Centronics
Strobe 1 1 1us pulse used to clock data into the printer
Data 0 2 2
Data 1 3 3
Data 2 4 4
Data 3 5 5
Data 4 6 6
Data 5 7 7
Data 6 8 8
Data 7 9 9
Acknowledge 10 10 acknowledge signal from printer to PC
Busy 11 11 used by the printer to stop the flow of data
Paper Empty 12 12 indicates the printer has run out of paper
595
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Select Out 13 13 indicates the printer is "on line"


Auto Feed 14 14 not often implemented - wired to ground
Error 15 32 indicates a fault in the printer
Initialization 16 31 clears the printers buffers and resets defaults
Select input 17 36 a signal on this line is same as select button
Ground 20 to 25 18-25, 16, 19-30, 33 18-25 paired with Data pins 2 to 9 as shields

9-6.3. Parallel Port I/O Addressing


The IBM PC & compatible computers have three LPTn ports. The
addresses of the control, status, and data signals for each LPTn port are
listed below in Table 9-6. Each port works in the same way that LPT1
does. This is a very powerful capability of this port.
Table 9-6 Addresses of the PC line printers (LPTn)

PORT DATA STATUS CONTROL


LPT1 378H 379H 37AH
LPT2 278H 279H 27AH
LPT3 3BCH 2BDH 3BEH

At boot-up of the PC, the setup routines in the BIOS look for parallel
ports on the I/O bus, and assigns the following LPT addresses, for LPT1,
to LPT3 in this order:

 03BC to 03BE
 0378 to 037A
 0278 to 027A
Officially LPT1 uses I/O address 0378 to 037A but when the BIOS setup
routine is looking for parallel Ports it assigns the first one it finds as
LPT1. The address 03BC to 03BE was first provided by a parallel port on
IBM Monochrome display adaptor but today it is quite common to find
this address available on Parallel Port hardware. The Parallel Ports are
assigned an IRQ line as follows.
Port IRQ
LPT 1 IRQ 7
LPT 2 IRQ 7 or IRQ 5

In old 8-bit PC computers (PC/XT) IRQ7 was assigned to both LPT1 and
LPT2 but in later generation hardware IRQ5 is assigned to LPT2. The
IRQ line is not usually used by software communicating with the LPT
ports and so IRQ7 and IRQ5 are available for other I/O functions. Sound
Cards use either IRQ5, 7 or 10 as the default IRQ.
596
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-6.4. Parallel Port Timing Diagram


The printer is assumed to use the Centronics protocol, shown in figure 9-
42. The printer sets BUSY high while it is processing a character. The
BUSY signal may also be high when the printer is disconnected, off-line,
or has an error. In the polled mode of printing, the character bits are put
on the DATA lines, BUSY is tested repeatedly until it is found to be low,
then the STROBE pulse is sent. The printer sets BUSY high when the
character data have been latched and sets it low again when the character
has been processed.

The Centronics protocol specifies that the DATA lines be stable from at
least 500 ns before to at least 500 ns after the STROBE pulse, and the
STROBE pulse be at least 500 ns long. These times may of course be
shortened for a specific printer, at the risk of loss of generality. Programs
using the polled mode should include a timeout counter to guard against a
permanent BUSY condition. BIOS calls and DOS functions use this
mode for printing.

In the interrupt-driven mode of printing, the positive-going edge of the


ACK signal is used to cause an interrupt 0Fh via the IRQ7 line to the
Interrupt Controller; the Interrupt Handler can send a new character to the
printer whenever it is invoked, since ACK indicates that the previous
character has been processed. The DOS command PRINT uses this mode
to spool and print files.

Fig. 9-42. Timing Diagram for the Centronics Protocol.

9-6.5. Programming the Parallel Port.


The two programs below are examples about the way you can program
the parallel port. You can compile them with any C compiler. For
instance, Microsoft VC++ provides access to the I/O ports on the 80x86
CPU via the predefined functions _inp / _inpw and _outp / _outpw.

597
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

 int _inp(unsigned pid); // returns a byte read from the I/O port pid
 int _inpw(unsigned pid); // returns a word read from the I/O port pid
 unsigned _outp(unsigned pid, int value); // writes byte to I/O port pid
 unsigned _outpw(unsigned pid, unsigned value); // writes word to I/O

The following program shows how to send and receive a byte to/from the
parallel port. The _outp(pid, value); function sends a byte to a specified
I/O port. The first function parameter (pid) is the address of the port to
write a byte. pid can be any unsigned integer in the range 0-65535. The
second parameter (value) is the value of the byte to send. Both parameters
can be defined as variables. The _inp(pid); function read a byte from the
specified I/O address (pid) of the computer.

#include <stdio.h>
#include <dos.h>
#include <conio.h>
#define DATA 0x378
#define STATUS 0x379
#define CONTROL 0x37A
int main (void)
{
clrscr();
int bits, dummy; // 0 <= bits <= 255
dummy = _outp(DATA, bits); // output data
Bits = _inp(STATUS); // input data
getch();
return 0;
}

The following example shows you how to read and write a byte from/to
the parallel port in VC++. Here are the steps to write the parallel port
interfacing application (pptest).
Start VC++ IDE, Select 'New' from File menu. Then select ―Win32
Console Application‖ from ―Projects‖, enter project name as ―pptest‖,
then click OK button. Select ―A simple Application‖ and click Finish.
Now open example1.cpp from ―FileView‖ and replace the existing code
with the following code. The test board that you may build to test this
code is shown in problem 4.

#include "stdafx.h"
#include "conio.h"
#include "stdio.h"
#include "string.h"
#include "stdlib.h"

598
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

int main (int argc, char* argv[])


{
short data;
if(argc < 2)
{
printf("Usage\n\n");
printf("partest1.exe ,,\n\n\n");
return 0;
}
if(!strcmp(argv[1],"read"))
{
data = _inp(atoi(argv[2]));
printf("Data read from parallel port is ");
printf("%d\n\n\n\n",data);
}
if(!strcmp(argv[1],"write"))
{
_outp(atoi(argv[2]),atoi(argv[3]));
printf("Data written to parallel port is ");
printf("%s\n\n\n\n\n",argv[3]);
}
return 0;
}

9-6.6. Recent Improvements in the PC Parallel Port


Over the years the Parallel port in typical PC Computers, has undergone
slow but steady improvement. We now have six types of Parallel Ports
that have been used over the years.

1. Unidirectional (4 bit)
2. Bi-directional (8 bit)
3. Standard Parallel Port (SPP), also called Type 1
4. DMA Type 3 (used only by IBM)
5. Enhanced Parallel Port (EPP)
6. Enhanced Capability Port (ECP)

By 1994 this development was getting out of hand, and so the IEEE set
down standard modes of operation for the parallel port, in an document
with the title IEEE 1284-1994, standard signaling method for a bi-
directional parallel interface for PC's. Before this time there were no set
standards as to how the parallel port should behave when connected to
devices such as printers, Scanners External Disk Drives etc. The IEEE
defined five modes of operation. These modes take care of the various
types of hardware that have developed over the years since the PC
Computer was released.
599
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

1. Compatibility (or Centronics) Mode.


2. Nibble Mode.
3. Byte Mode
4. EPP Mode (Enhanced Parallel Port).
5. ECP Mode (Extended Capabilities Mode)

This IEEE specification is aimed at standardizing the behavior between a


PC and attached devices. Although the specification deals mainly with
printers, devices like SCSI adaptors, scanners, external disk drives and
tape backup, and LAN interfaces are also covered to some extent.
i- Unidirectional (4-bit) Port
When the PC was first designed, the Parallel Port was only intended to
send data to a printer. Data lines send 8-bit data to the printer, and control
lines, were provided for flow control and error signals. The 5 control lines
from the peripheral to the PC are dedicated for external status indications.
However, via these lines, a peripheral can send a byte of data (8 bits) by
sending 2 nibbles (4 bits) of information to the PC in two transfer cycles.
The unidirectional (4 bit) port was capable of data transfer rates of 40 to
60 kB/s in reverse direction and up to 140 KB/s in the forward direction.
ii- Bi-direction (8-bit) Port and Standard Parallel Port (SPP)
This was introduced in 1987 with the IBM PS/2 range of computers.
Alternative names include, PS/2 type, or Type 1. Data transfer rates as
high as 300 kB/s can be achieved. The Bi-directional Parallel Port opened
up the way for eight bit communications between the computer and
peripheral devices across the Parallel I/O Port. This was done by
redefining some unused pins in the Centronics Parallel connector, by
defining a Status Bit, to indicate in which direction data was sent.
iii- Bi-directional (8-bit DMA) Type 3 Port
The use of a DMA Channel made this port much faster than the Type 1
port covered above. There was also a similar Type 2 port from IBM but
this was not used for long and was superceded by the Type 3. IBM was
the only company to use Type 2 and 3 Parallel I/O Ports.
iv- Enhanced Parallel Port (EPP)
The EPP was developed in 1992 by Intel, Xircom and Zenith and is
sometimes referred to as the Fast Mode Parallel Port. EPP can operate
at the ISA bus speed, providing about ten times the data rate of the older
parallel port modes. Transfer rates up to 2MB/s are possible. This is
achieved by allowing the hardware contained in the port to provide flow
control, (handshaking) rather than have the software applications (service
routines) done it.
600
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The IEEE incorporated the EPP standard into its document 1284-1994,
so, we now have two standards for EPP. There is the original EPP
Standards version1.7, and the IEEE1284 version. Because the differences
were minor, peripherals can be designed to cope with the two variations,
but older peripherals made to the original EPP1.7 standard may not work
with the IEEE1284 ports.

v- Extended Capabilities Mode (ECP)


The Extended Capabilities Mode was jointly designed by Hewlett
Packard and Microsoft and announced in 1992. ECP was also included in
the IEEE 1284 specification in 1994. Like EPP, ECP uses additional
hardware to generate the flow control signals and runs at very much the
same speed as an EPP Port. In addition ECP requires a DMA channel to
move data about, and uses a FIFO buffer for sending and/or receiving
data. The use of a DMA channel can lead to conflicts with other devices
that also use DMA and it is often best to choose EPP mode rather than
ECP. The rapid adoption of plug and play (PnP) hardware, and PnP
aware operating systems like Windows XP/Vista, means the DMA
channel should no longer be a problem. Another feature of ECP is the
real-time data compression. It uses Run Length Encoding (RLE) to
achieve data compression ratio up to 64:1. This is useful for optical
scanners and printers where a good part of data is long repetitive strings.
The extended capabilities also support a method of channel addressing
that can address multiple devices (e.g., faxes, scanners and printers)
within one port.

9-6.7. Parallel Port I/O under Windows


As we have seen so far, writing programs to talk with parallel port is quite
easy in MSDOS and Win95/98 too. We can use inport() and outport() or
_inp() or _outp() C-language functions in programs without any problem if
we are running the program under DOS or WIN95/98. But with the new
NT clone operating systems like Windows NT/2k/XP/Vista, the old
simplicity goes away. However, we can use dedicated functions or
programs like the inpout32.dll3 to work with NT/2k/XP and inpout64.dll to
work with recent 64-bit versions of Windows XP and Vista. The
inpout32.dll and inpout64.dll are dynamic libraries that check the operating
system version when functions are called, and if the operating system is
Windows 9X, the DLL will use _inp() and _outp functions to read/write
the parallel port. On the other hand, if the operating system is Windows

3
Download this DLL from this site: http://www.logix4u.net/inpout32_source_and_bins.zip. This DLL
can be used in Win NT/XP as if it is Win 9X.

601
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

NT/XP/Vista, it will install a kernel mode driver and talk to parallel port
through that driver. Therefore, the code doesn‘t need to be aware of the
operating system under which it is running. The flow chart of the 32-bit
version is given below. The main functions exported from inpout32.dll are:

 Inp32(), Reads data from a specified parallel port register.


 Out32(), Writes data to specified parallel port register.

Fig. 9-43. Structure of the parallel port 32-bit drivers.

There exists an equivalent ActiveX control with same features of


inpout32.dll. It is called hwinterface.ocx and can be used with VC++ and
VB. Data can be written to parallel port using outport() method and can
be read using inport() method.

602
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-7. Attaching Mass Storage Devices to IBM PC & Compatibles


AT Attachment (ATA) and AT Attachment Packet Interface (ATAPI) are
interface standards for the connection of storage devices such as hard
disks and CD-ROM drives in computers.

ATA/ATAPI is an evolution of the AT Attachment Interface, which was


itself evolved in several stages from Western Digital's original Integrated
Drive Electronics (IDE) interface. As a result, many synonyms for
ATA/ATAPI exist, including abbreviations such as IDE which are still in
common informal use. With the market introduction of Serial ATA in
2003, the original ATA was retroactively renamed Parallel ATA (PATA).
For many years ATA provided the most common and the least expensive
interface for this application. By the beginning of 2007 it had largely
been replaced by serial ATA (SATA).

9-7.1. Historical Development of ATA and IDE


The first version of what is now called the ATA/ATAPI interface was
developed by Western Digital under the name Integrated Drive
Electronics (IDE). Together with Control Data Corporation and Compaq
Computer, they developed the connector, the signaling protocols, with the
goal of remaining software compatible with the existing ST-506 hard
drive interface. The first such drives appeared in Compaq PCs in 1986.
The term IDE refers not just to the connector and interface definition, but
also to the fact that the drive controller is integrated into the drive, as
opposed to a separate controller on or connected to the motherboard. The
integrated controller presents the drive to the host computer as an array of
512-byte blocks with a relatively simple command interface. This relieves
the software in the host computer of the chores of stepping the disk head
arm, moving the head arm in and out, and so on, as had to be done with
earlier ST-506 and ESDI hard drives.

The interface used by the IDE drives was standardized in 1994 as ANSI
standard X3.221, AT Attachment Interface for Disk Drives. In the
following years, updated versions of the standard were developed, under
the name ATA-1 and ATA-2. In 1994, Western Digital introduced the
Enhanced IDE (EIDE). Other manufacturers introduced their own
variations of ATA-1 such as Fast ATA and Fast ATA-2. The terms IDE
and EIDE have come to be used interchangeably with ATA (now Parallel
ATA). However the terms "IDE" and "EIDE" are at best imprecise. Every
ATA drive is an IDE drive, but not every IDE drive is an ATA drive, as
the term correctly describes

603
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-43(a). Structure of the IDE connector and how it is tied to the HDD.

Pin 1 Reset Pin 11 Data 3 Pin 21 DDRQ Pin 31 IRQ


Pin 2 Ground Pin 12 Data 12 Pin 22 Ground Pin 32 No connect
Pin 3 Data 7 Pin 13 Data 2 Pin 23 I/O write Pin 33 Addr 1
Pin 4 Data 8 Pin 14 Data 13 Pin 24 Ground Pin 34 GPIO_DMA
Pin 5 Data 6 Pin 15 Data 1 Pin 25 I/O read Pin 35 Addr 0
Pin 6 Data 9 Pin 16 Data 14 Pin 26 Ground Pin 36 Addr 2
Pin 7 Data 5 Pin 17 Data 0 Pin 27 IOCHRDY Pin 37 Chip select 1P
Pin 8 Data 10 Pin 18 Data 15 Pin 28 Cable select Pin 38 Chip select 3P
Pin 9 Data 4 Pin 19 Ground Pin 29 DDACK Pin 39 Activity
Pin10 Data 11 Pin 20 Key /VCC Pin 30 Ground Pin 40 Ground

Fig. 9-43(b). Pin list and cables of ATA connector (40-pin plug and ribbon cables).

604
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The so-called SCSI drives have the drive controllers on board and present
the drive to the host as an array of blocks. There have been several
generations of EIDE drives marketed, compliant with the ATA
specification. Another issue is to refer to the specification version by the
fastest mode supported. For example, ATA-4 supports Ultra DMA modes
0 through 3, the latter providing a maximum transfer rate of 33MB/s.
ATA-4 drives are thus sometimes called Ultra DMA-33 (UDMA-33)
drives. Similarly, ATA-6 introduced a transfer speed of 100 MB/s.

Since the introduction of SATA around 2003, the conventional ATA is


referred to as Parallel ATA (PATA). The original ATA specification uses
28-bit addressing mode, allowing for the addressing of 228 (268,435,456)
sectors (blocks) of 512 bytes each, resulting in a maximum disk capacity
of about 137GB. This is displayed by Windows as 128GB. ATA-6
introduced 48-bit addressing, increasing the disk size limit to 144PB
(petabytes). Consequently, any ATA drive of capacity larger than 137GB
must be an ATA-6 or later drive.
9-7.2. Parallel ATA (PATA) Interface
Until the introduction of Serial ATA, the 40-pin connectors were attached
to drives by a ribbon cable. Each cable has two or three connectors, one
of which plugs into an adapter interfacing with the rest of the computer.
The remaining connector(s) plug into drives. PATA cables transfer data
16 bits at a time.

The ATA ribbon cables had 40 wires for most of its history (44 for the
small form-factor 2.5" drives), but an 80-wire version appeared with the
introduction of the Ultra DMA/33 (UDMA) mode. All of the additional
wires in the new cable are ground wires, interleaved with the previously
defined wires to reduce the effects of capacitive coupling between
neighboring signal wires, to reduce crosstalk. This was necessary to
enable the 66 MB/s transfer rate of UDMA4 to work reliably. The faster
UDMA5 and UDMA6 modes also require 80-conductor cables. Though
the number of wires doubled, the number of connector pins and the
pinout remain the same as 40-conductor cables, and the external
appearance of the connectors is identical. Internally the connectors are
different; the 80-wire cables usually come with three differently colored
connectors (blue - controller, gray - slave drive, and black - master drive)
as opposed to uniformly colored 40-wire cable's connectors (all black).
The gray connector on 80-pin cables has pin 28 not connected; making it
the slave position for drives.

605
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9.7.3. Serial ATA (SATA) Interface


Serial ATA (SATA) bus is the serial version of the IDE [ATA], primarily
used for hard disk connections. SATA has been used for PC storage since
the early 1990s. Like PATA, SATA bus has the primary function of
transferring data between the motherboard and mass storage devices
(such as hard disk drives) inside a computer. Actually, SATA is a
software transparent replacement for PATA that changes only the
physical interface layer and maintains compatibility with existing
operating systems and drivers. SATA offers advantages over the older
PATA interface: as faster data transfer, the ability to remove or add
devices while operating (hot swapping), and thinner cables.

Fig. 9-44. Serial ATA (SATA) and parallel ATA (PATA) connectors

Designed as a successor to the Advanced Technology Attachment


standard (ATA), SATA has replaced the older technology (retroactively
renamed Parallel ATA or PATA, also known as IDE or EIDE). Serial
ATA adapters and devices communicate over a high-speed serial cable.
The current SATA specification can support data transfer rates as high as
3.0 Gbit/s per device. SATA uses only 4 signal lines; cables are more
compact and cheaper than for PATA.

SATA supports hot-swapping and NCQ. There is a special connector


specified for external devices and an optionally implemented provision
for clips to hold internal connectors firmly in place. SATA drives may be
plugged into Serial Attached SCSI (SAS) controllers and communicate
on the same physical cable as native SAS disks, but SATA controllers
cannot handle SAS disks.

The SATA standard defines a data cable using seven pins to supply four
conductors shielded with ground supplied by the other three pins.

606
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-45. Serial ATA (SATA) plug and cables.

Pin Name Function


1 GND Ground
2 A+ Transmit+
3 A- Transmit-
4 GND Ground
5 B- Receive-
6 B+ Receive+
7 GND Ground

Transmit pins are connected to Receive pins on the other side. The SATA
connector is keyed at pin 7. SATA uses a 4 conductor cable with two
differential pairs [Tx/Rx], plus an additional 3 grounds pins and a
separate power connector. SATA runs at 150MBps(SATA/150),
300MBps(SATA II), or 600MBps transfer rates. Faster SATA
implementations are backward compatible with older devices. 8B/10B
encoding used for data transfers. Maximum unshielded cable length is
about 1 meter.

607
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The so-called External SATA (eSATA) was standardized in 2004, with


specifically defined cables, connectors, and signal requirements for
external SATA drives. eSATA features full SATA speed for external
disks, no protocol conversion from IDE/SATA to USB/Firewire, and
cable length may be up to 2m.

Fig. 9-46. External SATA (eSATA) plug

9-7.4. Comparison between ATA , SATA, SCSI and USB


SCSI currently offers transfer rates higher than SATA, but it uses a more
complex bus, usually resulting in higher costs. SCSI buses also allow
connection of several drives (using multiple channels), whereas SATA
allows one drive per channel, unless using a port multiplier.
SATA 3.0 Gbit/s offers a maximum bandwidth of 300 MB/s per device
compared to SCSI with a maximum of 320 MB/s. Also, SCSI drives
provide greater sustained throughput than SATA drives because of
disconnect-reconnect and aggregating performance. The Serial Attached
SCSI (SAS) is a new generation serial comm.-unication protocol for
devices designed to allow for much higher speed data transfers and is
compatible with SATA.

SAS uses serial communication instead of the parallel method found in


traditional SCSI devices but still uses SCSI commands. SAS and fibre-
channel (FC) drives are typically more expensive so they are traditionally
used in servers and disk arrays. Inexpensive ATA and SATA drives
evolved in the PC market, but there is a view that they are less reliable.

USB HDDs are external units containing ATA or SCSI disks with USB
ports on the back allowing very simple expansion and mobility.
Furthermore, USB HDDs are cheap and can work with both Macintosh
and Windows operating systems, without the need of driver installation.
The Maximum transfer speed for external SATA (eSATA or SATA 300)
is 300 MB/s. PATA is 133 MB/s and USB2 is around 60 MB/s. As digital
audio and image/video files are growing in use and size you may need to
upgrade to a larger hard drive. The external hard disks, with USB3
Universal Drive Adapter make it easy to transfer data between the old and
new drives.
608
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-47. SATA to USB converter.

Table 9-7. Comparison between serial devices and their data transfer rates

Interface Description Transfer Rate Best For


USB 1.1 Hot-swappable 12 Mb/s Connection
maximum burst transfer rate quickly
USB 2.0 Hot-swappable 480 Mb/s Connection
maximum burst transfer rate quickly
USB 3.0 Hot-swappable 4.8 Gb/s HD video and
full-duplex maximum burst transfer rate storage devices
asynchronous
FireWire Hot-swappable* 400 Mb/s Fast transfer of
400 max sustained transfer rate data
FireWire Hot-swappable* 800 Mb/s high-resolution
800 maximum sustained transfer digital audio and
rate video
applications
eSATA Fast connection Up to 6 Gb/s Transferring
to external hard maximum sustained transfer rate large
drives amounts of data
Gigabit Attaches a to a 1,000 Mb/s Remote file
Ethernet network, router, maximum sustained transfer access of data
switch, or hub rate

609
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-8. Keyboard Interface Circuits


In this section we provide some details about the IBM keyboards & Mice
interface circuits and their connectors. Although the IBM PC keyboard
has a standard interface, there exist several types of keyboards, which
work with IBM PC‘s and compatible computers. The original IBM PC
was equipped with a keyboard that had 83 keys. The keys arranged in
three major groupings. The central portion of the keyboard is a standard
typewriter keyboard layout. On the left side are 10 function keys (F1-
F10). These keys are user-defined by software. On the right is a 15-key
keypad. With the introduction of PC/AT, an 84 key keyboard was
introduced. The above keyboards were connected to the PC by 5-pin (XT
and AT) connectors which were plugged into a socket on the rear of the
system board. After the introduction of the IBM PC/AT, the enhanced
101-106 key keyboards have soon become the standard for most PC
clones. Such keyboards had the function keys along the top of the
keyboard and a numeric keypad at the right hand end. The latest trend is
ergonomic keyboards with multimedia and internet browsing keys.

83-key PC/XT keyboard

101-key PC/AT keyboard

Fig. 9-48. Photographs of two IBM PC keyboards

610
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The XT and AT keyboards are NOT compatible. The keyboard interface


circuit in the PC/XT was quite different to that used in the AT type and
all subsequent IBM PC‘s. The interface of 8-bit PC/XT used a serial in-
parallel out shift register (74LS322) to accept the keystroke data and the
interface was only single sided. This interface was unidirectional, the
keyboard only talks to the computer and the computer does not talk back
to the keyboard.

The keyboard return scan codes rather than ASCII code (American
Standard Code for Information Interchange). In addition, all keys are
typematic and generate both make and release scan code. For example,
key 1 produces scan code hex 01 on make and code hex 81 on release.

The AT computer introduced a bi-directional interface, and used an 8048


microprocessor chip in the keyboard interface to accept the keystroke
data, and to provide some data processing. The basic processes are the
same for both types of interface. The data from the keyboard is sent to a
keyboard interface circuit on the system board in Serial form. Each byte
of keystroke data is clocked into the interface circuit by a clock signal
provided by the keyboard itself. The core of most keyboard interfaces is
the Intel 8048 microcontroller or some derivative of it.

9-8.1. Keyboard Operation


When a key on the keyboard is pressed, the keyboard controller
determines a scan code for that key from its position in the key matrix.
The Scan Code for each key represents the keys position on the keyboard,
rather than a particular character.

When the PC keyboard has a key pressed it clocks a scan code


representing that key, into the computers Keyboard Interface circuit on
the system board. When the keyboard interface circuit has received 8-bits
(PC/XT keyboard) or 11 bits (all other PC keyboards) of key stroke data,
it generates a hardware interrupt on IRQ1, to start the keyboard service
routine. When the CPU gets the hardware interrupt signal IRQ1, via the
8259 interrupt controller chip, it runs the Interrupt service routine to find
out the interrupt number that caused the interrupt. It then uses this
number to find an address, from the interrupt vector table, that points to
the keyboard service routine. The processor loads the start address of this
routine into its program counter and proceeds to run the keyboard service
routine. The keyboard service routine takes the scan code from the
keyboard interface and produces a 2-byte code that it puts in a keyboard
buffer area in the PC RAM. The 2-Byte codes take the following form:
611
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Normal Characters
 Main byte (low byte) ASCII code
 Aux byte (high byte) SCAN code

Special Characters
 Main byte (low byte) 00(zero)
 Aux byte (high byte) Scan-code or special code

The Keyboard controller chip, inside the keyboard, scans the key matrix
and when a key is pressed it sends the Scan Code for the key that was
pressed, to the Keyboard Interface Circuit on the Computers System
Board. Therefore, each key on a PC keyboard has a Scan Code in
addition to the ASCII code associated with it. The following table
indicates the Scan Code for each key on a 101 key PC Keyboard.

Table 9-8. Keyboard scan codes, for IBM PC‘s

Scan Lower Upper Scan Base Upper Scan Base Upper


code case case code case case code case case
29 ` ~ 02 1 ! 03 2 @
04 3 # 05 4 $ 06 5 %
07 6 ^ 08 7 & 09 8 *
0A 9 ( 0B 0 ) 0C - _
0D = + 0E Backspace Backspace 0F Tab Back Tab

10 q Q 11 w W 12 e E
13 r R 14 t T 15 y Y
Scan Lower Upper Scan Base Upper Scan Base Upper
code case case code case case code case case
16 u U 17 i I 18 o O
19 p P 1A [ { 1B ] }
2B \ | 3A Caps Lock na 1E a A
1F s S 20 d D 21 f F
22 g G 23 h H 24 j J
25 k K 26 l L 27 ; :
28 ' " 2B # ~ 1C Enter Enter
2A Left Shift na D5 \ | 2C z Z
2D x X 2E c C 2F v V
30 b B 31 n N 32 m M
33 , < 34 . > 35 / ?

612
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Scan Lower Upper Scan Base Upper Scan Base Upper


code case case code case case code case case
Left
36 Right shift na 1D Left Ctrl na 38 na
Alt
Right
39 Spacebar Spacebar E0,38 Right Alt na E0,1D Ctrl na
Left
E0,52 Insert na E0,53 Delete na E0,4B Arrow na
Up
E0,47 Home na E0,4F End na E0,48 Arrow
na
Right
E0,49 Pg Up na E0,51 Pg Dn na E0,4D Arrow na
Keypa
45,C5 Num Lock na 47 Keypad 7 Home 4B d4
Left Arrow

Keypa
4F Keypad 1 End E0,35 Keypad / Keypad / 48 d8
Up Arrow

Keypa
4C Keypad 5 na 50 Keypad 2 Dn Arrow 52 d0
Insert

Keypa
E0,37 Keypad * Keypad * 49 Keypad 9 Pg Up 4D d6
Right Arrow

Keyp
51 Keypad 3 Pg Dn 53 Keypad . Delete 4A Keypad -
ad -
Escap
4E Keypad + Keypad + E0,1C Keypad Enter Keypad Enter 01 Escape
e
3B F1 3C F2 3D F3
3E F4 3F F5 40 F6
41 F7 42 F8 43 F9
44 F10 D9 F11 DA F12
2A,37 Prnt, Scrn na 46 Scroll Lock na

Fig. 9-49. Keyboard keys and scan codes

613
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-8.2. Detailed Operation of the PS/2 Keyboard


The basic operation of the keyboard interface of PS/2 and later PC‘s can
be summarized in the following steps.
1. The 8048 (or 8042) controller side of the interface is two bidirectional
open collector lines called CLOCK and DATA.

2. The keyboard side of the interface is identical.


3. Both sides can drive the lines and do so at the same time. Each side
has to figure out what is going on if the other side decided to interrupt

4. When the system CPU wants to send something to a keyboard device it


writes a command to the 8048 which tells the 8048 that the next
command written to the 8042 should be passed on to the keyboard device
across the serial DATA line.
5. Assume that when idle both the DATA and CLOCK lines are high.

6. The 8048 pulls the clock line low to inhibit any transmission from the
keyboard.

7. The 8048 pulls the data line low to get the keyboard's attention to the
fact that it wants to transmit.
8. The 8048 release the CLOCK line and waits for the keyboard to pull
the CLOCK line low. When the CLOCK line has been pulled low the
8042 places its first bit of data on the DATA line.

9. The keyboard toggles the clock line and clocks data across on the
DATA line. The 8048 controller will place a new bit on the data line each
time the CLOCK is pulled LOW by the keyboard.

9-8.3. Keyboard Protocol & Data Format


The keyboard transmits data by bits, synchronously with the clock line.
The older XT keyboards transmit the same way except they only use 10
bits (8bit data in addition to start and stop bits). The AT and PS2 as well
as later PC keyboards use 11 bits frames for data transmission, 1 start bit
(0), 8 data bits, 1 parity bit-set if the number of 1's in the data bits is
even, and a stop bit (1). The keyboard uses a bidirectional protocol. It
can send data to the host PC and the host PC can send data to it.

614
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Keyboard-to-host PC data frames

Host PC-to-Keyboard data frames

Fig. 9-50. Keyboard data frames

9-8.4. Keyboard Connectors


The PC keyboard is connected to the computer via one of three types of
connectors; namely the 5-pin DIN connector for PC/XT and PC/AT
computers), the 6-pin DIN connector (for PS/2 and later PC‘s) and USB
connectors for recent PCs.

Fig. 9-51. Main keyboard connectors

The AT keyboard connector has 5 pins. Pin 3 is called Reset, but it's
reserved, so we can't use it. Open collectors drive the clock and data pins,
so when they are not driven low, they float at 5V. Here is the wiring
assignment for these connectors
i. The XT/ AT (5-Pin) Connector
1 CLK
2 DATA
3 NOT RESET
4 GND
5 +5V
615
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

ii. The PS2 (6-Pin) Connector


1 DATA
2 No connection
3 GND
4 +5V
5 CLK
6 No connection
iii. The USB (4-Pin) Connector
1 VCC (+5V)
2 Data -
3 Data +
4 GND

The PC keyboard may be cordless or of wireless type. In this case the


keyboard communicates with its interface circuit in the host PC side via
infrared (IrDA) or radio (Bluetooth and WiFi).
9-8.5. Keyboard BIOS Calls
Although MS-DOS provides a set of routines to read ASCII and extended
character codes from the keyboard, the PC BIOS provides much better
keyboard input facilities. In general, if you do not need the I/O redirection
facilities provided by MS-DOS, reading the keyboard input using BIOS
functions provides much more flexibility. In order to call the BIOS
keyboard services you use the int 16 assembly instruction. The BIOS
provides the following keyboard functions (defined in AH):
Table 9-9. Keynoard BIOS calls

616
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-8.6. Keyboard Interface Circuits


The following schematic depicts the interface circuit of IBM PC/XT and
PS/2 keyboard, respectively. The shown circuits are in the host PC side.

Fig. 9-52. PC/XT keyboard interface

617
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-53. Schematic of the PS2 keyboard interface, showing the 8048 controller

618
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-9. Mouse Interface


A computer mouse is a pointing device that can detect the x-y
coordinates of its motion on a supporting surface. The mouse3 motion is
translated on the computer display as pointer, to facilitate the navigation
through the graphic-user interface (GUI) of the operating system. There
are many pointing device technologies, among which one can cite the
mechanical and the optical mice.

 Mechanical Mouse has a large rubber covered ball in the bottom of it


and this ball drives two encoder wheels that generate pulses in
response to the movement of the Mouse. A variation on this is the
Honeywell mechanical Mouse that has two inclined wheels set at 90
degrees to each other instead of a ball.

 Optical & laser Mouse use a light beam that shines out of the bottom of
the device and is reflected back into the Mouse. A reflective Mouse
Mat with a series of lines on it is required, and photo cells inside the
Mouse detect the movement of the device from these lines.

 Wireless or Cordless mice transmit data via infrared (IrDA) or radio


(Bluetooth and WiFi). The receiver is usually connected to the
computer through an RS232 serial port or USB.

Fig. 9-54. Different shapes of the computer mouse

9-9.1. Mouse Data Packet


The pointing devices may have different connections, among which the
most famous nowadays are the PS/2, USB and wireless mice. The PS/2
mouse uses the same protocol as the PS/2 keyboard. The standard PS/2
mouse interface supports the following inputs: X movement (right/left), Y
movement (up/down), left button, middle button, and right button. The
mouse periodically reads these inputs and updates various counters and
flags to reflect movement and button states.
619
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

There are other mouse devices that have additional inputs and may report
data differently. One popular extension is the Microsoft Intellimouse,
which supports the standard inputs in addition to scrolling wheel and two
additional buttons. The PS/2 mouse sends data to the host PC using the
following 3-byte frame:
Table 9-10. Mouse data packets

Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0


Y X Y X Always Middle Right Left
Byte 1 overflow overflow sign bit sign bit 1 Button Button Button
Byte 2 X movement
Byte 3 Y movement

The motion values are 9-bit 2's complement integers, where the most
significant bit appears as a "sign" bit in byte 1 of the movement data
packet. Their value represents the mouse's offset relative to its position
when the previous packet was sent, in units defined by the current
resolution. The range of values that can be expressed is -255 to +255.

Fig. 9-55. Schematic diagram of a PS/2 mouse

9-9.2. Mouse BIOS Calls


In MS-DOS, the mouse is initially hidden. The function 0H initializes the
mouse hardware and software so the mouse is ready to be used.

620
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

To set this function, load AX=0000H and call the mouse interrupt int 33.
The output of this function (the mouse status) will be written in AX and
BX. AX=0000H means ERROR, AX=FFFFFH means OK; the mouse is
installed. BX will also contain the number of mouse buttons.

The function 01H shows the mouse cursor on the screen. The function
02H hides the mouse cursor.

The function 03H returns position and button status. The output of this
function appears in BX, CX and DX (BX=1 if button is pressed, CX
contains the column position, DX contains the row position). The
function 04H will position the mouse cursor. To call this function put
04H in AX and put the mouse column position in CX and the mouse row
position in DX.

The function 09H redefines the shape of the mouse cursor when the
screen in graphics mode. The mouse shape is input as bitmap in BX and
CX. Also, ES:DX should contain a pointer to cursor bitmap.

621
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-10. Video/Monitor Interface


The old IBM PC‘s and compatible computers were traditionally equipped
with the analog cathode-ray-tube (CRT) monitors for more than two
decades. The recent PC‘s have been alternatively equipped with more
compact liquid-crystal display (LCD) monitors, which have much less
power dissipation.

Fig. 9-56. Old CRT (analog) monitor and a recent LCD (digital) monitor.

9-10.1. Video Monitors & Connectors


The video display monitors evolved along with the video system. In the
past, all video systems used the analog video graphic adaptor (VGA)
(DB15 or DB8) connectors. As the video standards proliferated, multi-
scan and digital monitors became available for a range of video systems.
Nowadays, monitors have either VGA or the digital video interface
(DVI) or the digital nflat panel (DFP) and/or the high-definition media
interface (HDMI) plugs. However, most monitors still have VGA sockets
and connectors, and so most graphics cards have VGA output.

Fig. 9-57. Video connectors (VGA and DVI and HDMI).


622
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-57. Video connectors (VGA and DVI).

Many modern computers and video cards are fitted with a standard VGA/
SVGA 15-way high-density D socket.

Table 9-11(a). Pins of the VGA (15 pin) adaptor

Pin Code Function Pin Code Function


1 R Red 9 - No pin (key) #
SYNC_
2 G Green 10 Sync Ground
GND
3 B Blue 11 ID0 Monitor ID 0 * ‡
4 ID2 Monitor ID 2 ‡ 12 ID1 Monitor ID 1 •
5 NC Not connected ‡ 13 HSYNC Horizontal Sync
6 R_GND Red Ground 14 VSYNC Vertical Sync
7 G_GND Green Ground 15 NC Not connected †
8 B_GND Blue Ground

Alternatively, the VGA circuits can appear on a 9-way D socket (DB9),


which is wired as follows:-

Table 9-11(b). Pins of the VGA (9 pin) adaptor

Pin Code Function Pin Code Function


1 R Red 6 R_GND Red Ground
2 G Green 7 G_GND Green Ground
3 B Blue 8 B_GND Blue Ground
4 HSYNC Horizontal Sync 9 SYNC_GND Sync Ground
5 VSYNC Vertical Sync

The DVI connector is wired as follows:-

623
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Table 9-11(c). Pins of the DVI (24 pin) adaptor

Pin Code Function Pin Code Function


1 TX2- Channel 2 Data - 13 TX3+ Channel 3 Data +
2 TX2+ Channel 2 Data + 14 +5V +5 V from interface
3 SHLD2/4 Channels 2 and 4 Shield 15 5V_GND
Ground for +5 V
4 TX4- Channel 4 Data - 16 HPD Hot Plug Detection *
5 TX4+ Channel 4 Data + 17 TX0- Channel 0 Data -
6 SCL DDC Clock 18 TX0+ Channel 0 Data +
Channels 0 and 5
7 SDA DDC Data (bidirectional) 19 SHLD0/5
Shield
8 VSYNC Analogue Vertical Sync • 20 TX5- Channel 5 Data -
9 TX1- Channel 1 Data - 21 TX5+ Channel 5 Data +
10 TX1+ Channel 1 Data + 22 SCL_GND DDC Clock Ground
11 SHLD1/3 Channels 1 and 3 Shield 23 TXC+ Clock +
12 TX3- Channel 3 Data - 24 TXC- Clock -

The DFP connector is wired as follows:-


Table 9-11(d). Pins of the DFP (20 pin) adaptor

Pin Code Function Pin Code Function


1 TX1+ Channel 1 Data + 11 TX2+ Channel 2 Data +
2 TX1- Channel 1 Data - 12 TX2- Channel 2 Data -
3 SHLD1 Channel 1 Shield 13 SHLD2 Channel 2 Shield
4 SHLDC Clock Shield 14 SHLD0 Channel 0 Shield
5 TXC+ Clock + 15 TX0+ Channel 0 Data +
6 TXC- Clock - 16 TX0- Channel 0 Data -
7 GND Logic Ground 17 NC Not connected
8 +5V +5 V from interface 18 HPD Hot Plug Detection *
9cx NC Not connected 19 SDA DDC Data (bidirectional)
10 NC Not connected 20 SCL DDC Clock

9-10.2. Video Adaptors (Interface Circuits)


The video adaptors (Video cards) is the interface between the CPU and a
video monitor (e.g. CRT or LCD). Video cards may be plugged into
expansion slots or in the advanced graphic port (AGP). The video card
has essentially a video controller chip and the video RAM frame buffer.
The frame buffer is a block of memory that can have its physical address
remapped to any location within the CPU logical address space.

For the original IBM PC, there were two types of Video Adaptors
(Interface cards), a monochrome character-only display called the Mono
Graphics Adaptor (MDA) and the Color Graphics Adaptor (CGA).

624
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Fig. 9-58. Basic block diagram of monochrome CRT monitor (above) and the
horizontal and vertical scan processes (below).

The CRT video controller provides the video information as well as


horizontal and vertical synchronization pulses to the CRT monitor. Color
monitors have separate video inputs for the basic three colors (red, green
and blue) and sometimes called RGB monitors. The resolution of a CRT
display is determined from two factors: the number of scan lines and the
number of dots that can be displayed on each line (pixels). Also, the
number of scan lines is determined by the scan rate.
625
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

In graphics display modes, it takes one bit to describe whether a pixel is


on or off. If we wish to display this pixel in more than two colors, then
we need more information for each pixel. The video-graphic array (VGA)
is a standard established by IBM to provide color graphics at higher pixel
addressability, than previous adaptors. The VGA standard provides 16
colors at 640 by 480. It takes 4 bits to describe the color of a pixel.
Multiply these 4 bits by the 307,200 pixels in a 640 x 480 display, divide
by 8bits to get how many bytes are required and you have 153,600 bytes
of Video RAM to store a screen image. The first VGA cards were
equipped with 256K bytes of Video RAM. The VGA consists of seven
sub-systems, including: graphics controller, display memory, serializer,
attribute controller, sequencer and CRT controller.

 Graphics Controller: Can perform logical functions on data being


written to display memory.
 Display Memory: A bank of 256k DRAM divided into 4 64k color
planes. It is used to store screen display data.
 Serializer: Takes display data from the display memory and converts
it to a serial bit-stream which is sent to the attribute controller.
 Attribute Controller: Contains the color LUT (Look Up Table)
which determines what color will be displayed for a given pixel value
in display memory.
 Sequencer: Controls timing of the board and color planes.
 CRT Controller: Generates syncing and blanking signals to control
the monitor display.

Basically, the CPU performs most of the work, feeding pixel and text
information to the VGA. So, the VGA card is a simple display adapter
with no processing capability. All the thinking is done by the CPU,
including writing and reading of text, and drawing of simple graphics
primitives like pixels, lines and memory transfers for images. Some
newer accelerator cards include functions for 3D graphics rendering like
polygon shading, coordinate manipulation and texture mapping. Others
provide on-the-fly magnification of video clips so that those MPEG
movies don't appear in a box that's three inches wide and two inches high
on your screen. If we want to record a standard video signal for digital
playback, we have to digitize it at about 640x480 pixels/frame. At a
screen refresh rate of 30 fps (frames per second), and 24-bit/pixel color
depth (16.7 million colors) we get 640x480x30x3=28 MB/s. At that data
rate, a 650MB CDROM would hold only 23 seconds of video! CDROM
reader and hard drive technologies don't allow us to transfer data at such
high rates, so in order to display digital video it is compressed for storage.
626
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The software or hardware that performs the compression / decompression


(when recording or displaying video) is called a CODEC. Dedicated
CODECs are available either as add-on cards or are integrated into video
cards. There exist several types of compressed video formats, including
MPEG (Motion Pictures Experts Group), AVI, MOV, and Quicktime.
Some of these formats can provide compression ratios of up to 100:1
while still providing good quality video. The following table depicts the
scan rates of video monitors for different generations of video cards.

Fig. 9-59. Video Graphic Adaptor of an old IBM PC, with DVI and DFP connectors.

Table 9-12. Video adaptor standards.

Adaptor Resol Colors H. Freq V. Freq VRAM Address Connector


MDA - 18,432 50 4K B0000 DB9
Hercules - 18,432 50 64K B0000 DB9
CGA 16 15,750 60 16K B8000 DB9
EGA 16 21,800 60 128-256K A0000 DB9
VGA 640 x 480 256 31,500 50-70 512K A0000 Min DB15
800 x 600 256
SVGA 35,5 56 1Meg A0000 Min DB15
1024 x 768 256
XGA 1280x1280 16.7 Million 48.2 56 6Meg A0000 Min DB15

9-10.3. BIOS Video Interface & Interrupt 10H


In order to use one of the video functions, first place the function number
in AH, set the other input registers, then call the function with int 10H.
The function 0H sets the video mode. To call this function put 0H in AH
and the Video Mode in AL (01H means 40x25 text mode, 03H means
80x25 text mode, 13H means 320x200 Graphic mode with 256 colors).
This function will clear the screen unless bit 7 of the AL register is set.
627
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

The function 01H sets the starting and ending lines of the screen cursor.
The starting line of cursor (0-7) is put in CH and the ending line of cursor
(0-7) is put in CL.. The function 02H sets the cursor position. Put the
display page in BH (in text modes--use 00H in graphic modes).and the
row and column positions in DH and DL, respectively.

The function 05H sets the current display page. To do this put the display
page in AL and call int 10. The function 08H reads a character and its
attributes at current cursor position. To do this put the display page in BH
and call int 10. The output (ASCII code of the character) will be output in
AL and its color attributes will be put in AH.

The function 09H writes a character and attributes at current cursor


position. To do this put the display page in BH and call int 10. The input
character (ASCII code) should be put in AL and its color attributes should
be put in BL.

The function 0Ah function is identical to function 09H, except that it is in


text modes, and the attribute byte currently in video memory is not
modified.

The function 0CH writes a graphic pixel. In this case you define the
display page in BH, the screen line in DX and the screen column in CX.
Also, the pixel color number should be put in AL, before calling int 10.
Alternatively, one can read a graphics pixel by calling the function 0DH.
This function returns the color number of the specified pixel (row in DX
and column in CX). The return value (pixel color number) will be in AL.
The BIOS video functions are summarized in the following table.
Table 9-13. Summary of BIOS Video functions

Function (AH) Description


00H Set Video Mode
01H Set Cursor Shape
02H Set Cursor Position
03H Get Cursor Position And Shape
04H Get Light Pen Position
05H Set Display Page
06H/ 07H Clear and Scroll Screen Up / Down
08H / 09H Read / Write Character and Attribute at Cursor
0AH Write Character at Cursor
0BH Set Border Color

628
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

13h Write String


0CH / 0DH Write / Read Graphics Pixel
0EH Write Character in TTY Mode
0FH Get Video Mode
10H Set Palette Registers (EGA, VGA, SVGA)
11H Character Generator (EGA, VGA, SVGA)
12H Alternate Select Functions (EGA, VGA, SVGA)
1Ah Get or Set Display Combination Code (VGA, SVGA)
1Bh Get Functionality Information (VGA, SVGA)
1Ch Save or Restore Video State (VGA, SVGA)
4Fh VESA BIOS Extension Functions (SVGA)

9-10.4. Graphic Processing Unit (GPU) & Graphic Accelerators


A graphics processing unit (GPU) is a special graphics rendering device
for PCs, workstations, or game consoles. Modern GPUs are more
efficient than typical CPUs for a range of complex algorithms. A GPU
can sit on top of a video card, or integrated directly into the motherboard.

A graphics accelerator contains special mathematical operations


commonly used in graphic rendering. They are mainly used for playing
3D games or high-end 3D rendering. Since the mid-1990s, CPU-assisted
real-time 3D graphics were common in PCs and console games. Early
examples of mass-marketed 3D graphics hardware can be found in video
game consoles such as PlayStation and Nintendo. As DirectX advanced
from a rudimentary API for games to become one a leading 3D graphics
interfaces, 3D accelerators evolved seemingly exponentially as years
passed. Direct3D 5.0 was the first version of the API to really dominate
the gaming market and stomp out many of the hardware-specific
interfaces. Later versions of Direct3D introduced support for hardware-
accelerated transform and lighting. The NVIDIA GeForce 256 and ATI
Radeon 9700 were the first cards on the market with this capability.

629
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-11. Summary of I/O Addresses in IBM PC & Compatibles


The following table summarizes the addresses of all input/output ports in
the IBM PC and compatibles.
Table 9-14a. Summary of I/O addresses in the IBM PC and compatibles.

I/O Address USE


0000-000F Slave DMA controller (8237 chip)
0010-001F System
0020-0021 First Interrupt controller (8259 chip)
0030-0031 Second interrupt controller
0040-0043 Programmable Interval Timer 1 (8254 chip)
0048-004B Programmable Interval Timer 2
0050-006F System devices
0070-0071 NMI Enable / Real Time Clock
0080-008B DMA Page registers
0090-009F System devices
00A0-00A1 Slave interrupt controller
00C0-00DE Master DMA controller
00F0-00FF System devices
0100-0167 System devices
0168-016F IDE Interface - Quaternary channel
0170-0177 IDE interface - Secondary channel
01E8-01EF IDE Interface - Tertiary channel
01F0-01F7 IDE interface - Primary channel
0200-0207 Games Port (joystick port)
0220-022F Usually used by sound cards & by NOVEL NETWARE
0270-0273 Plug and Play hardware
0278-027A Parallel Port *
0280-028F Sometimes used for LCD Display I/O
02B0-02DF Alternate VGA Display Adaptor assignment (secondary address)
02E0-02E7 GPIB 0, data acquisition card 0 (02E1 to 02E3 only)
02E8-02EF Serial Port - COM 4
02F8-02FF Serial Port - COM 2
0300-031F Often used as a default for Network Interface cards (prototype card)
0320-023F ST506 and ESDI Hard Disk Drive Interface (PC/XT, early PC/AT)
0330-0331 MPU-401 (midi) interface, on Sound Cards
0360-036F Sometimes used for Network Interface cards
0376-0377 Another address used by the Secondary IDE Controller
0378-037A Parallel Port
0388-038B FM (sound) synthesis port on sound cards
03B0-03BB MDA, EGA and VGA Display Adaptor (only 03B0-03BB used)
03BC-03BF Parallel Port (originally only fitted to IBM mono display adaptors)
03C0-03DF EGA / VGA Video Display Adaptor, (Primary address)
03E0-03E7 PCIC PCMCIA Port Controller
03E8-03EF Serial Port - COM 3
03F0-03F6 Floppy Disk Drive Interface

630
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

I/O Address USE


03F7-03f7 Another address used by the Primary IDE Controller
03F8-03FF Serial Port - COM 1
0533-0537 Windows sound system (used by many sound cards)

The next assignments are recent additions, and are used by some modern
chipsets, which are used on the IBM PC's motherboards. The three
parallel port assignments listed in this table are assigned as LPT1 to LPT3

Table 9-14b. Recent additions for I/O addresses in the IBM PC and compatibles.

I/O Address USE


0678-067F Used by the Extended Parallel Port at 0278 (EPP)
0778-077F Used by the Extended Parallel Port at 0378 (EPP)
07BC-07C5 Used by the Extended Parallel Port at 03BC (EPP)
0CF8-0CFB PCI data registers
FF00-FF07 IDE Bus Mastering Registers
FFA0-FFA7 Primary IDE Bus Mastering Registers
FFA8-FFAF Secondary IDE Bus Mastering Registers

631
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-12. Summary
In this chapter, we presented the general architecture of IBM PC &
compatible microcomputers, which are based on Intel 80x86
microprocessors. The expansion slots and various busses are introduced.
The IBM Personal Computer, commonly known as the IBM PC, is the
original version and progenitor of the IBM PC compatible hardware
platform. The term personal computer was common currency before
1981. However, because of the success of the IBM PC, what had been a
generic term sometimes meant a microcomputer compatible with IBM
PC. The original PC was an IBM attempt to get into the small computer
market then dominated by the Commodore, Atari 8-bit family, Apple II
and Tandy TRS-80s, and various CP/M machines. The IBM PC model
5150 was introduced on August 12, 1981. Rather than going through the
usual IBM design process, a special team was formed with authorization
to bypass the company restrictions and get something to market rapidly.
The team consisted of twelve people headed by Don Estridge and Chief
Scientist Larry Potter. They developed the PC in about a year. To achieve
this they decided to build the machine with "off-the-shelf" parts from a
variety of different original equipment manufacturers (OEMs). IBM also
sold an IBM PC Technical Reference Manual which included a listing of
the BIOS source code. The following figure shows the block diagram of
an 8088-based PC and its support chips.
The expansion slots have so many pins to power expansion cards and to
connect them with data, address and control busses. The expansion slots
are connected with the microprocessor via a group of signal lines (on the
motherboard of a computer) called the expansion bus. The expansion bus
contains a large number of input/output pins, for data and address as well
as control signals and it is usually operated at a frequency lower than the
microprocessor clock. The efficiency of the expansion bus is expressed in
terms of its bandwidth. The expansion bus speed is calculated using the
following equation:
Bus speed = Bus width (in Bytes) x Bus Clock (in MHz)
Interfacing I/O devices to the IBM PC via ISA bus needs, at least, the
connection with the following pins (among the first 62 pins):

1- A0-A9 for address decoding (you can assign up to 1k ports)


2- IOR and IOW (both active low)
3- AEN signal: AEN = 0 when the CPU is using the bus
632
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

For a relatively long time, most PCs on the market had one or more ISA
slots for backward compatibility; however, most expansion cards are now
built using the PCI and PCI-Express interface. PCI was developed by
Intel, in 1998, but it took some time to get it to work reliably. The PCI-
bus has some attractive features, such as concurrent bus-mastering, a
full burst mode, and a type of pipelining queue that can reduce the
number of potential wait states.

The so-called PCI-Express or PCIe is a much higher norm that was


developed by the PCI Special Interest Group (PCI-SIG). PCIe bus offers
500 MB/s, up to 3.5 times more bandwidth than conventional PCI at 133
MB/s.

633
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

Interface Date Max transfer


PC Notes
bus introduced rate
Acorn bus 1979 6502 based 8bit data /16-bit address.
ISA-8 1981 1,99 MByte/s IBM PC very widespread.
VME 1981 10 MByte/s Motorola 68000 Industry use.
STEbus 1983 CPU independent 8bit data 20-bit address.
ISA-16 1984 3,97 MByte/s IBM PC/AT very widespread.
Zorro II 1985 3,56 MByte/s Amiga Auto configuration.
EISA 1988 20 MByte/s IBM clones 32-bit version of ISA-16
NuBus 1987 40 MByte/s NeXT, Macintosh
Zorro III 1990 150 MByte/s Amiga PC Multiplexed 32-bit.
PCI-32/33 1993 133 MByte/s Widespread.
PCI Express 2004 250 MByte/s P2P highspeed PCI.

Hard disk drives are accessed over one of a number of bus types,
including parallel ATA (PATA, also called IDE or EIDE), Serial ATA
(SATA), SCSI, Serial Attached SCSI (SAS), and Fibre Channel. Bridge
circuitry is sometimes used to connect hard disk drives to buses that they
cannot communicate with natively, such as IEEE 1394 and USB Below is
a table showing the buses and slots and their maximum bandwidths:
PCI 132 MB/s
AGP 8X 2,100 MB/s
PCI Express 1x 250 MB/s
PCI Express 2x 500 MB/s
PCI Express 4x 1000 MB/s
PCI Express 8x 2000 MB/s
PCI Express 16x 4000 MB/s
PCI Express 32x 8000 MB/s
USB 2.0 (Max Possible) 60 MB/s
IDE (ATA100) 100 MB/s
IDE (ATA133) 133 MB/s
SATA 150 MB/s
SATA II 300 MB/s
Gigabit Ethernet 125 MB/s
IEEE1394B [Firewire 800] ~100 MB/s

634
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-13. Problems
9-1) ISA stands for
a) International American Standard.
b) Industry Standard Architecture.
c) International Standard Architecture.
d) None of the above.
9-2) IDE disk is connected to the PCI BUS using ______ interface.
a) ISA
b) ISO
c) ANSI
d) IEEE
9-3) ________ is an extension of the processor BUS.
a) SCSI BUS
b) USB
c) PCI BUS
d) None of the above
9-4) _____ provides a separate physical connection to the memory.
a) PCI BUS
b) PCI interface
c) PCI bridge
d) Switch circuit
9-5) The key feature of the PCI BUS is
a) Low cost connectivity.
b) Plug and Play capability.
c) Expansion of Bandwidth.
d) Both a and c.
9-6) The DMA differs from the interrupt mode by
a) The involvement of the processor for the operation
b) The method accessing the I/O devices
c) The amount of data transfer possible
d) Both a and c
9-7) The key feature of the PCI BUS is
a) Low cost connectivity.
b) Plug and Play capability.
c) Expansion of Bandwidth.
d) Both a and c.
9-8) The DMA differs from the interrupt mode by
a) The involvement of the processor for the operation
b) The method accessing the I/O devices
c) The amount of data transfer possible
d) Both a and c
635
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-9) Describe briefly how to design a simple I/O extension card that can
be inserted into an ISA expansion slot to provide a simple input port of 8-
bits (via dip-switches) and a simple output port of 8-bits (to 8 LED‘s or a
7-segment display). Hint: Refer to the circuit in figure 9-11.

9-10) Write an assembly program that reads the input port (A) and output
the reading to the output port (B) of the 8255 chip in the above card.

9-11) Show how to connect and initialize an external serial device, to


COM1, with given baud rate and parity. Write the software driver that
you‘ll use, in C-language.
Hint: Browse the hyper terminal program, which is supplied with your
Windows operating system (Start/Programs/Accessories/Communication)

9-12) Consider the parallel port test circuit shown below. Show how to
build the project (pptest), and run the resultant pptest.exe under DOS such
that when you set the dip switches to "11111111", the LED1 to LED8 in
the hardware will glow.

9-13) Design a parallel printer ISA interface card on the basis of the
8255, to drive a stepper motor, via the key-board. Use a 3-to-8 decoder
(74137) for assigning the appropriate port address.
Hint: Let the arrow keys ( and ) rotate the motor right and left, and
the space bar to start and the escape to stop the motor.

636
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

9-14- Bibliography

[1] M. Sergent, IBM PC Inside Out, McGraw Hill, 1986

[2] Peter Norton, Inside the IBM PC, Brady, New York, 1986.

[3] IBM Personal Computer Hardware Reference, 1986.

[4] Intel Corp., Peripheral Components, Santa Clara. CA, 1993.

[5] V. Rajaraman, and T. Radhakrishnan, Essentials of Assembly


Language Programm-ing, for the IBM PC, Prentice-Hall, 2000.

[6] B. Govindarajalu, IBM PC And Clones: Hardware, Troubleshooting


and Maintenance , Tata McGraw-Hill Education, 2002.

[7] http://www.epanorama.net/

637
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 9

638
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 10

Microprocessor Support
Chips & PC Chipsets
Contents

10-1. Introduction
10-2. DMA Controller, Intel 8237 Chip
10-3. UART Chip, Intel 8250A Chip
10-4. USART Chip, Intel 8251A Chip
10-5. UART Chip, Intel 16550 Chip
10-6. Programmable Interval Timer (PIT), Intel 8253 Chip
10-7. Programmable Peripheral Interface (PPI), Intel 8255 Chip
10-8. Programmable Interrupt Controller (PIC), Intel 8259 Chip
10-9. Keyboard / Display Controller, Intel 8279 Chip
10-10. Bus controller, Intel 8288 Chip
10-11. Bus Arbiter, Intel 8289 Chip
10-12. CRT Controller, Intel 8275 & Motorola 6845 Chips
10-13. Graphic Processing Units (GPU) & Graphic Accelerators
10-14. IBM PC Chipsets
10-15. Case Study: Intel 82845 Chipset
10-15.1. North-Bridge Chipset (MCH)
10-15.2. South-Bridge Chipset (ICH)
10-16. Intel DG965 Chipsets
10-17. AMD 690G Chipsets
10-18. Intel 5-Series (Core i7) Chipsets
10-18. Intel 6- Series and 7-Series Chipsets
10-19. Apple PC Chipsets
10-20. Intel Z170 (Skylake) Chipset
10-21. Summary
639

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

640

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Microprocessor Support
Chips & PC Chipset
10-1. Introduction
In order to build a complete microprocessor system, various support chips
are needed in conjunction with the microprocessor. When Intel
introduced its 80186 processor in 1982, which was designed as an
embedded solution, Intel built the clock generator, DMA channels,
interrupt controller, timers, and wait-state generator into the CPU.
However, since the introduction of 80286, the opinion prevailed to
integrate such functions in a chipset. In 1999, Intel launched a family of
compatible 800-series chipsets. The first of these was the 810 chipset.

Table 10-1. Intel Microprocessor Support Chips(for early PC’s)

Chip Registers Uses


8042 - Keyboard Interface 8-Bit Microcontroller
8237 4 Direct-Memory Access DMA Controller, 4 channels.
8250 10 Universal Asynchronous Receiver Transmitter UART.
8251 4 Universal Synchronous/Asynchronous Receiver
Transmitter USART.
16550 16 UART, as 8250, but modern faster chip
8253 / 54 4 Programmable Interval Timer, PIT, timer chip
8255 4 Programmable Peripheral Interface PPI Adaptor
8256 4 Multifunction UART (MUART)
8259 8 Programmable Interrupt Controller, PIC
8272 - Floppy Disk Controller
8275 16 CRT (video) Controller
8279 8 Keyboard / Display Controller
8288 - Bus Controller
8289 - Programmable Bus Arbiter
82091 - Advanced Integrated Peripheral (AIP)
8742 - Universal Peripheral Interface 8-Bit Microcontroller
6845 16 CRT Controller, for color graphics adaptors
6849 4 Direct-Memory Access DMA Controller, 4 channels.

641

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-1. Bock diagram of the IBM PC, showing the 8088 CPU and support chips.

The system chipset is the logic circuits that form the intelligence of the
computer. It usually controls data transfers between the CPU, cache, and
system buses. Since data flow is such a critical issue, the chipset is one of
the important components that have a major impact on the computer
performance. In this chapter we describe the famous support chips for
x86 CPUs and IBM PC.

10.2. DMA Controller (DMAC), Intel 8237 Chip


The Direct Memory Access (DMA) subsystem in the IBM PC is based on
the Intel 8237 DMA controller. The 8237 contains four DMA channels
that can be programmed independently and any one of the channels may
be active at any moment. These channels are numbered 0, 1, 2 and 3. The
8237 has two electrical signals for each channel, named DRQ and
DACK.
642

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

There are additional signals with the names HRQ (Hold Request), HLDA
(Hold Acknowledge), EOP (End of Process), and the bus control signals
MEMR (Memory Read), MEMW (Memory Write), IOR (I/O Read), and
IOW (I/O Write).

The 8237 DMA is known as a fly-by controller. This means that the data
being moved from one location to another does not pass through the
DMA chip and is not stored in the DMA chip. Subsequently, the DMA
can only transfer data between an I/O port and memory, but not between
2 I/O ports or 2 memory locations.
10.3. UART, Intel 8250A Chip
The IBM PC uses the Intel 8250A UART (universal asynchronous
receiver and transmitter) as the interface unit, for serial communication
with the outside world. The Intel 8250A UART is a special interface unit
that converts parallel information to serial information and vice-versa.
The UART is used to connect the IBM PC to serial devices like
MODEM’s (Modulator/Demodulators) and FAX/MODEM cards. The
Intel 16550 UART was the successor of 8250A, for recent PC’s. Over the
years, since the introduction of the DOS computer, three types of UARTS
have been used in this hardware. The first was the 8250 chip, this was
followed by 8250A, then the 16450 chip, and then the 16550 chip. The 3
chips are pin compatible, but different in performance. The following
table indicates the performance of these various chips.
Table 10-2. Max data rate of famous UART chips.

Type Max. Data Rate (Baud)


8250 9600 bits/second
16450 30K bits/second
16550 112K bits/second

The UART chip has a total of 12 different registers that are mapped into 8
different Port I/O locations. Yes, you read that correct, 12 registers in 8
locations. Obviously that means there is more than one register that uses
the same Port I/O location, and affects how the UART can be configured.
In reality, two of the registers are really the same one but in a different
context, as the Port I/O address that you transmit the characters to be sent
out of the serial data port is the same address that you can read in the
characters that are sent to the computer. Another I/O port address has a
different context when you write data to it than when you read data from
it... and the number will be different after writing the data to it than when
you read data from it. More on that in a little bit.
643

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

One of the issues that came up when this chip was originally being
designed was that the designer needed to be able to send information
about the baud rate of the serial data with 16 bits. This actually takes up
two different registers and is toggled by what is called the Divisor Latch
Access Bit (DLAB). When the DLAB is set to "1", the baud rate registers
can be set and when it is "0" the registers have a different context. Does
all this sound confusing? Maybe, but let us take it step by step. The
following is a table of each of the registers that can be found in a typical
UART chip:

Table 10-2. Registers of UART chips.

Register Name Abbrv. I/O DLAB Base


Access Address
Transmitter Holding Buffer THR Write 0 +0
Receiver Buffer RBR Read 0 +0
Divisor Latch Low Byte DLL Read/Write 1 +0
Interrupt Enable Register IER Read/Write 0 +1
Divisor Latch High Byte DLM Read/Write 1 +1
Interrupt Identification Register IIR Read x +2
FIFO Control Register FCR Write x +2
Line Control Register LCR Read/Write x +3
Modem Control Register MCR Read/Write x +4
Line Status Register LSR Read x +5
Modem Status Register MSR Read x +6
Scratch Register SR Read/Write x +7

The "x" in the DLAB column means that the status of the DLAB has no
effect on what register is going to be accessed for that offset range.
Notice also that some registers are Read only. If you attempt to write data
to them, you may end up with either some problems with the modem
(worst case), or the data will simply be ignored (typically the result).

As mentioned earlier, some registers share a Port I/O address where one
register will be used when you write data to it and another register will be
used to retrieve data from the same address. Each serial communication
port will have its own set of these registers. For example, if you wanted to
access the Line Status Register (LSR) for COM1, and assuming the base
I/O Port address of 53F8, the I/O Port address to get the information in
this register would be found at 03F8 + 05 or 03FD. Some example code
would be like this:
644

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

COM1_Base = 03F8; COM2_Base = 02F8;


LSR_Offset = 05;
Byte LSR_Value() {return Port[COM1_Base+LSR_Offset]; }

Note that we use the following mathematical formula, in order to


determine what numbers we need to put into the Divisor Latch Bytes:

The Interrupt Enable Register allows us to control when and how the
UART is going to trigger an interrupt event with the hardware interrupt
associated with the serial COM port. If used properly, this can enable an
efficient use of system resources and allow you to react to information
being sent across a serial data line in real-time. The point here is that you
can use the UART to let you know exactly when you need to extract
some data. This register has both read and write access. The following is
a table showing each bit in this register and what events that it will enable
to allow you check on the status of this chip:

Table 10-3. Bits of the Interrupt Enable Register (IER)

Bit Notes
7 Reserved
6 Reserved
5 Enables Low Power Mode (16750)
4 Enables Sleep Mode (16750)
3 Enable Modem Status Interrupt
2 Enable Receiver Line Status Interrupt
1 Enable Transmitter Holding Register Empty Interrupt
0 Enable Received Data Available Interrupt

The Received Data interrupt is a way to let you know that there is some
data waiting for you to pull off of the UART. This is probably the one bit
that you will use more than the rest, and has more use.
The Transmitter Holding Register Empty Interrupt is to let you know
that the output buffer (on more advanced models of the chip like the
16550) has finished sending everything that you pushed into the buffer.
This is a way to streamline the data transmission routines so they take up
less CPU time.

645

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

The Receiver Line Status Interrupt indicates that something in the LSR
register has probably changed. This is usually an error condition, and if
you are going to write an efficient error handler for the UART that will
give plain text descriptions to the end user of your application, this is
something you should consider.
The Modem Status Interrupt notifies you when something changes with
an external modem connected to your computer. This includes the
telephone bell ringing, that you have successfully connected to another
modem (Carrier Detect has been turned on), or that somebody has hung
up the telephone (Carrier Detect has turned off). It can also help you to
know if the external modem or data equipment can continue to receive
data (Clear to Send). Essentially this deals with the other wires in the RS-
232 standard other than strictly the transmit and receive wires. The other
two modes are strictly for the 16750 chip, and help put the chip into a low
power state for use on battery-powered laptop computers or embedded
controllers. On earlier chips you should treat these bits as "Reserved", and
only put a "0" into them.

10.4. USART, Intel 8251A Chip


The typical PC UART is the Intel 8251A, this IC can be programmed like
a synchronous or an asynchronous device. The 8251A contains 8 data bits
(D0-D7) that connect to the data bus of the PC. The chip select (CS) input
enables the IC when is asserted by de control bus of the PC system. This
IC has two internal addresses, a control address and a data address. The
control address is selected when the C/D input is high. The data address is
selected when the C/D input is low. The RESET signal resets the IC.
When the RD is low the computer reads a control or a data byte. The WR
enables the PC to write a byte. Both signals are connected to the system
with the same names.

646

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

IRQ3
D0 74LS244 8250
: D0 INTRPT DRIVER ٌ
: : SOUT TxD
:
: RTS R
D7 D7 DTR S
DIR G 2
SIN RxD
IOR CTS 3
IOR 2
IOW IOW DSR
RLSD
A0 A0 RI
A1 A1
A2 A2
A3 Cs2 OUT1
:
CS1 OUT2
A9 CS0
AEN MR Xin Xout
18.4322
MHz Divider
XTL
/10

Fig. 10-2(a). The 8250 UART chip block diagram, and how it is connected in a PC.

10.5. UART, Intel 16550 Chip


The 16550 is otherwise identical to its predecessors and so it can be used
as a 16450 or 8250 replacement. The older I/O cards found in DOS
computer hardware had a 40 pin DIL UART chip mounted on a socket
and so it would be simple to upgrade by just replacing the chip. Some
SPG cards had a 16450 chip while others had 16451. This chip is
equivalent to the 16450 with a Parallel Interface as well as a UART.

647

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-2(b). Pin-out diagram of the 8250 and 16550 UART Chips

10.6. Programmable Interval Timer (PIT), Intel 8253/8254 Chips


The 8253 chip is designed to solve the timing control problems in
microcomputer systems. It is used in the IBM PC to generate various
clocks for different support chips. So, the 8253 solves one of the most
common problems in any microcomputer system, the generation of
accurate time delays under software control. Instead of setting up timing
loops in software, the programmer configures the 8253 to match his
requirements and programs one of the counters for the desired delay. The
8254 chip is a superset of the 8253 counter/timer. The 82C54 is a high-
performance CMOS version of the 8254. The pin-out and functional
block diagram of the 8253/8254 chips are shown in figures 10-3(a), (b).

As shown in figure, the 8254 chip contains three 16-bit programmable


counters, which can be independently programmed to produce different
clocks (CLK1, CLK2 and CLK3). Each counter is capable of handling
clock input frequencies of up to 12MHz.

Counter0 is connected to IRQ0 of the 8259 PIC (programmable interrupt


controller) chip, which is highest priority interrupt in IBM PC. This
interrupt provides the time-of-day (TOD) as well as other services.
Counter1 is usually used in DRAM refreshing. As we’ve seen so far, the
channel 0 of the DMA controller 8237 was used to refresh DRAM in
early IBM PC’s. In this case, the 8253 was responsible of informing the
DMA periodically (with the allowable refresh cycle time) every 15 us (at
least) to do memory refreshing.
648

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Counter2 is usually connected to the PC speakers to play music. It is also


connected to PC5 (bit5 of port PC) of the 8255 peripheral I/O (PIO)
adaptor chip. Programming is done in 6 different modes (mode0-mode5),
by writing a specific control word inside the 8253 control register, as
shown in figure 10-2(c) below. The first 2 bits (SC0, SC1) are used to
choose one of the 3 counters (11 is illegal). The following 2 bits (RW0,
RW1) define the method of read/write of counters. For instance, 00=
latch count read, 01= read/load high byte, 10= read/load low byte and
11= read/load low and high bytes. The 6 programmable timer modes are
chosen via 3 bits of the control word (M0, M1, M2). These modes allow
the 8254 to be used as an event counter (mode0), programmable one-shot
(mode1), elapsed time indicator (mode2), square-wave generator (mode3)
and other applications. In the IBM PC, the address timer channels 0,1,2
and the control register are 40H, 41H, 42H and 43H.

Fig. 10-3(a) Pin-out diagram of the 8253/8254 programmable interval timer (PIT).

649

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-3(b). Functional block diagram of the 825/54 programmable interval timer.

D7 D6 D5 D4 D3 D2 D1 D0
SC1 SC0 RW1 RW0 M2 M1 M0 B/BCD
Channel ID Read/Load Mode Count Mode Selection 0 = Binary
1 = BCD
Fig. 10-3(c). Bits of the control register of the 8254 programmable interrupt
timer/counter (PIT).

650

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10.7. Programmable Peripheral Interface (PPI). Intel 8255 Chip


The 8255 is a programmable peripheral interface (PPI) adaptor chip that
can take care of up to three parallel I/O devices. Sometimes the 8255 chip
is called parallel input/output (PIO).

Fig. 10-4(a). Pin-out diagram of the 8255b chip

The 8255 chip contains four registers. One is a control register and three
data register, one for each of its three ports PA, PB and PC). Not
necessarily all ports are doing I/O. When the CPU executes an IN or
OUT instruction, the two address bits (A0-A1) specify which of the four
registers will be accessed. Thus a connection between the CPU and the
I/O device can be established. As we have explained in chapter 8, the
8255A has three operation modes:
1- Basic I/O mode: All the 3 ports can be programmed as input or output
ports. When CPU does an output, a value is stored to the register, the
8255 latches it (holds it constantly), and sends it to its output pins until
the CPU writes a new data to the register (CPU won't write new data until
the I/O device receives the previous item). The PPI does not latch its
input; the CPU can only read a value while the external device sends it.
So, the CPU reads a value from the ports data register.
651

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

2- Strobed mode: PA and PB will serve as data buffer with strobing


(handshaking) signals provided by PC. The handshaking of PA is
provided by upper half of PC. The handshaking of PB is provided by
lower half of PC.
3- Bidirectional bus mode: this mode is a equivalent to: strobed mode
and direction flow control for PA (bi-directional with handshaking).

Fig. 10-4(b). Architecture of the 8255 UART chip.

Programming details of the 8255 chip can be found with numerous


examples in chapter 8. As we stated before, the original IBM PC had a
single 8255 chip in I/O mode 0, with PA address = 60H, PB address =
61H, PC address = 62H, and the control register address =63H. Also, the
default control word is 99H.

652

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-4(c). Mode summary of the 8255 chip

10.8. Programmable Interrupt Controller (PIC). Intel 8259 Chip


The Intel 8259 PIC chip can help the microprocessor to take care of
multiple interrupts. In fact, the 8086/80888 processors have only two
interrupt control inputs, nonmaskable interrupt (NMI) and interrupt
request (INTR). NMI handles interrupts that cannot be masked (like
memory parity error). In the IBM PC, the interrupt vector table (IVT) is
stored in the first 1kB of PC RAM. This table consists of 256 vectors.
Each vector requires 4 bytes because addresses are specified in segment
and offset format. The interrupt vectors are used to locate the interrupt
service routines (ISR's), which are available to BIOS, DOS and
applications. Figure 10-4(a) depicts the pin-out diagram of the 8259 chip.
Also, figure 10-4(b) illustrates the 8259 functional block diagram.

The 8259 interrupt controller chip can perform the following functions:

1. it receives interrupt requests from up to 8 different sources,


2. it prioritizes and queues interrupts that come at the same time,
3. it masks interrupt requests when instructed by the CPU,
4. it processes the CPU interrupt-acknowledge signal by sending the
interrupt vector number to the CPU (on the data bus).

653

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-5(a). Pin-out diagram of 8259 PIC chip

Fig. 10-5(b). Functional block diagram of the 8259 PIC chip

Table 10-4. Interrupts in the IBM PC

Int. Num. Address in IVT Description


0 00-03 CPU divide by zero
1 04-07 Debug single step
2 08-0B Nonmaskable Interrupt (NMI input)
3 0C-0F Debug breakpoints
4 10-13 Arithmetic overflow
5 14-17 BIOS provided Print Screen routine

654

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Int. Num. Address in IVT Description


6 18-1B Reserved
7 1C-1F Reserved
8 20-23 IRQ0, Time of day hardware services
9 24-27 IRQ1, Keyboard Interface
A 28-2B IRQ2, ISA Bus cascade for second 8259
B 2C-2F IRQ3, Com 2 hardware
C 30-33 IRQ4, Com1 hardware
D 34-37 IRQ5, LPT2, parallel Hard Disk on XT
E 38-3B IRQ6, Floppy Disk adaptor
F 3C-3F IRQ7, LPT1, Parallel port hardware
10 40-43 Video services, see note 1
11 44-47 Equipment check
12 48-4B Memory size determination
13 4C-4F Floppy I/O routines
14 50-53 Serial port I/O routines
15 54-57 PC used for Cassette tape services
16 58-5B Keyboard I/O routines
17 5C-5F Printer I/O routines
18 60-63 Points to basic interpreter in a real IBMPC
19 64-67 Bootstrap loader
1A 68-6B Time of day services
1B 6C-6F Services Ctrl-Break service
1C 70-73 Timer tick (provides 18.2 ticks per sec)
1D 74-77 Video parameters
1E 78-7B Disk parameters
1F 7C-7F Video graphics
20 80-83 Program termination (obsolete)
21 84-87 All DOS functions through this Interrupt
22 88-8B Terminate address
23 8C-8B Ctrl-Break exit address
24 90-93 Critical error handler
25 94-97 Read logical sectors
26 98-9B Write logical sectors
27 9C-9F Terminate and Stay Resident (TSR)
28 to 3F A0-A3 to FC-FF Reserved for DOS
40 to 4F 100-103 to 13C-13F Reserved for BIOS
50 140-143 Reserved for BIOS
51 144-147 Mouse functions
52 to 59 148-14B to 164-167 Reserved for BIOS
5A 168-16B Reserved for BIOS
5B 16C-16F Reserved for BIOS
5D 174-177 Reserved for BIOS
5E 178-17B Reserved for BIOS
5F 17C-17F Reserved for BIOS
60 to 66 180-183 to 198-19B Reserved for User programs
655

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Int. Num. Address in IVT Description


67 19C-19F Used for EMS functions
68 to 6F 1A0-1A3 -1BC-1BF Unused
70 1C0-1C3 IRQ8, ISA bus Real time clock
71 1C4-1C7 IRQ9, takes the place of IRQ2
72 1C8-1CB IRQ10 (available hardware interrupt)
73 1CC-1CF IRQ11 (available hardware interrupt)
74 1D0-1D3 IRQ12 (available hardware interrupt)
75 1D4-1D7 IRQ13, math co-processor
76 1D8-1DB IRQ14, ISA bus hard disk controller
77 1DC-1DF IRQ15, (available hardware interrupt)
78 to 7F 1E0-1E3 to 1FC-1FF Unused
80 to 85 200-203 to 214-217 Reserved for BASIC interpreter
86 to F0 218-21B to 3C0-3C3 Used by BASIC interpreter
F1 to FF 3C4-3C7-3C4-3FF Unused

Fig. 10-5(c). Connection of the 8259 chip to the 8088 microprocessor in IBM PC.

656

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

The 8259 has many modes of operation. In the simplest mode, the chip is
programmed by an initialization word of 3 bytes, in much the same way
as you program the 8255 chip. This 3-byte word assign the interrupt
request lines (IRQ’s) to certain interrupt vectors and define the triggering
direction of interrupt signals (positive or negative edge). After
initialization, the 8259 becomes ready to accept up to 8 interrupt requests
and to raise its INT output pin. The IRQ table will change a little from an
IBM PC to another, according to the installed peripheral devices. The
table below depicts the IRQ's of a recent IBM PC.

Table 10-5. Interrupt Requests (IRQs) in a recent IBM PC

IRQ Number Description


IRQ00 System Timer (PIT Timer 0)
IRQ01 Standard KB
IRQ02 2nd PIC (chained)
IRQ03 COM2 (communication port 2)
IRQ04 COM1 (communication port 1)
IRQ05 In use by Unknown device
IRQ06 FDD (Floppy Disk Drive controller)
IRQ07 LPT1 (ECP printer port)
IRQ08 System CMOS/Real time clock
IRQ09 PCI Audio device
IRQ10 Display card
IRQ11 Modem card
IRQ12 PS/2 Mouse
IRQ13 Numeric coprocessor
IRQ14 IDE (Hard disk drive) controller
IRQ15 Secondary IDE (Hard disk drive) controller

657

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10.9. Programmable Keyboard and Display. Intel 8279


The 8279 is a programmable keyboard and display interface designed for
use with Intel microprocessors. The keyboard portion can provide a
scanned interface to a 64-contact key matrix. The display portion
provides a scanned display interface for LED, incandescent and other
display technologies.

Fig. 10-6(a). Pin-out diagram of the 8279 keyboard controller.

Fig. 10-6(b). Functional block diagram of the 8279 keyboard controller.

658

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-6(c). Circuit diagram illustrating the use of the 8279 controller to drive 8
multiplexed 7-segment display units.

Fig. 10-6(d). Circuit diagram illustrating the use of the 8279 controller to drive a 8x8
matrix keypad.

659

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

It should be noted that the keyboard unit of IBM PC contains an Intel


8042 (or 8048) microcontroller that is programmed to scan the keyboard
for key presses and releases (each counts as an individual keystroke),
debounce the keystrokes, implement the typematic (hold-to-repeat)
feature, maintain a 16-keystroke buffer, and transmit each keystroke
serially to the host PC system unit.

10.10. Bus Controller, Intel 8288


The Intel 8288 bus controller chip coordinates activities on the micro bus
system. It converts CPU status (S0, S1, S2) and clock signals into bus
control signals. These control signals direct operations of latches, data
transceivers, and the I/O bus

For instance, the 8288 uses the status signals S0, S1, S3 (pins 26-28 of
8086/8088 micros in its maximum mode) to generate I/O control signals
(ALE, MEMRC, MEMWC, IORC, IOWC, INTA, and other signals),
which facilitate the use of other support chips.

Fig. 10-7. Functional block diagram if the 8288 bus controller.

660

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Table 10.6. Input status and output control signals of the 8288 chip.

STATUS INPUTS 8288 COMMAND CPU CYCLE


S2 S1 S0
0 0 0 INTA Interrupt acknowledge
0 0 1 IORC Read I/O
0 1 0 IOWC, AIOWC Write I/O
0 1 1 None Halt
1 0 0 MRDC Instruction fetch
1 0 1 MRDC Read memory
1 1 0 MWTC, AMWC Write memory
1 1 1 None Passive

10.11. Bus Arbiter, Intel 8289


The bus mastering in 8088 is achieved by 5 signals. In minimum mode,
the HOLD (input from control device) and HLDA (output from 8088)
handle the bus mastering between the 8086/8088 and other control
devices, when they need to take over the local bus. In maximum mode of
the 8086/8088 micro, the bus mastering is performed via 3 other signals,
namely the Request/Grant (RQ/GT0, RQ/GT1) and LOCK signals. When
an external controller needs to master the local bus, it pulls one of the
RQ/GT lines down during a complete clock cycle. Then the 8086/8088
may respond to the request by issuing a grant signal. The 8289 bus arbiter
chip can also take the status (S0, S1, S2) and LOCK signals to deliver
multi-bus control signals, as shown in figure 10-8.

661

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-8. Connection of the 8289 bus arbiter to the 8088 in maximum mode.

10.12. CRT Controllers, Intel 8275 & Motorola 6845


Any PC has a display for monitoring its output and needs a video card or
interface into which a display can be plugged. The majority of PC’s use
onboard' or integrated graphic processor which is built on most low and
mid range motherboards. For gaming and 3D simulation applications, a
good quality graphics card is needed. There exist so many CRT controller
chips. The 8275 and the Motorola 6845 chips were extensively used in
old PC's monochrome and color/graphic adaptors. These chips provide
the necessary interface to drive a raster-scan CRT.

The 6845 CRT controller has 19 registers (1 Address and 18 Data


Registers) in order to provide video timing, refresh memory addresses,
cursor, and light pen strobe signals. As shown in figure 10-9, the CRT
video timing signals include Vertical Sync (VS), Horizontal Sync (HS),
and Display Enable (DE) output signals. Refresh memory addressing
includes Memory Address (MA[13:0]) and Row Address (RA[4:0])
output buses. The 6845 micro interface consist of unidirectional data
input (DIN[7:0]) and data output (DOUT[7:0]) buses and control signals
RS, RWn, CSn, and E.

662

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-9(a). Pin-out diagram of the 6845 CRT controller.

The figure depicts the interface circuits to the monochrome display


adaptor (MDA) and color graphic adaptor (CGA). The monitor adaptors
are designed around the Motorola 6845 CRT controller module. There are
4K bytes of static memory on the adapter which is used for the display
buffer. This buffer has two ports and may be accessed directly by the
processor. No parity is provided on the display buffer.

Two bytes are fetched from the display buffer in 553ns, providing a data
rate of 1.8M byte/sec. The monitor adapter supports 256 different
character codes. An 8K-byte character generator contains the fonts for the
character codes.

The CGA has three modes available within the graphics mode. They are
low-resolution color graphics, medium-resolution color graphics, and
high-resolution color graphics. However, only medium- and high-
resolution graphics are supported in ROM. The following table
summarizes the three modes.

663

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-9(b). Utilization of 6845 CRT controller in a monochrome display adapter

664

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-9(c). Utilization of 6845 CRT controller in a color graphic adapter

10.13. Peripheral Universal Interface (UPI), Intel 8742 Chip


The Intel 8742 is a general-purpose Universal Peripheral Interface (UPI)
that allows the control and interface of peripheral devices. It contains a
microcontroller with 2K of program memory, 128 bytes of data memory,
8-bit timer/counter, and clock generator. Interface registers enable the
UPI to function as a peripheral controller in 8088 and 8086 systems.

Fig. 10-10(a).. Interfacing the 8088 CPU to peripheral devices via the 8742 UPI.

665

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-10(b). Pinout diagram of the 8742 UPI.

Fig. 10-10(c).. Block diagram the 8742 UPI.


666

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Table 10-7. Description of pins of the 8742

10-14. Graphic Processing Units (GPU) & Graphic Accelerators


A graphics processing unit (GPU) is a computer chip that performs rapid
mathematical calculations, primarily for the purpose of rendering images.

The term GPU was popularized by Nvidia in 1999, who marketed the
GeForce 256 as "the world's first GPU. Many companies , such as Intel,
Nvidia, AMD and ATI have produced GPUs under a number of brand
names. GPUs are used in embedded systems, mobile phones, personal
computers, workstations, and game consoles.

The GPUs of the most powerful class typically interface with the
motherboard by means of an expansion slot such as PCI Express (PCIe)
or Accelerated Graphics Port (AGP).
667

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

GPUs have moved away from the traditional fixed-function 3D graphics


pipeline toward general-purpose computational engine. GPUs are located
on plug-in cards, in a chipset on the motherboard or in the same chip as
the CPU, as shown in the following figure.

Fig. 10-11. Block diagram of NVidia GPU motherboard design.

668

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-15. IBM PC Chipsets


Since the beginning of Intel microprocessors (exactly after 80286),
chipsets have been an integral part or backbone of every PC. The PC
chipset assembles the functions of the above-described separate chips.
The term chipset is also used to refer to the main processing circuitry on
video cards. However, this is a totally different type of chipset, and not
the same as a motherboard (system) chipset. Sometimes the chipset chips
are implemented as application-specific integration circuits (ASIC).
Therefore, the total glue logic of the PC is divided into two major
chipsets, called the North-Bridge chipset and the South-Bridge chipset,

Fig. 10-11. Block diagram of the 82485 chipset, showing the North-Bridge (MCH)
and the South-Bridge (ICH2) chipsets, as connected with the P 4 and other devices.

669

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

The history of chipsets, started when Intel licensed ZyMOS's POACH


technology to create its 82230/82231 chipset to support 80286, which
assembled two interrupt controllers, an interval timer, the clock generator,
bus controller, two DMA controllers, the RTC, and a memory mapper
down into two chips. Each of those subsystems was previously a discrete
IC, so this represented an unprecedented level of integration that helped
boost performance of PC’s. The basic 2 components of such chipsets are
called the North-Bridge chipset and called the South-Bridge chipset.

The North-Bridge chipset connects the CPU front-side data bus1 with
main memory as well as other high-speed devices and ports, like the
AGP. The PCI bus is also connected to the north-bridge chipset as a
mezzanine stage, to link different types of busses. As shown, in figure 10-
12, the PCI bus is connected to the rest of the system via the South-
Bridge chipset. The south bridge chipset can convey signals between the
PCI bus and the ISA bus as well as other slow devices. For instance, the
floppy disk drive (FDD), the enhanced IDE (EIDE) channels, the
keyboard, the mouse and one or more serial ports may be connected via
this chipset. However, in most cases the system chipset does not integrate
all of the circuitry needed by the motherboard.

Some motherboards have the following additional controllers:

 The Super I/O chip, which handles the serial ports, parallel port,
USB ports, floppy disks, and sometimes the IDE hard disks.
 SCSI controllers (for SCSI hard disks & devices) and those found
in video, sound, and network cards.

10-16. Case Study: The i845 Chipset


The 82845 (i845D) is one of the famous chipsets, which support the
Pentium4 processor. This chipset supports DDR-SDRAM of PC1600 (1.6
GB/s bandwidth) and PC2100 specifications. The chipset i845D is
composed of a North-Bridge (called Memory Controller Hub or MCH)
i82845 and a South-Bridge (called I/O Controller Hub or ICH)
82801BA. In the following subsections we present both the MCH and the
ICH components of the 82845 chipset. Beside its role as a bus and
memory administrator, the i845D chipset resumes the characteristics of
its predecessor chipsets.

1
As we stated before, the recent microprocessors have wider external data bus (called front-side bus) to
connect them to main memory.

670

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-16.1. North-Bridge of the i845 Chipset (MCH)


The North-Bridge 82845 is a 593-pin FC-BGA package that differs from
those of previous chipsets. The supply of the chipset was optimized to
allow a higher stability. The organization of the NorthBridge is arranged
in a manner that the manufacturers of motherboards have less difficulty to
construct cards around this new chipset.

10-16.2. South-Bridge of the i845 Chipset (ICH)


The ICH chipset appeared with the range of i8xx. It principally brought
the support of the specifications of ATA100 to the new chipsets. The
ICH2 has a 360 pin E-BGA package. As shown in figure 10-13, all
components are very well arranged, just like North-Bridge, to allow a less
expensive and more rapid development. We detail here the various
controlled elements by North-Bridge. The IDE controller can manage 4
units of ATA33/66/100 in bus-mastering DMA mode. Note that the AT
attachment (ATA) is the standard specification that describes how the
IDE and EIDE interfaces work with hard disks.

The late versions of ATA enable you to connect up to 4 EIDE devices


(hard disks and CD drives), to the local bus of your mother board. The
ATA33 standard supports data transfer rate of 33.3 MB/s (2x 8.33 MHz
x2 bytes) and ATA66 supports transfer rate of 66.7 MB/s (2x 16.7MHz
x2 bytes). The NCV controller is able to manage 6 SCSI ports NCV2.2 at
33MHz. It has a maximum theoretical flow of 133MB/s. It uses 75 pins
of the total 360 pins which South-Bridge comprises. Although very
seldom exploited, the ICH2 contains a controller LAN, which must be
associated with an external component.

The LPC (Low Pine Count) controller allows connecting the I/O
controller to South-Bridge. It controls, ports PS2, LPT, IR ports, the
diskette controller, USB controller (4 ports), sound controller AC97 (6
channels), and a system management. As shown in figure, the connection
inter-bridge is quadrable data rate (QDR), and allows the communication
via a bus owner called "Hub Interface". A request to allocate 256k of
memory for the graphics card to access via the AGP bus could result in
several small, non-contiguous chunks of main memory being allocated.
bus (SMBUS) controller. Figure 10-14 depicts the reference design for
the motherboards that are based on i845D. To make this appear as one
256k, contiguous piece of memory to the graphics card, the chipset has
something called a Graphics Address Re-Mapping Table (GART). The
GART maps a linear range of virtual memory addresses to multiple, 4k,
physical addresses in main memory. The amount of memory for
remapping by the GART is 671 often determined by a setting in the

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

BIOS known as the AGP aperture.

Fig. 10-12. Pins of the 82485 North-Bridge chipsets (MCH).

Fig. 10-13. Pins of the 82485 SouthBridge chipsets (ICH2).

672

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-14. PC motherboard with i485D chipsets.

Fig. 10-15. Graphic Address Re-mapping Table.

673

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-17. Case Study: Intel DG 900 Series Chipset


Core 2 Duo's introduction in 2006 saw Intel release an additional eight
desktop and six mobile chipsets in the 900-series. The Figure 10-16
depicts the layout of the Intel DG965 chipset, which is dedicated for
supporting the Core2 Duo-2 microprocessors. Most Core 2- chipsets used
PCI Express 2 (PCIe) slots as well as DDR2 and later DDR3 memory.

Fig. 10-16. Block diagram of the Intel DG965 chipset, showing the North-Bridge
(MCH) and South-Bridge (ICH8) chipsets.

10-18. Case Study: AMD 690G Chipset


Most of AMD systems were based on third-party chipsets, such as VIA,
SiS and Nvidia. However, on February, 2007 AMD (jointly with ATI)
announced their first product for AMD64 platform, namely the AMD 690
chipset with Radeon X1250 graphics core. The AMD 690G chipset
consists of two microchips – RS690 North Bridge and SB600 South
Bridge. They are connected with one another via PCI-Express x4 bus.

The North Bridge ensures proper work with any contemporary Socket
AM2 processors. It is also responsible for work with PCI-Express
expansion cards and High Definition Audio codec. It also contains
integrated graphics adapter. SB600 South Bridge supports PCI bus, 10
USB 2 ports, 4 Serial ATA (SATA) channels (with RAID 0, 1 and 10
arrays) and one IDE.
674

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

The key feature of the AMD 690 chipset family is the powerful integrated
graphics core from the Radeon X1200 Series. There are two models in
this family: AMD 690G and 690V. The top 690G solution features.
Radeon X1250 graphics core and supports HDMI output, while the AMD
690V features an integrated

Fig. 10-17. AMD G690 chipset, connected with AMD processors and other devices.

The forefather of the Radeon X1250 was the Radeon X700 released in
2004. The major enhancement is the support of Avivo (advanced features
for media content processing, including HD-video). The clock speed of
Radeon X1250 is 400MHz, and up to 1GB of system memory can be
allocated for the needs of the graphics subsystem. AMD 690G was the
industry first chipset supporting High-Definition Media output (HDMI-
Out). It has two monitor connectors, including one DVI. Moreover, AMD
690G supports SurroundView function that allows connecting four
monitors to the system with an additional discrete PCIe video card. The
following table recapitulates some of the most recent chipsets for Intel
and AMD x86 machines and their characteristics.
675

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Table 10-8. List of recent PC chipsets and their characteristics.

10-19. Case Study: Intel 5 Series (Core i7) Chipset


As we have mentioned so far in Chapter 2, the Nehalem architecture
processors (such as Corei7) integrate the memory controllers and other
features, which were usually assembled in the North-Bridge chipset. The
following figure shows the difference between chipsets of Nehalem
architecture-based microprocessors (e.g., Corei7) and previous Intel
processors chipsets. The following figure depicts the layout of a recent
chipset from Intel (x58), which is dedicated for supporting the Corei7
Extreme Edition microprocessors.
As in earlier iterations of Intel core logic, the X58 Express is a
chipset of two separate ICs, rather than a monolithic design like nVidia or
SIS chipsets. The rationale is cost-efficiency since the same ICH10
South-Bridge can be used in any other Intel chipset configuration.

676

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

One thing to be aware of is the lack of legacy hooks, that is, no


more PS/2 connector and no more EIDE parallel drive support. Needless
to say that there are no serial or printer ports either. So, those who are still
using old printers, scanners or CD burners, will have to find an expansion
card or else upgrade to new devices with USB or serial SATA ports.

Fig. 10-18. Block diagram of the Intel x58 chipset.

677

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-19. Difference between Nehalem processors (such as Corei7) and previous
Intel microprocessors and their chipsets

10-20. Case Study: Intel's 6- And 7-Series Chipsets (Core i9) Chipset
The arrival of Sandy Bridge-based Core CPUs coincided with Intel's
Couger Point chipset launch in 2011. The most feature solution was
Intel's Z68, which boasted eight PCIe 2 lanes, two SATA 6Gb/s ports,
four SATA 3Gb/s ports, and 14 USB 2.0 ports. It also featured support
for overclocking, RAID 0/1/10, and the ability to split PCIe connectivity
up between multiple GPUs. Later, with the introduction of the Panther
platforms and Ivy Bridge processors, Intel integrated a USB 3.0 controller
into its PCHes. All 7-series chipsets can support up to four USB 3.0 ports
as a result. Ivy Bridge CPUs also saw the introduction of PCIe 3.0.

10-21. Apple PC Chipsets


It is well known, that Apple Computers has recently changed from
PowerPC to x86-64 processors, and specifically to Intel Xeon processors.
The Xserve chipset, from Apple computers, makes use of the Intel Xeon
(5400 series), which is based on the Core2 architecture. Each processor
has four 64-bit cores with shared L2-cache of 12 MB. The FSB is
operated at 1.6 GHz. It has 8 FB-DIMM slots for PC2-6400 (DDR2-800
MHz) with ECC. Each slot can have 4GB, with a total of 32 GB. The
graphic card is an AMD/ATI, X1300, which has a PCI-X interface, via
the south-bridge chipset.

678

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Fig. 10-20. Apple Xserve chipset with the Xeon microprocessor modules.
679

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-22. Case Study: Intel Z170 (Skylake) Chipset


The Z170 would be the flagship performance chipset which supports the
unlocked K-Series Skylake processors

Fig. 10-21. Intel Z170 (Skylake) Chipset.

The chipset will feature up to 20 PCIe Gen 3 lanes, 6 SATA Gen 3 ports,
10 USB 3.0 ports and a total of 14 total USB ports (USB 3.0 / USB 2.0),
up to 3 SATA Express capable ports, up to 3 Intel RST capable PCI-e
storage ports which may include x2 SATA Express or M.2 SSD port with
Enhanced SPI and x4/x8/x16 capable Gen 3 PCI-Express support from
the processor. Aside from that, we know that the Skylake processors
would be compatible with the latest LGA 1151 socketed boards

680

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Table 10-1. Intel Mainstream Platforms Comparison Chart:

Intel Sandy Intel Ivy Intel


Intel Haswell Intel Skylake
Bridge Bridge Broadwell
Platform Platform
Platform Platform Platform
Processor
Sandy Bridge Ivy Bridge Haswell Broadwell Skylake
Architecture
Date 2011 2012 2013-2014 2015 2015
Process 32nm 22nm 22nm 14nm 14nm
Cores (Max) 4 4 4 4 4
Desktop Desktop Desktop Desktop
Platform Desktop LGA
LGA LGA LGA LGA
Chipset 6-Series 7-Series 8-Series 9-Series 100-Series
Socket LGA 1155 LGA 1155 LGA 1150 LGA 1150 LGA 1151
DDR4 /
Memory
DDR3 DDR3 DDR3 DDR3 DDR3 (Up to
Support
64 GB)

681

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-23. Summary
In this chapter we recapitulated the most famous support chips, in the
IBM PC. In particular, we described the system chipset, which is a set of
integrated computer support chips. Thus a chipset is the logic circuit that
collects the intelligence of the PC motherboard. Chipsets control data
transfers between the processor, cache, system buses, basically
everything inside the PC. Since data flow is such a critical issue, the
chipset is one of the most important components in the PC. The
following figure depicts the general structure of the PC chipset, which is
composed of two basic ICs, namely: the north-bridge (IOH) and the
south-bridge (ICH)

The north bridge (IOH) contains the fast system I/O interface circuits and
the south bridge (ICH) usually handles the slower transfer devices. Some
manufacturers combined the two ICs in a single monolithic chipset, but
the solution of two ICs (south and north bridges) is still widely adopted in
the PC industry.

Integrated chipsets (with Graphics accelerators) are very attractive


because they allow saving money during PC assembly. Platforms using
integrated graphics are much more economical. The computer power
consumption is not only directly connected with the ever growing power
costs, but is also proportional to the amount of heat that needs to be taken
off the system. Since the power consumption of most graphics cards
keeps increasing, integrated systems may become a really great choice.
The following table shows a comparison between some famous chipsets

682

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Pentium 4 Chipsets

South FSB Max. TDP


Chipset Date CPU Memory PCI Graphics
bridge (MHz) Mem (W)
PC800/
v2.2/3
850 ICH2 2000 Pentium4 400 600 2 GB AGP 4×
3 MHz
RDRAM
PC1066/
400/ v2.2/3
850E ICH2 2002 Pentium4 800 2 GB AGP 4×
533 3 MHz
RDRAM
Celeron, DDR
400/ v2.2/3
845E ICH4 2002 Celeron D, 200/ 2 GB AGP 4× 5.8
533 3 MHz
Pentium 4 266
Celeron, DDR v2.2/3 Integrate
845GL ICH4 2002 400 2 GB 5.1
Pentium4 266, 3 MHz d
Celeron, 400/ DDR v2.2/3 AGP 4×
845G ICH4 2002 2 GB 5.1
Pentium4 533 266, 3 MHz & s
AGP 4×
Celeron, 400/ DDR v2.2/3
845GE ICH4 2002 2 GB & 6.3
Pentium4 533 266/333 3 MHz
integrate
DDR
Celeron, 400/ v2.2/3
845PE ICH4 2002 266/ 2 GB AGP 4× 5.6
Pentium4 533 3 MHz
333
Celeron, DDR
400/ v2.2/3 integrate
845GV ICH4 2002 CeleronD, 266/ 2 GB 5.1
533 3 MHz d
Pentium4 333,
Pentium4, DDR
ICH5/ 400/
Pentium EE 266/ v2.3/3 AGP 8×
865G ICH5 2003 533/ 4 GB 12.9
Celeron, 333/ 3 MHz &
R 800
CeleronD 400
Pentium4, 400/ DDR v2.3/3
865P ICH5 2003 4 GB AGP 8× 10.3
CeleronD 533 266/333 3 MHz
Pentium4, DDR
ICH5/ 400/
PentiumD, 266/ v2.3/3
865PE ICH5 2003 533/ 4 GB AGP 8× 11.3
PentiumEE, 333/ 3 MHz
R 800
Celeron D 400
ICH5/ Celeron, 400/ DDR
v2.3/3
848P ICH5 2003 Celeron D, 533/ 266/333 2 GB AGP 8× 8.1
3 MHz
R Pentium 4 800 /400
Pentium4,
ICH5/ PentiumD, 400/ DDR
v2.3/3 Integrate
865GV ICH5 2003 Pentium EE 533/ 266/333/ 4 GB
3 MHz d
R Celeron, 800 400
Celeron D

683

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Core2 Chipsets
All Core2 Duo chipsets support the Pentium Dual-Core and Celeron
processors based on the Core architecture. Support for all NetBurst based
processors was officially dropped starting with the P35 chipset family.

South Tech FSB Memory Max


Chipset Date CPU Graphics
Bridge (nm) (MHz) types Mem
Pentium4,
ICH7/7R 533/ DDR2 PCI-E 16×
945GC 2007 Core2 Duo, 45 2 GB
7DH 800 533/667 GMA 950
Atom
Pentium4,, 533/ DDR2 2 or
945GZ ICH7 2005 65 GMA 950
Core2 Duo, 800 400/533 4?GB
PCI-
Pentium4, 533/ DDR2
946PL ICH7 2006 65 4 GB Express
Core 2 Duo, 800 533/667
16×
Pentium4, 533/ DDR2 PCI-E16×
946GZ ICH7 2006 65 4 GB
Core 2 Duo, 800 533/667 GMA 3000
Pentium 533/ DDR2 PCI-E 16×
ICH8/ICH8
G965 2006 Dual-Core/ 65 800/ 533/667/ 8 GB GMA
R/ICH8-DH
Core 2 Duo 1066 800 X3000
Core2Quad/ 800/ PCI-E 16×
DDR2
G31 ICH7 2007 Core2 Duo/ 45 1066/ 4 GB 1.1
667/800
1333 GMA 3100
PCI-E 16×
800/
ICH8/ICH8 Core2Quad/ DDR2 1.1
G35 2007 45 1066/ 8 GB
R/ICH8-DH Core2 Duo 667/800 GMA
1333
X3500
Core2Quad/ 800/ DDR3
ICH9/ICH9 Core2 Duo/ 1066/ 1600 8 GB 2 PCI-E
X48 2008 45
R/ICH9-DH Core2 1333/ DDR2/ 16GB 16× 2.0
Extreme 1600 1066
DDR3 1 PCI-E
800/
Core2Quad/ 800/1066 4 GB 16× 1.1
G41 ICH7 2008 45 1066/
Core2 Duo DDR2 8 GB GMA
1333
667/800 X4500
DDR3
1PCI-E16×
800/ 800/1066 8 GB
ICH10/ICH Core2Quad/ 2.0
G45 2008 45 1066/ DDR2 16 G
10R Core2 Duo GMA
1333 667/800[ B
35] X4500

DDR3
800/ 1 PCI-E
Core2Quad/ 800/1066 16 G
B43 ICH10D 2008 45 1066/ 16× 2.0
Core2 Duo DDR2 B
1333 MAX 4500
667/800

684

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

Core / Core2 mobile chipsets

South FSB Memory Max. TDP


Chipset date CPU Graphics
bridge (MHz) types memory W
GMA
Celeron M,
ICH7- DDR2 950
940GML 2006 Pentium 533 2GB 7
M 400/533 graphics
Dual-Core
core
GMA
ICH7- Q2, DDR2 950
945GSE Intel Atom 533/667 2GB 6
M '08 400/533 graphics
core
GMA
Core2Duo,
ICH7- DDR2 950
945GM/E 2006 Core Duo, 533/66 4GB 7
M 400/533/667 graphics
Celeron M
core
Core2 Duo, PCI-
ICH7- DDR2 *
945PM 2006 Core Duo, 533/667 4 GB Express 7
M 400/533/667
Celeron M 16×

Core i Series chipsets


The Nehalem microarchitecture, which is the basis of the Core i7 brand
and projected additional brands, moves the memory controller onto the
microprocessor. For high-end Nehalem processors, the X58 IOH acts as a
bridge from the QPI to PCI-Express peripherals and DMI to the ICH10
southbridge. For mainstream and lower-end Nehalem processors, the
integrated memory controller (IMC) is an entire Northbridge (some even
have GPUs), and the PCH (Platform Controller Hub) acts as a South-
bridge. The Z68 chipset which supports CPU overclocking and use of the
integrated graphics does not have this hardware bug. The Z68 also added
support for transparent caching hard disk data onto solid-state drives (up
to 64GB). This is called Smart Response Technology.

LGA 1155
Chipsets supporting LGA 1155 CPUs.

PCI
Bus Bus TDP
Chipset Date Express SATA USB
Interface Speed W
lanes
6 PCI-E Rev 2.0, 10
H61 2011 DMI 2.0 4 GB/s 3Gbit/s, 4 Ports 6.1
2.0 Ports
, 8 PCI-E 6 Gbit/s, 2Ports & Rev 2.0, 14
P67 DMI 2.0 4 GB/s 6.1
2011 2.0[42] 3Gbit/s,4Ports Ports

685

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

8 PCI-E 6 Gbit/s, 2 Ports & Rev 2.0, 14


Z68 2011 DMI 2.0 4 GB/s 6.1
2.0 3 Gbit/s, 4 Ports Ports
8 PCI-E 6 Gbit/s, 2 Ports & Rev 2.0, 14
Q67 2011 DMI 2.0 4 GB/s 6.1
2.0[43] 3 Gbit/s, 4 Ports Ports
Rev 3.0, 4
8 PCI-E 6 Gbit/s, 2 Ports &
Z77 2012 DMI 2.0 4 GB/s Ports & Rev 6.7
2.0 3 Gbit/s, 4 Ports
2.0, 10 ports
Rev 3.0, 4
8 PCI-E 6 Gbit/s, 2 Ports &
H77 2012 DMI 2.0 4 GB/s Ports & Rev 6.7
2.0 3 Gbit/s, 4 Ports
2.0, 10 ports
Rev 3.0, 4
8 PCI-E 6 Gbit/s, 2 Ports &
Q77 2012 DMI 2.0 4 GB/s Ports & Rev 6.7
2.0 3 Gbit/s, 4 Ports
2.0, 10 ports
Rev 3.0, 4
8 PCI-E 6 Gbit/s, 1 Port & 3
B75 2012 DMI 2.0 4 GB/s Ports & Rev 6.7
2.0 Gbit/s, 5 Ports
2.0, 8 ports

LGA 1156
Chipsets supporting LGA 1156 CPUs.

Chipset Date Bus Interface Bus Speed PCI Express lanes SATA USB TDP W
8 PCI-E 2.0 3Gbit/s, Rev2.0,
P55 2009 DMI 2 GB/s 4.7
at 2.5 Gbit/s 6 Ports 14 Ports
6 PCI-E 2.0 3Gbit/s, Rev2.0,
H55 2010 DMI 2 GB/s 5.2
at 2.5 Gbit/s 6 Ports 12 Ports
8 PCI-E 2.0 3Gbit/s, Rev2.0,
H57 2010 DMI 2 GB/s 5.2
at 2.5 Gbit/s 6 Ports 14 Ports
8 PCI-E 2.0 3Gbit/s, Rev 2.0,
Q57 2010 DMI 2 GB/s 5.1
at 2.5 Gbit/s 6 Ports 14 Ports

LGA 1366 & LGA 2011


Chipsets supporting LGA 1366 and LGA 2011 CPUs.

Bus TDP
Chipset Date Bus Speed PCI Express lanes SATA USB
Interface W
36 PCI-E 2.0
Up to 3 Gbit/s, Rev2.0,
X58 2008 QPI at 5 Gbit/s (IOH); 28.6
25.6GB/s 6 Ports 12Ports
6 PCI-E 1.1 (ICH)
6 Gbit/s,
2 Ports & Rev2.0,
X79 2011 DMI 2.0 4GB/s 8 PCI-E 2.0 7.8
3 Gbit/s, 14Ports
4 Ports

686

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-24. Problems

10-1) Consider the parallel port application shown below. Show how to
build the project under DOS such that when you turn on the relay switch,
the appliance is connected to the AC mains.

10-2) Assessment Project #1

Show how to interface stepper motor to 8086 microprocessor using 8255.


Use the following Assembly program to rotate a stepper motor in
clockwise & anticlockwise, by the keyboard and follow the procedure.
Assembly routine to rotate a stepper motor in clockwise direction
MODEL SMALL
.STACK 100
.DATA
PORTA EQU FFC0H ; PORTA ADDRESS
PORTB EQU FFC2H ; PORTB ADDRESS
PORTC EQU FFC4H ; PORTC ADDRESS
CWR EQU FFC6H ; CONTROL PORT ADDRESS
PHASEC EQU 03H
PHASEB EQU 06H ; SEQUENCE IN SERIES TO ROTATE MOTOR
PHASED EQU 0CH ; IN CLOCKWISE DIRECTION
PHASEA EQU 09H
.CODE
START:
MOV AL,@DATA
MOV DX,CTL
687

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

OUT DX,AL
AGAIN:

MOV AL,PHASEC
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP:
LOOP UP
MOV AL,PHASEB
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP1:
LOOP UP1
MOV AL,PHASED
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP2:
LOOP UP2
MOV AL,PHASEA
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP3:
LOOP UP3
JMP AGAIN ; REPEATE OUTPUT SEQUENCE
INT 03H
END START

Assembly routine to rotate stepper motor in anticlockwise direction


MODEL SMALL
.STACK 100
.DATA
PORTA EQU FFC0H ; PORTA ADDRESS
PORTB EQU FFC2H ; PORTB ADDRESS
PORTC EQU FFC4H ; PORTC ADDRESS
CWR EQU FFC6H ; CONTROL PORT ADDRESS
PHASEC EQU 03H
PHASEA EQU 09H ; SEQUENCE IN SERIES TO ROTATE MOTOR
PHASED EQU 0CH ; IN ANTICLOCKWISE DIRECTION
PHASEB EQU 06H
.CODE
START:
MOV AL,@DATA
MOV DX,CTL
OUT DX,AL
AGAIN:
688

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

MOV AL,PHASEC
MOV DX,PORTC

OUT DX,AL
MOV CX,0FFFFH
UP:
LOOP UP
MOV AL,PHASEA
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP1:
LOOP UP1
MOV AL,PHASED
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP2:
LOOP UP2
MOV AL,PHASEB
MOV DX,PORTC
OUT DX,AL
MOV CX,0FFFFH
UP3:
LOOP UP3
JMP AGAIN ; REPEATE OUTPUT SEQUENCE
INT 03H
END START

Procedure
1. Connect power supply 5V & GND to both microprocessor trainer kit &
Stepper motor interfacing kit.
2. Connect data bus between microprocessor trainer kit & Stepper motor
interfacing kit.
3. Enter the program to rotate Stepper motor in clockwise &
anticlockwise.
4. Execute the program by typing:
GO E000:00C0 ENTER (for clockwise),
GO E000:0030 ENTER (for anticlockwise).
5. Observe the rotation of stepper motor.

689

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 10

10-25. References

[1] M. Sergent, IBM PC Inside Out, McGraw Hill, 1986

[2] Peter Norton, Inside the IBM PC, Brady, New York, 1986.

[3] IBM Personal Computer Hardware Reference, 1986.

[4] Intel Corp., Peripheral Components, Santa Clara. CA, 1993.

[5] V. Rajaraman, and T. Radhakrishnan, Essentials of Assembly


Language Programm-ing, for the IBM PC, Prentice-Hall, 2000.

[6] B. Govindarajalu, IBM PC And Clones: Hardware, Troubleshooting


and Maintenance , Tata McGraw-Hill Education, 2002.

690

Prof. Dr. Muhammad El-SABA


Introduction to Microprocessors & Interface Circuits CHAPTER 11

Microprocessor Selection
Guide
Contents
11-1. Intel Microprocessors Selection Guide
11-2. AMD Microprocessors Selection Guide
11-3. SPARC Microprocessors Selection Guide
11-4. Processor Performance Factors
11-5. Benchmarks
11-6. Microprocessor Packages
11-7. Processor Sockets
11-8. Processor Bus Speed
11-9. Overclocking
11-10. Processor Supply Voltages
11-11. CPU Cooling
11-12. Summary
11-13. Problems

691
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

692
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Microprocessor
Selection Guide
11-1. Intel Microprocessors Selection Guide
At the time of writing the first version of this book, the most recent
release of Intel 80x86 microprocessors was Intel® Pentium 4. However,
the Intel Core2 processors followed the Pentium 4 and became the
vedette of more recent PC’s and other computing platforms. The Pentium
4 processors are based on the Intel Netburs microarchitecture and still
maintains the tradition of compatibility with IA-32 software.

In 2005, Intel has released a 3.73 GHz Pentium 4 microprocessor, with


1066 MHz front-side bus speed and based on the hyper threading (HT)
technology. In 2006, Intel introduced the Core2 series of micro-
processors. The Intel dual Core (Core2) microprocessor allows for true
parallel computing capabilities on the desktop PC’s. It has 2 complete
processor cores in one package running at the same frequency. The Intel
Core2 features a new virtualization technology (VT), which allows one
hardware platform to function as multiple virtual platforms. The two
major of the i7 architecture, developed under the codeword Nehalem are
the triple-channel integrated memory controllers and the new interface
called Quick Path Interconnect or short QPI. Additional highlights
encompass the monolithic die to consolidate all four cores in a single
piece of silicon, and the shared L3 cache. The so-called Intel Celeron
processors expanded Intel's processor family into the value-priced PC
market segment. The Intel Mobile processors offer great performance for
full-size, thin-and-light as well as desktop-replacement notebooks. In
table 11-1, we recapitulate all the Intel microprocessors, which are
introduced since 1971. Note that for each kind of processor there is a type
of use for its like:
1. Desktop processors ,
2. Laptop processors,
3. Wireless processors,
4. Server processors,
5. Workstation processors,
6. Network processors, and
7. Embedded processors,
693
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Table 11-1. INTEL Processors, from 1971 to 2015. The indicated data bus width is
the internal data bus
FAMILY

NUMBER OF DESIGN ADDRESS DATA


TRADE NAME CLOCK BUS
DATE RULE BUS
(CODE NAME) IN MHZ MIPS TRANSISTORS
(MICRON) (BITS) (BITS)

i7-6700 CORE i7 47GHZ - 2015- ----- 0.014 64 64/128

i7-965 CORE i7 3.4 GHZ - 2011- 1.16 BILLION 0.032 64 64/128

I7-035 CORE i7 3.3 GHZ - 2009- 731 MILLION 0.045 64 64/128


80557 CORE2 DUO 1.6-2.3G - 2007 167 MILLION 0.065 64 64/128

P4E EXTREME 3.75G 11000 2005 125 MILLION 0.09/0.065 64 64/128

P4 MADISON 2400 - 2003 50 MILLION 0.13 64 64/128

80686 PENTIUM M 900-1700 1500. 2003 77 MILLION 0.13 – 0.9 64 64/32


80686 PENTIUM 4 1500. 1500. 2001 42 MILLION 0.18-0.13 64 64/32
80686 PIII-CELERON 533-1300 300 2000 7.5 MILLION 0.18 64 64/32
80686 PIII XEON 733 733.0 1999 28.1 MILLION 0.18 64 64/32
80686 PENTIUM III 500-1400 500 - 1999 9,5 MILLION 0.25-0.18 64 64/32
80686 P II XEON 400 400.0 1998 7,5 MILLION 0.25 64 64/32
PII -
80686 266-400 300 1998 7,5 MILLION 0.25 32 64/32
CELERON
80686 PENTIUM II 233-300 300 1997 7,5 MILLION 0.35 64 64/32
PENTIUM
80586 200 MHZ 200 1995 5,5 MILLION 0.35 CMOS 32 64/32
PRO
100- MAR 3,1 MILLION
80586 PENTIUM 60 / 66 0.8 CMOS 32 64/32
112 1993 273 PIN PGA
MAR, 1,4 MILLION
80486 80486 DX2 20 - 100 50 0.8 CMOS 32 32
1992 168 PIN
APRIL, 1,2 MILLION
80486 486 DX 25-50 20-41 1.0 CMOS 32 32
1989 168 PIN
5- 275000
80386 386 DX 16 - 33 1985 1.0 CMOS 32 32
11.4 132 PIN
80286 80286 6 -12.5 2.66 1982 134,000 1.5 NMOS 24 16
0.33 – JUNE, 29,000
8088 8088 5-8 3.0 NMOS 20 8/16
0.75 1979 40 PIN
0.33 - JUNE,
8086 8086 5-10 29,000 40 PIN 3.0 NMOS 20 16
0.75 1978
8080 8080 2-3 0.64 1974 4,500 40 PIN 6.0 NMOS 16 8
4004 4004 0.74 0.07 1971 2300 10 PMOS 6 4

694
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Intel 4004 Intel 8088 Intel 80286

Intel 80386 Intel 80486 Intel Pentium

Intel Pentium 4 Intel Core2 Duo Intel Core i7


Fig. 11-1. Photograph of some Intel microprocessors.

11-1.1. Intel Mobile Processors


As shown in the following table, Intel mobile processors include a line of
products, like Centrino. The Intel Pentium 4M processor is a 0.13m
mobile processor with 512 kB L2-cach and 400MHz Front Side Bus. The
Intel Centrino is a 45nm mobile processor with Deep Power Down
technology. It has 800 MHz Front Side Bus and 6MB L2-cache memory.
Table 11-2(a). Examples of Intel Mobile Processors.

PROCESSOR CLOCK POWER VOLT FSB (MHZ) L2 L3 Cache


Pentium 4-M 2.2 GHz <2W 1.3 V 400 1/ 4 MB
Core 2 Duo T7700 2.4 GHz <2W 1.3 V 800 1/ 4 MB
Core i7- 5950 2.9 GHz 47 W 0.8 V 1150 1/ 6 MB

695
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-1.2. Intel Desktop Processors


As shown in the following table, Intel desktop processors include a large
number of products. The Intel Corei7 Extreme processor is Intel's latest
desktop processor to launch (2011) with 6 cores. This processor is built in
45 nm technology. It has 8MB L-2 Cache and run at 3.2-3.6 GHz.

Table 11-2(b). Intel Desktop Processors.

Processor Clock FSB L2-Cache


Processor No. Tech. Cores
name (GHz) (MT/s) (MB)
420, 430, 440 1 1.6, 1.8, 2
Celeron 65 nm 800 0.512
E1200, E1400 2 1.6, 2.0
Pentium Dual E2140, E2160, 1.6, 1.8, 2,
1
Core E2180, E2200 2.2
E4300, E4400, 800
65 nm 2 1.8, 2, 2.2,
E4500, E4600,
Core 2 Duo 2.4, 2.6 2
E4700
E6300, E6400 1.86, 2.13 1066
E6300, E6400 2
1.86, 2.13
E6320, E6420 1066
E6600, E6700 2.40, 2.67
Core 2 Duo 65 nm 2
E6540 2.33 4
E6550, E6750, 1333
2.33, 2.67, 3
E6850
E8190 2.67
Core 2 Duo E8200, E8400, 45 nm 2 1333 6
2.67, 3, 3.16
E8500
Core 2
QX9650 45 nm 4 3 1333 12
Extreme

Table 11-2(c). Intel Corei7 desktop processors.

Name Cores Threads Frequency Turbo L2 cache L3 cache Date


Core i7-6700X 4 12 4 GHz 4.00 GHz 4x256KB 8MB 2015
Core i7-3970X 6 12 3.50 GHz 4.00 GHz 6x256KB 15MB 2013
Extreme Edition
Core i7-3960X 6 12 3.30 GHz 3.90 GHz 6x256KB 15MB 2011
Extreme Edition
Core i7-3930K 6 12 3.20 GHz 3.80 GHz 6x256KB 12MB 2011
Core i7-3820 4 8 3.60 GHz 3.80 GHz 4x256KB 10MB 2012

696
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-1.3. Intel Laptop Processors


As shown in the following table, Intel laptop processors include a large
number of products.
Table 11-2(d). Intel Laptop Processors.

L2-
Processor No. of Clock speed FSB
Processor No. Tech. Cache
name Cores (GHz) (MT/s)
(MB)
520, 530 1.6, 1.73
Celeron M 520, 530, 540, 550, 1.6, 1.73, 533 1
65
560 1 1.86, 2, 2.13
nm
Celeron M
523 0.933 533 1
ULV
Core 2 Solo 65
U2100, U2200 1 1.06, 1.2 533 1
ULV nm
Pentium Dual T2310, T2330, 65 1.46, 1.6,
2 533 1
Core T2370, T2390 nm 1.73, 1.86
Core 2 Duo U7500, U7600, 1.06, 1.2,
533
ULV U7700 1.33
T5300 1.73 533
T5250, T5450,
65 1.5, 1.67,
T5550, T5750, 2 2
nm 1.83, 2.0, 2.1 667
Core 2 Duo T5850
T5500, T5600 1.67, 1.83
T5270, T5470, 1.4, 1.6, 1.8, 800
T7100, T7250 2.0 MT/s
667
L7200, L7400 1.33, 1.5
Core 2 Duo MT/s
4
LV L7300, L7500,
1.4, 1.6, 1.8 800
L7700
T5200 65 1.6 533
2 2
T5500, T5600 nm 1.67, 1.83
T7200, T7400, 667
Core 2 Duo 2, 2.16, 2.33
T7600
4
T7300, T7500, 2, 2.2, 2.4,
800
T7700, T7800 2.6
T8100, T8300 45 2.1, 2.4 3
Core 2 Duo 2 800
T9300, T9500 nm 2.5, 2.6 6

697
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Table 11-2(e). Intel Laptop Processors (Cont.)

Clock L2-
Processor Processor No. of FSB
Tech. speed Cache
name No. Cores (MT/s)
(GHz) (MB)
Core 2 45
X9000 2 2.8 800 6
Extreme nm
32
Core i7 I7-2960XM 4 2.7-3.7 5000 8
nm

11-1.4. Intel Workstation & Server Processors


As shown in the following table, Intel workstation and server processors
include a large number of products.

Table 11-2(f). Intel Workstations and Server Processors.

Clock L2-
Processor No. of FSB
Processor No. Tech. speed Cache
name Cores (MT/s)
(GHz) (MB)
Dual-Core
3040, 3050 65 nm 2 1.86, 2.13 1066 2
Xeon
3040, 3050 1.86, 2.13 2
1066
Dual-Core 3060, 3070 2.4, 2.67
65 nm 2
Xeon 3065, 3075, 2.33, 2.67, 4
1333
3085 3
Dual-Core
E3110 45 nm 2 3.0 1333 6
Xeon
Dual-Core 5128, 5138 1.86, 2.13 1066
Xeon LV 5148 2.33 1333
5110, 5120 65 nm 2 1.6, 1.86 1066 4
Dual-Core
Xeon 5130, 5140, 2, 2.33,
1333
5150, 5160 2.67, 3
E5205 1.86 1066
Dual-Core
X5260 45 nm 2 3.33 1333 6
Xeon
X5272 3.4 1600
Quad-Core X3210, X3220, 2.13, 2.4,
65 nm 4 1066 8
Xeon X3230 2.67

698
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Table 11-2(g). Intel Workstations and Server Processors (Cont.)

Clock L2-
Processor No. of FSB
Processor No. Tech. speed Cache
name Cores (MT/s)
(GHz) (MB)
Quad-Core L5310, L5320 1.6, 1.86 1066
Xeon LV L5335 2 1333
E5310, E5320 1.6, 1.86
65
E5330, E5340, 4 2.13, 2.4, 1066 8
Quad-Core nm
E5350 2.67
Xeon
E5335, E5345, 2, 2.33,
1333
X5355, X5365 2.67, 3
E5405, E5410, 2, 2.33, 2.5,
E5420, E5430 2.67
1333
Quad-Core E5440, E5450, 45 2.83, 3, 3,
4 12
Xeon X5450, X5460 nm 3.16
E5462, E5472, 2.8, 3, 3,
1600
X5472, X5482 3.2
Quad-Core
L7345 1.86 8
Xeon LV
E7310, E7320 65 1.6, 2.13 4
4 1066
Quad-Core nm
E7330 2.4 6
Xeon
E7340, X7350 2.4, 2.93 8

Xeon EX E7-8870 32 nm 10 2.4-2.8 600 30M

Itanium 9350 9350 4 1.73 GHz 4800 24M

Note that the Pentium Pro, Pentium II, Pentium III, and Pentium
III Xeon processors are belonging to the 32-bit Intel Architecture
(IA-32) processors based on the P6 Microarchitecture. The
Pentium 4, Pentium D, and Pentium processor Extreme Editions
are based on the Intel NetBurst Microarchitecture. Most early
Intel Xeon processors are also based on the Intel NetBurst
Microarchitecture. The Intel Core and Xeon processors are based
on an improved version of Pentium Microarchitecture. The Intel
Xeon processor 3x00 and 7x00 series, Pentium dual-core, Intel
Core2 Duo are based on Intel Core Microarchitecture. While Intel
Core i7 are based on Nehalem Microarchitecture

699
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-2. AMD Processors


Nowadays, the most important competitors to Intel market share of
microprocessors are Advanced Micro Devices (AMD). The final estimate
of market share of PC microprocessors in 2007 showed that Intel
produces about 75% while AMD takes over about 15% of overall
microprocessor revenues.
Table 11-3. AMD Processors, from 1975 to 2010
.
Processor Series Date
AMD 29000 series, aka 29K, 1987-1995
x86 processors second-sourced, under contract with Intel 1979-1991
Amx86 series 1991–1995
Amx86 series 1991–1995
Am386 1991
Am486 1993
Am5x86 (a 486-class µP) 1995
K5 series 1995
AMD K5 (SSA5/5k86)
K6 series 1997–2001
AMD K6 (NX686/Little Foot) 1997
AMD K6-2 (Chompers/CXT)
AMD K6-2-P (Mobile K6-2)
AMD K6-III (Sharptooth)
AMD K6-III-P
AMD K6-2+
AMD K6-III+
K7 series 1999–2005
Athlon (Slot A) (Argon,Pluto/Orion,Thunderbird) 1999
Athlon (Socket A) (Thunderbird) 2000
Duron (Spitfire,Morgan,Applebred) 2000
Athlon MP (Palomino) 2001
Mobile Athlon 4 (Corvette/Mobile Palomino) 2001
Athlon XP (Palomino) 2001
Mobile Athlon XP (Mobile Palomino) () 2002
Mobile Duron (Camaro/Mobile Morgan) () 2002
Sempron (Thoroughbred,Thorton,Barton) () 2004
Mobile Sempron

700
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Processor Series Date


K8 series 2003– now
Families: Opteron, Athlon 64, Sempron, Turion 64,
Athlon 64 X2, Turion 64 X2
Opteron (SledgeHammer) 2003
Athlon 64 FX (SledgeHammer) 2003
Athlon 64 (ClawHammer/Newcastle) 2003
Mobile Athlon 64 (Newcastle) 2004
Athlon XP-M (Dublin) 2004
Sempron (Paris) 2004
Athlon 64 (Winchester) 2004
Turion 64 (Lancaster) 2005
Athlon 64 FX (San Diego) 2005
Athlon 64 (San Diego/Venice) 2005
Sempron (Palermo) 2005
Athlon 64 X2 (Manchester) 2005
Athlon 64 X2 (Toledo) 2005
Athlon 64 FX (Toledo) 2005
Turion 64 X2 (Taylor) 2006
Athlon 64 X2 (Windsor) 2006
Athlon 64 FX (Windsor) 2006
Athlon 64 X2 (Brisbane) 2006
Athlon 64 (Orleans) 2006
Sempron (Manila) 2006
Opteron (Santa Rosa)
Opteron (Santa Ana)
Mobile Sempron
K9 series
K10 series 2007-now
Opteron (Barcelona) 2007
Phenom FX (Agena FX) 2008
Phenom 9-series (Agena) 2007
Phenom 8-series (Toliman) 2008
Athlon 6-series (Kuma) 2008
Athlon 4-series (Kuma) 2008
Athlon X2 (Rana) 2007
Sempron (Spica) 2007
Opteron (Budapest) 2007
Opteron (Shanghai) 2007
Opteron (Cadiz) 2007
Phenom X6 2010
701
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Note that in 2003, AMD extended the Intel 32-bit architecture (IA-32) to
64 bits, variously called x86-64 or AMD64. The AMD Opteron
processors, the AMD Athlon processors, the AMD Phenom processors,
AMD Turion 64 mobile technology comprise the AMD64 family, as
follows:

 AMD Opteron processor – servers and workstations


 AMD Athlon processor family – desktops and notebooks
 AMD Turion 64 mobile technology – notebooks

AMD64 is designed to enable simultaneous 32- and 64-bit computing


with no degradation in performance. With Direct Connect Architecture,
AMD64 processors address and help eliminate the real challenges and
bottlenecks of system architectures because everything is directly
connected to the central processing unit.

AMD Sempron (front and back sides)

AMD Athlon AMD Phenom


Fig. 11-2. Photographs of some AMD microprocessors.
702
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

AMD Sempron processor family is a new line of AMD processors that


superseded AMD Duron family. AMD Sempron family is based on two
architectures: K7 and K8. Despite being advertised as budget line of
AMD processors, Sempron socket A processors are simply re-branded
Athlon XP processors. Unlike the Duron processors, that had some CPU
features crippled, the socket A Semprons have all of the features of
Athlon XP CPUs. All Sempron socket A processors have 333 MHz (166
MHz DDR) bus speed and are manufactured on the 0.13 micron
technology. The AMD Family 10h, or K10, is a microprocessor micro-
architecture. AMD refers to K10 as Family 10h Processors, as it is the
successor of the Family 0Fh Processors (K8). Both 10h and 0Fh refer to
the main result of the CPUID x86 processor instruction. In hexadecimal
numbering, 0Fh equals the decimal number 15, and 10h equals 16.

703
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-3. SPARC Processors


The following list is intended to be helpful to users of SPARC based
computers. There are other types of SPARC CPUs which are less
commonly seen in computer environment.

Table 11.4. SPARC processors from 1987 to 2013


CPU Name Year Model Freq. Arch. Transistors Power L1 Cache L2
(codename) (MHz) version (million) (W) (D+I) (k) Cache (k)
SPARC 1987 TI/ Sun 14.2–40 V7 0.1–1.8 -- 0–128 none
microSPARC I 1992 TI 40–50 V8 0.8 2.5 2+4 none
(Tsunami)
SuperSPARC I 1992 TI / 33–60 V8 3.1 14.3 16+20 0-2048
(Viking) Sun
SPARClite 1992 Fujitsu 66–108 V8E -- -- 16+16 none
hyperSPARC 1993 Ross 40–90 V8 1.5 -- 0+8 128-256
(Colorado 1)
microSPARC II 1994 Fujitsu / 60–125 V8 2.3 5 8+16 none
(Swift) Sun
hyperSPARC 1994 Ross 90–125 V8 1.5 -- 0+8 128-256
(Colorado 2)
SuperSPARC II 1994 Sun 75–90 V8 3.1 16 16+20 1024-2048
(Voyager)
hyperSPARC 1995 Ross 125–166 V8 1.5 -- 0+8 512-1024
(Colorado 3)
TurboSPARC 1995 Fujitsu 160–180 V8 3.0 7 16+16 512
UltraSPARC I 1995 Sun 143–167 V9 5.2 30 16+16 512-1024
(Spitfire)
UltraSPARC I 1998 Sun 200 V9 5.2 -- 16+16 512-1024
(Hornet)
hyperSPARC 1996 Ross 180–200 V8 1.7 -- 16+16 512
(Colorado 4)
SPARC64 1995 Fujitsu 101–118 V9 -- 50 128+128 --
SPARC64 II 1996 Fujitsu 141–161 V9 -- 64 128+128 --
SPARC64 III 1998 Fujitsu 250–330 V9 17.6 -- 64+64 8192
UltraSPARC IIs 1997 Sun 250–400 V9 5.4 25 16+16 1024 /4096
(Blackbird)
UltraSPARC IIs 1999 Sun 360–480 V9 5.4 21 16+16 1024–8192
(Sapphire-
Black)
UltraSPARC IIi 1997 Sun 270–360 V9 5.4 21 16+16 256–2048
(Sabre)
UltraSPARC IIi 1998 Sun 333–480 V9 5.4 21 16+16 2048
(Sapphire-Red) SME
UltraSPARC 2000 Sun 400–500 V9 -- 13 16+16 256
IIe
(Hummingbird)
UltraSPARC IIi 2002 -- 550–650 V9 -- 17.6 16+16 512
(IIe+)
(Phantom)
SPARC64 GP 2000 Fujitsu 400–810 V9 30.2 -- 128+128 8192
SPARC64 IV 2000 Fujitsu 450–810 V9 -- -- 128+128 2048

704
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

CPU Name Year Model Freq. Arch. Transistors Power L1 Cache L2


(codename) (MHz) versio (million) (W) (D+I) (k) Cache (k)
UltraSPARC III 2001 Sun 600 V9 29 53 64+32 8192
(Cheetah)
UltraSPARC III 2001 Sun 750–900 V9 29 -- 64+32 8192
(Cheetah)
UltraSPARC 2001 Sun 1002– V9 29 80 64+32 8192
IIICu(Cheetah) 1200
UltraSPARC 2003 Sun 1064– V9 87.5 52 64+32 1024
IIIi (Jalapeno) 1593
SPARC64 V 2003 Fujitsu 1100– V9/JPS 190 40 128+128 2048
(Zeus) 1350 1
SPARC64 V+ 2004 Fujitsu 1650– V9/JPS 400 65 128+128 4096
(Olympus-B) 2160 1
UltraSPARC IV 2004 Sun 1050– V9 66 108 64+32 16384
(Jaguar) 1350
UltraSPARC 2005 Sun 1500– V9 295 90 64+64 2048
IV+ (Panther) 2100
UltraSPARC T1 2005 Sun 1000– V9/UA 300 72 8+16 3072
1400
SPARC64 VI 2007 Fujitsu 2150– V9/JPS 540 120 128+128 5120
(Olympus-C) 2400 2
UltraSPARC T2 2007 Sun 1000– V9 / 503 95 8+16 4096
(Niagara 2) 1400 UA
UltraSPARC T2 2008 Sun 1200– V9 / 503 - 8+16 4096
+ (VictoriaFalls) 1400 UA
SPARC64 VII 2008 Fujitsu 2400 V9/JPS 600 135 64+64 6144
UltraSPARC 2009 Sun 2300 V9 / ? ? 32+32 2048
RK (Rock) UA
SPARC64 X 2012 Fujitsu 3000 V9 2950 -- 64x16 2048
SPARC T5 2013 Oracle 3600 V9 (28nm) - 16x8

Having failed long ago on the desktop and still being insignificant in the
overall notebook market (despite the availability of technically
impressive products) SPARC - unlike Intel architecture - is best viewed
solely as server processor architecture.

SPARC64 VII UltraSPARC II Oracle T5


Fig. 11-3. Photographs of some SPARC processors, from Sun Microsystems, Fujitsu
and Oracle.

705
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-4. ARM Microprocessors Selection Guide


The ARM processor range provides solutions for open platforms
in wireless, consumer and imaging applications, embedded real-
time systems for storage, automotive, industrial and networking
applications, and secure applications for VISA and SIM cards.

Fig. 11-4. ARM processors.

There are currently eight product families which make up the ARM
processor range:
ARM7 processor family
ARM9 processor family
ARM9E processor family
ARM10E processor family
ARM11 processor family
Cortex processor family
SecurCore processor family
OptimoDE Data Engines
Further implementations of the ARM architecture are available from our
Partners such as the Intel XScale microarchitecture
706
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-5. Processor Performance Factors


Since the performance of a processor is (in most cases) based on how
many instructions it can execute in a given time, it has become common
to use the words "performance" and "speed" somewhat interchangeably.
Unfortunately, the word "speed" has too many meanings when it comes
to processors, and as a result frequently one person will use it to mean
one thing and another person to mean quite another. There are two major
factors that determine the performance level of a processor:

 Clock Speed: The processor clock speed is a measure of how fast


it is running in raw terms, meaning, how many clock cycles it has to work
with in a given period of time. Using the analogy of a bicycle, this would
be equivalent to how fast you are pedaling.
 Architecture: The internal design and architecture of the processor
measures how efficiently the processor does its work and how much it
does with each clock cycle. In the bicycle analogy this would be a
measure of how hard you are pedaling.

Since clock speed is easy to see and understand and architecture is


complex and difficult to understand, it is not surprising that the former
receives much more attention than the latter. This is unfortunate, because
looking at just the clock speed of a processor is very deceptive, because it
tells only one part of the picture. In fact, this is more so today than ever
before due to the much greater variety in processor designs and
technologies.
The "P" rating scheme was invented by Intel competitors AMD and Cyrix
to provide what they feel is an assessment of the value of their processors
that is fairer than using just clock speed.

One great example showing the importance of architecture is the AMD


K5 and K6 series of processors. Some of these processors actually have
very different performance ratings despite running at the same clock
speed! For example, the K5-PR100 and the K5-PR133 both run at the
same clock speed, 100MHz, but the PR133 has about 33% better
performance, due to changes in the processor internal architecture.

Clock speed can be used to compare processors only if they are identical
internally. This means you can only use clock speed to compare the
performance of otherwise identical processors. A Pentium 200 is in fact
20% faster than a Pentium 166. But it isn't 20% faster than a Pentium
with MMX 166, because of the improvements in the latter architecture.
707
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-6. Benchmarks
Benchmarks are standard evaluation programs, which can be run on
different computers to give a measure of their performance. Processors
are frequently benchmarked, in the hopes of coming up with a single
number that can capture the value of the CPU and let it be easily
compared to others. There is a difference between benchmarking
processors and benchmarking whole systems. System benchmarking
attempts to evaluate "real world" performance of a whole system, which
involves much more than just the processor. Processor benchmarks show
isolated performance of processors relative to one another. Even in just
looking at processors, there are many types of benchmarks, which can
result in many different scores.

The count of executed instructions per second, by millions (MIPS), was a


measure of performance for first microprocessor generations. The Digital
Equipment VAX 11/780 minicomputer was just 1 MIPS. However, as
microprocessors have been widely produced by several manufacturers,
and their instruction sets are radically different in their byte count, the
MIPS, has no longer been an acceptable benchmark in the microprocessor
industry. Alternatively, the count of executed floating-point operations
per second, by millions (MFLOPS) or by billions (GFLOPS) has been
proposed as a benchmark. Another interesting benchmark, that has been
recently introduced, is the CPU 2000. The CPU 2000 is based on other
two benchmarks, namely: the SPECint92 and the SPECfp92. The last
two benchmarks have been widely accredited as trustable benchmarks.

The SPECint92 is a performance evaluation standard, whose result is


derived from a set of integer benchmarks. It’s actually calculated as the
geometric mean of 6 tests done by another benchmark called CINT92.
So, the SPECint92 is a benchmark, which may be used to estimate a
machine's single-tasking performance on integer code1. Similarly, the
SPECfp92 is a benchmark, which may be used to estimate a machine's
single-tasking performance on floating point numbers code. For instance,
the Pentium (80586) performance on integer number operations is
estimated as 64.5 SPECint92, and its performance on floating numbers is
about 56.9 SPECfp92. In the following paragraphs we present some
utilities that make use of the above benchmarks:
Norton SI: This is one of the earliest universal benchmarks, created by
Peter Norton of Norton Utilities fame and used as part of the Norton
System Information utility. This benchmark has been around for years,
and in fact is one of the few for which scores are available even for older
708
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

processors. It is however an outdated. Norton SI32: This is an updated


version of the original Norton SI index and produces scores that are to a
different scale than the old benchmark. It operates under Windows and
uses 32-bit code to perform its assessment. Scores are relative to the
386SX which is defined as 1.0. This benchmark depends on the system,
so two different systems with the same processor are likely to yield
different results for this test.

CPUmark32: This is a part of the Ziff-Davis Benchmark Operation


WinBench benchmark set. It tests performance executing 32-bit code. It
is widely used and somewhat dependent on particular machine setup.

iCOMP: This is an acronym for the Intel COmparative Microprocessor


Performance index. This is the "official" benchmark that Intel has chosen
to use to state the relative performance of its processors. It is in fact a
blending of the results from several other benchmarks. Intel used this
rating to produce values for processors ranging from the 386-16 to the
Pentium 166, before replacing it with iCOMP 2.0 .

iCOMP 2.0: This is Intel's revised iCOMP benchmark, used for Pentium
and later processors. It is also an amalgam of several other benchmarks
(including SI32 and CPUmark32). It focuses more on 32-bit performance
than the original iCOMP index, and also partially incorporates a
multimedia benchmark. The following table depicts the benchmark of
several processors. The benchmark values were obtained from many
different sources. Where possible, we used official values from
manufacturers, after cross-referencing with independent numbers. Values
with a tilde "~" in front of them are extrapolated or approximated.
Table 11-5. Benchmarks of different processors.

Processor Speed ICOMP ICOMP Norton SI Norton CPUmark


2.0 SI32 32
8088 regular -- -- 1.0 -- --
-8 -- -- 1.7 -- --
8086 regular -- -- !? -- --
-8 -- -- !? -- --
-10 -- -- !? -- --
80286 -6 -- -- 3.1 -- --
-10 -- -- 5.6 -- --
-20 ~38 -- ~20 !? --
80386DX -25 49 -- ~25 !? --
-33 68 -- 35 !? --
-40 ~85 -- ~43 !? --

709
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

-25 122 -- 54 !? --
80486DX -50 249 -- 109 !? --
-133 ~610 ~67 288 18 ~160
-100 ~610 ~67 264 ~16 ~150
AMD 5x86 -120 ~735 ~81 316 19 ~180
Cyrix 5x86 60 510 51 190 ~16 ~120
66 567 57 211 ~18 ~140
Pentium 75 610 67 237 23 181
100 815 90 317 30 243
133 1110 111 421 36 300
166 1308 127 529 40 343
233 ~2210 203 ~890 62 460
200 -- 220 -- 90 553
Pentium 200 -- ~240 -- 98 611
with MMX 233 -- 267 -- ~115 ~640
300 -- 332 -- !? !?
AMD K5 333 -- 366 -- !? !?
233 -- !? -- ~91 !?
Pentium 266 -- !? -- ~100 !?
Pro 233 -- 267 -- 115 640
300 -- 332 -- !? !?
Pentium II 166 -- !? -- 73 420
233 -- !? -- 91 !?
AMD K6 266 -- !? -- 100 !?

The following benchmarks are more recent, and are currently used to
compare the modern microprocessor systems.

PCMark05 from FutureMark is an application-based benchmarking tool


used to measure overall PC performance. By using portions of real
applications, this benchmarking tool can assess PC performance.

SPECint_base2006 and SPECfp_base2006 are speed-based metrics,


calculated from the time it takes to complete a series of individual
applications, each run one after another. Unless each application is
compiled specifically to be parallel, these tests are single-threaded and
will not show any performance benefits of a multi-core processor.

SPECint_rate_base2006 and SPECfp_rate_base2006 are used to


measure throughput of a computer that is performing a number of tasks.
This is achieved by running multiple copies of each benchmark
simultaneously with the number of copies set to set to the number of
logical hardware cores seen by the operating system.

710
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

For instance, the Intel Core2 Duo Processor E6700 has


SPECint_base2006 of 17.9 and SPECfp_base2006 of 16.3. The same
processor has SPECint_rate_base2006 of 39 and SPECfp_rate_base2006
of 25.5.

Fig. 11-5. Benchmarks of Pentium 3 and Pentium 4 microprocessors.

Fig. 11-6. Benchmarks of Intel’s Pentium D, Core2 and AMD’s Athlon 64


microprocessors.

711
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-7. Microprocessor Packages & Marking


Microprocessors are fabricated in different packaging styles, according to
the pin count and operation frequency. The first microprocessors (as
4004, 8080, 8086, 8088) were usually fabricated in dual inline packages
(DIP). As the microprocessors complexity increased (in terms of number
of transistors and functionality), their pin count increased. On the other
hand the operating clock frequency has been increased from a few MHz
to several GHz in recent microprocessors. Therefore, packages with high
pin-count, like leaded chip carrier (LCC), small-outline package (SOP),
quad flat packs (QFP), pin-grid arrays (PGA) and ball-grid array (BGA),
packages2, have been widely used in the microprocessor industry. The
80286 microprocessor was realized in PLCC package, while 80386 (132
pin), 80486 (168 pin) and Pentium (273 pin) microprocessors were
realized using PGA packages. Usually, the package of the processor holds
some marking data, to indicate its speed and cache size as well as date of
production. Figure 11-8 depicts the Pentium 4 marking.

DIP (dual inline package) QFP (quad flat pack) FC - PGA (pin grid array)

Fig. 11-7. Packages of different microprocessors.

Fig. 11-8. Marking of the Pentium 4 microprocessors.

2
and their variants (P=Plastic, C=Ceramic, etc)
712
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-8. Processor Sockets


We know that each PC motherboard is designed for a certain range of
processors. One of the determining factors of processor compatibility is
the socket connector, which is soldered onto the mottherboard. Intel
designed a set of sockets; each of these sockets supports a certain range of
CPUs. The 80286 and early versions of 80386 (132 pin) microprocessor
used PLCC sockets. The 80486 (168 pin) used special PGA sockets, such
as Socket 1, Socket 2 and Socket 3. Socket 4, a 296 pins PGA, was also
used for the 80486 and the initial Pentium 60 and 66 MHz Processors.
Intel Pentium classic and compatible processors plug into Socket 5 or
Socket 7. Socket 5 and 7 are a Staggered Pin Grid Array (SPGA). The
Pentium Pro introduced a larger Socket 8.

Socket 7 is a processor socket that supports Intel Pentium processors &


AMD processors with 321 pins in a ZIF socket. This includes all 2.5V to
3.3V Pentium at 75 to 233MHz, AMDK5 through K6 and Cyrix 6X86
P120 to P233.

Socket 8 is a 387 pin socket used by the various Intel Pentium Pro and
Pentium II overdrive processors. The pins are arranged in a 24X26 matrix
that is more rectangular than the socket 7 voltages used with the socket 8
fall in the range of 3.1V to 3.3V. The Intel Pentium Pro 150 to 200 MHz
and Pentium II overdrive 300 to 333 MHz use this socket.

Socket 370 Once Intel changed designs from socket to slots, people were
used to the sockets designs ease of use and didn’t want to use a slot. So
Intel developed the socket 370 (named for the fact that it use as 370 pins).
The socket 370 socket type support Intel Pentium II and III as well as
several Celeron processors.

Socket A is one of the most popular sockets today. This socket consists
of 462 pins and looks similar to other socket types. The socket A
primarily support AMD Athlon, Duron and Athlon Xp processors

Socket 423 came with the introductions of the Pentium4. The socket 423
was introduced alongside the Intel850 motherboard chipset..

Socket 478 is used with Pentium4 processors. The socket 478 is similar
in appearance to the socket 423 but it support more pins for extra
capabilities.

713
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Socket 370 for Pentium

Slot 1 for Pentium II

Fig. 11-9. Socket 370 and Slot 1, their locations on the motherboard

The next move of Intel was away from Sockets to an edge connector
configuration, called Slot 1. This was first used with the Pentium II. All
versions of the Pentium II were packaged on a special daughter-board
that plugs into a card-edge processor slot on the motherboard. The
daughter-board is enclosed within a rectangular box called a Single Edge
Contact (SEC) cartridge. Slot1 is electrically identical to Socket 8 but is
an edge connector, rather than a Pin Grid Array (PGA) socket. Actually,
Pentium II required a 242-pin Slot1, while Xeon processor used a 330-pin
slot called Slot 2. Intel refers to Slot1 and Slot2 as SEC-242 and SEC-
330 in some of their technical documentation. The daughter-board has
mounting points for the Pentium 2 CPU itself plus various support chips
and cache memory chips.
714
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Slot1: With the development of the Pentium II, Intel designed the Single
Edge Contact Cartridge (SECC) to contain its processors. The SECC has
the processor on a small circuit board with 242 small fingers or leads.
This board is then inserted into a special slot on the motherboard called a
slot1 connector. The slot 1 connectors support SECC processors
including the Intel, Celeron, Pentium II, and some Pentium III’s.

Slot2: The Slot2 specified a 330 lead edge connector for the processor
card. It functions similarly to Slot1 setup but more leads on the connector
between the card and the motherboard. It allows the CPU to communicate
with the L2-cache at the CPU’s full clock speed thus enhancing
performance. Sot2 was primarily designed for use in workstations with
SECC Pentium III and Xeon processors.

Intel changed the Celeron processor interface to a new socket called


Socket 370. With the release of the Pentium4 in late 2000, Intel
introduced another socket called Socket 423. Indicative of the trend for
processors with low consume power consumption, the PGA-style Socket
423 has a voltage range of 1.0V -1.85V. Socket 423 had been in use for
only a few months when Intel announced the Socket 478 form factor. The
principal difference between Socket 478 and its predecessor is that the
newer format socket features a more packed form of pins known as micro
Pin Grid Array (µPGA), which reduces the size of the CPU and the space
occupied by the socket on the motherboard. Socket 423 was introduced to
accommodate the Pentium 4, launched at 2002. The µPGA 478 Socket
works fine with Pentium 4 chips and Pentium 4 Celeron processors. As of
2007, Land Grid Array (LGA) sockets became popular. With LGA
sockets, the socket contains pins that make contact with pads or lands on
the bottom of the processor package. The LGA 775 (also known as
Socket T) socket was initially introduced to accommodate some late
Pentium 4 and Celeron D processors. Nowadays, LGA 775 sockets are
also available for Core2 Duo and Core2 Quad processors. The Intel LGA
1155, also called Socket H2, supports Intel Sandy Bridge and Ivy Bridge
microprocessors (such as Intel’s Corei7). The Intel LGA 1366 (also
known as Socket B) supersedes Intel's LGA 775 (Socket T) in the high-
end desktop segments. It also replaces the server-oriented LGA 771
(Socket J) in the entry level.

The Intel LGA 2011 (also called Socket R) replaced Intel's LGA 1366
(Socket B) and LGA 1567 in the high-end desktop and server platforms.
The socket was released on 2011 and supports Sandy Bridge-E processors
with 4 memory channels of DDR3-1600 as well as 40×PCIe 2 or 3 lanes.

715
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Table 11-6 identifies some CPU interface sockets from the time of Intel's
socket1.

Table 11-6. Some Intel CPU famous sockets

Feature Socket 1 Socket 2 Socket 3 Socket 4 Socket 5 Socket 6 Socket 7


No. of pins 169 238 237 273 320 335 321
Voltage (V) 5 5 3.3/ 5 5 3.3 3.3 3.3/3.45
CPU 486 486SX, 486DX, Pentium Pentium 486DX4 Pentium
DX DX2,4

Feature Slot 1 Slot 2 Socket 370 Socket 423 Socket 478 Socket T
No. of pins 242 330 370 423 478 775
Voltage 3.3/2.5 3.3/2.5 3.3/2.5 1V -1.85V 1V -1.85V 1V -1.85V
CPU Celeron, Pentium 2 Celeron, Pentium 4 Pentium 4 Pentium 4,
Pentium 3 Pentium 3 Pentium 3 Core, Core2
Speed < 2.8 GHz > 3 GHz

As for the latest AMD processors, there exist a variety of sockets for
different processor. For instance, all K7-based sempron processors are
compatible with socket 462 (Socket A). About the same time the slot1
connector began to gain use AMD developed its own card and slot
processor (Slot A). The slot A looks the same as a slot1 and they are
physically same size. The slot A allows for a higher bus rate than a
socket7 and its used primarily with AMDK7 processor family. Also K8-
based sempron processors use Socket 754. Figure 11-9 shows some
sockets for AMD processors. Table 11-6 depicts some AMD sockets.

Table 11-7. AMD CPU sockets.

Characteristics Slot A Socket A Socket 754 Socket AM2


No. of pins 242 370 423 940
Voltage 3.3/2.5 2.5 1V -1.85V 1V -1.85V
Original CPU AMD AMD Athlon Athlon 64, Athlon 64,
Athlon XP Sempron Phenom

In 2006, AMD released Socket AM2 for desktop processors. AM2 (940
pin) is announced as a replacement for Socket 754 and Socket 939. AM2
supports AMD Athlon 64, Sempron, Optron and Phenom processors.
Socket AM2 is a part of AMD's generation of CPU sockets, along with
Socket F for servers and Socket S1 for mobile computing. The Socket
AM3 (938-contact PGA) is intended for single AMD processors, with
support for DDR3-SDRAM and separated power lanes..

716
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Socket 754 (Intel) Socket AM2 (Intel)

Socket 2011 (Intel)

Fig. 11-10. Photograph of Intel the754 socket.(), AM2 socket (AMD) and 2011 socket

717
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-9. Processor Bus Speed


The latest version of Pentium 4 had a front-side bus speed (FBS)
throughput, running at 1066 MHz, but in reality, it ran at 266 MHz and
passes 4 bits of data per connection per clock tick. Other Pentium 4
editions feature 533 (4x133) or 800 (4x200) MHz system bus. The Athlon
XP came with a 266 MHz double-pumped bus, which ran at 133 MHz
and passes 2 bits of data per tick. The new Core i7 processors feature a
point-to-point processor interconnect, called Intel QuickPath
Interconnect (QPI), which replaced the legacy front side bus.

11-10 Processor Overclocking


Overclocking (OC) is taking your computer components above their
recommended speed settings. Overclocking is the practice of making a
component run at a higher clock speed than the manufacturer's
specification. The idea is to increase performance for free or to exceed
current performance limits, but this may come at the cost of stability.
Think of the 4GHz on your new 3GHz Corei7 as a speed limit. This often
takes advantage of the fact that many manufacturers mark higher end
components as lower in order to meet demand for a lower end
component. You will be able to get extra performance out of your
components for free. It is possible to get performance that is not possible
even when using the top of the line components.

Note: Overclocking
Overclocking will void the warranty on the parts being overclocked.
Doing so may also cause system instability, and may also cause damage
to components and data. Be careful and cautious when overclocking.

The CPU's clock speed is the FSB clock speed (base, not effective speed)
times the CPU's multiplier. On most new CPUs, the multiplier is locked,
so you will have to adjust the FSB clock speed. The FSB is not adjustable
on a few motherboards, and many OEM systems. The FSB and
multiplier, if not locked, are adjustable from within the BIOS. Note that
upping the FSB clock speed also increases the clock speed of many other
components, including RAM. When increasing the FSB clock speed, only
do so in small increments of a few MHz at a time. After you do this, boot
up your computer to make sure it works. If your computer successfully
boots, increase the FSB some more. If it won't boot, lower the FSB until
your computer properly boots up. Repeat until you have the highest
setting with which your computer will boot up. Next test your OS to
make sure it is stable with a burn application, or any application.

718
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-11. Processor Supply Voltages


The trend is to a spilt power supply with 2.8V (now as low as 1.8V) to the
Core of the processor, and 3.3V to the I/O and interface circuitry. The
Core processors could consume as much as 100 Watt and the voltage
regulators supplying the 3.3V, should be able to supply 30A to the
processor. Better System Boards now use Switch-mode power supplies
on the board to regulate the supply voltages to the processor chip. This
gives less heat dissipation than linear regulators.

Fig. 11-11. Chronological evolution of Processor supply voltages

11-12. CPU Cooling


CPU cooling is very important and should not be overlooked. A less than
average CPU temperature prolongs CPU life (up to more than 10 years).
On the other hand high CPU temperatures can cause unreliable operation,
such as computer freezes, or slow operation. Extremely high temperatures
can cause immediate CPU destruction by melting the materials in the chip
and changing the physical shape of the sensitive transistors on the CPU.
Because of this, never switch on the computer if your CPU has no cooling
at all. It is an extremely stupid thing to do, the scenario of 'I'll just test
whether my CPU works!' as by doing so, you would find that the CPU
fries in less than 5 seconds and you will be off to buy a new one. Most
CPU installations use forced-air cooling, but convection cooling and
water cooling are also options. For traditional forced-air cooling, the heat
sink and fan (HSF) included in most retail CPUs is usually sufficient to
cool the CPU at stock speed.

719
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Overclockers might want to use a more powerful fan, or water cooling


because of the increased heat of over-clocking. HSFs with decent
performance are usually copper-based. The cooling effect is enhanced if
the HSF has heatpipes. Silent HSFs provide users a nearly silent cooling.
Many retail heatsink+fan units have a thermal pad installed, which
transfers heat from the CPU to the fan helping diffuse the heat created by
the CPU. This pad is usable only once. If you wish to remove the fan
from another CPU so that you can use it on your new one, or need to take
it off for some reason, you will need to remove it, and apply a thermal
paste or another thermal pad.

Fig. 11-12. Photograph of a laptop cooler

Note that some of the cheaper pads can melt in unexpected heat and may
cause problems and potentially even damage if you are overclocking. In
either case, thermal paste is usually more effective, just harder to apply. If
you are planning a long term installation a thermal pad is suggested. Non-
conductive thermal pastes made up of silicon are the cheapest and safest.
Silver-based thermal pastes sometimes perform better than normal
thermal pastes, and carbon-based ones perform better still. Some low-
noise CPU cooling fans require special mounting hardware on the
motherboard. Be sure that the cooling fan you choose is compatible with
your motherboard.
720
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-13. Summary

As computer architecture advanced, it became more difficult to compare


the performance of various computer systems simply by looking at their
specifications. Therefore, tests were developed that allowed comparison
of different architectures. For example, Pentium 4 processors generally
operate at a higher clock frequency than Athlon XP processors, which
does not necessarily translate to more computational power. A slower
processor, with regard to clock frequency, can perform as well as a
processor operating at a higher frequency. Benchmarks are standard
evaluation programs, which can be run on different computers to give a
measure of their performance.

In this chapter, we present the recent releases of Intel and AMD x86
microprocessors. We also recapitulate all the Intel microprocessors,
which are introduced since 1974. As shown in the following table, The
Pentium 4 processors are based on the Intel netburs microarchitecture
and still maintains the tradition of compatibility with IA-32 software. The
Intel Core processors (e.g., Core2 and Core i7) followed the Pentium4
and became the vedette of recent PC’s and other computing platforms.

721
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Bus Speed
Clock Tech Power L2 L3
Processor Socket Cores MHz
GHz (nm) (W) Cache Cache
(MT/s)
Socket 2,
Pentium 0.06- 0.2 800-350 NA 1 50 - 66 - -
3, 4, 5, 7
Pentium
0.12- 0.3 Socket 7 350-250 NA 1 60 - 66 - -
MMX
PBGA437, 0.65- 1, 400,533, 512 KB -
Atom 0.8- 2.13 32 , 45 -
PBGA441 13- 2 667,2. 1MB
Slot 1,
37,47 495, 45, 65, 66, 100,
0.266- 1, 0 KB -
Celeron LGA 775, 90, 130, 5.5- 6 133, 400, -
3.6 2 1MB
Socket M, 180,250 533, 800
Socket T
256KB,
Pentium 350, 29.2-
0.15- 0.2 Socket 8 1 60, 66 512KB, -
Pro 500 47
1024KB
Slot 1,
MMC-1,
0.233- 250, 16.8- 256KB -
Pentium II MMC-2, 1 66,100 -
0.45 350 38.2 512KB
Mini-
Cartridge
130,
Pentium Slot 1, 17- 256KB -
0.45-1.4 180, 1 100,133 -
III Socket370 34.5 512KB
250
Slot 2,
45, 100, 133,
Socket603 1,
65, 400, 533,
Socket604 2,
90, 16- 667, 800, 256KB - 4MB -
Xeon 0.4-4.4 Socket J, 4,
130, 165 1066, 1333, 12MB 16MB
T, B 6,
180, 1600, 4800
LGA1156, 8
250 5860, 6400
LGA 1366
Socket423 400,
Socket478 65 , 90 , 21 - 533, 256KB -
Pentium 4 1.3 - 3.8 1 -
,LGA 775, 130 , 180 115 800, 2MB
Socket T 1066
Pentium 4 3.2 - Socket478 92 - 800, 512KB - 0KB -
90 , 130 1
Ex Edition 3.73 ,Socket T 115 1066 1MB 2MB
Pentium 0.8- 5.5 - 400, 1MB -
Socket479 90 , 130 1 -
M 2.266 27 533 2MB
Pentium 2.66- 95 - 533, 800 2×1 MiB
Socket T 65 , 90 2 -
D/EE 3.73 130 1066 2×2 MiB
Pentium Socket775 10 - 533,667, 1MB -
1.6 -2.93 45 , 65 2 -
Dual-Core t M, P, T 65 800,1066 2MB

722
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

Bus Speed
Clock Tech Power L2 L3
Processor Socket Cores MHz
GHz (nm) (W) Cache Cache
(MT/s)
Socket775 800,1066,
Pentium 32 , 45 , 5.5 - 1, 2x256B - 0KB -
1.2 -3.33 LGA1156, 2.5GT/s,
New 65 73 2 2MB 3MiB
LGA1155, 5 GT/s
1.06- 5.5 - 1, 533 ,
Core Socket M 65 2MB -
2.33 49 2 667
Socket775 1, 533, 667,
1.06- 5.5 - 1MB -
Core 2 Socket M, 45 , 65 2, 800,1066, -
3.33 150 12MB
P, J, T 4 1333,1600
LGA1156 35 - 1066, 1600, 3MB -
Core i3 2.4 - 3.4 22 , 32 2 256KB
LGA 1155 73 2.5-5GT/s 4MB
1.06 - LGA1156 22 , 32 , 17 W - 2, 4MB -
Core i5 2.5-5GT/s 256KB
3.46 LGA 1155 45 95 W 4 8MB
LGA1156, 22 , 32 , 45- 4.8GT/s, 4×256K 6 MB -
Core i7 1.6 - 3.6 4
1366,2011 45 130 6.4GT/s B 10MB
LGA1366, 6x256K 12MB -
Core i7 3.2- 4 32, 22,14 130 6 6.4GT/s
LGA 2011 B 15MB

The Core i7 family of processors features an inclusive shared L3 cache


that can be up to 12 MB in size. The following figure shows the different
types of caches and their layout for the Core i7 quad-core processor.

In the older architectures, the front-side bus (FSB) was the interface for
exchanging data between the CPU and the chipset north bridge. If the
CPU had to read or write into system memory or over the PCIe bus, then
the data had to traverse over the external FSB. In the new Nehalem
microarchitecture, Intel moved the memory controller and PCIe controller
from the north bridge onto the CPU die. These changes help increase
data-throughput and reduce the latency for memory and data transactions.
723
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-14. Problems

11-1) Draw a graph indicating the temporal evolution of the number of


transistors integrated in Intel’s microprocessors.

11-2) Draw a graph indicating the temporal evolution of the MIPS


benchmark of Intel’s microprocessors.

11-3) Make a comparison between different 80x86 processors, indicating


the number of pins, sockets, operating voltage and power dissipation

11-4) Compare between the Pentium 4 and AMD K7 processors, from the
point of view of speed, number of pins, sockets, operating voltage and
power dissipation

11-5) Compare between the Intel Core2 Duo and AMD Sempron
processors, from the point of view of speed, number of pins, sockets,
operating voltage and power dissipation

11-6) Explain how the improvements make the Core i7 family of


processors ideal for test and measurement applications such as high-speed
design validation and high-speed data record and playback.

724
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

11-15. Bibliography

[1] Intel Corp., Peripheral Components, Santa Clara. CA, 1993.


[2] http://www.intel.com
[3] http://www.amd.com
[4] http://www.sun.com
[5] Intel 64 and IA-32 Architectures Developers Manual, Vol. 2, Intel
Corp., April 2008.

725
Prof. Dr. Muhammad El-SABA
Introduction to Microprocessors & Interface Circuits CHAPTER 11

726
Prof. Dr. Muhammad El-SABA

You might also like